From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from cuda.sgi.com (cuda1.sgi.com [192.48.157.11]) by oss.sgi.com (8.14.3/8.14.3/SuSE Linux 0.8) with ESMTP id p665A44G078511 for ; Wed, 6 Jul 2011 00:10:04 -0500 Received: from mga14.intel.com (localhost [127.0.0.1]) by cuda.sgi.com (Spam Firewall) with ESMTP id 470E0E792D1 for ; Tue, 5 Jul 2011 21:53:15 -0700 (PDT) Received: from mga14.intel.com (mga14.intel.com [143.182.124.37]) by cuda.sgi.com with ESMTP id LXOxLS5QzF3Ek26O for ; Tue, 05 Jul 2011 21:53:15 -0700 (PDT) Date: Tue, 5 Jul 2011 21:53:01 -0700 From: Wu Fengguang Subject: Re: [PATCH 03/27] xfs: use write_cache_pages for writeback clustering Message-ID: <20110706045301.GA11604@localhost> References: <20110629140109.003209430@bombadil.infradead.org> <20110629140336.950805096@bombadil.infradead.org> <20110701022248.GM561@dastard> <20110701041851.GN561@dastard> <20110701093305.GA28531@infradead.org> <20110701154136.GA17881@localhost> <20110704032534.GD1026@dastard> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <20110704032534.GD1026@dastard> List-Id: XFS Filesystem from SGI List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: base64 Sender: xfs-bounces@oss.sgi.com Errors-To: xfs-bounces@oss.sgi.com To: Dave Chinner Cc: Christoph Hellwig , "linux-mm@kvack.org" , "xfs@oss.sgi.com" , Mel Gorman , Johannes Weiner T24gTW9uLCBKdWwgMDQsIDIwMTEgYXQgMTE6MjU6MzRBTSArMDgwMCwgRGF2ZSBDaGlubmVyIHdy b3RlOgo+IE9uIEZyaSwgSnVsIDAxLCAyMDExIGF0IDExOjQxOjM2UE0gKzA4MDAsIFd1IEZlbmdn dWFuZyB3cm90ZToKPiA+IENocmlzdG9waCwKPiA+IAo+ID4gT24gRnJpLCBKdWwgMDEsIDIwMTEg YXQgMDU6MzM6MDVQTSArMDgwMCwgQ2hyaXN0b3BoIEhlbGx3aWcgd3JvdGU6Cj4gPiA+IEpvaGFu bmVzLCBNZWwsIFd1LAo+ID4gPiAKPiA+ID4gRGF2ZSBoYXMgYmVlbiBzdHJlc3Npbmcgc29tZSBY RlMgcGF0Y2hlcyBvZiBtaW5lIHRoYXQgcmVtb3ZlIHRoZSBYRlMKPiA+ID4gaW50ZXJuYWwgd3Jp dGViYWNrIGNsdXN0ZXJpbmcgaW4gZmF2b3VyIG9mIHVzaW5nIHdyaXRlX2NhY2hlX3BhZ2VzLgo+ ID4gPiAKPiA+ID4gQXMgcGFydCBvZiBpbnZlc3RpZ2F0aW5nIHRoZSBiZWhhdmlvdXIgaGUgZm91 bmQgb3V0IHRoYXQgd2UncmUgc3RpbGwKPiA+ID4gZG9pbmcgbG90cyBvZiBJL08gZnJvbSB0aGUg ZW5kIG9mIHRoZSBMUlUgaW4ga3N3YXBkLiAgTm90IG9ubHkgaXMgdGhhdAo+ID4gPiBwcmV0dHkg YmFkIGJlaGF2aW91ciBpbiBnZW5lcmFsLCBidXQgaXQgYWxzbyBtZWFucyB3ZSByZWFsbHkgY2Fu J3QKPiA+ID4ganVzdCByZW1vdmUgdGhlIHdyaXRlYmFjayBjbHVzdGVyaW5nIGluIHdyaXRlcGFn ZSBnaXZlbiBob3cgbXVjaAo+ID4gPiBJL08gaXMgc3RpbGwgZG9uZSB0aHJvdWdoIHRoYXQuCj4g PiA+IAo+ID4gPiBBbnkgY2hhbmNlIHdlIGNvdWxkIHRoZSB3cml0ZWJhY2sgdnMga3N3YXAgYmVo YXZpb3VyIHNvcnRlZCBvdXQgYSBiaXQKPiA+ID4gYmV0dGVyIGZpbmFsbHk/Cj4gPiAKPiA+IEkg b25jZSB0cmllZCB0aGlzIGFwcHJvYWNoOgo+ID4gCj4gPiBodHRwOi8vd3d3LnNwaW5pY3MubmV0 L2xpc3RzL2xpbnV4LW1tL21zZzA5MjAyLmh0bWwKPiA+IAo+ID4gSXQgdXNlZCBhIGxpc3Qgc3Ry dWN0dXJlIHRoYXQgaXMgbm90IGxpbmVhcmx5IHNjYWxhYmxlLCBob3dldmVyIHRoYXQKPiA+IHBh cnQgc2hvdWxkIGJlIGluZGVwZW5kZW50bHkgaW1wcm92YWJsZSB3aGVuIG5lY2Vzc2FyeS4KPiAK PiBJIGRvbid0IHRoaW5rIHRoYXQgaGFuZGluZyByYW5kb20gd3JpdGViYWNrIHRvIHRoZSBmbHVz aGVyIHRocmVhZCBpcwo+IG11Y2ggYmV0dGVyIHRoYW4gZG9pbmcgcmFuZG9tIHdyaXRlYmFjayBk aXJlY3RseS4gIFllcywgeW91IGFkZGVkCj4gc29tZSBjbHVzdGVyaW5nLCBidXQgSSdtIHN0aWxs IGRvbid0IHRoaW5rIHdyaXRpbmcgc3BlY2lmaWMgcGFnZXMgaXMKPiB0aGUgYmVzdCBzb2x1dGlv bi4KCkkgYWdyZWUgdGhhdCB0aGUgVk0gc2hvdWxkIGF2b2lkIHdyaXRpbmcgc3BlY2lmaWMgcGFn ZXMgYXMgbXVjaCBhcwpwb3NzaWJsZS4gTW9zdGx5IG9mdGVuLCBpdCdzIGluZGVlZCBPSyB0byBq dXN0IHNraXAgc3BvcmFkaWNhbGx5CmVuY291bnRlcmVkIGRpcnR5IHBhZ2UgYW5kIHJlY2xhaW0g dGhlIGNsZWFuIHBhZ2VzIHByZXN1bWFibHkgbm90CmZhciBhd2F5IGluIHRoZSBMUlUgbGlzdC4g U28geW91ciAyLWxpbmVyIHBhdGNoIGlzIGFsbCBnb29kIGlmCmNvbnN0cmFpbmluZyBpdCB0byBs b3cgc2NhbiBwcmVzc3VyZSwgd2hpY2ggd2lsbCBsb29rIGxpa2UKCiAgICAgICAgaWYgKHByaW9y aXR5ID09IERFRl9QUklPUklUWSkKICAgICAgICAgICAgICAgIHRhZyBQR19yZWNsYWltIG9uIGVu Y291bnRlcmVkIGRpcnR5IHBhZ2VzIGFuZAogICAgICAgICAgICAgICAgc2tpcCB3cml0aW5nIGl0 CgpIb3dldmVyIHRoZSBWTSBpbiBnZW5lcmFsIGRvZXMgbmVlZCB0aGUgYWJpbGl0eSB0byB3cml0 ZSBzcGVjaWZpYwpwYWdlcywgc3VjaCBhcyB3aGVuIHJlY2xhaW1pbmcgZnJvbSBzcGVjaWZpYyB6 b25lL21lbWNnLiBTbyBJJ2xsIHN0aWxsCnByb3Bvc2UgdG8gZG8gYmRpX3N0YXJ0X2lub2RlX3dy aXRlYmFjaygpLgoKQmVsb3cgaXMgdGhlIHBhdGNoIHJlYmFzZWQgdG8gbGludXgtbmV4dC4gSXQn cyBnb29kIGVub3VnaCBmb3IgdGVzdGluZwpwdXJwb3NlLCBhbmQgSSBndWVzcyBldmVuIHdpdGgg dGhlIC0+bnJfcGFnZXMgd29yayBpc3N1ZSwgaXQncwpjb21wbGV0ZSBlbm91Z2ggdG8gZ2V0IHJv dWdobHkgdGhlIHNhbWUgcGVyZm9ybWFuY2UgYXMgeW91ciAyLWxpbmVyCnBhdGNoLgoKPiA+IFRo ZSByZWFsIHByb2JsZW0gd2FzLCBpdCBzZWVtIHRvIG5vdCB2ZXJ5IGVmZmVjdGl2ZSBpbiBteSB0 ZXN0IHJ1bnMuCj4gPiBJIGZvdW5kIG1hbnkgLT5ucl9wYWdlcyB3b3JrcyBxdWV1ZWQgYmVmb3Jl IHRoZSAtPmlub2RlIHdvcmtzLCB3aGljaAo+ID4gZWZmZWN0aXZlbHkgbWFrZXMgdGhlIGZsdXNo ZXIgd29ya2luZyBvbiBtb3JlIGRpc3BlcnNlZCBwYWdlcyByYXRoZXIKPiA+IHRoYW4gZm9jdXNp bmcgb24gdGhlIGRpcnR5IHBhZ2VzIGVuY291bnRlcmVkIGluIExSVSByZWNsYWltLgo+IAo+IEJ1 dCB0aGF0J3MgcmVhbGx5IGp1c3QgYW4gaW1wbGVtZW50YXRpb24gaXNzdWUgcmVsYXRlZCB0byBo b3cgeW91Cj4gdHJpZWQgdG8gc29sdmUgdGhlIHByb2JsZW0uIFRoYXQgY291bGQgYmUgYWRkcmVz c2VkLgo+IAo+IEhvd2V2ZXIsIHdoYXQgSSdtIHF1ZXN0aW9uaW5nIGlzIHdoZXRoZXIgd2Ugc2hv dWxkIGV2ZW4gY2FyZSB3aGF0Cj4gcGFnZSBtZW1vcnkgcmVjbGFpbSB3YW50cyB0byB3cml0ZSAt IGl0IHNlZW1zIHRvIG1ha2UgZnVuZGFtZW50YWxseQo+IGJhZCBkZWNpc2lvbnMgZnJvbSBhbiBJ TyBwZXJzZXBjdGl2ZS4KPiAKPiBXZSBoYXZlIHRvIHJlbWVtYmVyIHRoYXQgbWVtb3J5IHJlY2xh aW0gaXMgZG9pbmcgTFJVIHJlY2xhaW0gYW5kIHRoZQo+IGZsdXNoZXIgdGhyZWFkcyBhcmUgZG9p bmcgIm9sZGVzdCBmaXJzdCIgd3JpdGViYWNrLiBJT1dzLCBib3RoIGFyZSB0cnlpbmcKPiB0byBv cGVyYXRlIGluIHRoZSBzYW1lIGRpcmVjdGlvbiAob2xkZXN0IHRvIHlvdW5nZXN0KSBmb3IgdGhl IHNhbWUKPiBwdXJwb3NlLiAgVGhlIGZ1bmRhbWVudGFsIHByb2JsZW0gdGhhdCBvY2N1cnMgd2hl biBtZW1vcnkgcmVjbGFpbQo+IHN0YXJ0cyB3cml0aW5nIHBhZ2VzIGJhY2sgZnJvbSB0aGUgTFJV IGlzIHRoaXM6Cj4gCj4gCS0gbWVtb3J5IHJlY2xhaW0gaGFzIHJ1biBhaGVhZCBvZiBJTyB3cml0 ZWJhY2sgLQo+IAo+IFRoZSBMUlUgdXN1YWxseSBsb29rcyBsaWtlIHRoaXM6Cj4gCj4gCW9sZGVz dAkJCQkJeW91bmdlc3QKPiAJKy0tLS0tLS0tLS0tLS0tLSstLS0tLS0tLS0tLS0tLS0rLS0tLS0t LS0tLS0tLS0rCj4gCWNsZWFuCQl3cml0ZWJhY2sJZGlydHkKPiAJCQleCQleCj4gCQkJfAkJfAo+ IAkJCXwJCVdoZXJlIGZsdXNoZXIgd2lsbCBuZXh0IHdvcmsgZnJvbQo+IAkJCXwJCVdoZXJlIGtz d2FwZCBpcyB3b3JraW5nIGZyb20KPiAJCQl8Cj4gCQkJSU8gc3VibWl0dGVkIGJ5IGZsdXNoZXIs IHdhaXRpbmcgb24gY29tcGxldGlvbgo+IAo+IAo+IElmIG1lbW9yeSByZWNsYWltIGlzIGhpdHRp bmcgZGlydHkgcGFnZXMgb24gdGhlIExSVSwgaXQgbWVhbnMgaXQgaGFzCj4gZ290IGFoZWFkIG9m IHdyaXRlYmFjayB3aXRob3V0IGJlaW5nIHRocm90dGxlZCAtIGl0J3MgcGFzc2VkIG92ZXIKPiBh bGwgdGhlIHBhZ2VzIGN1cnJlbnRseSB1bmRlciB3cml0ZWJhY2sgYW5kIGlzIHRyeWluZyB0byB3 cml0ZSBiYWNrCj4gcGFnZXMgdGhhdCBhcmUgKm5ld2VyKiB0aGFuIHdoYXQgd3JpdGViYWNrIGlz IHdvcmtpbmcgb24uIElPV3MsIGl0Cj4gc3RhcnRzIHRyeWluZyB0byBkbyB0aGUgam9iIG9mIHRo ZSBmbHVzaGVyIHRocmVhZHMsIGFuZCBpdCBkb2VzIHRoYXQKPiB2ZXJ5IGJhZGx5Lgo+IAo+IFRo ZSAkMTAwIHF1ZXN0aW9uIGlzIOKIl3doeSBpcyBpdCBnZXR0aW5nIGFoZWFkIG9mIHdyaXRlYmFj ayo/CgpUaGUgbW9zdCBpbXBvcnRhbnQgY2FzZSBpczogZmFzdGVyIHJlYWRlciArIHJlbGF0aXZl bHkgc2xvdyB3cml0ZXIuCgpBc3N1bWUgZm9yIGV2ZXJ5IDEwIHBhZ2VzIHJlYWQsIDEgcGFnZSBp cyBkaXJ0aWVkLCBhbmQgdGhlIGRpcnR5IHNwZWVkCmlzIGZhc3QgZW5vdWdoIHRvIHRyaWdnZXIg dGhlIDIwJSBkaXJ0eSByYXRpbyBhbmQgaGVuY2UgZGlydHkgYmFsYW5jaW5nLgoKVGhhdCBwYXR0 ZXJuIGlzIGFibGUgdG8gZXZlbmx5IGRpc3RyaWJ1dGUgZGlydHkgcGFnZXMgYWxsIG92ZXIgdGhl IExSVQpsaXN0IGFuZCBoZW5jZSB0cmlnZ2VyIGxvdHMgb2YgcGFnZW91dCgpcy4gVGhlICJza2lw IHJlY2xhaW0gd3JpdGVzIG9uCmxvdyBwcmVzc3VyZSIgYXBwcm9hY2ggY2FuIGZpeCB0aGlzIGNh c2UuCgpUaGFua3MsCkZlbmdndWFuZwotLS0KU3ViamVjdDogd3JpdGViYWNrOiBpbnRyb2R1Y2Ug YmRpX3N0YXJ0X2lub2RlX3dyaXRlYmFjaygpCkRhdGU6IFRodSBKdWwgMjkgMTQ6NDE6MTkgQ1NU IDIwMTAKClRoaXMgcmVsYXlzIEFTWU5DIGZpbGUgd3JpdGViYWNrIElPcyB0byB0aGUgZmx1c2hl ciB0aHJlYWRzLgoKcGFnZW91dCgpIHdpbGwgY29udGludWUgdG8gc2VydmUgdGhlIFNZTkMgZmls ZSBwYWdlIHdyaXRlcyBmb3IgbmVjZXNzYXJ5CnRocm90dGxpbmcgZm9yIHByZXZlbnRpbmcgT09N LCB3aGljaCBtYXkgaGFwcGVuIGlmIHRoZSBMUlUgbGlzdCBpcyBzbWFsbAphbmQvb3IgdGhlIHN0 b3JhZ2UgaXMgc2xvdywgc28gdGhhdCB0aGUgZmx1c2hlciBjYW5ub3QgY2xlYW4gZW5vdWdoCnBh Z2VzIGJlZm9yZSB0aGUgTFJVIGlzIGZ1bGwgc2Nhbm5lZC4KCk9ubHkgQVNZTkMgcGFnZW91dCgp IGlzIHJlbGF5ZWQgdG8gdGhlIGZsdXNoZXIgdGhyZWFkcywgdGhlIGxlc3MKZnJlcXVlbnQgU1lO QyBwYWdlb3V0KClzIHdpbGwgd29yayBhcyBiZWZvcmUgYXMgYSBsYXN0IHJlc29ydC4KVGhpcyBo ZWxwcyB0byBhdm9pZCBPT00gd2hlbiB0aGUgTFJVIGxpc3QgaXMgc21hbGwgYW5kL29yIHRoZSBz dG9yYWdlIGlzCnNsb3csIGFuZCB0aGUgZmx1c2hlciBjYW5ub3QgY2xlYW4gZW5vdWdoIHBhZ2Vz IGJlZm9yZSB0aGUgTFJVIGlzCmZ1bGwgc2Nhbm5lZC4KClRoZSBmbHVzaGVyIHdpbGwgcGlnZ3kg YmFjayBtb3JlIGRpcnR5IHBhZ2VzIGZvciBJTwotIGl0J3MgbW9yZSBJTyBlZmZpY2llbnQKLSBp dCBoZWxwcyBjbGVhbiBtb3JlIHBhZ2VzLCBhIGdvb2QgbnVtYmVyIG9mIHRoZW0gbWF5IHNpdCBp biB0aGUgc2FtZQogIExSVSBsaXN0IHRoYXQgaXMgYmVpbmcgc2Nhbm5lZC4KClRvIGF2b2lkIG1l bW9yeSBhbGxvY2F0aW9ucyBhdCBwYWdlIHJlY2xhaW0sIGEgbWVtcG9vbCBpcyBjcmVhdGVkLgoK QmFja2dyb3VuZC9wZXJpb2RpYyB3b3JrcyB3aWxsIHF1aXQgYXV0b21hdGljYWxseSAoYXMgZG9u ZSBpbiBhbm90aGVyCnBhdGNoKSwgc28gYXMgdG8gY2xlYW4gdGhlIHBhZ2VzIHVuZGVyIHJlY2xh aW0gQVNBUC4gSG93ZXZlciBmb3Igbm93IHRoZQpzeW5jIHdvcmsgY2FuIHN0aWxsIGJsb2NrIHVz IGZvciBsb25nIHRpbWUuCgpKYW4gS2FyYTogbGltaXQgdGhlIHNlYXJjaCBzY29wZS4KCkNDOiBK YW4gS2FyYSA8amFja0BzdXNlLmN6PgpDQzogUmlrIHZhbiBSaWVsIDxyaWVsQHJlZGhhdC5jb20+ CkNDOiBNZWwgR29ybWFuIDxtZWxAbGludXgudm5ldC5pYm0uY29tPgpDQzogTWluY2hhbiBLaW0g PG1pbmNoYW4ua2ltQGdtYWlsLmNvbT4KU2lnbmVkLW9mZi1ieTogV3UgRmVuZ2d1YW5nIDxmZW5n Z3Vhbmcud3VAaW50ZWwuY29tPgotLS0KIGZzL2ZzLXdyaXRlYmFjay5jICAgICAgICAgICAgICAg IHwgIDE1NiArKysrKysrKysrKysrKysrKysrKysrKysrKysrLQogaW5jbHVkZS9saW51eC9iYWNr aW5nLWRldi5oICAgICAgfCAgICAxIAogaW5jbHVkZS90cmFjZS9ldmVudHMvd3JpdGViYWNrLmgg fCAgIDE1ICsrCiBtbS92bXNjYW4uYyAgICAgICAgICAgICAgICAgICAgICB8ICAgIDggKwogNCBm aWxlcyBjaGFuZ2VkLCAxNzQgaW5zZXJ0aW9ucygrKSwgNiBkZWxldGlvbnMoLSkKCi0tLSBsaW51 eC1uZXh0Lm9yaWcvbW0vdm1zY2FuLmMJMjAxMS0wNi0yOSAyMDo0MzoxMC4wMDAwMDAwMDAgLTA3 MDAKKysrIGxpbnV4LW5leHQvbW0vdm1zY2FuLmMJMjAxMS0wNy0wNSAxODozMDoxOS4wMDAwMDAw MDAgLTA3MDAKQEAgLTgyNSw2ICs4MjUsMTQgQEAgc3RhdGljIHVuc2lnbmVkIGxvbmcgc2hyaW5r X3BhZ2VfbGlzdChzdAogCQlpZiAoUGFnZURpcnR5KHBhZ2UpKSB7CiAJCQlucl9kaXJ0eSsrOwog CisJCQlpZiAocGFnZV9pc19maWxlX2NhY2hlKHBhZ2UpICYmIG1hcHBpbmcgJiYKKwkJCSAgICBz Yy0+cmVjbGFpbV9tb2RlICE9IFJFQ0xBSU1fTU9ERV9TWU5DKSB7CisJCQkJaWYgKGZsdXNoX2lu b2RlX3BhZ2UocGFnZSwgbWFwcGluZykgPj0gMCkgeworCQkJCQlTZXRQYWdlUmVjbGFpbShwYWdl KTsKKwkJCQkJZ290byBrZWVwX2xvY2tlZDsKKwkJCQl9CisJCQl9CisKIAkJCWlmIChyZWZlcmVu Y2VzID09IFBBR0VSRUZfUkVDTEFJTV9DTEVBTikKIAkJCQlnb3RvIGtlZXBfbG9ja2VkOwogCQkJ aWYgKCFtYXlfZW50ZXJfZnMpCi0tLSBsaW51eC1uZXh0Lm9yaWcvZnMvZnMtd3JpdGViYWNrLmMJ MjAxMS0wNy0wNSAxODozMDoxNi4wMDAwMDAwMDAgLTA3MDAKKysrIGxpbnV4LW5leHQvZnMvZnMt d3JpdGViYWNrLmMJMjAxMS0wNy0wNSAxODozMDo1Mi4wMDAwMDAwMDAgLTA3MDAKQEAgLTMwLDEy ICszMCwyMSBAQAogI2luY2x1ZGUgImludGVybmFsLmgiCiAKIC8qCisgKiBXaGVuIGZsdXNoaW5n IGFuIGlub2RlIHBhZ2UgKGZvciBwYWdlIHJlY2xhaW0pLCB0cnkgdG8gcGlnZ3kgYmFjayB1cCB0 bworICogNE1CIG5lYXJieSBwYWdlcyBmb3IgSU8gZWZmaWNpZW5jeS4gVGhlc2UgcGFnZXMgd2ls bCBoYXZlIGdvb2Qgb3Bwb3J0dW5pdHkKKyAqIHRvIGJlIGluIHRoZSBzYW1lIExSVSBsaXN0Lgor ICovCisjZGVmaW5lIFdSSVRFX0FST1VORF9QQUdFUwlNSU5fV1JJVEVCQUNLX1BBR0VTCisKKy8q CiAgKiBQYXNzZWQgaW50byB3Yl93cml0ZWJhY2soKSwgZXNzZW50aWFsbHkgYSBzdWJzZXQgb2Yg d3JpdGViYWNrX2NvbnRyb2wKICAqLwogc3RydWN0IHdiX3dyaXRlYmFja193b3JrIHsKIAlsb25n IG5yX3BhZ2VzOwogCXN0cnVjdCBzdXBlcl9ibG9jayAqc2I7CiAJdW5zaWduZWQgbG9uZyAqb2xk ZXJfdGhhbl90aGlzOworCXN0cnVjdCBpbm9kZSAqaW5vZGU7CisJcGdvZmZfdCBvZmZzZXQ7CiAJ ZW51bSB3cml0ZWJhY2tfc3luY19tb2RlcyBzeW5jX21vZGU7CiAJdW5zaWduZWQgaW50IHRhZ2dl ZF93cml0ZXBhZ2VzOjE7CiAJdW5zaWduZWQgaW50IGZvcl9rdXBkYXRlOjE7CkBAIC01OSw2ICs2 OCwyNyBAQCBzdHJ1Y3Qgd2Jfd3JpdGViYWNrX3dvcmsgewogICovCiBpbnQgbnJfcGRmbHVzaF90 aHJlYWRzOwogCitzdGF0aWMgbWVtcG9vbF90ICp3Yl93b3JrX21lbXBvb2w7CisKK3N0YXRpYyB2 b2lkICp3Yl93b3JrX2FsbG9jKGdmcF90IGdmcF9tYXNrLCB2b2lkICpwb29sX2RhdGEpCit7CisJ LyoKKwkgKiBiZGlfc3RhcnRfaW5vZGVfd3JpdGViYWNrKCkgbWF5IGJlIGNhbGxlZCBvbiBwYWdl IHJlY2xhaW0KKwkgKi8KKwlpZiAoY3VycmVudC0+ZmxhZ3MgJiBQRl9NRU1BTExPQykKKwkJcmV0 dXJuIE5VTEw7CisKKwlyZXR1cm4ga21hbGxvYyhzaXplb2Yoc3RydWN0IHdiX3dyaXRlYmFja193 b3JrKSwgZ2ZwX21hc2spOworfQorCitzdGF0aWMgX19pbml0IGludCB3Yl93b3JrX2luaXQodm9p ZCkKK3sKKwl3Yl93b3JrX21lbXBvb2wgPSBtZW1wb29sX2NyZWF0ZSgxMDI0LAorCQkJCQkgd2Jf d29ya19hbGxvYywgbWVtcG9vbF9rZnJlZSwgTlVMTCk7CisJcmV0dXJuIHdiX3dvcmtfbWVtcG9v bCA/IDAgOiAtRU5PTUVNOworfQorZnNfaW5pdGNhbGwod2Jfd29ya19pbml0KTsKKwogLyoqCiAg KiB3cml0ZWJhY2tfaW5fcHJvZ3Jlc3MgLSBkZXRlcm1pbmUgd2hldGhlciB0aGVyZSBpcyB3cml0 ZWJhY2sgaW4gcHJvZ3Jlc3MKICAqIEBiZGk6IHRoZSBkZXZpY2UncyBiYWNraW5nX2Rldl9pbmZv IHN0cnVjdHVyZS4KQEAgLTEyMyw3ICsxNTMsNyBAQCBfX2JkaV9zdGFydF93cml0ZWJhY2soc3Ry dWN0IGJhY2tpbmdfZGV2CiAJICogVGhpcyBpcyBXQl9TWU5DX05PTkUgd3JpdGViYWNrLCBzbyBp ZiBhbGxvY2F0aW9uIGZhaWxzIGp1c3QKIAkgKiB3YWtldXAgdGhlIHRocmVhZCBmb3Igb2xkIGRp cnR5IGRhdGEgd3JpdGViYWNrCiAJICovCi0Jd29yayA9IGt6YWxsb2Moc2l6ZW9mKCp3b3JrKSwg R0ZQX0FUT01JQyk7CisJd29yayA9IG1lbXBvb2xfYWxsb2Mod2Jfd29ya19tZW1wb29sLCBHRlBf Tk9XQUlUKTsKIAlpZiAoIXdvcmspIHsKIAkJaWYgKGJkaS0+d2IudGFzaykgewogCQkJdHJhY2Vf d3JpdGViYWNrX25vd29yayhiZGkpOwpAQCAtMTMyLDYgKzE2Miw3IEBAIF9fYmRpX3N0YXJ0X3dy aXRlYmFjayhzdHJ1Y3QgYmFja2luZ19kZXYKIAkJcmV0dXJuOwogCX0KIAorCW1lbXNldCh3b3Jr LCAwLCBzaXplb2YoKndvcmspKTsKIAl3b3JrLT5zeW5jX21vZGUJPSBXQl9TWU5DX05PTkU7CiAJ d29yay0+bnJfcGFnZXMJPSBucl9wYWdlczsKIAl3b3JrLT5yYW5nZV9jeWNsaWMgPSByYW5nZV9j eWNsaWM7CkBAIC0xNzcsNiArMjA4LDEwNyBAQCB2b2lkIGJkaV9zdGFydF9iYWNrZ3JvdW5kX3dy aXRlYmFjayhzdHJ1CiAJc3Bpbl91bmxvY2tfYmgoJmJkaS0+d2JfbG9jayk7CiB9CiAKK3N0YXRp YyBib29sIGV4dGVuZF93cml0ZWJhY2tfcmFuZ2Uoc3RydWN0IHdiX3dyaXRlYmFja193b3JrICp3 b3JrLAorCQkJCSAgIHBnb2ZmX3Qgb2Zmc2V0KQoreworCXBnb2ZmX3QgZW5kID0gd29yay0+b2Zm c2V0ICsgd29yay0+bnJfcGFnZXM7CisKKwlpZiAob2Zmc2V0ID49IHdvcmstPm9mZnNldCAmJiBv ZmZzZXQgPCBlbmQpCisJCXJldHVybiB0cnVlOworCisJLyogdGhlIHVuc2lnbmVkIGNvbXBhcmlz b24gaGVscHMgZWxpbWluYXRlIG9uZSBjb21wYXJlICovCisJaWYgKHdvcmstPm9mZnNldCAtIG9m ZnNldCA8IFdSSVRFX0FST1VORF9QQUdFUykgeworCQl3b3JrLT5ucl9wYWdlcyArPSBXUklURV9B Uk9VTkRfUEFHRVM7CisJCXdvcmstPm9mZnNldCAtPSBXUklURV9BUk9VTkRfUEFHRVM7CisJCXJl dHVybiB0cnVlOworCX0KKworCWlmIChvZmZzZXQgLSBlbmQgPCBXUklURV9BUk9VTkRfUEFHRVMp IHsKKwkJd29yay0+bnJfcGFnZXMgKz0gV1JJVEVfQVJPVU5EX1BBR0VTOworCQlyZXR1cm4gdHJ1 ZTsKKwl9CisKKwlyZXR1cm4gZmFsc2U7Cit9CisKKy8qCisgKiBzY2hlZHVsZSB3cml0ZWJhY2sg b24gYSByYW5nZSBvZiBpbm9kZSBwYWdlcy4KKyAqLworc3RhdGljIHN0cnVjdCB3Yl93cml0ZWJh Y2tfd29yayAqCitiZGlfZmx1c2hfaW5vZGVfcmFuZ2Uoc3RydWN0IGJhY2tpbmdfZGV2X2luZm8g KmJkaSwKKwkJICAgICAgc3RydWN0IGlub2RlICppbm9kZSwKKwkJICAgICAgcGdvZmZfdCBvZmZz ZXQsCisJCSAgICAgIHBnb2ZmX3QgbGVuKQoreworCXN0cnVjdCB3Yl93cml0ZWJhY2tfd29yayAq d29yazsKKworCWlmICghaWdyYWIoaW5vZGUpKQorCQlyZXR1cm4gRVJSX1BUUigtRU5PRU5UKTsK KworCXdvcmsgPSBtZW1wb29sX2FsbG9jKHdiX3dvcmtfbWVtcG9vbCwgR0ZQX05PV0FJVCk7CisJ aWYgKCF3b3JrKQorCQlyZXR1cm4gRVJSX1BUUigtRU5PTUVNKTsKKworCW1lbXNldCh3b3JrLCAw LCBzaXplb2YoKndvcmspKTsKKwl3b3JrLT5zeW5jX21vZGUJCT0gV0JfU1lOQ19OT05FOworCXdv cmstPmlub2RlCQk9IGlub2RlOworCXdvcmstPm9mZnNldAkJPSBvZmZzZXQ7CisJd29yay0+bnJf cGFnZXMJCT0gbGVuOworCisJYmRpX3F1ZXVlX3dvcmsoYmRpLCB3b3JrKTsKKworCXJldHVybiB3 b3JrOworfQorCisvKgorICogQ2FsbGVkIGJ5IHBhZ2UgcmVjbGFpbSBjb2RlIHRvIGZsdXNoIHRo ZSBkaXJ0eSBwYWdlIEFTQVAuIERvIHdyaXRlLWFyb3VuZCB0bworICogaW1wcm92ZSBJTyB0aHJv dWdocHV0LiBUaGUgbmVhcmJ5IHBhZ2VzIHdpbGwgaGF2ZSBnb29kIGNoYW5jZSB0byByZXNpZGUg aW4KKyAqIHRoZSBzYW1lIExSVSBsaXN0IHRoYXQgdm1zY2FuIGlzIHdvcmtpbmcgb24sIGFuZCBl dmVuIGNsb3NlIHRvIGVhY2ggb3RoZXIKKyAqIGluc2lkZSB0aGUgTFJVIGxpc3QgaW4gdGhlIGNv bW1vbiBjYXNlIG9mIHNlcXVlbnRpYWwgcmVhZC93cml0ZS4KKyAqCisgKiByZXQgPiAwOiBzdWNj ZXNzLCBmb3VuZC9yZXVzZWQgYSBwcmV2aW91cyB3cml0ZWJhY2sgd29yaworICogcmV0ID0gMDog c3VjY2VzcywgYWxsb2NhdGVkL3F1ZXVlZCBhIG5ldyB3cml0ZWJhY2sgd29yaworICogcmV0IDwg MDogZmFpbGVkCisgKi8KK2xvbmcgZmx1c2hfaW5vZGVfcGFnZShzdHJ1Y3QgcGFnZSAqcGFnZSwg c3RydWN0IGFkZHJlc3Nfc3BhY2UgKm1hcHBpbmcpCit7CisJc3RydWN0IGJhY2tpbmdfZGV2X2lu Zm8gKmJkaSA9IG1hcHBpbmctPmJhY2tpbmdfZGV2X2luZm87CisJc3RydWN0IGlub2RlICppbm9k ZSA9IG1hcHBpbmctPmhvc3Q7CisJcGdvZmZfdCBvZmZzZXQgPSBwYWdlLT5pbmRleDsKKwlwZ29m Zl90IGxlbiA9IDA7CisJc3RydWN0IHdiX3dyaXRlYmFja193b3JrICp3b3JrOworCWxvbmcgcmV0 ID0gLUVOT0VOVDsKKworCWlmICh1bmxpa2VseSghaW5vZGUpKQorCQlnb3RvIG91dDsKKworCWxl biA9IDE7CisJc3Bpbl9sb2NrX2JoKCZiZGktPndiX2xvY2spOworCWxpc3RfZm9yX2VhY2hfZW50 cnlfcmV2ZXJzZSh3b3JrLCAmYmRpLT53b3JrX2xpc3QsIGxpc3QpIHsKKwkJaWYgKHdvcmstPmlu b2RlICE9IGlub2RlKQorCQkJY29udGludWU7CisJCWlmIChleHRlbmRfd3JpdGViYWNrX3Jhbmdl KHdvcmssIG9mZnNldCkpIHsKKwkJCXJldCA9IGxlbjsKKwkJCW9mZnNldCA9IHdvcmstPm9mZnNl dDsKKwkJCWxlbiA9IHdvcmstPm5yX3BhZ2VzOworCQkJYnJlYWs7CisJCX0KKwkJaWYgKGxlbisr ID4gMzApCS8qIGRvIGxpbWl0ZWQgc2VhcmNoICovCisJCQlicmVhazsKKwl9CisJc3Bpbl91bmxv Y2tfYmgoJmJkaS0+d2JfbG9jayk7CisKKwlpZiAocmV0ID4gMCkKKwkJZ290byBvdXQ7CisKKwlv ZmZzZXQgPSByb3VuZF9kb3duKG9mZnNldCwgV1JJVEVfQVJPVU5EX1BBR0VTKTsKKwlsZW4gPSBX UklURV9BUk9VTkRfUEFHRVM7CisJd29yayA9IGJkaV9mbHVzaF9pbm9kZV9yYW5nZShiZGksIGlu b2RlLCBvZmZzZXQsIGxlbik7CisJcmV0ID0gSVNfRVJSKHdvcmspID8gUFRSX0VSUih3b3JrKSA6 IDA7CitvdXQ6CisJcmV0dXJuIHJldDsKK30KKwogLyoKICAqIFJlbW92ZSB0aGUgaW5vZGUgZnJv bSB0aGUgd3JpdGViYWNrIGxpc3QgaXQgaXMgb24uCiAgKi8KQEAgLTgzMCw2ICs5NjIsMjEgQEAg c3RhdGljIHVuc2lnbmVkIGxvbmcgZ2V0X25yX2RpcnR5X3BhZ2VzKAogCQlnZXRfbnJfZGlydHlf aW5vZGVzKCk7CiB9CiAKK3N0YXRpYyBsb25nIHdiX2ZsdXNoX2lub2RlKHN0cnVjdCBiZGlfd3Jp dGViYWNrICp3YiwKKwkJCSAgIHN0cnVjdCB3Yl93cml0ZWJhY2tfd29yayAqd29yaykKK3sKKwls b2ZmX3Qgc3RhcnQgPSB3b3JrLT5vZmZzZXQ7CisJbG9mZl90IGVuZCAgID0gd29yay0+b2Zmc2V0 ICsgd29yay0+bnJfcGFnZXMgLSAxOworCWludCB3cm90ZTsKKworCXdyb3RlID0gX19maWxlbWFw X2ZkYXRhd3JpdGVfcmFuZ2Uod29yay0+aW5vZGUtPmlfbWFwcGluZywKKwkJCQkJICAgc3RhcnQg PDwgUEFHRV9DQUNIRV9TSElGVCwKKwkJCQkJICAgZW5kICAgPDwgUEFHRV9DQUNIRV9TSElGVCwK KwkJCQkJICAgV0JfU1lOQ19OT05FKTsKKwlpcHV0KHdvcmstPmlub2RlKTsKKwlyZXR1cm4gd3Jv dGU7Cit9CisKIHN0YXRpYyBsb25nIHdiX2NoZWNrX2JhY2tncm91bmRfZmx1c2goc3RydWN0IGJk aV93cml0ZWJhY2sgKndiKQogewogCWlmIChvdmVyX2Jncm91bmRfdGhyZXNoKCkpIHsKQEAgLTkw MCw3ICsxMDQ3LDEwIEBAIGxvbmcgd2JfZG9fd3JpdGViYWNrKHN0cnVjdCBiZGlfd3JpdGViYWMK IAogCQl0cmFjZV93cml0ZWJhY2tfZXhlYyhiZGksIHdvcmspOwogCi0JCXdyb3RlICs9IHdiX3dy aXRlYmFjayh3Yiwgd29yayk7CisJCWlmICh3b3JrLT5pbm9kZSkKKwkJCXdyb3RlICs9IHdiX2Zs dXNoX2lub2RlKHdiLCB3b3JrKTsKKwkJZWxzZQorCQkJd3JvdGUgKz0gd2Jfd3JpdGViYWNrKHdi LCB3b3JrKTsKIAogCQkvKgogCQkgKiBOb3RpZnkgdGhlIGNhbGxlciBvZiBjb21wbGV0aW9uIGlm IHRoaXMgaXMgYSBzeW5jaHJvbm91cwpAQCAtOTA5LDcgKzEwNTksNyBAQCBsb25nIHdiX2RvX3dy aXRlYmFjayhzdHJ1Y3QgYmRpX3dyaXRlYmFjCiAJCWlmICh3b3JrLT5kb25lKQogCQkJY29tcGxl dGUod29yay0+ZG9uZSk7CiAJCWVsc2UKLQkJCWtmcmVlKHdvcmspOworCQkJbWVtcG9vbF9mcmVl KHdvcmssIHdiX3dvcmtfbWVtcG9vbCk7CiAJfQogCiAJLyoKLS0tIGxpbnV4LW5leHQub3JpZy9p bmNsdWRlL2xpbnV4L2JhY2tpbmctZGV2LmgJMjAxMS0wNy0wMyAyMDowMzozNy4wMDAwMDAwMDAg LTA3MDAKKysrIGxpbnV4LW5leHQvaW5jbHVkZS9saW51eC9iYWNraW5nLWRldi5oCTIwMTEtMDct MDUgMTg6MzA6MTkuMDAwMDAwMDAwIC0wNzAwCkBAIC0xMDksNiArMTA5LDcgQEAgdm9pZCBiZGlf dW5yZWdpc3RlcihzdHJ1Y3QgYmFja2luZ19kZXZfaQogaW50IGJkaV9zZXR1cF9hbmRfcmVnaXN0 ZXIoc3RydWN0IGJhY2tpbmdfZGV2X2luZm8gKiwgY2hhciAqLCB1bnNpZ25lZCBpbnQpOwogdm9p ZCBiZGlfc3RhcnRfd3JpdGViYWNrKHN0cnVjdCBiYWNraW5nX2Rldl9pbmZvICpiZGksIGxvbmcg bnJfcGFnZXMpOwogdm9pZCBiZGlfc3RhcnRfYmFja2dyb3VuZF93cml0ZWJhY2soc3RydWN0IGJh Y2tpbmdfZGV2X2luZm8gKmJkaSk7Citsb25nIGZsdXNoX2lub2RlX3BhZ2Uoc3RydWN0IHBhZ2Ug KnBhZ2UsIHN0cnVjdCBhZGRyZXNzX3NwYWNlICptYXBwaW5nKTsKIGludCBiZGlfd3JpdGViYWNr X3RocmVhZCh2b2lkICpkYXRhKTsKIGludCBiZGlfaGFzX2RpcnR5X2lvKHN0cnVjdCBiYWNraW5n X2Rldl9pbmZvICpiZGkpOwogdm9pZCBiZGlfYXJtX3N1cGVyc190aW1lcih2b2lkKTsKLS0tIGxp bnV4LW5leHQub3JpZy9pbmNsdWRlL3RyYWNlL2V2ZW50cy93cml0ZWJhY2suaAkyMDExLTA3LTA1 IDE4OjMwOjE2LjAwMDAwMDAwMCAtMDcwMAorKysgbGludXgtbmV4dC9pbmNsdWRlL3RyYWNlL2V2 ZW50cy93cml0ZWJhY2suaAkyMDExLTA3LTA1IDE4OjMwOjE5LjAwMDAwMDAwMCAtMDcwMApAQCAt MjgsMzEgKzI4LDQwIEBAIERFQ0xBUkVfRVZFTlRfQ0xBU1Mod3JpdGViYWNrX3dvcmtfY2xhc3MK IAlUUF9BUkdTKGJkaSwgd29yayksCiAJVFBfU1RSVUNUX19lbnRyeSgKIAkJX19hcnJheShjaGFy LCBuYW1lLCAzMikKKwkJX19maWVsZChzdHJ1Y3Qgd2Jfd3JpdGViYWNrX3dvcmsqLCB3b3JrKQog CQlfX2ZpZWxkKGxvbmcsIG5yX3BhZ2VzKQogCQlfX2ZpZWxkKGRldl90LCBzYl9kZXYpCiAJCV9f ZmllbGQoaW50LCBzeW5jX21vZGUpCiAJCV9fZmllbGQoaW50LCBmb3Jfa3VwZGF0ZSkKIAkJX19m aWVsZChpbnQsIHJhbmdlX2N5Y2xpYykKIAkJX19maWVsZChpbnQsIGZvcl9iYWNrZ3JvdW5kKQor CQlfX2ZpZWxkKHVuc2lnbmVkIGxvbmcsIGlubykKKwkJX19maWVsZCh1bnNpZ25lZCBsb25nLCBv ZmZzZXQpCiAJKSwKIAlUUF9mYXN0X2Fzc2lnbigKIAkJc3RybmNweShfX2VudHJ5LT5uYW1lLCBk ZXZfbmFtZShiZGktPmRldiksIDMyKTsKKwkJX19lbnRyeS0+d29yayA9IHdvcms7CiAJCV9fZW50 cnktPm5yX3BhZ2VzID0gd29yay0+bnJfcGFnZXM7CiAJCV9fZW50cnktPnNiX2RldiA9IHdvcmst PnNiID8gd29yay0+c2ItPnNfZGV2IDogMDsKIAkJX19lbnRyeS0+c3luY19tb2RlID0gd29yay0+ c3luY19tb2RlOwogCQlfX2VudHJ5LT5mb3Jfa3VwZGF0ZSA9IHdvcmstPmZvcl9rdXBkYXRlOwog CQlfX2VudHJ5LT5yYW5nZV9jeWNsaWMgPSB3b3JrLT5yYW5nZV9jeWNsaWM7CiAJCV9fZW50cnkt PmZvcl9iYWNrZ3JvdW5kCT0gd29yay0+Zm9yX2JhY2tncm91bmQ7CisJCV9fZW50cnktPmlubwkJ PSB3b3JrLT5pbm9kZSA/IHdvcmstPmlub2RlLT5pX2lubyA6IDA7CisJCV9fZW50cnktPm9mZnNl dAkJPSB3b3JrLT5vZmZzZXQ7CiAJKSwKLQlUUF9wcmludGsoImJkaSAlczogc2JfZGV2ICVkOiVk IG5yX3BhZ2VzPSVsZCBzeW5jX21vZGU9JWQgIgotCQkgICJrdXBkYXRlPSVkIHJhbmdlX2N5Y2xp Yz0lZCBiYWNrZ3JvdW5kPSVkIiwKKwlUUF9wcmludGsoImJkaSAlczogc2JfZGV2ICVkOiVkICVw IG5yX3BhZ2VzPSVsZCBzeW5jX21vZGU9JWQgIgorCQkgICJrdXBkYXRlPSVkIHJhbmdlX2N5Y2xp Yz0lZCBiYWNrZ3JvdW5kPSVkIGlubz0lbHUgb2Zmc2V0PSVsdSIsCiAJCSAgX19lbnRyeS0+bmFt ZSwKIAkJICBNQUpPUihfX2VudHJ5LT5zYl9kZXYpLCBNSU5PUihfX2VudHJ5LT5zYl9kZXYpLAor CQkgIF9fZW50cnktPndvcmssCiAJCSAgX19lbnRyeS0+bnJfcGFnZXMsCiAJCSAgX19lbnRyeS0+ c3luY19tb2RlLAogCQkgIF9fZW50cnktPmZvcl9rdXBkYXRlLAogCQkgIF9fZW50cnktPnJhbmdl X2N5Y2xpYywKLQkJICBfX2VudHJ5LT5mb3JfYmFja2dyb3VuZAorCQkgIF9fZW50cnktPmZvcl9i YWNrZ3JvdW5kLAorCQkgIF9fZW50cnktPmlubywKKwkJICBfX2VudHJ5LT5vZmZzZXQKIAkpCiAp OwogI2RlZmluZSBERUZJTkVfV1JJVEVCQUNLX1dPUktfRVZFTlQobmFtZSkgXAoKX19fX19fX19f X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX18KeGZzIG1haWxpbmcgbGlzdAp4 ZnNAb3NzLnNnaS5jb20KaHR0cDovL29zcy5zZ2kuY29tL21haWxtYW4vbGlzdGluZm8veGZzCg== From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail203.messagelabs.com (mail203.messagelabs.com [216.82.254.243]) by kanga.kvack.org (Postfix) with SMTP id 4A8849000C2 for ; Wed, 6 Jul 2011 00:53:17 -0400 (EDT) Date: Tue, 5 Jul 2011 21:53:01 -0700 From: Wu Fengguang Subject: Re: [PATCH 03/27] xfs: use write_cache_pages for writeback clustering Message-ID: <20110706045301.GA11604@localhost> References: <20110629140109.003209430@bombadil.infradead.org> <20110629140336.950805096@bombadil.infradead.org> <20110701022248.GM561@dastard> <20110701041851.GN561@dastard> <20110701093305.GA28531@infradead.org> <20110701154136.GA17881@localhost> <20110704032534.GD1026@dastard> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <20110704032534.GD1026@dastard> Sender: owner-linux-mm@kvack.org List-ID: To: Dave Chinner Cc: Christoph Hellwig , Mel Gorman , Johannes Weiner , "xfs@oss.sgi.com" , "linux-mm@kvack.org" On Mon, Jul 04, 2011 at 11:25:34AM +0800, Dave Chinner wrote: > On Fri, Jul 01, 2011 at 11:41:36PM +0800, Wu Fengguang wrote: > > Christoph, > > > > On Fri, Jul 01, 2011 at 05:33:05PM +0800, Christoph Hellwig wrote: > > > Johannes, Mel, Wu, > > > > > > Dave has been stressing some XFS patches of mine that remove the XFS > > > internal writeback clustering in favour of using write_cache_pages. > > > > > > As part of investigating the behaviour he found out that we're still > > > doing lots of I/O from the end of the LRU in kswapd. Not only is that > > > pretty bad behaviour in general, but it also means we really can't > > > just remove the writeback clustering in writepage given how much > > > I/O is still done through that. > > > > > > Any chance we could the writeback vs kswap behaviour sorted out a bit > > > better finally? > > > > I once tried this approach: > > > > http://www.spinics.net/lists/linux-mm/msg09202.html > > > > It used a list structure that is not linearly scalable, however that > > part should be independently improvable when necessary. > > I don't think that handing random writeback to the flusher thread is > much better than doing random writeback directly. Yes, you added > some clustering, but I'm still don't think writing specific pages is > the best solution. I agree that the VM should avoid writing specific pages as much as possible. Mostly often, it's indeed OK to just skip sporadically encountered dirty page and reclaim the clean pages presumably not far away in the LRU list. So your 2-liner patch is all good if constraining it to low scan pressure, which will look like if (priority == DEF_PRIORITY) tag PG_reclaim on encountered dirty pages and skip writing it However the VM in general does need the ability to write specific pages, such as when reclaiming from specific zone/memcg. So I'll still propose to do bdi_start_inode_writeback(). Below is the patch rebased to linux-next. It's good enough for testing purpose, and I guess even with the ->nr_pages work issue, it's complete enough to get roughly the same performance as your 2-liner patch. > > The real problem was, it seem to not very effective in my test runs. > > I found many ->nr_pages works queued before the ->inode works, which > > effectively makes the flusher working on more dispersed pages rather > > than focusing on the dirty pages encountered in LRU reclaim. > > But that's really just an implementation issue related to how you > tried to solve the problem. That could be addressed. > > However, what I'm questioning is whether we should even care what > page memory reclaim wants to write - it seems to make fundamentally > bad decisions from an IO persepctive. > > We have to remember that memory reclaim is doing LRU reclaim and the > flusher threads are doing "oldest first" writeback. IOWs, both are trying > to operate in the same direction (oldest to youngest) for the same > purpose. The fundamental problem that occurs when memory reclaim > starts writing pages back from the LRU is this: > > - memory reclaim has run ahead of IO writeback - > > The LRU usually looks like this: > > oldest youngest > +---------------+---------------+--------------+ > clean writeback dirty > ^ ^ > | | > | Where flusher will next work from > | Where kswapd is working from > | > IO submitted by flusher, waiting on completion > > > If memory reclaim is hitting dirty pages on the LRU, it means it has > got ahead of writeback without being throttled - it's passed over > all the pages currently under writeback and is trying to write back > pages that are *newer* than what writeback is working on. IOWs, it > starts trying to do the job of the flusher threads, and it does that > very badly. > > The $100 question is a??why is it getting ahead of writeback*? The most important case is: faster reader + relatively slow writer. Assume for every 10 pages read, 1 page is dirtied, and the dirty speed is fast enough to trigger the 20% dirty ratio and hence dirty balancing. That pattern is able to evenly distribute dirty pages all over the LRU list and hence trigger lots of pageout()s. The "skip reclaim writes on low pressure" approach can fix this case. Thanks, Fengguang --- Subject: writeback: introduce bdi_start_inode_writeback() Date: Thu Jul 29 14:41:19 CST 2010 This relays ASYNC file writeback IOs to the flusher threads. pageout() will continue to serve the SYNC file page writes for necessary throttling for preventing OOM, which may happen if the LRU list is small and/or the storage is slow, so that the flusher cannot clean enough pages before the LRU is full scanned. Only ASYNC pageout() is relayed to the flusher threads, the less frequent SYNC pageout()s will work as before as a last resort. This helps to avoid OOM when the LRU list is small and/or the storage is slow, and the flusher cannot clean enough pages before the LRU is full scanned. The flusher will piggy back more dirty pages for IO - it's more IO efficient - it helps clean more pages, a good number of them may sit in the same LRU list that is being scanned. To avoid memory allocations at page reclaim, a mempool is created. Background/periodic works will quit automatically (as done in another patch), so as to clean the pages under reclaim ASAP. However for now the sync work can still block us for long time. Jan Kara: limit the search scope. CC: Jan Kara CC: Rik van Riel CC: Mel Gorman CC: Minchan Kim Signed-off-by: Wu Fengguang --- fs/fs-writeback.c | 156 ++++++++++++++++++++++++++++- include/linux/backing-dev.h | 1 include/trace/events/writeback.h | 15 ++ mm/vmscan.c | 8 + 4 files changed, 174 insertions(+), 6 deletions(-) --- linux-next.orig/mm/vmscan.c 2011-06-29 20:43:10.000000000 -0700 +++ linux-next/mm/vmscan.c 2011-07-05 18:30:19.000000000 -0700 @@ -825,6 +825,14 @@ static unsigned long shrink_page_list(st if (PageDirty(page)) { nr_dirty++; + if (page_is_file_cache(page) && mapping && + sc->reclaim_mode != RECLAIM_MODE_SYNC) { + if (flush_inode_page(page, mapping) >= 0) { + SetPageReclaim(page); + goto keep_locked; + } + } + if (references == PAGEREF_RECLAIM_CLEAN) goto keep_locked; if (!may_enter_fs) --- linux-next.orig/fs/fs-writeback.c 2011-07-05 18:30:16.000000000 -0700 +++ linux-next/fs/fs-writeback.c 2011-07-05 18:30:52.000000000 -0700 @@ -30,12 +30,21 @@ #include "internal.h" /* + * When flushing an inode page (for page reclaim), try to piggy back up to + * 4MB nearby pages for IO efficiency. These pages will have good opportunity + * to be in the same LRU list. + */ +#define WRITE_AROUND_PAGES MIN_WRITEBACK_PAGES + +/* * Passed into wb_writeback(), essentially a subset of writeback_control */ struct wb_writeback_work { long nr_pages; struct super_block *sb; unsigned long *older_than_this; + struct inode *inode; + pgoff_t offset; enum writeback_sync_modes sync_mode; unsigned int tagged_writepages:1; unsigned int for_kupdate:1; @@ -59,6 +68,27 @@ struct wb_writeback_work { */ int nr_pdflush_threads; +static mempool_t *wb_work_mempool; + +static void *wb_work_alloc(gfp_t gfp_mask, void *pool_data) +{ + /* + * bdi_start_inode_writeback() may be called on page reclaim + */ + if (current->flags & PF_MEMALLOC) + return NULL; + + return kmalloc(sizeof(struct wb_writeback_work), gfp_mask); +} + +static __init int wb_work_init(void) +{ + wb_work_mempool = mempool_create(1024, + wb_work_alloc, mempool_kfree, NULL); + return wb_work_mempool ? 0 : -ENOMEM; +} +fs_initcall(wb_work_init); + /** * writeback_in_progress - determine whether there is writeback in progress * @bdi: the device's backing_dev_info structure. @@ -123,7 +153,7 @@ __bdi_start_writeback(struct backing_dev * This is WB_SYNC_NONE writeback, so if allocation fails just * wakeup the thread for old dirty data writeback */ - work = kzalloc(sizeof(*work), GFP_ATOMIC); + work = mempool_alloc(wb_work_mempool, GFP_NOWAIT); if (!work) { if (bdi->wb.task) { trace_writeback_nowork(bdi); @@ -132,6 +162,7 @@ __bdi_start_writeback(struct backing_dev return; } + memset(work, 0, sizeof(*work)); work->sync_mode = WB_SYNC_NONE; work->nr_pages = nr_pages; work->range_cyclic = range_cyclic; @@ -177,6 +208,107 @@ void bdi_start_background_writeback(stru spin_unlock_bh(&bdi->wb_lock); } +static bool extend_writeback_range(struct wb_writeback_work *work, + pgoff_t offset) +{ + pgoff_t end = work->offset + work->nr_pages; + + if (offset >= work->offset && offset < end) + return true; + + /* the unsigned comparison helps eliminate one compare */ + if (work->offset - offset < WRITE_AROUND_PAGES) { + work->nr_pages += WRITE_AROUND_PAGES; + work->offset -= WRITE_AROUND_PAGES; + return true; + } + + if (offset - end < WRITE_AROUND_PAGES) { + work->nr_pages += WRITE_AROUND_PAGES; + return true; + } + + return false; +} + +/* + * schedule writeback on a range of inode pages. + */ +static struct wb_writeback_work * +bdi_flush_inode_range(struct backing_dev_info *bdi, + struct inode *inode, + pgoff_t offset, + pgoff_t len) +{ + struct wb_writeback_work *work; + + if (!igrab(inode)) + return ERR_PTR(-ENOENT); + + work = mempool_alloc(wb_work_mempool, GFP_NOWAIT); + if (!work) + return ERR_PTR(-ENOMEM); + + memset(work, 0, sizeof(*work)); + work->sync_mode = WB_SYNC_NONE; + work->inode = inode; + work->offset = offset; + work->nr_pages = len; + + bdi_queue_work(bdi, work); + + return work; +} + +/* + * Called by page reclaim code to flush the dirty page ASAP. Do write-around to + * improve IO throughput. The nearby pages will have good chance to reside in + * the same LRU list that vmscan is working on, and even close to each other + * inside the LRU list in the common case of sequential read/write. + * + * ret > 0: success, found/reused a previous writeback work + * ret = 0: success, allocated/queued a new writeback work + * ret < 0: failed + */ +long flush_inode_page(struct page *page, struct address_space *mapping) +{ + struct backing_dev_info *bdi = mapping->backing_dev_info; + struct inode *inode = mapping->host; + pgoff_t offset = page->index; + pgoff_t len = 0; + struct wb_writeback_work *work; + long ret = -ENOENT; + + if (unlikely(!inode)) + goto out; + + len = 1; + spin_lock_bh(&bdi->wb_lock); + list_for_each_entry_reverse(work, &bdi->work_list, list) { + if (work->inode != inode) + continue; + if (extend_writeback_range(work, offset)) { + ret = len; + offset = work->offset; + len = work->nr_pages; + break; + } + if (len++ > 30) /* do limited search */ + break; + } + spin_unlock_bh(&bdi->wb_lock); + + if (ret > 0) + goto out; + + offset = round_down(offset, WRITE_AROUND_PAGES); + len = WRITE_AROUND_PAGES; + work = bdi_flush_inode_range(bdi, inode, offset, len); + ret = IS_ERR(work) ? PTR_ERR(work) : 0; +out: + return ret; +} + /* * Remove the inode from the writeback list it is on. */ @@ -830,6 +962,21 @@ static unsigned long get_nr_dirty_pages( get_nr_dirty_inodes(); } +static long wb_flush_inode(struct bdi_writeback *wb, + struct wb_writeback_work *work) +{ + loff_t start = work->offset; + loff_t end = work->offset + work->nr_pages - 1; + int wrote; + + wrote = __filemap_fdatawrite_range(work->inode->i_mapping, + start << PAGE_CACHE_SHIFT, + end << PAGE_CACHE_SHIFT, + WB_SYNC_NONE); + iput(work->inode); + return wrote; +} + static long wb_check_background_flush(struct bdi_writeback *wb) { if (over_bground_thresh()) { @@ -900,7 +1047,10 @@ long wb_do_writeback(struct bdi_writebac trace_writeback_exec(bdi, work); - wrote += wb_writeback(wb, work); + if (work->inode) + wrote += wb_flush_inode(wb, work); + else + wrote += wb_writeback(wb, work); /* * Notify the caller of completion if this is a synchronous @@ -909,7 +1059,7 @@ long wb_do_writeback(struct bdi_writebac if (work->done) complete(work->done); else - kfree(work); + mempool_free(work, wb_work_mempool); } /* --- linux-next.orig/include/linux/backing-dev.h 2011-07-03 20:03:37.000000000 -0700 +++ linux-next/include/linux/backing-dev.h 2011-07-05 18:30:19.000000000 -0700 @@ -109,6 +109,7 @@ void bdi_unregister(struct backing_dev_i int bdi_setup_and_register(struct backing_dev_info *, char *, unsigned int); void bdi_start_writeback(struct backing_dev_info *bdi, long nr_pages); void bdi_start_background_writeback(struct backing_dev_info *bdi); +long flush_inode_page(struct page *page, struct address_space *mapping); int bdi_writeback_thread(void *data); int bdi_has_dirty_io(struct backing_dev_info *bdi); void bdi_arm_supers_timer(void); --- linux-next.orig/include/trace/events/writeback.h 2011-07-05 18:30:16.000000000 -0700 +++ linux-next/include/trace/events/writeback.h 2011-07-05 18:30:19.000000000 -0700 @@ -28,31 +28,40 @@ DECLARE_EVENT_CLASS(writeback_work_class TP_ARGS(bdi, work), TP_STRUCT__entry( __array(char, name, 32) + __field(struct wb_writeback_work*, work) __field(long, nr_pages) __field(dev_t, sb_dev) __field(int, sync_mode) __field(int, for_kupdate) __field(int, range_cyclic) __field(int, for_background) + __field(unsigned long, ino) + __field(unsigned long, offset) ), TP_fast_assign( strncpy(__entry->name, dev_name(bdi->dev), 32); + __entry->work = work; __entry->nr_pages = work->nr_pages; __entry->sb_dev = work->sb ? work->sb->s_dev : 0; __entry->sync_mode = work->sync_mode; __entry->for_kupdate = work->for_kupdate; __entry->range_cyclic = work->range_cyclic; __entry->for_background = work->for_background; + __entry->ino = work->inode ? work->inode->i_ino : 0; + __entry->offset = work->offset; ), - TP_printk("bdi %s: sb_dev %d:%d nr_pages=%ld sync_mode=%d " - "kupdate=%d range_cyclic=%d background=%d", + TP_printk("bdi %s: sb_dev %d:%d %p nr_pages=%ld sync_mode=%d " + "kupdate=%d range_cyclic=%d background=%d ino=%lu offset=%lu", __entry->name, MAJOR(__entry->sb_dev), MINOR(__entry->sb_dev), + __entry->work, __entry->nr_pages, __entry->sync_mode, __entry->for_kupdate, __entry->range_cyclic, - __entry->for_background + __entry->for_background, + __entry->ino, + __entry->offset ) ); #define DEFINE_WRITEBACK_WORK_EVENT(name) \ -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org