From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1759160AbYLQAA2 (ORCPT ); Tue, 16 Dec 2008 19:00:28 -0500 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1754345AbYLQAAK (ORCPT ); Tue, 16 Dec 2008 19:00:10 -0500 Received: from gw1.cosmosbay.com ([86.65.150.130]:46369 "EHLO gw1.cosmosbay.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752499AbYLQAAI convert rfc822-to-8bit (ORCPT ); Tue, 16 Dec 2008 19:00:08 -0500 Message-ID: <494840C4.50000@cosmosbay.com> Date: Wed, 17 Dec 2008 00:59:00 +0100 From: Eric Dumazet User-Agent: Thunderbird 2.0.0.18 (Windows/20081105) MIME-Version: 1.0 To: Rusty Russell CC: David Miller , rostedt@goodmis.org, akpm@linux-foundation.org, linux-kernel@vger.kernel.org, mathieu.desnoyers@polymtl.ca, paulus@samba.org, benh@kernel.crashing.org, linux-ia64@vger.kernel.org, linux-s390@vger.kernel.org Subject: Re: local_add_return References: <200812161703.00697.rusty@rustcorp.com.au> <20081215.231314.92267481.davem@davemloft.net> <200812170908.05423.rusty@rustcorp.com.au> In-Reply-To: <200812170908.05423.rusty@rustcorp.com.au> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 8BIT X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-1.6 (gw1.cosmosbay.com [0.0.0.0]); Wed, 17 Dec 2008 00:59:02 +0100 (CET) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Rusty Russell a écrit : > On Tuesday 16 December 2008 17:43:14 David Miller wrote: >> Here ya go: > > Very interesting. There's a little noise there (that first local_inc of 243 > is wrong), but the picture is clear: trivalue is the best implementation for > sparc64. > > Note: trivalue uses 3 values, so instead of hitting random values across 8MB > it's across 24MB, and despite the resulting cache damage it's 15% faster. The > cpu_local_inc test is a single value, so no cache effects: it shows trivalue > to be 3 to 3.5 times faster in the cache-hot case. > > This sucks, because it really does mean that there's no one-size-fits-all > implementation of local_t. There's also no platform yet where atomic_long_t > is the right choice; and that's the default! > > Any chance of an IA64 or s390 run? You can normalize if you like, since > it's only to compare the different approaches. > > Cheers, > Rusty. > > Benchmarks for local_t variants > > (This patch also fixes the x86 cpu_local_* macros, which are obviously > unused). > > I chose a large array (1M longs) for the inc/add/add_return tests so > the trivalue case would show some cache pressure. > > The cpu_local_inc case is always cache-hot, so it's not comparable to > the others. Would be good to differenciate results, if data is already in cache or not... > > Time in ns per iteration (brackets is with CONFIG_PREEMPT=y): > > inc add add_return cpu_local_inc read > x86-32: 2.13 Ghz Core Duo 2 > atomic_long 118 118 115 17 17 really strange atomic_long performs so badly here. LOCK + data not in cache -> really really bad... > irqsave/rest 77 78 77 23 16 > trivalue 45 45 127 3(6) 21 > local_t 36 36 36 1(5) 17 > > x86-64: 2.6 GHz Dual-Core AMD Opteron 2218 > atomic_long 55 60 - 6 19 > irqsave/rest 54 54 - 11 19 > trivalue 47 47 - 5 28 > local_t 47 46 - 1 19 > Running local_t variant benchmarks atomic_long: local_inc=395001846/11 local_add=395000325/11 cpu_local_inc=362000295/10 local_read=49000040/1 local_add_return=396000322/11 (total was 1728053248) irqsave/restore: local_inc=498000400/14 local_add=496000395/14 cpu_local_inc=486000384/14 local_read=68000054/2 local_add_return=502000394/14 (total was 1728053248) trivalue: local_inc=1325001024/39 local_add=1324001226/39 cpu_local_inc=81000080/2 local_read=786000766/23 local_add_return=4193003781/124 (total was 1728053248) local_t: local_inc=69000059/2 local_add=69000058/2 cpu_local_inc=42000035/1 local_read=50000043/1 local_add_return=90000076/2 (total was 1728053248, warm_total 62914562) Intel(R) Xeon(R) CPU E5450 @ 3.00GHz two quadcore cpus, x86-32 kernel It seems Core2 are really better than Core Duo 2, or their cache is big enough to hold the array of your test... (at least for l1 & l2, their 4Mbytes working set fits in cache) processor : 7 vendor_id : GenuineIntel cpu family : 6 model : 23 model name : Intel(R) Xeon(R) CPU E5450 @ 3.00GHz stepping : 6 cpu MHz : 3000.099 cache size : 6144 KB <<<< yes, thats big :) >>>> If I double size of working set #define NUM_LOCAL_TEST (2*1024*1024) then I get quite different numbers : Running local_t variant benchmarks atomic_long: local_inc=6729007264/100 local_add=6727005943/100 cpu_local_inc=724000569/10 local_read=1030000784/15 local _add_return=6623004616/98 (total was 3456106496) irqsave/restore: local_inc=4458002796/66 local_add=4459001998/66 cpu_local_inc=971000381/14 local_read=1060000389/15 loc al_add_return=4528001388/67 (total was 3456106496) trivalue: local_inc=2871000855/42 local_add=2867000976/42 cpu_local_inc=162000052/2 local_read=1747000551/26 local_add_r eturn=8829002352/131 (total was 3456106496) local_t: local_inc=2210000492/32 local_add=2206000460/32 cpu_local_inc=84000017/1 local_read=1029000203/15 local_add_ret urn=2216000415/33 (total was 3456106496, warm_total 125829124) If now I reduce NUM_LOCAL_TEST to 256*1024 so that even trivalue l3 fits cache. Running local_t variant benchmarks atomic_long: local_inc=98984929/11 local_add=98984889/11 cpu_local_inc=89986248/10 local_read=11998165/1 local_add_retur n=99003292/11 (total was 2579496960) irqsave/restore: local_inc=124000102/14 local_add=124000102/14 cpu_local_inc=121000100/14 local_read=17000013/2 local_ad d_return=126000103/15 (total was 2579496960) trivalue: local_inc=21000017/2 local_add=20000016/2 cpu_local_inc=20000017/2 local_read=25000021/2 local_add_return=1360 00110/16 (total was 2579496960) local_t: local_inc=17000014/2 local_add=17000015/2 cpu_local_inc=11000009/1 local_read=12000010/1 local_add_return=23000 019/2 (total was 2579496960, warm_total 15728642) About trivalues, their use in percpu_counter local storage (one trivalue for each cpu) would make the accuracy a litle bit more lazy... From mboxrd@z Thu Jan 1 00:00:00 1970 From: Eric Dumazet Date: Tue, 16 Dec 2008 23:59:00 +0000 Subject: Re: local_add_return Message-Id: <494840C4.50000@cosmosbay.com> List-Id: References: <200812161703.00697.rusty@rustcorp.com.au> <20081215.231314.92267481.davem@davemloft.net> <200812170908.05423.rusty@rustcorp.com.au> In-Reply-To: <200812170908.05423.rusty@rustcorp.com.au> MIME-Version: 1.0 Content-Type: text/plain; charset="macroman" Content-Transfer-Encoding: base64 To: Rusty Russell Cc: David Miller , rostedt@goodmis.org, akpm@linux-foundation.org, linux-kernel@vger.kernel.org, mathieu.desnoyers@polymtl.ca, paulus@samba.org, benh@kernel.crashing.org, linux-ia64@vger.kernel.org, linux-s390@vger.kernel.org UnVzdHkgUnVzc2VsbCBhIOljcml0IDoKPiBPbiBUdWVzZGF5IDE2IERlY2VtYmVyIDIwMDggMTc6 NDM6MTQgRGF2aWQgTWlsbGVyIHdyb3RlOgo+PiBIZXJlIHlhIGdvOgo+IAo+IFZlcnkgaW50ZXJl c3RpbmcuICBUaGVyZSdzIGEgbGl0dGxlIG5vaXNlIHRoZXJlICh0aGF0IGZpcnN0IGxvY2FsX2lu YyBvZiAyNDMKPiBpcyB3cm9uZyksIGJ1dCB0aGUgcGljdHVyZSBpcyBjbGVhcjogdHJpdmFsdWUg aXMgdGhlIGJlc3QgaW1wbGVtZW50YXRpb24gZm9yCj4gc3BhcmM2NC4KPiAKPiBOb3RlOiB0cml2 YWx1ZSB1c2VzIDMgdmFsdWVzLCBzbyBpbnN0ZWFkIG9mIGhpdHRpbmcgcmFuZG9tIHZhbHVlcyBh Y3Jvc3MgOE1CCj4gaXQncyBhY3Jvc3MgMjRNQiwgYW5kIGRlc3BpdGUgdGhlIHJlc3VsdGluZyBj YWNoZSBkYW1hZ2UgaXQncyAxNSUgZmFzdGVyLiAgVGhlCj4gY3B1X2xvY2FsX2luYyB0ZXN0IGlz IGEgc2luZ2xlIHZhbHVlLCBzbyBubyBjYWNoZSBlZmZlY3RzOiBpdCBzaG93cyB0cml2YWx1ZQo+ IHRvIGJlIDMgdG8gMy41IHRpbWVzIGZhc3RlciBpbiB0aGUgY2FjaGUtaG90IGNhc2UuCj4gCj4g VGhpcyBzdWNrcywgYmVjYXVzZSBpdCByZWFsbHkgZG9lcyBtZWFuIHRoYXQgdGhlcmUncyBubyBv bmUtc2l6ZS1maXRzLWFsbAo+IGltcGxlbWVudGF0aW9uIG9mIGxvY2FsX3QuICBUaGVyZSdzIGFs c28gbm8gcGxhdGZvcm0geWV0IHdoZXJlIGF0b21pY19sb25nX3QKPiBpcyB0aGUgcmlnaHQgY2hv aWNlOyBhbmQgdGhhdCdzIHRoZSBkZWZhdWx0IQo+IAo+IEFueSBjaGFuY2Ugb2YgYW4gSUE2NCBv ciBzMzkwIHJ1bj8gIFlvdSBjYW4gbm9ybWFsaXplIGlmIHlvdSBsaWtlLCBzaW5jZQo+IGl0J3Mg b25seSB0byBjb21wYXJlIHRoZSBkaWZmZXJlbnQgYXBwcm9hY2hlcy4KPiAKPiBDaGVlcnMsCj4g UnVzdHkuCj4gCj4gQmVuY2htYXJrcyBmb3IgbG9jYWxfdCB2YXJpYW50cwo+IAo+IChUaGlzIHBh dGNoIGFsc28gZml4ZXMgdGhlIHg4NiBjcHVfbG9jYWxfKiBtYWNyb3MsIHdoaWNoIGFyZSBvYnZp b3VzbHkKPiB1bnVzZWQpLgo+IAo+IEkgY2hvc2UgYSBsYXJnZSBhcnJheSAoMU0gbG9uZ3MpIGZv ciB0aGUgaW5jL2FkZC9hZGRfcmV0dXJuIHRlc3RzIHNvCj4gdGhlIHRyaXZhbHVlIGNhc2Ugd291 bGQgc2hvdyBzb21lIGNhY2hlIHByZXNzdXJlLgo+IAo+IFRoZSBjcHVfbG9jYWxfaW5jIGNhc2Ug aXMgYWx3YXlzIGNhY2hlLWhvdCwgc28gaXQncyBub3QgY29tcGFyYWJsZSB0bwo+IHRoZSBvdGhl cnMuCgpXb3VsZCBiZSBnb29kIHRvIGRpZmZlcmVuY2lhdGUgcmVzdWx0cywgaWYgZGF0YSBpcyBh bHJlYWR5IGluIGNhY2hlIG9yIG5vdC4uLgoKPiAKPiBUaW1lIGluIG5zIHBlciBpdGVyYXRpb24g KGJyYWNrZXRzIGlzIHdpdGggQ09ORklHX1BSRUVNUFQ9eSk6Cj4gCj4gCQlpbmMJYWRkCWFkZF9y ZXR1cm4JY3B1X2xvY2FsX2luYwlyZWFkCj4geDg2LTMyOiAyLjEzIEdoeiBDb3JlIER1byAyCj4g YXRvbWljX2xvbmcJMTE4CTExOAkxMTUJCTE3CQkxNwoKcmVhbGx5IHN0cmFuZ2UgYXRvbWljX2xv bmcgcGVyZm9ybXMgc28gYmFkbHkgaGVyZS4KTE9DSyArIGRhdGEgbm90IGluIGNhY2hlIC0+IHJl YWxseSByZWFsbHkgYmFkLi4uCgo+IGlycXNhdmUvcmVzdAk3Nwk3OAk3NwkJMjMJCTE2Cj4gdHJp dmFsdWUJNDUJNDUJMTI3CQkzKDYpCQkyMQo+IGxvY2FsX3QJCTM2CTM2CTM2CQkxKDUpCQkxNwo+ IAo+IHg4Ni02NDogMi42IEdIeiBEdWFsLUNvcmUgQU1EIE9wdGVyb24gMjIxOAo+IGF0b21pY19s b25nCTU1CTYwCS0JCTYJCTE5Cj4gaXJxc2F2ZS9yZXN0CTU0CTU0CS0JCTExCQkxOQo+IHRyaXZh bHVlCTQ3CTQ3CS0JCTUJCTI4Cj4gbG9jYWxfdAkJNDcJNDYJLQkJMQkJMTkKPiAKClJ1bm5pbmcg bG9jYWxfdCB2YXJpYW50IGJlbmNobWFya3MKYXRvbWljX2xvbmc6IGxvY2FsX2luYzk1MDAxODQ2 LzExIGxvY2FsX2FkZDk1MDAwMzI1LzExIGNwdV9sb2NhbF9pbmM2MjAwMDI5NS8xMCBsb2NhbF9y ZWFkSTAwMDA0MC8xIGxvY2FsX2FkZF9yZXR1cm45NjAwMDMyMi8xMSAodG90YWwgd2FzIDE3Mjgw NTMyNDgpCmlycXNhdmUvcmVzdG9yZTogbG9jYWxfaW5jSTgwMDA0MDAvMTQgbG9jYWxfYWRkSTYw MDAzOTUvMTQgY3B1X2xvY2FsX2luY0g2MDAwMzg0LzE0IGxvY2FsX3JlYWRoMDAwMDU0LzIgbG9j YWxfYWRkX3JldHVyblAyMDAwMzk0LzE0ICh0b3RhbCB3YXMgMTcyODA1MzI0OCkKdHJpdmFsdWU6 IGxvY2FsX2luYxMyNTAwMTAyNC8zOSBsb2NhbF9hZGQTMjQwMDEyMjYvMzkgY3B1X2xvY2FsX2lu Y4EwMDAwODAvMiBsb2NhbF9yZWFkeDYwMDA3NjYvMjMgbG9jYWxfYWRkX3JldHVybkE5MzAwMzc4 MS8xMjQgKHRvdGFsIHdhcyAxNzI4MDUzMjQ4KQpsb2NhbF90OiBsb2NhbF9pbmNpMDAwMDU5LzIg bG9jYWxfYWRkaTAwMDA1OC8yIGNwdV9sb2NhbF9pbmNCMDAwMDM1LzEgbG9jYWxfcmVhZFAwMDAw NDMvMSBsb2NhbF9hZGRfcmV0dXJukDAwMDA3Ni8yICh0b3RhbCB3YXMgMTcyODA1MzI0OCwgd2Fy bV90b3RhbCA2MjkxNDU2MikKCgpJbnRlbChSKSBYZW9uKFIpIENQVSAgICAgICAgICAgRTU0NTAg IEAgMy4wMEdIegoKdHdvIHF1YWRjb3JlIGNwdXMsIHg4Ni0zMiBrZXJuZWwKCkl0IHNlZW1zIENv cmUyIGFyZSByZWFsbHkgYmV0dGVyIHRoYW4gQ29yZSBEdW8gMiwKb3IgdGhlaXIgY2FjaGUgaXMg YmlnIGVub3VnaCB0byBob2xkIHRoZSBhcnJheSBvZiB5b3VyIHRlc3QuLi4KCihhdCBsZWFzdCBm b3IgbDEgJiBsMiwgdGhlaXIgNE1ieXRlcyB3b3JraW5nIHNldCBmaXRzIGluIGNhY2hlKQoKcHJv Y2Vzc29yICAgICAgIDogNwp2ZW5kb3JfaWQgICAgICAgOiBHZW51aW5lSW50ZWwKY3B1IGZhbWls eSAgICAgIDogNgptb2RlbCAgICAgICAgICAgOiAyMwptb2RlbCBuYW1lICAgICAgOiBJbnRlbChS KSBYZW9uKFIpIENQVSAgICAgICAgICAgRTU0NTAgIEAgMy4wMEdIegpzdGVwcGluZyAgICAgICAg OiA2CmNwdSBNSHogICAgICAgICA6IDMwMDAuMDk5CmNhY2hlIHNpemUgICAgICA6IDYxNDQgS0Ig ICAgPDw8PCB5ZXMsIHRoYXRzIGJpZyA6KSA+Pj4+CgpJZiBJIGRvdWJsZSBzaXplIG9mIHdvcmtp bmcgc2V0CgojZGVmaW5lIE5VTV9MT0NBTF9URVNUICgyKjEwMjQqMTAyNCkKCnRoZW4gSSBnZXQg cXVpdGUgZGlmZmVyZW50IG51bWJlcnMgOgoKUnVubmluZyBsb2NhbF90IHZhcmlhbnQgYmVuY2ht YXJrcwphdG9taWNfbG9uZzogbG9jYWxfaW5jZzI5MDA3MjY0LzEwMCBsb2NhbF9hZGRnMjcwMDU5 NDMvMTAwIGNwdV9sb2NhbF9pbmNyNDAwMDU2OS8xMCBsb2NhbF9yZWFkEDMwMDAwNzg0LzE1IGxv Y2FsCl9hZGRfcmV0dXJuZjIzMDA0NjE2Lzk4ICh0b3RhbCB3YXMgMzQ1NjEwNjQ5NikKaXJxc2F2 ZS9yZXN0b3JlOiBsb2NhbF9pbmNENTgwMDI3OTYvNjYgbG9jYWxfYWRkRDU5MDAxOTk4LzY2IGNw dV9sb2NhbF9pbmOXMTAwMDM4MS8xNCBsb2NhbF9yZWFkEDYwMDAwMzg5LzE1IGxvYwphbF9hZGRf cmV0dXJuRTI4MDAxMzg4LzY3ICh0b3RhbCB3YXMgMzQ1NjEwNjQ5NikKdHJpdmFsdWU6IGxvY2Fs X2luYyg3MTAwMDg1NS80MiBsb2NhbF9hZGQoNjcwMDA5NzYvNDIgY3B1X2xvY2FsX2luYxYyMDAw MDUyLzIgbG9jYWxfcmVhZBc0NzAwMDU1MS8yNiBsb2NhbF9hZGRfcgpldHVybogyOTAwMjM1Mi8x MzEgKHRvdGFsIHdhcyAzNDU2MTA2NDk2KQpsb2NhbF90OiBsb2NhbF9pbmMiMTAwMDA0OTIvMzIg bG9jYWxfYWRkIjA2MDAwNDYwLzMyIGNwdV9sb2NhbF9pbmOEMDAwMDE3LzEgbG9jYWxfcmVhZBAy OTAwMDIwMy8xNSBsb2NhbF9hZGRfcmV0CnVybiIxNjAwMDQxNS8zMyAodG90YWwgd2FzIDM0NTYx MDY0OTYsIHdhcm1fdG90YWwgMTI1ODI5MTI0KQoKSWYgbm93IEkgcmVkdWNlIE5VTV9MT0NBTF9U RVNUIHRvIDI1NioxMDI0IHNvIHRoYXQgZXZlbiB0cml2YWx1ZSBsMyBmaXRzIGNhY2hlLgoKUnVu bmluZyBsb2NhbF90IHZhcmlhbnQgYmVuY2htYXJrcwphdG9taWNfbG9uZzogbG9jYWxfaW5jmDk4 NDkyOS8xMSBsb2NhbF9hZGSYOTg0ODg5LzExIGNwdV9sb2NhbF9pbmOJOTg2MjQ4LzEwIGxvY2Fs X3JlYWQROTk4MTY1LzEgbG9jYWxfYWRkX3JldHVyCm6ZMDAzMjkyLzExICh0b3RhbCB3YXMgMjU3 OTQ5Njk2MCkKaXJxc2F2ZS9yZXN0b3JlOiBsb2NhbF9pbmMSNDAwMDEwMi8xNCBsb2NhbF9hZGQS NDAwMDEwMi8xNCBjcHVfbG9jYWxfaW5jEjEwMDAxMDAvMTQgbG9jYWxfcmVhZBcwMDAwMTMvMiBs b2NhbF9hZApkX3JldHVybhI2MDAwMTAzLzE1ICh0b3RhbCB3YXMgMjU3OTQ5Njk2MCkKdHJpdmFs dWU6IGxvY2FsX2luYyEwMDAwMTcvMiBsb2NhbF9hZGQgMDAwMDE2LzIgY3B1X2xvY2FsX2luYyAw MDAwMTcvMiBsb2NhbF9yZWFkJTAwMDAyMS8yIGxvY2FsX2FkZF9yZXR1cm4TNjAKMDAxMTAvMTYg KHRvdGFsIHdhcyAyNTc5NDk2OTYwKQpsb2NhbF90OiBsb2NhbF9pbmMXMDAwMDE0LzIgbG9jYWxf YWRkFzAwMDAxNS8yIGNwdV9sb2NhbF9pbmMRMDAwMDA5LzEgbG9jYWxfcmVhZBIwMDAwMTAvMSBs b2NhbF9hZGRfcmV0dXJuIzAwMAowMTkvMiAodG90YWwgd2FzIDI1Nzk0OTY5NjAsIHdhcm1fdG90 YWwgMTU3Mjg2NDIpCgoKCkFib3V0IHRyaXZhbHVlcywgdGhlaXIgdXNlIGluIHBlcmNwdV9jb3Vu dGVyIGxvY2FsIHN0b3JhZ2UgKG9uZSB0cml2YWx1ZSBmb3IgZWFjaCBjcHUpCndvdWxkIG1ha2Ug dGhlIGFjY3VyYWN5IGEgbGl0bGUgYml0IG1vcmUgbGF6eS4uLgoKCi0tClRvIHVuc3Vic2NyaWJl IGZyb20gdGhpcyBsaXN0OiBzZW5kIHRoZSBsaW5lICJ1bnN1YnNjcmliZSBsaW51eC1pYTY0IiBp bgp0aGUgYm9keSBvZiBhIG1lc3NhZ2UgdG8gbWFqb3Jkb21vQHZnZXIua2VybmVsLm9yZwpNb3Jl IG1ham9yZG9tbyBpbmZvIGF0ICBodHRwOi8vdmdlci5rZXJuZWwub3JnL21ham9yZG9tby1pbmZv Lmh0bWw=