From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752958AbZKIHgx (ORCPT ); Mon, 9 Nov 2009 02:36:53 -0500 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1752802AbZKIHgx (ORCPT ); Mon, 9 Nov 2009 02:36:53 -0500 Received: from fgwmail6.fujitsu.co.jp ([192.51.44.36]:44128 "EHLO fgwmail6.fujitsu.co.jp" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752417AbZKIHgw (ORCPT ); Mon, 9 Nov 2009 02:36:52 -0500 X-SecurityPolicyCheck-FJ: OK by FujitsuOutboundMailChecker v1.3.1 Date: Mon, 9 Nov 2009 16:34:20 +0900 From: KAMEZAWA Hiroyuki To: balbir@linux.vnet.ibm.com Cc: "linux-kernel@vger.kernel.org" , "linux-mm@kvack.org" , "nishimura@mxp.nes.nec.co.jp" , cl@linux-foundation.org, "akpm@linux-foundation.org" Subject: Re: [PATCH 0/2] memcg make use of new percpu implementations Message-Id: <20091109163420.5a2fa803.kamezawa.hiroyu@jp.fujitsu.com> In-Reply-To: <20091109070421.GD3042@balbir.in.ibm.com> References: <20091106175242.6e13ee29.kamezawa.hiroyu@jp.fujitsu.com> <20091109070421.GD3042@balbir.in.ibm.com> Organization: FUJITSU Co. LTD. X-Mailer: Sylpheed 2.5.0 (GTK+ 2.10.14; i686-pc-mingw32) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, 9 Nov 2009 12:34:21 +0530 Balbir Singh wrote: > * KAMEZAWA Hiroyuki [2009-11-06 17:52:42]: > > > Hi, > > > > Recent updates on dynamic percpu allocation looks good and I tries to rewrite > > memcg's poor implementation of percpu status counter. > > (It's not NUMA-aware ...) > > Thanks for great works. > > > > For this time. I added Christoph to CC because I'm not fully sure my usage of > > __this_cpu_xxx is correct...I'm glad if you check the usage when you have time. > > > > > > Patch 1/2 is just clean up (prepare for patch 2/2) > > Patch 2/2 is for percpu. > > > > Tested on my 8cpu box and works well. > > Pathcesa are against the latest mmotm. > > How do the test results look? DO you see a significant boost? Because my test enviroment is just an SMP (not NUMA), improvement will not be siginificant (visible), I think. But It's good to make use of recent updates of per_cpu implementation. We can make use of offset calculation methods of them and one-instruction-access macros. This is code size (PREEMPT=y) [Before patch] [kamezawa@bluextal mmotm-2.6.32-Nov2]$ size mm/memcontrol.o text data bss dec hex filename 22403 3420 4132 29955 7503 mm/memcontrol.o [After patch] [kamezawa@bluextal mmotm-2.6.32-Nov2]$ size mm/memcontrol.o text data bss dec hex filename 22188 3420 4132 29740 742c mm/memcontrol.o Then, text size is surely reduced. One example is mem_cgroup_swap_statistics(). This function is inlined after this patch. this code: mem_cgroup_uncharge_swap(), modifies percpu counter. == memcg = mem_cgroup_lookup(id); if (memcg) { /* * We uncharge this because swap is freed. * This memcg can be obsolete one. We avoid calling css_tryget */ if (!mem_cgroup_is_root(memcg)) res_counter_uncharge(&memcg->memsw, PAGE_SIZE); mem_cgroup_swap_statistics(memcg, false); mem_cgroup_put(memcg); } == [before patch] mem_cgroup_swap_statistics() is not inlined and uses 0x69 bytes. 0000000000001d30 : 1d30: push %rbp 1d31: mov %rsp,%rbp 1d34: push %r12 1d88: callq 1d8d 1d8d: nopl (%rax) 1d90: jmp 1d83 1d92: nopw %cs:0x0(%rax,%rax,1) 1d99: [After patch] mem_cgroup_uncharge_swap()'s inlined code. 3b67: cmp 0x0(%rip),%rax # 3b6e 3b6e: je 3b81 <=== check mem_cgroup_is_root 3b70: lea 0x90(%rax),%rdi 3b77: mov $0x1000,%esi 3b7c: callq 3b81 <=== calling res_counter_uncahrge() 3b81: 48 89 df mov %rbx,%rdi 3b84: 48 8b 83 70 01 00 00 mov 0x170(%rbx),%rax <=== get offset of mem->cpustat 3b8b: 65 48 83 40 30 ff addq $0xffffffffffffffff,%gs:0x30(%rax) mem->cpustat.count[index]--; 3b91: e8 6a e0 ff ff callq 1c00 This uses 2 instruction. Then, code size reduction is enough large, I think. >BTW, I've been experimenting a bit with the earlier percpu counter patches, > I might post an iteration once I have some good results. > Thank you, it's helpful. Regards, -Kame From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail143.messagelabs.com (mail143.messagelabs.com [216.82.254.35]) by kanga.kvack.org (Postfix) with SMTP id 3653D6B004D for ; Mon, 9 Nov 2009 02:37:00 -0500 (EST) Received: from m2.gw.fujitsu.co.jp ([10.0.50.72]) by fgwmail5.fujitsu.co.jp (Fujitsu Gateway) with ESMTP id nA97avhL008919 for (envelope-from kamezawa.hiroyu@jp.fujitsu.com); Mon, 9 Nov 2009 16:36:57 +0900 Received: from smail (m2 [127.0.0.1]) by outgoing.m2.gw.fujitsu.co.jp (Postfix) with ESMTP id E055D45DE51 for ; Mon, 9 Nov 2009 16:36:56 +0900 (JST) Received: from s2.gw.fujitsu.co.jp (s2.gw.fujitsu.co.jp [10.0.50.92]) by m2.gw.fujitsu.co.jp (Postfix) with ESMTP id B795745DE4F for ; Mon, 9 Nov 2009 16:36:56 +0900 (JST) Received: from s2.gw.fujitsu.co.jp (localhost.localdomain [127.0.0.1]) by s2.gw.fujitsu.co.jp (Postfix) with ESMTP id 9E7801DB8038 for ; Mon, 9 Nov 2009 16:36:56 +0900 (JST) Received: from m105.s.css.fujitsu.com (m105.s.css.fujitsu.com [10.249.87.105]) by s2.gw.fujitsu.co.jp (Postfix) with ESMTP id 0E5F71DB803A for ; Mon, 9 Nov 2009 16:36:56 +0900 (JST) Date: Mon, 9 Nov 2009 16:34:20 +0900 From: KAMEZAWA Hiroyuki Subject: Re: [PATCH 0/2] memcg make use of new percpu implementations Message-Id: <20091109163420.5a2fa803.kamezawa.hiroyu@jp.fujitsu.com> In-Reply-To: <20091109070421.GD3042@balbir.in.ibm.com> References: <20091106175242.6e13ee29.kamezawa.hiroyu@jp.fujitsu.com> <20091109070421.GD3042@balbir.in.ibm.com> Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org To: balbir@linux.vnet.ibm.com Cc: "linux-kernel@vger.kernel.org" , "linux-mm@kvack.org" , "nishimura@mxp.nes.nec.co.jp" , cl@linux-foundation.org, "akpm@linux-foundation.org" List-ID: On Mon, 9 Nov 2009 12:34:21 +0530 Balbir Singh wrote: > * KAMEZAWA Hiroyuki [2009-11-06 17:52:42]: > > > Hi, > > > > Recent updates on dynamic percpu allocation looks good and I tries to rewrite > > memcg's poor implementation of percpu status counter. > > (It's not NUMA-aware ...) > > Thanks for great works. > > > > For this time. I added Christoph to CC because I'm not fully sure my usage of > > __this_cpu_xxx is correct...I'm glad if you check the usage when you have time. > > > > > > Patch 1/2 is just clean up (prepare for patch 2/2) > > Patch 2/2 is for percpu. > > > > Tested on my 8cpu box and works well. > > Pathcesa are against the latest mmotm. > > How do the test results look? DO you see a significant boost? Because my test enviroment is just an SMP (not NUMA), improvement will not be siginificant (visible), I think. But It's good to make use of recent updates of per_cpu implementation. We can make use of offset calculation methods of them and one-instruction-access macros. This is code size (PREEMPT=y) [Before patch] [kamezawa@bluextal mmotm-2.6.32-Nov2]$ size mm/memcontrol.o text data bss dec hex filename 22403 3420 4132 29955 7503 mm/memcontrol.o [After patch] [kamezawa@bluextal mmotm-2.6.32-Nov2]$ size mm/memcontrol.o text data bss dec hex filename 22188 3420 4132 29740 742c mm/memcontrol.o Then, text size is surely reduced. One example is mem_cgroup_swap_statistics(). This function is inlined after this patch. this code: mem_cgroup_uncharge_swap(), modifies percpu counter. == memcg = mem_cgroup_lookup(id); if (memcg) { /* * We uncharge this because swap is freed. * This memcg can be obsolete one. We avoid calling css_tryget */ if (!mem_cgroup_is_root(memcg)) res_counter_uncharge(&memcg->memsw, PAGE_SIZE); mem_cgroup_swap_statistics(memcg, false); mem_cgroup_put(memcg); } == [before patch] mem_cgroup_swap_statistics() is not inlined and uses 0x69 bytes. 0000000000001d30 : 1d30: push %rbp 1d31: mov %rsp,%rbp 1d34: push %r12 1d88: callq 1d8d 1d8d: nopl (%rax) 1d90: jmp 1d83 1d92: nopw %cs:0x0(%rax,%rax,1) 1d99: [After patch] mem_cgroup_uncharge_swap()'s inlined code. 3b67: cmp 0x0(%rip),%rax # 3b6e 3b6e: je 3b81 <=== check mem_cgroup_is_root 3b70: lea 0x90(%rax),%rdi 3b77: mov $0x1000,%esi 3b7c: callq 3b81 <=== calling res_counter_uncahrge() 3b81: 48 89 df mov %rbx,%rdi 3b84: 48 8b 83 70 01 00 00 mov 0x170(%rbx),%rax <=== get offset of mem->cpustat 3b8b: 65 48 83 40 30 ff addq $0xffffffffffffffff,%gs:0x30(%rax) mem->cpustat.count[index]--; 3b91: e8 6a e0 ff ff callq 1c00 This uses 2 instruction. Then, code size reduction is enough large, I think. >BTW, I've been experimenting a bit with the earlier percpu counter patches, > I might post an iteration once I have some good results. > Thank you, it's helpful. Regards, -Kame -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org