From mboxrd@z Thu Jan 1 00:00:00 1970 From: Alexey Klimov Subject: Re: [PATCH] lib: Make _find_next_bit helper function inline Date: Mon, 24 Aug 2015 01:53:59 +0300 Message-ID: References: <1438110564-19932-1-git-send-email-cburden@codeaurora.org> <55B7F2C6.9010000@gmail.com> <20150728144537.67d46b5714c99d25f0bb33fb@linux-foundation.org> <1438176656.18723.8.camel@ceres> <55B93A47.90107@codeaurora.org> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Return-path: Received: from mail-la0-f51.google.com ([209.85.215.51]:33513 "EHLO mail-la0-f51.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753637AbbHWWyB (ORCPT ); Sun, 23 Aug 2015 18:54:01 -0400 In-Reply-To: <55B93A47.90107@codeaurora.org> Sender: linux-arm-msm-owner@vger.kernel.org List-Id: linux-arm-msm@vger.kernel.org To: Cassidy Burden Cc: Andrew Morton , Yury , linux-arm-msm@vger.kernel.org, Linux Kernel Mailing List , linux-arm-kernel@lists.infradead.org, "David S. Miller" , Daniel Borkmann , Hannes Frederic Sowa , Lai Jiangshan , Mark Salter , AKASHI Takahiro , Thomas Graf , Valentin Rothberg , Chris Wilson Hi Cassidy, On Wed, Jul 29, 2015 at 11:40 PM, Cassidy Burden wrote: > I changed the test module to now set the entire array to all 0/1s and > only flip a few bits. There appears to be a performance benefit, but > it's only 2-3% better (if that). If the main benefit of the original > patch was to save space then inlining definitely doesn't seem worth the > small gains in real use cases. > > find_next_zero_bit (us) > old new inline > 14440 17080 17086 > 4779 5181 5069 > 10844 12720 12746 > 9642 11312 11253 > 3858 3818 3668 > 10540 12349 12307 > 12470 14716 14697 > 5403 6002 5942 > 2282 1820 1418 > 13632 16056 15998 > 11048 13019 13030 > 6025 6790 6706 > 13255 15586 15605 > 3038 2744 2539 > 10353 12219 12239 > 10498 12251 12322 > 14767 17452 17454 > 12785 15048 15052 > 1655 1034 691 > 9924 11611 11558 > > find_next_bit (us) > old new inline > 8535 9936 9667 > 14666 17372 16880 > 2315 1799 1355 > 6578 9092 8806 > 6548 7558 7274 > 9448 11213 10821 > 3467 3497 3449 > 2719 3079 2911 > 6115 7989 7796 > 13582 16113 15643 > 4643 4946 4766 > 3406 3728 3536 > 7118 9045 8805 > 3174 3011 2701 > 13300 16780 16252 > 14285 16848 16330 > 11583 13669 13207 > 13063 15455 14989 > 12661 14955 14500 > 12068 14166 13790 > > On 7/29/2015 6:30 AM, Alexey Klimov wrote: >> >> I will re-check on another machine. It's really interesting if >> __always_inline makes things better for aarch64 and worse for x86_64. It >> will be nice if someone will check it on x86_64 too. > > > Very odd, this may be related to the other compiler optimizations Yuri > mentioned? It's better to ask Yury, i hope he can answer some day. Do you need to re-check this (with more iterations or on another machine(s))? -- Best regards, Klimov Alexey From mboxrd@z Thu Jan 1 00:00:00 1970 From: klimov.linux@gmail.com (Alexey Klimov) Date: Mon, 24 Aug 2015 01:53:59 +0300 Subject: [PATCH] lib: Make _find_next_bit helper function inline In-Reply-To: <55B93A47.90107@codeaurora.org> References: <1438110564-19932-1-git-send-email-cburden@codeaurora.org> <55B7F2C6.9010000@gmail.com> <20150728144537.67d46b5714c99d25f0bb33fb@linux-foundation.org> <1438176656.18723.8.camel@ceres> <55B93A47.90107@codeaurora.org> Message-ID: To: linux-arm-kernel@lists.infradead.org List-Id: linux-arm-kernel.lists.infradead.org Hi Cassidy, On Wed, Jul 29, 2015 at 11:40 PM, Cassidy Burden wrote: > I changed the test module to now set the entire array to all 0/1s and > only flip a few bits. There appears to be a performance benefit, but > it's only 2-3% better (if that). If the main benefit of the original > patch was to save space then inlining definitely doesn't seem worth the > small gains in real use cases. > > find_next_zero_bit (us) > old new inline > 14440 17080 17086 > 4779 5181 5069 > 10844 12720 12746 > 9642 11312 11253 > 3858 3818 3668 > 10540 12349 12307 > 12470 14716 14697 > 5403 6002 5942 > 2282 1820 1418 > 13632 16056 15998 > 11048 13019 13030 > 6025 6790 6706 > 13255 15586 15605 > 3038 2744 2539 > 10353 12219 12239 > 10498 12251 12322 > 14767 17452 17454 > 12785 15048 15052 > 1655 1034 691 > 9924 11611 11558 > > find_next_bit (us) > old new inline > 8535 9936 9667 > 14666 17372 16880 > 2315 1799 1355 > 6578 9092 8806 > 6548 7558 7274 > 9448 11213 10821 > 3467 3497 3449 > 2719 3079 2911 > 6115 7989 7796 > 13582 16113 15643 > 4643 4946 4766 > 3406 3728 3536 > 7118 9045 8805 > 3174 3011 2701 > 13300 16780 16252 > 14285 16848 16330 > 11583 13669 13207 > 13063 15455 14989 > 12661 14955 14500 > 12068 14166 13790 > > On 7/29/2015 6:30 AM, Alexey Klimov wrote: >> >> I will re-check on another machine. It's really interesting if >> __always_inline makes things better for aarch64 and worse for x86_64. It >> will be nice if someone will check it on x86_64 too. > > > Very odd, this may be related to the other compiler optimizations Yuri > mentioned? It's better to ask Yury, i hope he can answer some day. Do you need to re-check this (with more iterations or on another machine(s))? -- Best regards, Klimov Alexey