From mboxrd@z Thu Jan  1 00:00:00 1970
From: Alexey Klimov <klimov.linux@gmail.com>
Subject: Re: [PATCH] lib: Make _find_next_bit helper function inline
Date: Mon, 24 Aug 2015 01:53:59 +0300
Message-ID: <CALW4P+LU2iLkT7d=BiaC_=oSJ6K_g152VsVSjfcQGUKx_5q4tQ@mail.gmail.com>
References: <1438110564-19932-1-git-send-email-cburden@codeaurora.org>
	<55B7F2C6.9010000@gmail.com>
	<20150728144537.67d46b5714c99d25f0bb33fb@linux-foundation.org>
	<1438176656.18723.8.camel@ceres>
	<55B93A47.90107@codeaurora.org>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Return-path: <linux-arm-msm-owner@vger.kernel.org>
Received: from mail-la0-f51.google.com ([209.85.215.51]:33513 "EHLO
	mail-la0-f51.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
	with ESMTP id S1753637AbbHWWyB (ORCPT
	<rfc822;linux-arm-msm@vger.kernel.org>);
	Sun, 23 Aug 2015 18:54:01 -0400
In-Reply-To: <55B93A47.90107@codeaurora.org>
Sender: linux-arm-msm-owner@vger.kernel.org
List-Id: linux-arm-msm@vger.kernel.org
To: Cassidy Burden <cburden@codeaurora.org>
Cc: Andrew Morton <akpm@linux-foundation.org>, Yury <yury.norov@gmail.com>, linux-arm-msm@vger.kernel.org, Linux Kernel Mailing List <linux-kernel@vger.kernel.org>, linux-arm-kernel@lists.infradead.org, "David S. Miller" <davem@davemloft.net>, Daniel Borkmann <dborkman@redhat.com>, Hannes Frederic Sowa <hannes@stressinduktion.org>, Lai Jiangshan <laijs@cn.fujitsu.com>, Mark Salter <msalter@redhat.com>, AKASHI Takahiro <takahiro.akashi@linaro.org>, Thomas Graf <tgraf@suug.ch>, Valentin Rothberg <valentinrothberg@gmail.com>, Chris Wilson <chris@chris-wilson.co.uk>

Hi Cassidy,


On Wed, Jul 29, 2015 at 11:40 PM, Cassidy Burden <cburden@codeaurora.org> wrote:
> I changed the test module to now set the entire array to all 0/1s and
> only flip a few bits. There appears to be a performance benefit, but
> it's only 2-3% better (if that). If the main benefit of the original
> patch was to save space then inlining definitely doesn't seem worth the
> small gains in real use cases.
>
> find_next_zero_bit (us)
> old      new     inline
> 14440    17080   17086
> 4779     5181    5069
> 10844    12720   12746
> 9642     11312   11253
> 3858     3818    3668
> 10540    12349   12307
> 12470    14716   14697
> 5403     6002    5942
> 2282     1820    1418
> 13632    16056   15998
> 11048    13019   13030
> 6025     6790    6706
> 13255    15586   15605
> 3038     2744    2539
> 10353    12219   12239
> 10498    12251   12322
> 14767    17452   17454
> 12785    15048   15052
> 1655     1034    691
> 9924     11611   11558
>
> find_next_bit (us)
> old      new     inline
> 8535     9936    9667
> 14666    17372   16880
> 2315     1799    1355
> 6578     9092    8806
> 6548     7558    7274
> 9448     11213   10821
> 3467     3497    3449
> 2719     3079    2911
> 6115     7989    7796
> 13582    16113   15643
> 4643     4946    4766
> 3406     3728    3536
> 7118     9045    8805
> 3174     3011    2701
> 13300    16780   16252
> 14285    16848   16330
> 11583    13669   13207
> 13063    15455   14989
> 12661    14955   14500
> 12068    14166   13790
>
> On 7/29/2015 6:30 AM, Alexey Klimov wrote:
>>
>> I will re-check on another machine. It's really interesting if
>> __always_inline makes things better for aarch64 and worse for x86_64. It
>> will be nice if someone will check it on x86_64 too.
>
>
> Very odd, this may be related to the other compiler optimizations Yuri
> mentioned?

It's better to ask Yury, i hope he can answer some day.

Do you need to re-check this (with more iterations or on another machine(s))?

-- 
Best regards, Klimov Alexey

From mboxrd@z Thu Jan  1 00:00:00 1970
From: klimov.linux@gmail.com (Alexey Klimov)
Date: Mon, 24 Aug 2015 01:53:59 +0300
Subject: [PATCH] lib: Make _find_next_bit helper function inline
In-Reply-To: <55B93A47.90107@codeaurora.org>
References: <1438110564-19932-1-git-send-email-cburden@codeaurora.org>
 <55B7F2C6.9010000@gmail.com>
 <20150728144537.67d46b5714c99d25f0bb33fb@linux-foundation.org>
 <1438176656.18723.8.camel@ceres> <55B93A47.90107@codeaurora.org>
Message-ID: <CALW4P+LU2iLkT7d=BiaC_=oSJ6K_g152VsVSjfcQGUKx_5q4tQ@mail.gmail.com>
To: linux-arm-kernel@lists.infradead.org
List-Id: linux-arm-kernel.lists.infradead.org

Hi Cassidy,


On Wed, Jul 29, 2015 at 11:40 PM, Cassidy Burden <cburden@codeaurora.org> wrote:
> I changed the test module to now set the entire array to all 0/1s and
> only flip a few bits. There appears to be a performance benefit, but
> it's only 2-3% better (if that). If the main benefit of the original
> patch was to save space then inlining definitely doesn't seem worth the
> small gains in real use cases.
>
> find_next_zero_bit (us)
> old      new     inline
> 14440    17080   17086
> 4779     5181    5069
> 10844    12720   12746
> 9642     11312   11253
> 3858     3818    3668
> 10540    12349   12307
> 12470    14716   14697
> 5403     6002    5942
> 2282     1820    1418
> 13632    16056   15998
> 11048    13019   13030
> 6025     6790    6706
> 13255    15586   15605
> 3038     2744    2539
> 10353    12219   12239
> 10498    12251   12322
> 14767    17452   17454
> 12785    15048   15052
> 1655     1034    691
> 9924     11611   11558
>
> find_next_bit (us)
> old      new     inline
> 8535     9936    9667
> 14666    17372   16880
> 2315     1799    1355
> 6578     9092    8806
> 6548     7558    7274
> 9448     11213   10821
> 3467     3497    3449
> 2719     3079    2911
> 6115     7989    7796
> 13582    16113   15643
> 4643     4946    4766
> 3406     3728    3536
> 7118     9045    8805
> 3174     3011    2701
> 13300    16780   16252
> 14285    16848   16330
> 11583    13669   13207
> 13063    15455   14989
> 12661    14955   14500
> 12068    14166   13790
>
> On 7/29/2015 6:30 AM, Alexey Klimov wrote:
>>
>> I will re-check on another machine. It's really interesting if
>> __always_inline makes things better for aarch64 and worse for x86_64. It
>> will be nice if someone will check it on x86_64 too.
>
>
> Very odd, this may be related to the other compiler optimizations Yuri
> mentioned?

It's better to ask Yury, i hope he can answer some day.

Do you need to re-check this (with more iterations or on another machine(s))?

-- 
Best regards, Klimov Alexey