From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 446B5C433DB for ; Thu, 25 Feb 2021 11:54:54 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id EA36D64F19 for ; Thu, 25 Feb 2021 11:54:53 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233313AbhBYLyu (ORCPT ); Thu, 25 Feb 2021 06:54:50 -0500 Received: from mail2.protonmail.ch ([185.70.40.22]:39792 "EHLO mail2.protonmail.ch" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233839AbhBYLyX (ORCPT ); Thu, 25 Feb 2021 06:54:23 -0500 Date: Thu, 25 Feb 2021 11:53:29 +0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=pm.me; s=protonmail; t=1614254013; bh=ONdes30hp2y4aeuL69ENZj2u7NiSpoqFLcmz383kzpw=; h=Date:To:From:Cc:Reply-To:Subject:In-Reply-To:References:From; b=BAqDq3Woizix8XxvcwS1Xv6HwuajMFgNu7o0RAPuqMNg0yHBoNDgaRMv1RJe2LZLn AYoq+o6wfvmJ7U+ZsUz9tGrPu9WSJuIIaw2gnjOcTMjDnh6zw90WFxQzDKVLzIua9t MV+Al/asUIUoVBVkewDnGmpNsK9tdezkvy+v4xAAnI9YfPAqp546ZkwETLSf2tUvP/ WdR2uv3Ge6yg5XYyo1vI/HkA6sswqpn4LbD49/tvFCWvxgt23u1KOzct2LlSVv/Xfn WgFcUHt0teYQK1eWqIpGSaP3bgQvzklqYQpecOT+7wRmKYTxDcPTGHV4dABpQX/mP/ aZbL7805MYRdg== To: Yury Norov From: Alexander Lobakin Cc: Alexander Lobakin , Catalin Marinas , Will Deacon , linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, linux-arch@vger.kernel.org, linux-mips@vger.kernel.org Reply-To: Alexander Lobakin Subject: Re: [PATCH] arm64: enable GENERIC_FIND_FIRST_BIT Message-ID: <20210225115320.3491-1-alobakin@pm.me> In-Reply-To: <20210224154416.GA1181413@yury-ThinkPad> References: <20201205165406.108990-1-yury.norov@gmail.com> <20210224115247.1618-1-alobakin@pm.me> <20210224154416.GA1181413@yury-ThinkPad> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-arch@vger.kernel.org From: Yury Norov Date: Wed, 24 Feb 2021 07:44:16 -0800 > On Wed, Feb 24, 2021 at 11:52:55AM +0000, Alexander Lobakin wrote: > > From: Yury Norov > > Date: Sat, 5 Dec 2020 08:54:06 -0800 > > > > Hi, > > > > > ARM64 doesn't implement find_first_{zero}_bit in arch code and doesn'= t > > > enable it in config. It leads to using find_next_bit() which is less > > > efficient: > > > > > > 0000000000000000 : > > > 0:=09aa0003e4 =09mov=09x4, x0 > > > 4:=09aa0103e0 =09mov=09x0, x1 > > > 8:=09b4000181 =09cbz=09x1, 38 > > > c:=09f9400083 =09ldr=09x3, [x4] > > > 10:=09d2800802 =09mov=09x2, #0x40 =09// #64 > > > 14:=0991002084 =09add=09x4, x4, #0x8 > > > 18:=09b40000c3 =09cbz=09x3, 30 > > > 1c:=0914000008 =09b=093c > > > 20:=09f8408483 =09ldr=09x3, [x4], #8 > > > 24:=0991010045 =09add=09x5, x2, #0x40 > > > 28:=09b50000c3 =09cbnz=09x3, 40 > > > 2c:=09aa0503e2 =09mov=09x2, x5 > > > 30:=09eb02001f =09cmp=09x0, x2 > > > 34:=0954ffff68 =09b.hi=0920 // b.pmore > > > 38:=09d65f03c0 =09ret > > > 3c:=09d2800002 =09mov=09x2, #0x0 =09// #0 > > > 40:=09dac00063 =09rbit=09x3, x3 > > > 44:=09dac01063 =09clz=09x3, x3 > > > 48:=098b020062 =09add=09x2, x3, x2 > > > 4c:=09eb02001f =09cmp=09x0, x2 > > > 50:=099a829000 =09csel=09x0, x0, x2, ls // ls =3D plast > > > 54:=09d65f03c0 =09ret > > > > > > ... > > > > > > 0000000000000118 <_find_next_bit.constprop.1>: > > > 118:=09eb02007f =09cmp=09x3, x2 > > > 11c:=09540002e2 =09b.cs=09178 <_find_next_bit.constprop.1+0x60> // = b.hs, b.nlast > > > 120:=09d346fc66 =09lsr=09x6, x3, #6 > > > 124:=09f8667805 =09ldr=09x5, [x0, x6, lsl #3] > > > 128:=09b4000061 =09cbz=09x1, 134 <_find_next_bit.constprop.1+0x1c> > > > 12c:=09f8667826 =09ldr=09x6, [x1, x6, lsl #3] > > > 130:=098a0600a5 =09and=09x5, x5, x6 > > > 134:=09ca0400a6 =09eor=09x6, x5, x4 > > > 138:=0992800005 =09mov=09x5, #0xffffffffffffffff =09// #-1 > > > 13c:=099ac320a5 =09lsl=09x5, x5, x3 > > > 140:=09927ae463 =09and=09x3, x3, #0xffffffffffffffc0 > > > 144:=09ea0600a5 =09ands=09x5, x5, x6 > > > 148:=0954000120 =09b.eq=0916c <_find_next_bit.constprop.1+0x54> // = b.none > > > 14c:=091400000e =09b=09184 <_find_next_bit.constprop.1+0x6c> > > > 150:=09d346fc66 =09lsr=09x6, x3, #6 > > > 154:=09f8667805 =09ldr=09x5, [x0, x6, lsl #3] > > > 158:=09b4000061 =09cbz=09x1, 164 <_find_next_bit.constprop.1+0x4c> > > > 15c:=09f8667826 =09ldr=09x6, [x1, x6, lsl #3] > > > 160:=098a0600a5 =09and=09x5, x5, x6 > > > 164:=09eb05009f =09cmp=09x4, x5 > > > 168:=09540000c1 =09b.ne=09180 <_find_next_bit.constprop.1+0x68> // = b.any > > > 16c:=0991010063 =09add=09x3, x3, #0x40 > > > 170:=09eb03005f =09cmp=09x2, x3 > > > 174:=0954fffee8 =09b.hi=09150 <_find_next_bit.constprop.1+0x38> // = b.pmore > > > 178:=09aa0203e0 =09mov=09x0, x2 > > > 17c:=09d65f03c0 =09ret > > > 180:=09ca050085 =09eor=09x5, x4, x5 > > > 184:=09dac000a5 =09rbit=09x5, x5 > > > 188:=09dac010a5 =09clz=09x5, x5 > > > 18c:=098b0300a3 =09add=09x3, x5, x3 > > > 190:=09eb03005f =09cmp=09x2, x3 > > > 194:=099a839042 =09csel=09x2, x2, x3, ls // ls =3D plast > > > 198:=09aa0203e0 =09mov=09x0, x2 > > > 19c:=09d65f03c0 =09ret > > > > > > ... > > > > > > 0000000000000238 : > > > 238:=09a9bf7bfd =09stp=09x29, x30, [sp, #-16]! > > > 23c:=09aa0203e3 =09mov=09x3, x2 > > > 240:=09d2800004 =09mov=09x4, #0x0 =09// #0 > > > 244:=09aa0103e2 =09mov=09x2, x1 > > > 248:=09910003fd =09mov=09x29, sp > > > 24c:=09d2800001 =09mov=09x1, #0x0 =09// #0 > > > 250:=0997ffffb2 =09bl=09118 <_find_next_bit.constprop.1> > > > 254:=09a8c17bfd =09ldp=09x29, x30, [sp], #16 > > > 258:=09d65f03c0 =09ret > > > > > > Enabling this functions would also benefit for_each_{set,clear}_bit()= . > > > Would it make sense to enable this config for all such architectures = by > > > default? > > > > I confirm that GENERIC_FIND_FIRST_BIT also produces more optimized and > > fast code on MIPS (32 R2) where there is also no architecture-specific > > bitsearching routines. > > So, if it's okay for other folks, I'd suggest to go for it and enable > > for all similar arches. > > As far as I understand the idea of GENERIC_FIND_FIRST_BIT=3Dn, it's > intended to save some space in .text. But in fact it bloats the > kernel: > > yury:linux$ scripts/bloat-o-meter vmlinux vmlinux.ffb > add/remove: 4/1 grow/shrink: 19/251 up/down: 564/-1692 (-1128) > ... Same for MIPS, enabling GENERIC_FIND_FIRST_BIT saves a bunch of .text memory despite that it introduces new entries. > For the next cycle, I'm going to submit a patch that removes the > GENERIC_FIND_FIRST_BIT completely and forces all architectures to > use find_first{_zero}_bit() I like that idea. I'm almost sure there'll be no arch that benefits from CONFIG_GENERIC_FIND_FIRST_BIT=3Dn (and has no arch-optimized versions). > > (otherwise, I'll publish a separate entry for mips-next after 5.12-rc1 > > release and mention you in "Suggested-by:") > > I think it worth to enable GENERIC_FIND_FIRST_BIT for mips and arm now > and see how it works for people. If there'll be no complains I'll remove > the config entirely. I'm OK if you submit the patch for mips now, or we > can make a series and submit together. Works either way. Lez make a series and see how it goes. I'll send you MIPS part soon. Al