From: Vineet Gupta <Vineet.Gupta1@synopsys.com> To: Linus Torvalds <torvalds@linux-foundation.org> Cc: Arnd Bergmann <arnd@arndb.de>, Khalid Aziz <khalid.aziz@oracle.com>, Andrey Konovalov <andreyknvl@google.com>, Andrew Morton <akpm@linux-foundation.org>, Peter Zijlstra <peterz@infradead.org>, Christian Brauner <christian.brauner@ubuntu.com>, Kees Cook <keescook@chromium.org>, Ingo Molnar <mingo@kernel.org>, Aleksa Sarai <cyphar@cyphar.com>, "open list:SYNOPSYS ARC ARCHITECTURE" <linux-snps-arc@lists.infradead.org>, "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>, linux-arch <linux-arch@vger.kernel.org> Subject: Re: [RFC 4/4] ARC: uaccess: use optimized generic __strnlen_user/__strncpy_from_user Date: Tue, 14 Jan 2020 22:14:31 +0000 [thread overview] Message-ID: <67715aba-fa40-1f46-288d-391d086328ac@synopsys.com> (raw) In-Reply-To: <CAHk-=wjX-c9YpPhbQ073UPnTvELNQCN49vqK1yY7JGuHSn5-ew@mail.gmail.com> On 1/14/20 1:49 PM, Linus Torvalds wrote: > On Tue, Jan 14, 2020 at 1:37 PM Vineet Gupta <Vineet.Gupta1@synopsys.com> wrote: >> >> On 1/14/20 12:42 PM, Arnd Bergmann wrote: >>> >>> What's wrong with the generic version on little-endian? Any >>> chance you can find a way to make it work as well for you as >>> this copy? >> >> find_zero() by default doesn't use pop count instructions. > > Don't you think the generic find_zero() is likely just as fast as the > pop count instruction? On 32-bit, I think it's like a shift and a mask > and a couple of additions. You are right that in grand scheme things it may be less than noise. ARC pop count version # bits = (bits - 1) & ~bits; # return bits >> 7; sub r0,r6,1 bic r6,r0,r6 lsr r0,r6,7 # return fls(mask) >> 3; fls.f r0, r0 add.nz r0, r0, 1 asr r5,r0,3 j_s.d [blink] Generic version # bits = (bits - 1) & ~bits; # return bits >> 7; sub r5,r6,1 bic r6,r5,r6 lsr r5,r6,7 # unsigned long a = (0x0ff0001+mask) >> 23; # return a & mask; add r0,r5,0x0ff0001 <-- this is 8 byte instruction though lsr_s r0,r0,23 and r5,r5,r0 j_s.d [blink] But its the usual itch/inclination of arch people to try and use the specific instruction if available. > > The 64-bit case has a multiply that is likely expensive unless you > have a good multiplication unit (but what 64-bit architecture > doesn't?), but the generic 32-bit LE code should already be pretty > close to optimal, and it might not be worth it to worry about it. > > (The big-endian case is very different, and architectures really can > do much better. But LE allows for bit tricks using the carry chain) -Vineet
WARNING: multiple messages have this Message-ID (diff)
From: Vineet Gupta <Vineet.Gupta1@synopsys.com> To: Linus Torvalds <torvalds@linux-foundation.org> Cc: linux-arch <linux-arch@vger.kernel.org>, Kees Cook <keescook@chromium.org>, Arnd Bergmann <arnd@arndb.de>, Peter Zijlstra <peterz@infradead.org>, Andrey Konovalov <andreyknvl@google.com>, Aleksa Sarai <cyphar@cyphar.com>, Ingo Molnar <mingo@kernel.org>, Khalid Aziz <khalid.aziz@oracle.com>, Christian Brauner <christian.brauner@ubuntu.com>, "open list:SYNOPSYS ARC ARCHITECTURE" <linux-snps-arc@lists.infradead.org>, Andrew Morton <akpm@linux-foundation.org>, "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org> Subject: Re: [RFC 4/4] ARC: uaccess: use optimized generic __strnlen_user/__strncpy_from_user Date: Tue, 14 Jan 2020 22:14:31 +0000 [thread overview] Message-ID: <67715aba-fa40-1f46-288d-391d086328ac@synopsys.com> (raw) In-Reply-To: <CAHk-=wjX-c9YpPhbQ073UPnTvELNQCN49vqK1yY7JGuHSn5-ew@mail.gmail.com> On 1/14/20 1:49 PM, Linus Torvalds wrote: > On Tue, Jan 14, 2020 at 1:37 PM Vineet Gupta <Vineet.Gupta1@synopsys.com> wrote: >> >> On 1/14/20 12:42 PM, Arnd Bergmann wrote: >>> >>> What's wrong with the generic version on little-endian? Any >>> chance you can find a way to make it work as well for you as >>> this copy? >> >> find_zero() by default doesn't use pop count instructions. > > Don't you think the generic find_zero() is likely just as fast as the > pop count instruction? On 32-bit, I think it's like a shift and a mask > and a couple of additions. You are right that in grand scheme things it may be less than noise. ARC pop count version # bits = (bits - 1) & ~bits; # return bits >> 7; sub r0,r6,1 bic r6,r0,r6 lsr r0,r6,7 # return fls(mask) >> 3; fls.f r0, r0 add.nz r0, r0, 1 asr r5,r0,3 j_s.d [blink] Generic version # bits = (bits - 1) & ~bits; # return bits >> 7; sub r5,r6,1 bic r6,r5,r6 lsr r5,r6,7 # unsigned long a = (0x0ff0001+mask) >> 23; # return a & mask; add r0,r5,0x0ff0001 <-- this is 8 byte instruction though lsr_s r0,r0,23 and r5,r5,r0 j_s.d [blink] But its the usual itch/inclination of arch people to try and use the specific instruction if available. > > The 64-bit case has a multiply that is likely expensive unless you > have a good multiplication unit (but what 64-bit architecture > doesn't?), but the generic 32-bit LE code should already be pretty > close to optimal, and it might not be worth it to worry about it. > > (The big-endian case is very different, and architectures really can > do much better. But LE allows for bit tricks using the carry chain) -Vineet _______________________________________________ linux-snps-arc mailing list linux-snps-arc@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-snps-arc
next prev parent reply other threads:[~2020-01-14 22:14 UTC|newest] Thread overview: 56+ messages / expand[flat|nested] mbox.gz Atom feed top 2020-01-14 20:08 [RFC 0/4] Switching ARC to optimized generic strncpy_from_user Vineet Gupta 2020-01-14 20:08 ` Vineet Gupta 2020-01-14 20:08 ` Vineet Gupta 2020-01-14 20:08 ` [RFC 1/4] asm-generic/uaccess: don't define inline functions if noinline lib/* in use Vineet Gupta 2020-01-14 20:08 ` Vineet Gupta 2020-01-14 20:57 ` Arnd Bergmann 2020-01-14 20:57 ` Arnd Bergmann 2020-01-14 20:57 ` Arnd Bergmann 2020-01-15 23:01 ` Vineet Gupta 2020-01-15 23:01 ` Vineet Gupta 2020-01-15 23:01 ` Vineet Gupta 2020-01-16 11:43 ` Arnd Bergmann 2020-01-16 11:43 ` Arnd Bergmann 2020-01-16 11:43 ` Arnd Bergmann 2020-01-14 21:32 ` Linus Torvalds 2020-01-14 21:32 ` Linus Torvalds 2020-01-15 9:08 ` Arnd Bergmann 2020-01-15 9:08 ` Arnd Bergmann 2020-01-15 9:08 ` Arnd Bergmann 2020-01-15 14:12 ` Al Viro 2020-01-15 14:12 ` Al Viro 2020-01-15 14:12 ` Al Viro 2020-01-15 14:21 ` Arnd Bergmann 2020-01-15 14:21 ` Arnd Bergmann 2020-01-15 14:21 ` Arnd Bergmann 2020-01-14 20:08 ` [RFC 2/4] lib/strncpy_from_user: Remove redundant user space pointer range check Vineet Gupta 2020-01-14 20:08 ` Vineet Gupta 2020-01-14 21:22 ` Linus Torvalds 2020-01-14 21:22 ` Linus Torvalds 2020-01-14 21:52 ` Vineet Gupta 2020-01-14 21:52 ` Vineet Gupta 2020-01-14 21:52 ` Vineet Gupta 2020-01-14 23:46 ` Al Viro 2020-01-14 23:46 ` Al Viro 2020-01-15 14:42 ` Andrey Konovalov 2020-01-15 14:42 ` Andrey Konovalov 2020-01-15 14:42 ` Andrey Konovalov 2020-01-15 23:00 ` Vineet Gupta 2020-01-15 23:00 ` Vineet Gupta 2020-01-15 23:00 ` Vineet Gupta 2020-01-14 20:08 ` [RFC 3/4] ARC: uaccess: remove noinline variants of __strncpy_from_user() and friends Vineet Gupta 2020-01-14 20:08 ` Vineet Gupta 2020-01-14 20:08 ` [RFC 4/4] ARC: uaccess: use optimized generic __strnlen_user/__strncpy_from_user Vineet Gupta 2020-01-14 20:08 ` Vineet Gupta 2020-01-14 20:42 ` Arnd Bergmann 2020-01-14 20:42 ` Arnd Bergmann 2020-01-14 20:42 ` Arnd Bergmann 2020-01-14 21:36 ` Vineet Gupta 2020-01-14 21:36 ` Vineet Gupta 2020-01-14 21:36 ` Vineet Gupta 2020-01-14 21:49 ` Linus Torvalds 2020-01-14 21:49 ` Linus Torvalds 2020-01-14 21:49 ` Linus Torvalds 2020-01-14 22:14 ` Vineet Gupta [this message] 2020-01-14 22:14 ` Vineet Gupta 2020-01-14 22:14 ` Vineet Gupta
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=67715aba-fa40-1f46-288d-391d086328ac@synopsys.com \ --to=vineet.gupta1@synopsys.com \ --cc=akpm@linux-foundation.org \ --cc=andreyknvl@google.com \ --cc=arnd@arndb.de \ --cc=christian.brauner@ubuntu.com \ --cc=cyphar@cyphar.com \ --cc=keescook@chromium.org \ --cc=khalid.aziz@oracle.com \ --cc=linux-arch@vger.kernel.org \ --cc=linux-kernel@vger.kernel.org \ --cc=linux-snps-arc@lists.infradead.org \ --cc=mingo@kernel.org \ --cc=peterz@infradead.org \ --cc=torvalds@linux-foundation.org \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.