All of lore.kernel.org
 help / color / mirror / Atom feed
From: Vineet Gupta <Vineet.Gupta1@synopsys.com>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Arnd Bergmann <arnd@arndb.de>,
	Khalid Aziz <khalid.aziz@oracle.com>,
	Andrey Konovalov <andreyknvl@google.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Christian Brauner <christian.brauner@ubuntu.com>,
	Kees Cook <keescook@chromium.org>, Ingo Molnar <mingo@kernel.org>,
	Aleksa Sarai <cyphar@cyphar.com>,
	"open list:SYNOPSYS ARC ARCHITECTURE" 
	<linux-snps-arc@lists.infradead.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	linux-arch <linux-arch@vger.kernel.org>
Subject: Re: [RFC 4/4] ARC: uaccess: use optimized generic __strnlen_user/__strncpy_from_user
Date: Tue, 14 Jan 2020 22:14:31 +0000	[thread overview]
Message-ID: <67715aba-fa40-1f46-288d-391d086328ac@synopsys.com> (raw)
In-Reply-To: <CAHk-=wjX-c9YpPhbQ073UPnTvELNQCN49vqK1yY7JGuHSn5-ew@mail.gmail.com>

On 1/14/20 1:49 PM, Linus Torvalds wrote:
> On Tue, Jan 14, 2020 at 1:37 PM Vineet Gupta <Vineet.Gupta1@synopsys.com> wrote:
>>
>> On 1/14/20 12:42 PM, Arnd Bergmann wrote:
>>>
>>> What's wrong with the generic version on little-endian? Any
>>> chance you can find a way to make it work as well for you as
>>> this copy?
>>
>> find_zero() by default doesn't use pop count instructions.
> 
> Don't you think the generic find_zero() is likely just as fast as the
> pop count instruction? On 32-bit, I think it's like a shift and a mask
> and a couple of additions.

You are right that in grand scheme things it may be less than noise.

ARC pop count version

# 	bits = (bits - 1) & ~bits;
#  	return bits >> 7;

	sub r0,r6,1
	bic r6,r0,r6
	lsr r0,r6,7

# 	return fls(mask) >> 3;

	fls.f	r0, r0
	add.nz	r0, r0, 1
	asr r5,r0,3

	j_s.d [blink]

Generic version

# 	bits = (bits - 1) & ~bits;
#  	return bits >> 7;

	sub r5,r6,1
	bic r6,r5,r6
	lsr r5,r6,7

#  	unsigned long a = (0x0ff0001+mask) >> 23;
# 	return a & mask;

	add r0,r5,0x0ff0001	<-- this is 8 byte instruction though
	lsr_s r0,r0,23
	and r5,r5,r0

	j_s.d [blink]


But its the usual itch/inclination of arch people to try and use the specific
instruction if available.

> 
> The 64-bit case has a multiply that is likely expensive unless you
> have a good multiplication unit (but what 64-bit architecture
> doesn't?), but the generic 32-bit LE code should already be pretty
> close to optimal, and it might not be worth it to worry about it.
> 
> (The big-endian case is very different, and architectures really can
> do much better. But LE allows for bit tricks using the carry chain)

-Vineet

WARNING: multiple messages have this Message-ID (diff)
From: Vineet Gupta <Vineet.Gupta1@synopsys.com>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: linux-arch <linux-arch@vger.kernel.org>,
	Kees Cook <keescook@chromium.org>, Arnd Bergmann <arnd@arndb.de>,
	Peter Zijlstra <peterz@infradead.org>,
	Andrey Konovalov <andreyknvl@google.com>,
	Aleksa Sarai <cyphar@cyphar.com>, Ingo Molnar <mingo@kernel.org>,
	Khalid Aziz <khalid.aziz@oracle.com>,
	Christian Brauner <christian.brauner@ubuntu.com>,
	"open list:SYNOPSYS ARC ARCHITECTURE"
	<linux-snps-arc@lists.infradead.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
Subject: Re: [RFC 4/4] ARC: uaccess: use optimized generic __strnlen_user/__strncpy_from_user
Date: Tue, 14 Jan 2020 22:14:31 +0000	[thread overview]
Message-ID: <67715aba-fa40-1f46-288d-391d086328ac@synopsys.com> (raw)
In-Reply-To: <CAHk-=wjX-c9YpPhbQ073UPnTvELNQCN49vqK1yY7JGuHSn5-ew@mail.gmail.com>

On 1/14/20 1:49 PM, Linus Torvalds wrote:
> On Tue, Jan 14, 2020 at 1:37 PM Vineet Gupta <Vineet.Gupta1@synopsys.com> wrote:
>>
>> On 1/14/20 12:42 PM, Arnd Bergmann wrote:
>>>
>>> What's wrong with the generic version on little-endian? Any
>>> chance you can find a way to make it work as well for you as
>>> this copy?
>>
>> find_zero() by default doesn't use pop count instructions.
> 
> Don't you think the generic find_zero() is likely just as fast as the
> pop count instruction? On 32-bit, I think it's like a shift and a mask
> and a couple of additions.

You are right that in grand scheme things it may be less than noise.

ARC pop count version

# 	bits = (bits - 1) & ~bits;
#  	return bits >> 7;

	sub r0,r6,1
	bic r6,r0,r6
	lsr r0,r6,7

# 	return fls(mask) >> 3;

	fls.f	r0, r0
	add.nz	r0, r0, 1
	asr r5,r0,3

	j_s.d [blink]

Generic version

# 	bits = (bits - 1) & ~bits;
#  	return bits >> 7;

	sub r5,r6,1
	bic r6,r5,r6
	lsr r5,r6,7

#  	unsigned long a = (0x0ff0001+mask) >> 23;
# 	return a & mask;

	add r0,r5,0x0ff0001	<-- this is 8 byte instruction though
	lsr_s r0,r0,23
	and r5,r5,r0

	j_s.d [blink]


But its the usual itch/inclination of arch people to try and use the specific
instruction if available.

> 
> The 64-bit case has a multiply that is likely expensive unless you
> have a good multiplication unit (but what 64-bit architecture
> doesn't?), but the generic 32-bit LE code should already be pretty
> close to optimal, and it might not be worth it to worry about it.
> 
> (The big-endian case is very different, and architectures really can
> do much better. But LE allows for bit tricks using the carry chain)

-Vineet
_______________________________________________
linux-snps-arc mailing list
linux-snps-arc@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-snps-arc

  reply	other threads:[~2020-01-14 22:14 UTC|newest]

Thread overview: 56+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-01-14 20:08 [RFC 0/4] Switching ARC to optimized generic strncpy_from_user Vineet Gupta
2020-01-14 20:08 ` Vineet Gupta
2020-01-14 20:08 ` Vineet Gupta
2020-01-14 20:08 ` [RFC 1/4] asm-generic/uaccess: don't define inline functions if noinline lib/* in use Vineet Gupta
2020-01-14 20:08   ` Vineet Gupta
2020-01-14 20:57   ` Arnd Bergmann
2020-01-14 20:57     ` Arnd Bergmann
2020-01-14 20:57     ` Arnd Bergmann
2020-01-15 23:01     ` Vineet Gupta
2020-01-15 23:01       ` Vineet Gupta
2020-01-15 23:01       ` Vineet Gupta
2020-01-16 11:43       ` Arnd Bergmann
2020-01-16 11:43         ` Arnd Bergmann
2020-01-16 11:43         ` Arnd Bergmann
2020-01-14 21:32   ` Linus Torvalds
2020-01-14 21:32     ` Linus Torvalds
2020-01-15  9:08     ` Arnd Bergmann
2020-01-15  9:08       ` Arnd Bergmann
2020-01-15  9:08       ` Arnd Bergmann
2020-01-15 14:12       ` Al Viro
2020-01-15 14:12         ` Al Viro
2020-01-15 14:12         ` Al Viro
2020-01-15 14:21         ` Arnd Bergmann
2020-01-15 14:21           ` Arnd Bergmann
2020-01-15 14:21           ` Arnd Bergmann
2020-01-14 20:08 ` [RFC 2/4] lib/strncpy_from_user: Remove redundant user space pointer range check Vineet Gupta
2020-01-14 20:08   ` Vineet Gupta
2020-01-14 21:22   ` Linus Torvalds
2020-01-14 21:22     ` Linus Torvalds
2020-01-14 21:52     ` Vineet Gupta
2020-01-14 21:52       ` Vineet Gupta
2020-01-14 21:52       ` Vineet Gupta
2020-01-14 23:46     ` Al Viro
2020-01-14 23:46       ` Al Viro
2020-01-15 14:42   ` Andrey Konovalov
2020-01-15 14:42     ` Andrey Konovalov
2020-01-15 14:42     ` Andrey Konovalov
2020-01-15 23:00     ` Vineet Gupta
2020-01-15 23:00       ` Vineet Gupta
2020-01-15 23:00       ` Vineet Gupta
2020-01-14 20:08 ` [RFC 3/4] ARC: uaccess: remove noinline variants of __strncpy_from_user() and friends Vineet Gupta
2020-01-14 20:08   ` Vineet Gupta
2020-01-14 20:08 ` [RFC 4/4] ARC: uaccess: use optimized generic __strnlen_user/__strncpy_from_user Vineet Gupta
2020-01-14 20:08   ` Vineet Gupta
2020-01-14 20:42   ` Arnd Bergmann
2020-01-14 20:42     ` Arnd Bergmann
2020-01-14 20:42     ` Arnd Bergmann
2020-01-14 21:36     ` Vineet Gupta
2020-01-14 21:36       ` Vineet Gupta
2020-01-14 21:36       ` Vineet Gupta
2020-01-14 21:49       ` Linus Torvalds
2020-01-14 21:49         ` Linus Torvalds
2020-01-14 21:49         ` Linus Torvalds
2020-01-14 22:14         ` Vineet Gupta [this message]
2020-01-14 22:14           ` Vineet Gupta
2020-01-14 22:14           ` Vineet Gupta

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=67715aba-fa40-1f46-288d-391d086328ac@synopsys.com \
    --to=vineet.gupta1@synopsys.com \
    --cc=akpm@linux-foundation.org \
    --cc=andreyknvl@google.com \
    --cc=arnd@arndb.de \
    --cc=christian.brauner@ubuntu.com \
    --cc=cyphar@cyphar.com \
    --cc=keescook@chromium.org \
    --cc=khalid.aziz@oracle.com \
    --cc=linux-arch@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-snps-arc@lists.infradead.org \
    --cc=mingo@kernel.org \
    --cc=peterz@infradead.org \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.