From: Vineet Gupta <Vineet.Gupta1@synopsys.com> To: Linus Torvalds <torvalds@linux-foundation.org> Cc: Al Viro <viro@zeniv.linux.org.uk>, "linux-arch@vger.kernel.org" <linux-arch@vger.kernel.org>, "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>, Richard Henderson <rth@twiddle.net>, Russell King <linux@armlinux.org.uk>, Will Deacon <will.deacon@arm.com>, Haavard Skinnemoen <hskinnemoen@gmail.com>, Steven Miao <realmz6@gmail.com>, Jesper Nilsson <jesper.nilsson@axis.com>, Mark Salter <msalter@redhat.com>, Yoshinori Sato <ysato@users.sourceforge.jp>, "Richard Kuo" <rkuo@codeaurora.org>, Tony Luck <tony.luck@intel.com>, "Geert Uytterhoeven" <geert@linux-m68k.org>, James Hogan <james.hogan@imgtec.com>, Michal Simek <monstr@monstr.eu>, David Howells <dhowells@redhat.com>, "Ley Foon Tan" <lftan@altera.com>, Jonas Bonn <Jonas.Nilsson@synopsys> Subject: Re: [RFC][CFT][PATCHSET v1] uaccess unification Date: Thu, 30 Mar 2017 13:40:31 -0700 [thread overview] Message-ID: <efb7aaa4-7d25-0c68-ebf8-cdd7eb1297dc@synopsys.com> (raw) In-Reply-To: <CA+55aFyGwYwdk8i7-GbXV7NLTn38e-bow3VD-hHcQmTr9ebAjw@mail.gmail.com> On 03/29/2017 05:27 PM, Linus Torvalds wrote: > On Wed, Mar 29, 2017 at 5:02 PM, Vineet Gupta > <Vineet.Gupta1@synopsys.com> wrote: >> >> I guess I can in next day or two - but mind you the inline version for ARC is kind >> of special vs. other arches. We have this "manual" constant propagation to elide >> the unrolled LD/ST for 1-15 byte stragglers, when @sz is constant. > > I don't think that's special. We do that on x86 too, and I suspect ARC > copied it from there (or from somebody else who did it). No, I (re)wrote that code and AFAIKR didn't copy from anyone and AFAICS it is certainly different from others if not special. If you look closely at arc:access.h it is not the trivial check for 1-2-4 conversion as in the commit you referred to. It actually tries to compile time eliminate hunks from inline assembly, for constant @sz (so is designed purely for inlined variants, whether that matters or not is a different story). Thing is from the hardware POV, 4 LD/ST in flight is good (atleast for ARC700 cores) so we wrap it up in a Zero delay loop. This takes care of multiples of 16 bytes, the last 15 bytes are the killer which requires bunch of conditionals which is what I try to eliminate. FWIW, I experimented with uaccess inlining on ARC 1. pristine 4.11-rc1 (all inline) 2. Inline + disabling the "smart" const propagation 3. Out of line only variants (which already existed/default on ARC for -Os, but hacked for current -O3) Numbers for LMBench FS latency (off of tmpfs to avoid any device related perturbation). Note that LMBench already runs them several times itself and each of below is obviously with a fresh reboot since kernels were different. So it seems 0k file create/del gets worse without the smart inline, while 10k gets better. mmap (16k) got worse as well. With out of line some got better while some worse. File & VM system latencies in microseconds - smaller is better ------------------------------------------------------------------------------- Host OS 0K File 10K File Mmap Prot Page 100fd Create Delete Create Delete Latency Fault Fault selct --------- ------------- ------ ------ ------ ------ ------- ----- ------- ----- 170329-v4 Linux 4.11.0- 124.3 75.3 734.2 147.8 2200.0 6.205 10.9 87.6 170330-v4 Linux 4.11.0- 154.9 88.3 709.2 131.2 2494.0 4.056 11.0 91.1 170330-v4 Linux 4.11.0- 157.7 69.8 622.7 140.8 2168.0 5.654 10.8 91.0 Compare that to data against 1. pristine 4.11-rc1 (all inline) 2. Al's series + ARC forced inline 3. Al's series + ARC forced NOT inline File & VM system latencies in microseconds - smaller is better ------------------------------------------------------------------------------- Host OS 0K File 10K File Mmap Prot Page 100fd Create Delete Create Delete Latency Fault Fault selct --------- ------------- ------ ------ ------ ------ ------- ----- ------- ----- 170329-v4 Linux 4.11.0- 124.3 75.3 734.2 147.8 2200.0 6.205 10.9 87.6 170329-v4 Linux 4.11.0- 141.2 63.4 629.7 130.0 2172.0 5.796 10.8 90.0 170329-v4 Linux 4.11.0- 154.9 89.2 691.6 147.7 2323.0 4.922 10.8 92.3 So it's a mix bag really. Maybe we need some better directed test to really drill it down. > But at least on x86 is is limited entirely to the "__" versions, and > it's almost entirely pointless. We actually removed some of that kind > of code because it was *do* pointless, and it had just been copied > around into the "atomic" versions too. > > See for example commit bd28b14591b9 ("x86: remove more uaccess_32.h > complexity"), which did that. > > The basic "__" versions still do that constant-size thing, but they > really are questionable. Perhaps because the scope of constant usage was pretty narrow - it would only benefit if *copy_from_user() were called with 1,2,4 which is relatively unlikely as we have __get_user and friends for that already. > Exactly because it's just the "__" versions - > the *regular* "copy_to/from_user()" is an unconditional function call, > because inlining it isn't just the access operations, it's the size > check, and on modern x86 it's also the "set AC to mark the user access > as safe". So what you are saying is it is relatively costly on x86 because of SMAP which may not be true for arches w/o hardware support. Note that I'm not arguing for/against inlining per-se, it seems it doesn't matter -Vineet
WARNING: multiple messages have this Message-ID (diff)
From: Vineet Gupta <Vineet.Gupta1@synopsys.com> To: Linus Torvalds <torvalds@linux-foundation.org> Cc: Al Viro <viro@zeniv.linux.org.uk>, "linux-arch@vger.kernel.org" <linux-arch@vger.kernel.org>, "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>, Richard Henderson <rth@twiddle.net>, Russell King <linux@armlinux.org.uk>, Will Deacon <will.deacon@arm.com>, Haavard Skinnemoen <hskinnemoen@gmail.com>, Steven Miao <realmz6@gmail.com>, Jesper Nilsson <jesper.nilsson@axis.com>, Mark Salter <msalter@redhat.com>, Yoshinori Sato <ysato@users.sourceforge.jp>, Richard Kuo <rkuo@codeaurora.org>, Tony Luck <tony.luck@intel.com>, Geert Uytterhoeven <geert@linux-m68k.org>, James Hogan <james.hogan@imgtec.com>, Michal Simek <monstr@monstr.eu>, David Howells <dhowells@redhat.com>, Ley Foon Tan <lftan@altera.com>, Jonas Bonn <Jonas.Nilsson@syn> Subject: Re: [RFC][CFT][PATCHSET v1] uaccess unification Date: Thu, 30 Mar 2017 13:40:31 -0700 [thread overview] Message-ID: <efb7aaa4-7d25-0c68-ebf8-cdd7eb1297dc@synopsys.com> (raw) In-Reply-To: <CA+55aFyGwYwdk8i7-GbXV7NLTn38e-bow3VD-hHcQmTr9ebAjw@mail.gmail.com> On 03/29/2017 05:27 PM, Linus Torvalds wrote: > On Wed, Mar 29, 2017 at 5:02 PM, Vineet Gupta > <Vineet.Gupta1@synopsys.com> wrote: >> >> I guess I can in next day or two - but mind you the inline version for ARC is kind >> of special vs. other arches. We have this "manual" constant propagation to elide >> the unrolled LD/ST for 1-15 byte stragglers, when @sz is constant. > > I don't think that's special. We do that on x86 too, and I suspect ARC > copied it from there (or from somebody else who did it). No, I (re)wrote that code and AFAIKR didn't copy from anyone and AFAICS it is certainly different from others if not special. If you look closely at arc:access.h it is not the trivial check for 1-2-4 conversion as in the commit you referred to. It actually tries to compile time eliminate hunks from inline assembly, for constant @sz (so is designed purely for inlined variants, whether that matters or not is a different story). Thing is from the hardware POV, 4 LD/ST in flight is good (atleast for ARC700 cores) so we wrap it up in a Zero delay loop. This takes care of multiples of 16 bytes, the last 15 bytes are the killer which requires bunch of conditionals which is what I try to eliminate. FWIW, I experimented with uaccess inlining on ARC 1. pristine 4.11-rc1 (all inline) 2. Inline + disabling the "smart" const propagation 3. Out of line only variants (which already existed/default on ARC for -Os, but hacked for current -O3) Numbers for LMBench FS latency (off of tmpfs to avoid any device related perturbation). Note that LMBench already runs them several times itself and each of below is obviously with a fresh reboot since kernels were different. So it seems 0k file create/del gets worse without the smart inline, while 10k gets better. mmap (16k) got worse as well. With out of line some got better while some worse. File & VM system latencies in microseconds - smaller is better ------------------------------------------------------------------------------- Host OS 0K File 10K File Mmap Prot Page 100fd Create Delete Create Delete Latency Fault Fault selct --------- ------------- ------ ------ ------ ------ ------- ----- ------- ----- 170329-v4 Linux 4.11.0- 124.3 75.3 734.2 147.8 2200.0 6.205 10.9 87.6 170330-v4 Linux 4.11.0- 154.9 88.3 709.2 131.2 2494.0 4.056 11.0 91.1 170330-v4 Linux 4.11.0- 157.7 69.8 622.7 140.8 2168.0 5.654 10.8 91.0 Compare that to data against 1. pristine 4.11-rc1 (all inline) 2. Al's series + ARC forced inline 3. Al's series + ARC forced NOT inline File & VM system latencies in microseconds - smaller is better ------------------------------------------------------------------------------- Host OS 0K File 10K File Mmap Prot Page 100fd Create Delete Create Delete Latency Fault Fault selct --------- ------------- ------ ------ ------ ------ ------- ----- ------- ----- 170329-v4 Linux 4.11.0- 124.3 75.3 734.2 147.8 2200.0 6.205 10.9 87.6 170329-v4 Linux 4.11.0- 141.2 63.4 629.7 130.0 2172.0 5.796 10.8 90.0 170329-v4 Linux 4.11.0- 154.9 89.2 691.6 147.7 2323.0 4.922 10.8 92.3 So it's a mix bag really. Maybe we need some better directed test to really drill it down. > But at least on x86 is is limited entirely to the "__" versions, and > it's almost entirely pointless. We actually removed some of that kind > of code because it was *do* pointless, and it had just been copied > around into the "atomic" versions too. > > See for example commit bd28b14591b9 ("x86: remove more uaccess_32.h > complexity"), which did that. > > The basic "__" versions still do that constant-size thing, but they > really are questionable. Perhaps because the scope of constant usage was pretty narrow - it would only benefit if *copy_from_user() were called with 1,2,4 which is relatively unlikely as we have __get_user and friends for that already. > Exactly because it's just the "__" versions - > the *regular* "copy_to/from_user()" is an unconditional function call, > because inlining it isn't just the access operations, it's the size > check, and on modern x86 it's also the "set AC to mark the user access > as safe". So what you are saying is it is relatively costly on x86 because of SMAP which may not be true for arches w/o hardware support. Note that I'm not arguing for/against inlining per-se, it seems it doesn't matter -Vineet
next prev parent reply other threads:[~2017-03-30 20:41 UTC|newest] Thread overview: 83+ messages / expand[flat|nested] mbox.gz Atom feed top 2017-03-29 5:57 [RFC][CFT][PATCHSET v1] uaccess unification Al Viro 2017-03-29 5:57 ` Al Viro 2017-03-29 20:08 ` Vineet Gupta 2017-03-29 20:08 ` Vineet Gupta 2017-03-29 20:08 ` Vineet Gupta 2017-03-29 20:29 ` Al Viro 2017-03-29 20:29 ` Al Viro 2017-03-29 20:37 ` Linus Torvalds 2017-03-29 20:37 ` Linus Torvalds 2017-03-29 21:03 ` Al Viro 2017-03-29 21:03 ` Al Viro 2017-03-29 21:24 ` Linus Torvalds 2017-03-29 21:24 ` Linus Torvalds 2017-03-29 23:09 ` Al Viro 2017-03-29 23:09 ` Al Viro 2017-03-29 23:43 ` Linus Torvalds 2017-03-29 23:43 ` Linus Torvalds 2017-03-30 15:31 ` Al Viro 2017-03-30 15:31 ` Al Viro 2017-03-29 21:14 ` Vineet Gupta 2017-03-29 21:14 ` Vineet Gupta 2017-03-29 23:42 ` Al Viro 2017-03-29 23:42 ` Al Viro 2017-03-30 0:02 ` Vineet Gupta 2017-03-30 0:02 ` Vineet Gupta 2017-03-30 0:27 ` Linus Torvalds 2017-03-30 0:27 ` Linus Torvalds 2017-03-30 1:15 ` Al Viro 2017-03-30 1:15 ` Al Viro 2017-03-30 20:40 ` Vineet Gupta [this message] 2017-03-30 20:40 ` Vineet Gupta 2017-03-30 20:59 ` Linus Torvalds 2017-03-30 20:59 ` Linus Torvalds 2017-03-30 23:21 ` Russell King - ARM Linux 2017-03-30 23:21 ` Russell King - ARM Linux 2017-03-30 12:32 ` Martin Schwidefsky 2017-03-30 12:32 ` Martin Schwidefsky 2017-03-30 14:48 ` Al Viro 2017-03-30 14:48 ` Al Viro 2017-03-30 16:22 ` Russell King - ARM Linux 2017-03-30 16:22 ` Russell King - ARM Linux 2017-03-30 16:43 ` Al Viro 2017-03-30 16:43 ` Al Viro 2017-03-30 17:18 ` Linus Torvalds 2017-03-30 17:18 ` Linus Torvalds 2017-03-30 18:48 ` Al Viro 2017-03-30 18:48 ` Al Viro 2017-03-30 18:54 ` Al Viro 2017-03-30 18:54 ` Al Viro 2017-03-30 18:59 ` Linus Torvalds 2017-03-30 18:59 ` Linus Torvalds 2017-03-30 19:10 ` Al Viro 2017-03-30 19:10 ` Al Viro 2017-03-30 19:19 ` Linus Torvalds 2017-03-30 19:19 ` Linus Torvalds 2017-03-30 21:08 ` Al Viro 2017-03-30 21:08 ` Al Viro 2017-03-30 18:56 ` Linus Torvalds 2017-03-30 18:56 ` Linus Torvalds 2017-03-31 0:21 ` Kees Cook 2017-03-31 0:21 ` Kees Cook 2017-03-31 13:38 ` James Hogan 2017-03-31 13:38 ` James Hogan 2017-04-03 16:27 ` James Morse 2017-04-03 16:27 ` James Morse 2017-04-04 20:26 ` Max Filippov 2017-04-04 20:26 ` Max Filippov 2017-04-04 20:26 ` Max Filippov 2017-04-04 20:52 ` Al Viro 2017-04-04 20:52 ` Al Viro 2017-04-05 5:05 ` ia64 exceptions (Re: [RFC][CFT][PATCHSET v1] uaccess unification) Al Viro 2017-04-05 5:05 ` Al Viro 2017-04-05 8:08 ` Al Viro 2017-04-05 8:08 ` Al Viro 2017-04-05 18:44 ` Tony Luck 2017-04-05 18:44 ` Tony Luck 2017-04-05 20:33 ` Al Viro 2017-04-05 20:33 ` Al Viro 2017-04-07 0:24 ` [RFC][CFT][PATCHSET v2] uaccess unification Al Viro 2017-04-07 0:24 ` Al Viro 2017-04-07 0:35 ` Al Viro 2017-04-07 0:35 ` Al Viro [not found] <CACVxJT8+fQqvpSPb9rTWFy6g7moqUqxi+Ewjcg0ykuqo=vm4Ow@mail.gmail.com> 2017-03-30 13:27 ` [RFC][CFT][PATCHSET v1] " Alexey Dobriyan
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=efb7aaa4-7d25-0c68-ebf8-cdd7eb1297dc@synopsys.com \ --to=vineet.gupta1@synopsys.com \ --cc=Jonas.Nilsson@synopsys \ --cc=dhowells@redhat.com \ --cc=geert@linux-m68k.org \ --cc=hskinnemoen@gmail.com \ --cc=james.hogan@imgtec.com \ --cc=jesper.nilsson@axis.com \ --cc=lftan@altera.com \ --cc=linux-arch@vger.kernel.org \ --cc=linux-kernel@vger.kernel.org \ --cc=linux@armlinux.org.uk \ --cc=monstr@monstr.eu \ --cc=msalter@redhat.com \ --cc=realmz6@gmail.com \ --cc=rkuo@codeaurora.org \ --cc=rth@twiddle.net \ --cc=tony.luck@intel.com \ --cc=torvalds@linux-foundation.org \ --cc=viro@zeniv.linux.org.uk \ --cc=will.deacon@arm.com \ --cc=ysato@users.sourceforge.jp \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.