From: Rasmus Villemoes <linux@rasmusvillemoes.dk>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Linux Kernel Mailing List <linux-kernel@vger.kernel.org>
Subject: Re: bit tweaks [was: Re: [nfsd4] potentially hardware breaking regression in 4.14-rc and 4.13.11]
Date: Tue, 14 Nov 2017 23:24:17 +0100 [thread overview]
Message-ID: <CAKwiHFgXDxpQc279NUrO8X3E+UysBoVQ63s=y5oVu0jKysT7Rw@mail.gmail.com> (raw)
In-Reply-To: <CA+55aFwWt-hRSRqJRP6Wc67COZFZwpcMy4o06bRNBGnwW8QBFw@mail.gmail.com>
On 14 November 2017 at 00:54, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> On Mon, Nov 13, 2017 at 3:30 PM, Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
>>
>> So let's just rewrite that mnt_flags conversion that way, justr to get
>> gcc to generate the obvious code.
>
> Oh wow. I tried to do the same thing in fs/namespace.c where it does
> the reverse bit translation, and gcc makes a _horrible_ mess of it and
> actually makes the code much worse because for some reason the pattern
> doesn't trigger.
[trimming cc list]
Can you be more specific? That's not what I see with gcc 7.1. I've
found two blocks where I replaced the if's with the ternary
expressions, one in clone_mnt, one in do_mount. In clone_mnt, gcc-7.1
seems to do (first instruction is the unconditional |= MNT_LOCK_ATIME)
10d7: 0d 00 00 04 00 or $0x40000,%eax
10dc: 89 43 30 mov %eax,0x30(%rbx)
10df: a8 02 test $0x2,%al
10e1: 74 08 je 10eb <clone_mnt+0x9b>
10e3: 0d 00 00 20 00 or $0x200000,%eax
10e8: 89 43 30 mov %eax,0x30(%rbx)
10eb: a8 01 test $0x1,%al
10ed: 74 08 je 10f7 <clone_mnt+0xa7>
10ef: 0d 00 00 10 00 or $0x100000,%eax
10f4: 89 43 30 mov %eax,0x30(%rbx)
and after patching
10cc: 0d 00 00 04 00 or $0x40000,%eax
10d1: 89 c2 mov %eax,%edx
10d3: c1 e2 10 shl $0x10,%edx
10d6: 81 e2 00 00 40 00 and $0x400000,%edx
10dc: 09 d0 or %edx,%eax
10de: 89 c2 mov %eax,%edx
10e0: c1 e2 14 shl $0x14,%edx
10e3: 81 e2 00 00 20 00 and $0x200000,%edx
10e9: 09 c2 or %eax,%edx
(with a final store of %eax to 0x30(%rbx)). Either way it's four
instructions per flag, but I assume the one without the branches is
preferable.
Similarly, in do_mount, before we have
3429: 44 89 f8 mov %r15d,%eax
342c: 83 c8 01 or $0x1,%eax
342f: 40 f6 c5 02 test $0x2,%bpl
3433: 44 0f 45 f8 cmovne %eax,%r15d
3437: 44 89 f8 mov %r15d,%eax
343a: 83 c8 02 or $0x2,%eax
343d: 40 f6 c5 04 test $0x4,%bpl
3441: 44 0f 45 f8 cmovne %eax,%r15d
but after patching it does something like
3425: 4d 89 fe mov %r15,%r14
3428: 48 c1 ea 07 shr $0x7,%rdx
342c: 49 d1 ee shr %r14
342f: 89 d0 mov %edx,%eax
3431: c1 e1 05 shl $0x5,%ecx
3434: 83 e0 08 and $0x8,%eax
3437: 41 83 e6 07 and $0x7,%r14d
343b: 41 09 c6 or %eax,%r14d
343e: 89 d0 mov %edx,%eax
which actually makes use of MS_{NOSUID,NODEV,NOEXEC} all being 1 bit
off from their MNT_ counterparts (witness the shr %r14 and the and
with 0x7), so there we also cut 37 bytes according to bloat-o-meter.
Now, it does seem that older (and not that old in absolute terms)
compilers may generate worse code after the transformation - the
'replace with shift-mask-or' pattern doesn't exist until 7.1. E.g.
with 5.4 we have
$ scripts/bloat-o-meter fs/namespace.o.{0,1}-5.4
add/remove: 0/0 grow/shrink: 2/0 up/down: 50/0 (50)
function old new delta
do_mount 3153 3195 +42
clone_mnt 768 776 +8
5.4 compiles the "if() ..." construction roughly as 7.1, i.e. with
cmovs, but the ternary expressions are transformed into this
abomination
336f: 48 89 da mov %rbx,%rdx
3372: 83 e2 04 and $0x4,%edx
3375: 48 83 fa 01 cmp $0x1,%rdx
3379: 19 d2 sbb %edx,%edx
337b: f7 d2 not %edx
337d: 83 e2 02 and $0x2,%edx
3380: 09 d0 or %edx,%eax
3382: 48 89 da mov %rbx,%rdx
3385: 83 e2 08 and $0x8,%edx
3388: 48 83 fa 01 cmp $0x1,%rdx
338c: 19 d2 sbb %edx,%edx
338e: f7 d2 not %edx
3390: 83 e2 04 and $0x4,%edx
3393: 09 d0 or %edx,%eax
Was it something like this you saw?
Rasmus
next prev parent reply other threads:[~2017-11-14 22:24 UTC|newest]
Thread overview: 82+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-11-09 0:43 [nfsd4] potentially hardware breaking regression in 4.14-rc and 4.13.11 Patrick McLean
2017-11-09 2:40 ` Linus Torvalds
2017-11-09 3:45 ` Al Viro
2017-11-09 19:34 ` Patrick McLean
2017-11-09 19:38 ` Al Viro
2017-11-09 19:42 ` Patrick McLean
2017-11-09 19:37 ` Al Viro
2017-11-09 19:51 ` Patrick McLean
2017-11-09 20:04 ` Linus Torvalds
2017-11-09 21:16 ` Al Viro
2017-11-10 1:58 ` Patrick McLean
2017-11-10 13:53 ` Arnd Bergmann
2017-11-10 18:42 ` Linus Torvalds
2017-11-10 23:26 ` Patrick McLean
2017-11-11 0:27 ` Patrick McLean
2017-11-11 2:36 ` Linus Torvalds
2017-11-11 2:36 ` [kernel-hardening] " Linus Torvalds
2017-11-11 2:36 ` Linus Torvalds
2017-11-11 16:13 ` Kees Cook
2017-11-11 16:13 ` [kernel-hardening] " Kees Cook
2017-11-11 16:13 ` Kees Cook
2017-11-11 17:31 ` Linus Torvalds
2017-11-11 17:31 ` [kernel-hardening] " Linus Torvalds
2017-11-11 17:31 ` Linus Torvalds
2017-11-13 22:48 ` Patrick McLean
2017-11-13 22:48 ` [kernel-hardening] " Patrick McLean
2017-11-13 22:48 ` Patrick McLean
2017-11-17 0:54 ` Kees Cook
2017-11-17 0:54 ` [kernel-hardening] " Kees Cook
2017-11-17 0:54 ` Kees Cook
2017-11-17 19:03 ` Patrick McLean
2017-11-17 19:03 ` [kernel-hardening] " Patrick McLean
2017-11-17 19:03 ` Patrick McLean
2017-11-17 21:26 ` Kees Cook
2017-11-17 21:26 ` [kernel-hardening] " Kees Cook
2017-11-17 21:26 ` Kees Cook
2017-11-18 0:27 ` Patrick McLean
2017-11-18 0:27 ` [kernel-hardening] " Patrick McLean
2017-11-18 0:27 ` Patrick McLean
2017-11-18 0:55 ` Linus Torvalds
2017-11-18 0:55 ` [kernel-hardening] " Linus Torvalds
2017-11-18 0:55 ` Linus Torvalds
2017-11-18 1:54 ` Patrick McLean
2017-11-18 1:54 ` [kernel-hardening] " Patrick McLean
2017-11-18 1:54 ` Patrick McLean
2017-11-18 5:14 ` Kees Cook
2017-11-18 5:14 ` [kernel-hardening] " Kees Cook
2017-11-18 5:14 ` Kees Cook
2017-11-18 5:29 ` Linus Torvalds
2017-11-18 5:29 ` [kernel-hardening] " Linus Torvalds
2017-11-18 5:29 ` Linus Torvalds
2017-11-18 8:20 ` Kees Cook
2017-11-18 8:20 ` [kernel-hardening] " Kees Cook
2017-11-18 8:20 ` Kees Cook
2018-02-21 22:19 ` RANDSTRUCT structs need linux/compiler_types.h (Was: [nfsd4] potentially hardware breaking regression in 4.14-rc and 4.13.11) Maciej S. Szmigiero
2018-02-21 22:47 ` Linus Torvalds
2018-02-21 22:47 ` Linus Torvalds
2018-02-21 23:34 ` Kees Cook
2018-02-21 23:34 ` Kees Cook
2018-03-05 9:27 ` Masahiro Yamada
2018-03-05 9:27 ` Masahiro Yamada
2018-03-05 19:15 ` Kees Cook
2018-03-05 19:18 ` Linus Torvalds
2018-02-21 22:52 ` Kees Cook
2018-02-21 23:24 ` Linus Torvalds
2018-02-22 0:12 ` Kees Cook
2018-02-22 0:22 ` Linus Torvalds
2018-02-22 0:23 ` Kees Cook
2018-02-22 0:27 ` Kees Cook
2017-11-11 1:13 ` [nfsd4] potentially hardware breaking regression in 4.14-rc and 4.13.11 J. Bruce Fields
2017-11-11 2:32 ` Al Viro
2017-11-10 1:47 ` Patrick McLean
2017-11-09 20:47 ` J. Bruce Fields
2017-11-09 23:07 ` Patrick McLean
2017-11-13 22:59 ` bit tweaks [was: Re: [nfsd4] potentially hardware breaking regression in 4.14-rc and 4.13.11] Rasmus Villemoes
2017-11-13 23:30 ` Linus Torvalds
2017-11-13 23:54 ` Linus Torvalds
2017-11-14 22:24 ` Rasmus Villemoes [this message]
2017-11-14 22:43 ` Linus Torvalds
2017-11-14 23:53 ` Rasmus Villemoes
2017-11-15 0:02 ` Linus Torvalds
2017-11-11 2:47 ` [nfsd4] potentially hardware breaking regression in 4.14-rc and 4.13.11 Alan Cox
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CAKwiHFgXDxpQc279NUrO8X3E+UysBoVQ63s=y5oVu0jKysT7Rw@mail.gmail.com' \
--to=linux@rasmusvillemoes.dk \
--cc=linux-kernel@vger.kernel.org \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.