All of lore.kernel.org
 help / color / mirror / Atom feed
From: Rasmus Villemoes <linux@rasmusvillemoes.dk>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Linux Kernel Mailing List <linux-kernel@vger.kernel.org>
Subject: Re: bit tweaks [was: Re: [nfsd4] potentially hardware breaking regression in 4.14-rc and 4.13.11]
Date: Tue, 14 Nov 2017 23:24:17 +0100	[thread overview]
Message-ID: <CAKwiHFgXDxpQc279NUrO8X3E+UysBoVQ63s=y5oVu0jKysT7Rw@mail.gmail.com> (raw)
In-Reply-To: <CA+55aFwWt-hRSRqJRP6Wc67COZFZwpcMy4o06bRNBGnwW8QBFw@mail.gmail.com>

On 14 November 2017 at 00:54, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> On Mon, Nov 13, 2017 at 3:30 PM, Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
>>
>> So let's just rewrite that mnt_flags conversion that way, justr to get
>> gcc to generate the obvious code.
>
> Oh wow. I tried to do the same thing in fs/namespace.c where it does
> the reverse bit translation, and gcc makes a _horrible_ mess of it and
> actually makes the code much worse because for some reason the pattern
> doesn't trigger.

[trimming cc list]

Can you be more specific? That's not what I see with gcc 7.1. I've
found two blocks where I replaced the if's with the ternary
expressions, one in clone_mnt, one in do_mount. In clone_mnt, gcc-7.1
seems to do (first instruction is the unconditional |= MNT_LOCK_ATIME)

    10d7:       0d 00 00 04 00          or     $0x40000,%eax
    10dc:       89 43 30                mov    %eax,0x30(%rbx)
    10df:       a8 02                   test   $0x2,%al
    10e1:       74 08                   je     10eb <clone_mnt+0x9b>
    10e3:       0d 00 00 20 00          or     $0x200000,%eax
    10e8:       89 43 30                mov    %eax,0x30(%rbx)
    10eb:       a8 01                   test   $0x1,%al
    10ed:       74 08                   je     10f7 <clone_mnt+0xa7>
    10ef:       0d 00 00 10 00          or     $0x100000,%eax
    10f4:       89 43 30                mov    %eax,0x30(%rbx)

and after patching

    10cc:       0d 00 00 04 00          or     $0x40000,%eax
    10d1:       89 c2                   mov    %eax,%edx
    10d3:       c1 e2 10                shl    $0x10,%edx
    10d6:       81 e2 00 00 40 00       and    $0x400000,%edx
    10dc:       09 d0                   or     %edx,%eax
    10de:       89 c2                   mov    %eax,%edx
    10e0:       c1 e2 14                shl    $0x14,%edx
    10e3:       81 e2 00 00 20 00       and    $0x200000,%edx
    10e9:       09 c2                   or     %eax,%edx

(with a final store of %eax to 0x30(%rbx)). Either way it's four
instructions per flag, but I assume the one without the branches is
preferable.

Similarly, in do_mount, before we have

    3429:       44 89 f8                mov    %r15d,%eax
    342c:       83 c8 01                or     $0x1,%eax
    342f:       40 f6 c5 02             test   $0x2,%bpl
    3433:       44 0f 45 f8             cmovne %eax,%r15d
    3437:       44 89 f8                mov    %r15d,%eax
    343a:       83 c8 02                or     $0x2,%eax
    343d:       40 f6 c5 04             test   $0x4,%bpl
    3441:       44 0f 45 f8             cmovne %eax,%r15d

but after patching it does something like

    3425:       4d 89 fe                mov    %r15,%r14
    3428:       48 c1 ea 07             shr    $0x7,%rdx
    342c:       49 d1 ee                shr    %r14
    342f:       89 d0                   mov    %edx,%eax
    3431:       c1 e1 05                shl    $0x5,%ecx
    3434:       83 e0 08                and    $0x8,%eax
    3437:       41 83 e6 07             and    $0x7,%r14d
    343b:       41 09 c6                or     %eax,%r14d
    343e:       89 d0                   mov    %edx,%eax

which actually makes use of MS_{NOSUID,NODEV,NOEXEC} all being 1 bit
off from their MNT_ counterparts (witness the shr %r14 and the and
with 0x7), so there we also cut 37 bytes according to bloat-o-meter.

Now, it does seem that older (and not that old in absolute terms)
compilers may generate worse code after the transformation - the
'replace with shift-mask-or' pattern doesn't exist until 7.1. E.g.
with 5.4 we have

$ scripts/bloat-o-meter fs/namespace.o.{0,1}-5.4
add/remove: 0/0 grow/shrink: 2/0 up/down: 50/0 (50)
function                                     old     new   delta
do_mount                                    3153    3195     +42
clone_mnt                                    768     776      +8

5.4 compiles the "if() ..." construction roughly as 7.1, i.e. with
cmovs, but the ternary expressions are transformed into this
abomination

    336f:       48 89 da                mov    %rbx,%rdx
    3372:       83 e2 04                and    $0x4,%edx
    3375:       48 83 fa 01             cmp    $0x1,%rdx
    3379:       19 d2                   sbb    %edx,%edx
    337b:       f7 d2                   not    %edx
    337d:       83 e2 02                and    $0x2,%edx
    3380:       09 d0                   or     %edx,%eax
    3382:       48 89 da                mov    %rbx,%rdx
    3385:       83 e2 08                and    $0x8,%edx
    3388:       48 83 fa 01             cmp    $0x1,%rdx
    338c:       19 d2                   sbb    %edx,%edx
    338e:       f7 d2                   not    %edx
    3390:       83 e2 04                and    $0x4,%edx
    3393:       09 d0                   or     %edx,%eax

Was it something like this you saw?

Rasmus

  reply	other threads:[~2017-11-14 22:24 UTC|newest]

Thread overview: 82+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-11-09  0:43 [nfsd4] potentially hardware breaking regression in 4.14-rc and 4.13.11 Patrick McLean
2017-11-09  2:40 ` Linus Torvalds
2017-11-09  3:45   ` Al Viro
2017-11-09 19:34   ` Patrick McLean
2017-11-09 19:38     ` Al Viro
2017-11-09 19:42       ` Patrick McLean
2017-11-09 19:37   ` Al Viro
2017-11-09 19:51     ` Patrick McLean
2017-11-09 20:04       ` Linus Torvalds
2017-11-09 21:16         ` Al Viro
2017-11-10  1:58         ` Patrick McLean
2017-11-10 13:53           ` Arnd Bergmann
2017-11-10 18:42           ` Linus Torvalds
2017-11-10 23:26             ` Patrick McLean
2017-11-11  0:27               ` Patrick McLean
2017-11-11  2:36                 ` Linus Torvalds
2017-11-11  2:36                   ` [kernel-hardening] " Linus Torvalds
2017-11-11  2:36                   ` Linus Torvalds
2017-11-11 16:13                   ` Kees Cook
2017-11-11 16:13                     ` [kernel-hardening] " Kees Cook
2017-11-11 16:13                     ` Kees Cook
2017-11-11 17:31                     ` Linus Torvalds
2017-11-11 17:31                       ` [kernel-hardening] " Linus Torvalds
2017-11-11 17:31                       ` Linus Torvalds
2017-11-13 22:48                       ` Patrick McLean
2017-11-13 22:48                         ` [kernel-hardening] " Patrick McLean
2017-11-13 22:48                         ` Patrick McLean
2017-11-17  0:54                         ` Kees Cook
2017-11-17  0:54                           ` [kernel-hardening] " Kees Cook
2017-11-17  0:54                           ` Kees Cook
2017-11-17 19:03                           ` Patrick McLean
2017-11-17 19:03                             ` [kernel-hardening] " Patrick McLean
2017-11-17 19:03                             ` Patrick McLean
2017-11-17 21:26                             ` Kees Cook
2017-11-17 21:26                               ` [kernel-hardening] " Kees Cook
2017-11-17 21:26                               ` Kees Cook
2017-11-18  0:27                               ` Patrick McLean
2017-11-18  0:27                                 ` [kernel-hardening] " Patrick McLean
2017-11-18  0:27                                 ` Patrick McLean
2017-11-18  0:55                                 ` Linus Torvalds
2017-11-18  0:55                                   ` [kernel-hardening] " Linus Torvalds
2017-11-18  0:55                                   ` Linus Torvalds
2017-11-18  1:54                                   ` Patrick McLean
2017-11-18  1:54                                     ` [kernel-hardening] " Patrick McLean
2017-11-18  1:54                                     ` Patrick McLean
2017-11-18  5:14                                     ` Kees Cook
2017-11-18  5:14                                       ` [kernel-hardening] " Kees Cook
2017-11-18  5:14                                       ` Kees Cook
2017-11-18  5:29                                       ` Linus Torvalds
2017-11-18  5:29                                         ` [kernel-hardening] " Linus Torvalds
2017-11-18  5:29                                         ` Linus Torvalds
2017-11-18  8:20                                         ` Kees Cook
2017-11-18  8:20                                           ` [kernel-hardening] " Kees Cook
2017-11-18  8:20                                           ` Kees Cook
2018-02-21 22:19                                       ` RANDSTRUCT structs need linux/compiler_types.h (Was: [nfsd4] potentially hardware breaking regression in 4.14-rc and 4.13.11) Maciej S. Szmigiero
2018-02-21 22:47                                         ` Linus Torvalds
2018-02-21 22:47                                           ` Linus Torvalds
2018-02-21 23:34                                           ` Kees Cook
2018-02-21 23:34                                             ` Kees Cook
2018-03-05  9:27                                           ` Masahiro Yamada
2018-03-05  9:27                                             ` Masahiro Yamada
2018-03-05 19:15                                             ` Kees Cook
2018-03-05 19:18                                             ` Linus Torvalds
2018-02-21 22:52                                         ` Kees Cook
2018-02-21 23:24                                           ` Linus Torvalds
2018-02-22  0:12                                             ` Kees Cook
2018-02-22  0:22                                               ` Linus Torvalds
2018-02-22  0:23                                                 ` Kees Cook
2018-02-22  0:27                                                   ` Kees Cook
2017-11-11  1:13               ` [nfsd4] potentially hardware breaking regression in 4.14-rc and 4.13.11 J. Bruce Fields
2017-11-11  2:32                 ` Al Viro
2017-11-10  1:47       ` Patrick McLean
2017-11-09 20:47   ` J. Bruce Fields
2017-11-09 23:07     ` Patrick McLean
2017-11-13 22:59   ` bit tweaks [was: Re: [nfsd4] potentially hardware breaking regression in 4.14-rc and 4.13.11] Rasmus Villemoes
2017-11-13 23:30     ` Linus Torvalds
2017-11-13 23:54       ` Linus Torvalds
2017-11-14 22:24         ` Rasmus Villemoes [this message]
2017-11-14 22:43           ` Linus Torvalds
2017-11-14 23:53             ` Rasmus Villemoes
2017-11-15  0:02               ` Linus Torvalds
2017-11-11  2:47 ` [nfsd4] potentially hardware breaking regression in 4.14-rc and 4.13.11 Alan Cox

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAKwiHFgXDxpQc279NUrO8X3E+UysBoVQ63s=y5oVu0jKysT7Rw@mail.gmail.com' \
    --to=linux@rasmusvillemoes.dk \
    --cc=linux-kernel@vger.kernel.org \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.