All of lore.kernel.org
 help / color / mirror / Atom feed
From: Andrew Murray <andrew.murray@arm.com>
To: Nick Desaulniers <ndesaulniers@google.com>
Cc: Mark Rutland <mark.rutland@arm.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Catalin Marinas <catalin.marinas@arm.com>,
	Robin Murphy <robin.murphy@arm.com>,
	Ard.Biesheuvel@arm.com,
	Nathan Chancellor <natechancellor@gmail.com>,
	Will Deacon <will@kernel.org>,
	Linux ARM <linux-arm-kernel@lists.infradead.org>
Subject: Re: [PATCH v5 03/10] arm64: atomics: avoid out-of-line ll/sc atomics
Date: Thu, 5 Sep 2019 12:25:19 +0100	[thread overview]
Message-ID: <20190905112519.GY9720@e119886-lin.cambridge.arm.com> (raw)
In-Reply-To: <CAKwvOdkVatgMBLiuKV1bLdDKj_czsaGXuXWXp-9VR6zLyv+U4g@mail.gmail.com>

On Wed, Sep 04, 2019 at 10:28:14AM -0700, Nick Desaulniers wrote:
> On Tue, Sep 3, 2019 at 3:04 PM Andrew Murray <andrew.murray@arm.com> wrote:
> >
> > On Tue, Sep 03, 2019 at 05:37:55PM +0100, Will Deacon wrote:
> > > On Tue, Sep 03, 2019 at 04:31:20PM +0100, Andrew Murray wrote:
> > > > On Tue, Sep 03, 2019 at 04:15:44PM +0100, Andrew Murray wrote:
> > > > > On Tue, Sep 03, 2019 at 03:45:34PM +0100, Will Deacon wrote:
> > > > > > Does it work if the only thing you change is the toolchain, and use GCC
> > > > > > instead?
> > > > >
> > > > > Yup.
> > > >
> > > > Also this is Clang generation:
> > > >
> > > > ffff8000100f2700 <__ptrace_link>:
> > > > ffff8000100f2700:       f9426009        ldr     x9, [x0, #1216]
> > > > ffff8000100f2704:       91130008        add     x8, x0, #0x4c0
> > > > ffff8000100f2708:       eb09011f        cmp     x8, x9
> > > > ffff8000100f270c:       540002a1        b.ne    ffff8000100f2760 <__ptrace_link+0x60>  // b.any
> > > > ffff8000100f2710:       f9425829        ldr     x9, [x1, #1200]
> > > > ffff8000100f2714:       9112c02a        add     x10, x1, #0x4b0
> > > > ffff8000100f2718:       f9000528        str     x8, [x9, #8]
> > > > ffff8000100f271c:       f9026009        str     x9, [x0, #1216]
> > > > ffff8000100f2720:       f902640a        str     x10, [x0, #1224]
> > > > ffff8000100f2724:       f9025828        str     x8, [x1, #1200]
> > > > ffff8000100f2728:       f9024001        str     x1, [x0, #1152]
> > > > ffff8000100f272c:       b4000162        cbz     x2, ffff8000100f2758 <__ptrace_link+0x58>
> > > > ffff8000100f2730:       b900985f        str     wzr, [x2, #152]
> > > > ffff8000100f2734:       14000004        b       ffff8000100f2744 <__ptrace_link+0x44>
> > > > ffff8000100f2738:       14000001        b       ffff8000100f273c <__ptrace_link+0x3c>
> > > > ffff8000100f273c:       14000006        b       ffff8000100f2754 <__ptrace_link+0x54>
> > > > ffff8000100f2740:       14000001        b       ffff8000100f2744 <__ptrace_link+0x44>
> > > > ffff8000100f2744:       52800028        mov     w8, #0x1                        // #1
> > > > ffff8000100f2748:       b828005f        stadd   w8, [x2]
> > > > ffff8000100f274c:       f9030002        str     x2, [x0, #1536]
> > > > ffff8000100f2750:       d65f03c0        ret
> > > > ffff8000100f2754:       140007fd        b       ffff8000100f4748 <ptrace_check_attach+0xf8>
> > > > ...
> > > >
> > > > This looks like the default path (before we write over it) will take you to
> > > > the LSE code (e.g. ffff8000100f2734). I'm pretty sure this is wrong, or at
> > > > least not what we expected to see. Also why 4 branches?
> > >
> > > So I reproduced this with a silly atomic_inc wrapper:
> > >
> > > void will_atomic_inc(atomic_t *v)
> > > {
> > >         atomic_inc(v);
> > > }
> > >
> > > Compiles to:
> > >
> > > 0000000000000018 <will_atomic_inc>:
> > >   18: 14000004        b       28 <will_atomic_inc+0x10>
> > >   1c: 14000001        b       20 <will_atomic_inc+0x8>
> > >   20: 14000005        b       34 <will_atomic_inc+0x1c>
> > >   24: 14000001        b       28 <will_atomic_inc+0x10>
> > >   28: 52800028        mov     w8, #0x1                        // #1
> > >   2c: b828001f        stadd   w8, [x0]
> > >   30: d65f03c0        ret
> > >   34: 14000027        b       d0 <dump_kernel_offset+0x60>
> > >   38: d65f03c0        ret
> > >
> > > which is going to explode.
> >
> > I've come up with a simple reproducer for this issue:
> >
> > static bool branch_jump()
> > {
> >         asm_volatile_goto(
> >                 "1: b %l[l_yes2]"
> >                  : : : : l_yes2);
> >
> >         return false;
> > l_yes2:
> >         return true;
> > }
> >
> > static bool branch_test()
> > {
> >         return (!branch_jump() && !branch_jump());
> > }
> >
> > void andy_test(int *v)
> > {
> >         if (branch_test())
> >                 *v = 0xff;
> > }
> >
> > This leads to the following (it shouldn't do anything):
> >
> > 0000000000000000 <andy_test>:
> >    0:   14000004        b       10 <andy_test+0x10>
> >    4:   14000001        b       8 <andy_test+0x8>
> >    8:   14000004        b       18 <andy_test+0x18>
> >    c:   14000001        b       10 <andy_test+0x10>
> >   10:   52801fe8        mov     w8, #0xff                       // #255
> >   14:   b9000008        str     w8, [x0]
> >   18:   d65f03c0        ret
> >
> > The issue goes away with any of the following hunks:
> >
> >
> > @@ -55,7 +55,7 @@ static bool branch_jump()
> >
> >  static bool branch_test()
> >  {
> > -       return (!branch_jump() && !branch_jump());
> > +       return (!branch_jump());
> >  }
> >
> >  void andy_test(int *v)
> >
> >
> > or:
> >
> >
> > @@ -53,14 +53,10 @@ static bool branch_jump()
> >          return true;
> >  }
> >
> > -static bool branch_test()
> > -{
> > -       return (!branch_jump() && !branch_jump());
> > -}
> >
> >  void andy_test(int *v)
> >  {
> > -       if (branch_test())
> > +       if (!branch_jump() && !branch_jump())
> >                 *v = 0xff;
> >  }
> 
> Indeed, playing with the definition of `__lse_ll_sc_body`, I can get
> the kernel to boot again.

Thanks for investigating this.

Did it boot to a prompt? I played with the structure of the code and
too was able to get it to boot, but I found that it hung later-on during
boot. Thus I lost a bit of confidence in it.

> 
> So I think your very helpful test cases are illustrating two different problems:
> https://godbolt.org/z/dMf7x-
> See the disassembly of `andy_test2`.  Reference to the correct label
> is emitted in the inline asm, but there's some silly unconditional
> branches to the next instruction.  That's issue #1 and part of the
> reason you see superfluous branches.  With that fixed, `andy_test2`
> would match between GCC and Clang.  I think that can be a very late
> peephole optimization (and further, we could probably combine labels
> that refer to the same location, oh and .Lfunc_endX could just use
> `.`, too!). LLVM devs noted that the x86 backend doesn't have this
> issue, but this is a curiously recurring pattern I'm noticing in LLVM
> where some arch agnostic optimization is only implemented for x86...
> I'm reading through our Branch Folding pass which I think should
> handle this, but I'll need to fire up a debugger.
> 
> Issue #2 is the more critical issue, but may be conflated with issue
> #1.  Issue #2 is the nonsensical control flow with one level of
> inlining.  See how in the disassembly of `andy_test`, the first label
> referenced from inline assembly is *before* the mov/str when it should
> have been *after*.  Not sure where we could be going wrong, but it's
> straightforward for me to observe the code change as its transformed
> through LLVM, and I've debugged and fixed issues related to inlining
> asm goto before.

You may also be interested in this:

https://godbolt.org/z/8OthP2

void andy_test3(int *v)
{
    if (!branch_jump())
        return;

    if (!branch_jump())
        return;

    *v = 0xff;
}

(I used a similar approach with system_uses_lse_atomics to get the
kernel to boot a bit more).

This generated code does the right thing here (in comparison to andy_test2).
I felt like this gave an insight as to what is going on, but I don't
have the knowledge to know what. It's as if the early return prevents the
compiler from getting confused when it should otherwise jump to the second
goto.

Thanks,

Andrew Murray

> -- 
> Thanks,
> ~Nick Desaulniers

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

  reply	other threads:[~2019-09-05 11:25 UTC|newest]

Thread overview: 44+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-08-29 15:48 [PATCH v5 00/10] arm64: avoid out-of-line ll/sc atomics Will Deacon
2019-08-29 15:48 ` [PATCH v5 01/10] jump_label: Don't warn on __exit jump entries Will Deacon
2019-08-29 15:48 ` [PATCH v5 02/10] arm64: Use correct ll/sc atomic constraints Will Deacon
2019-08-29 15:48 ` [PATCH v5 03/10] arm64: atomics: avoid out-of-line ll/sc atomics Will Deacon
2019-09-03  6:00   ` Nathan Chancellor
2019-09-03  6:39     ` Will Deacon
2019-09-03 14:31     ` Andrew Murray
2019-09-03 14:45       ` Will Deacon
2019-09-03 15:15         ` Andrew Murray
2019-09-03 15:31           ` Andrew Murray
2019-09-03 16:37             ` Will Deacon
2019-09-03 22:04               ` Andrew Murray
2019-09-03 22:35                 ` Nick Desaulniers
     [not found]                   ` <CANW9uyuRFtNKMnSwmHWt_RebJA1ADXdZfeDHc6=yaaFH2NsyWg@mail.gmail.com>
2019-09-03 22:53                     ` Nick Desaulniers
2019-09-04 10:20                       ` Will Deacon
2019-09-04 17:28                 ` Nick Desaulniers
2019-09-05 11:25                   ` Andrew Murray [this message]
2019-09-06 19:44                     ` Nick Desaulniers
2019-08-29 15:48 ` [PATCH v5 04/10] arm64: avoid using hard-coded registers for LSE atomics Will Deacon
2019-08-29 15:48 ` [PATCH v5 05/10] arm64: atomics: Remove atomic_ll_sc compilation unit Will Deacon
2019-08-29 17:47   ` Nick Desaulniers
2019-08-29 20:07     ` Tri Vo
2019-08-29 21:54       ` Will Deacon
2019-08-29 15:48 ` [PATCH v5 06/10] arm64: lse: Remove unused 'alt_lse' assembly macro Will Deacon
2019-08-29 23:39   ` Andrew Murray
2019-08-29 15:48 ` [PATCH v5 07/10] arm64: asm: Kill 'asm/atomic_arch.h' Will Deacon
2019-08-29 23:43   ` Andrew Murray
2019-08-29 15:48 ` [PATCH v5 08/10] arm64: lse: Make ARM64_LSE_ATOMICS depend on JUMP_LABEL Will Deacon
2019-08-29 23:44   ` Andrew Murray
2019-08-29 15:48 ` [PATCH v5 09/10] arm64: atomics: Undefine internal macros after use Will Deacon
2019-08-29 23:44   ` Andrew Murray
2019-08-29 15:48 ` [PATCH v5 10/10] arm64: atomics: Use K constraint when toolchain appears to support it Will Deacon
2019-08-29 16:54   ` Will Deacon
2019-08-29 17:45     ` Nick Desaulniers
2019-08-29 21:53       ` Will Deacon
2019-08-30 20:57         ` Nick Desaulniers
2019-08-30  0:08     ` Andrew Murray
2019-08-30  7:52       ` Will Deacon
2019-08-30  9:11         ` Andrew Murray
2019-08-30 10:17           ` Will Deacon
2019-08-30 11:57             ` Andrew Murray
2019-08-30 10:40           ` Mark Rutland
2019-08-30 11:53             ` Andrew Murray
2019-08-29 23:49   ` Andrew Murray

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190905112519.GY9720@e119886-lin.cambridge.arm.com \
    --to=andrew.murray@arm.com \
    --cc=Ard.Biesheuvel@arm.com \
    --cc=catalin.marinas@arm.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=mark.rutland@arm.com \
    --cc=natechancellor@gmail.com \
    --cc=ndesaulniers@google.com \
    --cc=peterz@infradead.org \
    --cc=robin.murphy@arm.com \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.