Linux-ARM-Kernel Archive on lore.kernel.org
 help / color / Atom feed
From: Ard Biesheuvel <ard.biesheuvel@linaro.org>
To: Andrew Murray <andrew.murray@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>,
	Catalin Marinas <catalin.marinas@arm.com>,
	Boqun Feng <boqun.feng@gmail.com>,
	Will Deacon <will.deacon@arm.com>,
	Ard.Biesheuvel@arm.com,
	linux-arm-kernel <linux-arm-kernel@lists.infradead.org>
Subject: Re: [PATCH v1 0/5] arm64: avoid out-of-line ll/sc atomics
Date: Fri, 17 May 2019 12:29:54 +0200
Message-ID: <CAKv+Gu_fhFB-fFw20OjhPt5BM2cFuYxbD99JJK963gQftAAn3Q@mail.gmail.com> (raw)
In-Reply-To: <20190517100802.GS8268@e119886-lin.cambridge.arm.com>

On Fri, 17 May 2019 at 12:08, Andrew Murray <andrew.murray@arm.com> wrote:
>
> On Fri, May 17, 2019 at 09:24:01AM +0200, Peter Zijlstra wrote:
> > On Thu, May 16, 2019 at 04:53:39PM +0100, Andrew Murray wrote:
> > > When building for LSE atomics (CONFIG_ARM64_LSE_ATOMICS), if the hardware
> > > or toolchain doesn't support it the existing code will fallback to ll/sc
> > > atomics. It achieves this by branching from inline assembly to a function
> > > that is built with specical compile flags. Further this results in the
> > > clobbering of registers even when the fallback isn't used increasing
> > > register pressure.
> > >
> > > Let's improve this by providing inline implementatins of both LSE and
> > > ll/sc and use a static key to select between them. This allows for the
> > > compiler to generate better atomics code.
> >
> > Don't you guys have alternatives? That would avoid having both versions
> > in the code, and thus significantly cuts back on the bloat.
>
> Yes we do.
>
> Prior to patch 3 of this series, the ARM64_LSE_ATOMIC_INSN macro used
> ALTERNATIVE to either bl to a fallback ll/sc function (and nops) - or execute
> some LSE instructions.
>
> But this approach limits the compilers ability to optimise the code due to
> the asm clobber list being the superset of both ll/sc and LSE - and the gcc
> compiler flags used on the ll/sc functions.
>
> I think the alternative solution (excuse the pun) that you are suggesting
> is to put the body of the ll/sc or LSE code in the ALTERNATIVE oldinstr/newinstr
> blocks (i.e. drop the fallback branches). However this still gives us some
> bloat (but less than my current solution) because we're still now inlining the
> larger fallback ll/sc whereas previously they were non-inline'd functions. We
> still end up with potentially unnecessary clobbers for LSE code with this
> approach.
>
> Approach prior to this series:
>
>    BL 1 or NOP <- single alternative instruction
>    LSE
>    LSE
>    ...
>
> 1: LL/SC <- LL/SC fallback not inlined so reused
>    LL/SC
>    LL/SC
>    LL/SC
>
> Approach proposed by this series:
>
>    BL 1 or NOP <- single alternative instruction
>    LSE
>    LSE
>    BL 2
> 1: LL/SC <- inlined LL/SC and thus duplicated
>    LL/SC
>    LL/SC
>    LL/SC
> 2: ..
>
> Approach using alternative without braces:
>
>    LSE
>    LSE
>    NOP
>    NOP
>
> or
>
>    LL/SC <- inlined LL/SC and thus duplicated
>    LL/SC
>    LL/SC
>    LL/SC
>
> I guess there is a balance here between bloat and code optimisation.
>


So there are two separate questions here:
1) whether or not we should merge the inline asm blocks so that the
compiler sees a single set of constraints and operands
2) whether the LL/SC sequence should be inlined and/or duplicated.

This approach appears to be based on the assumption that reserving one
or sometimes two additional registers for the LL/SC fallback has a
more severe impact on performance than the unconditional branch.
However, it seems to me that any call site that uses the atomics has
to deal with the possibility of either version being invoked, and so
the additional registers need to be freed up in any case. Or am I
missing something?

As for the duplication: a while ago, I suggested an approach [0] using
alternatives and asm subsections, which moved the duplicated LL/SC
fallbacks out of the hot path. This does not remove the bloat, but it
does mitigate its impact on I-cache efficiency when running on
hardware that does not require the fallbacks.


[0] https://lore.kernel.org/linux-arm-kernel/20181113233923.20098-1-ard.biesheuvel@linaro.org/



> >
> > > These changes add a small amount of bloat on defconfig according to
> > > bloat-o-meter:
> > >
> > > text:
> > >   add/remove: 1/108 grow/shrink: 3448/20 up/down: 272768/-4320 (268448)
> > >   Total: Before=12363112, After=12631560, chg +2.17%
> >
> > I'd say 2% is quite significant bloat.
>
> Thanks,
>
> Andrew Murray
>
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

  reply index

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-05-16 15:53 Andrew Murray
2019-05-16 15:53 ` [PATCH v1 1/5] jump_label: Don't warn on __exit jump entries Andrew Murray
2019-05-16 15:53 ` [PATCH v1 2/5] arm64: Use correct ll/sc atomic constraints Andrew Murray
2019-05-16 15:53 ` [PATCH v1 3/5] arm64: atomics: avoid out-of-line ll/sc atomics Andrew Murray
2019-05-16 15:53 ` [PATCH v1 4/5] arm64: avoid using hard-coded registers for LSE atomics Andrew Murray
2019-05-16 15:53 ` [PATCH v1 5/5] arm64: atomics: remove atomic_ll_sc compilation unit Andrew Murray
2019-05-17  7:24 ` [PATCH v1 0/5] arm64: avoid out-of-line ll/sc atomics Peter Zijlstra
2019-05-17 10:08   ` Andrew Murray
2019-05-17 10:29     ` Ard Biesheuvel [this message]
2019-05-22 10:45       ` Andrew Murray
2019-05-22 11:44         ` Ard Biesheuvel
2019-05-22 15:36           ` Andrew Murray
2019-05-17 12:05     ` Peter Zijlstra
2019-05-17 12:19       ` Ard Biesheuvel

Reply instructions:

You may reply publically to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAKv+Gu_fhFB-fFw20OjhPt5BM2cFuYxbD99JJK963gQftAAn3Q@mail.gmail.com \
    --to=ard.biesheuvel@linaro.org \
    --cc=Ard.Biesheuvel@arm.com \
    --cc=andrew.murray@arm.com \
    --cc=boqun.feng@gmail.com \
    --cc=catalin.marinas@arm.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=peterz@infradead.org \
    --cc=will.deacon@arm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Linux-ARM-Kernel Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-arm-kernel/0 linux-arm-kernel/git/0.git
	git clone --mirror https://lore.kernel.org/linux-arm-kernel/1 linux-arm-kernel/git/1.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-arm-kernel linux-arm-kernel/ https://lore.kernel.org/linux-arm-kernel \
		linux-arm-kernel@lists.infradead.org infradead-linux-arm-kernel@archiver.kernel.org
	public-inbox-index linux-arm-kernel

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.infradead.lists.linux-arm-kernel


AGPL code for this site: git clone https://public-inbox.org/ public-inbox