All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Roger Pau Monné" <roger.pau@citrix.com>
To: Jan Beulich <jbeulich@suse.com>
Cc: "xen-devel@lists.xenproject.org" <xen-devel@lists.xenproject.org>,
	Wei Liu <wl@xen.org>, Andrew Cooper <andrew.cooper3@citrix.com>
Subject: Re: [PATCH v2] x86: use POPCNT for hweight<N>() when available
Date: Wed, 20 May 2020 19:18:29 +0200	[thread overview]
Message-ID: <20200520171829.GO54375@Air-de-Roger> (raw)
In-Reply-To: <d0a15359-339f-6edd-034c-cd6385e929d1@suse.com>

On Wed, May 20, 2020 at 03:12:25PM +0200, Jan Beulich wrote:
> On 20.05.2020 13:43, Roger Pau Monné wrote:
> > On Wed, May 20, 2020 at 12:57:27PM +0200, Jan Beulich wrote:
> >> On 20.05.2020 12:28, Roger Pau Monné wrote:
> >>> On Wed, May 20, 2020 at 12:17:15PM +0200, Jan Beulich wrote:
> >>>> On 20.05.2020 11:31, Roger Pau Monné wrote:
> >>>>> On Wed, May 20, 2020 at 10:31:38AM +0200, Jan Beulich wrote:
> >>>>>> On 14.05.2020 16:05, Roger Pau Monné wrote:
> >>>>>>> On Mon, Jul 15, 2019 at 02:39:04PM +0000, Jan Beulich wrote:
> >>>>>>>> @@ -251,6 +255,10 @@ boot/mkelf32: boot/mkelf32.c
> >>>>>>>>   efi/mkreloc: efi/mkreloc.c
> >>>>>>>>   	$(HOSTCC) $(HOSTCFLAGS) -g -o $@ $<
> >>>>>>>>   
> >>>>>>>> +nocov-y += hweight.o
> >>>>>>>> +noubsan-y += hweight.o
> >>>>>>>> +hweight.o: CFLAGS += $(foreach reg,cx dx si 8 9 10 11,-ffixed-r$(reg))
> >>>>>>>
> >>>>>>> Why not use clobbers in the asm to list the scratch registers? Is it
> >>>>>>> that much expensive?
> >>>>>>
> >>>>>> The goal is to disturb the call sites as little as possible. There's
> >>>>>> no point avoiding the scratch registers when no call is made (i.e.
> >>>>>> when the POPCNT insn can be used). Taking away from the compiler 7
> >>>>>> out of 15 registers (that it can hold live data in) seems quite a
> >>>>>> lot to me.
> >>>>>
> >>>>> IMO using -ffixed-reg for all those registers is even worse, as that
> >>>>> prevents the generated code in hweight from using any of those, thus
> >>>>> greatly limiting the amount of registers and likely making the
> >>>>> generated code rely heavily on pushing an popping from the stack?
> >>>>
> >>>> Okay, that's the other side of the idea behind all this: Virtually no
> >>>> hardware we run on will lack POPCNT support, hence the quality of
> >>>> these fallback routines matters only the very old hardware, where we
> >>>> likely don't perform optimally already anyway.
> >>>>
> >>>>> This also has the side effect to limiting the usage of popcnt to gcc,
> >>>>> which IMO is also not ideal.
> >>>>
> >>>> Agreed. I don't know enough about clang to be able to think of
> >>>> possible alternatives. In any event there's no change to current
> >>>> behavior for hypervisors built with clang.
> >>>>
> >>>>> I also wondered, since the in-place asm before patching is a call
> >>>>> instruction, wouldn't inline asm at build time already assume that the
> >>>>> scratch registers are clobbered?
> >>>>
> >>>> That would imply the compiler peeks into the string literal of the
> >>>> asm(). At least gcc doesn't, and even if it did it couldn't infer an
> >>>> ABI from seeing a CALL insn.
> >>>
> >>> Please bear with me, but then I don't understand what Linux is doing
> >>> in arch/x86/include/asm/arch_hweight.h. I see no clobbers there,
> >>> neither it seems like the __sw_hweight{32/64} functions are built
> >>> without the usage of the scratch registers.
> >>
> >> __sw_hweight{32,64} are implemented in assembly, avoiding most
> >> scratch registers while pushing/popping the ones which do get
> >> altered.
> > 
> > Oh right, I was looking at lib/hweight.c instead of the arch one.
> > 
> > Would you agree to use the no_caller_saved_registers attribute (which
> > is available AFAICT for both gcc and clang) for generic_hweightXX and
> > then remove the asm prefix code in favour of the defines for
> > hweight{8/16}?
> 
> At least for gcc no_caller_saved_registers isn't old enough to be
> used unconditionally (nor is its companion -mgeneral-regs-only).
> If you tell me it's fine to use unconditionally with clang, then
> I can see about making this the preferred variant, with the
> present one as a fallback.

Hm, so my suggestion was bad, no_caller_saved_registers is only
implemented starting clang 5, which is newer than the minimum we
currently require (3.5).

So apart from adding a clobber to the asm instance covering the
scratch registers the only option I can see as viable is using a bunch
of dummy global variables assigned to the registers we need to prevent
the software generic_hweightXX functions from using, but that's ugly
and will likely trigger warnings at least, since I'm not sure the
compiler will find it safe to clobber a function call register with a
global variable. Or adding a prolegue / epilogue to the call
instruction in order to push / pop the relevant registers. None seems
like a very good option IMO.

I also assume that using no_caller_saved_registers when available or
else keeping the current behavior is not an acceptable solution? FWIW,
from a FreeBSD PoV I would be OK with that, as I don't think there are
any supported targets with clang < 5.

Thanks, Roger.


  reply	other threads:[~2020-05-20 17:19 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-07-15 14:39 [Xen-devel] [PATCH v2] x86: use POPCNT for hweight<N>() when available Jan Beulich
2020-05-14 14:05 ` Roger Pau Monné
2020-05-20  8:31   ` Jan Beulich
2020-05-20  9:31     ` Roger Pau Monné
2020-05-20 10:17       ` Jan Beulich
2020-05-20 10:28         ` Roger Pau Monné
2020-05-20 10:57           ` Jan Beulich
2020-05-20 11:43             ` Roger Pau Monné
2020-05-20 13:12               ` Jan Beulich
2020-05-20 17:18                 ` Roger Pau Monné [this message]
2020-05-22  9:58                   ` Jan Beulich
2020-05-22 10:19                     ` Roger Pau Monné
2023-03-17 11:22 ` Roger Pau Monné
2023-03-17 12:26   ` Andrew Cooper
2023-03-20  9:48     ` Jan Beulich
2023-03-21 14:57       ` Roger Pau Monné
2023-03-21 15:35         ` Jan Beulich
2023-03-21 15:44           ` Juergen Gross
2023-03-21 16:31           ` Roger Pau Monné
2023-03-21 16:41             ` Jan Beulich
2023-03-21 17:02               ` Roger Pau Monné

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200520171829.GO54375@Air-de-Roger \
    --to=roger.pau@citrix.com \
    --cc=andrew.cooper3@citrix.com \
    --cc=jbeulich@suse.com \
    --cc=wl@xen.org \
    --cc=xen-devel@lists.xenproject.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.