All of lore.kernel.org
 help / color / mirror / Atom feed
* powerpc Linux scv support and scv system call ABI proposal
@ 2020-01-28 10:50 Nicholas Piggin
  2020-01-28 13:09 ` Florian Weimer
  0 siblings, 1 reply; 26+ messages in thread
From: Nicholas Piggin @ 2020-01-28 10:50 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Tulio Magno Quites Machado Filho, libc-alpha

I would like to enable support for the scv instruction to provide the Linux
system calls.

This requires two things to be defined, firstly how to advertise support
for scv and how to allocate and advertise support for individual scv
vectors. Secondly, how to define a Linux system call ABI with this new
instruction.

I have put together a rough proposal along with some options and 
questions. Any thoughts or input would be welcome, I have probably
missed some things so please point them out.

(I will be on vacation for two weeks from the end of the week, may not
get to replying immediately)

Thanks,
Nick

System Call Vectored (scv) ABI

The scv instruction is introduced with POWER9 / ISA3, it comes with an
rfscv counter-part. The benefit of these instructions is performance
(trading slower SRR0/1 with faster LR/CTR registers, and entering the
kernel with MSR[EE] and MSR[RI] left enabled, which can avoid one mtmsrd
instruction. Another benefit is that the ABI can be changed if there is
a good reason to.

The scv instruction has 128 interrupt entry points (not enough to cover
the Linux system call space). The proposal is to assign scv numbers
conservatively. 'scv 0' could be used for the regular Linux system call
ABI initially. Examples of other assignments could be 32-bit compat
system calls, and firmware service calls.

Linux has not enabled FSCR[SCV] yet, so the instruction will trap with illegal
instruction on current environments. Linux has defined a HWCAP2 bit
PPC_FEATURE2_SCV for SCV support, but does not set it.

One option is for PPC_FEATURE2_SCV to indicate 'scv 0' support, and a new HWCAP
bit assigned for each new scv vector supported for userspace. This is the most
regular and flexible approach. It requires the most HWCAP space, but vector
usage is not expected to grow quickly.

Another option is for PPC_FEATURE2_SCV to indicate 'scv 0', and other vectors
will each return -ENOSYS, then when they are assigned to a new ABI, it will
define a particular way they can be queried for support (which would return
something other than -ENOSYS if supported). This will not require more HWCAP
bits, but it's less regular and more complicated to determine.

* Proposal is for PPC_FEATURE2_SCV to indicate 'scv 0' support, all other
  vectors will return -ENOSYS, and the decision for how to add support for
  a new vector deferred until we see the next user.

* Proposal is for scv 0 to provide the standard Linux system call ABI with some
  differences:

- LR is volatile across scv calls. This is necessary for support because the
  scv instruction clobbers LR.

- CR1 and CR5-CR7 are volatile. This matches the C ABI and would allow the
  system call exit to avoid restoring the CR register.

- Error handling: use of CR0[SO] to indicate error requires a mtcr / mtocr
  instruction on the kernel side, and it is currently not implemented well
  in glibc, requiring a mfcr (mfocr should be possible and asm goto support
  would allow a better implementation). Is it worth continuing this style of
  error handling? Or just move to -ve return means error? Using a different
  bit would allow the kernel to piggy back the CR return code setting with
  a test for the error case exit.

- R2 could be volatile as though it's an external function call, which
  would avoid one store  in the system call entry path. However it would
  require the caller to load R2 after the system call returns, where the
  latency of the load can not be overlapped with the costly system call
  exit sequence. On balance, it may be better to keep R2 as non-volatile.

- Number of volatile registers available seems sufficient. Linux's 'sc'
  handler is badly constrained here, but that is because it is shared
  between both hypercall and syscall handlers, which have different
  call conventions that share no volatile GPR registers! r9-r12 should
  be quite enough.

- Should this be for 64-bit only? 'scv 1' could be reserved for 32-bit
  calls if there was interest in developing an ABI for 32-bit programs.
  Marginal benefit in avoiding compat syscall selection.


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: powerpc Linux scv support and scv system call ABI proposal
  2020-01-28 10:50 powerpc Linux scv support and scv system call ABI proposal Nicholas Piggin
@ 2020-01-28 13:09 ` Florian Weimer
  2020-01-28 14:05   ` Nicholas Piggin
  2020-01-28 22:14   ` Joseph Myers
  0 siblings, 2 replies; 26+ messages in thread
From: Florian Weimer @ 2020-01-28 13:09 UTC (permalink / raw)
  To: Nicholas Piggin
  Cc: libc-alpha, Tulio Magno Quites Machado Filho, linuxppc-dev

* Nicholas Piggin:

> * Proposal is for PPC_FEATURE2_SCV to indicate 'scv 0' support, all other
>   vectors will return -ENOSYS, and the decision for how to add support for
>   a new vector deferred until we see the next user.

Seems reasonable.  We don't have to decide this today.

> * Proposal is for scv 0 to provide the standard Linux system call ABI with some
>   differences:
>
> - LR is volatile across scv calls. This is necessary for support because the
>   scv instruction clobbers LR.

I think we can express this in the glibc system call assembler wrapper
generators.  The mcount profiling wrappers already have this property.

But I don't think we are so lucky for the inline system calls.  GCC
recognizes an "lr" clobber with inline asm (even though it is not
documented), but it generates rather strange assembler output as a
result:

long
f (long x)
{
  long y;
  asm ("#" : "=r" (y) : "r" (x) : "lr");
  return y;
}

	.abiversion 2
	.section	".text"
	.align 2
	.p2align 4,,15
	.globl f
	.type	f, @function
f:
.LFB0:
	.cfi_startproc
	mflr 0
	.cfi_register 65, 0
#APP
 # 5 "t.c" 1
	#
 # 0 "" 2
#NO_APP
	std 0,16(1)
	.cfi_offset 65, 16
	ori 2,2,0
	ld 0,16(1)
	mtlr 0
	.cfi_restore 65
	blr
	.long 0
	.byte 0,0,0,1,0,0,0,0
	.cfi_endproc
.LFE0:
	.size	f,.-f


That's with GCC 8.3 at -O2.  I don't understand what the ori is about.

I don't think we can save LR in a regular register around the system
call, explicitly in the inline asm statement, because we still have to
generate proper unwinding information using CFI directives, something
that you cannot do from within the asm statement.

Supporting this in GCC should not be impossible, but someone who
actually knows this stuff needs to look at it.

> - CR1 and CR5-CR7 are volatile. This matches the C ABI and would allow the
>   system call exit to avoid restoring the CR register.

This sounds reasonable, but I don't know what kind of knock-on effects
this has.  The inline system call wrappers can handle this with minor
tweaks.

> - Error handling: use of CR0[SO] to indicate error requires a mtcr / mtocr
>   instruction on the kernel side, and it is currently not implemented well
>   in glibc, requiring a mfcr (mfocr should be possible and asm goto support
>   would allow a better implementation). Is it worth continuing this style of
>   error handling? Or just move to -ve return means error? Using a different
>   bit would allow the kernel to piggy back the CR return code setting with
>   a test for the error case exit.

GCC does not model the condition registers, so for inline system calls,
we have to produce a value anyway that the subsequence C code can check.
The assembler syscall wrappers do not need to do this, of course, but
I'm not sure which category of interfaces is more important.

But the kernel uses the -errno convention internally, so I think it
would make sense to pass this to userspace and not convert back and
forth.  This would align with what most of the architectures do, and
also avoids the GCC oddity.

> - Should this be for 64-bit only? 'scv 1' could be reserved for 32-bit
>   calls if there was interest in developing an ABI for 32-bit programs.
>   Marginal benefit in avoiding compat syscall selection.

We don't have an ELFv2 ABI for 32-bit.  I doubt it makes sense to
provide an ELFv1 port for this given that it's POWER9-specific.

From the glibc perspective, the major question is how we handle run-time
selection of the system call instruction sequence.  On i386, we use a
function pointer in the TCB to call an instruction sequence in the vDSO.
That's problematic from a security perspective.  I expect that on
POWER9, using a pointer in read-only memory would be equally
non-attractive due to a similar lack of PC-relative addressing.  We
could use the HWCAP bit in the TCB, but that would add another (easy to
predict) conditional branch to every system call.

I don't think it matters whether both system call variants use the same
error convention because we could have different error code extraction
code on the two branches.

Thanks,
Florian


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: powerpc Linux scv support and scv system call ABI proposal
  2020-01-28 13:09 ` Florian Weimer
@ 2020-01-28 14:05   ` Nicholas Piggin
  2020-01-28 15:40     ` Segher Boessenkool
                       ` (2 more replies)
  2020-01-28 22:14   ` Joseph Myers
  1 sibling, 3 replies; 26+ messages in thread
From: Nicholas Piggin @ 2020-01-28 14:05 UTC (permalink / raw)
  To: Florian Weimer; +Cc: linuxppc-dev, Tulio Magno Quites Machado Filho, libc-alpha

Florian Weimer's on January 28, 2020 11:09 pm:
> * Nicholas Piggin:
> 
>> * Proposal is for PPC_FEATURE2_SCV to indicate 'scv 0' support, all other
>>   vectors will return -ENOSYS, and the decision for how to add support for
>>   a new vector deferred until we see the next user.
> 
> Seems reasonable.  We don't have to decide this today.
> 
>> * Proposal is for scv 0 to provide the standard Linux system call ABI with some
>>   differences:
>>
>> - LR is volatile across scv calls. This is necessary for support because the
>>   scv instruction clobbers LR.
> 
> I think we can express this in the glibc system call assembler wrapper
> generators.  The mcount profiling wrappers already have this property.
> 
> But I don't think we are so lucky for the inline system calls.  GCC
> recognizes an "lr" clobber with inline asm (even though it is not
> documented), but it generates rather strange assembler output as a
> result:
> 
> long
> f (long x)
> {
>   long y;
>   asm ("#" : "=r" (y) : "r" (x) : "lr");
>   return y;
> }
> 
> 	.abiversion 2
> 	.section	".text"
> 	.align 2
> 	.p2align 4,,15
> 	.globl f
> 	.type	f, @function
> f:
> .LFB0:
> 	.cfi_startproc
> 	mflr 0
> 	.cfi_register 65, 0
> #APP
>  # 5 "t.c" 1
> 	#
>  # 0 "" 2
> #NO_APP
> 	std 0,16(1)
> 	.cfi_offset 65, 16
> 	ori 2,2,0
> 	ld 0,16(1)
> 	mtlr 0
> 	.cfi_restore 65
> 	blr
> 	.long 0
> 	.byte 0,0,0,1,0,0,0,0
> 	.cfi_endproc
> .LFE0:
> 	.size	f,.-f
> 
> 
> That's with GCC 8.3 at -O2.  I don't understand what the ori is about.

ori 2,2,0 is the group terminating nop hint for POWER8 type cores
which had dispatch grouping rules.

> 
> I don't think we can save LR in a regular register around the system
> call, explicitly in the inline asm statement, because we still have to
> generate proper unwinding information using CFI directives, something
> that you cannot do from within the asm statement.
> 
> Supporting this in GCC should not be impossible, but someone who
> actually knows this stuff needs to look at it.

The generated assembler actually seems okay to me. If we compile
something like a syscall and with -mcpu=power9:

long
f (long _r3, long _r4, long _r5, long _r6, long _r7, long _r8, long _r0)
{
  register long r0 asm ("r0") = _r0;
  register long r3 asm ("r3") = _r3;
  register long r4 asm ("r4") = _r4;
  register long r5 asm ("r5") = _r5;
  register long r6 asm ("r6") = _r6;
  register long r7 asm ("r7") = _r7;
  register long r8 asm ("r8") = _r8;

  asm ("# scv" : "=r"(r3) : "r"(r0), "r"(r4), "r"(r5), "r"(r6), "r"(r7), "r"(r8) : "lr", "ctr", "cc", "xer");

  return r3;
}


f:
.LFB0:
        .cfi_startproc
        mflr 0
        std 0,16(1)
        .cfi_offset 65, 16
        mr 0,9
#APP
 # 12 "a.c" 1
        # scv
 # 0 "" 2
#NO_APP
        ld 0,16(1)
        mtlr 0
        .cfi_restore 65
        blr
        .long 0
        .byte 0,0,0,1,0,0,0,0
        .cfi_endproc

That gets the LR save/restore right when we're also using r0.

> 
>> - CR1 and CR5-CR7 are volatile. This matches the C ABI and would allow the
>>   system call exit to avoid restoring the CR register.
> 
> This sounds reasonable, but I don't know what kind of knock-on effects
> this has.  The inline system call wrappers can handle this with minor
> tweaks.

Okay, good. In the end we would have to check code trace through the
kernel and libc of course, but I think there's little to no opportunity
to take advantage of current extra non-volatile cr regs.

mtcr has to write 8 independently renamed registers so it's cracked into
2 insns on POWER9 (and likely to always be a bit troublesome). It's not
much in the scheme of a system call, but while we can tweak the ABI...

> 
>> - Error handling: use of CR0[SO] to indicate error requires a mtcr / mtocr
>>   instruction on the kernel side, and it is currently not implemented well
>>   in glibc, requiring a mfcr (mfocr should be possible and asm goto support
>>   would allow a better implementation). Is it worth continuing this style of
>>   error handling? Or just move to -ve return means error? Using a different
>>   bit would allow the kernel to piggy back the CR return code setting with
>>   a test for the error case exit.
> 
> GCC does not model the condition registers, so for inline system calls,
> we have to produce a value anyway that the subsequence C code can check.
> The assembler syscall wrappers do not need to do this, of course, but
> I'm not sure which category of interfaces is more important.

Right. asm goto can improve this kind of pattern if it's inlined
into the C code which tests the result, it can branch using the flags
to the C error handling label, rather than move flags into GPR, test
GPR, branch. However...

> But the kernel uses the -errno convention internally, so I think it
> would make sense to pass this to userspace and not convert back and
> forth.  This would align with what most of the architectures do, and
> also avoids the GCC oddity.

Yes I would be interested in opinions for this option. It seems like
matching other architectures is a good idea. Maybe there are some
reasons not to.

>> - Should this be for 64-bit only? 'scv 1' could be reserved for 32-bit
>>   calls if there was interest in developing an ABI for 32-bit programs.
>>   Marginal benefit in avoiding compat syscall selection.
> 
> We don't have an ELFv2 ABI for 32-bit.  I doubt it makes sense to
> provide an ELFv1 port for this given that it's POWER9-specific.

Okay. There's no reason not to enable this for BE, at least for the
kernel it's no additional work so it probably remains enabled (unless
there is something really good we could do with the ABI if we exclude
ELFv1 but I don't see anything).

But if glibc only builds for ELFv2 support that's probably reasonable.

> 
> From the glibc perspective, the major question is how we handle run-time
> selection of the system call instruction sequence.  On i386, we use a
> function pointer in the TCB to call an instruction sequence in the vDSO.
> That's problematic from a security perspective.  I expect that on
> POWER9, using a pointer in read-only memory would be equally
> non-attractive due to a similar lack of PC-relative addressing.  We
> could use the HWCAP bit in the TCB, but that would add another (easy to
> predict) conditional branch to every system call.

I would have to defer to glibc devs on this. Conditional branch
should be acceptable I think, scv improves speed as much as several
mispredicted branches (about 90 cycles).

> I don't think it matters whether both system call variants use the same
> error convention because we could have different error code extraction
> code on the two branches.

That's one less difficulty.

Thanks,
Nick

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: powerpc Linux scv support and scv system call ABI proposal
  2020-01-28 14:05   ` Nicholas Piggin
@ 2020-01-28 15:40     ` Segher Boessenkool
  2020-01-28 16:04       ` Florian Weimer
  2020-01-28 15:58     ` Florian Weimer
  2020-01-28 17:26     ` Adhemerval Zanella
  2 siblings, 1 reply; 26+ messages in thread
From: Segher Boessenkool @ 2020-01-28 15:40 UTC (permalink / raw)
  To: Nicholas Piggin
  Cc: Florian Weimer, libc-alpha, Tulio Magno Quites Machado Filho,
	linuxppc-dev

On Wed, Jan 29, 2020 at 12:05:40AM +1000, Nicholas Piggin wrote:
> Florian Weimer's on January 28, 2020 11:09 pm:
> > But I don't think we are so lucky for the inline system calls.  GCC
> > recognizes an "lr" clobber with inline asm (even though it is not
> > documented), but it generates rather strange assembler output as a
> > result:

> > 	std 0,16(1)
> > 	ori 2,2,0
> > 	ld 0,16(1)

> > That's with GCC 8.3 at -O2.  I don't understand what the ori is about.
> 
> ori 2,2,0 is the group terminating nop hint for POWER8 type cores
> which had dispatch grouping rules.

Yup.  GCC generates that here to force the load into a different
scheduling group than the corresponding store is, because that otherwise
would cause very expensive pipeline flushes.  It does that if it knows it
is the same address (like here).

> > I don't think we can save LR in a regular register around the system
> > call, explicitly in the inline asm statement, because we still have to
> > generate proper unwinding information using CFI directives, something
> > that you cannot do from within the asm statement.

Why not?

> >> - Error handling: use of CR0[SO] to indicate error requires a mtcr / mtocr
> >>   instruction on the kernel side, and it is currently not implemented well
> >>   in glibc, requiring a mfcr (mfocr should be possible and asm goto support
> >>   would allow a better implementation). Is it worth continuing this style of
> >>   error handling? Or just move to -ve return means error? Using a different
> >>   bit would allow the kernel to piggy back the CR return code setting with
> >>   a test for the error case exit.
> > 
> > GCC does not model the condition registers,

Huh?  It does model the condition register, as 8 registers in GCC's
internal model (one each for CR0..CR7).

There is no way to use CR0 across function calls, with our ABIs: it is
a volatile register.

GCC does not model the SO bits in the CR fields.

If the calling convention would only use registers GCC *does* know
about, we can have a builtin for this, so that you can get better
inlining etc., no need for an assembler wrapper.

> > But the kernel uses the -errno convention internally, so I think it
> > would make sense to pass this to userspace and not convert back and
> > forth.  This would align with what most of the architectures do, and
> > also avoids the GCC oddity.
> 
> Yes I would be interested in opinions for this option. It seems like
> matching other architectures is a good idea. Maybe there are some
> reasons not to.

Agreed with you both here.

> >> - Should this be for 64-bit only? 'scv 1' could be reserved for 32-bit
> >>   calls if there was interest in developing an ABI for 32-bit programs.
> >>   Marginal benefit in avoiding compat syscall selection.
> > 
> > We don't have an ELFv2 ABI for 32-bit.  I doubt it makes sense to
> > provide an ELFv1 port for this given that it's POWER9-specific.

We *do* have a 32-bit LE ABI.  And ELFv1 is not 32-bit either.  Please
don't confuse these things :-)

The 64-bit LE kernel does not really support 32-bit userland (or BE
userland), *that* is what you want to say.

> > From the glibc perspective, the major question is how we handle run-time
> > selection of the system call instruction sequence.

Well, if it is inlined you don't have this problem either!  :-)


Segher

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: powerpc Linux scv support and scv system call ABI proposal
  2020-01-28 14:05   ` Nicholas Piggin
  2020-01-28 15:40     ` Segher Boessenkool
@ 2020-01-28 15:58     ` Florian Weimer
  2020-01-29  4:41       ` Nicholas Piggin
  2020-01-28 17:26     ` Adhemerval Zanella
  2 siblings, 1 reply; 26+ messages in thread
From: Florian Weimer @ 2020-01-28 15:58 UTC (permalink / raw)
  To: Nicholas Piggin
  Cc: linuxppc-dev, Tulio Magno Quites Machado Filho, libc-alpha

* Nicholas Piggin:

> That gets the LR save/restore right when we're also using r0.

Yes, I agree it looks good.  Nice.

>> But the kernel uses the -errno convention internally, so I think it
>> would make sense to pass this to userspace and not convert back and
>> forth.  This would align with what most of the architectures do, and
>> also avoids the GCC oddity.
>
> Yes I would be interested in opinions for this option. It seems like
> matching other architectures is a good idea. Maybe there are some
> reasons not to.

If there were a POWER-specific system call that uses all result bits and
doesn't have room for the 4096 error states (or an error number that's
outside that range), that would be a blocker.  I can't find such a
system call wrapped in the glibc sources.  musl's inline syscalls always
convert the errno state to -errno, so it's not possible to use such a
system call there.

>>> - Should this be for 64-bit only? 'scv 1' could be reserved for 32-bit
>>>   calls if there was interest in developing an ABI for 32-bit programs.
>>>   Marginal benefit in avoiding compat syscall selection.
>> 
>> We don't have an ELFv2 ABI for 32-bit.  I doubt it makes sense to
>> provide an ELFv1 port for this given that it's POWER9-specific.
>
> Okay. There's no reason not to enable this for BE, at least for the
> kernel it's no additional work so it probably remains enabled (unless
> there is something really good we could do with the ABI if we exclude
> ELFv1 but I don't see anything).
>
> But if glibc only builds for ELFv2 support that's probably reasonable.

To be clear, we still support ELFv1 for POWER, but given that this
feature is POWER9 and later, I expect the number of users benefiting
from 32-bit support (or ELFv1 and thus big-endian support) to be quite
small.

Especially if we go the conditional branch route, I would restrict this
to ELFv2 in glibc, at least for default builds.

>> From the glibc perspective, the major question is how we handle run-time
>> selection of the system call instruction sequence.  On i386, we use a
>> function pointer in the TCB to call an instruction sequence in the vDSO.
>> That's problematic from a security perspective.  I expect that on
>> POWER9, using a pointer in read-only memory would be equally
>> non-attractive due to a similar lack of PC-relative addressing.  We
>> could use the HWCAP bit in the TCB, but that would add another (easy to
>> predict) conditional branch to every system call.
>
> I would have to defer to glibc devs on this. Conditional branch
> should be acceptable I think, scv improves speed as much as several
> mispredicted branches (about 90 cycles).

But we'd have to pay for that branch (and likely the LR clobber) on
legacy POWER, and that's something to consider, too.

Is there an additional performance hit if a process uses both the old
and new system call sequence?

Thanks,
Florian


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: powerpc Linux scv support and scv system call ABI proposal
  2020-01-28 15:40     ` Segher Boessenkool
@ 2020-01-28 16:04       ` Florian Weimer
  2020-01-28 20:01         ` Segher Boessenkool
  0 siblings, 1 reply; 26+ messages in thread
From: Florian Weimer @ 2020-01-28 16:04 UTC (permalink / raw)
  To: Segher Boessenkool
  Cc: libc-alpha, Tulio Magno Quites Machado Filho, linuxppc-dev,
	Nicholas Piggin

* Segher Boessenkool:

>> > I don't think we can save LR in a regular register around the system
>> > call, explicitly in the inline asm statement, because we still have to
>> > generate proper unwinding information using CFI directives, something
>> > that you cannot do from within the asm statement.
>
> Why not?

As far as I knowm there isn't a CFI directive that allows us to restore
the CFI state at the end of the inline assembly.  If we say that LR is
stored in a different register than what the rest of the function uses,
that would lead to incorrect CFI after the exit of the inline assembler
fragment.

At least that's what I think.  Compilers aren't really my thing.

>
>> >> - Error handling: use of CR0[SO] to indicate error requires a mtcr / mtocr
>> >>   instruction on the kernel side, and it is currently not implemented well
>> >>   in glibc, requiring a mfcr (mfocr should be possible and asm goto support
>> >>   would allow a better implementation). Is it worth continuing this style of
>> >>   error handling? Or just move to -ve return means error? Using a different
>> >>   bit would allow the kernel to piggy back the CR return code setting with
>> >>   a test for the error case exit.
>> > 
>> > GCC does not model the condition registers,
>
> Huh?  It does model the condition register, as 8 registers in GCC's
> internal model (one each for CR0..CR7).

But GCC doesn't expose them as integers to C code, so you can't do much
without them.

>> >> - Should this be for 64-bit only? 'scv 1' could be reserved for 32-bit
>> >>   calls if there was interest in developing an ABI for 32-bit programs.
>> >>   Marginal benefit in avoiding compat syscall selection.
>> > 
>> > We don't have an ELFv2 ABI for 32-bit.  I doubt it makes sense to
>> > provide an ELFv1 port for this given that it's POWER9-specific.
>
> We *do* have a 32-bit LE ABI.  And ELFv1 is not 32-bit either.  Please
> don't confuse these things :-)
>
> The 64-bit LE kernel does not really support 32-bit userland (or BE
> userland), *that* is what you want to say.

Sorry for the confusion.  Is POWER9 running kernels which are not 64-bit
LE really a thing in practice, though?

>> > From the glibc perspective, the major question is how we handle run-time
>> > selection of the system call instruction sequence.
>
> Well, if it is inlined you don't have this problem either!  :-)

How so?  We would have to put the conditional sequence into all inline
system calls, of course.

Thanks,
Florian


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: powerpc Linux scv support and scv system call ABI proposal
  2020-01-28 14:05   ` Nicholas Piggin
  2020-01-28 15:40     ` Segher Boessenkool
  2020-01-28 15:58     ` Florian Weimer
@ 2020-01-28 17:26     ` Adhemerval Zanella
  2020-01-29  4:58       ` Nicholas Piggin
  2 siblings, 1 reply; 26+ messages in thread
From: Adhemerval Zanella @ 2020-01-28 17:26 UTC (permalink / raw)
  To: linuxppc-dev, npiggin



On 28/01/2020 11:05, Nicholas Piggin wrote:
> Florian Weimer's on January 28, 2020 11:09 pm:
>> * Nicholas Piggin:
>>
>>> * Proposal is for PPC_FEATURE2_SCV to indicate 'scv 0' support, all other
>>>   vectors will return -ENOSYS, and the decision for how to add support for
>>>   a new vector deferred until we see the next user.
>>
>> Seems reasonable.  We don't have to decide this today.
>>
>>> * Proposal is for scv 0 to provide the standard Linux system call ABI with some
>>>   differences:
>>>
>>> - LR is volatile across scv calls. This is necessary for support because the
>>>   scv instruction clobbers LR.
>>
>> I think we can express this in the glibc system call assembler wrapper
>> generators.  The mcount profiling wrappers already have this property.
>>
>> But I don't think we are so lucky for the inline system calls.  GCC
>> recognizes an "lr" clobber with inline asm (even though it is not
>> documented), but it generates rather strange assembler output as a
>> result:
>>
>> long
>> f (long x)
>> {
>>   long y;
>>   asm ("#" : "=r" (y) : "r" (x) : "lr");
>>   return y;
>> }
>>
>> 	.abiversion 2
>> 	.section	".text"
>> 	.align 2
>> 	.p2align 4,,15
>> 	.globl f
>> 	.type	f, @function
>> f:
>> .LFB0:
>> 	.cfi_startproc
>> 	mflr 0
>> 	.cfi_register 65, 0
>> #APP
>>  # 5 "t.c" 1
>> 	#
>>  # 0 "" 2
>> #NO_APP
>> 	std 0,16(1)
>> 	.cfi_offset 65, 16
>> 	ori 2,2,0
>> 	ld 0,16(1)
>> 	mtlr 0
>> 	.cfi_restore 65
>> 	blr
>> 	.long 0
>> 	.byte 0,0,0,1,0,0,0,0
>> 	.cfi_endproc
>> .LFE0:
>> 	.size	f,.-f
>>
>>
>> That's with GCC 8.3 at -O2.  I don't understand what the ori is about.
> 
> ori 2,2,0 is the group terminating nop hint for POWER8 type cores
> which had dispatch grouping rules.

It worth to note that it aims to mitigate a load-hit-store cpu stall
on some powerpc chips.

> 
>>
>> I don't think we can save LR in a regular register around the system
>> call, explicitly in the inline asm statement, because we still have to
>> generate proper unwinding information using CFI directives, something
>> that you cannot do from within the asm statement.
>>
>> Supporting this in GCC should not be impossible, but someone who
>> actually knows this stuff needs to look at it.
> 
> The generated assembler actually seems okay to me. If we compile
> something like a syscall and with -mcpu=power9:
> 
> long
> f (long _r3, long _r4, long _r5, long _r6, long _r7, long _r8, long _r0)
> {
>   register long r0 asm ("r0") = _r0;
>   register long r3 asm ("r3") = _r3;
>   register long r4 asm ("r4") = _r4;
>   register long r5 asm ("r5") = _r5;
>   register long r6 asm ("r6") = _r6;
>   register long r7 asm ("r7") = _r7;
>   register long r8 asm ("r8") = _r8;
> 
>   asm ("# scv" : "=r"(r3) : "r"(r0), "r"(r4), "r"(r5), "r"(r6), "r"(r7), "r"(r8) : "lr", "ctr", "cc", "xer");
> 
>   return r3;
> }
> 
> 
> f:
> .LFB0:
>         .cfi_startproc
>         mflr 0
>         std 0,16(1)
>         .cfi_offset 65, 16
>         mr 0,9
> #APP
>  # 12 "a.c" 1
>         # scv
>  # 0 "" 2
> #NO_APP
>         ld 0,16(1)
>         mtlr 0
>         .cfi_restore 65
>         blr
>         .long 0
>         .byte 0,0,0,1,0,0,0,0
>         .cfi_endproc
> 
> That gets the LR save/restore right when we're also using r0.
> 
>>
>>> - CR1 and CR5-CR7 are volatile. This matches the C ABI and would allow the
>>>   system call exit to avoid restoring the CR register.
>>
>> This sounds reasonable, but I don't know what kind of knock-on effects
>> this has.  The inline system call wrappers can handle this with minor
>> tweaks.
> 
> Okay, good. In the end we would have to check code trace through the
> kernel and libc of course, but I think there's little to no opportunity
> to take advantage of current extra non-volatile cr regs.
> 
> mtcr has to write 8 independently renamed registers so it's cracked into
> 2 insns on POWER9 (and likely to always be a bit troublesome). It's not
> much in the scheme of a system call, but while we can tweak the ABI...

We don't really need a mfcr/mfocr to implement the Linux syscall ABI on
powerpc, we can use a 'bns+' plus a neg instead as:

--
#define internal_syscall6(name, err, nr, arg1, arg2, arg3, arg4, arg5,  \
                          arg6)                                         \
  ({                                                                    \
    register long int r0  __asm__ ("r0") = (long int) (name);           \
    register long int r3  __asm__ ("r3") = (long int) (arg1);           \
    register long int r4  __asm__ ("r4") = (long int) (arg2);           \
    register long int r5  __asm__ ("r5") = (long int) (arg3);           \
    register long int r6  __asm__ ("r6") = (long int) (arg4);           \
    register long int r7  __asm__ ("r7") = (long int) (arg5);           \
    register long int r8  __asm__ ("r8") = (long int) (arg6);           \
    __asm__ __volatile__                                                \
      ("sc\n\t"                                                         \
       "bns+ 1f\n\t"                                                    \
       "neg %1, %1\n\t"                                                 \
       "1:\n\t"                                                         \
       : "+r" (r0), "+r" (r3), "+r" (r4), "+r" (r5), "+r" (r6),         \
         "+r" (r7), "+r" (r8)                                           \
       :                                                                \
       : "r9", "r10", "r11", "r12",                                     \
         "cr0", "memory");                                              \
    r3;                                                                 \
  })
--

And change INTERNAL_SYSCALL_ERROR_P to check for the expected invalid
range (((unsigned long) (val) >= (unsigned long) -4095)) and 
INTERNAL_SYSCALL_ERRNO to return a negative value (since the value will
be negated by INTERNAL_SYSCALL_ERROR_P).

The powerpc kernel ABI to use a different constraint to signal error
also requires glibc to reimplement the vDSO symbol call to be arch
specific instead a straight function call (since it might fallbacks
to a syscall).

Even for POWER-specific system call that uses all result bits, either
it should not fail or it would require a arch-specific implementation
to setup the expected error value (since the information would require
another source or a pre-defined value). 

In fact I think we make the assumption that INTERNAL_SYSCALL returns
a negative errno value in case or an error and make all the handling
to check for a syscall failure and errno setting generic. This will
required change ia64, mips, nios2, and sparc though.

> 
>>
>>> - Error handling: use of CR0[SO] to indicate error requires a mtcr / mtocr
>>>   instruction on the kernel side, and it is currently not implemented well
>>>   in glibc, requiring a mfcr (mfocr should be possible and asm goto support
>>>   would allow a better implementation). Is it worth continuing this style of
>>>   error handling? Or just move to -ve return means error? Using a different
>>>   bit would allow the kernel to piggy back the CR return code setting with
>>>   a test for the error case exit.
>>
>> GCC does not model the condition registers, so for inline system calls,
>> we have to produce a value anyway that the subsequence C code can check.
>> The assembler syscall wrappers do not need to do this, of course, but
>> I'm not sure which category of interfaces is more important.
> 
> Right. asm goto can improve this kind of pattern if it's inlined
> into the C code which tests the result, it can branch using the flags
> to the C error handling label, rather than move flags into GPR, test
> GPR, branch. However...
> 
>> But the kernel uses the -errno convention internally, so I think it
>> would make sense to pass this to userspace and not convert back and
>> forth.  This would align with what most of the architectures do, and
>> also avoids the GCC oddity.
> 
> Yes I would be interested in opinions for this option. It seems like
> matching other architectures is a good idea. Maybe there are some
> reasons not to.
> 
>>> - Should this be for 64-bit only? 'scv 1' could be reserved for 32-bit
>>>   calls if there was interest in developing an ABI for 32-bit programs.
>>>   Marginal benefit in avoiding compat syscall selection.
>>
>> We don't have an ELFv2 ABI for 32-bit.  I doubt it makes sense to
>> provide an ELFv1 port for this given that it's POWER9-specific.
> 
> Okay. There's no reason not to enable this for BE, at least for the
> kernel it's no additional work so it probably remains enabled (unless
> there is something really good we could do with the ABI if we exclude
> ELFv1 but I don't see anything).
> 
> But if glibc only builds for ELFv2 support that's probably reasonable.
> 
>>
>> From the glibc perspective, the major question is how we handle run-time
>> selection of the system call instruction sequence.  On i386, we use a
>> function pointer in the TCB to call an instruction sequence in the vDSO.
>> That's problematic from a security perspective.  I expect that on
>> POWER9, using a pointer in read-only memory would be equally
>> non-attractive due to a similar lack of PC-relative addressing.  We
>> could use the HWCAP bit in the TCB, but that would add another (easy to
>> predict) conditional branch to every system call.
> 
> I would have to defer to glibc devs on this. Conditional branch
> should be acceptable I think, scv improves speed as much as several
> mispredicted branches (about 90 cycles).
> 
>> I don't think it matters whether both system call variants use the same
>> error convention because we could have different error code extraction
>> code on the two branches.
> 
> That's one less difficulty.

We already had to push a similar hack where glibc used to abort transactions
prior syscalls to avoid some side-effects on kernel (commit 56cf2763819d2f).
It was eventually removed from syscall handling by f0458cf4f9ff3d870, where
we only enable TLE if kernel suppors PPC_FEATURE2_HTM_NOSC.

The transaction syscall abort used to read a variable directly from TCB,
so this could be an option. I would expect that we could optimize it where
if glibc is building against a recent kernel and compiler is building
for a ISA 3.0+ cpu we could remove the 'sc' code.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: powerpc Linux scv support and scv system call ABI proposal
  2020-01-28 16:04       ` Florian Weimer
@ 2020-01-28 20:01         ` Segher Boessenkool
  2020-01-29 16:19           ` Florian Weimer
  0 siblings, 1 reply; 26+ messages in thread
From: Segher Boessenkool @ 2020-01-28 20:01 UTC (permalink / raw)
  To: Florian Weimer
  Cc: libc-alpha, Tulio Magno Quites Machado Filho, linuxppc-dev,
	Nicholas Piggin

On Tue, Jan 28, 2020 at 05:04:49PM +0100, Florian Weimer wrote:
> * Segher Boessenkool:
> 
> >> > I don't think we can save LR in a regular register around the system
> >> > call, explicitly in the inline asm statement, because we still have to
> >> > generate proper unwinding information using CFI directives, something
> >> > that you cannot do from within the asm statement.
> >
> > Why not?
> 
> As far as I knowm there isn't a CFI directive that allows us to restore
> the CFI state at the end of the inline assembly.  If we say that LR is
> stored in a different register than what the rest of the function uses,
> that would lead to incorrect CFI after the exit of the inline assembler
> fragment.
> 
> At least that's what I think.  Compilers aren't really my thing.

.cfi_restore?  Or .cfi_remember_state / .cfi_restore_state, that is
probably easiest in inline assembler.

> >> > GCC does not model the condition registers,
> >
> > Huh?  It does model the condition register, as 8 registers in GCC's
> > internal model (one each for CR0..CR7).
> 
> But GCC doesn't expose them as integers to C code, so you can't do much
> without them.

Sure, it doesn't expose any other registers directly, either.

> >> > We don't have an ELFv2 ABI for 32-bit.  I doubt it makes sense to
> >> > provide an ELFv1 port for this given that it's POWER9-specific.
> >
> > We *do* have a 32-bit LE ABI.  And ELFv1 is not 32-bit either.  Please
> > don't confuse these things :-)
> >
> > The 64-bit LE kernel does not really support 32-bit userland (or BE
> > userland), *that* is what you want to say.
> 
> Sorry for the confusion.  Is POWER9 running kernels which are not 64-bit
> LE really a thing in practice, though?

Linux only really supports 64-bit LE userland on p9.  Anything else is
not supported.

> >> > From the glibc perspective, the major question is how we handle run-time
> >> > selection of the system call instruction sequence.
> >
> > Well, if it is inlined you don't have this problem either!  :-)
> 
> How so?  We would have to put the conditional sequence into all inline
> system calls, of course.

Ah, if you support older systems in your program as well, gotcha.  That
is not the usual case (just like people use -mcpu=power9 frequently,
which means the resulting program will not run on any older CPU).


Segher

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: powerpc Linux scv support and scv system call ABI proposal
  2020-01-28 13:09 ` Florian Weimer
  2020-01-28 14:05   ` Nicholas Piggin
@ 2020-01-28 22:14   ` Joseph Myers
  1 sibling, 0 replies; 26+ messages in thread
From: Joseph Myers @ 2020-01-28 22:14 UTC (permalink / raw)
  To: Florian Weimer
  Cc: libc-alpha, Tulio Magno Quites Machado Filho, linuxppc-dev,
	Nicholas Piggin

On Tue, 28 Jan 2020, Florian Weimer wrote:

> I don't think we can save LR in a regular register around the system
> call, explicitly in the inline asm statement, because we still have to
> generate proper unwinding information using CFI directives, something
> that you cannot do from within the asm statement.

What other architectures in glibc have done for code sequences for 
syscalls that are problematic for compiler-generated CFI is made the C 
syscall macros call separate functions defined in a .S file (see 
sysdeps/unix/sysv/linux/arm/libc-do-syscall.S, 
sysdeps/unix/sysv/linux/i386/libc-do-syscall.S, 
sysdeps/unix/sysv/linux/mips/mips32/mips-syscall[567].S).  I don't know if 
you can do that in this case and still get the performance benefits of the 
new instruction.

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: powerpc Linux scv support and scv system call ABI proposal
  2020-01-28 15:58     ` Florian Weimer
@ 2020-01-29  4:41       ` Nicholas Piggin
  0 siblings, 0 replies; 26+ messages in thread
From: Nicholas Piggin @ 2020-01-29  4:41 UTC (permalink / raw)
  To: Florian Weimer; +Cc: linuxppc-dev, Tulio Magno Quites Machado Filho, libc-alpha

Florian Weimer's on January 29, 2020 1:58 am:
> * Nicholas Piggin:
> 
>> That gets the LR save/restore right when we're also using r0.
> 
> Yes, I agree it looks good.  Nice.
> 
>>> But the kernel uses the -errno convention internally, so I think it
>>> would make sense to pass this to userspace and not convert back and
>>> forth.  This would align with what most of the architectures do, and
>>> also avoids the GCC oddity.
>>
>> Yes I would be interested in opinions for this option. It seems like
>> matching other architectures is a good idea. Maybe there are some
>> reasons not to.
> 
> If there were a POWER-specific system call that uses all result bits and
> doesn't have room for the 4096 error states (or an error number that's
> outside that range), that would be a blocker.  I can't find such a
> system call wrapped in the glibc sources.

Nothing apparent in the kernel sources either.

> musl's inline syscalls always
> convert the errno state to -errno, so it's not possible to use such a
> system call there.
> 
>>>> - Should this be for 64-bit only? 'scv 1' could be reserved for 32-bit
>>>>   calls if there was interest in developing an ABI for 32-bit programs.
>>>>   Marginal benefit in avoiding compat syscall selection.
>>> 
>>> We don't have an ELFv2 ABI for 32-bit.  I doubt it makes sense to
>>> provide an ELFv1 port for this given that it's POWER9-specific.
>>
>> Okay. There's no reason not to enable this for BE, at least for the
>> kernel it's no additional work so it probably remains enabled (unless
>> there is something really good we could do with the ABI if we exclude
>> ELFv1 but I don't see anything).
>>
>> But if glibc only builds for ELFv2 support that's probably reasonable.
> 
> To be clear, we still support ELFv1 for POWER, but given that this
> feature is POWER9 and later, I expect the number of users benefiting
> from 32-bit support (or ELFv1 and thus big-endian support) to be quite
> small.
> 
> Especially if we go the conditional branch route, I would restrict this
> to ELFv2 in glibc, at least for default builds.
> 
>>> From the glibc perspective, the major question is how we handle run-time
>>> selection of the system call instruction sequence.  On i386, we use a
>>> function pointer in the TCB to call an instruction sequence in the vDSO.
>>> That's problematic from a security perspective.  I expect that on
>>> POWER9, using a pointer in read-only memory would be equally
>>> non-attractive due to a similar lack of PC-relative addressing.  We
>>> could use the HWCAP bit in the TCB, but that would add another (easy to
>>> predict) conditional branch to every system call.
>>
>> I would have to defer to glibc devs on this. Conditional branch
>> should be acceptable I think, scv improves speed as much as several
>> mispredicted branches (about 90 cycles).
> 
> But we'd have to pay for that branch (and likely the LR clobber) on
> legacy POWER, and that's something to consider, too.

We would that's true.

> Is there an additional performance hit if a process uses both the old
> and new system call sequence?

No state or logic required to switch between them or run them
concurrently. Just the  extra instruction footprint.

Thanks,
Nick


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: powerpc Linux scv support and scv system call ABI proposal
  2020-01-28 17:26     ` Adhemerval Zanella
@ 2020-01-29  4:58       ` Nicholas Piggin
  2020-01-29 13:20         ` Segher Boessenkool
  2020-01-29 15:51         ` Tulio Magno Quites Machado Filho
  0 siblings, 2 replies; 26+ messages in thread
From: Nicholas Piggin @ 2020-01-29  4:58 UTC (permalink / raw)
  To: Adhemerval Zanella, linuxppc-dev

Adhemerval Zanella's on January 29, 2020 3:26 am:
> 
> 
> On 28/01/2020 11:05, Nicholas Piggin wrote:
>> Florian Weimer's on January 28, 2020 11:09 pm:
>>> * Nicholas Piggin:
>>>
>>>> * Proposal is for PPC_FEATURE2_SCV to indicate 'scv 0' support, all other
>>>>   vectors will return -ENOSYS, and the decision for how to add support for
>>>>   a new vector deferred until we see the next user.
>>>
>>> Seems reasonable.  We don't have to decide this today.
>>>
>>>> * Proposal is for scv 0 to provide the standard Linux system call ABI with some
>>>>   differences:
>>>>
>>>> - LR is volatile across scv calls. This is necessary for support because the
>>>>   scv instruction clobbers LR.
>>>
>>> I think we can express this in the glibc system call assembler wrapper
>>> generators.  The mcount profiling wrappers already have this property.
>>>
>>> But I don't think we are so lucky for the inline system calls.  GCC
>>> recognizes an "lr" clobber with inline asm (even though it is not
>>> documented), but it generates rather strange assembler output as a
>>> result:
>>>
>>> long
>>> f (long x)
>>> {
>>>   long y;
>>>   asm ("#" : "=r" (y) : "r" (x) : "lr");
>>>   return y;
>>> }
>>>
>>> 	.abiversion 2
>>> 	.section	".text"
>>> 	.align 2
>>> 	.p2align 4,,15
>>> 	.globl f
>>> 	.type	f, @function
>>> f:
>>> .LFB0:
>>> 	.cfi_startproc
>>> 	mflr 0
>>> 	.cfi_register 65, 0
>>> #APP
>>>  # 5 "t.c" 1
>>> 	#
>>>  # 0 "" 2
>>> #NO_APP
>>> 	std 0,16(1)
>>> 	.cfi_offset 65, 16
>>> 	ori 2,2,0
>>> 	ld 0,16(1)
>>> 	mtlr 0
>>> 	.cfi_restore 65
>>> 	blr
>>> 	.long 0
>>> 	.byte 0,0,0,1,0,0,0,0
>>> 	.cfi_endproc
>>> .LFE0:
>>> 	.size	f,.-f
>>>
>>>
>>> That's with GCC 8.3 at -O2.  I don't understand what the ori is about.
>> 
>> ori 2,2,0 is the group terminating nop hint for POWER8 type cores
>> which had dispatch grouping rules.
> 
> It worth to note that it aims to mitigate a load-hit-store cpu stall
> on some powerpc chips.
> 
>> 
>>>
>>> I don't think we can save LR in a regular register around the system
>>> call, explicitly in the inline asm statement, because we still have to
>>> generate proper unwinding information using CFI directives, something
>>> that you cannot do from within the asm statement.
>>>
>>> Supporting this in GCC should not be impossible, but someone who
>>> actually knows this stuff needs to look at it.
>> 
>> The generated assembler actually seems okay to me. If we compile
>> something like a syscall and with -mcpu=power9:
>> 
>> long
>> f (long _r3, long _r4, long _r5, long _r6, long _r7, long _r8, long _r0)
>> {
>>   register long r0 asm ("r0") = _r0;
>>   register long r3 asm ("r3") = _r3;
>>   register long r4 asm ("r4") = _r4;
>>   register long r5 asm ("r5") = _r5;
>>   register long r6 asm ("r6") = _r6;
>>   register long r7 asm ("r7") = _r7;
>>   register long r8 asm ("r8") = _r8;
>> 
>>   asm ("# scv" : "=r"(r3) : "r"(r0), "r"(r4), "r"(r5), "r"(r6), "r"(r7), "r"(r8) : "lr", "ctr", "cc", "xer");
>> 
>>   return r3;
>> }
>> 
>> 
>> f:
>> .LFB0:
>>         .cfi_startproc
>>         mflr 0
>>         std 0,16(1)
>>         .cfi_offset 65, 16
>>         mr 0,9
>> #APP
>>  # 12 "a.c" 1
>>         # scv
>>  # 0 "" 2
>> #NO_APP
>>         ld 0,16(1)
>>         mtlr 0
>>         .cfi_restore 65
>>         blr
>>         .long 0
>>         .byte 0,0,0,1,0,0,0,0
>>         .cfi_endproc
>> 
>> That gets the LR save/restore right when we're also using r0.
>> 
>>>
>>>> - CR1 and CR5-CR7 are volatile. This matches the C ABI and would allow the
>>>>   system call exit to avoid restoring the CR register.
>>>
>>> This sounds reasonable, but I don't know what kind of knock-on effects
>>> this has.  The inline system call wrappers can handle this with minor
>>> tweaks.
>> 
>> Okay, good. In the end we would have to check code trace through the
>> kernel and libc of course, but I think there's little to no opportunity
>> to take advantage of current extra non-volatile cr regs.
>> 
>> mtcr has to write 8 independently renamed registers so it's cracked into
>> 2 insns on POWER9 (and likely to always be a bit troublesome). It's not
>> much in the scheme of a system call, but while we can tweak the ABI...
> 
> We don't really need a mfcr/mfocr to implement the Linux syscall ABI on
> powerpc, we can use a 'bns+' plus a neg instead as:
> 
> --
> #define internal_syscall6(name, err, nr, arg1, arg2, arg3, arg4, arg5,  \
>                           arg6)                                         \
>   ({                                                                    \
>     register long int r0  __asm__ ("r0") = (long int) (name);           \
>     register long int r3  __asm__ ("r3") = (long int) (arg1);           \
>     register long int r4  __asm__ ("r4") = (long int) (arg2);           \
>     register long int r5  __asm__ ("r5") = (long int) (arg3);           \
>     register long int r6  __asm__ ("r6") = (long int) (arg4);           \
>     register long int r7  __asm__ ("r7") = (long int) (arg5);           \
>     register long int r8  __asm__ ("r8") = (long int) (arg6);           \
>     __asm__ __volatile__                                                \
>       ("sc\n\t"                                                         \
>        "bns+ 1f\n\t"                                                    \
>        "neg %1, %1\n\t"                                                 \
>        "1:\n\t"                                                         \
>        : "+r" (r0), "+r" (r3), "+r" (r4), "+r" (r5), "+r" (r6),         \
>          "+r" (r7), "+r" (r8)                                           \
>        :                                                                \
>        : "r9", "r10", "r11", "r12",                                     \
>          "cr0", "memory");                                              \
>     r3;                                                                 \
>   })

True, but the taken branch would be a 1 cycle bubble in fetch. Could 
avoid that by branching out of line then back for the error case. But
mfocrf is fine (only sources one register), that's what should be used
here I think.

That probably makes the performance argument for avoiding CR[SO] for
error return indication less significant. Commonality with other
architectures is probably the bigger reason for it.

> --
> 
> And change INTERNAL_SYSCALL_ERROR_P to check for the expected invalid
> range (((unsigned long) (val) >= (unsigned long) -4095)) and 
> INTERNAL_SYSCALL_ERRNO to return a negative value (since the value will
> be negated by INTERNAL_SYSCALL_ERROR_P).
> 
> The powerpc kernel ABI to use a different constraint to signal error
> also requires glibc to reimplement the vDSO symbol call to be arch
> specific instead a straight function call (since it might fallbacks
> to a syscall).
> 
> Even for POWER-specific system call that uses all result bits, either
> it should not fail or it would require a arch-specific implementation
> to setup the expected error value (since the information would require
> another source or a pre-defined value). 
> 
> In fact I think we make the assumption that INTERNAL_SYSCALL returns
> a negative errno value in case or an error and make all the handling
> to check for a syscall failure and errno setting generic. This will
> required change ia64, mips, nios2, and sparc though.
> 
>> 
>>>
>>>> - Error handling: use of CR0[SO] to indicate error requires a mtcr / mtocr
>>>>   instruction on the kernel side, and it is currently not implemented well
>>>>   in glibc, requiring a mfcr (mfocr should be possible and asm goto support
>>>>   would allow a better implementation). Is it worth continuing this style of
>>>>   error handling? Or just move to -ve return means error? Using a different
>>>>   bit would allow the kernel to piggy back the CR return code setting with
>>>>   a test for the error case exit.
>>>
>>> GCC does not model the condition registers, so for inline system calls,
>>> we have to produce a value anyway that the subsequence C code can check.
>>> The assembler syscall wrappers do not need to do this, of course, but
>>> I'm not sure which category of interfaces is more important.
>> 
>> Right. asm goto can improve this kind of pattern if it's inlined
>> into the C code which tests the result, it can branch using the flags
>> to the C error handling label, rather than move flags into GPR, test
>> GPR, branch. However...
>> 
>>> But the kernel uses the -errno convention internally, so I think it
>>> would make sense to pass this to userspace and not convert back and
>>> forth.  This would align with what most of the architectures do, and
>>> also avoids the GCC oddity.
>> 
>> Yes I would be interested in opinions for this option. It seems like
>> matching other architectures is a good idea. Maybe there are some
>> reasons not to.
>> 
>>>> - Should this be for 64-bit only? 'scv 1' could be reserved for 32-bit
>>>>   calls if there was interest in developing an ABI for 32-bit programs.
>>>>   Marginal benefit in avoiding compat syscall selection.
>>>
>>> We don't have an ELFv2 ABI for 32-bit.  I doubt it makes sense to
>>> provide an ELFv1 port for this given that it's POWER9-specific.
>> 
>> Okay. There's no reason not to enable this for BE, at least for the
>> kernel it's no additional work so it probably remains enabled (unless
>> there is something really good we could do with the ABI if we exclude
>> ELFv1 but I don't see anything).
>> 
>> But if glibc only builds for ELFv2 support that's probably reasonable.
>> 
>>>
>>> From the glibc perspective, the major question is how we handle run-time
>>> selection of the system call instruction sequence.  On i386, we use a
>>> function pointer in the TCB to call an instruction sequence in the vDSO.
>>> That's problematic from a security perspective.  I expect that on
>>> POWER9, using a pointer in read-only memory would be equally
>>> non-attractive due to a similar lack of PC-relative addressing.  We
>>> could use the HWCAP bit in the TCB, but that would add another (easy to
>>> predict) conditional branch to every system call.
>> 
>> I would have to defer to glibc devs on this. Conditional branch
>> should be acceptable I think, scv improves speed as much as several
>> mispredicted branches (about 90 cycles).
>> 
>>> I don't think it matters whether both system call variants use the same
>>> error convention because we could have different error code extraction
>>> code on the two branches.
>> 
>> That's one less difficulty.
> 
> We already had to push a similar hack where glibc used to abort transactions
> prior syscalls to avoid some side-effects on kernel (commit 56cf2763819d2f).
> It was eventually removed from syscall handling by f0458cf4f9ff3d870, where
> we only enable TLE if kernel suppors PPC_FEATURE2_HTM_NOSC.
> 
> The transaction syscall abort used to read a variable directly from TCB,
> so this could be an option. I would expect that we could optimize it where
> if glibc is building against a recent kernel and compiler is building
> for a ISA 3.0+ cpu we could remove the 'sc' code.
> 

We would just have to be careful of running on ISA 3.0 CPUs on older
kernels which do not support scv.

I don't think this is such a big deal, yes we'll have a switch for a
time and it will be slightly slower, but we can eventually remove it.

Thanks,
Nick

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: powerpc Linux scv support and scv system call ABI proposal
  2020-01-29  4:58       ` Nicholas Piggin
@ 2020-01-29 13:20         ` Segher Boessenkool
  2020-01-29 15:51         ` Tulio Magno Quites Machado Filho
  1 sibling, 0 replies; 26+ messages in thread
From: Segher Boessenkool @ 2020-01-29 13:20 UTC (permalink / raw)
  To: Nicholas Piggin; +Cc: linuxppc-dev, Adhemerval Zanella

On Wed, Jan 29, 2020 at 02:58:44PM +1000, Nicholas Piggin wrote:
> Adhemerval Zanella's on January 29, 2020 3:26 am:
> >     __asm__ __volatile__                                                \
> >       ("sc\n\t"                                                         \
> >        "bns+ 1f\n\t"                                                    \
> >        "neg %1, %1\n\t"                                                 \
> >        "1:\n\t"                                                         \

> True, but the taken branch would be a 1 cycle bubble in fetch. Could 
> avoid that by branching out of line then back for the error case. But
> mfocrf is fine (only sources one register), that's what should be used
> here I think.

        neg %9,%1 ; isel %1,%9,%1,so

> That probably makes the performance argument for avoiding CR[SO] for
> error return indication less significant. Commonality with other
> architectures is probably the bigger reason for it.

Yes, and to have the syscall calling convention closer to the normal
function calling convention would be good, too.


Segher

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: powerpc Linux scv support and scv system call ABI proposal
  2020-01-29  4:58       ` Nicholas Piggin
  2020-01-29 13:20         ` Segher Boessenkool
@ 2020-01-29 15:51         ` Tulio Magno Quites Machado Filho
  2020-02-19 11:03           ` Nicholas Piggin
  1 sibling, 1 reply; 26+ messages in thread
From: Tulio Magno Quites Machado Filho @ 2020-01-29 15:51 UTC (permalink / raw)
  To: Nicholas Piggin, Adhemerval Zanella, linuxppc-dev,
	Libc-alpha Mailing List

Nicholas Piggin <npiggin@gmail.com> writes:

> Adhemerval Zanella's on January 29, 2020 3:26 am:
>> 
>> We already had to push a similar hack where glibc used to abort transactions
>> prior syscalls to avoid some side-effects on kernel (commit 56cf2763819d2f).
>> It was eventually removed from syscall handling by f0458cf4f9ff3d870, where
>> we only enable TLE if kernel suppors PPC_FEATURE2_HTM_NOSC.
>> 
>> The transaction syscall abort used to read a variable directly from TCB,
>> so this could be an option. I would expect that we could optimize it where
>> if glibc is building against a recent kernel and compiler is building
>> for a ISA 3.0+ cpu we could remove the 'sc' code.
>> 
>
> We would just have to be careful of running on ISA 3.0 CPUs on older
> kernels which do not support scv.

Can we assume that, if a syscall is available through sc it's also available
in scv 0?

Because if that's true, I believe your suggestion to interpret PPC_FEATURE2_SCV
as scv 0 support would be helpful to provide this support via IFUNC even
when glibc is built using --with-cpu=power8, which is the most common scenario
in ppc64le.

In that scenario, it seems new HWCAP bits for new vectors wouldn't be too
frequent, which was the only downside of this proposal.

-- 
Tulio Magno

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: powerpc Linux scv support and scv system call ABI proposal
  2020-01-28 20:01         ` Segher Boessenkool
@ 2020-01-29 16:19           ` Florian Weimer
  2020-01-29 16:29             ` Segher Boessenkool
  0 siblings, 1 reply; 26+ messages in thread
From: Florian Weimer @ 2020-01-29 16:19 UTC (permalink / raw)
  To: Segher Boessenkool
  Cc: libc-alpha, Tulio Magno Quites Machado Filho, linuxppc-dev,
	Nicholas Piggin

* Segher Boessenkool:

> On Tue, Jan 28, 2020 at 05:04:49PM +0100, Florian Weimer wrote:
>> * Segher Boessenkool:
>> 
>> >> > I don't think we can save LR in a regular register around the system
>> >> > call, explicitly in the inline asm statement, because we still have to
>> >> > generate proper unwinding information using CFI directives, something
>> >> > that you cannot do from within the asm statement.
>> >
>> > Why not?
>> 
>> As far as I knowm there isn't a CFI directive that allows us to restore
>> the CFI state at the end of the inline assembly.  If we say that LR is
>> stored in a different register than what the rest of the function uses,
>> that would lead to incorrect CFI after the exit of the inline assembler
>> fragment.
>> 
>> At least that's what I think.  Compilers aren't really my thing.
>
> .cfi_restore?  Or .cfi_remember_state / .cfi_restore_state, that is
> probably easiest in inline assembler.

Oh, right, .cfi_remember_state and .cfi_restore_state should work, as
long as -fno-dwarf2-cfi-asm isn't used (but that's okay given that we
need this only for the glibc build).

But it looks like we can use an explicit "lr" clobber, so we should be
good anyway.

>> >> > GCC does not model the condition registers,
>> >
>> > Huh?  It does model the condition register, as 8 registers in GCC's
>> > internal model (one each for CR0..CR7).
>> 
>> But GCC doesn't expose them as integers to C code, so you can't do much
>> without them.
>
> Sure, it doesn't expose any other registers directly, either.

I can use r0 & 1 with a register variable r0 to check a bit.  I don't
think writing a similar check against a condition register is possible
today.

Thanks,
Florian


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: powerpc Linux scv support and scv system call ABI proposal
  2020-01-29 16:19           ` Florian Weimer
@ 2020-01-29 16:29             ` Segher Boessenkool
  2020-01-29 17:02               ` Florian Weimer
  0 siblings, 1 reply; 26+ messages in thread
From: Segher Boessenkool @ 2020-01-29 16:29 UTC (permalink / raw)
  To: Florian Weimer
  Cc: libc-alpha, Tulio Magno Quites Machado Filho, linuxppc-dev,
	Nicholas Piggin

On Wed, Jan 29, 2020 at 05:19:19PM +0100, Florian Weimer wrote:
> * Segher Boessenkool:
> >> But GCC doesn't expose them as integers to C code, so you can't do much
> >> without them.
> >
> > Sure, it doesn't expose any other registers directly, either.
> 
> I can use r0 & 1 with a register variable r0 to check a bit.

That is not reliable, or supported, and it *will* break.  This is
explicit for local register asm, and global register asm is
underdefined.

> I don't
> think writing a similar check against a condition register is possible
> today.

That's right.  You cannot express CCmode (MODE_CC) values in C at all.


Segher

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: powerpc Linux scv support and scv system call ABI proposal
  2020-01-29 16:29             ` Segher Boessenkool
@ 2020-01-29 17:02               ` Florian Weimer
  2020-01-29 17:51                 ` Segher Boessenkool
  0 siblings, 1 reply; 26+ messages in thread
From: Florian Weimer @ 2020-01-29 17:02 UTC (permalink / raw)
  To: Segher Boessenkool
  Cc: libc-alpha, Tulio Magno Quites Machado Filho, linuxppc-dev,
	Nicholas Piggin

* Segher Boessenkool:

> On Wed, Jan 29, 2020 at 05:19:19PM +0100, Florian Weimer wrote:
>> * Segher Boessenkool:
>> >> But GCC doesn't expose them as integers to C code, so you can't do much
>> >> without them.
>> >
>> > Sure, it doesn't expose any other registers directly, either.
>> 
>> I can use r0 & 1 with a register variable r0 to check a bit.
>
> That is not reliable, or supported, and it *will* break.  This is
> explicit for local register asm, and global register asm is
> underdefined.

Ugh.  I did not know that.  And neither did the person who wrote
powerpc64/sysdep.h because it uses register variables in regular C
expressions. 8-(  Other architectures are affected as well.

One set of issues is less of a problem if system call arguments are
variables and not complex expressions, so that side effects do not
clobber registers in the initialization:

	register long __a0 asm ("$4") = (long) (arg1);

But I wasn't aware of that constraint on the macro users at all.

Thanks,
Florian


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: powerpc Linux scv support and scv system call ABI proposal
  2020-01-29 17:02               ` Florian Weimer
@ 2020-01-29 17:51                 ` Segher Boessenkool
  2020-01-30 10:42                   ` Florian Weimer
  0 siblings, 1 reply; 26+ messages in thread
From: Segher Boessenkool @ 2020-01-29 17:51 UTC (permalink / raw)
  To: Florian Weimer
  Cc: libc-alpha, Tulio Magno Quites Machado Filho, linuxppc-dev,
	Nicholas Piggin

On Wed, Jan 29, 2020 at 06:02:34PM +0100, Florian Weimer wrote:
> * Segher Boessenkool:
> 
> > On Wed, Jan 29, 2020 at 05:19:19PM +0100, Florian Weimer wrote:
> >> * Segher Boessenkool:
> >> >> But GCC doesn't expose them as integers to C code, so you can't do much
> >> >> without them.
> >> >
> >> > Sure, it doesn't expose any other registers directly, either.
> >> 
> >> I can use r0 & 1 with a register variable r0 to check a bit.
> >
> > That is not reliable, or supported, and it *will* break.  This is
> > explicit for local register asm, and global register asm is
> > underdefined.
> 
> Ugh.  I did not know that.  And neither did the person who wrote
> powerpc64/sysdep.h because it uses register variables in regular C
> expressions. 8-(  Other architectures are affected as well.

Where?  I don't see any?  Ah, the other one, heh (there are two).

No, that *is* supported: as input to or output from an asm, a local
register asm variable *is* guaranteed to live in the specified register.
This is the *only* supported use.  Other uses may sometimes still work,
but they never worked reliably, and it cannot be made reliable; it has
been documented as not supported since ages, and it will not work at all
anymore some day.


Segher

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: powerpc Linux scv support and scv system call ABI proposal
  2020-01-29 17:51                 ` Segher Boessenkool
@ 2020-01-30 10:42                   ` Florian Weimer
  2020-01-30 11:25                     ` Segher Boessenkool
  0 siblings, 1 reply; 26+ messages in thread
From: Florian Weimer @ 2020-01-30 10:42 UTC (permalink / raw)
  To: Segher Boessenkool
  Cc: libc-alpha, Tulio Magno Quites Machado Filho, linuxppc-dev,
	Nicholas Piggin

* Segher Boessenkool:

> On Wed, Jan 29, 2020 at 06:02:34PM +0100, Florian Weimer wrote:
>> * Segher Boessenkool:
>> 
>> > On Wed, Jan 29, 2020 at 05:19:19PM +0100, Florian Weimer wrote:
>> >> * Segher Boessenkool:
>> >> >> But GCC doesn't expose them as integers to C code, so you can't do much
>> >> >> without them.
>> >> >
>> >> > Sure, it doesn't expose any other registers directly, either.
>> >> 
>> >> I can use r0 & 1 with a register variable r0 to check a bit.
>> >
>> > That is not reliable, or supported, and it *will* break.  This is
>> > explicit for local register asm, and global register asm is
>> > underdefined.
>> 
>> Ugh.  I did not know that.  And neither did the person who wrote
>> powerpc64/sysdep.h because it uses register variables in regular C
>> expressions. 8-(  Other architectures are affected as well.
>
> Where?  I don't see any?  Ah, the other one, heh (there are two).
>
> No, that *is* supported: as input to or output from an asm, a local
> register asm variable *is* guaranteed to live in the specified register.
> This is the *only* supported use.  Other uses may sometimes still work,
> but they never worked reliably, and it cannot be made reliable; it has
> been documented as not supported since ages, and it will not work at all
> anymore some day.

I must say I find this situation *very* confusing.

You said that r0 & 1 is undefined.  I *assumed* that I would still get
the value of r0 (the register) from the associated extended asm in this
expression, even if it may now be a different register.  Your comment
made me think that this is undefined.  But then the syscall wrappers use
this construct:

    __asm__ __volatile__						\
      ("sc\n\t"								\
       "mfcr  %0\n\t"							\
       "0:"								\
       : "=&r" (r0),							\
         "=&r" (r3), "=&r" (r4), "=&r" (r5),				\
         "=&r" (r6), "=&r" (r7), "=&r" (r8)				\
       : ASM_INPUT_##nr							\
       : "r9", "r10", "r11", "r12",					\
         "cr0", "ctr", "memory");					\
	  err = r0;  \
    r3;  \

That lone r3 at the end would be equally undefined because it is not
used in an input or output operand of an extended asm statement.

The GCC documentation has this warning:

|  _Warning:_ In the above example, be aware that a register (for
| example 'r0') can be call-clobbered by subsequent code, including
| function calls and library calls for arithmetic operators on other
| variables (for example the initialization of 'p2').

On POWER, the LOADARGS macros attempt to deal with this by using
non-register temporaries.  However, I don't know how effective this is
if the compiler really doesn't deal with call-clobbered registers
properly.

For the extended asm use case (to express register assignments that
cannot be listed in constraints), I would expect that these variables
retain their values according to the C specification (so they are never
clobbered by calls), but that they only reside in their designated
registers when used as input or output operands in extended asm
statements.

However, this is incompatible with other (ab)uses of local and global
register variables, which may well expect that they are clobbered by
calls.  It looks like GCC uses the same construct for two unrelated
things.

Thanks,
Florian


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: powerpc Linux scv support and scv system call ABI proposal
  2020-01-30 10:42                   ` Florian Weimer
@ 2020-01-30 11:25                     ` Segher Boessenkool
  2020-01-30 12:03                       ` Florian Weimer
  0 siblings, 1 reply; 26+ messages in thread
From: Segher Boessenkool @ 2020-01-30 11:25 UTC (permalink / raw)
  To: Florian Weimer
  Cc: libc-alpha, Tulio Magno Quites Machado Filho, linuxppc-dev,
	Nicholas Piggin

On Thu, Jan 30, 2020 at 11:42:51AM +0100, Florian Weimer wrote:
> * Segher Boessenkool:
> > No, that *is* supported: as input to or output from an asm, a local
> > register asm variable *is* guaranteed to live in the specified register.
> > This is the *only* supported use.  Other uses may sometimes still work,
> > but they never worked reliably, and it cannot be made reliable; it has
> > been documented as not supported since ages, and it will not work at all
> > anymore some day.
> 
> I must say I find this situation *very* confusing.

Local register variables live in that register when they are operands to
an (extended) inline asm.  There are no other guarantees.  That is all.

> You said that r0 & 1 is undefined.

I said that in

  int reg asm("r0");
  ...
  ...  reg & 1  ...

in that last expression, reg can be in any register, not necessarily r0.
The code is still perfectly well-defined of course, it just might not do
what you expected.

>  I *assumed* that I would still get
> the value of r0 (the register) from the associated extended asm in this
> expression, even if it may now be a different register.  Your comment
> made me think that this is undefined.

Please show full(er) examples, I think we are talking about something
else?

> But then the syscall wrappers use
> this construct:
> 
>     __asm__ __volatile__						\
>       ("sc\n\t"								\
>        "mfcr  %0\n\t"							\
>        "0:"								\
>        : "=&r" (r0),							\
>          "=&r" (r3), "=&r" (r4), "=&r" (r5),				\
>          "=&r" (r6), "=&r" (r7), "=&r" (r8)				\
>        : ASM_INPUT_##nr							\
>        : "r9", "r10", "r11", "r12",					\
>          "cr0", "ctr", "memory");					\
> 	  err = r0;  \
>     r3;  \
> 
> That lone r3 at the end would be equally undefined because it is not
> used in an input or output operand of an extended asm statement.

Nothing is undefined.  That r3 variable at the end might not live in
register r3 there; but the output from the asm does (the compiler can
have swapped registers around already, or even put this in memory
(which is what will probably happen here at -O0!), etc.

> The GCC documentation has this warning:
> 
> |  _Warning:_ In the above example, be aware that a register (for
> | example 'r0') can be call-clobbered by subsequent code, including
> | function calls and library calls for arithmetic operators on other
> | variables (for example the initialization of 'p2').

Yes.  This does not matter for the only supported use.  This is why that
*is* the only supported use.  The documentation could use a touch-up, I
think.  Unless we still have problems here?

> On POWER, the LOADARGS macros attempt to deal with this by using
> non-register temporaries.  However, I don't know how effective this is
> if the compiler really doesn't deal with call-clobbered registers
> properly.

It worked on old compilers.  This isn't necessary on newer compilers.

> For the extended asm use case (to express register assignments that
> cannot be listed in constraints), I would expect that these variables
> retain their values according to the C specification (so they are never
> clobbered by calls), but that they only reside in their designated
> registers when used as input or output operands in extended asm
> statements.

That is what is done now.

In the old days (more than ten years ago), local register variables had
more guarantees, guarantees that were broken once by one (or all all of
the time, depends on your viewpoint ;-) )  Such variables *did* live in
the specified register everywhere, well, not *everywhere*, and there
the problems started.

Nowadays:

Local register variables live in that register when they are operands to
an (extended) inline asm.  There are no other guarantees.  That is all.


Segher

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: powerpc Linux scv support and scv system call ABI proposal
  2020-01-30 11:25                     ` Segher Boessenkool
@ 2020-01-30 12:03                       ` Florian Weimer
  2020-01-30 13:50                         ` Segher Boessenkool
  0 siblings, 1 reply; 26+ messages in thread
From: Florian Weimer @ 2020-01-30 12:03 UTC (permalink / raw)
  To: Segher Boessenkool
  Cc: libc-alpha, Tulio Magno Quites Machado Filho, linuxppc-dev,
	Nicholas Piggin

* Segher Boessenkool:

>> I *assumed* that I would still get
>> the value of r0 (the register) from the associated extended asm in this
>> expression, even if it may now be a different register.  Your comment
>> made me think that this is undefined.
>
> Please show full(er) examples, I think we are talking about something
> else?

No, I think we are in agreement here how things should behave under the
new model.  But I have doubts whether that is implemented in GCC 9.

>> The GCC documentation has this warning:
>> 
>> |  _Warning:_ In the above example, be aware that a register (for
>> | example 'r0') can be call-clobbered by subsequent code, including
>> | function calls and library calls for arithmetic operators on other
>> | variables (for example the initialization of 'p2').
>
> Yes.  This does not matter for the only supported use.

I'm not so sure.  See below.

> This is why that *is* the only supported use.  The documentation could
> use a touch-up, I think.  Unless we still have problems here?

I really don't know.  GCC still has *some* support for the old behavior,
though.  For example, local register variables are treated as
initialized, and I think you can still use registers like global
variables.  GCC does not perform copy propagation here:

int f1 (int);

int
f (void)
{
  register int edi __asm__ ("edi");
  int edi_copy = edi;
  return f1 (1) + edi_copy;
}

And the case that we agreed should be defined in fact is not:

void f1 (int);

int
f (void)
{
  register int edi __asm__ ("edi");
  asm ("#" : "=r" (edi));
  f1 (1);
  return edi;
}

On x86-64, %edi is used to pass the first function parameter, so the
call clobbers %edi.  It is simply ambiguous whether edi (the variable)
should retain the value prior to the call to f1 (which I think is what
should happen under the new model, where register variables are only
affect asm operands), or if edi (the variable) should have the value of
%edi (the register) after the call (the old model).

Should we move this to the gcc list?

Thanks,
Florian


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: powerpc Linux scv support and scv system call ABI proposal
  2020-01-30 12:03                       ` Florian Weimer
@ 2020-01-30 13:50                         ` Segher Boessenkool
  2020-01-30 17:04                           ` Adhemerval Zanella
  0 siblings, 1 reply; 26+ messages in thread
From: Segher Boessenkool @ 2020-01-30 13:50 UTC (permalink / raw)
  To: Florian Weimer
  Cc: libc-alpha, Tulio Magno Quites Machado Filho, linuxppc-dev,
	Nicholas Piggin

Hi again,

On Thu, Jan 30, 2020 at 01:03:53PM +0100, Florian Weimer wrote:
> > This is why that *is* the only supported use.  The documentation could
> > use a touch-up, I think.  Unless we still have problems here?
> 
> I really don't know.  GCC still has *some* support for the old behavior,
> though.

No.  No support.  It still does some of the same things, but that can
change (and probably should).  But this hasn't been supported since the
dark ages, and the documentation has become gradually more explicit
about it.

> For example, local register variables are treated as
> initialized, and I think you can still use registers like global
> variables.  GCC does not perform copy propagation here:
> 
> int f1 (int);
> 
> int
> f (void)
> {
>   register int edi __asm__ ("edi");
>   int edi_copy = edi;
>   return f1 (1) + edi_copy;
> }

f:
        pushl   %edi
        subl    $20, %esp
        pushl   $1
        call    f1
        addl    %edi, %eax
        addl    $24, %esp
        popl    %edi
        ret

It takes the edi value *after* the call.  The behaviour is undefined,
so that is not a problem.  (This is a GCC 10 from September, fwiw).

> And the case that we agreed should be defined in fact is not:
> 
> void f1 (int);
> 
> int
> f (void)
> {
>   register int edi __asm__ ("edi");
>   asm ("#" : "=r" (edi));
>   f1 (1);
>   return edi;
> }

f:
        pushl   %edi
        subl    $20, %esp
#APP
        #
#NO_APP
        pushl   $1
        call    f1
        movl    %edi, %eax
        addl    $24, %esp
        popl    %edi
        ret

Changing it to "# %0" (so that we can see what we are doing) gives

#APP
        # %edi
#NO_APP

All as expected.

> On x86-64,

Oh, this was i386, since you used edi.  On x86-64:

f:
        pushq   %rbx
        movl    %edi, %ebx
        movl    $1, %edi
        call    f1
        addl    %ebx, %eax
        popq    %rbx
        ret

for that first testcase, taking edi before the call, which is not
*guaranteed* to happen, but still can; and

f:
        subq    $8, %rsp
        movl    $1, %edi
        call    f1
        addq    $8, %rsp
        movl    %edi, %eax
        ret

The asm was right before that "mov $1,%edi", so it was optimised away:
it is not a volatile asm, and its output is unused.  Making the asm
statement volatile gives

f:
        subq    $8, %rsp
#APP
        # %edi
#NO_APP
        movl    $1, %edi
        call    f1
        addq    $8, %rsp
        movl    %edi, %eax
        ret

> %edi is used to pass the first function parameter, so the
> call clobbers %edi.  It is simply ambiguous whether edi (the variable)
> should retain the value prior to the call to f1 (which I think is what
> should happen under the new model, where register variables are only
> affect asm operands), or if edi (the variable) should have the value of
> %edi (the register) after the call (the old model).

There is nothing ambiguous there, afaics?  Other than the edi value you
use in the asm is coming out of thin air (but it will always work with
current GCC; that's not really specified though).

> Should we move this to the gcc list?

Whoops, I thought that was on Cc:.  Sure.


Segher

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: powerpc Linux scv support and scv system call ABI proposal
  2020-01-30 13:50                         ` Segher Boessenkool
@ 2020-01-30 17:04                           ` Adhemerval Zanella
  2020-01-30 21:41                             ` Segher Boessenkool
  0 siblings, 1 reply; 26+ messages in thread
From: Adhemerval Zanella @ 2020-01-30 17:04 UTC (permalink / raw)
  To: Segher Boessenkool, Florian Weimer
  Cc: libc-alpha, Tulio Magno Quites Machado Filho, linuxppc-dev,
	Nicholas Piggin



On 30/01/2020 10:50, Segher Boessenkool wrote:
> Hi again,
> 
> On Thu, Jan 30, 2020 at 01:03:53PM +0100, Florian Weimer wrote:
>>> This is why that *is* the only supported use.  The documentation could
>>> use a touch-up, I think.  Unless we still have problems here?
>>
>> I really don't know.  GCC still has *some* support for the old behavior,
>> though.
> 
> No.  No support.  It still does some of the same things, but that can
> change (and probably should).  But this hasn't been supported since the
> dark ages, and the documentation has become gradually more explicit
> about it.
> 

I think this might be related to an odd sparc32 issue I am seeing with 
newer clock_nanosleep.  The expanded code is:

--
  register long err __asm__("g1");                                   // INTERNAL_SYSCALL_DECL  (err)
  r = ({                                                             // r = INTERNAL_SYSCALL_CANCEL (...)
	 long int sc_ret;
         if (SINGLE_THREAD_P)
	   sc_ret = INTERNAL_SYSCALL_CALL (__VA_ARGS__);
         else
           {
	     int sc_cancel_oldtype = __libc_enable_asynccancel ();
	     sc_ret = INTERNAL_SYSCALL_CALL (__VA_ARGS__);          // It issues the syscall with the asm (...)
	     __librt_disable_asynccancel (sc_cancel_oldtype);
	   }
         sc_ret;
       });
  if ((void) (val), __builtin_expect((err) != 0, 0))                // if (! INTERNAL_SYSCALL_ERROR_P (r, err))
    return 0;
  if ((-(val)) != ENOSYS)                                           // if (INTERNAL_SYSCALL_ERRNO (r, err) != ENOSYS)
    return ((-(val)));                                              //   return INTERNAL_SYSCALL_ERRNO (r, err);

  [...]

  r = ({                                                             // r = INTERNAL_SYSCALL_CANCEL (...)
       [...]
      )}
  if ((void) (val), __builtin_expect((err) != 0, 0))                // if (! INTERNAL_SYSCALL_ERROR_P (r, err))
    {
      [...]
    }
  return ((void) (val), __builtin_expect((err) != 0, 0))            // return (INTERNAL_SYSCALL_ERROR_P (r, err)
         ? ((-(val))) : 0;                                          //        ? INTERNAL_SYSCALL_ERRNO (r, err) : 0);
--

It requires that 'err' (assigned to 'g1') be value propagated over
functions calls and over different scopes, which I take from your 
explanation is not supported and fragile. It also seems that if I 
move the __libc_enable_* calls before 'err' initialization and after
its usage the code seems to works, but again it seems this usage
is not really supported on gcc.

So it seems that the current usage of 'INTERNAL_SYSCALL_DECL' and
'INTERNAL_SYSCALL_ERROR_P' are fragile if the architecture *does*
use the 'err' variable and it is defined a register alias (which 
its the case only for sparc currently).

Although a straightforward for sparc would be redefine 
INTERNAL_SYSCALL_DECL to not use a register alias, I still think
we should just follow Linux kernel ABI convention where value in 
the range between -4095 and -1 indicates an error and handle any 
specific symbols that might not strictly follow it with an 
arch-specific implementation (as we do for lseek on x32 and
mips64n32).  It would allow cleanup a lot of code and avoid such
pitfalls.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: powerpc Linux scv support and scv system call ABI proposal
  2020-01-30 17:04                           ` Adhemerval Zanella
@ 2020-01-30 21:41                             ` Segher Boessenkool
  2020-01-31 11:30                               ` Adhemerval Zanella
  0 siblings, 1 reply; 26+ messages in thread
From: Segher Boessenkool @ 2020-01-30 21:41 UTC (permalink / raw)
  To: Adhemerval Zanella
  Cc: Florian Weimer, libc-alpha, Tulio Magno Quites Machado Filho,
	linuxppc-dev, Nicholas Piggin

Hi!

On Thu, Jan 30, 2020 at 02:04:51PM -0300, Adhemerval Zanella wrote:
> On 30/01/2020 10:50, Segher Boessenkool wrote:
> > On Thu, Jan 30, 2020 at 01:03:53PM +0100, Florian Weimer wrote:
> >>> This is why that *is* the only supported use.  The documentation could
> >>> use a touch-up, I think.  Unless we still have problems here?
> >>
> >> I really don't know.  GCC still has *some* support for the old behavior,
> >> though.
> > 
> > No.  No support.  It still does some of the same things, but that can
> > change (and probably should).  But this hasn't been supported since the
> > dark ages, and the documentation has become gradually more explicit
> > about it.
> > 
> 
> I think this might be related to an odd sparc32 issue I am seeing with 
> newer clock_nanosleep.  The expanded code is:
> 
> --
>   register long err __asm__("g1");                                   // INTERNAL_SYSCALL_DECL  (err)
>   r = ({                                                             // r = INTERNAL_SYSCALL_CANCEL (...)
> 	 long int sc_ret;
>          if (SINGLE_THREAD_P)
> 	   sc_ret = INTERNAL_SYSCALL_CALL (__VA_ARGS__);
>          else
>            {
> 	     int sc_cancel_oldtype = __libc_enable_asynccancel ();
> 	     sc_ret = INTERNAL_SYSCALL_CALL (__VA_ARGS__);          // It issues the syscall with the asm (...)
> 	     __librt_disable_asynccancel (sc_cancel_oldtype);
> 	   }
>          sc_ret;
>        });
>   if ((void) (val), __builtin_expect((err) != 0, 0))                // if (! INTERNAL_SYSCALL_ERROR_P (r, err))
>     return 0;
>   if ((-(val)) != ENOSYS)                                           // if (INTERNAL_SYSCALL_ERRNO (r, err) != ENOSYS)
>     return ((-(val)));                                              //   return INTERNAL_SYSCALL_ERRNO (r, err);
> 
>   [...]
> 
>   r = ({                                                             // r = INTERNAL_SYSCALL_CANCEL (...)
>        [...]
>       )}
>   if ((void) (val), __builtin_expect((err) != 0, 0))                // if (! INTERNAL_SYSCALL_ERROR_P (r, err))
>     {
>       [...]
>     }
>   return ((void) (val), __builtin_expect((err) != 0, 0))            // return (INTERNAL_SYSCALL_ERROR_P (r, err)
>          ? ((-(val))) : 0;                                          //        ? INTERNAL_SYSCALL_ERRNO (r, err) : 0);
> --
> 
> It requires that 'err' (assigned to 'g1')

What do you mean by "assigned to g1"?

> be value propagated over
> functions calls and over different scopes, which I take from your 
> explanation is not supported and fragile.

You probably misundertand that, but let me ask: where is err assigned to
at all in the code you quoted?  I don't see it.  Maybe it's hidden in some
macro?

Or, maybe some asm writes to g1?  This is explicitly not okay (quote
from the GCC manual):

  Defining a register variable does not reserve the register.  Other than
  when invoking the Extended 'asm', the contents of the specified register
  are not guaranteed.  For this reason, the following uses are explicitly
  _not_ supported.  If they appear to work, it is only happenstance, and
  may stop working as intended due to (seemingly) unrelated changes in
  surrounding code, or even minor changes in the optimization of a future
  version of gcc:

   * Passing parameters to or from Basic 'asm'
   * Passing parameters to or from Extended 'asm' without using input or
     output operands.
   * Passing parameters to or from routines written in assembler (or
     other languages) using non-standard calling conventions.

> It also seems that if I 
> move the __libc_enable_* calls before 'err' initialization and after
> its usage the code seems to works, but again it seems this usage
> is not really supported on gcc.
> 
> So it seems that the current usage of 'INTERNAL_SYSCALL_DECL' and
> 'INTERNAL_SYSCALL_ERROR_P' are fragile if the architecture *does*
> use the 'err' variable and it is defined a register alias (which 
> its the case only for sparc currently).
> 
> Although a straightforward for sparc would be redefine 
> INTERNAL_SYSCALL_DECL to not use a register alias, I still think
> we should just follow Linux kernel ABI convention where value in 
> the range between -4095 and -1 indicates an error and handle any 
> specific symbols that might not strictly follow it with an 
> arch-specific implementation (as we do for lseek on x32 and
> mips64n32).  It would allow cleanup a lot of code and avoid such
> pitfalls.

I don't really understand what you call a "register alias", either.
(And i don't know the Sparc ABI well enough to help you with that).


Segher

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: powerpc Linux scv support and scv system call ABI proposal
  2020-01-30 21:41                             ` Segher Boessenkool
@ 2020-01-31 11:30                               ` Adhemerval Zanella
  2020-01-31 11:55                                 ` Segher Boessenkool
  0 siblings, 1 reply; 26+ messages in thread
From: Adhemerval Zanella @ 2020-01-31 11:30 UTC (permalink / raw)
  To: Segher Boessenkool
  Cc: Florian Weimer, libc-alpha, Tulio Magno Quites Machado Filho,
	linuxppc-dev, Nicholas Piggin



On 30/01/2020 18:41, Segher Boessenkool wrote:
> Hi!
> 
> On Thu, Jan 30, 2020 at 02:04:51PM -0300, Adhemerval Zanella wrote:
>> On 30/01/2020 10:50, Segher Boessenkool wrote:
>>> On Thu, Jan 30, 2020 at 01:03:53PM +0100, Florian Weimer wrote:
>>>>> This is why that *is* the only supported use.  The documentation could
>>>>> use a touch-up, I think.  Unless we still have problems here?
>>>>
>>>> I really don't know.  GCC still has *some* support for the old behavior,
>>>> though.
>>>
>>> No.  No support.  It still does some of the same things, but that can
>>> change (and probably should).  But this hasn't been supported since the
>>> dark ages, and the documentation has become gradually more explicit
>>> about it.
>>>
>>
>> I think this might be related to an odd sparc32 issue I am seeing with 
>> newer clock_nanosleep.  The expanded code is:
>>
>> --
>>   register long err __asm__("g1");                                   // INTERNAL_SYSCALL_DECL  (err)
>>   r = ({                                                             // r = INTERNAL_SYSCALL_CANCEL (...)
>> 	 long int sc_ret;
>>          if (SINGLE_THREAD_P)
>> 	   sc_ret = INTERNAL_SYSCALL_CALL (__VA_ARGS__);
>>          else
>>            {
>> 	     int sc_cancel_oldtype = __libc_enable_asynccancel ();
>> 	     sc_ret = INTERNAL_SYSCALL_CALL (__VA_ARGS__);          // It issues the syscall with the asm (...)
>> 	     __librt_disable_asynccancel (sc_cancel_oldtype);
>> 	   }
>>          sc_ret;
>>        });
>>   if ((void) (val), __builtin_expect((err) != 0, 0))                // if (! INTERNAL_SYSCALL_ERROR_P (r, err))
>>     return 0;
>>   if ((-(val)) != ENOSYS)                                           // if (INTERNAL_SYSCALL_ERRNO (r, err) != ENOSYS)
>>     return ((-(val)));                                              //   return INTERNAL_SYSCALL_ERRNO (r, err);
>>
>>   [...]
>>
>>   r = ({                                                             // r = INTERNAL_SYSCALL_CANCEL (...)
>>        [...]
>>       )}
>>   if ((void) (val), __builtin_expect((err) != 0, 0))                // if (! INTERNAL_SYSCALL_ERROR_P (r, err))
>>     {
>>       [...]
>>     }
>>   return ((void) (val), __builtin_expect((err) != 0, 0))            // return (INTERNAL_SYSCALL_ERROR_P (r, err)
>>          ? ((-(val))) : 0;                                          //        ? INTERNAL_SYSCALL_ERRNO (r, err) : 0);
>> --
>>
>> It requires that 'err' (assigned to 'g1')
> 
> What do you mean by "assigned to g1"?

I meant 'err' being a local register variable.

> 
>> be value propagated over
>> functions calls and over different scopes, which I take from your 
>> explanation is not supported and fragile.
> 
> You probably misundertand that, but let me ask: where is err assigned to
> at all in the code you quoted?  I don't see it.  Maybe it's hidden in some
> macro?

Indeed it was not explicit in the example code, it is burried in the
INTERNAL_SYSCALL_CALL macro which calls sparc-defined macros. For instance, 
with 6 argument kernel syscall, it issues:

#define inline_syscall6(string,err,name,arg1,arg2,arg3,arg4,arg5,arg6)  \
({                                                                      \
        register long __o0 __asm__ ("o0") = (long)(arg1);               \
        register long __o1 __asm__ ("o1") = (long)(arg2);               \
        register long __o2 __asm__ ("o2") = (long)(arg3);               \
        register long __o3 __asm__ ("o3") = (long)(arg4);               \
        register long __o4 __asm__ ("o4") = (long)(arg5);               \
        register long __o5 __asm__ ("o5") = (long)(arg6);               \
        err = name;                                                     \
        __asm __volatile (string : "=r" (err), "=r" (__o0) :            \
                          "0" (err), "1" (__o0), "r" (__o1),            \
                          "r" (__o2), "r" (__o3), "r" (__o4),           \
                          "r" (__o5) :                                  \
                          __SYSCALL_CLOBBERS);                          \
        __o0;                                                           \
})

Where 'err' defined by 'INTERNAL_SYSCALL_DECL' should be the 'err' macro
argument.

> 
> Or, maybe some asm writes to g1?  This is explicitly not okay (quote
> from the GCC manual):

Yes, that's the case.

> 
>   Defining a register variable does not reserve the register.  Other than
>   when invoking the Extended 'asm', the contents of the specified register
>   are not guaranteed.  For this reason, the following uses are explicitly
>   _not_ supported.  If they appear to work, it is only happenstance, and
>   may stop working as intended due to (seemingly) unrelated changes in
>   surrounding code, or even minor changes in the optimization of a future
>   version of gcc:
> 
>    * Passing parameters to or from Basic 'asm'
>    * Passing parameters to or from Extended 'asm' without using input or
>      output operands.
>    * Passing parameters to or from routines written in assembler (or
>      other languages) using non-standard calling conventions.
> 
>> It also seems that if I 
>> move the __libc_enable_* calls before 'err' initialization and after
>> its usage the code seems to works, but again it seems this usage
>> is not really supported on gcc.
>>
>> So it seems that the current usage of 'INTERNAL_SYSCALL_DECL' and
>> 'INTERNAL_SYSCALL_ERROR_P' are fragile if the architecture *does*
>> use the 'err' variable and it is defined a register alias (which 
>> its the case only for sparc currently).
>>
>> Although a straightforward for sparc would be redefine 
>> INTERNAL_SYSCALL_DECL to not use a register alias, I still think
>> we should just follow Linux kernel ABI convention where value in 
>> the range between -4095 and -1 indicates an error and handle any 
>> specific symbols that might not strictly follow it with an 
>> arch-specific implementation (as we do for lseek on x32 and
>> mips64n32).  It would allow cleanup a lot of code and avoid such
>> pitfalls.
> 
> I don't really understand what you call a "register alias", either.
> (And i don't know the Sparc ABI well enough to help you with that).

I meant a register variable where its use 'after' the extended asm
is expected to use the define register.


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: powerpc Linux scv support and scv system call ABI proposal
  2020-01-31 11:30                               ` Adhemerval Zanella
@ 2020-01-31 11:55                                 ` Segher Boessenkool
  0 siblings, 0 replies; 26+ messages in thread
From: Segher Boessenkool @ 2020-01-31 11:55 UTC (permalink / raw)
  To: Adhemerval Zanella
  Cc: Florian Weimer, libc-alpha, Tulio Magno Quites Machado Filho,
	linuxppc-dev, Nicholas Piggin

Hi!

On Fri, Jan 31, 2020 at 08:30:45AM -0300, Adhemerval Zanella wrote:
> On 30/01/2020 18:41, Segher Boessenkool wrote:
> > On Thu, Jan 30, 2020 at 02:04:51PM -0300, Adhemerval Zanella wrote:
> >> be value propagated over
> >> functions calls and over different scopes, which I take from your 
> >> explanation is not supported and fragile.
> > 
> > You probably misundertand that, but let me ask: where is err assigned to
> > at all in the code you quoted?  I don't see it.  Maybe it's hidden in some
> > macro?
> 
> Indeed it was not explicit in the example code, it is burried in the
> INTERNAL_SYSCALL_CALL macro which calls sparc-defined macros. For instance, 
> with 6 argument kernel syscall, it issues:
> 
> #define inline_syscall6(string,err,name,arg1,arg2,arg3,arg4,arg5,arg6)  \
> ({                                                                      \
>         register long __o0 __asm__ ("o0") = (long)(arg1);               \
>         register long __o1 __asm__ ("o1") = (long)(arg2);               \
>         register long __o2 __asm__ ("o2") = (long)(arg3);               \
>         register long __o3 __asm__ ("o3") = (long)(arg4);               \
>         register long __o4 __asm__ ("o4") = (long)(arg5);               \
>         register long __o5 __asm__ ("o5") = (long)(arg6);               \
>         err = name;                                                     \
>         __asm __volatile (string : "=r" (err), "=r" (__o0) :            \
>                           "0" (err), "1" (__o0), "r" (__o1),            \
>                           "r" (__o2), "r" (__o3), "r" (__o4),           \
>                           "r" (__o5) :                                  \
>                           __SYSCALL_CLOBBERS);                          \
>         __o0;                                                           \
> })
> 
> Where 'err' defined by 'INTERNAL_SYSCALL_DECL' should be the 'err' macro
> argument.

GCC makes sure that what is in register g1 at the end of this asm does
end up in the C variable "err" (at least conceptually, the actual code
can be optimised further).

> I meant a register variable where its use 'after' the extended asm
> is expected to use the define register.

Yes, that is not supported like this.  You'll have to use some more
inline asm at that use (with "err" as input there).  Or, if you actually
care about this being in a specific register, maybe you shouldn't write
this in C at all?  Writing assembler code in assembler language (in a
single inline asm block, or even in an assembler source file) tends to
give much better results (and is a lot easier) than trying to second-
guess the compiler.  You can write pretty much anything as inline
assembler code, but that doesn't mean you have to, or that that would
be a good idea.

Things on the border like system calls are hard to handle.  I like the
idea of doing those in compiler builtins ("compiler intrinsics"), but
that has its own problems as well: mostly, need to {write down / lock
down / determine in advance} more of the calling convention than the
other approaches.  And of course it will take years before most projects
can use it :-/


Segher

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: powerpc Linux scv support and scv system call ABI proposal
  2020-01-29 15:51         ` Tulio Magno Quites Machado Filho
@ 2020-02-19 11:03           ` Nicholas Piggin
  0 siblings, 0 replies; 26+ messages in thread
From: Nicholas Piggin @ 2020-02-19 11:03 UTC (permalink / raw)
  To: Adhemerval Zanella, Libc-alpha Mailing List, linuxppc-dev,
	Tulio Magno Quites Machado Filho

Tulio Magno Quites Machado Filho's on January 30, 2020 1:51 am:
> Nicholas Piggin <npiggin@gmail.com> writes:
> 
>> Adhemerval Zanella's on January 29, 2020 3:26 am:
>>> 
>>> We already had to push a similar hack where glibc used to abort transactions
>>> prior syscalls to avoid some side-effects on kernel (commit 56cf2763819d2f).
>>> It was eventually removed from syscall handling by f0458cf4f9ff3d870, where
>>> we only enable TLE if kernel suppors PPC_FEATURE2_HTM_NOSC.
>>> 
>>> The transaction syscall abort used to read a variable directly from TCB,
>>> so this could be an option. I would expect that we could optimize it where
>>> if glibc is building against a recent kernel and compiler is building
>>> for a ISA 3.0+ cpu we could remove the 'sc' code.
>>> 
>>
>> We would just have to be careful of running on ISA 3.0 CPUs on older
>> kernels which do not support scv.
> 
> Can we assume that, if a syscall is available through sc it's also available
> in scv 0?

Was on vacation, thanks for waiting.

Yes, except for the difference in calling convention, we would require
that the syscalls available to `sc` is exactly the same as `scv 0`. This
happens as a natural consequence of the kernel implementation which
re-uses code to select the syscall.

> 
> Because if that's true, I believe your suggestion to interpret PPC_FEATURE2_SCV
> as scv 0 support would be helpful to provide this support via IFUNC even
> when glibc is built using --with-cpu=power8, which is the most common scenario
> in ppc64le.
> 
> In that scenario, it seems new HWCAP bits for new vectors wouldn't be too
> frequent, which was the only downside of this proposal.

Okay good feedback, thanks.

Thanks,
Nick

^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2020-02-19 11:05 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-01-28 10:50 powerpc Linux scv support and scv system call ABI proposal Nicholas Piggin
2020-01-28 13:09 ` Florian Weimer
2020-01-28 14:05   ` Nicholas Piggin
2020-01-28 15:40     ` Segher Boessenkool
2020-01-28 16:04       ` Florian Weimer
2020-01-28 20:01         ` Segher Boessenkool
2020-01-29 16:19           ` Florian Weimer
2020-01-29 16:29             ` Segher Boessenkool
2020-01-29 17:02               ` Florian Weimer
2020-01-29 17:51                 ` Segher Boessenkool
2020-01-30 10:42                   ` Florian Weimer
2020-01-30 11:25                     ` Segher Boessenkool
2020-01-30 12:03                       ` Florian Weimer
2020-01-30 13:50                         ` Segher Boessenkool
2020-01-30 17:04                           ` Adhemerval Zanella
2020-01-30 21:41                             ` Segher Boessenkool
2020-01-31 11:30                               ` Adhemerval Zanella
2020-01-31 11:55                                 ` Segher Boessenkool
2020-01-28 15:58     ` Florian Weimer
2020-01-29  4:41       ` Nicholas Piggin
2020-01-28 17:26     ` Adhemerval Zanella
2020-01-29  4:58       ` Nicholas Piggin
2020-01-29 13:20         ` Segher Boessenkool
2020-01-29 15:51         ` Tulio Magno Quites Machado Filho
2020-02-19 11:03           ` Nicholas Piggin
2020-01-28 22:14   ` Joseph Myers

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.