All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: Please help to confirm the risk if using TPIDRRO_EL0 to save CPU number, thanks.
       [not found] <1D289F1E6D91D2489524BBB0B8880A7DA1A39219@dggeml509-mbx.china.huawei.com>
@ 2020-06-01  7:03 ` Will Deacon
  2020-06-05 12:10   ` Mark Brown
  0 siblings, 1 reply; 6+ messages in thread
From: Will Deacon @ 2020-06-01  7:03 UTC (permalink / raw)
  To: Lixin (Victor, Kirin); +Cc: fujun (F), Wuxuecheng, linux-arm-kernel

On Fri, May 29, 2020 at 09:03:37AM +0000, Lixin (Victor, Kirin) wrote:
>    Intel optimized getcpu syscall on Linux/Android system by using vDSO, but
>    ARM doesn't do any optimizations for getcpu syscall.
> 
>    In Apple open source, TPIDRRO_EL0/TPIDRURO is used to save the CPU number,
>    [1]https://opensource.apple.com/source/xnu/xnu-4570.1.46/osfmk/arm/cswitch.s.auto.html
> 
>               Is there any risk if using TPIDRRO_EL0/TPIDRURO to implement
>    the vDSO for getcpu? Is there any possible to break any ARM ABI? Can you
>    help us to confirm the considerations?

Do you have a use-case for high-performance getcpu() that isn't better
suited to rseq()?

Will

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Please help to confirm the risk if using TPIDRRO_EL0 to save CPU number, thanks.
  2020-06-01  7:03 ` Please help to confirm the risk if using TPIDRRO_EL0 to save CPU number, thanks Will Deacon
@ 2020-06-05 12:10   ` Mark Brown
  2020-06-05 12:33     ` Will Deacon
  0 siblings, 1 reply; 6+ messages in thread
From: Mark Brown @ 2020-06-05 12:10 UTC (permalink / raw)
  To: Will Deacon
  Cc: fujun (F), Wuxuecheng, linux-arm-kernel, Lixin (Victor, Kirin)


[-- Attachment #1.1: Type: text/plain, Size: 2294 bytes --]

On Mon, Jun 01, 2020 at 08:03:12AM +0100, Will Deacon wrote:
> On Fri, May 29, 2020 at 09:03:37AM +0000, Lixin (Victor, Kirin) wrote:

> >    Intel optimized getcpu syscall on Linux/Android system by using vDSO, but
> >    ARM doesn't do any optimizations for getcpu syscall.

> >    In Apple open source, TPIDRRO_EL0/TPIDRURO is used to save the CPU number,
> >    [1]https://opensource.apple.com/source/xnu/xnu-4570.1.46/osfmk/arm/cswitch.s.auto.html

> >               Is there any risk if using TPIDRRO_EL0/TPIDRURO to implement
> >    the vDSO for getcpu? Is there any possible to break any ARM ABI? Can you
> >    help us to confirm the considerations?

> Do you have a use-case for high-performance getcpu() that isn't better
> suited to rseq()?

I actually have an implementation of this that I'd been waiting for the
end of the merge window to post, largely because I first heard of the
use of restartable sequences for this after I'd already implemented the
vDSO version - this stuff is not as discoverable as one might desire.
It doesn't store the CPU ID directly in TPIDRRO but rather uses TPIDDRRO
to store the offset of a per-CPU struct in the vDSO data in order to
allow for the addition of further data in the future.  I'll post it
today for discussion.

The latest version of the Mathieu's glibc integration patches is:

    https://lore.kernel.org/lkml/20200527185130.5604-3-mathieu.desnoyers@efficios.com/

The only things I can see where the vDSO does better are support for the
node parameter of getcpu() and the ease of implementation for the users,
the restartable sequences code was merged all the way back in v4.18 and
it's still not used by any of the libcs as far as I can see.  The node
to CPU mapping is static so I'm not sure how exciting that is, it could
be looked up separately when processing data if it's important, but the 
ease of use feels like something.

One important caveat with using TPIDRRO is that if KPTI is active then
the KPTI trampoline uses TPIDRRO as a scratch register so unless we can
find another register for scratch usage the user would need to give up
the protections offered by KPTI or run on future hardware which can use
E0PD instead.  This severely limits the usefulness on current systems.

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

[-- Attachment #2: Type: text/plain, Size: 176 bytes --]

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Please help to confirm the risk if using TPIDRRO_EL0 to save CPU number, thanks.
  2020-06-05 12:10   ` Mark Brown
@ 2020-06-05 12:33     ` Will Deacon
  2020-06-05 12:58       ` Robin Murphy
  0 siblings, 1 reply; 6+ messages in thread
From: Will Deacon @ 2020-06-05 12:33 UTC (permalink / raw)
  To: Mark Brown; +Cc: fujun (F), Wuxuecheng, linux-arm-kernel, Lixin (Victor, Kirin)

On Fri, Jun 05, 2020 at 01:10:29PM +0100, Mark Brown wrote:
> On Mon, Jun 01, 2020 at 08:03:12AM +0100, Will Deacon wrote:
> > On Fri, May 29, 2020 at 09:03:37AM +0000, Lixin (Victor, Kirin) wrote:
> 
> > >    Intel optimized getcpu syscall on Linux/Android system by using vDSO, but
> > >    ARM doesn't do any optimizations for getcpu syscall.
> 
> > >    In Apple open source, TPIDRRO_EL0/TPIDRURO is used to save the CPU number,
> > >    [1]https://opensource.apple.com/source/xnu/xnu-4570.1.46/osfmk/arm/cswitch.s.auto.html
> 
> > >               Is there any risk if using TPIDRRO_EL0/TPIDRURO to implement
> > >    the vDSO for getcpu? Is there any possible to break any ARM ABI? Can you
> > >    help us to confirm the considerations?
> 
> > Do you have a use-case for high-performance getcpu() that isn't better
> > suited to rseq()?
> 
> I actually have an implementation of this that I'd been waiting for the
> end of the merge window to post, largely because I first heard of the
> use of restartable sequences for this after I'd already implemented the
> vDSO version - this stuff is not as discoverable as one might desire.
> It doesn't store the CPU ID directly in TPIDRRO but rather uses TPIDDRRO
> to store the offset of a per-CPU struct in the vDSO data in order to
> allow for the addition of further data in the future.  I'll post it
> today for discussion.
> 
> The latest version of the Mathieu's glibc integration patches is:
> 
>     https://lore.kernel.org/lkml/20200527185130.5604-3-mathieu.desnoyers@efficios.com/
> 
> The only things I can see where the vDSO does better are support for the
> node parameter of getcpu() and the ease of implementation for the users,
> the restartable sequences code was merged all the way back in v4.18 and
> it's still not used by any of the libcs as far as I can see.  The node
> to CPU mapping is static so I'm not sure how exciting that is, it could
> be looked up separately when processing data if it's important, but the 
> ease of use feels like something.
> 
> One important caveat with using TPIDRRO is that if KPTI is active then
> the KPTI trampoline uses TPIDRRO as a scratch register so unless we can
> find another register for scratch usage the user would need to give up
> the protections offered by KPTI or run on future hardware which can use
> E0PD instead.  This severely limits the usefulness on current systems.

We only trash TPIDRRO on entry, so I think you could repopulate it on every
exception from userspace and it *should* work with KPTI (famous last words!)

Will

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Please help to confirm the risk if using TPIDRRO_EL0 to save CPU number, thanks.
  2020-06-05 12:33     ` Will Deacon
@ 2020-06-05 12:58       ` Robin Murphy
  2020-06-05 13:02         ` Will Deacon
  0 siblings, 1 reply; 6+ messages in thread
From: Robin Murphy @ 2020-06-05 12:58 UTC (permalink / raw)
  To: Will Deacon, Mark Brown
  Cc: fujun (F), Wuxuecheng, Lixin (Victor, Kirin), linux-arm-kernel

On 2020-06-05 13:33, Will Deacon wrote:
> On Fri, Jun 05, 2020 at 01:10:29PM +0100, Mark Brown wrote:
>> On Mon, Jun 01, 2020 at 08:03:12AM +0100, Will Deacon wrote:
>>> On Fri, May 29, 2020 at 09:03:37AM +0000, Lixin (Victor, Kirin) wrote:
>>
>>>>     Intel optimized getcpu syscall on Linux/Android system by using vDSO, but
>>>>     ARM doesn't do any optimizations for getcpu syscall.
>>
>>>>     In Apple open source, TPIDRRO_EL0/TPIDRURO is used to save the CPU number,
>>>>     [1]https://opensource.apple.com/source/xnu/xnu-4570.1.46/osfmk/arm/cswitch.s.auto.html
>>
>>>>                Is there any risk if using TPIDRRO_EL0/TPIDRURO to implement
>>>>     the vDSO for getcpu? Is there any possible to break any ARM ABI? Can you
>>>>     help us to confirm the considerations?
>>
>>> Do you have a use-case for high-performance getcpu() that isn't better
>>> suited to rseq()?
>>
>> I actually have an implementation of this that I'd been waiting for the
>> end of the merge window to post, largely because I first heard of the
>> use of restartable sequences for this after I'd already implemented the
>> vDSO version - this stuff is not as discoverable as one might desire.
>> It doesn't store the CPU ID directly in TPIDRRO but rather uses TPIDDRRO
>> to store the offset of a per-CPU struct in the vDSO data in order to
>> allow for the addition of further data in the future.  I'll post it
>> today for discussion.
>>
>> The latest version of the Mathieu's glibc integration patches is:
>>
>>      https://lore.kernel.org/lkml/20200527185130.5604-3-mathieu.desnoyers@efficios.com/
>>
>> The only things I can see where the vDSO does better are support for the
>> node parameter of getcpu() and the ease of implementation for the users,
>> the restartable sequences code was merged all the way back in v4.18 and
>> it's still not used by any of the libcs as far as I can see.  The node
>> to CPU mapping is static so I'm not sure how exciting that is, it could
>> be looked up separately when processing data if it's important, but the
>> ease of use feels like something.
>>
>> One important caveat with using TPIDRRO is that if KPTI is active then
>> the KPTI trampoline uses TPIDRRO as a scratch register so unless we can
>> find another register for scratch usage the user would need to give up
>> the protections offered by KPTI or run on future hardware which can use
>> E0PD instead.  This severely limits the usefulness on current systems.
> 
> We only trash TPIDRRO on entry, so I think you could repopulate it on every
> exception from userspace and it *should* work with KPTI (famous last words!)

Is that not already the case given that we keep TLS gubbins in there for 
compat tasks?

Robin.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Please help to confirm the risk if using TPIDRRO_EL0 to save CPU number, thanks.
  2020-06-05 12:58       ` Robin Murphy
@ 2020-06-05 13:02         ` Will Deacon
  2020-06-05 13:23           ` Robin Murphy
  0 siblings, 1 reply; 6+ messages in thread
From: Will Deacon @ 2020-06-05 13:02 UTC (permalink / raw)
  To: Robin Murphy
  Cc: fujun (F), Mark Brown, Wuxuecheng, Lixin (Victor, Kirin),
	linux-arm-kernel

On Fri, Jun 05, 2020 at 01:58:39PM +0100, Robin Murphy wrote:
> On 2020-06-05 13:33, Will Deacon wrote:
> > On Fri, Jun 05, 2020 at 01:10:29PM +0100, Mark Brown wrote:
> > > On Mon, Jun 01, 2020 at 08:03:12AM +0100, Will Deacon wrote:
> > > > On Fri, May 29, 2020 at 09:03:37AM +0000, Lixin (Victor, Kirin) wrote:
> > > 
> > > > >     Intel optimized getcpu syscall on Linux/Android system by using vDSO, but
> > > > >     ARM doesn't do any optimizations for getcpu syscall.
> > > 
> > > > >     In Apple open source, TPIDRRO_EL0/TPIDRURO is used to save the CPU number,
> > > > >     [1]https://opensource.apple.com/source/xnu/xnu-4570.1.46/osfmk/arm/cswitch.s.auto.html
> > > 
> > > > >                Is there any risk if using TPIDRRO_EL0/TPIDRURO to implement
> > > > >     the vDSO for getcpu? Is there any possible to break any ARM ABI? Can you
> > > > >     help us to confirm the considerations?
> > > 
> > > > Do you have a use-case for high-performance getcpu() that isn't better
> > > > suited to rseq()?
> > > 
> > > I actually have an implementation of this that I'd been waiting for the
> > > end of the merge window to post, largely because I first heard of the
> > > use of restartable sequences for this after I'd already implemented the
> > > vDSO version - this stuff is not as discoverable as one might desire.
> > > It doesn't store the CPU ID directly in TPIDRRO but rather uses TPIDDRRO
> > > to store the offset of a per-CPU struct in the vDSO data in order to
> > > allow for the addition of further data in the future.  I'll post it
> > > today for discussion.
> > > 
> > > The latest version of the Mathieu's glibc integration patches is:
> > > 
> > >      https://lore.kernel.org/lkml/20200527185130.5604-3-mathieu.desnoyers@efficios.com/
> > > 
> > > The only things I can see where the vDSO does better are support for the
> > > node parameter of getcpu() and the ease of implementation for the users,
> > > the restartable sequences code was merged all the way back in v4.18 and
> > > it's still not used by any of the libcs as far as I can see.  The node
> > > to CPU mapping is static so I'm not sure how exciting that is, it could
> > > be looked up separately when processing data if it's important, but the
> > > ease of use feels like something.
> > > 
> > > One important caveat with using TPIDRRO is that if KPTI is active then
> > > the KPTI trampoline uses TPIDRRO as a scratch register so unless we can
> > > find another register for scratch usage the user would need to give up
> > > the protections offered by KPTI or run on future hardware which can use
> > > E0PD instead.  This severely limits the usefulness on current systems.
> > 
> > We only trash TPIDRRO on entry, so I think you could repopulate it on every
> > exception from userspace and it *should* work with KPTI (famous last words!)
> 
> Is that not already the case given that we keep TLS gubbins in there for
> compat tasks?

No; we only trash TPIDRRO for 64-bit tasks. 32-bit tasks have loads of
free registers :D

Will

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Please help to confirm the risk if using TPIDRRO_EL0 to save CPU number, thanks.
  2020-06-05 13:02         ` Will Deacon
@ 2020-06-05 13:23           ` Robin Murphy
  0 siblings, 0 replies; 6+ messages in thread
From: Robin Murphy @ 2020-06-05 13:23 UTC (permalink / raw)
  To: Will Deacon
  Cc: fujun (F), Mark Brown, Wuxuecheng, Lixin (Victor, Kirin),
	linux-arm-kernel

On 2020-06-05 14:02, Will Deacon wrote:
> On Fri, Jun 05, 2020 at 01:58:39PM +0100, Robin Murphy wrote:
>> On 2020-06-05 13:33, Will Deacon wrote:
>>> On Fri, Jun 05, 2020 at 01:10:29PM +0100, Mark Brown wrote:
>>>> On Mon, Jun 01, 2020 at 08:03:12AM +0100, Will Deacon wrote:
>>>>> On Fri, May 29, 2020 at 09:03:37AM +0000, Lixin (Victor, Kirin) wrote:
>>>>
>>>>>>      Intel optimized getcpu syscall on Linux/Android system by using vDSO, but
>>>>>>      ARM doesn't do any optimizations for getcpu syscall.
>>>>
>>>>>>      In Apple open source, TPIDRRO_EL0/TPIDRURO is used to save the CPU number,
>>>>>>      [1]https://opensource.apple.com/source/xnu/xnu-4570.1.46/osfmk/arm/cswitch.s.auto.html
>>>>
>>>>>>      �����������Is there any risk if using TPIDRRO_EL0/TPIDRURO to implement
>>>>>>      the vDSO for getcpu? Is there any possible to break any ARM ABI? Can you
>>>>>>      help us to confirm the considerations?
>>>>
>>>>> Do you have a use-case for high-performance getcpu() that isn't better
>>>>> suited to rseq()?
>>>>
>>>> I actually have an implementation of this that I'd been waiting for the
>>>> end of the merge window to post, largely because I first heard of the
>>>> use of restartable sequences for this after I'd already implemented the
>>>> vDSO version - this stuff is not as discoverable as one might desire.
>>>> It doesn't store the CPU ID directly in TPIDRRO but rather uses TPIDDRRO
>>>> to store the offset of a per-CPU struct in the vDSO data in order to
>>>> allow for the addition of further data in the future.  I'll post it
>>>> today for discussion.
>>>>
>>>> The latest version of the Mathieu's glibc integration patches is:
>>>>
>>>>       https://lore.kernel.org/lkml/20200527185130.5604-3-mathieu.desnoyers@efficios.com/
>>>>
>>>> The only things I can see where the vDSO does better are support for the
>>>> node parameter of getcpu() and the ease of implementation for the users,
>>>> the restartable sequences code was merged all the way back in v4.18 and
>>>> it's still not used by any of the libcs as far as I can see.  The node
>>>> to CPU mapping is static so I'm not sure how exciting that is, it could
>>>> be looked up separately when processing data if it's important, but the
>>>> ease of use feels like something.
>>>>
>>>> One important caveat with using TPIDRRO is that if KPTI is active then
>>>> the KPTI trampoline uses TPIDRRO as a scratch register so unless we can
>>>> find another register for scratch usage the user would need to give up
>>>> the protections offered by KPTI or run on future hardware which can use
>>>> E0PD instead.  This severely limits the usefulness on current systems.
>>>
>>> We only trash TPIDRRO on entry, so I think you could repopulate it on every
>>> exception from userspace and it *should* work with KPTI (famous last words!)
>>
>> Is that not already the case given that we keep TLS gubbins in there for
>> compat tasks?
> 
> No; we only trash TPIDRRO for 64-bit tasks. 32-bit tasks have loads of
> free registers :D

Derp, I thought that was one that we always rewrote somewhere in the 
exception return path, but I must have got muddled up with CONTEXTIDR 
(which upon double-checking, I see we 'restore' from itself, so I didn't 
even remember that quite right...)

Robin.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2020-06-05 13:23 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <1D289F1E6D91D2489524BBB0B8880A7DA1A39219@dggeml509-mbx.china.huawei.com>
2020-06-01  7:03 ` Please help to confirm the risk if using TPIDRRO_EL0 to save CPU number, thanks Will Deacon
2020-06-05 12:10   ` Mark Brown
2020-06-05 12:33     ` Will Deacon
2020-06-05 12:58       ` Robin Murphy
2020-06-05 13:02         ` Will Deacon
2020-06-05 13:23           ` Robin Murphy

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.