All of lore.kernel.org
 help / color / mirror / Atom feed
* Bringing rseq back into glibc
@ 2021-11-18 10:17 Florian Weimer
  2021-11-18 16:32 ` Mathieu Desnoyers
                   ` (2 more replies)
  0 siblings, 3 replies; 8+ messages in thread
From: Florian Weimer @ 2021-11-18 10:17 UTC (permalink / raw)
  To: libc-alpha
  Cc: linux-api, Mathieu Desnoyers, Jeremy Linton, Vincenzo Frascino,
	Rich Felker

I would like to bring back rseq for glibc 2.35.  I propose the following
steps:

1. Enable rseq registration in glibc, for internal use only.  This time,
   put the rseq area into struct pthread, not into a initial-exec TLS
   symbol.  (This helps to avoid with initial-exec TLS bloat with dlopen
   and simplifies initialization somewhat.)

2. Add a tunable to disable rseq registration in glibc.  This way, if
   there is already an rseq user, it can be made to work again by
   setting the tunable.

3. Implement sched_getcpu on top of rseq.

4. Add public symbols __rseq_abi_offset, __rseq_abi_size (currently 32
   or 0), __rseq_abi_flags (currently 0).  __rseq_abi_offset is the
   offset to add to the thread pointer (see __builtin_thread_pointer) to
   get to the rseq area.  They will be public ABI symbols.  These
   variables are initialized before user code runs, and changing the
   results in undefined behavior.

Under this model, the rseq area offset is clearly constant across all
threads.  (This was previously implied by using initial-exec TLS
memory.)  rseq registration failure is indicated by __rseq_abi_size ==
0.  If the size is non-zero, rseq will be registered on all threads
created by glibc, and all the time as far as user code is concernes.
(This assumes that if rseq registration succeeds on the main thread, it
will succeed on all other threads.  We will terminate the process if
not.)  For example, if a JIT compiler sees __rseq_abi_size >= 32, in
generated code, it can inline a version of sched_getcpu that
materializes the thread pointer and loads the cpu_id field from the rseq
area, without further checks.  Under the old TLS-based model, it was
less clear that this was a valid optimization.

Furthermore, I believe this approach will be more compatible with
potential future kernel changes in this area.  If the kernel tells us
some day through the auxiliary vector that we should register a 128-byte
rseq area with 64-byte alignment, we can make that happen and change
__rseq_abi_offset and __rseq_abi_size accordingly.

Steps 1 to 3 are backportable to previous glibc version, especially to
2.34 with its integrated libpthread.

Comments?  As I said, I'd like to bring these changes into glibc 2.35,
hopefully in early December.

Thanks,
Florian


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Bringing rseq back into glibc
  2021-11-18 10:17 Bringing rseq back into glibc Florian Weimer
@ 2021-11-18 16:32 ` Mathieu Desnoyers
  2021-11-18 16:54   ` Florian Weimer
  2021-11-18 18:42 ` Noah Goldstein
  2021-11-18 18:48 ` Cristian Rodríguez
  2 siblings, 1 reply; 8+ messages in thread
From: Mathieu Desnoyers @ 2021-11-18 16:32 UTC (permalink / raw)
  To: Florian Weimer
  Cc: libc-alpha, linux-api, Jeremy Linton, Vincenzo Frascino,
	Rich Felker, Peter Zijlstra, paulmck, Boqun Feng

----- On Nov 18, 2021, at 5:17 AM, Florian Weimer fweimer@redhat.com wrote:

> I would like to bring back rseq for glibc 2.35.

That's excellent news ! Thanks for looking into this.

> I propose the following steps:
> 
> 1. Enable rseq registration in glibc, for internal use only.  This time,
>   put the rseq area into struct pthread, not into a initial-exec TLS
>   symbol.  (This helps to avoid with initial-exec TLS bloat with dlopen
>   and simplifies initialization somewhat.)

That works for me.

> 
> 2. Add a tunable to disable rseq registration in glibc.  This way, if
>   there is already an rseq user, it can be made to work again by
>   setting the tunable.

Out of curiosity, how is the glibc tunable exposed ? Can it be called
from the application, or is it an environment variable which needs to
be set before running the application ?

> 
> 3. Implement sched_getcpu on top of rseq.
> 
> 4. Add public symbols __rseq_abi_offset, __rseq_abi_size (currently 32
>   or 0), __rseq_abi_flags (currently 0).  __rseq_abi_offset is the
>   offset to add to the thread pointer (see __builtin_thread_pointer) to
>   get to the rseq area.  They will be public ABI symbols.  These
>   variables are initialized before user code runs, and changing the
>   results in undefined behavior.

Works for me. So if the Linux kernel eventually implements something along
the lines of an extensible kTLS, we can could use that underneath.

Small bike-shedding comment: I wonder if we want those public glibc
symbols to be called "__rseq_abi_{offset,size,flags}", or if a name like
"__ktls_{offset,size,flags}" might be more appropriate and future-proof
from a glibc ABI standpoint ?

> 
> Under this model, the rseq area offset is clearly constant across all
> threads.  (This was previously implied by using initial-exec TLS
> memory.)  rseq registration failure is indicated by __rseq_abi_size ==
> 0.  If the size is non-zero, rseq will be registered on all threads
> created by glibc, and all the time as far as user code is concernes.
> (This assumes that if rseq registration succeeds on the main thread, it
> will succeed on all other threads.  We will terminate the process if
> not.)  For example, if a JIT compiler sees __rseq_abi_size >= 32, in
> generated code, it can inline a version of sched_getcpu that
> materializes the thread pointer and loads the cpu_id field from the rseq
> area, without further checks.  Under the old TLS-based model, it was
> less clear that this was a valid optimization.

Sounds good.

Note that multiple applications wishing to use rseq on a shared memory
area may find themselves in a situation where some applications support
rseq, and others don't. So it would be up to the application to negotiate
whether they can use rseq in a shared memory area or not.

> 
> Furthermore, I believe this approach will be more compatible with
> potential future kernel changes in this area.  If the kernel tells us
> some day through the auxiliary vector that we should register a 128-byte
> rseq area with 64-byte alignment, we can make that happen and change
> __rseq_abi_offset and __rseq_abi_size accordingly.

Yes, hence my question about __ktls_* naming for the glibc symbols.

> 
> Steps 1 to 3 are backportable to previous glibc version, especially to
> 2.34 with its integrated libpthread.

So if we have an application or library already using rseq directly through
the system call, upgrading glibc may cause it to fail. Arguably, no new
symbol are exposed, so I guess it's OK with the backport guide-lines.
My question here is: is it OK for a backported patch to break an
application which uses the Linux kernel system calls directly ?

> 
> Comments?  As I said, I'd like to bring these changes into glibc 2.35,
> hopefully in early December.

I won't have time to do the implementation effort myself this time due to
other commitments, but I will try to free up some time for review. Feel
free to grab whatever code you feel is useful from my earlier rseq
integration patches (if any).

Thanks,

Mathieu

> 
> Thanks,
> Florian

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Bringing rseq back into glibc
  2021-11-18 16:32 ` Mathieu Desnoyers
@ 2021-11-18 16:54   ` Florian Weimer
  2021-11-18 17:52     ` Mathieu Desnoyers
  0 siblings, 1 reply; 8+ messages in thread
From: Florian Weimer @ 2021-11-18 16:54 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: libc-alpha, linux-api, Jeremy Linton, Vincenzo Frascino,
	Rich Felker, Peter Zijlstra, paulmck, Boqun Feng

* Mathieu Desnoyers:

> ----- On Nov 18, 2021, at 5:17 AM, Florian Weimer fweimer@redhat.com wrote:
>
>> I would like to bring back rseq for glibc 2.35.
>
> That's excellent news ! Thanks for looking into this.
>
>> I propose the following steps:
>> 
>> 1. Enable rseq registration in glibc, for internal use only.  This time,
>>   put the rseq area into struct pthread, not into a initial-exec TLS
>>   symbol.  (This helps to avoid with initial-exec TLS bloat with dlopen
>>   and simplifies initialization somewhat.)
>
> That works for me.
>
>> 
>> 2. Add a tunable to disable rseq registration in glibc.  This way, if
>>   there is already an rseq user, it can be made to work again by
>>   setting the tunable.
>
> Out of curiosity, how is the glibc tunable exposed ? Can it be called
> from the application, or is it an environment variable which needs to
> be set before running the application ?

Today, it's an environment variable.

>> 3. Implement sched_getcpu on top of rseq.
>> 
>> 4. Add public symbols __rseq_abi_offset, __rseq_abi_size (currently 32
>>   or 0), __rseq_abi_flags (currently 0).  __rseq_abi_offset is the
>>   offset to add to the thread pointer (see __builtin_thread_pointer) to
>>   get to the rseq area.  They will be public ABI symbols.  These
>>   variables are initialized before user code runs, and changing the
>>   results in undefined behavior.
>
> Works for me. So if the Linux kernel eventually implements something along
> the lines of an extensible kTLS, we can could use that underneath.
>
> Small bike-shedding comment: I wonder if we want those public glibc
> symbols to be called "__rseq_abi_{offset,size,flags}", or if a name like
> "__ktls_{offset,size,flags}" might be more appropriate and future-proof
> from a glibc ABI standpoint ?

No, if the kTLS stuff arrives, it might have different sizes and
offsets, and the rseq area is just a slice of that.  So the numbers
could be different.  We could do things as you propose if rseq is
guaranteed to be at the start of the kernel area, always, but do we know
that yet?

Also, kTLS wille likely be called something else to avoid confusion with
Kernel Transport Layer Security.  That's another reason to stick with
__rseq_.

>> Steps 1 to 3 are backportable to previous glibc version, especially to
>> 2.34 with its integrated libpthread.
>
> So if we have an application or library already using rseq directly through
> the system call, upgrading glibc may cause it to fail. Arguably, no new
> symbol are exposed, so I guess it's OK with the backport guide-lines.
> My question here is: is it OK for a backported patch to break an
> application which uses the Linux kernel system calls directly ?

It depends. 8-)

I think we can get away with it because shipping software for deployment
on other people's system must have a fallback path for non-rseq mode
outside of specialized environments.  For the (hopefully) rare
exceptions, we'll provide the tunable setting.

We must have done it before with similar system calls (set_tid_address,
set_robust_list).  But system call design tends to avoid creating new
examples.  rseq is similar to set_tid_address and set_robust_list in
that more or less has to be this way, with the single-user property.
(Supporting multiple users is undesirable from a performance/complexity
perspective.)

>> Comments?  As I said, I'd like to bring these changes into glibc 2.35,
>> hopefully in early December.
>
> I won't have time to do the implementation effort myself this time due to
> other commitments, but I will try to free up some time for review. Feel
> free to grab whatever code you feel is useful from my earlier rseq
> integration patches (if any).

I plan to reuse the architecture-specific marker constants from your
version at least.  That's already going to save a lot of work.  Thanks.

Florian


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Bringing rseq back into glibc
  2021-11-18 16:54   ` Florian Weimer
@ 2021-11-18 17:52     ` Mathieu Desnoyers
  0 siblings, 0 replies; 8+ messages in thread
From: Mathieu Desnoyers @ 2021-11-18 17:52 UTC (permalink / raw)
  To: Florian Weimer
  Cc: libc-alpha, linux-api, Jeremy Linton, Vincenzo Frascino,
	Rich Felker, Peter Zijlstra, paulmck, Boqun Feng

----- On Nov 18, 2021, at 11:54 AM, Florian Weimer fweimer@redhat.com wrote:

> * Mathieu Desnoyers:
> 
>> ----- On Nov 18, 2021, at 5:17 AM, Florian Weimer fweimer@redhat.com wrote:

[...]

> 
>>> 3. Implement sched_getcpu on top of rseq.
>>> 
>>> 4. Add public symbols __rseq_abi_offset, __rseq_abi_size (currently 32
>>>   or 0), __rseq_abi_flags (currently 0).  __rseq_abi_offset is the
>>>   offset to add to the thread pointer (see __builtin_thread_pointer) to
>>>   get to the rseq area.  They will be public ABI symbols.  These
>>>   variables are initialized before user code runs, and changing the
>>>   results in undefined behavior.
>>
>> Works for me. So if the Linux kernel eventually implements something along
>> the lines of an extensible kTLS, we can could use that underneath.
>>
>> Small bike-shedding comment: I wonder if we want those public glibc
>> symbols to be called "__rseq_abi_{offset,size,flags}", or if a name like
>> "__ktls_{offset,size,flags}" might be more appropriate and future-proof
>> from a glibc ABI standpoint ?
> 
> No, if the kTLS stuff arrives, it might have different sizes and
> offsets, and the rseq area is just a slice of that.  So the numbers
> could be different.  We could do things as you propose if rseq is
> guaranteed to be at the start of the kernel area, always, but do we know
> that yet?

You're right, we don't. So let's stick with __rseq_abi_.

> 
> Also, kTLS wille likely be called something else to avoid confusion with
> Kernel Transport Layer Security.  That's another reason to stick with
> __rseq_.

Yep.

> 
>>> Steps 1 to 3 are backportable to previous glibc version, especially to
>>> 2.34 with its integrated libpthread.
>>
>> So if we have an application or library already using rseq directly through
>> the system call, upgrading glibc may cause it to fail. Arguably, no new
>> symbol are exposed, so I guess it's OK with the backport guide-lines.
>> My question here is: is it OK for a backported patch to break an
>> application which uses the Linux kernel system calls directly ?
> 
> It depends. 8-)
> 
> I think we can get away with it because shipping software for deployment
> on other people's system must have a fallback path for non-rseq mode
> outside of specialized environments.  For the (hopefully) rare
> exceptions, we'll provide the tunable setting.

Fair enough.

> 
> We must have done it before with similar system calls (set_tid_address,
> set_robust_list).  But system call design tends to avoid creating new
> examples.  rseq is similar to set_tid_address and set_robust_list in
> that more or less has to be this way, with the single-user property.
> (Supporting multiple users is undesirable from a performance/complexity
> perspective.)

Right.

> 
>>> Comments?  As I said, I'd like to bring these changes into glibc 2.35,
>>> hopefully in early December.
>>
>> I won't have time to do the implementation effort myself this time due to
>> other commitments, but I will try to free up some time for review. Feel
>> free to grab whatever code you feel is useful from my earlier rseq
>> integration patches (if any).
> 
> I plan to reuse the architecture-specific marker constants from your
> version at least.  That's already going to save a lot of work.  Thanks.

You're welcome. Let me know if I can be of further assistance.

Thanks,

Mathieu

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Bringing rseq back into glibc
  2021-11-18 10:17 Bringing rseq back into glibc Florian Weimer
  2021-11-18 16:32 ` Mathieu Desnoyers
@ 2021-11-18 18:42 ` Noah Goldstein
  2021-11-18 18:55   ` Florian Weimer
  2021-11-18 18:48 ` Cristian Rodríguez
  2 siblings, 1 reply; 8+ messages in thread
From: Noah Goldstein @ 2021-11-18 18:42 UTC (permalink / raw)
  To: Florian Weimer
  Cc: GNU C Library, Linux API, Vincenzo Frascino, Mathieu Desnoyers,
	Jeremy Linton, Rich Felker

On Thu, Nov 18, 2021 at 4:17 AM Florian Weimer via Libc-alpha
<libc-alpha@sourceware.org> wrote:
>
> I would like to bring back rseq for glibc 2.35.  I propose the following
> steps:
>
> 1. Enable rseq registration in glibc, for internal use only.  This time,
>    put the rseq area into struct pthread, not into a initial-exec TLS
>    symbol.  (This helps to avoid with initial-exec TLS bloat with dlopen
>    and simplifies initialization somewhat.)

Isn't THREAD_SELF also implemented in TLS? Or am I missing
something?

>
> 2. Add a tunable to disable rseq registration in glibc.  This way, if
>    there is already an rseq user, it can be made to work again by
>    setting the tunable.
>
> 3. Implement sched_getcpu on top of rseq.
>
> 4. Add public symbols __rseq_abi_offset, __rseq_abi_size (currently 32
>    or 0), __rseq_abi_flags (currently 0).  __rseq_abi_offset is the
>    offset to add to the thread pointer (see __builtin_thread_pointer) to
>    get to the rseq area.  They will be public ABI symbols.  These
>    variables are initialized before user code runs, and changing the
>    results in undefined behavior.
>
> Under this model, the rseq area offset is clearly constant across all
> threads.  (This was previously implied by using initial-exec TLS
> memory.)  rseq registration failure is indicated by __rseq_abi_size ==
> 0.  If the size is non-zero, rseq will be registered on all threads
> created by glibc, and all the time as far as user code is concernes.
> (This assumes that if rseq registration succeeds on the main thread, it
> will succeed on all other threads.  We will terminate the process if
> not.)  For example, if a JIT compiler sees __rseq_abi_size >= 32, in
> generated code, it can inline a version of sched_getcpu that
> materializes the thread pointer and loads the cpu_id field from the rseq
> area, without further checks.  Under the old TLS-based model, it was
> less clear that this was a valid optimization.
>
> Furthermore, I believe this approach will be more compatible with
> potential future kernel changes in this area.  If the kernel tells us
> some day through the auxiliary vector that we should register a 128-byte
> rseq area with 64-byte alignment, we can make that happen and change
> __rseq_abi_offset and __rseq_abi_size accordingly.
>
> Steps 1 to 3 are backportable to previous glibc version, especially to
> 2.34 with its integrated libpthread.
>
> Comments?  As I said, I'd like to bring these changes into glibc 2.35,
> hopefully in early December.
>
> Thanks,
> Florian
>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Bringing rseq back into glibc
  2021-11-18 10:17 Bringing rseq back into glibc Florian Weimer
  2021-11-18 16:32 ` Mathieu Desnoyers
  2021-11-18 18:42 ` Noah Goldstein
@ 2021-11-18 18:48 ` Cristian Rodríguez
  2021-11-18 19:41   ` Mathieu Desnoyers
  2 siblings, 1 reply; 8+ messages in thread
From: Cristian Rodríguez @ 2021-11-18 18:48 UTC (permalink / raw)
  To: Florian Weimer
  Cc: libc-alpha, linux-api, Vincenzo Frascino, Mathieu Desnoyers,
	Jeremy Linton, Rich Felker

On Thu, Nov 18, 2021 at 7:17 AM Florian Weimer via Libc-alpha
<libc-alpha@sourceware.org> wrote:

> 4. Add public symbols __rseq_abi_offset, __rseq_abi_size (currently 32
>    or 0), __rseq_abi_flags (currently 0).  __rseq_abi_offset is the
>    offset to add to the thread pointer (see __builtin_thread_pointer) to
>    get to the rseq area.  They will be public ABI symbols.  These
>    variables are initialized before user code runs, and changing the
>    results in undefined behavior.

Why not then __get_rseq_whatwever functions and not variables ? or
maybe writing to these variables results in a compiler or linker error
instead of UB ?

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Bringing rseq back into glibc
  2021-11-18 18:42 ` Noah Goldstein
@ 2021-11-18 18:55   ` Florian Weimer
  0 siblings, 0 replies; 8+ messages in thread
From: Florian Weimer @ 2021-11-18 18:55 UTC (permalink / raw)
  To: Noah Goldstein
  Cc: GNU C Library, Linux API, Vincenzo Frascino, Mathieu Desnoyers,
	Jeremy Linton, Rich Felker

* Noah Goldstein:

> On Thu, Nov 18, 2021 at 4:17 AM Florian Weimer via Libc-alpha
> <libc-alpha@sourceware.org> wrote:
>>
>> I would like to bring back rseq for glibc 2.35.  I propose the following
>> steps:
>>
>> 1. Enable rseq registration in glibc, for internal use only.  This time,
>>    put the rseq area into struct pthread, not into a initial-exec TLS
>>    symbol.  (This helps to avoid with initial-exec TLS bloat with dlopen
>>    and simplifies initialization somewhat.)
>
> Isn't THREAD_SELF also implemented in TLS? Or am I missing
> something?

THREAD_SELF uses a pointer in the thread control block, and that pointer
is not replicated for different libc.so.6 copies with dlmopen (like the
rest of the TCB and struct pthread).

Thanks,
Florian


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Bringing rseq back into glibc
  2021-11-18 18:48 ` Cristian Rodríguez
@ 2021-11-18 19:41   ` Mathieu Desnoyers
  0 siblings, 0 replies; 8+ messages in thread
From: Mathieu Desnoyers @ 2021-11-18 19:41 UTC (permalink / raw)
  To: Cristian Rodríguez
  Cc: Florian Weimer, libc-alpha, linux-api, Vincenzo Frascino,
	Jeremy Linton, Rich Felker

----- On Nov 18, 2021, at 1:48 PM, Cristian Rodríguez crrodriguez@opensuse.org wrote:

> On Thu, Nov 18, 2021 at 7:17 AM Florian Weimer via Libc-alpha
> <libc-alpha@sourceware.org> wrote:
> 
>> 4. Add public symbols __rseq_abi_offset, __rseq_abi_size (currently 32
>>    or 0), __rseq_abi_flags (currently 0).  __rseq_abi_offset is the
>>    offset to add to the thread pointer (see __builtin_thread_pointer) to
>>    get to the rseq area.  They will be public ABI symbols.  These
>>    variables are initialized before user code runs, and changing the
>>    results in undefined behavior.
> 
> Why not then __get_rseq_whatwever functions and not variables ? or
> maybe writing to these variables results in a compiler or linker error
> instead of UB ?

rseq critical sections cannot issue function calls, and also function calls
are noticeably expensive compared to an rseq critical section. So all users
would end up needing to make a local copy of the information fetched by those
getters.

So rather than require all those extra per-user copies, I suspect exposing
a single copy through public glibc symbols is more efficient.

The downside is indeed that writing to those variables is UB.

Thanks,

Mathieu

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2021-11-18 19:41 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-11-18 10:17 Bringing rseq back into glibc Florian Weimer
2021-11-18 16:32 ` Mathieu Desnoyers
2021-11-18 16:54   ` Florian Weimer
2021-11-18 17:52     ` Mathieu Desnoyers
2021-11-18 18:42 ` Noah Goldstein
2021-11-18 18:55   ` Florian Weimer
2021-11-18 18:48 ` Cristian Rodríguez
2021-11-18 19:41   ` Mathieu Desnoyers

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.