linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Restartable Sequences system call merged into Linux
@ 2018-06-11 19:49 Mathieu Desnoyers
  2018-06-11 19:55 ` Florian Weimer
  2018-06-13 11:48 ` Heiko Carstens
  0 siblings, 2 replies; 25+ messages in thread
From: Mathieu Desnoyers @ 2018-06-11 19:49 UTC (permalink / raw)
  To: Carlos O'Donell, Florian Weimer
  Cc: Peter Zijlstra, Paul E. McKenney, Boqun Feng, Thomas Gleixner,
	linux-kernel, libc-alpha

Hi!

Good news! The restartable sequences (rseq) system call is now merged into the master
branch of the Linux kernel within the 4.18 merge window:

https://github.com/torvalds/linux/commit/d82991a8688ad128b46db1b42d5d84396487a508

It would be important to discuss how we should proceed to integrate the library part
of rseq (see tools/testing/selftests/rseq/rseq*.{ch}) into glibc, or if it should
live in a standalone project.

It should be noted that there can be only one rseq TLS area registered per thread,
which can then be used by many libraries and by the executable, so this is a
process-wide (per-thread) resource that we need to manage carefully.

Thoughts ?

Thanks!

Mathieu


-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Restartable Sequences system call merged into Linux
  2018-06-11 19:49 Restartable Sequences system call merged into Linux Mathieu Desnoyers
@ 2018-06-11 19:55 ` Florian Weimer
  2018-06-11 20:04   ` Mathieu Desnoyers
  2018-06-13 11:48 ` Heiko Carstens
  1 sibling, 1 reply; 25+ messages in thread
From: Florian Weimer @ 2018-06-11 19:55 UTC (permalink / raw)
  To: Mathieu Desnoyers, Carlos O'Donell
  Cc: Peter Zijlstra, Paul E. McKenney, Boqun Feng, Thomas Gleixner,
	linux-kernel, libc-alpha

On 06/11/2018 09:49 PM, Mathieu Desnoyers wrote:
> It should be noted that there can be only one rseq TLS area registered per thread,
> which can then be used by many libraries and by the executable, so this is a
> process-wide (per-thread) resource that we need to manage carefully.

Is it possible to resize the area after thread creation, perhaps even 
from other threads?

If there is only one contiguous area, this generally means there needs 
to be linker support, similar to what we have for initial-exec TLS today.

Thanks,
Florian

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Restartable Sequences system call merged into Linux
  2018-06-11 19:55 ` Florian Weimer
@ 2018-06-11 20:04   ` Mathieu Desnoyers
  2018-06-12 13:11     ` Florian Weimer
  0 siblings, 1 reply; 25+ messages in thread
From: Mathieu Desnoyers @ 2018-06-11 20:04 UTC (permalink / raw)
  To: Florian Weimer
  Cc: Carlos O'Donell, Peter Zijlstra, Paul E. McKenney,
	Boqun Feng, Thomas Gleixner, linux-kernel, libc-alpha

----- On Jun 11, 2018, at 3:55 PM, Florian Weimer fweimer@redhat.com wrote:

> On 06/11/2018 09:49 PM, Mathieu Desnoyers wrote:
>> It should be noted that there can be only one rseq TLS area registered per
>> thread,
>> which can then be used by many libraries and by the executable, so this is a
>> process-wide (per-thread) resource that we need to manage carefully.
> 
> Is it possible to resize the area after thread creation, perhaps even
> from other threads?

I'm not sure why we would want to resize it. The per-thread area is fixed-size.
Its layout is here: include/uapi/linux/rseq.h: struct rseq

The ABI is designed so that all users (program and libraries) can interact
through this per-thread TLS area.

> 
> If there is only one contiguous area, this generally means there needs
> to be linker support, similar to what we have for initial-exec TLS today.

Not entirely sure what you imply by "one contiguous area". All we need is
a single fixed-size TLS area for each thread.

Thanks,

Mathieu

> 
> Thanks,
> Florian

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Restartable Sequences system call merged into Linux
  2018-06-11 20:04   ` Mathieu Desnoyers
@ 2018-06-12 13:11     ` Florian Weimer
  2018-06-12 16:31       ` Mathieu Desnoyers
  0 siblings, 1 reply; 25+ messages in thread
From: Florian Weimer @ 2018-06-12 13:11 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: Carlos O'Donell, Peter Zijlstra, Paul E. McKenney,
	Boqun Feng, Thomas Gleixner, linux-kernel, libc-alpha

On 06/11/2018 10:04 PM, Mathieu Desnoyers wrote:
> ----- On Jun 11, 2018, at 3:55 PM, Florian Weimer fweimer@redhat.com wrote:
> 
>> On 06/11/2018 09:49 PM, Mathieu Desnoyers wrote:
>>> It should be noted that there can be only one rseq TLS area registered per
>>> thread,
>>> which can then be used by many libraries and by the executable, so this is a
>>> process-wide (per-thread) resource that we need to manage carefully.
>>
>> Is it possible to resize the area after thread creation, perhaps even
>> from other threads?
> 
> I'm not sure why we would want to resize it. The per-thread area is fixed-size.
> Its layout is here: include/uapi/linux/rseq.h: struct rseq

Looks I was mistaken and this is very similar to the robust mutex list.

Should we treat it the same way?  Always allocate it for each new thread 
and register it with the kernel?

> The ABI is designed so that all users (program and libraries) can interact
> through this per-thread TLS area.

Then the user code needs just the address of the structure.

How much coordination is needed between different users of this 
interface?  Looking at the the section hacks, I don't think we want to 
put this into glibc at this stage.  It looks more like something for 
which we traditionally require compiler support.

Thanks,
Florian

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Restartable Sequences system call merged into Linux
  2018-06-12 13:11     ` Florian Weimer
@ 2018-06-12 16:31       ` Mathieu Desnoyers
  2018-06-13  8:21         ` Florian Weimer
  2018-06-14 12:27         ` Pavel Machek
  0 siblings, 2 replies; 25+ messages in thread
From: Mathieu Desnoyers @ 2018-06-12 16:31 UTC (permalink / raw)
  To: Florian Weimer
  Cc: carlos, Peter Zijlstra, Paul E. McKenney, Boqun Feng,
	Thomas Gleixner, linux-kernel, libc-alpha

----- On Jun 12, 2018, at 9:11 AM, Florian Weimer fweimer@redhat.com wrote:

> On 06/11/2018 10:04 PM, Mathieu Desnoyers wrote:
>> ----- On Jun 11, 2018, at 3:55 PM, Florian Weimer fweimer@redhat.com wrote:
>> 
>>> On 06/11/2018 09:49 PM, Mathieu Desnoyers wrote:
>>>> It should be noted that there can be only one rseq TLS area registered per
>>>> thread,
>>>> which can then be used by many libraries and by the executable, so this is a
>>>> process-wide (per-thread) resource that we need to manage carefully.
>>>
>>> Is it possible to resize the area after thread creation, perhaps even
>>> from other threads?
>> 
>> I'm not sure why we would want to resize it. The per-thread area is fixed-size.
>> Its layout is here: include/uapi/linux/rseq.h: struct rseq
> 
> Looks I was mistaken and this is very similar to the robust mutex list.
> 
> Should we treat it the same way?  Always allocate it for each new thread
> and register it with the kernel?

That would be an efficient way to do it, indeed. There is very little
performance overhead to have rseq registered for all threads, whether or
not they intend to run rseq critical sections.

> 
>> The ABI is designed so that all users (program and libraries) can interact
>> through this per-thread TLS area.
> 
> Then the user code needs just the address of the structure.

Yes.

> 
> How much coordination is needed between different users of this
> interface?  Looking at the the section hacks, I don't think we want to
> put this into glibc at this stage.  It looks more like something for
> which we traditionally require compiler support.

I really don't mind maintaining a separate project containing librseq
along with the headers needed to facilitate declaration of rseq critical
sections. This specifically does not need much coordination between users of
the interface.

The part which really requires coordination between users is registration
to the kernel (and ownership) of the rseq TLS area.

I have a few possible approaches in mind (feel free to suggest other
options):

A) glibc exposes a strong __rseq_abi TLS symbol:

   - should ideally *not* be global-dynamic for performance reasons,
   - registration to kernel can either be handled explicitly by requiring
     application or libraries to call an API, or implicitly at thread
     creation,
   - requires all rseq users to upgrade to newer glibc. Early rseq users
     (libs and applications) registering their own rseq TLS will conflict
     with newer glibc.

B) librseq.so exposes a strong __rseq_abi symbol:

   - should ideally *not* be global-dynamic for performance reasons, but
     testing shows that using initial-exec causes issues in situations where
     librseq.so ends up being dlopen'd (e.g. java virtual machine dlopening
     the lttng-ust tracer linked against librseq.so),
   - registration/unregistration of area to kernel can either be performed
     lazily on first use, destruction done using pthread_key, or require an
     explicit API call from application,
   - A per-thread refcount in a TLS could allow many users to call the
     registration/unregistration API, and lazy registration,
   - an early-user application which also exposes a __rseq_abi strong symbol
     would conflict with librseq.so.

C) __rseq_abi symbol declared weak within each user (application, librseq,
   other libraries, glibc):

   - should ideally *not* be global-dynamic for performance reasons,
     - however, initial-exec causes issues when librseq or early user libraries
       are dlopen'd (e.g. java runtime dlopening lttng-ust),
   - a weak symbol allow combining early user libs/apps with glibc/librseq
     exposing the same symbol,
   - considering that glibc is AFAIK never dlopen'd, does not cause exhaustion
     of initial-exec TLS entries in cases where librseq.so or early adopter
     libs are dlopen'd,
   - if glibc implicitly registers the rseq area, *and* librseq.so also wants
     to register it, *and* early adopters also want to register it, we should
     come up with a refcount scheme in the TLS ensuring that registration and
     unregistration is only done with the first/last user comes/goes away.

Thoughts ?

Thanks!

Mathieu

> 
> Thanks,
> Florian

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Restartable Sequences system call merged into Linux
  2018-06-12 16:31       ` Mathieu Desnoyers
@ 2018-06-13  8:21         ` Florian Weimer
  2018-06-14 12:27         ` Pavel Machek
  1 sibling, 0 replies; 25+ messages in thread
From: Florian Weimer @ 2018-06-13  8:21 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: carlos, Peter Zijlstra, Paul E. McKenney, Boqun Feng,
	Thomas Gleixner, linux-kernel, libc-alpha

On 06/12/2018 06:31 PM, Mathieu Desnoyers wrote:
> ----- On Jun 12, 2018, at 9:11 AM, Florian Weimer fweimer@redhat.com wrote:
> 
>> On 06/11/2018 10:04 PM, Mathieu Desnoyers wrote:
>>> ----- On Jun 11, 2018, at 3:55 PM, Florian Weimer fweimer@redhat.com wrote:
>>>
>>>> On 06/11/2018 09:49 PM, Mathieu Desnoyers wrote:
>>>>> It should be noted that there can be only one rseq TLS area registered per
>>>>> thread,
>>>>> which can then be used by many libraries and by the executable, so this is a
>>>>> process-wide (per-thread) resource that we need to manage carefully.
>>>>
>>>> Is it possible to resize the area after thread creation, perhaps even
>>>> from other threads?
>>>
>>> I'm not sure why we would want to resize it. The per-thread area is fixed-size.
>>> Its layout is here: include/uapi/linux/rseq.h: struct rseq
>>
>> Looks I was mistaken and this is very similar to the robust mutex list.
>>
>> Should we treat it the same way?  Always allocate it for each new thread
>> and register it with the kernel?
> 
> That would be an efficient way to do it, indeed. There is very little
> performance overhead to have rseq registered for all threads, whether or
> not they intend to run rseq critical sections.
> 
>>
>>> The ABI is designed so that all users (program and libraries) can interact
>>> through this per-thread TLS area.
>>
>> Then the user code needs just the address of the structure.
> 
> Yes.

So we'd add

   struct rseq *rseq_location (void);

and be done with it?  It would return the address of the thread-local 
variable, similar to __errno_location.

Or we could add something like this:

   extern __thread struct rseq pthread_rseq_area_np
     __attribute__ ((__tls_model__ ("initial-exec")));

But of course only for recent-enough GNU compilers (and Clang, which 
identifies itself as GNU).

The advantage of the function call is that it often results in more 
compact code.  Making the initial-exec nature part of the ABI has the 
advantage that the applications could use the fact of the constant 
offset to the thread pointer if they desire to do so.

Would we need to document which glibc functions use 
pthread_rseq_area_np, so that applications do not call them when they 
itself use the area?

Do we actually need to use RSEQ_FLAG_UNREGISTER prior to thread exit? 
Why can't the kernel do it for us?

>     - requires all rseq users to upgrade to newer glibc. Early rseq users
>       (libs and applications) registering their own rseq TLS will conflict
>       with newer glibc.

We will need to do something about stack unwinding and longjmp anyway (I 
assume the kernel already handles signals for us), so it may not be 
possible to use restartable sequences in any substantial way with a 
system upgrade anyway.

> B) librseq.so exposes a strong __rseq_abi symbol:
> 
>     - should ideally *not* be global-dynamic for performance reasons, but
>       testing shows that using initial-exec causes issues in situations where
>       librseq.so ends up being dlopen'd (e.g. java virtual machine dlopening
>       the lttng-ust tracer linked against librseq.so),

Just an aside:

You can work around that using preloading.  On the glibc side, we could 
also make the initial reserve configurable.  On 64-bit, there really is 
no reason not to use a different TCB allocation scheme which would allow 
you to create a few threads before the initial-exec TLS area cannot be 
extended.

The existing approach dates back to LinuxThreads and its TCB collocated 
with the the stack.  But changes in the next few months are not very likely.

> C) __rseq_abi symbol declared weak within each user (application, librseq,
>     other libraries, glibc):

We can multiple two non-weak definitions for the symbol.  It should work 
as long as only the definition in glibc has a symbol version.

__rseq_abi as a name is problematic because it's in the internal namespace.

Thanks,
Florian

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Restartable Sequences system call merged into Linux
  2018-06-11 19:49 Restartable Sequences system call merged into Linux Mathieu Desnoyers
  2018-06-11 19:55 ` Florian Weimer
@ 2018-06-13 11:48 ` Heiko Carstens
  2018-06-13 16:14   ` Mathieu Desnoyers
  1 sibling, 1 reply; 25+ messages in thread
From: Heiko Carstens @ 2018-06-13 11:48 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: Carlos O'Donell, Florian Weimer, Peter Zijlstra,
	Paul E. McKenney, Boqun Feng, Thomas Gleixner, linux-kernel,
	libc-alpha

On Mon, Jun 11, 2018 at 03:49:18PM -0400, Mathieu Desnoyers wrote:
> Hi!
> 
> Good news! The restartable sequences (rseq) system call is now merged into the master
> branch of the Linux kernel within the 4.18 merge window:
> 
> https://github.com/torvalds/linux/commit/d82991a8688ad128b46db1b42d5d84396487a508
> 
> It would be important to discuss how we should proceed to integrate the library part
> of rseq (see tools/testing/selftests/rseq/rseq*.{ch}) into glibc, or if it should
> live in a standalone project.

Is there any documentation available of what is the exact semantics of the
functions that have to be implemented for additional architectures?

I could look at rseq-skip.h and e.g. rseq-x86.h and try to figure out what
would be the correct implementation for s390. But having that somewhere
written down, e.g. as comments in one of the implementations, would be very
helpful.


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Restartable Sequences system call merged into Linux
  2018-06-13 11:48 ` Heiko Carstens
@ 2018-06-13 16:14   ` Mathieu Desnoyers
  2018-06-13 19:53     ` Mathieu Desnoyers
  0 siblings, 1 reply; 25+ messages in thread
From: Mathieu Desnoyers @ 2018-06-13 16:14 UTC (permalink / raw)
  To: heiko carstens
  Cc: carlos, Florian Weimer, Peter Zijlstra, Paul E. McKenney,
	Boqun Feng, Thomas Gleixner, linux-kernel, libc-alpha

----- On Jun 13, 2018, at 7:48 AM, heiko carstens heiko.carstens@de.ibm.com wrote:

> On Mon, Jun 11, 2018 at 03:49:18PM -0400, Mathieu Desnoyers wrote:
>> Hi!
>> 
>> Good news! The restartable sequences (rseq) system call is now merged into the
>> master
>> branch of the Linux kernel within the 4.18 merge window:
>> 
>> https://github.com/torvalds/linux/commit/d82991a8688ad128b46db1b42d5d84396487a508
>> 
>> It would be important to discuss how we should proceed to integrate the library
>> part
>> of rseq (see tools/testing/selftests/rseq/rseq*.{ch}) into glibc, or if it
>> should
>> live in a standalone project.
> 
> Is there any documentation available of what is the exact semantics of the
> functions that have to be implemented for additional architectures?

It's documented on top of kernel/rseq.c:

/*
 *
 * Restartable sequences are a lightweight interface that allows
 * user-level code to be executed atomically relative to scheduler
 * preemption and signal delivery. Typically used for implementing
 * per-cpu operations.
 *
 * It allows user-space to perform update operations on per-cpu data
 * without requiring heavy-weight atomic operations.
 *
 * Detailed algorithm of rseq user-space assembly sequences:
 *
 *                     init(rseq_cs)
 *                     cpu = TLS->rseq::cpu_id_start
 *   [1]               TLS->rseq::rseq_cs = rseq_cs
 *   [start_ip]        ----------------------------
 *   [2]               if (cpu != TLS->rseq::cpu_id)
 *                             goto abort_ip;
 *   [3]               <last_instruction_in_cs>
 *   [post_commit_ip]  ----------------------------
 *
 *   The address of jump target abort_ip must be outside the critical
 *   region, i.e.:
 *
 *     [abort_ip] < [start_ip]  || [abort_ip] >= [post_commit_ip]
 *
 *   Steps [2]-[3] (inclusive) need to be a sequence of instructions in
 *   userspace that can handle being interrupted between any of those
 *   instructions, and then resumed to the abort_ip.
 *
 *   1.  Userspace stores the address of the struct rseq_cs assembly
 *       block descriptor into the rseq_cs field of the registered
 *       struct rseq TLS area. This update is performed through a single
 *       store within the inline assembly instruction sequence.
 *       [start_ip]
 *
 *   2.  Userspace tests to check whether the current cpu_id field match
 *       the cpu number loaded before start_ip, branching to abort_ip
 *       in case of a mismatch.
 *
 *       If the sequence is preempted or interrupted by a signal
 *       at or after start_ip and before post_commit_ip, then the kernel
 *       clears TLS->__rseq_abi::rseq_cs, and sets the user-space return
 *       ip to abort_ip before returning to user-space, so the preempted
 *       execution resumes at abort_ip.
 *
 *   3.  Userspace critical section final instruction before
 *       post_commit_ip is the commit. The critical section is
 *       self-terminating.
 *       [post_commit_ip]
 *
 *   4.  <success>
 *
 *   On failure at [2], or if interrupted by preempt or signal delivery
 *   between [1] and [3]:
 *
 *       [abort_ip]
 *   F1. <failure>
 */

> I could look at rseq-skip.h and e.g. rseq-x86.h and try to figure out what
> would be the correct implementation for s390. But having that somewhere
> written down, e.g. as comments in one of the implementations, would be very
> helpful.

The first architecture implemented was rseq-x86.h. Boqun derived rseq-ppc.h
from it, and I derived rseq-arm.h from it. Feel free to start from whichever
architecture has the instruction set which is most similar to yours.

Thanks!

Mathieu

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Restartable Sequences system call merged into Linux
  2018-06-13 16:14   ` Mathieu Desnoyers
@ 2018-06-13 19:53     ` Mathieu Desnoyers
  0 siblings, 0 replies; 25+ messages in thread
From: Mathieu Desnoyers @ 2018-06-13 19:53 UTC (permalink / raw)
  To: heiko carstens
  Cc: carlos, Florian Weimer, Peter Zijlstra, Paul E. McKenney,
	Boqun Feng, Thomas Gleixner, linux-kernel, libc-alpha

----- On Jun 13, 2018, at 12:14 PM, Mathieu Desnoyers mathieu.desnoyers@efficios.com wrote:

> ----- On Jun 13, 2018, at 7:48 AM, heiko carstens heiko.carstens@de.ibm.com
> wrote:
[...]
>> 
>> Is there any documentation available of what is the exact semantics of the
>> functions that have to be implemented for additional architectures?
> 
> It's documented on top of kernel/rseq.c:
> 

[...]

> 
> The first architecture implemented was rseq-x86.h. Boqun derived rseq-ppc.h
> from it, and I derived rseq-arm.h from it. Feel free to start from whichever
> architecture has the instruction set which is most similar to yours.

One more thing: adding full support for your architecture to the rseq selftests
also requires to extend tools/testing/selftests/rseq/param_test.c to implement
the RSEQ_INJECT_INPUT, INJECT_ASM_REG, RSEQ_INJECT_CLOBBER and RSEQ_INJECT_ASM
macros for your architecture. Those are simply defining the inline asm operands
and assembly code needed to inject delay loops within the rseq critical sections,
which greatly facilitates testing.

Thanks,

Mathieu

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Restartable Sequences system call merged into Linux
  2018-06-12 16:31       ` Mathieu Desnoyers
  2018-06-13  8:21         ` Florian Weimer
@ 2018-06-14 12:27         ` Pavel Machek
  2018-06-14 13:01           ` Mathieu Desnoyers
  2018-06-15  5:07           ` Florian Weimer
  1 sibling, 2 replies; 25+ messages in thread
From: Pavel Machek @ 2018-06-14 12:27 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: Florian Weimer, carlos, Peter Zijlstra, Paul E. McKenney,
	Boqun Feng, Thomas Gleixner, linux-kernel, libc-alpha

[-- Attachment #1: Type: text/plain, Size: 2024 bytes --]

On Tue 2018-06-12 12:31:24, Mathieu Desnoyers wrote:
> ----- On Jun 12, 2018, at 9:11 AM, Florian Weimer fweimer@redhat.com wrote:
> 
> > On 06/11/2018 10:04 PM, Mathieu Desnoyers wrote:
> >> ----- On Jun 11, 2018, at 3:55 PM, Florian Weimer fweimer@redhat.com wrote:
> >> 
> >>> On 06/11/2018 09:49 PM, Mathieu Desnoyers wrote:
> >>>> It should be noted that there can be only one rseq TLS area registered per
> >>>> thread,
> >>>> which can then be used by many libraries and by the executable, so this is a
> >>>> process-wide (per-thread) resource that we need to manage carefully.
> >>>
> >>> Is it possible to resize the area after thread creation, perhaps even
> >>> from other threads?
> >> 
> >> I'm not sure why we would want to resize it. The per-thread area is fixed-size.
> >> Its layout is here: include/uapi/linux/rseq.h: struct rseq
> > 
> > Looks I was mistaken and this is very similar to the robust mutex list.
> > 
> > Should we treat it the same way?  Always allocate it for each new thread
> > and register it with the kernel?
> 
> That would be an efficient way to do it, indeed. There is very little
> performance overhead to have rseq registered for all threads, whether or
> not they intend to run rseq critical sections.

People with slow / low memory machines would prefer not to see
overhead they don't need...

> I have a few possible approaches in mind (feel free to suggest other
> options):
> 
> A) glibc exposes a strong __rseq_abi TLS symbol:
> 
>    - should ideally *not* be global-dynamic for performance reasons,
>    - registration to kernel can either be handled explicitly by requiring
>      application or libraries to call an API, or implicitly at thread
>      creation,

...so I'd prefer explicit API call.

> B) librseq.so exposes a strong __rseq_abi symbol:

Works for me.
									Pavel

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Restartable Sequences system call merged into Linux
  2018-06-14 12:27         ` Pavel Machek
@ 2018-06-14 13:01           ` Mathieu Desnoyers
  2018-06-14 13:25             ` Pavel Machek
  2018-06-15  5:09             ` Florian Weimer
  2018-06-15  5:07           ` Florian Weimer
  1 sibling, 2 replies; 25+ messages in thread
From: Mathieu Desnoyers @ 2018-06-14 13:01 UTC (permalink / raw)
  To: Pavel Machek
  Cc: Florian Weimer, carlos, Peter Zijlstra, Paul E. McKenney,
	Boqun Feng, Thomas Gleixner, linux-kernel, libc-alpha

----- On Jun 14, 2018, at 8:27 AM, Pavel Machek pavel@ucw.cz wrote:

> On Tue 2018-06-12 12:31:24, Mathieu Desnoyers wrote:
>> ----- On Jun 12, 2018, at 9:11 AM, Florian Weimer fweimer@redhat.com wrote:
>> 
>> > On 06/11/2018 10:04 PM, Mathieu Desnoyers wrote:
>> >> ----- On Jun 11, 2018, at 3:55 PM, Florian Weimer fweimer@redhat.com wrote:
>> >> 
>> >>> On 06/11/2018 09:49 PM, Mathieu Desnoyers wrote:
>> >>>> It should be noted that there can be only one rseq TLS area registered per
>> >>>> thread,
>> >>>> which can then be used by many libraries and by the executable, so this is a
>> >>>> process-wide (per-thread) resource that we need to manage carefully.
>> >>>
>> >>> Is it possible to resize the area after thread creation, perhaps even
>> >>> from other threads?
>> >> 
>> >> I'm not sure why we would want to resize it. The per-thread area is fixed-size.
>> >> Its layout is here: include/uapi/linux/rseq.h: struct rseq
>> > 
>> > Looks I was mistaken and this is very similar to the robust mutex list.
>> > 
>> > Should we treat it the same way?  Always allocate it for each new thread
>> > and register it with the kernel?
>> 
>> That would be an efficient way to do it, indeed. There is very little
>> performance overhead to have rseq registered for all threads, whether or
>> not they intend to run rseq critical sections.
> 
> People with slow / low memory machines would prefer not to see
> overhead they don't need...

In terms of memory usage, if people don't want the extra few bytes of memory
used by rseq in the kernel, they should use CONFIG_RSEQ=n.

In terms of overhead, let's have a closer look at what it means: when a thread
is registered to rseq, but does not enter rseq critical sections, only this
extra work is done by the kernel:

- rseq_preempt(): on preemption, the scheduler sets the TIF_NOTIFY_RESUME thread
  flag, so rseq_handle_notify_resume() can check whether it's in a rseq critical
  section when returning to user-space,
- rseq_signal_deliver(): on signal delivery, rseq_handle_notify_resume() checks
  whether it's in a rseq critical section,
- rseq_migrate: on migration, the scheduler sets TIF_NOTIFY_RESUME as well,

> 
>> I have a few possible approaches in mind (feel free to suggest other
>> options):
>> 
>> A) glibc exposes a strong __rseq_abi TLS symbol:
>> 
>>    - should ideally *not* be global-dynamic for performance reasons,
>>    - registration to kernel can either be handled explicitly by requiring
>>      application or libraries to call an API, or implicitly at thread
>>      creation,
> 
> ...so I'd prefer explicit API call.

I have use-cases where a library wants to link against librseq and have rseq
critical sections, without requiring the application to explicitly add rseq
registration calls on thread creation/destruction. Is there a way to register
callbacks to glibc which could be invoked on thread creation/destruction ?

Then if we include dynamic loading of libraries (dlopen/dlclose) in the
picture, this gets even worse, as we'd need to be able to iterate on all
existing threads to invoke registration/unregistration callbacks.

One alternative approach would be to let the user library lazily register rseq
when needed, and use a pthread_key for unregistration. However, this does not
allow dlclose of the user library without figuring a way to iterate on all
threads.

Another alternative would be to somehow let glibc handle the registration,
perhaps only doing it for applications expressing their interest for rseq.

Thoughts ?

Thanks,

Mathieu

> 
>> B) librseq.so exposes a strong __rseq_abi symbol:
> 
> Works for me.
>									Pavel
> 
> --
> (english) http://www.livejournal.com/~pavelmachek
> (cesky, pictures)
> http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Restartable Sequences system call merged into Linux
  2018-06-14 13:01           ` Mathieu Desnoyers
@ 2018-06-14 13:25             ` Pavel Machek
  2018-06-14 13:32               ` Florian Weimer
  2018-06-14 13:38               ` Mathieu Desnoyers
  2018-06-15  5:09             ` Florian Weimer
  1 sibling, 2 replies; 25+ messages in thread
From: Pavel Machek @ 2018-06-14 13:25 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: Florian Weimer, carlos, Peter Zijlstra, Paul E. McKenney,
	Boqun Feng, Thomas Gleixner, linux-kernel, libc-alpha

[-- Attachment #1: Type: text/plain, Size: 2192 bytes --]

Hi!

> >> >>>> It should be noted that there can be only one rseq TLS area registered per
> >> >>>> thread,
> >> >>>> which can then be used by many libraries and by the executable, so this is a
> >> >>>> process-wide (per-thread) resource that we need to manage carefully.
> >> >>>
> >> >>> Is it possible to resize the area after thread creation, perhaps even
> >> >>> from other threads?
> >> >> 
> >> >> I'm not sure why we would want to resize it. The per-thread area is fixed-size.
> >> >> Its layout is here: include/uapi/linux/rseq.h: struct rseq
> >> > 
> >> > Looks I was mistaken and this is very similar to the robust mutex list.
> >> > 
> >> > Should we treat it the same way?  Always allocate it for each new thread
> >> > and register it with the kernel?
> >> 
> >> That would be an efficient way to do it, indeed. There is very little
> >> performance overhead to have rseq registered for all threads, whether or
> >> not they intend to run rseq critical sections.
> > 
> > People with slow / low memory machines would prefer not to see
> > overhead they don't need...
> 
> In terms of memory usage, if people don't want the extra few bytes of memory
> used by rseq in the kernel, they should use CONFIG_RSEQ=n.
> 
> In terms of overhead, let's have a closer look at what it means: when a thread
> is registered to rseq, but does not enter rseq critical sections, only this
> extra work is done by the kernel:
> 
> - rseq_preempt(): on preemption, the scheduler sets the TIF_NOTIFY_RESUME thread
>   flag, so rseq_handle_notify_resume() can check whether it's in a rseq critical
>   section when returning to user-space,
> - rseq_signal_deliver(): on signal delivery, rseq_handle_notify_resume() checks
>   whether it's in a rseq critical section,
> - rseq_migrate: on migration, the scheduler sets TIF_NOTIFY_RESUME as well,

Yes, this is not likely to be noticeable.

But the proposal wanted to add a syscall to thread creation, right?
And I believe that may be noticeable.
									Pavel

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Restartable Sequences system call merged into Linux
  2018-06-14 13:25             ` Pavel Machek
@ 2018-06-14 13:32               ` Florian Weimer
  2018-06-14 13:46                 ` Mathieu Desnoyers
  2018-06-14 13:38               ` Mathieu Desnoyers
  1 sibling, 1 reply; 25+ messages in thread
From: Florian Weimer @ 2018-06-14 13:32 UTC (permalink / raw)
  To: Pavel Machek, Mathieu Desnoyers
  Cc: carlos, Peter Zijlstra, Paul E. McKenney, Boqun Feng,
	Thomas Gleixner, linux-kernel, libc-alpha

On 06/14/2018 03:25 PM, Pavel Machek wrote:

> But the proposal wanted to add a syscall to thread creation, right?
> And I believe that may be noticeable.

We already call set_robust_list, so we could just pass a larger area to 
that and the kernel could use it.  Then no additional system call would 
be needed in the common case (new kernel which recognizes the new area 
size).

But then we cannot use an initial-exec thread local variable for it 
(although the offset from the thread pointer will still be constant, of 
course).

Thanks,
Florian

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Restartable Sequences system call merged into Linux
  2018-06-14 13:25             ` Pavel Machek
  2018-06-14 13:32               ` Florian Weimer
@ 2018-06-14 13:38               ` Mathieu Desnoyers
  2018-06-14 13:49                 ` Pavel Machek
  1 sibling, 1 reply; 25+ messages in thread
From: Mathieu Desnoyers @ 2018-06-14 13:38 UTC (permalink / raw)
  To: Pavel Machek
  Cc: Florian Weimer, carlos, Peter Zijlstra, Paul E. McKenney,
	Boqun Feng, Thomas Gleixner, linux-kernel, libc-alpha

----- On Jun 14, 2018, at 9:25 AM, Pavel Machek pavel@ucw.cz wrote:

> Hi!
> 
>> >> >>>> It should be noted that there can be only one rseq TLS area registered per
>> >> >>>> thread,
>> >> >>>> which can then be used by many libraries and by the executable, so this is a
>> >> >>>> process-wide (per-thread) resource that we need to manage carefully.
>> >> >>>
>> >> >>> Is it possible to resize the area after thread creation, perhaps even
>> >> >>> from other threads?
>> >> >> 
>> >> >> I'm not sure why we would want to resize it. The per-thread area is fixed-size.
>> >> >> Its layout is here: include/uapi/linux/rseq.h: struct rseq
>> >> > 
>> >> > Looks I was mistaken and this is very similar to the robust mutex list.
>> >> > 
>> >> > Should we treat it the same way?  Always allocate it for each new thread
>> >> > and register it with the kernel?
>> >> 
>> >> That would be an efficient way to do it, indeed. There is very little
>> >> performance overhead to have rseq registered for all threads, whether or
>> >> not they intend to run rseq critical sections.
>> > 
>> > People with slow / low memory machines would prefer not to see
>> > overhead they don't need...
>> 
>> In terms of memory usage, if people don't want the extra few bytes of memory
>> used by rseq in the kernel, they should use CONFIG_RSEQ=n.
>> 
>> In terms of overhead, let's have a closer look at what it means: when a thread
>> is registered to rseq, but does not enter rseq critical sections, only this
>> extra work is done by the kernel:
>> 
>> - rseq_preempt(): on preemption, the scheduler sets the TIF_NOTIFY_RESUME thread
>>   flag, so rseq_handle_notify_resume() can check whether it's in a rseq critical
>>   section when returning to user-space,
>> - rseq_signal_deliver(): on signal delivery, rseq_handle_notify_resume() checks
>>   whether it's in a rseq critical section,
>> - rseq_migrate: on migration, the scheduler sets TIF_NOTIFY_RESUME as well,
> 
> Yes, this is not likely to be noticeable.
> 
> But the proposal wanted to add a syscall to thread creation, right?
> And I believe that may be noticeable.

Fair point! Do we have a standard benchmark that would stress this ?

If it ends up being noticeable overhead, I wonder whether we could extend clone() with a
new CLONE_RSEQ flag so glibc could pass a pointer to the rseq TLS area through an extra
argument to the clone system call rather than do an extra syscall on thread creation ?

Thanks,

Mathieu


-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Restartable Sequences system call merged into Linux
  2018-06-14 13:32               ` Florian Weimer
@ 2018-06-14 13:46                 ` Mathieu Desnoyers
  2018-06-15  5:10                   ` Florian Weimer
  0 siblings, 1 reply; 25+ messages in thread
From: Mathieu Desnoyers @ 2018-06-14 13:46 UTC (permalink / raw)
  To: Florian Weimer
  Cc: Pavel Machek, carlos, Peter Zijlstra, Paul E. McKenney,
	Boqun Feng, Thomas Gleixner, linux-kernel, libc-alpha

----- On Jun 14, 2018, at 9:32 AM, Florian Weimer fweimer@redhat.com wrote:

> On 06/14/2018 03:25 PM, Pavel Machek wrote:
> 
>> But the proposal wanted to add a syscall to thread creation, right?
>> And I believe that may be noticeable.
> 
> We already call set_robust_list, so we could just pass a larger area to
> that and the kernel could use it.  Then no additional system call would
> be needed in the common case (new kernel which recognizes the new area
> size).
> 
> But then we cannot use an initial-exec thread local variable for it
> (although the offset from the thread pointer will still be constant, of
> course).

I'm wondering whether we could turn the problem around: expose a new
system call allowing to register an array of pointers to per-thread data,
which would be used rather than set_robust_list when available. This way,
we could register both the robust list and rseq with a single system call,
e.g.:

enum linux_tls_area_type {
    LINUX_TLS_ROBUST_LIST,
    LINUX_TLS_RSEQ,
};

struct linux_tls_area_item {
    enum linux_tls_area_type type;
    void *p;
};

long sys_register_tls_areas(struct linux_tls_area_item *array, size_t nb)

This would allow registering various TLS data structures with a single
system call without hindering flexibility on the user-space side. For
instance, we could still use initial-exec and the __rseq_abi symbol for
rseq with this approach.

Thoughts ?

Thanks,

Mathieu

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Restartable Sequences system call merged into Linux
  2018-06-14 13:38               ` Mathieu Desnoyers
@ 2018-06-14 13:49                 ` Pavel Machek
  2018-06-14 14:00                   ` Florian Weimer
  0 siblings, 1 reply; 25+ messages in thread
From: Pavel Machek @ 2018-06-14 13:49 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: Florian Weimer, carlos, Peter Zijlstra, Paul E. McKenney,
	Boqun Feng, Thomas Gleixner, linux-kernel, libc-alpha

[-- Attachment #1: Type: text/plain, Size: 940 bytes --]

Hi!

> >> - rseq_preempt(): on preemption, the scheduler sets the TIF_NOTIFY_RESUME thread
> >>   flag, so rseq_handle_notify_resume() can check whether it's in a rseq critical
> >>   section when returning to user-space,
> >> - rseq_signal_deliver(): on signal delivery, rseq_handle_notify_resume() checks
> >>   whether it's in a rseq critical section,
> >> - rseq_migrate: on migration, the scheduler sets TIF_NOTIFY_RESUME as well,
> > 
> > Yes, this is not likely to be noticeable.
> > 
> > But the proposal wanted to add a syscall to thread creation, right?
> > And I believe that may be noticeable.
> 
> Fair point! Do we have a standard benchmark that would stress this ?

Web server performance benchmarks basically test clone() performance
in many cases.
									Pavel

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Restartable Sequences system call merged into Linux
  2018-06-14 13:49                 ` Pavel Machek
@ 2018-06-14 14:00                   ` Florian Weimer
  2018-06-14 14:36                     ` Mathieu Desnoyers
  0 siblings, 1 reply; 25+ messages in thread
From: Florian Weimer @ 2018-06-14 14:00 UTC (permalink / raw)
  To: Pavel Machek, Mathieu Desnoyers
  Cc: carlos, Peter Zijlstra, Paul E. McKenney, Boqun Feng,
	Thomas Gleixner, linux-kernel, libc-alpha

On 06/14/2018 03:49 PM, Pavel Machek wrote:
> Hi!
> 
>>>> - rseq_preempt(): on preemption, the scheduler sets the TIF_NOTIFY_RESUME thread
>>>>    flag, so rseq_handle_notify_resume() can check whether it's in a rseq critical
>>>>    section when returning to user-space,
>>>> - rseq_signal_deliver(): on signal delivery, rseq_handle_notify_resume() checks
>>>>    whether it's in a rseq critical section,
>>>> - rseq_migrate: on migration, the scheduler sets TIF_NOTIFY_RESUME as well,
>>>
>>> Yes, this is not likely to be noticeable.
>>>
>>> But the proposal wanted to add a syscall to thread creation, right?
>>> And I believe that may be noticeable.
>>
>> Fair point! Do we have a standard benchmark that would stress this ?
> 
> Web server performance benchmarks basically test clone() performance
> in many cases.

Isn't that fork?  I expect that the rseq arena is inherited on fork and 
fork-type clone, otherwise it's going to be painful.

Thanks,
Florian

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Restartable Sequences system call merged into Linux
  2018-06-14 14:00                   ` Florian Weimer
@ 2018-06-14 14:36                     ` Mathieu Desnoyers
  2018-06-14 14:41                       ` Florian Weimer
  0 siblings, 1 reply; 25+ messages in thread
From: Mathieu Desnoyers @ 2018-06-14 14:36 UTC (permalink / raw)
  To: Florian Weimer
  Cc: Pavel Machek, carlos, Peter Zijlstra, Paul E. McKenney,
	Boqun Feng, Thomas Gleixner, linux-kernel, libc-alpha

----- On Jun 14, 2018, at 10:00 AM, Florian Weimer fweimer@redhat.com wrote:

> On 06/14/2018 03:49 PM, Pavel Machek wrote:
>> Hi!
>> 
>>>>> - rseq_preempt(): on preemption, the scheduler sets the TIF_NOTIFY_RESUME thread
>>>>>    flag, so rseq_handle_notify_resume() can check whether it's in a rseq critical
>>>>>    section when returning to user-space,
>>>>> - rseq_signal_deliver(): on signal delivery, rseq_handle_notify_resume() checks
>>>>>    whether it's in a rseq critical section,
>>>>> - rseq_migrate: on migration, the scheduler sets TIF_NOTIFY_RESUME as well,
>>>>
>>>> Yes, this is not likely to be noticeable.
>>>>
>>>> But the proposal wanted to add a syscall to thread creation, right?
>>>> And I believe that may be noticeable.
>>>
>>> Fair point! Do we have a standard benchmark that would stress this ?
>> 
>> Web server performance benchmarks basically test clone() performance
>> in many cases.
> 
> Isn't that fork?  I expect that the rseq arena is inherited on fork and
> fork-type clone, otherwise it's going to be painful.

On fork or clone creating a new process, the rseq tls area is inherited
from the thread that does the fork syscall.

On creation of a new thread with clone, there is no such inheritance.

Thanks,

Mathieu

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Restartable Sequences system call merged into Linux
  2018-06-14 14:36                     ` Mathieu Desnoyers
@ 2018-06-14 14:41                       ` Florian Weimer
  2018-06-14 15:09                         ` Mathieu Desnoyers
  0 siblings, 1 reply; 25+ messages in thread
From: Florian Weimer @ 2018-06-14 14:41 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: Pavel Machek, carlos, Peter Zijlstra, Paul E. McKenney,
	Boqun Feng, Thomas Gleixner, linux-kernel, libc-alpha

On 06/14/2018 04:36 PM, Mathieu Desnoyers wrote:
> ----- On Jun 14, 2018, at 10:00 AM, Florian Weimer fweimer@redhat.com wrote:
> 
>> On 06/14/2018 03:49 PM, Pavel Machek wrote:
>>> Hi!
>>>
>>>>>> - rseq_preempt(): on preemption, the scheduler sets the TIF_NOTIFY_RESUME thread
>>>>>>     flag, so rseq_handle_notify_resume() can check whether it's in a rseq critical
>>>>>>     section when returning to user-space,
>>>>>> - rseq_signal_deliver(): on signal delivery, rseq_handle_notify_resume() checks
>>>>>>     whether it's in a rseq critical section,
>>>>>> - rseq_migrate: on migration, the scheduler sets TIF_NOTIFY_RESUME as well,
>>>>>
>>>>> Yes, this is not likely to be noticeable.
>>>>>
>>>>> But the proposal wanted to add a syscall to thread creation, right?
>>>>> And I believe that may be noticeable.
>>>>
>>>> Fair point! Do we have a standard benchmark that would stress this ?
>>>
>>> Web server performance benchmarks basically test clone() performance
>>> in many cases.
>>
>> Isn't that fork?  I expect that the rseq arena is inherited on fork and
>> fork-type clone, otherwise it's going to be painful.
> 
> On fork or clone creating a new process, the rseq tls area is inherited
> from the thread that does the fork syscall.
> 
> On creation of a new thread with clone, there is no such inheritance.

Makes sense.  So fork-based (web) servers will not be impacted by the 
additional system call, and thread-based servers likely use a thread 
pool anyway.  I'm not really concerned about the additional system call 
here.

Thanks,
Florian

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Restartable Sequences system call merged into Linux
  2018-06-14 14:41                       ` Florian Weimer
@ 2018-06-14 15:09                         ` Mathieu Desnoyers
  0 siblings, 0 replies; 25+ messages in thread
From: Mathieu Desnoyers @ 2018-06-14 15:09 UTC (permalink / raw)
  To: Florian Weimer
  Cc: Pavel Machek, carlos, Peter Zijlstra, Paul E. McKenney,
	Boqun Feng, Thomas Gleixner, linux-kernel, libc-alpha

----- On Jun 14, 2018, at 10:41 AM, Florian Weimer fweimer@redhat.com wrote:

> On 06/14/2018 04:36 PM, Mathieu Desnoyers wrote:
>> ----- On Jun 14, 2018, at 10:00 AM, Florian Weimer fweimer@redhat.com wrote:
>> 
>>> On 06/14/2018 03:49 PM, Pavel Machek wrote:
>>>> Hi!
>>>>
>>>>>>> - rseq_preempt(): on preemption, the scheduler sets the TIF_NOTIFY_RESUME thread
>>>>>>>     flag, so rseq_handle_notify_resume() can check whether it's in a rseq critical
>>>>>>>     section when returning to user-space,
>>>>>>> - rseq_signal_deliver(): on signal delivery, rseq_handle_notify_resume() checks
>>>>>>>     whether it's in a rseq critical section,
>>>>>>> - rseq_migrate: on migration, the scheduler sets TIF_NOTIFY_RESUME as well,
>>>>>>
>>>>>> Yes, this is not likely to be noticeable.
>>>>>>
>>>>>> But the proposal wanted to add a syscall to thread creation, right?
>>>>>> And I believe that may be noticeable.
>>>>>
>>>>> Fair point! Do we have a standard benchmark that would stress this ?
>>>>
>>>> Web server performance benchmarks basically test clone() performance
>>>> in many cases.
>>>
>>> Isn't that fork?  I expect that the rseq arena is inherited on fork and
>>> fork-type clone, otherwise it's going to be painful.
>> 
>> On fork or clone creating a new process, the rseq tls area is inherited
>> from the thread that does the fork syscall.
>> 
>> On creation of a new thread with clone, there is no such inheritance.
> 
> Makes sense.  So fork-based (web) servers will not be impacted by the
> additional system call, and thread-based servers likely use a thread
> pool anyway.  I'm not really concerned about the additional system call
> here.

Just for the sake of completeness, there is (of course) no inheritance
on exec(). So glibc would also have to register the rseq TLS in its
constructors.

Thanks,

Mathieu

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Restartable Sequences system call merged into Linux
  2018-06-14 12:27         ` Pavel Machek
  2018-06-14 13:01           ` Mathieu Desnoyers
@ 2018-06-15  5:07           ` Florian Weimer
  1 sibling, 0 replies; 25+ messages in thread
From: Florian Weimer @ 2018-06-15  5:07 UTC (permalink / raw)
  To: Pavel Machek, Mathieu Desnoyers
  Cc: carlos, Peter Zijlstra, Paul E. McKenney, Boqun Feng,
	Thomas Gleixner, linux-kernel, libc-alpha

On 06/14/2018 02:27 PM, Pavel Machek wrote:

>>> Should we treat it the same way?  Always allocate it for each new thread
>>> and register it with the kernel?
>>
>> That would be an efficient way to do it, indeed. There is very little
>> performance overhead to have rseq registered for all threads, whether or
>> not they intend to run rseq critical sections.
> 
> People with slow / low memory machines would prefer not to see
> overhead they don't need...

I can try to get rid of the >500 byte per-thread area for the stub 
resolver.  That should compensate for the overhead introduced.

Thanks,
Florian

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Restartable Sequences system call merged into Linux
  2018-06-14 13:01           ` Mathieu Desnoyers
  2018-06-14 13:25             ` Pavel Machek
@ 2018-06-15  5:09             ` Florian Weimer
  2018-06-15 17:50               ` Mathieu Desnoyers
  1 sibling, 1 reply; 25+ messages in thread
From: Florian Weimer @ 2018-06-15  5:09 UTC (permalink / raw)
  To: Mathieu Desnoyers, Pavel Machek
  Cc: carlos, Peter Zijlstra, Paul E. McKenney, Boqun Feng,
	Thomas Gleixner, linux-kernel, libc-alpha

On 06/14/2018 03:01 PM, Mathieu Desnoyers wrote:
> Another alternative would be to somehow let glibc handle the registration,
> perhaps only doing it for applications expressing their interest for rseq.

That's not really possible.  We can't rely on the visibility of symbol 
bindings due to lazy binding and hidden visibility.  Registration of 
intent by other means will not work because if it is done from user 
code, some other library may have already launched a thread at this point.

(It's also a moot point if we want to use restartable sequences in glibc 
itself.)

Thanks,
Florian

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Restartable Sequences system call merged into Linux
  2018-06-14 13:46                 ` Mathieu Desnoyers
@ 2018-06-15  5:10                   ` Florian Weimer
  2018-06-15 17:44                     ` Mathieu Desnoyers
  0 siblings, 1 reply; 25+ messages in thread
From: Florian Weimer @ 2018-06-15  5:10 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: Pavel Machek, carlos, Peter Zijlstra, Paul E. McKenney,
	Boqun Feng, Thomas Gleixner, linux-kernel, libc-alpha

On 06/14/2018 03:46 PM, Mathieu Desnoyers wrote:
> This would allow registering various TLS data structures with a single
> system call without hindering flexibility on the user-space side. For
> instance, we could still use initial-exec and the __rseq_abi symbol for
> rseq with this approach.
> 
> Thoughts ?

Isn't this just a very narrow case of the usual batched syscalls 
proposal? 8-)

Florian

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Restartable Sequences system call merged into Linux
  2018-06-15  5:10                   ` Florian Weimer
@ 2018-06-15 17:44                     ` Mathieu Desnoyers
  0 siblings, 0 replies; 25+ messages in thread
From: Mathieu Desnoyers @ 2018-06-15 17:44 UTC (permalink / raw)
  To: Florian Weimer
  Cc: Pavel Machek, carlos, Peter Zijlstra, Paul E. McKenney,
	Boqun Feng, Thomas Gleixner, linux-kernel, libc-alpha

----- On Jun 15, 2018, at 1:10 AM, Florian Weimer fweimer@redhat.com wrote:

> On 06/14/2018 03:46 PM, Mathieu Desnoyers wrote:
>> This would allow registering various TLS data structures with a single
>> system call without hindering flexibility on the user-space side. For
>> instance, we could still use initial-exec and the __rseq_abi symbol for
>> rseq with this approach.
>> 
>> Thoughts ?
> 
> Isn't this just a very narrow case of the usual batched syscalls
> proposal? 8-)

Pretty much. But let's not go there unless this is really needed.
It looks like the added syscall on thread creation is not an issue
so far.

Thanks,

Mathieu

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Restartable Sequences system call merged into Linux
  2018-06-15  5:09             ` Florian Weimer
@ 2018-06-15 17:50               ` Mathieu Desnoyers
  0 siblings, 0 replies; 25+ messages in thread
From: Mathieu Desnoyers @ 2018-06-15 17:50 UTC (permalink / raw)
  To: Florian Weimer
  Cc: Pavel Machek, carlos, Peter Zijlstra, Paul E. McKenney,
	Boqun Feng, Thomas Gleixner, linux-kernel, libc-alpha

----- On Jun 15, 2018, at 1:09 AM, Florian Weimer fweimer@redhat.com wrote:

> On 06/14/2018 03:01 PM, Mathieu Desnoyers wrote:
>> Another alternative would be to somehow let glibc handle the registration,
>> perhaps only doing it for applications expressing their interest for rseq.
> 
> That's not really possible.  We can't rely on the visibility of symbol
> bindings due to lazy binding and hidden visibility.  Registration of
> intent by other means will not work because if it is done from user
> code, some other library may have already launched a thread at this point.
> 
> (It's also a moot point if we want to use restartable sequences in glibc
> itself.)

Considering that we can expect the glibc memory allocator to benefit from
rseq to speed up its memory allocator, this means pretty much any application
linked against glibc *will* end up using rseq indirectly.

Thanks,

Mathieu

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2018-06-15 17:50 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-06-11 19:49 Restartable Sequences system call merged into Linux Mathieu Desnoyers
2018-06-11 19:55 ` Florian Weimer
2018-06-11 20:04   ` Mathieu Desnoyers
2018-06-12 13:11     ` Florian Weimer
2018-06-12 16:31       ` Mathieu Desnoyers
2018-06-13  8:21         ` Florian Weimer
2018-06-14 12:27         ` Pavel Machek
2018-06-14 13:01           ` Mathieu Desnoyers
2018-06-14 13:25             ` Pavel Machek
2018-06-14 13:32               ` Florian Weimer
2018-06-14 13:46                 ` Mathieu Desnoyers
2018-06-15  5:10                   ` Florian Weimer
2018-06-15 17:44                     ` Mathieu Desnoyers
2018-06-14 13:38               ` Mathieu Desnoyers
2018-06-14 13:49                 ` Pavel Machek
2018-06-14 14:00                   ` Florian Weimer
2018-06-14 14:36                     ` Mathieu Desnoyers
2018-06-14 14:41                       ` Florian Weimer
2018-06-14 15:09                         ` Mathieu Desnoyers
2018-06-15  5:09             ` Florian Weimer
2018-06-15 17:50               ` Mathieu Desnoyers
2018-06-15  5:07           ` Florian Weimer
2018-06-13 11:48 ` Heiko Carstens
2018-06-13 16:14   ` Mathieu Desnoyers
2018-06-13 19:53     ` Mathieu Desnoyers

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).