All of lore.kernel.org
 help / color / mirror / Atom feed
* Disabling TLS address caching to help QEMU on GNU/Linux
@ 2021-07-20 14:52 Florian Weimer
  2021-07-20 15:31 ` Iain Sandoe
                   ` (2 more replies)
  0 siblings, 3 replies; 5+ messages in thread
From: Florian Weimer @ 2021-07-20 14:52 UTC (permalink / raw)
  To: gcc, libc-alpha, qemu-devel

Currently, the GNU/Linux ABI does not really specify whether the thread
pointer (the address of the TCB) may change at a function boundary.

Traditionally, GCC assumes that the ABI allows caching addresses of
thread-local variables across function calls.  Such caching varies in
aggressiveness between targets, probably due to differences in the
choice of -mtls-dialect=gnu and -mtls-dialect=gnu2 as the default for
the targets.  (Caching with -mtls-dialect=gnu2 appears to be more
aggressive.)

In addition to that, glibc defines errno as this:

extern int *__errno_location (void) __attribute__ ((__const__));
#define errno (*__errno_location ())

And the const attribute has the side effect of caching the address of
errno within the same stack frame.

With stackful coroutines, such address caching is only valid if
coroutines are only ever resumed on the same thread on which they were
suspended.  (The C++ coroutine implementation is not stackful and is not
affected by this at the ABI level.)  Historically, I think we took the
position that cross-thread resumption is undefined.  But the ABIs aren't
crystal-clear on this matter.

One important piece of software for GNU is QEMU (not just for GNU/Linux,
Hurd development also benefits from virtualization).  QEMU uses stackful
coroutines extensively.  There are some hard-to-change code areas where
resumption happens across threads unfortunately.  These increasingly
cause problems with more inlining, inter-procedural analysis, and a
general push towards LTO (which is also needed for some security
hardening features).

Should the GNU toolchain offer something to help out the QEMU
developers?  Maybe GCC could offer an option to disable the caching for
all TLS models.  glibc could detect that mode based on a new
preprocessor macro and adjust its __errno_location declaration and
similar function declarations.  There will be a performance impact of
this, of course, but it would make the QEMU usage well-defined (at the
lowest levels).

If this is a programming model that should be supported, then restoring
some of the optimizations would be possible, by annotating
context-switching functions and TLS-address-dependent functions.  But I
think QEMU would immediately benefit from just the simple approach that
disables address caching of TLS variables.

Thanks,
Florian



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Disabling TLS address caching to help QEMU on GNU/Linux
  2021-07-20 14:52 Disabling TLS address caching to help QEMU on GNU/Linux Florian Weimer
@ 2021-07-20 15:31 ` Iain Sandoe
  2021-07-21  7:22 ` Thomas Huth
  2021-07-22 12:12 ` Richard Biener
  2 siblings, 0 replies; 5+ messages in thread
From: Iain Sandoe @ 2021-07-20 15:31 UTC (permalink / raw)
  To: Florian Weimer; +Cc: gcc, libc-alpha, qemu-devel

Hi Florian,

This also affects fibres implementations (both C++ and D ones at least from
discussion with both communities).

> On 20 Jul 2021, at 15:52, Florian Weimer via Gcc <gcc@gcc.gnu.org> wrote:
> 
> Currently, the GNU/Linux ABI does not really specify whether the thread
> pointer (the address of the TCB) may change at a function boundary.
> 
> Traditionally, GCC assumes that the ABI allows caching addresses of
> thread-local variables across function calls.  Such caching varies in
> aggressiveness between targets, probably due to differences in the
> choice of -mtls-dialect=gnu and -mtls-dialect=gnu2 as the default for
> the targets.  (Caching with -mtls-dialect=gnu2 appears to be more
> aggressive.)
> 
> In addition to that, glibc defines errno as this:
> 
> extern int *__errno_location (void) __attribute__ ((__const__));
> #define errno (*__errno_location ())
> 
> And the const attribute has the side effect of caching the address of
> errno within the same stack frame.
> 
> With stackful coroutines, such address caching is only valid if
> coroutines are only ever resumed on the same thread on which they were
> suspended.  (The C++ coroutine implementation is not stackful and is not
> affected by this at the ABI level.)

There are C++20 coroutine library writers who want to switch threads in
symmetric transfers [ I am not entirely convinced about this at present and it
certainly would be suspect with TLS address caching enabled - since a TLS
pointer could equally be cached in the coroutine frame ].

The C++20 coroutine ABI is silent on such matters (it only describes the
visible part of the coroutine frame and the builtins used by the std library).

>  Historically, I think we took the
> position that cross-thread resumption is undefined.  But the ABIs aren't
> crystal-clear on this matter.


> One important piece of software for GNU is QEMU (not just for GNU/Linux,
> Hurd development also benefits from virtualization).  QEMU uses stackful
> coroutines extensively.  There are some hard-to-change code areas where
> resumption happens across threads unfortunately.  These increasingly
> cause problems with more inlining, inter-procedural analysis, and a
> general push towards LTO (which is also needed for some security
> hardening features).
> 
> Should the GNU toolchain offer something to help out the QEMU
> developers?  Maybe GCC could offer an option to disable the caching for
> all TLS models.  glibc could detect that mode based on a new
> preprocessor macro and adjust its __errno_location declaration and
> similar function declarations.  There will be a performance impact of
> this, of course, but it would make the QEMU usage well-defined (at the
> lowest levels).
> 
> If this is a programming model that should be supported, then restoring
> some of the optimizations would be possible, by annotating
> context-switching functions and TLS-address-dependent functions.  But I
> think QEMU would immediately benefit from just the simple approach that
> disables address caching of TLS variables.

IMO the general cases you note above are enough reason to want some
mechanism to control this,
thanks
Iain

> 
> Thanks,
> Florian
> 



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Disabling TLS address caching to help QEMU on GNU/Linux
  2021-07-20 14:52 Disabling TLS address caching to help QEMU on GNU/Linux Florian Weimer
  2021-07-20 15:31 ` Iain Sandoe
@ 2021-07-21  7:22 ` Thomas Huth
  2021-07-22 12:12 ` Richard Biener
  2 siblings, 0 replies; 5+ messages in thread
From: Thomas Huth @ 2021-07-21  7:22 UTC (permalink / raw)
  To: Florian Weimer, gcc, libc-alpha, qemu-devel; +Cc: Iain Sandoe, Stefan Hajnoczi

On 20/07/2021 16.52, Florian Weimer wrote:
> Currently, the GNU/Linux ABI does not really specify whether the thread
> pointer (the address of the TCB) may change at a function boundary.
[...]
> One important piece of software for GNU is QEMU (not just for GNU/Linux,
> Hurd development also benefits from virtualization).  QEMU uses stackful
> coroutines extensively.  There are some hard-to-change code areas where
> resumption happens across threads unfortunately.  These increasingly
> cause problems with more inlining, inter-procedural analysis, and a
> general push towards LTO (which is also needed for some security
> hardening features).

Thanks a lot for your mail, Florian!

As a context for those who read about this for the very first time: We're 
currently facing the problem that the coroutines in QEMU fail when compiling 
QEMU with -flto on a non-x86 architecture, see:

  https://bugzilla.redhat.com/show_bug.cgi?id=1952483#c6

> Should the GNU toolchain offer something to help out the QEMU
> developers?

I guess that would be extremely helpful...

  Thomas



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Disabling TLS address caching to help QEMU on GNU/Linux
  2021-07-20 14:52 Disabling TLS address caching to help QEMU on GNU/Linux Florian Weimer
  2021-07-20 15:31 ` Iain Sandoe
  2021-07-21  7:22 ` Thomas Huth
@ 2021-07-22 12:12 ` Richard Biener
  2021-07-22 16:01   ` Michael Matz
  2 siblings, 1 reply; 5+ messages in thread
From: Richard Biener @ 2021-07-22 12:12 UTC (permalink / raw)
  To: Florian Weimer; +Cc: GCC Development, GNU C Library, qemu-devel

On Tue, Jul 20, 2021 at 4:54 PM Florian Weimer via Gcc <gcc@gcc.gnu.org> wrote:
>
> Currently, the GNU/Linux ABI does not really specify whether the thread
> pointer (the address of the TCB) may change at a function boundary.
>
> Traditionally, GCC assumes that the ABI allows caching addresses of
> thread-local variables across function calls.  Such caching varies in
> aggressiveness between targets, probably due to differences in the
> choice of -mtls-dialect=gnu and -mtls-dialect=gnu2 as the default for
> the targets.  (Caching with -mtls-dialect=gnu2 appears to be more
> aggressive.)
>
> In addition to that, glibc defines errno as this:
>
> extern int *__errno_location (void) __attribute__ ((__const__));
> #define errno (*__errno_location ())
>
> And the const attribute has the side effect of caching the address of
> errno within the same stack frame.
>
> With stackful coroutines, such address caching is only valid if
> coroutines are only ever resumed on the same thread on which they were
> suspended.  (The C++ coroutine implementation is not stackful and is not
> affected by this at the ABI level.)  Historically, I think we took the
> position that cross-thread resumption is undefined.  But the ABIs aren't
> crystal-clear on this matter.
>
> One important piece of software for GNU is QEMU (not just for GNU/Linux,
> Hurd development also benefits from virtualization).  QEMU uses stackful
> coroutines extensively.  There are some hard-to-change code areas where
> resumption happens across threads unfortunately.  These increasingly
> cause problems with more inlining, inter-procedural analysis, and a
> general push towards LTO (which is also needed for some security
> hardening features).
>
> Should the GNU toolchain offer something to help out the QEMU
> developers?  Maybe GCC could offer an option to disable the caching for
> all TLS models.  glibc could detect that mode based on a new
> preprocessor macro and adjust its __errno_location declaration and
> similar function declarations.  There will be a performance impact of
> this, of course, but it would make the QEMU usage well-defined (at the
> lowest levels).

But how does TLS usage transfer between threads?  On the gimple
level the TLS pointer is not visible and thus we'd happily CSE its address:

__thread int x[2];

void bar (int *);

int *foo(int i)
{
  int *p = &x[i];
  bar (p);
  return &x[i];
}

results in

int * foo (int i)
{
  int * p;
  sizetype _5;
  sizetype _6;

  <bb 2> [local count: 1073741824]:
  _5 = (sizetype) i_1(D);
  _6 = _5 * 4;
  p_2 = &x + _6;
  bar (p_2);
  return p_2;
}

to make this work as expected one would need to expose the TLS pointer
access.

> If this is a programming model that should be supported, then restoring
> some of the optimizations would be possible, by annotating
> context-switching functions and TLS-address-dependent functions.  But I
> think QEMU would immediately benefit from just the simple approach that
> disables address caching of TLS variables.
>
> Thanks,
> Florian
>


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Disabling TLS address caching to help QEMU on GNU/Linux
  2021-07-22 12:12 ` Richard Biener
@ 2021-07-22 16:01   ` Michael Matz
  0 siblings, 0 replies; 5+ messages in thread
From: Michael Matz @ 2021-07-22 16:01 UTC (permalink / raw)
  To: Richard Biener; +Cc: Florian Weimer, GCC Development, GNU C Library, qemu-devel

Hello,

On Thu, 22 Jul 2021, Richard Biener via Gcc wrote:

> But how does TLS usage transfer between threads?  On the gimple level 
> the TLS pointer is not visible and thus we'd happily CSE its address:

Yes.  All take-address operations then need to be encoded explicitely with 
a non-CSE-able internal function (or so):

  &x --> __ifn_get_tls_addr(&x);

(&x in the argument just so that it's clear that it doesn't access the 
value at x and to get the current effects of address-taken marking of 
ADDR_EXPR).

(Or of course, ADDR_EXPR could be taken as unstable when applied to TLS 
decls).

Quite a big hammer IMHO.


Ciao,
Michael.


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2021-07-22 16:03 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-07-20 14:52 Disabling TLS address caching to help QEMU on GNU/Linux Florian Weimer
2021-07-20 15:31 ` Iain Sandoe
2021-07-21  7:22 ` Thomas Huth
2021-07-22 12:12 ` Richard Biener
2021-07-22 16:01   ` Michael Matz

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.