Disabling TLS address caching to help QEMU on GNU/Linux

* Disabling TLS address caching to help QEMU on GNU/Linux
@ 2021-07-20 14:52 Florian Weimer
  2021-07-20 15:31 ` Iain Sandoe
                   ` (2 more replies)
  0 siblings, 3 replies; 5+ messages in thread
From: Florian Weimer @ 2021-07-20 14:52 UTC (permalink / raw)
  To: gcc, libc-alpha, qemu-devel

Currently, the GNU/Linux ABI does not really specify whether the thread
pointer (the address of the TCB) may change at a function boundary.

Traditionally, GCC assumes that the ABI allows caching addresses of
thread-local variables across function calls.  Such caching varies in
aggressiveness between targets, probably due to differences in the
choice of -mtls-dialect=gnu and -mtls-dialect=gnu2 as the default for
the targets.  (Caching with -mtls-dialect=gnu2 appears to be more
aggressive.)

In addition to that, glibc defines errno as this:

extern int *__errno_location (void) __attribute__ ((__const__));
#define errno (*__errno_location ())

And the const attribute has the side effect of caching the address of
errno within the same stack frame.

With stackful coroutines, such address caching is only valid if
coroutines are only ever resumed on the same thread on which they were
suspended.  (The C++ coroutine implementation is not stackful and is not
affected by this at the ABI level.)  Historically, I think we took the
position that cross-thread resumption is undefined.  But the ABIs aren't
crystal-clear on this matter.

One important piece of software for GNU is QEMU (not just for GNU/Linux,
Hurd development also benefits from virtualization).  QEMU uses stackful
coroutines extensively.  There are some hard-to-change code areas where
resumption happens across threads unfortunately.  These increasingly
cause problems with more inlining, inter-procedural analysis, and a
general push towards LTO (which is also needed for some security
hardening features).

Should the GNU toolchain offer something to help out the QEMU
developers?  Maybe GCC could offer an option to disable the caching for
all TLS models.  glibc could detect that mode based on a new
preprocessor macro and adjust its __errno_location declaration and
similar function declarations.  There will be a performance impact of
this, of course, but it would make the QEMU usage well-defined (at the
lowest levels).

If this is a programming model that should be supported, then restoring
some of the optimizations would be possible, by annotating
context-switching functions and TLS-address-dependent functions.  But I
think QEMU would immediately benefit from just the simple approach that
disables address caching of TLS variables.

Thanks,
Florian

^ permalink raw reply	[flat|nested] 5+ messages in thread