All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC v2 0/4] tls: add macros for coroutine-safe TLS variables
@ 2021-12-01 17:01 Stefan Hajnoczi
  2021-12-01 17:01 ` [RFC v2 1/4] " Stefan Hajnoczi
                   ` (3 more replies)
  0 siblings, 4 replies; 11+ messages in thread
From: Stefan Hajnoczi @ 2021-12-01 17:01 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, fweimer, thuth, Daniel Berrange, qemu-block,
	Richard Henderson, Stefan Hajnoczi, Paolo Bonzini, Fam Zheng,
	Warner Losh, sguelton

This patch series solves the coroutines TLS problem. Coroutines re-entered from
another thread sometimes see stale TLS values. This happens because compilers
may cache values across yield points, so a value from the previous thread will
be used when the coroutine is re-entered in another thread.

Serge Guelton developed a portable technique and Richard Henderson developed an
inline-friendly architecture-specific technique, see the first patch for
details.

I have audited all __thread variables in QEMU and converted those that can be
used from coroutines. Most actually look safe to me. This patch does not
include a checkpatch.pl rule that requires coroutine-tls.h usage, but perhaps
we should add one for block/ at least?

Stefan Hajnoczi (4):
  tls: add macros for coroutine-safe TLS variables
  util/async: replace __thread with QEMU TLS macros
  rcu: use coroutine TLS macros
  cpus: use coroutine TLS macros for iothread_locked

 include/qemu/coroutine-tls.h | 202 +++++++++++++++++++++++++++++++++++
 include/qemu/rcu.h           |   7 +-
 softmmu/cpus.c               |   8 +-
 tests/unit/rcutorture.c      |  10 +-
 tests/unit/test-rcu-list.c   |   4 +-
 util/async.c                 |  12 ++-
 util/rcu.c                   |  10 +-
 7 files changed, 229 insertions(+), 24 deletions(-)
 create mode 100644 include/qemu/coroutine-tls.h

-- 
2.33.1




^ permalink raw reply	[flat|nested] 11+ messages in thread

* [RFC v2 1/4] tls: add macros for coroutine-safe TLS variables
  2021-12-01 17:01 [RFC v2 0/4] tls: add macros for coroutine-safe TLS variables Stefan Hajnoczi
@ 2021-12-01 17:01 ` Stefan Hajnoczi
  2021-12-01 18:24   ` Florian Weimer
  2021-12-02 14:44   ` Peter Maydell
  2021-12-01 17:01 ` [RFC v2 2/4] util/async: replace __thread with QEMU TLS macros Stefan Hajnoczi
                   ` (2 subsequent siblings)
  3 siblings, 2 replies; 11+ messages in thread
From: Stefan Hajnoczi @ 2021-12-01 17:01 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, fweimer, thuth, Daniel Berrange, qemu-block,
	Richard Henderson, Stefan Hajnoczi, Paolo Bonzini, Fam Zheng,
	Warner Losh, sguelton

Compiler optimizations can cache TLS values across coroutine yield
points, resulting in stale values from the previous thread when a
coroutine is re-entered by a new thread.

Serge Guelton developed an __attribute__((noinline)) wrapper and tested
it with clang and gcc. I formatted his idea according to QEMU's coding
style and wrote documentation.

Richard Henderson developed an alternative approach that can be inlined
by the compiler. This is included for architectures where we have inline
assembly that determines the address of a TLS variable.

These macros must be used instead of __thread from now on to prevent
coroutine TLS bugs. Here is an x86_64 TLS variable access before this patch:

  mov    %fs:-0x19c,%edx

And here is the same access using Richard's approach:

  rdfsbase %rax             # %fs contains the base address
  lea    -0x1a8(%rax),%rax  # -0x1a8 is the offset of our variable
  mov    0xc(%rax),%edx     # here we access the TLS variable via %rax

Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=1952483
Suggested-by: Serge Guelton <sguelton@redhat.com>
Suggested-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
Richard's suggested code used a MOV instruction on x86_64 but we need
LEA semantics. LEA doesn't support %fs so I switched to RDFSBASE+LEA.
Otherwise Richard's approach is unchanged.
---
 include/qemu/coroutine-tls.h | 202 +++++++++++++++++++++++++++++++++++
 1 file changed, 202 insertions(+)
 create mode 100644 include/qemu/coroutine-tls.h

diff --git a/include/qemu/coroutine-tls.h b/include/qemu/coroutine-tls.h
new file mode 100644
index 0000000000..3158f9c0eb
--- /dev/null
+++ b/include/qemu/coroutine-tls.h
@@ -0,0 +1,202 @@
+/*
+ * QEMU Thread Local Storage for coroutines
+ *
+ * Copyright Red Hat
+ *
+ * SPDX-License-Identifier: LGPL-2.1-or-later
+ *
+ * This work is licensed under the terms of the GNU LGPL, version 2.1 or later.
+ * See the COPYING.LIB file in the top-level directory.
+ *
+ * It is forbidden to access Thread Local Storage in coroutines because
+ * compiler optimizations may cause values to be cached across coroutine
+ * re-entry. Coroutines can run in more than one thread through the course of
+ * their life, leading bugs when stale TLS values from the wrong thread are
+ * used as a result of compiler optimization.
+ *
+ * An example is:
+ *
+ * ..code-block:: c
+ *   :caption: A coroutine that may see the wrong TLS value
+ *
+ *   static __thread AioContext *current_aio_context;
+ *   ...
+ *   static void coroutine_fn foo(void)
+ *   {
+ *       aio_notify(current_aio_context);
+ *       qemu_coroutine_yield();
+ *       aio_notify(current_aio_context); // <-- may be stale after yielding!
+ *   }
+ *
+ * This header provides macros for safely defining variables in Thread Local
+ * Storage:
+ *
+ * ..code-block:: c
+ *   :caption: A coroutine that safely uses TLS
+ *
+ *   QEMU_DEFINE_STATIC_CO_TLS(AioContext *, current_aio_context)
+ *   ...
+ *   static void coroutine_fn foo(void)
+ *   {
+ *       aio_notify(get_current_aio_context());
+ *       qemu_coroutine_yield();
+ *       aio_notify(get_current_aio_context()); // <-- safe
+ *   }
+ */
+
+#ifndef QEMU_COROUTINE_TLS_H
+#define QEMU_COROUTINE_TLS_H
+
+/*
+ * Two techniques are available to stop the compiler from caching TLS values:
+ * 1. Accessor functions with __attribute__((noinline)). This is portable but
+ *    prevents inlining optimizations.
+ * 2. TLS address-of implemented as asm volatile so it can be inlined safely.
+ *    This enables inlining optimizations but requires architecture-specific
+ *    inline assembly.
+ */
+#if defined(__aarch64__)
+#define QEMU_CO_TLS_ADDR(ret, var)                              \
+    asm volatile("mrs %0, tpidr_el0\n\t"                        \
+                 "add %0, %0, #:tprel_hi12:"#var", lsl #12\n\t" \
+                 "add %0, %0, #:tprel_lo12_nc:"#var             \
+                 : "=r"(ret))
+#elif defined(__powerpc64__)
+#define QEMU_CO_TLS_ADDR(ret, var)                              \
+    asm volatile("addis %0,13,"#var"@tprel@ha\n\t"              \
+                 "add   %0,%0,"#var"@tprel@l"                   \
+                 : "=r"(ret))
+#elif defined(__riscv)
+#define QEMU_CO_TLS_ADDR(ret, var)                              \
+    asm volatile("lui  %0,%%tprel_hi("#var")\n\t"               \
+                 "add  %0,%0,%%tprel_add("#var")\n\t"           \
+                 "addi %0,%0,%%tprel_lo("#var")"                \
+                 : "=r"(ret))
+#elif defined(__x86_64__)
+#define QEMU_CO_TLS_ADDR(ret, var)                              \
+    asm volatile("rdfsbase %0\n\t"                              \
+                 "lea "#var"@tpoff(%0), %0" : "=r"(ret))
+#endif
+
+/**
+ * QEMU_DECLARE_CO_TLS:
+ * @type: the variable's C type
+ * @var: the variable name
+ *
+ * Declare an extern variable in Thread Local Storage from a header file:
+ *
+ * .. code-block:: c
+ *   :caption: Declaring an extern variable in Thread Local Storage
+ *
+ *   QEMU_DECLARE_CO_TLS(int, my_count)
+ *   ...
+ *   int c = get_my_count();
+ *   set_my_count(c + 1);
+ *   *get_ptr_my_count() = 0;
+ *
+ * Use this instead of:
+ *
+ * .. code-block:: c
+ *   :caption: Declaring a TLS variable using __thread
+ *
+ *   extern __thread int my_count;
+ *   ...
+ *   int c = my_count;
+ *   my_count = c + 1;
+ *   *(&my_count) = 0;
+ */
+#ifdef QEMU_CO_TLS_ADDR
+#define QEMU_DECLARE_CO_TLS(type, var)                          \
+    extern __thread type co_tls_##var;                          \
+    static inline type get_##var(void)                          \
+    { type *p; QEMU_CO_TLS_ADDR(p, co_tls_##var); return *p; }  \
+    static inline void set_##var(type v)                        \
+    { type *p; QEMU_CO_TLS_ADDR(p, co_tls_##var); *p = v; }     \
+    static inline type *get_ptr_##var(void)                     \
+    { type *p; QEMU_CO_TLS_ADDR(p, co_tls_##var); return p; }
+#else
+#define QEMU_DECLARE_CO_TLS(type, var)                          \
+    __attribute__((noinline)) type get_##var(void);             \
+    __attribute__((noinline)) void set_##var(type v);           \
+    __attribute__((noinline)) type *get_ptr_##var(void);
+#endif
+
+/**
+ * QEMU_DEFINE_CO_TLS:
+ * @type: the variable's C type
+ * @var: the variable name
+ *
+ * Define an variable in Thread Local Storage that was previously declared from
+ * a header file with QEMU_DECLARE_CO_TLS():
+ *
+ * .. code-block:: c
+ *   :caption: Defining a variable in Thread Local Storage
+ *
+ *   QEMU_DEFINE_CO_TLS(int, my_count)
+ *
+ * Use this instead of:
+ *
+ * .. code-block:: c
+ *   :caption: Defining a TLS variable using __thread
+ *
+ *   __thread int my_count;
+ */
+#ifdef QEMU_CO_TLS_ADDR
+#define QEMU_DEFINE_CO_TLS(type, var)                           \
+    __thread type co_tls_##var;
+#else
+#define QEMU_DEFINE_CO_TLS(type, var)                           \
+    static __thread type co_tls_##var;                          \
+    type get_##var(void) { return co_tls_##var; }               \
+    void set_##var(type v) { co_tls_##var = v; }                \
+    type *get_ptr_##var(void) { return &co_tls_##var; }
+#endif
+
+/**
+ * QEMU_DEFINE_STATIC_CO_TLS:
+ * @type: the variable's C type
+ * @var: the variable name
+ *
+ * Define a static variable in Thread Local Storage:
+ *
+ * .. code-block:: c
+ *   :caption: Defining a static variable in Thread Local Storage
+ *
+ *   QEMU_DEFINE_STATIC_CO_TLS(int, my_count)
+ *   ...
+ *   int c = get_my_count();
+ *   set_my_count(c + 1);
+ *   *get_ptr_my_count() = 0;
+ *
+ * Use this instead of:
+ *
+ * .. code-block:: c
+ *   :caption: Defining a static TLS variable using __thread
+ *
+ *   static __thread int my_count;
+ *   ...
+ *   int c = my_count;
+ *   my_count = c + 1;
+ *   *(&my_count) = 0;
+ */
+#ifdef QEMU_CO_TLS_ADDR
+#define QEMU_DEFINE_STATIC_CO_TLS(type, var)                    \
+    __thread type co_tls_##var;  \
+    static inline type get_##var(void)                          \
+    { type *p; QEMU_CO_TLS_ADDR(p, co_tls_##var); return *p; }  \
+    static inline void set_##var(type v)                        \
+    { type *p; QEMU_CO_TLS_ADDR(p, co_tls_##var); *p = v; }     \
+    static inline type *get_ptr_##var(void)                     \
+    { type *p; QEMU_CO_TLS_ADDR(p, co_tls_##var); return p; }
+#else
+#define QEMU_DEFINE_STATIC_CO_TLS(type, var)                    \
+    static __thread type co_tls_##var;                          \
+    static __attribute__((noinline, unused)) type get_##var(void)       \
+    { return co_tls_##var; }                                    \
+    static __attribute__((noinline, unused)) void set_##var(type v)     \
+    { co_tls_##var = v; }                                       \
+    static __attribute__((noinline, unused)) type *get_ptr_##var(void)  \
+    { return &co_tls_##var; }
+#endif
+
+#endif /* QEMU_COROUTINE_TLS_H */
-- 
2.33.1



^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [RFC v2 2/4] util/async: replace __thread with QEMU TLS macros
  2021-12-01 17:01 [RFC v2 0/4] tls: add macros for coroutine-safe TLS variables Stefan Hajnoczi
  2021-12-01 17:01 ` [RFC v2 1/4] " Stefan Hajnoczi
@ 2021-12-01 17:01 ` Stefan Hajnoczi
  2021-12-01 17:01 ` [RFC v2 3/4] rcu: use coroutine " Stefan Hajnoczi
  2021-12-01 17:01 ` [RFC v2 4/4] cpus: use coroutine TLS macros for iothread_locked Stefan Hajnoczi
  3 siblings, 0 replies; 11+ messages in thread
From: Stefan Hajnoczi @ 2021-12-01 17:01 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, fweimer, thuth, Daniel Berrange, qemu-block,
	Richard Henderson, Stefan Hajnoczi, Paolo Bonzini, Fam Zheng,
	Warner Losh, sguelton

QEMU TLS macros must be used to make TLS variables safe with coroutines.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 util/async.c | 12 +++++++-----
 1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/util/async.c b/util/async.c
index 6f6717a34b..ddd9f24419 100644
--- a/util/async.c
+++ b/util/async.c
@@ -32,6 +32,7 @@
 #include "qemu/rcu_queue.h"
 #include "block/raw-aio.h"
 #include "qemu/coroutine_int.h"
+#include "qemu/coroutine-tls.h"
 #include "trace.h"
 
 /***********************************************************/
@@ -669,12 +670,13 @@ void aio_context_release(AioContext *ctx)
     qemu_rec_mutex_unlock(&ctx->lock);
 }
 
-static __thread AioContext *my_aiocontext;
+QEMU_DEFINE_STATIC_CO_TLS(AioContext *, my_aiocontext)
 
 AioContext *qemu_get_current_aio_context(void)
 {
-    if (my_aiocontext) {
-        return my_aiocontext;
+    AioContext *ctx = get_my_aiocontext();
+    if (ctx) {
+        return ctx;
     }
     if (qemu_mutex_iothread_locked()) {
         /* Possibly in a vCPU thread.  */
@@ -685,6 +687,6 @@ AioContext *qemu_get_current_aio_context(void)
 
 void qemu_set_current_aio_context(AioContext *ctx)
 {
-    assert(!my_aiocontext);
-    my_aiocontext = ctx;
+    assert(!get_my_aiocontext());
+    set_my_aiocontext(ctx);
 }
-- 
2.33.1



^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [RFC v2 3/4] rcu: use coroutine TLS macros
  2021-12-01 17:01 [RFC v2 0/4] tls: add macros for coroutine-safe TLS variables Stefan Hajnoczi
  2021-12-01 17:01 ` [RFC v2 1/4] " Stefan Hajnoczi
  2021-12-01 17:01 ` [RFC v2 2/4] util/async: replace __thread with QEMU TLS macros Stefan Hajnoczi
@ 2021-12-01 17:01 ` Stefan Hajnoczi
  2021-12-01 17:01 ` [RFC v2 4/4] cpus: use coroutine TLS macros for iothread_locked Stefan Hajnoczi
  3 siblings, 0 replies; 11+ messages in thread
From: Stefan Hajnoczi @ 2021-12-01 17:01 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, fweimer, thuth, Daniel Berrange, qemu-block,
	Richard Henderson, Stefan Hajnoczi, Paolo Bonzini, Fam Zheng,
	Warner Losh, sguelton

RCU may be used from coroutines. Standard __thread variables cannot be
used by coroutines. Use the coroutine TLS macros instead.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 include/qemu/rcu.h         |  7 ++++---
 tests/unit/rcutorture.c    | 10 +++++-----
 tests/unit/test-rcu-list.c |  4 ++--
 util/rcu.c                 | 10 +++++-----
 4 files changed, 16 insertions(+), 15 deletions(-)

diff --git a/include/qemu/rcu.h b/include/qemu/rcu.h
index e69efbd47f..b063c6fde8 100644
--- a/include/qemu/rcu.h
+++ b/include/qemu/rcu.h
@@ -29,6 +29,7 @@
 #include "qemu/atomic.h"
 #include "qemu/notify.h"
 #include "qemu/sys_membarrier.h"
+#include "qemu/coroutine-tls.h"
 
 #ifdef __cplusplus
 extern "C" {
@@ -76,11 +77,11 @@ struct rcu_reader_data {
     NotifierList force_rcu;
 };
 
-extern __thread struct rcu_reader_data rcu_reader;
+QEMU_DECLARE_CO_TLS(struct rcu_reader_data, rcu_reader)
 
 static inline void rcu_read_lock(void)
 {
-    struct rcu_reader_data *p_rcu_reader = &rcu_reader;
+    struct rcu_reader_data *p_rcu_reader = get_ptr_rcu_reader();
     unsigned ctr;
 
     if (p_rcu_reader->depth++ > 0) {
@@ -96,7 +97,7 @@ static inline void rcu_read_lock(void)
 
 static inline void rcu_read_unlock(void)
 {
-    struct rcu_reader_data *p_rcu_reader = &rcu_reader;
+    struct rcu_reader_data *p_rcu_reader = get_ptr_rcu_reader();
 
     assert(p_rcu_reader->depth != 0);
     if (--p_rcu_reader->depth > 0) {
diff --git a/tests/unit/rcutorture.c b/tests/unit/rcutorture.c
index de6f649058..495a4e6f42 100644
--- a/tests/unit/rcutorture.c
+++ b/tests/unit/rcutorture.c
@@ -122,7 +122,7 @@ static void *rcu_read_perf_test(void *arg)
 
     rcu_register_thread();
 
-    *(struct rcu_reader_data **)arg = &rcu_reader;
+    *(struct rcu_reader_data **)arg = get_ptr_rcu_reader();
     qatomic_inc(&nthreadsrunning);
     while (goflag == GOFLAG_INIT) {
         g_usleep(1000);
@@ -148,7 +148,7 @@ static void *rcu_update_perf_test(void *arg)
 
     rcu_register_thread();
 
-    *(struct rcu_reader_data **)arg = &rcu_reader;
+    *(struct rcu_reader_data **)arg = get_ptr_rcu_reader();
     qatomic_inc(&nthreadsrunning);
     while (goflag == GOFLAG_INIT) {
         g_usleep(1000);
@@ -253,7 +253,7 @@ static void *rcu_read_stress_test(void *arg)
 
     rcu_register_thread();
 
-    *(struct rcu_reader_data **)arg = &rcu_reader;
+    *(struct rcu_reader_data **)arg = get_ptr_rcu_reader();
     while (goflag == GOFLAG_INIT) {
         g_usleep(1000);
     }
@@ -304,7 +304,7 @@ static void *rcu_update_stress_test(void *arg)
     struct rcu_stress *cp = qatomic_read(&rcu_stress_current);
 
     rcu_register_thread();
-    *(struct rcu_reader_data **)arg = &rcu_reader;
+    *(struct rcu_reader_data **)arg = get_ptr_rcu_reader();
 
     while (goflag == GOFLAG_INIT) {
         g_usleep(1000);
@@ -347,7 +347,7 @@ static void *rcu_fake_update_stress_test(void *arg)
 {
     rcu_register_thread();
 
-    *(struct rcu_reader_data **)arg = &rcu_reader;
+    *(struct rcu_reader_data **)arg = get_ptr_rcu_reader();
     while (goflag == GOFLAG_INIT) {
         g_usleep(1000);
     }
diff --git a/tests/unit/test-rcu-list.c b/tests/unit/test-rcu-list.c
index 49641e1936..64b81ae058 100644
--- a/tests/unit/test-rcu-list.c
+++ b/tests/unit/test-rcu-list.c
@@ -171,7 +171,7 @@ static void *rcu_q_reader(void *arg)
 
     rcu_register_thread();
 
-    *(struct rcu_reader_data **)arg = &rcu_reader;
+    *(struct rcu_reader_data **)arg = get_ptr_rcu_reader();
     qatomic_inc(&nthreadsrunning);
     while (qatomic_read(&goflag) == GOFLAG_INIT) {
         g_usleep(1000);
@@ -206,7 +206,7 @@ static void *rcu_q_updater(void *arg)
     long long n_removed_local = 0;
     struct list_element *el, *prev_el;
 
-    *(struct rcu_reader_data **)arg = &rcu_reader;
+    *(struct rcu_reader_data **)arg = get_ptr_rcu_reader();
     qatomic_inc(&nthreadsrunning);
     while (qatomic_read(&goflag) == GOFLAG_INIT) {
         g_usleep(1000);
diff --git a/util/rcu.c b/util/rcu.c
index c91da9f137..b6d6c71cff 100644
--- a/util/rcu.c
+++ b/util/rcu.c
@@ -65,7 +65,7 @@ static inline int rcu_gp_ongoing(unsigned long *ctr)
 /* Written to only by each individual reader. Read by both the reader and the
  * writers.
  */
-__thread struct rcu_reader_data rcu_reader;
+QEMU_DEFINE_CO_TLS(struct rcu_reader_data, rcu_reader)
 
 /* Protected by rcu_registry_lock.  */
 typedef QLIST_HEAD(, rcu_reader_data) ThreadList;
@@ -355,23 +355,23 @@ void drain_call_rcu(void)
 
 void rcu_register_thread(void)
 {
-    assert(rcu_reader.ctr == 0);
+    assert(get_ptr_rcu_reader()->ctr == 0);
     qemu_mutex_lock(&rcu_registry_lock);
-    QLIST_INSERT_HEAD(&registry, &rcu_reader, node);
+    QLIST_INSERT_HEAD(&registry, get_ptr_rcu_reader(), node);
     qemu_mutex_unlock(&rcu_registry_lock);
 }
 
 void rcu_unregister_thread(void)
 {
     qemu_mutex_lock(&rcu_registry_lock);
-    QLIST_REMOVE(&rcu_reader, node);
+    QLIST_REMOVE(get_ptr_rcu_reader(), node);
     qemu_mutex_unlock(&rcu_registry_lock);
 }
 
 void rcu_add_force_rcu_notifier(Notifier *n)
 {
     qemu_mutex_lock(&rcu_registry_lock);
-    notifier_list_add(&rcu_reader.force_rcu, n);
+    notifier_list_add(&get_ptr_rcu_reader()->force_rcu, n);
     qemu_mutex_unlock(&rcu_registry_lock);
 }
 
-- 
2.33.1



^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [RFC v2 4/4] cpus: use coroutine TLS macros for iothread_locked
  2021-12-01 17:01 [RFC v2 0/4] tls: add macros for coroutine-safe TLS variables Stefan Hajnoczi
                   ` (2 preceding siblings ...)
  2021-12-01 17:01 ` [RFC v2 3/4] rcu: use coroutine " Stefan Hajnoczi
@ 2021-12-01 17:01 ` Stefan Hajnoczi
  3 siblings, 0 replies; 11+ messages in thread
From: Stefan Hajnoczi @ 2021-12-01 17:01 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, fweimer, thuth, Daniel Berrange, qemu-block,
	Richard Henderson, Stefan Hajnoczi, Paolo Bonzini, Fam Zheng,
	Warner Losh, sguelton

qemu_mutex_iothread_locked() may be used from coroutines. Standard
__thread variables cannot be used by coroutines. Use the coroutine TLS
macros instead.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 softmmu/cpus.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/softmmu/cpus.c b/softmmu/cpus.c
index 071085f840..b02d3211e1 100644
--- a/softmmu/cpus.c
+++ b/softmmu/cpus.c
@@ -474,11 +474,11 @@ bool qemu_in_vcpu_thread(void)
     return current_cpu && qemu_cpu_is_self(current_cpu);
 }
 
-static __thread bool iothread_locked = false;
+QEMU_DEFINE_STATIC_CO_TLS(bool, iothread_locked)
 
 bool qemu_mutex_iothread_locked(void)
 {
-    return iothread_locked;
+    return get_iothread_locked();
 }
 
 /*
@@ -491,13 +491,13 @@ void qemu_mutex_lock_iothread_impl(const char *file, int line)
 
     g_assert(!qemu_mutex_iothread_locked());
     bql_lock(&qemu_global_mutex, file, line);
-    iothread_locked = true;
+    set_iothread_locked(true);
 }
 
 void qemu_mutex_unlock_iothread(void)
 {
     g_assert(qemu_mutex_iothread_locked());
-    iothread_locked = false;
+    set_iothread_locked(false);
     qemu_mutex_unlock(&qemu_global_mutex);
 }
 
-- 
2.33.1



^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [RFC v2 1/4] tls: add macros for coroutine-safe TLS variables
  2021-12-01 17:01 ` [RFC v2 1/4] " Stefan Hajnoczi
@ 2021-12-01 18:24   ` Florian Weimer
  2021-12-02  9:53     ` Stefan Hajnoczi
  2021-12-02 14:44   ` Peter Maydell
  1 sibling, 1 reply; 11+ messages in thread
From: Florian Weimer @ 2021-12-01 18:24 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: Kevin Wolf, Fam Zheng, thuth, Daniel Berrange, qemu-block,
	Richard Henderson, qemu-devel, Paolo Bonzini, Warner Losh,
	sguelton

* Stefan Hajnoczi:

> +#elif defined(__x86_64__)
> +#define QEMU_CO_TLS_ADDR(ret, var)                              \
> +    asm volatile("rdfsbase %0\n\t"                              \
> +                 "lea "#var"@tpoff(%0), %0" : "=r"(ret))
> +#endif

RDFSBASE needs quite recent kernels.  I think you should use

  movq %%fs:0, %0

instead, which is equivalent for the x86-64 psABI.

Thanks,
Florian



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC v2 1/4] tls: add macros for coroutine-safe TLS variables
  2021-12-01 18:24   ` Florian Weimer
@ 2021-12-02  9:53     ` Stefan Hajnoczi
  0 siblings, 0 replies; 11+ messages in thread
From: Stefan Hajnoczi @ 2021-12-02  9:53 UTC (permalink / raw)
  To: Florian Weimer
  Cc: Kevin Wolf, Fam Zheng, thuth, Daniel Berrange, qemu-block,
	Richard Henderson, qemu-devel, Paolo Bonzini, Warner Losh,
	sguelton

[-- Attachment #1: Type: text/plain, Size: 658 bytes --]

On Wed, Dec 01, 2021 at 07:24:33PM +0100, Florian Weimer wrote:
> * Stefan Hajnoczi:
> 
> > +#elif defined(__x86_64__)
> > +#define QEMU_CO_TLS_ADDR(ret, var)                              \
> > +    asm volatile("rdfsbase %0\n\t"                              \
> > +                 "lea "#var"@tpoff(%0), %0" : "=r"(ret))
> > +#endif
> 
> RDFSBASE needs quite recent kernels.  I think you should use
> 
>   movq %%fs:0, %0
> 
> instead, which is equivalent for the x86-64 psABI.

Nice trick! I remember reading that the address of the thread data is
stored in the first element of the thread data itself, so this makes
sense :).

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC v2 1/4] tls: add macros for coroutine-safe TLS variables
  2021-12-01 17:01 ` [RFC v2 1/4] " Stefan Hajnoczi
  2021-12-01 18:24   ` Florian Weimer
@ 2021-12-02 14:44   ` Peter Maydell
  2021-12-02 14:50     ` Peter Maydell
  2021-12-03  6:24     ` Serge Guelton
  1 sibling, 2 replies; 11+ messages in thread
From: Peter Maydell @ 2021-12-02 14:44 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: Kevin Wolf, fweimer, thuth, Daniel Berrange, qemu-block,
	Richard Henderson, qemu-devel, Paolo Bonzini, Fam Zheng,
	Warner Losh, sguelton

On Wed, 1 Dec 2021 at 17:19, Stefan Hajnoczi <stefanha@redhat.com> wrote:
>
> Compiler optimizations can cache TLS values across coroutine yield
> points, resulting in stale values from the previous thread when a
> coroutine is re-entered by a new thread.
>
> Serge Guelton developed an __attribute__((noinline)) wrapper and tested
> it with clang and gcc. I formatted his idea according to QEMU's coding
> style and wrote documentation.

> +#ifdef QEMU_CO_TLS_ADDR
> +#define QEMU_DEFINE_STATIC_CO_TLS(type, var)                    \
> +    __thread type co_tls_##var;  \
> +    static inline type get_##var(void)                          \
> +    { type *p; QEMU_CO_TLS_ADDR(p, co_tls_##var); return *p; }  \
> +    static inline void set_##var(type v)                        \
> +    { type *p; QEMU_CO_TLS_ADDR(p, co_tls_##var); *p = v; }     \
> +    static inline type *get_ptr_##var(void)                     \
> +    { type *p; QEMU_CO_TLS_ADDR(p, co_tls_##var); return p; }
> +#else
> +#define QEMU_DEFINE_STATIC_CO_TLS(type, var)                    \
> +    static __thread type co_tls_##var;                          \
> +    static __attribute__((noinline, unused)) type get_##var(void)       \
> +    { return co_tls_##var; }                                    \
> +    static __attribute__((noinline, unused)) void set_##var(type v)     \
> +    { co_tls_##var = v; }                                       \
> +    static __attribute__((noinline, unused)) type *get_ptr_##var(void)  \
> +    { return &co_tls_##var; }
> +#endif

My compiler-developer colleagues present the following case where
'noinline' is not sufficient for the compiler to definitely
use different values of the address-of-the-TLS-var across an
intervening function call:

  __thread int i;

  __attribute__((noinline)) long get_ptr_i()
  {
    return (long)&i;
  }

  void switcher();

  int g()
  {
    long a = get_ptr_i();
    switcher();
    return a == get_ptr_i();
  }

Trunk clang optimizes the comparison of the two pointers down
to "must be always true" even though they might not be if the
switcher() function has resulted in a change-of-thread:
  https://godbolt.org/z/hd67zh6jW

The 'optnone' attribute (clang-specific) seems to be able
to suppress this attribute. The equivalent on gcc may
(or may not) be 'noipa'.

-- PMM


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC v2 1/4] tls: add macros for coroutine-safe TLS variables
  2021-12-02 14:44   ` Peter Maydell
@ 2021-12-02 14:50     ` Peter Maydell
  2021-12-02 14:57       ` Florian Weimer
  2021-12-03  6:24     ` Serge Guelton
  1 sibling, 1 reply; 11+ messages in thread
From: Peter Maydell @ 2021-12-02 14:50 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: Kevin Wolf, fweimer, thuth, Daniel Berrange, qemu-block,
	Richard Henderson, qemu-devel, Paolo Bonzini, Fam Zheng,
	Warner Losh, sguelton

On Thu, 2 Dec 2021 at 14:44, Peter Maydell <peter.maydell@linaro.org> wrote:
> My compiler-developer colleagues present the following case where
> 'noinline' is not sufficient for the compiler to definitely
> use different values of the address-of-the-TLS-var across an
> intervening function call:
>
>   __thread int i;
>
>   __attribute__((noinline)) long get_ptr_i()
>   {
>     return (long)&i;
>   }
>
>   void switcher();
>
>   int g()
>   {
>     long a = get_ptr_i();
>     switcher();
>     return a == get_ptr_i();
>   }
>
> Trunk clang optimizes the comparison of the two pointers down
> to "must be always true" even though they might not be if the
> switcher() function has resulted in a change-of-thread:
>   https://godbolt.org/z/hd67zh6jW
>
> The 'optnone' attribute (clang-specific) seems to be able
> to suppress this

...no it doesn't -- although the get_ptr_i() function is
called both before and after, its return value is ignored
and g() always still returns 'true'.

-- PMM


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC v2 1/4] tls: add macros for coroutine-safe TLS variables
  2021-12-02 14:50     ` Peter Maydell
@ 2021-12-02 14:57       ` Florian Weimer
  0 siblings, 0 replies; 11+ messages in thread
From: Florian Weimer @ 2021-12-02 14:57 UTC (permalink / raw)
  To: Peter Maydell
  Cc: Kevin Wolf, Fam Zheng, thuth, Daniel Berrange, qemu-block,
	Richard Henderson, qemu-devel, Stefan Hajnoczi, Paolo Bonzini,
	Warner Losh, sguelton

* Peter Maydell:

> On Thu, 2 Dec 2021 at 14:44, Peter Maydell <peter.maydell@linaro.org> wrote:
>> My compiler-developer colleagues present the following case where
>> 'noinline' is not sufficient for the compiler to definitely
>> use different values of the address-of-the-TLS-var across an
>> intervening function call:
>>
>>   __thread int i;
>>
>>   __attribute__((noinline)) long get_ptr_i()
>>   {
>>     return (long)&i;
>>   }
>>
>>   void switcher();
>>
>>   int g()
>>   {
>>     long a = get_ptr_i();
>>     switcher();
>>     return a == get_ptr_i();
>>   }
>>
>> Trunk clang optimizes the comparison of the two pointers down
>> to "must be always true" even though they might not be if the
>> switcher() function has resulted in a change-of-thread:
>>   https://godbolt.org/z/hd67zh6jW
>>
>> The 'optnone' attribute (clang-specific) seems to be able
>> to suppress this
>
> ...no it doesn't -- although the get_ptr_i() function is
> called both before and after, its return value is ignored
> and g() always still returns 'true'.

__attribute__ ((weak)) would mark get_ptr_i as interposable and should
act as an optimization barrier for any compiler.  Unless the compiler
defaults -fno-semantic-interposition *and* feels very, very strongly
about its meaning (clang 13 doesn't seem to, despite the
-fno-semantic-interposition default even with -fpic).

Thanks,
Florian



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC v2 1/4] tls: add macros for coroutine-safe TLS variables
  2021-12-02 14:44   ` Peter Maydell
  2021-12-02 14:50     ` Peter Maydell
@ 2021-12-03  6:24     ` Serge Guelton
  1 sibling, 0 replies; 11+ messages in thread
From: Serge Guelton @ 2021-12-03  6:24 UTC (permalink / raw)
  To: Peter Maydell
  Cc: Kevin Wolf, fweimer, thuth, Daniel Berrange, qemu-block,
	Richard Henderson, qemu-devel, Stefan Hajnoczi, Paolo Bonzini,
	Fam Zheng, Warner Losh

On Thu, Dec 02, 2021 at 02:44:42PM +0000, Peter Maydell wrote:
> On Wed, 1 Dec 2021 at 17:19, Stefan Hajnoczi <stefanha@redhat.com> wrote:
> >
> > Compiler optimizations can cache TLS values across coroutine yield
> > points, resulting in stale values from the previous thread when a
> > coroutine is re-entered by a new thread.
> >
> > Serge Guelton developed an __attribute__((noinline)) wrapper and tested
> > it with clang and gcc. I formatted his idea according to QEMU's coding
> > style and wrote documentation.
> 
> > +#ifdef QEMU_CO_TLS_ADDR
> > +#define QEMU_DEFINE_STATIC_CO_TLS(type, var)                    \
> > +    __thread type co_tls_##var;  \
> > +    static inline type get_##var(void)                          \
> > +    { type *p; QEMU_CO_TLS_ADDR(p, co_tls_##var); return *p; }  \
> > +    static inline void set_##var(type v)                        \
> > +    { type *p; QEMU_CO_TLS_ADDR(p, co_tls_##var); *p = v; }     \
> > +    static inline type *get_ptr_##var(void)                     \
> > +    { type *p; QEMU_CO_TLS_ADDR(p, co_tls_##var); return p; }
> > +#else
> > +#define QEMU_DEFINE_STATIC_CO_TLS(type, var)                    \
> > +    static __thread type co_tls_##var;                          \
> > +    static __attribute__((noinline, unused)) type get_##var(void)       \
> > +    { return co_tls_##var; }                                    \
> > +    static __attribute__((noinline, unused)) void set_##var(type v)     \
> > +    { co_tls_##var = v; }                                       \
> > +    static __attribute__((noinline, unused)) type *get_ptr_##var(void)  \
> > +    { return &co_tls_##var; }
> > +#endif
> 
> My compiler-developer colleagues present the following case where
> 'noinline' is not sufficient for the compiler to definitely
> use different values of the address-of-the-TLS-var across an
> intervening function call:
> 
>   __thread int i;
> 
>   __attribute__((noinline)) long get_ptr_i()
>   {
>     return (long)&i;
>   }
> 
>   void switcher();
> 
>   int g()
>   {
>     long a = get_ptr_i();
>     switcher();
>     return a == get_ptr_i();
>   }

You can also force an extra mov through `volatile` as in
https://godbolt.org/z/hWvdb7o9G


^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2021-12-03  6:26 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-12-01 17:01 [RFC v2 0/4] tls: add macros for coroutine-safe TLS variables Stefan Hajnoczi
2021-12-01 17:01 ` [RFC v2 1/4] " Stefan Hajnoczi
2021-12-01 18:24   ` Florian Weimer
2021-12-02  9:53     ` Stefan Hajnoczi
2021-12-02 14:44   ` Peter Maydell
2021-12-02 14:50     ` Peter Maydell
2021-12-02 14:57       ` Florian Weimer
2021-12-03  6:24     ` Serge Guelton
2021-12-01 17:01 ` [RFC v2 2/4] util/async: replace __thread with QEMU TLS macros Stefan Hajnoczi
2021-12-01 17:01 ` [RFC v2 3/4] rcu: use coroutine " Stefan Hajnoczi
2021-12-01 17:01 ` [RFC v2 4/4] cpus: use coroutine TLS macros for iothread_locked Stefan Hajnoczi

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.