All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH 0/6] i386 + x86_64 mttcg
@ 2018-09-03 17:18 Emilio G. Cota
  2018-09-03 17:18 ` [Qemu-devel] [PATCH 1/6] qsp: drop atomics when using the seqlock Emilio G. Cota
                   ` (6 more replies)
  0 siblings, 7 replies; 23+ messages in thread
From: Emilio G. Cota @ 2018-09-03 17:18 UTC (permalink / raw)
  To: qemu-devel
  Cc: Paolo Bonzini, Peter Crosthwaite, Richard Henderson,
	Eduardo Habkost, Alex Bennée

I sent ~20 days ago a series that allowed me to boot x86_64 with mttcg:
  https://patchwork.kernel.org/cover/10564977/

Thanks to Paolo's work (already merged), we don't need to hold
the BQL when calling cpu_get_ticks, which makes the MTTCG conversion
even simpler. [ I have a couple more patches related to some of
that work, but I'll send them next week once Paolo is back ]

This series enables MTTCG for both x86_64 and i386. I tested both
with a buildroot image, and looked for races with Valgrind's helgrind.

The first four patches are not really necessary for i386 MTTCG,
but since they'll eventually go via Paolo's tree I included them here.

You can fetch this series from:
  https://github.com/cota/qemu/tree/i386-mttcg

Thanks,

		Emilio

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [Qemu-devel] [PATCH 1/6] qsp: drop atomics when using the seqlock
  2018-09-03 17:18 [Qemu-devel] [PATCH 0/6] i386 + x86_64 mttcg Emilio G. Cota
@ 2018-09-03 17:18 ` Emilio G. Cota
  2018-09-09 23:32   ` Paolo Bonzini
  2018-09-03 17:18 ` [Qemu-devel] [PATCH 2/6] test-rcu-list: avoid torn accesses to n_reclaims and n_nodes_removed Emilio G. Cota
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 23+ messages in thread
From: Emilio G. Cota @ 2018-09-03 17:18 UTC (permalink / raw)
  To: qemu-devel
  Cc: Paolo Bonzini, Peter Crosthwaite, Richard Henderson,
	Eduardo Habkost, Alex Bennée

Using atomics here is a mistake since they're not guaranteed
to compile.

Signed-off-by: Emilio G. Cota <cota@braap.org>
---
 util/qsp.c | 16 ++++++++++------
 1 file changed, 10 insertions(+), 6 deletions(-)

diff --git a/util/qsp.c b/util/qsp.c
index b0c2575d10..a1ee03b84b 100644
--- a/util/qsp.c
+++ b/util/qsp.c
@@ -351,6 +351,7 @@ static QSPEntry *qsp_entry_get(const void *obj, const char *file, int line,
 static void qsp_entry_aggregate(QSPEntry *to, const QSPEntry *from)
 {
 #ifdef CONFIG_ATOMIC64
+    /* use __nocheck because sizeof(void *) might be < sizeof(u64) */
     to->ns += atomic_read__nocheck(&from->ns);
     to->n_acqs += atomic_read__nocheck(&from->n_acqs);
 #else
@@ -359,8 +360,8 @@ static void qsp_entry_aggregate(QSPEntry *to, const QSPEntry *from)
 
     do {
         version = seqlock_read_begin(&from->sequence);
-        ns = atomic_read__nocheck(&from->ns);
-        n_acqs = atomic_read__nocheck(&from->n_acqs);
+        ns = from->ns;
+        n_acqs = from->n_acqs;
     } while (seqlock_read_retry(&from->sequence, version));
 
     to->ns += ns;
@@ -375,14 +376,17 @@ static void qsp_entry_aggregate(QSPEntry *to, const QSPEntry *from)
  */
 static inline void do_qsp_entry_record(QSPEntry *e, int64_t delta, bool acq)
 {
-#ifndef CONFIG_ATOMIC64
-    seqlock_write_begin(&e->sequence);
-#endif
+#ifdef CONFIG_ATOMIC64
     atomic_set__nocheck(&e->ns, e->ns + delta);
     if (acq) {
         atomic_set__nocheck(&e->n_acqs, e->n_acqs + 1);
     }
-#ifndef CONFIG_ATOMIC64
+#else
+    seqlock_write_begin(&e->sequence);
+    e->ns += delta;
+    if (acq) {
+        e->n_acqs++;
+    }
     seqlock_write_end(&e->sequence);
 #endif
 }
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [Qemu-devel] [PATCH 2/6] test-rcu-list: avoid torn accesses to n_reclaims and n_nodes_removed
  2018-09-03 17:18 [Qemu-devel] [PATCH 0/6] i386 + x86_64 mttcg Emilio G. Cota
  2018-09-03 17:18 ` [Qemu-devel] [PATCH 1/6] qsp: drop atomics when using the seqlock Emilio G. Cota
@ 2018-09-03 17:18 ` Emilio G. Cota
  2018-09-04 17:37   ` Murilo Opsfelder Araujo
  2018-09-03 17:18 ` [Qemu-devel] [PATCH 3/6] atomic: fix comment s/x64_64/x86_64/ Emilio G. Cota
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 23+ messages in thread
From: Emilio G. Cota @ 2018-09-03 17:18 UTC (permalink / raw)
  To: qemu-devel
  Cc: Paolo Bonzini, Peter Crosthwaite, Richard Henderson,
	Eduardo Habkost, Alex Bennée

Signed-off-by: Emilio G. Cota <cota@braap.org>
---
 tests/test-rcu-list.c | 67 +++++++++++++++++++++++++++++++++++++------
 1 file changed, 59 insertions(+), 8 deletions(-)

diff --git a/tests/test-rcu-list.c b/tests/test-rcu-list.c
index 192bfbf02e..2606b7c19d 100644
--- a/tests/test-rcu-list.c
+++ b/tests/test-rcu-list.c
@@ -25,6 +25,23 @@
 #include "qemu/rcu.h"
 #include "qemu/thread.h"
 #include "qemu/rcu_queue.h"
+#include "qemu/seqlock.h"
+
+/*
+ * Abstraction to avoid torn accesses when there is a single thread updating
+ * the count.
+ *
+ * If CONFIG_ATOMIC64 is defined, we simply use atomic accesses. Otherwise, we
+ * use a seqlock without a lock, since only one thread can update the count.
+ */
+struct Count {
+    long long val;
+#ifndef CONFIG_ATOMIC64
+    QemuSeqLock sequence;
+#endif
+};
+
+typedef struct Count Count;
 
 /*
  * Test variables.
@@ -33,8 +50,8 @@
 static QemuMutex counts_mutex;
 static long long n_reads = 0LL;
 static long long n_updates = 0LL;
-static long long n_reclaims = 0LL;
-static long long n_nodes_removed = 0LL;
+static Count n_reclaims;
+static Count n_nodes_removed;
 static long long n_nodes = 0LL;
 static int g_test_in_charge = 0;
 
@@ -60,6 +77,38 @@ static int select_random_el(int max)
     return (rand() % max);
 }
 
+static inline long long count_read(Count *c)
+{
+#ifdef CONFIG_ATOMIC64
+    /* use __nocheck because sizeof(void *) might be < sizeof(long long) */
+    return atomic_read__nocheck(&c->val);
+#else
+    unsigned int version;
+    long long val;
+
+    do {
+        version = seqlock_read_begin(&c->sequence);
+        val = c->val;
+    } while (seqlock_read_retry(&c->sequence, version));
+    return val;
+#endif
+}
+
+static inline void count_add(Count *c, long long val)
+{
+#ifdef CONFIG_ATOMIC64
+    atomic_set__nocheck(&c->val, c->val + val);
+#else
+    seqlock_write_begin(&c->sequence);
+    c->val += val;
+    seqlock_write_end(&c->sequence);
+#endif
+}
+
+static inline void count_inc(Count *c)
+{
+    count_add(c, 1);
+}
 
 static void create_thread(void *(*func)(void *))
 {
@@ -104,7 +153,7 @@ static void reclaim_list_el(struct rcu_head *prcu)
     struct list_element *el = container_of(prcu, struct list_element, rcu);
     g_free(el);
     /* Accessed only from call_rcu thread.  */
-    n_reclaims++;
+    count_inc(&n_reclaims);
 }
 
 #if TEST_LIST_TYPE == 1
@@ -232,7 +281,7 @@ static void *rcu_q_updater(void *arg)
     qemu_mutex_lock(&counts_mutex);
     n_nodes += n_nodes_local;
     n_updates += n_updates_local;
-    n_nodes_removed += n_removed_local;
+    count_add(&n_nodes_removed, n_removed_local);
     qemu_mutex_unlock(&counts_mutex);
     return NULL;
 }
@@ -286,19 +335,21 @@ static void rcu_qtest(const char *test, int duration, int nreaders)
         n_removed_local++;
     }
     qemu_mutex_lock(&counts_mutex);
-    n_nodes_removed += n_removed_local;
+    count_add(&n_nodes_removed, n_removed_local);
     qemu_mutex_unlock(&counts_mutex);
     synchronize_rcu();
-    while (n_nodes_removed > n_reclaims) {
+    while (count_read(&n_nodes_removed) > count_read(&n_reclaims)) {
         g_usleep(100);
         synchronize_rcu();
     }
     if (g_test_in_charge) {
-        g_assert_cmpint(n_nodes_removed, ==, n_reclaims);
+        g_assert_cmpint(count_read(&n_nodes_removed), ==,
+                        count_read(&n_reclaims));
     } else {
         printf("%s: %d readers; 1 updater; nodes read: "  \
                "%lld, nodes removed: %lld; nodes reclaimed: %lld\n",
-               test, nthreadsrunning - 1, n_reads, n_nodes_removed, n_reclaims);
+               test, nthreadsrunning - 1, n_reads,
+               count_read(&n_nodes_removed), count_read(&n_reclaims));
         exit(0);
     }
 }
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [Qemu-devel] [PATCH 3/6] atomic: fix comment s/x64_64/x86_64/
  2018-09-03 17:18 [Qemu-devel] [PATCH 0/6] i386 + x86_64 mttcg Emilio G. Cota
  2018-09-03 17:18 ` [Qemu-devel] [PATCH 1/6] qsp: drop atomics when using the seqlock Emilio G. Cota
  2018-09-03 17:18 ` [Qemu-devel] [PATCH 2/6] test-rcu-list: avoid torn accesses to n_reclaims and n_nodes_removed Emilio G. Cota
@ 2018-09-03 17:18 ` Emilio G. Cota
  2018-09-10  9:12   ` Alex Bennée
  2018-09-03 17:18 ` [Qemu-devel] [PATCH 4/6] cpus: initialize timers_state.vm_clock_lock Emilio G. Cota
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 23+ messages in thread
From: Emilio G. Cota @ 2018-09-03 17:18 UTC (permalink / raw)
  To: qemu-devel
  Cc: Paolo Bonzini, Peter Crosthwaite, Richard Henderson,
	Eduardo Habkost, Alex Bennée

Signed-off-by: Emilio G. Cota <cota@braap.org>
---
 include/qemu/atomic.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/qemu/atomic.h b/include/qemu/atomic.h
index 9ed39effd3..de3e36f400 100644
--- a/include/qemu/atomic.h
+++ b/include/qemu/atomic.h
@@ -98,7 +98,7 @@
  * We'd prefer not want to pull in everything else TCG related, so handle
  * those few cases by hand.
  *
- * Note that x32 is fully detected with __x64_64__ + _ILP32, and that for
+ * Note that x32 is fully detected with __x86_64__ + _ILP32, and that for
  * Sparc we always force the use of sparcv9 in configure.
  */
 #if defined(__x86_64__) || defined(__sparc__)
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [Qemu-devel] [PATCH 4/6] cpus: initialize timers_state.vm_clock_lock
  2018-09-03 17:18 [Qemu-devel] [PATCH 0/6] i386 + x86_64 mttcg Emilio G. Cota
                   ` (2 preceding siblings ...)
  2018-09-03 17:18 ` [Qemu-devel] [PATCH 3/6] atomic: fix comment s/x64_64/x86_64/ Emilio G. Cota
@ 2018-09-03 17:18 ` Emilio G. Cota
  2018-09-10  9:13   ` Alex Bennée
  2018-09-03 17:18 ` [Qemu-devel] [PATCH 5/6] target/i386/translate: use thread-local storage in !user-mode Emilio G. Cota
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 23+ messages in thread
From: Emilio G. Cota @ 2018-09-03 17:18 UTC (permalink / raw)
  To: qemu-devel
  Cc: Paolo Bonzini, Peter Crosthwaite, Richard Henderson,
	Eduardo Habkost, Alex Bennée

We forgot to initialize the spinlock introduced in 94377115b2
("cpus: protect TimerState writes with a spinlock", 2018-08-23).
Fix it.

Signed-off-by: Emilio G. Cota <cota@braap.org>
---
 cpus.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/cpus.c b/cpus.c
index 8ee6e5db93..ebc13bac2d 100644
--- a/cpus.c
+++ b/cpus.c
@@ -823,6 +823,7 @@ int cpu_throttle_get_percentage(void)
 void cpu_ticks_init(void)
 {
     seqlock_init(&timers_state.vm_clock_seqlock);
+    qemu_spin_init(&timers_state.vm_clock_lock);
     vmstate_register(NULL, 0, &vmstate_timers, &timers_state);
     throttle_timer = timer_new_ns(QEMU_CLOCK_VIRTUAL_RT,
                                            cpu_throttle_timer_tick, NULL);
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [Qemu-devel] [PATCH 5/6] target/i386/translate: use thread-local storage in !user-mode
  2018-09-03 17:18 [Qemu-devel] [PATCH 0/6] i386 + x86_64 mttcg Emilio G. Cota
                   ` (3 preceding siblings ...)
  2018-09-03 17:18 ` [Qemu-devel] [PATCH 4/6] cpus: initialize timers_state.vm_clock_lock Emilio G. Cota
@ 2018-09-03 17:18 ` Emilio G. Cota
  2018-09-10  9:17   ` Alex Bennée
  2018-09-03 17:18 ` [Qemu-devel] [PATCH 6/6] configure: enable mttcg for i386 and x86_64 Emilio G. Cota
  2018-09-11 11:25 ` [Qemu-devel] [PATCH 0/6] i386 + x86_64 mttcg Paolo Bonzini
  6 siblings, 1 reply; 23+ messages in thread
From: Emilio G. Cota @ 2018-09-03 17:18 UTC (permalink / raw)
  To: qemu-devel
  Cc: Paolo Bonzini, Peter Crosthwaite, Richard Henderson,
	Eduardo Habkost, Alex Bennée

Needed for MTTCG.

Signed-off-by: Emilio G. Cota <cota@braap.org>
---
 target/i386/translate.c | 24 ++++++++++++++++--------
 1 file changed, 16 insertions(+), 8 deletions(-)

diff --git a/target/i386/translate.c b/target/i386/translate.c
index 1f9d1d9b24..9a6a72e205 100644
--- a/target/i386/translate.c
+++ b/target/i386/translate.c
@@ -71,26 +71,34 @@
 
 //#define MACRO_TEST   1
 
+/* we need thread-local storage for mttcg */
+#ifdef CONFIG_USER_ONLY
+#define I386_THREAD
+#else
+#define I386_THREAD __thread
+#endif
+
 /* global register indexes */
-static TCGv cpu_A0;
-static TCGv cpu_cc_dst, cpu_cc_src, cpu_cc_src2, cpu_cc_srcT;
+static TCGv cpu_cc_dst, cpu_cc_src, cpu_cc_src2;
 static TCGv_i32 cpu_cc_op;
 static TCGv cpu_regs[CPU_NB_REGS];
 static TCGv cpu_seg_base[6];
 static TCGv_i64 cpu_bndl[4];
 static TCGv_i64 cpu_bndu[4];
 /* local temps */
-static TCGv cpu_T0, cpu_T1;
+static I386_THREAD TCGv cpu_cc_srcT;
+static I386_THREAD TCGv cpu_A0;
+static I386_THREAD TCGv cpu_T0, cpu_T1;
 /* local register indexes (only used inside old micro ops) */
-static TCGv cpu_tmp0, cpu_tmp4;
-static TCGv_ptr cpu_ptr0, cpu_ptr1;
-static TCGv_i32 cpu_tmp2_i32, cpu_tmp3_i32;
-static TCGv_i64 cpu_tmp1_i64;
+static I386_THREAD TCGv cpu_tmp0, cpu_tmp4;
+static I386_THREAD TCGv_ptr cpu_ptr0, cpu_ptr1;
+static I386_THREAD TCGv_i32 cpu_tmp2_i32, cpu_tmp3_i32;
+static I386_THREAD TCGv_i64 cpu_tmp1_i64;
 
 #include "exec/gen-icount.h"
 
 #ifdef TARGET_X86_64
-static int x86_64_hregs;
+static I386_THREAD int x86_64_hregs;
 #endif
 
 typedef struct DisasContext {
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [Qemu-devel] [PATCH 6/6] configure: enable mttcg for i386 and x86_64
  2018-09-03 17:18 [Qemu-devel] [PATCH 0/6] i386 + x86_64 mttcg Emilio G. Cota
                   ` (4 preceding siblings ...)
  2018-09-03 17:18 ` [Qemu-devel] [PATCH 5/6] target/i386/translate: use thread-local storage in !user-mode Emilio G. Cota
@ 2018-09-03 17:18 ` Emilio G. Cota
  2018-09-11 11:25 ` [Qemu-devel] [PATCH 0/6] i386 + x86_64 mttcg Paolo Bonzini
  6 siblings, 0 replies; 23+ messages in thread
From: Emilio G. Cota @ 2018-09-03 17:18 UTC (permalink / raw)
  To: qemu-devel
  Cc: Paolo Bonzini, Peter Crosthwaite, Richard Henderson,
	Eduardo Habkost, Alex Bennée

Signed-off-by: Emilio G. Cota <cota@braap.org>
---
 configure | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/configure b/configure
index 58862d2ae8..f715252c9f 100755
--- a/configure
+++ b/configure
@@ -7025,12 +7025,14 @@ TARGET_ABI_DIR=""
 
 case "$target_name" in
   i386)
+    mttcg="yes"
     gdb_xml_files="i386-32bit.xml i386-32bit-core.xml i386-32bit-sse.xml"
     target_compiler=$cross_cc_i386
     target_compiler_cflags=$cross_cc_ccflags_i386
   ;;
   x86_64)
     TARGET_BASE_ARCH=i386
+    mttcg="yes"
     gdb_xml_files="i386-64bit.xml i386-64bit-core.xml i386-64bit-sse.xml"
     target_compiler=$cross_cc_x86_64
   ;;
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [Qemu-devel] [PATCH 2/6] test-rcu-list: avoid torn accesses to n_reclaims and n_nodes_removed
  2018-09-03 17:18 ` [Qemu-devel] [PATCH 2/6] test-rcu-list: avoid torn accesses to n_reclaims and n_nodes_removed Emilio G. Cota
@ 2018-09-04 17:37   ` Murilo Opsfelder Araujo
  2018-09-04 19:35     ` Emilio G. Cota
  0 siblings, 1 reply; 23+ messages in thread
From: Murilo Opsfelder Araujo @ 2018-09-04 17:37 UTC (permalink / raw)
  To: Emilio G. Cota
  Cc: qemu-devel, Paolo Bonzini, Richard Henderson, Alex Bennée,
	Eduardo Habkost, Peter Crosthwaite

Hi, Emilio.

On Mon, Sep 03, 2018 at 01:18:27PM -0400, Emilio G. Cota wrote:
> Signed-off-by: Emilio G. Cota <cota@braap.org>
> ---
>  tests/test-rcu-list.c | 67 +++++++++++++++++++++++++++++++++++++------
>  1 file changed, 59 insertions(+), 8 deletions(-)
>
> diff --git a/tests/test-rcu-list.c b/tests/test-rcu-list.c
> index 192bfbf02e..2606b7c19d 100644
> --- a/tests/test-rcu-list.c
> +++ b/tests/test-rcu-list.c
> @@ -25,6 +25,23 @@
>  #include "qemu/rcu.h"
>  #include "qemu/thread.h"
>  #include "qemu/rcu_queue.h"
> +#include "qemu/seqlock.h"
> +
> +/*
> + * Abstraction to avoid torn accesses when there is a single thread updating
> + * the count.
> + *
> + * If CONFIG_ATOMIC64 is defined, we simply use atomic accesses. Otherwise, we
> + * use a seqlock without a lock, since only one thread can update the count.
> + */
> +struct Count {
> +    long long val;
> +#ifndef CONFIG_ATOMIC64
> +    QemuSeqLock sequence;
> +#endif
> +};
> +
> +typedef struct Count Count;
>
>  /*
>   * Test variables.
> @@ -33,8 +50,8 @@
>  static QemuMutex counts_mutex;
>  static long long n_reads = 0LL;
>  static long long n_updates = 0LL;
> -static long long n_reclaims = 0LL;
> -static long long n_nodes_removed = 0LL;
> +static Count n_reclaims;
> +static Count n_nodes_removed;

Don't we need to init the seqlocks?

  seqlock_init(&n_reclaims.sequence);
  seqlock_init(&n_nodes_removed.sequence);

Don't we need to init ->val with 0LL as well?

>  static long long n_nodes = 0LL;
>  static int g_test_in_charge = 0;
>
> @@ -60,6 +77,38 @@ static int select_random_el(int max)
>      return (rand() % max);
>  }
>
> +static inline long long count_read(Count *c)
> +{
> +#ifdef CONFIG_ATOMIC64
> +    /* use __nocheck because sizeof(void *) might be < sizeof(long long) */
> +    return atomic_read__nocheck(&c->val);
> +#else
> +    unsigned int version;
> +    long long val;
> +
> +    do {
> +        version = seqlock_read_begin(&c->sequence);
> +        val = c->val;
> +    } while (seqlock_read_retry(&c->sequence, version));
> +    return val;
> +#endif
> +}
> +
> +static inline void count_add(Count *c, long long val)
> +{
> +#ifdef CONFIG_ATOMIC64
> +    atomic_set__nocheck(&c->val, c->val + val);
> +#else
> +    seqlock_write_begin(&c->sequence);
> +    c->val += val;
> +    seqlock_write_end(&c->sequence);
> +#endif
> +}
> +
> +static inline void count_inc(Count *c)
> +{
> +    count_add(c, 1);
> +}

Are these `#ifdef CONFIG_ATOMIC64` required?

The bodies of

  seqlock_read_begin()
  seqlock_read_retry()
  seqlock_write_begin()
  seqlock_write_end()

in include/qemu/seqlock.h make me think that they already use atomic operations.
What am I missing?

>
>  static void create_thread(void *(*func)(void *))
>  {
> @@ -104,7 +153,7 @@ static void reclaim_list_el(struct rcu_head *prcu)
>      struct list_element *el = container_of(prcu, struct list_element, rcu);
>      g_free(el);
>      /* Accessed only from call_rcu thread.  */
> -    n_reclaims++;
> +    count_inc(&n_reclaims);
>  }
>
>  #if TEST_LIST_TYPE == 1
> @@ -232,7 +281,7 @@ static void *rcu_q_updater(void *arg)
>      qemu_mutex_lock(&counts_mutex);
>      n_nodes += n_nodes_local;
>      n_updates += n_updates_local;
> -    n_nodes_removed += n_removed_local;
> +    count_add(&n_nodes_removed, n_removed_local);

I'm just curious why n_nodes and n_updates don't need seqlocks.  Are
n_nodes_removed and n_reclaims some kind of special that seqlocks are required?

>      qemu_mutex_unlock(&counts_mutex);
>      return NULL;
>  }
> @@ -286,19 +335,21 @@ static void rcu_qtest(const char *test, int duration, int nreaders)
>          n_removed_local++;
>      }
>      qemu_mutex_lock(&counts_mutex);
> -    n_nodes_removed += n_removed_local;
> +    count_add(&n_nodes_removed, n_removed_local);
>      qemu_mutex_unlock(&counts_mutex);

Does this count_add() need to be guarded by a mutex?

>      synchronize_rcu();
> -    while (n_nodes_removed > n_reclaims) {
> +    while (count_read(&n_nodes_removed) > count_read(&n_reclaims)) {
>          g_usleep(100);
>          synchronize_rcu();
>      }
>      if (g_test_in_charge) {
> -        g_assert_cmpint(n_nodes_removed, ==, n_reclaims);
> +        g_assert_cmpint(count_read(&n_nodes_removed), ==,
> +                        count_read(&n_reclaims));
>      } else {
>          printf("%s: %d readers; 1 updater; nodes read: "  \
>                 "%lld, nodes removed: %lld; nodes reclaimed: %lld\n",
> -               test, nthreadsrunning - 1, n_reads, n_nodes_removed, n_reclaims);
> +               test, nthreadsrunning - 1, n_reads,
> +               count_read(&n_nodes_removed), count_read(&n_reclaims));
>          exit(0);
>      }
>  }
> --
> 2.17.1
>
>

--
Murilo

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Qemu-devel] [PATCH 2/6] test-rcu-list: avoid torn accesses to n_reclaims and n_nodes_removed
  2018-09-04 17:37   ` Murilo Opsfelder Araujo
@ 2018-09-04 19:35     ` Emilio G. Cota
  2018-09-04 20:56       ` Murilo Opsfelder Araujo
  0 siblings, 1 reply; 23+ messages in thread
From: Emilio G. Cota @ 2018-09-04 19:35 UTC (permalink / raw)
  To: Murilo Opsfelder Araujo
  Cc: qemu-devel, Paolo Bonzini, Richard Henderson, Alex Bennée,
	Eduardo Habkost, Peter Crosthwaite

On Tue, Sep 04, 2018 at 14:37:34 -0300, Murilo Opsfelder Araujo wrote:
> Hi, Emilio.
> 
> On Mon, Sep 03, 2018 at 01:18:27PM -0400, Emilio G. Cota wrote:
> > Signed-off-by: Emilio G. Cota <cota@braap.org>
> > ---
> >  tests/test-rcu-list.c | 67 +++++++++++++++++++++++++++++++++++++------
> >  1 file changed, 59 insertions(+), 8 deletions(-)
> >
> > diff --git a/tests/test-rcu-list.c b/tests/test-rcu-list.c
> > index 192bfbf02e..2606b7c19d 100644
> > --- a/tests/test-rcu-list.c
> > +++ b/tests/test-rcu-list.c
> > @@ -25,6 +25,23 @@
> >  #include "qemu/rcu.h"
> >  #include "qemu/thread.h"
> >  #include "qemu/rcu_queue.h"
> > +#include "qemu/seqlock.h"
> > +
> > +/*
> > + * Abstraction to avoid torn accesses when there is a single thread updating
> > + * the count.
> > + *
> > + * If CONFIG_ATOMIC64 is defined, we simply use atomic accesses. Otherwise, we
> > + * use a seqlock without a lock, since only one thread can update the count.
> > + */
> > +struct Count {
> > +    long long val;
> > +#ifndef CONFIG_ATOMIC64
> > +    QemuSeqLock sequence;
> > +#endif
> > +};
> > +
> > +typedef struct Count Count;
> >
> >  /*
> >   * Test variables.
> > @@ -33,8 +50,8 @@
> >  static QemuMutex counts_mutex;
> >  static long long n_reads = 0LL;
> >  static long long n_updates = 0LL;
> > -static long long n_reclaims = 0LL;
> > -static long long n_nodes_removed = 0LL;
> > +static Count n_reclaims;
> > +static Count n_nodes_removed;
> 
> Don't we need to init the seqlocks?
> 
>   seqlock_init(&n_reclaims.sequence);
>   seqlock_init(&n_nodes_removed.sequence);
> 
> Don't we need to init ->val with 0LL as well?

These are all zeroed out due to being static.

We could add seqlock_init calls just to be more clear, but seqlock_init
would not have any actual effect (it just sets the sequence to 0).

> >  static long long n_nodes = 0LL;
> >  static int g_test_in_charge = 0;
> >
> > @@ -60,6 +77,38 @@ static int select_random_el(int max)
> >      return (rand() % max);
> >  }
> >
> > +static inline long long count_read(Count *c)
> > +{
> > +#ifdef CONFIG_ATOMIC64
> > +    /* use __nocheck because sizeof(void *) might be < sizeof(long long) */
> > +    return atomic_read__nocheck(&c->val);
> > +#else
> > +    unsigned int version;
> > +    long long val;
> > +
> > +    do {
> > +        version = seqlock_read_begin(&c->sequence);
> > +        val = c->val;
> > +    } while (seqlock_read_retry(&c->sequence, version));
> > +    return val;
> > +#endif
> > +}
> > +
> > +static inline void count_add(Count *c, long long val)
> > +{
> > +#ifdef CONFIG_ATOMIC64
> > +    atomic_set__nocheck(&c->val, c->val + val);
> > +#else
> > +    seqlock_write_begin(&c->sequence);
> > +    c->val += val;
> > +    seqlock_write_end(&c->sequence);
> > +#endif
> > +}
> > +
> > +static inline void count_inc(Count *c)
> > +{
> > +    count_add(c, 1);
> > +}
> 
> Are these `#ifdef CONFIG_ATOMIC64` required?
> 
> The bodies of
> 
>   seqlock_read_begin()
>   seqlock_read_retry()
>   seqlock_write_begin()
>   seqlock_write_end()
> 
> in include/qemu/seqlock.h make me think that they already use atomic operations.
> What am I missing?

atomic_read/set(*foo), as defined in qemu/atomic.h, are as wide as
foo. The sequence number in a seqlock is an "unsigned", so those
atomics won't be larger than 32 bits.

The counts we're dealing with here are 64-bits, so with
#ifdef CONFIG_ATOMIC64 we ensure that the host can actually
perform those 64-bit atomic accesses.

> >
> >  static void create_thread(void *(*func)(void *))
> >  {
> > @@ -104,7 +153,7 @@ static void reclaim_list_el(struct rcu_head *prcu)
> >      struct list_element *el = container_of(prcu, struct list_element, rcu);
> >      g_free(el);
> >      /* Accessed only from call_rcu thread.  */
> > -    n_reclaims++;
> > +    count_inc(&n_reclaims);
> >  }
> >
> >  #if TEST_LIST_TYPE == 1
> > @@ -232,7 +281,7 @@ static void *rcu_q_updater(void *arg)
> >      qemu_mutex_lock(&counts_mutex);
> >      n_nodes += n_nodes_local;
> >      n_updates += n_updates_local;
> > -    n_nodes_removed += n_removed_local;
> > +    count_add(&n_nodes_removed, n_removed_local);
> 
> I'm just curious why n_nodes and n_updates don't need seqlocks.  Are
> n_nodes_removed and n_reclaims some kind of special that seqlocks are required?

n_nodes and n_updates are serialized by counts_mutex.

n_nodes_removed and n_reclaims don't really need the lock (even though
some accesses to it are protected by it; more on this below), since
they're updated by a single thread at a time. This is why just using
atomic_set is enough for these, and why we use seqlocks if the host
cannot do 64-bit atomic accesses.

Note that these "atomics" are not "read-modify-write"; they are
atomic in the sense that they prevent torn accesses, but
otherwise imply no memory fences or synchronization.

> >      qemu_mutex_unlock(&counts_mutex);
> >      return NULL;
> >  }
> > @@ -286,19 +335,21 @@ static void rcu_qtest(const char *test, int duration, int nreaders)
> >          n_removed_local++;
> >      }
> >      qemu_mutex_lock(&counts_mutex);
> > -    n_nodes_removed += n_removed_local;
> > +    count_add(&n_nodes_removed, n_removed_local);
> >      qemu_mutex_unlock(&counts_mutex);
> 
> Does this count_add() need to be guarded by a mutex?

No. In fact, accesses to n_nodes_removed don't need the mutex
at all, because only one thread writes to it at a time (first
the updater thread, then the main thread joins the updater
thread, and then the main thread updates it).

Performance-wise, this change wouldn't change much though, which
is why I decided not to include it. The main purpose of this
patch is to avoid undefined behaviour when different threads
access the same variable with regular accesses without always
holding the same lock, as is the case with the two variables
converted to Count.

Cheers,

		Emilio

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Qemu-devel] [PATCH 2/6] test-rcu-list: avoid torn accesses to n_reclaims and n_nodes_removed
  2018-09-04 19:35     ` Emilio G. Cota
@ 2018-09-04 20:56       ` Murilo Opsfelder Araujo
  2018-09-04 21:32         ` Emilio G. Cota
  2018-09-11 11:25         ` Paolo Bonzini
  0 siblings, 2 replies; 23+ messages in thread
From: Murilo Opsfelder Araujo @ 2018-09-04 20:56 UTC (permalink / raw)
  To: Emilio G. Cota
  Cc: qemu-devel, Paolo Bonzini, Richard Henderson, Alex Bennée,
	Eduardo Habkost, Peter Crosthwaite

Hi, Emilio.

On Tue, Sep 04, 2018 at 03:35:38PM -0400, Emilio G. Cota wrote:
> On Tue, Sep 04, 2018 at 14:37:34 -0300, Murilo Opsfelder Araujo wrote:
> > Hi, Emilio.
> > 
> > On Mon, Sep 03, 2018 at 01:18:27PM -0400, Emilio G. Cota wrote:
> > > Signed-off-by: Emilio G. Cota <cota@braap.org>
> > > ---
> > >  tests/test-rcu-list.c | 67 +++++++++++++++++++++++++++++++++++++------
> > >  1 file changed, 59 insertions(+), 8 deletions(-)
> > >
> > > diff --git a/tests/test-rcu-list.c b/tests/test-rcu-list.c
> > > index 192bfbf02e..2606b7c19d 100644
> > > --- a/tests/test-rcu-list.c
> > > +++ b/tests/test-rcu-list.c
> > > @@ -25,6 +25,23 @@
> > >  #include "qemu/rcu.h"
> > >  #include "qemu/thread.h"
> > >  #include "qemu/rcu_queue.h"
> > > +#include "qemu/seqlock.h"
> > > +
> > > +/*
> > > + * Abstraction to avoid torn accesses when there is a single thread updating
> > > + * the count.
> > > + *
> > > + * If CONFIG_ATOMIC64 is defined, we simply use atomic accesses. Otherwise, we
> > > + * use a seqlock without a lock, since only one thread can update the count.
> > > + */
> > > +struct Count {
> > > +    long long val;
> > > +#ifndef CONFIG_ATOMIC64
> > > +    QemuSeqLock sequence;
> > > +#endif
> > > +};
> > > +
> > > +typedef struct Count Count;
> > >
> > >  /*
> > >   * Test variables.
> > > @@ -33,8 +50,8 @@
> > >  static QemuMutex counts_mutex;
> > >  static long long n_reads = 0LL;
> > >  static long long n_updates = 0LL;
> > > -static long long n_reclaims = 0LL;
> > > -static long long n_nodes_removed = 0LL;
> > > +static Count n_reclaims;
> > > +static Count n_nodes_removed;
> > 
> > Don't we need to init the seqlocks?
> > 
> >   seqlock_init(&n_reclaims.sequence);
> >   seqlock_init(&n_nodes_removed.sequence);
> > 
> > Don't we need to init ->val with 0LL as well?
> 
> These are all zeroed out due to being static.
> 
> We could add seqlock_init calls just to be more clear, but seqlock_init
> would not have any actual effect (it just sets the sequence to 0).
> 
> > >  static long long n_nodes = 0LL;
> > >  static int g_test_in_charge = 0;
> > >
> > > @@ -60,6 +77,38 @@ static int select_random_el(int max)
> > >      return (rand() % max);
> > >  }
> > >
> > > +static inline long long count_read(Count *c)
> > > +{
> > > +#ifdef CONFIG_ATOMIC64
> > > +    /* use __nocheck because sizeof(void *) might be < sizeof(long long) */
> > > +    return atomic_read__nocheck(&c->val);
> > > +#else
> > > +    unsigned int version;
> > > +    long long val;
> > > +
> > > +    do {
> > > +        version = seqlock_read_begin(&c->sequence);
> > > +        val = c->val;
> > > +    } while (seqlock_read_retry(&c->sequence, version));
> > > +    return val;
> > > +#endif
> > > +}
> > > +
> > > +static inline void count_add(Count *c, long long val)
> > > +{
> > > +#ifdef CONFIG_ATOMIC64
> > > +    atomic_set__nocheck(&c->val, c->val + val);
> > > +#else
> > > +    seqlock_write_begin(&c->sequence);
> > > +    c->val += val;
> > > +    seqlock_write_end(&c->sequence);
> > > +#endif
> > > +}
> > > +
> > > +static inline void count_inc(Count *c)
> > > +{
> > > +    count_add(c, 1);
> > > +}
> > 
> > Are these `#ifdef CONFIG_ATOMIC64` required?
> > 
> > The bodies of
> > 
> >   seqlock_read_begin()
> >   seqlock_read_retry()
> >   seqlock_write_begin()
> >   seqlock_write_end()
> > 
> > in include/qemu/seqlock.h make me think that they already use atomic operations.
> > What am I missing?
> 
> atomic_read/set(*foo), as defined in qemu/atomic.h, are as wide as
> foo. The sequence number in a seqlock is an "unsigned", so those
> atomics won't be larger than 32 bits.
> 
> The counts we're dealing with here are 64-bits, so with
> #ifdef CONFIG_ATOMIC64 we ensure that the host can actually
> perform those 64-bit atomic accesses.
> 
> > >
> > >  static void create_thread(void *(*func)(void *))
> > >  {
> > > @@ -104,7 +153,7 @@ static void reclaim_list_el(struct rcu_head *prcu)
> > >      struct list_element *el = container_of(prcu, struct list_element, rcu);
> > >      g_free(el);
> > >      /* Accessed only from call_rcu thread.  */
> > > -    n_reclaims++;
> > > +    count_inc(&n_reclaims);
> > >  }
> > >
> > >  #if TEST_LIST_TYPE == 1
> > > @@ -232,7 +281,7 @@ static void *rcu_q_updater(void *arg)
> > >      qemu_mutex_lock(&counts_mutex);
> > >      n_nodes += n_nodes_local;
> > >      n_updates += n_updates_local;
> > > -    n_nodes_removed += n_removed_local;
> > > +    count_add(&n_nodes_removed, n_removed_local);
> > 
> > I'm just curious why n_nodes and n_updates don't need seqlocks.  Are
> > n_nodes_removed and n_reclaims some kind of special that seqlocks are required?
> 
> n_nodes and n_updates are serialized by counts_mutex.
> 
> n_nodes_removed and n_reclaims don't really need the lock (even though
> some accesses to it are protected by it; more on this below), since
> they're updated by a single thread at a time. This is why just using
> atomic_set is enough for these, and why we use seqlocks if the host
> cannot do 64-bit atomic accesses.
> 
> Note that these "atomics" are not "read-modify-write"; they are
> atomic in the sense that they prevent torn accesses, but
> otherwise imply no memory fences or synchronization.
> 
> > >      qemu_mutex_unlock(&counts_mutex);
> > >      return NULL;
> > >  }
> > > @@ -286,19 +335,21 @@ static void rcu_qtest(const char *test, int duration, int nreaders)
> > >          n_removed_local++;
> > >      }
> > >      qemu_mutex_lock(&counts_mutex);
> > > -    n_nodes_removed += n_removed_local;
> > > +    count_add(&n_nodes_removed, n_removed_local);
> > >      qemu_mutex_unlock(&counts_mutex);
> > 
> > Does this count_add() need to be guarded by a mutex?
> 
> No. In fact, accesses to n_nodes_removed don't need the mutex
> at all, because only one thread writes to it at a time (first
> the updater thread, then the main thread joins the updater
> thread, and then the main thread updates it).
> 
> Performance-wise, this change wouldn't change much though, which
> is why I decided not to include it. The main purpose of this
> patch is to avoid undefined behaviour when different threads
> access the same variable with regular accesses without always
> holding the same lock, as is the case with the two variables
> converted to Count.
> 
> Cheers,
> 
> 		Emilio
> 

The explanations you provided made a lot of difference on understanding the why
of your patch.  Thank you!

Is it possible to enhance commit message and add the explanations?

-- 
Murilo

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Qemu-devel] [PATCH 2/6] test-rcu-list: avoid torn accesses to n_reclaims and n_nodes_removed
  2018-09-04 20:56       ` Murilo Opsfelder Araujo
@ 2018-09-04 21:32         ` Emilio G. Cota
  2018-09-11 11:25         ` Paolo Bonzini
  1 sibling, 0 replies; 23+ messages in thread
From: Emilio G. Cota @ 2018-09-04 21:32 UTC (permalink / raw)
  To: Murilo Opsfelder Araujo
  Cc: qemu-devel, Paolo Bonzini, Richard Henderson, Alex Bennée,
	Eduardo Habkost, Peter Crosthwaite

On Tue, Sep 04, 2018 at 17:56:31 -0300, Murilo Opsfelder Araujo wrote:
> The explanations you provided made a lot of difference on understanding the why
> of your patch.  Thank you!

Glad that helped!

> Is it possible to enhance commit message and add the explanations?

If I end up submitting a v2 of this series due to code-related
changes, then I'll expand this patch's commit message.

Thanks,

		E.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Qemu-devel] [PATCH 1/6] qsp: drop atomics when using the seqlock
  2018-09-03 17:18 ` [Qemu-devel] [PATCH 1/6] qsp: drop atomics when using the seqlock Emilio G. Cota
@ 2018-09-09 23:32   ` Paolo Bonzini
  2018-09-10 15:44     ` Emilio G. Cota
  0 siblings, 1 reply; 23+ messages in thread
From: Paolo Bonzini @ 2018-09-09 23:32 UTC (permalink / raw)
  To: Emilio G. Cota, qemu-devel
  Cc: Peter Crosthwaite, Richard Henderson, Eduardo Habkost, Alex Bennée

On 03/09/2018 19:18, Emilio G. Cota wrote:
> Using atomics here is a mistake since they're not guaranteed
> to compile.

But isn't it technically a C11 data race if you don't use atomics?
Could we make nocheck read/set degrade to just a volatile access when
used on a variable that is bigger than pointers, or perhaps always
except when using tsan?

Paolo

> Signed-off-by: Emilio G. Cota <cota@braap.org>
> ---
>  util/qsp.c | 16 ++++++++++------
>  1 file changed, 10 insertions(+), 6 deletions(-)
> 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Qemu-devel] [PATCH 3/6] atomic: fix comment s/x64_64/x86_64/
  2018-09-03 17:18 ` [Qemu-devel] [PATCH 3/6] atomic: fix comment s/x64_64/x86_64/ Emilio G. Cota
@ 2018-09-10  9:12   ` Alex Bennée
  0 siblings, 0 replies; 23+ messages in thread
From: Alex Bennée @ 2018-09-10  9:12 UTC (permalink / raw)
  To: Emilio G. Cota
  Cc: qemu-devel, Paolo Bonzini, Peter Crosthwaite, Richard Henderson,
	Eduardo Habkost


Emilio G. Cota <cota@braap.org> writes:

> Signed-off-by: Emilio G. Cota <cota@braap.org>
> ---
>  include/qemu/atomic.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/include/qemu/atomic.h b/include/qemu/atomic.h
> index 9ed39effd3..de3e36f400 100644
> --- a/include/qemu/atomic.h
> +++ b/include/qemu/atomic.h
> @@ -98,7 +98,7 @@
>   * We'd prefer not want to pull in everything else TCG related, so handle
>   * those few cases by hand.
>   *
> - * Note that x32 is fully detected with __x64_64__ + _ILP32, and that for
> + * Note that x32 is fully detected with __x86_64__ + _ILP32, and that for
>   * Sparc we always force the use of sparcv9 in configure.
>   */
>  #if defined(__x86_64__) || defined(__sparc__)

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>


-- 
Alex Bennée

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Qemu-devel] [PATCH 4/6] cpus: initialize timers_state.vm_clock_lock
  2018-09-03 17:18 ` [Qemu-devel] [PATCH 4/6] cpus: initialize timers_state.vm_clock_lock Emilio G. Cota
@ 2018-09-10  9:13   ` Alex Bennée
  0 siblings, 0 replies; 23+ messages in thread
From: Alex Bennée @ 2018-09-10  9:13 UTC (permalink / raw)
  To: Emilio G. Cota
  Cc: qemu-devel, Paolo Bonzini, Peter Crosthwaite, Richard Henderson,
	Eduardo Habkost


Emilio G. Cota <cota@braap.org> writes:

> We forgot to initialize the spinlock introduced in 94377115b2
> ("cpus: protect TimerState writes with a spinlock", 2018-08-23).
> Fix it.
>
> Signed-off-by: Emilio G. Cota <cota@braap.org>
> ---
>  cpus.c | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/cpus.c b/cpus.c
> index 8ee6e5db93..ebc13bac2d 100644
> --- a/cpus.c
> +++ b/cpus.c
> @@ -823,6 +823,7 @@ int cpu_throttle_get_percentage(void)
>  void cpu_ticks_init(void)
>  {
>      seqlock_init(&timers_state.vm_clock_seqlock);
> +    qemu_spin_init(&timers_state.vm_clock_lock);
>      vmstate_register(NULL, 0, &vmstate_timers, &timers_state);
>      throttle_timer = timer_new_ns(QEMU_CLOCK_VIRTUAL_RT,
>                                             cpu_throttle_timer_tick, NULL);


Reviewed-by: Alex Bennée <alex.bennee@linaro.org>

-- 
Alex Bennée

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Qemu-devel] [PATCH 5/6] target/i386/translate: use thread-local storage in !user-mode
  2018-09-03 17:18 ` [Qemu-devel] [PATCH 5/6] target/i386/translate: use thread-local storage in !user-mode Emilio G. Cota
@ 2018-09-10  9:17   ` Alex Bennée
  2018-09-10 12:30     ` Emilio G. Cota
  0 siblings, 1 reply; 23+ messages in thread
From: Alex Bennée @ 2018-09-10  9:17 UTC (permalink / raw)
  To: Emilio G. Cota
  Cc: qemu-devel, Paolo Bonzini, Peter Crosthwaite, Richard Henderson,
	Eduardo Habkost


Emilio G. Cota <cota@braap.org> writes:

> Needed for MTTCG.
>
> Signed-off-by: Emilio G. Cota <cota@braap.org>
> ---
>  target/i386/translate.c | 24 ++++++++++++++++--------
>  1 file changed, 16 insertions(+), 8 deletions(-)
>
> diff --git a/target/i386/translate.c b/target/i386/translate.c
> index 1f9d1d9b24..9a6a72e205 100644
> --- a/target/i386/translate.c
> +++ b/target/i386/translate.c
> @@ -71,26 +71,34 @@
>  
>  //#define MACRO_TEST   1
>  
> +/* we need thread-local storage for mttcg */
> +#ifdef CONFIG_USER_ONLY
> +#define I386_THREAD
> +#else
> +#define I386_THREAD __thread
> +#endif
> +

I'm confused - as we can have multi-threaded user space don't the same
requirements apply?

>  /* global register indexes */
> -static TCGv cpu_A0;
> -static TCGv cpu_cc_dst, cpu_cc_src, cpu_cc_src2, cpu_cc_srcT;
> +static TCGv cpu_cc_dst, cpu_cc_src, cpu_cc_src2;
>  static TCGv_i32 cpu_cc_op;
>  static TCGv cpu_regs[CPU_NB_REGS];
>  static TCGv cpu_seg_base[6];
>  static TCGv_i64 cpu_bndl[4];
>  static TCGv_i64 cpu_bndu[4];
>  /* local temps */
> -static TCGv cpu_T0, cpu_T1;
> +static I386_THREAD TCGv cpu_cc_srcT;
> +static I386_THREAD TCGv cpu_A0;
> +static I386_THREAD TCGv cpu_T0, cpu_T1;
>  /* local register indexes (only used inside old micro ops) */
> -static TCGv cpu_tmp0, cpu_tmp4;
> -static TCGv_ptr cpu_ptr0, cpu_ptr1;
> -static TCGv_i32 cpu_tmp2_i32, cpu_tmp3_i32;
> -static TCGv_i64 cpu_tmp1_i64;
> +static I386_THREAD TCGv cpu_tmp0, cpu_tmp4;
> +static I386_THREAD TCGv_ptr cpu_ptr0, cpu_ptr1;
> +static I386_THREAD TCGv_i32 cpu_tmp2_i32, cpu_tmp3_i32;
> +static I386_THREAD TCGv_i64 cpu_tmp1_i64;
>  
>  #include "exec/gen-icount.h"
>  
>  #ifdef TARGET_X86_64
> -static int x86_64_hregs;
> +static I386_THREAD int x86_64_hregs;
>  #endif
>  
>  typedef struct DisasContext {


-- 
Alex Bennée

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Qemu-devel] [PATCH 5/6] target/i386/translate: use thread-local storage in !user-mode
  2018-09-10  9:17   ` Alex Bennée
@ 2018-09-10 12:30     ` Emilio G. Cota
  2018-09-10 13:43       ` Alex Bennée
  2018-09-11 11:24       ` Paolo Bonzini
  0 siblings, 2 replies; 23+ messages in thread
From: Emilio G. Cota @ 2018-09-10 12:30 UTC (permalink / raw)
  To: Alex Bennée
  Cc: qemu-devel, Paolo Bonzini, Peter Crosthwaite, Richard Henderson,
	Eduardo Habkost

On Mon, Sep 10, 2018 at 10:17:53 +0100, Alex Bennée wrote:
> 
> Emilio G. Cota <cota@braap.org> writes:
> 
> > Needed for MTTCG.
> >
> > Signed-off-by: Emilio G. Cota <cota@braap.org>
> > ---
> >  target/i386/translate.c | 24 ++++++++++++++++--------
> >  1 file changed, 16 insertions(+), 8 deletions(-)
> >
> > diff --git a/target/i386/translate.c b/target/i386/translate.c
> > index 1f9d1d9b24..9a6a72e205 100644
> > --- a/target/i386/translate.c
> > +++ b/target/i386/translate.c
> > @@ -71,26 +71,34 @@
> >  
> >  //#define MACRO_TEST   1
> >  
> > +/* we need thread-local storage for mttcg */
> > +#ifdef CONFIG_USER_ONLY
> > +#define I386_THREAD
> > +#else
> > +#define I386_THREAD __thread
> > +#endif
> > +
> 
> I'm confused - as we can have multi-threaded user space don't the same
> requirements apply?

In user-mode, code generation is serialized by mmap_lock.
Making these per-thread would just waste TLS space.

		E.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Qemu-devel] [PATCH 5/6] target/i386/translate: use thread-local storage in !user-mode
  2018-09-10 12:30     ` Emilio G. Cota
@ 2018-09-10 13:43       ` Alex Bennée
  2018-09-11 11:24       ` Paolo Bonzini
  1 sibling, 0 replies; 23+ messages in thread
From: Alex Bennée @ 2018-09-10 13:43 UTC (permalink / raw)
  To: Emilio G. Cota
  Cc: qemu-devel, Paolo Bonzini, Peter Crosthwaite, Richard Henderson,
	Eduardo Habkost


Emilio G. Cota <cota@braap.org> writes:

> On Mon, Sep 10, 2018 at 10:17:53 +0100, Alex Bennée wrote:
>>
>> Emilio G. Cota <cota@braap.org> writes:
>>
>> > Needed for MTTCG.
>> >
>> > Signed-off-by: Emilio G. Cota <cota@braap.org>
>> > ---
>> >  target/i386/translate.c | 24 ++++++++++++++++--------
>> >  1 file changed, 16 insertions(+), 8 deletions(-)
>> >
>> > diff --git a/target/i386/translate.c b/target/i386/translate.c
>> > index 1f9d1d9b24..9a6a72e205 100644
>> > --- a/target/i386/translate.c
>> > +++ b/target/i386/translate.c
>> > @@ -71,26 +71,34 @@
>> >
>> >  //#define MACRO_TEST   1
>> >
>> > +/* we need thread-local storage for mttcg */
>> > +#ifdef CONFIG_USER_ONLY
>> > +#define I386_THREAD
>> > +#else
>> > +#define I386_THREAD __thread
>> > +#endif
>> > +
>>
>> I'm confused - as we can have multi-threaded user space don't the same
>> requirements apply?
>
> In user-mode, code generation is serialized by mmap_lock.
> Making these per-thread would just waste TLS space.

Ahh this is still the case - ok.

Acked-by: Alex Bennée <alex.bennee@linaro.org>

--
Alex Bennée

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Qemu-devel] [PATCH 1/6] qsp: drop atomics when using the seqlock
  2018-09-09 23:32   ` Paolo Bonzini
@ 2018-09-10 15:44     ` Emilio G. Cota
  2018-09-11 11:11       ` Paolo Bonzini
  0 siblings, 1 reply; 23+ messages in thread
From: Emilio G. Cota @ 2018-09-10 15:44 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: qemu-devel, Peter Crosthwaite, Richard Henderson,
	Eduardo Habkost, Alex Bennée

On Mon, Sep 10, 2018 at 01:32:15 +0200, Paolo Bonzini wrote:
> On 03/09/2018 19:18, Emilio G. Cota wrote:
> > Using atomics here is a mistake since they're not guaranteed
> > to compile.
> 
> But isn't it technically a C11 data race if you don't use atomics?

Yes, it's undefined behaviour.

> Could we make nocheck read/set degrade to just a volatile access when
> used on a variable that is bigger than pointers, or perhaps always
> except when using tsan?

But volatile wouldn't save you from undefined behaviour, would it?

A simpler and definitely correct alternative is to just use a
spinlock instead of the seqlock also for reads when !CONFIG_ATOMIC64.
We don't care about scalability on those rare hosts anyway, so
I'd go with that.

Thanks,

		Emilio

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Qemu-devel] [PATCH 1/6] qsp: drop atomics when using the seqlock
  2018-09-10 15:44     ` Emilio G. Cota
@ 2018-09-11 11:11       ` Paolo Bonzini
  0 siblings, 0 replies; 23+ messages in thread
From: Paolo Bonzini @ 2018-09-11 11:11 UTC (permalink / raw)
  To: Emilio G. Cota
  Cc: qemu-devel, Peter Crosthwaite, Richard Henderson,
	Eduardo Habkost, Alex Bennée

On 10/09/2018 17:44, Emilio G. Cota wrote:
> On Mon, Sep 10, 2018 at 01:32:15 +0200, Paolo Bonzini wrote:
>> On 03/09/2018 19:18, Emilio G. Cota wrote:
>>> Using atomics here is a mistake since they're not guaranteed
>>> to compile.
>>
>> But isn't it technically a C11 data race if you don't use atomics?
> 
> Yes, it's undefined behaviour.
> 
>> Could we make nocheck read/set degrade to just a volatile access when
>> used on a variable that is bigger than pointers, or perhaps always
>> except when using tsan?
> 
> But volatile wouldn't save you from undefined behaviour, would it?

Yeah, but 1) only on those hosts that cannot do CONFIG_ATOMIC64 2) we
pretty much already define what we expect from volatile.

Paolo

> A simpler and definitely correct alternative is to just use a
> spinlock instead of the seqlock also for reads when !CONFIG_ATOMIC64.
> We don't care about scalability on those rare hosts anyway, so
> I'd go with that.
> 
> Thanks,
> 
> 		Emilio
> 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Qemu-devel] [PATCH 5/6] target/i386/translate: use thread-local storage in !user-mode
  2018-09-10 12:30     ` Emilio G. Cota
  2018-09-10 13:43       ` Alex Bennée
@ 2018-09-11 11:24       ` Paolo Bonzini
  2018-09-11 17:21         ` Emilio G. Cota
  1 sibling, 1 reply; 23+ messages in thread
From: Paolo Bonzini @ 2018-09-11 11:24 UTC (permalink / raw)
  To: Emilio G. Cota, Alex Bennée
  Cc: qemu-devel, Peter Crosthwaite, Richard Henderson, Eduardo Habkost

On 10/09/2018 14:30, Emilio G. Cota wrote:
>> I'm confused - as we can have multi-threaded user space don't the same
>> requirements apply?
> In user-mode, code generation is serialized by mmap_lock.
> Making these per-thread would just waste TLS space.

It's stupid question time!  How can the TLS work?  tcg_x86_init is only
called once, the first time cpu_exec_realizefn is called.

Either they can be kept in non-TLS, or you should move them to DisasContext.

Paolo

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Qemu-devel] [PATCH 2/6] test-rcu-list: avoid torn accesses to n_reclaims and n_nodes_removed
  2018-09-04 20:56       ` Murilo Opsfelder Araujo
  2018-09-04 21:32         ` Emilio G. Cota
@ 2018-09-11 11:25         ` Paolo Bonzini
  1 sibling, 0 replies; 23+ messages in thread
From: Paolo Bonzini @ 2018-09-11 11:25 UTC (permalink / raw)
  To: Murilo Opsfelder Araujo, Emilio G. Cota
  Cc: qemu-devel, Richard Henderson, Alex Bennée, Eduardo Habkost,
	Peter Crosthwaite

On 04/09/2018 22:56, Murilo Opsfelder Araujo wrote:
>>> +static inline void count_add(Count *c, long long val)
>>> +{
>>> +#ifdef CONFIG_ATOMIC64
>>> +    atomic_set__nocheck(&c->val, c->val + val);
>>> +#else
>>> +    seqlock_write_begin(&c->sequence);
>>> +    c->val += val;
>>> +    seqlock_write_end(&c->sequence);
>>> +#endif
>>> +}
>>> +
>>> +static inline void count_inc(Count *c)
>>> +{
>>> +    count_add(c, 1);
>>> +}
>> Are these `#ifdef CONFIG_ATOMIC64` required?
>>
>> The bodies of
>>
>>   seqlock_read_begin()
>>   seqlock_read_retry()
>>   seqlock_write_begin()
>>   seqlock_write_end()
>>
>> in include/qemu/seqlock.h make me think that they already use atomic operations.
>> What am I missing?
> atomic_read/set(*foo), as defined in qemu/atomic.h, are as wide as
> foo. The sequence number in a seqlock is an "unsigned", so those
> atomics won't be larger than 32 bits.
> 
> The counts we're dealing with here are 64-bits, so with
> #ifdef CONFIG_ATOMIC64 we ensure that the host can actually
> perform those 64-bit atomic accesses.

Like for patch 1, I'm not sure I like introducing the data races...
While reads inside the seqlock do not need atomics, I think the write
should be atomic in both cases.

Paolo

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Qemu-devel] [PATCH 0/6] i386 + x86_64 mttcg
  2018-09-03 17:18 [Qemu-devel] [PATCH 0/6] i386 + x86_64 mttcg Emilio G. Cota
                   ` (5 preceding siblings ...)
  2018-09-03 17:18 ` [Qemu-devel] [PATCH 6/6] configure: enable mttcg for i386 and x86_64 Emilio G. Cota
@ 2018-09-11 11:25 ` Paolo Bonzini
  6 siblings, 0 replies; 23+ messages in thread
From: Paolo Bonzini @ 2018-09-11 11:25 UTC (permalink / raw)
  To: Emilio G. Cota, qemu-devel
  Cc: Peter Crosthwaite, Richard Henderson, Eduardo Habkost, Alex Bennée

On 03/09/2018 19:18, Emilio G. Cota wrote:
> I sent ~20 days ago a series that allowed me to boot x86_64 with mttcg:
>   https://patchwork.kernel.org/cover/10564977/
> 
> Thanks to Paolo's work (already merged), we don't need to hold
> the BQL when calling cpu_get_ticks, which makes the MTTCG conversion
> even simpler. [ I have a couple more patches related to some of
> that work, but I'll send them next week once Paolo is back ]
> 
> This series enables MTTCG for both x86_64 and i386. I tested both
> with a buildroot image, and looked for races with Valgrind's helgrind.
> 
> The first four patches are not really necessary for i386 MTTCG,
> but since they'll eventually go via Paolo's tree I included them here.

Applied 3-4 for now.

Paolo

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Qemu-devel] [PATCH 5/6] target/i386/translate: use thread-local storage in !user-mode
  2018-09-11 11:24       ` Paolo Bonzini
@ 2018-09-11 17:21         ` Emilio G. Cota
  0 siblings, 0 replies; 23+ messages in thread
From: Emilio G. Cota @ 2018-09-11 17:21 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Alex Bennée, qemu-devel, Peter Crosthwaite,
	Richard Henderson, Eduardo Habkost

On Tue, Sep 11, 2018 at 13:24:03 +0200, Paolo Bonzini wrote:
> On 10/09/2018 14:30, Emilio G. Cota wrote:
> >> I'm confused - as we can have multi-threaded user space don't the same
> >> requirements apply?
> > In user-mode, code generation is serialized by mmap_lock.
> > Making these per-thread would just waste TLS space.
> 
> It's stupid question time!  How can the TLS work?  tcg_x86_init is only
> called once, the first time cpu_exec_realizefn is called.
>
> Either they can be kept in non-TLS, or you should move them to DisasContext.

Yes, the latter is the Right Thing (tm), as both you and Richard
pointed out. I should have done that in the first place.

Will send a v3 with just this change + the configure patch.

Thanks,

		Emilio

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2018-09-11 17:21 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-09-03 17:18 [Qemu-devel] [PATCH 0/6] i386 + x86_64 mttcg Emilio G. Cota
2018-09-03 17:18 ` [Qemu-devel] [PATCH 1/6] qsp: drop atomics when using the seqlock Emilio G. Cota
2018-09-09 23:32   ` Paolo Bonzini
2018-09-10 15:44     ` Emilio G. Cota
2018-09-11 11:11       ` Paolo Bonzini
2018-09-03 17:18 ` [Qemu-devel] [PATCH 2/6] test-rcu-list: avoid torn accesses to n_reclaims and n_nodes_removed Emilio G. Cota
2018-09-04 17:37   ` Murilo Opsfelder Araujo
2018-09-04 19:35     ` Emilio G. Cota
2018-09-04 20:56       ` Murilo Opsfelder Araujo
2018-09-04 21:32         ` Emilio G. Cota
2018-09-11 11:25         ` Paolo Bonzini
2018-09-03 17:18 ` [Qemu-devel] [PATCH 3/6] atomic: fix comment s/x64_64/x86_64/ Emilio G. Cota
2018-09-10  9:12   ` Alex Bennée
2018-09-03 17:18 ` [Qemu-devel] [PATCH 4/6] cpus: initialize timers_state.vm_clock_lock Emilio G. Cota
2018-09-10  9:13   ` Alex Bennée
2018-09-03 17:18 ` [Qemu-devel] [PATCH 5/6] target/i386/translate: use thread-local storage in !user-mode Emilio G. Cota
2018-09-10  9:17   ` Alex Bennée
2018-09-10 12:30     ` Emilio G. Cota
2018-09-10 13:43       ` Alex Bennée
2018-09-11 11:24       ` Paolo Bonzini
2018-09-11 17:21         ` Emilio G. Cota
2018-09-03 17:18 ` [Qemu-devel] [PATCH 6/6] configure: enable mttcg for i386 and x86_64 Emilio G. Cota
2018-09-11 11:25 ` [Qemu-devel] [PATCH 0/6] i386 + x86_64 mttcg Paolo Bonzini

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.