* [Qemu-devel] [PATCH v2 02/13] compiler.h: add QEMU_ALIGNED() to enforce struct alignment
2016-04-07 17:32 [Qemu-devel] [PATCH v2 00/10] tb hash improvements Emilio G. Cota
@ 2016-04-07 17:32 ` Emilio G. Cota
2016-04-08 18:26 ` Richard Henderson
2016-04-07 17:32 ` [Qemu-devel] [PATCH v2 03/13] seqlock: remove optional mutex Emilio G. Cota
` (10 subsequent siblings)
11 siblings, 1 reply; 28+ messages in thread
From: Emilio G. Cota @ 2016-04-07 17:32 UTC (permalink / raw)
To: QEMU Developers, MTTCG Devel
Cc: Alex Bennée, Paolo Bonzini, Peter Crosthwaite,
Richard Henderson, Peter Maydell, Sergey Fedorov
Signed-off-by: Emilio G. Cota <cota@braap.org>
---
include/qemu/compiler.h | 2 ++
1 file changed, 2 insertions(+)
diff --git a/include/qemu/compiler.h b/include/qemu/compiler.h
index 8f1cc7b..1978ddc 100644
--- a/include/qemu/compiler.h
+++ b/include/qemu/compiler.h
@@ -41,6 +41,8 @@
# define QEMU_PACKED __attribute__((packed))
#endif
+#define QEMU_ALIGNED(B) __attribute__((aligned(B)))
+
#ifndef glue
#define xglue(x, y) x ## y
#define glue(x, y) xglue(x, y)
--
2.5.0
^ permalink raw reply related [flat|nested] 28+ messages in thread
* [Qemu-devel] [PATCH v2 03/13] seqlock: remove optional mutex
2016-04-07 17:32 [Qemu-devel] [PATCH v2 00/10] tb hash improvements Emilio G. Cota
2016-04-07 17:32 ` [Qemu-devel] [PATCH v2 02/13] compiler.h: add QEMU_ALIGNED() to enforce struct alignment Emilio G. Cota
@ 2016-04-07 17:32 ` Emilio G. Cota
2016-04-08 18:26 ` Richard Henderson
2016-04-07 17:32 ` [Qemu-devel] [PATCH v2 04/13] seqlock: rename write_lock/unlock to write_begin/end Emilio G. Cota
` (9 subsequent siblings)
11 siblings, 1 reply; 28+ messages in thread
From: Emilio G. Cota @ 2016-04-07 17:32 UTC (permalink / raw)
To: QEMU Developers, MTTCG Devel
Cc: Alex Bennée, Paolo Bonzini, Peter Crosthwaite,
Richard Henderson, Peter Maydell, Sergey Fedorov
This option is unused; besides, it bloats the struct when not needed.
Let's just let writers define their own locks elsewhere.
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Emilio G. Cota <cota@braap.org>
---
cpus.c | 2 +-
include/qemu/seqlock.h | 10 +---------
2 files changed, 2 insertions(+), 10 deletions(-)
diff --git a/cpus.c b/cpus.c
index 8ae4777..7f550b9 100644
--- a/cpus.c
+++ b/cpus.c
@@ -611,7 +611,7 @@ int cpu_throttle_get_percentage(void)
void cpu_ticks_init(void)
{
- seqlock_init(&timers_state.vm_clock_seqlock, NULL);
+ seqlock_init(&timers_state.vm_clock_seqlock);
vmstate_register(NULL, 0, &vmstate_timers, &timers_state);
throttle_timer = timer_new_ns(QEMU_CLOCK_VIRTUAL_RT,
cpu_throttle_timer_tick, NULL);
diff --git a/include/qemu/seqlock.h b/include/qemu/seqlock.h
index 70b01fd..e673482 100644
--- a/include/qemu/seqlock.h
+++ b/include/qemu/seqlock.h
@@ -19,22 +19,17 @@
typedef struct QemuSeqLock QemuSeqLock;
struct QemuSeqLock {
- QemuMutex *mutex;
unsigned sequence;
};
-static inline void seqlock_init(QemuSeqLock *sl, QemuMutex *mutex)
+static inline void seqlock_init(QemuSeqLock *sl)
{
- sl->mutex = mutex;
sl->sequence = 0;
}
/* Lock out other writers and update the count. */
static inline void seqlock_write_lock(QemuSeqLock *sl)
{
- if (sl->mutex) {
- qemu_mutex_lock(sl->mutex);
- }
++sl->sequence;
/* Write sequence before updating other fields. */
@@ -47,9 +42,6 @@ static inline void seqlock_write_unlock(QemuSeqLock *sl)
smp_wmb();
++sl->sequence;
- if (sl->mutex) {
- qemu_mutex_unlock(sl->mutex);
- }
}
static inline unsigned seqlock_read_begin(QemuSeqLock *sl)
--
2.5.0
^ permalink raw reply related [flat|nested] 28+ messages in thread
* Re: [Qemu-devel] [PATCH v2 03/13] seqlock: remove optional mutex
2016-04-07 17:32 ` [Qemu-devel] [PATCH v2 03/13] seqlock: remove optional mutex Emilio G. Cota
@ 2016-04-08 18:26 ` Richard Henderson
0 siblings, 0 replies; 28+ messages in thread
From: Richard Henderson @ 2016-04-08 18:26 UTC (permalink / raw)
To: Emilio G. Cota, QEMU Developers, MTTCG Devel
Cc: Alex Bennée, Paolo Bonzini, Peter Crosthwaite,
Peter Maydell, Sergey Fedorov
On 04/07/2016 10:32 AM, Emilio G. Cota wrote:
> This option is unused; besides, it bloats the struct when not needed.
> Let's just let writers define their own locks elsewhere.
>
> Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
> Signed-off-by: Emilio G. Cota <cota@braap.org>
> ---
> cpus.c | 2 +-
> include/qemu/seqlock.h | 10 +---------
> 2 files changed, 2 insertions(+), 10 deletions(-)
Reviewed-by: Richard Henderson <rth@twiddle.net>
r~
^ permalink raw reply [flat|nested] 28+ messages in thread
* [Qemu-devel] [PATCH v2 04/13] seqlock: rename write_lock/unlock to write_begin/end
2016-04-07 17:32 [Qemu-devel] [PATCH v2 00/10] tb hash improvements Emilio G. Cota
2016-04-07 17:32 ` [Qemu-devel] [PATCH v2 02/13] compiler.h: add QEMU_ALIGNED() to enforce struct alignment Emilio G. Cota
2016-04-07 17:32 ` [Qemu-devel] [PATCH v2 03/13] seqlock: remove optional mutex Emilio G. Cota
@ 2016-04-07 17:32 ` Emilio G. Cota
2016-04-08 18:27 ` Richard Henderson
2016-04-07 17:32 ` [Qemu-devel] [PATCH v2 05/13] include/processor.h: define cpu_relax() Emilio G. Cota
` (8 subsequent siblings)
11 siblings, 1 reply; 28+ messages in thread
From: Emilio G. Cota @ 2016-04-07 17:32 UTC (permalink / raw)
To: QEMU Developers, MTTCG Devel
Cc: Alex Bennée, Paolo Bonzini, Peter Crosthwaite,
Richard Henderson, Peter Maydell, Sergey Fedorov
It is a more appropriate name, now that the mutex embedded
in the seqlock is gone.
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Emilio G. Cota <cota@braap.org>
---
cpus.c | 28 ++++++++++++++--------------
include/qemu/seqlock.h | 4 ++--
2 files changed, 16 insertions(+), 16 deletions(-)
diff --git a/cpus.c b/cpus.c
index 7f550b9..7dad2e6 100644
--- a/cpus.c
+++ b/cpus.c
@@ -247,13 +247,13 @@ int64_t cpu_get_clock(void)
void cpu_enable_ticks(void)
{
/* Here, the really thing protected by seqlock is cpu_clock_offset. */
- seqlock_write_lock(&timers_state.vm_clock_seqlock);
+ seqlock_write_begin(&timers_state.vm_clock_seqlock);
if (!timers_state.cpu_ticks_enabled) {
timers_state.cpu_ticks_offset -= cpu_get_host_ticks();
timers_state.cpu_clock_offset -= get_clock();
timers_state.cpu_ticks_enabled = 1;
}
- seqlock_write_unlock(&timers_state.vm_clock_seqlock);
+ seqlock_write_end(&timers_state.vm_clock_seqlock);
}
/* disable cpu_get_ticks() : the clock is stopped. You must not call
@@ -263,13 +263,13 @@ void cpu_enable_ticks(void)
void cpu_disable_ticks(void)
{
/* Here, the really thing protected by seqlock is cpu_clock_offset. */
- seqlock_write_lock(&timers_state.vm_clock_seqlock);
+ seqlock_write_begin(&timers_state.vm_clock_seqlock);
if (timers_state.cpu_ticks_enabled) {
timers_state.cpu_ticks_offset += cpu_get_host_ticks();
timers_state.cpu_clock_offset = cpu_get_clock_locked();
timers_state.cpu_ticks_enabled = 0;
}
- seqlock_write_unlock(&timers_state.vm_clock_seqlock);
+ seqlock_write_end(&timers_state.vm_clock_seqlock);
}
/* Correlation between real and virtual time is always going to be
@@ -292,7 +292,7 @@ static void icount_adjust(void)
return;
}
- seqlock_write_lock(&timers_state.vm_clock_seqlock);
+ seqlock_write_begin(&timers_state.vm_clock_seqlock);
cur_time = cpu_get_clock_locked();
cur_icount = cpu_get_icount_locked();
@@ -313,7 +313,7 @@ static void icount_adjust(void)
last_delta = delta;
timers_state.qemu_icount_bias = cur_icount
- (timers_state.qemu_icount << icount_time_shift);
- seqlock_write_unlock(&timers_state.vm_clock_seqlock);
+ seqlock_write_end(&timers_state.vm_clock_seqlock);
}
static void icount_adjust_rt(void *opaque)
@@ -345,7 +345,7 @@ static void icount_warp_rt(void)
return;
}
- seqlock_write_lock(&timers_state.vm_clock_seqlock);
+ seqlock_write_begin(&timers_state.vm_clock_seqlock);
if (runstate_is_running()) {
int64_t clock = REPLAY_CLOCK(REPLAY_CLOCK_VIRTUAL_RT,
cpu_get_clock_locked());
@@ -364,7 +364,7 @@ static void icount_warp_rt(void)
timers_state.qemu_icount_bias += warp_delta;
}
vm_clock_warp_start = -1;
- seqlock_write_unlock(&timers_state.vm_clock_seqlock);
+ seqlock_write_end(&timers_state.vm_clock_seqlock);
if (qemu_clock_expired(QEMU_CLOCK_VIRTUAL)) {
qemu_clock_notify(QEMU_CLOCK_VIRTUAL);
@@ -389,9 +389,9 @@ void qtest_clock_warp(int64_t dest)
int64_t deadline = qemu_clock_deadline_ns_all(QEMU_CLOCK_VIRTUAL);
int64_t warp = qemu_soonest_timeout(dest - clock, deadline);
- seqlock_write_lock(&timers_state.vm_clock_seqlock);
+ seqlock_write_begin(&timers_state.vm_clock_seqlock);
timers_state.qemu_icount_bias += warp;
- seqlock_write_unlock(&timers_state.vm_clock_seqlock);
+ seqlock_write_end(&timers_state.vm_clock_seqlock);
qemu_clock_run_timers(QEMU_CLOCK_VIRTUAL);
timerlist_run_timers(aio_context->tlg.tl[QEMU_CLOCK_VIRTUAL]);
@@ -458,9 +458,9 @@ void qemu_start_warp_timer(void)
* It is useful when we want a deterministic execution time,
* isolated from host latencies.
*/
- seqlock_write_lock(&timers_state.vm_clock_seqlock);
+ seqlock_write_begin(&timers_state.vm_clock_seqlock);
timers_state.qemu_icount_bias += deadline;
- seqlock_write_unlock(&timers_state.vm_clock_seqlock);
+ seqlock_write_end(&timers_state.vm_clock_seqlock);
qemu_clock_notify(QEMU_CLOCK_VIRTUAL);
} else {
/*
@@ -471,11 +471,11 @@ void qemu_start_warp_timer(void)
* you will not be sending network packets continuously instead of
* every 100ms.
*/
- seqlock_write_lock(&timers_state.vm_clock_seqlock);
+ seqlock_write_begin(&timers_state.vm_clock_seqlock);
if (vm_clock_warp_start == -1 || vm_clock_warp_start > clock) {
vm_clock_warp_start = clock;
}
- seqlock_write_unlock(&timers_state.vm_clock_seqlock);
+ seqlock_write_end(&timers_state.vm_clock_seqlock);
timer_mod_anticipate(icount_warp_timer, clock + deadline);
}
} else if (deadline == 0) {
diff --git a/include/qemu/seqlock.h b/include/qemu/seqlock.h
index e673482..4dfc055 100644
--- a/include/qemu/seqlock.h
+++ b/include/qemu/seqlock.h
@@ -28,7 +28,7 @@ static inline void seqlock_init(QemuSeqLock *sl)
}
/* Lock out other writers and update the count. */
-static inline void seqlock_write_lock(QemuSeqLock *sl)
+static inline void seqlock_write_begin(QemuSeqLock *sl)
{
++sl->sequence;
@@ -36,7 +36,7 @@ static inline void seqlock_write_lock(QemuSeqLock *sl)
smp_wmb();
}
-static inline void seqlock_write_unlock(QemuSeqLock *sl)
+static inline void seqlock_write_end(QemuSeqLock *sl)
{
/* Write other fields before finalizing sequence. */
smp_wmb();
--
2.5.0
^ permalink raw reply related [flat|nested] 28+ messages in thread
* Re: [Qemu-devel] [PATCH v2 04/13] seqlock: rename write_lock/unlock to write_begin/end
2016-04-07 17:32 ` [Qemu-devel] [PATCH v2 04/13] seqlock: rename write_lock/unlock to write_begin/end Emilio G. Cota
@ 2016-04-08 18:27 ` Richard Henderson
0 siblings, 0 replies; 28+ messages in thread
From: Richard Henderson @ 2016-04-08 18:27 UTC (permalink / raw)
To: Emilio G. Cota, QEMU Developers, MTTCG Devel
Cc: Alex Bennée, Paolo Bonzini, Peter Crosthwaite,
Peter Maydell, Sergey Fedorov
On 04/07/2016 10:32 AM, Emilio G. Cota wrote:
> It is a more appropriate name, now that the mutex embedded
> in the seqlock is gone.
>
> Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
> Signed-off-by: Emilio G. Cota <cota@braap.org>
> ---
> cpus.c | 28 ++++++++++++++--------------
> include/qemu/seqlock.h | 4 ++--
> 2 files changed, 16 insertions(+), 16 deletions(-)
Reviewed-by: Richard Henderson <rth@twiddle.net>
r~
^ permalink raw reply [flat|nested] 28+ messages in thread
* [Qemu-devel] [PATCH v2 05/13] include/processor.h: define cpu_relax()
2016-04-07 17:32 [Qemu-devel] [PATCH v2 00/10] tb hash improvements Emilio G. Cota
` (2 preceding siblings ...)
2016-04-07 17:32 ` [Qemu-devel] [PATCH v2 04/13] seqlock: rename write_lock/unlock to write_begin/end Emilio G. Cota
@ 2016-04-07 17:32 ` Emilio G. Cota
2016-04-08 18:33 ` Richard Henderson
2016-04-07 17:32 ` [Qemu-devel] [PATCH v2 06/13] qemu-thread: add simple test-and-set spinlock Emilio G. Cota
` (7 subsequent siblings)
11 siblings, 1 reply; 28+ messages in thread
From: Emilio G. Cota @ 2016-04-07 17:32 UTC (permalink / raw)
To: QEMU Developers, MTTCG Devel
Cc: Alex Bennée, Paolo Bonzini, Peter Crosthwaite,
Richard Henderson, Peter Maydell, Sergey Fedorov
Taken from the linux kernel.
Signed-off-by: Emilio G. Cota <cota@braap.org>
---
include/qemu/processor.h | 28 ++++++++++++++++++++++++++++
1 file changed, 28 insertions(+)
create mode 100644 include/qemu/processor.h
diff --git a/include/qemu/processor.h b/include/qemu/processor.h
new file mode 100644
index 0000000..675a00a
--- /dev/null
+++ b/include/qemu/processor.h
@@ -0,0 +1,28 @@
+#ifndef QEMU_PROCESSOR_H
+#define QEMU_PROCESSOR_H
+
+#include "qemu/atomic.h"
+
+#if defined(__i386__) || defined(__x86_64__)
+#define cpu_relax() asm volatile("rep; nop" ::: "memory")
+#endif
+
+#ifdef __ia64__
+#define cpu_relax() asm volatile("hint @pause" ::: "memory")
+#endif
+
+#ifdef __aarch64__
+#define cpu_relax() asm volatile("yield" ::: "memory")
+#endif
+
+#if defined(__powerpc64__)
+/* set Hardware Multi-Threading (HMT) priority to low; then back to medium */
+#define cpu_relax() asm volatile("or 1, 1, 1;"
+ "or 2, 2, 2;" ::: "memory")
+#endif
+
+#ifndef cpu_relax
+#define cpu_relax() barrier()
+#endif
+
+#endif /* QEMU_PROCESSOR_H */
--
2.5.0
^ permalink raw reply related [flat|nested] 28+ messages in thread
* Re: [Qemu-devel] [PATCH v2 05/13] include/processor.h: define cpu_relax()
2016-04-07 17:32 ` [Qemu-devel] [PATCH v2 05/13] include/processor.h: define cpu_relax() Emilio G. Cota
@ 2016-04-08 18:33 ` Richard Henderson
0 siblings, 0 replies; 28+ messages in thread
From: Richard Henderson @ 2016-04-08 18:33 UTC (permalink / raw)
To: Emilio G. Cota, QEMU Developers, MTTCG Devel
Cc: Alex Bennée, Paolo Bonzini, Peter Crosthwaite,
Peter Maydell, Sergey Fedorov
On 04/07/2016 10:32 AM, Emilio G. Cota wrote:
> Taken from the linux kernel.
>
> Signed-off-by: Emilio G. Cota <cota@braap.org>
> ---
> include/qemu/processor.h | 28 ++++++++++++++++++++++++++++
> 1 file changed, 28 insertions(+)
> create mode 100644 include/qemu/processor.h
Reviewed-by: Richard Henderson <rth@twiddle.net>
> +#if defined(__i386__) || defined(__x86_64__)
> +#define cpu_relax() asm volatile("rep; nop" ::: "memory")
> +#endif
Not that it matters much, but for the record there's an ICC inspired builtin in
<xmmintrin.h> for this: _mm_pause(). But so long as we're always using gcc or
clang, that's probably just an extra dependency to worry about.
r~
^ permalink raw reply [flat|nested] 28+ messages in thread
* [Qemu-devel] [PATCH v2 06/13] qemu-thread: add simple test-and-set spinlock
2016-04-07 17:32 [Qemu-devel] [PATCH v2 00/10] tb hash improvements Emilio G. Cota
` (3 preceding siblings ...)
2016-04-07 17:32 ` [Qemu-devel] [PATCH v2 05/13] include/processor.h: define cpu_relax() Emilio G. Cota
@ 2016-04-07 17:32 ` Emilio G. Cota
2016-04-08 13:02 ` Alex Bennée
2016-04-08 18:35 ` Richard Henderson
2016-04-07 17:32 ` [Qemu-devel] [PATCH v2 07/13] qemu-thread: call cpu_relax() while spinning Emilio G. Cota
` (6 subsequent siblings)
11 siblings, 2 replies; 28+ messages in thread
From: Emilio G. Cota @ 2016-04-07 17:32 UTC (permalink / raw)
To: QEMU Developers, MTTCG Devel
Cc: Alex Bennée, Paolo Bonzini, Peter Crosthwaite,
Richard Henderson, Peter Maydell, Sergey Fedorov
From: Guillaume Delbergue <guillaume.delbergue@greensocs.com>
Signed-off-by: Guillaume Delbergue <guillaume.delbergue@greensocs.com>
[Rewritten. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
include/qemu/thread.h | 31 +++++++++++++++++++++++++++++++
1 file changed, 31 insertions(+)
diff --git a/include/qemu/thread.h b/include/qemu/thread.h
index bdae6df..1aa843b 100644
--- a/include/qemu/thread.h
+++ b/include/qemu/thread.h
@@ -1,6 +1,8 @@
#ifndef __QEMU_THREAD_H
#define __QEMU_THREAD_H 1
+#include <errno.h>
+#include "qemu/atomic.h"
typedef struct QemuMutex QemuMutex;
typedef struct QemuCond QemuCond;
@@ -60,4 +62,33 @@ struct Notifier;
void qemu_thread_atexit_add(struct Notifier *notifier);
void qemu_thread_atexit_remove(struct Notifier *notifier);
+typedef struct QemuSpin {
+ int value;
+} QemuSpin;
+
+static inline void qemu_spin_init(QemuSpin *spin)
+{
+ spin->value = 0;
+}
+
+static inline void qemu_spin_lock(QemuSpin *spin)
+{
+ do {
+ while (atomic_read(&spin->value));
+ } while (atomic_xchg(&spin->value, true));
+}
+
+static inline int qemu_spin_trylock(QemuSpin *spin)
+{
+ if (atomic_read(&spin->value) || atomic_xchg(&spin->value, true)) {
+ return -EBUSY;
+ }
+ return 0;
+}
+
+static inline void qemu_spin_unlock(QemuSpin *spin)
+{
+ atomic_mb_set(&spin->value, 0);
+}
+
#endif
--
2.5.0
^ permalink raw reply related [flat|nested] 28+ messages in thread
* Re: [Qemu-devel] [PATCH v2 06/13] qemu-thread: add simple test-and-set spinlock
2016-04-07 17:32 ` [Qemu-devel] [PATCH v2 06/13] qemu-thread: add simple test-and-set spinlock Emilio G. Cota
@ 2016-04-08 13:02 ` Alex Bennée
2016-04-08 18:38 ` Richard Henderson
2016-04-08 18:35 ` Richard Henderson
1 sibling, 1 reply; 28+ messages in thread
From: Alex Bennée @ 2016-04-08 13:02 UTC (permalink / raw)
To: Emilio G. Cota
Cc: QEMU Developers, MTTCG Devel, Paolo Bonzini, Peter Crosthwaite,
Richard Henderson, Peter Maydell, Sergey Fedorov
Emilio G. Cota <cota@braap.org> writes:
> From: Guillaume Delbergue <guillaume.delbergue@greensocs.com>
>
> Signed-off-by: Guillaume Delbergue <guillaume.delbergue@greensocs.com>
> [Rewritten. - Paolo]
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
> include/qemu/thread.h | 31 +++++++++++++++++++++++++++++++
> 1 file changed, 31 insertions(+)
>
> diff --git a/include/qemu/thread.h b/include/qemu/thread.h
> index bdae6df..1aa843b 100644
> --- a/include/qemu/thread.h
> +++ b/include/qemu/thread.h
> @@ -1,6 +1,8 @@
> #ifndef __QEMU_THREAD_H
> #define __QEMU_THREAD_H 1
>
> +#include <errno.h>
> +#include "qemu/atomic.h"
>
> typedef struct QemuMutex QemuMutex;
> typedef struct QemuCond QemuCond;
> @@ -60,4 +62,33 @@ struct Notifier;
> void qemu_thread_atexit_add(struct Notifier *notifier);
> void qemu_thread_atexit_remove(struct Notifier *notifier);
>
> +typedef struct QemuSpin {
> + int value;
If we are throwing true and false around as the only two values can we
use bool here and be consistent when setting/clearing.
> +} QemuSpin;
> +
> +static inline void qemu_spin_init(QemuSpin *spin)
> +{
> + spin->value = 0;
> +}
> +
> +static inline void qemu_spin_lock(QemuSpin *spin)
> +{
> + do {
> + while (atomic_read(&spin->value));
> + } while (atomic_xchg(&spin->value, true));
> +}
> +
> +static inline int qemu_spin_trylock(QemuSpin *spin)
> +{
> + if (atomic_read(&spin->value) || atomic_xchg(&spin->value, true)) {
> + return -EBUSY;
> + }
> + return 0;
> +}
> +
> +static inline void qemu_spin_unlock(QemuSpin *spin)
> +{
> + atomic_mb_set(&spin->value, 0);
> +}
> +
> #endif
--
Alex Bennée
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [Qemu-devel] [PATCH v2 06/13] qemu-thread: add simple test-and-set spinlock
2016-04-08 13:02 ` Alex Bennée
@ 2016-04-08 18:38 ` Richard Henderson
2016-04-08 21:24 ` Alex Bennée
0 siblings, 1 reply; 28+ messages in thread
From: Richard Henderson @ 2016-04-08 18:38 UTC (permalink / raw)
To: Alex Bennée, Emilio G. Cota
Cc: QEMU Developers, MTTCG Devel, Paolo Bonzini, Peter Crosthwaite,
Peter Maydell, Sergey Fedorov
On 04/08/2016 06:02 AM, Alex Bennée wrote:
>> > +typedef struct QemuSpin {
>> > + int value;
> If we are throwing true and false around as the only two values can we
> use bool here and be consistent when setting/clearing.
>
Except that quite a lot of hosts can only (efficiently) do atomic operations on
a minimum of 4 byte quantities. I'd rather continue to use int here.
r~
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [Qemu-devel] [PATCH v2 06/13] qemu-thread: add simple test-and-set spinlock
2016-04-08 18:38 ` Richard Henderson
@ 2016-04-08 21:24 ` Alex Bennée
2016-04-08 21:26 ` Paolo Bonzini
0 siblings, 1 reply; 28+ messages in thread
From: Alex Bennée @ 2016-04-08 21:24 UTC (permalink / raw)
To: Richard Henderson
Cc: Emilio G. Cota, QEMU Developers, MTTCG Devel, Paolo Bonzini,
Peter Crosthwaite, Peter Maydell, Sergey Fedorov
Richard Henderson <rth@twiddle.net> writes:
> On 04/08/2016 06:02 AM, Alex Bennée wrote:
>>> > +typedef struct QemuSpin {
>>> > + int value;
>> If we are throwing true and false around as the only two values can we
>> use bool here and be consistent when setting/clearing.
>>
>
> Except that quite a lot of hosts can only (efficiently) do atomic operations on
> a minimum of 4 byte quantities. I'd rather continue to use int here.
I suspect bool == unsigned int underneath. But having true/false and 0/1 mixed up
gets confusing even if they are equivalent.
>
>
> r~
--
Alex Bennée
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [Qemu-devel] [PATCH v2 06/13] qemu-thread: add simple test-and-set spinlock
2016-04-08 21:24 ` Alex Bennée
@ 2016-04-08 21:26 ` Paolo Bonzini
2016-04-08 21:31 ` Richard Henderson
0 siblings, 1 reply; 28+ messages in thread
From: Paolo Bonzini @ 2016-04-08 21:26 UTC (permalink / raw)
To: Alex Bennée, Richard Henderson
Cc: Emilio G. Cota, QEMU Developers, MTTCG Devel, Peter Crosthwaite,
Peter Maydell, Sergey Fedorov
On 08/04/2016 23:24, Alex Bennée wrote:
> > Except that quite a lot of hosts can only (efficiently) do atomic operations on
> > a minimum of 4 byte quantities. I'd rather continue to use int here.
>
> I suspect bool == unsigned int underneath. But having true/false and 0/1 mixed up
> gets confusing even if they are equivalent.
Sometimes sizeof(bool) == 1.
Paolo
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [Qemu-devel] [PATCH v2 06/13] qemu-thread: add simple test-and-set spinlock
2016-04-08 21:26 ` Paolo Bonzini
@ 2016-04-08 21:31 ` Richard Henderson
2016-04-08 21:35 ` Sergey Fedorov
0 siblings, 1 reply; 28+ messages in thread
From: Richard Henderson @ 2016-04-08 21:31 UTC (permalink / raw)
To: Paolo Bonzini, Alex Bennée
Cc: Emilio G. Cota, QEMU Developers, MTTCG Devel, Peter Crosthwaite,
Peter Maydell, Sergey Fedorov
On 04/08/2016 02:26 PM, Paolo Bonzini wrote:
>
>
> On 08/04/2016 23:24, Alex Bennée wrote:
>>> Except that quite a lot of hosts can only (efficiently) do atomic operations on
>>> a minimum of 4 byte quantities. I'd rather continue to use int here.
>>
>> I suspect bool == unsigned int underneath. But having true/false and 0/1 mixed up
>> gets confusing even if they are equivalent.
>
> Sometimes sizeof(bool) == 1.
sizeof(bool) == 1 everywhere except MacOSX, where it's 4.
r~
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [Qemu-devel] [PATCH v2 06/13] qemu-thread: add simple test-and-set spinlock
2016-04-08 21:31 ` Richard Henderson
@ 2016-04-08 21:35 ` Sergey Fedorov
0 siblings, 0 replies; 28+ messages in thread
From: Sergey Fedorov @ 2016-04-08 21:35 UTC (permalink / raw)
To: Richard Henderson, Paolo Bonzini, Alex Bennée
Cc: Emilio G. Cota, QEMU Developers, MTTCG Devel, Peter Crosthwaite,
Peter Maydell
On 09/04/16 00:31, Richard Henderson wrote:
> On 04/08/2016 02:26 PM, Paolo Bonzini wrote:
>>
>> On 08/04/2016 23:24, Alex Bennée wrote:
>>>> Except that quite a lot of hosts can only (efficiently) do atomic operations on
>>>> a minimum of 4 byte quantities. I'd rather continue to use int here.
>>> I suspect bool == unsigned int underneath. But having true/false and 0/1 mixed up
>>> gets confusing even if they are equivalent.
>> Sometimes sizeof(bool) == 1.
> sizeof(bool) == 1 everywhere except MacOSX, where it's 4.
>
Hm, that's too strange:
$ gcc a.c
a.c: In function ‘main’:
a.c:6:5: warning: format ‘%d’ expects argument of type ‘int’, but
argument 2 has type ‘long unsigned int’ [-Wformat=]
printf("%d\n", sizeof(bool));
^
$ ./a.out
1
Kind regards,
Sergey
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [Qemu-devel] [PATCH v2 06/13] qemu-thread: add simple test-and-set spinlock
2016-04-07 17:32 ` [Qemu-devel] [PATCH v2 06/13] qemu-thread: add simple test-and-set spinlock Emilio G. Cota
2016-04-08 13:02 ` Alex Bennée
@ 2016-04-08 18:35 ` Richard Henderson
1 sibling, 0 replies; 28+ messages in thread
From: Richard Henderson @ 2016-04-08 18:35 UTC (permalink / raw)
To: Emilio G. Cota, QEMU Developers, MTTCG Devel
Cc: Alex Bennée, Paolo Bonzini, Peter Crosthwaite,
Peter Maydell, Sergey Fedorov
On 04/07/2016 10:32 AM, Emilio G. Cota wrote:
> + while (atomic_read(&spin->value));
I really really don't like ; snuggled up behind loop conditions.
Isn't this where you want to use pause, anyway?
r~
^ permalink raw reply [flat|nested] 28+ messages in thread
* [Qemu-devel] [PATCH v2 07/13] qemu-thread: call cpu_relax() while spinning
2016-04-07 17:32 [Qemu-devel] [PATCH v2 00/10] tb hash improvements Emilio G. Cota
` (4 preceding siblings ...)
2016-04-07 17:32 ` [Qemu-devel] [PATCH v2 06/13] qemu-thread: add simple test-and-set spinlock Emilio G. Cota
@ 2016-04-07 17:32 ` Emilio G. Cota
2016-04-08 18:39 ` Richard Henderson
2016-04-07 17:32 ` [Qemu-devel] [PATCH v2 08/13] qemu-thread: optimize spin_lock for uncontended locks Emilio G. Cota
` (5 subsequent siblings)
11 siblings, 1 reply; 28+ messages in thread
From: Emilio G. Cota @ 2016-04-07 17:32 UTC (permalink / raw)
To: QEMU Developers, MTTCG Devel
Cc: Alex Bennée, Paolo Bonzini, Peter Crosthwaite,
Richard Henderson, Peter Maydell, Sergey Fedorov
Signed-off-by: Emilio G. Cota <cota@braap.org>
---
include/qemu/thread.h | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/include/qemu/thread.h b/include/qemu/thread.h
index 1aa843b..599965e 100644
--- a/include/qemu/thread.h
+++ b/include/qemu/thread.h
@@ -2,6 +2,7 @@
#define __QEMU_THREAD_H 1
#include <errno.h>
+#include "qemu/processor.h"
#include "qemu/atomic.h"
typedef struct QemuMutex QemuMutex;
@@ -74,7 +75,9 @@ static inline void qemu_spin_init(QemuSpin *spin)
static inline void qemu_spin_lock(QemuSpin *spin)
{
do {
- while (atomic_read(&spin->value));
+ while (atomic_read(&spin->value)) {
+ cpu_relax();
+ }
} while (atomic_xchg(&spin->value, true));
}
--
2.5.0
^ permalink raw reply related [flat|nested] 28+ messages in thread
* [Qemu-devel] [PATCH v2 08/13] qemu-thread: optimize spin_lock for uncontended locks
2016-04-07 17:32 [Qemu-devel] [PATCH v2 00/10] tb hash improvements Emilio G. Cota
` (5 preceding siblings ...)
2016-04-07 17:32 ` [Qemu-devel] [PATCH v2 07/13] qemu-thread: call cpu_relax() while spinning Emilio G. Cota
@ 2016-04-07 17:32 ` Emilio G. Cota
2016-04-08 18:40 ` Richard Henderson
2016-04-07 17:32 ` [Qemu-devel] [PATCH v2 09/13] exec: add tb_hash_func5, derived from xxhash Emilio G. Cota
` (4 subsequent siblings)
11 siblings, 1 reply; 28+ messages in thread
From: Emilio G. Cota @ 2016-04-07 17:32 UTC (permalink / raw)
To: QEMU Developers, MTTCG Devel
Cc: Alex Bennée, Paolo Bonzini, Peter Crosthwaite,
Richard Henderson, Peter Maydell, Sergey Fedorov
This way we can acquire the lock with xchg+test, instead of test+xchg+test.
Most spinlocks should be uncontended so this should result in a ne
performance gain.
Before:
4ad957: eb 09 jmp 4ad962 <qht_insert+0x32>
4ad959: 0f 1f 80 00 00 00 00 nopl 0x0(%rax)
4ad960: f3 90 pause
4ad962: 8b 03 mov (%rbx),%eax
4ad964: 85 c0 test %eax,%eax
4ad966: 75 f8 jne 4ad960 <qht_insert+0x30>
4ad968: 89 f8 mov %edi,%eax
4ad96a: 87 03 xchg %eax,(%rbx)
4ad96c: 85 c0 test %eax,%eax
4ad96e: 75 f2 jne 4ad962 <qht_insert+0x32>
After:
4ad980: 89 f8 mov %edi,%eax
4ad982: 87 03 xchg %eax,(%rbx)
4ad984: 85 c0 test %eax,%eax
4ad986: 74 12 je 4ad99a <qht_insert+0x4a>
4ad988: 0f 1f 84 00 00 00 00 nopl 0x0(%rax,%rax,1)
4ad98f: 00
4ad990: 8b 03 mov (%rbx),%eax
4ad992: 85 c0 test %eax,%eax
4ad994: 74 ea je 4ad980 <qht_insert+0x30>
4ad996: f3 90 pause
4ad998: eb f6 jmp 4ad990 <qht_insert+0x40>
Signed-off-by: Emilio G. Cota <cota@braap.org>
---
include/qemu/thread.h | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/include/qemu/thread.h b/include/qemu/thread.h
index 599965e..e2af57c 100644
--- a/include/qemu/thread.h
+++ b/include/qemu/thread.h
@@ -74,11 +74,11 @@ static inline void qemu_spin_init(QemuSpin *spin)
static inline void qemu_spin_lock(QemuSpin *spin)
{
- do {
+ while (atomic_xchg(&spin->value, true)) {
while (atomic_read(&spin->value)) {
cpu_relax();
}
- } while (atomic_xchg(&spin->value, true));
+ }
}
static inline int qemu_spin_trylock(QemuSpin *spin)
--
2.5.0
^ permalink raw reply related [flat|nested] 28+ messages in thread
* Re: [Qemu-devel] [PATCH v2 08/13] qemu-thread: optimize spin_lock for uncontended locks
2016-04-07 17:32 ` [Qemu-devel] [PATCH v2 08/13] qemu-thread: optimize spin_lock for uncontended locks Emilio G. Cota
@ 2016-04-08 18:40 ` Richard Henderson
0 siblings, 0 replies; 28+ messages in thread
From: Richard Henderson @ 2016-04-08 18:40 UTC (permalink / raw)
To: Emilio G. Cota, QEMU Developers, MTTCG Devel
Cc: Alex Bennée, Paolo Bonzini, Peter Crosthwaite,
Peter Maydell, Sergey Fedorov
On 04/07/2016 10:32 AM, Emilio G. Cota wrote:
> static inline void qemu_spin_lock(QemuSpin *spin)
> {
> - do {
> + while (atomic_xchg(&spin->value, true)) {
> while (atomic_read(&spin->value)) {
> cpu_relax();
> }
> - } while (atomic_xchg(&spin->value, true));
> + }
> }
And merge this one as well, please. It's a good improvement, but there's
little point to keeping these separate.
r~
^ permalink raw reply [flat|nested] 28+ messages in thread
* [Qemu-devel] [PATCH v2 09/13] exec: add tb_hash_func5, derived from xxhash
2016-04-07 17:32 [Qemu-devel] [PATCH v2 00/10] tb hash improvements Emilio G. Cota
` (6 preceding siblings ...)
2016-04-07 17:32 ` [Qemu-devel] [PATCH v2 08/13] qemu-thread: optimize spin_lock for uncontended locks Emilio G. Cota
@ 2016-04-07 17:32 ` Emilio G. Cota
2016-04-08 18:45 ` Richard Henderson
2016-04-07 17:32 ` [Qemu-devel] [PATCH v2 10/13] tb hash: hash phys_pc, pc, and flags with xxhash Emilio G. Cota
` (3 subsequent siblings)
11 siblings, 1 reply; 28+ messages in thread
From: Emilio G. Cota @ 2016-04-07 17:32 UTC (permalink / raw)
To: QEMU Developers, MTTCG Devel
Cc: Alex Bennée, Paolo Bonzini, Peter Crosthwaite,
Richard Henderson, Peter Maydell, Sergey Fedorov
This will be used by upcoming changes for hashing the tb hash.
Add this into a separate file to include the copyright notice from
xxhash.
Signed-off-by: Emilio G. Cota <cota@braap.org>
---
include/exec/tb-hash-xx.h | 103 ++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 103 insertions(+)
create mode 100644 include/exec/tb-hash-xx.h
diff --git a/include/exec/tb-hash-xx.h b/include/exec/tb-hash-xx.h
new file mode 100644
index 0000000..64d9865
--- /dev/null
+++ b/include/exec/tb-hash-xx.h
@@ -0,0 +1,103 @@
+/*
+ * xxHash - Fast Hash algorithm
+ * Copyright (C) 2012-2016, Yann Collet
+ *
+ * BSD 2-Clause License (http://www.opensource.org/licenses/bsd-license.php)
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions are
+ * met:
+ *
+ * + Redistributions of source code must retain the above copyright
+ * notice, this list of conditions and the following disclaimer.
+ * + Redistributions in binary form must reproduce the above
+ * copyright notice, this list of conditions and the following disclaimer
+ * in the documentation and/or other materials provided with the
+ * distribution.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ *
+ * You can contact the author at :
+ * - xxHash source repository : https://github.com/Cyan4973/xxHash
+ */
+#ifndef EXEC_TB_HASH_XX
+#define EXEC_TB_HASH_XX
+
+#include <stdint.h>
+
+#define PRIME32_1 2654435761U
+#define PRIME32_2 2246822519U
+#define PRIME32_3 3266489917U
+#define PRIME32_4 668265263U
+#define PRIME32_5 374761393U
+
+/*
+ * Note : although _rotl exists for minGW (GCC under windows), performance
+ * seems poor.
+ */
+#if defined(_MSC_VER)
+# define XXH_rotl32(x, r) _rotl(x, r)
+#else
+# define XXH_rotl32(x, r) ((x << r) | (x >> (32 - r)))
+#endif
+
+/*
+ * xxhash32, customized for input variables that are not guaranteed to be
+ * contiguous in memory.
+ */
+static inline
+uint32_t tb_hash_func5(uint64_t a0, uint64_t b0, uint32_t e, int seed)
+{
+ uint32_t v1 = seed + PRIME32_1 + PRIME32_2;
+ uint32_t v2 = seed + PRIME32_2;
+ uint32_t v3 = seed + 0;
+ uint32_t v4 = seed - PRIME32_1;
+ uint32_t a = a0 >> 31 >> 1;
+ uint32_t b = a0;
+ uint32_t c = b0 >> 31 >> 1;
+ uint32_t d = b0;
+ uint32_t h32;
+
+ v1 += a * PRIME32_2;
+ v1 = XXH_rotl32(v1, 13);
+ v1 *= PRIME32_1;
+
+ v2 += b * PRIME32_2;
+ v2 = XXH_rotl32(v2, 13);
+ v2 *= PRIME32_1;
+
+ v3 += c * PRIME32_2;
+ v3 = XXH_rotl32(v3, 13);
+ v3 *= PRIME32_1;
+
+ v4 += d * PRIME32_2;
+ v4 = XXH_rotl32(v4, 13);
+ v4 *= PRIME32_1;
+
+ h32 = XXH_rotl32(v1, 1) + XXH_rotl32(v2, 7) + XXH_rotl32(v3, 12) +
+ XXH_rotl32(v4, 18);
+ h32 += 20;
+
+ h32 += e * PRIME32_3;
+ h32 = XXH_rotl32(h32, 17) * PRIME32_4;
+
+ h32 ^= h32 >> 15;
+ h32 *= PRIME32_2;
+ h32 ^= h32 >> 13;
+ h32 *= PRIME32_3;
+ h32 ^= h32 >> 16;
+
+ return h32;
+}
+
+#endif /* EXEC_TB_HASH_XX */
--
2.5.0
^ permalink raw reply related [flat|nested] 28+ messages in thread
* Re: [Qemu-devel] [PATCH v2 09/13] exec: add tb_hash_func5, derived from xxhash
2016-04-07 17:32 ` [Qemu-devel] [PATCH v2 09/13] exec: add tb_hash_func5, derived from xxhash Emilio G. Cota
@ 2016-04-08 18:45 ` Richard Henderson
0 siblings, 0 replies; 28+ messages in thread
From: Richard Henderson @ 2016-04-08 18:45 UTC (permalink / raw)
To: Emilio G. Cota, QEMU Developers, MTTCG Devel
Cc: Alex Bennée, Paolo Bonzini, Peter Crosthwaite,
Peter Maydell, Sergey Fedorov
On 04/07/2016 10:32 AM, Emilio G. Cota wrote:
> +#include <stdint.h>
Should already be done by <qemu/osdep.h>
> +/*
> + * Note : although _rotl exists for minGW (GCC under windows), performance
> + * seems poor.
> + */
> +#if defined(_MSC_VER)
> +# define XXH_rotl32(x, r) _rotl(x, r)
> +#else
> +# define XXH_rotl32(x, r) ((x << r) | (x >> (32 - r)))
> +#endif
Please use rol32 from <qemu/bitops.h>
> +static inline
> +uint32_t tb_hash_func5(uint64_t a0, uint64_t b0, uint32_t e, int seed)
> +{
Is there really any point in passing in "seed" anymore?
We certainly don't want different values passed in...
r~
^ permalink raw reply [flat|nested] 28+ messages in thread
* [Qemu-devel] [PATCH v2 10/13] tb hash: hash phys_pc, pc, and flags with xxhash
2016-04-07 17:32 [Qemu-devel] [PATCH v2 00/10] tb hash improvements Emilio G. Cota
` (7 preceding siblings ...)
2016-04-07 17:32 ` [Qemu-devel] [PATCH v2 09/13] exec: add tb_hash_func5, derived from xxhash Emilio G. Cota
@ 2016-04-07 17:32 ` Emilio G. Cota
2016-04-08 18:48 ` Richard Henderson
2016-04-07 17:32 ` [Qemu-devel] [PATCH v2 12/13] qht: add test program Emilio G. Cota
` (2 subsequent siblings)
11 siblings, 1 reply; 28+ messages in thread
From: Emilio G. Cota @ 2016-04-07 17:32 UTC (permalink / raw)
To: QEMU Developers, MTTCG Devel
Cc: Alex Bennée, Paolo Bonzini, Peter Crosthwaite,
Richard Henderson, Peter Maydell, Sergey Fedorov
For some workloads such as arm bootup, tb_phys_hash is performance-critical.
The is due to the high frequency of accesses to the hash table, originated
by (frequent) TLB flushes that wipe out the cpu-private tb_jmp_cache's.
More info:
https://lists.nongnu.org/archive/html/qemu-devel/2016-03/msg05098.html
To dig further into this I modified an arm image booting debian jessie to
immediately shut down after boot. Analysis revealed that quite a bit of time
is unnecessarily spent in tb_phys_hash: the cause is poor hashing that
results in very uneven loading of chains in the hash table's buckets;
the longest observed chain had ~550 elements.
The appended addresses this with two changes:
1) Use xxhash as the hash table's hash function. xxhash is a fast,
high-quality hashing function.
2) Feed the hashing function with not just tb_phys, but also pc and flags.
This improves performance over using just tb_phys for hashing, since that
resulted in some hash buckets having many TB's, while others getting very few;
with these changes, the longest observed chain on a single hash bucket is
brought down from ~550 to ~40.
Tests show that the other element checked for in tb_find_physical,
cs_base, is always a match when tb_phys+pc+flags are a match,
so hashing cs_base is wasteful. It could be that this is an ARM-only
thing, though.
BTW, after this change the hash table should not be called "tb_hash_phys"
anymore; this is addressed later in this series.
This change gives consistent bootup time improvements. I tested two
host machines:
- Intel Xeon E5-2690: 11.6% less time
- Intel i7-4790K: 19.2% less time
Increasing the number of hash buckets yields further improvements. However,
using a larger, fixed number of buckets can degrade performance for other
workloads that do not translate as many blocks (600K+ for debian-jessie arm
bootup). This is dealt with later in this series.
Signed-off-by: Emilio G. Cota <cota@braap.org>
---
cpu-exec.c | 2 +-
include/exec/tb-hash.h | 8 ++++++--
translate-all.c | 6 +++---
3 files changed, 10 insertions(+), 6 deletions(-)
diff --git a/cpu-exec.c b/cpu-exec.c
index debc65c..a889cf1 100644
--- a/cpu-exec.c
+++ b/cpu-exec.c
@@ -233,7 +233,7 @@ static TranslationBlock *tb_find_physical(CPUState *cpu,
/* find translated block using physical mappings */
phys_pc = get_page_addr_code(env, pc);
phys_page1 = phys_pc & TARGET_PAGE_MASK;
- h = tb_phys_hash_func(phys_pc);
+ h = tb_hash_func(phys_pc, pc, flags);
ptb1 = &tcg_ctx.tb_ctx.tb_phys_hash[h];
for(;;) {
tb = *ptb1;
diff --git a/include/exec/tb-hash.h b/include/exec/tb-hash.h
index 0f4e8a0..164b6f4 100644
--- a/include/exec/tb-hash.h
+++ b/include/exec/tb-hash.h
@@ -20,6 +20,9 @@
#ifndef EXEC_TB_HASH
#define EXEC_TB_HASH
+#include "exec/exec-all.h"
+#include "exec/tb-hash-xx.h"
+
/* Only the bottom TB_JMP_PAGE_BITS of the jump cache hash bits vary for
addresses on the same page. The top bits are the same. This allows
TLB invalidation to quickly clear a subset of the hash table. */
@@ -43,9 +46,10 @@ static inline unsigned int tb_jmp_cache_hash_func(target_ulong pc)
| (tmp & TB_JMP_ADDR_MASK));
}
-static inline unsigned int tb_phys_hash_func(tb_page_addr_t pc)
+static inline
+uint32_t tb_hash_func(tb_page_addr_t phys_pc, target_ulong pc, int flags)
{
- return (pc >> 2) & (CODE_GEN_PHYS_HASH_SIZE - 1);
+ return tb_hash_func5(phys_pc, pc, flags, 1) & (CODE_GEN_PHYS_HASH_SIZE - 1);
}
#endif
diff --git a/translate-all.c b/translate-all.c
index 1a8f68b..eca2f16 100644
--- a/translate-all.c
+++ b/translate-all.c
@@ -972,7 +972,7 @@ void tb_phys_invalidate(TranslationBlock *tb, tb_page_addr_t page_addr)
/* remove the TB from the hash list */
phys_pc = tb->page_addr[0] + (tb->pc & ~TARGET_PAGE_MASK);
- h = tb_phys_hash_func(phys_pc);
+ h = tb_hash_func(phys_pc, tb->pc, tb->flags);
tb_hash_remove(&tcg_ctx.tb_ctx.tb_phys_hash[h], tb);
/* remove the TB from the page list */
@@ -1474,8 +1474,8 @@ static void tb_link_page(TranslationBlock *tb, tb_page_addr_t phys_pc,
unsigned int h;
TranslationBlock **ptb;
- /* add in the physical hash table */
- h = tb_phys_hash_func(phys_pc);
+ /* add in the hash table */
+ h = tb_hash_func(phys_pc, tb->pc, tb->flags);
ptb = &tcg_ctx.tb_ctx.tb_phys_hash[h];
tb->phys_hash_next = *ptb;
*ptb = tb;
--
2.5.0
^ permalink raw reply related [flat|nested] 28+ messages in thread
* Re: [Qemu-devel] [PATCH v2 10/13] tb hash: hash phys_pc, pc, and flags with xxhash
2016-04-07 17:32 ` [Qemu-devel] [PATCH v2 10/13] tb hash: hash phys_pc, pc, and flags with xxhash Emilio G. Cota
@ 2016-04-08 18:48 ` Richard Henderson
0 siblings, 0 replies; 28+ messages in thread
From: Richard Henderson @ 2016-04-08 18:48 UTC (permalink / raw)
To: Emilio G. Cota, QEMU Developers, MTTCG Devel
Cc: Alex Bennée, Paolo Bonzini, Peter Crosthwaite,
Peter Maydell, Sergey Fedorov
On 04/07/2016 10:32 AM, Emilio G. Cota wrote:
> For some workloads such as arm bootup, tb_phys_hash is performance-critical.
> The is due to the high frequency of accesses to the hash table, originated
> by (frequent) TLB flushes that wipe out the cpu-private tb_jmp_cache's.
> More info:
> https://lists.nongnu.org/archive/html/qemu-devel/2016-03/msg05098.html
>
> To dig further into this I modified an arm image booting debian jessie to
> immediately shut down after boot. Analysis revealed that quite a bit of time
> is unnecessarily spent in tb_phys_hash: the cause is poor hashing that
> results in very uneven loading of chains in the hash table's buckets;
> the longest observed chain had ~550 elements.
>
> The appended addresses this with two changes:
>
> 1) Use xxhash as the hash table's hash function. xxhash is a fast,
> high-quality hashing function.
>
> 2) Feed the hashing function with not just tb_phys, but also pc and flags.
Reviewed-by: Richard Henderson <rth@twiddle.net>
r~
^ permalink raw reply [flat|nested] 28+ messages in thread
* [Qemu-devel] [PATCH v2 12/13] qht: add test program
2016-04-07 17:32 [Qemu-devel] [PATCH v2 00/10] tb hash improvements Emilio G. Cota
` (8 preceding siblings ...)
2016-04-07 17:32 ` [Qemu-devel] [PATCH v2 10/13] tb hash: hash phys_pc, pc, and flags with xxhash Emilio G. Cota
@ 2016-04-07 17:32 ` Emilio G. Cota
2016-04-08 12:41 ` [Qemu-devel] [PATCH v2 00/10] tb hash improvements Alex Bennée
[not found] ` <1460050358-25025-2-git-send-email-cota@braap.org>
11 siblings, 0 replies; 28+ messages in thread
From: Emilio G. Cota @ 2016-04-07 17:32 UTC (permalink / raw)
To: QEMU Developers, MTTCG Devel
Cc: Alex Bennée, Paolo Bonzini, Peter Crosthwaite,
Richard Henderson, Peter Maydell, Sergey Fedorov
Signed-off-by: Emilio G. Cota <cota@braap.org>
---
tests/.gitignore | 1 +
tests/Makefile | 6 ++-
tests/test-qht.c | 109 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
3 files changed, 115 insertions(+), 1 deletion(-)
create mode 100644 tests/test-qht.c
diff --git a/tests/.gitignore b/tests/.gitignore
index b7bf13e..d6d0700 100644
--- a/tests/.gitignore
+++ b/tests/.gitignore
@@ -47,6 +47,7 @@ test-qapi-visit.[ch]
test-qdev-global-props
test-qemu-opts
test-qga
+test-qht
test-qmp-commands
test-qmp-commands.h
test-qmp-event
diff --git a/tests/Makefile b/tests/Makefile
index 45b9048..b5a99d6 100644
--- a/tests/Makefile
+++ b/tests/Makefile
@@ -68,6 +68,8 @@ check-unit-y += tests/rcutorture$(EXESUF)
gcov-files-rcutorture-y = util/rcu.c
check-unit-y += tests/test-rcu-list$(EXESUF)
gcov-files-test-rcu-list-y = util/rcu.c
+check-unit-y += tests/test-qht$(EXESUF)
+gcov-files-test-qht-y = util/qht.c
check-unit-y += tests/test-bitops$(EXESUF)
check-unit-$(CONFIG_HAS_GLIB_SUBPROCESS_TESTS) += tests/test-qdev-global-props$(EXESUF)
check-unit-y += tests/check-qom-interface$(EXESUF)
@@ -387,7 +389,8 @@ test-obj-y = tests/check-qint.o tests/check-qstring.o tests/check-qdict.o \
tests/test-qmp-commands.o tests/test-visitor-serialization.o \
tests/test-x86-cpuid.o tests/test-mul64.o tests/test-int128.o \
tests/test-opts-visitor.o tests/test-qmp-event.o \
- tests/rcutorture.o tests/test-rcu-list.o
+ tests/rcutorture.o tests/test-rcu-list.o \
+ tests/test-qht.o
$(test-obj-y): QEMU_INCLUDES += -Itests
QEMU_CFLAGS += -I$(SRC_PATH)/tests
@@ -425,6 +428,7 @@ tests/test-cutils$(EXESUF): tests/test-cutils.o util/cutils.o
tests/test-int128$(EXESUF): tests/test-int128.o
tests/rcutorture$(EXESUF): tests/rcutorture.o $(test-util-obj-y)
tests/test-rcu-list$(EXESUF): tests/test-rcu-list.o $(test-util-obj-y)
+tests/test-qht$(EXESUF): tests/test-qht.o $(test-util-obj-y)
tests/test-qdev-global-props$(EXESUF): tests/test-qdev-global-props.o \
hw/core/qdev.o hw/core/qdev-properties.o hw/core/hotplug.o\
diff --git a/tests/test-qht.c b/tests/test-qht.c
new file mode 100644
index 0000000..45e29a2
--- /dev/null
+++ b/tests/test-qht.c
@@ -0,0 +1,109 @@
+#include "qemu/osdep.h"
+#include "qemu/qht.h"
+#include "exec/tb-hash-xx.h"
+
+#define N 5000
+#define SEED 1
+
+static struct qht ht;
+static int32_t arr[N];
+
+/*
+ * We might be tempted to use val as the hash. However, val
+ * could be 0, and all hashes passed to qht must be !0.
+ */
+static inline uint32_t hash_func(uint32_t val)
+{
+ return tb_hash_func5(val, 0, 0, SEED);
+}
+
+static bool is_equal(const void *obj, const void *userp)
+{
+ const int32_t *a = obj;
+ const int32_t *b = userp;
+
+ return *a == *b;
+}
+
+static void insert(int a, int b)
+{
+ int i;
+
+ for (i = a; i < b; i++) {
+ uint32_t hash;
+
+ arr[i] = i;
+ hash = hash_func(i);
+
+ qht_insert(&ht, &arr[i], hash);
+ }
+}
+
+static void rm(int init, int end)
+{
+ int i;
+
+ for (i = init; i < end; i++) {
+ uint32_t hash;
+
+ hash = hash_func(arr[i]);
+ assert(qht_remove(&ht, &arr[i], hash));
+ }
+}
+
+static void check(int a, int b, bool expected)
+{
+ int i;
+
+ for (i = a; i < b; i++) {
+ void *p;
+ uint32_t hash;
+ int32_t val;
+
+ val = i;
+ hash = hash_func(i);
+ p = qht_lookup(&ht, is_equal, &val, hash);
+ assert(!!p == expected);
+ }
+}
+
+static void count_func(struct qht *ht, void *p, uint32_t hash, void *userp)
+{
+ unsigned int *curr = userp;
+
+ (*curr)++;
+}
+
+static void iter_check(unsigned int count)
+{
+ unsigned int curr = 0;
+
+ qht_iter(&ht, count_func, &curr);
+ assert(curr == count);
+}
+
+static void qht_test(unsigned int mode)
+{
+ qht_init(&ht, 0, mode);
+
+ insert(0, N);
+ check(0, N, true);
+ check(-N, -1, false);
+ iter_check(N);
+ rm(1, 2);
+ qht_reset_size(&ht, 0);
+ check(0, N, false);
+
+ qht_destroy(&ht);
+}
+
+int main(int argc, char *argv[])
+{
+ qht_test(0);
+ qht_test(QHT_MODE_MRU_LOOKUP);
+ qht_test(QHT_MODE_MRU_LOOKUP | QHT_MODE_MRU_INSERT);
+ qht_test(QHT_MODE_MRU_LOOKUP | QHT_MODE_MRU_INSERT | QHT_MODE_AUTO_RESIZE);
+ qht_test(QHT_MODE_AUTO_RESIZE | QHT_MODE_MRU_INSERT);
+ qht_test(QHT_MODE_MRU_LOOKUP | QHT_MODE_MRU_INSERT);
+ return 0;
+}
--
2.5.0
^ permalink raw reply related [flat|nested] 28+ messages in thread
* Re: [Qemu-devel] [PATCH v2 00/10] tb hash improvements
2016-04-07 17:32 [Qemu-devel] [PATCH v2 00/10] tb hash improvements Emilio G. Cota
` (9 preceding siblings ...)
2016-04-07 17:32 ` [Qemu-devel] [PATCH v2 12/13] qht: add test program Emilio G. Cota
@ 2016-04-08 12:41 ` Alex Bennée
[not found] ` <1460050358-25025-2-git-send-email-cota@braap.org>
11 siblings, 0 replies; 28+ messages in thread
From: Alex Bennée @ 2016-04-08 12:41 UTC (permalink / raw)
To: Emilio G. Cota
Cc: QEMU Developers, MTTCG Devel, Paolo Bonzini, Peter Crosthwaite,
Richard Henderson, Peter Maydell, Sergey Fedorov
Emilio G. Cota <cota@braap.org> writes:
> See v1 for context:
> https://lists.gnu.org/archive/html/qemu-devel/2016-04/msg00587.html
>
> All patches in v2 are checkpatch-clean, except 05 (checkpatch should
> be ignored for this one) and 06, which I took unmodified (later patches
> fix those warnings while doing other things, anyway).
>
> Note that patch 01 has already been picked up by Paolo; I left it
> here for completeness.
>
> Another patch that is related to this series is the transition of
> tb->flags to uint32_t; this has been sent as a separate patch since
> it touches all targets. It shouldn't conflict with this patchset.
I've just gotten through v1 so I'll leave this for others to look at and
pick up my review on v3. I have been benchmarking though and I like what
I see.
>
> Changes from v1:
> - Drop QEMU_CACHELINE, define QEMU_ALIGNED()
> + Remove excessive caution about Windows: it supports it.
> + define QHT_BUCKET_ALIGN as 64
> - Add some reviewed-by tags from Alex.
> - Drop POSIX spinlock wrapper; use the one pointed out by Paolo
> + Add a couple of fixes over this spinlock implementation:
> * define cpu_relax() for some architectures
> * Optimize spin_lock() for uncontended cases
> - Add tb_hash_func5, a version of xxhash32 customized for tb-hash, so
> that the input values do not have to be contiguous in memory.
> + Drop xxhash.h, add only the customized version at exec/tb-hash-xx.h
> - qht
> + use size_t for counts in qht
> * Drop some 2**32 size checks in there; it's not really a problem
> in practice.
> + s/__func/func__locked/
> + Test program: hash with tb_hash_func5
>
> Thanks,
>
> Emilio
--
Alex Bennée
^ permalink raw reply [flat|nested] 28+ messages in thread
[parent not found: <1460050358-25025-2-git-send-email-cota@braap.org>]
* Re: [Qemu-devel] [PATCH v2 01/13] translate-all: add missing fold of tb_ctx into tcg_ctx
[not found] ` <1460050358-25025-2-git-send-email-cota@braap.org>
@ 2016-04-08 18:25 ` Richard Henderson
0 siblings, 0 replies; 28+ messages in thread
From: Richard Henderson @ 2016-04-08 18:25 UTC (permalink / raw)
To: Emilio G. Cota, QEMU Developers, MTTCG Devel
Cc: Alex Bennée, Paolo Bonzini, Peter Crosthwaite,
Peter Maydell, Sergey Fedorov
On 04/07/2016 10:32 AM, Emilio G. Cota wrote:
> Since 5e5f07e08 "TCG: Move translation block variables
> to new context inside tcg_ctx: tb_ctx" on Feb 1 2013, compilation
> of usermode + TB_DEBUG_CHECK has been broken. Fix it.
>
> Signed-off-by: Emilio G. Cota <cota@braap.org>
> ---
> translate-all.c | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
Reviewed-by: Richard Henderson <rth@twiddle.net>
r~
^ permalink raw reply [flat|nested] 28+ messages in thread