* [Qemu-devel] [RFC PATCH] qht: Align sequence lock to cache line @ 2016-10-25 15:35 Pranith Kumar 2016-10-25 15:41 ` Paolo Bonzini 2016-10-25 20:55 ` Emilio G. Cota 0 siblings, 2 replies; 10+ messages in thread From: Pranith Kumar @ 2016-10-25 15:35 UTC (permalink / raw) To: Richard Henderson, Alex Bennée, Markus Armbruster, Sergey Fedorov, Paolo Bonzini, Emilio G. Cota, open list:All patches CC here Using perf, I see that sequence lock is being a bottleneck since it is being read by everyone. Giving it its own cache-line seems to help things quite a bit. Using qht-bench, I measured the following for: $ ./tests/qht-bench -d 10 -n 24 -u <x> throughput base patch %change update 0 8.07 13.33 +65% 10 7.10 8.90 +25% 20 6.34 7.02 +10% 30 5.48 6.11 +9.6% 40 4.90 5.46 +11.42% I am not able to see any significant increases for lower thread counts though. Signed-off-by: Pranith Kumar <bobby.prani@gmail.com> --- include/qemu/seqlock.h | 2 +- util/qht.c | 4 ++-- 2 files changed, 3 insertions(+), 3 deletions(-) diff --git a/include/qemu/seqlock.h b/include/qemu/seqlock.h index 8dee11d..954abe8 100644 --- a/include/qemu/seqlock.h +++ b/include/qemu/seqlock.h @@ -21,7 +21,7 @@ typedef struct QemuSeqLock QemuSeqLock; struct QemuSeqLock { unsigned sequence; -}; +} QEMU_ALIGNED(64); static inline void seqlock_init(QemuSeqLock *sl) { diff --git a/util/qht.c b/util/qht.c index ff4d2e6..4d82609 100644 --- a/util/qht.c +++ b/util/qht.c @@ -101,14 +101,14 @@ * be grabbed first. */ struct qht_bucket { - QemuSpin lock; QemuSeqLock sequence; + QemuSpin lock; uint32_t hashes[QHT_BUCKET_ENTRIES]; void *pointers[QHT_BUCKET_ENTRIES]; struct qht_bucket *next; } QEMU_ALIGNED(QHT_BUCKET_ALIGN); -QEMU_BUILD_BUG_ON(sizeof(struct qht_bucket) > QHT_BUCKET_ALIGN); +QEMU_BUILD_BUG_ON(sizeof(struct qht_bucket) > 2 * QHT_BUCKET_ALIGN); /** * struct qht_map - structure to track an array of buckets -- 2.10.1 ^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [Qemu-devel] [RFC PATCH] qht: Align sequence lock to cache line 2016-10-25 15:35 [Qemu-devel] [RFC PATCH] qht: Align sequence lock to cache line Pranith Kumar @ 2016-10-25 15:41 ` Paolo Bonzini 2016-10-25 15:49 ` Pranith Kumar 2016-10-25 20:55 ` Emilio G. Cota 1 sibling, 1 reply; 10+ messages in thread From: Paolo Bonzini @ 2016-10-25 15:41 UTC (permalink / raw) To: Pranith Kumar, Richard Henderson, Alex Bennée, Markus Armbruster, Sergey Fedorov, Emilio G. Cota, open list:All patches CC here On 25/10/2016 17:35, Pranith Kumar wrote: > Using perf, I see that sequence lock is being a bottleneck since it is > being read by everyone. Giving it its own cache-line seems to help > things quite a bit. > > Using qht-bench, I measured the following for: > > $ ./tests/qht-bench -d 10 -n 24 -u <x> > > throughput base patch %change > update > 0 8.07 13.33 +65% > 10 7.10 8.90 +25% > 20 6.34 7.02 +10% > 30 5.48 6.11 +9.6% > 40 4.90 5.46 +11.42% > > I am not able to see any significant increases for lower thread counts though. Whoa, that's a bit of a heavy hammer. The idea is that whoever modifies the seqlock must take the spinlock first, and whoever reads the seqlock will read one of the members of hashes[]/pointers[]. So having everything in the same cacheline should be better. What happens if you just change QHT_BUCKET_ALIGN to 128? Paolo > Signed-off-by: Pranith Kumar <bobby.prani@gmail.com> > --- > include/qemu/seqlock.h | 2 +- > util/qht.c | 4 ++-- > 2 files changed, 3 insertions(+), 3 deletions(-) > > diff --git a/include/qemu/seqlock.h b/include/qemu/seqlock.h > index 8dee11d..954abe8 100644 > --- a/include/qemu/seqlock.h > +++ b/include/qemu/seqlock.h > @@ -21,7 +21,7 @@ typedef struct QemuSeqLock QemuSeqLock; > > struct QemuSeqLock { > unsigned sequence; > -}; > +} QEMU_ALIGNED(64); > > static inline void seqlock_init(QemuSeqLock *sl) > { > diff --git a/util/qht.c b/util/qht.c > index ff4d2e6..4d82609 100644 > --- a/util/qht.c > +++ b/util/qht.c > @@ -101,14 +101,14 @@ > * be grabbed first. > */ > struct qht_bucket { > - QemuSpin lock; > QemuSeqLock sequence; > + QemuSpin lock; > uint32_t hashes[QHT_BUCKET_ENTRIES]; > void *pointers[QHT_BUCKET_ENTRIES]; > struct qht_bucket *next; > } QEMU_ALIGNED(QHT_BUCKET_ALIGN); > > -QEMU_BUILD_BUG_ON(sizeof(struct qht_bucket) > QHT_BUCKET_ALIGN); > +QEMU_BUILD_BUG_ON(sizeof(struct qht_bucket) > 2 * QHT_BUCKET_ALIGN); > > /** > * struct qht_map - structure to track an array of buckets > ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [Qemu-devel] [RFC PATCH] qht: Align sequence lock to cache line 2016-10-25 15:41 ` Paolo Bonzini @ 2016-10-25 15:49 ` Pranith Kumar 2016-10-25 15:54 ` Paolo Bonzini 0 siblings, 1 reply; 10+ messages in thread From: Pranith Kumar @ 2016-10-25 15:49 UTC (permalink / raw) To: Paolo Bonzini Cc: Richard Henderson, Alex Bennée, Markus Armbruster, Emilio G. Cota, open list:All patches CC here On Tue, Oct 25, 2016 at 11:41 AM, Paolo Bonzini <pbonzini@redhat.com> wrote: > > > On 25/10/2016 17:35, Pranith Kumar wrote: >> Using perf, I see that sequence lock is being a bottleneck since it is >> being read by everyone. Giving it its own cache-line seems to help >> things quite a bit. >> >> Using qht-bench, I measured the following for: >> >> $ ./tests/qht-bench -d 10 -n 24 -u <x> >> >> throughput base patch %change >> update >> 0 8.07 13.33 +65% >> 10 7.10 8.90 +25% >> 20 6.34 7.02 +10% >> 30 5.48 6.11 +9.6% >> 40 4.90 5.46 +11.42% >> >> I am not able to see any significant increases for lower thread counts though. > > Whoa, that's a bit of a heavy hammer. The idea is that whoever modifies > the seqlock must take the spinlock first, and whoever reads the seqlock > will read one of the members of hashes[]/pointers[]. So having > everything in the same cacheline should be better. But we are taking the seqlock of only the head bucket, while the readers are reading hashes/pointers of the chained buckets. > > What happens if you just change QHT_BUCKET_ALIGN to 128? > 0 4.47 10 9.82 (weird!) 20 8.13 30 7.13 40 6.45 Thanks, -- Pranith ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [Qemu-devel] [RFC PATCH] qht: Align sequence lock to cache line 2016-10-25 15:49 ` Pranith Kumar @ 2016-10-25 15:54 ` Paolo Bonzini 2016-10-25 19:12 ` Pranith Kumar 0 siblings, 1 reply; 10+ messages in thread From: Paolo Bonzini @ 2016-10-25 15:54 UTC (permalink / raw) To: Pranith Kumar Cc: Richard Henderson, Alex Bennée, Markus Armbruster, Emilio G. Cota, open list:All patches CC here On 25/10/2016 17:49, Pranith Kumar wrote: > But we are taking the seqlock of only the head bucket, while the > readers are reading hashes/pointers of the chained buckets. No, we aren't. See qht_lookup__slowpath. This patch: throughput base patch %change update 0 8.07 13.33 +65% 10 7.10 8.90 +25% 20 6.34 7.02 +10% 30 5.48 6.11 +9.6% 40 4.90 5.46 +11.42% Just doubling the cachesize: throughput base patch %change update 0 8.07 4.47 -45% ?!? 10 7.10 9.82 +38% 20 6.34 8.13 +28% 30 5.48 7.13 +30% 40 5.90 6.45 +30% It seems to me that your machine has 128-byte cachelines. Paolo ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [Qemu-devel] [RFC PATCH] qht: Align sequence lock to cache line 2016-10-25 15:54 ` Paolo Bonzini @ 2016-10-25 19:12 ` Pranith Kumar 2016-10-25 20:02 ` Paolo Bonzini 0 siblings, 1 reply; 10+ messages in thread From: Pranith Kumar @ 2016-10-25 19:12 UTC (permalink / raw) To: Paolo Bonzini Cc: Richard Henderson, Alex Bennée, Markus Armbruster, Emilio G. Cota, open list:All patches CC here Paolo Bonzini writes: > On 25/10/2016 17:49, Pranith Kumar wrote: >> But we are taking the seqlock of only the head bucket, while the >> readers are reading hashes/pointers of the chained buckets. > > No, we aren't. See qht_lookup__slowpath. I don't see it. The reader is taking the head bucket look in qht_lookup__slowpath() and then iterating over the chained buckets in qht_do_lookup(). The writer is doing the same. It is taking the head bucket lock in qht_insert__locked(). I've written a patch (see below) to take the per-bucket sequence locks. > > This patch: > > throughput base patch %change > update > 0 8.07 13.33 +65% > 10 7.10 8.90 +25% > 20 6.34 7.02 +10% > 30 5.48 6.11 +9.6% > 40 4.90 5.46 +11.42% > > > Just doubling the cachesize: > > throughput base patch %change > update > 0 8.07 4.47 -45% ?!? > 10 7.10 9.82 +38% > 20 6.34 8.13 +28% > 30 5.48 7.13 +30% > 40 5.90 6.45 +30% > > It seems to me that your machine has 128-byte cachelines. > Nope. It is just the regular 64 byte cache line. $ getconf LEVEL1_DCACHE_LINESIZE 64 (The machine model is Xeon CPU E5-2620). Take the per-bucket sequence locks instead of the head bucket lock. Signed-off-by: Pranith Kumar <bobby.prani@gmail.com> --- util/qht.c | 36 ++++++++++++++++++------------------ 1 file changed, 18 insertions(+), 18 deletions(-) diff --git a/util/qht.c b/util/qht.c index 4d82609..cfce5fc 100644 --- a/util/qht.c +++ b/util/qht.c @@ -374,19 +374,19 @@ static void qht_bucket_reset__locked(struct qht_bucket *head) struct qht_bucket *b = head; int i; - seqlock_write_begin(&head->sequence); do { + seqlock_write_begin(&b->sequence); for (i = 0; i < QHT_BUCKET_ENTRIES; i++) { if (b->pointers[i] == NULL) { - goto done; + seqlock_write_end(&b->sequence); + return; } atomic_set(&b->hashes[i], 0); atomic_set(&b->pointers[i], NULL); } + seqlock_write_end(&b->sequence); b = b->next; } while (b); - done: - seqlock_write_end(&head->sequence); } /* call with all bucket locks held */ @@ -446,6 +446,8 @@ void *qht_do_lookup(struct qht_bucket *head, qht_lookup_func_t func, int i; do { + void *q = NULL; + unsigned int version = seqlock_read_begin(&b->sequence); for (i = 0; i < QHT_BUCKET_ENTRIES; i++) { if (atomic_read(&b->hashes[i]) == hash) { /* The pointer is dereferenced before seqlock_read_retry, @@ -455,11 +457,16 @@ void *qht_do_lookup(struct qht_bucket *head, qht_lookup_func_t func, void *p = atomic_rcu_read(&b->pointers[i]); if (likely(p) && likely(func(p, userp))) { - return p; + q = p; + break; } } } - b = atomic_rcu_read(&b->next); + if (!q) { + b = atomic_rcu_read(&b->next); + } else if (!seqlock_read_retry(&b->sequence, version)) { + return q; + } } while (b); return NULL; @@ -469,14 +476,7 @@ static __attribute__((noinline)) void *qht_lookup__slowpath(struct qht_bucket *b, qht_lookup_func_t func, const void *userp, uint32_t hash) { - unsigned int version; - void *ret; - - do { - version = seqlock_read_begin(&b->sequence); - ret = qht_do_lookup(b, func, userp, hash); - } while (seqlock_read_retry(&b->sequence, version)); - return ret; + return qht_do_lookup(b, func, userp, hash); } void *qht_lookup(struct qht *ht, qht_lookup_func_t func, const void *userp, @@ -537,14 +537,14 @@ static bool qht_insert__locked(struct qht *ht, struct qht_map *map, found: /* found an empty key: acquire the seqlock and write */ - seqlock_write_begin(&head->sequence); + seqlock_write_begin(&b->sequence); if (new) { atomic_rcu_set(&prev->next, b); } /* smp_wmb() implicit in seqlock_write_begin. */ atomic_set(&b->hashes[i], hash); atomic_set(&b->pointers[i], p); - seqlock_write_end(&head->sequence); + seqlock_write_end(&b->sequence); return true; } @@ -665,9 +665,9 @@ bool qht_remove__locked(struct qht_map *map, struct qht_bucket *head, } if (q == p) { qht_debug_assert(b->hashes[i] == hash); - seqlock_write_begin(&head->sequence); + seqlock_write_begin(&b->sequence); qht_bucket_remove_entry(b, i); - seqlock_write_end(&head->sequence); + seqlock_write_end(&b->sequence); return true; } } -- 2.10.1 ^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [Qemu-devel] [RFC PATCH] qht: Align sequence lock to cache line 2016-10-25 19:12 ` Pranith Kumar @ 2016-10-25 20:02 ` Paolo Bonzini 2016-10-25 20:35 ` Pranith Kumar 0 siblings, 1 reply; 10+ messages in thread From: Paolo Bonzini @ 2016-10-25 20:02 UTC (permalink / raw) To: Pranith Kumar Cc: open list:All patches CC here, Emilio G. Cota, Alex Bennée, Markus Armbruster, Richard Henderson On 25/10/2016 21:12, Pranith Kumar wrote: > > Paolo Bonzini writes: > >> On 25/10/2016 17:49, Pranith Kumar wrote: >>> But we are taking the seqlock of only the head bucket, while the >>> readers are reading hashes/pointers of the chained buckets. >> >> No, we aren't. See qht_lookup__slowpath. > > > I don't see it. The reader is taking the head bucket look in > qht_lookup__slowpath() and then iterating over the chained buckets in > qht_do_lookup(). The writer is doing the same. It is taking the head bucket > lock in qht_insert__locked(). Uh, you're right. Sorry I wasn't reading it correctly. > I've written a patch (see below) to take the per-bucket sequence locks. What's the performance like? Paolo >> >> This patch: >> >> throughput base patch %change >> update >> 0 8.07 13.33 +65% >> 10 7.10 8.90 +25% >> 20 6.34 7.02 +10% >> 30 5.48 6.11 +9.6% >> 40 4.90 5.46 +11.42% >> >> >> Just doubling the cachesize: >> >> throughput base patch %change >> update >> 0 8.07 4.47 -45% ?!? >> 10 7.10 9.82 +38% >> 20 6.34 8.13 +28% >> 30 5.48 7.13 +30% >> 40 5.90 6.45 +30% >> >> It seems to me that your machine has 128-byte cachelines. >> > > Nope. It is just the regular 64 byte cache line. > > $ getconf LEVEL1_DCACHE_LINESIZE > 64 > > (The machine model is Xeon CPU E5-2620). > > > Take the per-bucket sequence locks instead of the head bucket lock. > > Signed-off-by: Pranith Kumar <bobby.prani@gmail.com> > --- > util/qht.c | 36 ++++++++++++++++++------------------ > 1 file changed, 18 insertions(+), 18 deletions(-) > > diff --git a/util/qht.c b/util/qht.c > index 4d82609..cfce5fc 100644 > --- a/util/qht.c > +++ b/util/qht.c > @@ -374,19 +374,19 @@ static void qht_bucket_reset__locked(struct qht_bucket *head) > struct qht_bucket *b = head; > int i; > > - seqlock_write_begin(&head->sequence); > do { > + seqlock_write_begin(&b->sequence); > for (i = 0; i < QHT_BUCKET_ENTRIES; i++) { > if (b->pointers[i] == NULL) { > - goto done; > + seqlock_write_end(&b->sequence); > + return; > } > atomic_set(&b->hashes[i], 0); > atomic_set(&b->pointers[i], NULL); > } > + seqlock_write_end(&b->sequence); > b = b->next; > } while (b); > - done: > - seqlock_write_end(&head->sequence); > } > > /* call with all bucket locks held */ > @@ -446,6 +446,8 @@ void *qht_do_lookup(struct qht_bucket *head, qht_lookup_func_t func, > int i; > > do { > + void *q = NULL; > + unsigned int version = seqlock_read_begin(&b->sequence); > for (i = 0; i < QHT_BUCKET_ENTRIES; i++) { > if (atomic_read(&b->hashes[i]) == hash) { > /* The pointer is dereferenced before seqlock_read_retry, > @@ -455,11 +457,16 @@ void *qht_do_lookup(struct qht_bucket *head, qht_lookup_func_t func, > void *p = atomic_rcu_read(&b->pointers[i]); > > if (likely(p) && likely(func(p, userp))) { > - return p; > + q = p; > + break; > } > } > } > - b = atomic_rcu_read(&b->next); > + if (!q) { > + b = atomic_rcu_read(&b->next); > + } else if (!seqlock_read_retry(&b->sequence, version)) { > + return q; > + } > } while (b); > > return NULL; > @@ -469,14 +476,7 @@ static __attribute__((noinline)) > void *qht_lookup__slowpath(struct qht_bucket *b, qht_lookup_func_t func, > const void *userp, uint32_t hash) > { > - unsigned int version; > - void *ret; > - > - do { > - version = seqlock_read_begin(&b->sequence); > - ret = qht_do_lookup(b, func, userp, hash); > - } while (seqlock_read_retry(&b->sequence, version)); > - return ret; > + return qht_do_lookup(b, func, userp, hash); > } > > void *qht_lookup(struct qht *ht, qht_lookup_func_t func, const void *userp, > @@ -537,14 +537,14 @@ static bool qht_insert__locked(struct qht *ht, struct qht_map *map, > > found: > /* found an empty key: acquire the seqlock and write */ > - seqlock_write_begin(&head->sequence); > + seqlock_write_begin(&b->sequence); > if (new) { > atomic_rcu_set(&prev->next, b); > } > /* smp_wmb() implicit in seqlock_write_begin. */ > atomic_set(&b->hashes[i], hash); > atomic_set(&b->pointers[i], p); > - seqlock_write_end(&head->sequence); > + seqlock_write_end(&b->sequence); > return true; > } > > @@ -665,9 +665,9 @@ bool qht_remove__locked(struct qht_map *map, struct qht_bucket *head, > } > if (q == p) { > qht_debug_assert(b->hashes[i] == hash); > - seqlock_write_begin(&head->sequence); > + seqlock_write_begin(&b->sequence); > qht_bucket_remove_entry(b, i); > - seqlock_write_end(&head->sequence); > + seqlock_write_end(&b->sequence); > return true; > } > } > ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [Qemu-devel] [RFC PATCH] qht: Align sequence lock to cache line 2016-10-25 20:02 ` Paolo Bonzini @ 2016-10-25 20:35 ` Pranith Kumar 2016-10-25 20:45 ` Emilio G. Cota 0 siblings, 1 reply; 10+ messages in thread From: Pranith Kumar @ 2016-10-25 20:35 UTC (permalink / raw) To: Paolo Bonzini Cc: open list:All patches CC here, Emilio G. Cota, Alex Bennée, Markus Armbruster, Richard Henderson On Tue, Oct 25, 2016 at 4:02 PM, Paolo Bonzini <pbonzini@redhat.com> wrote: > > >> I've written a patch (see below) to take the per-bucket sequence locks. > > What's the performance like? > Applying only this patch, the perf numbers are similar to the 128 cache line alignment you suggested. 0 4 10 9.70 20 8.09 30 7.13 40 6.49 I am not sure why only 100% reader case is so low. Applying the sequence lock cache alignment patch brings it back up to 13 MT/s/thread. -- Pranith ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [Qemu-devel] [RFC PATCH] qht: Align sequence lock to cache line 2016-10-25 20:35 ` Pranith Kumar @ 2016-10-25 20:45 ` Emilio G. Cota 2016-10-25 20:58 ` Paolo Bonzini 0 siblings, 1 reply; 10+ messages in thread From: Emilio G. Cota @ 2016-10-25 20:45 UTC (permalink / raw) To: Pranith Kumar Cc: Paolo Bonzini, open list:All patches CC here, Alex Bennée, Markus Armbruster, Richard Henderson On Tue, Oct 25, 2016 at 16:35:48 -0400, Pranith Kumar wrote: > On Tue, Oct 25, 2016 at 4:02 PM, Paolo Bonzini <pbonzini@redhat.com> wrote: > > > > > >> I've written a patch (see below) to take the per-bucket sequence locks. > > > > What's the performance like? > > > > Applying only this patch, the perf numbers are similar to the 128 > cache line alignment you suggested. That makes sense. Having a single seqlock per bucket is simple and fast; note that bucket chains should be very short (we use good hashing and automatic resize for this purpose). E. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [Qemu-devel] [RFC PATCH] qht: Align sequence lock to cache line 2016-10-25 20:45 ` Emilio G. Cota @ 2016-10-25 20:58 ` Paolo Bonzini 0 siblings, 0 replies; 10+ messages in thread From: Paolo Bonzini @ 2016-10-25 20:58 UTC (permalink / raw) To: Emilio G. Cota, Pranith Kumar Cc: Richard Henderson, Alex Bennée, open list:All patches CC here, Markus Armbruster On 25/10/2016 22:45, Emilio G. Cota wrote: > On Tue, Oct 25, 2016 at 16:35:48 -0400, Pranith Kumar wrote: >> On Tue, Oct 25, 2016 at 4:02 PM, Paolo Bonzini <pbonzini@redhat.com> wrote: >>> >>> >>>> I've written a patch (see below) to take the per-bucket sequence locks. >>> >>> What's the performance like? >>> >> >> Applying only this patch, the perf numbers are similar to the 128 >> cache line alignment you suggested. > > That makes sense. Having a single seqlock per bucket is simple and fast; > note that bucket chains should be very short (we use good hashing and > automatic resize for this purpose). But why do we get such worse performance in the 100% reader case? (And even more puzzling, why does Pranith's original patch improve performance instead of causing more cache misses?) Thanks, Paolo ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [Qemu-devel] [RFC PATCH] qht: Align sequence lock to cache line 2016-10-25 15:35 [Qemu-devel] [RFC PATCH] qht: Align sequence lock to cache line Pranith Kumar 2016-10-25 15:41 ` Paolo Bonzini @ 2016-10-25 20:55 ` Emilio G. Cota 1 sibling, 0 replies; 10+ messages in thread From: Emilio G. Cota @ 2016-10-25 20:55 UTC (permalink / raw) To: Pranith Kumar Cc: Richard Henderson, Alex Bennée, Markus Armbruster, Sergey Fedorov, Paolo Bonzini, open list:All patches CC here On Tue, Oct 25, 2016 at 11:35:06 -0400, Pranith Kumar wrote: > Using perf, I see that sequence lock is being a bottleneck since it is > being read by everyone. Giving it its own cache-line seems to help > things quite a bit. > > Using qht-bench, I measured the following for: > > $ ./tests/qht-bench -d 10 -n 24 -u <x> > > throughput base patch %change > update > 0 8.07 13.33 +65% > 10 7.10 8.90 +25% > 20 6.34 7.02 +10% > 30 5.48 6.11 +9.6% > 40 4.90 5.46 +11.42% > > I am not able to see any significant increases for lower thread counts though. Honestly I don't know what you're measuring here. Your results are low (I assume you're showing here throughput per-thread, but still they're low), and it makes no sense that increasing the cacheline footprint will make a *read-only* workload (0% updates) run faster. More below. > Signed-off-by: Pranith Kumar <bobby.prani@gmail.com> > --- > include/qemu/seqlock.h | 2 +- > util/qht.c | 4 ++-- > 2 files changed, 3 insertions(+), 3 deletions(-) > > diff --git a/include/qemu/seqlock.h b/include/qemu/seqlock.h > index 8dee11d..954abe8 100644 > --- a/include/qemu/seqlock.h > +++ b/include/qemu/seqlock.h > @@ -21,7 +21,7 @@ typedef struct QemuSeqLock QemuSeqLock; > > struct QemuSeqLock { > unsigned sequence; > -}; > +} QEMU_ALIGNED(64); > > static inline void seqlock_init(QemuSeqLock *sl) > { > diff --git a/util/qht.c b/util/qht.c > index ff4d2e6..4d82609 100644 > --- a/util/qht.c > +++ b/util/qht.c > @@ -101,14 +101,14 @@ > * be grabbed first. > */ > struct qht_bucket { > - QemuSpin lock; > QemuSeqLock sequence; > + QemuSpin lock; > uint32_t hashes[QHT_BUCKET_ENTRIES]; > void *pointers[QHT_BUCKET_ENTRIES]; > struct qht_bucket *next; > } QEMU_ALIGNED(QHT_BUCKET_ALIGN); I understand this is a hack but this would have been more localized: diff --git a/util/qht.c b/util/qht.c index ff4d2e6..55db907 100644 --- a/util/qht.c +++ b/util/qht.c @@ -101,14 +101,16 @@ * be grabbed first. */ struct qht_bucket { + struct { + QemuSeqLock sequence; + } QEMU_ALIGNED(QHT_BUCKET_ALIGN); QemuSpin lock; - QemuSeqLock sequence; uint32_t hashes[QHT_BUCKET_ENTRIES]; void *pointers[QHT_BUCKET_ENTRIES]; struct qht_bucket *next; } QEMU_ALIGNED(QHT_BUCKET_ALIGN); So I tested my change above vs. master on a 16-core (32-way) Intel machine (Xeon E5-2690 @ 2.90GHz with turbo-boost disabled), making sure threads are scheduled on separate cores, favouring same-socket ones. Results: http://imgur.com/a/c4dTB So really I don't know what you're measuring. The idea of decoupling the seqlock from the spinlock's cache line doesn't make sense to me, because: - Bucket lock holders are very likely to update the seqlock, so it makes sense to have them in the same cache line (exceptions to this are resizes or traversals, but those are very rare and we're not measuring those in qht-bench) - Thanks to resizing + good hashing, bucket chains are very short, so a single seqlock per bucket is all we need. - We can have *many* buckets (200K is not crazy for the TB htable), so anything that increases their size needs very good justification (see 200K results above). E. ^ permalink raw reply related [flat|nested] 10+ messages in thread
end of thread, other threads:[~2016-10-25 20:58 UTC | newest] Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2016-10-25 15:35 [Qemu-devel] [RFC PATCH] qht: Align sequence lock to cache line Pranith Kumar 2016-10-25 15:41 ` Paolo Bonzini 2016-10-25 15:49 ` Pranith Kumar 2016-10-25 15:54 ` Paolo Bonzini 2016-10-25 19:12 ` Pranith Kumar 2016-10-25 20:02 ` Paolo Bonzini 2016-10-25 20:35 ` Pranith Kumar 2016-10-25 20:45 ` Emilio G. Cota 2016-10-25 20:58 ` Paolo Bonzini 2016-10-25 20:55 ` Emilio G. Cota
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.