linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 1/2] futex: eliminate cache miss from futex_hash()
@ 2015-06-22 13:16 Rasmus Villemoes
  2015-06-22 13:16 ` [not-a-PATCH 2/2] futex: eliminate instruction from hash_futex Rasmus Villemoes
  2015-06-22 13:33 ` [PATCH 1/2] futex: eliminate cache miss from futex_hash() Peter Zijlstra
  0 siblings, 2 replies; 5+ messages in thread
From: Rasmus Villemoes @ 2015-06-22 13:16 UTC (permalink / raw)
  To: Andrew Morton, Richard Weinberger, Peter Zijlstra (Intel),
	Andy Lutomirski, Darren Hart, Oleg Nesterov, Michael Kerrisk
  Cc: Rasmus Villemoes, linux-kernel

futex_hash() references two global variables: the base pointer
futex_queues and the size of the array futex_hashsize. The latter is
marked __read_mostly, while the former is not, so they are likely to
end up very far from each other. This means that futex_hash() is
likely to encounter two cache misses.

We could mark futex_queues as __read_mostly as well, but that doesn't
guarantee they'll end up next to each other. So put the two variables
in a small singleton struct and mark that as __read_mostly.

A diff of the disassembly shows what I'd expect:

@@ -213289,14 +213289,14 @@
 :      31 d1                   xor    %edx,%ecx
 :      c1 ca 12                ror    $0x12,%edx
 :      29 d1                   sub    %edx,%ecx
-:      48 8b 15 95 61 e4 00    mov    0xe46195(%rip),%rdx        # ffffffff81eff9e8 <futex_hashsize>
+:      48 8b 15 a5 61 e4 00    mov    0xe461a5(%rip),%rdx        # ffffffff81eff9f8 <__futex_data+0x8>
 :      31 c8                   xor    %ecx,%eax
 :      c1 c9 08                ror    $0x8,%ecx
 :      29 c8                   sub    %ecx,%eax
 :      48 83 ea 01             sub    $0x1,%rdx
 :      48 21 d0                and    %rdx,%rax
 :      48 c1 e0 06             shl    $0x6,%rax
-:      48 03 05 74 dc 00 01    add    0x100dc74(%rip),%rax        # ffffffff820c74e0 <futex_queues>
+:      48 03 05 84 61 e4 00    add    0xe46184(%rip),%rax        # ffffffff81eff9f0 <__futex_data>
 :      c3                      retq
 :      0f 1f 00                nopl   (%rax)

Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
---

If this is worth applying, one could consider giving dentry_hashtable,
inode_hashtable and their friends the same treatment. The variables
are all __read_mostly, but may still end up in separate cache lines
(even if the linker places them next to each other).

Even better would be to have some alternatives-like mechanism for
replacing the code with instructions using immediates. But that's far
more complicated...

 kernel/futex.c | 13 +++++++++++--
 1 file changed, 11 insertions(+), 2 deletions(-)

diff --git a/kernel/futex.c b/kernel/futex.c
index 2579e407ff67..c5f33bf78293 100644
--- a/kernel/futex.c
+++ b/kernel/futex.c
@@ -254,9 +254,18 @@ struct futex_hash_bucket {
 	struct plist_head chain;
 } ____cacheline_aligned_in_smp;
 
-static unsigned long __read_mostly futex_hashsize;
+/*
+ * The base of the bucket array and its size are always used together
+ * (after initialization only in hash_futex()), so ensure that they
+ * reside in the same cacheline.
+ */
+static struct {
+	struct futex_hash_bucket *queues;
+	unsigned long            hashsize;
+} __futex_data __read_mostly __aligned(16);
+#define futex_queues   (__futex_data.queues)
+#define futex_hashsize (__futex_data.hashsize)
 
-static struct futex_hash_bucket *futex_queues;
 
 static inline void futex_get_mm(union futex_key *key)
 {
-- 
2.1.3

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [not-a-PATCH 2/2] futex: eliminate instruction from hash_futex
  2015-06-22 13:16 [PATCH 1/2] futex: eliminate cache miss from futex_hash() Rasmus Villemoes
@ 2015-06-22 13:16 ` Rasmus Villemoes
  2015-06-22 13:33 ` [PATCH 1/2] futex: eliminate cache miss from futex_hash() Peter Zijlstra
  1 sibling, 0 replies; 5+ messages in thread
From: Rasmus Villemoes @ 2015-06-22 13:16 UTC (permalink / raw)
  To: Andrew Morton, Richard Weinberger, Peter Zijlstra (Intel),
	Andy Lutomirski, Darren Hart, Oleg Nesterov, Michael Kerrisk
  Cc: Rasmus Villemoes, linux-kernel

After initialization, futex_hashsize is only used in hash_futex, where
we subtract 1 to get the proper mask. So store that mask instead.

Or not: Unfortunately, this causes gcc to emit an 'and with memory op'
instead of doing a load which can happen at least somewhat in parallel
with the last few instructions computing hash, as this diff shows:

-:      48 8b 15 a5 61 e4 00    mov    0xe461a5(%rip),%rdx        # ffffffff81eff9f8 <__futex_data+0x8>
 :      31 c8                   xor    %ecx,%eax
 :      c1 c9 08                ror    $0x8,%ecx
 :      29 c8                   sub    %ecx,%eax
-:      48 83 ea 01             sub    $0x1,%rdx
-:      48 21 d0                and    %rdx,%rax
+:      23 05 9f 61 e4 00       and    0xe4619f(%rip),%eax        # ffffffff81eff9f8 <__futex_data+0x8>
 :      48 c1 e0 06             shl    $0x6,%rax
-:      48 03 05 84 61 e4 00    add    0xe46184(%rip),%rax        # ffffffff81eff9f0 <__futex_data>
+:      48 03 05 8c 61 e4 00    add    0xe4618c(%rip),%rax        # ffffffff81eff9f0 <__futex_data>
 :      c3                      retq

Not-signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
---
 kernel/futex.c | 11 ++++++-----
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/kernel/futex.c b/kernel/futex.c
index c5f33bf78293..22865bcbcae8 100644
--- a/kernel/futex.c
+++ b/kernel/futex.c
@@ -255,16 +255,16 @@ struct futex_hash_bucket {
 } ____cacheline_aligned_in_smp;
 
 /*
- * The base of the bucket array and its size are always used together
+ * The base of the bucket array and the mask are always used together
  * (after initialization only in hash_futex()), so ensure that they
  * reside in the same cacheline.
  */
 static struct {
 	struct futex_hash_bucket *queues;
-	unsigned long            hashsize;
+	u32                      mask;
 } __futex_data __read_mostly __aligned(16);
 #define futex_queues   (__futex_data.queues)
-#define futex_hashsize (__futex_data.hashsize)
+#define futex_hashmask (__futex_data.mask)
 
 
 static inline void futex_get_mm(union futex_key *key)
@@ -320,7 +320,7 @@ static struct futex_hash_bucket *hash_futex(union futex_key *key)
 	u32 hash = jhash2((u32*)&key->both.word,
 			  (sizeof(key->both.word)+sizeof(key->both.ptr))/4,
 			  key->both.offset);
-	return &futex_queues[hash & (futex_hashsize - 1)];
+	return &futex_queues[hash & futex_hashmask];
 }
 
 /*
@@ -3035,6 +3035,7 @@ static int __init futex_init(void)
 {
 	unsigned int futex_shift;
 	unsigned long i;
+	unsigned long futex_hashsize;
 
 #if CONFIG_BASE_SMALL
 	futex_hashsize = 16;
@@ -3045,7 +3046,7 @@ static int __init futex_init(void)
 	futex_queues = alloc_large_system_hash("futex", sizeof(*futex_queues),
 					       futex_hashsize, 0,
 					       futex_hashsize < 256 ? HASH_SMALL : 0,
-					       &futex_shift, NULL,
+					       &futex_shift, &futex_hashmask,
 					       futex_hashsize, futex_hashsize);
 	futex_hashsize = 1UL << futex_shift;
 
-- 
2.1.3

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH 1/2] futex: eliminate cache miss from futex_hash()
  2015-06-22 13:16 [PATCH 1/2] futex: eliminate cache miss from futex_hash() Rasmus Villemoes
  2015-06-22 13:16 ` [not-a-PATCH 2/2] futex: eliminate instruction from hash_futex Rasmus Villemoes
@ 2015-06-22 13:33 ` Peter Zijlstra
  2015-06-22 13:45   ` Rasmus Villemoes
  1 sibling, 1 reply; 5+ messages in thread
From: Peter Zijlstra @ 2015-06-22 13:33 UTC (permalink / raw)
  To: Rasmus Villemoes
  Cc: Andrew Morton, Richard Weinberger, Andy Lutomirski, Darren Hart,
	Oleg Nesterov, Michael Kerrisk, linux-kernel

On Mon, Jun 22, 2015 at 03:16:18PM +0200, Rasmus Villemoes wrote:
> +static struct {
> +	struct futex_hash_bucket *queues;
> +	unsigned long            hashsize;
> +} __futex_data __read_mostly __aligned(16);

Does: __aligned(sizeof(__futex_data)), work?

Because 16 might waste 8 bytes on 32bit.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH 1/2] futex: eliminate cache miss from futex_hash()
  2015-06-22 13:33 ` [PATCH 1/2] futex: eliminate cache miss from futex_hash() Peter Zijlstra
@ 2015-06-22 13:45   ` Rasmus Villemoes
  2015-06-22 13:55     ` Peter Zijlstra
  0 siblings, 1 reply; 5+ messages in thread
From: Rasmus Villemoes @ 2015-06-22 13:45 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Andrew Morton, Richard Weinberger, Andy Lutomirski, Darren Hart,
	Oleg Nesterov, Michael Kerrisk, linux-kernel

On Mon, Jun 22 2015, Peter Zijlstra <peterz@infradead.org> wrote:

> On Mon, Jun 22, 2015 at 03:16:18PM +0200, Rasmus Villemoes wrote:
>> +static struct {
>> +	struct futex_hash_bucket *queues;
>> +	unsigned long            hashsize;
>> +} __futex_data __read_mostly __aligned(16);
>
> Does: __aligned(sizeof(__futex_data)), work?
>

Unfortunately not:

kernel/futex.c:265:30: error: ‘__futex_data’ undeclared here (not in a function)
kernel/futex.c:265:1: error: requested alignment is not an integer constant

> Because 16 might waste 8 bytes on 32bit.

Yeah, wasting >= 48 bytes was the reason I didn't make it
____cacheline_aligned. If 8 bytes is also too much, I suppose one could
just give the struct a tag and then use sizeof(struct futex_data).

Rasmus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH 1/2] futex: eliminate cache miss from futex_hash()
  2015-06-22 13:45   ` Rasmus Villemoes
@ 2015-06-22 13:55     ` Peter Zijlstra
  0 siblings, 0 replies; 5+ messages in thread
From: Peter Zijlstra @ 2015-06-22 13:55 UTC (permalink / raw)
  To: Rasmus Villemoes
  Cc: Andrew Morton, Richard Weinberger, Andy Lutomirski, Darren Hart,
	Oleg Nesterov, Michael Kerrisk, linux-kernel

On Mon, Jun 22, 2015 at 03:45:10PM +0200, Rasmus Villemoes wrote:
> On Mon, Jun 22 2015, Peter Zijlstra <peterz@infradead.org> wrote:

> > Does: __aligned(sizeof(__futex_data)), work?

> Unfortunately not:

> > Because 16 might waste 8 bytes on 32bit.
> 
> Yeah, wasting >= 48 bytes was the reason I didn't make it
> ____cacheline_aligned. If 8 bytes is also too much, I suppose one could
> just give the struct a tag and then use sizeof(struct futex_data).

Nah, its probably fine, just wondering if it could be done better.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2015-06-22 13:55 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-06-22 13:16 [PATCH 1/2] futex: eliminate cache miss from futex_hash() Rasmus Villemoes
2015-06-22 13:16 ` [not-a-PATCH 2/2] futex: eliminate instruction from hash_futex Rasmus Villemoes
2015-06-22 13:33 ` [PATCH 1/2] futex: eliminate cache miss from futex_hash() Peter Zijlstra
2015-06-22 13:45   ` Rasmus Villemoes
2015-06-22 13:55     ` Peter Zijlstra

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).