All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC] NUMA futex hashing
@ 2006-08-08  7:07 Ravikiran G Thirumalai
  2006-08-08  9:14 ` Eric Dumazet
                   ` (2 more replies)
  0 siblings, 3 replies; 78+ messages in thread
From: Ravikiran G Thirumalai @ 2006-08-08  7:07 UTC (permalink / raw)
  To: linux-kernel; +Cc: Shai Fultheim (Shai@scalex86.org), pravin b shelar

Current futex hash scheme is not the best for NUMA.   The futex hash table is 
an array of struct futex_hash_bucket, which is just a spinlock and a 
list_head -- this means multiple spinlocks on the same cacheline and on NUMA 
machines, on the same internode cacheline.  If futexes of two unrelated 
threads running on two different nodes happen to hash onto adjacent hash 
buckets, or buckets on the same internode cacheline, then we have the 
internode cacheline bouncing between nodes.

Here is a simple scheme which maintains per-node hash tables for futexes.  

In this scheme, a private futex is assigned to the node id of the futex's KVA. 
The reasoning is, the futex KVA is allocated from the node as indicated
by memory policy set by the process, and that should be a good 'home node' 
for that futex.  Of course this helps workloads where all the threads of a 
process are bound to the same node, but it seems reasonable to run all
threads of a process on the same node.

A shared futex is assigned a home node based on jhash2 itself.  Since inode
and offset are used as the key, the same inode offset is used to arrive at
the home node of a shared futex.  This distributes private futexes across
all nodes.

Comments? Suggestions? Particularly regarding shared futexes.  Any policy
suggestions?

Thanks,
Kiran

Note: This patch needs to have kvaddr_to_nid() reintroduced.  This was taken
out in git commit 9f3fd602aef96c2a490e3bfd669d06475aeba8d8

Index: linux-2.6.18-rc3/kernel/futex.c
===================================================================
--- linux-2.6.18-rc3.orig/kernel/futex.c	2006-08-02 12:11:34.000000000 -0700
+++ linux-2.6.18-rc3/kernel/futex.c	2006-08-02 16:48:47.000000000 -0700
@@ -137,20 +137,35 @@ struct futex_hash_bucket {
        struct list_head       chain;
 };
 
-static struct futex_hash_bucket futex_queues[1<<FUTEX_HASHBITS];
+static struct futex_hash_bucket *futex_queues[MAX_NUMNODES] __read_mostly;
 
 /* Futex-fs vfsmount entry: */
 static struct vfsmount *futex_mnt;
 
 /*
  * We hash on the keys returned from get_futex_key (see below).
+ * With NUMA aware futex hashing, we have per-node hash tables.
+ * We determine the home node of a futex based on the KVA -- if the futex
+ * is a private futex.  For shared futexes, we use  jhash2 itself on the
+ * futex_key to arrive at a home node.
  */
 static struct futex_hash_bucket *hash_futex(union futex_key *key)
 {
+	int nodeid;
 	u32 hash = jhash2((u32*)&key->both.word,
 			  (sizeof(key->both.word)+sizeof(key->both.ptr))/4,
 			  key->both.offset);
-	return &futex_queues[hash & ((1 << FUTEX_HASHBITS)-1)];
+	if (key->both.offset & 0x1) {
+		/* 
+		 * Shared futex: Use any of the 'possible' nodes as home node.
+		 */
+		nodeid = hash & (MAX_NUMNODES -1);
+		BUG_ON(!node_possible(nodeid));
+	} else
+		/* Private futex */ 
+		nodeid = kvaddr_to_nid(key->both.ptr);
+	
+	return &futex_queues[nodeid][hash & ((1 << FUTEX_HASHBITS)-1)];
 }
 
 /*
@@ -1909,13 +1924,25 @@ static int __init init(void)
 {
 	unsigned int i;
 
+	int nid;
+
+	for_each_node(nid)
+	{
+		futex_queues[nid] = kmalloc_node(
+					(sizeof(struct futex_hash_bucket) *
+					(1 << FUTEX_HASHBITS)),
+					GFP_KERNEL, nid);
+		if (!futex_queues[nid])
+			panic("futex_init: Allocation of multi-node futex_queues failed");
+		for (i = 0; i < (1 << FUTEX_HASHBITS); i++) {
+			INIT_LIST_HEAD(&futex_queues[nid][i].chain);
+			spin_lock_init(&futex_queues[nid][i].lock);
+		}
+	}
+
 	register_filesystem(&futex_fs_type);
 	futex_mnt = kern_mount(&futex_fs_type);
 
-	for (i = 0; i < ARRAY_SIZE(futex_queues); i++) {
-		INIT_LIST_HEAD(&futex_queues[i].chain);
-		spin_lock_init(&futex_queues[i].lock);
-	}
 	return 0;
 }
 __initcall(init);


^ permalink raw reply	[flat|nested] 78+ messages in thread

end of thread, other threads:[~2007-04-26 13:38 UTC | newest]

Thread overview: 78+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2006-08-08  7:07 [RFC] NUMA futex hashing Ravikiran G Thirumalai
2006-08-08  9:14 ` Eric Dumazet
2006-08-08 20:31   ` Ravikiran G Thirumalai
2006-08-08  9:37 ` Jes Sorensen
2006-08-08  9:58   ` Andi Kleen
2006-08-08 10:07     ` Jes Sorensen
2006-08-08  9:57 ` Andi Kleen
2006-08-08 10:10   ` Eric Dumazet
2006-08-08 10:36     ` Andi Kleen
2006-08-08 12:29       ` Eric Dumazet
2006-08-08 12:47         ` Andi Kleen
2006-08-08 12:57           ` Eric Dumazet
2006-08-08 14:39             ` Ulrich Drepper
2006-08-08 15:11               ` Nick Piggin
2006-08-08 15:36                 ` Ulrich Drepper
2006-08-08 16:22                   ` Nick Piggin
2006-08-08 16:26                     ` Nick Piggin
2006-08-08 16:49                     ` Ulrich Drepper
2006-08-08 16:08                 ` Eric Dumazet
2006-08-08 16:34                   ` Nick Piggin
2006-08-08 16:49                     ` Eric Dumazet
2006-08-08 16:59                       ` Eric Dumazet
2006-08-09  1:56                       ` Nick Piggin
2006-08-08 16:58                   ` Ulrich Drepper
2006-08-08 17:08                     ` Eric Dumazet
2006-08-09  1:58                     ` Nick Piggin
2006-08-09  6:26                       ` Eric Dumazet
2006-08-09  6:43                         ` Eric Dumazet
2007-03-15 19:10                           ` [PATCH 0/3] FUTEX : new PRIVATE futexes, SMP and NUMA improvements Eric Dumazet
2007-03-15 20:15                             ` Nick Piggin
2007-03-16  8:05                             ` Peter Zijlstra
2007-03-16  9:30                               ` Eric Dumazet
2007-03-16 10:10                                 ` Peter Zijlstra
2007-03-16 10:30                                   ` Eric Dumazet
2007-03-16 10:36                                     ` Peter Zijlstra
2007-04-04  7:16                             ` Ulrich Drepper
2007-04-05 17:49                               ` [PATCH] FUTEX : new PRIVATE futexes Eric Dumazet
2007-04-05 20:43                                 ` Ulrich Drepper
2007-04-06  1:19                                 ` Nick Piggin
2007-04-06  5:53                                   ` Eric Dumazet
2007-04-06 11:50                                     ` Nick Piggin
2007-04-06  6:05                                   ` Hugh Dickins
2007-04-06 17:41                                     ` Jan Engelhardt
2007-04-06 12:26                                 ` Shared futexes (was [PATCH] FUTEX : new PRIVATE futexes) Peter Zijlstra
2007-04-06 13:02                                   ` Hugh Dickins
2007-04-06 13:15                                     ` Peter Zijlstra
2007-04-06 13:15                                     ` Nick Piggin
2007-04-06 13:22                                       ` Peter Zijlstra
2007-04-06 13:40                                         ` Nick Piggin
2007-04-06 12:31                                 ` [PATCH] FUTEX : new PRIVATE futexes Peter Zijlstra
2007-04-07  8:43                                 ` [PATCH, take4] " Eric Dumazet
2007-04-07  9:30                                   ` Nick Piggin
2007-04-07 10:00                                     ` Eric Dumazet
2007-04-11  7:22                                       ` Nick Piggin
2007-04-11  8:14                                         ` Eric Dumazet
2007-04-11  9:23                                           ` Nick Piggin
2007-04-11  9:30                                             ` Pierre Peiffer
2007-04-11  9:39                                               ` Nick Piggin
2007-04-11  9:40                                                 ` Nick Piggin
2007-04-11  9:35                                             ` Eric Dumazet
2007-04-12  1:57                                               ` Nick Piggin
2007-04-07 11:18                                   ` Jakub Jelinek
2007-04-07 11:54                                     ` Eric Dumazet
2007-04-07 16:40                                       ` Ulrich Drepper
2007-04-07 22:15                                   ` Andrew Morton
2007-04-10  9:21                                     ` Eric Dumazet
2007-04-11  9:19                                   ` [PATCH, take5] " Eric Dumazet
2007-04-11 12:23                                     ` Rusty Russell
2007-04-26 12:55                                     ` [PATCH, take6] " Eric Dumazet
2007-04-26 13:35                                       ` Pierre Peiffer
2007-03-15 19:13                           ` [PATCH 1/3] FUTEX : introduce PROCESS_PRIVATE semantic Eric Dumazet
2007-03-15 19:16                           ` [PATCH 2/3] FUTEX : introduce private hashtables Eric Dumazet
2007-03-15 20:25                             ` Nick Piggin
2007-03-15 21:09                               ` Ulrich Drepper
2007-03-15 21:29                                 ` Nick Piggin
2007-03-15 22:59                               ` William Lee Irwin III
2007-03-15 19:20                           ` [PATCH 3/3] FUTEX : NUMA friendly global hashtable Eric Dumazet
2006-08-09  0:13     ` [RFC] NUMA futex hashing Ravikiran G Thirumalai

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.