All of lore.kernel.org
 help / color / mirror / Atom feed
From: Eric Dumazet <dada1@cosmosbay.com>
To: Ravikiran G Thirumalai <kiran@scalex86.org>
Cc: linux-kernel@vger.kernel.org,
	"Shai Fultheim (Shai@scalex86.org)" <shai@scalex86.org>,
	pravin b shelar <pravin.shelar@calsoftinc.com>
Subject: Re: [RFC] NUMA futex hashing
Date: Tue, 8 Aug 2006 11:14:49 +0200	[thread overview]
Message-ID: <200608081114.50256.dada1@cosmosbay.com> (raw)
In-Reply-To: <20060808070708.GA3931@localhost.localdomain>

On Tuesday 08 August 2006 09:07, Ravikiran G Thirumalai wrote:
> Current futex hash scheme is not the best for NUMA.   The futex hash table
> is an array of struct futex_hash_bucket, which is just a spinlock and a
> list_head -- this means multiple spinlocks on the same cacheline and on
> NUMA machines, on the same internode cacheline.  If futexes of two
> unrelated threads running on two different nodes happen to hash onto
> adjacent hash buckets, or buckets on the same internode cacheline, then we
> have the internode cacheline bouncing between nodes.
>
> Here is a simple scheme which maintains per-node hash tables for futexes.
>
> In this scheme, a private futex is assigned to the node id of the futex's
> KVA. The reasoning is, the futex KVA is allocated from the node as
> indicated by memory policy set by the process, and that should be a good
> 'home node' for that futex.  Of course this helps workloads where all the
> threads of a process are bound to the same node, but it seems reasonable to
> run all threads of a process on the same node.
>
> A shared futex is assigned a home node based on jhash2 itself.  Since inode
> and offset are used as the key, the same inode offset is used to arrive at
> the home node of a shared futex.  This distributes private futexes across
> all nodes.
>
> Comments? Suggestions? Particularly regarding shared futexes.  Any policy
> suggestions?
>

Your patch seems fine, but I have one comment.

For non NUMA machine, we would have one useless indirection to get the 
futex_queues pointer.

static struct futex_hash_bucket *futex_queues[1];

I think it is worth to redesign your patch so that this extra-indirection is 
needed only for NUMA machines.

#if defined(CONFIG_NUMA)
static struct futex_hash_bucket *futex_queues[MAX_NUMNODES];
#define FUTEX_QUEUES(nodeid, hash) \
	&futex_queues[nodeid][hash & ((1 << FUTEX_HASHBITS)-1)];
#else
static struct futex_hash_bucket futex_queues[1<<FUTEX_HASHBITS];
# define FUTEX_QUEUES(nodeid, hash) \
     &futex_queues[hash & ((1 << FUTEX_HASHBITS)-1)];
#endif

Thank you

> Thanks,
> Kiran
>
> Note: This patch needs to have kvaddr_to_nid() reintroduced.  This was
> taken out in git commit 9f3fd602aef96c2a490e3bfd669d06475aeba8d8
>
> Index: linux-2.6.18-rc3/kernel/futex.c
> ===================================================================
> --- linux-2.6.18-rc3.orig/kernel/futex.c	2006-08-02 12:11:34.000000000
> -0700 +++ linux-2.6.18-rc3/kernel/futex.c	2006-08-02 16:48:47.000000000
> -0700 @@ -137,20 +137,35 @@ struct futex_hash_bucket {
>         struct list_head       chain;
>  };
>
> -static struct futex_hash_bucket futex_queues[1<<FUTEX_HASHBITS];
> +static struct futex_hash_bucket *futex_queues[MAX_NUMNODES] __read_mostly;
>
>  /* Futex-fs vfsmount entry: */
>  static struct vfsmount *futex_mnt;
>
>  /*
>   * We hash on the keys returned from get_futex_key (see below).
> + * With NUMA aware futex hashing, we have per-node hash tables.
> + * We determine the home node of a futex based on the KVA -- if the futex
> + * is a private futex.  For shared futexes, we use  jhash2 itself on the
> + * futex_key to arrive at a home node.
>   */
>  static struct futex_hash_bucket *hash_futex(union futex_key *key)
>  {
> +	int nodeid;
>  	u32 hash = jhash2((u32*)&key->both.word,
>  			  (sizeof(key->both.word)+sizeof(key->both.ptr))/4,
>  			  key->both.offset);
> -	return &futex_queues[hash & ((1 << FUTEX_HASHBITS)-1)];
> +	if (key->both.offset & 0x1) {
> +		/*
> +		 * Shared futex: Use any of the 'possible' nodes as home node.
> +		 */
> +		nodeid = hash & (MAX_NUMNODES -1);
> +		BUG_ON(!node_possible(nodeid));
> +	} else
> +		/* Private futex */
> +		nodeid = kvaddr_to_nid(key->both.ptr);
> +
> +	return &futex_queues[nodeid][hash & ((1 << FUTEX_HASHBITS)-1)];
>  }
>
>  /*
> @@ -1909,13 +1924,25 @@ static int __init init(void)
>  {
>  	unsigned int i;
>
> +	int nid;
> +
> +	for_each_node(nid)
> +	{
> +		futex_queues[nid] = kmalloc_node(
> +					(sizeof(struct futex_hash_bucket) *
> +					(1 << FUTEX_HASHBITS)),
> +					GFP_KERNEL, nid);
> +		if (!futex_queues[nid])
> +			panic("futex_init: Allocation of multi-node futex_queues failed");
> +		for (i = 0; i < (1 << FUTEX_HASHBITS); i++) {
> +			INIT_LIST_HEAD(&futex_queues[nid][i].chain);
> +			spin_lock_init(&futex_queues[nid][i].lock);
> +		}
> +	}
> +
>  	register_filesystem(&futex_fs_type);
>  	futex_mnt = kern_mount(&futex_fs_type);
>
> -	for (i = 0; i < ARRAY_SIZE(futex_queues); i++) {
> -		INIT_LIST_HEAD(&futex_queues[i].chain);
> -		spin_lock_init(&futex_queues[i].lock);
> -	}
>  	return 0;
>  }
>  __initcall(init);
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

  reply	other threads:[~2006-08-08  9:14 UTC|newest]

Thread overview: 78+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-08-08  7:07 [RFC] NUMA futex hashing Ravikiran G Thirumalai
2006-08-08  9:14 ` Eric Dumazet [this message]
2006-08-08 20:31   ` Ravikiran G Thirumalai
2006-08-08  9:37 ` Jes Sorensen
2006-08-08  9:58   ` Andi Kleen
2006-08-08 10:07     ` Jes Sorensen
2006-08-08  9:57 ` Andi Kleen
2006-08-08 10:10   ` Eric Dumazet
2006-08-08 10:36     ` Andi Kleen
2006-08-08 12:29       ` Eric Dumazet
2006-08-08 12:47         ` Andi Kleen
2006-08-08 12:57           ` Eric Dumazet
2006-08-08 14:39             ` Ulrich Drepper
2006-08-08 15:11               ` Nick Piggin
2006-08-08 15:36                 ` Ulrich Drepper
2006-08-08 16:22                   ` Nick Piggin
2006-08-08 16:26                     ` Nick Piggin
2006-08-08 16:49                     ` Ulrich Drepper
2006-08-08 16:08                 ` Eric Dumazet
2006-08-08 16:34                   ` Nick Piggin
2006-08-08 16:49                     ` Eric Dumazet
2006-08-08 16:59                       ` Eric Dumazet
2006-08-09  1:56                       ` Nick Piggin
2006-08-08 16:58                   ` Ulrich Drepper
2006-08-08 17:08                     ` Eric Dumazet
2006-08-09  1:58                     ` Nick Piggin
2006-08-09  6:26                       ` Eric Dumazet
2006-08-09  6:43                         ` Eric Dumazet
2007-03-15 19:10                           ` [PATCH 0/3] FUTEX : new PRIVATE futexes, SMP and NUMA improvements Eric Dumazet
2007-03-15 20:15                             ` Nick Piggin
2007-03-16  8:05                             ` Peter Zijlstra
2007-03-16  9:30                               ` Eric Dumazet
2007-03-16 10:10                                 ` Peter Zijlstra
2007-03-16 10:30                                   ` Eric Dumazet
2007-03-16 10:36                                     ` Peter Zijlstra
2007-04-04  7:16                             ` Ulrich Drepper
2007-04-05 17:49                               ` [PATCH] FUTEX : new PRIVATE futexes Eric Dumazet
2007-04-05 20:43                                 ` Ulrich Drepper
2007-04-06  1:19                                 ` Nick Piggin
2007-04-06  5:53                                   ` Eric Dumazet
2007-04-06 11:50                                     ` Nick Piggin
2007-04-06  6:05                                   ` Hugh Dickins
2007-04-06 17:41                                     ` Jan Engelhardt
2007-04-06 12:26                                 ` Shared futexes (was [PATCH] FUTEX : new PRIVATE futexes) Peter Zijlstra
2007-04-06 13:02                                   ` Hugh Dickins
2007-04-06 13:15                                     ` Peter Zijlstra
2007-04-06 13:15                                     ` Nick Piggin
2007-04-06 13:22                                       ` Peter Zijlstra
2007-04-06 13:40                                         ` Nick Piggin
2007-04-06 12:31                                 ` [PATCH] FUTEX : new PRIVATE futexes Peter Zijlstra
2007-04-07  8:43                                 ` [PATCH, take4] " Eric Dumazet
2007-04-07  9:30                                   ` Nick Piggin
2007-04-07 10:00                                     ` Eric Dumazet
2007-04-11  7:22                                       ` Nick Piggin
2007-04-11  8:14                                         ` Eric Dumazet
2007-04-11  9:23                                           ` Nick Piggin
2007-04-11  9:30                                             ` Pierre Peiffer
2007-04-11  9:39                                               ` Nick Piggin
2007-04-11  9:40                                                 ` Nick Piggin
2007-04-11  9:35                                             ` Eric Dumazet
2007-04-12  1:57                                               ` Nick Piggin
2007-04-07 11:18                                   ` Jakub Jelinek
2007-04-07 11:54                                     ` Eric Dumazet
2007-04-07 16:40                                       ` Ulrich Drepper
2007-04-07 22:15                                   ` Andrew Morton
2007-04-10  9:21                                     ` Eric Dumazet
2007-04-11  9:19                                   ` [PATCH, take5] " Eric Dumazet
2007-04-11 12:23                                     ` Rusty Russell
2007-04-26 12:55                                     ` [PATCH, take6] " Eric Dumazet
2007-04-26 13:35                                       ` Pierre Peiffer
2007-03-15 19:13                           ` [PATCH 1/3] FUTEX : introduce PROCESS_PRIVATE semantic Eric Dumazet
2007-03-15 19:16                           ` [PATCH 2/3] FUTEX : introduce private hashtables Eric Dumazet
2007-03-15 20:25                             ` Nick Piggin
2007-03-15 21:09                               ` Ulrich Drepper
2007-03-15 21:29                                 ` Nick Piggin
2007-03-15 22:59                               ` William Lee Irwin III
2007-03-15 19:20                           ` [PATCH 3/3] FUTEX : NUMA friendly global hashtable Eric Dumazet
2006-08-09  0:13     ` [RFC] NUMA futex hashing Ravikiran G Thirumalai

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=200608081114.50256.dada1@cosmosbay.com \
    --to=dada1@cosmosbay.com \
    --cc=kiran@scalex86.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=pravin.shelar@calsoftinc.com \
    --cc=shai@scalex86.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.