All of lore.kernel.org
 help / color / mirror / Atom feed
* [patch V2 0/7] Sebastian Andrzej Siewior <bigeasy@linutronix.de>, Linus Torvalds <torvalds@linux-foundation.org>, Darren Hart <darren@dvhart.com>, Peter Zijlstra <peterz@infradead.org>, Ingo Molnar <mingo@kernel.org>, Michael Kerrisk <mtk.manpages@googlemail.com>, Davidlohr Bueso <dave@stgolabs.net>, Chris Mason <clm@fb.com>, Carlos O'Donell <carlos@redhat.com>, Torvald Riegel <triegel@redhat.com>, Eric Dumazet <edumazet@google.com>
@ 2016-05-05 19:03 Thomas Gleixner
  2016-05-05 19:03 ` [patch V2 1/7] futex: Add some more function commentry Thomas Gleixner
                   ` (6 more replies)
  0 siblings, 7 replies; 10+ messages in thread
From: Thomas Gleixner @ 2016-05-05 19:03 UTC (permalink / raw)
  To: LKML

The standard futex mechanism in the Linux kernel uses a global hash to store
transient state. Collisions on that hash can lead to performance degradation
and on real-time enabled kernels to unbound priority inversions.

This new attempt to solve the issue does not require user space changes and
operates transparently. On the first futex operation of a process the kernel
allocates a hash private to the process. All process private futexes are
hashed in this hash. Process shared futexes still use the global hash.

For RT applications and pathological use cases a new futex op is provided
which allows the application to preallocate and thereby size the process
private hash.

The last two patches add support to the perf futex-hash benchmark so test can
be run on nodes and the preallocation sizing can be tested.

The last patch contains a first update for the futex man page.

The difference vs. V1 of this series is that it uses hash_long() now that the
hash_64 implementation has been fixed in mainline. The performance numbers are
more or less the same as with V1 [1].

Thanks,

	tglx

[1] http://lkml.kernel.org/r/20160428161742.363543816@linutronix.de

----
 Documentation/sysctl/kernel.txt |   17 +++
 b/include/linux/futex_types.h   |   12 ++
 include/linux/futex.h           |   39 +++++--
 include/linux/mm_types.h        |    4 
 include/uapi/linux/futex.h      |    1 
 init/Kconfig                    |    4 
 kernel/fork.c                   |    3 
 kernel/futex.c                  |  215 +++++++++++++++++++++++++++++++++++++++-
 kernel/sysctl.c                 |   21 +++
 tools/perf/bench/Build          |    4 
 tools/perf/bench/futex-hash.c   |  101 ++++++++++++++++--
 tools/perf/bench/futex.h        |    5 
 12 files changed, 403 insertions(+), 23 deletions(-)

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [patch V2 1/7] futex: Add some more function commentry
  2016-05-05 19:03 [patch V2 0/7] Sebastian Andrzej Siewior <bigeasy@linutronix.de>, Linus Torvalds <torvalds@linux-foundation.org>, Darren Hart <darren@dvhart.com>, Peter Zijlstra <peterz@infradead.org>, Ingo Molnar <mingo@kernel.org>, Michael Kerrisk <mtk.manpages@googlemail.com>, Davidlohr Bueso <dave@stgolabs.net>, Chris Mason <clm@fb.com>, Carlos O'Donell <carlos@redhat.com>, Torvald Riegel <triegel@redhat.com>, Eric Dumazet <edumazet@google.com> Thomas Gleixner
@ 2016-05-05 19:03 ` Thomas Gleixner
  2016-05-05 19:03 ` [patch V2 2/7] futex: Hash private futexes per process Thomas Gleixner
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 10+ messages in thread
From: Thomas Gleixner @ 2016-05-05 19:03 UTC (permalink / raw)
  To: LKML

[-- Attachment #1: futex-Add-some-more-function-commentry.patch --]
[-- Type: text/plain, Size: 1126 bytes --]

Add some more comments and reformat existing ones to kernel doc style.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 kernel/futex.c |   15 ++++++++++++---
 1 file changed, 12 insertions(+), 3 deletions(-)

--- a/kernel/futex.c
+++ b/kernel/futex.c
@@ -373,8 +373,12 @@ static inline int hb_waiters_pending(str
 #endif
 }
 
-/*
- * We hash on the keys returned from get_futex_key (see below).
+/**
+ * hash_futex - Return the hash bucket in the global hash
+ * @key:	Pointer to the futex key for which the hash is calculated
+ *
+ * We hash on the keys returned from get_futex_key (see below) and return the
+ * corresponding hash bucket in the global hash.
  */
 static struct futex_hash_bucket *hash_futex(union futex_key *key)
 {
@@ -384,7 +388,12 @@ static struct futex_hash_bucket *hash_fu
 	return &futex_queues[hash & (futex_hashsize - 1)];
 }
 
-/*
+
+/**
+ * match_futex - Check whether to futex keys are equal
+ * @key1:	Pointer to key1
+ * @key2:	Pointer to key2
+ *
  * Return 1 if two futex_keys are equal, 0 otherwise.
  */
 static inline int match_futex(union futex_key *key1, union futex_key *key2)

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [patch V2 2/7] futex: Hash private futexes per process
  2016-05-05 19:03 [patch V2 0/7] Sebastian Andrzej Siewior <bigeasy@linutronix.de>, Linus Torvalds <torvalds@linux-foundation.org>, Darren Hart <darren@dvhart.com>, Peter Zijlstra <peterz@infradead.org>, Ingo Molnar <mingo@kernel.org>, Michael Kerrisk <mtk.manpages@googlemail.com>, Davidlohr Bueso <dave@stgolabs.net>, Chris Mason <clm@fb.com>, Carlos O'Donell <carlos@redhat.com>, Torvald Riegel <triegel@redhat.com>, Eric Dumazet <edumazet@google.com> Thomas Gleixner
  2016-05-05 19:03 ` [patch V2 1/7] futex: Add some more function commentry Thomas Gleixner
@ 2016-05-05 19:03 ` Thomas Gleixner
  2016-05-05 19:03 ` [patch V2 3/7] futex: Add op for hash preallocation Thomas Gleixner
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 10+ messages in thread
From: Thomas Gleixner @ 2016-05-05 19:03 UTC (permalink / raw)
  To: LKML; +Cc: Sebastian Siewior

[-- Attachment #1: futex-Hash-private-futexes-per-process.patch --]
[-- Type: text/plain, Size: 13765 bytes --]

From: Sebastian Siewior <bigeasy@linutronix.de>

The standard futex mechanism in the Linux kernel uses a global hash to store
transient state. Collisions on that hash can lead to performance degradation
especially on NUMA systems and on real-time enabled kernels even to priority
inversions.

To mitigate that problem we provide per process private hashing. On the first
futex operation in a process the kernel allocates a hash table. The hash table
is accessible via the process mm_struct. On Numa systems the hash is allocated
node local.

If the allocation fails then the global hash table is used as fallback, so
there is no user space visible impact of this feature.

The hash size is a default value which can be tweaked by the sys admin. The
sysctl interface is implemented in a follow up patch to make the review
simpler. For applications which have special requirements for the private hash
and to allow preallocation of the hash for RT applications, we'll provide a
futex OP in a follow up patch.

Performance data acquired on a 4 socket (node) Intel machine with perf bench
futex-hash:

Threads  G 65536  P 4	  P 8      P 16       P 32     P 64     P 128    P 256

1        8175006  8645465  8617469  8628686   8625223  8664491  8590934  8631582
2	 8149869  8618385  8578185  8622267   8603253  8618787  8595073  8590591
4	 7479482  5867840  7882991  7604838   7894380  7882850  7884911  7886278
8	 7308822  2378057  5731051  5550479   7691198  7672814  7711939  7681549
16	 7295893   677414  2670682  3453552   7158906  7688978  7677603  7690290

So with the proper hash size of the private hash is ~5% faster than the global
hash.

With a full perf bench futex-hash run with one process (36 threads) per node
and 1024 futexes per thread the following results are achieved:

G 65536	 P 4     P 8     P 16     P 32     P 64     P 128    P 256    P 512    P 1024  P 2048     
2673390  368952  682626  1223908  1845922  3003524  3538313  4118533  4286925  4289589 4274020

Ratio:   0,14    0,26    0,46     0,69	   1,12     1,32     1,54     1,60     1,60    1,60

So with a private hash size of 256 buckets and above the performance is almost
steady in this pathological test case and factor 1.6 better than the global
hash. Even a 64 buckets hash is already 10% faster,

Signed-off-by: Sebastian Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 include/linux/futex.h       |   38 ++++++++--
 include/linux/futex_types.h |   12 +++
 include/linux/mm_types.h    |    4 +
 init/Kconfig                |    4 +
 kernel/fork.c               |    3 
 kernel/futex.c              |  162 +++++++++++++++++++++++++++++++++++++++++++-
 6 files changed, 212 insertions(+), 11 deletions(-)
 create mode 100644 include/linux/futex_types.h

--- a/include/linux/futex.h
+++ b/include/linux/futex.h
@@ -1,6 +1,7 @@
 #ifndef _LINUX_FUTEX_H
 #define _LINUX_FUTEX_H
 
+#include <linux/futex_types.h>
 #include <uapi/linux/futex.h>
 
 struct inode;
@@ -21,16 +22,19 @@ handle_futex_death(u32 __user *uaddr, st
  *
  * offset is aligned to a multiple of sizeof(u32) (== 4) by definition.
  * We use the two low order bits of offset to tell what is the kind of key :
- *  00 : Private process futex (PTHREAD_PROCESS_PRIVATE)
- *       (no reference on an inode or mm)
+ *  00 : Private process futex (PTHREAD_PROCESS_PRIVATE) using process private
+ *	 hash (no reference on an inode or mm)
  *  01 : Shared futex (PTHREAD_PROCESS_SHARED)
  *	mapped on a file (reference on the underlying inode)
  *  10 : Shared futex (PTHREAD_PROCESS_SHARED)
  *       (but private mapping on an mm, and reference taken on it)
+ *  11 : Private process futex (PTHREAD_PROCESS_PRIVATE) using global hash
+ *	 (no reference on an inode or mm)
 */
 
-#define FUT_OFF_INODE    1 /* We set bit 0 if key has a reference on inode */
-#define FUT_OFF_MMSHARED 2 /* We set bit 1 if key has a reference on mm */
+#define FUT_OFF_INODE		0x01 /* Key has a reference on inode */
+#define FUT_OFF_MMSHARED	0x02 /* Key has a reference on mm */
+#define FUT_OFF_PRIVATE		0x03 /* Key has no ref on inode/mm */
 
 union futex_key {
 	struct {
@@ -60,12 +64,30 @@ extern void exit_pi_state_list(struct ta
 #else
 extern int futex_cmpxchg_enabled;
 #endif
+
 #else
-static inline void exit_robust_list(struct task_struct *curr)
-{
-}
-static inline void exit_pi_state_list(struct task_struct *curr)
+static inline void exit_robust_list(struct task_struct *curr) { }
+static inline void exit_pi_state_list(struct task_struct *curr) { }
+#endif
+
+#ifdef CONFIG_FUTEX_PRIVATE_HASH
+/* Process private hash data for futexes */
+
+extern unsigned int futex_default_hash_bits;
+extern unsigned int futex_max_hash_bits;
+
+extern void futex_mm_hash_exit(struct mm_struct *mm);
+
+static inline void futex_mm_hash_init(struct mm_struct *mm)
 {
+	raw_spin_lock_init(&mm->futex_hash.lock);
+	mm->futex_hash.hash = NULL;
 }
+
+#else
+
+static inline void futex_mm_hash_init(struct mm_struct *mm) { }
+static inline void futex_mm_hash_exit(struct mm_struct *mm) { }
 #endif
+
 #endif
--- /dev/null
+++ b/include/linux/futex_types.h
@@ -0,0 +1,12 @@
+#ifndef _LINUX_FUTEX_TYPES_H
+#define _LINUX_FUTEX_TYPES_H
+
+struct futex_hash_bucket;
+
+struct futex_hash {
+	struct raw_spinlock		lock;
+	unsigned int			hash_bits;
+	struct futex_hash_bucket	*hash;
+};
+
+#endif
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -11,6 +11,7 @@
 #include <linux/completion.h>
 #include <linux/cpumask.h>
 #include <linux/uprobes.h>
+#include <linux/futex_types.h>
 #include <linux/page-flags-layout.h>
 #include <asm/page.h>
 #include <asm/mmu.h>
@@ -442,6 +443,9 @@ struct mm_struct {
 
 	struct linux_binfmt *binfmt;
 
+#ifdef CONFIG_FUTEX_PRIVATE_HASH
+	struct futex_hash futex_hash;
+#endif
 	cpumask_var_t cpu_vm_mask_var;
 
 	/* Architecture-specific MM context */
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -1498,6 +1498,10 @@ config FUTEX
 	  support for "fast userspace mutexes".  The resulting kernel may not
 	  run glibc-based applications correctly.
 
+config FUTEX_PRIVATE_HASH
+	bool
+	default FUTEX && SMP
+
 config HAVE_FUTEX_CMPXCHG
 	bool
 	depends on FUTEX
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -617,6 +617,8 @@ static struct mm_struct *mm_init(struct
 	mm_init_owner(mm, p);
 	mmu_notifier_mm_init(mm);
 	clear_tlb_flush_pending(mm);
+	futex_mm_hash_init(mm);
+
 #if defined(CONFIG_TRANSPARENT_HUGEPAGE) && !USE_SPLIT_PMD_PTLOCKS
 	mm->pmd_huge_pte = NULL;
 #endif
@@ -713,6 +715,7 @@ void mmput(struct mm_struct *mm)
 		khugepaged_exit(mm); /* must run before exit_mmap */
 		exit_mmap(mm);
 		set_mm_exe_file(mm, NULL);
+		futex_mm_hash_exit(mm);
 		if (!list_empty(&mm->mmlist)) {
 			spin_lock(&mmlist_lock);
 			list_del(&mm->mmlist);
--- a/kernel/futex.c
+++ b/kernel/futex.c
@@ -23,6 +23,9 @@
  *  Copyright (C) IBM Corporation, 2009
  *  Thanks to Thomas Gleixner for conceptual design and careful reviews.
  *
+ *  Private hashed futex support by Sebastian Siewior and Thomas Gleixner
+ *  Copyright (C) Linutronix GmbH, 2016
+ *
  *  Thanks to Ben LaHaise for yelling "hashed waitqueues" loudly
  *  enough at me, Linus for the original (flawed) idea, Matthew
  *  Kirkwood for proof-of-concept implementation.
@@ -49,6 +52,7 @@
 #include <linux/fs.h>
 #include <linux/file.h>
 #include <linux/jhash.h>
+#include <linux/hash.h>
 #include <linux/init.h>
 #include <linux/futex.h>
 #include <linux/mount.h>
@@ -169,6 +173,34 @@
  * the code that actually moves the futex(es) between hash buckets (requeue_futex)
  * will do the additional required waiter count housekeeping. This is done for
  * double_lock_hb() and double_unlock_hb(), respectively.
+ *
+ * For private futexes we (pre)allocate a per process hash. We check lockless
+ * whether the hash is already allocated. To access the hash later we need
+ * information about the hash properties as well. This requires barriers as
+ * follows:
+ *
+ * CPU 0					CPU 1
+ * check_hash_allocation()
+ *	if (mm->futex_hash.hash)
+ *		return;
+ *	hash = alloc_hash()
+ *	lock(&mm->futex_hash.lock);
+ *	if (!mm->futex_hash.hash) {
+ *	  mm->futex_hash.par = params;
+ *
+ *	  smp_wmb(); (A0) <-paired with-|
+ *					|
+ *	  mm->futex_hash.hash = hash;	|
+ *					|	check_hash_allocation()
+ *					|	   if (mm->futex_hash.hash)
+ *					|		return;
+ *	  unlock(&mm->futex_hash.lock);	|	get_futex_key_refs()
+ *					|
+ *					|--------- smp_mb() (B)
+ *						s = hash(f, mm->futex_hash.par);
+ *						hb = &mm->futex_hash.hash[s];
+ *
+ * So we utilize the existing smp_mb() in get_futex_key_refs().
  */
 
 #ifndef CONFIG_HAVE_FUTEX_CMPXCHG
@@ -255,6 +287,22 @@ struct futex_hash_bucket {
 	struct plist_head chain;
 } ____cacheline_aligned_in_smp;
 
+#ifdef CONFIG_FUTEX_PRIVATE_HASH
+/*
+ * Process private hash for non-shared futexes
+ */
+#define FUTEX_USE_GLOBAL_HASH		((void *) 0x03)
+
+#define FUTEX_MIN_HASH_BITS		order_base_2(4UL)
+#define FUTEX_DEF_HASH_BITS		order_base_2(8UL)
+#define FUTEX_MAX_HASH_BITS		order_base_2(256UL)
+
+unsigned int futex_default_hash_bits	= FUTEX_DEF_HASH_BITS;
+unsigned int futex_max_hash_bits	= FUTEX_MAX_HASH_BITS;
+#else
+static const unsigned int futex_default_hash_bits = 0;
+#endif
+
 /*
  * The base of the bucket array and its size are always used together
  * (after initialization only in hash_futex()), so ensure that they
@@ -374,13 +422,13 @@ static inline int hb_waiters_pending(str
 }
 
 /**
- * hash_futex - Return the hash bucket in the global hash
+ * hash_global_futex - Return the hash bucket in the global hash
  * @key:	Pointer to the futex key for which the hash is calculated
  *
  * We hash on the keys returned from get_futex_key (see below) and return the
  * corresponding hash bucket in the global hash.
  */
-static struct futex_hash_bucket *hash_futex(union futex_key *key)
+static struct futex_hash_bucket *hash_global_futex(union futex_key *key)
 {
 	u32 hash = jhash2((u32*)&key->both.word,
 			  (sizeof(key->both.word)+sizeof(key->both.ptr))/4,
@@ -388,9 +436,33 @@ static struct futex_hash_bucket *hash_fu
 	return &futex_queues[hash & (futex_hashsize - 1)];
 }
 
+/**
+ * hash_futex - Get the hash bucket for a futex
+ *
+ * Returns either the process private or the global hash bucket which fits the
+ * key.
+ */
+static struct futex_hash_bucket *hash_futex(union futex_key *key)
+{
+#ifdef CONFIG_FUTEX_PRIVATE_HASH
+	struct mm_struct *mm = current->mm;
+	unsigned int slot;
+
+	/*
+	 * Futexes which use the per process hash have the lower bits cleared
+	 */
+	if (key->both.offset & (FUT_OFF_INODE | FUT_OFF_MMSHARED))
+		return hash_global_futex(key);
+
+	slot = hash_long(key->private.address, mm->futex_hash.hash_bits);
+	return &mm->futex_hash.hash[slot];
+#else
+	return hash_global_futex(key);
+#endif
+}
 
 /**
- * match_futex - Check whether to futex keys are equal
+ * match_futex - Check whether two futex keys are equal
  * @key1:	Pointer to key1
  * @key2:	Pointer to key2
  *
@@ -505,7 +577,20 @@ get_futex_key(u32 __user *uaddr, int fsh
 	 */
 	if (!fshared) {
 		key->private.mm = mm;
+		/*
+		 * If we have a process private hash, then we store uaddr
+		 * instead of the page base address.
+		 */
+#ifdef CONFIG_FUTEX_PRIVATE_HASH
+		if (mm->futex_hash.hash != FUTEX_USE_GLOBAL_HASH) {
+			key->private.address = (unsigned long) uaddr;
+		} else {
+			key->private.address = address;
+			key->both.offset |= FUT_OFF_PRIVATE;
+		}
+#else
 		key->private.address = address;
+#endif
 		get_futex_key_refs(key);  /* implies smp_mb(); (B) */
 		return 0;
 	}
@@ -3153,6 +3238,75 @@ void exit_robust_list(struct task_struct
 				   curr, pip);
 }
 
+#ifdef CONFIG_FUTEX_PRIVATE_HASH
+
+void futex_mm_hash_exit(struct mm_struct *mm)
+{
+	if (mm->futex_hash.hash && mm->futex_hash.hash != FUTEX_USE_GLOBAL_HASH)
+		kfree(mm->futex_hash.hash);
+	mm->futex_hash.hash = NULL;
+}
+
+static struct futex_hash_bucket *futex_alloc_hash(unsigned int hash_bits)
+{
+	struct futex_hash_bucket *hb;
+	size_t hash_size, size;
+	int i;
+
+	hash_size = 1 << hash_bits;
+	size = hash_size * sizeof(struct futex_hash_bucket);
+	hb = kzalloc_node(size, GFP_KERNEL, numa_node_id());
+	if (!hb)
+		return NULL;
+
+	for (i = 0; i < hash_size; i++) {
+		atomic_set(&hb[i].waiters, 0);
+		plist_head_init(&hb[i].chain);
+		spin_lock_init(&hb[i].lock);
+	}
+	return hb;
+}
+
+static void futex_populate_hash(unsigned int hash_bits)
+{
+	struct mm_struct *mm = current->mm;
+	struct futex_hash_bucket *hb = NULL;
+
+	/*
+	 * We don't need an explicit smp_mb() when the hash is populated
+	 * because before we dereference mm->futex_hash.hash_bits in the hash
+	 * function we have an smp_mb() in futex_get_key_refs() already.
+	 */
+	if (mm->futex_hash.hash)
+		return;
+
+	/*
+	 * If we failed to allocate a hash on the fly, fall back to the global
+	 * hash.
+	 */
+	hb = futex_alloc_hash(hash_bits);
+	if (!hb)
+		hb = FUTEX_USE_GLOBAL_HASH;
+
+	raw_spin_lock(&mm->futex_hash.lock);
+	/* We might have raced with another task allocating the hash. */
+	if (!mm->futex_hash.hash) {
+		mm->futex_hash.hash_bits = hash_bits;
+		/*
+		 * Ensure that the above is visible before we store
+		 * the pointer.
+		 */
+		smp_wmb(); /* (A0) Pairs with (B) */
+		mm->futex_hash.hash = hb;
+		hb = NULL;
+	}
+	raw_spin_unlock(&mm->futex_hash.lock);
+	kfree(hb);
+}
+#else /* CONFIG_FUTEX_PRIVATE_HASH */
+static inline void futex_populate_hash(unsigned int hash_bits) { }
+#endif
+
 long do_futex(u32 __user *uaddr, int op, u32 val, ktime_t *timeout,
 		u32 __user *uaddr2, u32 val2, u32 val3)
 {
@@ -3161,6 +3315,8 @@ long do_futex(u32 __user *uaddr, int op,
 
 	if (!(op & FUTEX_PRIVATE_FLAG))
 		flags |= FLAGS_SHARED;
+	else
+		futex_populate_hash(futex_default_hash_bits);
 
 	if (op & FUTEX_CLOCK_REALTIME) {
 		flags |= FLAGS_CLOCKRT;

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [patch V2 3/7] futex: Add op for hash preallocation
  2016-05-05 19:03 [patch V2 0/7] Sebastian Andrzej Siewior <bigeasy@linutronix.de>, Linus Torvalds <torvalds@linux-foundation.org>, Darren Hart <darren@dvhart.com>, Peter Zijlstra <peterz@infradead.org>, Ingo Molnar <mingo@kernel.org>, Michael Kerrisk <mtk.manpages@googlemail.com>, Davidlohr Bueso <dave@stgolabs.net>, Chris Mason <clm@fb.com>, Carlos O'Donell <carlos@redhat.com>, Torvald Riegel <triegel@redhat.com>, Eric Dumazet <edumazet@google.com> Thomas Gleixner
  2016-05-05 19:03 ` [patch V2 1/7] futex: Add some more function commentry Thomas Gleixner
  2016-05-05 19:03 ` [patch V2 2/7] futex: Hash private futexes per process Thomas Gleixner
@ 2016-05-05 19:03 ` Thomas Gleixner
  2016-05-05 19:03 ` [patch V2 4/7] futex: Add sysctl knobs for process private hash Thomas Gleixner
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 10+ messages in thread
From: Thomas Gleixner @ 2016-05-05 19:03 UTC (permalink / raw)
  To: LKML; +Cc: Sebastian Siewior

[-- Attachment #1: futex-Add-op-for-hash-preallocation.patch --]
[-- Type: text/plain, Size: 3659 bytes --]

From: Sebastian Siewior <bigeasy@linutronix.de>

The per process hash is allocated on the fly at the first futex operation of a
process. The size of the hash is determined by a system wide default setting
controlled by the sys admin, This is suboptimal for RT applications and
applications with pathological futex abuse,

 - For RT applications its important to allocate the per process hash before the
   first futex operation to avoid the allocation on the first futex operation.

 - For pathological applications which use gazillions of futexes its useful to
   allocate a hash greater than the default hash size.

Add a futex op which allows to preallocate the hash with the requested
size. The size is limited by the systemwide maximum hash size, which can be
set by the admin. The requested size is rounded up to the next order of 2.

The function can be called several times, but ony the first call results in a
hash allocation of the requested size as there is no non-intrusive way to
reallocate/rehash in a multithreaded application.

Note, that this call must be issued before the first futex operation in the
process because that would automatically allocate the default sized hash.

The function returns the actual hash size or 0 if the global hash is used. The
latter is the case on UP and in the rare case that the allocation failed and
the global hash is used as a fallback.

Signed-off-by: Sebastian Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 include/uapi/linux/futex.h |    1 +
 kernel/futex.c             |   41 +++++++++++++++++++++++++++++++++++++++++
 2 files changed, 42 insertions(+)

--- a/include/uapi/linux/futex.h
+++ b/include/uapi/linux/futex.h
@@ -20,6 +20,7 @@
 #define FUTEX_WAKE_BITSET	10
 #define FUTEX_WAIT_REQUEUE_PI	11
 #define FUTEX_CMP_REQUEUE_PI	12
+#define FUTEX_PREALLOC_HASH	13
 
 #define FUTEX_PRIVATE_FLAG	128
 #define FUTEX_CLOCK_REALTIME	256
--- a/kernel/futex.c
+++ b/kernel/futex.c
@@ -3307,6 +3307,45 @@ static void futex_populate_hash(unsigned
 static inline void futex_populate_hash(unsigned int hash_bits) { }
 #endif
 
+/**
+ * futex_preallocate_hash - Preallocate the process private hash
+ * @slots:	Number of slots to allocate
+ *
+ * The function will allocate the process private hash with the number of
+ * requested slots. The number is rounded to the next power of two and may not
+ * exceed the current system limit.
+ *
+ * If the hash was already allocated by either an earlier call to
+ * futex_preallocate_hash() or an earlier futex op which allocated the cache
+ * on the fly, we return the size of the active hash.
+ *
+ * Returns::	Size of the hash, if 0 then the global hash is used.
+ */
+static int futex_preallocate_hash(unsigned int slots)
+{
+#ifdef CONFIG_FUTEX_PRIVATE_HASH
+	struct mm_struct *mm = current->mm;
+	struct futex_hash_bucket *hb;
+	unsigned int bits;
+
+	/* Try to allocate the requested nr of slots */
+	bits = order_base_2(slots);
+
+	if (bits < FUTEX_MIN_HASH_BITS)
+		bits = FUTEX_MIN_HASH_BITS;
+
+	if (bits > futex_max_hash_bits)
+		bits = futex_max_hash_bits;
+
+	futex_populate_hash(bits);
+
+	hb = mm->futex_hash.hash;
+	return hb == FUTEX_USE_GLOBAL_HASH ? 0 : 1 << mm->futex_hash.hash_bits;
+#else
+	return 0;
+#endif
+}
+
 long do_futex(u32 __user *uaddr, int op, u32 val, ktime_t *timeout,
 		u32 __user *uaddr2, u32 val2, u32 val3)
 {
@@ -3362,6 +3401,8 @@ long do_futex(u32 __user *uaddr, int op,
 					     uaddr2);
 	case FUTEX_CMP_REQUEUE_PI:
 		return futex_requeue(uaddr, flags, uaddr2, val, val2, &val3, 1);
+	case FUTEX_PREALLOC_HASH:
+		return futex_preallocate_hash(val);
 	}
 	return -ENOSYS;
 }

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [patch V2 4/7] futex: Add sysctl knobs for process private hash
  2016-05-05 19:03 [patch V2 0/7] Sebastian Andrzej Siewior <bigeasy@linutronix.de>, Linus Torvalds <torvalds@linux-foundation.org>, Darren Hart <darren@dvhart.com>, Peter Zijlstra <peterz@infradead.org>, Ingo Molnar <mingo@kernel.org>, Michael Kerrisk <mtk.manpages@googlemail.com>, Davidlohr Bueso <dave@stgolabs.net>, Chris Mason <clm@fb.com>, Carlos O'Donell <carlos@redhat.com>, Torvald Riegel <triegel@redhat.com>, Eric Dumazet <edumazet@google.com> Thomas Gleixner
                   ` (2 preceding siblings ...)
  2016-05-05 19:03 ` [patch V2 3/7] futex: Add op for hash preallocation Thomas Gleixner
@ 2016-05-05 19:03 ` Thomas Gleixner
  2016-05-05 19:03 ` [patch V2 6/7] perf/bench/futex-hash: Support preallocate hash table Thomas Gleixner
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 10+ messages in thread
From: Thomas Gleixner @ 2016-05-05 19:03 UTC (permalink / raw)
  To: LKML; +Cc: Sebastian Siewior

[-- Attachment #1: add-sysctl-interface.patch --]
[-- Type: text/plain, Size: 3677 bytes --]

From: Sebastian Siewior <bigeasy@linutronix.de>

To adjust the default hash size and the maximum hash size for process private
futexes we add the following sysctls:

futex_private_default_hash_bits:

     Adjusts the default hash size (in bits) which is used for automatic hash
     allocations on the first futex operation

futex_private_max_hash_bits:

     Adjusts the maximum hash size (in bits). This limits the hash size which
     can be preallocated by applications with the FUTEX_PREALLOC_HASH op.

Signed-off-by: Sebastian Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 Documentation/sysctl/kernel.txt |   17 +++++++++++++++++
 include/linux/futex.h           |    1 +
 kernel/futex.c                  |    5 +++--
 kernel/sysctl.c                 |   21 +++++++++++++++++++++
 4 files changed, 42 insertions(+), 2 deletions(-)

--- a/Documentation/sysctl/kernel.txt
+++ b/Documentation/sysctl/kernel.txt
@@ -29,6 +29,8 @@ Currently, these files might (depending
 - core_pipe_limit
 - core_uses_pid
 - ctrl-alt-del
+- futex_private_default_hash_bits
+- futex_private_max_hash_bits
 - dmesg_restrict
 - domainname
 - hostname
@@ -265,6 +267,21 @@ to decide what to do with it.
 
 ==============================================================
 
+futex_private_default_hash_bits:
+
+Adjusts the default hash size (in bits) which is used for
+automatic hash allocations on the first futex operation
+
+==============================================================
+
+futex_private_max_hash_bits:
+
+Adjusts the maximum hash size (in bits). This limits the hash
+size which can be preallocated by applications with the
+FUTEX_PREALLOC_HASH op.
+
+==============================================================
+
 dmesg_restrict:
 
 This toggle indicates whether unprivileged users are prevented
--- a/include/linux/futex.h
+++ b/include/linux/futex.h
@@ -75,6 +75,7 @@ static inline void exit_pi_state_list(st
 
 extern unsigned int futex_default_hash_bits;
 extern unsigned int futex_max_hash_bits;
+extern unsigned int futex_sysmax_hash_bits;
 
 extern void futex_mm_hash_exit(struct mm_struct *mm);
 
--- a/kernel/futex.c
+++ b/kernel/futex.c
@@ -297,8 +297,9 @@ struct futex_hash_bucket {
 #define FUTEX_DEF_HASH_BITS		order_base_2(8UL)
 #define FUTEX_MAX_HASH_BITS		order_base_2(256UL)
 
-unsigned int futex_default_hash_bits	= FUTEX_DEF_HASH_BITS;
-unsigned int futex_max_hash_bits	= FUTEX_MAX_HASH_BITS;
+unsigned int futex_default_hash_bits			= FUTEX_DEF_HASH_BITS;
+unsigned int futex_max_hash_bits			= FUTEX_MAX_HASH_BITS;
+unsigned int  __read_mostly futex_sysmax_hash_bits	= FUTEX_MAX_HASH_BITS;
 #else
 static const unsigned int futex_default_hash_bits = 0;
 #endif
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -65,6 +65,7 @@
 #include <linux/sched/sysctl.h>
 #include <linux/kexec.h>
 #include <linux/bpf.h>
+#include <linux/futex.h>
 
 #include <asm/uaccess.h>
 #include <asm/processor.h>
@@ -593,6 +594,26 @@ static struct ctl_table kern_table[] = {
 		.mode		= 0644,
 		.proc_handler	= proc_dointvec,
 	},
+#ifdef CONFIG_FUTEX_PRIVATE_HASH
+	{
+		.procname	= "futex_private_default_hash_bits",
+		.data		= &futex_default_hash_bits,
+		.maxlen		= sizeof(int),
+		.mode		= 0644,
+		.proc_handler	= proc_dointvec_minmax,
+		.extra1		= &two,
+		.extra2		= &futex_max_hash_bits,
+	},
+	{
+		.procname	= "futex_private_max_hash_bits",
+		.data		= &futex_max_hash_bits,
+		.maxlen		= sizeof(int),
+		.mode		= 0644,
+		.proc_handler	= proc_dointvec_minmax,
+		.extra1		= &futex_default_hash_bits,
+		.extra2		= &futex_sysmax_hash_bits,
+	},
+#endif
 #ifdef CONFIG_FUNCTION_TRACER
 	{
 		.procname	= "ftrace_enabled",

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [patch V2 6/7] perf/bench/futex-hash: Support preallocate hash table
  2016-05-05 19:03 [patch V2 0/7] Sebastian Andrzej Siewior <bigeasy@linutronix.de>, Linus Torvalds <torvalds@linux-foundation.org>, Darren Hart <darren@dvhart.com>, Peter Zijlstra <peterz@infradead.org>, Ingo Molnar <mingo@kernel.org>, Michael Kerrisk <mtk.manpages@googlemail.com>, Davidlohr Bueso <dave@stgolabs.net>, Chris Mason <clm@fb.com>, Carlos O'Donell <carlos@redhat.com>, Torvald Riegel <triegel@redhat.com>, Eric Dumazet <edumazet@google.com> Thomas Gleixner
                   ` (3 preceding siblings ...)
  2016-05-05 19:03 ` [patch V2 4/7] futex: Add sysctl knobs for process private hash Thomas Gleixner
@ 2016-05-05 19:03 ` Thomas Gleixner
  2016-05-05 19:03 ` [patch V2 5/7] perf/bench/futex-hash: Support NUMA Thomas Gleixner
  2016-05-05 19:03 ` [patch V2 7/7] futex.2: Document hash preallocation opcode Thomas Gleixner
  6 siblings, 0 replies; 10+ messages in thread
From: Thomas Gleixner @ 2016-05-05 19:03 UTC (permalink / raw)
  To: LKML; +Cc: Sebastian Andrzej Siewior

[-- Attachment #1: perf-bench-futex-hash-Support-preallocate-hash-table.patch --]
[-- Type: text/plain, Size: 2578 bytes --]

Instead of using the default hash size on the first allocation it is
possible to allocate a specific number of slots upfront.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 tools/perf/bench/futex-hash.c |   16 ++++++++++++++--
 tools/perf/bench/futex.h      |    5 +++++
 2 files changed, 19 insertions(+), 2 deletions(-)

--- a/tools/perf/bench/futex-hash.c
+++ b/tools/perf/bench/futex-hash.c
@@ -30,6 +30,7 @@ static unsigned int nsecs    = 10;
 static unsigned int nfutexes = 1024;
 static bool fshared = false, done = false, silent = false;
 static int futex_flag = 0;
+static unsigned int prealloc;
 static int numa_node = -1;
 
 struct timeval start, end, runtime;
@@ -51,6 +52,7 @@ static const struct option options[] = {
 	OPT_UINTEGER('f', "futexes", &nfutexes, "Specify amount of futexes per threads"),
 	OPT_BOOLEAN( 's', "silent",  &silent,   "Silent mode: do not display data/details"),
 	OPT_BOOLEAN( 'S', "shared",  &fshared,  "Use shared futexes instead of private ones"),
+	OPT_UINTEGER('p', "prealloc",&prealloc, "Specify number of preallocated hash slots"),
 #ifdef CONFIG_NUMA
 	OPT_INTEGER( 'n', "numa",   &numa_node,  "Specify the NUMA node"),
 #endif
@@ -138,6 +140,7 @@ int bench_futex_hash(int argc, const cha
 	unsigned int i, ncpus;
 	pthread_attr_t thread_attr;
 	struct worker *worker = NULL;
+	char *prealloc_str = NULL;
 	char *node_str = NULL;
 	unsigned int cpunum;
 
@@ -192,11 +195,20 @@ int bench_futex_hash(int argc, const cha
 	if (!fshared)
 		futex_flag = FUTEX_PRIVATE_FLAG;
 
-	printf("Run summary [PID %d]: %d threads%s, each operating on %d [%s] futexes for %d secs.\n\n",
+	if (prealloc) {
+		ret = futex_preallocate(prealloc);
+		if (ret < 0)
+			err(EXIT_FAILURE, "futex_prealloate");
+		ret = asprintf(&prealloc_str, " P %u %d", prealloc, ret);
+		if (ret < 0)
+			err(EXIT_FAILURE, "futex_preallocate, asprintf");
+	}
+
+	printf("Run summary [PID %d]: %d threads%s, each operating on %d [%s%s] futexes for %d secs.\n\n",
 	       getpid(), nthreads,
 	       node_str ? : "",
 	       nfutexes, fshared ? "shared":"private",
-	       nsecs);
+	       prealloc_str ? : "", nsecs);
 
 	init_stats(&throughput_stats);
 	pthread_mutex_init(&thread_lock, NULL);
--- a/tools/perf/bench/futex.h
+++ b/tools/perf/bench/futex.h
@@ -101,4 +101,9 @@ static inline int pthread_attr_setaffini
 }
 #endif
 
+static inline int futex_preallocate(u_int32_t hash_size)
+{
+	return futex(0, FUTEX_PREALLOC_HASH, hash_size, NULL, NULL, 0, 0);
+}
+
 #endif /* _FUTEX_H */

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [patch V2 5/7] perf/bench/futex-hash: Support NUMA
  2016-05-05 19:03 [patch V2 0/7] Sebastian Andrzej Siewior <bigeasy@linutronix.de>, Linus Torvalds <torvalds@linux-foundation.org>, Darren Hart <darren@dvhart.com>, Peter Zijlstra <peterz@infradead.org>, Ingo Molnar <mingo@kernel.org>, Michael Kerrisk <mtk.manpages@googlemail.com>, Davidlohr Bueso <dave@stgolabs.net>, Chris Mason <clm@fb.com>, Carlos O'Donell <carlos@redhat.com>, Torvald Riegel <triegel@redhat.com>, Eric Dumazet <edumazet@google.com> Thomas Gleixner
                   ` (4 preceding siblings ...)
  2016-05-05 19:03 ` [patch V2 6/7] perf/bench/futex-hash: Support preallocate hash table Thomas Gleixner
@ 2016-05-05 19:03 ` Thomas Gleixner
  2016-05-05 19:03 ` [patch V2 7/7] futex.2: Document hash preallocation opcode Thomas Gleixner
  6 siblings, 0 replies; 10+ messages in thread
From: Thomas Gleixner @ 2016-05-05 19:03 UTC (permalink / raw)
  To: LKML; +Cc: Sebastian Andrzej Siewior

[-- Attachment #1: perf-bench-futex-hash-Support-NUMA.patch --]
[-- Type: text/plain, Size: 5893 bytes --]

This adds a new option to tell perf on which numa node the hash benchmark
should run. If set then 

 - The test is bound to the node
 - Memory is allocated on the local NUMA node
 - The threads are bound to the cpus on the node

The NUMA node can be specified by the -n argument.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 tools/perf/bench/Build        |  4 ++
 tools/perf/bench/futex-hash.c | 89 +++++++++++++++++++++++++++++++++++++------
 2 files changed, 82 insertions(+), 11 deletions(-)

diff --git a/tools/perf/bench/Build b/tools/perf/bench/Build
index 60bf119..9e6e518 100644
--- a/tools/perf/bench/Build
+++ b/tools/perf/bench/Build
@@ -1,3 +1,7 @@
+ifdef CONFIG_NUMA
+CFLAGS_futex-hash.o   += -DCONFIG_NUMA=1
+endif
+
 perf-y += sched-messaging.o
 perf-y += sched-pipe.o
 perf-y += mem-functions.o
diff --git a/tools/perf/bench/futex-hash.c b/tools/perf/bench/futex-hash.c
index 0999ac5..a1c6ee9 100644
--- a/tools/perf/bench/futex-hash.c
+++ b/tools/perf/bench/futex-hash.c
@@ -20,6 +20,9 @@
 #include <stdlib.h>
 #include <sys/time.h>
 #include <pthread.h>
+#ifdef CONFIG_NUMA
+#include <numa.h>
+#endif
 
 static unsigned int nthreads = 0;
 static unsigned int nsecs    = 10;
@@ -27,6 +30,7 @@ static unsigned int nsecs    = 10;
 static unsigned int nfutexes = 1024;
 static bool fshared = false, done = false, silent = false;
 static int futex_flag = 0;
+static int numa_node = -1;
 
 struct timeval start, end, runtime;
 static pthread_mutex_t thread_lock;
@@ -39,7 +43,7 @@ struct worker {
 	u_int32_t *futex;
 	pthread_t thread;
 	unsigned long ops;
-};
+} __attribute__((aligned(128)));
 
 static const struct option options[] = {
 	OPT_UINTEGER('t', "threads", &nthreads, "Specify amount of threads"),
@@ -47,9 +51,28 @@ static const struct option options[] = {
 	OPT_UINTEGER('f', "futexes", &nfutexes, "Specify amount of futexes per threads"),
 	OPT_BOOLEAN( 's', "silent",  &silent,   "Silent mode: do not display data/details"),
 	OPT_BOOLEAN( 'S', "shared",  &fshared,  "Use shared futexes instead of private ones"),
+#ifdef CONFIG_NUMA
+	OPT_INTEGER( 'n', "numa",   &numa_node,  "Specify the NUMA node"),
+#endif
 	OPT_END()
 };
 
+#ifndef CONFIG_NUMA
+static int numa_run_on_node(int node __maybe_unused) { return 0; }
+static int numa_node_of_cpu(int node __maybe_unused) { return 0; }
+static void *numa_alloc_local(size_t size) { return malloc(size); }
+static void numa_free(void *p, size_t size __maybe_unused) { return free(p); }
+#endif
+
+static bool cpu_is_local(int cpu)
+{
+	if (numa_node < 0)
+		return true;
+	if (numa_node_of_cpu(cpu) == numa_node)
+		return true;
+	return false;
+}
+
 static const char * const bench_futex_hash_usage[] = {
 	"perf bench futex hash <options>",
 	NULL
@@ -115,6 +138,8 @@ int bench_futex_hash(int argc, const char **argv,
 	unsigned int i, ncpus;
 	pthread_attr_t thread_attr;
 	struct worker *worker = NULL;
+	char *node_str = NULL;
+	unsigned int cpunum;
 
 	argc = parse_options(argc, argv, options, bench_futex_hash_usage, 0);
 	if (argc) {
@@ -128,18 +153,50 @@ int bench_futex_hash(int argc, const char **argv,
 	act.sa_sigaction = toggle_done;
 	sigaction(SIGINT, &act, NULL);
 
-	if (!nthreads) /* default to the number of CPUs */
-		nthreads = ncpus;
+	if (!nthreads) {
+		/* default to the number of CPUs per NUMA node */
+		if (numa_node < 0) {
+			nthreads = ncpus;
+		} else {
+			for (i = 0; i < ncpus; i++) {
+				if (cpu_is_local(i))
+					nthreads++;
+			}
+			if (!nthreads)
+				err(EXIT_FAILURE, "No online CPUs for this node");
+		}
+	} else {
+		int cpu_available = 0;
+
+		for (i = 0; i < ncpus && !cpu_available; i++) {
+			if (cpu_is_local(i))
+				cpu_available = 1;
+		}
+		if (!cpu_available)
+			err(EXIT_FAILURE, "No online CPUs for this node");
+	}
+
+	if (numa_node >= 0) {
+		ret = numa_run_on_node(numa_node);
+		if (ret < 0)
+			err(EXIT_FAILURE, "numa_run_on_node");
+		ret = asprintf(&node_str, " on node %d", numa_node);
+		if (ret < 0)
+			err(EXIT_FAILURE, "numa_node, asprintf");
+	}
 
-	worker = calloc(nthreads, sizeof(*worker));
+	worker = numa_alloc_local(nthreads * sizeof(*worker));
 	if (!worker)
 		goto errmem;
 
 	if (!fshared)
 		futex_flag = FUTEX_PRIVATE_FLAG;
 
-	printf("Run summary [PID %d]: %d threads, each operating on %d [%s] futexes for %d secs.\n\n",
-	       getpid(), nthreads, nfutexes, fshared ? "shared":"private", nsecs);
+	printf("Run summary [PID %d]: %d threads%s, each operating on %d [%s] futexes for %d secs.\n\n",
+	       getpid(), nthreads,
+	       node_str ? : "",
+	       nfutexes, fshared ? "shared":"private",
+	       nsecs);
 
 	init_stats(&throughput_stats);
 	pthread_mutex_init(&thread_lock, NULL);
@@ -149,14 +206,24 @@ int bench_futex_hash(int argc, const char **argv,
 	threads_starting = nthreads;
 	pthread_attr_init(&thread_attr);
 	gettimeofday(&start, NULL);
-	for (i = 0; i < nthreads; i++) {
+	for (cpunum = 0, i = 0; i < nthreads; i++, cpunum++) {
+
+		do {
+			if (cpu_is_local(cpunum))
+				break;
+			cpunum++;
+			if (cpunum > ncpus)
+				cpunum = 0;
+		} while (1);
+
 		worker[i].tid = i;
-		worker[i].futex = calloc(nfutexes, sizeof(*worker[i].futex));
+		worker[i].futex = numa_alloc_local(nfutexes *
+						   sizeof(*worker[i].futex));
 		if (!worker[i].futex)
 			goto errmem;
 
 		CPU_ZERO(&cpu);
-		CPU_SET(i % ncpus, &cpu);
+		CPU_SET(cpunum % ncpus, &cpu);
 
 		ret = pthread_attr_setaffinity_np(&thread_attr, sizeof(cpu_set_t), &cpu);
 		if (ret)
@@ -203,12 +270,12 @@ int bench_futex_hash(int argc, const char **argv,
 				       &worker[i].futex[nfutexes-1], t);
 		}
 
-		free(worker[i].futex);
+		numa_free(worker[i].futex, nfutexes * sizeof(*worker[i].futex));
 	}
 
 	print_summary();
 
-	free(worker);
+	numa_free(worker, nthreads * sizeof(*worker));
 	return ret;
 errmem:
 	err(EXIT_FAILURE, "calloc");
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [patch V2 7/7] futex.2: Document hash preallocation opcode
  2016-05-05 19:03 [patch V2 0/7] Sebastian Andrzej Siewior <bigeasy@linutronix.de>, Linus Torvalds <torvalds@linux-foundation.org>, Darren Hart <darren@dvhart.com>, Peter Zijlstra <peterz@infradead.org>, Ingo Molnar <mingo@kernel.org>, Michael Kerrisk <mtk.manpages@googlemail.com>, Davidlohr Bueso <dave@stgolabs.net>, Chris Mason <clm@fb.com>, Carlos O'Donell <carlos@redhat.com>, Torvald Riegel <triegel@redhat.com>, Eric Dumazet <edumazet@google.com> Thomas Gleixner
                   ` (5 preceding siblings ...)
  2016-05-05 19:03 ` [patch V2 5/7] perf/bench/futex-hash: Support NUMA Thomas Gleixner
@ 2016-05-05 19:03 ` Thomas Gleixner
  6 siblings, 0 replies; 10+ messages in thread
From: Thomas Gleixner @ 2016-05-05 19:03 UTC (permalink / raw)
  To: LKML

[-- Attachment #1: futex.2--Document-hash-preallocation-opcode --]
[-- Type: text/plain, Size: 2067 bytes --]

At least an attempt to document the futex attached mode extension.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 man2/futex.2 |   40 ++++++++++++++++++++++++++++++++++++++++
 1 file changed, 40 insertions(+)

--- a/man2/futex.2
+++ b/man2/futex.2
@@ -767,6 +767,40 @@ operations correspond to
 and
 .BR FUTEX_WAKE_BITSET
 operations where the bit masks are all ones.
+
+.\"
+.\""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
+.\"
+.TP
+.BR FUTEX_PREALLOC_HASH " (since Linux ?.?.?)"
+This operation preallocates the per process private hash with a hash size
+provided by the
+.IR val
+argument.
+
+This operation tells the kernel to preallocate the per process private hash
+with a given hash size. The size is rounded up to the next power of two and
+limited internally in the kernel according to the sys control setting.
+
+The per process private hash is allocated once and cannot be resized. The
+allocation either happens by this op code or automatically when the first real
+futex operation takes place.
+
+If the hash is allocated then subsequent calls with this opcode return the
+currently active hash size.
+
+If the returned size value is 0 then the global hash is used instead of the
+process private hash. This can happen on UP machines where the private hashing
+is disabled and in case of memory allocation failure.
+
+The
+.IR uaddr ,
+.IR timeout ,
+.IR uaddr2
+and
+.I val3
+arguments are ignored.
+
 .\"
 .\""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
 .\"
@@ -1334,6 +1368,12 @@ the futex word at
 Returns 0 if the caller was successfully requeued to the futex for
 the futex word at
 .IR uaddr2 .
+.TP
+.B FUTEX_PREALLOC_HASH
+Returns the size of the allocated hash, i.e. the number of hash slots. If the
+size value is 0 then the global hash is used instead of the process private
+hash. This can happen on UP machines where the private hashing is disabled and
+in case of memory allocation failure.
 .\"
 .\""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
 .\"

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [patch V2 1/7] futex: Add some more function commentry
  2016-05-05 20:44 ` [patch V2 1/7] futex: Add some more function commentry Thomas Gleixner
@ 2016-05-06 17:37   ` Darren Hart
  0 siblings, 0 replies; 10+ messages in thread
From: Darren Hart @ 2016-05-06 17:37 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, Sebastian Andrzej Siewior, Linus Torvalds, Darren Hart,
	Peter Zijlstra, Ingo Molnar, Michael Kerrisk, Davidlohr Bueso,
	Chris Mason, Carlos O'Donell, Torvald Riegel, Eric Dumazet

On Thu, May 05, 2016 at 08:44:03PM -0000, Thomas Gleixner wrote:
> Add some more comments and reformat existing ones to kernel doc style.
> 
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> ---
>  kernel/futex.c |   15 ++++++++++++---
>  1 file changed, 12 insertions(+), 3 deletions(-)
> 
> --- a/kernel/futex.c
> +++ b/kernel/futex.c
> @@ -373,8 +373,12 @@ static inline int hb_waiters_pending(str
>  #endif
>  }
>  
> -/*
> - * We hash on the keys returned from get_futex_key (see below).
> +/**
> + * hash_futex - Return the hash bucket in the global hash
> + * @key:	Pointer to the futex key for which the hash is calculated
> + *
> + * We hash on the keys returned from get_futex_key (see below) and return the
> + * corresponding hash bucket in the global hash.
>   */
>  static struct futex_hash_bucket *hash_futex(union futex_key *key)
>  {
> @@ -384,7 +388,12 @@ static struct futex_hash_bucket *hash_fu
>  	return &futex_queues[hash & (futex_hashsize - 1)];
>  }
>  
> -/*
> +
> +/**
> + * match_futex - Check whether to futex keys are equal

s/to/two/

Otherwise:

Reviewed-by: Darren Hart <dvhart@linux.intel.com>

> + * @key1:	Pointer to key1
> + * @key2:	Pointer to key2
> + *
>   * Return 1 if two futex_keys are equal, 0 otherwise.
>   */
>  static inline int match_futex(union futex_key *key1, union futex_key *key2)
> 
> 
> 

-- 
Darren Hart
Intel Open Source Technology Center

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [patch V2 1/7] futex: Add some more function commentry
  2016-05-05 20:44 [patch V2 0/7] futex: Add support for process private hashing Thomas Gleixner
@ 2016-05-05 20:44 ` Thomas Gleixner
  2016-05-06 17:37   ` Darren Hart
  0 siblings, 1 reply; 10+ messages in thread
From: Thomas Gleixner @ 2016-05-05 20:44 UTC (permalink / raw)
  To: LKML
  Cc: Sebastian Andrzej Siewior, Linus Torvalds, Darren Hart,
	Peter Zijlstra, Ingo Molnar, Michael Kerrisk, Davidlohr Bueso,
	Chris Mason, Carlos O'Donell, Torvald Riegel, Eric Dumazet

[-- Attachment #1: futex-Add-some-more-function-commentry.patch --]
[-- Type: text/plain, Size: 1126 bytes --]

Add some more comments and reformat existing ones to kernel doc style.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 kernel/futex.c |   15 ++++++++++++---
 1 file changed, 12 insertions(+), 3 deletions(-)

--- a/kernel/futex.c
+++ b/kernel/futex.c
@@ -373,8 +373,12 @@ static inline int hb_waiters_pending(str
 #endif
 }
 
-/*
- * We hash on the keys returned from get_futex_key (see below).
+/**
+ * hash_futex - Return the hash bucket in the global hash
+ * @key:	Pointer to the futex key for which the hash is calculated
+ *
+ * We hash on the keys returned from get_futex_key (see below) and return the
+ * corresponding hash bucket in the global hash.
  */
 static struct futex_hash_bucket *hash_futex(union futex_key *key)
 {
@@ -384,7 +388,12 @@ static struct futex_hash_bucket *hash_fu
 	return &futex_queues[hash & (futex_hashsize - 1)];
 }
 
-/*
+
+/**
+ * match_futex - Check whether to futex keys are equal
+ * @key1:	Pointer to key1
+ * @key2:	Pointer to key2
+ *
  * Return 1 if two futex_keys are equal, 0 otherwise.
  */
 static inline int match_futex(union futex_key *key1, union futex_key *key2)

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2016-05-06 17:37 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-05-05 19:03 [patch V2 0/7] Sebastian Andrzej Siewior <bigeasy@linutronix.de>, Linus Torvalds <torvalds@linux-foundation.org>, Darren Hart <darren@dvhart.com>, Peter Zijlstra <peterz@infradead.org>, Ingo Molnar <mingo@kernel.org>, Michael Kerrisk <mtk.manpages@googlemail.com>, Davidlohr Bueso <dave@stgolabs.net>, Chris Mason <clm@fb.com>, Carlos O'Donell <carlos@redhat.com>, Torvald Riegel <triegel@redhat.com>, Eric Dumazet <edumazet@google.com> Thomas Gleixner
2016-05-05 19:03 ` [patch V2 1/7] futex: Add some more function commentry Thomas Gleixner
2016-05-05 19:03 ` [patch V2 2/7] futex: Hash private futexes per process Thomas Gleixner
2016-05-05 19:03 ` [patch V2 3/7] futex: Add op for hash preallocation Thomas Gleixner
2016-05-05 19:03 ` [patch V2 4/7] futex: Add sysctl knobs for process private hash Thomas Gleixner
2016-05-05 19:03 ` [patch V2 6/7] perf/bench/futex-hash: Support preallocate hash table Thomas Gleixner
2016-05-05 19:03 ` [patch V2 5/7] perf/bench/futex-hash: Support NUMA Thomas Gleixner
2016-05-05 19:03 ` [patch V2 7/7] futex.2: Document hash preallocation opcode Thomas Gleixner
2016-05-05 20:44 [patch V2 0/7] futex: Add support for process private hashing Thomas Gleixner
2016-05-05 20:44 ` [patch V2 1/7] futex: Add some more function commentry Thomas Gleixner
2016-05-06 17:37   ` Darren Hart

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.