All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3 00/15] futex: More futex2 bits
@ 2023-09-21 10:45 peterz
  2023-09-21 10:45 ` [PATCH v3 01/15] futex: Clarify FUTEX2 flags peterz
                   ` (15 more replies)
  0 siblings, 16 replies; 36+ messages in thread
From: peterz @ 2023-09-21 10:45 UTC (permalink / raw)
  To: tglx, axboe
  Cc: linux-kernel, peterz, mingo, dvhart, dave, andrealmeid,
	Andrew Morton, urezki, hch, lstoakes, Arnd Bergmann, linux-api,
	linux-mm, linux-arch, malteskarupke

Hi!

New version of the futex2 patches. Futex2 is a new interface to the same 'old'
futex core. An attempt to get away from the multiplex syscall and add a little
room for extentions.

Changes since v2:
 - Rebased to v6.6-rc
 - New FUTEX_STRICT flag (Andre)
 - Reordered futex_size() helper (tglx)
 - Updated some comments (tglx)
 - Folded some tags

My plan is to push the first 10 patches (all the syscalls) into
tip/locking/core this afternoon. All those patches have plenty review tags
including from Thomas who is the actual maintainer of this lot :-)

This should be plenty for Jens to get a move on with the io-uring stuff.

I'm holding off on the NUMA bits for now, because I want to write some
userspace for it since there is some confusion on that -- but I seem to keep
getting side-tracked :/

Patches also available at:

  git://git.kernel.org/pub/scm/linux/kernel/git/peterz/queue.git locking/core
  git://git.kernel.org/pub/scm/linux/kernel/git/peterz/queue.git locking/futex

Where the locking/core thing is the first 10 patches only, and barring Link
tags (which I'll harvest from this posting), will be what I'll push out to tip.


^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH v3 01/15] futex: Clarify FUTEX2 flags
  2023-09-21 10:45 [PATCH v3 00/15] futex: More futex2 bits peterz
@ 2023-09-21 10:45 ` peterz
  2023-09-21 17:36   ` [tip: locking/core] " tip-bot2 for peterz@infradead.org
  2023-09-21 10:45 ` [PATCH v3 02/15] futex: Extend the " peterz
                   ` (14 subsequent siblings)
  15 siblings, 1 reply; 36+ messages in thread
From: peterz @ 2023-09-21 10:45 UTC (permalink / raw)
  To: tglx, axboe
  Cc: linux-kernel, peterz, mingo, dvhart, dave, andrealmeid,
	Andrew Morton, urezki, hch, lstoakes, Arnd Bergmann, linux-api,
	linux-mm, linux-arch, malteskarupke

[-- Attachment #1: peterz-futex2-flags-1.patch --]
[-- Type: text/plain, Size: 2156 bytes --]

sys_futex_waitv() is part of the futex2 series (the first and only so
far) of syscalls and has a flags field per futex (as opposed to flags
being encoded in the futex op).

This new flags field has a new namespace, which unfortunately isn't
super explicit. Notably it currently takes FUTEX_32 and
FUTEX_PRIVATE_FLAG.

Introduce the FUTEX2 namespace to clarify this

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: André Almeida <andrealmeid@igalia.com>
---
 include/uapi/linux/futex.h |   16 +++++++++++++---
 kernel/futex/syscalls.c    |    7 +++----
 2 files changed, 16 insertions(+), 7 deletions(-)

--- a/include/uapi/linux/futex.h
+++ b/include/uapi/linux/futex.h
@@ -44,10 +44,20 @@
 					 FUTEX_PRIVATE_FLAG)
 
 /*
- * Flags to specify the bit length of the futex word for futex2 syscalls.
- * Currently, only 32 is supported.
+ * Flags for futex2 syscalls.
  */
-#define FUTEX_32		2
+			/*	0x00 */
+			/*	0x01 */
+#define FUTEX2_SIZE_U32		0x02
+			/*	0x04 */
+			/*	0x08 */
+			/*	0x10 */
+			/*	0x20 */
+			/*	0x40 */
+#define FUTEX2_PRIVATE		FUTEX_PRIVATE_FLAG
+
+/* do not use */
+#define FUTEX_32		FUTEX2_SIZE_U32 /* historical accident :-( */
 
 /*
  * Max numbers of elements in a futex_waitv array
--- a/kernel/futex/syscalls.c
+++ b/kernel/futex/syscalls.c
@@ -183,8 +183,7 @@ SYSCALL_DEFINE6(futex, u32 __user *, uad
 	return do_futex(uaddr, op, val, tp, uaddr2, (unsigned long)utime, val3);
 }
 
-/* Mask of available flags for each futex in futex_waitv list */
-#define FUTEXV_WAITER_MASK (FUTEX_32 | FUTEX_PRIVATE_FLAG)
+#define FUTEX2_VALID_MASK (FUTEX2_SIZE_U32 | FUTEX2_PRIVATE)
 
 /**
  * futex_parse_waitv - Parse a waitv array from userspace
@@ -205,10 +204,10 @@ static int futex_parse_waitv(struct fute
 		if (copy_from_user(&aux, &uwaitv[i], sizeof(aux)))
 			return -EFAULT;
 
-		if ((aux.flags & ~FUTEXV_WAITER_MASK) || aux.__reserved)
+		if ((aux.flags & ~FUTEX2_VALID_MASK) || aux.__reserved)
 			return -EINVAL;
 
-		if (!(aux.flags & FUTEX_32))
+		if (!(aux.flags & FUTEX2_SIZE_U32))
 			return -EINVAL;
 
 		futexv[i].w.flags = aux.flags;




^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH v3 02/15] futex: Extend the FUTEX2 flags
  2023-09-21 10:45 [PATCH v3 00/15] futex: More futex2 bits peterz
  2023-09-21 10:45 ` [PATCH v3 01/15] futex: Clarify FUTEX2 flags peterz
@ 2023-09-21 10:45 ` peterz
  2023-09-21 17:36   ` [tip: locking/core] " tip-bot2 for peterz@infradead.org
  2023-09-21 10:45 ` [PATCH v3 03/15] futex: Flag conversion peterz
                   ` (13 subsequent siblings)
  15 siblings, 1 reply; 36+ messages in thread
From: peterz @ 2023-09-21 10:45 UTC (permalink / raw)
  To: tglx, axboe
  Cc: linux-kernel, peterz, mingo, dvhart, dave, andrealmeid,
	Andrew Morton, urezki, hch, lstoakes, Arnd Bergmann, linux-api,
	linux-mm, linux-arch, malteskarupke

[-- Attachment #1: peterz-futex2-flags-2.patch --]
[-- Type: text/plain, Size: 2339 bytes --]

Add the definition for the missing but always intended extra sizes,
and add a NUMA flag for the planned numa extention.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: André Almeida <andrealmeid@igalia.com>
---
 include/uapi/linux/futex.h |   21 ++++++++++++++++++---
 kernel/futex/syscalls.c    |    9 +++++++--
 2 files changed, 25 insertions(+), 5 deletions(-)

Index: linux-2.6/include/uapi/linux/futex.h
===================================================================
--- linux-2.6.orig/include/uapi/linux/futex.h
+++ linux-2.6/include/uapi/linux/futex.h
@@ -45,17 +45,32 @@
 
 /*
  * Flags for futex2 syscalls.
+ *
+ * NOTE: these are not pure flags, they can also be seen as:
+ *
+ *   union {
+ *     u32  flags;
+ *     struct {
+ *       u32 size    : 2,
+ *           numa    : 1,
+ *                   : 4,
+ *           private : 1;
+ *     };
+ *   };
  */
-			/*	0x00 */
-			/*	0x01 */
+#define FUTEX2_SIZE_U8		0x00
+#define FUTEX2_SIZE_U16		0x01
 #define FUTEX2_SIZE_U32		0x02
-			/*	0x04 */
+#define FUTEX2_SIZE_U64		0x03
+#define FUTEX2_NUMA		0x04
 			/*	0x08 */
 			/*	0x10 */
 			/*	0x20 */
 			/*	0x40 */
 #define FUTEX2_PRIVATE		FUTEX_PRIVATE_FLAG
 
+#define FUTEX2_SIZE_MASK	0x03
+
 /* do not use */
 #define FUTEX_32		FUTEX2_SIZE_U32 /* historical accident :-( */
 
Index: linux-2.6/kernel/futex/syscalls.c
===================================================================
--- linux-2.6.orig/kernel/futex/syscalls.c
+++ linux-2.6/kernel/futex/syscalls.c
@@ -183,7 +183,7 @@ SYSCALL_DEFINE6(futex, u32 __user *, uad
 	return do_futex(uaddr, op, val, tp, uaddr2, (unsigned long)utime, val3);
 }
 
-#define FUTEX2_VALID_MASK (FUTEX2_SIZE_U32 | FUTEX2_PRIVATE)
+#define FUTEX2_VALID_MASK (FUTEX2_SIZE_MASK | FUTEX2_PRIVATE)
 
 /**
  * futex_parse_waitv - Parse a waitv array from userspace
@@ -207,7 +207,12 @@ static int futex_parse_waitv(struct fute
 		if ((aux.flags & ~FUTEX2_VALID_MASK) || aux.__reserved)
 			return -EINVAL;
 
-		if (!(aux.flags & FUTEX2_SIZE_U32))
+		if (!IS_ENABLED(CONFIG_64BIT) || in_compat_syscall()) {
+			if ((aux.flags & FUTEX2_SIZE_MASK) == FUTEX2_SIZE_U64)
+				return -EINVAL;
+		}
+
+		if ((aux.flags & FUTEX2_SIZE_MASK) != FUTEX2_SIZE_U32)
 			return -EINVAL;
 
 		futexv[i].w.flags = aux.flags;




^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH v3 03/15] futex: Flag conversion
  2023-09-21 10:45 [PATCH v3 00/15] futex: More futex2 bits peterz
  2023-09-21 10:45 ` [PATCH v3 01/15] futex: Clarify FUTEX2 flags peterz
  2023-09-21 10:45 ` [PATCH v3 02/15] futex: Extend the " peterz
@ 2023-09-21 10:45 ` peterz
  2023-09-21 17:36   ` [tip: locking/core] " tip-bot2 for peterz@infradead.org
  2023-09-21 10:45 ` [PATCH v3 04/15] futex: Validate futex value against futex size peterz
                   ` (12 subsequent siblings)
  15 siblings, 1 reply; 36+ messages in thread
From: peterz @ 2023-09-21 10:45 UTC (permalink / raw)
  To: tglx, axboe
  Cc: linux-kernel, peterz, mingo, dvhart, dave, andrealmeid,
	Andrew Morton, urezki, hch, lstoakes, Arnd Bergmann, linux-api,
	linux-mm, linux-arch, malteskarupke

[-- Attachment #1: peterz-futex2-flags.patch --]
[-- Type: text/plain, Size: 5002 bytes --]

Futex has 3 sets of flags:

 - legacy futex op bits
 - futex2 flags
 - internal flags

Add a few helpers to convert from the API flags into the internal
flags.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: André Almeida <andrealmeid@igalia.com>
---
 kernel/futex/futex.h    |   63 +++++++++++++++++++++++++++++++++++++++++++++---
 kernel/futex/syscalls.c |   24 ++++++------------
 kernel/futex/waitwake.c |    4 +--
 3 files changed, 71 insertions(+), 20 deletions(-)

Index: linux-2.6/kernel/futex/futex.h
===================================================================
--- linux-2.6.orig/kernel/futex/futex.h
+++ linux-2.6/kernel/futex/futex.h
@@ -5,6 +5,7 @@
 #include <linux/futex.h>
 #include <linux/rtmutex.h>
 #include <linux/sched/wake_q.h>
+#include <linux/compat.h>
 
 #ifdef CONFIG_PREEMPT_RT
 #include <linux/rcuwait.h>
@@ -16,8 +17,15 @@
  * Futex flags used to encode options to functions and preserve them across
  * restarts.
  */
+#define FLAGS_SIZE_8		0x00
+#define FLAGS_SIZE_16		0x01
+#define FLAGS_SIZE_32		0x02
+#define FLAGS_SIZE_64		0x03
+
+#define FLAGS_SIZE_MASK		0x03
+
 #ifdef CONFIG_MMU
-# define FLAGS_SHARED		0x01
+# define FLAGS_SHARED		0x10
 #else
 /*
  * NOMMU does not have per process address space. Let the compiler optimize
@@ -25,8 +33,57 @@
  */
 # define FLAGS_SHARED		0x00
 #endif
-#define FLAGS_CLOCKRT		0x02
-#define FLAGS_HAS_TIMEOUT	0x04
+#define FLAGS_CLOCKRT		0x20
+#define FLAGS_HAS_TIMEOUT	0x40
+#define FLAGS_NUMA		0x80
+
+/* FUTEX_ to FLAGS_ */
+static inline unsigned int futex_to_flags(unsigned int op)
+{
+	unsigned int flags = FLAGS_SIZE_32;
+
+	if (!(op & FUTEX_PRIVATE_FLAG))
+		flags |= FLAGS_SHARED;
+
+	if (op & FUTEX_CLOCK_REALTIME)
+		flags |= FLAGS_CLOCKRT;
+
+	return flags;
+}
+
+/* FUTEX2_ to FLAGS_ */
+static inline unsigned int futex2_to_flags(unsigned int flags2)
+{
+	unsigned int flags = flags2 & FUTEX2_SIZE_MASK;
+
+	if (!(flags2 & FUTEX2_PRIVATE))
+		flags |= FLAGS_SHARED;
+
+	if (flags2 & FUTEX2_NUMA)
+		flags |= FLAGS_NUMA;
+
+	return flags;
+}
+
+static inline unsigned int futex_size(unsigned int flags)
+{
+	return 1 << (flags & FLAGS_SIZE_MASK);
+}
+
+static inline bool futex_flags_valid(unsigned int flags)
+{
+	/* Only 64bit futexes for 64bit code */
+	if (!IS_ENABLED(CONFIG_64BIT) || in_compat_syscall()) {
+		if ((flags & FLAGS_SIZE_MASK) == FLAGS_SIZE_64)
+			return false;
+	}
+
+	/* Only 32bit futexes are implemented -- for now */
+	if ((flags & FLAGS_SIZE_MASK) != FLAGS_SIZE_32)
+		return false;
+
+	return true;
+}
 
 #ifdef CONFIG_FAIL_FUTEX
 extern bool should_fail_futex(bool fshared);
Index: linux-2.6/kernel/futex/syscalls.c
===================================================================
--- linux-2.6.orig/kernel/futex/syscalls.c
+++ linux-2.6/kernel/futex/syscalls.c
@@ -1,6 +1,5 @@
 // SPDX-License-Identifier: GPL-2.0-or-later
 
-#include <linux/compat.h>
 #include <linux/syscalls.h>
 #include <linux/time_namespace.h>
 
@@ -85,15 +84,12 @@ err_unlock:
 long do_futex(u32 __user *uaddr, int op, u32 val, ktime_t *timeout,
 		u32 __user *uaddr2, u32 val2, u32 val3)
 {
+	unsigned int flags = futex_to_flags(op);
 	int cmd = op & FUTEX_CMD_MASK;
-	unsigned int flags = 0;
 
-	if (!(op & FUTEX_PRIVATE_FLAG))
-		flags |= FLAGS_SHARED;
-
-	if (op & FUTEX_CLOCK_REALTIME) {
-		flags |= FLAGS_CLOCKRT;
-		if (cmd != FUTEX_WAIT_BITSET && cmd != FUTEX_WAIT_REQUEUE_PI &&
+	if (flags & FLAGS_CLOCKRT) {
+		if (cmd != FUTEX_WAIT_BITSET &&
+		    cmd != FUTEX_WAIT_REQUEUE_PI &&
 		    cmd != FUTEX_LOCK_PI2)
 			return -ENOSYS;
 	}
@@ -201,21 +197,19 @@ static int futex_parse_waitv(struct fute
 	unsigned int i;
 
 	for (i = 0; i < nr_futexes; i++) {
+		unsigned int flags;
+
 		if (copy_from_user(&aux, &uwaitv[i], sizeof(aux)))
 			return -EFAULT;
 
 		if ((aux.flags & ~FUTEX2_VALID_MASK) || aux.__reserved)
 			return -EINVAL;
 
-		if (!IS_ENABLED(CONFIG_64BIT) || in_compat_syscall()) {
-			if ((aux.flags & FUTEX2_SIZE_MASK) == FUTEX2_SIZE_U64)
-				return -EINVAL;
-		}
-
-		if ((aux.flags & FUTEX2_SIZE_MASK) != FUTEX2_SIZE_U32)
+		flags = futex2_to_flags(aux.flags);
+		if (!futex_flags_valid(flags))
 			return -EINVAL;
 
-		futexv[i].w.flags = aux.flags;
+		futexv[i].w.flags = flags;
 		futexv[i].w.val = aux.val;
 		futexv[i].w.uaddr = aux.uaddr;
 		futexv[i].q = futex_q_init;
Index: linux-2.6/kernel/futex/waitwake.c
===================================================================
--- linux-2.6.orig/kernel/futex/waitwake.c
+++ linux-2.6/kernel/futex/waitwake.c
@@ -419,11 +419,11 @@ static int futex_wait_multiple_setup(str
 	 */
 retry:
 	for (i = 0; i < count; i++) {
-		if ((vs[i].w.flags & FUTEX_PRIVATE_FLAG) && retry)
+		if (!(vs[i].w.flags & FLAGS_SHARED) && retry)
 			continue;
 
 		ret = get_futex_key(u64_to_user_ptr(vs[i].w.uaddr),
-				    !(vs[i].w.flags & FUTEX_PRIVATE_FLAG),
+				    vs[i].w.flags & FLAGS_SHARED,
 				    &vs[i].q.key, FUTEX_READ);
 
 		if (unlikely(ret))




^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH v3 04/15] futex: Validate futex value against futex size
  2023-09-21 10:45 [PATCH v3 00/15] futex: More futex2 bits peterz
                   ` (2 preceding siblings ...)
  2023-09-21 10:45 ` [PATCH v3 03/15] futex: Flag conversion peterz
@ 2023-09-21 10:45 ` peterz
  2023-09-21 17:36   ` [tip: locking/core] " tip-bot2 for peterz@infradead.org
  2023-09-21 10:45 ` [PATCH v3 05/15] futex: Add sys_futex_wake() peterz
                   ` (11 subsequent siblings)
  15 siblings, 1 reply; 36+ messages in thread
From: peterz @ 2023-09-21 10:45 UTC (permalink / raw)
  To: tglx, axboe
  Cc: linux-kernel, peterz, mingo, dvhart, dave, andrealmeid,
	Andrew Morton, urezki, hch, lstoakes, Arnd Bergmann, linux-api,
	linux-mm, linux-arch, malteskarupke

[-- Attachment #1: peterz-futex2-enforce-bits.patch --]
[-- Type: text/plain, Size: 1460 bytes --]

Ensure the futex value fits in the given futex size. Since this adds a
constraint to an existing syscall, it might possibly change behaviour.

Currently the value would be truncated to a u32 and any high bits
would get silently lost.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
---
 kernel/futex/futex.h    |   10 ++++++++++
 kernel/futex/syscalls.c |    3 +++
 2 files changed, 13 insertions(+)

Index: linux-2.6/kernel/futex/futex.h
===================================================================
--- linux-2.6.orig/kernel/futex/futex.h
+++ linux-2.6/kernel/futex/futex.h
@@ -85,6 +85,16 @@ static inline bool futex_flags_valid(uns
 	return true;
 }
 
+static inline bool futex_validate_input(unsigned int flags, u64 val)
+{
+	int bits = 8 * futex_size(flags);
+
+	if (bits < 64 && (val >> bits))
+		return false;
+
+	return true;
+}
+
 #ifdef CONFIG_FAIL_FUTEX
 extern bool should_fail_futex(bool fshared);
 #else
Index: linux-2.6/kernel/futex/syscalls.c
===================================================================
--- linux-2.6.orig/kernel/futex/syscalls.c
+++ linux-2.6/kernel/futex/syscalls.c
@@ -209,6 +209,9 @@ static int futex_parse_waitv(struct fute
 		if (!futex_flags_valid(flags))
 			return -EINVAL;
 
+		if (!futex_validate_input(flags, aux.val))
+			return -EINVAL;
+
 		futexv[i].w.flags = flags;
 		futexv[i].w.val = aux.val;
 		futexv[i].w.uaddr = aux.uaddr;



^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH v3 05/15] futex: Add sys_futex_wake()
  2023-09-21 10:45 [PATCH v3 00/15] futex: More futex2 bits peterz
                   ` (3 preceding siblings ...)
  2023-09-21 10:45 ` [PATCH v3 04/15] futex: Validate futex value against futex size peterz
@ 2023-09-21 10:45 ` peterz
  2023-09-21 17:36   ` [tip: locking/core] " tip-bot2 for peterz@infradead.org
  2023-09-21 10:45 ` [PATCH v3 06/15] futex: FLAGS_STRICT peterz
                   ` (10 subsequent siblings)
  15 siblings, 1 reply; 36+ messages in thread
From: peterz @ 2023-09-21 10:45 UTC (permalink / raw)
  To: tglx, axboe
  Cc: linux-kernel, peterz, mingo, dvhart, dave, andrealmeid,
	Andrew Morton, urezki, hch, lstoakes, Arnd Bergmann, linux-api,
	linux-mm, linux-arch, malteskarupke, Geert Uytterhoeven

[-- Attachment #1: peterz-futex2-wake.patch --]
[-- Type: text/plain, Size: 12503 bytes --]

To complement sys_futex_waitv() add sys_futex_wake(). This syscall
implements what was previously known as FUTEX_WAKE_BITSET except it
uses 'unsigned long' for the bitmask and takes FUTEX2 flags.

The 'unsigned long' allows FUTEX2_SIZE_U64 on 64bit platforms.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
---
 arch/alpha/kernel/syscalls/syscall.tbl      |    1 
 arch/arm/tools/syscall.tbl                  |    1 
 arch/arm64/include/asm/unistd.h             |    2 -
 arch/arm64/include/asm/unistd32.h           |    2 +
 arch/ia64/kernel/syscalls/syscall.tbl       |    1 
 arch/m68k/kernel/syscalls/syscall.tbl       |    1 
 arch/microblaze/kernel/syscalls/syscall.tbl |    1 
 arch/mips/kernel/syscalls/syscall_n32.tbl   |    1 
 arch/mips/kernel/syscalls/syscall_n64.tbl   |    1 
 arch/mips/kernel/syscalls/syscall_o32.tbl   |    1 
 arch/parisc/kernel/syscalls/syscall.tbl     |    1 
 arch/powerpc/kernel/syscalls/syscall.tbl    |    1 
 arch/s390/kernel/syscalls/syscall.tbl       |    1 
 arch/sh/kernel/syscalls/syscall.tbl         |    1 
 arch/sparc/kernel/syscalls/syscall.tbl      |    1 
 arch/x86/entry/syscalls/syscall_32.tbl      |    1 
 arch/x86/entry/syscalls/syscall_64.tbl      |    1 
 arch/xtensa/kernel/syscalls/syscall.tbl     |    1 
 include/linux/syscalls.h                    |    3 ++
 include/uapi/asm-generic/unistd.h           |    5 ++--
 kernel/futex/syscalls.c                     |   33 ++++++++++++++++++++++++++++
 kernel/sys_ni.c                             |    1 
 22 files changed, 59 insertions(+), 3 deletions(-)

Index: linux-2.6/arch/alpha/kernel/syscalls/syscall.tbl
===================================================================
--- linux-2.6.orig/arch/alpha/kernel/syscalls/syscall.tbl
+++ linux-2.6/arch/alpha/kernel/syscalls/syscall.tbl
@@ -492,3 +492,4 @@
 560	common	set_mempolicy_home_node		sys_ni_syscall
 561	common	cachestat			sys_cachestat
 562	common	fchmodat2			sys_fchmodat2
+563	common	futex_wake			sys_futex_wake
Index: linux-2.6/arch/arm/tools/syscall.tbl
===================================================================
--- linux-2.6.orig/arch/arm/tools/syscall.tbl
+++ linux-2.6/arch/arm/tools/syscall.tbl
@@ -466,3 +466,4 @@
 450	common	set_mempolicy_home_node		sys_set_mempolicy_home_node
 451	common	cachestat			sys_cachestat
 452	common	fchmodat2			sys_fchmodat2
+454	common	futex_wake			sys_futex_wake
Index: linux-2.6/arch/arm64/include/asm/unistd.h
===================================================================
--- linux-2.6.orig/arch/arm64/include/asm/unistd.h
+++ linux-2.6/arch/arm64/include/asm/unistd.h
@@ -39,7 +39,7 @@
 #define __ARM_NR_compat_set_tls		(__ARM_NR_COMPAT_BASE + 5)
 #define __ARM_NR_COMPAT_END		(__ARM_NR_COMPAT_BASE + 0x800)
 
-#define __NR_compat_syscalls		453
+#define __NR_compat_syscalls		455
 #endif
 
 #define __ARCH_WANT_SYS_CLONE
Index: linux-2.6/arch/arm64/include/asm/unistd32.h
===================================================================
--- linux-2.6.orig/arch/arm64/include/asm/unistd32.h
+++ linux-2.6/arch/arm64/include/asm/unistd32.h
@@ -911,6 +911,8 @@ __SYSCALL(__NR_set_mempolicy_home_node,
 __SYSCALL(__NR_cachestat, sys_cachestat)
 #define __NR_fchmodat2 452
 __SYSCALL(__NR_fchmodat2, sys_fchmodat2)
+#define __NR_futex_wake 454
+__SYSCALL(__NR_futex_wake, sys_futex_wake)
 
 /*
  * Please add new compat syscalls above this comment and update
Index: linux-2.6/arch/ia64/kernel/syscalls/syscall.tbl
===================================================================
--- linux-2.6.orig/arch/ia64/kernel/syscalls/syscall.tbl
+++ linux-2.6/arch/ia64/kernel/syscalls/syscall.tbl
@@ -373,3 +373,4 @@
 450	common	set_mempolicy_home_node		sys_set_mempolicy_home_node
 451	common	cachestat			sys_cachestat
 452	common	fchmodat2			sys_fchmodat2
+454	common	futex_wake			sys_futex_wake
Index: linux-2.6/arch/m68k/kernel/syscalls/syscall.tbl
===================================================================
--- linux-2.6.orig/arch/m68k/kernel/syscalls/syscall.tbl
+++ linux-2.6/arch/m68k/kernel/syscalls/syscall.tbl
@@ -452,3 +452,4 @@
 450	common	set_mempolicy_home_node		sys_set_mempolicy_home_node
 451	common	cachestat			sys_cachestat
 452	common	fchmodat2			sys_fchmodat2
+454	common	futex_wake			sys_futex_wake
Index: linux-2.6/arch/microblaze/kernel/syscalls/syscall.tbl
===================================================================
--- linux-2.6.orig/arch/microblaze/kernel/syscalls/syscall.tbl
+++ linux-2.6/arch/microblaze/kernel/syscalls/syscall.tbl
@@ -458,3 +458,4 @@
 450	common	set_mempolicy_home_node		sys_set_mempolicy_home_node
 451	common	cachestat			sys_cachestat
 452	common	fchmodat2			sys_fchmodat2
+454	common	futex_wake			sys_futex_wake
Index: linux-2.6/arch/mips/kernel/syscalls/syscall_n32.tbl
===================================================================
--- linux-2.6.orig/arch/mips/kernel/syscalls/syscall_n32.tbl
+++ linux-2.6/arch/mips/kernel/syscalls/syscall_n32.tbl
@@ -391,3 +391,4 @@
 450	n32	set_mempolicy_home_node		sys_set_mempolicy_home_node
 451	n32	cachestat			sys_cachestat
 452	n32	fchmodat2			sys_fchmodat2
+454	n32	futex_wake			sys_futex_wake
Index: linux-2.6/arch/mips/kernel/syscalls/syscall_n64.tbl
===================================================================
--- linux-2.6.orig/arch/mips/kernel/syscalls/syscall_n64.tbl
+++ linux-2.6/arch/mips/kernel/syscalls/syscall_n64.tbl
@@ -367,3 +367,4 @@
 450	common	set_mempolicy_home_node		sys_set_mempolicy_home_node
 451	n64	cachestat			sys_cachestat
 452	n64	fchmodat2			sys_fchmodat2
+454	n64	futex_wake			sys_futex_wake
Index: linux-2.6/arch/mips/kernel/syscalls/syscall_o32.tbl
===================================================================
--- linux-2.6.orig/arch/mips/kernel/syscalls/syscall_o32.tbl
+++ linux-2.6/arch/mips/kernel/syscalls/syscall_o32.tbl
@@ -440,3 +440,4 @@
 450	o32	set_mempolicy_home_node		sys_set_mempolicy_home_node
 451	o32	cachestat			sys_cachestat
 452	o32	fchmodat2			sys_fchmodat2
+454	o32	futex_wake			sys_futex_wake
Index: linux-2.6/arch/parisc/kernel/syscalls/syscall.tbl
===================================================================
--- linux-2.6.orig/arch/parisc/kernel/syscalls/syscall.tbl
+++ linux-2.6/arch/parisc/kernel/syscalls/syscall.tbl
@@ -451,3 +451,4 @@
 450	common	set_mempolicy_home_node		sys_set_mempolicy_home_node
 451	common	cachestat			sys_cachestat
 452	common	fchmodat2			sys_fchmodat2
+454	common	futex_wake			sys_futex_wake
Index: linux-2.6/arch/powerpc/kernel/syscalls/syscall.tbl
===================================================================
--- linux-2.6.orig/arch/powerpc/kernel/syscalls/syscall.tbl
+++ linux-2.6/arch/powerpc/kernel/syscalls/syscall.tbl
@@ -539,3 +539,4 @@
 450 	nospu	set_mempolicy_home_node		sys_set_mempolicy_home_node
 451	common	cachestat			sys_cachestat
 452	common	fchmodat2			sys_fchmodat2
+454	common	futex_wake			sys_futex_wake
Index: linux-2.6/arch/s390/kernel/syscalls/syscall.tbl
===================================================================
--- linux-2.6.orig/arch/s390/kernel/syscalls/syscall.tbl
+++ linux-2.6/arch/s390/kernel/syscalls/syscall.tbl
@@ -455,3 +455,4 @@
 450  common	set_mempolicy_home_node	sys_set_mempolicy_home_node	sys_set_mempolicy_home_node
 451  common	cachestat		sys_cachestat			sys_cachestat
 452  common	fchmodat2		sys_fchmodat2			sys_fchmodat2
+454  common	futex_wake		sys_futex_wake			sys_futex_wake
Index: linux-2.6/arch/sh/kernel/syscalls/syscall.tbl
===================================================================
--- linux-2.6.orig/arch/sh/kernel/syscalls/syscall.tbl
+++ linux-2.6/arch/sh/kernel/syscalls/syscall.tbl
@@ -455,3 +455,4 @@
 450	common	set_mempolicy_home_node		sys_set_mempolicy_home_node
 451	common	cachestat			sys_cachestat
 452	common	fchmodat2			sys_fchmodat2
+454	common	futex_wake			sys_futex_wake
Index: linux-2.6/arch/sparc/kernel/syscalls/syscall.tbl
===================================================================
--- linux-2.6.orig/arch/sparc/kernel/syscalls/syscall.tbl
+++ linux-2.6/arch/sparc/kernel/syscalls/syscall.tbl
@@ -498,3 +498,4 @@
 450	common	set_mempolicy_home_node		sys_set_mempolicy_home_node
 451	common	cachestat			sys_cachestat
 452	common	fchmodat2			sys_fchmodat2
+454	common	futex_wake			sys_futex_wake
Index: linux-2.6/arch/x86/entry/syscalls/syscall_32.tbl
===================================================================
--- linux-2.6.orig/arch/x86/entry/syscalls/syscall_32.tbl
+++ linux-2.6/arch/x86/entry/syscalls/syscall_32.tbl
@@ -457,3 +457,4 @@
 450	i386	set_mempolicy_home_node		sys_set_mempolicy_home_node
 451	i386	cachestat		sys_cachestat
 452	i386	fchmodat2		sys_fchmodat2
+454	i386	futex_wake		sys_futex_wake
Index: linux-2.6/arch/x86/entry/syscalls/syscall_64.tbl
===================================================================
--- linux-2.6.orig/arch/x86/entry/syscalls/syscall_64.tbl
+++ linux-2.6/arch/x86/entry/syscalls/syscall_64.tbl
@@ -375,6 +375,7 @@
 451	common	cachestat		sys_cachestat
 452	common	fchmodat2		sys_fchmodat2
 453	64	map_shadow_stack	sys_map_shadow_stack
+454	common	futex_wake		sys_futex_wake
 
 #
 # Due to a historical design error, certain syscalls are numbered differently
Index: linux-2.6/arch/xtensa/kernel/syscalls/syscall.tbl
===================================================================
--- linux-2.6.orig/arch/xtensa/kernel/syscalls/syscall.tbl
+++ linux-2.6/arch/xtensa/kernel/syscalls/syscall.tbl
@@ -423,3 +423,4 @@
 450	common	set_mempolicy_home_node		sys_set_mempolicy_home_node
 451	common	cachestat			sys_cachestat
 452	common	fchmodat2			sys_fchmodat2
+454	common	futex_wake			sys_futex_wake
Index: linux-2.6/include/linux/syscalls.h
===================================================================
--- linux-2.6.orig/include/linux/syscalls.h
+++ linux-2.6/include/linux/syscalls.h
@@ -549,6 +549,9 @@ asmlinkage long sys_set_robust_list(stru
 asmlinkage long sys_futex_waitv(struct futex_waitv *waiters,
 				unsigned int nr_futexes, unsigned int flags,
 				struct __kernel_timespec __user *timeout, clockid_t clockid);
+
+asmlinkage long sys_futex_wake(void __user *uaddr, unsigned long mask, int nr, unsigned int flags);
+
 asmlinkage long sys_nanosleep(struct __kernel_timespec __user *rqtp,
 			      struct __kernel_timespec __user *rmtp);
 asmlinkage long sys_nanosleep_time32(struct old_timespec32 __user *rqtp,
Index: linux-2.6/include/uapi/asm-generic/unistd.h
===================================================================
--- linux-2.6.orig/include/uapi/asm-generic/unistd.h
+++ linux-2.6/include/uapi/asm-generic/unistd.h
@@ -822,9 +822,11 @@ __SYSCALL(__NR_cachestat, sys_cachestat)
 
 #define __NR_fchmodat2 452
 __SYSCALL(__NR_fchmodat2, sys_fchmodat2)
+#define __NR_futex_wake 454
+__SYSCALL(__NR_futex_wake, sys_futex_wake)
 
 #undef __NR_syscalls
-#define __NR_syscalls 453
+#define __NR_syscalls 455
 
 /*
  * 32 bit systems traditionally used different
Index: linux-2.6/kernel/futex/syscalls.c
===================================================================
--- linux-2.6.orig/kernel/futex/syscalls.c
+++ linux-2.6/kernel/futex/syscalls.c
@@ -306,6 +306,36 @@ destroy_timer:
 	return ret;
 }
 
+/*
+ * sys_futex_wake - Wake a number of futexes
+ * @uaddr:	Address of the futex(es) to wake
+ * @mask:	bitmask
+ * @nr:		Number of the futexes to wake
+ * @flags:	FUTEX2 flags
+ *
+ * Identical to the traditional FUTEX_WAKE_BITSET op, except it is part of the
+ * futex2 family of calls.
+ */
+
+SYSCALL_DEFINE4(futex_wake,
+		void __user *, uaddr,
+		unsigned long, mask,
+		int, nr,
+		unsigned int, flags)
+{
+	if (flags & ~FUTEX2_VALID_MASK)
+		return -EINVAL;
+
+	flags = futex2_to_flags(flags);
+	if (!futex_flags_valid(flags))
+		return -EINVAL;
+
+	if (!futex_validate_input(flags, mask))
+		return -EINVAL;
+
+	return futex_wake(uaddr, flags, nr, mask);
+}
+
 #ifdef CONFIG_COMPAT
 COMPAT_SYSCALL_DEFINE2(set_robust_list,
 		struct compat_robust_list_head __user *, head,
Index: linux-2.6/kernel/sys_ni.c
===================================================================
--- linux-2.6.orig/kernel/sys_ni.c
+++ linux-2.6/kernel/sys_ni.c
@@ -87,6 +87,7 @@ COND_SYSCALL_COMPAT(set_robust_list);
 COND_SYSCALL(get_robust_list);
 COND_SYSCALL_COMPAT(get_robust_list);
 COND_SYSCALL(futex_waitv);
+COND_SYSCALL(futex_wake);
 COND_SYSCALL(kexec_load);
 COND_SYSCALL_COMPAT(kexec_load);
 COND_SYSCALL(init_module);



^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH v3 06/15] futex: FLAGS_STRICT
  2023-09-21 10:45 [PATCH v3 00/15] futex: More futex2 bits peterz
                   ` (4 preceding siblings ...)
  2023-09-21 10:45 ` [PATCH v3 05/15] futex: Add sys_futex_wake() peterz
@ 2023-09-21 10:45 ` peterz
  2023-09-21 15:14   ` Thomas Gleixner
  2023-09-21 17:36   ` [tip: locking/core] " tip-bot2 for peterz@infradead.org
  2023-09-21 10:45 ` [PATCH v3 07/15] futex: Add sys_futex_wait() peterz
                   ` (9 subsequent siblings)
  15 siblings, 2 replies; 36+ messages in thread
From: peterz @ 2023-09-21 10:45 UTC (permalink / raw)
  To: tglx, axboe
  Cc: linux-kernel, peterz, mingo, dvhart, dave, andrealmeid,
	Andrew Morton, urezki, hch, lstoakes, Arnd Bergmann, linux-api,
	linux-mm, linux-arch, malteskarupke

[-- Attachment #1: peterz-futex2-wake-0.patch --]
[-- Type: text/plain, Size: 2516 bytes --]

The current semantics for futex_wake() are a bit loose, specifically
asking for 0 futexes to be woken actually gets you 1.

Adding a !nr check to sys_futex_wake() makes that it would return 0
for unaligned futex words, because that check comes in the shared
futex_wake() function. Adding the !nr check there, would affect the
legacy sys_futex() semantics.

Hence frob a flag :-(

Suggested-by: André Almeida <andrealmeid@igalia.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
Index: linux-2.6/kernel/futex/futex.h
===================================================================
--- linux-2.6.orig/kernel/futex/futex.h
+++ linux-2.6/kernel/futex/futex.h
@@ -17,25 +17,26 @@
  * Futex flags used to encode options to functions and preserve them across
  * restarts.
  */
-#define FLAGS_SIZE_8		0x00
-#define FLAGS_SIZE_16		0x01
-#define FLAGS_SIZE_32		0x02
-#define FLAGS_SIZE_64		0x03
+#define FLAGS_SIZE_8		0x0000
+#define FLAGS_SIZE_16		0x0001
+#define FLAGS_SIZE_32		0x0002
+#define FLAGS_SIZE_64		0x0003
 
-#define FLAGS_SIZE_MASK		0x03
+#define FLAGS_SIZE_MASK		0x0003
 
 #ifdef CONFIG_MMU
-# define FLAGS_SHARED		0x10
+# define FLAGS_SHARED		0x0010
 #else
 /*
  * NOMMU does not have per process address space. Let the compiler optimize
  * code away.
  */
-# define FLAGS_SHARED		0x00
+# define FLAGS_SHARED		0x0000
 #endif
-#define FLAGS_CLOCKRT		0x20
-#define FLAGS_HAS_TIMEOUT	0x40
-#define FLAGS_NUMA		0x80
+#define FLAGS_CLOCKRT		0x0020
+#define FLAGS_HAS_TIMEOUT	0x0040
+#define FLAGS_NUMA		0x0080
+#define FLAGS_STRICT		0x0100
 
 /* FUTEX_ to FLAGS_ */
 static inline unsigned int futex_to_flags(unsigned int op)
Index: linux-2.6/kernel/futex/syscalls.c
===================================================================
--- linux-2.6.orig/kernel/futex/syscalls.c
+++ linux-2.6/kernel/futex/syscalls.c
@@ -333,7 +333,7 @@ SYSCALL_DEFINE4(futex_wake,
 	if (!futex_validate_input(flags, mask))
 		return -EINVAL;
 
-	return futex_wake(uaddr, flags, nr, mask);
+	return futex_wake(uaddr, FLAGS_STRICT | flags, nr, mask);
 }
 
 #ifdef CONFIG_COMPAT
Index: linux-2.6/kernel/futex/waitwake.c
===================================================================
--- linux-2.6.orig/kernel/futex/waitwake.c
+++ linux-2.6/kernel/futex/waitwake.c
@@ -155,6 +155,9 @@ int futex_wake(u32 __user *uaddr, unsign
 	if (unlikely(ret != 0))
 		return ret;
 
+	if ((flags & FLAGS_STRICT) && !nr_wake)
+		return 0;
+
 	hb = futex_hash(&key);
 
 	/* Make sure we really have tasks to wakeup */




^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH v3 07/15] futex: Add sys_futex_wait()
  2023-09-21 10:45 [PATCH v3 00/15] futex: More futex2 bits peterz
                   ` (5 preceding siblings ...)
  2023-09-21 10:45 ` [PATCH v3 06/15] futex: FLAGS_STRICT peterz
@ 2023-09-21 10:45 ` peterz
  2023-09-21 17:36   ` [tip: locking/core] " tip-bot2 for peterz@infradead.org
  2023-09-21 10:45 ` [PATCH v3 08/15] futex: Propagate flags into get_futex_key() peterz
                   ` (8 subsequent siblings)
  15 siblings, 1 reply; 36+ messages in thread
From: peterz @ 2023-09-21 10:45 UTC (permalink / raw)
  To: tglx, axboe
  Cc: linux-kernel, peterz, mingo, dvhart, dave, andrealmeid,
	Andrew Morton, urezki, hch, lstoakes, Arnd Bergmann, linux-api,
	linux-mm, linux-arch, malteskarupke, Geert Uytterhoeven

[-- Attachment #1: peterz-futex2-wait.patch --]
[-- Type: text/plain, Size: 18631 bytes --]

To complement sys_futex_waitv()/wake(), add sys_futex_wait(). This
syscall implements what was previously known as FUTEX_WAIT_BITSET
except it uses 'unsigned long' for the value and bitmask arguments,
takes timespec and clockid_t arguments for the absolute timeout and
uses FUTEX2 flags.

The 'unsigned long' allows FUTEX2_SIZE_U64 on 64bit platforms.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
---
 arch/alpha/kernel/syscalls/syscall.tbl      |    1 
 arch/arm/tools/syscall.tbl                  |    1 
 arch/arm64/include/asm/unistd.h             |    2 
 arch/arm64/include/asm/unistd32.h           |    2 
 arch/ia64/kernel/syscalls/syscall.tbl       |    1 
 arch/m68k/kernel/syscalls/syscall.tbl       |    1 
 arch/microblaze/kernel/syscalls/syscall.tbl |    1 
 arch/mips/kernel/syscalls/syscall_n32.tbl   |    1 
 arch/mips/kernel/syscalls/syscall_n64.tbl   |    1 
 arch/mips/kernel/syscalls/syscall_o32.tbl   |    1 
 arch/parisc/kernel/syscalls/syscall.tbl     |    1 
 arch/powerpc/kernel/syscalls/syscall.tbl    |    1 
 arch/s390/kernel/syscalls/syscall.tbl       |    1 
 arch/sh/kernel/syscalls/syscall.tbl         |    1 
 arch/sparc/kernel/syscalls/syscall.tbl      |    1 
 arch/x86/entry/syscalls/syscall_32.tbl      |    1 
 arch/x86/entry/syscalls/syscall_64.tbl      |    1 
 arch/xtensa/kernel/syscalls/syscall.tbl     |    1 
 include/linux/syscalls.h                    |    4 
 include/uapi/asm-generic/unistd.h           |    4 
 kernel/futex/futex.h                        |    3 
 kernel/futex/syscalls.c                     |  120 +++++++++++++++++++++-------
 kernel/futex/waitwake.c                     |   67 +++++++++------
 kernel/sys_ni.c                             |    1 
 24 files changed, 159 insertions(+), 60 deletions(-)

Index: linux-2.6/arch/alpha/kernel/syscalls/syscall.tbl
===================================================================
--- linux-2.6.orig/arch/alpha/kernel/syscalls/syscall.tbl
+++ linux-2.6/arch/alpha/kernel/syscalls/syscall.tbl
@@ -493,3 +493,4 @@
 561	common	cachestat			sys_cachestat
 562	common	fchmodat2			sys_fchmodat2
 563	common	futex_wake			sys_futex_wake
+564	common	futex_wait			sys_futex_wait
Index: linux-2.6/arch/arm/tools/syscall.tbl
===================================================================
--- linux-2.6.orig/arch/arm/tools/syscall.tbl
+++ linux-2.6/arch/arm/tools/syscall.tbl
@@ -467,3 +467,4 @@
 451	common	cachestat			sys_cachestat
 452	common	fchmodat2			sys_fchmodat2
 454	common	futex_wake			sys_futex_wake
+455	common	futex_wait			sys_futex_wait
Index: linux-2.6/arch/arm64/include/asm/unistd.h
===================================================================
--- linux-2.6.orig/arch/arm64/include/asm/unistd.h
+++ linux-2.6/arch/arm64/include/asm/unistd.h
@@ -39,7 +39,7 @@
 #define __ARM_NR_compat_set_tls		(__ARM_NR_COMPAT_BASE + 5)
 #define __ARM_NR_COMPAT_END		(__ARM_NR_COMPAT_BASE + 0x800)
 
-#define __NR_compat_syscalls		455
+#define __NR_compat_syscalls		456
 #endif
 
 #define __ARCH_WANT_SYS_CLONE
Index: linux-2.6/arch/arm64/include/asm/unistd32.h
===================================================================
--- linux-2.6.orig/arch/arm64/include/asm/unistd32.h
+++ linux-2.6/arch/arm64/include/asm/unistd32.h
@@ -913,6 +913,8 @@ __SYSCALL(__NR_cachestat, sys_cachestat)
 __SYSCALL(__NR_fchmodat2, sys_fchmodat2)
 #define __NR_futex_wake 454
 __SYSCALL(__NR_futex_wake, sys_futex_wake)
+#define __NR_futex_wait 455
+__SYSCALL(__NR_futex_wait, sys_futex_wait)
 
 /*
  * Please add new compat syscalls above this comment and update
Index: linux-2.6/arch/ia64/kernel/syscalls/syscall.tbl
===================================================================
--- linux-2.6.orig/arch/ia64/kernel/syscalls/syscall.tbl
+++ linux-2.6/arch/ia64/kernel/syscalls/syscall.tbl
@@ -374,3 +374,4 @@
 451	common	cachestat			sys_cachestat
 452	common	fchmodat2			sys_fchmodat2
 454	common	futex_wake			sys_futex_wake
+455	common	futex_wait			sys_futex_wait
Index: linux-2.6/arch/m68k/kernel/syscalls/syscall.tbl
===================================================================
--- linux-2.6.orig/arch/m68k/kernel/syscalls/syscall.tbl
+++ linux-2.6/arch/m68k/kernel/syscalls/syscall.tbl
@@ -453,3 +453,4 @@
 451	common	cachestat			sys_cachestat
 452	common	fchmodat2			sys_fchmodat2
 454	common	futex_wake			sys_futex_wake
+455	common	futex_wait			sys_futex_wait
Index: linux-2.6/arch/microblaze/kernel/syscalls/syscall.tbl
===================================================================
--- linux-2.6.orig/arch/microblaze/kernel/syscalls/syscall.tbl
+++ linux-2.6/arch/microblaze/kernel/syscalls/syscall.tbl
@@ -459,3 +459,4 @@
 451	common	cachestat			sys_cachestat
 452	common	fchmodat2			sys_fchmodat2
 454	common	futex_wake			sys_futex_wake
+455	common	futex_wait			sys_futex_wait
Index: linux-2.6/arch/mips/kernel/syscalls/syscall_n32.tbl
===================================================================
--- linux-2.6.orig/arch/mips/kernel/syscalls/syscall_n32.tbl
+++ linux-2.6/arch/mips/kernel/syscalls/syscall_n32.tbl
@@ -392,3 +392,4 @@
 451	n32	cachestat			sys_cachestat
 452	n32	fchmodat2			sys_fchmodat2
 454	n32	futex_wake			sys_futex_wake
+455	n32	futex_wait			sys_futex_wait
Index: linux-2.6/arch/mips/kernel/syscalls/syscall_n64.tbl
===================================================================
--- linux-2.6.orig/arch/mips/kernel/syscalls/syscall_n64.tbl
+++ linux-2.6/arch/mips/kernel/syscalls/syscall_n64.tbl
@@ -368,3 +368,4 @@
 451	n64	cachestat			sys_cachestat
 452	n64	fchmodat2			sys_fchmodat2
 454	n64	futex_wake			sys_futex_wake
+455	n64	futex_wait			sys_futex_wait
Index: linux-2.6/arch/mips/kernel/syscalls/syscall_o32.tbl
===================================================================
--- linux-2.6.orig/arch/mips/kernel/syscalls/syscall_o32.tbl
+++ linux-2.6/arch/mips/kernel/syscalls/syscall_o32.tbl
@@ -441,3 +441,4 @@
 451	o32	cachestat			sys_cachestat
 452	o32	fchmodat2			sys_fchmodat2
 454	o32	futex_wake			sys_futex_wake
+455	o32	futex_wait			sys_futex_wait
Index: linux-2.6/arch/parisc/kernel/syscalls/syscall.tbl
===================================================================
--- linux-2.6.orig/arch/parisc/kernel/syscalls/syscall.tbl
+++ linux-2.6/arch/parisc/kernel/syscalls/syscall.tbl
@@ -452,3 +452,4 @@
 451	common	cachestat			sys_cachestat
 452	common	fchmodat2			sys_fchmodat2
 454	common	futex_wake			sys_futex_wake
+455	common	futex_wait			sys_futex_wait
Index: linux-2.6/arch/powerpc/kernel/syscalls/syscall.tbl
===================================================================
--- linux-2.6.orig/arch/powerpc/kernel/syscalls/syscall.tbl
+++ linux-2.6/arch/powerpc/kernel/syscalls/syscall.tbl
@@ -540,3 +540,4 @@
 451	common	cachestat			sys_cachestat
 452	common	fchmodat2			sys_fchmodat2
 454	common	futex_wake			sys_futex_wake
+455	common	futex_wait			sys_futex_wait
Index: linux-2.6/arch/s390/kernel/syscalls/syscall.tbl
===================================================================
--- linux-2.6.orig/arch/s390/kernel/syscalls/syscall.tbl
+++ linux-2.6/arch/s390/kernel/syscalls/syscall.tbl
@@ -456,3 +456,4 @@
 451  common	cachestat		sys_cachestat			sys_cachestat
 452  common	fchmodat2		sys_fchmodat2			sys_fchmodat2
 454  common	futex_wake		sys_futex_wake			sys_futex_wake
+455  common	futex_wait		sys_futex_wait			sys_futex_wait
Index: linux-2.6/arch/sh/kernel/syscalls/syscall.tbl
===================================================================
--- linux-2.6.orig/arch/sh/kernel/syscalls/syscall.tbl
+++ linux-2.6/arch/sh/kernel/syscalls/syscall.tbl
@@ -456,3 +456,4 @@
 451	common	cachestat			sys_cachestat
 452	common	fchmodat2			sys_fchmodat2
 454	common	futex_wake			sys_futex_wake
+455	common	futex_wait			sys_futex_wait
Index: linux-2.6/arch/sparc/kernel/syscalls/syscall.tbl
===================================================================
--- linux-2.6.orig/arch/sparc/kernel/syscalls/syscall.tbl
+++ linux-2.6/arch/sparc/kernel/syscalls/syscall.tbl
@@ -499,3 +499,4 @@
 451	common	cachestat			sys_cachestat
 452	common	fchmodat2			sys_fchmodat2
 454	common	futex_wake			sys_futex_wake
+455	common	futex_wait			sys_futex_wait
Index: linux-2.6/arch/x86/entry/syscalls/syscall_32.tbl
===================================================================
--- linux-2.6.orig/arch/x86/entry/syscalls/syscall_32.tbl
+++ linux-2.6/arch/x86/entry/syscalls/syscall_32.tbl
@@ -458,3 +458,4 @@
 451	i386	cachestat		sys_cachestat
 452	i386	fchmodat2		sys_fchmodat2
 454	i386	futex_wake		sys_futex_wake
+455	i386	futex_wait		sys_futex_wait
Index: linux-2.6/arch/x86/entry/syscalls/syscall_64.tbl
===================================================================
--- linux-2.6.orig/arch/x86/entry/syscalls/syscall_64.tbl
+++ linux-2.6/arch/x86/entry/syscalls/syscall_64.tbl
@@ -376,6 +376,7 @@
 452	common	fchmodat2		sys_fchmodat2
 453	64	map_shadow_stack	sys_map_shadow_stack
 454	common	futex_wake		sys_futex_wake
+455	common	futex_wait		sys_futex_wait
 
 #
 # Due to a historical design error, certain syscalls are numbered differently
Index: linux-2.6/arch/xtensa/kernel/syscalls/syscall.tbl
===================================================================
--- linux-2.6.orig/arch/xtensa/kernel/syscalls/syscall.tbl
+++ linux-2.6/arch/xtensa/kernel/syscalls/syscall.tbl
@@ -424,3 +424,4 @@
 451	common	cachestat			sys_cachestat
 452	common	fchmodat2			sys_fchmodat2
 454	common	futex_wake			sys_futex_wake
+455	common	futex_wait			sys_futex_wait
Index: linux-2.6/include/linux/syscalls.h
===================================================================
--- linux-2.6.orig/include/linux/syscalls.h
+++ linux-2.6/include/linux/syscalls.h
@@ -552,6 +552,10 @@ asmlinkage long sys_futex_waitv(struct f
 
 asmlinkage long sys_futex_wake(void __user *uaddr, unsigned long mask, int nr, unsigned int flags);
 
+asmlinkage long sys_futex_wait(void __user *uaddr, unsigned long val, unsigned long mask,
+			       unsigned int flags, struct __kernel_timespec __user *timespec,
+			       clockid_t clockid);
+
 asmlinkage long sys_nanosleep(struct __kernel_timespec __user *rqtp,
 			      struct __kernel_timespec __user *rmtp);
 asmlinkage long sys_nanosleep_time32(struct old_timespec32 __user *rqtp,
Index: linux-2.6/include/uapi/asm-generic/unistd.h
===================================================================
--- linux-2.6.orig/include/uapi/asm-generic/unistd.h
+++ linux-2.6/include/uapi/asm-generic/unistd.h
@@ -824,9 +824,11 @@ __SYSCALL(__NR_cachestat, sys_cachestat)
 __SYSCALL(__NR_fchmodat2, sys_fchmodat2)
 #define __NR_futex_wake 454
 __SYSCALL(__NR_futex_wake, sys_futex_wake)
+#define __NR_futex_wait 455
+__SYSCALL(__NR_futex_wait, sys_futex_wait)
 
 #undef __NR_syscalls
-#define __NR_syscalls 455
+#define __NR_syscalls 456
 
 /*
  * 32 bit systems traditionally used different
Index: linux-2.6/kernel/futex/futex.h
===================================================================
--- linux-2.6.orig/kernel/futex/futex.h
+++ linux-2.6/kernel/futex/futex.h
@@ -332,6 +332,9 @@ extern int futex_requeue(u32 __user *uad
 			 u32 __user *uaddr2, int nr_wake, int nr_requeue,
 			 u32 *cmpval, int requeue_pi);
 
+extern int __futex_wait(u32 __user *uaddr, unsigned int flags, u32 val,
+			struct hrtimer_sleeper *to, u32 bitset);
+
 extern int futex_wait(u32 __user *uaddr, unsigned int flags, u32 val,
 		      ktime_t *abs_time, u32 bitset);
 
Index: linux-2.6/kernel/futex/syscalls.c
===================================================================
--- linux-2.6.orig/kernel/futex/syscalls.c
+++ linux-2.6/kernel/futex/syscalls.c
@@ -221,6 +221,46 @@ static int futex_parse_waitv(struct fute
 	return 0;
 }
 
+static int futex2_setup_timeout(struct __kernel_timespec __user *timeout,
+				clockid_t clockid, struct hrtimer_sleeper *to)
+{
+	int flag_clkid = 0, flag_init = 0;
+	struct timespec64 ts;
+	ktime_t time;
+	int ret;
+
+	if (!timeout)
+		return 0;
+
+	if (clockid == CLOCK_REALTIME) {
+		flag_clkid = FLAGS_CLOCKRT;
+		flag_init = FUTEX_CLOCK_REALTIME;
+	}
+
+	if (clockid != CLOCK_REALTIME && clockid != CLOCK_MONOTONIC)
+		return -EINVAL;
+
+	if (get_timespec64(&ts, timeout))
+		return -EFAULT;
+
+	/*
+	 * Since there's no opcode for futex_waitv, use
+	 * FUTEX_WAIT_BITSET that uses absolute timeout as well
+	 */
+	ret = futex_init_timeout(FUTEX_WAIT_BITSET, flag_init, &ts, &time);
+	if (ret)
+		return ret;
+
+	futex_setup_timer(&time, to, flag_clkid, 0);
+	return 0;
+}
+
+static inline void futex2_destroy_timeout(struct hrtimer_sleeper *to)
+{
+	hrtimer_cancel(&to->timer);
+	destroy_hrtimer_on_stack(&to->timer);
+}
+
 /**
  * sys_futex_waitv - Wait on a list of futexes
  * @waiters:    List of futexes to wait on
@@ -250,8 +290,6 @@ SYSCALL_DEFINE5(futex_waitv, struct fute
 {
 	struct hrtimer_sleeper to;
 	struct futex_vector *futexv;
-	struct timespec64 ts;
-	ktime_t time;
 	int ret;
 
 	/* This syscall supports no flags for now */
@@ -261,30 +299,8 @@ SYSCALL_DEFINE5(futex_waitv, struct fute
 	if (!nr_futexes || nr_futexes > FUTEX_WAITV_MAX || !waiters)
 		return -EINVAL;
 
-	if (timeout) {
-		int flag_clkid = 0, flag_init = 0;
-
-		if (clockid == CLOCK_REALTIME) {
-			flag_clkid = FLAGS_CLOCKRT;
-			flag_init = FUTEX_CLOCK_REALTIME;
-		}
-
-		if (clockid != CLOCK_REALTIME && clockid != CLOCK_MONOTONIC)
-			return -EINVAL;
-
-		if (get_timespec64(&ts, timeout))
-			return -EFAULT;
-
-		/*
-		 * Since there's no opcode for futex_waitv, use
-		 * FUTEX_WAIT_BITSET that uses absolute timeout as well
-		 */
-		ret = futex_init_timeout(FUTEX_WAIT_BITSET, flag_init, &ts, &time);
-		if (ret)
-			return ret;
-
-		futex_setup_timer(&time, &to, flag_clkid, 0);
-	}
+	if (timeout && (ret = futex2_setup_timeout(timeout, clockid, &to)))
+		return ret;
 
 	futexv = kcalloc(nr_futexes, sizeof(*futexv), GFP_KERNEL);
 	if (!futexv) {
@@ -299,10 +315,8 @@ SYSCALL_DEFINE5(futex_waitv, struct fute
 	kfree(futexv);
 
 destroy_timer:
-	if (timeout) {
-		hrtimer_cancel(&to.timer);
-		destroy_hrtimer_on_stack(&to.timer);
-	}
+	if (timeout)
+		futex2_destroy_timeout(&to);
 	return ret;
 }
 
@@ -336,6 +350,52 @@ SYSCALL_DEFINE4(futex_wake,
 	return futex_wake(uaddr, FLAGS_STRICT | flags, nr, mask);
 }
 
+/*
+ * sys_futex_wait - Wait on a futex
+ * @uaddr:	Address of the futex to wait on
+ * @val:	Value of @uaddr
+ * @mask:	bitmask
+ * @flags:	FUTEX2 flags
+ * @timeout:	Optional absolute timeout
+ * @clockid:	Clock to be used for the timeout, realtime or monotonic
+ *
+ * Identical to the traditional FUTEX_WAIT_BITSET op, except it is part of the
+ * futex2 familiy of calls.
+ */
+
+SYSCALL_DEFINE6(futex_wait,
+		void __user *, uaddr,
+		unsigned long, val,
+		unsigned long, mask,
+		unsigned int, flags,
+		struct __kernel_timespec __user *, timeout,
+		clockid_t, clockid)
+{
+	struct hrtimer_sleeper to;
+	int ret;
+
+	if (flags & ~FUTEX2_VALID_MASK)
+		return -EINVAL;
+
+	flags = futex2_to_flags(flags);
+	if (!futex_flags_valid(flags))
+		return -EINVAL;
+
+	if (!futex_validate_input(flags, val) ||
+	    !futex_validate_input(flags, mask))
+		return -EINVAL;
+
+	if (timeout && (ret = futex2_setup_timeout(timeout, clockid, &to)))
+		return ret;
+
+	ret = __futex_wait(uaddr, flags, val, timeout ? &to : NULL, mask);
+
+	if (timeout)
+		futex2_destroy_timeout(&to);
+
+	return ret;
+}
+
 #ifdef CONFIG_COMPAT
 COMPAT_SYSCALL_DEFINE2(set_robust_list,
 		struct compat_robust_list_head __user *, head,
Index: linux-2.6/kernel/futex/waitwake.c
===================================================================
--- linux-2.6.orig/kernel/futex/waitwake.c
+++ linux-2.6/kernel/futex/waitwake.c
@@ -632,20 +632,18 @@ retry_private:
 	return ret;
 }
 
-int futex_wait(u32 __user *uaddr, unsigned int flags, u32 val, ktime_t *abs_time, u32 bitset)
+int __futex_wait(u32 __user *uaddr, unsigned int flags, u32 val,
+		 struct hrtimer_sleeper *to, u32 bitset)
 {
-	struct hrtimer_sleeper timeout, *to;
-	struct restart_block *restart;
-	struct futex_hash_bucket *hb;
 	struct futex_q q = futex_q_init;
+	struct futex_hash_bucket *hb;
 	int ret;
 
 	if (!bitset)
 		return -EINVAL;
+
 	q.bitset = bitset;
 
-	to = futex_setup_timer(abs_time, &timeout, flags,
-			       current->timer_slack_ns);
 retry:
 	/*
 	 * Prepare to wait on uaddr. On success, it holds hb->lock and q
@@ -653,18 +651,17 @@ retry:
 	 */
 	ret = futex_wait_setup(uaddr, val, flags, &q, &hb);
 	if (ret)
-		goto out;
+		return ret;
 
 	/* futex_queue and wait for wakeup, timeout, or a signal. */
 	futex_wait_queue(hb, &q, to);
 
 	/* If we were woken (and unqueued), we succeeded, whatever. */
-	ret = 0;
 	if (!futex_unqueue(&q))
-		goto out;
-	ret = -ETIMEDOUT;
+		return 0;
+
 	if (to && !to->task)
-		goto out;
+		return -ETIMEDOUT;
 
 	/*
 	 * We expect signal_pending(current), but we might be the
@@ -673,24 +670,38 @@ retry:
 	if (!signal_pending(current))
 		goto retry;
 
-	ret = -ERESTARTSYS;
-	if (!abs_time)
-		goto out;
-
-	restart = &current->restart_block;
-	restart->futex.uaddr = uaddr;
-	restart->futex.val = val;
-	restart->futex.time = *abs_time;
-	restart->futex.bitset = bitset;
-	restart->futex.flags = flags | FLAGS_HAS_TIMEOUT;
-
-	ret = set_restart_fn(restart, futex_wait_restart);
-
-out:
-	if (to) {
-		hrtimer_cancel(&to->timer);
-		destroy_hrtimer_on_stack(&to->timer);
+	return -ERESTARTSYS;
+}
+
+int futex_wait(u32 __user *uaddr, unsigned int flags, u32 val, ktime_t *abs_time, u32 bitset)
+{
+	struct hrtimer_sleeper timeout, *to;
+	struct restart_block *restart;
+	int ret;
+
+	to = futex_setup_timer(abs_time, &timeout, flags,
+			       current->timer_slack_ns);
+
+	ret = __futex_wait(uaddr, flags, val, to, bitset);
+
+	/* No timeout, nothing to clean up. */
+	if (!to)
+		return ret;
+
+	hrtimer_cancel(&to->timer);
+	destroy_hrtimer_on_stack(&to->timer);
+
+	if (ret == -ERESTARTSYS) {
+		restart = &current->restart_block;
+		restart->futex.uaddr = uaddr;
+		restart->futex.val = val;
+		restart->futex.time = *abs_time;
+		restart->futex.bitset = bitset;
+		restart->futex.flags = flags | FLAGS_HAS_TIMEOUT;
+
+		return set_restart_fn(restart, futex_wait_restart);
 	}
+
 	return ret;
 }
 
Index: linux-2.6/kernel/sys_ni.c
===================================================================
--- linux-2.6.orig/kernel/sys_ni.c
+++ linux-2.6/kernel/sys_ni.c
@@ -88,6 +88,7 @@ COND_SYSCALL(get_robust_list);
 COND_SYSCALL_COMPAT(get_robust_list);
 COND_SYSCALL(futex_waitv);
 COND_SYSCALL(futex_wake);
+COND_SYSCALL(futex_wait);
 COND_SYSCALL(kexec_load);
 COND_SYSCALL_COMPAT(kexec_load);
 COND_SYSCALL(init_module);



^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH v3 08/15] futex: Propagate flags into get_futex_key()
  2023-09-21 10:45 [PATCH v3 00/15] futex: More futex2 bits peterz
                   ` (6 preceding siblings ...)
  2023-09-21 10:45 ` [PATCH v3 07/15] futex: Add sys_futex_wait() peterz
@ 2023-09-21 10:45 ` peterz
  2023-09-21 17:36   ` [tip: locking/core] " tip-bot2 for peterz@infradead.org
  2023-09-21 10:45 ` [PATCH v3 09/15] futex: Add flags2 argument to futex_requeue() peterz
                   ` (7 subsequent siblings)
  15 siblings, 1 reply; 36+ messages in thread
From: peterz @ 2023-09-21 10:45 UTC (permalink / raw)
  To: tglx, axboe
  Cc: linux-kernel, peterz, mingo, dvhart, dave, andrealmeid,
	Andrew Morton, urezki, hch, lstoakes, Arnd Bergmann, linux-api,
	linux-mm, linux-arch, malteskarupke

[-- Attachment #1: peterz-futex2-get_futex_key-flags.patch --]
[-- Type: text/plain, Size: 5603 bytes --]

Instead of only passing FLAGS_SHARED as a boolean, pass down flags as
a whole.

No functional change intended.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
---
 kernel/futex/core.c     |    5 ++++-
 kernel/futex/futex.h    |    2 +-
 kernel/futex/pi.c       |    4 ++--
 kernel/futex/requeue.c  |    6 +++---
 kernel/futex/waitwake.c |   14 +++++++-------
 5 files changed, 17 insertions(+), 14 deletions(-)

Index: linux-2.6/kernel/futex/core.c
===================================================================
--- linux-2.6.orig/kernel/futex/core.c
+++ linux-2.6/kernel/futex/core.c
@@ -193,7 +193,7 @@ static u64 get_inode_sequence_number(str
 /**
  * get_futex_key() - Get parameters which are the keys for a futex
  * @uaddr:	virtual address of the futex
- * @fshared:	false for a PROCESS_PRIVATE futex, true for PROCESS_SHARED
+ * @flags:	FLAGS_*
  * @key:	address where result is stored.
  * @rw:		mapping needs to be read/write (values: FUTEX_READ,
  *              FUTEX_WRITE)
@@ -217,7 +217,7 @@ static u64 get_inode_sequence_number(str
  *
  * lock_page() might sleep, the caller should not hold a spinlock.
  */
-int get_futex_key(u32 __user *uaddr, bool fshared, union futex_key *key,
+int get_futex_key(u32 __user *uaddr, unsigned int flags, union futex_key *key,
 		  enum futex_access rw)
 {
 	unsigned long address = (unsigned long)uaddr;
@@ -226,6 +226,9 @@ int get_futex_key(u32 __user *uaddr, boo
 	struct folio *folio;
 	struct address_space *mapping;
 	int err, ro = 0;
+	bool fshared;
+
+	fshared = flags & FLAGS_SHARED;
 
 	/*
 	 * The futex address must be "naturally" aligned.
Index: linux-2.6/kernel/futex/futex.h
===================================================================
--- linux-2.6.orig/kernel/futex/futex.h
+++ linux-2.6/kernel/futex/futex.h
@@ -184,7 +184,7 @@ enum futex_access {
 	FUTEX_WRITE
 };
 
-extern int get_futex_key(u32 __user *uaddr, bool fshared, union futex_key *key,
+extern int get_futex_key(u32 __user *uaddr, unsigned int flags, union futex_key *key,
 			 enum futex_access rw);
 
 extern struct hrtimer_sleeper *
Index: linux-2.6/kernel/futex/pi.c
===================================================================
--- linux-2.6.orig/kernel/futex/pi.c
+++ linux-2.6/kernel/futex/pi.c
@@ -933,7 +933,7 @@ int futex_lock_pi(u32 __user *uaddr, uns
 	to = futex_setup_timer(time, &timeout, flags, 0);
 
 retry:
-	ret = get_futex_key(uaddr, flags & FLAGS_SHARED, &q.key, FUTEX_WRITE);
+	ret = get_futex_key(uaddr, flags, &q.key, FUTEX_WRITE);
 	if (unlikely(ret != 0))
 		goto out;
 
@@ -1129,7 +1129,7 @@ retry:
 	if ((uval & FUTEX_TID_MASK) != vpid)
 		return -EPERM;
 
-	ret = get_futex_key(uaddr, flags & FLAGS_SHARED, &key, FUTEX_WRITE);
+	ret = get_futex_key(uaddr, flags, &key, FUTEX_WRITE);
 	if (ret)
 		return ret;
 
Index: linux-2.6/kernel/futex/requeue.c
===================================================================
--- linux-2.6.orig/kernel/futex/requeue.c
+++ linux-2.6/kernel/futex/requeue.c
@@ -424,10 +424,10 @@ int futex_requeue(u32 __user *uaddr1, un
 	}
 
 retry:
-	ret = get_futex_key(uaddr1, flags & FLAGS_SHARED, &key1, FUTEX_READ);
+	ret = get_futex_key(uaddr1, flags, &key1, FUTEX_READ);
 	if (unlikely(ret != 0))
 		return ret;
-	ret = get_futex_key(uaddr2, flags & FLAGS_SHARED, &key2,
+	ret = get_futex_key(uaddr2, flags, &key2,
 			    requeue_pi ? FUTEX_WRITE : FUTEX_READ);
 	if (unlikely(ret != 0))
 		return ret;
@@ -789,7 +789,7 @@ int futex_wait_requeue_pi(u32 __user *ua
 	 */
 	rt_mutex_init_waiter(&rt_waiter);
 
-	ret = get_futex_key(uaddr2, flags & FLAGS_SHARED, &key2, FUTEX_WRITE);
+	ret = get_futex_key(uaddr2, flags, &key2, FUTEX_WRITE);
 	if (unlikely(ret != 0))
 		goto out;
 
Index: linux-2.6/kernel/futex/waitwake.c
===================================================================
--- linux-2.6.orig/kernel/futex/waitwake.c
+++ linux-2.6/kernel/futex/waitwake.c
@@ -145,13 +145,13 @@ int futex_wake(u32 __user *uaddr, unsign
 	struct futex_hash_bucket *hb;
 	struct futex_q *this, *next;
 	union futex_key key = FUTEX_KEY_INIT;
-	int ret;
 	DEFINE_WAKE_Q(wake_q);
+	int ret;
 
 	if (!bitset)
 		return -EINVAL;
 
-	ret = get_futex_key(uaddr, flags & FLAGS_SHARED, &key, FUTEX_READ);
+	ret = get_futex_key(uaddr, flags, &key, FUTEX_READ);
 	if (unlikely(ret != 0))
 		return ret;
 
@@ -248,10 +248,10 @@ int futex_wake_op(u32 __user *uaddr1, un
 	DEFINE_WAKE_Q(wake_q);
 
 retry:
-	ret = get_futex_key(uaddr1, flags & FLAGS_SHARED, &key1, FUTEX_READ);
+	ret = get_futex_key(uaddr1, flags, &key1, FUTEX_READ);
 	if (unlikely(ret != 0))
 		return ret;
-	ret = get_futex_key(uaddr2, flags & FLAGS_SHARED, &key2, FUTEX_WRITE);
+	ret = get_futex_key(uaddr2, flags, &key2, FUTEX_WRITE);
 	if (unlikely(ret != 0))
 		return ret;
 
@@ -426,7 +426,7 @@ retry:
 			continue;
 
 		ret = get_futex_key(u64_to_user_ptr(vs[i].w.uaddr),
-				    vs[i].w.flags & FLAGS_SHARED,
+				    vs[i].w.flags,
 				    &vs[i].q.key, FUTEX_READ);
 
 		if (unlikely(ret))
@@ -438,7 +438,7 @@ retry:
 	for (i = 0; i < count; i++) {
 		u32 __user *uaddr = (u32 __user *)(unsigned long)vs[i].w.uaddr;
 		struct futex_q *q = &vs[i].q;
-		u32 val = (u32)vs[i].w.val;
+		u32 val = vs[i].w.val;
 
 		hb = futex_q_lock(q);
 		ret = futex_get_value_locked(&uval, uaddr);
@@ -602,7 +602,7 @@ int futex_wait_setup(u32 __user *uaddr,
 	 * while the syscall executes.
 	 */
 retry:
-	ret = get_futex_key(uaddr, flags & FLAGS_SHARED, &q->key, FUTEX_READ);
+	ret = get_futex_key(uaddr, flags, &q->key, FUTEX_READ);
 	if (unlikely(ret != 0))
 		return ret;
 



^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH v3 09/15] futex: Add flags2 argument to futex_requeue()
  2023-09-21 10:45 [PATCH v3 00/15] futex: More futex2 bits peterz
                   ` (7 preceding siblings ...)
  2023-09-21 10:45 ` [PATCH v3 08/15] futex: Propagate flags into get_futex_key() peterz
@ 2023-09-21 10:45 ` peterz
  2023-09-21 17:36   ` [tip: locking/core] " tip-bot2 for peterz@infradead.org
  2023-09-21 10:45 ` [PATCH v3 10/15] futex: Add sys_futex_requeue() peterz
                   ` (6 subsequent siblings)
  15 siblings, 1 reply; 36+ messages in thread
From: peterz @ 2023-09-21 10:45 UTC (permalink / raw)
  To: tglx, axboe
  Cc: linux-kernel, peterz, mingo, dvhart, dave, andrealmeid,
	Andrew Morton, urezki, hch, lstoakes, Arnd Bergmann, linux-api,
	linux-mm, linux-arch, malteskarupke

[-- Attachment #1: peterz-futex2-requeue-flags.patch --]
[-- Type: text/plain, Size: 3988 bytes --]

In order to support mixed size requeue, add a second flags argument to
the internal futex_requeue() function.

No functional change intended.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
---
 kernel/futex/futex.h    |    5 +++--
 kernel/futex/requeue.c  |   12 +++++++-----
 kernel/futex/syscalls.c |    6 +++---
 3 files changed, 13 insertions(+), 10 deletions(-)

Index: linux-2.6/kernel/futex/futex.h
===================================================================
--- linux-2.6.orig/kernel/futex/futex.h
+++ linux-2.6/kernel/futex/futex.h
@@ -328,8 +328,9 @@ extern int futex_wait_requeue_pi(u32 __u
 				 val, ktime_t *abs_time, u32 bitset, u32 __user
 				 *uaddr2);
 
-extern int futex_requeue(u32 __user *uaddr1, unsigned int flags,
-			 u32 __user *uaddr2, int nr_wake, int nr_requeue,
+extern int futex_requeue(u32 __user *uaddr1, unsigned int flags1,
+			 u32 __user *uaddr2, unsigned int flags2,
+			 int nr_wake, int nr_requeue,
 			 u32 *cmpval, int requeue_pi);
 
 extern int __futex_wait(u32 __user *uaddr, unsigned int flags, u32 val,
Index: linux-2.6/kernel/futex/requeue.c
===================================================================
--- linux-2.6.orig/kernel/futex/requeue.c
+++ linux-2.6/kernel/futex/requeue.c
@@ -346,8 +346,9 @@ futex_proxy_trylock_atomic(u32 __user *p
 /**
  * futex_requeue() - Requeue waiters from uaddr1 to uaddr2
  * @uaddr1:	source futex user address
- * @flags:	futex flags (FLAGS_SHARED, etc.)
+ * @flags1:	futex flags (FLAGS_SHARED, etc.)
  * @uaddr2:	target futex user address
+ * @flags2:	futex flags (FLAGS_SHARED, etc.)
  * @nr_wake:	number of waiters to wake (must be 1 for requeue_pi)
  * @nr_requeue:	number of waiters to requeue (0-INT_MAX)
  * @cmpval:	@uaddr1 expected value (or %NULL)
@@ -361,7 +362,8 @@ futex_proxy_trylock_atomic(u32 __user *p
  *  - >=0 - on success, the number of tasks requeued or woken;
  *  -  <0 - on error
  */
-int futex_requeue(u32 __user *uaddr1, unsigned int flags, u32 __user *uaddr2,
+int futex_requeue(u32 __user *uaddr1, unsigned int flags1,
+		  u32 __user *uaddr2, unsigned int flags2,
 		  int nr_wake, int nr_requeue, u32 *cmpval, int requeue_pi)
 {
 	union futex_key key1 = FUTEX_KEY_INIT, key2 = FUTEX_KEY_INIT;
@@ -424,10 +426,10 @@ int futex_requeue(u32 __user *uaddr1, un
 	}
 
 retry:
-	ret = get_futex_key(uaddr1, flags, &key1, FUTEX_READ);
+	ret = get_futex_key(uaddr1, flags1, &key1, FUTEX_READ);
 	if (unlikely(ret != 0))
 		return ret;
-	ret = get_futex_key(uaddr2, flags, &key2,
+	ret = get_futex_key(uaddr2, flags2, &key2,
 			    requeue_pi ? FUTEX_WRITE : FUTEX_READ);
 	if (unlikely(ret != 0))
 		return ret;
@@ -459,7 +461,7 @@ retry_private:
 			if (ret)
 				return ret;
 
-			if (!(flags & FLAGS_SHARED))
+			if (!(flags1 & FLAGS_SHARED))
 				goto retry_private;
 
 			goto retry;
Index: linux-2.6/kernel/futex/syscalls.c
===================================================================
--- linux-2.6.orig/kernel/futex/syscalls.c
+++ linux-2.6/kernel/futex/syscalls.c
@@ -106,9 +106,9 @@ long do_futex(u32 __user *uaddr, int op,
 	case FUTEX_WAKE_BITSET:
 		return futex_wake(uaddr, flags, val, val3);
 	case FUTEX_REQUEUE:
-		return futex_requeue(uaddr, flags, uaddr2, val, val2, NULL, 0);
+		return futex_requeue(uaddr, flags, uaddr2, flags, val, val2, NULL, 0);
 	case FUTEX_CMP_REQUEUE:
-		return futex_requeue(uaddr, flags, uaddr2, val, val2, &val3, 0);
+		return futex_requeue(uaddr, flags, uaddr2, flags, val, val2, &val3, 0);
 	case FUTEX_WAKE_OP:
 		return futex_wake_op(uaddr, flags, uaddr2, val, val2, val3);
 	case FUTEX_LOCK_PI:
@@ -125,7 +125,7 @@ long do_futex(u32 __user *uaddr, int op,
 		return futex_wait_requeue_pi(uaddr, flags, val, timeout, val3,
 					     uaddr2);
 	case FUTEX_CMP_REQUEUE_PI:
-		return futex_requeue(uaddr, flags, uaddr2, val, val2, &val3, 1);
+		return futex_requeue(uaddr, flags, uaddr2, flags, val, val2, &val3, 1);
 	}
 	return -ENOSYS;
 }



^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH v3 10/15] futex: Add sys_futex_requeue()
  2023-09-21 10:45 [PATCH v3 00/15] futex: More futex2 bits peterz
                   ` (8 preceding siblings ...)
  2023-09-21 10:45 ` [PATCH v3 09/15] futex: Add flags2 argument to futex_requeue() peterz
@ 2023-09-21 10:45 ` peterz
  2023-09-21 17:35   ` [tip: locking/core] " tip-bot2 for peterz@infradead.org
  2023-09-22  9:35   ` [PATCH v3 10/15] " Ingo Molnar
  2023-09-21 10:45 ` [PATCH v3 11/15] mm: Add vmalloc_huge_node() peterz
                   ` (5 subsequent siblings)
  15 siblings, 2 replies; 36+ messages in thread
From: peterz @ 2023-09-21 10:45 UTC (permalink / raw)
  To: tglx, axboe
  Cc: linux-kernel, peterz, mingo, dvhart, dave, andrealmeid,
	Andrew Morton, urezki, hch, lstoakes, Arnd Bergmann, linux-api,
	linux-mm, linux-arch, malteskarupke, Geert Uytterhoeven

[-- Attachment #1: peterz-futex2-requeue.patch --]
[-- Type: text/plain, Size: 12808 bytes --]

Finish off the 'simple' futex2 syscall group by adding
sys_futex_requeue(). Unlike sys_futex_{wait,wake}() its arguments are
too numerous to fit into a regular syscall. As such, use struct
futex_waitv to pass the 'source' and 'destination' futexes to the
syscall.

This syscall implements what was previously known as FUTEX_CMP_REQUEUE
and uses {val, uaddr, flags} for source and {uaddr, flags} for
destination.

This design explicitly allows requeueing between different types of
futex by having a different flags word per uaddr.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
---
 arch/alpha/kernel/syscalls/syscall.tbl      |    1 
 arch/arm/tools/syscall.tbl                  |    1 
 arch/arm64/include/asm/unistd.h             |    2 -
 arch/arm64/include/asm/unistd32.h           |    2 +
 arch/ia64/kernel/syscalls/syscall.tbl       |    1 
 arch/m68k/kernel/syscalls/syscall.tbl       |    1 
 arch/microblaze/kernel/syscalls/syscall.tbl |    1 
 arch/mips/kernel/syscalls/syscall_n32.tbl   |    1 
 arch/mips/kernel/syscalls/syscall_n64.tbl   |    1 
 arch/mips/kernel/syscalls/syscall_o32.tbl   |    1 
 arch/parisc/kernel/syscalls/syscall.tbl     |    1 
 arch/powerpc/kernel/syscalls/syscall.tbl    |    1 
 arch/s390/kernel/syscalls/syscall.tbl       |    1 
 arch/sh/kernel/syscalls/syscall.tbl         |    1 
 arch/sparc/kernel/syscalls/syscall.tbl      |    1 
 arch/x86/entry/syscalls/syscall_32.tbl      |    1 
 arch/x86/entry/syscalls/syscall_64.tbl      |    1 
 arch/xtensa/kernel/syscalls/syscall.tbl     |    1 
 include/linux/syscalls.h                    |    3 ++
 include/uapi/asm-generic/unistd.h           |    4 ++
 kernel/futex/syscalls.c                     |   38 ++++++++++++++++++++++++++++
 kernel/sys_ni.c                             |    1 
 22 files changed, 64 insertions(+), 2 deletions(-)

Index: linux-2.6/arch/alpha/kernel/syscalls/syscall.tbl
===================================================================
--- linux-2.6.orig/arch/alpha/kernel/syscalls/syscall.tbl
+++ linux-2.6/arch/alpha/kernel/syscalls/syscall.tbl
@@ -494,3 +494,4 @@
 562	common	fchmodat2			sys_fchmodat2
 563	common	futex_wake			sys_futex_wake
 564	common	futex_wait			sys_futex_wait
+565	common	futex_requeue			sys_futex_requeue
Index: linux-2.6/arch/arm/tools/syscall.tbl
===================================================================
--- linux-2.6.orig/arch/arm/tools/syscall.tbl
+++ linux-2.6/arch/arm/tools/syscall.tbl
@@ -468,3 +468,4 @@
 452	common	fchmodat2			sys_fchmodat2
 454	common	futex_wake			sys_futex_wake
 455	common	futex_wait			sys_futex_wait
+456	common	futex_requeue			sys_futex_requeue
Index: linux-2.6/arch/arm64/include/asm/unistd.h
===================================================================
--- linux-2.6.orig/arch/arm64/include/asm/unistd.h
+++ linux-2.6/arch/arm64/include/asm/unistd.h
@@ -39,7 +39,7 @@
 #define __ARM_NR_compat_set_tls		(__ARM_NR_COMPAT_BASE + 5)
 #define __ARM_NR_COMPAT_END		(__ARM_NR_COMPAT_BASE + 0x800)
 
-#define __NR_compat_syscalls		456
+#define __NR_compat_syscalls		457
 #endif
 
 #define __ARCH_WANT_SYS_CLONE
Index: linux-2.6/arch/arm64/include/asm/unistd32.h
===================================================================
--- linux-2.6.orig/arch/arm64/include/asm/unistd32.h
+++ linux-2.6/arch/arm64/include/asm/unistd32.h
@@ -915,6 +915,8 @@ __SYSCALL(__NR_fchmodat2, sys_fchmodat2)
 __SYSCALL(__NR_futex_wake, sys_futex_wake)
 #define __NR_futex_wait 455
 __SYSCALL(__NR_futex_wait, sys_futex_wait)
+#define __NR_futex_requeue 456
+__SYSCALL(__NR_futex_requeue, sys_futex_requeue)
 
 /*
  * Please add new compat syscalls above this comment and update
Index: linux-2.6/arch/ia64/kernel/syscalls/syscall.tbl
===================================================================
--- linux-2.6.orig/arch/ia64/kernel/syscalls/syscall.tbl
+++ linux-2.6/arch/ia64/kernel/syscalls/syscall.tbl
@@ -375,3 +375,4 @@
 452	common	fchmodat2			sys_fchmodat2
 454	common	futex_wake			sys_futex_wake
 455	common	futex_wait			sys_futex_wait
+456	common	futex_requeue			sys_futex_requeue
Index: linux-2.6/arch/m68k/kernel/syscalls/syscall.tbl
===================================================================
--- linux-2.6.orig/arch/m68k/kernel/syscalls/syscall.tbl
+++ linux-2.6/arch/m68k/kernel/syscalls/syscall.tbl
@@ -454,3 +454,4 @@
 452	common	fchmodat2			sys_fchmodat2
 454	common	futex_wake			sys_futex_wake
 455	common	futex_wait			sys_futex_wait
+456	common	futex_requeue			sys_futex_requeue
Index: linux-2.6/arch/microblaze/kernel/syscalls/syscall.tbl
===================================================================
--- linux-2.6.orig/arch/microblaze/kernel/syscalls/syscall.tbl
+++ linux-2.6/arch/microblaze/kernel/syscalls/syscall.tbl
@@ -460,3 +460,4 @@
 452	common	fchmodat2			sys_fchmodat2
 454	common	futex_wake			sys_futex_wake
 455	common	futex_wait			sys_futex_wait
+456	common	futex_requeue			sys_futex_requeue
Index: linux-2.6/arch/mips/kernel/syscalls/syscall_n32.tbl
===================================================================
--- linux-2.6.orig/arch/mips/kernel/syscalls/syscall_n32.tbl
+++ linux-2.6/arch/mips/kernel/syscalls/syscall_n32.tbl
@@ -393,3 +393,4 @@
 452	n32	fchmodat2			sys_fchmodat2
 454	n32	futex_wake			sys_futex_wake
 455	n32	futex_wait			sys_futex_wait
+456	n32	futex_requeue			sys_futex_requeue
Index: linux-2.6/arch/mips/kernel/syscalls/syscall_n64.tbl
===================================================================
--- linux-2.6.orig/arch/mips/kernel/syscalls/syscall_n64.tbl
+++ linux-2.6/arch/mips/kernel/syscalls/syscall_n64.tbl
@@ -369,3 +369,4 @@
 452	n64	fchmodat2			sys_fchmodat2
 454	n64	futex_wake			sys_futex_wake
 455	n64	futex_wait			sys_futex_wait
+456	n64	futex_requeue			sys_futex_requeue
Index: linux-2.6/arch/mips/kernel/syscalls/syscall_o32.tbl
===================================================================
--- linux-2.6.orig/arch/mips/kernel/syscalls/syscall_o32.tbl
+++ linux-2.6/arch/mips/kernel/syscalls/syscall_o32.tbl
@@ -442,3 +442,4 @@
 452	o32	fchmodat2			sys_fchmodat2
 454	o32	futex_wake			sys_futex_wake
 455	o32	futex_wait			sys_futex_wait
+456	o32	futex_requeue			sys_futex_requeue
Index: linux-2.6/arch/parisc/kernel/syscalls/syscall.tbl
===================================================================
--- linux-2.6.orig/arch/parisc/kernel/syscalls/syscall.tbl
+++ linux-2.6/arch/parisc/kernel/syscalls/syscall.tbl
@@ -453,3 +453,4 @@
 452	common	fchmodat2			sys_fchmodat2
 454	common	futex_wake			sys_futex_wake
 455	common	futex_wait			sys_futex_wait
+456	common	futex_requeue			sys_futex_requeue
Index: linux-2.6/arch/powerpc/kernel/syscalls/syscall.tbl
===================================================================
--- linux-2.6.orig/arch/powerpc/kernel/syscalls/syscall.tbl
+++ linux-2.6/arch/powerpc/kernel/syscalls/syscall.tbl
@@ -541,3 +541,4 @@
 452	common	fchmodat2			sys_fchmodat2
 454	common	futex_wake			sys_futex_wake
 455	common	futex_wait			sys_futex_wait
+456	common	futex_requeue			sys_futex_requeue
Index: linux-2.6/arch/s390/kernel/syscalls/syscall.tbl
===================================================================
--- linux-2.6.orig/arch/s390/kernel/syscalls/syscall.tbl
+++ linux-2.6/arch/s390/kernel/syscalls/syscall.tbl
@@ -457,3 +457,4 @@
 452  common	fchmodat2		sys_fchmodat2			sys_fchmodat2
 454  common	futex_wake		sys_futex_wake			sys_futex_wake
 455  common	futex_wait		sys_futex_wait			sys_futex_wait
+456  common	futex_requeue		sys_futex_requeue			sys_futex_requeue
Index: linux-2.6/arch/sh/kernel/syscalls/syscall.tbl
===================================================================
--- linux-2.6.orig/arch/sh/kernel/syscalls/syscall.tbl
+++ linux-2.6/arch/sh/kernel/syscalls/syscall.tbl
@@ -457,3 +457,4 @@
 452	common	fchmodat2			sys_fchmodat2
 454	common	futex_wake			sys_futex_wake
 455	common	futex_wait			sys_futex_wait
+456	common	futex_requeue			sys_futex_requeue
Index: linux-2.6/arch/sparc/kernel/syscalls/syscall.tbl
===================================================================
--- linux-2.6.orig/arch/sparc/kernel/syscalls/syscall.tbl
+++ linux-2.6/arch/sparc/kernel/syscalls/syscall.tbl
@@ -500,3 +500,4 @@
 452	common	fchmodat2			sys_fchmodat2
 454	common	futex_wake			sys_futex_wake
 455	common	futex_wait			sys_futex_wait
+456	common	futex_requeue			sys_futex_requeue
Index: linux-2.6/arch/x86/entry/syscalls/syscall_32.tbl
===================================================================
--- linux-2.6.orig/arch/x86/entry/syscalls/syscall_32.tbl
+++ linux-2.6/arch/x86/entry/syscalls/syscall_32.tbl
@@ -459,3 +459,4 @@
 452	i386	fchmodat2		sys_fchmodat2
 454	i386	futex_wake		sys_futex_wake
 455	i386	futex_wait		sys_futex_wait
+456	i386	futex_requeue		sys_futex_requeue
Index: linux-2.6/arch/x86/entry/syscalls/syscall_64.tbl
===================================================================
--- linux-2.6.orig/arch/x86/entry/syscalls/syscall_64.tbl
+++ linux-2.6/arch/x86/entry/syscalls/syscall_64.tbl
@@ -377,6 +377,7 @@
 453	64	map_shadow_stack	sys_map_shadow_stack
 454	common	futex_wake		sys_futex_wake
 455	common	futex_wait		sys_futex_wait
+456	common	futex_requeue		sys_futex_requeue
 
 #
 # Due to a historical design error, certain syscalls are numbered differently
Index: linux-2.6/arch/xtensa/kernel/syscalls/syscall.tbl
===================================================================
--- linux-2.6.orig/arch/xtensa/kernel/syscalls/syscall.tbl
+++ linux-2.6/arch/xtensa/kernel/syscalls/syscall.tbl
@@ -425,3 +425,4 @@
 452	common	fchmodat2			sys_fchmodat2
 454	common	futex_wake			sys_futex_wake
 455	common	futex_wait			sys_futex_wait
+456	common	futex_requeue			sys_futex_requeue
Index: linux-2.6/include/linux/syscalls.h
===================================================================
--- linux-2.6.orig/include/linux/syscalls.h
+++ linux-2.6/include/linux/syscalls.h
@@ -556,6 +556,9 @@ asmlinkage long sys_futex_wait(void __us
 			       unsigned int flags, struct __kernel_timespec __user *timespec,
 			       clockid_t clockid);
 
+asmlinkage long sys_futex_requeue(struct futex_waitv __user *waiters,
+				  unsigned int flags, int nr_wake, int nr_requeue);
+
 asmlinkage long sys_nanosleep(struct __kernel_timespec __user *rqtp,
 			      struct __kernel_timespec __user *rmtp);
 asmlinkage long sys_nanosleep_time32(struct old_timespec32 __user *rqtp,
Index: linux-2.6/include/uapi/asm-generic/unistd.h
===================================================================
--- linux-2.6.orig/include/uapi/asm-generic/unistd.h
+++ linux-2.6/include/uapi/asm-generic/unistd.h
@@ -826,9 +826,11 @@ __SYSCALL(__NR_fchmodat2, sys_fchmodat2)
 __SYSCALL(__NR_futex_wake, sys_futex_wake)
 #define __NR_futex_wait 455
 __SYSCALL(__NR_futex_wait, sys_futex_wait)
+#define __NR_futex_requeue 456
+__SYSCALL(__NR_futex_requeue, sys_futex_requeue)
 
 #undef __NR_syscalls
-#define __NR_syscalls 456
+#define __NR_syscalls 457
 
 /*
  * 32 bit systems traditionally used different
Index: linux-2.6/kernel/futex/syscalls.c
===================================================================
--- linux-2.6.orig/kernel/futex/syscalls.c
+++ linux-2.6/kernel/futex/syscalls.c
@@ -396,6 +396,44 @@ SYSCALL_DEFINE6(futex_wait,
 	return ret;
 }
 
+/*
+ * sys_futex_requeue - Requeue a waiter from one futex to another
+ * @waiters:	array describing the source and destination futex
+ * @flags:	unused
+ * @nr_wake:	number of futexes to wake
+ * @nr_requeue:	number of futexes to requeue
+ *
+ * Identical to the traditional FUTEX_CMP_REQUEUE op, except it is part of the
+ * futex2 family of calls.
+ */
+
+SYSCALL_DEFINE4(futex_requeue,
+		struct futex_waitv __user *, waiters,
+		unsigned int, flags,
+		int, nr_wake,
+		int, nr_requeue)
+{
+	struct futex_vector futexes[2];
+	u32 cmpval;
+	int ret;
+
+	if (flags)
+		return -EINVAL;
+
+	if (!waiters)
+		return -EINVAL;
+
+	ret = futex_parse_waitv(futexes, waiters, 2);
+	if (ret)
+		return ret;
+
+	cmpval = futexes[0].w.val;
+
+	return futex_requeue(u64_to_user_ptr(futexes[0].w.uaddr), futexes[0].w.flags,
+			     u64_to_user_ptr(futexes[1].w.uaddr), futexes[1].w.flags,
+			     nr_wake, nr_requeue, &cmpval, 0);
+}
+
 #ifdef CONFIG_COMPAT
 COMPAT_SYSCALL_DEFINE2(set_robust_list,
 		struct compat_robust_list_head __user *, head,
Index: linux-2.6/kernel/sys_ni.c
===================================================================
--- linux-2.6.orig/kernel/sys_ni.c
+++ linux-2.6/kernel/sys_ni.c
@@ -89,6 +89,7 @@ COND_SYSCALL_COMPAT(get_robust_list);
 COND_SYSCALL(futex_waitv);
 COND_SYSCALL(futex_wake);
 COND_SYSCALL(futex_wait);
+COND_SYSCALL(futex_requeue);
 COND_SYSCALL(kexec_load);
 COND_SYSCALL_COMPAT(kexec_load);
 COND_SYSCALL(init_module);



^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH v3 11/15] mm: Add vmalloc_huge_node()
  2023-09-21 10:45 [PATCH v3 00/15] futex: More futex2 bits peterz
                   ` (9 preceding siblings ...)
  2023-09-21 10:45 ` [PATCH v3 10/15] futex: Add sys_futex_requeue() peterz
@ 2023-09-21 10:45 ` peterz
  2023-09-21 10:45 ` [PATCH v3 12/15] futex: Implement FUTEX2_NUMA peterz
                   ` (4 subsequent siblings)
  15 siblings, 0 replies; 36+ messages in thread
From: peterz @ 2023-09-21 10:45 UTC (permalink / raw)
  To: tglx, axboe
  Cc: linux-kernel, peterz, mingo, dvhart, dave, andrealmeid,
	Andrew Morton, urezki, hch, lstoakes, Arnd Bergmann, linux-api,
	linux-mm, linux-arch, malteskarupke, Christoph Hellwig

[-- Attachment #1: peterz-vmalloc_huge_node.patch --]
[-- Type: text/plain, Size: 1556 bytes --]

To enable node specific hash-tables.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 include/linux/vmalloc.h |    1 +
 mm/vmalloc.c            |    7 +++++++
 2 files changed, 8 insertions(+)

Index: linux-2.6/include/linux/vmalloc.h
===================================================================
--- linux-2.6.orig/include/linux/vmalloc.h
+++ linux-2.6/include/linux/vmalloc.h
@@ -152,6 +152,7 @@ extern void *__vmalloc_node_range(unsign
 void *__vmalloc_node(unsigned long size, unsigned long align, gfp_t gfp_mask,
 		int node, const void *caller) __alloc_size(1);
 void *vmalloc_huge(unsigned long size, gfp_t gfp_mask) __alloc_size(1);
+void *vmalloc_huge_node(unsigned long size, gfp_t gfp_mask, int node) __alloc_size(1);
 
 extern void *__vmalloc_array(size_t n, size_t size, gfp_t flags) __alloc_size(1, 2);
 extern void *vmalloc_array(size_t n, size_t size) __alloc_size(1, 2);
Index: linux-2.6/mm/vmalloc.c
===================================================================
--- linux-2.6.orig/mm/vmalloc.c
+++ linux-2.6/mm/vmalloc.c
@@ -3420,6 +3420,13 @@ void *vmalloc(unsigned long size)
 }
 EXPORT_SYMBOL(vmalloc);
 
+void *vmalloc_huge_node(unsigned long size, gfp_t gfp_mask, int node)
+{
+	return __vmalloc_node_range(size, 1, VMALLOC_START, VMALLOC_END,
+				    gfp_mask, PAGE_KERNEL, VM_ALLOW_HUGE_VMAP,
+				    node, __builtin_return_address(0));
+}
+
 /**
  * vmalloc_huge - allocate virtually contiguous memory, allow huge pages
  * @size:      allocation size



^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH v3 12/15] futex: Implement FUTEX2_NUMA
  2023-09-21 10:45 [PATCH v3 00/15] futex: More futex2 bits peterz
                   ` (10 preceding siblings ...)
  2023-09-21 10:45 ` [PATCH v3 11/15] mm: Add vmalloc_huge_node() peterz
@ 2023-09-21 10:45 ` peterz
  2023-09-21 10:45 ` [PATCH v3 13/15] futex: Propagate flags into futex_get_value_locked() peterz
                   ` (3 subsequent siblings)
  15 siblings, 0 replies; 36+ messages in thread
From: peterz @ 2023-09-21 10:45 UTC (permalink / raw)
  To: tglx, axboe
  Cc: linux-kernel, peterz, mingo, dvhart, dave, andrealmeid,
	Andrew Morton, urezki, hch, lstoakes, Arnd Bergmann, linux-api,
	linux-mm, linux-arch, malteskarupke

[-- Attachment #1: peterz-futex2-numa.patch --]
[-- Type: text/plain, Size: 9743 bytes --]

Extend the futex2 interface to be numa aware.

When FUTEX2_NUMA is specified for a futex, the user value is extended
to two words (of the same size). The first is the user value we all
know, the second one will be the node to place this futex on.

  struct futex_numa_32 {
	u32 val;
	u32 node;
  };

When node is set to ~0, WAIT will set it to the current node_id such
that WAKE knows where to find it. If userspace corrupts the node value
between WAIT and WAKE, the futex will not be found and no wakeup will
happen.

When FUTEX2_NUMA is not set, the node is simply an extention of the
hash, such that traditional futexes are still interleaved over the
nodes.

This is done to avoid having to have a separate !numa hash-table.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 include/linux/futex.h   |    3 +
 kernel/futex/core.c     |  129 +++++++++++++++++++++++++++++++++++++++---------
 kernel/futex/futex.h    |   25 +++++++--
 kernel/futex/syscalls.c |    2 
 4 files changed, 128 insertions(+), 31 deletions(-)

Index: linux-2.6/include/linux/futex.h
===================================================================
--- linux-2.6.orig/include/linux/futex.h
+++ linux-2.6/include/linux/futex.h
@@ -34,6 +34,7 @@ union futex_key {
 		u64 i_seq;
 		unsigned long pgoff;
 		unsigned int offset;
+		/* unsigned int node; */
 	} shared;
 	struct {
 		union {
@@ -42,11 +43,13 @@ union futex_key {
 		};
 		unsigned long address;
 		unsigned int offset;
+		/* unsigned int node; */
 	} private;
 	struct {
 		u64 ptr;
 		unsigned long word;
 		unsigned int offset;
+		unsigned int node;	/* NOT hashed! */
 	} both;
 };
 
Index: linux-2.6/kernel/futex/core.c
===================================================================
--- linux-2.6.orig/kernel/futex/core.c
+++ linux-2.6/kernel/futex/core.c
@@ -34,7 +34,8 @@
 #include <linux/compat.h>
 #include <linux/jhash.h>
 #include <linux/pagemap.h>
-#include <linux/memblock.h>
+#include <linux/gfp.h>
+#include <linux/vmalloc.h>
 #include <linux/fault-inject.h>
 #include <linux/slab.h>
 
@@ -47,12 +48,14 @@
  * reside in the same cacheline.
  */
 static struct {
-	struct futex_hash_bucket *queues;
 	unsigned long            hashsize;
+	unsigned int		 hashshift;
+	struct futex_hash_bucket *queues[MAX_NUMNODES];
 } __futex_data __read_mostly __aligned(2*sizeof(long));
-#define futex_queues   (__futex_data.queues)
-#define futex_hashsize (__futex_data.hashsize)
 
+#define futex_hashsize	(__futex_data.hashsize)
+#define futex_hashshift	(__futex_data.hashshift)
+#define futex_queues	(__futex_data.queues)
 
 /*
  * Fault injections for futexes.
@@ -105,6 +108,26 @@ late_initcall(fail_futex_debugfs);
 
 #endif /* CONFIG_FAIL_FUTEX */
 
+static int futex_get_value(u32 *val, u32 __user *from, unsigned int flags)
+{
+	switch (futex_size(flags)) {
+	case 1: return __get_user(*val, (u8 __user *)from);
+	case 2: return __get_user(*val, (u16 __user *)from);
+	case 4: return __get_user(*val, (u32 __user *)from);
+	default: BUG();
+	}
+}
+
+static int futex_put_value(u32 val, u32 __user *to, unsigned int flags)
+{
+	switch (futex_size(flags)) {
+	case 1: return __put_user(val, (u8 __user *)to);
+	case 2: return __put_user(val, (u16 __user *)to);
+	case 4: return __put_user(val, (u32 __user *)to);
+	default: BUG();
+	}
+}
+
 /**
  * futex_hash - Return the hash bucket in the global hash
  * @key:	Pointer to the futex key for which the hash is calculated
@@ -114,10 +137,29 @@ late_initcall(fail_futex_debugfs);
  */
 struct futex_hash_bucket *futex_hash(union futex_key *key)
 {
-	u32 hash = jhash2((u32 *)key, offsetof(typeof(*key), both.offset) / 4,
+	u32 hash = jhash2((u32 *)key,
+			  offsetof(typeof(*key), both.offset) / sizeof(u32),
 			  key->both.offset);
+	int node = key->both.node;
+
+	if (node == FUTEX_NO_NODE) {
+		/*
+		 * In case of !FLAGS_NUMA, use some unused hash bits to pick a
+		 * node -- this ensures regular futexes are interleaved across
+		 * the nodes and avoids having to allocate multiple
+		 * hash-tables.
+		 *
+		 * NOTE: this isn't perfectly uniform, but it is fast and
+		 * handles sparse node masks.
+		 */
+		node = (hash >> futex_hashshift) % nr_node_ids;
+		if (!node_possible(node)) {
+			node = find_next_bit_wrap(node_possible_map.bits,
+						  nr_node_ids, node);
+		}
+	}
 
-	return &futex_queues[hash & (futex_hashsize - 1)];
+	return &futex_queues[node][hash & (futex_hashsize - 1)];
 }
 
 
@@ -217,7 +259,7 @@ static u64 get_inode_sequence_number(str
  *
  * lock_page() might sleep, the caller should not hold a spinlock.
  */
-int get_futex_key(u32 __user *uaddr, unsigned int flags, union futex_key *key,
+int get_futex_key(void __user *uaddr, unsigned int flags, union futex_key *key,
 		  enum futex_access rw)
 {
 	unsigned long address = (unsigned long)uaddr;
@@ -225,25 +267,49 @@ int get_futex_key(u32 __user *uaddr, uns
 	struct page *page;
 	struct folio *folio;
 	struct address_space *mapping;
-	int err, ro = 0;
+	int node, err, size, ro = 0;
 	bool fshared;
 
 	fshared = flags & FLAGS_SHARED;
+	size = futex_size(flags);
+	if (flags & FLAGS_NUMA)
+		size *= 2;
 
 	/*
 	 * The futex address must be "naturally" aligned.
 	 */
 	key->both.offset = address % PAGE_SIZE;
-	if (unlikely((address % sizeof(u32)) != 0))
+	if (unlikely((address % size) != 0))
 		return -EINVAL;
 	address -= key->both.offset;
 
-	if (unlikely(!access_ok(uaddr, sizeof(u32))))
+	if (unlikely(!access_ok(uaddr, size)))
 		return -EFAULT;
 
 	if (unlikely(should_fail_futex(fshared)))
 		return -EFAULT;
 
+	if (flags & FLAGS_NUMA) {
+		void __user *naddr = uaddr + size / 2;
+
+		if (futex_get_value(&node, naddr, flags))
+			return -EFAULT;
+
+		if (node == FUTEX_NO_NODE) {
+			node = numa_node_id();
+			if (futex_put_value(node, naddr, flags))
+				return -EFAULT;
+
+		} else if (node >= MAX_NUMNODES || !node_possible(node)) {
+			return -EINVAL;
+		}
+
+		key->both.node = node;
+
+	} else {
+		key->both.node = FUTEX_NO_NODE;
+	}
+
 	/*
 	 * PROCESS_PRIVATE futexes are fast.
 	 * As the mm cannot disappear under us and the 'key' only needs
@@ -1124,26 +1190,42 @@ void futex_exit_release(struct task_stru
 
 static int __init futex_init(void)
 {
-	unsigned int futex_shift;
-	unsigned long i;
+	unsigned int order, n;
+	unsigned long size, i;
 
 #if CONFIG_BASE_SMALL
 	futex_hashsize = 16;
 #else
-	futex_hashsize = roundup_pow_of_two(256 * num_possible_cpus());
+	futex_hashsize = 256 * num_possible_cpus();
+	futex_hashsize /= num_possible_nodes();
+	futex_hashsize = roundup_pow_of_two(futex_hashsize);
 #endif
+	futex_hashshift = ilog2(futex_hashsize);
+	size = sizeof(struct futex_hash_bucket) * futex_hashsize;
+	order = get_order(size);
+
+	for_each_node(n) {
+		struct futex_hash_bucket *table;
+
+		if (order > MAX_ORDER)
+			table = vmalloc_huge_node(size, GFP_KERNEL, n);
+		else
+			table = alloc_pages_exact_nid(n, size, GFP_KERNEL);
+
+		BUG_ON(!table);
+
+		for (i = 0; i < futex_hashsize; i++) {
+			atomic_set(&table[i].waiters, 0);
+			spin_lock_init(&table[i].lock);
+			plist_head_init(&table[i].chain);
+		}
 
-	futex_queues = alloc_large_system_hash("futex", sizeof(*futex_queues),
-					       futex_hashsize, 0, 0,
-					       &futex_shift, NULL,
-					       futex_hashsize, futex_hashsize);
-	futex_hashsize = 1UL << futex_shift;
-
-	for (i = 0; i < futex_hashsize; i++) {
-		atomic_set(&futex_queues[i].waiters, 0);
-		plist_head_init(&futex_queues[i].chain);
-		spin_lock_init(&futex_queues[i].lock);
+		futex_queues[n] = table;
 	}
+	pr_info("futex hash table, %d nodes, %ld entries (order: %d, %lu bytes)\n",
+		num_possible_nodes(),
+		futex_hashsize, order,
+		sizeof(struct futex_hash_bucket) * futex_hashsize);
 
 	return 0;
 }
Index: linux-2.6/kernel/futex/futex.h
===================================================================
--- linux-2.6.orig/kernel/futex/futex.h
+++ linux-2.6/kernel/futex/futex.h
@@ -83,6 +83,19 @@ static inline bool futex_flags_valid(uns
 	if ((flags & FLAGS_SIZE_MASK) != FLAGS_SIZE_32)
 		return false;
 
+	/*
+	 * Must be able to represent both FUTEX_NO_NODE and every valid nodeid
+	 * in a futex word.
+	 */
+	if (flags & FLAGS_NUMA) {
+		int bits = 8 * futex_size(flags);
+		u64 max = ~0ULL;
+
+		max >>= 64 - bits;
+		if (nr_node_ids >= max)
+			return false;
+	}
+
 	return true;
 }
 
@@ -184,7 +197,7 @@ enum futex_access {
 	FUTEX_WRITE
 };
 
-extern int get_futex_key(u32 __user *uaddr, unsigned int flags, union futex_key *key,
+extern int get_futex_key(void __user *uaddr, unsigned int flags, union futex_key *key,
 			 enum futex_access rw);
 
 extern struct hrtimer_sleeper *
Index: linux-2.6/kernel/futex/syscalls.c
===================================================================
--- linux-2.6.orig/kernel/futex/syscalls.c
+++ linux-2.6/kernel/futex/syscalls.c
@@ -179,7 +179,7 @@ SYSCALL_DEFINE6(futex, u32 __user *, uad
 	return do_futex(uaddr, op, val, tp, uaddr2, (unsigned long)utime, val3);
 }
 
-#define FUTEX2_VALID_MASK (FUTEX2_SIZE_MASK | FUTEX2_PRIVATE)
+#define FUTEX2_VALID_MASK (FUTEX2_SIZE_MASK | FUTEX2_NUMA | FUTEX2_PRIVATE)
 
 /**
  * futex_parse_waitv - Parse a waitv array from userspace
Index: linux-2.6/include/uapi/linux/futex.h
===================================================================
--- linux-2.6.orig/include/uapi/linux/futex.h
+++ linux-2.6/include/uapi/linux/futex.h
@@ -74,6 +74,14 @@
 /* do not use */
 #define FUTEX_32		FUTEX2_SIZE_U32 /* historical accident :-( */
 
+
+/*
+ * When FUTEX2_NUMA doubles the futex word, the second word is a node value.
+ * The special value -1 indicates no-node. This is the same value as
+ * NUMA_NO_NODE, except that value is not ABI, this is.
+ */
+#define FUTEX_NO_NODE		(-1)
+
 /*
  * Max numbers of elements in a futex_waitv array
  */



^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH v3 13/15] futex: Propagate flags into futex_get_value_locked()
  2023-09-21 10:45 [PATCH v3 00/15] futex: More futex2 bits peterz
                   ` (11 preceding siblings ...)
  2023-09-21 10:45 ` [PATCH v3 12/15] futex: Implement FUTEX2_NUMA peterz
@ 2023-09-21 10:45 ` peterz
  2023-09-21 10:45 ` [PATCH v3 14/15] futex: Enable FUTEX2_{8,16} peterz
                   ` (2 subsequent siblings)
  15 siblings, 0 replies; 36+ messages in thread
From: peterz @ 2023-09-21 10:45 UTC (permalink / raw)
  To: tglx, axboe
  Cc: linux-kernel, peterz, mingo, dvhart, dave, andrealmeid,
	Andrew Morton, urezki, hch, lstoakes, Arnd Bergmann, linux-api,
	linux-mm, linux-arch, malteskarupke

[-- Attachment #1: peterz-futex2-get_values_locked.patch --]
[-- Type: text/plain, Size: 4437 bytes --]

In order to facilitate variable sized futexes propagate the flags into
futex_get_value_locked().

No functional change intended.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
---
 kernel/futex/core.c     |    4 ++--
 kernel/futex/futex.h    |    2 +-
 kernel/futex/pi.c       |    8 ++++----
 kernel/futex/requeue.c  |    4 ++--
 kernel/futex/waitwake.c |    4 ++--
 5 files changed, 11 insertions(+), 11 deletions(-)

Index: linux-2.6/kernel/futex/core.c
===================================================================
--- linux-2.6.orig/kernel/futex/core.c
+++ linux-2.6/kernel/futex/core.c
@@ -516,12 +516,12 @@ int futex_cmpxchg_value_locked(u32 *curv
 	return ret;
 }
 
-int futex_get_value_locked(u32 *dest, u32 __user *from)
+int futex_get_value_locked(u32 *dest, u32 __user *from, unsigned int flags)
 {
 	int ret;
 
 	pagefault_disable();
-	ret = __get_user(*dest, from);
+	ret = futex_get_value(dest, from, flags);
 	pagefault_enable();
 
 	return ret ? -EFAULT : 0;
Index: linux-2.6/kernel/futex/futex.h
===================================================================
--- linux-2.6.orig/kernel/futex/futex.h
+++ linux-2.6/kernel/futex/futex.h
@@ -229,7 +229,7 @@ extern void futex_wake_mark(struct wake_
 
 extern int fault_in_user_writeable(u32 __user *uaddr);
 extern int futex_cmpxchg_value_locked(u32 *curval, u32 __user *uaddr, u32 uval, u32 newval);
-extern int futex_get_value_locked(u32 *dest, u32 __user *from);
+extern int futex_get_value_locked(u32 *dest, u32 __user *from, unsigned int flags);
 extern struct futex_q *futex_top_waiter(struct futex_hash_bucket *hb, union futex_key *key);
 
 extern void __futex_unqueue(struct futex_q *q);
Index: linux-2.6/kernel/futex/pi.c
===================================================================
--- linux-2.6.orig/kernel/futex/pi.c
+++ linux-2.6/kernel/futex/pi.c
@@ -240,7 +240,7 @@ static int attach_to_pi_state(u32 __user
 	 * still is what we expect it to be, otherwise retry the entire
 	 * operation.
 	 */
-	if (futex_get_value_locked(&uval2, uaddr))
+	if (futex_get_value_locked(&uval2, uaddr, FLAGS_SIZE_32))
 		goto out_efault;
 
 	if (uval != uval2)
@@ -359,7 +359,7 @@ static int handle_exit_race(u32 __user *
 	 * The same logic applies to the case where the exiting task is
 	 * already gone.
 	 */
-	if (futex_get_value_locked(&uval2, uaddr))
+	if (futex_get_value_locked(&uval2, uaddr, FLAGS_SIZE_32))
 		return -EFAULT;
 
 	/* If the user space value has changed, try again. */
@@ -527,7 +527,7 @@ int futex_lock_pi_atomic(u32 __user *uad
 	 * Read the user space value first so we can validate a few
 	 * things before proceeding further.
 	 */
-	if (futex_get_value_locked(&uval, uaddr))
+	if (futex_get_value_locked(&uval, uaddr, FLAGS_SIZE_32))
 		return -EFAULT;
 
 	if (unlikely(should_fail_futex(true)))
@@ -750,7 +750,7 @@ retry:
 	if (!pi_state->owner)
 		newtid |= FUTEX_OWNER_DIED;
 
-	err = futex_get_value_locked(&uval, uaddr);
+	err = futex_get_value_locked(&uval, uaddr, FLAGS_SIZE_32);
 	if (err)
 		goto handle_err;
 
Index: linux-2.6/kernel/futex/requeue.c
===================================================================
--- linux-2.6.orig/kernel/futex/requeue.c
+++ linux-2.6/kernel/futex/requeue.c
@@ -273,7 +273,7 @@ futex_proxy_trylock_atomic(u32 __user *p
 	u32 curval;
 	int ret;
 
-	if (futex_get_value_locked(&curval, pifutex))
+	if (futex_get_value_locked(&curval, pifutex, FLAGS_SIZE_32))
 		return -EFAULT;
 
 	if (unlikely(should_fail_futex(true)))
@@ -451,7 +451,7 @@ retry_private:
 	if (likely(cmpval != NULL)) {
 		u32 curval;
 
-		ret = futex_get_value_locked(&curval, uaddr1);
+		ret = futex_get_value_locked(&curval, uaddr1, FLAGS_SIZE_32);
 
 		if (unlikely(ret)) {
 			double_unlock_hb(hb1, hb2);
Index: linux-2.6/kernel/futex/waitwake.c
===================================================================
--- linux-2.6.orig/kernel/futex/waitwake.c
+++ linux-2.6/kernel/futex/waitwake.c
@@ -441,7 +441,7 @@ retry:
 		u32 val = vs[i].w.val;
 
 		hb = futex_q_lock(q);
-		ret = futex_get_value_locked(&uval, uaddr);
+		ret = futex_get_value_locked(&uval, uaddr, FLAGS_SIZE_32);
 
 		if (!ret && uval == val) {
 			/*
@@ -609,7 +609,7 @@ retry:
 retry_private:
 	*hb = futex_q_lock(q);
 
-	ret = futex_get_value_locked(&uval, uaddr);
+	ret = futex_get_value_locked(&uval, uaddr, FLAGS_SIZE_32);
 
 	if (ret) {
 		futex_q_unlock(*hb);



^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH v3 14/15] futex: Enable FUTEX2_{8,16}
  2023-09-21 10:45 [PATCH v3 00/15] futex: More futex2 bits peterz
                   ` (12 preceding siblings ...)
  2023-09-21 10:45 ` [PATCH v3 13/15] futex: Propagate flags into futex_get_value_locked() peterz
@ 2023-09-21 10:45 ` peterz
  2023-09-21 10:45 ` [PATCH v3 15/15] futex,selftests: Extend the futex selftests peterz
  2023-09-22 20:01 ` futex2 numa stuff Peter Zijlstra
  15 siblings, 0 replies; 36+ messages in thread
From: peterz @ 2023-09-21 10:45 UTC (permalink / raw)
  To: tglx, axboe
  Cc: linux-kernel, peterz, mingo, dvhart, dave, andrealmeid,
	Andrew Morton, urezki, hch, lstoakes, Arnd Bergmann, linux-api,
	linux-mm, linux-arch, malteskarupke

[-- Attachment #1: peterz-futex2-small.patch --]
[-- Type: text/plain, Size: 3973 bytes --]

When futexes are no longer u32 aligned, the lower offset bits are no
longer available to put type info in. However, since offset is the
offset within a page, there are plenty bits available on the top end.

After that, pass flags into futex_get_value_locked() for WAIT and
disallow FUTEX2_SIZE_U64 instead of mandating FUTEX2_SIZE_U32.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
---
 include/linux/futex.h   |   11 ++++++-----
 kernel/futex/core.c     |    9 +++++++++
 kernel/futex/futex.h    |    4 ++--
 kernel/futex/waitwake.c |    5 +++--
 4 files changed, 20 insertions(+), 9 deletions(-)

Index: linux-2.6/include/linux/futex.h
===================================================================
--- linux-2.6.orig/include/linux/futex.h
+++ linux-2.6/include/linux/futex.h
@@ -16,18 +16,19 @@ struct task_struct;
  * The key type depends on whether it's a shared or private mapping.
  * Don't rearrange members without looking at hash_futex().
  *
- * offset is aligned to a multiple of sizeof(u32) (== 4) by definition.
- * We use the two low order bits of offset to tell what is the kind of key :
+ * offset is the position within a page and is in the range [0, PAGE_SIZE).
+ * The high bits of the offset indicate what kind of key this is:
  *  00 : Private process futex (PTHREAD_PROCESS_PRIVATE)
  *       (no reference on an inode or mm)
  *  01 : Shared futex (PTHREAD_PROCESS_SHARED)
  *	mapped on a file (reference on the underlying inode)
  *  10 : Shared futex (PTHREAD_PROCESS_SHARED)
  *       (but private mapping on an mm, and reference taken on it)
-*/
+ */
 
-#define FUT_OFF_INODE    1 /* We set bit 0 if key has a reference on inode */
-#define FUT_OFF_MMSHARED 2 /* We set bit 1 if key has a reference on mm */
+#define FUT_OFF_INODE    (PAGE_SIZE << 0)
+#define FUT_OFF_MMSHARED (PAGE_SIZE << 1)
+#define FUT_OFF_SIZE	 (PAGE_SIZE << 2)
 
 union futex_key {
 	struct {
Index: linux-2.6/kernel/futex/core.c
===================================================================
--- linux-2.6.orig/kernel/futex/core.c
+++ linux-2.6/kernel/futex/core.c
@@ -311,6 +311,15 @@ int get_futex_key(void __user *uaddr, un
 	}
 
 	/*
+	 * Encode the futex size in the offset. This makes cross-size
+	 * wake-wait fail -- see futex_match().
+	 *
+	 * NOTE that cross-size wake-wait is fundamentally broken wrt
+	 * FLAGS_NUMA.
+	 */
+	key->both.offset |= FUT_OFF_SIZE * (flags & FLAGS_SIZE_MASK);
+
+	/*
 	 * PROCESS_PRIVATE futexes are fast.
 	 * As the mm cannot disappear under us and the 'key' only needs
 	 * virtual address, we dont even have to find the underlying vma.
Index: linux-2.6/kernel/futex/futex.h
===================================================================
--- linux-2.6.orig/kernel/futex/futex.h
+++ linux-2.6/kernel/futex/futex.h
@@ -79,8 +79,8 @@ static inline bool futex_flags_valid(uns
 			return false;
 	}
 
-	/* Only 32bit futexes are implemented -- for now */
-	if ((flags & FLAGS_SIZE_MASK) != FLAGS_SIZE_32)
+	/* 64bit futexes aren't implemented -- yet */
+	if ((flags & FLAGS_SIZE_MASK) == FLAGS_SIZE_64)
 		return false;
 
 	/*
Index: linux-2.6/kernel/futex/waitwake.c
===================================================================
--- linux-2.6.orig/kernel/futex/waitwake.c
+++ linux-2.6/kernel/futex/waitwake.c
@@ -437,11 +437,12 @@ retry:
 
 	for (i = 0; i < count; i++) {
 		u32 __user *uaddr = (u32 __user *)(unsigned long)vs[i].w.uaddr;
+		unsigned int flags = vs[i].w.flags;
 		struct futex_q *q = &vs[i].q;
 		u32 val = vs[i].w.val;
 
 		hb = futex_q_lock(q);
-		ret = futex_get_value_locked(&uval, uaddr, FLAGS_SIZE_32);
+		ret = futex_get_value_locked(&uval, uaddr, flags);
 
 		if (!ret && uval == val) {
 			/*
@@ -609,7 +610,7 @@ retry:
 retry_private:
 	*hb = futex_q_lock(q);
 
-	ret = futex_get_value_locked(&uval, uaddr, FLAGS_SIZE_32);
+	ret = futex_get_value_locked(&uval, uaddr, flags);
 
 	if (ret) {
 		futex_q_unlock(*hb);



^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH v3 15/15] futex,selftests: Extend the futex selftests
  2023-09-21 10:45 [PATCH v3 00/15] futex: More futex2 bits peterz
                   ` (13 preceding siblings ...)
  2023-09-21 10:45 ` [PATCH v3 14/15] futex: Enable FUTEX2_{8,16} peterz
@ 2023-09-21 10:45 ` peterz
  2023-09-21 11:41   ` Peter Zijlstra
  2023-09-22 20:01 ` futex2 numa stuff Peter Zijlstra
  15 siblings, 1 reply; 36+ messages in thread
From: peterz @ 2023-09-21 10:45 UTC (permalink / raw)
  To: tglx, axboe
  Cc: linux-kernel, peterz, mingo, dvhart, dave, andrealmeid,
	Andrew Morton, urezki, hch, lstoakes, Arnd Bergmann, linux-api,
	linux-mm, linux-arch, malteskarupke

[-- Attachment #1: peterz-futex2-tests.patch --]
[-- Type: text/plain, Size: 15711 bytes --]

Extend the wait/requeue selftests to also cover the futex2 syscalls.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 tools/testing/selftests/futex/functional/futex_requeue.c         |  100 +++++++++-
 tools/testing/selftests/futex/functional/futex_wait.c            |   56 ++++-
 tools/testing/selftests/futex/functional/futex_wait_timeout.c    |   14 +
 tools/testing/selftests/futex/functional/futex_wait_wouldblock.c |   28 ++
 tools/testing/selftests/futex/functional/futex_waitv.c           |   15 -
 tools/testing/selftests/futex/functional/run.sh                  |    6 
 tools/testing/selftests/futex/include/futex2test.h               |   39 +++
 7 files changed, 229 insertions(+), 29 deletions(-)

Index: linux-2.6/tools/testing/selftests/futex/functional/futex_requeue.c
===================================================================
--- linux-2.6.orig/tools/testing/selftests/futex/functional/futex_requeue.c
+++ linux-2.6/tools/testing/selftests/futex/functional/futex_requeue.c
@@ -7,8 +7,10 @@
 
 #include <pthread.h>
 #include <limits.h>
+#include <stdbool.h>
 #include "logging.h"
 #include "futextest.h"
+#include "futex2test.h"
 
 #define TEST_NAME "futex-requeue"
 #define timeout_ns  30000000
@@ -16,24 +18,58 @@
 
 volatile futex_t *f1;
 
+bool futex2 = 0;
+bool mixed = 0;
+
 void usage(char *prog)
 {
 	printf("Usage: %s\n", prog);
 	printf("  -c	Use color\n");
+	printf("  -n	Use futex2 interface\n");
+	printf("  -x	Use mixed size futex\n");
 	printf("  -h	Display this help message\n");
 	printf("  -v L	Verbosity level: %d=QUIET %d=CRITICAL %d=INFO\n",
 	       VQUIET, VCRITICAL, VINFO);
 }
 
-void *waiterfn(void *arg)
+static void *waiterfn(void *arg)
 {
+	unsigned int flags = 0;
 	struct timespec to;
 
-	to.tv_sec = 0;
-	to.tv_nsec = timeout_ns;
+	if (futex2) {
+		unsigned long mask;
+
+		if (clock_gettime(CLOCK_MONOTONIC, &to)) {
+			printf("clock_gettime() failed errno %d", errno);
+			return NULL;
+		}
+
+		to.tv_nsec += timeout_ns;
+		if (to.tv_nsec >= 1000000000) {
+			to.tv_sec++;
+			to.tv_nsec -= 1000000000;
+		}
+
+		if (mixed) {
+			flags |= FUTEX2_SIZE_U16;
+			mask = (unsigned short)(~0U);
+		} else {
+			flags |= FUTEX2_SIZE_U32;
+			mask = (unsigned int)(~0U);
+		}
+
+		if (futex2_wait(f1, *f1, mask, flags,
+				&to, CLOCK_MONOTONIC))
+			printf("waiter failed errno %d\n", errno);
+	} else {
+
+		to.tv_sec = 0;
+		to.tv_nsec = timeout_ns;
 
-	if (futex_wait(f1, *f1, &to, 0))
-		printf("waiter failed errno %d\n", errno);
+		if (futex_wait(f1, *f1, &to, flags))
+			printf("waiter failed errno %d\n", errno);
+	}
 
 	return NULL;
 }
@@ -48,7 +84,7 @@ int main(int argc, char *argv[])
 
 	f1 = &_f1;
 
-	while ((c = getopt(argc, argv, "cht:v:")) != -1) {
+	while ((c = getopt(argc, argv, "xncht:v:")) != -1) {
 		switch (c) {
 		case 'c':
 			log_color(1);
@@ -59,6 +95,12 @@ int main(int argc, char *argv[])
 		case 'v':
 			log_verbosity(atoi(optarg));
 			break;
+		case 'x':
+			mixed=1;
+			/* fallthrough */
+		case 'n':
+			futex2=1;
+			break;
 		default:
 			usage(basename(argv[0]));
 			exit(1);
@@ -79,7 +121,22 @@ int main(int argc, char *argv[])
 	usleep(WAKE_WAIT_US);
 
 	info("Requeuing 1 futex from f1 to f2\n");
-	res = futex_cmp_requeue(f1, 0, &f2, 0, 1, 0);
+	if (futex2) {
+		struct futex_waitv futexes[2] = {
+			{
+				.val = 0,
+				.uaddr = (unsigned long)f1,
+				.flags = mixed ? FUTEX2_SIZE_U16 : FUTEX2_SIZE_U32,
+			},
+			{
+				.uaddr = (unsigned long)&f2,
+				.flags = FUTEX2_SIZE_U32,
+			},
+		};
+		res = futex2_requeue(futexes, 0, 0, 1);
+	} else {
+		res = futex_cmp_requeue(f1, 0, &f2, 0, 1, 0);
+	}
 	if (res != 1) {
 		ksft_test_result_fail("futex_requeue simple returned: %d %s\n",
 				      res ? errno : res,
@@ -89,7 +146,11 @@ int main(int argc, char *argv[])
 
 
 	info("Waking 1 futex at f2\n");
-	res = futex_wake(&f2, 1, 0);
+	if (futex2) {
+		res = futex2_wake(&f2, ~0U, 1, FUTEX2_SIZE_U32);
+	} else {
+		res = futex_wake(&f2, 1, 0);
+	}
 	if (res != 1) {
 		ksft_test_result_fail("futex_requeue simple returned: %d %s\n",
 				      res ? errno : res,
@@ -112,7 +173,22 @@ int main(int argc, char *argv[])
 	usleep(WAKE_WAIT_US);
 
 	info("Waking 3 futexes at f1 and requeuing 7 futexes from f1 to f2\n");
-	res = futex_cmp_requeue(f1, 0, &f2, 3, 7, 0);
+	if (futex2) {
+		struct futex_waitv futexes[2] = {
+			{
+				.val = 0,
+				.uaddr = (unsigned long)f1,
+				.flags = mixed ? FUTEX2_SIZE_U16 : FUTEX2_SIZE_U32,
+			},
+			{
+				.uaddr = (unsigned long)&f2,
+				.flags = FUTEX2_SIZE_U32,
+			},
+		};
+		res = futex2_requeue(futexes, 0, 3, 7);
+	} else {
+		res = futex_cmp_requeue(f1, 0, &f2, 3, 7, 0);
+	}
 	if (res != 10) {
 		ksft_test_result_fail("futex_requeue many returned: %d %s\n",
 				      res ? errno : res,
@@ -121,7 +197,11 @@ int main(int argc, char *argv[])
 	}
 
 	info("Waking INT_MAX futexes at f2\n");
-	res = futex_wake(&f2, INT_MAX, 0);
+	if (futex2) {
+		res = futex2_wake(&f2, ~0U, INT_MAX, FUTEX2_SIZE_U32);
+	} else {
+		res = futex_wake(&f2, INT_MAX, 0);
+	}
 	if (res != 7) {
 		ksft_test_result_fail("futex_requeue many returned: %d %s\n",
 				      res ? errno : res,
Index: linux-2.6/tools/testing/selftests/futex/functional/futex_wait.c
===================================================================
--- linux-2.6.orig/tools/testing/selftests/futex/functional/futex_wait.c
+++ linux-2.6/tools/testing/selftests/futex/functional/futex_wait.c
@@ -9,8 +9,10 @@
 #include <sys/shm.h>
 #include <sys/mman.h>
 #include <fcntl.h>
+#include <stdbool.h>
 #include "logging.h"
 #include "futextest.h"
+#include "futex2test.h"
 
 #define TEST_NAME "futex-wait"
 #define timeout_ns  30000000
@@ -19,10 +21,13 @@
 
 void *futex;
 
+bool futex2 = 0;
+
 void usage(char *prog)
 {
 	printf("Usage: %s\n", prog);
 	printf("  -c	Use color\n");
+	printf("  -n	Use futex2 interface\n");
 	printf("  -h	Display this help message\n");
 	printf("  -v L	Verbosity level: %d=QUIET %d=CRITICAL %d=INFO\n",
 	       VQUIET, VCRITICAL, VINFO);
@@ -30,17 +35,35 @@ void usage(char *prog)
 
 static void *waiterfn(void *arg)
 {
-	struct timespec to;
 	unsigned int flags = 0;
+	struct timespec to;
 
 	if (arg)
 		flags = *((unsigned int *) arg);
 
-	to.tv_sec = 0;
-	to.tv_nsec = timeout_ns;
+	if (futex2) {
+		if (clock_gettime(CLOCK_MONOTONIC, &to)) {
+			printf("clock_gettime() failed errno %d", errno);
+			return NULL;
+		}
 
-	if (futex_wait(futex, 0, &to, flags))
-		printf("waiter failed errno %d\n", errno);
+		to.tv_nsec += timeout_ns;
+		if (to.tv_nsec >= 1000000000) {
+			to.tv_sec++;
+			to.tv_nsec -= 1000000000;
+		}
+
+		if (futex2_wait(futex, 0, ~0U, flags | FUTEX2_SIZE_U32,
+				&to, CLOCK_MONOTONIC))
+			printf("waiter failed errno %d\n", errno);
+	} else {
+
+		to.tv_sec = 0;
+		to.tv_nsec = timeout_ns;
+
+		if (futex_wait(futex, 0, &to, flags))
+			printf("waiter failed errno %d\n", errno);
+	}
 
 	return NULL;
 }
@@ -55,7 +78,7 @@ int main(int argc, char *argv[])
 
 	futex = &f_private;
 
-	while ((c = getopt(argc, argv, "cht:v:")) != -1) {
+	while ((c = getopt(argc, argv, "ncht:v:")) != -1) {
 		switch (c) {
 		case 'c':
 			log_color(1);
@@ -66,6 +89,9 @@ int main(int argc, char *argv[])
 		case 'v':
 			log_verbosity(atoi(optarg));
 			break;
+		case 'n':
+			futex2=1;
+			break;
 		default:
 			usage(basename(argv[0]));
 			exit(1);
@@ -84,7 +110,11 @@ int main(int argc, char *argv[])
 	usleep(WAKE_WAIT_US);
 
 	info("Calling private futex_wake on futex: %p\n", futex);
-	res = futex_wake(futex, 1, FUTEX_PRIVATE_FLAG);
+	if (futex2) {
+		res = futex2_wake(futex, ~0U, 1, FUTEX2_SIZE_U32 | FUTEX2_PRIVATE);
+	} else {
+		res = futex_wake(futex, 1, FUTEX_PRIVATE_FLAG);
+	}
 	if (res != 1) {
 		ksft_test_result_fail("futex_wake private returned: %d %s\n",
 				      errno, strerror(errno));
@@ -112,7 +142,11 @@ int main(int argc, char *argv[])
 	usleep(WAKE_WAIT_US);
 
 	info("Calling shared (page anon) futex_wake on futex: %p\n", futex);
-	res = futex_wake(futex, 1, 0);
+	if (futex2) {
+		res = futex2_wake(futex, ~0U, 1, FUTEX2_SIZE_U32);
+	} else {
+		res = futex_wake(futex, 1, 0);
+	}
 	if (res != 1) {
 		ksft_test_result_fail("futex_wake shared (page anon) returned: %d %s\n",
 				      errno, strerror(errno));
@@ -151,7 +185,11 @@ int main(int argc, char *argv[])
 	usleep(WAKE_WAIT_US);
 
 	info("Calling shared (file backed) futex_wake on futex: %p\n", futex);
-	res = futex_wake(shm, 1, 0);
+	if (futex2) {
+		res = futex2_wake(shm, ~0U, 1, FUTEX2_SIZE_U32);
+	} else {
+		res = futex_wake(shm, 1, 0);
+	}
 	if (res != 1) {
 		ksft_test_result_fail("futex_wake shared (file backed) returned: %d %s\n",
 				      errno, strerror(errno));
Index: linux-2.6/tools/testing/selftests/futex/functional/futex_wait_timeout.c
===================================================================
--- linux-2.6.orig/tools/testing/selftests/futex/functional/futex_wait_timeout.c
+++ linux-2.6/tools/testing/selftests/futex/functional/futex_wait_timeout.c
@@ -128,7 +128,7 @@ int main(int argc, char *argv[])
 	}
 
 	ksft_print_header();
-	ksft_set_plan(9);
+	ksft_set_plan(11);
 	ksft_print_msg("%s: Block on a futex and wait for timeout\n",
 	       basename(argv[0]));
 	ksft_print_msg("\tArguments: timeout=%ldns\n", timeout_ns);
@@ -201,6 +201,18 @@ int main(int argc, char *argv[])
 	res = futex_waitv(&waitv, 1, 0, &to, CLOCK_REALTIME);
 	test_timeout(res, &ret, "futex_waitv realtime", ETIMEDOUT);
 
+	/* futex2_wait with CLOCK_MONOTONIC */
+	if (futex_get_abs_timeout(CLOCK_MONOTONIC, &to, timeout_ns))
+		return RET_FAIL;
+	res = futex2_wait(&f1, f1, 1, FUTEX2_SIZE_U32, &to, CLOCK_MONOTONIC);
+	test_timeout(res, &ret, "futex2_wait monotonic", ETIMEDOUT);
+
+	/* futex2_wait with CLOCK_REALTIME */
+	if (futex_get_abs_timeout(CLOCK_REALTIME, &to, timeout_ns))
+		return RET_FAIL;
+	res = futex2_wait(&f1, f1, 1, FUTEX2_SIZE_U32, &to, CLOCK_REALTIME);
+	test_timeout(res, &ret, "futex2_wait realtime", ETIMEDOUT);
+
 	ksft_print_cnts();
 	return ret;
 }
Index: linux-2.6/tools/testing/selftests/futex/functional/futex_wait_wouldblock.c
===================================================================
--- linux-2.6.orig/tools/testing/selftests/futex/functional/futex_wait_wouldblock.c
+++ linux-2.6/tools/testing/selftests/futex/functional/futex_wait_wouldblock.c
@@ -46,7 +46,7 @@ int main(int argc, char *argv[])
 	struct futex_waitv waitv = {
 			.uaddr = (uintptr_t)&f1,
 			.val = f1+1,
-			.flags = FUTEX_32,
+			.flags = FUTEX2_SIZE_U32 | FUTEX2_PRIVATE,
 			.__reserved = 0
 		};
 
@@ -68,7 +68,7 @@ int main(int argc, char *argv[])
 	}
 
 	ksft_print_header();
-	ksft_set_plan(2);
+	ksft_set_plan(3);
 	ksft_print_msg("%s: Test the unexpected futex value in FUTEX_WAIT\n",
 	       basename(argv[0]));
 
@@ -106,6 +106,30 @@ int main(int argc, char *argv[])
 		ksft_test_result_pass("futex_waitv\n");
 	}
 
+	if (clock_gettime(CLOCK_MONOTONIC, &to)) {
+		error("clock_gettime failed\n", errno);
+		return errno;
+	}
+
+	to.tv_nsec += timeout_ns;
+
+	if (to.tv_nsec >= 1000000000) {
+		to.tv_sec++;
+		to.tv_nsec -= 1000000000;
+	}
+
+	info("Calling futex2_wait on f1: %u @ %p with val=%u\n", f1, &f1, f1+1);
+	res = futex2_wait(&f1, f1+1, ~0U, FUTEX2_SIZE_U32 | FUTEX2_PRIVATE,
+			  &to, CLOCK_MONOTONIC);
+	if (!res || errno != EWOULDBLOCK) {
+		ksft_test_result_pass("futex2_wait returned: %d %s\n",
+				      res ? errno : res,
+				      res ? strerror(errno) : "");
+		ret = RET_FAIL;
+	} else {
+		ksft_test_result_pass("futex2_wait\n");
+	}
+
 	ksft_print_cnts();
 	return ret;
 }
Index: linux-2.6/tools/testing/selftests/futex/functional/futex_waitv.c
===================================================================
--- linux-2.6.orig/tools/testing/selftests/futex/functional/futex_waitv.c
+++ linux-2.6/tools/testing/selftests/futex/functional/futex_waitv.c
@@ -88,7 +88,7 @@ int main(int argc, char *argv[])
 
 	for (i = 0; i < NR_FUTEXES; i++) {
 		waitv[i].uaddr = (uintptr_t)&futexes[i];
-		waitv[i].flags = FUTEX_32 | FUTEX_PRIVATE_FLAG;
+		waitv[i].flags = FUTEX2_SIZE_U32 | FUTEX2_PRIVATE;
 		waitv[i].val = 0;
 		waitv[i].__reserved = 0;
 	}
@@ -99,7 +99,8 @@ int main(int argc, char *argv[])
 
 	usleep(WAKE_WAIT_US);
 
-	res = futex_wake(u64_to_ptr(waitv[NR_FUTEXES - 1].uaddr), 1, FUTEX_PRIVATE_FLAG);
+	res = futex2_wake(u64_to_ptr(waitv[NR_FUTEXES - 1].uaddr), ~0U, 1,
+			  FUTEX2_PRIVATE | FUTEX2_SIZE_U32);
 	if (res != 1) {
 		ksft_test_result_fail("futex_wake private returned: %d %s\n",
 				      res ? errno : res,
@@ -122,7 +123,7 @@ int main(int argc, char *argv[])
 
 		*shared_data = 0;
 		waitv[i].uaddr = (uintptr_t)shared_data;
-		waitv[i].flags = FUTEX_32;
+		waitv[i].flags = FUTEX2_SIZE_U32;
 		waitv[i].val = 0;
 		waitv[i].__reserved = 0;
 	}
@@ -145,8 +146,8 @@ int main(int argc, char *argv[])
 	for (i = 0; i < NR_FUTEXES; i++)
 		shmdt(u64_to_ptr(waitv[i].uaddr));
 
-	/* Testing a waiter without FUTEX_32 flag */
-	waitv[0].flags = FUTEX_PRIVATE_FLAG;
+	/* Testing a waiter without FUTEX2_SIZE_U32 flag */
+	waitv[0].flags = FUTEX2_PRIVATE;
 
 	if (clock_gettime(CLOCK_MONOTONIC, &to))
 		error("gettime64 failed\n", errno);
@@ -160,11 +161,11 @@ int main(int argc, char *argv[])
 				      res ? strerror(errno) : "");
 		ret = RET_FAIL;
 	} else {
-		ksft_test_result_pass("futex_waitv without FUTEX_32\n");
+		ksft_test_result_pass("futex_waitv without FUTEX2_SIZE_U32\n");
 	}
 
 	/* Testing a waiter with an unaligned address */
-	waitv[0].flags = FUTEX_PRIVATE_FLAG | FUTEX_32;
+	waitv[0].flags = FUTEX2_PRIVATE | FUTEX2_SIZE_U32;
 	waitv[0].uaddr = 1;
 
 	if (clock_gettime(CLOCK_MONOTONIC, &to))
Index: linux-2.6/tools/testing/selftests/futex/functional/run.sh
===================================================================
--- linux-2.6.orig/tools/testing/selftests/futex/functional/run.sh
+++ linux-2.6/tools/testing/selftests/futex/functional/run.sh
@@ -76,9 +76,15 @@ echo
 
 echo
 ./futex_wait $COLOR
+echo
+./futex_wait -n $COLOR
 
 echo
 ./futex_requeue $COLOR
+echo
+./futex_requeue -n $COLOR
+echo
+./futex_requeue -x $COLOR
 
 echo
 ./futex_waitv $COLOR
Index: linux-2.6/tools/testing/selftests/futex/include/futex2test.h
===================================================================
--- linux-2.6.orig/tools/testing/selftests/futex/include/futex2test.h
+++ linux-2.6/tools/testing/selftests/futex/include/futex2test.h
@@ -8,6 +8,28 @@
 
 #define u64_to_ptr(x) ((void *)(uintptr_t)(x))
 
+#ifndef __NR_futex_wake
+#define __NR_futex_wake 452
+#define __NR_futex_wait 453
+#define __NR_futex_requeue 454
+#endif
+
+#ifndef FUTEX2_SIZE_U8
+/*
+ * Flags for futex2 syscalls.
+ */
+#define FUTEX2_SIZE_U8		0x00
+#define FUTEX2_SIZE_U16		0x01
+#define FUTEX2_SIZE_U32		0x02
+#define FUTEX2_SIZE_U64		0x03
+#define FUTEX2_NUMA		0x04
+			/*	0x08 */
+			/*	0x10 */
+			/*	0x20 */
+			/*	0x40 */
+#define FUTEX2_PRIVATE		FUTEX_PRIVATE_FLAG
+#endif
+
 /**
  * futex_waitv - Wait at multiple futexes, wake on any
  * @waiters:    Array of waiters
@@ -20,3 +42,20 @@ static inline int futex_waitv(volatile s
 {
 	return syscall(__NR_futex_waitv, waiters, nr_waiters, flags, timo, clockid);
 }
+
+static inline int futex2_wake(volatile void *uaddr, unsigned long mask, int nr, unsigned int flags)
+{
+	return syscall(__NR_futex_wake, uaddr, mask, nr, flags);
+}
+
+static inline int futex2_wait(volatile void *uaddr, unsigned long val, unsigned long mask,
+			      unsigned int flags, struct timespec *timo, clockid_t clockid)
+{
+	return syscall(__NR_futex_wait, uaddr, val, mask, flags, timo, clockid);
+}
+
+static inline int futex2_requeue(struct futex_waitv *futexes, unsigned int flags,
+				 int nr_wake, int nr_requeue)
+{
+	return syscall(__NR_futex_requeue, futexes, flags, nr_wake, nr_requeue);
+}



^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v3 15/15] futex,selftests: Extend the futex selftests
  2023-09-21 10:45 ` [PATCH v3 15/15] futex,selftests: Extend the futex selftests peterz
@ 2023-09-21 11:41   ` Peter Zijlstra
  0 siblings, 0 replies; 36+ messages in thread
From: Peter Zijlstra @ 2023-09-21 11:41 UTC (permalink / raw)
  To: tglx, axboe
  Cc: linux-kernel, mingo, dvhart, dave, andrealmeid, Andrew Morton,
	urezki, hch, lstoakes, Arnd Bergmann, linux-api, linux-mm,
	linux-arch, malteskarupke

On Thu, Sep 21, 2023 at 12:45:20PM +0200, peterz@infradead.org wrote:

> +#ifndef __NR_futex_wake
> +#define __NR_futex_wake 452
> +#define __NR_futex_wait 453
> +#define __NR_futex_requeue 454

Clearly I forgot to run the selftests before posting... +2 on those
syscall numbers.

Let me get on with writing some of that numa stuff.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v3 06/15] futex: FLAGS_STRICT
  2023-09-21 10:45 ` [PATCH v3 06/15] futex: FLAGS_STRICT peterz
@ 2023-09-21 15:14   ` Thomas Gleixner
  2023-09-21 17:36   ` [tip: locking/core] " tip-bot2 for peterz@infradead.org
  1 sibling, 0 replies; 36+ messages in thread
From: Thomas Gleixner @ 2023-09-21 15:14 UTC (permalink / raw)
  To: peterz, axboe
  Cc: linux-kernel, peterz, mingo, dvhart, dave, andrealmeid,
	Andrew Morton, urezki, hch, lstoakes, Arnd Bergmann, linux-api,
	linux-mm, linux-arch, malteskarupke

On Thu, Sep 21 2023 at 12:45, peterz@infradead.org wrote:
> The current semantics for futex_wake() are a bit loose, specifically
> asking for 0 futexes to be woken actually gets you 1.
>
> Adding a !nr check to sys_futex_wake() makes that it would return 0
> for unaligned futex words, because that check comes in the shared
> futex_wake() function. Adding the !nr check there, would affect the
> legacy sys_futex() semantics.
>
> Hence frob a flag :-(
>
> Suggested-by: André Almeida <andrealmeid@igalia.com>
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>

Reviewed-by: Thomas Gleixner <tglx@linutronix.de>

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [tip: locking/core] futex: Add sys_futex_requeue()
  2023-09-21 10:45 ` [PATCH v3 10/15] futex: Add sys_futex_requeue() peterz
@ 2023-09-21 17:35   ` tip-bot2 for peterz@infradead.org
  2023-09-22  9:35   ` [PATCH v3 10/15] " Ingo Molnar
  1 sibling, 0 replies; 36+ messages in thread
From: tip-bot2 for peterz@infradead.org @ 2023-09-21 17:35 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Peter Zijlstra (Intel),
	Thomas Gleixner, Geert Uytterhoeven, x86, linux-kernel

The following commit has been merged into the locking/core branch of tip:

Commit-ID:     0f4b5f972216782a4acb1ae00dcb55173847c2ff
Gitweb:        https://git.kernel.org/tip/0f4b5f972216782a4acb1ae00dcb55173847c2ff
Author:        peterz@infradead.org <peterz@infradead.org>
AuthorDate:    Thu, 21 Sep 2023 12:45:15 +02:00
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Thu, 21 Sep 2023 19:22:10 +02:00

futex: Add sys_futex_requeue()

Finish off the 'simple' futex2 syscall group by adding
sys_futex_requeue(). Unlike sys_futex_{wait,wake}() its arguments are
too numerous to fit into a regular syscall. As such, use struct
futex_waitv to pass the 'source' and 'destination' futexes to the
syscall.

This syscall implements what was previously known as FUTEX_CMP_REQUEUE
and uses {val, uaddr, flags} for source and {uaddr, flags} for
destination.

This design explicitly allows requeueing between different types of
futex by having a different flags word per uaddr.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Link: https://lore.kernel.org/r/20230921105248.511860556@noisy.programming.kicks-ass.net
---
 arch/alpha/kernel/syscalls/syscall.tbl      |  1 +-
 arch/arm/tools/syscall.tbl                  |  1 +-
 arch/arm64/include/asm/unistd.h             |  2 +-
 arch/arm64/include/asm/unistd32.h           |  2 +-
 arch/ia64/kernel/syscalls/syscall.tbl       |  1 +-
 arch/m68k/kernel/syscalls/syscall.tbl       |  1 +-
 arch/microblaze/kernel/syscalls/syscall.tbl |  1 +-
 arch/mips/kernel/syscalls/syscall_n32.tbl   |  1 +-
 arch/mips/kernel/syscalls/syscall_n64.tbl   |  1 +-
 arch/mips/kernel/syscalls/syscall_o32.tbl   |  1 +-
 arch/parisc/kernel/syscalls/syscall.tbl     |  1 +-
 arch/powerpc/kernel/syscalls/syscall.tbl    |  1 +-
 arch/s390/kernel/syscalls/syscall.tbl       |  1 +-
 arch/sh/kernel/syscalls/syscall.tbl         |  1 +-
 arch/sparc/kernel/syscalls/syscall.tbl      |  1 +-
 arch/x86/entry/syscalls/syscall_32.tbl      |  1 +-
 arch/x86/entry/syscalls/syscall_64.tbl      |  1 +-
 arch/xtensa/kernel/syscalls/syscall.tbl     |  1 +-
 include/linux/syscalls.h                    |  3 ++-
 include/uapi/asm-generic/unistd.h           |  4 +-
 kernel/futex/syscalls.c                     | 38 ++++++++++++++++++++-
 kernel/sys_ni.c                             |  1 +-
 22 files changed, 64 insertions(+), 2 deletions(-)

diff --git a/arch/alpha/kernel/syscalls/syscall.tbl b/arch/alpha/kernel/syscalls/syscall.tbl
index c49f12f..b1865f9 100644
--- a/arch/alpha/kernel/syscalls/syscall.tbl
+++ b/arch/alpha/kernel/syscalls/syscall.tbl
@@ -494,3 +494,4 @@
 562	common	fchmodat2			sys_fchmodat2
 563	common	futex_wake			sys_futex_wake
 564	common	futex_wait			sys_futex_wait
+565	common	futex_requeue			sys_futex_requeue
diff --git a/arch/arm/tools/syscall.tbl b/arch/arm/tools/syscall.tbl
index a6cf562..93d0d46 100644
--- a/arch/arm/tools/syscall.tbl
+++ b/arch/arm/tools/syscall.tbl
@@ -468,3 +468,4 @@
 452	common	fchmodat2			sys_fchmodat2
 454	common	futex_wake			sys_futex_wake
 455	common	futex_wait			sys_futex_wait
+456	common	futex_requeue			sys_futex_requeue
diff --git a/arch/arm64/include/asm/unistd.h b/arch/arm64/include/asm/unistd.h
index f33190f..531effc 100644
--- a/arch/arm64/include/asm/unistd.h
+++ b/arch/arm64/include/asm/unistd.h
@@ -39,7 +39,7 @@
 #define __ARM_NR_compat_set_tls		(__ARM_NR_COMPAT_BASE + 5)
 #define __ARM_NR_COMPAT_END		(__ARM_NR_COMPAT_BASE + 0x800)
 
-#define __NR_compat_syscalls		456
+#define __NR_compat_syscalls		457
 #endif
 
 #define __ARCH_WANT_SYS_CLONE
diff --git a/arch/arm64/include/asm/unistd32.h b/arch/arm64/include/asm/unistd32.h
index 6e7d372..c453291 100644
--- a/arch/arm64/include/asm/unistd32.h
+++ b/arch/arm64/include/asm/unistd32.h
@@ -915,6 +915,8 @@ __SYSCALL(__NR_fchmodat2, sys_fchmodat2)
 __SYSCALL(__NR_futex_wake, sys_futex_wake)
 #define __NR_futex_wait 455
 __SYSCALL(__NR_futex_wait, sys_futex_wait)
+#define __NR_futex_requeue 456
+__SYSCALL(__NR_futex_requeue, sys_futex_requeue)
 
 /*
  * Please add new compat syscalls above this comment and update
diff --git a/arch/ia64/kernel/syscalls/syscall.tbl b/arch/ia64/kernel/syscalls/syscall.tbl
index 4043f0c..81375ea 100644
--- a/arch/ia64/kernel/syscalls/syscall.tbl
+++ b/arch/ia64/kernel/syscalls/syscall.tbl
@@ -375,3 +375,4 @@
 452	common	fchmodat2			sys_fchmodat2
 454	common	futex_wake			sys_futex_wake
 455	common	futex_wait			sys_futex_wait
+456	common	futex_requeue			sys_futex_requeue
diff --git a/arch/m68k/kernel/syscalls/syscall.tbl b/arch/m68k/kernel/syscalls/syscall.tbl
index 2484167..f7f997a 100644
--- a/arch/m68k/kernel/syscalls/syscall.tbl
+++ b/arch/m68k/kernel/syscalls/syscall.tbl
@@ -454,3 +454,4 @@
 452	common	fchmodat2			sys_fchmodat2
 454	common	futex_wake			sys_futex_wake
 455	common	futex_wait			sys_futex_wait
+456	common	futex_requeue			sys_futex_requeue
diff --git a/arch/microblaze/kernel/syscalls/syscall.tbl b/arch/microblaze/kernel/syscalls/syscall.tbl
index f03927a..2967ec2 100644
--- a/arch/microblaze/kernel/syscalls/syscall.tbl
+++ b/arch/microblaze/kernel/syscalls/syscall.tbl
@@ -460,3 +460,4 @@
 452	common	fchmodat2			sys_fchmodat2
 454	common	futex_wake			sys_futex_wake
 455	common	futex_wait			sys_futex_wait
+456	common	futex_requeue			sys_futex_requeue
diff --git a/arch/mips/kernel/syscalls/syscall_n32.tbl b/arch/mips/kernel/syscalls/syscall_n32.tbl
index dbb5edf..383abb1 100644
--- a/arch/mips/kernel/syscalls/syscall_n32.tbl
+++ b/arch/mips/kernel/syscalls/syscall_n32.tbl
@@ -393,3 +393,4 @@
 452	n32	fchmodat2			sys_fchmodat2
 454	n32	futex_wake			sys_futex_wake
 455	n32	futex_wait			sys_futex_wait
+456	n32	futex_requeue			sys_futex_requeue
diff --git a/arch/mips/kernel/syscalls/syscall_n64.tbl b/arch/mips/kernel/syscalls/syscall_n64.tbl
index faff8df..c9bd09b 100644
--- a/arch/mips/kernel/syscalls/syscall_n64.tbl
+++ b/arch/mips/kernel/syscalls/syscall_n64.tbl
@@ -369,3 +369,4 @@
 452	n64	fchmodat2			sys_fchmodat2
 454	n64	futex_wake			sys_futex_wake
 455	n64	futex_wait			sys_futex_wait
+456	n64	futex_requeue			sys_futex_requeue
diff --git a/arch/mips/kernel/syscalls/syscall_o32.tbl b/arch/mips/kernel/syscalls/syscall_o32.tbl
index 542f756..ba5ef6c 100644
--- a/arch/mips/kernel/syscalls/syscall_o32.tbl
+++ b/arch/mips/kernel/syscalls/syscall_o32.tbl
@@ -442,3 +442,4 @@
 452	o32	fchmodat2			sys_fchmodat2
 454	o32	futex_wake			sys_futex_wake
 455	o32	futex_wait			sys_futex_wait
+456	o32	futex_requeue			sys_futex_requeue
diff --git a/arch/parisc/kernel/syscalls/syscall.tbl b/arch/parisc/kernel/syscalls/syscall.tbl
index 8e50e89..9f0f6df 100644
--- a/arch/parisc/kernel/syscalls/syscall.tbl
+++ b/arch/parisc/kernel/syscalls/syscall.tbl
@@ -453,3 +453,4 @@
 452	common	fchmodat2			sys_fchmodat2
 454	common	futex_wake			sys_futex_wake
 455	common	futex_wait			sys_futex_wait
+456	common	futex_requeue			sys_futex_requeue
diff --git a/arch/powerpc/kernel/syscalls/syscall.tbl b/arch/powerpc/kernel/syscalls/syscall.tbl
index ad33a99..26fc419 100644
--- a/arch/powerpc/kernel/syscalls/syscall.tbl
+++ b/arch/powerpc/kernel/syscalls/syscall.tbl
@@ -541,3 +541,4 @@
 452	common	fchmodat2			sys_fchmodat2
 454	common	futex_wake			sys_futex_wake
 455	common	futex_wait			sys_futex_wait
+456	common	futex_requeue			sys_futex_requeue
diff --git a/arch/s390/kernel/syscalls/syscall.tbl b/arch/s390/kernel/syscalls/syscall.tbl
index 418853f..31be90b 100644
--- a/arch/s390/kernel/syscalls/syscall.tbl
+++ b/arch/s390/kernel/syscalls/syscall.tbl
@@ -457,3 +457,4 @@
 452  common	fchmodat2		sys_fchmodat2			sys_fchmodat2
 454  common	futex_wake		sys_futex_wake			sys_futex_wake
 455  common	futex_wait		sys_futex_wait			sys_futex_wait
+456  common	futex_requeue		sys_futex_requeue			sys_futex_requeue
diff --git a/arch/sh/kernel/syscalls/syscall.tbl b/arch/sh/kernel/syscalls/syscall.tbl
index 8ef9557..4bc5d48 100644
--- a/arch/sh/kernel/syscalls/syscall.tbl
+++ b/arch/sh/kernel/syscalls/syscall.tbl
@@ -457,3 +457,4 @@
 452	common	fchmodat2			sys_fchmodat2
 454	common	futex_wake			sys_futex_wake
 455	common	futex_wait			sys_futex_wait
+456	common	futex_requeue			sys_futex_requeue
diff --git a/arch/sparc/kernel/syscalls/syscall.tbl b/arch/sparc/kernel/syscalls/syscall.tbl
index df59a9d..8404c8e 100644
--- a/arch/sparc/kernel/syscalls/syscall.tbl
+++ b/arch/sparc/kernel/syscalls/syscall.tbl
@@ -500,3 +500,4 @@
 452	common	fchmodat2			sys_fchmodat2
 454	common	futex_wake			sys_futex_wake
 455	common	futex_wait			sys_futex_wait
+456	common	futex_requeue			sys_futex_requeue
diff --git a/arch/x86/entry/syscalls/syscall_32.tbl b/arch/x86/entry/syscalls/syscall_32.tbl
index 0f66168..31c48bc 100644
--- a/arch/x86/entry/syscalls/syscall_32.tbl
+++ b/arch/x86/entry/syscalls/syscall_32.tbl
@@ -459,3 +459,4 @@
 452	i386	fchmodat2		sys_fchmodat2
 454	i386	futex_wake		sys_futex_wake
 455	i386	futex_wait		sys_futex_wait
+456	i386	futex_requeue		sys_futex_requeue
diff --git a/arch/x86/entry/syscalls/syscall_64.tbl b/arch/x86/entry/syscalls/syscall_64.tbl
index ddf6288..a577bb2 100644
--- a/arch/x86/entry/syscalls/syscall_64.tbl
+++ b/arch/x86/entry/syscalls/syscall_64.tbl
@@ -377,6 +377,7 @@
 453	64	map_shadow_stack	sys_map_shadow_stack
 454	common	futex_wake		sys_futex_wake
 455	common	futex_wait		sys_futex_wait
+456	common	futex_requeue		sys_futex_requeue
 
 #
 # Due to a historical design error, certain syscalls are numbered differently
diff --git a/arch/xtensa/kernel/syscalls/syscall.tbl b/arch/xtensa/kernel/syscalls/syscall.tbl
index ac278db..dd71ecc 100644
--- a/arch/xtensa/kernel/syscalls/syscall.tbl
+++ b/arch/xtensa/kernel/syscalls/syscall.tbl
@@ -425,3 +425,4 @@
 452	common	fchmodat2			sys_fchmodat2
 454	common	futex_wake			sys_futex_wake
 455	common	futex_wait			sys_futex_wait
+456	common	futex_requeue			sys_futex_requeue
diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
index 11f3fdd..0901af6 100644
--- a/include/linux/syscalls.h
+++ b/include/linux/syscalls.h
@@ -556,6 +556,9 @@ asmlinkage long sys_futex_wait(void __user *uaddr, unsigned long val, unsigned l
 			       unsigned int flags, struct __kernel_timespec __user *timespec,
 			       clockid_t clockid);
 
+asmlinkage long sys_futex_requeue(struct futex_waitv __user *waiters,
+				  unsigned int flags, int nr_wake, int nr_requeue);
+
 asmlinkage long sys_nanosleep(struct __kernel_timespec __user *rqtp,
 			      struct __kernel_timespec __user *rmtp);
 asmlinkage long sys_nanosleep_time32(struct old_timespec32 __user *rqtp,
diff --git a/include/uapi/asm-generic/unistd.h b/include/uapi/asm-generic/unistd.h
index f6553bd..d9e9cd1 100644
--- a/include/uapi/asm-generic/unistd.h
+++ b/include/uapi/asm-generic/unistd.h
@@ -826,9 +826,11 @@ __SYSCALL(__NR_fchmodat2, sys_fchmodat2)
 __SYSCALL(__NR_futex_wake, sys_futex_wake)
 #define __NR_futex_wait 455
 __SYSCALL(__NR_futex_wait, sys_futex_wait)
+#define __NR_futex_requeue 456
+__SYSCALL(__NR_futex_requeue, sys_futex_requeue)
 
 #undef __NR_syscalls
-#define __NR_syscalls 456
+#define __NR_syscalls 457
 
 /*
  * 32 bit systems traditionally used different
diff --git a/kernel/futex/syscalls.c b/kernel/futex/syscalls.c
index dde9b74..8200d86 100644
--- a/kernel/futex/syscalls.c
+++ b/kernel/futex/syscalls.c
@@ -396,6 +396,44 @@ SYSCALL_DEFINE6(futex_wait,
 	return ret;
 }
 
+/*
+ * sys_futex_requeue - Requeue a waiter from one futex to another
+ * @waiters:	array describing the source and destination futex
+ * @flags:	unused
+ * @nr_wake:	number of futexes to wake
+ * @nr_requeue:	number of futexes to requeue
+ *
+ * Identical to the traditional FUTEX_CMP_REQUEUE op, except it is part of the
+ * futex2 family of calls.
+ */
+
+SYSCALL_DEFINE4(futex_requeue,
+		struct futex_waitv __user *, waiters,
+		unsigned int, flags,
+		int, nr_wake,
+		int, nr_requeue)
+{
+	struct futex_vector futexes[2];
+	u32 cmpval;
+	int ret;
+
+	if (flags)
+		return -EINVAL;
+
+	if (!waiters)
+		return -EINVAL;
+
+	ret = futex_parse_waitv(futexes, waiters, 2);
+	if (ret)
+		return ret;
+
+	cmpval = futexes[0].w.val;
+
+	return futex_requeue(u64_to_user_ptr(futexes[0].w.uaddr), futexes[0].w.flags,
+			     u64_to_user_ptr(futexes[1].w.uaddr), futexes[1].w.flags,
+			     nr_wake, nr_requeue, &cmpval, 0);
+}
+
 #ifdef CONFIG_COMPAT
 COMPAT_SYSCALL_DEFINE2(set_robust_list,
 		struct compat_robust_list_head __user *, head,
diff --git a/kernel/sys_ni.c b/kernel/sys_ni.c
index 13df391..9db51ea 100644
--- a/kernel/sys_ni.c
+++ b/kernel/sys_ni.c
@@ -89,6 +89,7 @@ COND_SYSCALL_COMPAT(get_robust_list);
 COND_SYSCALL(futex_waitv);
 COND_SYSCALL(futex_wake);
 COND_SYSCALL(futex_wait);
+COND_SYSCALL(futex_requeue);
 COND_SYSCALL(kexec_load);
 COND_SYSCALL_COMPAT(kexec_load);
 COND_SYSCALL(init_module);

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [tip: locking/core] futex: Add flags2 argument to futex_requeue()
  2023-09-21 10:45 ` [PATCH v3 09/15] futex: Add flags2 argument to futex_requeue() peterz
@ 2023-09-21 17:36   ` tip-bot2 for peterz@infradead.org
  0 siblings, 0 replies; 36+ messages in thread
From: tip-bot2 for peterz@infradead.org @ 2023-09-21 17:36 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Peter Zijlstra (Intel), Thomas Gleixner, x86, linux-kernel

The following commit has been merged into the locking/core branch of tip:

Commit-ID:     27b88f3519e72d71c8cead6b835a26c171109c9b
Gitweb:        https://git.kernel.org/tip/27b88f3519e72d71c8cead6b835a26c171109c9b
Author:        peterz@infradead.org <peterz@infradead.org>
AuthorDate:    Thu, 21 Sep 2023 12:45:14 +02:00
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Thu, 21 Sep 2023 19:22:09 +02:00

futex: Add flags2 argument to futex_requeue()

In order to support mixed size requeue, add a second flags argument to
the internal futex_requeue() function.

No functional change intended.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20230921105248.396780136@noisy.programming.kicks-ass.net
---
 kernel/futex/futex.h    |  5 +++--
 kernel/futex/requeue.c  | 12 +++++++-----
 kernel/futex/syscalls.c |  6 +++---
 3 files changed, 13 insertions(+), 10 deletions(-)

diff --git a/kernel/futex/futex.h b/kernel/futex/futex.h
index a8ea5ef..a06030a 100644
--- a/kernel/futex/futex.h
+++ b/kernel/futex/futex.h
@@ -328,8 +328,9 @@ extern int futex_wait_requeue_pi(u32 __user *uaddr, unsigned int flags, u32
 				 val, ktime_t *abs_time, u32 bitset, u32 __user
 				 *uaddr2);
 
-extern int futex_requeue(u32 __user *uaddr1, unsigned int flags,
-			 u32 __user *uaddr2, int nr_wake, int nr_requeue,
+extern int futex_requeue(u32 __user *uaddr1, unsigned int flags1,
+			 u32 __user *uaddr2, unsigned int flags2,
+			 int nr_wake, int nr_requeue,
 			 u32 *cmpval, int requeue_pi);
 
 extern int __futex_wait(u32 __user *uaddr, unsigned int flags, u32 val,
diff --git a/kernel/futex/requeue.c b/kernel/futex/requeue.c
index 5bf6958..a0a7995 100644
--- a/kernel/futex/requeue.c
+++ b/kernel/futex/requeue.c
@@ -346,8 +346,9 @@ futex_proxy_trylock_atomic(u32 __user *pifutex, struct futex_hash_bucket *hb1,
 /**
  * futex_requeue() - Requeue waiters from uaddr1 to uaddr2
  * @uaddr1:	source futex user address
- * @flags:	futex flags (FLAGS_SHARED, etc.)
+ * @flags1:	futex flags (FLAGS_SHARED, etc.)
  * @uaddr2:	target futex user address
+ * @flags2:	futex flags (FLAGS_SHARED, etc.)
  * @nr_wake:	number of waiters to wake (must be 1 for requeue_pi)
  * @nr_requeue:	number of waiters to requeue (0-INT_MAX)
  * @cmpval:	@uaddr1 expected value (or %NULL)
@@ -361,7 +362,8 @@ futex_proxy_trylock_atomic(u32 __user *pifutex, struct futex_hash_bucket *hb1,
  *  - >=0 - on success, the number of tasks requeued or woken;
  *  -  <0 - on error
  */
-int futex_requeue(u32 __user *uaddr1, unsigned int flags, u32 __user *uaddr2,
+int futex_requeue(u32 __user *uaddr1, unsigned int flags1,
+		  u32 __user *uaddr2, unsigned int flags2,
 		  int nr_wake, int nr_requeue, u32 *cmpval, int requeue_pi)
 {
 	union futex_key key1 = FUTEX_KEY_INIT, key2 = FUTEX_KEY_INIT;
@@ -424,10 +426,10 @@ int futex_requeue(u32 __user *uaddr1, unsigned int flags, u32 __user *uaddr2,
 	}
 
 retry:
-	ret = get_futex_key(uaddr1, flags, &key1, FUTEX_READ);
+	ret = get_futex_key(uaddr1, flags1, &key1, FUTEX_READ);
 	if (unlikely(ret != 0))
 		return ret;
-	ret = get_futex_key(uaddr2, flags, &key2,
+	ret = get_futex_key(uaddr2, flags2, &key2,
 			    requeue_pi ? FUTEX_WRITE : FUTEX_READ);
 	if (unlikely(ret != 0))
 		return ret;
@@ -459,7 +461,7 @@ retry_private:
 			if (ret)
 				return ret;
 
-			if (!(flags & FLAGS_SHARED))
+			if (!(flags1 & FLAGS_SHARED))
 				goto retry_private;
 
 			goto retry;
diff --git a/kernel/futex/syscalls.c b/kernel/futex/syscalls.c
index e4c8ec7..dde9b74 100644
--- a/kernel/futex/syscalls.c
+++ b/kernel/futex/syscalls.c
@@ -106,9 +106,9 @@ long do_futex(u32 __user *uaddr, int op, u32 val, ktime_t *timeout,
 	case FUTEX_WAKE_BITSET:
 		return futex_wake(uaddr, flags, val, val3);
 	case FUTEX_REQUEUE:
-		return futex_requeue(uaddr, flags, uaddr2, val, val2, NULL, 0);
+		return futex_requeue(uaddr, flags, uaddr2, flags, val, val2, NULL, 0);
 	case FUTEX_CMP_REQUEUE:
-		return futex_requeue(uaddr, flags, uaddr2, val, val2, &val3, 0);
+		return futex_requeue(uaddr, flags, uaddr2, flags, val, val2, &val3, 0);
 	case FUTEX_WAKE_OP:
 		return futex_wake_op(uaddr, flags, uaddr2, val, val2, val3);
 	case FUTEX_LOCK_PI:
@@ -125,7 +125,7 @@ long do_futex(u32 __user *uaddr, int op, u32 val, ktime_t *timeout,
 		return futex_wait_requeue_pi(uaddr, flags, val, timeout, val3,
 					     uaddr2);
 	case FUTEX_CMP_REQUEUE_PI:
-		return futex_requeue(uaddr, flags, uaddr2, val, val2, &val3, 1);
+		return futex_requeue(uaddr, flags, uaddr2, flags, val, val2, &val3, 1);
 	}
 	return -ENOSYS;
 }

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [tip: locking/core] futex: Propagate flags into get_futex_key()
  2023-09-21 10:45 ` [PATCH v3 08/15] futex: Propagate flags into get_futex_key() peterz
@ 2023-09-21 17:36   ` tip-bot2 for peterz@infradead.org
  0 siblings, 0 replies; 36+ messages in thread
From: tip-bot2 for peterz@infradead.org @ 2023-09-21 17:36 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Peter Zijlstra (Intel), Thomas Gleixner, x86, linux-kernel

The following commit has been merged into the locking/core branch of tip:

Commit-ID:     3b63a55f498b763aba0886b244df613587a73c46
Gitweb:        https://git.kernel.org/tip/3b63a55f498b763aba0886b244df613587a73c46
Author:        peterz@infradead.org <peterz@infradead.org>
AuthorDate:    Thu, 21 Sep 2023 12:45:13 +02:00
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Thu, 21 Sep 2023 19:22:09 +02:00

futex: Propagate flags into get_futex_key()

Instead of only passing FLAGS_SHARED as a boolean, pass down flags as
a whole.

No functional change intended.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20230921105248.282857501@noisy.programming.kicks-ass.net
---
 kernel/futex/core.c     |  7 +++++--
 kernel/futex/futex.h    |  2 +-
 kernel/futex/pi.c       |  4 ++--
 kernel/futex/requeue.c  |  6 +++---
 kernel/futex/waitwake.c | 14 +++++++-------
 5 files changed, 18 insertions(+), 15 deletions(-)

diff --git a/kernel/futex/core.c b/kernel/futex/core.c
index d1d7b3c..ade7c73 100644
--- a/kernel/futex/core.c
+++ b/kernel/futex/core.c
@@ -193,7 +193,7 @@ static u64 get_inode_sequence_number(struct inode *inode)
 /**
  * get_futex_key() - Get parameters which are the keys for a futex
  * @uaddr:	virtual address of the futex
- * @fshared:	false for a PROCESS_PRIVATE futex, true for PROCESS_SHARED
+ * @flags:	FLAGS_*
  * @key:	address where result is stored.
  * @rw:		mapping needs to be read/write (values: FUTEX_READ,
  *              FUTEX_WRITE)
@@ -217,7 +217,7 @@ static u64 get_inode_sequence_number(struct inode *inode)
  *
  * lock_page() might sleep, the caller should not hold a spinlock.
  */
-int get_futex_key(u32 __user *uaddr, bool fshared, union futex_key *key,
+int get_futex_key(u32 __user *uaddr, unsigned int flags, union futex_key *key,
 		  enum futex_access rw)
 {
 	unsigned long address = (unsigned long)uaddr;
@@ -226,6 +226,9 @@ int get_futex_key(u32 __user *uaddr, bool fshared, union futex_key *key,
 	struct folio *folio;
 	struct address_space *mapping;
 	int err, ro = 0;
+	bool fshared;
+
+	fshared = flags & FLAGS_SHARED;
 
 	/*
 	 * The futex address must be "naturally" aligned.
diff --git a/kernel/futex/futex.h b/kernel/futex/futex.h
index e74888a..a8ea5ef 100644
--- a/kernel/futex/futex.h
+++ b/kernel/futex/futex.h
@@ -184,7 +184,7 @@ enum futex_access {
 	FUTEX_WRITE
 };
 
-extern int get_futex_key(u32 __user *uaddr, bool fshared, union futex_key *key,
+extern int get_futex_key(u32 __user *uaddr, unsigned int flags, union futex_key *key,
 			 enum futex_access rw);
 
 extern struct hrtimer_sleeper *
diff --git a/kernel/futex/pi.c b/kernel/futex/pi.c
index d636a1b..90e5197 100644
--- a/kernel/futex/pi.c
+++ b/kernel/futex/pi.c
@@ -933,7 +933,7 @@ int futex_lock_pi(u32 __user *uaddr, unsigned int flags, ktime_t *time, int tryl
 	to = futex_setup_timer(time, &timeout, flags, 0);
 
 retry:
-	ret = get_futex_key(uaddr, flags & FLAGS_SHARED, &q.key, FUTEX_WRITE);
+	ret = get_futex_key(uaddr, flags, &q.key, FUTEX_WRITE);
 	if (unlikely(ret != 0))
 		goto out;
 
@@ -1129,7 +1129,7 @@ retry:
 	if ((uval & FUTEX_TID_MASK) != vpid)
 		return -EPERM;
 
-	ret = get_futex_key(uaddr, flags & FLAGS_SHARED, &key, FUTEX_WRITE);
+	ret = get_futex_key(uaddr, flags, &key, FUTEX_WRITE);
 	if (ret)
 		return ret;
 
diff --git a/kernel/futex/requeue.c b/kernel/futex/requeue.c
index 4c73e0b..5bf6958 100644
--- a/kernel/futex/requeue.c
+++ b/kernel/futex/requeue.c
@@ -424,10 +424,10 @@ int futex_requeue(u32 __user *uaddr1, unsigned int flags, u32 __user *uaddr2,
 	}
 
 retry:
-	ret = get_futex_key(uaddr1, flags & FLAGS_SHARED, &key1, FUTEX_READ);
+	ret = get_futex_key(uaddr1, flags, &key1, FUTEX_READ);
 	if (unlikely(ret != 0))
 		return ret;
-	ret = get_futex_key(uaddr2, flags & FLAGS_SHARED, &key2,
+	ret = get_futex_key(uaddr2, flags, &key2,
 			    requeue_pi ? FUTEX_WRITE : FUTEX_READ);
 	if (unlikely(ret != 0))
 		return ret;
@@ -789,7 +789,7 @@ int futex_wait_requeue_pi(u32 __user *uaddr, unsigned int flags,
 	 */
 	rt_mutex_init_waiter(&rt_waiter);
 
-	ret = get_futex_key(uaddr2, flags & FLAGS_SHARED, &key2, FUTEX_WRITE);
+	ret = get_futex_key(uaddr2, flags, &key2, FUTEX_WRITE);
 	if (unlikely(ret != 0))
 		goto out;
 
diff --git a/kernel/futex/waitwake.c b/kernel/futex/waitwake.c
index b109a08..37860f7 100644
--- a/kernel/futex/waitwake.c
+++ b/kernel/futex/waitwake.c
@@ -145,13 +145,13 @@ int futex_wake(u32 __user *uaddr, unsigned int flags, int nr_wake, u32 bitset)
 	struct futex_hash_bucket *hb;
 	struct futex_q *this, *next;
 	union futex_key key = FUTEX_KEY_INIT;
-	int ret;
 	DEFINE_WAKE_Q(wake_q);
+	int ret;
 
 	if (!bitset)
 		return -EINVAL;
 
-	ret = get_futex_key(uaddr, flags & FLAGS_SHARED, &key, FUTEX_READ);
+	ret = get_futex_key(uaddr, flags, &key, FUTEX_READ);
 	if (unlikely(ret != 0))
 		return ret;
 
@@ -248,10 +248,10 @@ int futex_wake_op(u32 __user *uaddr1, unsigned int flags, u32 __user *uaddr2,
 	DEFINE_WAKE_Q(wake_q);
 
 retry:
-	ret = get_futex_key(uaddr1, flags & FLAGS_SHARED, &key1, FUTEX_READ);
+	ret = get_futex_key(uaddr1, flags, &key1, FUTEX_READ);
 	if (unlikely(ret != 0))
 		return ret;
-	ret = get_futex_key(uaddr2, flags & FLAGS_SHARED, &key2, FUTEX_WRITE);
+	ret = get_futex_key(uaddr2, flags, &key2, FUTEX_WRITE);
 	if (unlikely(ret != 0))
 		return ret;
 
@@ -426,7 +426,7 @@ retry:
 			continue;
 
 		ret = get_futex_key(u64_to_user_ptr(vs[i].w.uaddr),
-				    vs[i].w.flags & FLAGS_SHARED,
+				    vs[i].w.flags,
 				    &vs[i].q.key, FUTEX_READ);
 
 		if (unlikely(ret))
@@ -438,7 +438,7 @@ retry:
 	for (i = 0; i < count; i++) {
 		u32 __user *uaddr = (u32 __user *)(unsigned long)vs[i].w.uaddr;
 		struct futex_q *q = &vs[i].q;
-		u32 val = (u32)vs[i].w.val;
+		u32 val = vs[i].w.val;
 
 		hb = futex_q_lock(q);
 		ret = futex_get_value_locked(&uval, uaddr);
@@ -602,7 +602,7 @@ int futex_wait_setup(u32 __user *uaddr, u32 val, unsigned int flags,
 	 * while the syscall executes.
 	 */
 retry:
-	ret = get_futex_key(uaddr, flags & FLAGS_SHARED, &q->key, FUTEX_READ);
+	ret = get_futex_key(uaddr, flags, &q->key, FUTEX_READ);
 	if (unlikely(ret != 0))
 		return ret;
 

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [tip: locking/core] futex: Add sys_futex_wait()
  2023-09-21 10:45 ` [PATCH v3 07/15] futex: Add sys_futex_wait() peterz
@ 2023-09-21 17:36   ` tip-bot2 for peterz@infradead.org
  0 siblings, 0 replies; 36+ messages in thread
From: tip-bot2 for peterz@infradead.org @ 2023-09-21 17:36 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Peter Zijlstra (Intel),
	Thomas Gleixner, Geert Uytterhoeven, x86, linux-kernel

The following commit has been merged into the locking/core branch of tip:

Commit-ID:     cb8c4312afca1b2dc64107e7e7cea81911055612
Gitweb:        https://git.kernel.org/tip/cb8c4312afca1b2dc64107e7e7cea81911055612
Author:        peterz@infradead.org <peterz@infradead.org>
AuthorDate:    Thu, 21 Sep 2023 12:45:12 +02:00
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Thu, 21 Sep 2023 19:22:08 +02:00

futex: Add sys_futex_wait()

To complement sys_futex_waitv()/wake(), add sys_futex_wait(). This
syscall implements what was previously known as FUTEX_WAIT_BITSET
except it uses 'unsigned long' for the value and bitmask arguments,
takes timespec and clockid_t arguments for the absolute timeout and
uses FUTEX2 flags.

The 'unsigned long' allows FUTEX2_SIZE_U64 on 64bit platforms.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Link: https://lore.kernel.org/r/20230921105248.164324363@noisy.programming.kicks-ass.net
---
 arch/alpha/kernel/syscalls/syscall.tbl      |   1 +-
 arch/arm/tools/syscall.tbl                  |   1 +-
 arch/arm64/include/asm/unistd.h             |   2 +-
 arch/arm64/include/asm/unistd32.h           |   2 +-
 arch/ia64/kernel/syscalls/syscall.tbl       |   1 +-
 arch/m68k/kernel/syscalls/syscall.tbl       |   1 +-
 arch/microblaze/kernel/syscalls/syscall.tbl |   1 +-
 arch/mips/kernel/syscalls/syscall_n32.tbl   |   1 +-
 arch/mips/kernel/syscalls/syscall_n64.tbl   |   1 +-
 arch/mips/kernel/syscalls/syscall_o32.tbl   |   1 +-
 arch/parisc/kernel/syscalls/syscall.tbl     |   1 +-
 arch/powerpc/kernel/syscalls/syscall.tbl    |   1 +-
 arch/s390/kernel/syscalls/syscall.tbl       |   1 +-
 arch/sh/kernel/syscalls/syscall.tbl         |   1 +-
 arch/sparc/kernel/syscalls/syscall.tbl      |   1 +-
 arch/x86/entry/syscalls/syscall_32.tbl      |   1 +-
 arch/x86/entry/syscalls/syscall_64.tbl      |   1 +-
 arch/xtensa/kernel/syscalls/syscall.tbl     |   1 +-
 include/linux/syscalls.h                    |   4 +-
 include/uapi/asm-generic/unistd.h           |   4 +-
 kernel/futex/futex.h                        |   3 +-
 kernel/futex/syscalls.c                     | 120 ++++++++++++++-----
 kernel/futex/waitwake.c                     |  61 +++++-----
 kernel/sys_ni.c                             |   1 +-
 24 files changed, 156 insertions(+), 57 deletions(-)

diff --git a/arch/alpha/kernel/syscalls/syscall.tbl b/arch/alpha/kernel/syscalls/syscall.tbl
index 3b86519..c49f12f 100644
--- a/arch/alpha/kernel/syscalls/syscall.tbl
+++ b/arch/alpha/kernel/syscalls/syscall.tbl
@@ -493,3 +493,4 @@
 561	common	cachestat			sys_cachestat
 562	common	fchmodat2			sys_fchmodat2
 563	common	futex_wake			sys_futex_wake
+564	common	futex_wait			sys_futex_wait
diff --git a/arch/arm/tools/syscall.tbl b/arch/arm/tools/syscall.tbl
index 714abeb..a6cf562 100644
--- a/arch/arm/tools/syscall.tbl
+++ b/arch/arm/tools/syscall.tbl
@@ -467,3 +467,4 @@
 451	common	cachestat			sys_cachestat
 452	common	fchmodat2			sys_fchmodat2
 454	common	futex_wake			sys_futex_wake
+455	common	futex_wait			sys_futex_wait
diff --git a/arch/arm64/include/asm/unistd.h b/arch/arm64/include/asm/unistd.h
index 63a8a9c..f33190f 100644
--- a/arch/arm64/include/asm/unistd.h
+++ b/arch/arm64/include/asm/unistd.h
@@ -39,7 +39,7 @@
 #define __ARM_NR_compat_set_tls		(__ARM_NR_COMPAT_BASE + 5)
 #define __ARM_NR_COMPAT_END		(__ARM_NR_COMPAT_BASE + 0x800)
 
-#define __NR_compat_syscalls		455
+#define __NR_compat_syscalls		456
 #endif
 
 #define __ARCH_WANT_SYS_CLONE
diff --git a/arch/arm64/include/asm/unistd32.h b/arch/arm64/include/asm/unistd32.h
index 6897468..6e7d372 100644
--- a/arch/arm64/include/asm/unistd32.h
+++ b/arch/arm64/include/asm/unistd32.h
@@ -913,6 +913,8 @@ __SYSCALL(__NR_cachestat, sys_cachestat)
 __SYSCALL(__NR_fchmodat2, sys_fchmodat2)
 #define __NR_futex_wake 454
 __SYSCALL(__NR_futex_wake, sys_futex_wake)
+#define __NR_futex_wait 455
+__SYSCALL(__NR_futex_wait, sys_futex_wait)
 
 /*
  * Please add new compat syscalls above this comment and update
diff --git a/arch/ia64/kernel/syscalls/syscall.tbl b/arch/ia64/kernel/syscalls/syscall.tbl
index cd50247..4043f0c 100644
--- a/arch/ia64/kernel/syscalls/syscall.tbl
+++ b/arch/ia64/kernel/syscalls/syscall.tbl
@@ -374,3 +374,4 @@
 451	common	cachestat			sys_cachestat
 452	common	fchmodat2			sys_fchmodat2
 454	common	futex_wake			sys_futex_wake
+455	common	futex_wait			sys_futex_wait
diff --git a/arch/m68k/kernel/syscalls/syscall.tbl b/arch/m68k/kernel/syscalls/syscall.tbl
index 21eb35c..2484167 100644
--- a/arch/m68k/kernel/syscalls/syscall.tbl
+++ b/arch/m68k/kernel/syscalls/syscall.tbl
@@ -453,3 +453,4 @@
 451	common	cachestat			sys_cachestat
 452	common	fchmodat2			sys_fchmodat2
 454	common	futex_wake			sys_futex_wake
+455	common	futex_wait			sys_futex_wait
diff --git a/arch/microblaze/kernel/syscalls/syscall.tbl b/arch/microblaze/kernel/syscalls/syscall.tbl
index 3a4e851..f03927a 100644
--- a/arch/microblaze/kernel/syscalls/syscall.tbl
+++ b/arch/microblaze/kernel/syscalls/syscall.tbl
@@ -459,3 +459,4 @@
 451	common	cachestat			sys_cachestat
 452	common	fchmodat2			sys_fchmodat2
 454	common	futex_wake			sys_futex_wake
+455	common	futex_wait			sys_futex_wait
diff --git a/arch/mips/kernel/syscalls/syscall_n32.tbl b/arch/mips/kernel/syscalls/syscall_n32.tbl
index 6883ea3..dbb5edf 100644
--- a/arch/mips/kernel/syscalls/syscall_n32.tbl
+++ b/arch/mips/kernel/syscalls/syscall_n32.tbl
@@ -392,3 +392,4 @@
 451	n32	cachestat			sys_cachestat
 452	n32	fchmodat2			sys_fchmodat2
 454	n32	futex_wake			sys_futex_wake
+455	n32	futex_wait			sys_futex_wait
diff --git a/arch/mips/kernel/syscalls/syscall_n64.tbl b/arch/mips/kernel/syscalls/syscall_n64.tbl
index 48bc0fb..faff8df 100644
--- a/arch/mips/kernel/syscalls/syscall_n64.tbl
+++ b/arch/mips/kernel/syscalls/syscall_n64.tbl
@@ -368,3 +368,4 @@
 451	n64	cachestat			sys_cachestat
 452	n64	fchmodat2			sys_fchmodat2
 454	n64	futex_wake			sys_futex_wake
+455	n64	futex_wait			sys_futex_wait
diff --git a/arch/mips/kernel/syscalls/syscall_o32.tbl b/arch/mips/kernel/syscalls/syscall_o32.tbl
index a92625f..542f756 100644
--- a/arch/mips/kernel/syscalls/syscall_o32.tbl
+++ b/arch/mips/kernel/syscalls/syscall_o32.tbl
@@ -441,3 +441,4 @@
 451	o32	cachestat			sys_cachestat
 452	o32	fchmodat2			sys_fchmodat2
 454	o32	futex_wake			sys_futex_wake
+455	o32	futex_wait			sys_futex_wait
diff --git a/arch/parisc/kernel/syscalls/syscall.tbl b/arch/parisc/kernel/syscalls/syscall.tbl
index 57faa97..8e50e89 100644
--- a/arch/parisc/kernel/syscalls/syscall.tbl
+++ b/arch/parisc/kernel/syscalls/syscall.tbl
@@ -452,3 +452,4 @@
 451	common	cachestat			sys_cachestat
 452	common	fchmodat2			sys_fchmodat2
 454	common	futex_wake			sys_futex_wake
+455	common	futex_wait			sys_futex_wait
diff --git a/arch/powerpc/kernel/syscalls/syscall.tbl b/arch/powerpc/kernel/syscalls/syscall.tbl
index e6c6ed6..ad33a99 100644
--- a/arch/powerpc/kernel/syscalls/syscall.tbl
+++ b/arch/powerpc/kernel/syscalls/syscall.tbl
@@ -540,3 +540,4 @@
 451	common	cachestat			sys_cachestat
 452	common	fchmodat2			sys_fchmodat2
 454	common	futex_wake			sys_futex_wake
+455	common	futex_wait			sys_futex_wait
diff --git a/arch/s390/kernel/syscalls/syscall.tbl b/arch/s390/kernel/syscalls/syscall.tbl
index 7547201..418853f 100644
--- a/arch/s390/kernel/syscalls/syscall.tbl
+++ b/arch/s390/kernel/syscalls/syscall.tbl
@@ -456,3 +456,4 @@
 451  common	cachestat		sys_cachestat			sys_cachestat
 452  common	fchmodat2		sys_fchmodat2			sys_fchmodat2
 454  common	futex_wake		sys_futex_wake			sys_futex_wake
+455  common	futex_wait		sys_futex_wait			sys_futex_wait
diff --git a/arch/sh/kernel/syscalls/syscall.tbl b/arch/sh/kernel/syscalls/syscall.tbl
index 902a997..8ef9557 100644
--- a/arch/sh/kernel/syscalls/syscall.tbl
+++ b/arch/sh/kernel/syscalls/syscall.tbl
@@ -456,3 +456,4 @@
 451	common	cachestat			sys_cachestat
 452	common	fchmodat2			sys_fchmodat2
 454	common	futex_wake			sys_futex_wake
+455	common	futex_wait			sys_futex_wait
diff --git a/arch/sparc/kernel/syscalls/syscall.tbl b/arch/sparc/kernel/syscalls/syscall.tbl
index 8a1f887..df59a9d 100644
--- a/arch/sparc/kernel/syscalls/syscall.tbl
+++ b/arch/sparc/kernel/syscalls/syscall.tbl
@@ -499,3 +499,4 @@
 451	common	cachestat			sys_cachestat
 452	common	fchmodat2			sys_fchmodat2
 454	common	futex_wake			sys_futex_wake
+455	common	futex_wait			sys_futex_wait
diff --git a/arch/x86/entry/syscalls/syscall_32.tbl b/arch/x86/entry/syscalls/syscall_32.tbl
index 9e81323..0f66168 100644
--- a/arch/x86/entry/syscalls/syscall_32.tbl
+++ b/arch/x86/entry/syscalls/syscall_32.tbl
@@ -458,3 +458,4 @@
 451	i386	cachestat		sys_cachestat
 452	i386	fchmodat2		sys_fchmodat2
 454	i386	futex_wake		sys_futex_wake
+455	i386	futex_wait		sys_futex_wait
diff --git a/arch/x86/entry/syscalls/syscall_64.tbl b/arch/x86/entry/syscalls/syscall_64.tbl
index d10a600..ddf6288 100644
--- a/arch/x86/entry/syscalls/syscall_64.tbl
+++ b/arch/x86/entry/syscalls/syscall_64.tbl
@@ -376,6 +376,7 @@
 452	common	fchmodat2		sys_fchmodat2
 453	64	map_shadow_stack	sys_map_shadow_stack
 454	common	futex_wake		sys_futex_wake
+455	common	futex_wait		sys_futex_wait
 
 #
 # Due to a historical design error, certain syscalls are numbered differently
diff --git a/arch/xtensa/kernel/syscalls/syscall.tbl b/arch/xtensa/kernel/syscalls/syscall.tbl
index 4e511bf..ac278db 100644
--- a/arch/xtensa/kernel/syscalls/syscall.tbl
+++ b/arch/xtensa/kernel/syscalls/syscall.tbl
@@ -424,3 +424,4 @@
 451	common	cachestat			sys_cachestat
 452	common	fchmodat2			sys_fchmodat2
 454	common	futex_wake			sys_futex_wake
+455	common	futex_wait			sys_futex_wait
diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
index e174ed8..11f3fdd 100644
--- a/include/linux/syscalls.h
+++ b/include/linux/syscalls.h
@@ -552,6 +552,10 @@ asmlinkage long sys_futex_waitv(struct futex_waitv *waiters,
 
 asmlinkage long sys_futex_wake(void __user *uaddr, unsigned long mask, int nr, unsigned int flags);
 
+asmlinkage long sys_futex_wait(void __user *uaddr, unsigned long val, unsigned long mask,
+			       unsigned int flags, struct __kernel_timespec __user *timespec,
+			       clockid_t clockid);
+
 asmlinkage long sys_nanosleep(struct __kernel_timespec __user *rqtp,
 			      struct __kernel_timespec __user *rmtp);
 asmlinkage long sys_nanosleep_time32(struct old_timespec32 __user *rqtp,
diff --git a/include/uapi/asm-generic/unistd.h b/include/uapi/asm-generic/unistd.h
index f5454e6..f6553bd 100644
--- a/include/uapi/asm-generic/unistd.h
+++ b/include/uapi/asm-generic/unistd.h
@@ -824,9 +824,11 @@ __SYSCALL(__NR_cachestat, sys_cachestat)
 __SYSCALL(__NR_fchmodat2, sys_fchmodat2)
 #define __NR_futex_wake 454
 __SYSCALL(__NR_futex_wake, sys_futex_wake)
+#define __NR_futex_wait 455
+__SYSCALL(__NR_futex_wait, sys_futex_wait)
 
 #undef __NR_syscalls
-#define __NR_syscalls 455
+#define __NR_syscalls 456
 
 /*
  * 32 bit systems traditionally used different
diff --git a/kernel/futex/futex.h b/kernel/futex/futex.h
index 0e7821a..e74888a 100644
--- a/kernel/futex/futex.h
+++ b/kernel/futex/futex.h
@@ -332,6 +332,9 @@ extern int futex_requeue(u32 __user *uaddr1, unsigned int flags,
 			 u32 __user *uaddr2, int nr_wake, int nr_requeue,
 			 u32 *cmpval, int requeue_pi);
 
+extern int __futex_wait(u32 __user *uaddr, unsigned int flags, u32 val,
+			struct hrtimer_sleeper *to, u32 bitset);
+
 extern int futex_wait(u32 __user *uaddr, unsigned int flags, u32 val,
 		      ktime_t *abs_time, u32 bitset);
 
diff --git a/kernel/futex/syscalls.c b/kernel/futex/syscalls.c
index 4739892..e4c8ec7 100644
--- a/kernel/futex/syscalls.c
+++ b/kernel/futex/syscalls.c
@@ -221,6 +221,46 @@ static int futex_parse_waitv(struct futex_vector *futexv,
 	return 0;
 }
 
+static int futex2_setup_timeout(struct __kernel_timespec __user *timeout,
+				clockid_t clockid, struct hrtimer_sleeper *to)
+{
+	int flag_clkid = 0, flag_init = 0;
+	struct timespec64 ts;
+	ktime_t time;
+	int ret;
+
+	if (!timeout)
+		return 0;
+
+	if (clockid == CLOCK_REALTIME) {
+		flag_clkid = FLAGS_CLOCKRT;
+		flag_init = FUTEX_CLOCK_REALTIME;
+	}
+
+	if (clockid != CLOCK_REALTIME && clockid != CLOCK_MONOTONIC)
+		return -EINVAL;
+
+	if (get_timespec64(&ts, timeout))
+		return -EFAULT;
+
+	/*
+	 * Since there's no opcode for futex_waitv, use
+	 * FUTEX_WAIT_BITSET that uses absolute timeout as well
+	 */
+	ret = futex_init_timeout(FUTEX_WAIT_BITSET, flag_init, &ts, &time);
+	if (ret)
+		return ret;
+
+	futex_setup_timer(&time, to, flag_clkid, 0);
+	return 0;
+}
+
+static inline void futex2_destroy_timeout(struct hrtimer_sleeper *to)
+{
+	hrtimer_cancel(&to->timer);
+	destroy_hrtimer_on_stack(&to->timer);
+}
+
 /**
  * sys_futex_waitv - Wait on a list of futexes
  * @waiters:    List of futexes to wait on
@@ -250,8 +290,6 @@ SYSCALL_DEFINE5(futex_waitv, struct futex_waitv __user *, waiters,
 {
 	struct hrtimer_sleeper to;
 	struct futex_vector *futexv;
-	struct timespec64 ts;
-	ktime_t time;
 	int ret;
 
 	/* This syscall supports no flags for now */
@@ -261,30 +299,8 @@ SYSCALL_DEFINE5(futex_waitv, struct futex_waitv __user *, waiters,
 	if (!nr_futexes || nr_futexes > FUTEX_WAITV_MAX || !waiters)
 		return -EINVAL;
 
-	if (timeout) {
-		int flag_clkid = 0, flag_init = 0;
-
-		if (clockid == CLOCK_REALTIME) {
-			flag_clkid = FLAGS_CLOCKRT;
-			flag_init = FUTEX_CLOCK_REALTIME;
-		}
-
-		if (clockid != CLOCK_REALTIME && clockid != CLOCK_MONOTONIC)
-			return -EINVAL;
-
-		if (get_timespec64(&ts, timeout))
-			return -EFAULT;
-
-		/*
-		 * Since there's no opcode for futex_waitv, use
-		 * FUTEX_WAIT_BITSET that uses absolute timeout as well
-		 */
-		ret = futex_init_timeout(FUTEX_WAIT_BITSET, flag_init, &ts, &time);
-		if (ret)
-			return ret;
-
-		futex_setup_timer(&time, &to, flag_clkid, 0);
-	}
+	if (timeout && (ret = futex2_setup_timeout(timeout, clockid, &to)))
+		return ret;
 
 	futexv = kcalloc(nr_futexes, sizeof(*futexv), GFP_KERNEL);
 	if (!futexv) {
@@ -299,10 +315,8 @@ SYSCALL_DEFINE5(futex_waitv, struct futex_waitv __user *, waiters,
 	kfree(futexv);
 
 destroy_timer:
-	if (timeout) {
-		hrtimer_cancel(&to.timer);
-		destroy_hrtimer_on_stack(&to.timer);
-	}
+	if (timeout)
+		futex2_destroy_timeout(&to);
 	return ret;
 }
 
@@ -336,6 +350,52 @@ SYSCALL_DEFINE4(futex_wake,
 	return futex_wake(uaddr, FLAGS_STRICT | flags, nr, mask);
 }
 
+/*
+ * sys_futex_wait - Wait on a futex
+ * @uaddr:	Address of the futex to wait on
+ * @val:	Value of @uaddr
+ * @mask:	bitmask
+ * @flags:	FUTEX2 flags
+ * @timeout:	Optional absolute timeout
+ * @clockid:	Clock to be used for the timeout, realtime or monotonic
+ *
+ * Identical to the traditional FUTEX_WAIT_BITSET op, except it is part of the
+ * futex2 familiy of calls.
+ */
+
+SYSCALL_DEFINE6(futex_wait,
+		void __user *, uaddr,
+		unsigned long, val,
+		unsigned long, mask,
+		unsigned int, flags,
+		struct __kernel_timespec __user *, timeout,
+		clockid_t, clockid)
+{
+	struct hrtimer_sleeper to;
+	int ret;
+
+	if (flags & ~FUTEX2_VALID_MASK)
+		return -EINVAL;
+
+	flags = futex2_to_flags(flags);
+	if (!futex_flags_valid(flags))
+		return -EINVAL;
+
+	if (!futex_validate_input(flags, val) ||
+	    !futex_validate_input(flags, mask))
+		return -EINVAL;
+
+	if (timeout && (ret = futex2_setup_timeout(timeout, clockid, &to)))
+		return ret;
+
+	ret = __futex_wait(uaddr, flags, val, timeout ? &to : NULL, mask);
+
+	if (timeout)
+		futex2_destroy_timeout(&to);
+
+	return ret;
+}
+
 #ifdef CONFIG_COMPAT
 COMPAT_SYSCALL_DEFINE2(set_robust_list,
 		struct compat_robust_list_head __user *, head,
diff --git a/kernel/futex/waitwake.c b/kernel/futex/waitwake.c
index ceb05b8..b109a08 100644
--- a/kernel/futex/waitwake.c
+++ b/kernel/futex/waitwake.c
@@ -632,20 +632,18 @@ retry_private:
 	return ret;
 }
 
-int futex_wait(u32 __user *uaddr, unsigned int flags, u32 val, ktime_t *abs_time, u32 bitset)
+int __futex_wait(u32 __user *uaddr, unsigned int flags, u32 val,
+		 struct hrtimer_sleeper *to, u32 bitset)
 {
-	struct hrtimer_sleeper timeout, *to;
-	struct restart_block *restart;
-	struct futex_hash_bucket *hb;
 	struct futex_q q = futex_q_init;
+	struct futex_hash_bucket *hb;
 	int ret;
 
 	if (!bitset)
 		return -EINVAL;
+
 	q.bitset = bitset;
 
-	to = futex_setup_timer(abs_time, &timeout, flags,
-			       current->timer_slack_ns);
 retry:
 	/*
 	 * Prepare to wait on uaddr. On success, it holds hb->lock and q
@@ -653,18 +651,17 @@ retry:
 	 */
 	ret = futex_wait_setup(uaddr, val, flags, &q, &hb);
 	if (ret)
-		goto out;
+		return ret;
 
 	/* futex_queue and wait for wakeup, timeout, or a signal. */
 	futex_wait_queue(hb, &q, to);
 
 	/* If we were woken (and unqueued), we succeeded, whatever. */
-	ret = 0;
 	if (!futex_unqueue(&q))
-		goto out;
-	ret = -ETIMEDOUT;
+		return 0;
+
 	if (to && !to->task)
-		goto out;
+		return -ETIMEDOUT;
 
 	/*
 	 * We expect signal_pending(current), but we might be the
@@ -673,24 +670,38 @@ retry:
 	if (!signal_pending(current))
 		goto retry;
 
-	ret = -ERESTARTSYS;
-	if (!abs_time)
-		goto out;
+	return -ERESTARTSYS;
+}
 
-	restart = &current->restart_block;
-	restart->futex.uaddr = uaddr;
-	restart->futex.val = val;
-	restart->futex.time = *abs_time;
-	restart->futex.bitset = bitset;
-	restart->futex.flags = flags | FLAGS_HAS_TIMEOUT;
+int futex_wait(u32 __user *uaddr, unsigned int flags, u32 val, ktime_t *abs_time, u32 bitset)
+{
+	struct hrtimer_sleeper timeout, *to;
+	struct restart_block *restart;
+	int ret;
+
+	to = futex_setup_timer(abs_time, &timeout, flags,
+			       current->timer_slack_ns);
+
+	ret = __futex_wait(uaddr, flags, val, to, bitset);
+
+	/* No timeout, nothing to clean up. */
+	if (!to)
+		return ret;
+
+	hrtimer_cancel(&to->timer);
+	destroy_hrtimer_on_stack(&to->timer);
 
-	ret = set_restart_fn(restart, futex_wait_restart);
+	if (ret == -ERESTARTSYS) {
+		restart = &current->restart_block;
+		restart->futex.uaddr = uaddr;
+		restart->futex.val = val;
+		restart->futex.time = *abs_time;
+		restart->futex.bitset = bitset;
+		restart->futex.flags = flags | FLAGS_HAS_TIMEOUT;
 
-out:
-	if (to) {
-		hrtimer_cancel(&to->timer);
-		destroy_hrtimer_on_stack(&to->timer);
+		return set_restart_fn(restart, futex_wait_restart);
 	}
+
 	return ret;
 }
 
diff --git a/kernel/sys_ni.c b/kernel/sys_ni.c
index 983c058..13df391 100644
--- a/kernel/sys_ni.c
+++ b/kernel/sys_ni.c
@@ -88,6 +88,7 @@ COND_SYSCALL(get_robust_list);
 COND_SYSCALL_COMPAT(get_robust_list);
 COND_SYSCALL(futex_waitv);
 COND_SYSCALL(futex_wake);
+COND_SYSCALL(futex_wait);
 COND_SYSCALL(kexec_load);
 COND_SYSCALL_COMPAT(kexec_load);
 COND_SYSCALL(init_module);

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [tip: locking/core] futex: FLAGS_STRICT
  2023-09-21 10:45 ` [PATCH v3 06/15] futex: FLAGS_STRICT peterz
  2023-09-21 15:14   ` Thomas Gleixner
@ 2023-09-21 17:36   ` tip-bot2 for peterz@infradead.org
  1 sibling, 0 replies; 36+ messages in thread
From: tip-bot2 for peterz@infradead.org @ 2023-09-21 17:36 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: andrealmeid, Peter Zijlstra (Intel), Thomas Gleixner, x86, linux-kernel

The following commit has been merged into the locking/core branch of tip:

Commit-ID:     43adf844951084c266f172561f84c5f8120dd60b
Gitweb:        https://git.kernel.org/tip/43adf844951084c266f172561f84c5f8120dd60b
Author:        peterz@infradead.org <peterz@infradead.org>
AuthorDate:    Thu, 21 Sep 2023 12:45:11 +02:00
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Thu, 21 Sep 2023 19:22:07 +02:00

futex: FLAGS_STRICT

The current semantics for futex_wake() are a bit loose, specifically
asking for 0 futexes to be woken actually gets you 1.

Adding a !nr check to sys_futex_wake() makes that it would return 0
for unaligned futex words, because that check comes in the shared
futex_wake() function. Adding the !nr check there, would affect the
legacy sys_futex() semantics.

Hence frob a flag :-(

Suggested-by: André Almeida <andrealmeid@igalia.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20230921105248.048643656@noisy.programming.kicks-ass.net
---
 kernel/futex/futex.h    | 21 +++++++++++----------
 kernel/futex/syscalls.c |  2 +-
 kernel/futex/waitwake.c |  3 +++
 3 files changed, 15 insertions(+), 11 deletions(-)

diff --git a/kernel/futex/futex.h b/kernel/futex/futex.h
index a3f1fce..0e7821a 100644
--- a/kernel/futex/futex.h
+++ b/kernel/futex/futex.h
@@ -17,25 +17,26 @@
  * Futex flags used to encode options to functions and preserve them across
  * restarts.
  */
-#define FLAGS_SIZE_8		0x00
-#define FLAGS_SIZE_16		0x01
-#define FLAGS_SIZE_32		0x02
-#define FLAGS_SIZE_64		0x03
+#define FLAGS_SIZE_8		0x0000
+#define FLAGS_SIZE_16		0x0001
+#define FLAGS_SIZE_32		0x0002
+#define FLAGS_SIZE_64		0x0003
 
-#define FLAGS_SIZE_MASK		0x03
+#define FLAGS_SIZE_MASK		0x0003
 
 #ifdef CONFIG_MMU
-# define FLAGS_SHARED		0x10
+# define FLAGS_SHARED		0x0010
 #else
 /*
  * NOMMU does not have per process address space. Let the compiler optimize
  * code away.
  */
-# define FLAGS_SHARED		0x00
+# define FLAGS_SHARED		0x0000
 #endif
-#define FLAGS_CLOCKRT		0x20
-#define FLAGS_HAS_TIMEOUT	0x40
-#define FLAGS_NUMA		0x80
+#define FLAGS_CLOCKRT		0x0020
+#define FLAGS_HAS_TIMEOUT	0x0040
+#define FLAGS_NUMA		0x0080
+#define FLAGS_STRICT		0x0100
 
 /* FUTEX_ to FLAGS_ */
 static inline unsigned int futex_to_flags(unsigned int op)
diff --git a/kernel/futex/syscalls.c b/kernel/futex/syscalls.c
index 7049a52..4739892 100644
--- a/kernel/futex/syscalls.c
+++ b/kernel/futex/syscalls.c
@@ -333,7 +333,7 @@ SYSCALL_DEFINE4(futex_wake,
 	if (!futex_validate_input(flags, mask))
 		return -EINVAL;
 
-	return futex_wake(uaddr, flags, nr, mask);
+	return futex_wake(uaddr, FLAGS_STRICT | flags, nr, mask);
 }
 
 #ifdef CONFIG_COMPAT
diff --git a/kernel/futex/waitwake.c b/kernel/futex/waitwake.c
index fa97577..ceb05b8 100644
--- a/kernel/futex/waitwake.c
+++ b/kernel/futex/waitwake.c
@@ -155,6 +155,9 @@ int futex_wake(u32 __user *uaddr, unsigned int flags, int nr_wake, u32 bitset)
 	if (unlikely(ret != 0))
 		return ret;
 
+	if ((flags & FLAGS_STRICT) && !nr_wake)
+		return 0;
+
 	hb = futex_hash(&key);
 
 	/* Make sure we really have tasks to wakeup */

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [tip: locking/core] futex: Add sys_futex_wake()
  2023-09-21 10:45 ` [PATCH v3 05/15] futex: Add sys_futex_wake() peterz
@ 2023-09-21 17:36   ` tip-bot2 for peterz@infradead.org
  0 siblings, 0 replies; 36+ messages in thread
From: tip-bot2 for peterz@infradead.org @ 2023-09-21 17:36 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Peter Zijlstra (Intel),
	Thomas Gleixner, Geert Uytterhoeven, x86, linux-kernel

The following commit has been merged into the locking/core branch of tip:

Commit-ID:     9f6c532f59b20580acf8ede9409c9b8dce6e74e1
Gitweb:        https://git.kernel.org/tip/9f6c532f59b20580acf8ede9409c9b8dce6e74e1
Author:        peterz@infradead.org <peterz@infradead.org>
AuthorDate:    Thu, 21 Sep 2023 12:45:10 +02:00
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Thu, 21 Sep 2023 19:22:07 +02:00

futex: Add sys_futex_wake()

To complement sys_futex_waitv() add sys_futex_wake(). This syscall
implements what was previously known as FUTEX_WAKE_BITSET except it
uses 'unsigned long' for the bitmask and takes FUTEX2 flags.

The 'unsigned long' allows FUTEX2_SIZE_U64 on 64bit platforms.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Link: https://lore.kernel.org/r/20230921105247.936205525@noisy.programming.kicks-ass.net
---
 arch/alpha/kernel/syscalls/syscall.tbl      |  1 +-
 arch/arm/tools/syscall.tbl                  |  1 +-
 arch/arm64/include/asm/unistd.h             |  2 +-
 arch/arm64/include/asm/unistd32.h           |  2 +-
 arch/ia64/kernel/syscalls/syscall.tbl       |  1 +-
 arch/m68k/kernel/syscalls/syscall.tbl       |  1 +-
 arch/microblaze/kernel/syscalls/syscall.tbl |  1 +-
 arch/mips/kernel/syscalls/syscall_n32.tbl   |  1 +-
 arch/mips/kernel/syscalls/syscall_n64.tbl   |  1 +-
 arch/mips/kernel/syscalls/syscall_o32.tbl   |  1 +-
 arch/parisc/kernel/syscalls/syscall.tbl     |  1 +-
 arch/powerpc/kernel/syscalls/syscall.tbl    |  1 +-
 arch/s390/kernel/syscalls/syscall.tbl       |  1 +-
 arch/sh/kernel/syscalls/syscall.tbl         |  1 +-
 arch/sparc/kernel/syscalls/syscall.tbl      |  1 +-
 arch/x86/entry/syscalls/syscall_32.tbl      |  1 +-
 arch/x86/entry/syscalls/syscall_64.tbl      |  1 +-
 arch/xtensa/kernel/syscalls/syscall.tbl     |  1 +-
 include/linux/syscalls.h                    |  3 ++-
 include/uapi/asm-generic/unistd.h           |  4 ++-
 kernel/futex/syscalls.c                     | 30 ++++++++++++++++++++-
 kernel/sys_ni.c                             |  1 +-
 22 files changed, 56 insertions(+), 2 deletions(-)

diff --git a/arch/alpha/kernel/syscalls/syscall.tbl b/arch/alpha/kernel/syscalls/syscall.tbl
index ad37569..3b86519 100644
--- a/arch/alpha/kernel/syscalls/syscall.tbl
+++ b/arch/alpha/kernel/syscalls/syscall.tbl
@@ -492,3 +492,4 @@
 560	common	set_mempolicy_home_node		sys_ni_syscall
 561	common	cachestat			sys_cachestat
 562	common	fchmodat2			sys_fchmodat2
+563	common	futex_wake			sys_futex_wake
diff --git a/arch/arm/tools/syscall.tbl b/arch/arm/tools/syscall.tbl
index c572d6c..714abeb 100644
--- a/arch/arm/tools/syscall.tbl
+++ b/arch/arm/tools/syscall.tbl
@@ -466,3 +466,4 @@
 450	common	set_mempolicy_home_node		sys_set_mempolicy_home_node
 451	common	cachestat			sys_cachestat
 452	common	fchmodat2			sys_fchmodat2
+454	common	futex_wake			sys_futex_wake
diff --git a/arch/arm64/include/asm/unistd.h b/arch/arm64/include/asm/unistd.h
index bd77253..63a8a9c 100644
--- a/arch/arm64/include/asm/unistd.h
+++ b/arch/arm64/include/asm/unistd.h
@@ -39,7 +39,7 @@
 #define __ARM_NR_compat_set_tls		(__ARM_NR_COMPAT_BASE + 5)
 #define __ARM_NR_COMPAT_END		(__ARM_NR_COMPAT_BASE + 0x800)
 
-#define __NR_compat_syscalls		453
+#define __NR_compat_syscalls		455
 #endif
 
 #define __ARCH_WANT_SYS_CLONE
diff --git a/arch/arm64/include/asm/unistd32.h b/arch/arm64/include/asm/unistd32.h
index 78b6831..6897468 100644
--- a/arch/arm64/include/asm/unistd32.h
+++ b/arch/arm64/include/asm/unistd32.h
@@ -911,6 +911,8 @@ __SYSCALL(__NR_set_mempolicy_home_node, sys_set_mempolicy_home_node)
 __SYSCALL(__NR_cachestat, sys_cachestat)
 #define __NR_fchmodat2 452
 __SYSCALL(__NR_fchmodat2, sys_fchmodat2)
+#define __NR_futex_wake 454
+__SYSCALL(__NR_futex_wake, sys_futex_wake)
 
 /*
  * Please add new compat syscalls above this comment and update
diff --git a/arch/ia64/kernel/syscalls/syscall.tbl b/arch/ia64/kernel/syscalls/syscall.tbl
index 83d8609..cd50247 100644
--- a/arch/ia64/kernel/syscalls/syscall.tbl
+++ b/arch/ia64/kernel/syscalls/syscall.tbl
@@ -373,3 +373,4 @@
 450	common	set_mempolicy_home_node		sys_set_mempolicy_home_node
 451	common	cachestat			sys_cachestat
 452	common	fchmodat2			sys_fchmodat2
+454	common	futex_wake			sys_futex_wake
diff --git a/arch/m68k/kernel/syscalls/syscall.tbl b/arch/m68k/kernel/syscalls/syscall.tbl
index 259ceb1..21eb35c 100644
--- a/arch/m68k/kernel/syscalls/syscall.tbl
+++ b/arch/m68k/kernel/syscalls/syscall.tbl
@@ -452,3 +452,4 @@
 450	common	set_mempolicy_home_node		sys_set_mempolicy_home_node
 451	common	cachestat			sys_cachestat
 452	common	fchmodat2			sys_fchmodat2
+454	common	futex_wake			sys_futex_wake
diff --git a/arch/microblaze/kernel/syscalls/syscall.tbl b/arch/microblaze/kernel/syscalls/syscall.tbl
index a3798c2..3a4e851 100644
--- a/arch/microblaze/kernel/syscalls/syscall.tbl
+++ b/arch/microblaze/kernel/syscalls/syscall.tbl
@@ -458,3 +458,4 @@
 450	common	set_mempolicy_home_node		sys_set_mempolicy_home_node
 451	common	cachestat			sys_cachestat
 452	common	fchmodat2			sys_fchmodat2
+454	common	futex_wake			sys_futex_wake
diff --git a/arch/mips/kernel/syscalls/syscall_n32.tbl b/arch/mips/kernel/syscalls/syscall_n32.tbl
index 152034b..6883ea3 100644
--- a/arch/mips/kernel/syscalls/syscall_n32.tbl
+++ b/arch/mips/kernel/syscalls/syscall_n32.tbl
@@ -391,3 +391,4 @@
 450	n32	set_mempolicy_home_node		sys_set_mempolicy_home_node
 451	n32	cachestat			sys_cachestat
 452	n32	fchmodat2			sys_fchmodat2
+454	n32	futex_wake			sys_futex_wake
diff --git a/arch/mips/kernel/syscalls/syscall_n64.tbl b/arch/mips/kernel/syscalls/syscall_n64.tbl
index cb5e757..48bc0fb 100644
--- a/arch/mips/kernel/syscalls/syscall_n64.tbl
+++ b/arch/mips/kernel/syscalls/syscall_n64.tbl
@@ -367,3 +367,4 @@
 450	common	set_mempolicy_home_node		sys_set_mempolicy_home_node
 451	n64	cachestat			sys_cachestat
 452	n64	fchmodat2			sys_fchmodat2
+454	n64	futex_wake			sys_futex_wake
diff --git a/arch/mips/kernel/syscalls/syscall_o32.tbl b/arch/mips/kernel/syscalls/syscall_o32.tbl
index 1a64681..a92625f 100644
--- a/arch/mips/kernel/syscalls/syscall_o32.tbl
+++ b/arch/mips/kernel/syscalls/syscall_o32.tbl
@@ -440,3 +440,4 @@
 450	o32	set_mempolicy_home_node		sys_set_mempolicy_home_node
 451	o32	cachestat			sys_cachestat
 452	o32	fchmodat2			sys_fchmodat2
+454	o32	futex_wake			sys_futex_wake
diff --git a/arch/parisc/kernel/syscalls/syscall.tbl b/arch/parisc/kernel/syscalls/syscall.tbl
index e97c175..57faa97 100644
--- a/arch/parisc/kernel/syscalls/syscall.tbl
+++ b/arch/parisc/kernel/syscalls/syscall.tbl
@@ -451,3 +451,4 @@
 450	common	set_mempolicy_home_node		sys_set_mempolicy_home_node
 451	common	cachestat			sys_cachestat
 452	common	fchmodat2			sys_fchmodat2
+454	common	futex_wake			sys_futex_wake
diff --git a/arch/powerpc/kernel/syscalls/syscall.tbl b/arch/powerpc/kernel/syscalls/syscall.tbl
index 20e5058..e6c6ed6 100644
--- a/arch/powerpc/kernel/syscalls/syscall.tbl
+++ b/arch/powerpc/kernel/syscalls/syscall.tbl
@@ -539,3 +539,4 @@
 450 	nospu	set_mempolicy_home_node		sys_set_mempolicy_home_node
 451	common	cachestat			sys_cachestat
 452	common	fchmodat2			sys_fchmodat2
+454	common	futex_wake			sys_futex_wake
diff --git a/arch/s390/kernel/syscalls/syscall.tbl b/arch/s390/kernel/syscalls/syscall.tbl
index 0122cc1..7547201 100644
--- a/arch/s390/kernel/syscalls/syscall.tbl
+++ b/arch/s390/kernel/syscalls/syscall.tbl
@@ -455,3 +455,4 @@
 450  common	set_mempolicy_home_node	sys_set_mempolicy_home_node	sys_set_mempolicy_home_node
 451  common	cachestat		sys_cachestat			sys_cachestat
 452  common	fchmodat2		sys_fchmodat2			sys_fchmodat2
+454  common	futex_wake		sys_futex_wake			sys_futex_wake
diff --git a/arch/sh/kernel/syscalls/syscall.tbl b/arch/sh/kernel/syscalls/syscall.tbl
index e90d585..902a997 100644
--- a/arch/sh/kernel/syscalls/syscall.tbl
+++ b/arch/sh/kernel/syscalls/syscall.tbl
@@ -455,3 +455,4 @@
 450	common	set_mempolicy_home_node		sys_set_mempolicy_home_node
 451	common	cachestat			sys_cachestat
 452	common	fchmodat2			sys_fchmodat2
+454	common	futex_wake			sys_futex_wake
diff --git a/arch/sparc/kernel/syscalls/syscall.tbl b/arch/sparc/kernel/syscalls/syscall.tbl
index 4ed06c7..8a1f887 100644
--- a/arch/sparc/kernel/syscalls/syscall.tbl
+++ b/arch/sparc/kernel/syscalls/syscall.tbl
@@ -498,3 +498,4 @@
 450	common	set_mempolicy_home_node		sys_set_mempolicy_home_node
 451	common	cachestat			sys_cachestat
 452	common	fchmodat2			sys_fchmodat2
+454	common	futex_wake			sys_futex_wake
diff --git a/arch/x86/entry/syscalls/syscall_32.tbl b/arch/x86/entry/syscalls/syscall_32.tbl
index 2d0b1bd..9e81323 100644
--- a/arch/x86/entry/syscalls/syscall_32.tbl
+++ b/arch/x86/entry/syscalls/syscall_32.tbl
@@ -457,3 +457,4 @@
 450	i386	set_mempolicy_home_node		sys_set_mempolicy_home_node
 451	i386	cachestat		sys_cachestat
 452	i386	fchmodat2		sys_fchmodat2
+454	i386	futex_wake		sys_futex_wake
diff --git a/arch/x86/entry/syscalls/syscall_64.tbl b/arch/x86/entry/syscalls/syscall_64.tbl
index 1d6eee3..d10a600 100644
--- a/arch/x86/entry/syscalls/syscall_64.tbl
+++ b/arch/x86/entry/syscalls/syscall_64.tbl
@@ -375,6 +375,7 @@
 451	common	cachestat		sys_cachestat
 452	common	fchmodat2		sys_fchmodat2
 453	64	map_shadow_stack	sys_map_shadow_stack
+454	common	futex_wake		sys_futex_wake
 
 #
 # Due to a historical design error, certain syscalls are numbered differently
diff --git a/arch/xtensa/kernel/syscalls/syscall.tbl b/arch/xtensa/kernel/syscalls/syscall.tbl
index fc1a4f3..4e511bf 100644
--- a/arch/xtensa/kernel/syscalls/syscall.tbl
+++ b/arch/xtensa/kernel/syscalls/syscall.tbl
@@ -423,3 +423,4 @@
 450	common	set_mempolicy_home_node		sys_set_mempolicy_home_node
 451	common	cachestat			sys_cachestat
 452	common	fchmodat2			sys_fchmodat2
+454	common	futex_wake			sys_futex_wake
diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
index 22bc6bc..e174ed8 100644
--- a/include/linux/syscalls.h
+++ b/include/linux/syscalls.h
@@ -549,6 +549,9 @@ asmlinkage long sys_set_robust_list(struct robust_list_head __user *head,
 asmlinkage long sys_futex_waitv(struct futex_waitv *waiters,
 				unsigned int nr_futexes, unsigned int flags,
 				struct __kernel_timespec __user *timeout, clockid_t clockid);
+
+asmlinkage long sys_futex_wake(void __user *uaddr, unsigned long mask, int nr, unsigned int flags);
+
 asmlinkage long sys_nanosleep(struct __kernel_timespec __user *rqtp,
 			      struct __kernel_timespec __user *rmtp);
 asmlinkage long sys_nanosleep_time32(struct old_timespec32 __user *rqtp,
diff --git a/include/uapi/asm-generic/unistd.h b/include/uapi/asm-generic/unistd.h
index abe087c..f5454e6 100644
--- a/include/uapi/asm-generic/unistd.h
+++ b/include/uapi/asm-generic/unistd.h
@@ -822,9 +822,11 @@ __SYSCALL(__NR_cachestat, sys_cachestat)
 
 #define __NR_fchmodat2 452
 __SYSCALL(__NR_fchmodat2, sys_fchmodat2)
+#define __NR_futex_wake 454
+__SYSCALL(__NR_futex_wake, sys_futex_wake)
 
 #undef __NR_syscalls
-#define __NR_syscalls 453
+#define __NR_syscalls 455
 
 /*
  * 32 bit systems traditionally used different
diff --git a/kernel/futex/syscalls.c b/kernel/futex/syscalls.c
index 2339f9c..7049a52 100644
--- a/kernel/futex/syscalls.c
+++ b/kernel/futex/syscalls.c
@@ -306,6 +306,36 @@ destroy_timer:
 	return ret;
 }
 
+/*
+ * sys_futex_wake - Wake a number of futexes
+ * @uaddr:	Address of the futex(es) to wake
+ * @mask:	bitmask
+ * @nr:		Number of the futexes to wake
+ * @flags:	FUTEX2 flags
+ *
+ * Identical to the traditional FUTEX_WAKE_BITSET op, except it is part of the
+ * futex2 family of calls.
+ */
+
+SYSCALL_DEFINE4(futex_wake,
+		void __user *, uaddr,
+		unsigned long, mask,
+		int, nr,
+		unsigned int, flags)
+{
+	if (flags & ~FUTEX2_VALID_MASK)
+		return -EINVAL;
+
+	flags = futex2_to_flags(flags);
+	if (!futex_flags_valid(flags))
+		return -EINVAL;
+
+	if (!futex_validate_input(flags, mask))
+		return -EINVAL;
+
+	return futex_wake(uaddr, flags, nr, mask);
+}
+
 #ifdef CONFIG_COMPAT
 COMPAT_SYSCALL_DEFINE2(set_robust_list,
 		struct compat_robust_list_head __user *, head,
diff --git a/kernel/sys_ni.c b/kernel/sys_ni.c
index e137c13..983c058 100644
--- a/kernel/sys_ni.c
+++ b/kernel/sys_ni.c
@@ -87,6 +87,7 @@ COND_SYSCALL_COMPAT(set_robust_list);
 COND_SYSCALL(get_robust_list);
 COND_SYSCALL_COMPAT(get_robust_list);
 COND_SYSCALL(futex_waitv);
+COND_SYSCALL(futex_wake);
 COND_SYSCALL(kexec_load);
 COND_SYSCALL_COMPAT(kexec_load);
 COND_SYSCALL(init_module);

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [tip: locking/core] futex: Validate futex value against futex size
  2023-09-21 10:45 ` [PATCH v3 04/15] futex: Validate futex value against futex size peterz
@ 2023-09-21 17:36   ` tip-bot2 for peterz@infradead.org
  0 siblings, 0 replies; 36+ messages in thread
From: tip-bot2 for peterz@infradead.org @ 2023-09-21 17:36 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Peter Zijlstra (Intel), Thomas Gleixner, x86, linux-kernel

The following commit has been merged into the locking/core branch of tip:

Commit-ID:     698eb826383616ce0e817d2384da6413d1439fb6
Gitweb:        https://git.kernel.org/tip/698eb826383616ce0e817d2384da6413d1439fb6
Author:        peterz@infradead.org <peterz@infradead.org>
AuthorDate:    Thu, 21 Sep 2023 12:45:09 +02:00
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Thu, 21 Sep 2023 19:22:06 +02:00

futex: Validate futex value against futex size

Ensure the futex value fits in the given futex size. Since this adds a
constraint to an existing syscall, it might possibly change behaviour.

Currently the value would be truncated to a u32 and any high bits
would get silently lost.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20230921105247.828934099@noisy.programming.kicks-ass.net
---
 kernel/futex/futex.h    | 10 ++++++++++
 kernel/futex/syscalls.c |  3 +++
 2 files changed, 13 insertions(+)

diff --git a/kernel/futex/futex.h b/kernel/futex/futex.h
index 68fc052..a3f1fce 100644
--- a/kernel/futex/futex.h
+++ b/kernel/futex/futex.h
@@ -85,6 +85,16 @@ static inline bool futex_flags_valid(unsigned int flags)
 	return true;
 }
 
+static inline bool futex_validate_input(unsigned int flags, u64 val)
+{
+	int bits = 8 * futex_size(flags);
+
+	if (bits < 64 && (val >> bits))
+		return false;
+
+	return true;
+}
+
 #ifdef CONFIG_FAIL_FUTEX
 extern bool should_fail_futex(bool fshared);
 #else
diff --git a/kernel/futex/syscalls.c b/kernel/futex/syscalls.c
index 948ac24..2339f9c 100644
--- a/kernel/futex/syscalls.c
+++ b/kernel/futex/syscalls.c
@@ -209,6 +209,9 @@ static int futex_parse_waitv(struct futex_vector *futexv,
 		if (!futex_flags_valid(flags))
 			return -EINVAL;
 
+		if (!futex_validate_input(flags, aux.val))
+			return -EINVAL;
+
 		futexv[i].w.flags = flags;
 		futexv[i].w.val = aux.val;
 		futexv[i].w.uaddr = aux.uaddr;

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [tip: locking/core] futex: Flag conversion
  2023-09-21 10:45 ` [PATCH v3 03/15] futex: Flag conversion peterz
@ 2023-09-21 17:36   ` tip-bot2 for peterz@infradead.org
  0 siblings, 0 replies; 36+ messages in thread
From: tip-bot2 for peterz@infradead.org @ 2023-09-21 17:36 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Peter Zijlstra (Intel), Thomas Gleixner, andrealmeid, x86, linux-kernel

The following commit has been merged into the locking/core branch of tip:

Commit-ID:     5694289ce183bc3336407a78c8c722a0b9208f9b
Gitweb:        https://git.kernel.org/tip/5694289ce183bc3336407a78c8c722a0b9208f9b
Author:        peterz@infradead.org <peterz@infradead.org>
AuthorDate:    Thu, 21 Sep 2023 12:45:08 +02:00
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Thu, 21 Sep 2023 19:22:05 +02:00

futex: Flag conversion

Futex has 3 sets of flags:

 - legacy futex op bits
 - futex2 flags
 - internal flags

Add a few helpers to convert from the API flags into the internal
flags.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: André Almeida <andrealmeid@igalia.com>
Link: https://lore.kernel.org/r/20230921105247.722140574@noisy.programming.kicks-ass.net
---
 kernel/futex/futex.h    | 63 ++++++++++++++++++++++++++++++++++++++--
 kernel/futex/syscalls.c | 24 +++++----------
 kernel/futex/waitwake.c |  4 +--
 3 files changed, 71 insertions(+), 20 deletions(-)

diff --git a/kernel/futex/futex.h b/kernel/futex/futex.h
index b5379c0..68fc052 100644
--- a/kernel/futex/futex.h
+++ b/kernel/futex/futex.h
@@ -5,6 +5,7 @@
 #include <linux/futex.h>
 #include <linux/rtmutex.h>
 #include <linux/sched/wake_q.h>
+#include <linux/compat.h>
 
 #ifdef CONFIG_PREEMPT_RT
 #include <linux/rcuwait.h>
@@ -16,8 +17,15 @@
  * Futex flags used to encode options to functions and preserve them across
  * restarts.
  */
+#define FLAGS_SIZE_8		0x00
+#define FLAGS_SIZE_16		0x01
+#define FLAGS_SIZE_32		0x02
+#define FLAGS_SIZE_64		0x03
+
+#define FLAGS_SIZE_MASK		0x03
+
 #ifdef CONFIG_MMU
-# define FLAGS_SHARED		0x01
+# define FLAGS_SHARED		0x10
 #else
 /*
  * NOMMU does not have per process address space. Let the compiler optimize
@@ -25,8 +33,57 @@
  */
 # define FLAGS_SHARED		0x00
 #endif
-#define FLAGS_CLOCKRT		0x02
-#define FLAGS_HAS_TIMEOUT	0x04
+#define FLAGS_CLOCKRT		0x20
+#define FLAGS_HAS_TIMEOUT	0x40
+#define FLAGS_NUMA		0x80
+
+/* FUTEX_ to FLAGS_ */
+static inline unsigned int futex_to_flags(unsigned int op)
+{
+	unsigned int flags = FLAGS_SIZE_32;
+
+	if (!(op & FUTEX_PRIVATE_FLAG))
+		flags |= FLAGS_SHARED;
+
+	if (op & FUTEX_CLOCK_REALTIME)
+		flags |= FLAGS_CLOCKRT;
+
+	return flags;
+}
+
+/* FUTEX2_ to FLAGS_ */
+static inline unsigned int futex2_to_flags(unsigned int flags2)
+{
+	unsigned int flags = flags2 & FUTEX2_SIZE_MASK;
+
+	if (!(flags2 & FUTEX2_PRIVATE))
+		flags |= FLAGS_SHARED;
+
+	if (flags2 & FUTEX2_NUMA)
+		flags |= FLAGS_NUMA;
+
+	return flags;
+}
+
+static inline unsigned int futex_size(unsigned int flags)
+{
+	return 1 << (flags & FLAGS_SIZE_MASK);
+}
+
+static inline bool futex_flags_valid(unsigned int flags)
+{
+	/* Only 64bit futexes for 64bit code */
+	if (!IS_ENABLED(CONFIG_64BIT) || in_compat_syscall()) {
+		if ((flags & FLAGS_SIZE_MASK) == FLAGS_SIZE_64)
+			return false;
+	}
+
+	/* Only 32bit futexes are implemented -- for now */
+	if ((flags & FLAGS_SIZE_MASK) != FLAGS_SIZE_32)
+		return false;
+
+	return true;
+}
 
 #ifdef CONFIG_FAIL_FUTEX
 extern bool should_fail_futex(bool fshared);
diff --git a/kernel/futex/syscalls.c b/kernel/futex/syscalls.c
index 953f0a4..948ac24 100644
--- a/kernel/futex/syscalls.c
+++ b/kernel/futex/syscalls.c
@@ -1,6 +1,5 @@
 // SPDX-License-Identifier: GPL-2.0-or-later
 
-#include <linux/compat.h>
 #include <linux/syscalls.h>
 #include <linux/time_namespace.h>
 
@@ -85,15 +84,12 @@ err_unlock:
 long do_futex(u32 __user *uaddr, int op, u32 val, ktime_t *timeout,
 		u32 __user *uaddr2, u32 val2, u32 val3)
 {
+	unsigned int flags = futex_to_flags(op);
 	int cmd = op & FUTEX_CMD_MASK;
-	unsigned int flags = 0;
 
-	if (!(op & FUTEX_PRIVATE_FLAG))
-		flags |= FLAGS_SHARED;
-
-	if (op & FUTEX_CLOCK_REALTIME) {
-		flags |= FLAGS_CLOCKRT;
-		if (cmd != FUTEX_WAIT_BITSET && cmd != FUTEX_WAIT_REQUEUE_PI &&
+	if (flags & FLAGS_CLOCKRT) {
+		if (cmd != FUTEX_WAIT_BITSET &&
+		    cmd != FUTEX_WAIT_REQUEUE_PI &&
 		    cmd != FUTEX_LOCK_PI2)
 			return -ENOSYS;
 	}
@@ -201,21 +197,19 @@ static int futex_parse_waitv(struct futex_vector *futexv,
 	unsigned int i;
 
 	for (i = 0; i < nr_futexes; i++) {
+		unsigned int flags;
+
 		if (copy_from_user(&aux, &uwaitv[i], sizeof(aux)))
 			return -EFAULT;
 
 		if ((aux.flags & ~FUTEX2_VALID_MASK) || aux.__reserved)
 			return -EINVAL;
 
-		if (!IS_ENABLED(CONFIG_64BIT) || in_compat_syscall()) {
-			if ((aux.flags & FUTEX2_SIZE_MASK) == FUTEX2_SIZE_U64)
-				return -EINVAL;
-		}
-
-		if ((aux.flags & FUTEX2_SIZE_MASK) != FUTEX2_SIZE_U32)
+		flags = futex2_to_flags(aux.flags);
+		if (!futex_flags_valid(flags))
 			return -EINVAL;
 
-		futexv[i].w.flags = aux.flags;
+		futexv[i].w.flags = flags;
 		futexv[i].w.val = aux.val;
 		futexv[i].w.uaddr = aux.uaddr;
 		futexv[i].q = futex_q_init;
diff --git a/kernel/futex/waitwake.c b/kernel/futex/waitwake.c
index ba01b94..fa97577 100644
--- a/kernel/futex/waitwake.c
+++ b/kernel/futex/waitwake.c
@@ -419,11 +419,11 @@ static int futex_wait_multiple_setup(struct futex_vector *vs, int count, int *wo
 	 */
 retry:
 	for (i = 0; i < count; i++) {
-		if ((vs[i].w.flags & FUTEX_PRIVATE_FLAG) && retry)
+		if (!(vs[i].w.flags & FLAGS_SHARED) && retry)
 			continue;
 
 		ret = get_futex_key(u64_to_user_ptr(vs[i].w.uaddr),
-				    !(vs[i].w.flags & FUTEX_PRIVATE_FLAG),
+				    vs[i].w.flags & FLAGS_SHARED,
 				    &vs[i].q.key, FUTEX_READ);
 
 		if (unlikely(ret))

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [tip: locking/core] futex: Clarify FUTEX2 flags
  2023-09-21 10:45 ` [PATCH v3 01/15] futex: Clarify FUTEX2 flags peterz
@ 2023-09-21 17:36   ` tip-bot2 for peterz@infradead.org
  0 siblings, 0 replies; 36+ messages in thread
From: tip-bot2 for peterz@infradead.org @ 2023-09-21 17:36 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Peter Zijlstra (Intel), Thomas Gleixner, andrealmeid, x86, linux-kernel

The following commit has been merged into the locking/core branch of tip:

Commit-ID:     4923954bbc4a760e0b2210e0cb5733726ac2e2e9
Gitweb:        https://git.kernel.org/tip/4923954bbc4a760e0b2210e0cb5733726ac2e2e9
Author:        peterz@infradead.org <peterz@infradead.org>
AuthorDate:    Thu, 21 Sep 2023 12:45:06 +02:00
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Thu, 21 Sep 2023 19:22:04 +02:00

futex: Clarify FUTEX2 flags

sys_futex_waitv() is part of the futex2 series (the first and only so
far) of syscalls and has a flags field per futex (as opposed to flags
being encoded in the futex op).

This new flags field has a new namespace, which unfortunately isn't
super explicit. Notably it currently takes FUTEX_32 and
FUTEX_PRIVATE_FLAG.

Introduce the FUTEX2 namespace to clarify this

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: André Almeida <andrealmeid@igalia.com>
Link: https://lore.kernel.org/r/20230921105247.507327749@noisy.programming.kicks-ass.net
---
 include/uapi/linux/futex.h | 16 +++++++++++++---
 kernel/futex/syscalls.c    |  7 +++----
 2 files changed, 16 insertions(+), 7 deletions(-)

diff --git a/include/uapi/linux/futex.h b/include/uapi/linux/futex.h
index 71a5df8..21d4eff 100644
--- a/include/uapi/linux/futex.h
+++ b/include/uapi/linux/futex.h
@@ -44,10 +44,20 @@
 					 FUTEX_PRIVATE_FLAG)
 
 /*
- * Flags to specify the bit length of the futex word for futex2 syscalls.
- * Currently, only 32 is supported.
+ * Flags for futex2 syscalls.
  */
-#define FUTEX_32		2
+			/*	0x00 */
+			/*	0x01 */
+#define FUTEX2_SIZE_U32		0x02
+			/*	0x04 */
+			/*	0x08 */
+			/*	0x10 */
+			/*	0x20 */
+			/*	0x40 */
+#define FUTEX2_PRIVATE		FUTEX_PRIVATE_FLAG
+
+/* do not use */
+#define FUTEX_32		FUTEX2_SIZE_U32 /* historical accident :-( */
 
 /*
  * Max numbers of elements in a futex_waitv array
diff --git a/kernel/futex/syscalls.c b/kernel/futex/syscalls.c
index a807407..ff696b0 100644
--- a/kernel/futex/syscalls.c
+++ b/kernel/futex/syscalls.c
@@ -183,8 +183,7 @@ SYSCALL_DEFINE6(futex, u32 __user *, uaddr, int, op, u32, val,
 	return do_futex(uaddr, op, val, tp, uaddr2, (unsigned long)utime, val3);
 }
 
-/* Mask of available flags for each futex in futex_waitv list */
-#define FUTEXV_WAITER_MASK (FUTEX_32 | FUTEX_PRIVATE_FLAG)
+#define FUTEX2_VALID_MASK (FUTEX2_SIZE_U32 | FUTEX2_PRIVATE)
 
 /**
  * futex_parse_waitv - Parse a waitv array from userspace
@@ -205,10 +204,10 @@ static int futex_parse_waitv(struct futex_vector *futexv,
 		if (copy_from_user(&aux, &uwaitv[i], sizeof(aux)))
 			return -EFAULT;
 
-		if ((aux.flags & ~FUTEXV_WAITER_MASK) || aux.__reserved)
+		if ((aux.flags & ~FUTEX2_VALID_MASK) || aux.__reserved)
 			return -EINVAL;
 
-		if (!(aux.flags & FUTEX_32))
+		if (!(aux.flags & FUTEX2_SIZE_U32))
 			return -EINVAL;
 
 		futexv[i].w.flags = aux.flags;

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [tip: locking/core] futex: Extend the FUTEX2 flags
  2023-09-21 10:45 ` [PATCH v3 02/15] futex: Extend the " peterz
@ 2023-09-21 17:36   ` tip-bot2 for peterz@infradead.org
  0 siblings, 0 replies; 36+ messages in thread
From: tip-bot2 for peterz@infradead.org @ 2023-09-21 17:36 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Peter Zijlstra (Intel), Thomas Gleixner, andrealmeid, x86, linux-kernel

The following commit has been merged into the locking/core branch of tip:

Commit-ID:     d6d08d24790e82c69a46ef78ae44fe1b1ed30775
Gitweb:        https://git.kernel.org/tip/d6d08d24790e82c69a46ef78ae44fe1b1ed30775
Author:        peterz@infradead.org <peterz@infradead.org>
AuthorDate:    Thu, 21 Sep 2023 12:45:07 +02:00
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Thu, 21 Sep 2023 19:22:05 +02:00

futex: Extend the FUTEX2 flags

Add the definition for the missing but always intended extra sizes,
and add a NUMA flag for the planned numa extention.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: André Almeida <andrealmeid@igalia.com>
Link: https://lore.kernel.org/r/20230921105247.617057368@noisy.programming.kicks-ass.net
---
 include/uapi/linux/futex.h | 21 ++++++++++++++++++---
 kernel/futex/syscalls.c    |  9 +++++++--
 2 files changed, 25 insertions(+), 5 deletions(-)

diff --git a/include/uapi/linux/futex.h b/include/uapi/linux/futex.h
index 21d4eff..d2ee625 100644
--- a/include/uapi/linux/futex.h
+++ b/include/uapi/linux/futex.h
@@ -45,17 +45,32 @@
 
 /*
  * Flags for futex2 syscalls.
+ *
+ * NOTE: these are not pure flags, they can also be seen as:
+ *
+ *   union {
+ *     u32  flags;
+ *     struct {
+ *       u32 size    : 2,
+ *           numa    : 1,
+ *                   : 4,
+ *           private : 1;
+ *     };
+ *   };
  */
-			/*	0x00 */
-			/*	0x01 */
+#define FUTEX2_SIZE_U8		0x00
+#define FUTEX2_SIZE_U16		0x01
 #define FUTEX2_SIZE_U32		0x02
-			/*	0x04 */
+#define FUTEX2_SIZE_U64		0x03
+#define FUTEX2_NUMA		0x04
 			/*	0x08 */
 			/*	0x10 */
 			/*	0x20 */
 			/*	0x40 */
 #define FUTEX2_PRIVATE		FUTEX_PRIVATE_FLAG
 
+#define FUTEX2_SIZE_MASK	0x03
+
 /* do not use */
 #define FUTEX_32		FUTEX2_SIZE_U32 /* historical accident :-( */
 
diff --git a/kernel/futex/syscalls.c b/kernel/futex/syscalls.c
index ff696b0..953f0a4 100644
--- a/kernel/futex/syscalls.c
+++ b/kernel/futex/syscalls.c
@@ -183,7 +183,7 @@ SYSCALL_DEFINE6(futex, u32 __user *, uaddr, int, op, u32, val,
 	return do_futex(uaddr, op, val, tp, uaddr2, (unsigned long)utime, val3);
 }
 
-#define FUTEX2_VALID_MASK (FUTEX2_SIZE_U32 | FUTEX2_PRIVATE)
+#define FUTEX2_VALID_MASK (FUTEX2_SIZE_MASK | FUTEX2_PRIVATE)
 
 /**
  * futex_parse_waitv - Parse a waitv array from userspace
@@ -207,7 +207,12 @@ static int futex_parse_waitv(struct futex_vector *futexv,
 		if ((aux.flags & ~FUTEX2_VALID_MASK) || aux.__reserved)
 			return -EINVAL;
 
-		if (!(aux.flags & FUTEX2_SIZE_U32))
+		if (!IS_ENABLED(CONFIG_64BIT) || in_compat_syscall()) {
+			if ((aux.flags & FUTEX2_SIZE_MASK) == FUTEX2_SIZE_U64)
+				return -EINVAL;
+		}
+
+		if ((aux.flags & FUTEX2_SIZE_MASK) != FUTEX2_SIZE_U32)
 			return -EINVAL;
 
 		futexv[i].w.flags = aux.flags;

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* Re: [PATCH v3 10/15] futex: Add sys_futex_requeue()
  2023-09-21 10:45 ` [PATCH v3 10/15] futex: Add sys_futex_requeue() peterz
  2023-09-21 17:35   ` [tip: locking/core] " tip-bot2 for peterz@infradead.org
@ 2023-09-22  9:35   ` Ingo Molnar
  2023-09-22 11:03     ` Peter Zijlstra
  1 sibling, 1 reply; 36+ messages in thread
From: Ingo Molnar @ 2023-09-22  9:35 UTC (permalink / raw)
  To: peterz
  Cc: tglx, axboe, linux-kernel, mingo, dvhart, dave, andrealmeid,
	Andrew Morton, urezki, hch, lstoakes, Arnd Bergmann, linux-api,
	linux-mm, linux-arch, malteskarupke, Geert Uytterhoeven


* peterz@infradead.org <peterz@infradead.org> wrote:

> --- linux-2.6.orig/kernel/futex/syscalls.c
> +++ linux-2.6/kernel/futex/syscalls.c
> @@ -396,6 +396,44 @@ SYSCALL_DEFINE6(futex_wait,
>  	return ret;
>  }
>  
> +/*
> + * sys_futex_requeue - Requeue a waiter from one futex to another
> + * @waiters:	array describing the source and destination futex
> + * @flags:	unused
> + * @nr_wake:	number of futexes to wake
> + * @nr_requeue:	number of futexes to requeue
> + *
> + * Identical to the traditional FUTEX_CMP_REQUEUE op, except it is part of the
> + * futex2 family of calls.
> + */
> +
> +SYSCALL_DEFINE4(futex_requeue,
> +		struct futex_waitv __user *, waiters,
> +		unsigned int, flags,
> +		int, nr_wake,
> +		int, nr_requeue)
> +{
> +	struct futex_vector futexes[2];
> +	u32 cmpval;
> +	int ret;
> +
> +	if (flags)
> +		return -EINVAL;

Small detail, but isn't -ENOSYS the canonical error code for functionality 
not yet implemented - which the unused 'flags' ABI is arguably?

-EINVAL is for recognized but incorrect parameters, such as:

> +	if (!waiters)
> +		return -EINVAL;

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v3 10/15] futex: Add sys_futex_requeue()
  2023-09-22  9:35   ` [PATCH v3 10/15] " Ingo Molnar
@ 2023-09-22 11:03     ` Peter Zijlstra
  2023-09-22 13:56       ` Jens Axboe
  0 siblings, 1 reply; 36+ messages in thread
From: Peter Zijlstra @ 2023-09-22 11:03 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: tglx, axboe, linux-kernel, mingo, dvhart, dave, andrealmeid,
	Andrew Morton, urezki, hch, lstoakes, Arnd Bergmann, linux-api,
	linux-mm, linux-arch, malteskarupke, Geert Uytterhoeven

On Fri, Sep 22, 2023 at 11:35:03AM +0200, Ingo Molnar wrote:
> 
> * peterz@infradead.org <peterz@infradead.org> wrote:
> 
> > --- linux-2.6.orig/kernel/futex/syscalls.c
> > +++ linux-2.6/kernel/futex/syscalls.c
> > @@ -396,6 +396,44 @@ SYSCALL_DEFINE6(futex_wait,
> >  	return ret;
> >  }
> >  
> > +/*
> > + * sys_futex_requeue - Requeue a waiter from one futex to another
> > + * @waiters:	array describing the source and destination futex
> > + * @flags:	unused
> > + * @nr_wake:	number of futexes to wake
> > + * @nr_requeue:	number of futexes to requeue
> > + *
> > + * Identical to the traditional FUTEX_CMP_REQUEUE op, except it is part of the
> > + * futex2 family of calls.
> > + */
> > +
> > +SYSCALL_DEFINE4(futex_requeue,
> > +		struct futex_waitv __user *, waiters,
> > +		unsigned int, flags,
> > +		int, nr_wake,
> > +		int, nr_requeue)
> > +{
> > +	struct futex_vector futexes[2];
> > +	u32 cmpval;
> > +	int ret;
> > +
> > +	if (flags)
> > +		return -EINVAL;
> 
> Small detail, but isn't -ENOSYS the canonical error code for functionality 
> not yet implemented - which the unused 'flags' ABI is arguably?
> 
> -EINVAL is for recognized but incorrect parameters, such as:

IIUC 'unknown flag' falls into the -EINVAL return category. Here we
happen to have no known flags, but that should not matter.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v3 10/15] futex: Add sys_futex_requeue()
  2023-09-22 11:03     ` Peter Zijlstra
@ 2023-09-22 13:56       ` Jens Axboe
  0 siblings, 0 replies; 36+ messages in thread
From: Jens Axboe @ 2023-09-22 13:56 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar
  Cc: tglx, linux-kernel, mingo, dvhart, dave, andrealmeid,
	Andrew Morton, urezki, hch, lstoakes, Arnd Bergmann, linux-api,
	linux-mm, linux-arch, malteskarupke, Geert Uytterhoeven

On 9/22/23 5:03 AM, Peter Zijlstra wrote:
> On Fri, Sep 22, 2023 at 11:35:03AM +0200, Ingo Molnar wrote:
>>
>> * peterz@infradead.org <peterz@infradead.org> wrote:
>>
>>> --- linux-2.6.orig/kernel/futex/syscalls.c
>>> +++ linux-2.6/kernel/futex/syscalls.c
>>> @@ -396,6 +396,44 @@ SYSCALL_DEFINE6(futex_wait,
>>>  	return ret;
>>>  }
>>>  
>>> +/*
>>> + * sys_futex_requeue - Requeue a waiter from one futex to another
>>> + * @waiters:	array describing the source and destination futex
>>> + * @flags:	unused
>>> + * @nr_wake:	number of futexes to wake
>>> + * @nr_requeue:	number of futexes to requeue
>>> + *
>>> + * Identical to the traditional FUTEX_CMP_REQUEUE op, except it is part of the
>>> + * futex2 family of calls.
>>> + */
>>> +
>>> +SYSCALL_DEFINE4(futex_requeue,
>>> +		struct futex_waitv __user *, waiters,
>>> +		unsigned int, flags,
>>> +		int, nr_wake,
>>> +		int, nr_requeue)
>>> +{
>>> +	struct futex_vector futexes[2];
>>> +	u32 cmpval;
>>> +	int ret;
>>> +
>>> +	if (flags)
>>> +		return -EINVAL;
>>
>> Small detail, but isn't -ENOSYS the canonical error code for functionality 
>> not yet implemented - which the unused 'flags' ABI is arguably?
>>
>> -EINVAL is for recognized but incorrect parameters, such as:
> 
> IIUC 'unknown flag' falls into the -EINVAL return category. Here we
> happen to have no known flags, but that should not matter.

Yep, -ENOSYS is for not having the syscall at all, -EINVAL for unknown
flags set.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 36+ messages in thread

* futex2 numa stuff
  2023-09-21 10:45 [PATCH v3 00/15] futex: More futex2 bits peterz
                   ` (14 preceding siblings ...)
  2023-09-21 10:45 ` [PATCH v3 15/15] futex,selftests: Extend the futex selftests peterz
@ 2023-09-22 20:01 ` Peter Zijlstra
  2023-09-22 20:01   ` [PATCH 15/15] futex,selftests: Extend the futex selftests Peter Zijlstra
                     ` (3 more replies)
  15 siblings, 4 replies; 36+ messages in thread
From: Peter Zijlstra @ 2023-09-22 20:01 UTC (permalink / raw)
  To: tglx, axboe
  Cc: linux-kernel, peterz, mingo, dvhart, dave, andrealmeid,
	Andrew Morton, urezki, hch, lstoakes, Arnd Bergmann, linux-api,
	linux-mm, linux-arch, malteskarupke, steve.shaw, marko.makela,
	andrei.artemev

Hi!

Updated version of patch 15/15 and a few extra patches for testing the
FUTEX2_NUMA bits. The last patch (17/15) should never be applied for anything
you care about and exists purely because I'm too lazy to generate actual
hash-bucket contention.

On my 2 node IVB-EP:

 $ echo FUTEX_SQUASH > /debug/sched/features

Effectively reducing each node to 1 bucket.

 $ numactl -m0 -N0 ./futex_numa -c10 -t2 -n0 -N0 &
   numactl -m1 -N1 ./futex_numa -c10 -t2 -n0 -N0

 ...
 contenders: 16154935
 contenders: 16202472

 $ numactl -m0 -N0 ./futex_numa -c10 -t2 -n0 -N0 &
   numactl -m1 -N1 ./futex_numa -c10 -t2 -n0 -N1

 contenders: 48584991
 contenders: 48680560

(loop counts, higher is better)

Clearly showing how separating the hashes works. 

The first one runs 10 contenders on each node but forces the (numa) futex to
hash to node 0 for both. This ensures all 20 contenders hash to the same
bucket and *ouch*.

The second one does the same, except now fully separates the nodes. Performance
is much improved.

Proving the per-node hashing actually works as advertised.

Further:

 $ ./futex_numa -t2 -n50000 -s1 -N
 ...
 node: -1
 node: -1
 node: 0
 node: 0
 node: -1
 node: -1
 node: 1
 node: 1
 ...
 total: 8980

Shows how a FUTEX2_NUMA lock can bounce around the nodes. The test has some
trivial asserts trying to show critical section integrity, but otherwise does
lock+unlock cycles with a nanosleep.

This both illustrates how to build a (trivial) lock using FUTEX2_NUMA and
proves the functionality works.




^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH 15/15] futex,selftests: Extend the futex selftests
  2023-09-22 20:01 ` futex2 numa stuff Peter Zijlstra
@ 2023-09-22 20:01   ` Peter Zijlstra
  2023-09-22 20:01   ` [PATCH 16/15] futex,selftests: Extend the futex selftests for NUMA Peter Zijlstra
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 36+ messages in thread
From: Peter Zijlstra @ 2023-09-22 20:01 UTC (permalink / raw)
  To: tglx, axboe
  Cc: linux-kernel, peterz, mingo, dvhart, dave, andrealmeid,
	Andrew Morton, urezki, hch, lstoakes, Arnd Bergmann, linux-api,
	linux-mm, linux-arch, malteskarupke, steve.shaw, marko.makela,
	andrei.artemev

Extend the wait/requeue selftests to also cover the futex2 syscalls.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 tools/testing/selftests/futex/functional/futex_requeue.c         |  100 +++++++++-
 tools/testing/selftests/futex/functional/futex_wait.c            |   56 ++++-
 tools/testing/selftests/futex/functional/futex_wait_timeout.c    |   16 +
 tools/testing/selftests/futex/functional/futex_wait_wouldblock.c |   28 ++
 tools/testing/selftests/futex/functional/futex_waitv.c           |   15 -
 tools/testing/selftests/futex/functional/run.sh                  |    6 
 tools/testing/selftests/futex/include/futex2test.h               |   52 +++++
 7 files changed, 243 insertions(+), 30 deletions(-)

--- a/tools/testing/selftests/futex/functional/futex_requeue.c
+++ b/tools/testing/selftests/futex/functional/futex_requeue.c
@@ -7,8 +7,10 @@
 
 #include <pthread.h>
 #include <limits.h>
+#include <stdbool.h>
 #include "logging.h"
 #include "futextest.h"
+#include "futex2test.h"
 
 #define TEST_NAME "futex-requeue"
 #define timeout_ns  30000000
@@ -16,24 +18,58 @@
 
 volatile futex_t *f1;
 
+bool futex2 = 0;
+bool mixed = 0;
+
 void usage(char *prog)
 {
 	printf("Usage: %s\n", prog);
 	printf("  -c	Use color\n");
+	printf("  -n	Use futex2 interface\n");
+	printf("  -x	Use mixed size futex\n");
 	printf("  -h	Display this help message\n");
 	printf("  -v L	Verbosity level: %d=QUIET %d=CRITICAL %d=INFO\n",
 	       VQUIET, VCRITICAL, VINFO);
 }
 
-void *waiterfn(void *arg)
+static void *waiterfn(void *arg)
 {
+	unsigned int flags = 0;
 	struct timespec to;
 
-	to.tv_sec = 0;
-	to.tv_nsec = timeout_ns;
+	if (futex2) {
+		unsigned long mask;
+
+		if (clock_gettime(CLOCK_MONOTONIC, &to)) {
+			printf("clock_gettime() failed errno %d", errno);
+			return NULL;
+		}
+
+		to.tv_nsec += timeout_ns;
+		if (to.tv_nsec >= 1000000000) {
+			to.tv_sec++;
+			to.tv_nsec -= 1000000000;
+		}
+
+		if (mixed) {
+			flags |= FUTEX2_SIZE_U16;
+			mask = (unsigned short)(~0U);
+		} else {
+			flags |= FUTEX2_SIZE_U32;
+			mask = (unsigned int)(~0U);
+		}
+
+		if (futex2_wait(f1, *f1, mask, flags,
+				&to, CLOCK_MONOTONIC))
+			printf("waiter failed errno %d\n", errno);
+	} else {
+
+		to.tv_sec = 0;
+		to.tv_nsec = timeout_ns;
 
-	if (futex_wait(f1, *f1, &to, 0))
-		printf("waiter failed errno %d\n", errno);
+		if (futex_wait(f1, *f1, &to, flags))
+			printf("waiter failed errno %d\n", errno);
+	}
 
 	return NULL;
 }
@@ -48,7 +84,7 @@ int main(int argc, char *argv[])
 
 	f1 = &_f1;
 
-	while ((c = getopt(argc, argv, "cht:v:")) != -1) {
+	while ((c = getopt(argc, argv, "xncht:v:")) != -1) {
 		switch (c) {
 		case 'c':
 			log_color(1);
@@ -59,6 +95,12 @@ int main(int argc, char *argv[])
 		case 'v':
 			log_verbosity(atoi(optarg));
 			break;
+		case 'x':
+			mixed=1;
+			/* fallthrough */
+		case 'n':
+			futex2=1;
+			break;
 		default:
 			usage(basename(argv[0]));
 			exit(1);
@@ -79,7 +121,22 @@ int main(int argc, char *argv[])
 	usleep(WAKE_WAIT_US);
 
 	info("Requeuing 1 futex from f1 to f2\n");
-	res = futex_cmp_requeue(f1, 0, &f2, 0, 1, 0);
+	if (futex2) {
+		struct futex_waitv futexes[2] = {
+			{
+				.val = 0,
+				.uaddr = (unsigned long)f1,
+				.flags = mixed ? FUTEX2_SIZE_U16 : FUTEX2_SIZE_U32,
+			},
+			{
+				.uaddr = (unsigned long)&f2,
+				.flags = FUTEX2_SIZE_U32,
+			},
+		};
+		res = futex2_requeue(futexes, 0, 0, 1);
+	} else {
+		res = futex_cmp_requeue(f1, 0, &f2, 0, 1, 0);
+	}
 	if (res != 1) {
 		ksft_test_result_fail("futex_requeue simple returned: %d %s\n",
 				      res ? errno : res,
@@ -89,7 +146,11 @@ int main(int argc, char *argv[])
 
 
 	info("Waking 1 futex at f2\n");
-	res = futex_wake(&f2, 1, 0);
+	if (futex2) {
+		res = futex2_wake(&f2, ~0U, 1, FUTEX2_SIZE_U32);
+	} else {
+		res = futex_wake(&f2, 1, 0);
+	}
 	if (res != 1) {
 		ksft_test_result_fail("futex_requeue simple returned: %d %s\n",
 				      res ? errno : res,
@@ -112,7 +173,22 @@ int main(int argc, char *argv[])
 	usleep(WAKE_WAIT_US);
 
 	info("Waking 3 futexes at f1 and requeuing 7 futexes from f1 to f2\n");
-	res = futex_cmp_requeue(f1, 0, &f2, 3, 7, 0);
+	if (futex2) {
+		struct futex_waitv futexes[2] = {
+			{
+				.val = 0,
+				.uaddr = (unsigned long)f1,
+				.flags = mixed ? FUTEX2_SIZE_U16 : FUTEX2_SIZE_U32,
+			},
+			{
+				.uaddr = (unsigned long)&f2,
+				.flags = FUTEX2_SIZE_U32,
+			},
+		};
+		res = futex2_requeue(futexes, 0, 3, 7);
+	} else {
+		res = futex_cmp_requeue(f1, 0, &f2, 3, 7, 0);
+	}
 	if (res != 10) {
 		ksft_test_result_fail("futex_requeue many returned: %d %s\n",
 				      res ? errno : res,
@@ -121,7 +197,11 @@ int main(int argc, char *argv[])
 	}
 
 	info("Waking INT_MAX futexes at f2\n");
-	res = futex_wake(&f2, INT_MAX, 0);
+	if (futex2) {
+		res = futex2_wake(&f2, ~0U, INT_MAX, FUTEX2_SIZE_U32);
+	} else {
+		res = futex_wake(&f2, INT_MAX, 0);
+	}
 	if (res != 7) {
 		ksft_test_result_fail("futex_requeue many returned: %d %s\n",
 				      res ? errno : res,
--- a/tools/testing/selftests/futex/functional/futex_wait.c
+++ b/tools/testing/selftests/futex/functional/futex_wait.c
@@ -9,8 +9,10 @@
 #include <sys/shm.h>
 #include <sys/mman.h>
 #include <fcntl.h>
+#include <stdbool.h>
 #include "logging.h"
 #include "futextest.h"
+#include "futex2test.h"
 
 #define TEST_NAME "futex-wait"
 #define timeout_ns  30000000
@@ -19,10 +21,13 @@
 
 void *futex;
 
+bool futex2 = 0;
+
 void usage(char *prog)
 {
 	printf("Usage: %s\n", prog);
 	printf("  -c	Use color\n");
+	printf("  -n	Use futex2 interface\n");
 	printf("  -h	Display this help message\n");
 	printf("  -v L	Verbosity level: %d=QUIET %d=CRITICAL %d=INFO\n",
 	       VQUIET, VCRITICAL, VINFO);
@@ -30,17 +35,35 @@ void usage(char *prog)
 
 static void *waiterfn(void *arg)
 {
-	struct timespec to;
 	unsigned int flags = 0;
+	struct timespec to;
 
 	if (arg)
 		flags = *((unsigned int *) arg);
 
-	to.tv_sec = 0;
-	to.tv_nsec = timeout_ns;
+	if (futex2) {
+		if (clock_gettime(CLOCK_MONOTONIC, &to)) {
+			printf("clock_gettime() failed errno %d", errno);
+			return NULL;
+		}
 
-	if (futex_wait(futex, 0, &to, flags))
-		printf("waiter failed errno %d\n", errno);
+		to.tv_nsec += timeout_ns;
+		if (to.tv_nsec >= 1000000000) {
+			to.tv_sec++;
+			to.tv_nsec -= 1000000000;
+		}
+
+		if (futex2_wait(futex, 0, ~0U, flags | FUTEX2_SIZE_U32,
+				&to, CLOCK_MONOTONIC))
+			printf("waiter failed errno %d\n", errno);
+	} else {
+
+		to.tv_sec = 0;
+		to.tv_nsec = timeout_ns;
+
+		if (futex_wait(futex, 0, &to, flags))
+			printf("waiter failed errno %d\n", errno);
+	}
 
 	return NULL;
 }
@@ -55,7 +78,7 @@ int main(int argc, char *argv[])
 
 	futex = &f_private;
 
-	while ((c = getopt(argc, argv, "cht:v:")) != -1) {
+	while ((c = getopt(argc, argv, "ncht:v:")) != -1) {
 		switch (c) {
 		case 'c':
 			log_color(1);
@@ -66,6 +89,9 @@ int main(int argc, char *argv[])
 		case 'v':
 			log_verbosity(atoi(optarg));
 			break;
+		case 'n':
+			futex2=1;
+			break;
 		default:
 			usage(basename(argv[0]));
 			exit(1);
@@ -84,7 +110,11 @@ int main(int argc, char *argv[])
 	usleep(WAKE_WAIT_US);
 
 	info("Calling private futex_wake on futex: %p\n", futex);
-	res = futex_wake(futex, 1, FUTEX_PRIVATE_FLAG);
+	if (futex2) {
+		res = futex2_wake(futex, ~0U, 1, FUTEX2_SIZE_U32 | FUTEX2_PRIVATE);
+	} else {
+		res = futex_wake(futex, 1, FUTEX_PRIVATE_FLAG);
+	}
 	if (res != 1) {
 		ksft_test_result_fail("futex_wake private returned: %d %s\n",
 				      errno, strerror(errno));
@@ -112,7 +142,11 @@ int main(int argc, char *argv[])
 	usleep(WAKE_WAIT_US);
 
 	info("Calling shared (page anon) futex_wake on futex: %p\n", futex);
-	res = futex_wake(futex, 1, 0);
+	if (futex2) {
+		res = futex2_wake(futex, ~0U, 1, FUTEX2_SIZE_U32);
+	} else {
+		res = futex_wake(futex, 1, 0);
+	}
 	if (res != 1) {
 		ksft_test_result_fail("futex_wake shared (page anon) returned: %d %s\n",
 				      errno, strerror(errno));
@@ -151,7 +185,11 @@ int main(int argc, char *argv[])
 	usleep(WAKE_WAIT_US);
 
 	info("Calling shared (file backed) futex_wake on futex: %p\n", futex);
-	res = futex_wake(shm, 1, 0);
+	if (futex2) {
+		res = futex2_wake(shm, ~0U, 1, FUTEX2_SIZE_U32);
+	} else {
+		res = futex_wake(shm, 1, 0);
+	}
 	if (res != 1) {
 		ksft_test_result_fail("futex_wake shared (file backed) returned: %d %s\n",
 				      errno, strerror(errno));
--- a/tools/testing/selftests/futex/functional/futex_wait_timeout.c
+++ b/tools/testing/selftests/futex/functional/futex_wait_timeout.c
@@ -103,7 +103,7 @@ int main(int argc, char *argv[])
 	struct futex_waitv waitv = {
 			.uaddr = (uintptr_t)&f1,
 			.val = f1,
-			.flags = FUTEX_32,
+			.flags = FUTEX2_SIZE_U32,
 			.__reserved = 0
 		};
 
@@ -128,7 +128,7 @@ int main(int argc, char *argv[])
 	}
 
 	ksft_print_header();
-	ksft_set_plan(9);
+	ksft_set_plan(11);
 	ksft_print_msg("%s: Block on a futex and wait for timeout\n",
 	       basename(argv[0]));
 	ksft_print_msg("\tArguments: timeout=%ldns\n", timeout_ns);
@@ -201,6 +201,18 @@ int main(int argc, char *argv[])
 	res = futex_waitv(&waitv, 1, 0, &to, CLOCK_REALTIME);
 	test_timeout(res, &ret, "futex_waitv realtime", ETIMEDOUT);
 
+	/* futex2_wait with CLOCK_MONOTONIC */
+	if (futex_get_abs_timeout(CLOCK_MONOTONIC, &to, timeout_ns))
+		return RET_FAIL;
+	res = futex2_wait(&f1, f1, 1, FUTEX2_SIZE_U32, &to, CLOCK_MONOTONIC);
+	test_timeout(res, &ret, "futex2_wait monotonic", ETIMEDOUT);
+
+	/* futex2_wait with CLOCK_REALTIME */
+	if (futex_get_abs_timeout(CLOCK_REALTIME, &to, timeout_ns))
+		return RET_FAIL;
+	res = futex2_wait(&f1, f1, 1, FUTEX2_SIZE_U32, &to, CLOCK_REALTIME);
+	test_timeout(res, &ret, "futex2_wait realtime", ETIMEDOUT);
+
 	ksft_print_cnts();
 	return ret;
 }
--- a/tools/testing/selftests/futex/functional/futex_wait_wouldblock.c
+++ b/tools/testing/selftests/futex/functional/futex_wait_wouldblock.c
@@ -46,7 +46,7 @@ int main(int argc, char *argv[])
 	struct futex_waitv waitv = {
 			.uaddr = (uintptr_t)&f1,
 			.val = f1+1,
-			.flags = FUTEX_32,
+			.flags = FUTEX2_SIZE_U32 | FUTEX2_PRIVATE,
 			.__reserved = 0
 		};
 
@@ -68,7 +68,7 @@ int main(int argc, char *argv[])
 	}
 
 	ksft_print_header();
-	ksft_set_plan(2);
+	ksft_set_plan(3);
 	ksft_print_msg("%s: Test the unexpected futex value in FUTEX_WAIT\n",
 	       basename(argv[0]));
 
@@ -106,6 +106,30 @@ int main(int argc, char *argv[])
 		ksft_test_result_pass("futex_waitv\n");
 	}
 
+	if (clock_gettime(CLOCK_MONOTONIC, &to)) {
+		error("clock_gettime failed\n", errno);
+		return errno;
+	}
+
+	to.tv_nsec += timeout_ns;
+
+	if (to.tv_nsec >= 1000000000) {
+		to.tv_sec++;
+		to.tv_nsec -= 1000000000;
+	}
+
+	info("Calling futex2_wait on f1: %u @ %p with val=%u\n", f1, &f1, f1+1);
+	res = futex2_wait(&f1, f1+1, ~0U, FUTEX2_SIZE_U32 | FUTEX2_PRIVATE,
+			  &to, CLOCK_MONOTONIC);
+	if (!res || errno != EWOULDBLOCK) {
+		ksft_test_result_pass("futex2_wait returned: %d %s\n",
+				      res ? errno : res,
+				      res ? strerror(errno) : "");
+		ret = RET_FAIL;
+	} else {
+		ksft_test_result_pass("futex2_wait\n");
+	}
+
 	ksft_print_cnts();
 	return ret;
 }
--- a/tools/testing/selftests/futex/functional/futex_waitv.c
+++ b/tools/testing/selftests/futex/functional/futex_waitv.c
@@ -88,7 +88,7 @@ int main(int argc, char *argv[])
 
 	for (i = 0; i < NR_FUTEXES; i++) {
 		waitv[i].uaddr = (uintptr_t)&futexes[i];
-		waitv[i].flags = FUTEX_32 | FUTEX_PRIVATE_FLAG;
+		waitv[i].flags = FUTEX2_SIZE_U32 | FUTEX2_PRIVATE;
 		waitv[i].val = 0;
 		waitv[i].__reserved = 0;
 	}
@@ -99,7 +99,8 @@ int main(int argc, char *argv[])
 
 	usleep(WAKE_WAIT_US);
 
-	res = futex_wake(u64_to_ptr(waitv[NR_FUTEXES - 1].uaddr), 1, FUTEX_PRIVATE_FLAG);
+	res = futex2_wake(u64_to_ptr(waitv[NR_FUTEXES - 1].uaddr), ~0U, 1,
+			  FUTEX2_PRIVATE | FUTEX2_SIZE_U32);
 	if (res != 1) {
 		ksft_test_result_fail("futex_wake private returned: %d %s\n",
 				      res ? errno : res,
@@ -122,7 +123,7 @@ int main(int argc, char *argv[])
 
 		*shared_data = 0;
 		waitv[i].uaddr = (uintptr_t)shared_data;
-		waitv[i].flags = FUTEX_32;
+		waitv[i].flags = FUTEX2_SIZE_U32;
 		waitv[i].val = 0;
 		waitv[i].__reserved = 0;
 	}
@@ -145,8 +146,8 @@ int main(int argc, char *argv[])
 	for (i = 0; i < NR_FUTEXES; i++)
 		shmdt(u64_to_ptr(waitv[i].uaddr));
 
-	/* Testing a waiter without FUTEX_32 flag */
-	waitv[0].flags = FUTEX_PRIVATE_FLAG;
+	/* Testing a waiter without FUTEX2_SIZE_U32 flag */
+	waitv[0].flags = FUTEX2_PRIVATE;
 
 	if (clock_gettime(CLOCK_MONOTONIC, &to))
 		error("gettime64 failed\n", errno);
@@ -160,11 +161,11 @@ int main(int argc, char *argv[])
 				      res ? strerror(errno) : "");
 		ret = RET_FAIL;
 	} else {
-		ksft_test_result_pass("futex_waitv without FUTEX_32\n");
+		ksft_test_result_pass("futex_waitv without FUTEX2_SIZE_U32\n");
 	}
 
 	/* Testing a waiter with an unaligned address */
-	waitv[0].flags = FUTEX_PRIVATE_FLAG | FUTEX_32;
+	waitv[0].flags = FUTEX2_PRIVATE | FUTEX2_SIZE_U32;
 	waitv[0].uaddr = 1;
 
 	if (clock_gettime(CLOCK_MONOTONIC, &to))
--- a/tools/testing/selftests/futex/functional/run.sh
+++ b/tools/testing/selftests/futex/functional/run.sh
@@ -76,9 +76,15 @@ echo
 
 echo
 ./futex_wait $COLOR
+echo
+./futex_wait -n $COLOR
 
 echo
 ./futex_requeue $COLOR
+echo
+./futex_requeue -n $COLOR
+echo
+./futex_requeue -x $COLOR
 
 echo
 ./futex_waitv $COLOR
--- a/tools/testing/selftests/futex/include/futex2test.h
+++ b/tools/testing/selftests/futex/include/futex2test.h
@@ -8,6 +8,41 @@
 
 #define u64_to_ptr(x) ((void *)(uintptr_t)(x))
 
+#ifndef __NR_futex_waitv
+#define __NR_futex_waitv 449
+
+struct futex_waitv {
+	__u64 val;
+	__u64 uaddr;
+	__u32 flags;
+	__u32 __reserved;
+};
+#endif
+
+#ifndef __NR_futex_wake
+#define __NR_futex_wake 454
+#define __NR_futex_wait 455
+#define __NR_futex_requeue 456
+#endif
+
+#ifndef FUTEX2_SIZE_U8
+/*
+ * Flags for futex2 syscalls.
+ */
+#define FUTEX2_SIZE_U8		0x00
+#define FUTEX2_SIZE_U16		0x01
+#define FUTEX2_SIZE_U32		0x02
+#define FUTEX2_SIZE_U64		0x03
+#define FUTEX2_NUMA		0x04
+			/*	0x08 */
+			/*	0x10 */
+			/*	0x20 */
+			/*	0x40 */
+#define FUTEX2_PRIVATE		FUTEX_PRIVATE_FLAG
+#endif
+
+#define FUTEX_NO_NODE (-1)
+
 /**
  * futex_waitv - Wait at multiple futexes, wake on any
  * @waiters:    Array of waiters
@@ -20,3 +55,20 @@ static inline int futex_waitv(volatile s
 {
 	return syscall(__NR_futex_waitv, waiters, nr_waiters, flags, timo, clockid);
 }
+
+static inline int futex2_wake(volatile void *uaddr, unsigned long mask, int nr, unsigned int flags)
+{
+	return syscall(__NR_futex_wake, uaddr, mask, nr, flags);
+}
+
+static inline int futex2_wait(volatile void *uaddr, unsigned long val, unsigned long mask,
+			      unsigned int flags, struct timespec *timo, clockid_t clockid)
+{
+	return syscall(__NR_futex_wait, uaddr, val, mask, flags, timo, clockid);
+}
+
+static inline int futex2_requeue(struct futex_waitv *futexes, unsigned int flags,
+				 int nr_wake, int nr_requeue)
+{
+	return syscall(__NR_futex_requeue, futexes, flags, nr_wake, nr_requeue);
+}



^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH 16/15] futex,selftests: Extend the futex selftests for NUMA
  2023-09-22 20:01 ` futex2 numa stuff Peter Zijlstra
  2023-09-22 20:01   ` [PATCH 15/15] futex,selftests: Extend the futex selftests Peter Zijlstra
@ 2023-09-22 20:01   ` Peter Zijlstra
  2023-09-22 20:01   ` [PATCH 17/15] [HACK] futex: Force futex hash collision Peter Zijlstra
  2023-09-28 13:40   ` futex2 numa stuff Davidlohr Bueso
  3 siblings, 0 replies; 36+ messages in thread
From: Peter Zijlstra @ 2023-09-22 20:01 UTC (permalink / raw)
  To: tglx, axboe
  Cc: linux-kernel, peterz, mingo, dvhart, dave, andrealmeid,
	Andrew Morton, urezki, hch, lstoakes, Arnd Bergmann, linux-api,
	linux-mm, linux-arch, malteskarupke, steve.shaw, marko.makela,
	andrei.artemev

XXX

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 tools/testing/selftests/futex/functional/Makefile     |    3 
 tools/testing/selftests/futex/functional/futex_numa.c |  262 ++++++++++++++++++
 2 files changed, 264 insertions(+), 1 deletion(-)

--- a/tools/testing/selftests/futex/functional/Makefile
+++ b/tools/testing/selftests/futex/functional/Makefile
@@ -17,7 +17,8 @@ TEST_GEN_PROGS := \
 	futex_wait_private_mapped_file \
 	futex_wait \
 	futex_requeue \
-	futex_waitv
+	futex_waitv \
+	futex_numa
 
 TEST_PROGS := run.sh
 
--- /dev/null
+++ b/tools/testing/selftests/futex/functional/futex_numa.c
@@ -0,0 +1,262 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#include <pthread.h>
+#include <sys/shm.h>
+#include <sys/mman.h>
+#include <fcntl.h>
+#include <stdbool.h>
+#include <time.h>
+#include <assert.h>
+#include "logging.h"
+#include "futextest.h"
+#include "futex2test.h"
+
+typedef u_int32_t u32;
+typedef int32_t   s32;
+typedef u_int64_t u64;
+
+static int fflags = (FUTEX2_SIZE_U32 | FUTEX2_PRIVATE);
+static int fnode = FUTEX_NO_NODE;
+
+/* fairly stupid test-and-set lock with a waiter flag */
+
+#define N_LOCK		0x0000001
+#define N_WAITERS	0x0001000
+
+struct futex_numa_32 {
+	union {
+		u64 full;
+		struct {
+			u32 val;
+			u32 node;
+		};
+	};
+};
+
+void futex_numa_32_lock(struct futex_numa_32 *lock)
+{
+	for (;;) {
+		struct futex_numa_32 new, old = {
+			.full = __atomic_load_n(&lock->full, __ATOMIC_RELAXED),
+		};
+
+		for (;;) {
+			new = old;
+			if (old.val == 0) {
+				/* no waiter, no lock -> first lock, set no-node */
+				new.node = fnode;
+			}
+			if (old.val & N_LOCK) {
+				/* contention, set waiter */
+				new.val |= N_WAITERS;
+			}
+			new.val |= N_LOCK;
+
+			/* nothing changed, ready to block */
+			if (old.full == new.full)
+				break;
+
+			/*
+			 * Use u64 cmpxchg to set the futex value and node in a
+			 * consistent manner.
+			 */
+			if (__atomic_compare_exchange_n(&lock->full,
+							&old.full, new.full,
+							/* .weak */ false,
+							__ATOMIC_ACQUIRE,
+							__ATOMIC_RELAXED)) {
+
+				/* if we just set N_LOCK, we own it */
+				if (!(old.val & N_LOCK))
+					return;
+
+				/* go block */
+				break;
+			}
+		}
+
+		futex2_wait(lock, new.val, ~0U, fflags, NULL, 0);
+	}
+}
+
+void futex_numa_32_unlock(struct futex_numa_32 *lock)
+{
+	u32 val = __atomic_sub_fetch(&lock->val, N_LOCK, __ATOMIC_RELEASE);
+	assert((s32)val >= 0);
+	if (val & N_WAITERS) {
+		int woken = futex2_wake(lock, ~0U, 1, fflags);
+		assert(val == N_WAITERS);
+		if (!woken) {
+			__atomic_compare_exchange_n(&lock->val, &val, 0U,
+						    false, __ATOMIC_RELAXED,
+						    __ATOMIC_RELAXED);
+		}
+	}
+}
+
+static long nanos = 50000;
+
+struct thread_args {
+	pthread_t tid;
+	volatile int * done;
+	struct futex_numa_32 *lock;
+	int val;
+	int *val1, *val2;
+	int node;
+};
+
+static void *threadfn(void *_arg)
+{
+	struct thread_args *args = _arg;
+	struct timespec ts = {
+		.tv_nsec = nanos,
+	};
+	int node;
+
+	while (!*args->done) {
+
+		futex_numa_32_lock(args->lock);
+		args->val++;
+
+		assert(*args->val1 == *args->val2);
+		(*args->val1)++;
+		nanosleep(&ts, NULL);
+		(*args->val2)++;
+
+		node = args->lock->node;
+		futex_numa_32_unlock(args->lock);
+
+		if (node != args->node) {
+			args->node = node;
+			printf("node: %d\n", node);
+		}
+
+		nanosleep(&ts, NULL);
+	}
+
+	return NULL;
+}
+
+static void *contendfn(void *_arg)
+{
+	struct thread_args *args = _arg;
+
+	while (!*args->done) {
+		/*
+		 * futex2_wait() will take hb-lock, verify *var == val and
+		 * queue/abort.  By knowingly setting val 'wrong' this will
+		 * abort and thereby generate hb-lock contention.
+		 */
+		futex2_wait(&args->lock->val, ~0U, ~0U, fflags, NULL, 0);
+		args->val++;
+	}
+
+	return NULL;
+}
+
+static volatile int done = 0;
+static struct futex_numa_32 lock = { .val = 0, };
+static int val1, val2;
+
+int main(int argc, char *argv[])
+{
+	struct thread_args *tas[512], *cas[512];
+	int c, t, threads = 2, contenders = 0;
+	int sleeps = 10;
+	int total = 0;
+
+	while ((c = getopt(argc, argv, "c:t:s:n:N::")) != -1) {
+		switch (c) {
+		case 'c':
+			contenders = atoi(optarg);
+			break;
+		case 't':
+			threads = atoi(optarg);
+			break;
+		case 's':
+			sleeps = atoi(optarg);
+			break;
+		case 'n':
+			nanos = atoi(optarg);
+			break;
+		case 'N':
+			fflags |= FUTEX2_NUMA;
+			if (optarg)
+				fnode = atoi(optarg);
+			break;
+		default:
+			exit(1);
+			break;
+		}
+	}
+
+	for (t = 0; t < contenders; t++) {
+		struct thread_args *args = calloc(1, sizeof(*args));
+		if (!args) {
+			perror("thread_args");
+			exit(-1);
+		}
+
+		args->done = &done;
+		args->lock = &lock;
+		args->val1 = &val1;
+		args->val2 = &val2;
+		args->node = -1;
+
+		if (pthread_create(&args->tid, NULL, contendfn, args)) {
+			perror("pthread_create");
+			exit(-1);
+		}
+
+		cas[t] = args;
+	}
+
+	for (t = 0; t < threads; t++) {
+		struct thread_args *args = calloc(1, sizeof(*args));
+		if (!args) {
+			perror("thread_args");
+			exit(-1);
+		}
+
+		args->done = &done;
+		args->lock = &lock;
+		args->val1 = &val1;
+		args->val2 = &val2;
+		args->node = -1;
+
+		if (pthread_create(&args->tid, NULL, threadfn, args)) {
+			perror("pthread_create");
+			exit(-1);
+		}
+
+		tas[t] = args;
+	}
+
+	sleep(sleeps);
+
+	done = true;
+
+	for (t = 0; t < threads; t++) {
+		struct thread_args *args = tas[t];
+
+		pthread_join(args->tid, NULL);
+		total += args->val;
+//		printf("tval: %d\n", args->val);
+	}
+	printf("total: %d\n", total);
+
+	if (contenders) {
+		total = 0;
+		for (t = 0; t < contenders; t++) {
+			struct thread_args *args = cas[t];
+
+			pthread_join(args->tid, NULL);
+			total += args->val;
+//			printf("tval: %d\n", args->val);
+		}
+		printf("contenders: %d\n", total);
+	}
+
+	return 0;
+}
+



^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH 17/15] [HACK] futex: Force futex hash collision
  2023-09-22 20:01 ` futex2 numa stuff Peter Zijlstra
  2023-09-22 20:01   ` [PATCH 15/15] futex,selftests: Extend the futex selftests Peter Zijlstra
  2023-09-22 20:01   ` [PATCH 16/15] futex,selftests: Extend the futex selftests for NUMA Peter Zijlstra
@ 2023-09-22 20:01   ` Peter Zijlstra
  2023-09-28 13:40   ` futex2 numa stuff Davidlohr Bueso
  3 siblings, 0 replies; 36+ messages in thread
From: Peter Zijlstra @ 2023-09-22 20:01 UTC (permalink / raw)
  To: tglx, axboe
  Cc: linux-kernel, peterz, mingo, dvhart, dave, andrealmeid,
	Andrew Morton, urezki, hch, lstoakes, Arnd Bergmann, linux-api,
	linux-mm, linux-arch, malteskarupke, steve.shaw, marko.makela,
	andrei.artemev

If you hate performance -- use this.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 kernel/futex/core.c     |    6 ++++++
 kernel/sched/features.h |    2 ++
 2 files changed, 8 insertions(+)

--- a/kernel/futex/core.c
+++ b/kernel/futex/core.c
@@ -128,6 +128,9 @@ static int futex_put_value(u32 val, u32
 	}
 }
 
+#include <linux/sched/cputime.h>
+#include "../sched/sched.h"
+
 /**
  * futex_hash - Return the hash bucket in the global hash
  * @key:	Pointer to the futex key for which the hash is calculated
@@ -159,6 +162,9 @@ struct futex_hash_bucket *futex_hash(uni
 		}
 	}
 
+	if (sched_feat(FUTEX_SQUASH))
+		hash = 0;
+
 	return &futex_queues[node][hash & (futex_hashsize - 1)];
 }
 
--- a/kernel/sched/features.h
+++ b/kernel/sched/features.h
@@ -89,3 +89,5 @@ SCHED_FEAT(UTIL_EST_FASTUP, true)
 SCHED_FEAT(LATENCY_WARN, false)
 
 SCHED_FEAT(HZ_BW, true)
+
+SCHED_FEAT(FUTEX_SQUASH, false)



^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: futex2 numa stuff
  2023-09-22 20:01 ` futex2 numa stuff Peter Zijlstra
                     ` (2 preceding siblings ...)
  2023-09-22 20:01   ` [PATCH 17/15] [HACK] futex: Force futex hash collision Peter Zijlstra
@ 2023-09-28 13:40   ` Davidlohr Bueso
  3 siblings, 0 replies; 36+ messages in thread
From: Davidlohr Bueso @ 2023-09-28 13:40 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: tglx, axboe, linux-kernel, mingo, dvhart, andrealmeid,
	Andrew Morton, urezki, hch, lstoakes, Arnd Bergmann, linux-api,
	linux-mm, linux-arch, malteskarupke, steve.shaw, marko.makela,
	andrei.artemev

On Fri, 22 Sep 2023, Peter Zijlstra wrote:

>Hi!
>
>Updated version of patch 15/15 and a few extra patches for testing the
>FUTEX2_NUMA bits. The last patch (17/15) should never be applied for anything
>you care about and exists purely because I'm too lazy to generate actual
>hash-bucket contention.
>
>On my 2 node IVB-EP:
>
> $ echo FUTEX_SQUASH > /debug/sched/features
>
>Effectively reducing each node to 1 bucket.
>
> $ numactl -m0 -N0 ./futex_numa -c10 -t2 -n0 -N0 &
>   numactl -m1 -N1 ./futex_numa -c10 -t2 -n0 -N0
>
> ...
> contenders: 16154935
> contenders: 16202472
>
> $ numactl -m0 -N0 ./futex_numa -c10 -t2 -n0 -N0 &
>   numactl -m1 -N1 ./futex_numa -c10 -t2 -n0 -N1
>
> contenders: 48584991
> contenders: 48680560
>
>(loop counts, higher is better)
>
>Clearly showing how separating the hashes works.
>
>The first one runs 10 contenders on each node but forces the (numa) futex to
>hash to node 0 for both. This ensures all 20 contenders hash to the same
>bucket and *ouch*.
>
>The second one does the same, except now fully separates the nodes. Performance
>is much improved.
>
>Proving the per-node hashing actually works as advertised.

Very nice.

^ permalink raw reply	[flat|nested] 36+ messages in thread

end of thread, other threads:[~2023-09-28 13:41 UTC | newest]

Thread overview: 36+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-09-21 10:45 [PATCH v3 00/15] futex: More futex2 bits peterz
2023-09-21 10:45 ` [PATCH v3 01/15] futex: Clarify FUTEX2 flags peterz
2023-09-21 17:36   ` [tip: locking/core] " tip-bot2 for peterz@infradead.org
2023-09-21 10:45 ` [PATCH v3 02/15] futex: Extend the " peterz
2023-09-21 17:36   ` [tip: locking/core] " tip-bot2 for peterz@infradead.org
2023-09-21 10:45 ` [PATCH v3 03/15] futex: Flag conversion peterz
2023-09-21 17:36   ` [tip: locking/core] " tip-bot2 for peterz@infradead.org
2023-09-21 10:45 ` [PATCH v3 04/15] futex: Validate futex value against futex size peterz
2023-09-21 17:36   ` [tip: locking/core] " tip-bot2 for peterz@infradead.org
2023-09-21 10:45 ` [PATCH v3 05/15] futex: Add sys_futex_wake() peterz
2023-09-21 17:36   ` [tip: locking/core] " tip-bot2 for peterz@infradead.org
2023-09-21 10:45 ` [PATCH v3 06/15] futex: FLAGS_STRICT peterz
2023-09-21 15:14   ` Thomas Gleixner
2023-09-21 17:36   ` [tip: locking/core] " tip-bot2 for peterz@infradead.org
2023-09-21 10:45 ` [PATCH v3 07/15] futex: Add sys_futex_wait() peterz
2023-09-21 17:36   ` [tip: locking/core] " tip-bot2 for peterz@infradead.org
2023-09-21 10:45 ` [PATCH v3 08/15] futex: Propagate flags into get_futex_key() peterz
2023-09-21 17:36   ` [tip: locking/core] " tip-bot2 for peterz@infradead.org
2023-09-21 10:45 ` [PATCH v3 09/15] futex: Add flags2 argument to futex_requeue() peterz
2023-09-21 17:36   ` [tip: locking/core] " tip-bot2 for peterz@infradead.org
2023-09-21 10:45 ` [PATCH v3 10/15] futex: Add sys_futex_requeue() peterz
2023-09-21 17:35   ` [tip: locking/core] " tip-bot2 for peterz@infradead.org
2023-09-22  9:35   ` [PATCH v3 10/15] " Ingo Molnar
2023-09-22 11:03     ` Peter Zijlstra
2023-09-22 13:56       ` Jens Axboe
2023-09-21 10:45 ` [PATCH v3 11/15] mm: Add vmalloc_huge_node() peterz
2023-09-21 10:45 ` [PATCH v3 12/15] futex: Implement FUTEX2_NUMA peterz
2023-09-21 10:45 ` [PATCH v3 13/15] futex: Propagate flags into futex_get_value_locked() peterz
2023-09-21 10:45 ` [PATCH v3 14/15] futex: Enable FUTEX2_{8,16} peterz
2023-09-21 10:45 ` [PATCH v3 15/15] futex,selftests: Extend the futex selftests peterz
2023-09-21 11:41   ` Peter Zijlstra
2023-09-22 20:01 ` futex2 numa stuff Peter Zijlstra
2023-09-22 20:01   ` [PATCH 15/15] futex,selftests: Extend the futex selftests Peter Zijlstra
2023-09-22 20:01   ` [PATCH 16/15] futex,selftests: Extend the futex selftests for NUMA Peter Zijlstra
2023-09-22 20:01   ` [PATCH 17/15] [HACK] futex: Force futex hash collision Peter Zijlstra
2023-09-28 13:40   ` futex2 numa stuff Davidlohr Bueso

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.