All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v1 00/14] futex: More futex2 bits
@ 2023-07-21 10:22 Peter Zijlstra
  2023-07-21 10:22 ` [PATCH v1 01/14] futex: Clarify FUTEX2 flags Peter Zijlstra
                   ` (15 more replies)
  0 siblings, 16 replies; 52+ messages in thread
From: Peter Zijlstra @ 2023-07-21 10:22 UTC (permalink / raw)
  To: tglx, axboe
  Cc: linux-kernel, peterz, mingo, dvhart, dave, andrealmeid,
	Andrew Morton, urezki, hch, lstoakes, Arnd Bergmann, linux-api,
	linux-mm, linux-arch, malteskarupke

Hi,

New version of the futex2 patches. These are actually tested and appear to work
as expected.

I'm hoping to get at least the first 3 patches merged such that Jens can base
the io_uring futex patches on them.


Changes since v0:
 - switched over to 'unsigned long' for values (Arnd)
 - unshare vmalloc_huge() (Willy)
 - added wait/requeue syscalls
 - fixed NUMA to support sparse nodemask
 - added FUTEX2_n vs FUTEX2_NUMA check to ensure
   the node_id fits in the futex
 - added selftests
 - fixed a ton of silly bugs

---
 arch/alpha/kernel/syscalls/syscall.tbl             |   3 +
 arch/arm/tools/syscall.tbl                         |   3 +
 arch/arm64/include/asm/unistd32.h                  |   6 +
 arch/ia64/kernel/syscalls/syscall.tbl              |   3 +
 arch/m68k/kernel/syscalls/syscall.tbl              |   3 +
 arch/microblaze/kernel/syscalls/syscall.tbl        |   3 +
 arch/mips/kernel/syscalls/syscall_n32.tbl          |   3 +
 arch/mips/kernel/syscalls/syscall_n64.tbl          |   3 +
 arch/mips/kernel/syscalls/syscall_o32.tbl          |   3 +
 arch/parisc/kernel/syscalls/syscall.tbl            |   3 +
 arch/powerpc/kernel/syscalls/syscall.tbl           |   3 +
 arch/s390/kernel/syscalls/syscall.tbl              |   3 +
 arch/sh/kernel/syscalls/syscall.tbl                |   3 +
 arch/sparc/kernel/syscalls/syscall.tbl             |   3 +
 arch/x86/entry/syscalls/syscall_32.tbl             |   3 +
 arch/x86/entry/syscalls/syscall_64.tbl             |   3 +
 arch/xtensa/kernel/syscalls/syscall.tbl            |   3 +
 include/linux/futex.h                              |  14 +-
 include/linux/syscalls.h                           |  10 +
 include/linux/vmalloc.h                            |   1 +
 include/uapi/asm-generic/unistd.h                  |   9 +-
 include/uapi/linux/futex.h                         |  17 +-
 kernel/futex/core.c                                | 144 +++++++++++---
 kernel/futex/futex.h                               |  96 ++++++++-
 kernel/futex/pi.c                                  |  12 +-
 kernel/futex/requeue.c                             |  18 +-
 kernel/futex/syscalls.c                            | 221 ++++++++++++++++-----
 kernel/futex/waitwake.c                            |  80 ++++----
 kernel/sys_ni.c                                    |   3 +
 mm/vmalloc.c                                       |   7 +
 .../selftests/futex/functional/futex_requeue.c     | 100 +++++++++-
 .../selftests/futex/functional/futex_wait.c        |  56 +++++-
 .../futex/functional/futex_wait_timeout.c          |  14 +-
 .../futex/functional/futex_wait_wouldblock.c       |  28 ++-
 .../selftests/futex/functional/futex_waitv.c       |  15 +-
 tools/testing/selftests/futex/functional/run.sh    |   6 +
 tools/testing/selftests/futex/include/futex2test.h |  39 ++++
 37 files changed, 777 insertions(+), 167 deletions(-)


^ permalink raw reply	[flat|nested] 52+ messages in thread

* [PATCH v1 01/14] futex: Clarify FUTEX2 flags
  2023-07-21 10:22 [PATCH v1 00/14] futex: More futex2 bits Peter Zijlstra
@ 2023-07-21 10:22 ` Peter Zijlstra
  2023-07-31 16:08   ` Thomas Gleixner
  2023-07-21 10:22 ` [PATCH v1 02/14] futex: Extend the " Peter Zijlstra
                   ` (14 subsequent siblings)
  15 siblings, 1 reply; 52+ messages in thread
From: Peter Zijlstra @ 2023-07-21 10:22 UTC (permalink / raw)
  To: tglx, axboe
  Cc: linux-kernel, peterz, mingo, dvhart, dave, andrealmeid,
	Andrew Morton, urezki, hch, lstoakes, Arnd Bergmann, linux-api,
	linux-mm, linux-arch, malteskarupke

sys_futex_waitv() is part of the futex2 series (the first and only so
far) of syscalls and has a flags field per futex (as opposed to flags
being encoded in the futex op).

This new flags field has a new namespace, which unfortunately isn't
super explicit. Notably it currently takes FUTEX_32 and
FUTEX_PRIVATE_FLAG.

Introduce the FUTEX2 namespace to clarify this

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 include/uapi/linux/futex.h |   16 +++++++++++++---
 kernel/futex/syscalls.c    |    7 +++----
 2 files changed, 16 insertions(+), 7 deletions(-)

--- a/include/uapi/linux/futex.h
+++ b/include/uapi/linux/futex.h
@@ -44,10 +44,20 @@
 					 FUTEX_PRIVATE_FLAG)
 
 /*
- * Flags to specify the bit length of the futex word for futex2 syscalls.
- * Currently, only 32 is supported.
+ * Flags for futex2 syscalls.
  */
-#define FUTEX_32		2
+			/*	0x00 */
+			/*	0x01 */
+#define FUTEX2_32		0x02
+			/*	0x04 */
+			/*	0x08 */
+			/*	0x10 */
+			/*	0x20 */
+			/*	0x40 */
+#define FUTEX2_PRIVATE		FUTEX_PRIVATE_FLAG
+
+/* do not use */
+#define FUTEX_32		FUTEX2_32 /* historical accident :-( */
 
 /*
  * Max numbers of elements in a futex_waitv array
--- a/kernel/futex/syscalls.c
+++ b/kernel/futex/syscalls.c
@@ -183,8 +183,7 @@ SYSCALL_DEFINE6(futex, u32 __user *, uad
 	return do_futex(uaddr, op, val, tp, uaddr2, (unsigned long)utime, val3);
 }
 
-/* Mask of available flags for each futex in futex_waitv list */
-#define FUTEXV_WAITER_MASK (FUTEX_32 | FUTEX_PRIVATE_FLAG)
+#define FUTEX2_MASK (FUTEX2_32 | FUTEX2_PRIVATE)
 
 /**
  * futex_parse_waitv - Parse a waitv array from userspace
@@ -205,10 +204,10 @@ static int futex_parse_waitv(struct fute
 		if (copy_from_user(&aux, &uwaitv[i], sizeof(aux)))
 			return -EFAULT;
 
-		if ((aux.flags & ~FUTEXV_WAITER_MASK) || aux.__reserved)
+		if ((aux.flags & ~FUTEX2_MASK) || aux.__reserved)
 			return -EINVAL;
 
-		if (!(aux.flags & FUTEX_32))
+		if (!(aux.flags & FUTEX2_32))
 			return -EINVAL;
 
 		futexv[i].w.flags = aux.flags;



^ permalink raw reply	[flat|nested] 52+ messages in thread

* [PATCH v1 02/14] futex: Extend the FUTEX2 flags
  2023-07-21 10:22 [PATCH v1 00/14] futex: More futex2 bits Peter Zijlstra
  2023-07-21 10:22 ` [PATCH v1 01/14] futex: Clarify FUTEX2 flags Peter Zijlstra
@ 2023-07-21 10:22 ` Peter Zijlstra
  2023-07-21 15:47   ` Arnd Bergmann
  2023-07-31 16:11   ` Thomas Gleixner
  2023-07-21 10:22 ` [PATCH v1 03/14] futex: Flag conversion Peter Zijlstra
                   ` (13 subsequent siblings)
  15 siblings, 2 replies; 52+ messages in thread
From: Peter Zijlstra @ 2023-07-21 10:22 UTC (permalink / raw)
  To: tglx, axboe
  Cc: linux-kernel, peterz, mingo, dvhart, dave, andrealmeid,
	Andrew Morton, urezki, hch, lstoakes, Arnd Bergmann, linux-api,
	linux-mm, linux-arch, malteskarupke

Add the definition for the missing but always intended extra sizes,
and add a NUMA flag for the planned numa extention.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 include/uapi/linux/futex.h |    7 ++++---
 kernel/futex/syscalls.c    |    9 +++++++--
 2 files changed, 11 insertions(+), 5 deletions(-)

--- a/include/uapi/linux/futex.h
+++ b/include/uapi/linux/futex.h
@@ -46,10 +46,11 @@
 /*
  * Flags for futex2 syscalls.
  */
-			/*	0x00 */
-			/*	0x01 */
+#define FUTEX2_8		0x00
+#define FUTEX2_16		0x01
 #define FUTEX2_32		0x02
-			/*	0x04 */
+#define FUTEX2_64		0x03
+#define FUTEX2_NUMA		0x04
 			/*	0x08 */
 			/*	0x10 */
 			/*	0x20 */
--- a/kernel/futex/syscalls.c
+++ b/kernel/futex/syscalls.c
@@ -183,7 +183,7 @@ SYSCALL_DEFINE6(futex, u32 __user *, uad
 	return do_futex(uaddr, op, val, tp, uaddr2, (unsigned long)utime, val3);
 }
 
-#define FUTEX2_MASK (FUTEX2_32 | FUTEX2_PRIVATE)
+#define FUTEX2_MASK (FUTEX2_64 | FUTEX2_PRIVATE)
 
 /**
  * futex_parse_waitv - Parse a waitv array from userspace
@@ -207,7 +207,12 @@ static int futex_parse_waitv(struct fute
 		if ((aux.flags & ~FUTEX2_MASK) || aux.__reserved)
 			return -EINVAL;
 
-		if (!(aux.flags & FUTEX2_32))
+		if (!IS_ENABLED(CONFIG_64BIT) || in_compat_syscall()) {
+			if ((aux.flags & FUTEX2_64) == FUTEX2_64)
+				return -EINVAL;
+		}
+
+		if ((aux.flags & FUTEX2_64) != FUTEX2_32)
 			return -EINVAL;
 
 		futexv[i].w.flags = aux.flags;



^ permalink raw reply	[flat|nested] 52+ messages in thread

* [PATCH v1 03/14] futex: Flag conversion
  2023-07-21 10:22 [PATCH v1 00/14] futex: More futex2 bits Peter Zijlstra
  2023-07-21 10:22 ` [PATCH v1 01/14] futex: Clarify FUTEX2 flags Peter Zijlstra
  2023-07-21 10:22 ` [PATCH v1 02/14] futex: Extend the " Peter Zijlstra
@ 2023-07-21 10:22 ` Peter Zijlstra
  2023-07-31 16:21   ` Thomas Gleixner
  2023-07-21 10:22 ` [PATCH v1 04/14] futex: Validate futex value against futex size Peter Zijlstra
                   ` (12 subsequent siblings)
  15 siblings, 1 reply; 52+ messages in thread
From: Peter Zijlstra @ 2023-07-21 10:22 UTC (permalink / raw)
  To: tglx, axboe
  Cc: linux-kernel, peterz, mingo, dvhart, dave, andrealmeid,
	Andrew Morton, urezki, hch, lstoakes, Arnd Bergmann, linux-api,
	linux-mm, linux-arch, malteskarupke

Futex has 3 sets of flags:

 - legacy futex op bits
 - futex2 flags
 - internal flags

Add a few helpers to convert from the API flags into the internal
flags.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 kernel/futex/futex.h    |   64 +++++++++++++++++++++++++++++++++++++++++++++---
 kernel/futex/syscalls.c |   24 ++++++------------
 kernel/futex/waitwake.c |    4 +--
 3 files changed, 72 insertions(+), 20 deletions(-)

--- a/kernel/futex/futex.h
+++ b/kernel/futex/futex.h
@@ -5,6 +5,7 @@
 #include <linux/futex.h>
 #include <linux/rtmutex.h>
 #include <linux/sched/wake_q.h>
+#include <linux/compat.h>
 
 #ifdef CONFIG_PREEMPT_RT
 #include <linux/rcuwait.h>
@@ -16,8 +17,15 @@
  * Futex flags used to encode options to functions and preserve them across
  * restarts.
  */
+#define FLAGS_SIZE_8		0x00
+#define FLAGS_SIZE_16		0x01
+#define FLAGS_SIZE_32		0x02
+#define FLAGS_SIZE_64		0x03
+
+#define FLAGS_SIZE_MASK		0x03
+
 #ifdef CONFIG_MMU
-# define FLAGS_SHARED		0x01
+# define FLAGS_SHARED		0x10
 #else
 /*
  * NOMMU does not have per process address space. Let the compiler optimize
@@ -25,8 +33,58 @@
  */
 # define FLAGS_SHARED		0x00
 #endif
-#define FLAGS_CLOCKRT		0x02
-#define FLAGS_HAS_TIMEOUT	0x04
+#define FLAGS_CLOCKRT		0x20
+#define FLAGS_HAS_TIMEOUT	0x40
+#define FLAGS_NUMA		0x80
+
+/* FUTEX_ to FLAGS_ */
+static inline unsigned int futex_to_flags(unsigned int op)
+{
+	unsigned int flags = FLAGS_SIZE_32;
+
+	if (!(op & FUTEX_PRIVATE_FLAG))
+		flags |= FLAGS_SHARED;
+
+	if (op & FUTEX_CLOCK_REALTIME)
+		flags |= FLAGS_CLOCKRT;
+
+	return flags;
+}
+
+/* FUTEX2_ to FLAGS_ */
+static inline unsigned int futex2_to_flags(unsigned int flags2)
+{
+	unsigned int flags = flags2 & FUTEX2_64;
+
+	if (!(flags2 & FUTEX2_PRIVATE))
+		flags |= FLAGS_SHARED;
+
+	if (flags2 & FUTEX2_NUMA)
+		flags |= FLAGS_NUMA;
+
+	return flags;
+}
+
+static inline bool futex_flags_valid(unsigned int flags)
+{
+	/* Only 64bit futexes for 64bit code */
+	if (!IS_ENABLED(CONFIG_64BIT) || in_compat_syscall()) {
+		if ((flags & FLAGS_SIZE_MASK) == FLAGS_SIZE_64)
+			return false;
+	}
+
+	/* Only 32bit futexes are implemented -- for now */
+	if ((flags & FLAGS_SIZE_MASK) != FLAGS_SIZE_32)
+		return false;
+
+	return true;
+}
+
+static inline unsigned int futex_size(unsigned int flags)
+{
+	unsigned int size = flags & FLAGS_SIZE_MASK;
+	return 1 << size; /* {0,1,2,3} -> {1,2,4,8} */
+}
 
 #ifdef CONFIG_FAIL_FUTEX
 extern bool should_fail_futex(bool fshared);
--- a/kernel/futex/syscalls.c
+++ b/kernel/futex/syscalls.c
@@ -1,6 +1,5 @@
 // SPDX-License-Identifier: GPL-2.0-or-later
 
-#include <linux/compat.h>
 #include <linux/syscalls.h>
 #include <linux/time_namespace.h>
 
@@ -85,15 +84,12 @@ SYSCALL_DEFINE3(get_robust_list, int, pi
 long do_futex(u32 __user *uaddr, int op, u32 val, ktime_t *timeout,
 		u32 __user *uaddr2, u32 val2, u32 val3)
 {
+	unsigned int flags = futex_to_flags(op);
 	int cmd = op & FUTEX_CMD_MASK;
-	unsigned int flags = 0;
 
-	if (!(op & FUTEX_PRIVATE_FLAG))
-		flags |= FLAGS_SHARED;
-
-	if (op & FUTEX_CLOCK_REALTIME) {
-		flags |= FLAGS_CLOCKRT;
-		if (cmd != FUTEX_WAIT_BITSET && cmd != FUTEX_WAIT_REQUEUE_PI &&
+	if (flags & FLAGS_CLOCKRT) {
+		if (cmd != FUTEX_WAIT_BITSET &&
+		    cmd != FUTEX_WAIT_REQUEUE_PI &&
 		    cmd != FUTEX_LOCK_PI2)
 			return -ENOSYS;
 	}
@@ -201,21 +197,19 @@ static int futex_parse_waitv(struct fute
 	unsigned int i;
 
 	for (i = 0; i < nr_futexes; i++) {
+		unsigned int flags;
+
 		if (copy_from_user(&aux, &uwaitv[i], sizeof(aux)))
 			return -EFAULT;
 
 		if ((aux.flags & ~FUTEX2_MASK) || aux.__reserved)
 			return -EINVAL;
 
-		if (!IS_ENABLED(CONFIG_64BIT) || in_compat_syscall()) {
-			if ((aux.flags & FUTEX2_64) == FUTEX2_64)
-				return -EINVAL;
-		}
-
-		if ((aux.flags & FUTEX2_64) != FUTEX2_32)
+		flags = futex2_to_flags(aux.flags);
+		if (!futex_flags_valid(flags))
 			return -EINVAL;
 
-		futexv[i].w.flags = aux.flags;
+		futexv[i].w.flags = flags;
 		futexv[i].w.val = aux.val;
 		futexv[i].w.uaddr = aux.uaddr;
 		futexv[i].q = futex_q_init;
--- a/kernel/futex/waitwake.c
+++ b/kernel/futex/waitwake.c
@@ -419,11 +419,11 @@ static int futex_wait_multiple_setup(str
 	 */
 retry:
 	for (i = 0; i < count; i++) {
-		if ((vs[i].w.flags & FUTEX_PRIVATE_FLAG) && retry)
+		if (!(vs[i].w.flags & FLAGS_SHARED) && retry)
 			continue;
 
 		ret = get_futex_key(u64_to_user_ptr(vs[i].w.uaddr),
-				    !(vs[i].w.flags & FUTEX_PRIVATE_FLAG),
+				    vs[i].w.flags & FLAGS_SHARED,
 				    &vs[i].q.key, FUTEX_READ);
 
 		if (unlikely(ret))



^ permalink raw reply	[flat|nested] 52+ messages in thread

* [PATCH v1 04/14] futex: Validate futex value against futex size
  2023-07-21 10:22 [PATCH v1 00/14] futex: More futex2 bits Peter Zijlstra
                   ` (2 preceding siblings ...)
  2023-07-21 10:22 ` [PATCH v1 03/14] futex: Flag conversion Peter Zijlstra
@ 2023-07-21 10:22 ` Peter Zijlstra
  2023-07-31 17:12   ` Thomas Gleixner
  2023-07-21 10:22 ` [PATCH v1 05/14] futex: Add sys_futex_wake() Peter Zijlstra
                   ` (11 subsequent siblings)
  15 siblings, 1 reply; 52+ messages in thread
From: Peter Zijlstra @ 2023-07-21 10:22 UTC (permalink / raw)
  To: tglx, axboe
  Cc: linux-kernel, peterz, mingo, dvhart, dave, andrealmeid,
	Andrew Morton, urezki, hch, lstoakes, Arnd Bergmann, linux-api,
	linux-mm, linux-arch, malteskarupke

Ensure the futex value fits in the given futex size. Since this adds a
constraint to an existing syscall, it might possibly change behaviour.

Currently the value would be truncated to a u32 and any high bits
would get silently lost.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 kernel/futex/futex.h    |    8 ++++++++
 kernel/futex/syscalls.c |    3 +++
 2 files changed, 11 insertions(+)

--- a/kernel/futex/futex.h
+++ b/kernel/futex/futex.h
@@ -86,6 +86,14 @@ static inline unsigned int futex_size(un
 	return 1 << size; /* {0,1,2,3} -> {1,2,4,8} */
 }
 
+static inline bool futex_validate_input(unsigned int flags, u64 val)
+{
+	int bits = 8 * futex_size(flags);
+	if (bits < 64 && (val >> bits))
+		return false;
+	return true;
+}
+
 #ifdef CONFIG_FAIL_FUTEX
 extern bool should_fail_futex(bool fshared);
 #else
--- a/kernel/futex/syscalls.c
+++ b/kernel/futex/syscalls.c
@@ -209,6 +209,9 @@ static int futex_parse_waitv(struct fute
 		if (!futex_flags_valid(flags))
 			return -EINVAL;
 
+		if (!futex_validate_input(flags, aux.val))
+			return -EINVAL;
+
 		futexv[i].w.flags = flags;
 		futexv[i].w.val = aux.val;
 		futexv[i].w.uaddr = aux.uaddr;



^ permalink raw reply	[flat|nested] 52+ messages in thread

* [PATCH v1 05/14] futex: Add sys_futex_wake()
  2023-07-21 10:22 [PATCH v1 00/14] futex: More futex2 bits Peter Zijlstra
                   ` (3 preceding siblings ...)
  2023-07-21 10:22 ` [PATCH v1 04/14] futex: Validate futex value against futex size Peter Zijlstra
@ 2023-07-21 10:22 ` Peter Zijlstra
  2023-07-21 15:41   ` Arnd Bergmann
  2023-07-25  7:22   ` Geert Uytterhoeven
  2023-07-21 10:22 ` [PATCH v1 06/14] futex: Add sys_futex_wait() Peter Zijlstra
                   ` (10 subsequent siblings)
  15 siblings, 2 replies; 52+ messages in thread
From: Peter Zijlstra @ 2023-07-21 10:22 UTC (permalink / raw)
  To: tglx, axboe
  Cc: linux-kernel, peterz, mingo, dvhart, dave, andrealmeid,
	Andrew Morton, urezki, hch, lstoakes, Arnd Bergmann, linux-api,
	linux-mm, linux-arch, malteskarupke

To complement sys_futex_waitv() add sys_futex_wake(). This syscall
implements what was previously known as FUTEX_WAKE_BITSET except it
uses 'unsigned long' for the bitmask and takes FUTEX2 flags.

The 'unsigned long' allows FUTEX2_64 on 64bit platforms.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 arch/alpha/kernel/syscalls/syscall.tbl      |    1 
 arch/arm/tools/syscall.tbl                  |    1 
 arch/arm64/include/asm/unistd32.h           |    2 +
 arch/ia64/kernel/syscalls/syscall.tbl       |    1 
 arch/m68k/kernel/syscalls/syscall.tbl       |    1 
 arch/microblaze/kernel/syscalls/syscall.tbl |    1 
 arch/mips/kernel/syscalls/syscall_n32.tbl   |    1 
 arch/mips/kernel/syscalls/syscall_n64.tbl   |    1 
 arch/mips/kernel/syscalls/syscall_o32.tbl   |    1 
 arch/parisc/kernel/syscalls/syscall.tbl     |    1 
 arch/powerpc/kernel/syscalls/syscall.tbl    |    1 
 arch/s390/kernel/syscalls/syscall.tbl       |    1 
 arch/sh/kernel/syscalls/syscall.tbl         |    1 
 arch/sparc/kernel/syscalls/syscall.tbl      |    1 
 arch/x86/entry/syscalls/syscall_32.tbl      |    1 
 arch/x86/entry/syscalls/syscall_64.tbl      |    1 
 arch/xtensa/kernel/syscalls/syscall.tbl     |    1 
 include/linux/syscalls.h                    |    3 ++
 include/uapi/asm-generic/unistd.h           |    5 ++--
 kernel/futex/syscalls.c                     |   30 ++++++++++++++++++++++++++++
 kernel/sys_ni.c                             |    1 
 21 files changed, 55 insertions(+), 2 deletions(-)

--- a/arch/alpha/kernel/syscalls/syscall.tbl
+++ b/arch/alpha/kernel/syscalls/syscall.tbl
@@ -491,3 +491,4 @@
 559	common  futex_waitv                     sys_futex_waitv
 560	common	set_mempolicy_home_node		sys_ni_syscall
 561	common	cachestat			sys_cachestat
+562	common	futex_wake			sys_futex_wake
--- a/arch/arm/tools/syscall.tbl
+++ b/arch/arm/tools/syscall.tbl
@@ -465,3 +465,4 @@
 449	common	futex_waitv			sys_futex_waitv
 450	common	set_mempolicy_home_node		sys_set_mempolicy_home_node
 451	common	cachestat			sys_cachestat
+452	common	futex_wake			sys_futex_wake
--- a/arch/arm64/include/asm/unistd32.h
+++ b/arch/arm64/include/asm/unistd32.h
@@ -909,6 +909,8 @@ __SYSCALL(__NR_futex_waitv, sys_futex_wa
 __SYSCALL(__NR_set_mempolicy_home_node, sys_set_mempolicy_home_node)
 #define __NR_cachestat 451
 __SYSCALL(__NR_cachestat, sys_cachestat)
+#define __NR_futex_wake 452
+__SYSCALL(__NR_futex_wake, sys_futex_wake)
 
 /*
  * Please add new compat syscalls above this comment and update
--- a/arch/ia64/kernel/syscalls/syscall.tbl
+++ b/arch/ia64/kernel/syscalls/syscall.tbl
@@ -372,3 +372,4 @@
 449	common  futex_waitv                     sys_futex_waitv
 450	common	set_mempolicy_home_node		sys_set_mempolicy_home_node
 451	common	cachestat			sys_cachestat
+452	common	futex_wake			sys_futex_wake
--- a/arch/m68k/kernel/syscalls/syscall.tbl
+++ b/arch/m68k/kernel/syscalls/syscall.tbl
@@ -451,3 +451,4 @@
 449	common  futex_waitv                     sys_futex_waitv
 450	common	set_mempolicy_home_node		sys_set_mempolicy_home_node
 451	common	cachestat			sys_cachestat
+452	common	futex_wake			sys_futex_wake
--- a/arch/microblaze/kernel/syscalls/syscall.tbl
+++ b/arch/microblaze/kernel/syscalls/syscall.tbl
@@ -457,3 +457,4 @@
 449	common  futex_waitv                     sys_futex_waitv
 450	common	set_mempolicy_home_node		sys_set_mempolicy_home_node
 451	common	cachestat			sys_cachestat
+452	common	futex_wake			sys_futex_wake
--- a/arch/mips/kernel/syscalls/syscall_n32.tbl
+++ b/arch/mips/kernel/syscalls/syscall_n32.tbl
@@ -390,3 +390,4 @@
 449	n32	futex_waitv			sys_futex_waitv
 450	n32	set_mempolicy_home_node		sys_set_mempolicy_home_node
 451	n32	cachestat			sys_cachestat
+452	n32	futex_wake			sys_futex_wake
--- a/arch/mips/kernel/syscalls/syscall_n64.tbl
+++ b/arch/mips/kernel/syscalls/syscall_n64.tbl
@@ -366,3 +366,4 @@
 449	n64	futex_waitv			sys_futex_waitv
 450	common	set_mempolicy_home_node		sys_set_mempolicy_home_node
 451	n64	cachestat			sys_cachestat
+452	n64	futex_wake			sys_futex_wake
--- a/arch/mips/kernel/syscalls/syscall_o32.tbl
+++ b/arch/mips/kernel/syscalls/syscall_o32.tbl
@@ -439,3 +439,4 @@
 449	o32	futex_waitv			sys_futex_waitv
 450	o32	set_mempolicy_home_node		sys_set_mempolicy_home_node
 451	o32	cachestat			sys_cachestat
+452	o32	futex_wake			sys_futex_wake
--- a/arch/parisc/kernel/syscalls/syscall.tbl
+++ b/arch/parisc/kernel/syscalls/syscall.tbl
@@ -450,3 +450,4 @@
 449	common	futex_waitv			sys_futex_waitv
 450	common	set_mempolicy_home_node		sys_set_mempolicy_home_node
 451	common	cachestat			sys_cachestat
+452	common	futex_wake			sys_futex_wake
--- a/arch/powerpc/kernel/syscalls/syscall.tbl
+++ b/arch/powerpc/kernel/syscalls/syscall.tbl
@@ -538,3 +538,4 @@
 449	common  futex_waitv                     sys_futex_waitv
 450 	nospu	set_mempolicy_home_node		sys_set_mempolicy_home_node
 451	common	cachestat			sys_cachestat
+452	common	futex_wake			sys_futex_wake
--- a/arch/s390/kernel/syscalls/syscall.tbl
+++ b/arch/s390/kernel/syscalls/syscall.tbl
@@ -454,3 +454,4 @@
 449  common	futex_waitv		sys_futex_waitv			sys_futex_waitv
 450  common	set_mempolicy_home_node	sys_set_mempolicy_home_node	sys_set_mempolicy_home_node
 451  common	cachestat		sys_cachestat			sys_cachestat
+452  common	futex_wake		sys_futex_wake			sys_futex_wake
--- a/arch/sh/kernel/syscalls/syscall.tbl
+++ b/arch/sh/kernel/syscalls/syscall.tbl
@@ -454,3 +454,4 @@
 449	common  futex_waitv                     sys_futex_waitv
 450	common	set_mempolicy_home_node		sys_set_mempolicy_home_node
 451	common	cachestat			sys_cachestat
+452	common	futex_wake			sys_futex_wake
--- a/arch/sparc/kernel/syscalls/syscall.tbl
+++ b/arch/sparc/kernel/syscalls/syscall.tbl
@@ -497,3 +497,4 @@
 449	common  futex_waitv                     sys_futex_waitv
 450	common	set_mempolicy_home_node		sys_set_mempolicy_home_node
 451	common	cachestat			sys_cachestat
+452	common	futex_wake			sys_futex_wake
--- a/arch/x86/entry/syscalls/syscall_32.tbl
+++ b/arch/x86/entry/syscalls/syscall_32.tbl
@@ -456,3 +456,4 @@
 449	i386	futex_waitv		sys_futex_waitv
 450	i386	set_mempolicy_home_node		sys_set_mempolicy_home_node
 451	i386	cachestat		sys_cachestat
+452	i386	futex_wake		sys_futex_wake
--- a/arch/x86/entry/syscalls/syscall_64.tbl
+++ b/arch/x86/entry/syscalls/syscall_64.tbl
@@ -373,6 +373,7 @@
 449	common	futex_waitv		sys_futex_waitv
 450	common	set_mempolicy_home_node	sys_set_mempolicy_home_node
 451	common	cachestat		sys_cachestat
+452	common	futex_wake		sys_futex_wake
 
 #
 # Due to a historical design error, certain syscalls are numbered differently
--- a/arch/xtensa/kernel/syscalls/syscall.tbl
+++ b/arch/xtensa/kernel/syscalls/syscall.tbl
@@ -422,3 +422,4 @@
 449	common  futex_waitv                     sys_futex_waitv
 450	common	set_mempolicy_home_node		sys_set_mempolicy_home_node
 451	common	cachestat			sys_cachestat
+452	common	futex_wake			sys_futex_wake
--- a/include/linux/syscalls.h
+++ b/include/linux/syscalls.h
@@ -563,6 +563,9 @@ asmlinkage long sys_set_robust_list(stru
 asmlinkage long sys_futex_waitv(struct futex_waitv *waiters,
 				unsigned int nr_futexes, unsigned int flags,
 				struct __kernel_timespec __user *timeout, clockid_t clockid);
+
+asmlinkage long sys_futex_wake(void __user *uaddr, unsigned long mask, int nr, unsigned int flags);
+
 asmlinkage long sys_nanosleep(struct __kernel_timespec __user *rqtp,
 			      struct __kernel_timespec __user *rmtp);
 asmlinkage long sys_nanosleep_time32(struct old_timespec32 __user *rqtp,
--- a/include/uapi/asm-generic/unistd.h
+++ b/include/uapi/asm-generic/unistd.h
@@ -816,12 +816,13 @@ __SYSCALL(__NR_process_mrelease, sys_pro
 __SYSCALL(__NR_futex_waitv, sys_futex_waitv)
 #define __NR_set_mempolicy_home_node 450
 __SYSCALL(__NR_set_mempolicy_home_node, sys_set_mempolicy_home_node)
-
 #define __NR_cachestat 451
 __SYSCALL(__NR_cachestat, sys_cachestat)
+#define __NR_futex_wake 452
+__SYSCALL(__NR_futex_wake, sys_futex_wake)
 
 #undef __NR_syscalls
-#define __NR_syscalls 452
+#define __NR_syscalls 453
 
 /*
  * 32 bit systems traditionally used different
--- a/kernel/futex/syscalls.c
+++ b/kernel/futex/syscalls.c
@@ -306,6 +306,36 @@ SYSCALL_DEFINE5(futex_waitv, struct fute
 	return ret;
 }
 
+/*
+ * sys_futex_wake - Wake a number of futexes
+ * @uaddr:	Address of the futex(es) to wake
+ * @mask:	bitmask
+ * @nr:		Number of the futexes to wake
+ * @flags:	FUTEX2 flags
+ *
+ * Identical to the traditional FUTEX_WAKE_BITSET op, except it is part of the
+ * futex2 family of calls.
+ */
+
+SYSCALL_DEFINE4(futex_wake,
+		void __user *, uaddr,
+		unsigned long, mask,
+		int, nr,
+		unsigned int, flags)
+{
+	if (flags & ~FUTEX2_MASK)
+		return -EINVAL;
+
+	flags = futex2_to_flags(flags);
+	if (!futex_flags_valid(flags))
+		return -EINVAL;
+
+	if (!futex_validate_input(flags, mask))
+		return -EINVAL;
+
+	return futex_wake(uaddr, flags, nr, mask);
+}
+
 #ifdef CONFIG_COMPAT
 COMPAT_SYSCALL_DEFINE2(set_robust_list,
 		struct compat_robust_list_head __user *, head,
--- a/kernel/sys_ni.c
+++ b/kernel/sys_ni.c
@@ -87,6 +87,7 @@ COND_SYSCALL_COMPAT(set_robust_list);
 COND_SYSCALL(get_robust_list);
 COND_SYSCALL_COMPAT(get_robust_list);
 COND_SYSCALL(futex_waitv);
+COND_SYSCALL(futex_wake);
 COND_SYSCALL(kexec_load);
 COND_SYSCALL_COMPAT(kexec_load);
 COND_SYSCALL(init_module);



^ permalink raw reply	[flat|nested] 52+ messages in thread

* [PATCH v1 06/14] futex: Add sys_futex_wait()
  2023-07-21 10:22 [PATCH v1 00/14] futex: More futex2 bits Peter Zijlstra
                   ` (4 preceding siblings ...)
  2023-07-21 10:22 ` [PATCH v1 05/14] futex: Add sys_futex_wake() Peter Zijlstra
@ 2023-07-21 10:22 ` Peter Zijlstra
  2023-07-25  7:22   ` Geert Uytterhoeven
  2023-07-31 16:35   ` Thomas Gleixner
  2023-07-21 10:22 ` [PATCH v1 07/14] futex: Propagate flags into get_futex_key() Peter Zijlstra
                   ` (9 subsequent siblings)
  15 siblings, 2 replies; 52+ messages in thread
From: Peter Zijlstra @ 2023-07-21 10:22 UTC (permalink / raw)
  To: tglx, axboe
  Cc: linux-kernel, peterz, mingo, dvhart, dave, andrealmeid,
	Andrew Morton, urezki, hch, lstoakes, Arnd Bergmann, linux-api,
	linux-mm, linux-arch, malteskarupke

To complement sys_futex_waitv()/wake(), add sys_futex_wait(). This
syscall implements what was previously known as FUTEX_WAIT_BITSET
except it uses 'unsigned long' for the value and bitmask arguments,
takes timespec and clockid_t arguments for the absolute timeout and
uses FUTEX2 flags.

The 'unsigned long' allows FUTEX2_64 on 64bit platforms.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 arch/alpha/kernel/syscalls/syscall.tbl      |    1 
 arch/arm/tools/syscall.tbl                  |    1 
 arch/arm64/include/asm/unistd32.h           |    2 
 arch/ia64/kernel/syscalls/syscall.tbl       |    1 
 arch/m68k/kernel/syscalls/syscall.tbl       |    1 
 arch/microblaze/kernel/syscalls/syscall.tbl |    1 
 arch/mips/kernel/syscalls/syscall_n32.tbl   |    1 
 arch/mips/kernel/syscalls/syscall_n64.tbl   |    1 
 arch/mips/kernel/syscalls/syscall_o32.tbl   |    1 
 arch/parisc/kernel/syscalls/syscall.tbl     |    1 
 arch/powerpc/kernel/syscalls/syscall.tbl    |    1 
 arch/s390/kernel/syscalls/syscall.tbl       |    1 
 arch/sh/kernel/syscalls/syscall.tbl         |    1 
 arch/sparc/kernel/syscalls/syscall.tbl      |    1 
 arch/x86/entry/syscalls/syscall_32.tbl      |    1 
 arch/x86/entry/syscalls/syscall_64.tbl      |    1 
 arch/xtensa/kernel/syscalls/syscall.tbl     |    1 
 include/linux/syscalls.h                    |    4 
 include/uapi/asm-generic/unistd.h           |    4 
 kernel/futex/futex.h                        |    3 
 kernel/futex/syscalls.c                     |  120 +++++++++++++++++++++-------
 kernel/futex/waitwake.c                     |   65 ++++++++-------
 kernel/sys_ni.c                             |    1 
 23 files changed, 156 insertions(+), 59 deletions(-)

--- a/arch/alpha/kernel/syscalls/syscall.tbl
+++ b/arch/alpha/kernel/syscalls/syscall.tbl
@@ -492,3 +492,4 @@
 560	common	set_mempolicy_home_node		sys_ni_syscall
 561	common	cachestat			sys_cachestat
 562	common	futex_wake			sys_futex_wake
+563	common	futex_wait			sys_futex_wait
--- a/arch/arm/tools/syscall.tbl
+++ b/arch/arm/tools/syscall.tbl
@@ -466,3 +466,4 @@
 450	common	set_mempolicy_home_node		sys_set_mempolicy_home_node
 451	common	cachestat			sys_cachestat
 452	common	futex_wake			sys_futex_wake
+453	common	futex_wait			sys_futex_wait
--- a/arch/arm64/include/asm/unistd32.h
+++ b/arch/arm64/include/asm/unistd32.h
@@ -911,6 +911,8 @@ __SYSCALL(__NR_set_mempolicy_home_node,
 __SYSCALL(__NR_cachestat, sys_cachestat)
 #define __NR_futex_wake 452
 __SYSCALL(__NR_futex_wake, sys_futex_wake)
+#define __NR_futex_wait 453
+__SYSCALL(__NR_futex_wait, sys_futex_wait)
 
 /*
  * Please add new compat syscalls above this comment and update
--- a/arch/ia64/kernel/syscalls/syscall.tbl
+++ b/arch/ia64/kernel/syscalls/syscall.tbl
@@ -373,3 +373,4 @@
 450	common	set_mempolicy_home_node		sys_set_mempolicy_home_node
 451	common	cachestat			sys_cachestat
 452	common	futex_wake			sys_futex_wake
+453	common	futex_wait			sys_futex_wait
--- a/arch/m68k/kernel/syscalls/syscall.tbl
+++ b/arch/m68k/kernel/syscalls/syscall.tbl
@@ -452,3 +452,4 @@
 450	common	set_mempolicy_home_node		sys_set_mempolicy_home_node
 451	common	cachestat			sys_cachestat
 452	common	futex_wake			sys_futex_wake
+453	common	futex_wait			sys_futex_wait
--- a/arch/microblaze/kernel/syscalls/syscall.tbl
+++ b/arch/microblaze/kernel/syscalls/syscall.tbl
@@ -458,3 +458,4 @@
 450	common	set_mempolicy_home_node		sys_set_mempolicy_home_node
 451	common	cachestat			sys_cachestat
 452	common	futex_wake			sys_futex_wake
+453	common	futex_wait			sys_futex_wait
--- a/arch/mips/kernel/syscalls/syscall_n32.tbl
+++ b/arch/mips/kernel/syscalls/syscall_n32.tbl
@@ -391,3 +391,4 @@
 450	n32	set_mempolicy_home_node		sys_set_mempolicy_home_node
 451	n32	cachestat			sys_cachestat
 452	n32	futex_wake			sys_futex_wake
+453	n32	futex_wait			sys_futex_wait
--- a/arch/mips/kernel/syscalls/syscall_n64.tbl
+++ b/arch/mips/kernel/syscalls/syscall_n64.tbl
@@ -367,3 +367,4 @@
 450	common	set_mempolicy_home_node		sys_set_mempolicy_home_node
 451	n64	cachestat			sys_cachestat
 452	n64	futex_wake			sys_futex_wake
+453	n64	futex_wait			sys_futex_wait
--- a/arch/mips/kernel/syscalls/syscall_o32.tbl
+++ b/arch/mips/kernel/syscalls/syscall_o32.tbl
@@ -440,3 +440,4 @@
 450	o32	set_mempolicy_home_node		sys_set_mempolicy_home_node
 451	o32	cachestat			sys_cachestat
 452	o32	futex_wake			sys_futex_wake
+453	o32	futex_wait			sys_futex_wait
--- a/arch/parisc/kernel/syscalls/syscall.tbl
+++ b/arch/parisc/kernel/syscalls/syscall.tbl
@@ -451,3 +451,4 @@
 450	common	set_mempolicy_home_node		sys_set_mempolicy_home_node
 451	common	cachestat			sys_cachestat
 452	common	futex_wake			sys_futex_wake
+453	common	futex_wait			sys_futex_wait
--- a/arch/powerpc/kernel/syscalls/syscall.tbl
+++ b/arch/powerpc/kernel/syscalls/syscall.tbl
@@ -539,3 +539,4 @@
 450 	nospu	set_mempolicy_home_node		sys_set_mempolicy_home_node
 451	common	cachestat			sys_cachestat
 452	common	futex_wake			sys_futex_wake
+453	common	futex_wait			sys_futex_wait
--- a/arch/s390/kernel/syscalls/syscall.tbl
+++ b/arch/s390/kernel/syscalls/syscall.tbl
@@ -455,3 +455,4 @@
 450  common	set_mempolicy_home_node	sys_set_mempolicy_home_node	sys_set_mempolicy_home_node
 451  common	cachestat		sys_cachestat			sys_cachestat
 452  common	futex_wake		sys_futex_wake			sys_futex_wake
+453  common	futex_wait		sys_futex_wait			sys_futex_wait
--- a/arch/sh/kernel/syscalls/syscall.tbl
+++ b/arch/sh/kernel/syscalls/syscall.tbl
@@ -455,3 +455,4 @@
 450	common	set_mempolicy_home_node		sys_set_mempolicy_home_node
 451	common	cachestat			sys_cachestat
 452	common	futex_wake			sys_futex_wake
+453	common	futex_wait			sys_futex_wait
--- a/arch/sparc/kernel/syscalls/syscall.tbl
+++ b/arch/sparc/kernel/syscalls/syscall.tbl
@@ -498,3 +498,4 @@
 450	common	set_mempolicy_home_node		sys_set_mempolicy_home_node
 451	common	cachestat			sys_cachestat
 452	common	futex_wake			sys_futex_wake
+453	common	futex_wait			sys_futex_wait
--- a/arch/x86/entry/syscalls/syscall_32.tbl
+++ b/arch/x86/entry/syscalls/syscall_32.tbl
@@ -457,3 +457,4 @@
 450	i386	set_mempolicy_home_node		sys_set_mempolicy_home_node
 451	i386	cachestat		sys_cachestat
 452	i386	futex_wake		sys_futex_wake
+453	i386	futex_wait		sys_futex_wait
--- a/arch/x86/entry/syscalls/syscall_64.tbl
+++ b/arch/x86/entry/syscalls/syscall_64.tbl
@@ -374,6 +374,7 @@
 450	common	set_mempolicy_home_node	sys_set_mempolicy_home_node
 451	common	cachestat		sys_cachestat
 452	common	futex_wake		sys_futex_wake
+453	common	futex_wait		sys_futex_wait
 
 #
 # Due to a historical design error, certain syscalls are numbered differently
--- a/arch/xtensa/kernel/syscalls/syscall.tbl
+++ b/arch/xtensa/kernel/syscalls/syscall.tbl
@@ -423,3 +423,4 @@
 450	common	set_mempolicy_home_node		sys_set_mempolicy_home_node
 451	common	cachestat			sys_cachestat
 452	common	futex_wake			sys_futex_wake
+453	common	futex_wait			sys_futex_wait
--- a/include/linux/syscalls.h
+++ b/include/linux/syscalls.h
@@ -566,6 +566,10 @@ asmlinkage long sys_futex_waitv(struct f
 
 asmlinkage long sys_futex_wake(void __user *uaddr, unsigned long mask, int nr, unsigned int flags);
 
+asmlinkage long sys_futex_wait(void __user *uaddr, unsigned long val, unsigned long mask,
+			       unsigned int flags, struct __kernel_timespec __user *timespec,
+			       clockid_t clockid);
+
 asmlinkage long sys_nanosleep(struct __kernel_timespec __user *rqtp,
 			      struct __kernel_timespec __user *rmtp);
 asmlinkage long sys_nanosleep_time32(struct old_timespec32 __user *rqtp,
--- a/include/uapi/asm-generic/unistd.h
+++ b/include/uapi/asm-generic/unistd.h
@@ -820,9 +820,11 @@ __SYSCALL(__NR_set_mempolicy_home_node,
 __SYSCALL(__NR_cachestat, sys_cachestat)
 #define __NR_futex_wake 452
 __SYSCALL(__NR_futex_wake, sys_futex_wake)
+#define __NR_futex_wait 453
+__SYSCALL(__NR_futex_wait, sys_futex_wait)
 
 #undef __NR_syscalls
-#define __NR_syscalls 453
+#define __NR_syscalls 454
 
 /*
  * 32 bit systems traditionally used different
--- a/kernel/futex/futex.h
+++ b/kernel/futex/futex.h
@@ -330,6 +330,9 @@ extern int futex_requeue(u32 __user *uad
 			 u32 __user *uaddr2, int nr_wake, int nr_requeue,
 			 u32 *cmpval, int requeue_pi);
 
+extern int __futex_wait(u32 __user *uaddr, unsigned int flags, u32 val,
+			struct hrtimer_sleeper *to, u32 bitset);
+
 extern int futex_wait(u32 __user *uaddr, unsigned int flags, u32 val,
 		      ktime_t *abs_time, u32 bitset);
 
--- a/kernel/futex/syscalls.c
+++ b/kernel/futex/syscalls.c
@@ -221,6 +221,46 @@ static int futex_parse_waitv(struct fute
 	return 0;
 }
 
+static int futex2_setup_timeout(struct __kernel_timespec __user *timeout,
+				clockid_t clockid, struct hrtimer_sleeper *to)
+{
+	int flag_clkid = 0, flag_init = 0;
+	struct timespec64 ts;
+	ktime_t time;
+	int ret;
+
+	if (!timeout)
+		return 0;
+
+	if (clockid == CLOCK_REALTIME) {
+		flag_clkid = FLAGS_CLOCKRT;
+		flag_init = FUTEX_CLOCK_REALTIME;
+	}
+
+	if (clockid != CLOCK_REALTIME && clockid != CLOCK_MONOTONIC)
+		return -EINVAL;
+
+	if (get_timespec64(&ts, timeout))
+		return -EFAULT;
+
+	/*
+	 * Since there's no opcode for futex_waitv, use
+	 * FUTEX_WAIT_BITSET that uses absolute timeout as well
+	 */
+	ret = futex_init_timeout(FUTEX_WAIT_BITSET, flag_init, &ts, &time);
+	if (ret)
+		return ret;
+
+	futex_setup_timer(&time, to, flag_clkid, 0);
+	return 0;
+}
+
+static inline void futex2_destroy_timeout(struct hrtimer_sleeper *to)
+{
+	hrtimer_cancel(&to->timer);
+	destroy_hrtimer_on_stack(&to->timer);
+}
+
 /**
  * sys_futex_waitv - Wait on a list of futexes
  * @waiters:    List of futexes to wait on
@@ -250,8 +290,6 @@ SYSCALL_DEFINE5(futex_waitv, struct fute
 {
 	struct hrtimer_sleeper to;
 	struct futex_vector *futexv;
-	struct timespec64 ts;
-	ktime_t time;
 	int ret;
 
 	/* This syscall supports no flags for now */
@@ -261,30 +299,8 @@ SYSCALL_DEFINE5(futex_waitv, struct fute
 	if (!nr_futexes || nr_futexes > FUTEX_WAITV_MAX || !waiters)
 		return -EINVAL;
 
-	if (timeout) {
-		int flag_clkid = 0, flag_init = 0;
-
-		if (clockid == CLOCK_REALTIME) {
-			flag_clkid = FLAGS_CLOCKRT;
-			flag_init = FUTEX_CLOCK_REALTIME;
-		}
-
-		if (clockid != CLOCK_REALTIME && clockid != CLOCK_MONOTONIC)
-			return -EINVAL;
-
-		if (get_timespec64(&ts, timeout))
-			return -EFAULT;
-
-		/*
-		 * Since there's no opcode for futex_waitv, use
-		 * FUTEX_WAIT_BITSET that uses absolute timeout as well
-		 */
-		ret = futex_init_timeout(FUTEX_WAIT_BITSET, flag_init, &ts, &time);
-		if (ret)
-			return ret;
-
-		futex_setup_timer(&time, &to, flag_clkid, 0);
-	}
+	if (timeout && (ret = futex2_setup_timeout(timeout, clockid, &to)))
+		return ret;
 
 	futexv = kcalloc(nr_futexes, sizeof(*futexv), GFP_KERNEL);
 	if (!futexv) {
@@ -299,10 +315,8 @@ SYSCALL_DEFINE5(futex_waitv, struct fute
 	kfree(futexv);
 
 destroy_timer:
-	if (timeout) {
-		hrtimer_cancel(&to.timer);
-		destroy_hrtimer_on_stack(&to.timer);
-	}
+	if (timeout)
+		futex2_destroy_timeout(&to);
 	return ret;
 }
 
@@ -336,6 +350,52 @@ SYSCALL_DEFINE4(futex_wake,
 	return futex_wake(uaddr, flags, nr, mask);
 }
 
+/*
+ * sys_futex_wait - Wait on a futex
+ * @uaddr:	Address of the futex to wait on
+ * @val:	Value of @uaddr
+ * @mask:	bitmask
+ * @flags:	FUTEX2 flags
+ * @timeout:	Optional absolute timeout
+ * @clockid:	Clock to be used for the timeout, realtime or monotonic
+ *
+ * Identical to the traditional FUTEX_WAIT_BITSET op, except it is part of the
+ * futex2 familiy of calls.
+ */
+
+SYSCALL_DEFINE6(futex_wait,
+		void __user *, uaddr,
+		unsigned long, val,
+		unsigned long, mask,
+		unsigned int, flags,
+		struct __kernel_timespec __user *, timeout,
+		clockid_t, clockid)
+{
+	struct hrtimer_sleeper to;
+	int ret;
+
+	if (flags & ~FUTEX2_MASK)
+		return -EINVAL;
+
+	flags = futex2_to_flags(flags);
+	if (!futex_flags_valid(flags))
+		return -EINVAL;
+
+	if (!futex_validate_input(flags, val) ||
+	    !futex_validate_input(flags, mask))
+		return -EINVAL;
+
+	if (timeout && (ret = futex2_setup_timeout(timeout, clockid, &to)))
+		return ret;
+
+	ret = __futex_wait(uaddr, flags, val, timeout ? &to : NULL, mask);
+
+	if (timeout)
+		futex2_destroy_timeout(&to);
+
+	return ret;
+}
+
 #ifdef CONFIG_COMPAT
 COMPAT_SYSCALL_DEFINE2(set_robust_list,
 		struct compat_robust_list_head __user *, head,
--- a/kernel/futex/waitwake.c
+++ b/kernel/futex/waitwake.c
@@ -629,20 +629,18 @@ int futex_wait_setup(u32 __user *uaddr,
 	return ret;
 }
 
-int futex_wait(u32 __user *uaddr, unsigned int flags, u32 val, ktime_t *abs_time, u32 bitset)
+int __futex_wait(u32 __user *uaddr, unsigned int flags, u32 val,
+		 struct hrtimer_sleeper *to, u32 bitset)
 {
-	struct hrtimer_sleeper timeout, *to;
-	struct restart_block *restart;
-	struct futex_hash_bucket *hb;
 	struct futex_q q = futex_q_init;
+	struct futex_hash_bucket *hb;
 	int ret;
 
 	if (!bitset)
 		return -EINVAL;
+
 	q.bitset = bitset;
 
-	to = futex_setup_timer(abs_time, &timeout, flags,
-			       current->timer_slack_ns);
 retry:
 	/*
 	 * Prepare to wait on uaddr. On success, it holds hb->lock and q
@@ -650,18 +648,17 @@ int futex_wait(u32 __user *uaddr, unsign
 	 */
 	ret = futex_wait_setup(uaddr, val, flags, &q, &hb);
 	if (ret)
-		goto out;
+		return ret;
 
 	/* futex_queue and wait for wakeup, timeout, or a signal. */
 	futex_wait_queue(hb, &q, to);
 
 	/* If we were woken (and unqueued), we succeeded, whatever. */
-	ret = 0;
 	if (!futex_unqueue(&q))
-		goto out;
-	ret = -ETIMEDOUT;
+		return 0;
+
 	if (to && !to->task)
-		goto out;
+		return -ETIMEDOUT;
 
 	/*
 	 * We expect signal_pending(current), but we might be the
@@ -670,24 +667,36 @@ int futex_wait(u32 __user *uaddr, unsign
 	if (!signal_pending(current))
 		goto retry;
 
-	ret = -ERESTARTSYS;
-	if (!abs_time)
-		goto out;
-
-	restart = &current->restart_block;
-	restart->futex.uaddr = uaddr;
-	restart->futex.val = val;
-	restart->futex.time = *abs_time;
-	restart->futex.bitset = bitset;
-	restart->futex.flags = flags | FLAGS_HAS_TIMEOUT;
-
-	ret = set_restart_fn(restart, futex_wait_restart);
-
-out:
-	if (to) {
-		hrtimer_cancel(&to->timer);
-		destroy_hrtimer_on_stack(&to->timer);
+	return -ERESTARTSYS;
+}
+
+int futex_wait(u32 __user *uaddr, unsigned int flags, u32 val, ktime_t *abs_time, u32 bitset)
+{
+	struct hrtimer_sleeper timeout, *to;
+	struct restart_block *restart;
+	int ret;
+
+	to = futex_setup_timer(abs_time, &timeout, flags,
+			       current->timer_slack_ns);
+
+	ret = __futex_wait(uaddr, flags, val, to, bitset);
+	if (!to)
+		return ret;
+
+	hrtimer_cancel(&to->timer);
+	destroy_hrtimer_on_stack(&to->timer);
+
+	if (ret == -ERESTARTSYS) {
+		restart = &current->restart_block;
+		restart->futex.uaddr = uaddr;
+		restart->futex.val = val;
+		restart->futex.time = *abs_time;
+		restart->futex.bitset = bitset;
+		restart->futex.flags = flags | FLAGS_HAS_TIMEOUT;
+
+		return set_restart_fn(restart, futex_wait_restart);
 	}
+
 	return ret;
 }
 
--- a/kernel/sys_ni.c
+++ b/kernel/sys_ni.c
@@ -88,6 +88,7 @@ COND_SYSCALL(get_robust_list);
 COND_SYSCALL_COMPAT(get_robust_list);
 COND_SYSCALL(futex_waitv);
 COND_SYSCALL(futex_wake);
+COND_SYSCALL(futex_wait);
 COND_SYSCALL(kexec_load);
 COND_SYSCALL_COMPAT(kexec_load);
 COND_SYSCALL(init_module);



^ permalink raw reply	[flat|nested] 52+ messages in thread

* [PATCH v1 07/14] futex: Propagate flags into get_futex_key()
  2023-07-21 10:22 [PATCH v1 00/14] futex: More futex2 bits Peter Zijlstra
                   ` (5 preceding siblings ...)
  2023-07-21 10:22 ` [PATCH v1 06/14] futex: Add sys_futex_wait() Peter Zijlstra
@ 2023-07-21 10:22 ` Peter Zijlstra
  2023-07-31 16:36   ` Thomas Gleixner
  2023-07-21 10:22 ` [PATCH v1 08/14] futex: Add flags2 argument to futex_requeue() Peter Zijlstra
                   ` (8 subsequent siblings)
  15 siblings, 1 reply; 52+ messages in thread
From: Peter Zijlstra @ 2023-07-21 10:22 UTC (permalink / raw)
  To: tglx, axboe
  Cc: linux-kernel, peterz, mingo, dvhart, dave, andrealmeid,
	Andrew Morton, urezki, hch, lstoakes, Arnd Bergmann, linux-api,
	linux-mm, linux-arch, malteskarupke

Instead of only passing FLAGS_SHARED as a boolean, pass down flags as
a whole.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 kernel/futex/core.c     |    5 ++++-
 kernel/futex/futex.h    |    2 +-
 kernel/futex/pi.c       |    4 ++--
 kernel/futex/requeue.c  |    6 +++---
 kernel/futex/waitwake.c |   14 +++++++-------
 5 files changed, 17 insertions(+), 14 deletions(-)

--- a/kernel/futex/core.c
+++ b/kernel/futex/core.c
@@ -217,7 +217,7 @@ static u64 get_inode_sequence_number(str
  *
  * lock_page() might sleep, the caller should not hold a spinlock.
  */
-int get_futex_key(u32 __user *uaddr, bool fshared, union futex_key *key,
+int get_futex_key(u32 __user *uaddr, unsigned int flags, union futex_key *key,
 		  enum futex_access rw)
 {
 	unsigned long address = (unsigned long)uaddr;
@@ -225,6 +225,9 @@ int get_futex_key(u32 __user *uaddr, boo
 	struct page *page, *tail;
 	struct address_space *mapping;
 	int err, ro = 0;
+	bool fshared;
+
+	fshared = flags & FLAGS_SHARED;
 
 	/*
 	 * The futex address must be "naturally" aligned.
--- a/kernel/futex/futex.h
+++ b/kernel/futex/futex.h
@@ -174,7 +174,7 @@ enum futex_access {
 	FUTEX_WRITE
 };
 
-extern int get_futex_key(u32 __user *uaddr, bool fshared, union futex_key *key,
+extern int get_futex_key(u32 __user *uaddr, unsigned int flags, union futex_key *key,
 			 enum futex_access rw);
 
 extern struct hrtimer_sleeper *
--- a/kernel/futex/pi.c
+++ b/kernel/futex/pi.c
@@ -945,7 +945,7 @@ int futex_lock_pi(u32 __user *uaddr, uns
 	to = futex_setup_timer(time, &timeout, flags, 0);
 
 retry:
-	ret = get_futex_key(uaddr, flags & FLAGS_SHARED, &q.key, FUTEX_WRITE);
+	ret = get_futex_key(uaddr, flags, &q.key, FUTEX_WRITE);
 	if (unlikely(ret != 0))
 		goto out;
 
@@ -1117,7 +1117,7 @@ int futex_unlock_pi(u32 __user *uaddr, u
 	if ((uval & FUTEX_TID_MASK) != vpid)
 		return -EPERM;
 
-	ret = get_futex_key(uaddr, flags & FLAGS_SHARED, &key, FUTEX_WRITE);
+	ret = get_futex_key(uaddr, flags, &key, FUTEX_WRITE);
 	if (ret)
 		return ret;
 
--- a/kernel/futex/requeue.c
+++ b/kernel/futex/requeue.c
@@ -424,10 +424,10 @@ int futex_requeue(u32 __user *uaddr1, un
 	}
 
 retry:
-	ret = get_futex_key(uaddr1, flags & FLAGS_SHARED, &key1, FUTEX_READ);
+	ret = get_futex_key(uaddr1, flags, &key1, FUTEX_READ);
 	if (unlikely(ret != 0))
 		return ret;
-	ret = get_futex_key(uaddr2, flags & FLAGS_SHARED, &key2,
+	ret = get_futex_key(uaddr2, flags, &key2,
 			    requeue_pi ? FUTEX_WRITE : FUTEX_READ);
 	if (unlikely(ret != 0))
 		return ret;
@@ -789,7 +789,7 @@ int futex_wait_requeue_pi(u32 __user *ua
 	 */
 	rt_mutex_init_waiter(&rt_waiter);
 
-	ret = get_futex_key(uaddr2, flags & FLAGS_SHARED, &key2, FUTEX_WRITE);
+	ret = get_futex_key(uaddr2, flags, &key2, FUTEX_WRITE);
 	if (unlikely(ret != 0))
 		goto out;
 
--- a/kernel/futex/waitwake.c
+++ b/kernel/futex/waitwake.c
@@ -145,13 +145,13 @@ int futex_wake(u32 __user *uaddr, unsign
 	struct futex_hash_bucket *hb;
 	struct futex_q *this, *next;
 	union futex_key key = FUTEX_KEY_INIT;
-	int ret;
 	DEFINE_WAKE_Q(wake_q);
+	int ret;
 
 	if (!bitset)
 		return -EINVAL;
 
-	ret = get_futex_key(uaddr, flags & FLAGS_SHARED, &key, FUTEX_READ);
+	ret = get_futex_key(uaddr, flags, &key, FUTEX_READ);
 	if (unlikely(ret != 0))
 		return ret;
 
@@ -245,10 +245,10 @@ int futex_wake_op(u32 __user *uaddr1, un
 	DEFINE_WAKE_Q(wake_q);
 
 retry:
-	ret = get_futex_key(uaddr1, flags & FLAGS_SHARED, &key1, FUTEX_READ);
+	ret = get_futex_key(uaddr1, flags, &key1, FUTEX_READ);
 	if (unlikely(ret != 0))
 		return ret;
-	ret = get_futex_key(uaddr2, flags & FLAGS_SHARED, &key2, FUTEX_WRITE);
+	ret = get_futex_key(uaddr2, flags, &key2, FUTEX_WRITE);
 	if (unlikely(ret != 0))
 		return ret;
 
@@ -423,7 +423,7 @@ static int futex_wait_multiple_setup(str
 			continue;
 
 		ret = get_futex_key(u64_to_user_ptr(vs[i].w.uaddr),
-				    vs[i].w.flags & FLAGS_SHARED,
+				    vs[i].w.flags,
 				    &vs[i].q.key, FUTEX_READ);
 
 		if (unlikely(ret))
@@ -435,7 +435,7 @@ static int futex_wait_multiple_setup(str
 	for (i = 0; i < count; i++) {
 		u32 __user *uaddr = (u32 __user *)(unsigned long)vs[i].w.uaddr;
 		struct futex_q *q = &vs[i].q;
-		u32 val = (u32)vs[i].w.val;
+		u32 val = vs[i].w.val;
 
 		hb = futex_q_lock(q);
 		ret = futex_get_value_locked(&uval, uaddr);
@@ -599,7 +599,7 @@ int futex_wait_setup(u32 __user *uaddr,
 	 * while the syscall executes.
 	 */
 retry:
-	ret = get_futex_key(uaddr, flags & FLAGS_SHARED, &q->key, FUTEX_READ);
+	ret = get_futex_key(uaddr, flags, &q->key, FUTEX_READ);
 	if (unlikely(ret != 0))
 		return ret;
 



^ permalink raw reply	[flat|nested] 52+ messages in thread

* [PATCH v1 08/14] futex: Add flags2 argument to futex_requeue()
  2023-07-21 10:22 [PATCH v1 00/14] futex: More futex2 bits Peter Zijlstra
                   ` (6 preceding siblings ...)
  2023-07-21 10:22 ` [PATCH v1 07/14] futex: Propagate flags into get_futex_key() Peter Zijlstra
@ 2023-07-21 10:22 ` Peter Zijlstra
  2023-07-31 16:43   ` Thomas Gleixner
  2023-07-21 10:22 ` [PATCH v1 09/14] futex: Add sys_futex_requeue() Peter Zijlstra
                   ` (7 subsequent siblings)
  15 siblings, 1 reply; 52+ messages in thread
From: Peter Zijlstra @ 2023-07-21 10:22 UTC (permalink / raw)
  To: tglx, axboe
  Cc: linux-kernel, peterz, mingo, dvhart, dave, andrealmeid,
	Andrew Morton, urezki, hch, lstoakes, Arnd Bergmann, linux-api,
	linux-mm, linux-arch, malteskarupke

In order to support mixed size requeue, add a second flags argument to
the internal futex_requeue() function.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 kernel/futex/futex.h    |    5 +++--
 kernel/futex/requeue.c  |   12 +++++++-----
 kernel/futex/syscalls.c |    6 +++---
 3 files changed, 13 insertions(+), 10 deletions(-)

--- a/kernel/futex/futex.h
+++ b/kernel/futex/futex.h
@@ -318,8 +318,9 @@ extern int futex_wait_requeue_pi(u32 __u
 				 val, ktime_t *abs_time, u32 bitset, u32 __user
 				 *uaddr2);
 
-extern int futex_requeue(u32 __user *uaddr1, unsigned int flags,
-			 u32 __user *uaddr2, int nr_wake, int nr_requeue,
+extern int futex_requeue(u32 __user *uaddr1, unsigned int flags1,
+			 u32 __user *uaddr2, unsigned int flags2,
+			 int nr_wake, int nr_requeue,
 			 u32 *cmpval, int requeue_pi);
 
 extern int __futex_wait(u32 __user *uaddr, unsigned int flags, u32 val,
--- a/kernel/futex/requeue.c
+++ b/kernel/futex/requeue.c
@@ -346,8 +346,9 @@ futex_proxy_trylock_atomic(u32 __user *p
 /**
  * futex_requeue() - Requeue waiters from uaddr1 to uaddr2
  * @uaddr1:	source futex user address
- * @flags:	futex flags (FLAGS_SHARED, etc.)
+ * @flags1:	futex flags (FLAGS_SHARED, etc.)
  * @uaddr2:	target futex user address
+ * @flags2:	futex flags (FLAGS_SHARED, etc.)
  * @nr_wake:	number of waiters to wake (must be 1 for requeue_pi)
  * @nr_requeue:	number of waiters to requeue (0-INT_MAX)
  * @cmpval:	@uaddr1 expected value (or %NULL)
@@ -361,7 +362,8 @@ futex_proxy_trylock_atomic(u32 __user *p
  *  - >=0 - on success, the number of tasks requeued or woken;
  *  -  <0 - on error
  */
-int futex_requeue(u32 __user *uaddr1, unsigned int flags, u32 __user *uaddr2,
+int futex_requeue(u32 __user *uaddr1, unsigned int flags1,
+		  u32 __user *uaddr2, unsigned int flags2,
 		  int nr_wake, int nr_requeue, u32 *cmpval, int requeue_pi)
 {
 	union futex_key key1 = FUTEX_KEY_INIT, key2 = FUTEX_KEY_INIT;
@@ -424,10 +426,10 @@ int futex_requeue(u32 __user *uaddr1, un
 	}
 
 retry:
-	ret = get_futex_key(uaddr1, flags, &key1, FUTEX_READ);
+	ret = get_futex_key(uaddr1, flags1, &key1, FUTEX_READ);
 	if (unlikely(ret != 0))
 		return ret;
-	ret = get_futex_key(uaddr2, flags, &key2,
+	ret = get_futex_key(uaddr2, flags2, &key2,
 			    requeue_pi ? FUTEX_WRITE : FUTEX_READ);
 	if (unlikely(ret != 0))
 		return ret;
@@ -459,7 +461,7 @@ int futex_requeue(u32 __user *uaddr1, un
 			if (ret)
 				return ret;
 
-			if (!(flags & FLAGS_SHARED))
+			if (!(flags1 & FLAGS_SHARED))
 				goto retry_private;
 
 			goto retry;
--- a/kernel/futex/syscalls.c
+++ b/kernel/futex/syscalls.c
@@ -106,9 +106,9 @@ long do_futex(u32 __user *uaddr, int op,
 	case FUTEX_WAKE_BITSET:
 		return futex_wake(uaddr, flags, val, val3);
 	case FUTEX_REQUEUE:
-		return futex_requeue(uaddr, flags, uaddr2, val, val2, NULL, 0);
+		return futex_requeue(uaddr, flags, uaddr2, flags, val, val2, NULL, 0);
 	case FUTEX_CMP_REQUEUE:
-		return futex_requeue(uaddr, flags, uaddr2, val, val2, &val3, 0);
+		return futex_requeue(uaddr, flags, uaddr2, flags, val, val2, &val3, 0);
 	case FUTEX_WAKE_OP:
 		return futex_wake_op(uaddr, flags, uaddr2, val, val2, val3);
 	case FUTEX_LOCK_PI:
@@ -125,7 +125,7 @@ long do_futex(u32 __user *uaddr, int op,
 		return futex_wait_requeue_pi(uaddr, flags, val, timeout, val3,
 					     uaddr2);
 	case FUTEX_CMP_REQUEUE_PI:
-		return futex_requeue(uaddr, flags, uaddr2, val, val2, &val3, 1);
+		return futex_requeue(uaddr, flags, uaddr2, flags, val, val2, &val3, 1);
 	}
 	return -ENOSYS;
 }



^ permalink raw reply	[flat|nested] 52+ messages in thread

* [PATCH v1 09/14] futex: Add sys_futex_requeue()
  2023-07-21 10:22 [PATCH v1 00/14] futex: More futex2 bits Peter Zijlstra
                   ` (7 preceding siblings ...)
  2023-07-21 10:22 ` [PATCH v1 08/14] futex: Add flags2 argument to futex_requeue() Peter Zijlstra
@ 2023-07-21 10:22 ` Peter Zijlstra
  2023-07-25  7:23   ` Geert Uytterhoeven
  2023-07-31 17:19   ` Thomas Gleixner
  2023-07-21 10:22 ` [PATCH v1 10/14] mm: Add vmalloc_huge_node() Peter Zijlstra
                   ` (6 subsequent siblings)
  15 siblings, 2 replies; 52+ messages in thread
From: Peter Zijlstra @ 2023-07-21 10:22 UTC (permalink / raw)
  To: tglx, axboe
  Cc: linux-kernel, peterz, mingo, dvhart, dave, andrealmeid,
	Andrew Morton, urezki, hch, lstoakes, Arnd Bergmann, linux-api,
	linux-mm, linux-arch, malteskarupke

Finish of the 'simple' futex2 syscall group by adding
sys_futex_requeue(). Unlike sys_futex_{wait,wake}() it's arguments are
too numerous to fit into a regular syscall. As such, use struct
futex_waitv to pass the 'source' and 'destination' futexes to the
syscall.

This syscall implements what was previously known as FUTEX_CMP_REQUEUE
and uses {val, uaddr, flags} for source and {uaddr, flags} for
destination.

This design explicitly allows requeueing between different types of
futex by having a different flags word per uaddr.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 arch/alpha/kernel/syscalls/syscall.tbl      |    1 
 arch/arm/tools/syscall.tbl                  |    1 
 arch/arm64/include/asm/unistd32.h           |    2 +
 arch/ia64/kernel/syscalls/syscall.tbl       |    1 
 arch/m68k/kernel/syscalls/syscall.tbl       |    1 
 arch/microblaze/kernel/syscalls/syscall.tbl |    1 
 arch/mips/kernel/syscalls/syscall_n32.tbl   |    1 
 arch/mips/kernel/syscalls/syscall_n64.tbl   |    1 
 arch/mips/kernel/syscalls/syscall_o32.tbl   |    1 
 arch/parisc/kernel/syscalls/syscall.tbl     |    1 
 arch/powerpc/kernel/syscalls/syscall.tbl    |    1 
 arch/s390/kernel/syscalls/syscall.tbl       |    1 
 arch/sh/kernel/syscalls/syscall.tbl         |    1 
 arch/sparc/kernel/syscalls/syscall.tbl      |    1 
 arch/x86/entry/syscalls/syscall_32.tbl      |    1 
 arch/x86/entry/syscalls/syscall_64.tbl      |    1 
 arch/xtensa/kernel/syscalls/syscall.tbl     |    1 
 include/linux/syscalls.h                    |    3 ++
 include/uapi/asm-generic/unistd.h           |    4 ++
 kernel/futex/syscalls.c                     |   38 ++++++++++++++++++++++++++++
 kernel/sys_ni.c                             |    1 
 21 files changed, 63 insertions(+), 1 deletion(-)

--- a/arch/alpha/kernel/syscalls/syscall.tbl
+++ b/arch/alpha/kernel/syscalls/syscall.tbl
@@ -493,3 +493,4 @@
 561	common	cachestat			sys_cachestat
 562	common	futex_wake			sys_futex_wake
 563	common	futex_wait			sys_futex_wait
+564	common	futex_requeue			sys_futex_requeue
--- a/arch/arm/tools/syscall.tbl
+++ b/arch/arm/tools/syscall.tbl
@@ -467,3 +467,4 @@
 451	common	cachestat			sys_cachestat
 452	common	futex_wake			sys_futex_wake
 453	common	futex_wait			sys_futex_wait
+454	common	futex_requeue			sys_futex_requeue
--- a/arch/arm64/include/asm/unistd32.h
+++ b/arch/arm64/include/asm/unistd32.h
@@ -913,6 +913,8 @@ __SYSCALL(__NR_cachestat, sys_cachestat)
 __SYSCALL(__NR_futex_wake, sys_futex_wake)
 #define __NR_futex_wait 453
 __SYSCALL(__NR_futex_wait, sys_futex_wait)
+#define __NR_futex_requeue 454
+__SYSCALL(__NR_futex_requeue, sys_futex_requeue)
 
 /*
  * Please add new compat syscalls above this comment and update
--- a/arch/ia64/kernel/syscalls/syscall.tbl
+++ b/arch/ia64/kernel/syscalls/syscall.tbl
@@ -374,3 +374,4 @@
 451	common	cachestat			sys_cachestat
 452	common	futex_wake			sys_futex_wake
 453	common	futex_wait			sys_futex_wait
+454	common	futex_requeue			sys_futex_requeue
--- a/arch/m68k/kernel/syscalls/syscall.tbl
+++ b/arch/m68k/kernel/syscalls/syscall.tbl
@@ -453,3 +453,4 @@
 451	common	cachestat			sys_cachestat
 452	common	futex_wake			sys_futex_wake
 453	common	futex_wait			sys_futex_wait
+454	common	futex_requeue			sys_futex_requeue
--- a/arch/microblaze/kernel/syscalls/syscall.tbl
+++ b/arch/microblaze/kernel/syscalls/syscall.tbl
@@ -459,3 +459,4 @@
 451	common	cachestat			sys_cachestat
 452	common	futex_wake			sys_futex_wake
 453	common	futex_wait			sys_futex_wait
+454	common	futex_requeue			sys_futex_requeue
--- a/arch/mips/kernel/syscalls/syscall_n32.tbl
+++ b/arch/mips/kernel/syscalls/syscall_n32.tbl
@@ -392,3 +392,4 @@
 451	n32	cachestat			sys_cachestat
 452	n32	futex_wake			sys_futex_wake
 453	n32	futex_wait			sys_futex_wait
+454	n32	futex_requeue			sys_futex_requeue
--- a/arch/mips/kernel/syscalls/syscall_n64.tbl
+++ b/arch/mips/kernel/syscalls/syscall_n64.tbl
@@ -368,3 +368,4 @@
 451	n64	cachestat			sys_cachestat
 452	n64	futex_wake			sys_futex_wake
 453	n64	futex_wait			sys_futex_wait
+454	n64	futex_requeue			sys_futex_requeue
--- a/arch/mips/kernel/syscalls/syscall_o32.tbl
+++ b/arch/mips/kernel/syscalls/syscall_o32.tbl
@@ -441,3 +441,4 @@
 451	o32	cachestat			sys_cachestat
 452	o32	futex_wake			sys_futex_wake
 453	o32	futex_wait			sys_futex_wait
+454	o32	futex_requeue			sys_futex_requeue
--- a/arch/parisc/kernel/syscalls/syscall.tbl
+++ b/arch/parisc/kernel/syscalls/syscall.tbl
@@ -452,3 +452,4 @@
 451	common	cachestat			sys_cachestat
 452	common	futex_wake			sys_futex_wake
 453	common	futex_wait			sys_futex_wait
+454	common	futex_requeue			sys_futex_requeue
--- a/arch/powerpc/kernel/syscalls/syscall.tbl
+++ b/arch/powerpc/kernel/syscalls/syscall.tbl
@@ -540,3 +540,4 @@
 451	common	cachestat			sys_cachestat
 452	common	futex_wake			sys_futex_wake
 453	common	futex_wait			sys_futex_wait
+454	common	futex_requeue			sys_futex_requeue
--- a/arch/s390/kernel/syscalls/syscall.tbl
+++ b/arch/s390/kernel/syscalls/syscall.tbl
@@ -456,3 +456,4 @@
 451  common	cachestat		sys_cachestat			sys_cachestat
 452  common	futex_wake		sys_futex_wake			sys_futex_wake
 453  common	futex_wait		sys_futex_wait			sys_futex_wait
+454  common	futex_requeue		sys_futex_requeue			sys_futex_requeue
--- a/arch/sh/kernel/syscalls/syscall.tbl
+++ b/arch/sh/kernel/syscalls/syscall.tbl
@@ -456,3 +456,4 @@
 451	common	cachestat			sys_cachestat
 452	common	futex_wake			sys_futex_wake
 453	common	futex_wait			sys_futex_wait
+454	common	futex_requeue			sys_futex_requeue
--- a/arch/sparc/kernel/syscalls/syscall.tbl
+++ b/arch/sparc/kernel/syscalls/syscall.tbl
@@ -499,3 +499,4 @@
 451	common	cachestat			sys_cachestat
 452	common	futex_wake			sys_futex_wake
 453	common	futex_wait			sys_futex_wait
+454	common	futex_requeue			sys_futex_requeue
--- a/arch/x86/entry/syscalls/syscall_32.tbl
+++ b/arch/x86/entry/syscalls/syscall_32.tbl
@@ -458,3 +458,4 @@
 451	i386	cachestat		sys_cachestat
 452	i386	futex_wake		sys_futex_wake
 453	i386	futex_wait		sys_futex_wait
+454	i386	futex_requeue		sys_futex_requeue
--- a/arch/x86/entry/syscalls/syscall_64.tbl
+++ b/arch/x86/entry/syscalls/syscall_64.tbl
@@ -375,6 +375,7 @@
 451	common	cachestat		sys_cachestat
 452	common	futex_wake		sys_futex_wake
 453	common	futex_wait		sys_futex_wait
+454	common	futex_requeue		sys_futex_requeue
 
 #
 # Due to a historical design error, certain syscalls are numbered differently
--- a/arch/xtensa/kernel/syscalls/syscall.tbl
+++ b/arch/xtensa/kernel/syscalls/syscall.tbl
@@ -424,3 +424,4 @@
 451	common	cachestat			sys_cachestat
 452	common	futex_wake			sys_futex_wake
 453	common	futex_wait			sys_futex_wait
+454	common	futex_requeue			sys_futex_requeue
--- a/include/linux/syscalls.h
+++ b/include/linux/syscalls.h
@@ -570,6 +570,9 @@ asmlinkage long sys_futex_wait(void __us
 			       unsigned int flags, struct __kernel_timespec __user *timespec,
 			       clockid_t clockid);
 
+asmlinkage long sys_futex_requeue(struct futex_waitv __user *waiters,
+				  unsigned int flags, int nr_wake, int nr_requeue);
+
 asmlinkage long sys_nanosleep(struct __kernel_timespec __user *rqtp,
 			      struct __kernel_timespec __user *rmtp);
 asmlinkage long sys_nanosleep_time32(struct old_timespec32 __user *rqtp,
--- a/include/uapi/asm-generic/unistd.h
+++ b/include/uapi/asm-generic/unistd.h
@@ -822,9 +822,11 @@ __SYSCALL(__NR_cachestat, sys_cachestat)
 __SYSCALL(__NR_futex_wake, sys_futex_wake)
 #define __NR_futex_wait 453
 __SYSCALL(__NR_futex_wait, sys_futex_wait)
+#define __NR_futex_requeue 454
+__SYSCALL(__NR_futex_requeue, sys_futex_requeue)
 
 #undef __NR_syscalls
-#define __NR_syscalls 454
+#define __NR_syscalls 455
 
 /*
  * 32 bit systems traditionally used different
--- a/kernel/futex/syscalls.c
+++ b/kernel/futex/syscalls.c
@@ -396,6 +396,44 @@ SYSCALL_DEFINE6(futex_wait,
 	return ret;
 }
 
+/*
+ * sys_futex_requeue - Requeue a waiter from one futex to another
+ * @waiters:	array describing the source and destination futex
+ * @flags:	unused
+ * @nr_wake:	number of futexes to wake
+ * @nr_requeue:	number of futexes to requeue
+ *
+ * Identical to the traditional FUTEX_CMP_REQUEUE op, except it is part of the
+ * futex2 family of calls.
+ */
+
+SYSCALL_DEFINE4(futex_requeue,
+		struct futex_waitv __user *, waiters,
+		unsigned int, flags,
+		int, nr_wake,
+		int, nr_requeue)
+{
+	struct futex_vector futexes[2];
+	u32 cmpval;
+	int ret;
+
+	if (flags)
+		return -EINVAL;
+
+	if (!waiters)
+		return -EINVAL;
+
+	ret = futex_parse_waitv(futexes, waiters, 2);
+	if (ret)
+		return ret;
+
+	cmpval = futexes[0].w.val;
+
+	return futex_requeue(u64_to_user_ptr(futexes[0].w.uaddr), futexes[0].w.flags,
+			     u64_to_user_ptr(futexes[1].w.uaddr), futexes[1].w.flags,
+			     nr_wake, nr_requeue, &cmpval, 0);
+}
+
 #ifdef CONFIG_COMPAT
 COMPAT_SYSCALL_DEFINE2(set_robust_list,
 		struct compat_robust_list_head __user *, head,
--- a/kernel/sys_ni.c
+++ b/kernel/sys_ni.c
@@ -89,6 +89,7 @@ COND_SYSCALL_COMPAT(get_robust_list);
 COND_SYSCALL(futex_waitv);
 COND_SYSCALL(futex_wake);
 COND_SYSCALL(futex_wait);
+COND_SYSCALL(futex_requeue);
 COND_SYSCALL(kexec_load);
 COND_SYSCALL_COMPAT(kexec_load);
 COND_SYSCALL(init_module);



^ permalink raw reply	[flat|nested] 52+ messages in thread

* [PATCH v1 10/14] mm: Add vmalloc_huge_node()
  2023-07-21 10:22 [PATCH v1 00/14] futex: More futex2 bits Peter Zijlstra
                   ` (8 preceding siblings ...)
  2023-07-21 10:22 ` [PATCH v1 09/14] futex: Add sys_futex_requeue() Peter Zijlstra
@ 2023-07-21 10:22 ` Peter Zijlstra
  2023-07-24 13:46   ` Christoph Hellwig
  2023-07-21 10:22 ` [PATCH v1 11/14] futex: Implement FUTEX2_NUMA Peter Zijlstra
                   ` (5 subsequent siblings)
  15 siblings, 1 reply; 52+ messages in thread
From: Peter Zijlstra @ 2023-07-21 10:22 UTC (permalink / raw)
  To: tglx, axboe
  Cc: linux-kernel, peterz, mingo, dvhart, dave, andrealmeid,
	Andrew Morton, urezki, hch, lstoakes, Arnd Bergmann, linux-api,
	linux-mm, linux-arch, malteskarupke

To enable node specific hash-tables.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 include/linux/vmalloc.h |    1 +
 mm/vmalloc.c            |    7 +++++++
 2 files changed, 8 insertions(+)

--- a/include/linux/vmalloc.h
+++ b/include/linux/vmalloc.h
@@ -152,6 +152,7 @@ extern void *__vmalloc_node_range(unsign
 void *__vmalloc_node(unsigned long size, unsigned long align, gfp_t gfp_mask,
 		int node, const void *caller) __alloc_size(1);
 void *vmalloc_huge(unsigned long size, gfp_t gfp_mask) __alloc_size(1);
+void *vmalloc_huge_node(unsigned long size, gfp_t gfp_mask, int node) __alloc_size(1);
 
 extern void *__vmalloc_array(size_t n, size_t size, gfp_t flags) __alloc_size(1, 2);
 extern void *vmalloc_array(size_t n, size_t size) __alloc_size(1, 2);
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -3416,6 +3416,13 @@ void *vmalloc(unsigned long size)
 }
 EXPORT_SYMBOL(vmalloc);
 
+void *vmalloc_huge_node(unsigned long size, gfp_t gfp_mask, int node)
+{
+	return __vmalloc_node_range(size, 1, VMALLOC_START, VMALLOC_END,
+				    gfp_mask, PAGE_KERNEL, VM_ALLOW_HUGE_VMAP,
+				    node, __builtin_return_address(0));
+}
+
 /**
  * vmalloc_huge - allocate virtually contiguous memory, allow huge pages
  * @size:      allocation size



^ permalink raw reply	[flat|nested] 52+ messages in thread

* [PATCH v1 11/14] futex: Implement FUTEX2_NUMA
  2023-07-21 10:22 [PATCH v1 00/14] futex: More futex2 bits Peter Zijlstra
                   ` (9 preceding siblings ...)
  2023-07-21 10:22 ` [PATCH v1 10/14] mm: Add vmalloc_huge_node() Peter Zijlstra
@ 2023-07-21 10:22 ` Peter Zijlstra
  2023-07-21 12:16   ` Peter Zijlstra
  2023-07-31 17:36   ` Thomas Gleixner
  2023-07-21 10:22 ` [PATCH v1 12/14] futex: Propagate flags into futex_get_value_locked() Peter Zijlstra
                   ` (4 subsequent siblings)
  15 siblings, 2 replies; 52+ messages in thread
From: Peter Zijlstra @ 2023-07-21 10:22 UTC (permalink / raw)
  To: tglx, axboe
  Cc: linux-kernel, peterz, mingo, dvhart, dave, andrealmeid,
	Andrew Morton, urezki, hch, lstoakes, Arnd Bergmann, linux-api,
	linux-mm, linux-arch, malteskarupke

Extend the futex2 interface to be numa aware.

When FUTEX2_NUMA is specified for a futex, the user value is extended
to two words (of the same size). The first is the user value we all
know, the second one will be the node to place this futex on.

  struct futex_numa_32 {
	u32 val;
	u32 node;
  };

When node is set to ~0, WAIT will set it to the current node_id such
that WAKE knows where to find it. If userspace corrupts the node value
between WAIT and WAKE, the futex will not be found and no wakeup will
happen.

When FUTEX2_NUMA is not set, the node is simply an extention of the
hash, such that traditional futexes are still interleaved over the
nodes.

This is done to avoid having to have a separate !numa hash-table.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 include/linux/futex.h   |    3 +
 kernel/futex/core.c     |  128 +++++++++++++++++++++++++++++++++++++++---------
 kernel/futex/futex.h    |   26 +++++++--
 kernel/futex/syscalls.c |    2 
 4 files changed, 127 insertions(+), 32 deletions(-)

--- a/include/linux/futex.h
+++ b/include/linux/futex.h
@@ -34,6 +34,7 @@ union futex_key {
 		u64 i_seq;
 		unsigned long pgoff;
 		unsigned int offset;
+		/* unsigned int node; */
 	} shared;
 	struct {
 		union {
@@ -42,11 +43,13 @@ union futex_key {
 		};
 		unsigned long address;
 		unsigned int offset;
+		/* unsigned int node; */
 	} private;
 	struct {
 		u64 ptr;
 		unsigned long word;
 		unsigned int offset;
+		unsigned int node;	/* NOT hashed! */
 	} both;
 };
 
--- a/kernel/futex/core.c
+++ b/kernel/futex/core.c
@@ -34,7 +34,8 @@
 #include <linux/compat.h>
 #include <linux/jhash.h>
 #include <linux/pagemap.h>
-#include <linux/memblock.h>
+#include <linux/gfp.h>
+#include <linux/vmalloc.h>
 #include <linux/fault-inject.h>
 #include <linux/slab.h>
 
@@ -47,12 +48,14 @@
  * reside in the same cacheline.
  */
 static struct {
-	struct futex_hash_bucket *queues;
 	unsigned long            hashsize;
+	unsigned int		 hashshift;
+	struct futex_hash_bucket *queues[MAX_NUMNODES];
 } __futex_data __read_mostly __aligned(2*sizeof(long));
-#define futex_queues   (__futex_data.queues)
-#define futex_hashsize (__futex_data.hashsize)
 
+#define futex_hashsize	(__futex_data.hashsize)
+#define futex_hashshift	(__futex_data.hashshift)
+#define futex_queues	(__futex_data.queues)
 
 /*
  * Fault injections for futexes.
@@ -105,6 +108,26 @@ late_initcall(fail_futex_debugfs);
 
 #endif /* CONFIG_FAIL_FUTEX */
 
+static int futex_get_value(u32 *val, u32 __user *from, unsigned int flags)
+{
+	switch (futex_size(flags)) {
+	case 1: return __get_user(*val, (u8 __user *)from);
+	case 2: return __get_user(*val, (u16 __user *)from);
+	case 4: return __get_user(*val, (u32 __user *)from);
+	default: BUG();
+	}
+}
+
+static int futex_put_value(u32 val, u32 __user *to, unsigned int flags)
+{
+	switch (futex_size(flags)) {
+	case 1: return __put_user(val, (u8 __user *)to);
+	case 2: return __put_user(val, (u16 __user *)to);
+	case 4: return __put_user(val, (u32 __user *)to);
+	default: BUG();
+	}
+}
+
 /**
  * futex_hash - Return the hash bucket in the global hash
  * @key:	Pointer to the futex key for which the hash is calculated
@@ -114,10 +137,29 @@ late_initcall(fail_futex_debugfs);
  */
 struct futex_hash_bucket *futex_hash(union futex_key *key)
 {
-	u32 hash = jhash2((u32 *)key, offsetof(typeof(*key), both.offset) / 4,
+	u32 hash = jhash2((u32 *)key,
+			  offsetof(typeof(*key), both.offset) / sizeof(u32),
 			  key->both.offset);
+	int node = key->both.node;
 
-	return &futex_queues[hash & (futex_hashsize - 1)];
+	if (node == -1) {
+		/*
+		 * In case of !FLAGS_NUMA, use some unused hash bits to pick a
+		 * node -- this ensures regular futexes are interleaved across
+		 * the nodes and avoids having to allocate multiple
+		 * hash-tables.
+		 *
+		 * NOTE: this isn't perfectly uniform, but it is fast and
+		 * handles sparse node masks.
+		 */
+		node = (hash >> futex_hashshift) % nr_node_ids;
+		if (!node_possible(node)) {
+			node = find_next_bit_wrap(node_possible_map.bits,
+						  nr_node_ids, node);
+		}
+	}
+
+	return &futex_queues[node][hash & (futex_hashsize - 1)];
 }
 
 
@@ -217,32 +259,55 @@ static u64 get_inode_sequence_number(str
  *
  * lock_page() might sleep, the caller should not hold a spinlock.
  */
-int get_futex_key(u32 __user *uaddr, unsigned int flags, union futex_key *key,
+int get_futex_key(void __user *uaddr, unsigned int flags, union futex_key *key,
 		  enum futex_access rw)
 {
 	unsigned long address = (unsigned long)uaddr;
 	struct mm_struct *mm = current->mm;
 	struct page *page, *tail;
 	struct address_space *mapping;
-	int err, ro = 0;
+	int node, err, size, ro = 0;
 	bool fshared;
 
 	fshared = flags & FLAGS_SHARED;
+	size = futex_size(flags);
 
 	/*
 	 * The futex address must be "naturally" aligned.
 	 */
 	key->both.offset = address % PAGE_SIZE;
-	if (unlikely((address % sizeof(u32)) != 0))
+	if (unlikely((address % size) != 0))
 		return -EINVAL;
 	address -= key->both.offset;
 
-	if (unlikely(!access_ok(uaddr, sizeof(u32))))
+	if (flags & FLAGS_NUMA)
+		size *= 2;
+
+	if (unlikely(!access_ok(uaddr, size)))
 		return -EFAULT;
 
 	if (unlikely(should_fail_futex(fshared)))
 		return -EFAULT;
 
+	key->both.node = -1;
+	if (flags & FLAGS_NUMA) {
+		void __user *naddr = uaddr + size/2;
+
+		if (futex_get_value(&node, naddr, flags))
+			return -EFAULT;
+
+		if (node == -1) {
+			node = numa_node_id();
+			if (futex_put_value(node, naddr, flags))
+				return -EFAULT;
+		}
+
+		if (node >= MAX_NUMNODES || !node_possible(node))
+			return -EINVAL;
+
+		key->both.node = node;
+	}
+
 	/*
 	 * PROCESS_PRIVATE futexes are fast.
 	 * As the mm cannot disappear under us and the 'key' only needs
@@ -1125,27 +1190,42 @@ void futex_exit_release(struct task_stru
 
 static int __init futex_init(void)
 {
-	unsigned int futex_shift;
-	unsigned long i;
+	unsigned int order, n;
+	unsigned long size, i;
 
 #if CONFIG_BASE_SMALL
 	futex_hashsize = 16;
 #else
-	futex_hashsize = roundup_pow_of_two(256 * num_possible_cpus());
+	futex_hashsize = 256 * num_possible_cpus();
+	futex_hashsize /= num_possible_nodes();
+	futex_hashsize = roundup_pow_of_two(futex_hashsize);
 #endif
+	futex_hashshift = ilog2(futex_hashsize);
+	size = sizeof(struct futex_hash_bucket) * futex_hashsize;
+	order = get_order(size);
+
+	for_each_node(n) {
+		struct futex_hash_bucket *table;
+
+		if (order > MAX_ORDER)
+			table = vmalloc_huge_node(size, GFP_KERNEL, n);
+		else
+			table = alloc_pages_exact_nid(n, size, GFP_KERNEL);
+
+		BUG_ON(!table);
+
+		for (i = 0; i < futex_hashsize; i++) {
+			atomic_set(&table[i].waiters, 0);
+			spin_lock_init(&table[i].lock);
+			plist_head_init(&table[i].chain);
+		}
 
-	futex_queues = alloc_large_system_hash("futex", sizeof(*futex_queues),
-					       futex_hashsize, 0,
-					       futex_hashsize < 256 ? HASH_SMALL : 0,
-					       &futex_shift, NULL,
-					       futex_hashsize, futex_hashsize);
-	futex_hashsize = 1UL << futex_shift;
-
-	for (i = 0; i < futex_hashsize; i++) {
-		atomic_set(&futex_queues[i].waiters, 0);
-		plist_head_init(&futex_queues[i].chain);
-		spin_lock_init(&futex_queues[i].lock);
+		futex_queues[n] = table;
 	}
+	pr_info("futex hash table, %d nodes, %ld entries (order: %d, %lu bytes)\n",
+		num_possible_nodes(),
+		futex_hashsize, order,
+		sizeof(struct futex_hash_bucket) * futex_hashsize);
 
 	return 0;
 }
--- a/kernel/futex/futex.h
+++ b/kernel/futex/futex.h
@@ -65,6 +65,12 @@ static inline unsigned int futex2_to_fla
 	return flags;
 }
 
+static inline unsigned int futex_size(unsigned int flags)
+{
+	unsigned int size = flags & FLAGS_SIZE_MASK;
+	return 1 << size; /* {0,1,2,3} -> {1,2,4,8} */
+}
+
 static inline bool futex_flags_valid(unsigned int flags)
 {
 	/* Only 64bit futexes for 64bit code */
@@ -77,13 +83,19 @@ static inline bool futex_flags_valid(uns
 	if ((flags & FLAGS_SIZE_MASK) != FLAGS_SIZE_32)
 		return false;
 
-	return true;
-}
+	/*
+	 * Must be able to represent both NUMA_NO_NODE and every valid nodeid
+	 * in a futex word.
+	 */
+	if (flags & FLAGS_NUMA) {
+		int bits = 8 * futex_size(flags);
+		u64 max = ~0ULL;
+		max >>= 64 - bits;
+		if (nr_node_ids >= max)
+			return false;
+	}
 
-static inline unsigned int futex_size(unsigned int flags)
-{
-	unsigned int size = flags & FLAGS_SIZE_MASK;
-	return 1 << size; /* {0,1,2,3} -> {1,2,4,8} */
+	return true;
 }
 
 static inline bool futex_validate_input(unsigned int flags, u64 val)
@@ -182,7 +194,7 @@ enum futex_access {
 	FUTEX_WRITE
 };
 
-extern int get_futex_key(u32 __user *uaddr, unsigned int flags, union futex_key *key,
+extern int get_futex_key(void __user *uaddr, unsigned int flags, union futex_key *key,
 			 enum futex_access rw);
 
 extern struct hrtimer_sleeper *
--- a/kernel/futex/syscalls.c
+++ b/kernel/futex/syscalls.c
@@ -179,7 +179,7 @@ SYSCALL_DEFINE6(futex, u32 __user *, uad
 	return do_futex(uaddr, op, val, tp, uaddr2, (unsigned long)utime, val3);
 }
 
-#define FUTEX2_MASK (FUTEX2_64 | FUTEX2_PRIVATE)
+#define FUTEX2_MASK (FUTEX2_64 | FUTEX2_NUMA | FUTEX2_PRIVATE)
 
 /**
  * futex_parse_waitv - Parse a waitv array from userspace



^ permalink raw reply	[flat|nested] 52+ messages in thread

* [PATCH v1 12/14] futex: Propagate flags into futex_get_value_locked()
  2023-07-21 10:22 [PATCH v1 00/14] futex: More futex2 bits Peter Zijlstra
                   ` (10 preceding siblings ...)
  2023-07-21 10:22 ` [PATCH v1 11/14] futex: Implement FUTEX2_NUMA Peter Zijlstra
@ 2023-07-21 10:22 ` Peter Zijlstra
  2023-07-21 10:22 ` [PATCH v1 13/14] futex: Enable FUTEX2_{8,16} Peter Zijlstra
                   ` (3 subsequent siblings)
  15 siblings, 0 replies; 52+ messages in thread
From: Peter Zijlstra @ 2023-07-21 10:22 UTC (permalink / raw)
  To: tglx, axboe
  Cc: linux-kernel, peterz, mingo, dvhart, dave, andrealmeid,
	Andrew Morton, urezki, hch, lstoakes, Arnd Bergmann, linux-api,
	linux-mm, linux-arch, malteskarupke

In order to facilitate variable sized futexes propagate the flags into
futex_get_value_locked().

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 kernel/futex/core.c     |    4 ++--
 kernel/futex/futex.h    |    2 +-
 kernel/futex/pi.c       |    8 ++++----
 kernel/futex/requeue.c  |    4 ++--
 kernel/futex/waitwake.c |    4 ++--
 5 files changed, 11 insertions(+), 11 deletions(-)

--- a/kernel/futex/core.c
+++ b/kernel/futex/core.c
@@ -515,12 +515,12 @@ int futex_cmpxchg_value_locked(u32 *curv
 	return ret;
 }
 
-int futex_get_value_locked(u32 *dest, u32 __user *from)
+int futex_get_value_locked(u32 *dest, u32 __user *from, unsigned int flags)
 {
 	int ret;
 
 	pagefault_disable();
-	ret = __get_user(*dest, from);
+	ret = futex_get_value(dest, from, flags);
 	pagefault_enable();
 
 	return ret ? -EFAULT : 0;
--- a/kernel/futex/futex.h
+++ b/kernel/futex/futex.h
@@ -218,7 +218,7 @@ extern void futex_wake_mark(struct wake_
 
 extern int fault_in_user_writeable(u32 __user *uaddr);
 extern int futex_cmpxchg_value_locked(u32 *curval, u32 __user *uaddr, u32 uval, u32 newval);
-extern int futex_get_value_locked(u32 *dest, u32 __user *from);
+extern int futex_get_value_locked(u32 *dest, u32 __user *from, unsigned int flags);
 extern struct futex_q *futex_top_waiter(struct futex_hash_bucket *hb, union futex_key *key);
 
 extern void __futex_unqueue(struct futex_q *q);
--- a/kernel/futex/pi.c
+++ b/kernel/futex/pi.c
@@ -239,7 +239,7 @@ static int attach_to_pi_state(u32 __user
 	 * still is what we expect it to be, otherwise retry the entire
 	 * operation.
 	 */
-	if (futex_get_value_locked(&uval2, uaddr))
+	if (futex_get_value_locked(&uval2, uaddr, FLAGS_SIZE_32))
 		goto out_efault;
 
 	if (uval != uval2)
@@ -358,7 +358,7 @@ static int handle_exit_race(u32 __user *
 	 * The same logic applies to the case where the exiting task is
 	 * already gone.
 	 */
-	if (futex_get_value_locked(&uval2, uaddr))
+	if (futex_get_value_locked(&uval2, uaddr, FLAGS_SIZE_32))
 		return -EFAULT;
 
 	/* If the user space value has changed, try again. */
@@ -526,7 +526,7 @@ int futex_lock_pi_atomic(u32 __user *uad
 	 * Read the user space value first so we can validate a few
 	 * things before proceeding further.
 	 */
-	if (futex_get_value_locked(&uval, uaddr))
+	if (futex_get_value_locked(&uval, uaddr, FLAGS_SIZE_32))
 		return -EFAULT;
 
 	if (unlikely(should_fail_futex(true)))
@@ -762,7 +762,7 @@ static int __fixup_pi_state_owner(u32 __
 	if (!pi_state->owner)
 		newtid |= FUTEX_OWNER_DIED;
 
-	err = futex_get_value_locked(&uval, uaddr);
+	err = futex_get_value_locked(&uval, uaddr, FLAGS_SIZE_32);
 	if (err)
 		goto handle_err;
 
--- a/kernel/futex/requeue.c
+++ b/kernel/futex/requeue.c
@@ -273,7 +273,7 @@ futex_proxy_trylock_atomic(u32 __user *p
 	u32 curval;
 	int ret;
 
-	if (futex_get_value_locked(&curval, pifutex))
+	if (futex_get_value_locked(&curval, pifutex, FLAGS_SIZE_32))
 		return -EFAULT;
 
 	if (unlikely(should_fail_futex(true)))
@@ -451,7 +451,7 @@ int futex_requeue(u32 __user *uaddr1, un
 	if (likely(cmpval != NULL)) {
 		u32 curval;
 
-		ret = futex_get_value_locked(&curval, uaddr1);
+		ret = futex_get_value_locked(&curval, uaddr1, FLAGS_SIZE_32);
 
 		if (unlikely(ret)) {
 			double_unlock_hb(hb1, hb2);
--- a/kernel/futex/waitwake.c
+++ b/kernel/futex/waitwake.c
@@ -438,7 +438,7 @@ static int futex_wait_multiple_setup(str
 		u32 val = vs[i].w.val;
 
 		hb = futex_q_lock(q);
-		ret = futex_get_value_locked(&uval, uaddr);
+		ret = futex_get_value_locked(&uval, uaddr, FLAGS_SIZE_32);
 
 		if (!ret && uval == val) {
 			/*
@@ -606,7 +606,7 @@ int futex_wait_setup(u32 __user *uaddr,
 retry_private:
 	*hb = futex_q_lock(q);
 
-	ret = futex_get_value_locked(&uval, uaddr);
+	ret = futex_get_value_locked(&uval, uaddr, FLAGS_SIZE_32);
 
 	if (ret) {
 		futex_q_unlock(*hb);



^ permalink raw reply	[flat|nested] 52+ messages in thread

* [PATCH v1 13/14] futex: Enable FUTEX2_{8,16}
  2023-07-21 10:22 [PATCH v1 00/14] futex: More futex2 bits Peter Zijlstra
                   ` (11 preceding siblings ...)
  2023-07-21 10:22 ` [PATCH v1 12/14] futex: Propagate flags into futex_get_value_locked() Peter Zijlstra
@ 2023-07-21 10:22 ` Peter Zijlstra
  2023-07-21 10:22 ` [PATCH v1 14/14] futex,selftests: Extend the futex selftests Peter Zijlstra
                   ` (2 subsequent siblings)
  15 siblings, 0 replies; 52+ messages in thread
From: Peter Zijlstra @ 2023-07-21 10:22 UTC (permalink / raw)
  To: tglx, axboe
  Cc: linux-kernel, peterz, mingo, dvhart, dave, andrealmeid,
	Andrew Morton, urezki, hch, lstoakes, Arnd Bergmann, linux-api,
	linux-mm, linux-arch, malteskarupke

When futexes are no longer u32 aligned, the lower offset bits are no
longer available to put type info in. However, since offset is the
offset within a page, there are plenty bits available on the top end.

After that, pass flags into futex_get_value_locked() for WAIT and
disallow FUTEX2_64 instead of mandating FUTEX2_32.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 include/linux/futex.h   |   11 ++++++-----
 kernel/futex/core.c     |    9 +++++++++
 kernel/futex/futex.h    |    4 ++--
 kernel/futex/waitwake.c |    5 +++--
 4 files changed, 20 insertions(+), 9 deletions(-)

--- a/include/linux/futex.h
+++ b/include/linux/futex.h
@@ -16,18 +16,19 @@ struct task_struct;
  * The key type depends on whether it's a shared or private mapping.
  * Don't rearrange members without looking at hash_futex().
  *
- * offset is aligned to a multiple of sizeof(u32) (== 4) by definition.
- * We use the two low order bits of offset to tell what is the kind of key :
+ * offset is the position within a page and is in the range [0, PAGE_SIZE).
+ * The high bits of the offset indicate what kind of key this is:
  *  00 : Private process futex (PTHREAD_PROCESS_PRIVATE)
  *       (no reference on an inode or mm)
  *  01 : Shared futex (PTHREAD_PROCESS_SHARED)
  *	mapped on a file (reference on the underlying inode)
  *  10 : Shared futex (PTHREAD_PROCESS_SHARED)
  *       (but private mapping on an mm, and reference taken on it)
-*/
+ */
 
-#define FUT_OFF_INODE    1 /* We set bit 0 if key has a reference on inode */
-#define FUT_OFF_MMSHARED 2 /* We set bit 1 if key has a reference on mm */
+#define FUT_OFF_INODE    (PAGE_SIZE << 0)
+#define FUT_OFF_MMSHARED (PAGE_SIZE << 1)
+#define FUT_OFF_SIZE	 (PAGE_SIZE << 2)
 
 union futex_key {
 	struct {
--- a/kernel/futex/core.c
+++ b/kernel/futex/core.c
@@ -308,6 +308,15 @@ int get_futex_key(void __user *uaddr, un
 	}
 
 	/*
+	 * Encode the futex size in the offset. This makes cross-size
+	 * wake-wait fail -- see futex_match().
+	 *
+	 * NOTE that cross-size wake-wait is fundamentally broken wrt
+	 * FLAGS_NUMA but could possibly work for !NUMA.
+	 */
+	key->both.offset |= FUT_OFF_SIZE * (flags & FLAGS_SIZE_MASK);
+
+	/*
 	 * PROCESS_PRIVATE futexes are fast.
 	 * As the mm cannot disappear under us and the 'key' only needs
 	 * virtual address, we dont even have to find the underlying vma.
--- a/kernel/futex/futex.h
+++ b/kernel/futex/futex.h
@@ -79,8 +79,8 @@ static inline bool futex_flags_valid(uns
 			return false;
 	}
 
-	/* Only 32bit futexes are implemented -- for now */
-	if ((flags & FLAGS_SIZE_MASK) != FLAGS_SIZE_32)
+	/* 64bit futexes aren't implemented -- yet */
+	if ((flags & FLAGS_SIZE_MASK) == FLAGS_SIZE_64)
 		return false;
 
 	/*
--- a/kernel/futex/waitwake.c
+++ b/kernel/futex/waitwake.c
@@ -434,11 +434,12 @@ static int futex_wait_multiple_setup(str
 
 	for (i = 0; i < count; i++) {
 		u32 __user *uaddr = (u32 __user *)(unsigned long)vs[i].w.uaddr;
+		unsigned int flags = vs[i].w.flags;
 		struct futex_q *q = &vs[i].q;
 		u32 val = vs[i].w.val;
 
 		hb = futex_q_lock(q);
-		ret = futex_get_value_locked(&uval, uaddr, FLAGS_SIZE_32);
+		ret = futex_get_value_locked(&uval, uaddr, flags);
 
 		if (!ret && uval == val) {
 			/*
@@ -606,7 +607,7 @@ int futex_wait_setup(u32 __user *uaddr,
 retry_private:
 	*hb = futex_q_lock(q);
 
-	ret = futex_get_value_locked(&uval, uaddr, FLAGS_SIZE_32);
+	ret = futex_get_value_locked(&uval, uaddr, flags);
 
 	if (ret) {
 		futex_q_unlock(*hb);



^ permalink raw reply	[flat|nested] 52+ messages in thread

* [PATCH v1 14/14] futex,selftests: Extend the futex selftests
  2023-07-21 10:22 [PATCH v1 00/14] futex: More futex2 bits Peter Zijlstra
                   ` (12 preceding siblings ...)
  2023-07-21 10:22 ` [PATCH v1 13/14] futex: Enable FUTEX2_{8,16} Peter Zijlstra
@ 2023-07-21 10:22 ` Peter Zijlstra
  2023-07-21 14:42 ` [PATCH v1 00/14] futex: More futex2 bits Jens Axboe
  2023-07-21 15:49 ` Arnd Bergmann
  15 siblings, 0 replies; 52+ messages in thread
From: Peter Zijlstra @ 2023-07-21 10:22 UTC (permalink / raw)
  To: tglx, axboe
  Cc: linux-kernel, peterz, mingo, dvhart, dave, andrealmeid,
	Andrew Morton, urezki, hch, lstoakes, Arnd Bergmann, linux-api,
	linux-mm, linux-arch, malteskarupke

Extend the wait/requeue selftests to also cover the futex2 syscalls.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 tools/testing/selftests/futex/functional/futex_requeue.c         |  100 +++++++++-
 tools/testing/selftests/futex/functional/futex_wait.c            |   56 ++++-
 tools/testing/selftests/futex/functional/futex_wait_timeout.c    |   14 +
 tools/testing/selftests/futex/functional/futex_wait_wouldblock.c |   28 ++
 tools/testing/selftests/futex/functional/futex_waitv.c           |   15 -
 tools/testing/selftests/futex/functional/run.sh                  |    6 
 tools/testing/selftests/futex/include/futex2test.h               |   39 +++
 7 files changed, 229 insertions(+), 29 deletions(-)

--- a/tools/testing/selftests/futex/functional/futex_requeue.c
+++ b/tools/testing/selftests/futex/functional/futex_requeue.c
@@ -7,8 +7,10 @@
 
 #include <pthread.h>
 #include <limits.h>
+#include <stdbool.h>
 #include "logging.h"
 #include "futextest.h"
+#include "futex2test.h"
 
 #define TEST_NAME "futex-requeue"
 #define timeout_ns  30000000
@@ -16,24 +18,58 @@
 
 volatile futex_t *f1;
 
+bool futex2 = 0;
+bool mixed = 0;
+
 void usage(char *prog)
 {
 	printf("Usage: %s\n", prog);
 	printf("  -c	Use color\n");
+	printf("  -n	Use futex2 interface\n");
+	printf("  -x	Use mixed size futex\n");
 	printf("  -h	Display this help message\n");
 	printf("  -v L	Verbosity level: %d=QUIET %d=CRITICAL %d=INFO\n",
 	       VQUIET, VCRITICAL, VINFO);
 }
 
-void *waiterfn(void *arg)
+static void *waiterfn(void *arg)
 {
+	unsigned int flags = 0;
 	struct timespec to;
 
-	to.tv_sec = 0;
-	to.tv_nsec = timeout_ns;
+	if (futex2) {
+		unsigned long mask;
+
+		if (clock_gettime(CLOCK_MONOTONIC, &to)) {
+			printf("clock_gettime() failed errno %d", errno);
+			return NULL;
+		}
+
+		to.tv_nsec += timeout_ns;
+		if (to.tv_nsec >= 1000000000) {
+			to.tv_sec++;
+			to.tv_nsec -= 1000000000;
+		}
+
+		if (mixed) {
+			flags |= FUTEX2_16;
+			mask = (unsigned short)(~0U);
+		} else {
+			flags |= FUTEX2_32;
+			mask = (unsigned int)(~0U);
+		}
+
+		if (futex2_wait(f1, *f1, mask, flags,
+				&to, CLOCK_MONOTONIC))
+			printf("waiter failed errno %d\n", errno);
+	} else {
+
+		to.tv_sec = 0;
+		to.tv_nsec = timeout_ns;
 
-	if (futex_wait(f1, *f1, &to, 0))
-		printf("waiter failed errno %d\n", errno);
+		if (futex_wait(f1, *f1, &to, flags))
+			printf("waiter failed errno %d\n", errno);
+	}
 
 	return NULL;
 }
@@ -48,7 +84,7 @@ int main(int argc, char *argv[])
 
 	f1 = &_f1;
 
-	while ((c = getopt(argc, argv, "cht:v:")) != -1) {
+	while ((c = getopt(argc, argv, "xncht:v:")) != -1) {
 		switch (c) {
 		case 'c':
 			log_color(1);
@@ -59,6 +95,12 @@ int main(int argc, char *argv[])
 		case 'v':
 			log_verbosity(atoi(optarg));
 			break;
+		case 'x':
+			mixed=1;
+			/* fallthrough */
+		case 'n':
+			futex2=1;
+			break;
 		default:
 			usage(basename(argv[0]));
 			exit(1);
@@ -79,7 +121,22 @@ int main(int argc, char *argv[])
 	usleep(WAKE_WAIT_US);
 
 	info("Requeuing 1 futex from f1 to f2\n");
-	res = futex_cmp_requeue(f1, 0, &f2, 0, 1, 0);
+	if (futex2) {
+		struct futex_waitv futexes[2] = {
+			{
+				.val = 0,
+				.uaddr = (unsigned long)f1,
+				.flags = mixed ? FUTEX2_16 : FUTEX2_32,
+			},
+			{
+				.uaddr = (unsigned long)&f2,
+				.flags = FUTEX2_32,
+			},
+		};
+		res = futex2_requeue(futexes, 0, 0, 1);
+	} else {
+		res = futex_cmp_requeue(f1, 0, &f2, 0, 1, 0);
+	}
 	if (res != 1) {
 		ksft_test_result_fail("futex_requeue simple returned: %d %s\n",
 				      res ? errno : res,
@@ -89,7 +146,11 @@ int main(int argc, char *argv[])
 
 
 	info("Waking 1 futex at f2\n");
-	res = futex_wake(&f2, 1, 0);
+	if (futex2) {
+		res = futex2_wake(&f2, ~0U, 1, FUTEX2_32);
+	} else {
+		res = futex_wake(&f2, 1, 0);
+	}
 	if (res != 1) {
 		ksft_test_result_fail("futex_requeue simple returned: %d %s\n",
 				      res ? errno : res,
@@ -112,7 +173,22 @@ int main(int argc, char *argv[])
 	usleep(WAKE_WAIT_US);
 
 	info("Waking 3 futexes at f1 and requeuing 7 futexes from f1 to f2\n");
-	res = futex_cmp_requeue(f1, 0, &f2, 3, 7, 0);
+	if (futex2) {
+		struct futex_waitv futexes[2] = {
+			{
+				.val = 0,
+				.uaddr = (unsigned long)f1,
+				.flags = mixed ? FUTEX2_16 : FUTEX2_32,
+			},
+			{
+				.uaddr = (unsigned long)&f2,
+				.flags = FUTEX2_32,
+			},
+		};
+		res = futex2_requeue(futexes, 0, 3, 7);
+	} else {
+		res = futex_cmp_requeue(f1, 0, &f2, 3, 7, 0);
+	}
 	if (res != 10) {
 		ksft_test_result_fail("futex_requeue many returned: %d %s\n",
 				      res ? errno : res,
@@ -121,7 +197,11 @@ int main(int argc, char *argv[])
 	}
 
 	info("Waking INT_MAX futexes at f2\n");
-	res = futex_wake(&f2, INT_MAX, 0);
+	if (futex2) {
+		res = futex2_wake(&f2, ~0U, INT_MAX, FUTEX2_32);
+	} else {
+		res = futex_wake(&f2, INT_MAX, 0);
+	}
 	if (res != 7) {
 		ksft_test_result_fail("futex_requeue many returned: %d %s\n",
 				      res ? errno : res,
--- a/tools/testing/selftests/futex/functional/futex_wait.c
+++ b/tools/testing/selftests/futex/functional/futex_wait.c
@@ -9,8 +9,10 @@
 #include <sys/shm.h>
 #include <sys/mman.h>
 #include <fcntl.h>
+#include <stdbool.h>
 #include "logging.h"
 #include "futextest.h"
+#include "futex2test.h"
 
 #define TEST_NAME "futex-wait"
 #define timeout_ns  30000000
@@ -19,10 +21,13 @@
 
 void *futex;
 
+bool futex2 = 0;
+
 void usage(char *prog)
 {
 	printf("Usage: %s\n", prog);
 	printf("  -c	Use color\n");
+	printf("  -n	Use futex2 interface\n");
 	printf("  -h	Display this help message\n");
 	printf("  -v L	Verbosity level: %d=QUIET %d=CRITICAL %d=INFO\n",
 	       VQUIET, VCRITICAL, VINFO);
@@ -30,17 +35,35 @@ void usage(char *prog)
 
 static void *waiterfn(void *arg)
 {
-	struct timespec to;
 	unsigned int flags = 0;
+	struct timespec to;
 
 	if (arg)
 		flags = *((unsigned int *) arg);
 
-	to.tv_sec = 0;
-	to.tv_nsec = timeout_ns;
+	if (futex2) {
+		if (clock_gettime(CLOCK_MONOTONIC, &to)) {
+			printf("clock_gettime() failed errno %d", errno);
+			return NULL;
+		}
 
-	if (futex_wait(futex, 0, &to, flags))
-		printf("waiter failed errno %d\n", errno);
+		to.tv_nsec += timeout_ns;
+		if (to.tv_nsec >= 1000000000) {
+			to.tv_sec++;
+			to.tv_nsec -= 1000000000;
+		}
+
+		if (futex2_wait(futex, 0, ~0U, flags | FUTEX2_32,
+				&to, CLOCK_MONOTONIC))
+			printf("waiter failed errno %d\n", errno);
+	} else {
+
+		to.tv_sec = 0;
+		to.tv_nsec = timeout_ns;
+
+		if (futex_wait(futex, 0, &to, flags))
+			printf("waiter failed errno %d\n", errno);
+	}
 
 	return NULL;
 }
@@ -55,7 +78,7 @@ int main(int argc, char *argv[])
 
 	futex = &f_private;
 
-	while ((c = getopt(argc, argv, "cht:v:")) != -1) {
+	while ((c = getopt(argc, argv, "ncht:v:")) != -1) {
 		switch (c) {
 		case 'c':
 			log_color(1);
@@ -66,6 +89,9 @@ int main(int argc, char *argv[])
 		case 'v':
 			log_verbosity(atoi(optarg));
 			break;
+		case 'n':
+			futex2=1;
+			break;
 		default:
 			usage(basename(argv[0]));
 			exit(1);
@@ -84,7 +110,11 @@ int main(int argc, char *argv[])
 	usleep(WAKE_WAIT_US);
 
 	info("Calling private futex_wake on futex: %p\n", futex);
-	res = futex_wake(futex, 1, FUTEX_PRIVATE_FLAG);
+	if (futex2) {
+		res = futex2_wake(futex, ~0U, 1, FUTEX2_32 | FUTEX2_PRIVATE);
+	} else {
+		res = futex_wake(futex, 1, FUTEX_PRIVATE_FLAG);
+	}
 	if (res != 1) {
 		ksft_test_result_fail("futex_wake private returned: %d %s\n",
 				      errno, strerror(errno));
@@ -112,7 +142,11 @@ int main(int argc, char *argv[])
 	usleep(WAKE_WAIT_US);
 
 	info("Calling shared (page anon) futex_wake on futex: %p\n", futex);
-	res = futex_wake(futex, 1, 0);
+	if (futex2) {
+		res = futex2_wake(futex, ~0U, 1, FUTEX2_32);
+	} else {
+		res = futex_wake(futex, 1, 0);
+	}
 	if (res != 1) {
 		ksft_test_result_fail("futex_wake shared (page anon) returned: %d %s\n",
 				      errno, strerror(errno));
@@ -151,7 +185,11 @@ int main(int argc, char *argv[])
 	usleep(WAKE_WAIT_US);
 
 	info("Calling shared (file backed) futex_wake on futex: %p\n", futex);
-	res = futex_wake(shm, 1, 0);
+	if (futex2) {
+		res = futex2_wake(shm, ~0U, 1, FUTEX2_32);
+	} else {
+		res = futex_wake(shm, 1, 0);
+	}
 	if (res != 1) {
 		ksft_test_result_fail("futex_wake shared (file backed) returned: %d %s\n",
 				      errno, strerror(errno));
--- a/tools/testing/selftests/futex/functional/futex_wait_timeout.c
+++ b/tools/testing/selftests/futex/functional/futex_wait_timeout.c
@@ -125,7 +125,7 @@ int main(int argc, char *argv[])
 	}
 
 	ksft_print_header();
-	ksft_set_plan(9);
+	ksft_set_plan(11);
 	ksft_print_msg("%s: Block on a futex and wait for timeout\n",
 	       basename(argv[0]));
 	ksft_print_msg("\tArguments: timeout=%ldns\n", timeout_ns);
@@ -194,6 +194,18 @@ int main(int argc, char *argv[])
 	res = futex_waitv(&waitv, 1, 0, &to, CLOCK_REALTIME);
 	test_timeout(res, &ret, "futex_waitv realtime", ETIMEDOUT);
 
+	/* futex2_wait with CLOCK_MONOTONIC */
+	if (futex_get_abs_timeout(CLOCK_MONOTONIC, &to, timeout_ns))
+		return RET_FAIL;
+	res = futex2_wait(&f1, f1, 1, FUTEX2_32, &to, CLOCK_MONOTONIC);
+	test_timeout(res, &ret, "futex2_wait monotonic", ETIMEDOUT);
+
+	/* futex2_wait with CLOCK_REALTIME */
+	if (futex_get_abs_timeout(CLOCK_REALTIME, &to, timeout_ns))
+		return RET_FAIL;
+	res = futex2_wait(&f1, f1, 1, FUTEX2_32, &to, CLOCK_REALTIME);
+	test_timeout(res, &ret, "futex2_wait realtime", ETIMEDOUT);
+
 	ksft_print_cnts();
 	return ret;
 }
--- a/tools/testing/selftests/futex/functional/futex_wait_wouldblock.c
+++ b/tools/testing/selftests/futex/functional/futex_wait_wouldblock.c
@@ -46,7 +46,7 @@ int main(int argc, char *argv[])
 	struct futex_waitv waitv = {
 			.uaddr = (uintptr_t)&f1,
 			.val = f1+1,
-			.flags = FUTEX_32,
+			.flags = FUTEX2_32 | FUTEX2_PRIVATE,
 			.__reserved = 0
 		};
 
@@ -68,7 +68,7 @@ int main(int argc, char *argv[])
 	}
 
 	ksft_print_header();
-	ksft_set_plan(2);
+	ksft_set_plan(3);
 	ksft_print_msg("%s: Test the unexpected futex value in FUTEX_WAIT\n",
 	       basename(argv[0]));
 
@@ -106,6 +106,30 @@ int main(int argc, char *argv[])
 		ksft_test_result_pass("futex_waitv\n");
 	}
 
+	if (clock_gettime(CLOCK_MONOTONIC, &to)) {
+		error("clock_gettime failed\n", errno);
+		return errno;
+	}
+
+	to.tv_nsec += timeout_ns;
+
+	if (to.tv_nsec >= 1000000000) {
+		to.tv_sec++;
+		to.tv_nsec -= 1000000000;
+	}
+
+	info("Calling futex2_wait on f1: %u @ %p with val=%u\n", f1, &f1, f1+1);
+	res = futex2_wait(&f1, f1+1, ~0U, FUTEX2_32 | FUTEX2_PRIVATE,
+			  &to, CLOCK_MONOTONIC);
+	if (!res || errno != EWOULDBLOCK) {
+		ksft_test_result_pass("futex2_wait returned: %d %s\n",
+				      res ? errno : res,
+				      res ? strerror(errno) : "");
+		ret = RET_FAIL;
+	} else {
+		ksft_test_result_pass("futex2_wait\n");
+	}
+
 	ksft_print_cnts();
 	return ret;
 }
--- a/tools/testing/selftests/futex/functional/futex_waitv.c
+++ b/tools/testing/selftests/futex/functional/futex_waitv.c
@@ -88,7 +88,7 @@ int main(int argc, char *argv[])
 
 	for (i = 0; i < NR_FUTEXES; i++) {
 		waitv[i].uaddr = (uintptr_t)&futexes[i];
-		waitv[i].flags = FUTEX_32 | FUTEX_PRIVATE_FLAG;
+		waitv[i].flags = FUTEX2_32 | FUTEX2_PRIVATE;
 		waitv[i].val = 0;
 		waitv[i].__reserved = 0;
 	}
@@ -99,7 +99,8 @@ int main(int argc, char *argv[])
 
 	usleep(WAKE_WAIT_US);
 
-	res = futex_wake(u64_to_ptr(waitv[NR_FUTEXES - 1].uaddr), 1, FUTEX_PRIVATE_FLAG);
+	res = futex2_wake(u64_to_ptr(waitv[NR_FUTEXES - 1].uaddr), ~0U, 1,
+			  FUTEX2_PRIVATE | FUTEX2_32);
 	if (res != 1) {
 		ksft_test_result_fail("futex_wake private returned: %d %s\n",
 				      res ? errno : res,
@@ -122,7 +123,7 @@ int main(int argc, char *argv[])
 
 		*shared_data = 0;
 		waitv[i].uaddr = (uintptr_t)shared_data;
-		waitv[i].flags = FUTEX_32;
+		waitv[i].flags = FUTEX2_32;
 		waitv[i].val = 0;
 		waitv[i].__reserved = 0;
 	}
@@ -145,8 +146,8 @@ int main(int argc, char *argv[])
 	for (i = 0; i < NR_FUTEXES; i++)
 		shmdt(u64_to_ptr(waitv[i].uaddr));
 
-	/* Testing a waiter without FUTEX_32 flag */
-	waitv[0].flags = FUTEX_PRIVATE_FLAG;
+	/* Testing a waiter without FUTEX2_32 flag */
+	waitv[0].flags = FUTEX2_PRIVATE;
 
 	if (clock_gettime(CLOCK_MONOTONIC, &to))
 		error("gettime64 failed\n", errno);
@@ -160,11 +161,11 @@ int main(int argc, char *argv[])
 				      res ? strerror(errno) : "");
 		ret = RET_FAIL;
 	} else {
-		ksft_test_result_pass("futex_waitv without FUTEX_32\n");
+		ksft_test_result_pass("futex_waitv without FUTEX2_32\n");
 	}
 
 	/* Testing a waiter with an unaligned address */
-	waitv[0].flags = FUTEX_PRIVATE_FLAG | FUTEX_32;
+	waitv[0].flags = FUTEX2_PRIVATE | FUTEX2_32;
 	waitv[0].uaddr = 1;
 
 	if (clock_gettime(CLOCK_MONOTONIC, &to))
--- a/tools/testing/selftests/futex/functional/run.sh
+++ b/tools/testing/selftests/futex/functional/run.sh
@@ -76,9 +76,15 @@ echo
 
 echo
 ./futex_wait $COLOR
+echo
+./futex_wait -n $COLOR
 
 echo
 ./futex_requeue $COLOR
+echo
+./futex_requeue -n $COLOR
+echo
+./futex_requeue -x $COLOR
 
 echo
 ./futex_waitv $COLOR
--- a/tools/testing/selftests/futex/include/futex2test.h
+++ b/tools/testing/selftests/futex/include/futex2test.h
@@ -8,6 +8,28 @@
 
 #define u64_to_ptr(x) ((void *)(uintptr_t)(x))
 
+#ifndef __NR_futex_wake
+#define __NR_futex_wake 452
+#define __NR_futex_wait 453
+#define __NR_futex_requeue 454
+#endif
+
+#ifndef FUTEX2_8
+/*
+ * Flags for futex2 syscalls.
+ */
+#define FUTEX2_8		0x00
+#define FUTEX2_16		0x01
+#define FUTEX2_32		0x02
+#define FUTEX2_64		0x03
+#define FUTEX2_NUMA		0x04
+			/*	0x08 */
+			/*	0x10 */
+			/*	0x20 */
+			/*	0x40 */
+#define FUTEX2_PRIVATE		FUTEX_PRIVATE_FLAG
+#endif
+
 /**
  * futex_waitv - Wait at multiple futexes, wake on any
  * @waiters:    Array of waiters
@@ -20,3 +42,20 @@ static inline int futex_waitv(volatile s
 {
 	return syscall(__NR_futex_waitv, waiters, nr_waiters, flags, timo, clockid);
 }
+
+static inline int futex2_wake(volatile void *uaddr, unsigned long mask, int nr, unsigned int flags)
+{
+	return syscall(__NR_futex_wake, uaddr, mask, nr, flags);
+}
+
+static inline int futex2_wait(volatile void *uaddr, unsigned long val, unsigned long mask,
+			      unsigned int flags, struct timespec *timo, clockid_t clockid)
+{
+	return syscall(__NR_futex_wait, uaddr, val, mask, flags, timo, clockid);
+}
+
+static inline int futex2_requeue(struct futex_waitv *futexes, unsigned int flags,
+				 int nr_wake, int nr_requeue)
+{
+	return syscall(__NR_futex_requeue, futexes, flags, nr_wake, nr_requeue);
+}



^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v1 11/14] futex: Implement FUTEX2_NUMA
  2023-07-21 10:22 ` [PATCH v1 11/14] futex: Implement FUTEX2_NUMA Peter Zijlstra
@ 2023-07-21 12:16   ` Peter Zijlstra
  2023-07-31 17:36   ` Thomas Gleixner
  1 sibling, 0 replies; 52+ messages in thread
From: Peter Zijlstra @ 2023-07-21 12:16 UTC (permalink / raw)
  To: tglx, axboe
  Cc: linux-kernel, mingo, dvhart, dave, andrealmeid, Andrew Morton,
	urezki, hch, lstoakes, Arnd Bergmann, linux-api, linux-mm,
	linux-arch, malteskarupke

On Fri, Jul 21, 2023 at 12:22:48PM +0200, Peter Zijlstra wrote:
> @@ -217,32 +259,55 @@ static u64 get_inode_sequence_number(str
>   *
>   * lock_page() might sleep, the caller should not hold a spinlock.
>   */
> -int get_futex_key(u32 __user *uaddr, unsigned int flags, union futex_key *key,
> +int get_futex_key(void __user *uaddr, unsigned int flags, union futex_key *key,
>  		  enum futex_access rw)
>  {
>  	unsigned long address = (unsigned long)uaddr;
>  	struct mm_struct *mm = current->mm;
>  	struct page *page, *tail;
>  	struct address_space *mapping;
> +	int node, err, size, ro = 0;
>  	bool fshared;
>  
>  	fshared = flags & FLAGS_SHARED;
> +	size = futex_size(flags);
>  
>  	/*
>  	 * The futex address must be "naturally" aligned.
>  	 */
>  	key->both.offset = address % PAGE_SIZE;
> +	if (unlikely((address % size) != 0))
>  		return -EINVAL;

This enforces u32 alignment for:

struct futex_numa_32 {
	u32 val;
	u32 node;
};

Or do we want to enfore u64 alignment for that?

>  	address -= key->both.offset;
>  
> +	if (flags & FLAGS_NUMA)
> +		size *= 2;
> +
> +	if (unlikely(!access_ok(uaddr, size)))
>  		return -EFAULT;

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v1 00/14] futex: More futex2 bits
  2023-07-21 10:22 [PATCH v1 00/14] futex: More futex2 bits Peter Zijlstra
                   ` (13 preceding siblings ...)
  2023-07-21 10:22 ` [PATCH v1 14/14] futex,selftests: Extend the futex selftests Peter Zijlstra
@ 2023-07-21 14:42 ` Jens Axboe
  2023-07-21 15:49 ` Arnd Bergmann
  15 siblings, 0 replies; 52+ messages in thread
From: Jens Axboe @ 2023-07-21 14:42 UTC (permalink / raw)
  To: Peter Zijlstra, tglx
  Cc: linux-kernel, mingo, dvhart, dave, andrealmeid, Andrew Morton,
	urezki, hch, lstoakes, Arnd Bergmann, linux-api, linux-mm,
	linux-arch, malteskarupke

On 7/21/23 4:22?AM, Peter Zijlstra wrote:
> Hi,
> 
> New version of the futex2 patches. These are actually tested and appear to work
> as expected.
> 
> I'm hoping to get at least the first 3 patches merged such that Jens can base
> the io_uring futex patches on them.

First 4 now, as we'll need the validate patch as well!

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v1 05/14] futex: Add sys_futex_wake()
  2023-07-21 10:22 ` [PATCH v1 05/14] futex: Add sys_futex_wake() Peter Zijlstra
@ 2023-07-21 15:41   ` Arnd Bergmann
  2023-07-21 18:54     ` Peter Zijlstra
  2023-07-25  7:22   ` Geert Uytterhoeven
  1 sibling, 1 reply; 52+ messages in thread
From: Arnd Bergmann @ 2023-07-21 15:41 UTC (permalink / raw)
  To: Peter Zijlstra, Thomas Gleixner, Jens Axboe
  Cc: linux-kernel, Ingo Molnar, Darren Hart, dave, andrealmeid,
	Andrew Morton, urezki, Christoph Hellwig, Lorenzo Stoakes,
	linux-api, linux-mm, Linux-Arch, malteskarupke

On Fri, Jul 21, 2023, at 12:22, Peter Zijlstra wrote:
> --- a/arch/arm/tools/syscall.tbl
> +++ b/arch/arm/tools/syscall.tbl
> @@ -465,3 +465,4 @@
>  449	common	futex_waitv			sys_futex_waitv
>  450	common	set_mempolicy_home_node		sys_set_mempolicy_home_node
>  451	common	cachestat			sys_cachestat
> +452	common	futex_wake			sys_futex_wake

This clashes with __NR_fchmodat2 in linux-next, which also wants number 452.

> --- a/arch/arm64/include/asm/unistd32.h
> +++ b/arch/arm64/include/asm/unistd32.h
> @@ -909,6 +909,8 @@ __SYSCALL(__NR_futex_waitv, sys_futex_wa
>  __SYSCALL(__NR_set_mempolicy_home_node, sys_set_mempolicy_home_node)
>  #define __NR_cachestat 451
>  __SYSCALL(__NR_cachestat, sys_cachestat)
> +#define __NR_futex_wake 452
> +__SYSCALL(__NR_futex_wake, sys_futex_wake)
> 
>  /*
>   * Please add new compat syscalls above this comment and update

Unfortunately, changing this file still requires updating __NR_compat_syscalls
in arch/arm64/include/asm/unistd.h as well.

> --- a/kernel/sys_ni.c
> +++ b/kernel/sys_ni.c
> @@ -87,6 +87,7 @@ COND_SYSCALL_COMPAT(set_robust_list);
>  COND_SYSCALL(get_robust_list);
>  COND_SYSCALL_COMPAT(get_robust_list);
>  COND_SYSCALL(futex_waitv);
> +COND_SYSCALL(futex_wake);
>  COND_SYSCALL(kexec_load);
>  COND_SYSCALL_COMPAT(kexec_load);
>  COND_SYSCALL(init_module);

This is fine for the moment, but I wonder if we should start making
futex mandatory at some point. Right now, sparc32 with CONFIG_SMP
cannot support futex because of the lack of atomics in early
sparc processors, but sparc32 glibc actually requires futexes
and consequently only works on uniprocessor machines, on sparc64
compat mode, or on Leon3 with out of tree patches.

      Arnd

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v1 02/14] futex: Extend the FUTEX2 flags
  2023-07-21 10:22 ` [PATCH v1 02/14] futex: Extend the " Peter Zijlstra
@ 2023-07-21 15:47   ` Arnd Bergmann
  2023-07-21 18:52     ` Peter Zijlstra
  2023-07-31 16:11   ` Thomas Gleixner
  1 sibling, 1 reply; 52+ messages in thread
From: Arnd Bergmann @ 2023-07-21 15:47 UTC (permalink / raw)
  To: Peter Zijlstra, Thomas Gleixner, Jens Axboe
  Cc: linux-kernel, Ingo Molnar, Darren Hart, dave, andrealmeid,
	Andrew Morton, urezki, Christoph Hellwig, Lorenzo Stoakes,
	linux-api, linux-mm, Linux-Arch, malteskarupke

On Fri, Jul 21, 2023, at 12:22, Peter Zijlstra wrote:
>   * futex_parse_waitv - Parse a waitv array from userspace
> @@ -207,7 +207,12 @@ static int futex_parse_waitv(struct fute
>  		if ((aux.flags & ~FUTEX2_MASK) || aux.__reserved)
>  			return -EINVAL;
> 
> -		if (!(aux.flags & FUTEX2_32))
> +		if (!IS_ENABLED(CONFIG_64BIT) || in_compat_syscall()) {
> +			if ((aux.flags & FUTEX2_64) == FUTEX2_64)
> +				return -EINVAL;
> +		}
> +
> +		if ((aux.flags & FUTEX2_64) != FUTEX2_32)
>  			return -EINVAL;

This looks slightly confusing, how about defining another
FUTEX2_SIZEMASK (or similar) macro to clarify that
"aux.flags & FUTEX2_64" is a mask operation that can
match the FUTEX2_{8,16,32,64} values?

       Arnd

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v1 00/14] futex: More futex2 bits
  2023-07-21 10:22 [PATCH v1 00/14] futex: More futex2 bits Peter Zijlstra
                   ` (14 preceding siblings ...)
  2023-07-21 14:42 ` [PATCH v1 00/14] futex: More futex2 bits Jens Axboe
@ 2023-07-21 15:49 ` Arnd Bergmann
  15 siblings, 0 replies; 52+ messages in thread
From: Arnd Bergmann @ 2023-07-21 15:49 UTC (permalink / raw)
  To: Peter Zijlstra, Thomas Gleixner, Jens Axboe
  Cc: linux-kernel, Ingo Molnar, Darren Hart, dave, andrealmeid,
	Andrew Morton, urezki, Christoph Hellwig, Lorenzo Stoakes,
	linux-api, linux-mm, Linux-Arch, malteskarupke

On Fri, Jul 21, 2023, at 12:22, Peter Zijlstra wrote:
> Hi,
>
> New version of the futex2 patches. These are actually tested and appear to work
> as expected.
>
> I'm hoping to get at least the first 3 patches merged such that Jens can base
> the io_uring futex patches on them.
>
>
> Changes since v0:
>  - switched over to 'unsigned long' for values (Arnd)
>  - unshare vmalloc_huge() (Willy)
>  - added wait/requeue syscalls
>  - fixed NUMA to support sparse nodemask
>  - added FUTEX2_n vs FUTEX2_NUMA check to ensure
>    the node_id fits in the futex
>  - added selftests
>  - fixed a ton of silly bugs

The changes look good to me, and the ABI should be fine without
special compat handler now. I sent a couple of minor comments, but
nothing important.

     Arnd

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v1 02/14] futex: Extend the FUTEX2 flags
  2023-07-21 15:47   ` Arnd Bergmann
@ 2023-07-21 18:52     ` Peter Zijlstra
  0 siblings, 0 replies; 52+ messages in thread
From: Peter Zijlstra @ 2023-07-21 18:52 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Thomas Gleixner, Jens Axboe, linux-kernel, Ingo Molnar,
	Darren Hart, dave, andrealmeid, Andrew Morton, urezki,
	Christoph Hellwig, Lorenzo Stoakes, linux-api, linux-mm,
	Linux-Arch, malteskarupke

On Fri, Jul 21, 2023 at 05:47:31PM +0200, Arnd Bergmann wrote:
> On Fri, Jul 21, 2023, at 12:22, Peter Zijlstra wrote:
> >   * futex_parse_waitv - Parse a waitv array from userspace
> > @@ -207,7 +207,12 @@ static int futex_parse_waitv(struct fute
> >  		if ((aux.flags & ~FUTEX2_MASK) || aux.__reserved)
> >  			return -EINVAL;
> > 
> > -		if (!(aux.flags & FUTEX2_32))
> > +		if (!IS_ENABLED(CONFIG_64BIT) || in_compat_syscall()) {
> > +			if ((aux.flags & FUTEX2_64) == FUTEX2_64)
> > +				return -EINVAL;
> > +		}
> > +
> > +		if ((aux.flags & FUTEX2_64) != FUTEX2_32)
> >  			return -EINVAL;
> 
> This looks slightly confusing, how about defining another
> FUTEX2_SIZEMASK (or similar) macro to clarify that
> "aux.flags & FUTEX2_64" is a mask operation that can
> match the FUTEX2_{8,16,32,64} values?

Yeah, I had that in an earlier version, but then reconsidered as I
didn't want to clutter the uabi with that. But perhaps I over-throught
this.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v1 05/14] futex: Add sys_futex_wake()
  2023-07-21 15:41   ` Arnd Bergmann
@ 2023-07-21 18:54     ` Peter Zijlstra
  2023-07-21 21:23       ` Arnd Bergmann
  0 siblings, 1 reply; 52+ messages in thread
From: Peter Zijlstra @ 2023-07-21 18:54 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Thomas Gleixner, Jens Axboe, linux-kernel, Ingo Molnar,
	Darren Hart, dave, andrealmeid, Andrew Morton, urezki,
	Christoph Hellwig, Lorenzo Stoakes, linux-api, linux-mm,
	Linux-Arch, malteskarupke

On Fri, Jul 21, 2023 at 05:41:20PM +0200, Arnd Bergmann wrote:
> On Fri, Jul 21, 2023, at 12:22, Peter Zijlstra wrote:
> > --- a/arch/arm/tools/syscall.tbl
> > +++ b/arch/arm/tools/syscall.tbl
> > @@ -465,3 +465,4 @@
> >  449	common	futex_waitv			sys_futex_waitv
> >  450	common	set_mempolicy_home_node		sys_set_mempolicy_home_node
> >  451	common	cachestat			sys_cachestat
> > +452	common	futex_wake			sys_futex_wake
> 
> This clashes with __NR_fchmodat2 in linux-next, which also wants number 452.

Yeah, I fully expected some collisions :/ Unavoidable.

> > --- a/arch/arm64/include/asm/unistd32.h
> > +++ b/arch/arm64/include/asm/unistd32.h
> > @@ -909,6 +909,8 @@ __SYSCALL(__NR_futex_waitv, sys_futex_wa
> >  __SYSCALL(__NR_set_mempolicy_home_node, sys_set_mempolicy_home_node)
> >  #define __NR_cachestat 451
> >  __SYSCALL(__NR_cachestat, sys_cachestat)
> > +#define __NR_futex_wake 452
> > +__SYSCALL(__NR_futex_wake, sys_futex_wake)
> > 
> >  /*
> >   * Please add new compat syscalls above this comment and update
> 
> Unfortunately, changing this file still requires updating __NR_compat_syscalls
> in arch/arm64/include/asm/unistd.h as well.

Bah I missed that. I'll go fix it up. Thanks!

> > --- a/kernel/sys_ni.c
> > +++ b/kernel/sys_ni.c
> > @@ -87,6 +87,7 @@ COND_SYSCALL_COMPAT(set_robust_list);
> >  COND_SYSCALL(get_robust_list);
> >  COND_SYSCALL_COMPAT(get_robust_list);
> >  COND_SYSCALL(futex_waitv);
> > +COND_SYSCALL(futex_wake);
> >  COND_SYSCALL(kexec_load);
> >  COND_SYSCALL_COMPAT(kexec_load);
> >  COND_SYSCALL(init_module);
> 
> This is fine for the moment, but I wonder if we should start making
> futex mandatory at some point. Right now, sparc32 with CONFIG_SMP
> cannot support futex because of the lack of atomics in early
> sparc processors, but sparc32 glibc actually requires futexes
> and consequently only works on uniprocessor machines, on sparc64
> compat mode, or on Leon3 with out of tree patches.

PARISC is another 'fun' case. 

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v1 05/14] futex: Add sys_futex_wake()
  2023-07-21 18:54     ` Peter Zijlstra
@ 2023-07-21 21:23       ` Arnd Bergmann
  0 siblings, 0 replies; 52+ messages in thread
From: Arnd Bergmann @ 2023-07-21 21:23 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Thomas Gleixner, Jens Axboe, linux-kernel, Ingo Molnar,
	Darren Hart, dave, andrealmeid, Andrew Morton, urezki,
	Christoph Hellwig, Lorenzo Stoakes, linux-api, linux-mm,
	Linux-Arch, malteskarupke

On Fri, Jul 21, 2023, at 20:54, Peter Zijlstra wrote:
> On Fri, Jul 21, 2023 at 05:41:20PM +0200, Arnd Bergmann wrote:
>> On Fri, Jul 21, 2023, at 12:22, Peter Zijlstra wrote:
>> > --- a/kernel/sys_ni.c
>> > +++ b/kernel/sys_ni.c
>> > @@ -87,6 +87,7 @@ COND_SYSCALL_COMPAT(set_robust_list);
>> >  COND_SYSCALL(get_robust_list);
>> >  COND_SYSCALL_COMPAT(get_robust_list);
>> >  COND_SYSCALL(futex_waitv);
>> > +COND_SYSCALL(futex_wake);
>> >  COND_SYSCALL(kexec_load);
>> >  COND_SYSCALL_COMPAT(kexec_load);
>> >  COND_SYSCALL(init_module);
>> 
>> This is fine for the moment, but I wonder if we should start making
>> futex mandatory at some point. Right now, sparc32 with CONFIG_SMP
>> cannot support futex because of the lack of atomics in early
>> sparc processors, but sparc32 glibc actually requires futexes
>> and consequently only works on uniprocessor machines, on sparc64
>> compat mode, or on Leon3 with out of tree patches.
>
> PARISC is another 'fun' case.

I had to look up how that works, but as far as I can tell, the
parisc code actually has a chance of working, as the userspace
atomics go through the light-weight syscall that shares a hashed
lock with the actual futex syscall. On sparc32 I think it's
worse because userspace assumes that atomic instructions are
supported while the kernel assumes they are not.

        Arnd

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v1 10/14] mm: Add vmalloc_huge_node()
  2023-07-21 10:22 ` [PATCH v1 10/14] mm: Add vmalloc_huge_node() Peter Zijlstra
@ 2023-07-24 13:46   ` Christoph Hellwig
  0 siblings, 0 replies; 52+ messages in thread
From: Christoph Hellwig @ 2023-07-24 13:46 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: tglx, axboe, linux-kernel, mingo, dvhart, dave, andrealmeid,
	Andrew Morton, urezki, hch, lstoakes, Arnd Bergmann, linux-api,
	linux-mm, linux-arch, malteskarupke

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v1 05/14] futex: Add sys_futex_wake()
  2023-07-21 10:22 ` [PATCH v1 05/14] futex: Add sys_futex_wake() Peter Zijlstra
  2023-07-21 15:41   ` Arnd Bergmann
@ 2023-07-25  7:22   ` Geert Uytterhoeven
  1 sibling, 0 replies; 52+ messages in thread
From: Geert Uytterhoeven @ 2023-07-25  7:22 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: tglx, axboe, linux-kernel, mingo, dvhart, dave, andrealmeid,
	Andrew Morton, urezki, hch, lstoakes, Arnd Bergmann, linux-api,
	linux-mm, linux-arch, malteskarupke

On Fri, Jul 21, 2023 at 1:03 PM Peter Zijlstra <peterz@infradead.org> wrote:
> To complement sys_futex_waitv() add sys_futex_wake(). This syscall
> implements what was previously known as FUTEX_WAKE_BITSET except it
> uses 'unsigned long' for the bitmask and takes FUTEX2 flags.
>
> The 'unsigned long' allows FUTEX2_64 on 64bit platforms.
>
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>

>  arch/m68k/kernel/syscalls/syscall.tbl       |    1

For adding the syscall:
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>

For the actual number, you have to take the Big Syscall Lock ;-)

Gr{oetje,eeting}s,

                        Geert

-- 
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v1 06/14] futex: Add sys_futex_wait()
  2023-07-21 10:22 ` [PATCH v1 06/14] futex: Add sys_futex_wait() Peter Zijlstra
@ 2023-07-25  7:22   ` Geert Uytterhoeven
  2023-07-31 16:35   ` Thomas Gleixner
  1 sibling, 0 replies; 52+ messages in thread
From: Geert Uytterhoeven @ 2023-07-25  7:22 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: tglx, axboe, linux-kernel, mingo, dvhart, dave, andrealmeid,
	Andrew Morton, urezki, hch, lstoakes, Arnd Bergmann, linux-api,
	linux-mm, linux-arch, malteskarupke

On Fri, Jul 21, 2023 at 1:11 PM Peter Zijlstra <peterz@infradead.org> wrote:
> To complement sys_futex_waitv()/wake(), add sys_futex_wait(). This
> syscall implements what was previously known as FUTEX_WAIT_BITSET
> except it uses 'unsigned long' for the value and bitmask arguments,
> takes timespec and clockid_t arguments for the absolute timeout and
> uses FUTEX2 flags.
>
> The 'unsigned long' allows FUTEX2_64 on 64bit platforms.
>
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>

>  arch/m68k/kernel/syscalls/syscall.tbl       |    1

Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>

Gr{oetje,eeting}s,

                        Geert

-- 
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v1 09/14] futex: Add sys_futex_requeue()
  2023-07-21 10:22 ` [PATCH v1 09/14] futex: Add sys_futex_requeue() Peter Zijlstra
@ 2023-07-25  7:23   ` Geert Uytterhoeven
  2023-07-31 17:19   ` Thomas Gleixner
  1 sibling, 0 replies; 52+ messages in thread
From: Geert Uytterhoeven @ 2023-07-25  7:23 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: tglx, axboe, linux-kernel, mingo, dvhart, dave, andrealmeid,
	Andrew Morton, urezki, hch, lstoakes, Arnd Bergmann, linux-api,
	linux-mm, linux-arch, malteskarupke

On Fri, Jul 21, 2023 at 1:11 PM Peter Zijlstra <peterz@infradead.org> wrote:
> Finish of the 'simple' futex2 syscall group by adding
> sys_futex_requeue(). Unlike sys_futex_{wait,wake}() it's arguments are
> too numerous to fit into a regular syscall. As such, use struct
> futex_waitv to pass the 'source' and 'destination' futexes to the
> syscall.
>
> This syscall implements what was previously known as FUTEX_CMP_REQUEUE
> and uses {val, uaddr, flags} for source and {uaddr, flags} for
> destination.
>
> This design explicitly allows requeueing between different types of
> futex by having a different flags word per uaddr.
>
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>

>  arch/m68k/kernel/syscalls/syscall.tbl       |    1

Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>

Gr{oetje,eeting}s,

                        Geert

-- 
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v1 01/14] futex: Clarify FUTEX2 flags
  2023-07-21 10:22 ` [PATCH v1 01/14] futex: Clarify FUTEX2 flags Peter Zijlstra
@ 2023-07-31 16:08   ` Thomas Gleixner
  0 siblings, 0 replies; 52+ messages in thread
From: Thomas Gleixner @ 2023-07-31 16:08 UTC (permalink / raw)
  To: Peter Zijlstra, axboe
  Cc: linux-kernel, peterz, mingo, dvhart, dave, andrealmeid,
	Andrew Morton, urezki, hch, lstoakes, Arnd Bergmann, linux-api,
	linux-mm, linux-arch, malteskarupke

ROn Fri, Jul 21 2023 at 12:22, Peter Zijlstra wrote:
> sys_futex_waitv() is part of the futex2 series (the first and only so
> far) of syscalls and has a flags field per futex (as opposed to flags
> being encoded in the futex op).
>
> This new flags field has a new namespace, which unfortunately isn't
> super explicit. Notably it currently takes FUTEX_32 and
> FUTEX_PRIVATE_FLAG.
>
> Introduce the FUTEX2 namespace to clarify this
>
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>

Reviewed-by: Thomas Gleixner <tglx@linutronix.de>

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v1 02/14] futex: Extend the FUTEX2 flags
  2023-07-21 10:22 ` [PATCH v1 02/14] futex: Extend the " Peter Zijlstra
  2023-07-21 15:47   ` Arnd Bergmann
@ 2023-07-31 16:11   ` Thomas Gleixner
  2023-07-31 16:25     ` Peter Zijlstra
                       ` (2 more replies)
  1 sibling, 3 replies; 52+ messages in thread
From: Thomas Gleixner @ 2023-07-31 16:11 UTC (permalink / raw)
  To: Peter Zijlstra, axboe
  Cc: linux-kernel, peterz, mingo, dvhart, dave, andrealmeid,
	Andrew Morton, urezki, hch, lstoakes, Arnd Bergmann, linux-api,
	linux-mm, linux-arch, malteskarupke

On Fri, Jul 21 2023 at 12:22, Peter Zijlstra wrote:
> +#define FUTEX2_8		0x00
> +#define FUTEX2_16		0x01
>  #define FUTEX2_32		0x02
> -			/*	0x04 */
> +#define FUTEX2_64		0x03
> +#define FUTEX2_NUMA		0x04
>  			/*	0x08 */
>  			/*	0x10 */
>  			/*	0x20 */
> --- a/kernel/futex/syscalls.c
> +++ b/kernel/futex/syscalls.c
> @@ -183,7 +183,7 @@ SYSCALL_DEFINE6(futex, u32 __user *, uad
>  	return do_futex(uaddr, op, val, tp, uaddr2, (unsigned long)utime, val3);
>  }
>  
> -#define FUTEX2_MASK (FUTEX2_32 | FUTEX2_PRIVATE)
> +#define FUTEX2_MASK (FUTEX2_64 | FUTEX2_PRIVATE)
>  
>  /**
>   * futex_parse_waitv - Parse a waitv array from userspace
> @@ -207,7 +207,12 @@ static int futex_parse_waitv(struct fute
>  		if ((aux.flags & ~FUTEX2_MASK) || aux.__reserved)
>  			return -EINVAL;

With the above aux.flags with FUTEX2_32 set will result in -EINVAL. I
don't think that's intentional.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v1 03/14] futex: Flag conversion
  2023-07-21 10:22 ` [PATCH v1 03/14] futex: Flag conversion Peter Zijlstra
@ 2023-07-31 16:21   ` Thomas Gleixner
  2023-07-31 16:26     ` Peter Zijlstra
  0 siblings, 1 reply; 52+ messages in thread
From: Thomas Gleixner @ 2023-07-31 16:21 UTC (permalink / raw)
  To: Peter Zijlstra, axboe
  Cc: linux-kernel, peterz, mingo, dvhart, dave, andrealmeid,
	Andrew Morton, urezki, hch, lstoakes, Arnd Bergmann, linux-api,
	linux-mm, linux-arch, malteskarupke

On Fri, Jul 21 2023 at 12:22, Peter Zijlstra wrote:
> +
> +static inline bool futex_flags_valid(unsigned int flags)
> +{
> +	/* Only 64bit futexes for 64bit code */
> +	if (!IS_ENABLED(CONFIG_64BIT) || in_compat_syscall()) {
> +		if ((flags & FLAGS_SIZE_MASK) == FLAGS_SIZE_64)
> +			return false;
> +	}
> +
> +	/* Only 32bit futexes are implemented -- for now */
> +	if ((flags & FLAGS_SIZE_MASK) != FLAGS_SIZE_32)
> +		return false;
> +
> +	return true;
> +}
> +
> +static inline unsigned int futex_size(unsigned int flags)
> +{
> +	unsigned int size = flags & FLAGS_SIZE_MASK;
> +	return 1 << size; /* {0,1,2,3} -> {1,2,4,8} */

Lacks a new line and the comment is both misplaced and kinda obvious, no?

Other than that, this looks abour right.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v1 02/14] futex: Extend the FUTEX2 flags
  2023-07-31 16:11   ` Thomas Gleixner
@ 2023-07-31 16:25     ` Peter Zijlstra
  2023-07-31 17:16     ` Thomas Gleixner
  2023-07-31 17:42     ` Thomas Gleixner
  2 siblings, 0 replies; 52+ messages in thread
From: Peter Zijlstra @ 2023-07-31 16:25 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: axboe, linux-kernel, mingo, dvhart, dave, andrealmeid,
	Andrew Morton, urezki, hch, lstoakes, Arnd Bergmann, linux-api,
	linux-mm, linux-arch, malteskarupke

On Mon, Jul 31, 2023 at 06:11:27PM +0200, Thomas Gleixner wrote:
> On Fri, Jul 21 2023 at 12:22, Peter Zijlstra wrote:
> > +#define FUTEX2_8		0x00
> > +#define FUTEX2_16		0x01
> >  #define FUTEX2_32		0x02
> > -			/*	0x04 */
> > +#define FUTEX2_64		0x03
> > +#define FUTEX2_NUMA		0x04
> >  			/*	0x08 */
> >  			/*	0x10 */
> >  			/*	0x20 */
> > --- a/kernel/futex/syscalls.c
> > +++ b/kernel/futex/syscalls.c
> > @@ -183,7 +183,7 @@ SYSCALL_DEFINE6(futex, u32 __user *, uad
> >  	return do_futex(uaddr, op, val, tp, uaddr2, (unsigned long)utime, val3);
> >  }
> >  
> > -#define FUTEX2_MASK (FUTEX2_32 | FUTEX2_PRIVATE)
> > +#define FUTEX2_MASK (FUTEX2_64 | FUTEX2_PRIVATE)
> >  
> >  /**
> >   * futex_parse_waitv - Parse a waitv array from userspace
> > @@ -207,7 +207,12 @@ static int futex_parse_waitv(struct fute
> >  		if ((aux.flags & ~FUTEX2_MASK) || aux.__reserved)
> >  			return -EINVAL;
> 
> With the above aux.flags with FUTEX2_32 set will result in -EINVAL. I
> don't think that's intentional.

FUTEX2_8     0
FUTEX2_16    1
FUTEX2_32    2
FUTEX2_64    3

Therefore FUTEX2_64, when used as a mask, includes then all.

Arnd suggested adding FUTEX2_SIZE_MASK or something. And I initially had
something like that, but pulled it for not wanting to pullute the uabi.

Can easily add back.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v1 03/14] futex: Flag conversion
  2023-07-31 16:21   ` Thomas Gleixner
@ 2023-07-31 16:26     ` Peter Zijlstra
  0 siblings, 0 replies; 52+ messages in thread
From: Peter Zijlstra @ 2023-07-31 16:26 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: axboe, linux-kernel, mingo, dvhart, dave, andrealmeid,
	Andrew Morton, urezki, hch, lstoakes, Arnd Bergmann, linux-api,
	linux-mm, linux-arch, malteskarupke

On Mon, Jul 31, 2023 at 06:21:56PM +0200, Thomas Gleixner wrote:
> On Fri, Jul 21 2023 at 12:22, Peter Zijlstra wrote:
> > +
> > +static inline bool futex_flags_valid(unsigned int flags)
> > +{
> > +	/* Only 64bit futexes for 64bit code */
> > +	if (!IS_ENABLED(CONFIG_64BIT) || in_compat_syscall()) {
> > +		if ((flags & FLAGS_SIZE_MASK) == FLAGS_SIZE_64)
> > +			return false;
> > +	}
> > +
> > +	/* Only 32bit futexes are implemented -- for now */
> > +	if ((flags & FLAGS_SIZE_MASK) != FLAGS_SIZE_32)
> > +		return false;
> > +
> > +	return true;
> > +}
> > +
> > +static inline unsigned int futex_size(unsigned int flags)
> > +{
> > +	unsigned int size = flags & FLAGS_SIZE_MASK;
> > +	return 1 << size; /* {0,1,2,3} -> {1,2,4,8} */
> 
> Lacks a new line and the comment is both misplaced and kinda obvious, no?

Yeah, it was a more complicated transform at some point, I'll fix.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v1 06/14] futex: Add sys_futex_wait()
  2023-07-21 10:22 ` [PATCH v1 06/14] futex: Add sys_futex_wait() Peter Zijlstra
  2023-07-25  7:22   ` Geert Uytterhoeven
@ 2023-07-31 16:35   ` Thomas Gleixner
  1 sibling, 0 replies; 52+ messages in thread
From: Thomas Gleixner @ 2023-07-31 16:35 UTC (permalink / raw)
  To: Peter Zijlstra, axboe
  Cc: linux-kernel, peterz, mingo, dvhart, dave, andrealmeid,
	Andrew Morton, urezki, hch, lstoakes, Arnd Bergmann, linux-api,
	linux-mm, linux-arch, malteskarupke

On Fri, Jul 21 2023 at 12:22, Peter Zijlstra wrote:
> +int futex_wait(u32 __user *uaddr, unsigned int flags, u32 val, ktime_t *abs_time, u32 bitset)
> +{
> +	struct hrtimer_sleeper timeout, *to;
> +	struct restart_block *restart;
> +	int ret;
> +
> +	to = futex_setup_timer(abs_time, &timeout, flags,
> +			       current->timer_slack_ns);
> +
> +	ret = __futex_wait(uaddr, flags, val, to, bitset);
> +	if (!to)
> +		return ret;

Can you please put an empty new line and a comment between the __futex_wait()
and the if (!to) check? The original code was less obfuscated.

Other than that: Reviewed-by: Thomas Gleixner <tglx@linutronix.de>

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v1 07/14] futex: Propagate flags into get_futex_key()
  2023-07-21 10:22 ` [PATCH v1 07/14] futex: Propagate flags into get_futex_key() Peter Zijlstra
@ 2023-07-31 16:36   ` Thomas Gleixner
  0 siblings, 0 replies; 52+ messages in thread
From: Thomas Gleixner @ 2023-07-31 16:36 UTC (permalink / raw)
  To: Peter Zijlstra, axboe
  Cc: linux-kernel, peterz, mingo, dvhart, dave, andrealmeid,
	Andrew Morton, urezki, hch, lstoakes, Arnd Bergmann, linux-api,
	linux-mm, linux-arch, malteskarupke

On Fri, Jul 21 2023 at 12:22, Peter Zijlstra wrote:
> Instead of only passing FLAGS_SHARED as a boolean, pass down flags as
> a whole.
>
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>

Reviewed-by: Thomas Gleixner <tglx@linutronix.de>

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v1 08/14] futex: Add flags2 argument to futex_requeue()
  2023-07-21 10:22 ` [PATCH v1 08/14] futex: Add flags2 argument to futex_requeue() Peter Zijlstra
@ 2023-07-31 16:43   ` Thomas Gleixner
  0 siblings, 0 replies; 52+ messages in thread
From: Thomas Gleixner @ 2023-07-31 16:43 UTC (permalink / raw)
  To: Peter Zijlstra, axboe
  Cc: linux-kernel, peterz, mingo, dvhart, dave, andrealmeid,
	Andrew Morton, urezki, hch, lstoakes, Arnd Bergmann, linux-api,
	linux-mm, linux-arch, malteskarupke

On Fri, Jul 21 2023 at 12:22, Peter Zijlstra wrote:
> In order to support mixed size requeue, add a second flags argument to
> the internal futex_requeue() function.

No user visible changes, right?

Reviewed-by: Thomas Gleixner <tglx@linutronix.de>


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v1 04/14] futex: Validate futex value against futex size
  2023-07-21 10:22 ` [PATCH v1 04/14] futex: Validate futex value against futex size Peter Zijlstra
@ 2023-07-31 17:12   ` Thomas Gleixner
  0 siblings, 0 replies; 52+ messages in thread
From: Thomas Gleixner @ 2023-07-31 17:12 UTC (permalink / raw)
  To: Peter Zijlstra, axboe
  Cc: linux-kernel, peterz, mingo, dvhart, dave, andrealmeid,
	Andrew Morton, urezki, hch, lstoakes, Arnd Bergmann, linux-api,
	linux-mm, linux-arch, malteskarupke

On Fri, Jul 21 2023 at 12:22, Peter Zijlstra wrote:
> +static inline bool futex_validate_input(unsigned int flags, u64 val)
> +{
> +	int bits = 8 * futex_size(flags);
> +	if (bits < 64 && (val >> bits))

New line between declaration and code. 

> +		return false;
> +	return true;
> +}

Other than that::

Reviewed-by: Thomas Gleixner <tglx@linutronix.de>

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v1 02/14] futex: Extend the FUTEX2 flags
  2023-07-31 16:11   ` Thomas Gleixner
  2023-07-31 16:25     ` Peter Zijlstra
@ 2023-07-31 17:16     ` Thomas Gleixner
  2023-07-31 17:35       ` Peter Zijlstra
  2023-07-31 17:42     ` Thomas Gleixner
  2 siblings, 1 reply; 52+ messages in thread
From: Thomas Gleixner @ 2023-07-31 17:16 UTC (permalink / raw)
  To: Peter Zijlstra, axboe
  Cc: linux-kernel, peterz, mingo, dvhart, dave, andrealmeid,
	Andrew Morton, urezki, hch, lstoakes, Arnd Bergmann, linux-api,
	linux-mm, linux-arch, malteskarupke

On Mon, Jul 31 2023 at 18:11, Thomas Gleixner wrote:
> On Fri, Jul 21 2023 at 12:22, Peter Zijlstra wrote:
>> -#define FUTEX2_MASK (FUTEX2_32 | FUTEX2_PRIVATE)
>> +#define FUTEX2_MASK (FUTEX2_64 | FUTEX2_PRIVATE)
>>  
>>  /**
>>   * futex_parse_waitv - Parse a waitv array from userspace
>> @@ -207,7 +207,12 @@ static int futex_parse_waitv(struct fute
>>  		if ((aux.flags & ~FUTEX2_MASK) || aux.__reserved)
>>  			return -EINVAL;
>
> With the above aux.flags with FUTEX2_32 set will result in -EINVAL. I
> don't think that's intentional.

Also if you allow 64bit wide futexes, how is that supposed to work with
the existing code, which clearly expects a 32bit uval throughout the
place?

Thanks,

        tglx



^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v1 09/14] futex: Add sys_futex_requeue()
  2023-07-21 10:22 ` [PATCH v1 09/14] futex: Add sys_futex_requeue() Peter Zijlstra
  2023-07-25  7:23   ` Geert Uytterhoeven
@ 2023-07-31 17:19   ` Thomas Gleixner
  2023-07-31 17:38     ` Peter Zijlstra
  1 sibling, 1 reply; 52+ messages in thread
From: Thomas Gleixner @ 2023-07-31 17:19 UTC (permalink / raw)
  To: Peter Zijlstra, axboe
  Cc: linux-kernel, peterz, mingo, dvhart, dave, andrealmeid,
	Andrew Morton, urezki, hch, lstoakes, Arnd Bergmann, linux-api,
	linux-mm, linux-arch, malteskarupke

On Fri, Jul 21 2023 at 12:22, Peter Zijlstra wrote:
> +/*
> + * sys_futex_requeue - Requeue a waiter from one futex to another
> + * @waiters:	array describing the source and destination futex
> + * @flags:	unused
> + * @nr_wake:	number of futexes to wake
> + * @nr_requeue:	number of futexes to requeue
> + *
> + * Identical to the traditional FUTEX_CMP_REQUEUE op, except it is part of the
> + * futex2 family of calls.
> + */
> +
> +SYSCALL_DEFINE4(futex_requeue,
> +		struct futex_waitv __user *, waiters,
> +		unsigned int, flags,
> +		int, nr_wake,
> +		int, nr_requeue)
> +{
> +	struct futex_vector futexes[2];
> +	u32 cmpval;

So this is explictely u32. I'm completely confused vs. the 64 bit futex
size variant enablement earlier in the series by now.


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v1 02/14] futex: Extend the FUTEX2 flags
  2023-07-31 17:16     ` Thomas Gleixner
@ 2023-07-31 17:35       ` Peter Zijlstra
  2023-07-31 20:52         ` Thomas Gleixner
  0 siblings, 1 reply; 52+ messages in thread
From: Peter Zijlstra @ 2023-07-31 17:35 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: axboe, linux-kernel, mingo, dvhart, dave, andrealmeid,
	Andrew Morton, urezki, hch, lstoakes, Arnd Bergmann, linux-api,
	linux-mm, linux-arch, malteskarupke

On Mon, Jul 31, 2023 at 07:16:29PM +0200, Thomas Gleixner wrote:
> On Mon, Jul 31 2023 at 18:11, Thomas Gleixner wrote:
> > On Fri, Jul 21 2023 at 12:22, Peter Zijlstra wrote:
> >> -#define FUTEX2_MASK (FUTEX2_32 | FUTEX2_PRIVATE)
> >> +#define FUTEX2_MASK (FUTEX2_64 | FUTEX2_PRIVATE)
> >>  
> >>  /**
> >>   * futex_parse_waitv - Parse a waitv array from userspace
> >> @@ -207,7 +207,12 @@ static int futex_parse_waitv(struct fute
> >>  		if ((aux.flags & ~FUTEX2_MASK) || aux.__reserved)
> >>  			return -EINVAL;
> >
> > With the above aux.flags with FUTEX2_32 set will result in -EINVAL. I
> > don't think that's intentional.
> 
> Also if you allow 64bit wide futexes, how is that supposed to work with
> the existing code, which clearly expects a 32bit uval throughout the
> place?

Not allowed yet, these patches only allow 8,16,32. I still need to audit
the whole futex core and do 'u32 -> unsigned long' (and everything else
that follows from that), and only when that's done can the futex2
syscalls allow FUTEX2_64 on 64bit archs.

So for now, these patches:

  - add the FUTEX2_64 flag,
  - add 'unsigned long' interface such that
    64bit can potentiall use it,
  - explicitly disallow having it set.


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v1 11/14] futex: Implement FUTEX2_NUMA
  2023-07-21 10:22 ` [PATCH v1 11/14] futex: Implement FUTEX2_NUMA Peter Zijlstra
  2023-07-21 12:16   ` Peter Zijlstra
@ 2023-07-31 17:36   ` Thomas Gleixner
  2023-07-31 18:03     ` Peter Zijlstra
  1 sibling, 1 reply; 52+ messages in thread
From: Thomas Gleixner @ 2023-07-31 17:36 UTC (permalink / raw)
  To: Peter Zijlstra, axboe
  Cc: linux-kernel, peterz, mingo, dvhart, dave, andrealmeid,
	Andrew Morton, urezki, hch, lstoakes, Arnd Bergmann, linux-api,
	linux-mm, linux-arch, malteskarupke

On Fri, Jul 21 2023 at 12:22, Peter Zijlstra wrote:
>  struct futex_hash_bucket *futex_hash(union futex_key *key)
>  {
> -	u32 hash = jhash2((u32 *)key, offsetof(typeof(*key), both.offset) / 4,
> +	u32 hash = jhash2((u32 *)key,
> +			  offsetof(typeof(*key), both.offset) / sizeof(u32),
>  			  key->both.offset);
> +	int node = key->both.node;
>  
> -	return &futex_queues[hash & (futex_hashsize - 1)];
> +	if (node == -1) {
> +		/*
> +		 * In case of !FLAGS_NUMA, use some unused hash bits to pick a
> +		 * node -- this ensures regular futexes are interleaved across
> +		 * the nodes and avoids having to allocate multiple
> +		 * hash-tables.
> +		 *
> +		 * NOTE: this isn't perfectly uniform, but it is fast and
> +		 * handles sparse node masks.
> +		 */
> +		node = (hash >> futex_hashshift) % nr_node_ids;

Is nr_node_ids guaranteed to be stable after init? It's marked
__read_mostly, but not __ro_after_init.

> +		if (!node_possible(node)) {
> +			node = find_next_bit_wrap(node_possible_map.bits,
> +						  nr_node_ids, node);
> +		}
> +	}
> +
> +	return &futex_queues[node][hash & (futex_hashsize - 1)];
>  }
>  	fshared = flags & FLAGS_SHARED;
> +	size = futex_size(flags);
>  
>  	/*
>  	 * The futex address must be "naturally" aligned.
>  	 */
>  	key->both.offset = address % PAGE_SIZE;
> -	if (unlikely((address % sizeof(u32)) != 0))
> +	if (unlikely((address % size) != 0))
>  		return -EINVAL;

Hmm. Shouldn't that have changed with the allowance of the 1 and 2 byte
futexes?

>  	address -= key->both.offset;
>  
> -	if (unlikely(!access_ok(uaddr, sizeof(u32))))
> +	if (flags & FLAGS_NUMA)
> +		size *= 2;
> +
> +	if (unlikely(!access_ok(uaddr, size)))
>  		return -EFAULT;
>  
>  	if (unlikely(should_fail_futex(fshared)))
>  		return -EFAULT;
>  
> +	key->both.node = -1;

Please put this into an else path.

> +	if (flags & FLAGS_NUMA) {
> +		void __user *naddr = uaddr + size/2;

size / 2;

> +
> +		if (futex_get_value(&node, naddr, flags))
> +			return -EFAULT;
> +
> +		if (node == -1) {
> +			node = numa_node_id();
> +			if (futex_put_value(node, naddr, flags))
> +				return -EFAULT;
> +		}
> +
> +		if (node >= MAX_NUMNODES || !node_possible(node))
> +			return -EINVAL;

That's clearly an else path too. No point in checking whether
numa_node_id() is valid. 

> +		key->both.node = node;
> +	}
>  
> +static inline unsigned int futex_size(unsigned int flags)
> +{
> +	unsigned int size = flags & FLAGS_SIZE_MASK;
> +	return 1 << size; /* {0,1,2,3} -> {1,2,4,8} */
> +}
> +
>  static inline bool futex_flags_valid(unsigned int flags)
>  {
>  	/* Only 64bit futexes for 64bit code */
> @@ -77,13 +83,19 @@ static inline bool futex_flags_valid(uns
>  	if ((flags & FLAGS_SIZE_MASK) != FLAGS_SIZE_32)
>  		return false;
>  
> -	return true;
> -}
> +	/*
> +	 * Must be able to represent both NUMA_NO_NODE and every valid nodeid
> +	 * in a futex word.
> +	 */
> +	if (flags & FLAGS_NUMA) {
> +		int bits = 8 * futex_size(flags);
> +		u64 max = ~0ULL;
> +		max >>= 64 - bits;
Your newline key is broken, right?
> +		if (nr_node_ids >= max)
> +			return false;
> +	}

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v1 09/14] futex: Add sys_futex_requeue()
  2023-07-31 17:19   ` Thomas Gleixner
@ 2023-07-31 17:38     ` Peter Zijlstra
  0 siblings, 0 replies; 52+ messages in thread
From: Peter Zijlstra @ 2023-07-31 17:38 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: axboe, linux-kernel, mingo, dvhart, dave, andrealmeid,
	Andrew Morton, urezki, hch, lstoakes, Arnd Bergmann, linux-api,
	linux-mm, linux-arch, malteskarupke

On Mon, Jul 31, 2023 at 07:19:17PM +0200, Thomas Gleixner wrote:
> On Fri, Jul 21 2023 at 12:22, Peter Zijlstra wrote:
> > +/*
> > + * sys_futex_requeue - Requeue a waiter from one futex to another
> > + * @waiters:	array describing the source and destination futex
> > + * @flags:	unused
> > + * @nr_wake:	number of futexes to wake
> > + * @nr_requeue:	number of futexes to requeue
> > + *
> > + * Identical to the traditional FUTEX_CMP_REQUEUE op, except it is part of the
> > + * futex2 family of calls.
> > + */
> > +
> > +SYSCALL_DEFINE4(futex_requeue,
> > +		struct futex_waitv __user *, waiters,
> > +		unsigned int, flags,
> > +		int, nr_wake,
> > +		int, nr_requeue)
> > +{
> > +	struct futex_vector futexes[2];
> > +	u32 cmpval;
> 
> So this is explictely u32. I'm completely confused vs. the 64 bit futex
> size variant enablement earlier in the series by now.

As per the previous email; these patches only enable the syscall part of
64bit futexes, they do not convert the core, and per futex_flags_valid()
(patches 4 and 13), explicitly disallow having FUTEX2_64 set.


-       /* Only 32bit futexes are implemented -- for now */
-       if ((flags & FLAGS_SIZE_MASK) != FLAGS_SIZE_32)
+       /* 64bit futexes aren't implemented -- yet */
+       if ((flags & FLAGS_SIZE_MASK) == FLAGS_SIZE_64)
		return false;



^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v1 02/14] futex: Extend the FUTEX2 flags
  2023-07-31 16:11   ` Thomas Gleixner
  2023-07-31 16:25     ` Peter Zijlstra
  2023-07-31 17:16     ` Thomas Gleixner
@ 2023-07-31 17:42     ` Thomas Gleixner
  2023-07-31 19:20       ` Peter Zijlstra
  2 siblings, 1 reply; 52+ messages in thread
From: Thomas Gleixner @ 2023-07-31 17:42 UTC (permalink / raw)
  To: Peter Zijlstra, axboe
  Cc: linux-kernel, peterz, mingo, dvhart, dave, andrealmeid,
	Andrew Morton, urezki, hch, lstoakes, Arnd Bergmann, linux-api,
	linux-mm, linux-arch, malteskarupke

On Mon, Jul 31 2023 at 18:11, Thomas Gleixner wrote:

> On Fri, Jul 21 2023 at 12:22, Peter Zijlstra wrote:
>> +#define FUTEX2_8		0x00
>> +#define FUTEX2_16		0x01
>>  #define FUTEX2_32		0x02
>> -			/*	0x04 */
>> +#define FUTEX2_64		0x03
>> +#define FUTEX2_NUMA		0x04
>>  			/*	0x08 */
>>  			/*	0x10 */
>>  			/*	0x20 */
>> --- a/kernel/futex/syscalls.c
>> +++ b/kernel/futex/syscalls.c
>> @@ -183,7 +183,7 @@ SYSCALL_DEFINE6(futex, u32 __user *, uad
>>  	return do_futex(uaddr, op, val, tp, uaddr2, (unsigned long)utime, val3);
>>  }
>>  
>> -#define FUTEX2_MASK (FUTEX2_32 | FUTEX2_PRIVATE)
>> +#define FUTEX2_MASK (FUTEX2_64 | FUTEX2_PRIVATE)
>>  
>>  /**
>>   * futex_parse_waitv - Parse a waitv array from userspace
>> @@ -207,7 +207,12 @@ static int futex_parse_waitv(struct fute
>>  		if ((aux.flags & ~FUTEX2_MASK) || aux.__reserved)
>>  			return -EINVAL;
>
> With the above aux.flags with FUTEX2_32 set will result in -EINVAL. I
> don't think that's intentional.

Aargh. This is really nasty to make FUTEX2_64 0x3 and abuse it to test
the flags for validity. Intuitive and obvious is something else.


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v1 11/14] futex: Implement FUTEX2_NUMA
  2023-07-31 17:36   ` Thomas Gleixner
@ 2023-07-31 18:03     ` Peter Zijlstra
  2023-07-31 21:26       ` Thomas Gleixner
  0 siblings, 1 reply; 52+ messages in thread
From: Peter Zijlstra @ 2023-07-31 18:03 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: axboe, linux-kernel, mingo, dvhart, dave, andrealmeid,
	Andrew Morton, urezki, hch, lstoakes, Arnd Bergmann, linux-api,
	linux-mm, linux-arch, malteskarupke

On Mon, Jul 31, 2023 at 07:36:21PM +0200, Thomas Gleixner wrote:
> On Fri, Jul 21 2023 at 12:22, Peter Zijlstra wrote:
> >  struct futex_hash_bucket *futex_hash(union futex_key *key)
> >  {
> > -	u32 hash = jhash2((u32 *)key, offsetof(typeof(*key), both.offset) / 4,
> > +	u32 hash = jhash2((u32 *)key,
> > +			  offsetof(typeof(*key), both.offset) / sizeof(u32),
> >  			  key->both.offset);
> > +	int node = key->both.node;
> >  
> > -	return &futex_queues[hash & (futex_hashsize - 1)];
> > +	if (node == -1) {
> > +		/*
> > +		 * In case of !FLAGS_NUMA, use some unused hash bits to pick a
> > +		 * node -- this ensures regular futexes are interleaved across
> > +		 * the nodes and avoids having to allocate multiple
> > +		 * hash-tables.
> > +		 *
> > +		 * NOTE: this isn't perfectly uniform, but it is fast and
> > +		 * handles sparse node masks.
> > +		 */
> > +		node = (hash >> futex_hashshift) % nr_node_ids;
> 
> Is nr_node_ids guaranteed to be stable after init? It's marked
> __read_mostly, but not __ro_after_init.

AFAICT it is only ever written to in setup_nr_node_ids() and that is all
__init code. So I'm thinking this could/should indeed be
__ro_after_init. Esp. so since it is an exported variable.

Mike?

> > +		if (!node_possible(node)) {
> > +			node = find_next_bit_wrap(node_possible_map.bits,
> > +						  nr_node_ids, node);
> > +		}
> > +	}
> > +
> > +	return &futex_queues[node][hash & (futex_hashsize - 1)];
> >  }
> >  	fshared = flags & FLAGS_SHARED;
> > +	size = futex_size(flags);
> >  
> >  	/*
> >  	 * The futex address must be "naturally" aligned.
> >  	 */
> >  	key->both.offset = address % PAGE_SIZE;
> > -	if (unlikely((address % sizeof(u32)) != 0))
> > +	if (unlikely((address % size) != 0))
> >  		return -EINVAL;
> 
> Hmm. Shouldn't that have changed with the allowance of the 1 and 2 byte
> futexes?

That patches comes after this.. :-)

But I do have an open question here; do we want FUTEX2_NUMA futexes
aligned at futex_size or double that? That is, what do we want the
alignment of:

struct futex_numa_32 {
	u32 val;
	u32 node;
};

to be? Having that u64 aligned will guarantee these two values end up in
the same page, having them u32 aligned (as per this patch) allows for
them to be split.

The current paths don't care, we don't hold locks, but perhaps it makes
sense to be conservative.

> >  	address -= key->both.offset;
> >  
> > -	if (unlikely(!access_ok(uaddr, sizeof(u32))))
> > +	if (flags & FLAGS_NUMA)
> > +		size *= 2;
> > +
> > +	if (unlikely(!access_ok(uaddr, size)))
> >  		return -EFAULT;
> >  
> >  	if (unlikely(should_fail_futex(fshared)))
> >  		return -EFAULT;
> >  
> > +	key->both.node = -1;
> 
> Please put this into an else path.

Can do, but I figured the compiler could figure it out through dead
store elimitation or somesuch pass.

> > +	if (flags & FLAGS_NUMA) {
> > +		void __user *naddr = uaddr + size/2;
> 
> size / 2;
> 
> > +
> > +		if (futex_get_value(&node, naddr, flags))
> > +			return -EFAULT;
> > +
> > +		if (node == -1) {
> > +			node = numa_node_id();
> > +			if (futex_put_value(node, naddr, flags))
> > +				return -EFAULT;
> > +		}
> > +
> > +		if (node >= MAX_NUMNODES || !node_possible(node))
> > +			return -EINVAL;
> 
> That's clearly an else path too. No point in checking whether
> numa_node_id() is valid.

No, this also checks if the value we read from userspace is valid.

Only when the value we read from userspace is -1 do we set
numa_node_id(), otherwise we take the value as read, which then must be
a valid value.

> > +		key->both.node = node;
> > +	}
> >  
> > +static inline unsigned int futex_size(unsigned int flags)
> > +{
> > +	unsigned int size = flags & FLAGS_SIZE_MASK;
> > +	return 1 << size; /* {0,1,2,3} -> {1,2,4,8} */
> > +}
> > +
> >  static inline bool futex_flags_valid(unsigned int flags)
> >  {
> >  	/* Only 64bit futexes for 64bit code */
> > @@ -77,13 +83,19 @@ static inline bool futex_flags_valid(uns
> >  	if ((flags & FLAGS_SIZE_MASK) != FLAGS_SIZE_32)
> >  		return false;
> >  
> > -	return true;
> > -}
> > +	/*
> > +	 * Must be able to represent both NUMA_NO_NODE and every valid nodeid
> > +	 * in a futex word.
> > +	 */
> > +	if (flags & FLAGS_NUMA) {
> > +		int bits = 8 * futex_size(flags);
> > +		u64 max = ~0ULL;
> > +		max >>= 64 - bits;
> Your newline key is broken, right?





Yes :-)

> > +		if (nr_node_ids >= max)
> > +			return false;
> > +	}

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v1 02/14] futex: Extend the FUTEX2 flags
  2023-07-31 17:42     ` Thomas Gleixner
@ 2023-07-31 19:20       ` Peter Zijlstra
  2023-07-31 21:14         ` Thomas Gleixner
  0 siblings, 1 reply; 52+ messages in thread
From: Peter Zijlstra @ 2023-07-31 19:20 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: axboe, linux-kernel, mingo, dvhart, dave, andrealmeid,
	Andrew Morton, urezki, hch, lstoakes, Arnd Bergmann, linux-api,
	linux-mm, linux-arch, malteskarupke

On Mon, Jul 31, 2023 at 07:42:11PM +0200, Thomas Gleixner wrote:

> Aargh. This is really nasty to make FUTEX2_64 0x3 and abuse it to test
> the flags for validity. Intuitive and obvious is something else.

Like so then?

--- a/include/uapi/linux/futex.h
+++ b/include/uapi/linux/futex.h
@@ -57,6 +57,8 @@
 			/*	0x40 */
 #define FUTEX2_PRIVATE		FUTEX_PRIVATE_FLAG
 
+#define FUTEX2_SIZE_MASK	0x03
+
 /* do not use */
 #define FUTEX_32		FUTEX2_32 /* historical accident :-( */
 
--- a/kernel/futex/syscalls.c
+++ b/kernel/futex/syscalls.c
@@ -183,7 +183,7 @@ SYSCALL_DEFINE6(futex, u32 __user *, uad
 	return do_futex(uaddr, op, val, tp, uaddr2, (unsigned long)utime, val3);
 }
 
-#define FUTEX2_MASK (FUTEX2_64 | FUTEX2_PRIVATE)
+#define FUTEX2_MASK (FUTEX2_SIZE_MASK | FUTEX2_PRIVATE)
 
 /**
  * futex_parse_waitv - Parse a waitv array from userspace
@@ -208,11 +208,11 @@ static int futex_parse_waitv(struct fute
 			return -EINVAL;
 
 		if (!IS_ENABLED(CONFIG_64BIT) || in_compat_syscall()) {
-			if ((aux.flags & FUTEX2_64) == FUTEX2_64)
+			if ((aux.flags & FUTEX2_SIZE_MASK) == FUTEX2_64)
 				return -EINVAL;
 		}
 
-		if ((aux.flags & FUTEX2_64) != FUTEX2_32)
+		if ((aux.flags & FUTEX2_SIZE_MASK) != FUTEX2_32)
 			return -EINVAL;
 
 		futexv[i].w.flags = aux.flags;

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v1 02/14] futex: Extend the FUTEX2 flags
  2023-07-31 17:35       ` Peter Zijlstra
@ 2023-07-31 20:52         ` Thomas Gleixner
  0 siblings, 0 replies; 52+ messages in thread
From: Thomas Gleixner @ 2023-07-31 20:52 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: axboe, linux-kernel, mingo, dvhart, dave, andrealmeid,
	Andrew Morton, urezki, hch, lstoakes, Arnd Bergmann, linux-api,
	linux-mm, linux-arch, malteskarupke

On Mon, Jul 31 2023 at 19:35, Peter Zijlstra wrote:
> On Mon, Jul 31, 2023 at 07:16:29PM +0200, Thomas Gleixner wrote:
>> On Mon, Jul 31 2023 at 18:11, Thomas Gleixner wrote:
>> > On Fri, Jul 21 2023 at 12:22, Peter Zijlstra wrote:
>> >> -#define FUTEX2_MASK (FUTEX2_32 | FUTEX2_PRIVATE)
>> >> +#define FUTEX2_MASK (FUTEX2_64 | FUTEX2_PRIVATE)
>> >>  
>> >>  /**
>> >>   * futex_parse_waitv - Parse a waitv array from userspace
>> >> @@ -207,7 +207,12 @@ static int futex_parse_waitv(struct fute
>> >>  		if ((aux.flags & ~FUTEX2_MASK) || aux.__reserved)
>> >>  			return -EINVAL;
>> >
>> > With the above aux.flags with FUTEX2_32 set will result in -EINVAL. I
>> > don't think that's intentional.
>> 
>> Also if you allow 64bit wide futexes, how is that supposed to work with
>> the existing code, which clearly expects a 32bit uval throughout the
>> place?
>
> Not allowed yet, these patches only allow 8,16,32. I still need to audit
> the whole futex core and do 'u32 -> unsigned long' (and everything else
> that follows from that), and only when that's done can the futex2
> syscalls allow FUTEX2_64 on 64bit archs.
>
> So for now, these patches:
>
>   - add the FUTEX2_64 flag,
>   - add 'unsigned long' interface such that
>     64bit can potentiall use it,
>   - explicitly disallow having it set.

I figured that out very late. This flags having a size fields which
claims to be flags had confused the hell out of me.


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v1 02/14] futex: Extend the FUTEX2 flags
  2023-07-31 19:20       ` Peter Zijlstra
@ 2023-07-31 21:14         ` Thomas Gleixner
  2023-07-31 21:33           ` Peter Zijlstra
  0 siblings, 1 reply; 52+ messages in thread
From: Thomas Gleixner @ 2023-07-31 21:14 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: axboe, linux-kernel, mingo, dvhart, dave, andrealmeid,
	Andrew Morton, urezki, hch, lstoakes, Arnd Bergmann, linux-api,
	linux-mm, linux-arch, malteskarupke

On Mon, Jul 31 2023 at 21:20, Peter Zijlstra wrote:
> -#define FUTEX2_MASK (FUTEX2_64 | FUTEX2_PRIVATE)
> +#define FUTEX2_MASK (FUTEX2_SIZE_MASK | FUTEX2_PRIVATE)

Along with some comment which documents that the size "flags" constitute
a number field and not flags in the sense of binary flags.

And please name these size constants so it really becomes obvious:

#define FUTEX2_SIZE_U32		2

>  /**
>   * futex_parse_waitv - Parse a waitv array from userspace
> @@ -208,11 +208,11 @@ static int futex_parse_waitv(struct fute
>  			return -EINVAL;
>  
>  		if (!IS_ENABLED(CONFIG_64BIT) || in_compat_syscall()) {
> -			if ((aux.flags & FUTEX2_64) == FUTEX2_64)
> +			if ((aux.flags & FUTEX2_SIZE_MASK) == FUTEX2_64)
>  				return -EINVAL;
>  		}

That should be part of the actual 64bit futex enablement, no?
  
> -		if ((aux.flags & FUTEX2_64) != FUTEX2_32)
> +		if ((aux.flags & FUTEX2_SIZE_MASK) != FUTEX2_32)
>  			return -EINVAL;

In hindsight I think it was as mistake just to have this __u32 flags
field in the new interface. Soemthing like the incomplete below might be
retrofittable, no?

--- a/include/uapi/linux/futex.h
+++ b/include/uapi/linux/futex.h
@@ -74,7 +74,12 @@
 struct futex_waitv {
 	__u64 val;
 	__u64 uaddr;
-	__u32 flags;
+	union {
+		__u32	flags;
+		__u32	size	: 2,
+				: 5,
+			private	: 1;
+	};
 	__u32 __reserved;
 };
 
--- a/kernel/futex/syscalls.c
+++ b/kernel/futex/syscalls.c
@@ -204,10 +204,10 @@ static int futex_parse_waitv(struct fute
 		if (copy_from_user(&aux, &uwaitv[i], sizeof(aux)))
 			return -EFAULT;
 
-		if ((aux.flags & ~FUTEX2_MASK) || aux.__reserved)
+		if ((aux.flags & ~FUTEX2_VALID_FLAGBITS) || aux.__reserved)
 			return -EINVAL;
 
-		if (!(aux.flags & FUTEX2_32))
+		if (aux.size != FUTEX2_SIZE_U32)
 			return -EINVAL;
 
 		futexv[i].w.flags = aux.flags;


If this muck already confused me right now, then I don't want to
experience the confusion factor 6 month down the road :)

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v1 11/14] futex: Implement FUTEX2_NUMA
  2023-07-31 18:03     ` Peter Zijlstra
@ 2023-07-31 21:26       ` Thomas Gleixner
  0 siblings, 0 replies; 52+ messages in thread
From: Thomas Gleixner @ 2023-07-31 21:26 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: axboe, linux-kernel, mingo, dvhart, dave, andrealmeid,
	Andrew Morton, urezki, hch, lstoakes, Arnd Bergmann, linux-api,
	linux-mm, linux-arch, malteskarupke

On Mon, Jul 31 2023 at 20:03, Peter Zijlstra wrote:
> On Mon, Jul 31, 2023 at 07:36:21PM +0200, Thomas Gleixner wrote:
>> Hmm. Shouldn't that have changed with the allowance of the 1 and 2 byte
>> futexes?
>
> That patches comes after this.. :-)

Futexes are really cursed :)

> But I do have an open question here; do we want FUTEX2_NUMA futexes
> aligned at futex_size or double that? That is, what do we want the
> alignment of:
>
> struct futex_numa_32 {
> 	u32 val;
> 	u32 node;
> };
>
> to be? Having that u64 aligned will guarantee these two values end up in
> the same page, having them u32 aligned (as per this patch) allows for
> them to be split.

Same page and same cacheline.

> The current paths don't care, we don't hold locks, but perhaps it makes
> sense to be conservative.

I think it makes sense.

>> >  	address -= key->both.offset;
>> >  
>> > -	if (unlikely(!access_ok(uaddr, sizeof(u32))))
>> > +	if (flags & FLAGS_NUMA)
>> > +		size *= 2;
>> > +
>> > +	if (unlikely(!access_ok(uaddr, size)))
>> >  		return -EFAULT;
>> >  
>> >  	if (unlikely(should_fail_futex(fshared)))
>> >  		return -EFAULT;
>> >  
>> > +	key->both.node = -1;
>> 
>> Please put this into an else path.
>
> Can do, but I figured the compiler could figure it out through dead
> store elimitation or somesuch pass.

Sure, but taste disagrees and it simply makes the code more obvious.

>> > +	if (flags & FLAGS_NUMA) {
>> > +		void __user *naddr = uaddr + size/2;
>> 
>> size / 2;
>> 
>> > +
>> > +		if (futex_get_value(&node, naddr, flags))
>> > +			return -EFAULT;
>> > +
>> > +		if (node == -1) {
>> > +			node = numa_node_id();
>> > +			if (futex_put_value(node, naddr, flags))
>> > +				return -EFAULT;
>> > +		}
>> > +
>> > +		if (node >= MAX_NUMNODES || !node_possible(node))
>> > +			return -EINVAL;
>> 
>> That's clearly an else path too. No point in checking whether
>> numa_node_id() is valid.
>
> No, this also checks if the value we read from userspace is valid.
>
> Only when the value we read from userspace is -1 do we set
> numa_node_id(), otherwise we take the value as read, which then must be
> a valid value.

Right, but:

	if (node == -1) {
		node = numa_node_id();
		if (futex_put_value(node, naddr, flags))
			return -EFAULT;
	} else if (node >= MAX_NUMNODES || !node_possible(node)) {
		return -EINVAL;
        }

makes it clear that the path where @node read from user space is != -1
needs to be validated, while your version checks the result of

      node = numa_node_id();

too, which does not make sense to me. Yes, it works, but ...

Thanks,

        tglx


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v1 02/14] futex: Extend the FUTEX2 flags
  2023-07-31 21:14         ` Thomas Gleixner
@ 2023-07-31 21:33           ` Peter Zijlstra
  2023-07-31 22:43             ` Thomas Gleixner
  0 siblings, 1 reply; 52+ messages in thread
From: Peter Zijlstra @ 2023-07-31 21:33 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: axboe, linux-kernel, mingo, dvhart, dave, andrealmeid,
	Andrew Morton, urezki, hch, lstoakes, Arnd Bergmann, linux-api,
	linux-mm, linux-arch, malteskarupke

On Mon, Jul 31, 2023 at 11:14:11PM +0200, Thomas Gleixner wrote:
> On Mon, Jul 31 2023 at 21:20, Peter Zijlstra wrote:
> > -#define FUTEX2_MASK (FUTEX2_64 | FUTEX2_PRIVATE)
> > +#define FUTEX2_MASK (FUTEX2_SIZE_MASK | FUTEX2_PRIVATE)
> 
> Along with some comment which documents that the size "flags" constitute
> a number field and not flags in the sense of binary flags.
> 
> And please name these size constants so it really becomes obvious:
> 
> #define FUTEX2_SIZE_U32		2

So you want them named:

#define FUTEX2_SIZE_U8		0x00
#define FUTEX2_SIZE_U16		0x01
#define FUTEX2_SIZE_U32		0x02
#define FUTEX2_SIZE_U64		0x03

#define FUTEX2_SIZE_MASK	0x03

Sure, can do.

> >  /**
> >   * futex_parse_waitv - Parse a waitv array from userspace
> > @@ -208,11 +208,11 @@ static int futex_parse_waitv(struct fute
> >  			return -EINVAL;
> >  
> >  		if (!IS_ENABLED(CONFIG_64BIT) || in_compat_syscall()) {
> > -			if ((aux.flags & FUTEX2_64) == FUTEX2_64)
> > +			if ((aux.flags & FUTEX2_SIZE_MASK) == FUTEX2_64)
> >  				return -EINVAL;
> >  		}
> 
> That should be part of the actual 64bit futex enablement, no?

The 'unsigned long' thing is part of the syscalls, which is why I had it
now.

>   
> > -		if ((aux.flags & FUTEX2_64) != FUTEX2_32)
> > +		if ((aux.flags & FUTEX2_SIZE_MASK) != FUTEX2_32)
> >  			return -EINVAL;
> 
> In hindsight I think it was as mistake just to have this __u32 flags
> field in the new interface. Soemthing like the incomplete below might be
> retrofittable, no?
> 
> --- a/include/uapi/linux/futex.h
> +++ b/include/uapi/linux/futex.h
> @@ -74,7 +74,12 @@
>  struct futex_waitv {
>  	__u64 val;
>  	__u64 uaddr;
> -	__u32 flags;
> +	union {
> +		__u32	flags;
> +		__u32	size	: 2,
> +				: 5,
> +			private	: 1;
> +	};
>  	__u32 __reserved;
>  };

Durr, I'm not sure I remember if that does the right thing across
architectures -- might just work. But I'm fairly sure this isn't the
only case of a field in a flags thing in our APIs. Although obviously I
can't find another case in a hurry :/

Also, sys_futex_{wake,wait}() have this thing as a syscall argument,
surely you don't want to put this union there as well?

I'd much prefer to just keep the 'unsigned int flags' thing and perhaps
put a comment on-top of the '#define FUTEX2_*' thingies. Note that
having it a field instead of a bunch of flags makes sense, since you can
only have a single size, not a combination of sizes.



^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v1 02/14] futex: Extend the FUTEX2 flags
  2023-07-31 21:33           ` Peter Zijlstra
@ 2023-07-31 22:43             ` Thomas Gleixner
  2023-07-31 22:59               ` Peter Zijlstra
  2023-08-01  6:02               ` Arnd Bergmann
  0 siblings, 2 replies; 52+ messages in thread
From: Thomas Gleixner @ 2023-07-31 22:43 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: axboe, linux-kernel, mingo, dvhart, dave, andrealmeid,
	Andrew Morton, urezki, hch, lstoakes, Arnd Bergmann, linux-api,
	linux-mm, linux-arch, malteskarupke

On Mon, Jul 31 2023 at 23:33, Peter Zijlstra wrote:
> On Mon, Jul 31, 2023 at 11:14:11PM +0200, Thomas Gleixner wrote:
>> --- a/include/uapi/linux/futex.h
>> +++ b/include/uapi/linux/futex.h
>> @@ -74,7 +74,12 @@
>>  struct futex_waitv {
>>  	__u64 val;
>>  	__u64 uaddr;
>> -	__u32 flags;
>> +	union {
>> +		__u32	flags;
>> +		__u32	size	: 2,
>> +				: 5,
>> +			private	: 1;
>> +	};
>>  	__u32 __reserved;
>>  };
>
> Durr, I'm not sure I remember if that does the right thing across
> architectures -- might just work. But I'm fairly sure this isn't the
> only case of a field in a flags thing in our APIs. Although obviously
> I can't find another case in a hurry :/

I know, but that doesn't make these things more readable and neither an
argument against doing it for futex2 :)

> Also, sys_futex_{wake,wait}() have this thing as a syscall argument,
> surely you don't want to put this union there as well?

Why not? The anon union does not break the ABI unless I'm missing
something. Existing user space can still use 'flags' and people who care
about readability can use the bitfield, no?

Its inside struct futex_waitv and not an explicit syscall argument, right?

> I'd much prefer to just keep the 'unsigned int flags' thing and perhaps
> put a comment on-top of the '#define FUTEX2_*' thingies. Note that
> having it a field instead of a bunch of flags makes sense, since you can
> only have a single size, not a combination of sizes.

I'm aware of that by now :)

Still that explicit bitfield does neither need comments nor does it
leave room for interpretation.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v1 02/14] futex: Extend the FUTEX2 flags
  2023-07-31 22:43             ` Thomas Gleixner
@ 2023-07-31 22:59               ` Peter Zijlstra
  2023-08-01  8:49                 ` Thomas Gleixner
  2023-08-01  6:02               ` Arnd Bergmann
  1 sibling, 1 reply; 52+ messages in thread
From: Peter Zijlstra @ 2023-07-31 22:59 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: axboe, linux-kernel, mingo, dvhart, dave, andrealmeid,
	Andrew Morton, urezki, hch, lstoakes, Arnd Bergmann, linux-api,
	linux-mm, linux-arch, malteskarupke

On Tue, Aug 01, 2023 at 12:43:24AM +0200, Thomas Gleixner wrote:
> > Also, sys_futex_{wake,wait}() have this thing as a syscall argument,
> > surely you don't want to put this union there as well?
> 
> Why not? The anon union does not break the ABI unless I'm missing
> something. Existing user space can still use 'flags' and people who care
> about readability can use the bitfield, no?
> 
> Its inside struct futex_waitv and not an explicit syscall argument, right?

Nope, see patches 5 and 6, they introduce:

sys_futex_wake(void __user *uaddr, unsigned long mask, int nr, unsigned int flags);
sys_futex_wait(void __user *uaddr, unsigned long val,
               unsigned long mask, unsigned int flags,
	       struct __kernel_timespec __user *timeout, clockid_t clockid);

Using a union, would turn that into:

sys_futex_wake(void __user *uaddr, unsigned long mask, int nr, union futex_flags flags);
sys_futex_wait(void __user *uaddr, unsigned long val,
               unsigned long mask, union futex_flags flags,
	       struct __kernel_timespec __user *timeout, clockid_t clockid);

Which then gets people to write garbage like:

	futex_wake(add, 0xFFFF, 1, (union futex_flags){ .flags = FUTEX2_SIZE_U16 | FUTEX2_PRIVATE));
or
	futex_wake(add, 0xFFFF, 1, (union futex_flags){ .size = FUTEX2_SIZE_U16, private = true, ));

You really want that ?

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v1 02/14] futex: Extend the FUTEX2 flags
  2023-07-31 22:43             ` Thomas Gleixner
  2023-07-31 22:59               ` Peter Zijlstra
@ 2023-08-01  6:02               ` Arnd Bergmann
  1 sibling, 0 replies; 52+ messages in thread
From: Arnd Bergmann @ 2023-08-01  6:02 UTC (permalink / raw)
  To: Thomas Gleixner, Peter Zijlstra
  Cc: Jens Axboe, linux-kernel, Ingo Molnar, Darren Hart, dave,
	andrealmeid, Andrew Morton, urezki, Christoph Hellwig,
	Lorenzo Stoakes, linux-api, linux-mm, Linux-Arch, malteskarupke

On Tue, Aug 1, 2023, at 00:43, Thomas Gleixner wrote:
> On Mon, Jul 31 2023 at 23:33, Peter Zijlstra wrote:
>> On Mon, Jul 31, 2023 at 11:14:11PM +0200, Thomas Gleixner wrote:
>>> --- a/include/uapi/linux/futex.h
>>> +++ b/include/uapi/linux/futex.h
>>> @@ -74,7 +74,12 @@
>>>  struct futex_waitv {
>>>  	__u64 val;
>>>  	__u64 uaddr;
>>> -	__u32 flags;
>>> +	union {
>>> +		__u32	flags;
>>> +		__u32	size	: 2,
>>> +				: 5,
>>> +			private	: 1;
>>> +	};
>>>  	__u32 __reserved;
>>>  };
>>
>> Durr, I'm not sure I remember if that does the right thing across
>> architectures -- might just work. But I'm fairly sure this isn't the
>> only case of a field in a flags thing in our APIs. Although obviously
>> I can't find another case in a hurry :/
>
> I know, but that doesn't make these things more readable and neither an
> argument against doing it for futex2 :)
...
>
> Still that explicit bitfield does neither need comments nor does it
> leave room for interpretation.

It may be clear to the compiler, but without comments or
looking up psABI documentation I certainly wouldn't know
immediately which bits of the flags word overlay the bitfields
for a given combination of __BIG_ENDIAN/__LITTLE_ENDIAN
and __BIG_ENDIAN_BITFIELD/__LITTLE_ENDIAN_BITFIELD or
architectures with unusual struct alignment requirements
(m68k or arm-oabi).

I'd prefer to completely avoid the bitfield here. Maybe having
exclusive flags for each width would be less confusing, at the
cost of needing two more flag bits and a slightly more complicated
sanity check, or we could take an extra byte out of the __reserved
field to store the length.

       Arnd

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v1 02/14] futex: Extend the FUTEX2 flags
  2023-07-31 22:59               ` Peter Zijlstra
@ 2023-08-01  8:49                 ` Thomas Gleixner
  0 siblings, 0 replies; 52+ messages in thread
From: Thomas Gleixner @ 2023-08-01  8:49 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: axboe, linux-kernel, mingo, dvhart, dave, andrealmeid,
	Andrew Morton, urezki, hch, lstoakes, Arnd Bergmann, linux-api,
	linux-mm, linux-arch, malteskarupke

On Tue, Aug 01 2023 at 00:59, Peter Zijlstra wrote:
> On Tue, Aug 01, 2023 at 12:43:24AM +0200, Thomas Gleixner wrote:
> Which then gets people to write garbage like:
>
> 	futex_wake(add, 0xFFFF, 1, (union futex_flags){ .flags = FUTEX2_SIZE_U16 | FUTEX2_PRIVATE));
> or
> 	futex_wake(add, 0xFFFF, 1, (union futex_flags){ .size = FUTEX2_SIZE_U16, private = true, ));
>
> You really want that ?

Well, people write garbage no matter what. So just keep the flags and
make the names explicit.

Note to myself: /me shouldn't look at futex patches when tired

^ permalink raw reply	[flat|nested] 52+ messages in thread

end of thread, other threads:[~2023-08-01  8:49 UTC | newest]

Thread overview: 52+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-07-21 10:22 [PATCH v1 00/14] futex: More futex2 bits Peter Zijlstra
2023-07-21 10:22 ` [PATCH v1 01/14] futex: Clarify FUTEX2 flags Peter Zijlstra
2023-07-31 16:08   ` Thomas Gleixner
2023-07-21 10:22 ` [PATCH v1 02/14] futex: Extend the " Peter Zijlstra
2023-07-21 15:47   ` Arnd Bergmann
2023-07-21 18:52     ` Peter Zijlstra
2023-07-31 16:11   ` Thomas Gleixner
2023-07-31 16:25     ` Peter Zijlstra
2023-07-31 17:16     ` Thomas Gleixner
2023-07-31 17:35       ` Peter Zijlstra
2023-07-31 20:52         ` Thomas Gleixner
2023-07-31 17:42     ` Thomas Gleixner
2023-07-31 19:20       ` Peter Zijlstra
2023-07-31 21:14         ` Thomas Gleixner
2023-07-31 21:33           ` Peter Zijlstra
2023-07-31 22:43             ` Thomas Gleixner
2023-07-31 22:59               ` Peter Zijlstra
2023-08-01  8:49                 ` Thomas Gleixner
2023-08-01  6:02               ` Arnd Bergmann
2023-07-21 10:22 ` [PATCH v1 03/14] futex: Flag conversion Peter Zijlstra
2023-07-31 16:21   ` Thomas Gleixner
2023-07-31 16:26     ` Peter Zijlstra
2023-07-21 10:22 ` [PATCH v1 04/14] futex: Validate futex value against futex size Peter Zijlstra
2023-07-31 17:12   ` Thomas Gleixner
2023-07-21 10:22 ` [PATCH v1 05/14] futex: Add sys_futex_wake() Peter Zijlstra
2023-07-21 15:41   ` Arnd Bergmann
2023-07-21 18:54     ` Peter Zijlstra
2023-07-21 21:23       ` Arnd Bergmann
2023-07-25  7:22   ` Geert Uytterhoeven
2023-07-21 10:22 ` [PATCH v1 06/14] futex: Add sys_futex_wait() Peter Zijlstra
2023-07-25  7:22   ` Geert Uytterhoeven
2023-07-31 16:35   ` Thomas Gleixner
2023-07-21 10:22 ` [PATCH v1 07/14] futex: Propagate flags into get_futex_key() Peter Zijlstra
2023-07-31 16:36   ` Thomas Gleixner
2023-07-21 10:22 ` [PATCH v1 08/14] futex: Add flags2 argument to futex_requeue() Peter Zijlstra
2023-07-31 16:43   ` Thomas Gleixner
2023-07-21 10:22 ` [PATCH v1 09/14] futex: Add sys_futex_requeue() Peter Zijlstra
2023-07-25  7:23   ` Geert Uytterhoeven
2023-07-31 17:19   ` Thomas Gleixner
2023-07-31 17:38     ` Peter Zijlstra
2023-07-21 10:22 ` [PATCH v1 10/14] mm: Add vmalloc_huge_node() Peter Zijlstra
2023-07-24 13:46   ` Christoph Hellwig
2023-07-21 10:22 ` [PATCH v1 11/14] futex: Implement FUTEX2_NUMA Peter Zijlstra
2023-07-21 12:16   ` Peter Zijlstra
2023-07-31 17:36   ` Thomas Gleixner
2023-07-31 18:03     ` Peter Zijlstra
2023-07-31 21:26       ` Thomas Gleixner
2023-07-21 10:22 ` [PATCH v1 12/14] futex: Propagate flags into futex_get_value_locked() Peter Zijlstra
2023-07-21 10:22 ` [PATCH v1 13/14] futex: Enable FUTEX2_{8,16} Peter Zijlstra
2023-07-21 10:22 ` [PATCH v1 14/14] futex,selftests: Extend the futex selftests Peter Zijlstra
2023-07-21 14:42 ` [PATCH v1 00/14] futex: More futex2 bits Jens Axboe
2023-07-21 15:49 ` Arnd Bergmann

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.