linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH 0/9] NT synchronization primitive driver
@ 2024-01-24  0:40 Elizabeth Figura
  2024-01-24  0:40 ` [RFC PATCH 1/9] ntsync: Introduce the ntsync driver and character device Elizabeth Figura
                   ` (9 more replies)
  0 siblings, 10 replies; 39+ messages in thread
From: Elizabeth Figura @ 2024-01-24  0:40 UTC (permalink / raw)
  To: Arnd Bergmann, Greg Kroah-Hartman, linux-kernel, linux-api
  Cc: wine-devel, André Almeida, Wolfram Sang, Arkadiusz Hiler,
	Peter Zijlstra, Elizabeth Figura

This patch series introduces a new char misc driver, /dev/ntsync, which is used
to implement Windows NT synchronization primitives.

== Background ==

The Wine project emulates the Windows API in user space. One particular part of
that API, namely the NT synchronization primitives, have historically been
implemented via RPC to a dedicated "kernel" process. However, more recent
applications use these APIs more strenuously, and the overhead of RPC has become
a bottleneck.

The NT synchronization APIs are too complex to implement on top of existing
primitives without sacrificing correctness. Certain operations, such as
NtPulseEvent() or the "wait-for-all" mode of NtWaitForMultipleObjects(), require
direct control over the underlying wait queue, and implementing a wait queue
sufficiently robust for Wine in user space is not possible. This proposed
driver, therefore, implements the problematic interfaces directly in the Linux
kernel.

This driver was presented at Linux Plumbers Conference 2023. For those further
interested in the history of synchronization in Wine and past attempts to solve
this problem in user space, a recording of the presentation can be viewed here:

    https://www.youtube.com/watch?v=NjU4nyWyhU8


== Performance ==

The gain in performance varies wildly depending on the application in question
and the user's hardware. For some games NT synchronization is not a bottleneck
and no change can be observed, but for others frame rate improvements of 50 to
150 percent are not atypical. The following table lists frame rate measurements
from a variety of games on a variety of hardware, taken by users Dmitry
Skvortsov, FuzzyQuills, OnMars, and myself:

Game				Upstream	ntsync		improvement
===========================================================================
Anger Foot			 69		 99		 43%
Call of Juarez			 99.8		224.1		125%
Dirt 3				110.6		860.7		678%
Forza Horizon 5			108		160		 48%
Lara Croft: Temple of Osiris	141		326		131%
Metro 2033			164.4		199.2		 21%
Resident Evil 2			 26		 77		196%
The Crew			 26		 51		 96%
Tiny Tina's Wonderlands		130		360		177%
Total War Saga: Troy		109		146		 34%
===========================================================================


== Patches ==

This is the first part of a 32-patch series. The series comprises 17 patches
which contain the actual implementation, 13 which provide self-tests, 1 to
update the MAINTAINERS file, and 1 to add API documentation.

The intended semantics of the patches are broadly intended to match those of the
corresponding Windows functions. Since I do not expect familiarity with Windows
syscalls, however, and especially not with some of the more subtle or
unspecified behaviour that they provide, the documentation patch included in the
series also describes the intended behaviour in detail, and can be used as a
specification for the rest of the series.

The entire series can be retrieved or browsed here:

    https://repo.or.cz/linux/zf.git/shortlog/refs/heads/ntsync4

The patches making use of this driver in Wine can be retrieved or browsed here:

    https://repo.or.cz/wine/zf.git/shortlog/refs/heads/ntsync4


== Implementation ==

Some aspects of the implementation may deserve particular comment:

* In the interest of performance, each object is governed only by a single
  spinlock. However, NTSYNC_IOC_WAIT_ALL requires that the state of multiple
  objects be changed as a single atomic operation. In order to achieve this, we
  first take a device-wide lock ("wait_all_lock") any time we are going to lock
  more than one object at a time.

  The maximum number of objects that can be used in a vectored wait, and
  therefore the maximum that can be locked simultaneously, is 64. This number is
  NT's own limit.

  The acquisition of multiple spinlocks will degrade performance. This is a
  conscious choice, however. Wait-for-all is known to be a very rare operation
  in practice, especially with counts that approach the maximum, and it is the
  intent of the ntsync driver to optimize the wait-for-any pattern at the
  expense of the wait-for-all pattern as much as possible.

* NT mutexes are tied to their threads on an OS level, and the kernel includes
  builtin support for "robust" mutexes. In order to keep the ntsync driver
  self-contained and avoid touching more code than necessary, it does not hook
  into task exit nor use pids.

  Instead, the user space emulator is expected to manage thread IDs and pass
  them as an argument to any relevant functions; this is the "owner" field of
  ntsync_wait_args and ntsync_mutex_args.

  When the emulator detects that a thread dies, it should therefore call
  NTSYNC_IOC_KILL_OWNER, which will mark mutexes owned by that thread (if any)
  as abandoned.

* This implementation uses a misc device mostly because it seemed like the
  simplest and least obtrusive option.

  Besides simplicitly of implementation, the only particularly interesting
  advantage is the ability to create an arbitrary number of "contexts"
  (corresponding to Windows virtual machines) which are self-contained and
  shareable across multiple processes; this maps nicely to file descriptions
  (i.e. struct file). This is not impossible with syscalls of course but would
  require an extra argument.

  On the other hand, there is no reason to forbid using ntsync by default from
  user-mode processes, and (as far as I understand) to do so with a char device
  requires explicit configuration by e.g. udev or init. Since this is done with
  e.g. fuse, I assume this is the model to follow, but I may have chosen
  something deprecated.

* ntsync is module-capable mostly because there was nothing preventing it, and
  because it aided development. I am not aware of any reason why being a module
  is required, though.

* The misc minor number has not been reserved with LANANA. I am not sure at what
  point in the process this makes the most sense, but since this is still only
  an RFC I've abstained from doing so yet.


Elizabeth Figura (9):
  ntsync: Introduce the ntsync driver and character device.
  ntsync: Reserve a minor device number and ioctl range.
  ntsync: Introduce NTSYNC_IOC_CREATE_SEM and NTSYNC_IOC_DELETE.
  ntsync: Introduce NTSYNC_IOC_PUT_SEM.
  ntsync: Introduce NTSYNC_IOC_WAIT_ANY.
  ntsync: Introduce NTSYNC_IOC_WAIT_ALL.
  ntsync: Introduce NTSYNC_IOC_CREATE_MUTEX.
  ntsync: Introduce NTSYNC_IOC_PUT_MUTEX.
  ntsync: Introduce NTSYNC_IOC_KILL_OWNER.

 Documentation/admin-guide/devices.txt         |   3 +-
 .../userspace-api/ioctl/ioctl-number.rst      |   2 +
 drivers/misc/Kconfig                          |   9 +
 drivers/misc/Makefile                         |   1 +
 drivers/misc/ntsync.c                         | 916 ++++++++++++++++++
 include/linux/miscdevice.h                    |   1 +
 include/uapi/linux/ntsync.h                   |  53 +
 7 files changed, 984 insertions(+), 1 deletion(-)
 create mode 100644 drivers/misc/ntsync.c
 create mode 100644 include/uapi/linux/ntsync.h


base-commit: 6613476e225e090cc9aad49be7fa504e290dd33d
-- 
2.43.0


^ permalink raw reply	[flat|nested] 39+ messages in thread

* [RFC PATCH 1/9] ntsync: Introduce the ntsync driver and character device.
  2024-01-24  0:40 [RFC PATCH 0/9] NT synchronization primitive driver Elizabeth Figura
@ 2024-01-24  0:40 ` Elizabeth Figura
  2024-01-24  7:38   ` Arnd Bergmann
  2024-01-24 21:26   ` Andy Lutomirski
  2024-01-24  0:40 ` [RFC PATCH 2/9] ntsync: Reserve a minor device number and ioctl range Elizabeth Figura
                   ` (8 subsequent siblings)
  9 siblings, 2 replies; 39+ messages in thread
From: Elizabeth Figura @ 2024-01-24  0:40 UTC (permalink / raw)
  To: Arnd Bergmann, Greg Kroah-Hartman, linux-kernel, linux-api
  Cc: wine-devel, André Almeida, Wolfram Sang, Arkadiusz Hiler,
	Peter Zijlstra, Elizabeth Figura

ntsync uses a misc device as the simplest and least intrusive uAPI interface.

Each file description on the device represents an isolated NT instance, intended
to correspond to a single NT virtual machine.

Signed-off-by: Elizabeth Figura <zfigura@codeweavers.com>
---
 drivers/misc/Kconfig  |  9 ++++++++
 drivers/misc/Makefile |  1 +
 drivers/misc/ntsync.c | 53 +++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 63 insertions(+)
 create mode 100644 drivers/misc/ntsync.c

diff --git a/drivers/misc/Kconfig b/drivers/misc/Kconfig
index 4fb291f0bf7c..bdd8a71bd853 100644
--- a/drivers/misc/Kconfig
+++ b/drivers/misc/Kconfig
@@ -504,6 +504,15 @@ config OPEN_DICE
 	  measured boot flow. Userspace can use CDIs for remote attestation
 	  and sealing.
 
+config NTSYNC
+	tristate "NT synchronization primitive emulation"
+	help
+	  This module provides kernel support for emulation of Windows NT
+	  synchronization primitives. It is not a hardware driver.
+
+	  To compile this driver as a module, choose M here: the
+	  module will be called ntsync.
+
 	  If unsure, say N.
 
 config VCPU_STALL_DETECTOR
diff --git a/drivers/misc/Makefile b/drivers/misc/Makefile
index ea6ea5bbbc9c..153a3f4837e8 100644
--- a/drivers/misc/Makefile
+++ b/drivers/misc/Makefile
@@ -59,6 +59,7 @@ obj-$(CONFIG_PVPANIC)   	+= pvpanic/
 obj-$(CONFIG_UACCE)		+= uacce/
 obj-$(CONFIG_XILINX_SDFEC)	+= xilinx_sdfec.o
 obj-$(CONFIG_HISI_HIKEY_USB)	+= hisi_hikey_usb.o
+obj-$(CONFIG_NTSYNC)		+= ntsync.o
 obj-$(CONFIG_HI6421V600_IRQ)	+= hi6421v600-irq.o
 obj-$(CONFIG_OPEN_DICE)		+= open-dice.o
 obj-$(CONFIG_GP_PCI1XXXX)	+= mchp_pci1xxxx/
diff --git a/drivers/misc/ntsync.c b/drivers/misc/ntsync.c
new file mode 100644
index 000000000000..9424c6210e51
--- /dev/null
+++ b/drivers/misc/ntsync.c
@@ -0,0 +1,53 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * ntsync.c - Kernel driver for NT synchronization primitives
+ *
+ * Copyright (C) 2021-2022 Elizabeth Figura
+ */
+
+#include <linux/fs.h>
+#include <linux/miscdevice.h>
+#include <linux/module.h>
+
+#define NTSYNC_NAME	"ntsync"
+
+static int ntsync_char_open(struct inode *inode, struct file *file)
+{
+	return nonseekable_open(inode, file);
+}
+
+static int ntsync_char_release(struct inode *inode, struct file *file)
+{
+	return 0;
+}
+
+static long ntsync_char_ioctl(struct file *file, unsigned int cmd,
+			      unsigned long parm)
+{
+	switch (cmd) {
+	default:
+		return -ENOIOCTLCMD;
+	}
+}
+
+static const struct file_operations ntsync_fops = {
+	.owner		= THIS_MODULE,
+	.open		= ntsync_char_open,
+	.release	= ntsync_char_release,
+	.unlocked_ioctl	= ntsync_char_ioctl,
+	.compat_ioctl	= ntsync_char_ioctl,
+	.llseek		= no_llseek,
+};
+
+static struct miscdevice ntsync_misc = {
+	.minor		= MISC_DYNAMIC_MINOR,
+	.name		= NTSYNC_NAME,
+	.fops		= &ntsync_fops,
+};
+
+module_misc_device(ntsync_misc);
+
+MODULE_AUTHOR("Elizabeth Figura");
+MODULE_DESCRIPTION("Kernel driver for NT synchronization primitives");
+MODULE_LICENSE("GPL");
+MODULE_ALIAS("devname:" NTSYNC_NAME);
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [RFC PATCH 2/9] ntsync: Reserve a minor device number and ioctl range.
  2024-01-24  0:40 [RFC PATCH 0/9] NT synchronization primitive driver Elizabeth Figura
  2024-01-24  0:40 ` [RFC PATCH 1/9] ntsync: Introduce the ntsync driver and character device Elizabeth Figura
@ 2024-01-24  0:40 ` Elizabeth Figura
  2024-01-24  0:54   ` Greg Kroah-Hartman
  2024-01-24  0:40 ` [RFC PATCH 3/9] ntsync: Introduce NTSYNC_IOC_CREATE_SEM and NTSYNC_IOC_DELETE Elizabeth Figura
                   ` (7 subsequent siblings)
  9 siblings, 1 reply; 39+ messages in thread
From: Elizabeth Figura @ 2024-01-24  0:40 UTC (permalink / raw)
  To: Arnd Bergmann, Greg Kroah-Hartman, linux-kernel, linux-api
  Cc: wine-devel, André Almeida, Wolfram Sang, Arkadiusz Hiler,
	Peter Zijlstra, Elizabeth Figura

Signed-off-by: Elizabeth Figura <zfigura@codeweavers.com>
---
 Documentation/admin-guide/devices.txt              | 3 ++-
 Documentation/userspace-api/ioctl/ioctl-number.rst | 2 ++
 drivers/misc/ntsync.c                              | 3 ++-
 include/linux/miscdevice.h                         | 1 +
 4 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/Documentation/admin-guide/devices.txt b/Documentation/admin-guide/devices.txt
index 94c98be1329a..041404397ee5 100644
--- a/Documentation/admin-guide/devices.txt
+++ b/Documentation/admin-guide/devices.txt
@@ -376,8 +376,9 @@
 		240 = /dev/userio	Serio driver testing device
 		241 = /dev/vhost-vsock	Host kernel driver for virtio vsock
 		242 = /dev/rfkill	Turning off radio transmissions (rfkill)
+		243 = /dev/ntsync	NT synchronization primitive device
 
-		243-254			Reserved for local use
+		244-254			Reserved for local use
 		255			Reserved for MISC_DYNAMIC_MINOR
 
   11 char	Raw keyboard device	(Linux/SPARC only)
diff --git a/Documentation/userspace-api/ioctl/ioctl-number.rst b/Documentation/userspace-api/ioctl/ioctl-number.rst
index 457e16f06e04..a1326a5bc2e0 100644
--- a/Documentation/userspace-api/ioctl/ioctl-number.rst
+++ b/Documentation/userspace-api/ioctl/ioctl-number.rst
@@ -378,6 +378,8 @@ Code  Seq#    Include File                                           Comments
                                                                      <mailto:thomas@winischhofer.net>
 0xF6  all                                                            LTTng Linux Trace Toolkit Next Generation
                                                                      <mailto:mathieu.desnoyers@efficios.com>
+0xF7  00-1F  uapi/linux/ntsync.h                                     NT synchronization primitives
+                                                                     <mailto:wine-devel@winehq.org>
 0xF8  all    arch/x86/include/uapi/asm/amd_hsmp.h                    AMD HSMP EPYC system management interface driver
                                                                      <mailto:nchatrad@amd.com>
 0xFD  all    linux/dm-ioctl.h
diff --git a/drivers/misc/ntsync.c b/drivers/misc/ntsync.c
index 9424c6210e51..84b498e2b2d5 100644
--- a/drivers/misc/ntsync.c
+++ b/drivers/misc/ntsync.c
@@ -40,7 +40,7 @@ static const struct file_operations ntsync_fops = {
 };
 
 static struct miscdevice ntsync_misc = {
-	.minor		= MISC_DYNAMIC_MINOR,
+	.minor		= NTSYNC_MINOR,
 	.name		= NTSYNC_NAME,
 	.fops		= &ntsync_fops,
 };
@@ -51,3 +51,4 @@ MODULE_AUTHOR("Elizabeth Figura");
 MODULE_DESCRIPTION("Kernel driver for NT synchronization primitives");
 MODULE_LICENSE("GPL");
 MODULE_ALIAS("devname:" NTSYNC_NAME);
+MODULE_ALIAS_MISCDEV(NTSYNC_MINOR);
diff --git a/include/linux/miscdevice.h b/include/linux/miscdevice.h
index c0fea6ca5076..fe5d9366fdf7 100644
--- a/include/linux/miscdevice.h
+++ b/include/linux/miscdevice.h
@@ -71,6 +71,7 @@
 #define USERIO_MINOR		240
 #define VHOST_VSOCK_MINOR	241
 #define RFKILL_MINOR		242
+#define NTSYNC_MINOR		243
 #define MISC_DYNAMIC_MINOR	255
 
 struct device;
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [RFC PATCH 3/9] ntsync: Introduce NTSYNC_IOC_CREATE_SEM and NTSYNC_IOC_DELETE.
  2024-01-24  0:40 [RFC PATCH 0/9] NT synchronization primitive driver Elizabeth Figura
  2024-01-24  0:40 ` [RFC PATCH 1/9] ntsync: Introduce the ntsync driver and character device Elizabeth Figura
  2024-01-24  0:40 ` [RFC PATCH 2/9] ntsync: Reserve a minor device number and ioctl range Elizabeth Figura
@ 2024-01-24  0:40 ` Elizabeth Figura
  2024-01-24  1:14   ` Greg Kroah-Hartman
  2024-01-24  0:40 ` [RFC PATCH 4/9] ntsync: Introduce NTSYNC_IOC_PUT_SEM Elizabeth Figura
                   ` (6 subsequent siblings)
  9 siblings, 1 reply; 39+ messages in thread
From: Elizabeth Figura @ 2024-01-24  0:40 UTC (permalink / raw)
  To: Arnd Bergmann, Greg Kroah-Hartman, linux-kernel, linux-api
  Cc: wine-devel, André Almeida, Wolfram Sang, Arkadiusz Hiler,
	Peter Zijlstra, Elizabeth Figura

These correspond to the NT syscalls NtCreateSemaphore() and NtClose().
Unlike those functions, however, these ioctls do not handle object names, or
lookup of existing objects, or handle reference counting, but simply create the
underlying primitive. The user space emulator is expected to implement those
functions if they are required.

Signed-off-by: Elizabeth Figura <zfigura@codeweavers.com>
---
 drivers/misc/ntsync.c       | 117 ++++++++++++++++++++++++++++++++++++
 include/uapi/linux/ntsync.h |  25 ++++++++
 2 files changed, 142 insertions(+)
 create mode 100644 include/uapi/linux/ntsync.h

diff --git a/drivers/misc/ntsync.c b/drivers/misc/ntsync.c
index 84b498e2b2d5..3287b94be351 100644
--- a/drivers/misc/ntsync.c
+++ b/drivers/misc/ntsync.c
@@ -8,23 +8,140 @@
 #include <linux/fs.h>
 #include <linux/miscdevice.h>
 #include <linux/module.h>
+#include <linux/slab.h>
+#include <linux/xarray.h>
+#include <uapi/linux/ntsync.h>
 
 #define NTSYNC_NAME	"ntsync"
 
+enum ntsync_type {
+	NTSYNC_TYPE_SEM,
+};
+
+struct ntsync_obj {
+	struct rcu_head rhead;
+	struct kref refcount;
+
+	enum ntsync_type type;
+
+	union {
+		struct {
+			__u32 count;
+			__u32 max;
+		} sem;
+	} u;
+};
+
+struct ntsync_device {
+	struct xarray objects;
+};
+
+static void destroy_obj(struct kref *ref)
+{
+	struct ntsync_obj *obj = container_of(ref, struct ntsync_obj, refcount);
+
+	kfree_rcu(obj, rhead);
+}
+
+static void put_obj(struct ntsync_obj *obj)
+{
+	kref_put(&obj->refcount, destroy_obj);
+}
+
 static int ntsync_char_open(struct inode *inode, struct file *file)
 {
+	struct ntsync_device *dev;
+
+	dev = kzalloc(sizeof(*dev), GFP_KERNEL);
+	if (!dev)
+		return -ENOMEM;
+
+	xa_init_flags(&dev->objects, XA_FLAGS_ALLOC);
+
+	file->private_data = dev;
 	return nonseekable_open(inode, file);
 }
 
 static int ntsync_char_release(struct inode *inode, struct file *file)
 {
+	struct ntsync_device *dev = file->private_data;
+	struct ntsync_obj *obj;
+	unsigned long id;
+
+	xa_for_each(&dev->objects, id, obj)
+		put_obj(obj);
+
+	xa_destroy(&dev->objects);
+
+	kfree(dev);
+
+	return 0;
+}
+
+static void init_obj(struct ntsync_obj *obj)
+{
+	kref_init(&obj->refcount);
+}
+
+static int ntsync_create_sem(struct ntsync_device *dev, void __user *argp)
+{
+	struct ntsync_sem_args __user *user_args = argp;
+	struct ntsync_sem_args args;
+	struct ntsync_obj *sem;
+	__u32 id;
+	int ret;
+
+	if (copy_from_user(&args, argp, sizeof(args)))
+		return -EFAULT;
+
+	if (args.count > args.max)
+		return -EINVAL;
+
+	sem = kzalloc(sizeof(*sem), GFP_KERNEL);
+	if (!sem)
+		return -ENOMEM;
+
+	init_obj(sem);
+	sem->type = NTSYNC_TYPE_SEM;
+	sem->u.sem.count = args.count;
+	sem->u.sem.max = args.max;
+
+	ret = xa_alloc(&dev->objects, &id, sem, xa_limit_32b, GFP_KERNEL);
+	if (ret < 0) {
+		kfree(sem);
+		return ret;
+	}
+
+	return put_user(id, &user_args->sem);
+}
+
+static int ntsync_delete(struct ntsync_device *dev, void __user *argp)
+{
+	struct ntsync_obj *obj;
+	__u32 id;
+
+	if (get_user(id, (__u32 __user *)argp))
+		return -EFAULT;
+
+	obj = xa_erase(&dev->objects, id);
+	if (!obj)
+		return -EINVAL;
+
+	put_obj(obj);
 	return 0;
 }
 
 static long ntsync_char_ioctl(struct file *file, unsigned int cmd,
 			      unsigned long parm)
 {
+	struct ntsync_device *dev = file->private_data;
+	void __user *argp = (void __user *)parm;
+
 	switch (cmd) {
+	case NTSYNC_IOC_CREATE_SEM:
+		return ntsync_create_sem(dev, argp);
+	case NTSYNC_IOC_DELETE:
+		return ntsync_delete(dev, argp);
 	default:
 		return -ENOIOCTLCMD;
 	}
diff --git a/include/uapi/linux/ntsync.h b/include/uapi/linux/ntsync.h
new file mode 100644
index 000000000000..d97afc138dcc
--- /dev/null
+++ b/include/uapi/linux/ntsync.h
@@ -0,0 +1,25 @@
+/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
+/*
+ * Kernel support for NT synchronization primitive emulation
+ *
+ * Copyright (C) 2021-2022 Elizabeth Figura
+ */
+
+#ifndef __LINUX_NTSYNC_H
+#define __LINUX_NTSYNC_H
+
+#include <linux/types.h>
+
+struct ntsync_sem_args {
+	__u32 sem;
+	__u32 count;
+	__u32 max;
+};
+
+#define NTSYNC_IOC_BASE 0xf7
+
+#define NTSYNC_IOC_CREATE_SEM		_IOWR(NTSYNC_IOC_BASE, 0, \
+					      struct ntsync_sem_args)
+#define NTSYNC_IOC_DELETE		_IOW (NTSYNC_IOC_BASE, 1, __u32)
+
+#endif
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [RFC PATCH 4/9] ntsync: Introduce NTSYNC_IOC_PUT_SEM.
  2024-01-24  0:40 [RFC PATCH 0/9] NT synchronization primitive driver Elizabeth Figura
                   ` (2 preceding siblings ...)
  2024-01-24  0:40 ` [RFC PATCH 3/9] ntsync: Introduce NTSYNC_IOC_CREATE_SEM and NTSYNC_IOC_DELETE Elizabeth Figura
@ 2024-01-24  0:40 ` Elizabeth Figura
  2024-01-25  8:59   ` Nikolay Borisov
  2024-01-24  0:40 ` [RFC PATCH 5/9] ntsync: Introduce NTSYNC_IOC_WAIT_ANY Elizabeth Figura
                   ` (5 subsequent siblings)
  9 siblings, 1 reply; 39+ messages in thread
From: Elizabeth Figura @ 2024-01-24  0:40 UTC (permalink / raw)
  To: Arnd Bergmann, Greg Kroah-Hartman, linux-kernel, linux-api
  Cc: wine-devel, André Almeida, Wolfram Sang, Arkadiusz Hiler,
	Peter Zijlstra, Elizabeth Figura

This corresponds to the NT syscall NtReleaseSemaphore().

Signed-off-by: Elizabeth Figura <zfigura@codeweavers.com>
---
 drivers/misc/ntsync.c       | 76 +++++++++++++++++++++++++++++++++++++
 include/uapi/linux/ntsync.h |  2 +
 2 files changed, 78 insertions(+)

diff --git a/drivers/misc/ntsync.c b/drivers/misc/ntsync.c
index 3287b94be351..d1c91c2a4f1a 100644
--- a/drivers/misc/ntsync.c
+++ b/drivers/misc/ntsync.c
@@ -21,9 +21,11 @@ enum ntsync_type {
 struct ntsync_obj {
 	struct rcu_head rhead;
 	struct kref refcount;
+	spinlock_t lock;
 
 	enum ntsync_type type;
 
+	/* The following fields are protected by the object lock. */
 	union {
 		struct {
 			__u32 count;
@@ -36,6 +38,19 @@ struct ntsync_device {
 	struct xarray objects;
 };
 
+static struct ntsync_obj *get_obj(struct ntsync_device *dev, __u32 id)
+{
+	struct ntsync_obj *obj;
+
+	rcu_read_lock();
+	obj = xa_load(&dev->objects, id);
+	if (obj && !kref_get_unless_zero(&obj->refcount))
+		obj = NULL;
+	rcu_read_unlock();
+
+	return obj;
+}
+
 static void destroy_obj(struct kref *ref)
 {
 	struct ntsync_obj *obj = container_of(ref, struct ntsync_obj, refcount);
@@ -48,6 +63,18 @@ static void put_obj(struct ntsync_obj *obj)
 	kref_put(&obj->refcount, destroy_obj);
 }
 
+static struct ntsync_obj *get_obj_typed(struct ntsync_device *dev, __u32 id,
+					enum ntsync_type type)
+{
+	struct ntsync_obj *obj = get_obj(dev, id);
+
+	if (obj && obj->type != type) {
+		put_obj(obj);
+		return NULL;
+	}
+	return obj;
+}
+
 static int ntsync_char_open(struct inode *inode, struct file *file)
 {
 	struct ntsync_device *dev;
@@ -81,6 +108,7 @@ static int ntsync_char_release(struct inode *inode, struct file *file)
 static void init_obj(struct ntsync_obj *obj)
 {
 	kref_init(&obj->refcount);
+	spin_lock_init(&obj->lock);
 }
 
 static int ntsync_create_sem(struct ntsync_device *dev, void __user *argp)
@@ -131,6 +159,52 @@ static int ntsync_delete(struct ntsync_device *dev, void __user *argp)
 	return 0;
 }
 
+/*
+ * Actually change the semaphore state, returning -EOVERFLOW if it is made
+ * invalid.
+ */
+static int put_sem_state(struct ntsync_obj *sem, __u32 count)
+{
+	lockdep_assert_held(&sem->lock);
+
+	if (sem->u.sem.count + count < sem->u.sem.count ||
+	    sem->u.sem.count + count > sem->u.sem.max)
+		return -EOVERFLOW;
+
+	sem->u.sem.count += count;
+	return 0;
+}
+
+static int ntsync_put_sem(struct ntsync_device *dev, void __user *argp)
+{
+	struct ntsync_sem_args __user *user_args = argp;
+	struct ntsync_sem_args args;
+	struct ntsync_obj *sem;
+	__u32 prev_count;
+	int ret;
+
+	if (copy_from_user(&args, argp, sizeof(args)))
+		return -EFAULT;
+
+	sem = get_obj_typed(dev, args.sem, NTSYNC_TYPE_SEM);
+	if (!sem)
+		return -EINVAL;
+
+	spin_lock(&sem->lock);
+
+	prev_count = sem->u.sem.count;
+	ret = put_sem_state(sem, args.count);
+
+	spin_unlock(&sem->lock);
+
+	put_obj(sem);
+
+	if (!ret && put_user(prev_count, &user_args->count))
+		ret = -EFAULT;
+
+	return ret;
+}
+
 static long ntsync_char_ioctl(struct file *file, unsigned int cmd,
 			      unsigned long parm)
 {
@@ -142,6 +216,8 @@ static long ntsync_char_ioctl(struct file *file, unsigned int cmd,
 		return ntsync_create_sem(dev, argp);
 	case NTSYNC_IOC_DELETE:
 		return ntsync_delete(dev, argp);
+	case NTSYNC_IOC_PUT_SEM:
+		return ntsync_put_sem(dev, argp);
 	default:
 		return -ENOIOCTLCMD;
 	}
diff --git a/include/uapi/linux/ntsync.h b/include/uapi/linux/ntsync.h
index d97afc138dcc..8c610d65f8ef 100644
--- a/include/uapi/linux/ntsync.h
+++ b/include/uapi/linux/ntsync.h
@@ -21,5 +21,7 @@ struct ntsync_sem_args {
 #define NTSYNC_IOC_CREATE_SEM		_IOWR(NTSYNC_IOC_BASE, 0, \
 					      struct ntsync_sem_args)
 #define NTSYNC_IOC_DELETE		_IOW (NTSYNC_IOC_BASE, 1, __u32)
+#define NTSYNC_IOC_PUT_SEM		_IOWR(NTSYNC_IOC_BASE, 2, \
+					      struct ntsync_sem_args)
 
 #endif
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [RFC PATCH 5/9] ntsync: Introduce NTSYNC_IOC_WAIT_ANY.
  2024-01-24  0:40 [RFC PATCH 0/9] NT synchronization primitive driver Elizabeth Figura
                   ` (3 preceding siblings ...)
  2024-01-24  0:40 ` [RFC PATCH 4/9] ntsync: Introduce NTSYNC_IOC_PUT_SEM Elizabeth Figura
@ 2024-01-24  0:40 ` Elizabeth Figura
  2024-01-24  7:56   ` Arnd Bergmann
  2024-01-24  0:40 ` [RFC PATCH 6/9] ntsync: Introduce NTSYNC_IOC_WAIT_ALL Elizabeth Figura
                   ` (4 subsequent siblings)
  9 siblings, 1 reply; 39+ messages in thread
From: Elizabeth Figura @ 2024-01-24  0:40 UTC (permalink / raw)
  To: Arnd Bergmann, Greg Kroah-Hartman, linux-kernel, linux-api
  Cc: wine-devel, André Almeida, Wolfram Sang, Arkadiusz Hiler,
	Peter Zijlstra, Elizabeth Figura

This corresponds to part of the functionality of the NT syscall
NtWaitForMultipleObjects(). Specifically, it implements the behaviour where
the third argument (wait_any) is TRUE, and it does not handle alertable waits.
Those features have been split out into separate patches to ease review.

Signed-off-by: Elizabeth Figura <zfigura@codeweavers.com>
---
 drivers/misc/ntsync.c       | 229 ++++++++++++++++++++++++++++++++++++
 include/uapi/linux/ntsync.h |  13 ++
 2 files changed, 242 insertions(+)

diff --git a/drivers/misc/ntsync.c b/drivers/misc/ntsync.c
index d1c91c2a4f1a..2e8d3c2d51a4 100644
--- a/drivers/misc/ntsync.c
+++ b/drivers/misc/ntsync.c
@@ -23,6 +23,8 @@ struct ntsync_obj {
 	struct kref refcount;
 	spinlock_t lock;
 
+	struct list_head any_waiters;
+
 	enum ntsync_type type;
 
 	/* The following fields are protected by the object lock. */
@@ -34,6 +36,28 @@ struct ntsync_obj {
 	} u;
 };
 
+struct ntsync_q_entry {
+	struct list_head node;
+	struct ntsync_q *q;
+	struct ntsync_obj *obj;
+	__u32 index;
+};
+
+struct ntsync_q {
+	struct task_struct *task;
+	__u32 owner;
+
+	/*
+	 * Protected via atomic_cmpxchg(). Only the thread that wins the
+	 * compare-and-swap may actually change object states and wake this
+	 * task.
+	 */
+	atomic_t signaled;
+
+	__u32 count;
+	struct ntsync_q_entry entries[];
+};
+
 struct ntsync_device {
 	struct xarray objects;
 };
@@ -109,6 +133,26 @@ static void init_obj(struct ntsync_obj *obj)
 {
 	kref_init(&obj->refcount);
 	spin_lock_init(&obj->lock);
+	INIT_LIST_HEAD(&obj->any_waiters);
+}
+
+static void try_wake_any_sem(struct ntsync_obj *sem)
+{
+	struct ntsync_q_entry *entry;
+
+	lockdep_assert_held(&sem->lock);
+
+	list_for_each_entry(entry, &sem->any_waiters, node) {
+		struct ntsync_q *q = entry->q;
+
+		if (!sem->u.sem.count)
+			break;
+
+		if (atomic_cmpxchg(&q->signaled, -1, entry->index) == -1) {
+			sem->u.sem.count--;
+			wake_up_process(q->task);
+		}
+	}
 }
 
 static int ntsync_create_sem(struct ntsync_device *dev, void __user *argp)
@@ -194,6 +238,8 @@ static int ntsync_put_sem(struct ntsync_device *dev, void __user *argp)
 
 	prev_count = sem->u.sem.count;
 	ret = put_sem_state(sem, args.count);
+	if (!ret)
+		try_wake_any_sem(sem);
 
 	spin_unlock(&sem->lock);
 
@@ -205,6 +251,187 @@ static int ntsync_put_sem(struct ntsync_device *dev, void __user *argp)
 	return ret;
 }
 
+static int ntsync_schedule(const struct ntsync_q *q, ktime_t *timeout)
+{
+	int ret = 0;
+
+	do {
+		if (signal_pending(current)) {
+			ret = -ERESTARTSYS;
+			break;
+		}
+
+		set_current_state(TASK_INTERRUPTIBLE);
+		if (atomic_read(&q->signaled) != -1) {
+			ret = 0;
+			break;
+		}
+		ret = schedule_hrtimeout(timeout, HRTIMER_MODE_ABS);
+	} while (ret < 0);
+	__set_current_state(TASK_RUNNING);
+
+	return ret;
+}
+
+/*
+ * Allocate and initialize the ntsync_q structure, but do not queue us yet.
+ * Also, calculate the relative timeout.
+ */
+static int setup_wait(struct ntsync_device *dev,
+		      const struct ntsync_wait_args *args,
+		      ktime_t *ret_timeout, struct ntsync_q **ret_q)
+{
+	const __u32 count = args->count;
+	struct ntsync_q *q;
+	ktime_t timeout = 0;
+	__u32 *ids;
+	__u32 i, j;
+
+	if (!args->owner || args->pad)
+		return -EINVAL;
+
+	if (args->count > NTSYNC_MAX_WAIT_COUNT)
+		return -EINVAL;
+
+	if (args->timeout) {
+		struct timespec64 to;
+
+		if (get_timespec64(&to, u64_to_user_ptr(args->timeout)))
+			return -EFAULT;
+		if (!timespec64_valid(&to))
+			return -EINVAL;
+
+		timeout = timespec64_to_ns(&to);
+	}
+
+	ids = kmalloc_array(count, sizeof(*ids), GFP_KERNEL);
+	if (!ids)
+		return -ENOMEM;
+	if (copy_from_user(ids, u64_to_user_ptr(args->objs),
+			   array_size(count, sizeof(*ids)))) {
+		kfree(ids);
+		return -EFAULT;
+	}
+
+	q = kmalloc(struct_size(q, entries, count), GFP_KERNEL);
+	if (!q) {
+		kfree(ids);
+		return -ENOMEM;
+	}
+	q->task = current;
+	q->owner = args->owner;
+	atomic_set(&q->signaled, -1);
+	q->count = count;
+
+	for (i = 0; i < count; i++) {
+		struct ntsync_q_entry *entry = &q->entries[i];
+		struct ntsync_obj *obj = get_obj(dev, ids[i]);
+
+		if (!obj)
+			goto err;
+
+		entry->obj = obj;
+		entry->q = q;
+		entry->index = i;
+	}
+
+	kfree(ids);
+
+	*ret_q = q;
+	*ret_timeout = timeout;
+	return 0;
+
+err:
+	for (j = 0; j < i; j++)
+		put_obj(q->entries[j].obj);
+	kfree(ids);
+	kfree(q);
+	return -EINVAL;
+}
+
+static void try_wake_any_obj(struct ntsync_obj *obj)
+{
+	switch (obj->type) {
+	case NTSYNC_TYPE_SEM:
+		try_wake_any_sem(obj);
+		break;
+	}
+}
+
+static int ntsync_wait_any(struct ntsync_device *dev, void __user *argp)
+{
+	struct ntsync_wait_args args;
+	struct ntsync_q *q;
+	ktime_t timeout;
+	int signaled;
+	__u32 i;
+	int ret;
+
+	if (copy_from_user(&args, argp, sizeof(args)))
+		return -EFAULT;
+
+	ret = setup_wait(dev, &args, &timeout, &q);
+	if (ret < 0)
+		return ret;
+
+	/* queue ourselves */
+
+	for (i = 0; i < args.count; i++) {
+		struct ntsync_q_entry *entry = &q->entries[i];
+		struct ntsync_obj *obj = entry->obj;
+
+		spin_lock(&obj->lock);
+		list_add_tail(&entry->node, &obj->any_waiters);
+		spin_unlock(&obj->lock);
+	}
+
+	/* check if we are already signaled */
+
+	for (i = 0; i < args.count; i++) {
+		struct ntsync_obj *obj = q->entries[i].obj;
+
+		if (atomic_read(&q->signaled) != -1)
+			break;
+
+		spin_lock(&obj->lock);
+		try_wake_any_obj(obj);
+		spin_unlock(&obj->lock);
+	}
+
+	/* sleep */
+
+	ret = ntsync_schedule(q, args.timeout ? &timeout : NULL);
+
+	/* and finally, unqueue */
+
+	for (i = 0; i < args.count; i++) {
+		struct ntsync_q_entry *entry = &q->entries[i];
+		struct ntsync_obj *obj = entry->obj;
+
+		spin_lock(&obj->lock);
+		list_del(&entry->node);
+		spin_unlock(&obj->lock);
+
+		put_obj(obj);
+	}
+
+	signaled = atomic_read(&q->signaled);
+	if (signaled != -1) {
+		struct ntsync_wait_args __user *user_args = argp;
+
+		/* even if we caught a signal, we need to communicate success */
+		ret = 0;
+
+		if (put_user(signaled, &user_args->index))
+			ret = -EFAULT;
+	} else if (!ret) {
+		ret = -ETIMEDOUT;
+	}
+
+	kfree(q);
+	return ret;
+}
+
 static long ntsync_char_ioctl(struct file *file, unsigned int cmd,
 			      unsigned long parm)
 {
@@ -218,6 +445,8 @@ static long ntsync_char_ioctl(struct file *file, unsigned int cmd,
 		return ntsync_delete(dev, argp);
 	case NTSYNC_IOC_PUT_SEM:
 		return ntsync_put_sem(dev, argp);
+	case NTSYNC_IOC_WAIT_ANY:
+		return ntsync_wait_any(dev, argp);
 	default:
 		return -ENOIOCTLCMD;
 	}
diff --git a/include/uapi/linux/ntsync.h b/include/uapi/linux/ntsync.h
index 8c610d65f8ef..10f07da7864e 100644
--- a/include/uapi/linux/ntsync.h
+++ b/include/uapi/linux/ntsync.h
@@ -16,6 +16,17 @@ struct ntsync_sem_args {
 	__u32 max;
 };
 
+struct ntsync_wait_args {
+	__u64 timeout;
+	__u64 objs;
+	__u32 count;
+	__u32 owner;
+	__u32 index;
+	__u32 pad;
+};
+
+#define NTSYNC_MAX_WAIT_COUNT 64
+
 #define NTSYNC_IOC_BASE 0xf7
 
 #define NTSYNC_IOC_CREATE_SEM		_IOWR(NTSYNC_IOC_BASE, 0, \
@@ -23,5 +34,7 @@ struct ntsync_sem_args {
 #define NTSYNC_IOC_DELETE		_IOW (NTSYNC_IOC_BASE, 1, __u32)
 #define NTSYNC_IOC_PUT_SEM		_IOWR(NTSYNC_IOC_BASE, 2, \
 					      struct ntsync_sem_args)
+#define NTSYNC_IOC_WAIT_ANY		_IOWR(NTSYNC_IOC_BASE, 3, \
+					      struct ntsync_wait_args)
 
 #endif
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [RFC PATCH 6/9] ntsync: Introduce NTSYNC_IOC_WAIT_ALL.
  2024-01-24  0:40 [RFC PATCH 0/9] NT synchronization primitive driver Elizabeth Figura
                   ` (4 preceding siblings ...)
  2024-01-24  0:40 ` [RFC PATCH 5/9] ntsync: Introduce NTSYNC_IOC_WAIT_ANY Elizabeth Figura
@ 2024-01-24  0:40 ` Elizabeth Figura
  2024-01-24  0:40 ` [RFC PATCH 7/9] ntsync: Introduce NTSYNC_IOC_CREATE_MUTEX Elizabeth Figura
                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 39+ messages in thread
From: Elizabeth Figura @ 2024-01-24  0:40 UTC (permalink / raw)
  To: Arnd Bergmann, Greg Kroah-Hartman, linux-kernel, linux-api
  Cc: wine-devel, André Almeida, Wolfram Sang, Arkadiusz Hiler,
	Peter Zijlstra, Elizabeth Figura

This corresponds to part of the functionality of the NT syscall
NtWaitForMultipleObjects(). Specifically, it implements the behaviour where
the third argument (wait_any) is FALSE, and it does not yet handle alertable
waits.

Signed-off-by: Elizabeth Figura <zfigura@codeweavers.com>
---
 drivers/misc/ntsync.c       | 241 ++++++++++++++++++++++++++++++++++--
 include/uapi/linux/ntsync.h |   2 +
 2 files changed, 235 insertions(+), 8 deletions(-)

diff --git a/drivers/misc/ntsync.c b/drivers/misc/ntsync.c
index 2e8d3c2d51a4..2685363fae9e 100644
--- a/drivers/misc/ntsync.c
+++ b/drivers/misc/ntsync.c
@@ -23,7 +23,34 @@ struct ntsync_obj {
 	struct kref refcount;
 	spinlock_t lock;
 
+	/*
+	 * any_waiters is protected by the object lock, but all_waiters is
+	 * protected by the device wait_all_lock.
+	 */
 	struct list_head any_waiters;
+	struct list_head all_waiters;
+
+	/*
+	 * Hint describing how many tasks are queued on this object in a
+	 * wait-all operation.
+	 *
+	 * Any time we do a wake, we may need to wake "all" waiters as well as
+	 * "any" waiters. In order to atomically wake "all" waiters, we must
+	 * lock all of the objects, and that means grabbing the wait_all_lock
+	 * below (and, due to lock ordering rules, before locking this object).
+	 * However, wait-all is a rare operation, and grabbing the wait-all
+	 * lock for every wake would create unnecessary contention. Therefore we
+	 * first check whether all_hint is zero, and, if it is, we skip trying
+	 * to wake "all" waiters.
+	 *
+	 * This hint isn't protected by any lock. It might change during the
+	 * course of a wake, but there's no meaningful race there; it's only a
+	 * hint.
+	 *
+	 * Since wait requests must originate from user-space threads, we're
+	 * limited here by PID_MAX_LIMIT, so there's no risk of saturation.
+	 */
+	atomic_t all_hint;
 
 	enum ntsync_type type;
 
@@ -54,11 +81,25 @@ struct ntsync_q {
 	 */
 	atomic_t signaled;
 
+	bool all;
 	__u32 count;
 	struct ntsync_q_entry entries[];
 };
 
 struct ntsync_device {
+	/*
+	 * Wait-all operations must atomically grab all objects, and be totally
+	 * ordered with respect to each other and wait-any operations. If one
+	 * thread is trying to acquire several objects, another thread cannot
+	 * touch the object at the same time.
+	 *
+	 * We achieve this by grabbing multiple object locks at the same time.
+	 * However, this creates a lock ordering problem. To solve that problem,
+	 * wait_all_lock is taken first whenever multiple objects must be locked
+	 * at the same time.
+	 */
+	spinlock_t wait_all_lock;
+
 	struct xarray objects;
 };
 
@@ -107,6 +148,8 @@ static int ntsync_char_open(struct inode *inode, struct file *file)
 	if (!dev)
 		return -ENOMEM;
 
+	spin_lock_init(&dev->wait_all_lock);
+
 	xa_init_flags(&dev->objects, XA_FLAGS_ALLOC);
 
 	file->private_data = dev;
@@ -132,8 +175,81 @@ static int ntsync_char_release(struct inode *inode, struct file *file)
 static void init_obj(struct ntsync_obj *obj)
 {
 	kref_init(&obj->refcount);
+	atomic_set(&obj->all_hint, 0);
 	spin_lock_init(&obj->lock);
 	INIT_LIST_HEAD(&obj->any_waiters);
+	INIT_LIST_HEAD(&obj->all_waiters);
+}
+
+static bool is_signaled(struct ntsync_obj *obj, __u32 owner)
+{
+	lockdep_assert_held(&obj->lock);
+
+	switch (obj->type) {
+	case NTSYNC_TYPE_SEM:
+		return !!obj->u.sem.count;
+	}
+
+	WARN(1, "bad object type %#x\n", obj->type);
+	return false;
+}
+
+/*
+ * "locked_obj" is an optional pointer to an object which is already locked and
+ * should not be locked again. This is necessary so that changing an object's
+ * state and waking it can be a single atomic operation.
+ */
+static void try_wake_all(struct ntsync_device *dev, struct ntsync_q *q,
+			 struct ntsync_obj *locked_obj)
+{
+	__u32 count = q->count;
+	bool can_wake = true;
+	__u32 i;
+
+	lockdep_assert_held(&dev->wait_all_lock);
+	if (locked_obj)
+		lockdep_assert_held(&locked_obj->lock);
+
+	for (i = 0; i < count; i++) {
+		if (q->entries[i].obj != locked_obj)
+			spin_lock_nest_lock(&q->entries[i].obj->lock, &dev->wait_all_lock);
+	}
+
+	for (i = 0; i < count; i++) {
+		if (!is_signaled(q->entries[i].obj, q->owner)) {
+			can_wake = false;
+			break;
+		}
+	}
+
+	if (can_wake && atomic_cmpxchg(&q->signaled, -1, 0) == -1) {
+		for (i = 0; i < count; i++) {
+			struct ntsync_obj *obj = q->entries[i].obj;
+
+			switch (obj->type) {
+			case NTSYNC_TYPE_SEM:
+				obj->u.sem.count--;
+				break;
+			}
+		}
+		wake_up_process(q->task);
+	}
+
+	for (i = 0; i < count; i++) {
+		if (q->entries[i].obj != locked_obj)
+			spin_unlock(&q->entries[i].obj->lock);
+	}
+}
+
+static void try_wake_all_obj(struct ntsync_device *dev, struct ntsync_obj *obj)
+{
+	struct ntsync_q_entry *entry;
+
+	lockdep_assert_held(&dev->wait_all_lock);
+	lockdep_assert_held(&obj->lock);
+
+	list_for_each_entry(entry, &obj->all_waiters, node)
+		try_wake_all(dev, entry->q, obj);
 }
 
 static void try_wake_any_sem(struct ntsync_obj *sem)
@@ -234,14 +350,29 @@ static int ntsync_put_sem(struct ntsync_device *dev, void __user *argp)
 	if (!sem)
 		return -EINVAL;
 
-	spin_lock(&sem->lock);
+	if (atomic_read(&sem->all_hint) > 0) {
+		spin_lock(&dev->wait_all_lock);
+		spin_lock_nest_lock(&sem->lock, &dev->wait_all_lock);
 
-	prev_count = sem->u.sem.count;
-	ret = put_sem_state(sem, args.count);
-	if (!ret)
-		try_wake_any_sem(sem);
+		prev_count = sem->u.sem.count;
+		ret = put_sem_state(sem, args.count);
+		if (!ret) {
+			try_wake_all_obj(dev, sem);
+			try_wake_any_sem(sem);
+		}
 
-	spin_unlock(&sem->lock);
+		spin_unlock(&sem->lock);
+		spin_unlock(&dev->wait_all_lock);
+	} else {
+		spin_lock(&sem->lock);
+
+		prev_count = sem->u.sem.count;
+		ret = put_sem_state(sem, args.count);
+		if (!ret)
+			try_wake_any_sem(sem);
+
+		spin_unlock(&sem->lock);
+	}
 
 	put_obj(sem);
 
@@ -278,7 +409,7 @@ static int ntsync_schedule(const struct ntsync_q *q, ktime_t *timeout)
  * Also, calculate the relative timeout.
  */
 static int setup_wait(struct ntsync_device *dev,
-		      const struct ntsync_wait_args *args,
+		      const struct ntsync_wait_args *args, bool all,
 		      ktime_t *ret_timeout, struct ntsync_q **ret_q)
 {
 	const __u32 count = args->count;
@@ -321,6 +452,7 @@ static int setup_wait(struct ntsync_device *dev,
 	q->task = current;
 	q->owner = args->owner;
 	atomic_set(&q->signaled, -1);
+	q->all = all;
 	q->count = count;
 
 	for (i = 0; i < count; i++) {
@@ -330,6 +462,16 @@ static int setup_wait(struct ntsync_device *dev,
 		if (!obj)
 			goto err;
 
+		if (all) {
+			/* Check that the objects are all distinct. */
+			for (j = 0; j < i; j++) {
+				if (obj == q->entries[j].obj) {
+					put_obj(obj);
+					goto err;
+				}
+			}
+		}
+
 		entry->obj = obj;
 		entry->q = q;
 		entry->index = i;
@@ -370,7 +512,7 @@ static int ntsync_wait_any(struct ntsync_device *dev, void __user *argp)
 	if (copy_from_user(&args, argp, sizeof(args)))
 		return -EFAULT;
 
-	ret = setup_wait(dev, &args, &timeout, &q);
+	ret = setup_wait(dev, &args, false, &timeout, &q);
 	if (ret < 0)
 		return ret;
 
@@ -432,6 +574,87 @@ static int ntsync_wait_any(struct ntsync_device *dev, void __user *argp)
 	return ret;
 }
 
+static int ntsync_wait_all(struct ntsync_device *dev, void __user *argp)
+{
+	struct ntsync_wait_args args;
+	struct ntsync_q *q;
+	ktime_t timeout;
+	int signaled;
+	__u32 i;
+	int ret;
+
+	if (copy_from_user(&args, argp, sizeof(args)))
+		return -EFAULT;
+
+	ret = setup_wait(dev, &args, true, &timeout, &q);
+	if (ret < 0)
+		return ret;
+
+	/* queue ourselves */
+
+	spin_lock(&dev->wait_all_lock);
+
+	for (i = 0; i < args.count; i++) {
+		struct ntsync_q_entry *entry = &q->entries[i];
+		struct ntsync_obj *obj = entry->obj;
+
+		atomic_inc(&obj->all_hint);
+
+		/*
+		 * obj->all_waiters is protected by dev->wait_all_lock rather
+		 * than obj->lock, so there is no need to acquire it here.
+		 */
+		list_add_tail(&entry->node, &obj->all_waiters);
+	}
+
+	/* check if we are already signaled */
+
+	try_wake_all(dev, q, NULL);
+
+	spin_unlock(&dev->wait_all_lock);
+
+	/* sleep */
+
+	ret = ntsync_schedule(q, args.timeout ? &timeout : NULL);
+
+	/* and finally, unqueue */
+
+	spin_lock(&dev->wait_all_lock);
+
+	for (i = 0; i < args.count; i++) {
+		struct ntsync_q_entry *entry = &q->entries[i];
+		struct ntsync_obj *obj = entry->obj;
+
+		/*
+		 * obj->all_waiters is protected by dev->wait_all_lock rather
+		 * than obj->lock, so there is no need to acquire it here.
+		 */
+		list_del(&entry->node);
+
+		atomic_dec(&obj->all_hint);
+
+		put_obj(obj);
+	}
+
+	spin_unlock(&dev->wait_all_lock);
+
+	signaled = atomic_read(&q->signaled);
+	if (signaled != -1) {
+		struct ntsync_wait_args __user *user_args = argp;
+
+		/* even if we caught a signal, we need to communicate success */
+		ret = 0;
+
+		if (put_user(signaled, &user_args->index))
+			ret = -EFAULT;
+	} else if (!ret) {
+		ret = -ETIMEDOUT;
+	}
+
+	kfree(q);
+	return ret;
+}
+
 static long ntsync_char_ioctl(struct file *file, unsigned int cmd,
 			      unsigned long parm)
 {
@@ -445,6 +668,8 @@ static long ntsync_char_ioctl(struct file *file, unsigned int cmd,
 		return ntsync_delete(dev, argp);
 	case NTSYNC_IOC_PUT_SEM:
 		return ntsync_put_sem(dev, argp);
+	case NTSYNC_IOC_WAIT_ALL:
+		return ntsync_wait_all(dev, argp);
 	case NTSYNC_IOC_WAIT_ANY:
 		return ntsync_wait_any(dev, argp);
 	default:
diff --git a/include/uapi/linux/ntsync.h b/include/uapi/linux/ntsync.h
index 10f07da7864e..a5bed5a39b21 100644
--- a/include/uapi/linux/ntsync.h
+++ b/include/uapi/linux/ntsync.h
@@ -36,5 +36,7 @@ struct ntsync_wait_args {
 					      struct ntsync_sem_args)
 #define NTSYNC_IOC_WAIT_ANY		_IOWR(NTSYNC_IOC_BASE, 3, \
 					      struct ntsync_wait_args)
+#define NTSYNC_IOC_WAIT_ALL		_IOWR(NTSYNC_IOC_BASE, 4, \
+					      struct ntsync_wait_args)
 
 #endif
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [RFC PATCH 7/9] ntsync: Introduce NTSYNC_IOC_CREATE_MUTEX.
  2024-01-24  0:40 [RFC PATCH 0/9] NT synchronization primitive driver Elizabeth Figura
                   ` (5 preceding siblings ...)
  2024-01-24  0:40 ` [RFC PATCH 6/9] ntsync: Introduce NTSYNC_IOC_WAIT_ALL Elizabeth Figura
@ 2024-01-24  0:40 ` Elizabeth Figura
  2024-01-24  0:40 ` [RFC PATCH 8/9] ntsync: Introduce NTSYNC_IOC_PUT_MUTEX Elizabeth Figura
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 39+ messages in thread
From: Elizabeth Figura @ 2024-01-24  0:40 UTC (permalink / raw)
  To: Arnd Bergmann, Greg Kroah-Hartman, linux-kernel, linux-api
  Cc: wine-devel, André Almeida, Wolfram Sang, Arkadiusz Hiler,
	Peter Zijlstra, Elizabeth Figura

This corresponds to the NT syscall NtCreateMutant().

Signed-off-by: Elizabeth Figura <zfigura@codeweavers.com>
---
 drivers/misc/ntsync.c       | 72 +++++++++++++++++++++++++++++++++++++
 include/uapi/linux/ntsync.h |  8 +++++
 2 files changed, 80 insertions(+)

diff --git a/drivers/misc/ntsync.c b/drivers/misc/ntsync.c
index 2685363fae9e..d48f2ef41341 100644
--- a/drivers/misc/ntsync.c
+++ b/drivers/misc/ntsync.c
@@ -16,6 +16,7 @@
 
 enum ntsync_type {
 	NTSYNC_TYPE_SEM,
+	NTSYNC_TYPE_MUTEX,
 };
 
 struct ntsync_obj {
@@ -60,6 +61,10 @@ struct ntsync_obj {
 			__u32 count;
 			__u32 max;
 		} sem;
+		struct {
+			__u32 count;
+			__u32 owner;
+		} mutex;
 	} u;
 };
 
@@ -188,6 +193,10 @@ static bool is_signaled(struct ntsync_obj *obj, __u32 owner)
 	switch (obj->type) {
 	case NTSYNC_TYPE_SEM:
 		return !!obj->u.sem.count;
+	case NTSYNC_TYPE_MUTEX:
+		if (obj->u.mutex.owner && obj->u.mutex.owner != owner)
+			return false;
+		return obj->u.mutex.count < UINT_MAX;
 	}
 
 	WARN(1, "bad object type %#x\n", obj->type);
@@ -230,6 +239,10 @@ static void try_wake_all(struct ntsync_device *dev, struct ntsync_q *q,
 			case NTSYNC_TYPE_SEM:
 				obj->u.sem.count--;
 				break;
+			case NTSYNC_TYPE_MUTEX:
+				obj->u.mutex.count++;
+				obj->u.mutex.owner = q->owner;
+				break;
 			}
 		}
 		wake_up_process(q->task);
@@ -271,6 +284,28 @@ static void try_wake_any_sem(struct ntsync_obj *sem)
 	}
 }
 
+static void try_wake_any_mutex(struct ntsync_obj *mutex)
+{
+	struct ntsync_q_entry *entry;
+
+	lockdep_assert_held(&mutex->lock);
+
+	list_for_each_entry(entry, &mutex->any_waiters, node) {
+		struct ntsync_q *q = entry->q;
+
+		if (mutex->u.mutex.count == UINT_MAX)
+			break;
+		if (mutex->u.mutex.owner && mutex->u.mutex.owner != q->owner)
+			continue;
+
+		if (atomic_cmpxchg(&q->signaled, -1, entry->index) == -1) {
+			mutex->u.mutex.count++;
+			mutex->u.mutex.owner = q->owner;
+			wake_up_process(q->task);
+		}
+	}
+}
+
 static int ntsync_create_sem(struct ntsync_device *dev, void __user *argp)
 {
 	struct ntsync_sem_args __user *user_args = argp;
@@ -303,6 +338,38 @@ static int ntsync_create_sem(struct ntsync_device *dev, void __user *argp)
 	return put_user(id, &user_args->sem);
 }
 
+static int ntsync_create_mutex(struct ntsync_device *dev, void __user *argp)
+{
+	struct ntsync_mutex_args __user *user_args = argp;
+	struct ntsync_mutex_args args;
+	struct ntsync_obj *mutex;
+	__u32 id;
+	int ret;
+
+	if (copy_from_user(&args, argp, sizeof(args)))
+		return -EFAULT;
+
+	if (!args.owner != !args.count)
+		return -EINVAL;
+
+	mutex = kzalloc(sizeof(*mutex), GFP_KERNEL);
+	if (!mutex)
+		return -ENOMEM;
+
+	init_obj(mutex);
+	mutex->type = NTSYNC_TYPE_MUTEX;
+	mutex->u.mutex.count = args.count;
+	mutex->u.mutex.owner = args.owner;
+
+	ret = xa_alloc(&dev->objects, &id, mutex, xa_limit_32b, GFP_KERNEL);
+	if (ret < 0) {
+		kfree(mutex);
+		return ret;
+	}
+
+	return put_user(id, &user_args->mutex);
+}
+
 static int ntsync_delete(struct ntsync_device *dev, void __user *argp)
 {
 	struct ntsync_obj *obj;
@@ -497,6 +564,9 @@ static void try_wake_any_obj(struct ntsync_obj *obj)
 	case NTSYNC_TYPE_SEM:
 		try_wake_any_sem(obj);
 		break;
+	case NTSYNC_TYPE_MUTEX:
+		try_wake_any_mutex(obj);
+		break;
 	}
 }
 
@@ -662,6 +732,8 @@ static long ntsync_char_ioctl(struct file *file, unsigned int cmd,
 	void __user *argp = (void __user *)parm;
 
 	switch (cmd) {
+	case NTSYNC_IOC_CREATE_MUTEX:
+		return ntsync_create_mutex(dev, argp);
 	case NTSYNC_IOC_CREATE_SEM:
 		return ntsync_create_sem(dev, argp);
 	case NTSYNC_IOC_DELETE:
diff --git a/include/uapi/linux/ntsync.h b/include/uapi/linux/ntsync.h
index a5bed5a39b21..26d1b3d4847f 100644
--- a/include/uapi/linux/ntsync.h
+++ b/include/uapi/linux/ntsync.h
@@ -16,6 +16,12 @@ struct ntsync_sem_args {
 	__u32 max;
 };
 
+struct ntsync_mutex_args {
+	__u32 mutex;
+	__u32 owner;
+	__u32 count;
+};
+
 struct ntsync_wait_args {
 	__u64 timeout;
 	__u64 objs;
@@ -38,5 +44,7 @@ struct ntsync_wait_args {
 					      struct ntsync_wait_args)
 #define NTSYNC_IOC_WAIT_ALL		_IOWR(NTSYNC_IOC_BASE, 4, \
 					      struct ntsync_wait_args)
+#define NTSYNC_IOC_CREATE_MUTEX		_IOWR(NTSYNC_IOC_BASE, 5, \
+					      struct ntsync_mutex_args)
 
 #endif
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [RFC PATCH 8/9] ntsync: Introduce NTSYNC_IOC_PUT_MUTEX.
  2024-01-24  0:40 [RFC PATCH 0/9] NT synchronization primitive driver Elizabeth Figura
                   ` (6 preceding siblings ...)
  2024-01-24  0:40 ` [RFC PATCH 7/9] ntsync: Introduce NTSYNC_IOC_CREATE_MUTEX Elizabeth Figura
@ 2024-01-24  0:40 ` Elizabeth Figura
  2024-01-24  7:42   ` Arnd Bergmann
  2024-01-24  0:40 ` [RFC PATCH 9/9] ntsync: Introduce NTSYNC_IOC_KILL_OWNER Elizabeth Figura
  2024-01-24  0:59 ` [RFC PATCH 0/9] NT synchronization primitive driver Greg Kroah-Hartman
  9 siblings, 1 reply; 39+ messages in thread
From: Elizabeth Figura @ 2024-01-24  0:40 UTC (permalink / raw)
  To: Arnd Bergmann, Greg Kroah-Hartman, linux-kernel, linux-api
  Cc: wine-devel, André Almeida, Wolfram Sang, Arkadiusz Hiler,
	Peter Zijlstra, Elizabeth Figura

This corresponds to the NT syscall NtReleaseMutant().

Signed-off-by: Elizabeth Figura <zfigura@codeweavers.com>
---
 drivers/misc/ntsync.c       | 67 +++++++++++++++++++++++++++++++++++++
 include/uapi/linux/ntsync.h |  2 ++
 2 files changed, 69 insertions(+)

diff --git a/drivers/misc/ntsync.c b/drivers/misc/ntsync.c
index d48f2ef41341..28f43768d1c3 100644
--- a/drivers/misc/ntsync.c
+++ b/drivers/misc/ntsync.c
@@ -449,6 +449,71 @@ static int ntsync_put_sem(struct ntsync_device *dev, void __user *argp)
 	return ret;
 }
 
+/*
+ * Actually change the mutex state, returning -EPERM if not the owner.
+ */
+static int put_mutex_state(struct ntsync_obj *mutex,
+			   const struct ntsync_mutex_args *args)
+{
+	lockdep_assert_held(&mutex->lock);
+
+	if (mutex->u.mutex.owner != args->owner)
+		return -EPERM;
+
+	if (!--mutex->u.mutex.count)
+		mutex->u.mutex.owner = 0;
+	return 0;
+}
+
+static int ntsync_put_mutex(struct ntsync_device *dev, void __user *argp)
+{
+	struct ntsync_mutex_args __user *user_args = argp;
+	struct ntsync_mutex_args args;
+	struct ntsync_obj *mutex;
+	__u32 prev_count;
+	int ret;
+
+	if (copy_from_user(&args, argp, sizeof(args)))
+		return -EFAULT;
+	if (!args.owner)
+		return -EINVAL;
+
+	mutex = get_obj_typed(dev, args.mutex, NTSYNC_TYPE_MUTEX);
+	if (!mutex)
+		return -EINVAL;
+
+	if (atomic_read(&mutex->all_hint) > 0) {
+		spin_lock(&dev->wait_all_lock);
+		spin_lock_nest_lock(&mutex->lock, &dev->wait_all_lock);
+
+		prev_count = mutex->u.mutex.count;
+		ret = put_mutex_state(mutex, &args);
+		if (!ret) {
+			try_wake_all_obj(dev, mutex);
+			try_wake_any_mutex(mutex);
+		}
+
+		spin_unlock(&mutex->lock);
+		spin_unlock(&dev->wait_all_lock);
+	} else {
+		spin_lock(&mutex->lock);
+
+		prev_count = mutex->u.mutex.count;
+		ret = put_mutex_state(mutex, &args);
+		if (!ret)
+			try_wake_any_mutex(mutex);
+
+		spin_unlock(&mutex->lock);
+	}
+
+	put_obj(mutex);
+
+	if (!ret && put_user(prev_count, &user_args->count))
+		ret = -EFAULT;
+
+	return ret;
+}
+
 static int ntsync_schedule(const struct ntsync_q *q, ktime_t *timeout)
 {
 	int ret = 0;
@@ -738,6 +803,8 @@ static long ntsync_char_ioctl(struct file *file, unsigned int cmd,
 		return ntsync_create_sem(dev, argp);
 	case NTSYNC_IOC_DELETE:
 		return ntsync_delete(dev, argp);
+	case NTSYNC_IOC_PUT_MUTEX:
+		return ntsync_put_mutex(dev, argp);
 	case NTSYNC_IOC_PUT_SEM:
 		return ntsync_put_sem(dev, argp);
 	case NTSYNC_IOC_WAIT_ALL:
diff --git a/include/uapi/linux/ntsync.h b/include/uapi/linux/ntsync.h
index 26d1b3d4847f..2e44e7e77776 100644
--- a/include/uapi/linux/ntsync.h
+++ b/include/uapi/linux/ntsync.h
@@ -46,5 +46,7 @@ struct ntsync_wait_args {
 					      struct ntsync_wait_args)
 #define NTSYNC_IOC_CREATE_MUTEX		_IOWR(NTSYNC_IOC_BASE, 5, \
 					      struct ntsync_mutex_args)
+#define NTSYNC_IOC_PUT_MUTEX		_IOWR(NTSYNC_IOC_BASE, 6, \
+					      struct ntsync_mutex_args)
 
 #endif
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [RFC PATCH 9/9] ntsync: Introduce NTSYNC_IOC_KILL_OWNER.
  2024-01-24  0:40 [RFC PATCH 0/9] NT synchronization primitive driver Elizabeth Figura
                   ` (7 preceding siblings ...)
  2024-01-24  0:40 ` [RFC PATCH 8/9] ntsync: Introduce NTSYNC_IOC_PUT_MUTEX Elizabeth Figura
@ 2024-01-24  0:40 ` Elizabeth Figura
  2024-01-24  0:59 ` [RFC PATCH 0/9] NT synchronization primitive driver Greg Kroah-Hartman
  9 siblings, 0 replies; 39+ messages in thread
From: Elizabeth Figura @ 2024-01-24  0:40 UTC (permalink / raw)
  To: Arnd Bergmann, Greg Kroah-Hartman, linux-kernel, linux-api
  Cc: wine-devel, André Almeida, Wolfram Sang, Arkadiusz Hiler,
	Peter Zijlstra, Elizabeth Figura

This does not correspond to any NT syscall, but rather should be called by the
user-space NT emulator when a thread dies. It is responsible for marking any
mutexes owned by that thread as abandoned.

Signed-off-by: Elizabeth Figura <zfigura@codeweavers.com>
---
 drivers/misc/ntsync.c       | 80 ++++++++++++++++++++++++++++++++++++-
 include/uapi/linux/ntsync.h |  1 +
 2 files changed, 79 insertions(+), 2 deletions(-)

diff --git a/drivers/misc/ntsync.c b/drivers/misc/ntsync.c
index 28f43768d1c3..1173c750c106 100644
--- a/drivers/misc/ntsync.c
+++ b/drivers/misc/ntsync.c
@@ -64,6 +64,7 @@ struct ntsync_obj {
 		struct {
 			__u32 count;
 			__u32 owner;
+			bool ownerdead;
 		} mutex;
 	} u;
 };
@@ -87,6 +88,7 @@ struct ntsync_q {
 	atomic_t signaled;
 
 	bool all;
+	bool ownerdead;
 	__u32 count;
 	struct ntsync_q_entry entries[];
 };
@@ -240,6 +242,9 @@ static void try_wake_all(struct ntsync_device *dev, struct ntsync_q *q,
 				obj->u.sem.count--;
 				break;
 			case NTSYNC_TYPE_MUTEX:
+				if (obj->u.mutex.ownerdead)
+					q->ownerdead = true;
+				obj->u.mutex.ownerdead = false;
 				obj->u.mutex.count++;
 				obj->u.mutex.owner = q->owner;
 				break;
@@ -299,6 +304,9 @@ static void try_wake_any_mutex(struct ntsync_obj *mutex)
 			continue;
 
 		if (atomic_cmpxchg(&q->signaled, -1, entry->index) == -1) {
+			if (mutex->u.mutex.ownerdead)
+				q->ownerdead = true;
+			mutex->u.mutex.ownerdead = false;
 			mutex->u.mutex.count++;
 			mutex->u.mutex.owner = q->owner;
 			wake_up_process(q->task);
@@ -514,6 +522,71 @@ static int ntsync_put_mutex(struct ntsync_device *dev, void __user *argp)
 	return ret;
 }
 
+/*
+ * Actually change the mutex state to mark its owner as dead.
+ */
+static void put_mutex_ownerdead_state(struct ntsync_obj *mutex)
+{
+	lockdep_assert_held(&mutex->lock);
+
+	mutex->u.mutex.ownerdead = true;
+	mutex->u.mutex.owner = 0;
+	mutex->u.mutex.count = 0;
+}
+
+static int ntsync_kill_owner(struct ntsync_device *dev, void __user *argp)
+{
+	struct ntsync_obj *obj;
+	unsigned long id;
+	__u32 owner;
+
+	if (get_user(owner, (__u32 __user *)argp))
+		return -EFAULT;
+	if (!owner)
+		return -EINVAL;
+
+	rcu_read_lock();
+
+	xa_for_each(&dev->objects, id, obj) {
+		if (!kref_get_unless_zero(&obj->refcount))
+			continue;
+
+		if (obj->type != NTSYNC_TYPE_MUTEX) {
+			put_obj(obj);
+			continue;
+		}
+
+		if (atomic_read(&obj->all_hint) > 0) {
+			spin_lock(&dev->wait_all_lock);
+			spin_lock_nest_lock(&obj->lock, &dev->wait_all_lock);
+
+			if (obj->u.mutex.owner == owner) {
+				put_mutex_ownerdead_state(obj);
+				try_wake_all_obj(dev, obj);
+				try_wake_any_mutex(obj);
+			}
+
+			spin_unlock(&obj->lock);
+			spin_unlock(&dev->wait_all_lock);
+		} else {
+			spin_lock(&obj->lock);
+
+			if (obj->u.mutex.owner == owner) {
+				put_mutex_ownerdead_state(obj);
+				try_wake_any_mutex(obj);
+			}
+
+			spin_unlock(&obj->lock);
+		}
+
+		put_obj(obj);
+	}
+
+	rcu_read_unlock();
+
+	return 0;
+}
+
 static int ntsync_schedule(const struct ntsync_q *q, ktime_t *timeout)
 {
 	int ret = 0;
@@ -585,6 +658,7 @@ static int setup_wait(struct ntsync_device *dev,
 	q->owner = args->owner;
 	atomic_set(&q->signaled, -1);
 	q->all = all;
+	q->ownerdead = false;
 	q->count = count;
 
 	for (i = 0; i < count; i++) {
@@ -697,7 +771,7 @@ static int ntsync_wait_any(struct ntsync_device *dev, void __user *argp)
 		struct ntsync_wait_args __user *user_args = argp;
 
 		/* even if we caught a signal, we need to communicate success */
-		ret = 0;
+		ret = q->ownerdead ? -EOWNERDEAD : 0;
 
 		if (put_user(signaled, &user_args->index))
 			ret = -EFAULT;
@@ -778,7 +852,7 @@ static int ntsync_wait_all(struct ntsync_device *dev, void __user *argp)
 		struct ntsync_wait_args __user *user_args = argp;
 
 		/* even if we caught a signal, we need to communicate success */
-		ret = 0;
+		ret = q->ownerdead ? -EOWNERDEAD : 0;
 
 		if (put_user(signaled, &user_args->index))
 			ret = -EFAULT;
@@ -803,6 +877,8 @@ static long ntsync_char_ioctl(struct file *file, unsigned int cmd,
 		return ntsync_create_sem(dev, argp);
 	case NTSYNC_IOC_DELETE:
 		return ntsync_delete(dev, argp);
+	case NTSYNC_IOC_KILL_OWNER:
+		return ntsync_kill_owner(dev, argp);
 	case NTSYNC_IOC_PUT_MUTEX:
 		return ntsync_put_mutex(dev, argp);
 	case NTSYNC_IOC_PUT_SEM:
diff --git a/include/uapi/linux/ntsync.h b/include/uapi/linux/ntsync.h
index 2e44e7e77776..fec9a3993322 100644
--- a/include/uapi/linux/ntsync.h
+++ b/include/uapi/linux/ntsync.h
@@ -48,5 +48,6 @@ struct ntsync_wait_args {
 					      struct ntsync_mutex_args)
 #define NTSYNC_IOC_PUT_MUTEX		_IOWR(NTSYNC_IOC_BASE, 6, \
 					      struct ntsync_mutex_args)
+#define NTSYNC_IOC_KILL_OWNER		_IOW (NTSYNC_IOC_BASE, 7, __u32)
 
 #endif
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* Re: [RFC PATCH 2/9] ntsync: Reserve a minor device number and ioctl range.
  2024-01-24  0:40 ` [RFC PATCH 2/9] ntsync: Reserve a minor device number and ioctl range Elizabeth Figura
@ 2024-01-24  0:54   ` Greg Kroah-Hartman
  2024-01-24  3:43     ` Elizabeth Figura
  0 siblings, 1 reply; 39+ messages in thread
From: Greg Kroah-Hartman @ 2024-01-24  0:54 UTC (permalink / raw)
  To: Elizabeth Figura
  Cc: Arnd Bergmann, linux-kernel, linux-api, wine-devel,
	André Almeida, Wolfram Sang, Arkadiusz Hiler,
	Peter Zijlstra

On Tue, Jan 23, 2024 at 06:40:21PM -0600, Elizabeth Figura wrote:
> Signed-off-by: Elizabeth Figura <zfigura@codeweavers.com>
> ---

Note, we can't take patches without any changelog text, and you don't
want us to :)

>  Documentation/admin-guide/devices.txt              | 3 ++-
>  Documentation/userspace-api/ioctl/ioctl-number.rst | 2 ++
>  drivers/misc/ntsync.c                              | 3 ++-
>  include/linux/miscdevice.h                         | 1 +
>  4 files changed, 7 insertions(+), 2 deletions(-)
> 
> diff --git a/Documentation/admin-guide/devices.txt b/Documentation/admin-guide/devices.txt
> index 94c98be1329a..041404397ee5 100644
> --- a/Documentation/admin-guide/devices.txt
> +++ b/Documentation/admin-guide/devices.txt
> @@ -376,8 +376,9 @@
>  		240 = /dev/userio	Serio driver testing device
>  		241 = /dev/vhost-vsock	Host kernel driver for virtio vsock
>  		242 = /dev/rfkill	Turning off radio transmissions (rfkill)
> +		243 = /dev/ntsync	NT synchronization primitive device
>  
> -		243-254			Reserved for local use
> +		244-254			Reserved for local use

Why do you need a fixed minor number?  Can't your userspace handle
dynamic numbers?  What systems require a static value?

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [RFC PATCH 0/9] NT synchronization primitive driver
  2024-01-24  0:40 [RFC PATCH 0/9] NT synchronization primitive driver Elizabeth Figura
                   ` (8 preceding siblings ...)
  2024-01-24  0:40 ` [RFC PATCH 9/9] ntsync: Introduce NTSYNC_IOC_KILL_OWNER Elizabeth Figura
@ 2024-01-24  0:59 ` Greg Kroah-Hartman
  2024-01-24  1:37   ` Elizabeth Figura
  9 siblings, 1 reply; 39+ messages in thread
From: Greg Kroah-Hartman @ 2024-01-24  0:59 UTC (permalink / raw)
  To: Elizabeth Figura
  Cc: Arnd Bergmann, linux-kernel, linux-api, wine-devel,
	André Almeida, Wolfram Sang, Arkadiusz Hiler,
	Peter Zijlstra

On Tue, Jan 23, 2024 at 06:40:19PM -0600, Elizabeth Figura wrote:
> == Patches ==
> 
> This is the first part of a 32-patch series. The series comprises 17 patches
> which contain the actual implementation, 13 which provide self-tests, 1 to
> update the MAINTAINERS file, and 1 to add API documentation.

32 patches?  I only see 9 here, why not submit them all?

> The intended semantics of the patches are broadly intended to match those of the
> corresponding Windows functions. Since I do not expect familiarity with Windows
> syscalls, however, and especially not with some of the more subtle or
> unspecified behaviour that they provide, the documentation patch included in the
> series also describes the intended behaviour in detail, and can be used as a
> specification for the rest of the series.
> 
> The entire series can be retrieved or browsed here:
> 
>     https://repo.or.cz/linux/zf.git/shortlog/refs/heads/ntsync4

No one is going to dig elsewhere for kernel changes, sorry.  Please
submit them in email for review, that's the only way we can look at them
and comment.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [RFC PATCH 3/9] ntsync: Introduce NTSYNC_IOC_CREATE_SEM and NTSYNC_IOC_DELETE.
  2024-01-24  0:40 ` [RFC PATCH 3/9] ntsync: Introduce NTSYNC_IOC_CREATE_SEM and NTSYNC_IOC_DELETE Elizabeth Figura
@ 2024-01-24  1:14   ` Greg Kroah-Hartman
  2024-01-24  3:35     ` Elizabeth Figura
  0 siblings, 1 reply; 39+ messages in thread
From: Greg Kroah-Hartman @ 2024-01-24  1:14 UTC (permalink / raw)
  To: Elizabeth Figura
  Cc: Arnd Bergmann, linux-kernel, linux-api, wine-devel,
	André Almeida, Wolfram Sang, Arkadiusz Hiler,
	Peter Zijlstra

On Tue, Jan 23, 2024 at 06:40:22PM -0600, Elizabeth Figura wrote:
> +static int ntsync_create_sem(struct ntsync_device *dev, void __user *argp)
> +{
> +	struct ntsync_sem_args __user *user_args = argp;
> +	struct ntsync_sem_args args;
> +	struct ntsync_obj *sem;
> +	__u32 id;
> +	int ret;
> +
> +	if (copy_from_user(&args, argp, sizeof(args)))
> +		return -EFAULT;
> +
> +	if (args.count > args.max)
> +		return -EINVAL;

No bounds checking on count or max?

What's the relationship between count and max?  Some sort of real
documentation is needed here, the changelog needs to explain this.  Or
somewhere, but as-is, this patch series is pretty unreviewable as I
can't figure out how to review it because I don't know what it wants to
do.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [RFC PATCH 0/9] NT synchronization primitive driver
  2024-01-24  0:59 ` [RFC PATCH 0/9] NT synchronization primitive driver Greg Kroah-Hartman
@ 2024-01-24  1:37   ` Elizabeth Figura
  2024-01-24 12:29     ` Greg Kroah-Hartman
  0 siblings, 1 reply; 39+ messages in thread
From: Elizabeth Figura @ 2024-01-24  1:37 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Arnd Bergmann, linux-kernel, linux-api, wine-devel,
	André Almeida, Wolfram Sang, Arkadiusz Hiler,
	Peter Zijlstra

On Tuesday, 23 January 2024 18:59:35 CST Greg Kroah-Hartman wrote:
> On Tue, Jan 23, 2024 at 06:40:19PM -0600, Elizabeth Figura wrote:
> > == Patches ==
> > 
> > This is the first part of a 32-patch series. The series comprises 17 patches
> > which contain the actual implementation, 13 which provide self-tests, 1 to
> > update the MAINTAINERS file, and 1 to add API documentation.
> 
> 32 patches?  I only see 9 here, why not submit them all?

Because Documentation/process/submitting-patches.rst makes a point of asking people not to submit large patch series (and it matches the expectation of other projects I've worked with—that patches would be submitted and reviewed a few at a time). I suppose I've misunderstood that advice, though.

I'll resend with the entire series. Sorry for the noise.



^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [RFC PATCH 3/9] ntsync: Introduce NTSYNC_IOC_CREATE_SEM and NTSYNC_IOC_DELETE.
  2024-01-24  1:14   ` Greg Kroah-Hartman
@ 2024-01-24  3:35     ` Elizabeth Figura
  0 siblings, 0 replies; 39+ messages in thread
From: Elizabeth Figura @ 2024-01-24  3:35 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Arnd Bergmann, linux-kernel, linux-api, wine-devel,
	André Almeida, Wolfram Sang, Arkadiusz Hiler,
	Peter Zijlstra

On Tuesday, 23 January 2024 19:14:17 CST Greg Kroah-Hartman wrote:
> On Tue, Jan 23, 2024 at 06:40:22PM -0600, Elizabeth Figura wrote:
> > +static int ntsync_create_sem(struct ntsync_device *dev, void __user
> > *argp)
> > +{
> > +	struct ntsync_sem_args __user *user_args = argp;
> > +	struct ntsync_sem_args args;
> > +	struct ntsync_obj *sem;
> > +	__u32 id;
> > +	int ret;
> > +
> > +	if (copy_from_user(&args, argp, sizeof(args)))
> > +		return -EFAULT;
> > +
> > +	if (args.count > args.max)
> > +		return -EINVAL;
> 
> No bounds checking on count or max?
> 
> What's the relationship between count and max?  

Indeed, no bounds checking. The counter is just the semaphore's internal value 
and has no meaning other than that.

It's basically like an EFD_SEMAPHORE, except that the maximum is configurable 
rather than always being 2**64-2.

> Some sort of real
> documentation is needed here, the changelog needs to explain this.  Or
> somewhere, but as-is, this patch series is pretty unreviewable as I
> can't figure out how to review it because I don't know what it wants to
> do.

There is some comprehensive documentation in the series, but for ease of 
review I will try to write a basic description of the API in each relevant 
patch in v2.



^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [RFC PATCH 2/9] ntsync: Reserve a minor device number and ioctl range.
  2024-01-24  0:54   ` Greg Kroah-Hartman
@ 2024-01-24  3:43     ` Elizabeth Figura
  2024-01-24 12:32       ` Greg Kroah-Hartman
  0 siblings, 1 reply; 39+ messages in thread
From: Elizabeth Figura @ 2024-01-24  3:43 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Arnd Bergmann, linux-kernel, linux-api, wine-devel,
	André Almeida, Wolfram Sang, Arkadiusz Hiler,
	Peter Zijlstra

On Tuesday, 23 January 2024 18:54:02 CST Greg Kroah-Hartman wrote:
> On Tue, Jan 23, 2024 at 06:40:21PM -0600, Elizabeth Figura wrote:
> > Signed-off-by: Elizabeth Figura <zfigura@codeweavers.com>
> > ---
> 
> Note, we can't take patches without any changelog text, and you don't
> want us to :)
> 
> >  Documentation/admin-guide/devices.txt              | 3 ++-
> >  Documentation/userspace-api/ioctl/ioctl-number.rst | 2 ++
> >  drivers/misc/ntsync.c                              | 3 ++-
> >  include/linux/miscdevice.h                         | 1 +
> >  4 files changed, 7 insertions(+), 2 deletions(-)
> > 
> > diff --git a/Documentation/admin-guide/devices.txt
> > b/Documentation/admin-guide/devices.txt index 94c98be1329a..041404397ee5
> > 100644
> > --- a/Documentation/admin-guide/devices.txt
> > +++ b/Documentation/admin-guide/devices.txt
> > @@ -376,8 +376,9 @@
> > 
> >  		240 = /dev/userio	Serio driver testing device
> >  		241 = /dev/vhost-vsock	Host kernel driver for virtio 
vsock
> >  		242 = /dev/rfkill	Turning off radio transmissions 
(rfkill)
> > 
> > +		243 = /dev/ntsync	NT synchronization primitive 
device
> > 
> > -		243-254			Reserved for local use
> > +		244-254			Reserved for local use
> 
> Why do you need a fixed minor number?  Can't your userspace handle
> dynamic numbers?  What systems require a static value?

I believe I added this because it's necessary for MODULE_ALIAS (and, more 
broadly, because I was following the example of vaguely comparable devices 
like /dev/loop-control). I suppose I could instead just remove MODULE_ALIAS 
(or even remove the ability to compile ntsync as a module entirely).

It's a bit difficult to figure out what's the preferred way to organize things 
like this (there not being a lot of precedent for this kind of driver) so I'd 
appreciate any direction.

--Zeb



^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [RFC PATCH 1/9] ntsync: Introduce the ntsync driver and character device.
  2024-01-24  0:40 ` [RFC PATCH 1/9] ntsync: Introduce the ntsync driver and character device Elizabeth Figura
@ 2024-01-24  7:38   ` Arnd Bergmann
  2024-01-24 17:51     ` Elizabeth Figura
  2024-01-24 21:26   ` Andy Lutomirski
  1 sibling, 1 reply; 39+ messages in thread
From: Arnd Bergmann @ 2024-01-24  7:38 UTC (permalink / raw)
  To: Elizabeth Figura, Greg Kroah-Hartman, linux-kernel, linux-api
  Cc: wine-devel, André Almeida, Wolfram Sang, Arkadiusz Hiler,
	Peter Zijlstra

On Wed, Jan 24, 2024, at 01:40, Elizabeth Figura wrote:
> ntsync uses a misc device as the simplest and least intrusive uAPI interface.
>
> Each file description on the device represents an isolated NT instance, intended
> to correspond to a single NT virtual machine.
>
> Signed-off-by: Elizabeth Figura <zfigura@codeweavers.com>

I'm looking at the ioctl interface to ensure it's well-formed.

Your patches look ok from that perspective, but there are a
few minor things I would check for consistency here:

> +
> +static const struct file_operations ntsync_fops = {
> +	.owner		= THIS_MODULE,
> +	.open		= ntsync_char_open,
> +	.release	= ntsync_char_release,
> +	.unlocked_ioctl	= ntsync_char_ioctl,
> +	.compat_ioctl	= ntsync_char_ioctl,
> +	.llseek		= no_llseek,
> +};

The .compat_ioctl pointer should point to compat_ptr_ioctl()
since the actual ioctl commands all take pointers instead
of interpreting the argument as a number.

On x86 and arm64 this won't make a difference as compat_ptr()
is a nop.

     Arnd

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [RFC PATCH 8/9] ntsync: Introduce NTSYNC_IOC_PUT_MUTEX.
  2024-01-24  0:40 ` [RFC PATCH 8/9] ntsync: Introduce NTSYNC_IOC_PUT_MUTEX Elizabeth Figura
@ 2024-01-24  7:42   ` Arnd Bergmann
  2024-01-24 18:03     ` Elizabeth Figura
  0 siblings, 1 reply; 39+ messages in thread
From: Arnd Bergmann @ 2024-01-24  7:42 UTC (permalink / raw)
  To: Elizabeth Figura, Greg Kroah-Hartman, linux-kernel, linux-api
  Cc: wine-devel, André Almeida, Wolfram Sang, Arkadiusz Hiler,
	Peter Zijlstra

On Wed, Jan 24, 2024, at 01:40, Elizabeth Figura wrote:
> @@ -738,6 +803,8 @@ static long ntsync_char_ioctl(struct file *file, 
> diff --git a/include/uapi/linux/ntsync.h b/include/uapi/linux/ntsync.h
> index 26d1b3d4847f..2e44e7e77776 100644
> --- a/include/uapi/linux/ntsync.h
> +++ b/include/uapi/linux/ntsync.h
> @@ -46,5 +46,7 @@ struct ntsync_wait_args {
>  					      struct ntsync_wait_args)
>  #define NTSYNC_IOC_CREATE_MUTEX		_IOWR(NTSYNC_IOC_BASE, 5, \
>  					      struct ntsync_mutex_args)
> +#define NTSYNC_IOC_PUT_MUTEX		_IOWR(NTSYNC_IOC_BASE, 6, \
> +					      struct ntsync_mutex_args)
> 

In your implementation, this argument is not written back to
user space, so I think this should be _IOW rather than than _IORW.

Again, no practical difference here.

     Arnd

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [RFC PATCH 5/9] ntsync: Introduce NTSYNC_IOC_WAIT_ANY.
  2024-01-24  0:40 ` [RFC PATCH 5/9] ntsync: Introduce NTSYNC_IOC_WAIT_ANY Elizabeth Figura
@ 2024-01-24  7:56   ` Arnd Bergmann
  2024-01-24 18:02     ` Elizabeth Figura
  0 siblings, 1 reply; 39+ messages in thread
From: Arnd Bergmann @ 2024-01-24  7:56 UTC (permalink / raw)
  To: Elizabeth Figura, Greg Kroah-Hartman, linux-kernel, linux-api
  Cc: wine-devel, André Almeida, Wolfram Sang, Arkadiusz Hiler,
	Peter Zijlstra

On Wed, Jan 24, 2024, at 01:40, Elizabeth Figura wrote:

> +	if (args->timeout) {
> +		struct timespec64 to;
> +
> +		if (get_timespec64(&to, u64_to_user_ptr(args->timeout)))
> +			return -EFAULT;
> +		if (!timespec64_valid(&to))
> +			return -EINVAL;
> +
> +		timeout = timespec64_to_ns(&to);
> +	}

Have you considered just passing the nanosecond value here?
Since you do not appear to write it back, that would avoid
the complexities of dealing with timespec layout differences
and indirection.

> +	ids = kmalloc_array(count, sizeof(*ids), GFP_KERNEL);
> +	if (!ids)
> +		return -ENOMEM;
> +	if (copy_from_user(ids, u64_to_user_ptr(args->objs),
> +			   array_size(count, sizeof(*ids)))) {
> +		kfree(ids);
> +		return -EFAULT;
> +	}

This looks like memdup_user() would be slightly simpler.

      Arnd

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [RFC PATCH 0/9] NT synchronization primitive driver
  2024-01-24  1:37   ` Elizabeth Figura
@ 2024-01-24 12:29     ` Greg Kroah-Hartman
  0 siblings, 0 replies; 39+ messages in thread
From: Greg Kroah-Hartman @ 2024-01-24 12:29 UTC (permalink / raw)
  To: Elizabeth Figura
  Cc: Arnd Bergmann, linux-kernel, linux-api, wine-devel,
	André Almeida, Wolfram Sang, Arkadiusz Hiler,
	Peter Zijlstra

On Tue, Jan 23, 2024 at 07:37:01PM -0600, Elizabeth Figura wrote:
> On Tuesday, 23 January 2024 18:59:35 CST Greg Kroah-Hartman wrote:
> > On Tue, Jan 23, 2024 at 06:40:19PM -0600, Elizabeth Figura wrote:
> > > == Patches ==
> > > 
> > > This is the first part of a 32-patch series. The series comprises 17 patches
> > > which contain the actual implementation, 13 which provide self-tests, 1 to
> > > update the MAINTAINERS file, and 1 to add API documentation.
> > 
> > 32 patches?  I only see 9 here, why not submit them all?
> 
> Because Documentation/process/submitting-patches.rst makes a point of asking people not to submit large patch series (and it matches the expectation of other projects I've worked with—that patches would be submitted and reviewed a few at a time). I suppose I've misunderstood that advice, though.

32 patches isn't all that "large", we can handle that easily :)

100+ patches is large, I guess it all depends, so I can understand the
confusion.  You need to send us enough for us to be able to understand
and review the code, this was a bit short for that.

> I'll resend with the entire series. Sorry for the noise.

Ptches are NOT noise, we want to see them!

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [RFC PATCH 2/9] ntsync: Reserve a minor device number and ioctl range.
  2024-01-24  3:43     ` Elizabeth Figura
@ 2024-01-24 12:32       ` Greg Kroah-Hartman
  2024-01-24 17:59         ` Elizabeth Figura
  0 siblings, 1 reply; 39+ messages in thread
From: Greg Kroah-Hartman @ 2024-01-24 12:32 UTC (permalink / raw)
  To: Elizabeth Figura
  Cc: Arnd Bergmann, linux-kernel, linux-api, wine-devel,
	André Almeida, Wolfram Sang, Arkadiusz Hiler,
	Peter Zijlstra

On Tue, Jan 23, 2024 at 09:43:09PM -0600, Elizabeth Figura wrote:
> > Why do you need a fixed minor number?  Can't your userspace handle
> > dynamic numbers?  What systems require a static value?
> 
> I believe I added this because it's necessary for MODULE_ALIAS (and, more 
> broadly, because I was following the example of vaguely comparable devices 
> like /dev/loop-control). I suppose I could instead just remove MODULE_ALIAS 
> (or even remove the ability to compile ntsync as a module entirely).

Do you really need MODULE_ALIAS()?  Having it for char devices to be
auto-loaded is not generally considered a good idea anymore as systems
should have the module loaded before userspace goes and asks for it.

It also reduces suprises when any random userspace program can cause
kernel modules to be loaded for no real reason.

> It's a bit difficult to figure out what's the preferred way to organize things 
> like this (there not being a lot of precedent for this kind of driver) so I'd 
> appreciate any direction.

For now, I'd just stick to a dynamic id, no module alias, and if it's
ever needed in the future, it can be added.  But if you add it now, we
can't ever remove it as it's user-visible functionality.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [RFC PATCH 1/9] ntsync: Introduce the ntsync driver and character device.
  2024-01-24  7:38   ` Arnd Bergmann
@ 2024-01-24 17:51     ` Elizabeth Figura
  0 siblings, 0 replies; 39+ messages in thread
From: Elizabeth Figura @ 2024-01-24 17:51 UTC (permalink / raw)
  To: Greg Kroah-Hartman, linux-kernel, linux-api, Arnd Bergmann
  Cc: wine-devel, André Almeida, Wolfram Sang, Arkadiusz Hiler,
	Peter Zijlstra

On Wednesday, 24 January 2024 01:38:52 CST Arnd Bergmann wrote:
> On Wed, Jan 24, 2024, at 01:40, Elizabeth Figura wrote:
> > ntsync uses a misc device as the simplest and least intrusive uAPI interface.
> >
> > Each file description on the device represents an isolated NT instance, intended
> > to correspond to a single NT virtual machine.
> >
> > Signed-off-by: Elizabeth Figura <zfigura@codeweavers.com>
> 
> I'm looking at the ioctl interface to ensure it's well-formed.
> 
> Your patches look ok from that perspective, but there are a
> few minor things I would check for consistency here:
> 
> > +
> > +static const struct file_operations ntsync_fops = {
> > +	.owner		= THIS_MODULE,
> > +	.open		= ntsync_char_open,
> > +	.release	= ntsync_char_release,
> > +	.unlocked_ioctl	= ntsync_char_ioctl,
> > +	.compat_ioctl	= ntsync_char_ioctl,
> > +	.llseek		= no_llseek,
> > +};
> 
> The .compat_ioctl pointer should point to compat_ptr_ioctl()
> since the actual ioctl commands all take pointers instead
> of interpreting the argument as a number.
> 
> On x86 and arm64 this won't make a difference as compat_ptr()
> is a nop.

Thanks; will fix.



^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [RFC PATCH 2/9] ntsync: Reserve a minor device number and ioctl range.
  2024-01-24 12:32       ` Greg Kroah-Hartman
@ 2024-01-24 17:59         ` Elizabeth Figura
  0 siblings, 0 replies; 39+ messages in thread
From: Elizabeth Figura @ 2024-01-24 17:59 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Arnd Bergmann, linux-kernel, linux-api, wine-devel,
	André Almeida, Wolfram Sang, Arkadiusz Hiler,
	Peter Zijlstra

On Wednesday, 24 January 2024 06:32:13 CST Greg Kroah-Hartman wrote:
> On Tue, Jan 23, 2024 at 09:43:09PM -0600, Elizabeth Figura wrote:
> > > Why do you need a fixed minor number?  Can't your userspace handle
> > > dynamic numbers?  What systems require a static value?
> > 
> > I believe I added this because it's necessary for MODULE_ALIAS (and, more 
> > broadly, because I was following the example of vaguely comparable devices 
> > like /dev/loop-control). I suppose I could instead just remove MODULE_ALIAS 
> > (or even remove the ability to compile ntsync as a module entirely).
> 
> Do you really need MODULE_ALIAS()?  Having it for char devices to be
> auto-loaded is not generally considered a good idea anymore as systems
> should have the module loaded before userspace goes and asks for it.
> 
> It also reduces suprises when any random userspace program can cause
> kernel modules to be loaded for no real reason.

I think there's no reason we can't make loading the system's problem. I'll remove it in v2.



^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [RFC PATCH 5/9] ntsync: Introduce NTSYNC_IOC_WAIT_ANY.
  2024-01-24  7:56   ` Arnd Bergmann
@ 2024-01-24 18:02     ` Elizabeth Figura
  2024-01-24 19:52       ` Arnd Bergmann
  0 siblings, 1 reply; 39+ messages in thread
From: Elizabeth Figura @ 2024-01-24 18:02 UTC (permalink / raw)
  To: Greg Kroah-Hartman, linux-kernel, linux-api, Arnd Bergmann
  Cc: wine-devel, André Almeida, Wolfram Sang, Arkadiusz Hiler,
	Peter Zijlstra

On Wednesday, 24 January 2024 01:56:52 CST Arnd Bergmann wrote:
> On Wed, Jan 24, 2024, at 01:40, Elizabeth Figura wrote:
> 
> > +	if (args->timeout) {
> > +		struct timespec64 to;
> > +
> > +		if (get_timespec64(&to, u64_to_user_ptr(args->timeout)))
> > +			return -EFAULT;
> > +		if (!timespec64_valid(&to))
> > +			return -EINVAL;
> > +
> > +		timeout = timespec64_to_ns(&to);
> > +	}
> 
> Have you considered just passing the nanosecond value here?
> Since you do not appear to write it back, that would avoid
> the complexities of dealing with timespec layout differences
> and indirection.

That'd be nicer in general. I think there was some documentation that advised
using timespec64 for new ioctl interfaces but it may have been outdated or
misread.

> 
> > +	ids = kmalloc_array(count, sizeof(*ids), GFP_KERNEL);
> > +	if (!ids)
> > +		return -ENOMEM;
> > +	if (copy_from_user(ids, u64_to_user_ptr(args->objs),
> > +			   array_size(count, sizeof(*ids)))) {
> > +		kfree(ids);
> > +		return -EFAULT;
> > +	}
> 
> This looks like memdup_user() would be slightly simpler.

That's useful, thanks.



^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [RFC PATCH 8/9] ntsync: Introduce NTSYNC_IOC_PUT_MUTEX.
  2024-01-24  7:42   ` Arnd Bergmann
@ 2024-01-24 18:03     ` Elizabeth Figura
  2024-01-24 19:53       ` Arnd Bergmann
  0 siblings, 1 reply; 39+ messages in thread
From: Elizabeth Figura @ 2024-01-24 18:03 UTC (permalink / raw)
  To: Greg Kroah-Hartman, linux-kernel, linux-api, Arnd Bergmann
  Cc: wine-devel, André Almeida, Wolfram Sang, Arkadiusz Hiler,
	Peter Zijlstra

On Wednesday, 24 January 2024 01:42:19 CST Arnd Bergmann wrote:
> On Wed, Jan 24, 2024, at 01:40, Elizabeth Figura wrote:
> > @@ -738,6 +803,8 @@ static long ntsync_char_ioctl(struct file *file, 
> > diff --git a/include/uapi/linux/ntsync.h b/include/uapi/linux/ntsync.h
> > index 26d1b3d4847f..2e44e7e77776 100644
> > --- a/include/uapi/linux/ntsync.h
> > +++ b/include/uapi/linux/ntsync.h
> > @@ -46,5 +46,7 @@ struct ntsync_wait_args {
> >  					      struct ntsync_wait_args)
> >  #define NTSYNC_IOC_CREATE_MUTEX		_IOWR(NTSYNC_IOC_BASE, 5, \
> >  					      struct ntsync_mutex_args)
> > +#define NTSYNC_IOC_PUT_MUTEX		_IOWR(NTSYNC_IOC_BASE, 6, \
> > +					      struct ntsync_mutex_args)
> > 
> 
> In your implementation, this argument is not written back to
> user space, so I think this should be _IOW rather than than _IORW.
> 
> Again, no practical difference here.

Hm, but there is a put_user() at the end of the function, or am I missing something?



^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [RFC PATCH 5/9] ntsync: Introduce NTSYNC_IOC_WAIT_ANY.
  2024-01-24 18:02     ` Elizabeth Figura
@ 2024-01-24 19:52       ` Arnd Bergmann
  2024-01-24 22:28         ` Elizabeth Figura
  0 siblings, 1 reply; 39+ messages in thread
From: Arnd Bergmann @ 2024-01-24 19:52 UTC (permalink / raw)
  To: Elizabeth Figura, Greg Kroah-Hartman, linux-kernel, linux-api
  Cc: wine-devel, André Almeida, Wolfram Sang, Arkadiusz Hiler,
	Peter Zijlstra

On Wed, Jan 24, 2024, at 19:02, Elizabeth Figura wrote:
> On Wednesday, 24 January 2024 01:56:52 CST Arnd Bergmann wrote:
>> On Wed, Jan 24, 2024, at 01:40, Elizabeth Figura wrote:
>> 
>> > +	if (args->timeout) {
>> > +		struct timespec64 to;
>> > +
>> > +		if (get_timespec64(&to, u64_to_user_ptr(args->timeout)))
>> > +			return -EFAULT;
>> > +		if (!timespec64_valid(&to))
>> > +			return -EINVAL;
>> > +
>> > +		timeout = timespec64_to_ns(&to);
>> > +	}
>> 
>> Have you considered just passing the nanosecond value here?
>> Since you do not appear to write it back, that would avoid
>> the complexities of dealing with timespec layout differences
>> and indirection.
>
> That'd be nicer in general. I think there was some documentation that advised
> using timespec64 for new ioctl interfaces but it may have been outdated or
> misread.

It's probably something I wrote. It depends a bit on
whether you have an absolute or relative timeout. If
the timeout is relative to the current time as I understand
it is here, a 64-bit number seems more logical to me.

For absolute times, I would usually use a __kernel_timespec,
especially if it's CLOCK_REALTIME. In this case you would
also need to specify the time domain.

      Arnd

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [RFC PATCH 8/9] ntsync: Introduce NTSYNC_IOC_PUT_MUTEX.
  2024-01-24 18:03     ` Elizabeth Figura
@ 2024-01-24 19:53       ` Arnd Bergmann
  0 siblings, 0 replies; 39+ messages in thread
From: Arnd Bergmann @ 2024-01-24 19:53 UTC (permalink / raw)
  To: Elizabeth Figura, Greg Kroah-Hartman, linux-kernel, linux-api
  Cc: wine-devel, André Almeida, Wolfram Sang, Arkadiusz Hiler,
	Peter Zijlstra

On Wed, Jan 24, 2024, at 19:03, Elizabeth Figura wrote:
> On Wednesday, 24 January 2024 01:42:19 CST Arnd Bergmann wrote:
>> On Wed, Jan 24, 2024, at 01:40, Elizabeth Figura wrote:
>> > @@ -738,6 +803,8 @@ static long ntsync_char_ioctl(struct file *file, 
>> > diff --git a/include/uapi/linux/ntsync.h b/include/uapi/linux/ntsync.h
>> > index 26d1b3d4847f..2e44e7e77776 100644
>> > --- a/include/uapi/linux/ntsync.h
>> > +++ b/include/uapi/linux/ntsync.h
>> > @@ -46,5 +46,7 @@ struct ntsync_wait_args {
>> >  					      struct ntsync_wait_args)
>> >  #define NTSYNC_IOC_CREATE_MUTEX		_IOWR(NTSYNC_IOC_BASE, 5, \
>> >  					      struct ntsync_mutex_args)
>> > +#define NTSYNC_IOC_PUT_MUTEX		_IOWR(NTSYNC_IOC_BASE, 6, \
>> > +					      struct ntsync_mutex_args)
>> > 
>> 
>> In your implementation, this argument is not written back to
>> user space, so I think this should be _IOW rather than than _IORW.
>> 
>> Again, no practical difference here.
>
> Hm, but there is a put_user() at the end of the function, or am I 
> missing something?

No, I was just looking at the wrong thing, your version is good.

     Arnd

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [RFC PATCH 1/9] ntsync: Introduce the ntsync driver and character device.
  2024-01-24  0:40 ` [RFC PATCH 1/9] ntsync: Introduce the ntsync driver and character device Elizabeth Figura
  2024-01-24  7:38   ` Arnd Bergmann
@ 2024-01-24 21:26   ` Andy Lutomirski
  2024-01-24 22:56     ` Elizabeth Figura
  1 sibling, 1 reply; 39+ messages in thread
From: Andy Lutomirski @ 2024-01-24 21:26 UTC (permalink / raw)
  To: Elizabeth Figura
  Cc: Arnd Bergmann, Greg Kroah-Hartman, linux-kernel, linux-api,
	wine-devel, André Almeida, Wolfram Sang, Arkadiusz Hiler,
	Peter Zijlstra

On Tue, Jan 23, 2024 at 4:59 PM Elizabeth Figura
<zfigura@codeweavers.com> wrote:
>
> ntsync uses a misc device as the simplest and least intrusive uAPI interface.
>
> Each file description on the device represents an isolated NT instance, intended
> to correspond to a single NT virtual machine.

If I understand this text right, and if I understood the code right,
you're saying that each open instance of the device represents an
entire universe of NT synchronization objects, and no security or
isolation is possible between those objects.  For single-process use,
this seems fine.  But fork() will be a bit odd (although NT doesn't
really believe in fork, so maybe this is fine).

Except that NT has *named* semaphores and such.  And I'm pretty sure
I've written GUI programs that use named synchronization objects (IIRC
they were events, and this was a *very* common pattern, regularly
discussed in MSDN, usenet, etc) to detect whether another instance of
the program is running.  And this all works on real Windows because
sessions have sufficiently separated namespaces, and the security all
works out about as any other security on Windows, etc.  But
implementing *that* on top of this
file-description-plus-integer-equals-object will be fundamentally
quite subject to one buggy program completely clobbering someone
else's state.

Would it make sense and scale appropriately for an NT synchronization
*object* to be a Linux open file description?  Then SCM_RIGHTS could
pass them around, an RPC server could manage *named* objects, and
they'd generally work just like other "Object Manager" objects like,
say, files.

--Andy

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [RFC PATCH 5/9] ntsync: Introduce NTSYNC_IOC_WAIT_ANY.
  2024-01-24 19:52       ` Arnd Bergmann
@ 2024-01-24 22:28         ` Elizabeth Figura
  2024-01-25 17:02           ` Arnd Bergmann
  0 siblings, 1 reply; 39+ messages in thread
From: Elizabeth Figura @ 2024-01-24 22:28 UTC (permalink / raw)
  To: Greg Kroah-Hartman, linux-kernel, linux-api, Arnd Bergmann
  Cc: wine-devel, André Almeida, Wolfram Sang, Arkadiusz Hiler,
	Peter Zijlstra

On Wednesday, 24 January 2024 13:52:52 CST Arnd Bergmann wrote:
> On Wed, Jan 24, 2024, at 19:02, Elizabeth Figura wrote:
> > On Wednesday, 24 January 2024 01:56:52 CST Arnd Bergmann wrote:
> >> On Wed, Jan 24, 2024, at 01:40, Elizabeth Figura wrote:
> >> 
> >> > +	if (args->timeout) {
> >> > +		struct timespec64 to;
> >> > +
> >> > +		if (get_timespec64(&to, u64_to_user_ptr(args->timeout)))
> >> > +			return -EFAULT;
> >> > +		if (!timespec64_valid(&to))
> >> > +			return -EINVAL;
> >> > +
> >> > +		timeout = timespec64_to_ns(&to);
> >> > +	}
> >> 
> >> Have you considered just passing the nanosecond value here?
> >> Since you do not appear to write it back, that would avoid
> >> the complexities of dealing with timespec layout differences
> >> and indirection.
> >
> > That'd be nicer in general. I think there was some documentation that advised
> > using timespec64 for new ioctl interfaces but it may have been outdated or
> > misread.
> 
> It's probably something I wrote. It depends a bit on
> whether you have an absolute or relative timeout. If
> the timeout is relative to the current time as I understand
> it is here, a 64-bit number seems more logical to me.
> 
> For absolute times, I would usually use a __kernel_timespec,
> especially if it's CLOCK_REALTIME. In this case you would
> also need to specify the time domain.

Currently the interface does pass it as an absolute time, with the
domain implicitly being MONOTONIC. This particular choice comes from
process/botching-up-ioctls.rst, which is admittedly focused around GPU
ioctls, but the rationale of having easily restartable ioctls applies
here too.

(E.g. Wine does play games with signals, so we do want to be able to
interrupt arbitrary waits with EINTR. The "usual" fast path for ntsync
waits won't hit that, but we want to have it work.)

On the other hand, if we can pass the timeout as relative, and write it
back on exit like ppoll() does [assuming that's not proscribed], that
would presumably be slightly better for performance.

When writing the patch I just picked the recommended option, and didn't
bother doing any micro-optimizations afterward.

What's the rationale for using timespec for absolute or written-back
timeouts, instead of dealing in ns directly? I'm afraid it's not
obvious to me.

--Zeb



^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [RFC PATCH 1/9] ntsync: Introduce the ntsync driver and character device.
  2024-01-24 21:26   ` Andy Lutomirski
@ 2024-01-24 22:56     ` Elizabeth Figura
  2024-01-25  3:42       ` Elizabeth Figura
  2024-01-25  7:41       ` Alexandre Julliard
  0 siblings, 2 replies; 39+ messages in thread
From: Elizabeth Figura @ 2024-01-24 22:56 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Arnd Bergmann, Greg Kroah-Hartman, linux-kernel, linux-api,
	wine-devel, André Almeida, Wolfram Sang, Arkadiusz Hiler,
	Peter Zijlstra, Alexandre Julliard

On Wednesday, 24 January 2024 15:26:15 CST Andy Lutomirski wrote:
> On Tue, Jan 23, 2024 at 4:59 PM Elizabeth Figura
> <zfigura@codeweavers.com> wrote:
> >
> > ntsync uses a misc device as the simplest and least intrusive uAPI interface.
> >
> > Each file description on the device represents an isolated NT instance, intended
> > to correspond to a single NT virtual machine.
> 
> If I understand this text right, and if I understood the code right,
> you're saying that each open instance of the device represents an
> entire universe of NT synchronization objects, and no security or
> isolation is possible between those objects.  For single-process use,
> this seems fine.  But fork() will be a bit odd (although NT doesn't
> really believe in fork, so maybe this is fine).
> 
> Except that NT has *named* semaphores and such.  And I'm pretty sure
> I've written GUI programs that use named synchronization objects (IIRC
> they were events, and this was a *very* common pattern, regularly
> discussed in MSDN, usenet, etc) to detect whether another instance of
> the program is running.  And this all works on real Windows because
> sessions have sufficiently separated namespaces, and the security all
> works out about as any other security on Windows, etc.  But
> implementing *that* on top of this
> file-description-plus-integer-equals-object will be fundamentally
> quite subject to one buggy program completely clobbering someone
> else's state.
> 
> Would it make sense and scale appropriately for an NT synchronization
> *object* to be a Linux open file description?  Then SCM_RIGHTS could
> pass them around, an RPC server could manage *named* objects, and
> they'd generally work just like other "Object Manager" objects like,
> say, files.

It's a sensible concern. I think when I discussed this with Alexandre
Julliard (the Wine maintainer, CC'd) the conclusion was this wasn't
something we were concerned about.

While the current model *does* allow for processes to arbitrarily mess
with each other, accidentally or not, I think we're not concerned with
the scope of that than we are about implementing a whole scheduler in
user space.

For one, you can't corrupt the wineserver state this way—wineserver
being sort of like a dedicated process that handles many of the things
that a kernel would, and so sometimes needs to set or reset events, or
perform NTSYNC_IOC_KILL_MUTEX, but never relies on ntsync object state.
Whereas trying to implement a scheduler in user space would involve the
wineserver taking locks, and hence other processes could deadlock.

For two, it's probably a lot harder to mess with that internal state
accidentally.

[There is also a potential problem where some broken applications
create a million (literally) sync objects. Making these into files runs
into NOFILE. We did specifically push distributions and systemd to
increase those limits because an older solution *did* use eventfds and
*did* run into those limits. Since that push was successful I don't
know if this is *actually* a concern anymore, but avoiding files is
probably not a bad thing either.]

--Zeb



^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [RFC PATCH 1/9] ntsync: Introduce the ntsync driver and character device.
  2024-01-24 22:56     ` Elizabeth Figura
@ 2024-01-25  3:42       ` Elizabeth Figura
  2024-01-25 16:47         ` Arnd Bergmann
  2024-01-25 18:55         ` Andy Lutomirski
  2024-01-25  7:41       ` Alexandre Julliard
  1 sibling, 2 replies; 39+ messages in thread
From: Elizabeth Figura @ 2024-01-25  3:42 UTC (permalink / raw)
  To: Andy Lutomirski, wine-devel
  Cc: Arnd Bergmann, Greg Kroah-Hartman, linux-kernel, linux-api,
	wine-devel, André Almeida, Wolfram Sang, Peter Zijlstra,
	Alexandre Julliard, Elizabeth Figura

On Wednesday, 24 January 2024 16:56:23 CST Elizabeth Figura wrote:
> On Wednesday, 24 January 2024 15:26:15 CST Andy Lutomirski wrote:
> 
> > On Tue, Jan 23, 2024 at 4:59 PM Elizabeth Figura
> > <zfigura@codeweavers.com> wrote:
> > 
> > >
> > >
> > > ntsync uses a misc device as the simplest and least intrusive uAPI
> > > interface.
> >
> > >
> > >
> > > Each file description on the device represents an isolated NT instance,
> > > intended to correspond to a single NT virtual machine.
> > 
> > 
> > If I understand this text right, and if I understood the code right,
> > you're saying that each open instance of the device represents an
> > entire universe of NT synchronization objects, and no security or
> > isolation is possible between those objects.  For single-process use,
> > this seems fine.  But fork() will be a bit odd (although NT doesn't
> > really believe in fork, so maybe this is fine).
> > 
> > Except that NT has *named* semaphores and such.  And I'm pretty sure
> > I've written GUI programs that use named synchronization objects (IIRC
> > they were events, and this was a *very* common pattern, regularly
> > discussed in MSDN, usenet, etc) to detect whether another instance of
> > the program is running.  And this all works on real Windows because
> > sessions have sufficiently separated namespaces, and the security all
> > works out about as any other security on Windows, etc.  But
> > implementing *that* on top of this
> > file-description-plus-integer-equals-object will be fundamentally
> > quite subject to one buggy program completely clobbering someone
> > else's state.
> > 
> > Would it make sense and scale appropriately for an NT synchronization
> > *object* to be a Linux open file description?  Then SCM_RIGHTS could
> > pass them around, an RPC server could manage *named* objects, and
> > they'd generally work just like other "Object Manager" objects like,
> > say, files.
> 
> 
> It's a sensible concern. I think when I discussed this with Alexandre
> Julliard (the Wine maintainer, CC'd) the conclusion was this wasn't
> something we were concerned about.
> 
> While the current model *does* allow for processes to arbitrarily mess
> with each other, accidentally or not, I think we're not concerned with
> the scope of that than we are about implementing a whole scheduler in
> user space.
> 
> For one, you can't corrupt the wineserver state this way—wineserver
> being sort of like a dedicated process that handles many of the things
> that a kernel would, and so sometimes needs to set or reset events, or
> perform NTSYNC_IOC_KILL_MUTEX, but never relies on ntsync object state.
> Whereas trying to implement a scheduler in user space would involve the
> wineserver taking locks, and hence other processes could deadlock.
> 
> For two, it's probably a lot harder to mess with that internal state
> accidentally.
> 
> [There is also a potential problem where some broken applications
> create a million (literally) sync objects. Making these into files runs
> into NOFILE. We did specifically push distributions and systemd to
> increase those limits because an older solution *did* use eventfds and
> *did* run into those limits. Since that push was successful I don't
> know if this is *actually* a concern anymore, but avoiding files is
> probably not a bad thing either.]

Of course, looking at it from a kernel maintainer's perspective, it wouldn't 
be insane to do this anyway. If we at some point do start to care about cross-
process isolation in this way, or if another NT emulator wants to use this 
interface and does care about cross-process isolation, it'll be necessary. At 
least it'd make sense to make them separate files even if we don't implement 
granular permission handling just yet.

The main question is, is NOFILE a realistic concern, and what other problems 
might there be, in terms of making these heavier objects? Besides memory usage 
I can't think of any, but of course I don't have much knowledge of this area.

Alternatively, maybe there's another more lightweight way to store per-process 
data?



^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [RFC PATCH 1/9] ntsync: Introduce the ntsync driver and character device.
  2024-01-24 22:56     ` Elizabeth Figura
  2024-01-25  3:42       ` Elizabeth Figura
@ 2024-01-25  7:41       ` Alexandre Julliard
  1 sibling, 0 replies; 39+ messages in thread
From: Alexandre Julliard @ 2024-01-25  7:41 UTC (permalink / raw)
  To: Elizabeth Figura
  Cc: Andy Lutomirski, Arnd Bergmann, Greg Kroah-Hartman, linux-kernel,
	linux-api, wine-devel, André Almeida, Wolfram Sang,
	Arkadiusz Hiler, Peter Zijlstra

Elizabeth Figura <zfigura@codeweavers.com> writes:

> On Wednesday, 24 January 2024 15:26:15 CST Andy Lutomirski wrote:
>> On Tue, Jan 23, 2024 at 4:59 PM Elizabeth Figura
>> <zfigura@codeweavers.com> wrote:
>> >
>> > ntsync uses a misc device as the simplest and least intrusive uAPI interface.
>> >
>> > Each file description on the device represents an isolated NT instance, intended
>> > to correspond to a single NT virtual machine.
>> 
>> If I understand this text right, and if I understood the code right,
>> you're saying that each open instance of the device represents an
>> entire universe of NT synchronization objects, and no security or
>> isolation is possible between those objects.  For single-process use,
>> this seems fine.  But fork() will be a bit odd (although NT doesn't
>> really believe in fork, so maybe this is fine).
>> 
>> Except that NT has *named* semaphores and such.  And I'm pretty sure
>> I've written GUI programs that use named synchronization objects (IIRC
>> they were events, and this was a *very* common pattern, regularly
>> discussed in MSDN, usenet, etc) to detect whether another instance of
>> the program is running.  And this all works on real Windows because
>> sessions have sufficiently separated namespaces, and the security all
>> works out about as any other security on Windows, etc.  But
>> implementing *that* on top of this
>> file-description-plus-integer-equals-object will be fundamentally
>> quite subject to one buggy program completely clobbering someone
>> else's state.
>> 
>> Would it make sense and scale appropriately for an NT synchronization
>> *object* to be a Linux open file description?  Then SCM_RIGHTS could
>> pass them around, an RPC server could manage *named* objects, and
>> they'd generally work just like other "Object Manager" objects like,
>> say, files.
>
> It's a sensible concern. I think when I discussed this with Alexandre
> Julliard (the Wine maintainer, CC'd) the conclusion was this wasn't
> something we were concerned about.
>
> While the current model *does* allow for processes to arbitrarily mess
> with each other, accidentally or not, I think we're not concerned with
> the scope of that than we are about implementing a whole scheduler in
> user space.

I may have misunderstood something in that dicussion then, because it
would definitely be a concern. It's OK for a process to be able to mess
up the state of any object that it has an NT handle to, but it shouldn't
be possible to mess up the state of unrelated objects in other processes
simply by passing the wrong integer id.

The concern is not so much about a malicious process going out of its
way to corrupt others, because it could do that through the NT API just
as well. But if a wayward pointer corrupts the client-side handle cache,
that shouldn't take down the entire session.

-- 
Alexandre Julliard
julliard@winehq.org

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [RFC PATCH 4/9] ntsync: Introduce NTSYNC_IOC_PUT_SEM.
  2024-01-24  0:40 ` [RFC PATCH 4/9] ntsync: Introduce NTSYNC_IOC_PUT_SEM Elizabeth Figura
@ 2024-01-25  8:59   ` Nikolay Borisov
  0 siblings, 0 replies; 39+ messages in thread
From: Nikolay Borisov @ 2024-01-25  8:59 UTC (permalink / raw)
  To: Elizabeth Figura, Arnd Bergmann, Greg Kroah-Hartman,
	linux-kernel, linux-api
  Cc: wine-devel, André Almeida, Wolfram Sang, Arkadiusz Hiler,
	Peter Zijlstra



On 24.01.24 г. 2:40 ч., Elizabeth Figura wrote:
> This corresponds to the NT syscall NtReleaseSemaphore().
> 
> Signed-off-by: Elizabeth Figura <zfigura@codeweavers.com>
> ---
>   drivers/misc/ntsync.c       | 76 +++++++++++++++++++++++++++++++++++++
>   include/uapi/linux/ntsync.h |  2 +
>   2 files changed, 78 insertions(+)
> 
> diff --git a/drivers/misc/ntsync.c b/drivers/misc/ntsync.c
> index 3287b94be351..d1c91c2a4f1a 100644
> --- a/drivers/misc/ntsync.c
> +++ b/drivers/misc/ntsync.c
> @@ -21,9 +21,11 @@ enum ntsync_type {
>   struct ntsync_obj {
>   	struct rcu_head rhead;
>   	struct kref refcount;
> +	spinlock_t lock;
>   
>   	enum ntsync_type type;
>   
> +	/* The following fields are protected by the object lock. */
>   	union {
>   		struct {
>   			__u32 count;
> @@ -36,6 +38,19 @@ struct ntsync_device {
>   	struct xarray objects;
>   };
>   
> +static struct ntsync_obj *get_obj(struct ntsync_device *dev, __u32 id)
> +{
> +	struct ntsync_obj *obj;
> +
> +	rcu_read_lock();
> +	obj = xa_load(&dev->objects, id);
> +	if (obj && !kref_get_unless_zero(&obj->refcount))
> +		obj = NULL;
> +	rcu_read_unlock();
> +
> +	return obj;
> +}
> +
>   static void destroy_obj(struct kref *ref)
>   {
>   	struct ntsync_obj *obj = container_of(ref, struct ntsync_obj, refcount);
> @@ -48,6 +63,18 @@ static void put_obj(struct ntsync_obj *obj)
>   	kref_put(&obj->refcount, destroy_obj);
>   }
>   
> +static struct ntsync_obj *get_obj_typed(struct ntsync_device *dev, __u32 id,
> +					enum ntsync_type type)
> +{
> +	struct ntsync_obj *obj = get_obj(dev, id);
> +
> +	if (obj && obj->type != type) {
> +		put_obj(obj);
> +		return NULL;
> +	}
> +	return obj;
> +}
> +
>   static int ntsync_char_open(struct inode *inode, struct file *file)
>   {
>   	struct ntsync_device *dev;
> @@ -81,6 +108,7 @@ static int ntsync_char_release(struct inode *inode, struct file *file)
>   static void init_obj(struct ntsync_obj *obj)
>   {
>   	kref_init(&obj->refcount);
> +	spin_lock_init(&obj->lock);
>   }
>   
>   static int ntsync_create_sem(struct ntsync_device *dev, void __user *argp)
> @@ -131,6 +159,52 @@ static int ntsync_delete(struct ntsync_device *dev, void __user *argp)
>   	return 0;
>   }
>   
> +/*
> + * Actually change the semaphore state, returning -EOVERFLOW if it is made
> + * invalid.
> + */
> +static int put_sem_state(struct ntsync_obj *sem, __u32 count)

nit: Just a general observation - those functions that contains the 
specific type in their name could take the exact object i.e struct ntsem 
which will make the code somewhat more clear. Of course, this would mean 
that the struct definition in patch 3 should be changed to also contain 
a tag name.


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [RFC PATCH 1/9] ntsync: Introduce the ntsync driver and character device.
  2024-01-25  3:42       ` Elizabeth Figura
@ 2024-01-25 16:47         ` Arnd Bergmann
  2024-01-25 18:21           ` Elizabeth Figura
  2024-01-25 18:55         ` Andy Lutomirski
  1 sibling, 1 reply; 39+ messages in thread
From: Arnd Bergmann @ 2024-01-25 16:47 UTC (permalink / raw)
  To: Elizabeth Figura, Andy Lutomirski, wine-devel
  Cc: Greg Kroah-Hartman, linux-kernel, linux-api, André Almeida,
	Wolfram Sang, Peter Zijlstra, Alexandre Julliard

On Thu, Jan 25, 2024, at 04:42, Elizabeth Figura wrote:
> On Wednesday, 24 January 2024 16:56:23 CST Elizabeth Figura wrote:
>> On Wednesday, 24 January 2024 15:26:15 CST Andy Lutomirski wrote:
>> > On Tue, Jan 23, 2024 at 4:59 PM Elizabeth Figura <zfigura@codeweavers.com> wrote:
>> 
>> [There is also a potential problem where some broken applications
>> create a million (literally) sync objects. Making these into files runs
>> into NOFILE. We did specifically push distributions and systemd to
>> increase those limits because an older solution *did* use eventfds and
>> *did* run into those limits. Since that push was successful I don't
>> know if this is *actually* a concern anymore, but avoiding files is
>> probably not a bad thing either.]
>
> Of course, looking at it from a kernel maintainer's perspective, it wouldn't 
> be insane to do this anyway. If we at some point do start to care about cross-
> process isolation in this way, or if another NT emulator wants to use this 
> interface and does care about cross-process isolation, it'll be necessary. At 
> least it'd make sense to make them separate files even if we don't implement 
> granular permission handling just yet.

I can think of a few other possible benefits of going with
per-mutex file descriptors:

- being able to use poll() for waiting on them individually in
  combination with other file descriptor based events (socket,
  signalfd, pidfd, ...)

- replacing your logic around xarray with something a bit
  simpler. As far as I can tell, your code is all correct here,
  but it would be easier to understand if it looked more like
  other code I'm familiar with.

> The main question is, is NOFILE a realistic concern, and what other problems 
> might there be, in terms of making these heavier objects? Besides memory usage 
> I can't think of any, but of course I don't have much knowledge of this area.

I would think that RLIMIT_NOFILE is a sensible way of
managing this, at least this way it's possible to prevent
exhausting memory with too many mutexes, but still raising
the limit if you need more than whatever default one might
come up with.

     Arnd

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [RFC PATCH 5/9] ntsync: Introduce NTSYNC_IOC_WAIT_ANY.
  2024-01-24 22:28         ` Elizabeth Figura
@ 2024-01-25 17:02           ` Arnd Bergmann
  2024-01-25 18:30             ` Elizabeth Figura
  0 siblings, 1 reply; 39+ messages in thread
From: Arnd Bergmann @ 2024-01-25 17:02 UTC (permalink / raw)
  To: Elizabeth Figura, Greg Kroah-Hartman, linux-kernel, linux-api
  Cc: wine-devel, André Almeida, Wolfram Sang, Arkadiusz Hiler,
	Peter Zijlstra

On Wed, Jan 24, 2024, at 23:28, Elizabeth Figura wrote:
> On Wednesday, 24 January 2024 13:52:52 CST Arnd Bergmann wrote:
>> On Wed, Jan 24, 2024, at 19:02, Elizabeth Figura wrote:

>> > That'd be nicer in general. I think there was some documentation that advised
>> > using timespec64 for new ioctl interfaces but it may have been outdated or
>> > misread.
>> 
>> It's probably something I wrote. It depends a bit on
>> whether you have an absolute or relative timeout. If
>> the timeout is relative to the current time as I understand
>> it is here, a 64-bit number seems more logical to me.
>> 
>> For absolute times, I would usually use a __kernel_timespec,
>> especially if it's CLOCK_REALTIME. In this case you would
>> also need to specify the time domain.
>
> Currently the interface does pass it as an absolute time, with the
> domain implicitly being MONOTONIC. This particular choice comes from
> process/botching-up-ioctls.rst, which is admittedly focused around GPU
> ioctls, but the rationale of having easily restartable ioctls applies
> here too.

Ok, I was thinking of Documentation/driver-api/ioctl.rst, which
has similar recommendations.

> (E.g. Wine does play games with signals, so we do want to be able to
> interrupt arbitrary waits with EINTR. The "usual" fast path for ntsync
> waits won't hit that, but we want to have it work.)
>
> On the other hand, if we can pass the timeout as relative, and write it
> back on exit like ppoll() does [assuming that's not proscribed], that
> would presumably be slightly better for performance.

I've seen arguments go either way between absolute and relative
times, just pick whatever works best for you here.

> When writing the patch I just picked the recommended option, and didn't
> bother doing any micro-optimizations afterward.
>
> What's the rationale for using timespec for absolute or written-back
> timeouts, instead of dealing in ns directly? I'm afraid it's not
> obvious to me.

There is no hard rule either way, I mainly didn't like the
indirect pointer to the timespec that you have here. For
traditional unix-style interfaces, a timespec with CLOCK_REALTIME
times usually makes sense since that is what user space is
already using elsewhere, but you probably don't need to
worry about that. In theory, the single u64 CLOCK_REALTIME
nanoseconds have the problem of no longer working after year
2262, but with CLOCK_MONOTONIC that is not a concern anyway.

Between embedding a __u64 nanosecond value and embedding
a __kernel_timespec, I would pick whichever avoids converting
a __u64 back into a timespec, as that is an expensive
operation at least on 32-bit code.

       Arnd

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [RFC PATCH 1/9] ntsync: Introduce the ntsync driver and character device.
  2024-01-25 16:47         ` Arnd Bergmann
@ 2024-01-25 18:21           ` Elizabeth Figura
  0 siblings, 0 replies; 39+ messages in thread
From: Elizabeth Figura @ 2024-01-25 18:21 UTC (permalink / raw)
  To: Andy Lutomirski, wine-devel, Arnd Bergmann
  Cc: Greg Kroah-Hartman, linux-kernel, linux-api, André Almeida,
	Wolfram Sang, Peter Zijlstra, Alexandre Julliard

On Thursday, 25 January 2024 10:47:49 CST Arnd Bergmann wrote:
> On Thu, Jan 25, 2024, at 04:42, Elizabeth Figura wrote:
> > On Wednesday, 24 January 2024 16:56:23 CST Elizabeth Figura wrote:
> >> On Wednesday, 24 January 2024 15:26:15 CST Andy Lutomirski wrote:
> >> > On Tue, Jan 23, 2024 at 4:59 PM Elizabeth Figura 
<zfigura@codeweavers.com> wrote:
> >> [There is also a potential problem where some broken applications
> >> create a million (literally) sync objects. Making these into files runs
> >> into NOFILE. We did specifically push distributions and systemd to
> >> increase those limits because an older solution *did* use eventfds and
> >> *did* run into those limits. Since that push was successful I don't
> >> know if this is *actually* a concern anymore, but avoiding files is
> >> probably not a bad thing either.]
> > 
> > Of course, looking at it from a kernel maintainer's perspective, it
> > wouldn't be insane to do this anyway. If we at some point do start to
> > care about cross- process isolation in this way, or if another NT
> > emulator wants to use this interface and does care about cross-process
> > isolation, it'll be necessary. At least it'd make sense to make them
> > separate files even if we don't implement granular permission handling
> > just yet.
> 
> I can think of a few other possible benefits of going with
> per-mutex file descriptors:
> 
> - being able to use poll() for waiting on them individually in
>   combination with other file descriptor based events (socket,
>   signalfd, pidfd, ...)

I can say for sure this isn't going to be useful for Wine, at least not with 
the current design.

It also doesn't really mesh well with the NT design in the first place. 
NTSYNC_IOC_WAIT_ANY differs from poll() in two major ways: it consumes state 
of most object types, and (as coded here) it needs the owner thread ID to be 
specifically passed for mutexes.


Anyway, as Alexandre has informed me I clearly have misunderstood our 
requirements, so I'm going to try to put together something using files 
instead.



^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [RFC PATCH 5/9] ntsync: Introduce NTSYNC_IOC_WAIT_ANY.
  2024-01-25 17:02           ` Arnd Bergmann
@ 2024-01-25 18:30             ` Elizabeth Figura
  0 siblings, 0 replies; 39+ messages in thread
From: Elizabeth Figura @ 2024-01-25 18:30 UTC (permalink / raw)
  To: Greg Kroah-Hartman, linux-kernel, linux-api, Arnd Bergmann
  Cc: wine-devel, André Almeida, Wolfram Sang, Arkadiusz Hiler,
	Peter Zijlstra

On Thursday, 25 January 2024 11:02:26 CST Arnd Bergmann wrote:
> On Wed, Jan 24, 2024, at 23:28, Elizabeth Figura wrote:
> > On Wednesday, 24 January 2024 13:52:52 CST Arnd Bergmann wrote:
> >> On Wed, Jan 24, 2024, at 19:02, Elizabeth Figura wrote:
> >> > That'd be nicer in general. I think there was some documentation that
> >> > advised using timespec64 for new ioctl interfaces but it may have been
> >> > outdated or misread.
> >> 
> >> It's probably something I wrote. It depends a bit on
> >> whether you have an absolute or relative timeout. If
> >> the timeout is relative to the current time as I understand
> >> it is here, a 64-bit number seems more logical to me.
> >> 
> >> For absolute times, I would usually use a __kernel_timespec,
> >> especially if it's CLOCK_REALTIME. In this case you would
> >> also need to specify the time domain.
> > 
> > Currently the interface does pass it as an absolute time, with the
> > domain implicitly being MONOTONIC. This particular choice comes from
> > process/botching-up-ioctls.rst, which is admittedly focused around GPU
> > ioctls, but the rationale of having easily restartable ioctls applies
> > here too.
> 
> Ok, I was thinking of Documentation/driver-api/ioctl.rst, which
> has similar recommendations.
> 
> > (E.g. Wine does play games with signals, so we do want to be able to
> > interrupt arbitrary waits with EINTR. The "usual" fast path for ntsync
> > waits won't hit that, but we want to have it work.)
> > 
> > On the other hand, if we can pass the timeout as relative, and write it
> > back on exit like ppoll() does [assuming that's not proscribed], that
> > would presumably be slightly better for performance.
> 
> I've seen arguments go either way between absolute and relative
> times, just pick whatever works best for you here.
> 
> > When writing the patch I just picked the recommended option, and didn't
> > bother doing any micro-optimizations afterward.
> > 
> > What's the rationale for using timespec for absolute or written-back
> > timeouts, instead of dealing in ns directly? I'm afraid it's not
> > obvious to me.
> 
> There is no hard rule either way, I mainly didn't like the
> indirect pointer to the timespec that you have here. For
> traditional unix-style interfaces, a timespec with CLOCK_REALTIME
> times usually makes sense since that is what user space is
> already using elsewhere, but you probably don't need to
> worry about that. In theory, the single u64 CLOCK_REALTIME
> nanoseconds have the problem of no longer working after year
> 2262, but with CLOCK_MONOTONIC that is not a concern anyway.
> 
> Between embedding a __u64 nanosecond value and embedding
> a __kernel_timespec, I would pick whichever avoids converting
> a __u64 back into a timespec, as that is an expensive
> operation at least on 32-bit code.

Makes sense. I'll probably switch to using a relative and written-back u64 
then, thanks!



^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [RFC PATCH 1/9] ntsync: Introduce the ntsync driver and character device.
  2024-01-25  3:42       ` Elizabeth Figura
  2024-01-25 16:47         ` Arnd Bergmann
@ 2024-01-25 18:55         ` Andy Lutomirski
  2024-01-25 21:45           ` Elizabeth Figura
  1 sibling, 1 reply; 39+ messages in thread
From: Andy Lutomirski @ 2024-01-25 18:55 UTC (permalink / raw)
  To: Elizabeth Figura
  Cc: Andy Lutomirski, wine-devel, Arnd Bergmann, Greg Kroah-Hartman,
	linux-kernel, linux-api, André Almeida, Wolfram Sang,
	Peter Zijlstra, Alexandre Julliard

On Wed, Jan 24, 2024 at 7:42 PM Elizabeth Figura
<zfigura@codeweavers.com> wrote:
>
> On Wednesday, 24 January 2024 16:56:23 CST Elizabeth Figura wrote:
> > On Wednesday, 24 January 2024 15:26:15 CST Andy Lutomirski wrote:
> >
> > > On Tue, Jan 23, 2024 at 4:59 PM Elizabeth Figura
> > > <zfigura@codeweavers.com> wrote:
> > >
> > > >
> > > >
> > > > ntsync uses a misc device as the simplest and least intrusive uAPI
> > > > interface.
> > >
> > > >
> > > >
> > > > Each file description on the device represents an isolated NT instance,
> > > > intended to correspond to a single NT virtual machine.
> > >
> > >
> > > If I understand this text right, and if I understood the code right,
> > > you're saying that each open instance of the device represents an
> > > entire universe of NT synchronization objects, and no security or
> > > isolation is possible between those objects.  For single-process use,
> > > this seems fine.  But fork() will be a bit odd (although NT doesn't
> > > really believe in fork, so maybe this is fine).
> > >
> > > Except that NT has *named* semaphores and such.  And I'm pretty sure
> > > I've written GUI programs that use named synchronization objects (IIRC
> > > they were events, and this was a *very* common pattern, regularly
> > > discussed in MSDN, usenet, etc) to detect whether another instance of
> > > the program is running.  And this all works on real Windows because
> > > sessions have sufficiently separated namespaces, and the security all
> > > works out about as any other security on Windows, etc.  But
> > > implementing *that* on top of this
> > > file-description-plus-integer-equals-object will be fundamentally
> > > quite subject to one buggy program completely clobbering someone
> > > else's state.
> > >
> > > Would it make sense and scale appropriately for an NT synchronization
> > > *object* to be a Linux open file description?  Then SCM_RIGHTS could
> > > pass them around, an RPC server could manage *named* objects, and
> > > they'd generally work just like other "Object Manager" objects like,
> > > say, files.
> >
> >
> > It's a sensible concern. I think when I discussed this with Alexandre
> > Julliard (the Wine maintainer, CC'd) the conclusion was this wasn't
> > something we were concerned about.
> >
> > While the current model *does* allow for processes to arbitrarily mess
> > with each other, accidentally or not, I think we're not concerned with
> > the scope of that than we are about implementing a whole scheduler in
> > user space.
> >
> > For one, you can't corrupt the wineserver state this way—wineserver
> > being sort of like a dedicated process that handles many of the things
> > that a kernel would, and so sometimes needs to set or reset events, or
> > perform NTSYNC_IOC_KILL_MUTEX, but never relies on ntsync object state.
> > Whereas trying to implement a scheduler in user space would involve the
> > wineserver taking locks, and hence other processes could deadlock.
> >
> > For two, it's probably a lot harder to mess with that internal state
> > accidentally.
> >
> > [There is also a potential problem where some broken applications
> > create a million (literally) sync objects. Making these into files runs
> > into NOFILE. We did specifically push distributions and systemd to
> > increase those limits because an older solution *did* use eventfds and
> > *did* run into those limits. Since that push was successful I don't
> > know if this is *actually* a concern anymore, but avoiding files is
> > probably not a bad thing either.]
>
> Of course, looking at it from a kernel maintainer's perspective, it wouldn't
> be insane to do this anyway. If we at some point do start to care about cross-
> process isolation in this way, or if another NT emulator wants to use this
> interface and does care about cross-process isolation, it'll be necessary. At
> least it'd make sense to make them separate files even if we don't implement
> granular permission handling just yet.

I'm not convinced that any complexity at all beyond using individual
files is needed for granular permission handling.  Unless something
actually needs permission bits on different files pointing at the same
sync object (which I believe NT supports, but it's sort of an odd
concept and I'm not immediately convinced that anything uses it),
merely having individual files ought to do the trick.  Handling of who
has permission to open a given named object can live in a daemon, and
I'd guess that Wine even already implements this.

And keeping everything together gives me flashbacks of Windows 95 and
Mac OS pre-X.  Sure, in principle the software wasn't malicious, but
there was no shortage whatsoever of buggy crap out there, and systems
were quite unstable.  Even just:

CreateSemaphore();
fork();
sleep a few seconds;
exit();

seems like it could corrupt the shared namespace world.  (Obviously no
one would ever do that, right?)

Also, handle leaks:

while(true) {
  make a subprocess, which creates a semaphore and crashes;
}

>
> The main question is, is NOFILE a realistic concern, and what other problems
> might there be, in terms of making these heavier objects? Besides memory usage
> I can't think of any, but of course I don't have much knowledge of this area.

Years ago there was some discussion of making struct file
lighter-weight for light-weight things that aren't files.  And, in any
case, even the little integer indices in your code aren't free -- they
just aren't accounted as files.

And struct file isn't *that* bad.  I bet it's not dramatically bigger,
or even smaller, than whatever the Windows kernel stores for a
semaphore handle.

--Andy

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [RFC PATCH 1/9] ntsync: Introduce the ntsync driver and character device.
  2024-01-25 18:55         ` Andy Lutomirski
@ 2024-01-25 21:45           ` Elizabeth Figura
  0 siblings, 0 replies; 39+ messages in thread
From: Elizabeth Figura @ 2024-01-25 21:45 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Andy Lutomirski, wine-devel, Arnd Bergmann, Greg Kroah-Hartman,
	linux-kernel, linux-api, André Almeida, Wolfram Sang,
	Peter Zijlstra, Alexandre Julliard

On Thursday, 25 January 2024 12:55:04 CST Andy Lutomirski wrote:
> On Wed, Jan 24, 2024 at 7:42 PM Elizabeth Figura
> <zfigura@codeweavers.com> wrote:
> >
> > On Wednesday, 24 January 2024 16:56:23 CST Elizabeth Figura wrote:
> > > On Wednesday, 24 January 2024 15:26:15 CST Andy Lutomirski wrote:
> > >
> > > > On Tue, Jan 23, 2024 at 4:59 PM Elizabeth Figura
> > > > <zfigura@codeweavers.com> wrote:
> > > >
> > > > >
> > > > >
> > > > > ntsync uses a misc device as the simplest and least intrusive uAPI
> > > > > interface.
> > > >
> > > > >
> > > > >
> > > > > Each file description on the device represents an isolated NT instance,
> > > > > intended to correspond to a single NT virtual machine.
> > > >
> > > >
> > > > If I understand this text right, and if I understood the code right,
> > > > you're saying that each open instance of the device represents an
> > > > entire universe of NT synchronization objects, and no security or
> > > > isolation is possible between those objects.  For single-process use,
> > > > this seems fine.  But fork() will be a bit odd (although NT doesn't
> > > > really believe in fork, so maybe this is fine).
> > > >
> > > > Except that NT has *named* semaphores and such.  And I'm pretty sure
> > > > I've written GUI programs that use named synchronization objects (IIRC
> > > > they were events, and this was a *very* common pattern, regularly
> > > > discussed in MSDN, usenet, etc) to detect whether another instance of
> > > > the program is running.  And this all works on real Windows because
> > > > sessions have sufficiently separated namespaces, and the security all
> > > > works out about as any other security on Windows, etc.  But
> > > > implementing *that* on top of this
> > > > file-description-plus-integer-equals-object will be fundamentally
> > > > quite subject to one buggy program completely clobbering someone
> > > > else's state.
> > > >
> > > > Would it make sense and scale appropriately for an NT synchronization
> > > > *object* to be a Linux open file description?  Then SCM_RIGHTS could
> > > > pass them around, an RPC server could manage *named* objects, and
> > > > they'd generally work just like other "Object Manager" objects like,
> > > > say, files.
> > >
> > >
> > > It's a sensible concern. I think when I discussed this with Alexandre
> > > Julliard (the Wine maintainer, CC'd) the conclusion was this wasn't
> > > something we were concerned about.
> > >
> > > While the current model *does* allow for processes to arbitrarily mess
> > > with each other, accidentally or not, I think we're not concerned with
> > > the scope of that than we are about implementing a whole scheduler in
> > > user space.
> > >
> > > For one, you can't corrupt the wineserver state this way—wineserver
> > > being sort of like a dedicated process that handles many of the things
> > > that a kernel would, and so sometimes needs to set or reset events, or
> > > perform NTSYNC_IOC_KILL_MUTEX, but never relies on ntsync object state.
> > > Whereas trying to implement a scheduler in user space would involve the
> > > wineserver taking locks, and hence other processes could deadlock.
> > >
> > > For two, it's probably a lot harder to mess with that internal state
> > > accidentally.
> > >
> > > [There is also a potential problem where some broken applications
> > > create a million (literally) sync objects. Making these into files runs
> > > into NOFILE. We did specifically push distributions and systemd to
> > > increase those limits because an older solution *did* use eventfds and
> > > *did* run into those limits. Since that push was successful I don't
> > > know if this is *actually* a concern anymore, but avoiding files is
> > > probably not a bad thing either.]
> >
> > Of course, looking at it from a kernel maintainer's perspective, it wouldn't
> > be insane to do this anyway. If we at some point do start to care about cross-
> > process isolation in this way, or if another NT emulator wants to use this
> > interface and does care about cross-process isolation, it'll be necessary. At
> > least it'd make sense to make them separate files even if we don't implement
> > granular permission handling just yet.
> 
> I'm not convinced that any complexity at all beyond using individual
> files is needed for granular permission handling.  Unless something
> actually needs permission bits on different files pointing at the same
> sync object (which I believe NT supports, but it's sort of an odd
> concept and I'm not immediately convinced that anything uses it),
> merely having individual files ought to do the trick.  Handling of who
> has permission to open a given named object can live in a daemon, and
> I'd guess that Wine even already implements this.

This is mostly correct. NT has file descriptors and descriptions (the
former is called a "handle"), though unlike Unix access bits are
specific to the *descriptor* (handle). I don't know if anything uses 
it, but we do currently implement that basic functionality, so I can't
say that nothing does either.

So inasmuch as access to someone else's object is a concern, access to
your object with bits you don't have permission for could be a concern
along the same lines. However, from conversation with Alexandre I
believe it'd be fine to just implement those checks in user space.

> And keeping everything together gives me flashbacks of Windows 95 and
> Mac OS pre-X.  Sure, in principle the software wasn't malicious, but
> there was no shortage whatsoever of buggy crap out there, and systems
> were quite unstable.  Even just:
> 
> CreateSemaphore();
> fork();
> sleep a few seconds;
> exit();
> 
> seems like it could corrupt the shared namespace world.  (Obviously no
> one would ever do that, right?)
> 
> Also, handle leaks:
> 
> while(true) {
>   make a subprocess, which creates a semaphore and crashes;
> }

For whatever it's worth, this particular thing wouldn't be a concern;
Wine's "kernel" daemon already has to detect when a process dies and
close all its outstanding handles.



^ permalink raw reply	[flat|nested] 39+ messages in thread

end of thread, other threads:[~2024-01-25 21:45 UTC | newest]

Thread overview: 39+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-01-24  0:40 [RFC PATCH 0/9] NT synchronization primitive driver Elizabeth Figura
2024-01-24  0:40 ` [RFC PATCH 1/9] ntsync: Introduce the ntsync driver and character device Elizabeth Figura
2024-01-24  7:38   ` Arnd Bergmann
2024-01-24 17:51     ` Elizabeth Figura
2024-01-24 21:26   ` Andy Lutomirski
2024-01-24 22:56     ` Elizabeth Figura
2024-01-25  3:42       ` Elizabeth Figura
2024-01-25 16:47         ` Arnd Bergmann
2024-01-25 18:21           ` Elizabeth Figura
2024-01-25 18:55         ` Andy Lutomirski
2024-01-25 21:45           ` Elizabeth Figura
2024-01-25  7:41       ` Alexandre Julliard
2024-01-24  0:40 ` [RFC PATCH 2/9] ntsync: Reserve a minor device number and ioctl range Elizabeth Figura
2024-01-24  0:54   ` Greg Kroah-Hartman
2024-01-24  3:43     ` Elizabeth Figura
2024-01-24 12:32       ` Greg Kroah-Hartman
2024-01-24 17:59         ` Elizabeth Figura
2024-01-24  0:40 ` [RFC PATCH 3/9] ntsync: Introduce NTSYNC_IOC_CREATE_SEM and NTSYNC_IOC_DELETE Elizabeth Figura
2024-01-24  1:14   ` Greg Kroah-Hartman
2024-01-24  3:35     ` Elizabeth Figura
2024-01-24  0:40 ` [RFC PATCH 4/9] ntsync: Introduce NTSYNC_IOC_PUT_SEM Elizabeth Figura
2024-01-25  8:59   ` Nikolay Borisov
2024-01-24  0:40 ` [RFC PATCH 5/9] ntsync: Introduce NTSYNC_IOC_WAIT_ANY Elizabeth Figura
2024-01-24  7:56   ` Arnd Bergmann
2024-01-24 18:02     ` Elizabeth Figura
2024-01-24 19:52       ` Arnd Bergmann
2024-01-24 22:28         ` Elizabeth Figura
2024-01-25 17:02           ` Arnd Bergmann
2024-01-25 18:30             ` Elizabeth Figura
2024-01-24  0:40 ` [RFC PATCH 6/9] ntsync: Introduce NTSYNC_IOC_WAIT_ALL Elizabeth Figura
2024-01-24  0:40 ` [RFC PATCH 7/9] ntsync: Introduce NTSYNC_IOC_CREATE_MUTEX Elizabeth Figura
2024-01-24  0:40 ` [RFC PATCH 8/9] ntsync: Introduce NTSYNC_IOC_PUT_MUTEX Elizabeth Figura
2024-01-24  7:42   ` Arnd Bergmann
2024-01-24 18:03     ` Elizabeth Figura
2024-01-24 19:53       ` Arnd Bergmann
2024-01-24  0:40 ` [RFC PATCH 9/9] ntsync: Introduce NTSYNC_IOC_KILL_OWNER Elizabeth Figura
2024-01-24  0:59 ` [RFC PATCH 0/9] NT synchronization primitive driver Greg Kroah-Hartman
2024-01-24  1:37   ` Elizabeth Figura
2024-01-24 12:29     ` Greg Kroah-Hartman

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).