From: Elizabeth Figura <zfigura@codeweavers.com>
To: Arnd Bergmann <arnd@arndb.de>,
Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
Jonathan Corbet <corbet@lwn.net>, Shuah Khan <shuah@kernel.org>
Cc: linux-kernel@vger.kernel.org, linux-api@vger.kernel.org,
wine-devel@winehq.org, "André Almeida" <andrealmeid@igalia.com>,
"Wolfram Sang" <wsa@kernel.org>,
"Arkadiusz Hiler" <ahiler@codeweavers.com>,
"Peter Zijlstra" <peterz@infradead.org>,
"Andy Lutomirski" <luto@kernel.org>,
linux-doc@vger.kernel.org, linux-kselftest@vger.kernel.org,
"Randy Dunlap" <rdunlap@infradead.org>,
"Elizabeth Figura" <zfigura@codeweavers.com>
Subject: [PATCH v2 31/31] docs: ntsync: Add documentation for the ntsync uAPI.
Date: Mon, 19 Feb 2024 16:38:33 -0600 [thread overview]
Message-ID: <20240219223833.95710-32-zfigura@codeweavers.com> (raw)
In-Reply-To: <20240219223833.95710-1-zfigura@codeweavers.com>
Add an overall explanation of the driver architecture, and complete and precise
specification for its intended behaviour.
Signed-off-by: Elizabeth Figura <zfigura@codeweavers.com>
---
Documentation/userspace-api/index.rst | 1 +
Documentation/userspace-api/ntsync.rst | 399 +++++++++++++++++++++++++
2 files changed, 400 insertions(+)
create mode 100644 Documentation/userspace-api/ntsync.rst
diff --git a/Documentation/userspace-api/index.rst b/Documentation/userspace-api/index.rst
index 09f61bd2ac2e..f5a72ed27def 100644
--- a/Documentation/userspace-api/index.rst
+++ b/Documentation/userspace-api/index.rst
@@ -34,6 +34,7 @@ place where this information is gathered.
tee
isapnp
dcdbas
+ ntsync
.. only:: subproject and html
diff --git a/Documentation/userspace-api/ntsync.rst b/Documentation/userspace-api/ntsync.rst
new file mode 100644
index 000000000000..202c2350d3af
--- /dev/null
+++ b/Documentation/userspace-api/ntsync.rst
@@ -0,0 +1,399 @@
+===================================
+NT synchronization primitive driver
+===================================
+
+This page documents the user-space API for the ntsync driver.
+
+ntsync is a support driver for emulation of NT synchronization
+primitives by user-space NT emulators. It exists because implementation
+in user-space, using existing tools, cannot match Windows performance
+while offering accurate semantics. It is implemented entirely in
+software, and does not drive any hardware device.
+
+This interface is meant as a compatibility tool only, and should not
+be used for general synchronization. Instead use generic, versatile
+interfaces such as futex(2) and poll(2).
+
+Synchronization primitives
+==========================
+
+The ntsync driver exposes three types of synchronization primitives:
+semaphores, mutexes, and events.
+
+A semaphore holds a single volatile 32-bit counter, and a static 32-bit
+integer denoting the maximum value. It is considered signaled when the
+counter is nonzero. The counter is decremented by one when a wait is
+satisfied. Both the initial and maximum count are established when the
+semaphore is created.
+
+A mutex holds a volatile 32-bit recursion count, and a volatile 32-bit
+identifier denoting its owner. A mutex is considered signaled when its
+owner is zero (indicating that it is not owned). The recursion count is
+incremented when a wait is satisfied, and ownership is set to the given
+identifier.
+
+A mutex also holds an internal flag denoting whether its previous owner
+has died; such a mutex is said to be abandoned. Owner death is not
+tracked automatically based on thread death, but rather must be
+communicated using ``NTSYNC_IOC_MUTEX_KILL``. An abandoned mutex is
+inherently considered unowned.
+
+Except for the "unowned" semantics of zero, the actual value of the
+owner identifier is not interpreted by the ntsync driver at all. The
+intended use is to store a thread identifier; however, the ntsync
+driver does not actually validate that a calling thread provides
+consistent or unique identifiers.
+
+An event holds a volatile boolean state denoting whether it is signaled
+or not. There are two types of events, auto-reset and manual-reset. An
+auto-reset event is designaled when a wait is satisfied; a manual-reset
+event is not. The event type is specified when the event is created.
+
+Unless specified otherwise, all operations on an object are atomic and
+totally ordered with respect to other operations on the same object.
+
+Objects are represented by files. When all file descriptors to an
+object are closed, that object is deleted.
+
+Char device
+===========
+
+The ntsync driver creates a single char device /dev/ntsync. Each file
+description opened on the device represents a unique instance intended
+to back an individual NT virtual machine. Objects created by one ntsync
+instance may only be used with other objects created by the same
+instance.
+
+ioctl reference
+===============
+
+All operations on the device are done through ioctls. There are four
+structures used in ioctl calls::
+
+ struct ntsync_sem_args {
+ __u32 sem;
+ __u32 count;
+ __u32 max;
+ };
+
+ struct ntsync_mutex_args {
+ __u32 mutex;
+ __u32 owner;
+ __u32 count;
+ };
+
+ struct ntsync_event_args {
+ __u32 event;
+ __u32 signaled;
+ __u32 manual;
+ };
+
+ struct ntsync_wait_args {
+ __u64 timeout;
+ __u64 objs;
+ __u32 count;
+ __u32 owner;
+ __u32 index;
+ __u32 alert;
+ __u32 flags;
+ __u32 pad;
+ };
+
+Depending on the ioctl, members of the structure may be used as input,
+output, or not at all. All ioctls return 0 on success.
+
+The ioctls on the device file are as follows:
+
+.. c:macro:: NTSYNC_IOC_CREATE_SEM
+
+ Create a semaphore object. Takes a pointer to struct
+ :c:type:`ntsync_sem_args`, which is used as follows:
+
+ .. list-table::
+
+ * - ``sem``
+ - On output, contains a file descriptor to the created semaphore.
+ * - ``count``
+ - Initial count of the semaphore.
+ * - ``max``
+ - Maximum count of the semaphore.
+
+ Fails with ``EINVAL`` if ``count`` is greater than ``max``.
+
+.. c:macro:: NTSYNC_IOC_CREATE_MUTEX
+
+ Create a mutex object. Takes a pointer to struct
+ :c:type:`ntsync_mutex_args`, which is used as follows:
+
+ .. list-table::
+
+ * - ``mutex``
+ - On output, contains a file descriptor to the created mutex.
+ * - ``count``
+ - Initial recursion count of the mutex.
+ * - ``owner``
+ - Initial owner of the mutex.
+
+ If ``owner`` is nonzero and ``count`` is zero, or if ``owner`` is
+ zero and ``count`` is nonzero, the function fails with ``EINVAL``.
+
+.. c:macro:: NTSYNC_IOC_CREATE_EVENT
+
+ Create an event object. Takes a pointer to struct
+ :c:type:`ntsync_event_args`, which is used as follows:
+
+ .. list-table::
+
+ * - ``event``
+ - On output, contains a file descriptor to the created event.
+ * - ``signaled``
+ - If nonzero, the event is initially signaled, otherwise
+ nonsignaled.
+ * - ``manual``
+ - If nonzero, the event is a manual-reset event, otherwise
+ auto-reset.
+
+The ioctls on the individual objects are as follows:
+
+.. c:macro:: NTSYNC_IOC_SEM_POST
+
+ Post to a semaphore object. Takes a pointer to a 32-bit integer,
+ which on input holds the count to be added to the semaphore, and on
+ output contains its previous count.
+
+ If adding to the semaphore's current count would raise the latter
+ past the semaphore's maximum count, the ioctl fails with
+ ``EOVERFLOW`` and the semaphore is not affected. If raising the
+ semaphore's count causes it to become signaled, eligible threads
+ waiting on this semaphore will be woken and the semaphore's count
+ decremented appropriately.
+
+.. c:macro:: NTSYNC_IOC_MUTEX_UNLOCK
+
+ Release a mutex object. Takes a pointer to struct
+ :c:type:`ntsync_mutex_args`, which is used as follows:
+
+ .. list-table::
+
+ * - ``mutex``
+ - Ignored.
+ * - ``owner``
+ - Specifies the owner trying to release this mutex.
+ * - ``count``
+ - On output, contains the previous recursion count.
+
+ If ``owner`` is zero, the ioctl fails with ``EINVAL``. If ``owner``
+ is not the current owner of the mutex, the ioctl fails with
+ ``EPERM``.
+
+ The mutex's count will be decremented by one. If decrementing the
+ mutex's count causes it to become zero, the mutex is marked as
+ unowned and signaled, and eligible threads waiting on it will be
+ woken as appropriate.
+
+.. c:macro:: NTSYNC_IOC_SET_EVENT
+
+ Signal an event object. Takes a pointer to a 32-bit integer, which on
+ output contains the previous state of the event.
+
+ Eligible threads will be woken, and auto-reset events will be
+ designaled appropriately.
+
+.. c:macro:: NTSYNC_IOC_RESET_EVENT
+
+ Designal an event object. Takes a pointer to a 32-bit integer, which
+ on output contains the previous state of the event.
+
+.. c:macro:: NTSYNC_IOC_PULSE_EVENT
+
+ Wake threads waiting on an event object while leaving it in an
+ unsignaled state. Takes a pointer to a 32-bit integer, which on
+ output contains the previous state of the event.
+
+ A pulse operation can be thought of as a set followed by a reset,
+ performed as a single atomic operation. If two threads are waiting on
+ an auto-reset event which is pulsed, only one will be woken. If two
+ threads are waiting a manual-reset event which is pulsed, both will
+ be woken. However, in both cases, the event will be unsignaled
+ afterwards, and a simultaneous read operation will always report the
+ event as unsignaled.
+
+.. c:macro:: NTSYNC_IOC_READ_SEM
+
+ Read the current state of a semaphore object. Takes a pointer to
+ struct :c:type:`ntsync_sem_args`, which is used as follows:
+
+ .. list-table::
+
+ * - ``sem``
+ - Ignored.
+ * - ``count``
+ - On output, contains the current count of the semaphore.
+ * - ``max``
+ - On output, contains the maximum count of the semaphore.
+
+.. c:macro:: NTSYNC_IOC_READ_MUTEX
+
+ Read the current state of a mutex object. Takes a pointer to struct
+ :c:type:`ntsync_mutex_args`, which is used as follows:
+
+ .. list-table::
+
+ * - ``mutex``
+ - Ignored.
+ * - ``owner``
+ - On output, contains the current owner of the mutex, or zero
+ if the mutex is not currently owned.
+ * - ``count``
+ - On output, contains the current recursion count of the mutex.
+
+ If the mutex is marked as abandoned, the function fails with
+ ``EOWNERDEAD``. In this case, ``count`` and ``owner`` are set to
+ zero.
+
+.. c:macro:: NTSYNC_IOC_READ_EVENT
+
+ Read the current state of an event object. Takes a pointer to struct
+ :c:type:`ntsync_event_args`, which is used as follows:
+
+ .. list-table::
+
+ * - ``event``
+ - Ignored.
+ * - ``signaled``
+ - On output, contains the current state of the event.
+ * - ``manual``
+ - On output, contains 1 if the event is a manual-reset event,
+ and 0 otherwise.
+
+.. c:macro:: NTSYNC_IOC_KILL_OWNER
+
+ Mark a mutex as unowned and abandoned if it is owned by the given
+ owner. Takes an input-only pointer to a 32-bit integer denoting the
+ owner. If the owner is zero, the ioctl fails with ``EINVAL``. If the
+ owner does not own the mutex, the function fails with ``EPERM``.
+
+ Eligible threads waiting on the mutex will be woken as appropriate
+ (and such waits will fail with ``EOWNERDEAD``, as described below).
+
+.. c:macro:: NTSYNC_IOC_WAIT_ANY
+
+ Poll on any of a list of objects, atomically acquiring at most one.
+ Takes a pointer to struct :c:type:`ntsync_wait_args`, which is
+ used as follows:
+
+ .. list-table::
+
+ * - ``timeout``
+ - Absolute timeout in nanoseconds. If ``NTSYNC_WAIT_REALTIME``
+ is set, the timeout is measured against the REALTIME clock;
+ otherwise it is measured against the MONOTONIC clock. If the
+ timeout is equal to or earlier than the current time, the
+ function returns immediately without sleeping. If ``timeout``
+ is U64_MAX, the function will sleep until an object is
+ signaled, and will not fail with ``ETIMEDOUT``.
+ * - ``objs``
+ - Pointer to an array of ``count`` file descriptors
+ (specified as an integer so that the structure has the same
+ size regardless of architecture). If any object is
+ invalid, the function fails with ``EINVAL``.
+ * - ``count``
+ - Number of objects specified in the ``objs`` array.
+ If greater than ``NTSYNC_MAX_WAIT_COUNT``, the function fails
+ with ``EINVAL``.
+ * - ``owner``
+ - Mutex owner identifier. If any object in ``objs`` is a mutex,
+ the ioctl will attempt to acquire that mutex on behalf of
+ ``owner``. If ``owner`` is zero, the ioctl fails with
+ ``EINVAL``.
+ * - ``index``
+ - On success, contains the index (into ``objs``) of the object
+ which was signaled. If ``alert`` was signaled instead,
+ this contains ``count``.
+ * - ``alert``
+ - Optional event object file descriptor. If nonzero, this
+ specifies an "alert" event object which, if signaled, will
+ terminate the wait. If nonzero, the identifier must point to a
+ valid event.
+ * - ``flags``
+ - Zero or more flags. Currently the only flag is
+ ``NTSYNC_WAIT_REALTIME``, which causes the timeout to be
+ measured against the REALTIME clock instead of MONOTONIC.
+ * - ``pad``
+ - Unused, must be set to zero.
+
+ This function attempts to acquire one of the given objects. If unable
+ to do so, it sleeps until an object becomes signaled, subsequently
+ acquiring it, or the timeout expires. In the latter case the ioctl
+ fails with ``ETIMEDOUT``. The function only acquires one object, even
+ if multiple objects are signaled.
+
+ A semaphore is considered to be signaled if its count is nonzero, and
+ is acquired by decrementing its count by one. A mutex is considered
+ to be signaled if it is unowned or if its owner matches the ``owner``
+ argument, and is acquired by incrementing its recursion count by one
+ and setting its owner to the ``owner`` argument. An auto-reset event
+ is acquired by designaling it; a manual-reset event is not affected
+ by acquisition.
+
+ Acquisition is atomic and totally ordered with respect to other
+ operations on the same object. If two wait operations (with different
+ ``owner`` identifiers) are queued on the same mutex, only one is
+ signaled. If two wait operations are queued on the same semaphore,
+ and a value of one is posted to it, only one is signaled. The order
+ in which threads are signaled is not specified.
+
+ If an abandoned mutex is acquired, the ioctl fails with
+ ``EOWNERDEAD``. Although this is a failure return, the function may
+ otherwise be considered successful. The mutex is marked as owned by
+ the given owner (with a recursion count of 1) and as no longer
+ abandoned, and ``index`` is still set to the index of the mutex.
+
+ The ``alert`` argument is an "extra" event which can terminate the
+ wait, independently of all other objects. If members of ``objs`` and
+ ``alert`` are both simultaneously signaled, a member of ``objs`` will
+ always be given priority and acquired first.
+
+ It is valid to pass the same object more than once, including by
+ passing the same event in the ``objs`` array and in ``alert``. If a
+ wakeup occurs due to that object being signaled, ``index`` is set to
+ the lowest index corresponding to that object.
+
+ The function may fail with ``EINTR`` if a signal is received.
+
+.. c:macro:: NTSYNC_IOC_WAIT_ALL
+
+ Poll on a list of objects, atomically acquiring all of them. Takes a
+ pointer to struct :c:type:`ntsync_wait_args`, which is used
+ identically to ``NTSYNC_IOC_WAIT_ANY``, except that ``index`` is
+ always filled with zero on success if not woken via alert.
+
+ This function attempts to simultaneously acquire all of the given
+ objects. If unable to do so, it sleeps until all objects become
+ simultaneously signaled, subsequently acquiring them, or the timeout
+ expires. In the latter case the ioctl fails with ``ETIMEDOUT`` and no
+ objects are modified.
+
+ Objects may become signaled and subsequently designaled (through
+ acquisition by other threads) while this thread is sleeping. Only
+ once all objects are simultaneously signaled does the ioctl acquire
+ them and return. The entire acquisition is atomic and totally ordered
+ with respect to other operations on any of the given objects.
+
+ If an abandoned mutex is acquired, the ioctl fails with
+ ``EOWNERDEAD``. Similarly to ``NTSYNC_IOC_WAIT_ANY``, all objects are
+ nevertheless marked as acquired. Note that if multiple mutex objects
+ are specified, there is no way to know which were marked as
+ abandoned.
+
+ As with "any" waits, the ``alert`` argument is an "extra" event which
+ can terminate the wait. Critically, however, an "all" wait will
+ succeed if all members in ``objs`` are signaled, *or* if ``alert`` is
+ signaled. In the latter case ``index`` will be set to ``count``. As
+ with "any" waits, if both conditions are filled, the former takes
+ priority, and objects in ``objs`` will be acquired.
+
+ Unlike ``NTSYNC_IOC_WAIT_ANY``, it is not valid to pass the same
+ object more than once, nor is it valid to pass the same object in
+ ``objs`` and in ``alert``. If this is attempted, the function fails
+ with ``EINVAL``.
--
2.43.0
prev parent reply other threads:[~2024-02-19 22:39 UTC|newest]
Thread overview: 38+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-02-19 22:38 [PATCH v2 00/31] NT synchronization primitive driver Elizabeth Figura
2024-02-19 22:38 ` [PATCH v2 01/31] ntsync: Introduce the ntsync driver and character device Elizabeth Figura
2024-02-22 10:56 ` Geert Uytterhoeven
2024-02-22 20:13 ` Elizabeth Figura
2024-02-19 22:38 ` [PATCH v2 02/31] ntsync: Introduce NTSYNC_IOC_CREATE_SEM Elizabeth Figura
2024-03-07 22:12 ` Greg Kroah-Hartman
2024-02-19 22:38 ` [PATCH v2 03/31] ntsync: Introduce NTSYNC_IOC_SEM_POST Elizabeth Figura
2024-02-19 22:38 ` [PATCH v2 04/31] ntsync: Introduce NTSYNC_IOC_WAIT_ANY Elizabeth Figura
2024-03-11 18:58 ` Elizabeth Figura
2024-02-19 22:38 ` [PATCH v2 05/31] ntsync: Introduce NTSYNC_IOC_WAIT_ALL Elizabeth Figura
2024-02-19 22:38 ` [PATCH v2 06/31] ntsync: Introduce NTSYNC_IOC_CREATE_MUTEX Elizabeth Figura
2024-02-19 22:38 ` [PATCH v2 07/31] ntsync: Introduce NTSYNC_IOC_MUTEX_UNLOCK Elizabeth Figura
2024-02-19 22:38 ` [PATCH v2 08/31] ntsync: Introduce NTSYNC_IOC_MUTEX_KILL Elizabeth Figura
2024-02-19 22:38 ` [PATCH v2 09/31] ntsync: Introduce NTSYNC_IOC_CREATE_EVENT Elizabeth Figura
2024-02-19 22:38 ` [PATCH v2 10/31] ntsync: Introduce NTSYNC_IOC_EVENT_SET Elizabeth Figura
2024-02-19 22:38 ` [PATCH v2 11/31] ntsync: Introduce NTSYNC_IOC_EVENT_RESET Elizabeth Figura
2024-02-19 22:38 ` [PATCH v2 12/31] ntsync: Introduce NTSYNC_IOC_EVENT_PULSE Elizabeth Figura
2024-02-19 22:38 ` [PATCH v2 13/31] ntsync: Introduce NTSYNC_IOC_SEM_READ Elizabeth Figura
2024-02-19 22:38 ` [PATCH v2 14/31] ntsync: Introduce NTSYNC_IOC_MUTEX_READ Elizabeth Figura
2024-02-19 22:38 ` [PATCH v2 15/31] ntsync: Introduce NTSYNC_IOC_EVENT_READ Elizabeth Figura
2024-02-19 22:38 ` [PATCH v2 16/31] ntsync: Introduce alertable waits Elizabeth Figura
2024-02-19 22:38 ` [PATCH v2 17/31] ntsync: Allow waits to use the REALTIME clock Elizabeth Figura
2024-02-20 7:01 ` Arnd Bergmann
2024-02-22 0:48 ` Elizabeth Figura
2024-02-19 22:38 ` [PATCH v2 18/31] selftests: ntsync: Add some tests for semaphore state Elizabeth Figura
2024-02-19 22:38 ` [PATCH v2 19/31] selftests: ntsync: Add some tests for mutex state Elizabeth Figura
2024-02-19 22:38 ` [PATCH v2 20/31] selftests: ntsync: Add some tests for NTSYNC_IOC_WAIT_ANY Elizabeth Figura
2024-02-19 22:38 ` [PATCH v2 21/31] selftests: ntsync: Add some tests for NTSYNC_IOC_WAIT_ALL Elizabeth Figura
2024-02-19 22:38 ` [PATCH v2 22/31] selftests: ntsync: Add some tests for wakeup signaling with WINESYNC_IOC_WAIT_ANY Elizabeth Figura
2024-02-19 22:38 ` [PATCH v2 23/31] selftests: ntsync: Add some tests for wakeup signaling with WINESYNC_IOC_WAIT_ALL Elizabeth Figura
2024-02-19 22:38 ` [PATCH v2 24/31] selftests: ntsync: Add some tests for manual-reset event state Elizabeth Figura
2024-02-19 22:38 ` [PATCH v2 25/31] selftests: ntsync: Add some tests for auto-reset " Elizabeth Figura
2024-02-19 22:38 ` [PATCH v2 26/31] selftests: ntsync: Add some tests for wakeup signaling with events Elizabeth Figura
2024-02-19 22:38 ` [PATCH v2 27/31] selftests: ntsync: Add tests for alertable waits Elizabeth Figura
2024-02-19 22:38 ` [PATCH v2 28/31] selftests: ntsync: Add some tests for wakeup signaling via alerts Elizabeth Figura
2024-02-19 22:38 ` [PATCH v2 29/31] selftests: ntsync: Add a stress test for contended waits Elizabeth Figura
2024-02-19 22:38 ` [PATCH v2 30/31] maintainers: Add an entry for ntsync Elizabeth Figura
2024-02-19 22:38 ` Elizabeth Figura [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20240219223833.95710-32-zfigura@codeweavers.com \
--to=zfigura@codeweavers.com \
--cc=ahiler@codeweavers.com \
--cc=andrealmeid@igalia.com \
--cc=arnd@arndb.de \
--cc=corbet@lwn.net \
--cc=gregkh@linuxfoundation.org \
--cc=linux-api@vger.kernel.org \
--cc=linux-doc@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-kselftest@vger.kernel.org \
--cc=luto@kernel.org \
--cc=peterz@infradead.org \
--cc=rdunlap@infradead.org \
--cc=shuah@kernel.org \
--cc=wine-devel@winehq.org \
--cc=wsa@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).