linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Waiman Long <longman@redhat.com>
To: Thomas Gleixner <tglx@linutronix.de>,
	Ingo Molnar <mingo@kernel.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Jonathan Corbet <corbet@lwn.net>
Cc: linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org,
	Arnaldo Carvalho de Melo <acme@kernel.org>,
	Davidlohr Bueso <dave@stgolabs.net>,
	Mike Galbraith <umgwanakikbuti@gmail.com>,
	Scott J Norton <scott.norton@hpe.com>,
	Waiman Long <longman@redhat.com>
Subject: [PATCH-tip v5 18/21] TP-futex, doc: Update TP futexes document on shared locking
Date: Fri,  3 Feb 2017 13:03:51 -0500	[thread overview]
Message-ID: <1486145034-22210-19-git-send-email-longman@redhat.com> (raw)
In-Reply-To: <1486145034-22210-1-git-send-email-longman@redhat.com>

The tp-futex.txt was updated to add description about shared locking
support.

Signed-off-by: Waiman Long <longman@redhat.com>
---
 Documentation/tp-futex.txt | 163 +++++++++++++++++++++++++++++++++++++++------
 1 file changed, 143 insertions(+), 20 deletions(-)

diff --git a/Documentation/tp-futex.txt b/Documentation/tp-futex.txt
index 5324ee0..d4f0ed6 100644
--- a/Documentation/tp-futex.txt
+++ b/Documentation/tp-futex.txt
@@ -68,22 +68,58 @@ the kernel using the FUTEX_LOCK futex(2) syscall.
 Only the optional timeout parameter is being used by this futex(2)
 syscall.
 
-Inside the kernel, a kernel mutex is used for serialization among
-the futex waiters. Only the top lock waiter which is the owner of
-the serialization mutex is allowed to continuously spin and attempt
-to acquire the lock.  Other lock waiters will have one attempt to
-steal the lock before entering the mutex queues.
+A shared lock acquirer, on the other hand, has to do either one of
+the following 2 actions depends on the current value of the futex:
+
+ 1) If current futex value is 0, atomically put (FUTEX_SHARED + 1)
+    into the lower 30 bits of the futex.
+
+ 2) If the FUTEX_SHARED bit is set and the FUTEX_SHARED_UNLOCK bit
+    isn't, atomically increment the reader count in the lower 24 bits.
+    The other unused bits (24-27) can be used for other management purpose
+    and will be ignored by the kernel.
+
+Any failure to both of the above actions will lead to the following
+new FUTEX_LOCK_SHARED futex(2) syscall.
+
+  futex(uaddr, FUTEX_LOCK_SHARED, 0, timeout, NULL, 0);
+
+The special FUTEX_SHARED bit (bit 30) is used to distinguish between a
+shared lock and an exclusive lock. If the FUTEX_SHARED bit isn't set,
+the lower 29 bits contain the TID of the exclusive lock owner. If
+the FUTEX_SHARED bit is set, the lower 29 bits contain the number of
+tasks owning the shared lock.  Therefore, only 29 bits are allowed
+to be used by the TID.
+
+The FUTEX_SHARED_UNLOCK bit can be used by userspace to prevent racing
+and to indicate that a shared unlock is in progress as it will disable
+any shared trylock attempt in the kernel.
+
+Inside the kernel, a kernel mutex is used for serialization among the
+futex waiters. Only the top lock waiter which is the owner of the
+serialization mutex is allowed to continuously spin and attempt to
+acquire the lock.  Other lock waiters will have one or two attempts
+to steal the lock before entering the mutex queues.
 
 When the exclusive futex lock owner is no longer running, the top
 waiter will set the FUTEX_WAITERS bit before going to sleep. This is
 to make sure the futex owner will go into the kernel at unlock time
 to wake up the top waiter.
 
+When the futex is shared by multiple owners, there is no way to
+determine if all the shared owners are running or not. In this case,
+the top waiter will spin for a fixed amount of time if no reader
+activity is observed before giving up and going to sleep. Any change
+in the reader count will be considered reader activities and so will
+reset the timer. However, when the hand-off threshold has been reached,
+reader optimistic spinning timer reset will be disabled automatically.
+
 The return values of the above futex locking syscall, if non-negative,
-are status code that consists of 2 fields - the lock acquisition code
-(bits 0-7) and the number of sleeps (bits 8-30) in the optimistic
-spinning loop before acquiring the futex. A negative returned value
-means an error has happened.
+are status code that consists of 3 fields - the lock acquisition code
+(bits 0-7) and the number of sleeps (bits 16-30) in the optimistic
+spinning loop before acquiring the futex. Bits 8-15 are reserved
+for future extension. A negative returned value means an error has
+happened.
 
 The lock acquisition code can have the following values:
  a) 0   - lock stolen as non-top waiter
@@ -96,14 +132,21 @@ issue a FUTEX_UNLOCK futex(2) syscall to wake up the top waiter.
 
   futex(uaddr, FUTEX_UNLOCK, 0, NULL, NULL, 0);
 
-A return value of 1 from the FUTEX_UNLOCK futex(2) syscall indicates
-a task has been woken up. The syscall returns 0 if no sleeping task
-is woken. A negative value will be returned if an error happens.
+For a shared lock owner, unlock is done by atomically decrementing
+the reader count by one. If the resultant futex value has no reader
+remaining, the process has to atomically change the futex value from
+FUTEX_SHARED to 0. If that fails, it has to issue a FUTEX_UNLOCK_SHARED
+futex(2) syscall to wake up the top waiter.
+
+A return value of 1 from the FUTEX_UNLOCK or FUTEX_UNLOCK_SHARED
+futex(2) syscall indicates a task has been woken up. The syscall
+returns 0 if no sleeping task is woken. A negative value will be
+returned if an error happens.
 
-The error number returned by a FUTEX_UNLOCK syscall on an empty futex
-can be used to decide if the TP futex functionality is implemented
-in the kernel. If it is present, an EPERFM error will be returned.
-Otherwise it will return ENOSYS.
+The error number returned by a FUTEX_UNLOCK or FUTEX_UNLOCK_SHARED
+syscall on an empty futex can be used to decide if the TP futex
+functionality is implemented in the kernel. If it is present, an
+EPERFM error will be returned.  Otherwise it will return ENOSYS.
 
 TP futexes require the kernel to have SMP support as well as support
 for the cmpxchg functionality. For architectures that don't support
@@ -118,6 +161,13 @@ is the PID wrap-around issue where another process with the same PID
 as the real futex owner because of PID wrap-around is mis-identified
 as the owner of a futex.
 
+Shared locking TP futexes, on the other hand, cannot be used with
+robust futexes. The unexpected death of a shared lock owner may
+leave the futex permanently in the shared state leading to deadlock
+of exclusive lock waiters. If necessary, some kind of timeout and
+userspace lock management system have to be used to resolve this
+deadlock problem.
+
 Usage Scenario
 --------------
 
@@ -129,6 +179,24 @@ used when the critical section is long and prone to sleeping. However,
 it may not have the performance gain when compared with a wait-wake
 futex in this case.
 
+A TP futex can also be used to implement a userspace reader-writer
+lock (rwlock) where the writers use the FUTEX_LOCK and FUTEX_UNLOCK
+futex(2) syscalls and the readers used the FUTEX_LOCK_SHARED and
+FUTEX_UNLOCK_SHARED futex(2) syscalls. Beside providing a better
+performance compared with wait-wake futexes, other advantages of
+using the the TP futexes for rwlocks are:
+
+ 1) The simplicity and ease of maintenance of the userspace locking
+    codes. There is only one version of rwlock using TP futex versus
+    a reader-preferring variant and/or a writer-preferring variant
+    when using the wait-wake futexes.
+
+ 2) TP futex is lock-starvation free. That may not be the case
+    with rwlocks implemented with wait-wake futexes especially the
+    reader-preferring variants where the writers can be starved of
+    the lock under certain circumstances. Some writer-preferring
+    rwlock variants may also starve readers of getting the lock.
+
 The wait-wake futexes are more versatile as they can also be used to
 implement other locking primitives like semaphores or conditional
 variables.  So the TP futex is not a direct replacement of the
@@ -138,12 +206,12 @@ futex is likely a better option than the wait-wake futex.
 Sample Code
 -----------
 
-The following are sample code to implement simple mutex lock and
-unlock functions.
+The following are sample code to implement simple mutex or writer
+lock and unlock functions.
 
 __thread int thread_id;
 
-void mutex_lock(int *faddr)
+void write_lock(int *faddr)
 {
 	if (cmpxchg(faddr, 0, thread_id) == 0)
 		return;
@@ -151,7 +219,7 @@ void mutex_lock(int *faddr)
 		;
 }
 
-void mutex_unlock(int *faddr)
+void write_unlock(int *faddr)
 {
 	int old, fval;
 
@@ -159,3 +227,58 @@ void mutex_unlock(int *faddr)
 		return;
 	futex(faddr, FUTEX_UNLOCK, ...);
 }
+
+The following sample code can be used to implement shared or reader
+lock and unlock functions.
+
+void read_lock(int *faddr)
+{
+	int val = cmpxchg(faddr, 0, FUTEX_SHARED + 1);
+
+	if (!val)
+		return;
+	for (;;) {
+		int old = val, new;
+
+		if (!old)
+			new = FUTEX_SHARED + 1;
+		else if (!(old & FUTEX_SHARED) ||
+			 (old & ((~FUTEX_SHARED_TID_MASK)|
+				   FUTEX_SHARED_UNLOCK)))
+			break;
+		else
+			new = old + 1;
+		val = cmpxchg(faddr, old, new);
+		if (val == old)
+			return;
+	}
+	while (futex(faddr, FUTEX_LOCK_SHARED, ...) < 0)
+		;
+}
+
+void read_unlock(int *faddr)
+{
+	int val = xadd(faddr, -1) - 1;
+	int old;
+
+	for (;;) {
+		/*
+		 * Return if not the last reader, not in shared locking
+		 * mode or the unlock bit has been set.
+		 */
+		if (!(val & FUTEX_SHARED) ||
+		     (val & (FUTEX_SHARED_UNLOCK|FUTEX_SCNT_MASK)))
+			return;
+		if (val & ~FUTEX_SHARED_TID_MASK) {
+			old = cmpxchg(faddr, val, val|FUTEX_SHARED_UNLOCK);
+			if (old == val)
+				break;	/* Do FUTEX_UNLOCK_SHARED */
+		} else {
+			old = cmpxchg(faddr, val, 0);
+			if (old == val)
+				return;
+		}
+		val = old;
+	}
+	futex(faddr, FUTEX_UNLOCK_SHARED, ...);
+}
-- 
1.8.3.1

  parent reply	other threads:[~2017-02-03 18:04 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-02-03 18:03 [PATCH-tip v5 00/21] futex: Introducing throughput-optimized (TP) futexes Waiman Long
2017-02-03 18:03 ` [PATCH-tip v5 01/21] perf bench: New microbenchmark for userspace mutex performance Waiman Long
2017-02-03 18:03 ` [PATCH-tip v5 02/21] perf bench: New microbenchmark for userspace rwlock performance Waiman Long
2017-02-03 18:03 ` [PATCH-tip v5 03/21] futex: Consolidate duplicated timer setup code Waiman Long
2017-02-03 18:03 ` [PATCH-tip v5 04/21] futex: Rename futex_pi_state to futex_state Waiman Long
2017-02-03 18:03 ` [PATCH-tip v5 05/21] futex: Add helpers to get & cmpxchg futex value without lock Waiman Long
2017-02-03 18:03 ` [PATCH-tip v5 06/21] futex: Consolidate pure pi_state_list add & delete codes to helpers Waiman Long
2017-02-03 18:03 ` [PATCH-tip v5 07/21] futex: Add a new futex type field into futex_state Waiman Long
2017-02-03 18:03 ` [PATCH-tip v5 08/21] futex: Allow direct attachment of futex_state objects to hash bucket Waiman Long
2017-02-03 18:03 ` [PATCH-tip v5 09/21] futex: Introduce throughput-optimized (TP) futexes Waiman Long
2017-02-03 18:03 ` [PATCH-tip v5 10/21] TP-futex: Enable robust handling Waiman Long
2017-02-03 18:03 ` [PATCH-tip v5 11/21] TP-futex: Implement lock handoff to prevent lock starvation Waiman Long
2017-02-03 18:03 ` [PATCH-tip v5 12/21] TP-futex: Return status code on FUTEX_LOCK calls Waiman Long
2017-02-03 18:03 ` [PATCH-tip v5 13/21] TP-futex: Add timeout support Waiman Long
2017-02-03 18:03 ` [PATCH-tip v5 14/21] TP-futex, doc: Add TP futexes documentation Waiman Long
2017-02-03 18:03 ` [PATCH-tip v5 15/21] TP-futex: Support userspace reader/writer locks Waiman Long
2017-02-03 18:03 ` [PATCH-tip v5 16/21] TP-futex: Enable kernel reader lock stealing Waiman Long
2017-02-03 18:03 ` [PATCH-tip v5 17/21] TP-futex: Group readers together in wait queue Waiman Long
2017-02-03 18:23   ` valdis.kletnieks
2017-02-03 18:42     ` Waiman Long
2017-02-03 19:26       ` valdis.kletnieks
2017-02-03 18:03 ` Waiman Long [this message]
2017-02-03 18:03 ` [PATCH-tip v5 19/21] perf bench: Extend mutex/rwlock futex suite to test TP futexes Waiman Long
2017-02-03 18:03 ` [PATCH-tip v5 20/21] sched, TP-futex: Make wake_up_q() return wakeup count Waiman Long
2017-02-03 18:03 ` [PATCH-tip v5 21/21] futex: Dump internal futex state via debugfs Waiman Long

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1486145034-22210-19-git-send-email-longman@redhat.com \
    --to=longman@redhat.com \
    --cc=acme@kernel.org \
    --cc=corbet@lwn.net \
    --cc=dave@stgolabs.net \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@kernel.org \
    --cc=peterz@infradead.org \
    --cc=scott.norton@hpe.com \
    --cc=tglx@linutronix.de \
    --cc=umgwanakikbuti@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).