All of lore.kernel.org
 help / color / mirror / Atom feed
From: Waiman Long <Waiman.Long@hp.com>
To: Thomas Gleixner <tglx@linutronix.de>,
	Ingo Molnar <mingo@redhat.com>, "H. Peter Anvin" <hpa@zytor.com>,
	Arnd Bergmann <arnd@arndb.de>,
	Peter Zijlstra <peterz@infradead.org>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>,
	Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>,
	kvm@vger.kernel.org, virtualization@lists.linux-foundation.org,
	Andi Kleen <andi@firstfloor.org>,
	Michel Lespinasse <walken@google.com>,
	Alok Kataria <akataria@vmware.com>,
	linux-arch@vger.kernel.org, Gleb Natapov <gleb@redhat.com>,
	x86@kernel.org, xen-devel@lists.xenproject.org,
	"Paul E. McKenney" <paulmck@linux.vnet.ibm.com>,
	Scott J Norton <scott.norton@hp.com>,
	Rusty Russell <rusty@rustcorp.com.au>,
	Steven Rostedt <rostedt@goodmis.org>,
	Chris Wright <chrisw@sous-sol.org>,
	Oleg Nesterov <oleg@redhat.com>,
	Boris Ostrovsky <boris.ostrovsky@oracle.com>,
	Aswin Chandramouleeswaran <aswin@hp.com>,
	Chegu Vinod <chegu_vinod@hp.com>,
	Waiman Long <Waiman.Long@hp.com>,
	linux-kernel@vger.kernel.org,
	David Vrabel <david.vrabel@citrix.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Linu
Subject: [PATCH v6 03/11] qspinlock: More optimized code for smaller NR_CPUS
Date: Wed, 12 Mar 2014 14:54:50 -0400	[thread overview]
Message-ID: <1394650498-30118-4-git-send-email-Waiman.Long@hp.com> (raw)
In-Reply-To: <1394650498-30118-1-git-send-email-Waiman.Long@hp.com>

For architectures that support atomic operations on smaller 8 or
16 bits data types. It is possible to simplify the code and produce
slightly better optimized code at the expense of smaller number of
supported CPUs.

The qspinlock code can support up to a maximum of 4M-1 CPUs. With
less than 16K CPUs, it is possible to squeeze the queue code into a
2-byte short word which can be accessed directly as a 16-bit short
data type. This enables the simplification of the queue code exchange
portion of the slowpath code.

This patch introduces a new macro _ARCH_SUPPORTS_ATOMIC_8_16_BITS_OPS
which can now be defined in an architecture specific qspinlock.h header
file to indicate its support for smaller atomic operation data types.
This macro triggers the replacement of some of the generic functions
by more optimized versions.

Signed-off-by: Waiman Long <Waiman.Long@hp.com>
---
 arch/x86/include/asm/qspinlock.h      |   14 ++++-
 include/asm-generic/qspinlock_types.h |    8 ++-
 kernel/locking/qspinlock.c            |  100 +++++++++++++++++++++++++++++++++
 3 files changed, 120 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/qspinlock.h b/arch/x86/include/asm/qspinlock.h
index 44cefee..acbe155 100644
--- a/arch/x86/include/asm/qspinlock.h
+++ b/arch/x86/include/asm/qspinlock.h
@@ -8,11 +8,23 @@
 #define _ARCH_SUPPORTS_ATOMIC_8_16_BITS_OPS
 
 /*
+ * As the qcode will be accessed as a 16-bit word, no offset is needed
+ */
+#define _QCODE_VAL_OFFSET	0
+
+/*
  * x86-64 specific queue spinlock union structure
+ * Besides the slock and lock fields, the other fields are only
+ * valid with less than 16K CPUs.
  */
 union arch_qspinlock {
 	struct qspinlock slock;
-	u8		 lock;	/* Lock bit	*/
+	struct {
+		u8  lock;	/* Lock bit	*/
+		u8  reserved;
+		u16 qcode;	/* Queue code	*/
+	};
+	u32 qlcode;		/* Complete lock word */
 };
 
 #define	queue_spin_unlock queue_spin_unlock
diff --git a/include/asm-generic/qspinlock_types.h b/include/asm-generic/qspinlock_types.h
index df981d0..3a02a9e 100644
--- a/include/asm-generic/qspinlock_types.h
+++ b/include/asm-generic/qspinlock_types.h
@@ -48,7 +48,13 @@ typedef struct qspinlock {
 	atomic_t	qlcode;	/* Lock + queue code */
 } arch_spinlock_t;
 
-#define _QCODE_OFFSET		8
+#if CONFIG_NR_CPUS >= (1 << 14)
+# define _Q_MANY_CPUS
+# define _QCODE_OFFSET	8
+#else
+# define _QCODE_OFFSET	16
+#endif
+
 #define _QSPINLOCK_LOCKED	1U
 #define	_QSPINLOCK_LOCK_MASK	0xff
 
diff --git a/kernel/locking/qspinlock.c b/kernel/locking/qspinlock.c
index f1a8102..52d3580 100644
--- a/kernel/locking/qspinlock.c
+++ b/kernel/locking/qspinlock.c
@@ -62,6 +62,10 @@
  * Bits 0-1 : queue node index (4 nodes)
  * Bits 2-23: CPU number + 1   (4M - 1 CPUs)
  *
+ * The 16-bit queue node code is divided into the following 2 fields:
+ * Bits 0-1 : queue node index (4 nodes)
+ * Bits 2-15: CPU number + 1   (16K - 1 CPUs)
+ *
  * A queue node code of 0 indicates that no one is waiting for the lock.
  * As the value 0 cannot be used as a valid CPU number. We need to add
  * 1 to it before putting it into the queue code.
@@ -93,6 +97,101 @@ struct qnode_set {
  */
 static DEFINE_PER_CPU_ALIGNED(struct qnode_set, qnset) = { { { 0 } }, 0 };
 
+/*
+ ************************************************************************
+ * The following optimized codes are for architectures that support:	*
+ *  1) Atomic byte and short data write					*
+ *  2) Byte and short data exchange and compare-exchange instructions	*
+ *									*
+ * For those architectures, their asm/qspinlock.h header file should	*
+ * define the followings in order to use the optimized codes.		*
+ *  1) The _ARCH_SUPPORTS_ATOMIC_8_16_BITS_OPS macro			*
+ *  2) A "union arch_qspinlock" structure that include the individual	*
+ *     fields of the qspinlock structure, including:			*
+ *      o slock     - the qspinlock structure				*
+ *      o lock      - the lock byte					*
+ *      o qcode     - the queue node code				*
+ *      o qlcode    - the 32-bit qspinlock word				*
+ *									*
+ ************************************************************************
+ */
+#ifdef _ARCH_SUPPORTS_ATOMIC_8_16_BITS_OPS
+#ifndef _Q_MANY_CPUS
+/*
+ * With less than 16K CPUs, the following optimizations are possible with
+ * architectures that allows atomic 8/16 bit operations:
+ *  1) The 16-bit queue code can be accessed or modified directly as a
+ *     16-bit short value without disturbing the first 2 bytes.
+ */
+#define queue_encode_qcode(cpu, idx)	(((cpu) + 1) << 2 | (idx))
+
+#define queue_code_xchg queue_code_xchg
+/**
+ * queue_code_xchg - exchange a queue code value
+ * @lock : Pointer to queue spinlock structure
+ * @ocode: Old queue code in the lock [OUT]
+ * @ncode: New queue code to be exchanged
+ * Return: 0 is always returned
+ */
+static inline int queue_code_xchg(struct qspinlock *lock, u32 *ocode, u32 ncode)
+{
+	union arch_qspinlock *qlock = (union arch_qspinlock *)lock;
+
+	*ocode = xchg(&qlock->qcode, (u16)ncode);
+	return 0;
+}
+
+#define queue_spin_trylock_and_clr_qcode queue_spin_trylock_and_clr_qcode
+/**
+ * queue_spin_trylock_and_clr_qcode - Try to lock & clear qcode simultaneously
+ * @lock : Pointer to queue spinlock structure
+ * @qcode: The supposedly current qcode value
+ * Return: true if successful, false otherwise
+ */
+static inline int
+queue_spin_trylock_and_clr_qcode(struct qspinlock *lock, u32 qcode)
+{
+	qcode <<= _QCODE_OFFSET;
+	return atomic_cmpxchg(&lock->qlcode, qcode, _QSPINLOCK_LOCKED) == qcode;
+}
+
+#define queue_get_lock_qcode queue_get_lock_qcode
+/**
+ * queue_get_lock_qcode - get the lock & qcode values
+ * @lock  : Pointer to queue spinlock structure
+ * @qcode : Pointer to the returned qcode value
+ * @mycode: My qcode value
+ * Return : != 0 if lock is not available
+ *	     = 0 if lock is free
+ *
+ * It is considered locked when either the lock bit or the wait bit is set.
+ */
+static inline int
+queue_get_lock_qcode(struct qspinlock *lock, u32 *qcode, u32 mycode)
+{
+	u32 qlcode = (u32)atomic_read(&lock->qlcode);
+
+	*qcode = qlcode >> _QCODE_OFFSET;
+	return qlcode & _QSPINLOCK_LOCKED;
+}
+#endif /* _Q_MANY_CPUS */
+
+/**
+ * queue_spin_setlock - try to acquire the lock by setting the lock bit
+ * @lock: Pointer to queue spinlock structure
+ * Return: 1 if lock bit set successfully, 0 if failed
+ */
+static __always_inline int queue_spin_setlock(struct qspinlock *lock)
+{
+	union arch_qspinlock *qlock = (union arch_qspinlock *)lock;
+
+	return cmpxchg(&qlock->lock, 0, _QSPINLOCK_LOCKED) == 0;
+}
+#else /*  _ARCH_SUPPORTS_ATOMIC_8_16_BITS_OPS  */
+/*
+ * Generic functions for architectures that do not support atomic
+ * byte or short data types.
+ */
 /**
  *_queue_spin_setlock - try to acquire the lock by setting the lock bit
  * @lock: Pointer to queue spinlock structure
@@ -107,6 +206,7 @@ static __always_inline int queue_spin_setlock(struct qspinlock *lock)
 			return 1;
 	return 0;
 }
+#endif /* _ARCH_SUPPORTS_ATOMIC_8_16_BITS_OPS */
 
 /*
  ************************************************************************
-- 
1.7.1

  parent reply	other threads:[~2014-03-12 18:54 UTC|newest]

Thread overview: 135+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-03-12 18:54 [PATCH v6 00/11] qspinlock: a 4-byte queue spinlock with PV support Waiman Long
2014-03-12 18:54 ` [PATCH v6 01/11] qspinlock: A generic 4-byte queue spinlock implementation Waiman Long
2014-03-12 18:54 ` Waiman Long
2014-03-12 18:54 ` [PATCH v6 02/11] qspinlock, x86: Enable x86-64 to use queue spinlock Waiman Long
2014-03-12 18:54 ` Waiman Long
2014-03-12 18:54 ` [PATCH v6 03/11] qspinlock: More optimized code for smaller NR_CPUS Waiman Long
2014-03-12 18:54 ` Waiman Long [this message]
2014-03-12 18:54 ` [PATCH v6 04/11] qspinlock: Optimized code path for 2 contending tasks Waiman Long
2014-03-12 19:08   ` Waiman Long
2014-03-12 19:08     ` Waiman Long
2014-03-13 13:57     ` Peter Zijlstra
2014-03-13 13:57     ` Peter Zijlstra
2014-03-13 13:57       ` Peter Zijlstra
2014-03-17 17:23       ` Waiman Long
2014-03-17 17:23       ` Waiman Long
2014-03-17 17:23         ` Waiman Long
2014-03-12 19:08   ` Waiman Long
2014-03-12 18:54 ` Waiman Long
2014-03-12 18:54 ` [PATCH v6 05/11] pvqspinlock, x86: Allow unfair spinlock in a PV guest Waiman Long
2014-03-13 10:54   ` David Vrabel
2014-03-13 10:54   ` David Vrabel
2014-03-13 10:54     ` David Vrabel
2014-03-13 13:16     ` Paolo Bonzini
2014-03-13 13:16     ` Paolo Bonzini
2014-03-13 13:16       ` Paolo Bonzini
2014-03-13 13:16       ` Paolo Bonzini
2014-03-17 19:05       ` Konrad Rzeszutek Wilk
2014-03-17 19:05       ` Konrad Rzeszutek Wilk
2014-03-17 19:05         ` Konrad Rzeszutek Wilk
2014-03-17 19:05         ` Konrad Rzeszutek Wilk
2014-03-18  8:14         ` Paolo Bonzini
2014-03-18  8:14         ` Paolo Bonzini
2014-03-18  8:14           ` Paolo Bonzini
2014-03-19  3:15           ` Waiman Long
2014-03-19  3:15           ` Waiman Long
2014-03-19  3:15           ` Waiman Long
2014-03-19  3:15             ` Waiman Long
2014-03-19  3:15             ` Waiman Long
2014-03-19 10:07             ` Paolo Bonzini
2014-03-19 16:58               ` Waiman Long
2014-03-19 16:58               ` Waiman Long
2014-03-19 16:58               ` Waiman Long
2014-03-19 17:08                 ` Paolo Bonzini
2014-03-19 17:08                 ` Paolo Bonzini
2014-03-19 17:08                   ` Paolo Bonzini
2014-03-19 10:07             ` Paolo Bonzini
2014-03-19 10:07             ` Paolo Bonzini
2014-03-13 19:03     ` Waiman Long
2014-03-13 19:03     ` Waiman Long
2014-03-13 19:03       ` Waiman Long
2014-03-13 15:15   ` Peter Zijlstra
2014-03-13 15:15     ` Peter Zijlstra
2014-03-13 20:05     ` Waiman Long
2014-03-13 20:05     ` Waiman Long
2014-03-13 20:05       ` Waiman Long
2014-03-14  8:30       ` Peter Zijlstra
2014-03-14  8:30         ` Peter Zijlstra
2014-03-14  8:48         ` Paolo Bonzini
2014-03-14  8:48         ` Paolo Bonzini
2014-03-14  8:48           ` Paolo Bonzini
2014-03-17 17:44         ` Waiman Long
2014-03-17 17:44         ` Waiman Long
2014-03-17 17:44           ` Waiman Long
2014-03-17 18:54           ` Peter Zijlstra
2014-03-17 18:54           ` Peter Zijlstra
2014-03-17 18:54             ` Peter Zijlstra
2014-03-18  8:16             ` Paolo Bonzini
2014-03-18  8:16               ` Paolo Bonzini
2014-03-18  8:16             ` Paolo Bonzini
2014-03-19  3:08             ` Waiman Long
2014-03-19  3:08             ` Waiman Long
2014-03-19  3:08               ` Waiman Long
2014-03-17 19:10           ` Konrad Rzeszutek Wilk
2014-03-17 19:10             ` Konrad Rzeszutek Wilk
2014-03-19  3:11             ` Waiman Long
2014-03-19  3:11             ` Waiman Long
2014-03-19  3:11               ` Waiman Long
2014-03-19 15:25               ` Konrad Rzeszutek Wilk
2014-03-19 15:25               ` Konrad Rzeszutek Wilk
2014-03-19 15:25                 ` Konrad Rzeszutek Wilk
2014-03-17 19:10           ` Konrad Rzeszutek Wilk
2014-03-14  8:30       ` Peter Zijlstra
2014-03-13 15:15   ` Peter Zijlstra
2014-03-12 18:54 ` Waiman Long
2014-03-12 18:54 ` [PATCH v6 06/11] pvqspinlock, x86: Allow unfair queue spinlock in a KVM guest Waiman Long
2014-03-12 18:54 ` Waiman Long
2014-03-12 18:54 ` [PATCH v6 07/11] pvqspinlock, x86: Allow unfair queue spinlock in a XEN guest Waiman Long
2014-03-12 18:54 ` Waiman Long
2014-03-12 18:54 ` [PATCH v6 08/11] pvqspinlock, x86: Rename paravirt_ticketlocks_enabled Waiman Long
2014-03-12 18:54 ` Waiman Long
2014-03-12 18:54 ` [PATCH RFC v6 09/11] pvqspinlock, x86: Add qspinlock para-virtualization support Waiman Long
2014-03-12 18:54 ` Waiman Long
2014-03-13 11:21   ` David Vrabel
2014-03-13 11:21     ` David Vrabel
2014-03-13 13:57     ` Paolo Bonzini
2014-03-13 13:57     ` Paolo Bonzini
2014-03-13 13:57       ` Paolo Bonzini
2014-03-13 13:57       ` Paolo Bonzini
2014-03-13 19:49       ` Waiman Long
2014-03-13 19:49       ` Waiman Long
2014-03-13 19:49       ` Waiman Long
2014-03-13 19:49         ` Waiman Long
2014-03-13 19:49         ` Waiman Long
2014-03-14  9:44         ` Paolo Bonzini
2014-03-14  9:44         ` Paolo Bonzini
2014-03-14  9:44         ` Paolo Bonzini
2014-03-13 13:57     ` Paolo Bonzini
2014-03-13 19:05     ` Waiman Long
2014-03-13 19:05       ` Waiman Long
2014-03-13 19:05     ` Waiman Long
2014-03-13 11:21   ` David Vrabel
2014-03-12 18:54 ` [PATCH RFC v6 10/11] pvqspinlock, x86: Enable qspinlock PV support for KVM Waiman Long
2014-03-12 18:54 ` Waiman Long
2014-03-13 13:59   ` Paolo Bonzini
2014-03-13 13:59     ` Paolo Bonzini
2014-03-13 19:13     ` Waiman Long
2014-03-13 19:13     ` Waiman Long
2014-03-14  8:42       ` Paolo Bonzini
2014-03-14  8:42       ` Paolo Bonzini
2014-03-14  8:42         ` Paolo Bonzini
2014-03-17 17:47         ` Waiman Long
2014-03-17 17:47         ` Waiman Long
2014-03-17 17:47           ` Waiman Long
2014-03-18  8:18           ` Paolo Bonzini
2014-03-18  8:18             ` Paolo Bonzini
2014-03-18  8:18           ` Paolo Bonzini
2014-03-13 13:59   ` Paolo Bonzini
2014-03-13 15:25   ` Peter Zijlstra
2014-03-13 15:25   ` Peter Zijlstra
2014-03-13 15:25     ` Peter Zijlstra
2014-03-13 20:09     ` Waiman Long
2014-03-13 20:09     ` Waiman Long
2014-03-13 20:09       ` Waiman Long
2014-03-12 18:54 ` [PATCH RFC v6 11/11] pvqspinlock, x86: Enable qspinlock PV support for XEN Waiman Long
2014-03-12 18:54 ` Waiman Long

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1394650498-30118-4-git-send-email-Waiman.Long@hp.com \
    --to=waiman.long@hp.com \
    --cc=akataria@vmware.com \
    --cc=akpm@linux-foundation.org \
    --cc=andi@firstfloor.org \
    --cc=arnd@arndb.de \
    --cc=aswin@hp.com \
    --cc=boris.ostrovsky@oracle.com \
    --cc=chegu_vinod@hp.com \
    --cc=chrisw@sous-sol.org \
    --cc=david.vrabel@citrix.com \
    --cc=gleb@redhat.com \
    --cc=hpa@zytor.com \
    --cc=jeremy@goop.org \
    --cc=kvm@vger.kernel.org \
    --cc=linux-arch@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=oleg@redhat.com \
    --cc=paulmck@linux.vnet.ibm.com \
    --cc=peterz@infradead.org \
    --cc=raghavendra.kt@linux.vnet.ibm.com \
    --cc=rostedt@goodmis.org \
    --cc=rusty@rustcorp.com.au \
    --cc=scott.norton@hp.com \
    --cc=tglx@linutronix.de \
    --cc=virtualization@lists.linux-foundation.org \
    --cc=walken@google.com \
    --cc=x86@kernel.org \
    --cc=xen-devel@lists.xenproject.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.