All of lore.kernel.org
 help / color / mirror / Atom feed
From: Pan Xinhui <xinhui.pan@linux.vnet.ibm.com>
To: linux-kernel@vger.kernel.org, linuxppc-dev@lists.ozlabs.org,
	virtualization@lists.linux-foundation.org,
	linux-s390@vger.kernel.org
Cc: peterz@infradead.org, mingo@redhat.com, mpe@ellerman.id.au,
	paulus@samba.org, benh@kernel.crashing.org,
	paulmck@linux.vnet.ibm.com, waiman.long@hpe.com,
	will.deacon@arm.com, boqun.feng@gmail.com, dave@stgolabs.net,
	schwidefsky@de.ibm.com,
	Pan Xinhui <xinhui.pan@linux.vnet.ibm.com>
Subject: [PATCH v2 4/4] kernel/locking: Drop the overload of {mutex,rwsem}_spin_on_owner
Date: Tue, 28 Jun 2016 10:43:11 -0400	[thread overview]
Message-ID: <1467124991-13164-5-git-send-email-xinhui.pan@linux.vnet.ibm.com> (raw)
In-Reply-To: <1467124991-13164-1-git-send-email-xinhui.pan@linux.vnet.ibm.com>

An over-committed guest with more vCPUs than pCPUs has a heavy overload in
the two spin_on_owner. This blames on the lock holder preemption issue.

Kernel has an interface bool vcpu_is_preempted(int cpu) to see if a vCPU is
currently running or not. So break the spin loops on true condition.

test-case:
perf record -a perf bench sched messaging -g 400 -p && perf report

before patch:
20.68%  sched-messaging  [kernel.vmlinux]  [k] mutex_spin_on_owner
 8.45%  sched-messaging  [kernel.vmlinux]  [k] mutex_unlock
 4.12%  sched-messaging  [kernel.vmlinux]  [k] system_call
 3.01%  sched-messaging  [kernel.vmlinux]  [k] system_call_common
 2.83%  sched-messaging  [kernel.vmlinux]  [k] copypage_power7
 2.64%  sched-messaging  [kernel.vmlinux]  [k] rwsem_spin_on_owner
 2.00%  sched-messaging  [kernel.vmlinux]  [k] osq_lock

after patch:
 9.99%  sched-messaging  [kernel.vmlinux]  [k] mutex_unlock
 5.28%  sched-messaging  [unknown]         [H] 0xc0000000000768e0
 4.27%  sched-messaging  [kernel.vmlinux]  [k] __copy_tofrom_user_power7
 3.77%  sched-messaging  [kernel.vmlinux]  [k] copypage_power7
 3.24%  sched-messaging  [kernel.vmlinux]  [k] _raw_write_lock_irq
 3.02%  sched-messaging  [kernel.vmlinux]  [k] system_call
 2.69%  sched-messaging  [kernel.vmlinux]  [k] wait_consider_task

Signed-off-by: Pan Xinhui <xinhui.pan@linux.vnet.ibm.com>
---
 kernel/locking/mutex.c      | 15 +++++++++++++--
 kernel/locking/rwsem-xadd.c | 16 +++++++++++++---
 2 files changed, 26 insertions(+), 5 deletions(-)

diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c
index 79d2d76..ef0451b2 100644
--- a/kernel/locking/mutex.c
+++ b/kernel/locking/mutex.c
@@ -236,7 +236,13 @@ bool mutex_spin_on_owner(struct mutex *lock, struct task_struct *owner)
 		 */
 		barrier();
 
-		if (!owner->on_cpu || need_resched()) {
+		/*
+		 * Use vcpu_is_preempted to detech lock holder preemption issue
+		 * and break. vcpu_is_preempted is a macro defined by false if
+		 * arch does not support vcpu preempted check,
+		 */
+		if (!owner->on_cpu || need_resched() ||
+				vcpu_is_preempted(task_cpu(owner))) {
 			ret = false;
 			break;
 		}
@@ -261,8 +267,13 @@ static inline int mutex_can_spin_on_owner(struct mutex *lock)
 
 	rcu_read_lock();
 	owner = READ_ONCE(lock->owner);
+
+	/*
+	 * As lock holder preemption issue, we both skip spinning if task not
+	 * on cpu or its cpu is preempted
+	 */
 	if (owner)
-		retval = owner->on_cpu;
+		retval = owner->on_cpu && !vcpu_is_preempted(task_cpu(owner));
 	rcu_read_unlock();
 	/*
 	 * if lock->owner is not set, the mutex owner may have just acquired
diff --git a/kernel/locking/rwsem-xadd.c b/kernel/locking/rwsem-xadd.c
index 09e30c6..828ca7c 100644
--- a/kernel/locking/rwsem-xadd.c
+++ b/kernel/locking/rwsem-xadd.c
@@ -319,7 +319,11 @@ static inline bool rwsem_can_spin_on_owner(struct rw_semaphore *sem)
 		goto done;
 	}
 
-	ret = owner->on_cpu;
+	/*
+	 * As lock holder preemption issue, we both skip spinning if task not
+	 * on cpu or its cpu is preempted
+	 */
+	ret = owner->on_cpu && !vcpu_is_preempted(task_cpu(owner));
 done:
 	rcu_read_unlock();
 	return ret;
@@ -340,8 +344,14 @@ bool rwsem_spin_on_owner(struct rw_semaphore *sem, struct task_struct *owner)
 		 */
 		barrier();
 
-		/* abort spinning when need_resched or owner is not running */
-		if (!owner->on_cpu || need_resched()) {
+		/*
+		 * abort spinning when need_resched or owner is not running or
+		 * owner's cpu is preempted. vcpu_is_preempted is a macro
+		 * defined by false if arch does not support vcpu preempted
+		 * check
+		 */
+		if (!owner->on_cpu || need_resched() ||
+				vcpu_is_preempted(task_cpu(owner))) {
 			rcu_read_unlock();
 			return false;
 		}
-- 
2.4.11

WARNING: multiple messages have this Message-ID (diff)
From: Pan Xinhui <xinhui.pan@linux.vnet.ibm.com>
To: linux-kernel@vger.kernel.org, linuxppc-dev@lists.ozlabs.org,
	virtualization@lists.linux-foundation.org,
	linux-s390@vger.kernel.org
Cc: peterz@infradead.org, mingo@redhat.com, mpe@ellerman.id.au,
	paulus@samba.org, benh@kernel.crashing.org,
	paulmck@linux.vnet.ibm.com, waiman.long@hpe.com,
	will.deacon@arm.com, boqun.feng@gmail.com, dave@stgolabs.net,
	schwidefsky@de.ibm.com,
	Pan Xinhui <xinhui.pan@linux.vnet.ibm.com>
Subject: [PATCH v2 4/4] kernel/locking: Drop the overload of {mutex, rwsem}_spin_on_owner
Date: Tue, 28 Jun 2016 10:43:11 -0400	[thread overview]
Message-ID: <1467124991-13164-5-git-send-email-xinhui.pan@linux.vnet.ibm.com> (raw)
In-Reply-To: <1467124991-13164-1-git-send-email-xinhui.pan@linux.vnet.ibm.com>

An over-committed guest with more vCPUs than pCPUs has a heavy overload in
the two spin_on_owner. This blames on the lock holder preemption issue.

Kernel has an interface bool vcpu_is_preempted(int cpu) to see if a vCPU is
currently running or not. So break the spin loops on true condition.

test-case:
perf record -a perf bench sched messaging -g 400 -p && perf report

before patch:
20.68%  sched-messaging  [kernel.vmlinux]  [k] mutex_spin_on_owner
 8.45%  sched-messaging  [kernel.vmlinux]  [k] mutex_unlock
 4.12%  sched-messaging  [kernel.vmlinux]  [k] system_call
 3.01%  sched-messaging  [kernel.vmlinux]  [k] system_call_common
 2.83%  sched-messaging  [kernel.vmlinux]  [k] copypage_power7
 2.64%  sched-messaging  [kernel.vmlinux]  [k] rwsem_spin_on_owner
 2.00%  sched-messaging  [kernel.vmlinux]  [k] osq_lock

after patch:
 9.99%  sched-messaging  [kernel.vmlinux]  [k] mutex_unlock
 5.28%  sched-messaging  [unknown]         [H] 0xc0000000000768e0
 4.27%  sched-messaging  [kernel.vmlinux]  [k] __copy_tofrom_user_power7
 3.77%  sched-messaging  [kernel.vmlinux]  [k] copypage_power7
 3.24%  sched-messaging  [kernel.vmlinux]  [k] _raw_write_lock_irq
 3.02%  sched-messaging  [kernel.vmlinux]  [k] system_call
 2.69%  sched-messaging  [kernel.vmlinux]  [k] wait_consider_task

Signed-off-by: Pan Xinhui <xinhui.pan@linux.vnet.ibm.com>
---
 kernel/locking/mutex.c      | 15 +++++++++++++--
 kernel/locking/rwsem-xadd.c | 16 +++++++++++++---
 2 files changed, 26 insertions(+), 5 deletions(-)

diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c
index 79d2d76..ef0451b2 100644
--- a/kernel/locking/mutex.c
+++ b/kernel/locking/mutex.c
@@ -236,7 +236,13 @@ bool mutex_spin_on_owner(struct mutex *lock, struct task_struct *owner)
 		 */
 		barrier();
 
-		if (!owner->on_cpu || need_resched()) {
+		/*
+		 * Use vcpu_is_preempted to detech lock holder preemption issue
+		 * and break. vcpu_is_preempted is a macro defined by false if
+		 * arch does not support vcpu preempted check,
+		 */
+		if (!owner->on_cpu || need_resched() ||
+				vcpu_is_preempted(task_cpu(owner))) {
 			ret = false;
 			break;
 		}
@@ -261,8 +267,13 @@ static inline int mutex_can_spin_on_owner(struct mutex *lock)
 
 	rcu_read_lock();
 	owner = READ_ONCE(lock->owner);
+
+	/*
+	 * As lock holder preemption issue, we both skip spinning if task not
+	 * on cpu or its cpu is preempted
+	 */
 	if (owner)
-		retval = owner->on_cpu;
+		retval = owner->on_cpu && !vcpu_is_preempted(task_cpu(owner));
 	rcu_read_unlock();
 	/*
 	 * if lock->owner is not set, the mutex owner may have just acquired
diff --git a/kernel/locking/rwsem-xadd.c b/kernel/locking/rwsem-xadd.c
index 09e30c6..828ca7c 100644
--- a/kernel/locking/rwsem-xadd.c
+++ b/kernel/locking/rwsem-xadd.c
@@ -319,7 +319,11 @@ static inline bool rwsem_can_spin_on_owner(struct rw_semaphore *sem)
 		goto done;
 	}
 
-	ret = owner->on_cpu;
+	/*
+	 * As lock holder preemption issue, we both skip spinning if task not
+	 * on cpu or its cpu is preempted
+	 */
+	ret = owner->on_cpu && !vcpu_is_preempted(task_cpu(owner));
 done:
 	rcu_read_unlock();
 	return ret;
@@ -340,8 +344,14 @@ bool rwsem_spin_on_owner(struct rw_semaphore *sem, struct task_struct *owner)
 		 */
 		barrier();
 
-		/* abort spinning when need_resched or owner is not running */
-		if (!owner->on_cpu || need_resched()) {
+		/*
+		 * abort spinning when need_resched or owner is not running or
+		 * owner's cpu is preempted. vcpu_is_preempted is a macro
+		 * defined by false if arch does not support vcpu preempted
+		 * check
+		 */
+		if (!owner->on_cpu || need_resched() ||
+				vcpu_is_preempted(task_cpu(owner))) {
 			rcu_read_unlock();
 			return false;
 		}
-- 
2.4.11

WARNING: multiple messages have this Message-ID (diff)
From: Pan Xinhui <xinhui.pan@linux.vnet.ibm.com>
To: linux-kernel@vger.kernel.org, linuxppc-dev@lists.ozlabs.org,
	virtualization@lists.linux-foundation.org,
	linux-s390@vger.kernel.org
Cc: dave@stgolabs.net, Pan Xinhui <xinhui.pan@linux.vnet.ibm.com>,
	peterz@infradead.org, mpe@ellerman.id.au, boqun.feng@gmail.com,
	will.deacon@arm.com, waiman.long@hpe.com, mingo@redhat.com,
	paulus@samba.org, benh@kernel.crashing.org,
	schwidefsky@de.ibm.com, paulmck@linux.vnet.ibm.com
Subject: [PATCH v2 4/4] kernel/locking: Drop the overload of {mutex, rwsem}_spin_on_owner
Date: Tue, 28 Jun 2016 10:43:11 -0400	[thread overview]
Message-ID: <1467124991-13164-5-git-send-email-xinhui.pan@linux.vnet.ibm.com> (raw)
In-Reply-To: <1467124991-13164-1-git-send-email-xinhui.pan@linux.vnet.ibm.com>

An over-committed guest with more vCPUs than pCPUs has a heavy overload in
the two spin_on_owner. This blames on the lock holder preemption issue.

Kernel has an interface bool vcpu_is_preempted(int cpu) to see if a vCPU is
currently running or not. So break the spin loops on true condition.

test-case:
perf record -a perf bench sched messaging -g 400 -p && perf report

before patch:
20.68%  sched-messaging  [kernel.vmlinux]  [k] mutex_spin_on_owner
 8.45%  sched-messaging  [kernel.vmlinux]  [k] mutex_unlock
 4.12%  sched-messaging  [kernel.vmlinux]  [k] system_call
 3.01%  sched-messaging  [kernel.vmlinux]  [k] system_call_common
 2.83%  sched-messaging  [kernel.vmlinux]  [k] copypage_power7
 2.64%  sched-messaging  [kernel.vmlinux]  [k] rwsem_spin_on_owner
 2.00%  sched-messaging  [kernel.vmlinux]  [k] osq_lock

after patch:
 9.99%  sched-messaging  [kernel.vmlinux]  [k] mutex_unlock
 5.28%  sched-messaging  [unknown]         [H] 0xc0000000000768e0
 4.27%  sched-messaging  [kernel.vmlinux]  [k] __copy_tofrom_user_power7
 3.77%  sched-messaging  [kernel.vmlinux]  [k] copypage_power7
 3.24%  sched-messaging  [kernel.vmlinux]  [k] _raw_write_lock_irq
 3.02%  sched-messaging  [kernel.vmlinux]  [k] system_call
 2.69%  sched-messaging  [kernel.vmlinux]  [k] wait_consider_task

Signed-off-by: Pan Xinhui <xinhui.pan@linux.vnet.ibm.com>
---
 kernel/locking/mutex.c      | 15 +++++++++++++--
 kernel/locking/rwsem-xadd.c | 16 +++++++++++++---
 2 files changed, 26 insertions(+), 5 deletions(-)

diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c
index 79d2d76..ef0451b2 100644
--- a/kernel/locking/mutex.c
+++ b/kernel/locking/mutex.c
@@ -236,7 +236,13 @@ bool mutex_spin_on_owner(struct mutex *lock, struct task_struct *owner)
 		 */
 		barrier();
 
-		if (!owner->on_cpu || need_resched()) {
+		/*
+		 * Use vcpu_is_preempted to detech lock holder preemption issue
+		 * and break. vcpu_is_preempted is a macro defined by false if
+		 * arch does not support vcpu preempted check,
+		 */
+		if (!owner->on_cpu || need_resched() ||
+				vcpu_is_preempted(task_cpu(owner))) {
 			ret = false;
 			break;
 		}
@@ -261,8 +267,13 @@ static inline int mutex_can_spin_on_owner(struct mutex *lock)
 
 	rcu_read_lock();
 	owner = READ_ONCE(lock->owner);
+
+	/*
+	 * As lock holder preemption issue, we both skip spinning if task not
+	 * on cpu or its cpu is preempted
+	 */
 	if (owner)
-		retval = owner->on_cpu;
+		retval = owner->on_cpu && !vcpu_is_preempted(task_cpu(owner));
 	rcu_read_unlock();
 	/*
 	 * if lock->owner is not set, the mutex owner may have just acquired
diff --git a/kernel/locking/rwsem-xadd.c b/kernel/locking/rwsem-xadd.c
index 09e30c6..828ca7c 100644
--- a/kernel/locking/rwsem-xadd.c
+++ b/kernel/locking/rwsem-xadd.c
@@ -319,7 +319,11 @@ static inline bool rwsem_can_spin_on_owner(struct rw_semaphore *sem)
 		goto done;
 	}
 
-	ret = owner->on_cpu;
+	/*
+	 * As lock holder preemption issue, we both skip spinning if task not
+	 * on cpu or its cpu is preempted
+	 */
+	ret = owner->on_cpu && !vcpu_is_preempted(task_cpu(owner));
 done:
 	rcu_read_unlock();
 	return ret;
@@ -340,8 +344,14 @@ bool rwsem_spin_on_owner(struct rw_semaphore *sem, struct task_struct *owner)
 		 */
 		barrier();
 
-		/* abort spinning when need_resched or owner is not running */
-		if (!owner->on_cpu || need_resched()) {
+		/*
+		 * abort spinning when need_resched or owner is not running or
+		 * owner's cpu is preempted. vcpu_is_preempted is a macro
+		 * defined by false if arch does not support vcpu preempted
+		 * check
+		 */
+		if (!owner->on_cpu || need_resched() ||
+				vcpu_is_preempted(task_cpu(owner))) {
 			rcu_read_unlock();
 			return false;
 		}
-- 
2.4.11

  parent reply	other threads:[~2016-06-28 10:46 UTC|newest]

Thread overview: 77+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-06-28 14:43 [PATCH v2 0/4] implement vcpu preempted check Pan Xinhui
2016-06-28 14:43 ` Pan Xinhui
2016-06-28 14:43 ` [PATCH v2 1/4] kernel/sched: introduce vcpu preempted check interface Pan Xinhui
2016-06-28 14:43   ` Pan Xinhui
2016-06-28 14:43 ` [PATCH v2 2/4] powerpc/spinlock: support vcpu preempted check Pan Xinhui
2016-06-28 14:43   ` Pan Xinhui
2016-07-05  9:57   ` Wanpeng Li
2016-07-05  9:57   ` Wanpeng Li
2016-07-06  4:58     ` xinhui
2016-07-06  4:58       ` xinhui
2016-07-06  6:46       ` Wanpeng Li
2016-07-06  6:46         ` Wanpeng Li
2016-07-06  6:46         ` Wanpeng Li
2016-07-06  7:58         ` Peter Zijlstra
2016-07-06  7:58           ` Peter Zijlstra
2016-07-06  8:32           ` Wanpeng Li
2016-07-06  8:32             ` Wanpeng Li
2016-07-06 10:18             ` xinhui
2016-07-06 10:18               ` xinhui
2016-07-06 10:54   ` Balbir Singh
2016-07-06 10:54     ` Balbir Singh
2016-07-06 10:54     ` Balbir Singh
2016-07-15 15:35     ` Pan Xinhui
2016-07-15 15:35     ` Pan Xinhui
2016-06-28 14:43 ` [PATCH v2 3/4] locking/osq: Drop the overload of osq_lock() Pan Xinhui
2016-06-28 14:43 ` Pan Xinhui
2016-06-28 14:43 ` Pan Xinhui [this message]
2016-06-28 14:43   ` [PATCH v2 4/4] kernel/locking: Drop the overload of {mutex, rwsem}_spin_on_owner Pan Xinhui
2016-06-28 14:43   ` Pan Xinhui
2016-07-06  6:52 ` [PATCH v2 0/4] implement vcpu preempted check Peter Zijlstra
2016-07-06  6:52   ` Peter Zijlstra
2016-07-06  7:47   ` Juergen Gross
2016-07-06  7:47     ` Juergen Gross
2016-07-06  8:19     ` Peter Zijlstra
2016-07-06  8:19       ` Peter Zijlstra
2016-07-06  8:38       ` Juergen Gross
2016-07-06  8:38         ` Juergen Gross
2016-07-06 12:44       ` Paolo Bonzini
2016-07-06 12:44         ` Paolo Bonzini
2016-07-06 16:56       ` Christian Borntraeger
2016-07-06 16:56         ` Christian Borntraeger
2016-07-06 16:56         ` Christian Borntraeger
2016-07-06 10:05   ` xinhui
2016-07-06 10:05     ` xinhui
2016-07-06 10:44   ` Paolo Bonzini
2016-07-06 11:59     ` Peter Zijlstra
2016-07-06 11:59     ` Peter Zijlstra
2016-07-06 12:08     ` Wanpeng Li
2016-07-06 12:08       ` Wanpeng Li
2016-07-06 12:28       ` Paolo Bonzini
2016-07-06 12:28         ` Paolo Bonzini
2016-07-06 13:03         ` Wanpeng Li
2016-07-06 13:03           ` Wanpeng Li
2016-07-07  8:48         ` Wanpeng Li
2016-07-07  8:48           ` Wanpeng Li
2016-07-07  9:42           ` Peter Zijlstra
2016-07-07  9:42             ` Peter Zijlstra
2016-07-07 10:12             ` Wanpeng Li
2016-07-07 10:12               ` Wanpeng Li
2016-07-07 10:27               ` Wanpeng Li
2016-07-07 10:27                 ` Wanpeng Li
2016-07-07 11:15                 ` Peter Zijlstra
2016-07-07 11:15                   ` Peter Zijlstra
2016-07-07 11:08               ` Peter Zijlstra
2016-07-07 11:08                 ` Peter Zijlstra
2016-07-07 11:09             ` Peter Zijlstra
2016-07-07 11:09               ` Peter Zijlstra
2016-07-07 11:21             ` Peter Zijlstra
2016-07-07 11:21               ` Peter Zijlstra
2016-07-11 15:10   ` Waiman Long
2016-07-11 15:10     ` Waiman Long
2016-07-11 15:10     ` Waiman Long
2016-07-12  4:16     ` Juergen Gross
2016-07-12  4:16     ` Juergen Gross
2016-07-12 18:16       ` Waiman Long
2016-07-12 18:16         ` Waiman Long
2016-07-12 18:16         ` Waiman Long

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1467124991-13164-5-git-send-email-xinhui.pan@linux.vnet.ibm.com \
    --to=xinhui.pan@linux.vnet.ibm.com \
    --cc=benh@kernel.crashing.org \
    --cc=boqun.feng@gmail.com \
    --cc=dave@stgolabs.net \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-s390@vger.kernel.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=mingo@redhat.com \
    --cc=mpe@ellerman.id.au \
    --cc=paulmck@linux.vnet.ibm.com \
    --cc=paulus@samba.org \
    --cc=peterz@infradead.org \
    --cc=schwidefsky@de.ibm.com \
    --cc=virtualization@lists.linux-foundation.org \
    --cc=waiman.long@hpe.com \
    --cc=will.deacon@arm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.