linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH -tip 0/2] kernel/smp: Small csd_lock optimizations
@ 2016-03-10  1:55 Davidlohr Bueso
  2016-03-10  1:55 ` [PATCH 1/2] kernel/smp: Explicitly inline cds_lock helpers Davidlohr Bueso
                   ` (2 more replies)
  0 siblings, 3 replies; 6+ messages in thread
From: Davidlohr Bueso @ 2016-03-10  1:55 UTC (permalink / raw)
  To: mingo; +Cc: peterz, dave, linux-kernel

From: Davidlohr Bueso <dave@stgolabs.net>

Hi,

Justifications are in each patch, there is slight impact (patch 2)
on some tlb flushing intensive benchmarks (albeit using ipi batching
nowadays).  Specifically for the pft
benchmark, on a 12-core box:

pft faults
                              4.4                         4.4
                          vanilla                         smp
Hmean    faults/cpu-1   801432.1608 (  0.00%)  795719.8859 ( -0.71%)
Hmean    faults/cpu-3   702578.6659 (  0.00%)  752796.6960 (  7.15%)
Hmean    faults/cpu-5   606080.3473 (  0.00%)  595890.0451 ( -1.68%)
Hmean    faults/cpu-7   460369.0724 (  0.00%)  485283.6343 (  5.41%)
Hmean    faults/cpu-12  294445.4701 (  0.00%)  298300.6011 (  1.31%)
Hmean    faults/cpu-18  213156.0860 (  0.00%)  213584.2741 (  0.20%)
Hmean    faults/cpu-24  153104.2995 (  0.00%)  153198.8473 (  0.06%)
Hmean    faults/sec-1   796329.3184 (  0.00%)  614222.4594 (-22.87%)
Hmean    faults/sec-3  1947806.7372 (  0.00%) 2169267.1582 ( 11.37%)
Hmean    faults/sec-5  2611152.0422 (  0.00%) 2544652.6871 ( -2.55%)
Hmean    faults/sec-7  2493705.4668 (  0.00%) 2674847.5270 (  7.26%)
Hmean    faults/sec-12 2583139.7724 (  0.00%) 2614404.6002 (  1.21%)
Hmean    faults/sec-18 2661410.8170 (  0.00%) 2683427.0703 (  0.83%)
Hmean    faults/sec-24 2670463.4814 (  0.00%) 2666221.6332 ( -0.16%)
Stddev   faults/cpu-1    27537.6676 (  0.00%)   25753.4945 (  6.48%)
Stddev   faults/cpu-3    62616.8041 (  0.00%)   44728.0990 ( 28.57%)
Stddev   faults/cpu-5    70976.9184 (  0.00%)   74720.5716 ( -5.27%)
Stddev   faults/cpu-7    47426.5952 (  0.00%)   32758.2705 ( 30.93%)
Stddev   faults/cpu-12    6951.8792 (  0.00%)    9097.0782 (-30.86%)
Stddev   faults/cpu-18    4293.1696 (  0.00%)    5826.9446 (-35.73%)
Stddev   faults/cpu-24    3195.0939 (  0.00%)    3373.7230 ( -5.59%)
Stddev   faults/sec-1    27315.3093 (  0.00%)  148601.7795 (-444.02%)
Stddev   faults/sec-3   271560.5941 (  0.00%)  193681.0177 ( 28.68%)
Stddev   faults/sec-5   429633.7378 (  0.00%)  458426.3306 ( -6.70%)
Stddev   faults/sec-7   338229.0746 (  0.00%)  226146.3450 ( 33.14%)
Stddev   faults/sec-12   57766.4604 (  0.00%)   82734.3638 (-43.22%)
Stddev   faults/sec-18  118572.1909 (  0.00%)  134966.7210 (-13.83%)
Stddev   faults/sec-24   57452.7350 (  0.00%)   57542.7755 ( -0.16%)

                 4.4         4.4
             vanilla         smp
User           11.91       11.85
System        197.11      194.69
Elapsed        44.24       40.26

While the single thread is an abnormality, overall we don't seem
to do any harm (noise range). Could be give or take, but overall
the patches at least make some sense afaict.

Thanks!

Davidlohr Bueso (2):
  kernel/smp: Explicitly inline cds_lock helpers
  kernel/smp: Use make csd_lock_wait be smp_cond_acquire

 kernel/smp.c | 9 ++++-----
 1 file changed, 4 insertions(+), 5 deletions(-)

--
2.1.4

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH 1/2] kernel/smp: Explicitly inline cds_lock helpers
  2016-03-10  1:55 [PATCH -tip 0/2] kernel/smp: Small csd_lock optimizations Davidlohr Bueso
@ 2016-03-10  1:55 ` Davidlohr Bueso
  2016-03-10 11:05   ` [tip:locking/core] locking/csd_lock: Explicitly inline csd_lock*() helpers tip-bot for Davidlohr Bueso
  2016-03-10  1:55 ` [PATCH 2/2] kernel/smp: Use make csd_lock_wait be smp_cond_acquire Davidlohr Bueso
  2016-03-10  9:17 ` [PATCH -tip 0/2] kernel/smp: Small csd_lock optimizations Peter Zijlstra
  2 siblings, 1 reply; 6+ messages in thread
From: Davidlohr Bueso @ 2016-03-10  1:55 UTC (permalink / raw)
  To: mingo; +Cc: peterz, dave, linux-kernel, Davidlohr Bueso

From: Davidlohr Bueso <dave@stgolabs.net>

While the compiler tends to already to it for us (except for
cds_unlock), make it explicit. These helpers mainly deal with
the ->flags, are short-lived  and can be called, for example,
from smp_call_function_many().

Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
---
 kernel/smp.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/kernel/smp.c b/kernel/smp.c
index 822ffb1ada3f..c91e00178f8f 100644
--- a/kernel/smp.c
+++ b/kernel/smp.c
@@ -105,13 +105,13 @@ void __init call_function_init(void)
  * previous function call. For multi-cpu calls its even more interesting
  * as we'll have to ensure no other cpu is observing our csd.
  */
-static void csd_lock_wait(struct call_single_data *csd)
+static __always_inline void csd_lock_wait(struct call_single_data *csd)
 {
 	while (smp_load_acquire(&csd->flags) & CSD_FLAG_LOCK)
 		cpu_relax();
 }
 
-static void csd_lock(struct call_single_data *csd)
+static __always_inline void csd_lock(struct call_single_data *csd)
 {
 	csd_lock_wait(csd);
 	csd->flags |= CSD_FLAG_LOCK;
@@ -124,7 +124,7 @@ static void csd_lock(struct call_single_data *csd)
 	smp_wmb();
 }
 
-static void csd_unlock(struct call_single_data *csd)
+static __always_inline void csd_unlock(struct call_single_data *csd)
 {
 	WARN_ON(!(csd->flags & CSD_FLAG_LOCK));
 
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH 2/2] kernel/smp: Use make csd_lock_wait be smp_cond_acquire
  2016-03-10  1:55 [PATCH -tip 0/2] kernel/smp: Small csd_lock optimizations Davidlohr Bueso
  2016-03-10  1:55 ` [PATCH 1/2] kernel/smp: Explicitly inline cds_lock helpers Davidlohr Bueso
@ 2016-03-10  1:55 ` Davidlohr Bueso
  2016-03-10 11:06   ` [tip:locking/core] locking/csd_lock: Use smp_cond_acquire() in csd_lock_wait() tip-bot for Davidlohr Bueso
  2016-03-10  9:17 ` [PATCH -tip 0/2] kernel/smp: Small csd_lock optimizations Peter Zijlstra
  2 siblings, 1 reply; 6+ messages in thread
From: Davidlohr Bueso @ 2016-03-10  1:55 UTC (permalink / raw)
  To: mingo; +Cc: peterz, dave, linux-kernel, Davidlohr Bueso

From: Davidlohr Bueso <dave@stgolabs>

We can micro-optimize this call and mildly relax the
barrier requirements by relying on ctrl + rmb, keeping
the acquire semantics. In addition, this is pretty much
the now standard for busy-waiting under such restraints.

Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
---
 kernel/smp.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/kernel/smp.c b/kernel/smp.c
index c91e00178f8f..74165443c240 100644
--- a/kernel/smp.c
+++ b/kernel/smp.c
@@ -107,8 +107,7 @@ void __init call_function_init(void)
  */
 static __always_inline void csd_lock_wait(struct call_single_data *csd)
 {
-	while (smp_load_acquire(&csd->flags) & CSD_FLAG_LOCK)
-		cpu_relax();
+	smp_cond_acquire(!(csd->flags & CSD_FLAG_LOCK));
 }
 
 static __always_inline void csd_lock(struct call_single_data *csd)
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH -tip 0/2] kernel/smp: Small csd_lock optimizations
  2016-03-10  1:55 [PATCH -tip 0/2] kernel/smp: Small csd_lock optimizations Davidlohr Bueso
  2016-03-10  1:55 ` [PATCH 1/2] kernel/smp: Explicitly inline cds_lock helpers Davidlohr Bueso
  2016-03-10  1:55 ` [PATCH 2/2] kernel/smp: Use make csd_lock_wait be smp_cond_acquire Davidlohr Bueso
@ 2016-03-10  9:17 ` Peter Zijlstra
  2 siblings, 0 replies; 6+ messages in thread
From: Peter Zijlstra @ 2016-03-10  9:17 UTC (permalink / raw)
  To: Davidlohr Bueso; +Cc: mingo, dave, linux-kernel

On Wed, Mar 09, 2016 at 05:55:34PM -0800, Davidlohr Bueso wrote:
> From: Davidlohr Bueso <dave@stgolabs.net>
> 
> Hi,
> 
> Justifications are in each patch, there is slight impact (patch 2)
> on some tlb flushing intensive benchmarks (albeit using ipi batching
> nowadays).  Specifically for the pft
> benchmark, on a 12-core box:

>                  4.4         4.4
>              vanilla         smp
> User           11.91       11.85
> System        197.11      194.69
> Elapsed        44.24       40.26
> 
> While the single thread is an abnormality, overall we don't seem
> to do any harm (noise range). Could be give or take, but overall
> the patches at least make some sense afaict.

Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [tip:locking/core] locking/csd_lock: Explicitly inline csd_lock*() helpers
  2016-03-10  1:55 ` [PATCH 1/2] kernel/smp: Explicitly inline cds_lock helpers Davidlohr Bueso
@ 2016-03-10 11:05   ` tip-bot for Davidlohr Bueso
  0 siblings, 0 replies; 6+ messages in thread
From: tip-bot for Davidlohr Bueso @ 2016-03-10 11:05 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: mingo, tglx, peterz, linux-kernel, dbueso, hpa, torvalds, dave

Commit-ID:  90d1098478fb08a1ef166fe91622d8046869e17b
Gitweb:     http://git.kernel.org/tip/90d1098478fb08a1ef166fe91622d8046869e17b
Author:     Davidlohr Bueso <dave@stgolabs.net>
AuthorDate: Wed, 9 Mar 2016 17:55:35 -0800
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Thu, 10 Mar 2016 10:28:35 +0100

locking/csd_lock: Explicitly inline csd_lock*() helpers

While the compiler tends to already to it for us (except for
csd_unlock), make it explicit. These helpers mainly deal with
the ->flags, are short-lived  and can be called, for example,
from smp_call_function_many().

Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: dave@stgolabs.net
Link: http://lkml.kernel.org/r/1457574936-19065-2-git-send-email-dbueso@suse.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 kernel/smp.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/kernel/smp.c b/kernel/smp.c
index d903c02..5099db1 100644
--- a/kernel/smp.c
+++ b/kernel/smp.c
@@ -105,13 +105,13 @@ void __init call_function_init(void)
  * previous function call. For multi-cpu calls its even more interesting
  * as we'll have to ensure no other cpu is observing our csd.
  */
-static void csd_lock_wait(struct call_single_data *csd)
+static __always_inline void csd_lock_wait(struct call_single_data *csd)
 {
 	while (smp_load_acquire(&csd->flags) & CSD_FLAG_LOCK)
 		cpu_relax();
 }
 
-static void csd_lock(struct call_single_data *csd)
+static __always_inline void csd_lock(struct call_single_data *csd)
 {
 	csd_lock_wait(csd);
 	csd->flags |= CSD_FLAG_LOCK;
@@ -124,7 +124,7 @@ static void csd_lock(struct call_single_data *csd)
 	smp_wmb();
 }
 
-static void csd_unlock(struct call_single_data *csd)
+static __always_inline void csd_unlock(struct call_single_data *csd)
 {
 	WARN_ON(!(csd->flags & CSD_FLAG_LOCK));
 

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [tip:locking/core] locking/csd_lock: Use smp_cond_acquire() in csd_lock_wait()
  2016-03-10  1:55 ` [PATCH 2/2] kernel/smp: Use make csd_lock_wait be smp_cond_acquire Davidlohr Bueso
@ 2016-03-10 11:06   ` tip-bot for Davidlohr Bueso
  0 siblings, 0 replies; 6+ messages in thread
From: tip-bot for Davidlohr Bueso @ 2016-03-10 11:06 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, tglx, dbueso, hpa, torvalds, mingo, peterz

Commit-ID:  38460a2178d225b39ade5ac66586c3733391cf86
Gitweb:     http://git.kernel.org/tip/38460a2178d225b39ade5ac66586c3733391cf86
Author:     Davidlohr Bueso <dave@stgolabs>
AuthorDate: Wed, 9 Mar 2016 17:55:36 -0800
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Thu, 10 Mar 2016 10:28:35 +0100

locking/csd_lock: Use smp_cond_acquire() in csd_lock_wait()

We can micro-optimize this call and mildly relax the
barrier requirements by relying on ctrl + rmb, keeping
the acquire semantics. In addition, this is pretty much
the now standard for busy-waiting under such restraints.

Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: dave@stgolabs.net
Link: http://lkml.kernel.org/r/1457574936-19065-3-git-send-email-dbueso@suse.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 kernel/smp.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/kernel/smp.c b/kernel/smp.c
index 5099db1..300d293 100644
--- a/kernel/smp.c
+++ b/kernel/smp.c
@@ -107,8 +107,7 @@ void __init call_function_init(void)
  */
 static __always_inline void csd_lock_wait(struct call_single_data *csd)
 {
-	while (smp_load_acquire(&csd->flags) & CSD_FLAG_LOCK)
-		cpu_relax();
+	smp_cond_acquire(!(csd->flags & CSD_FLAG_LOCK));
 }
 
 static __always_inline void csd_lock(struct call_single_data *csd)

^ permalink raw reply related	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2016-03-10 11:07 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-03-10  1:55 [PATCH -tip 0/2] kernel/smp: Small csd_lock optimizations Davidlohr Bueso
2016-03-10  1:55 ` [PATCH 1/2] kernel/smp: Explicitly inline cds_lock helpers Davidlohr Bueso
2016-03-10 11:05   ` [tip:locking/core] locking/csd_lock: Explicitly inline csd_lock*() helpers tip-bot for Davidlohr Bueso
2016-03-10  1:55 ` [PATCH 2/2] kernel/smp: Use make csd_lock_wait be smp_cond_acquire Davidlohr Bueso
2016-03-10 11:06   ` [tip:locking/core] locking/csd_lock: Use smp_cond_acquire() in csd_lock_wait() tip-bot for Davidlohr Bueso
2016-03-10  9:17 ` [PATCH -tip 0/2] kernel/smp: Small csd_lock optimizations Peter Zijlstra

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).