All of lore.kernel.org
 help / color / mirror / Atom feed
From: Hanjun Guo <guohanjun@huawei.com>
To: Jan Glauber <jglauber@marvell.com>, Alex Kogan <alex.kogan@oracle.com>
Cc: "linux@armlinux.org.uk" <linux@armlinux.org.uk>,
	Peter Zijlstra <peterz@infradead.org>,
	Ingo Molnar <mingo@redhat.com>, Will Deacon <will.deacon@arm.com>,
	Arnd Bergmann <arnd@arndb.de>,
	"longman@redhat.com" <longman@redhat.com>,
	"linux-arch@vger.kernel.org" <linux-arch@vger.kernel.org>,
	linux-arm-kernel <linux-arm-kernel@lists.infradead.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"tglx@linutronix.de" <tglx@linutronix.de>,
	Borislav Petkov <bp@alien8.de>, "hpa@zytor.com" <hpa@zytor.com>,
	"x86@kernel.org" <x86@kernel.org>,
	"steven.sistare@oracle.com" <steven.sistare@oracle.com>,
	"daniel.m.jordan@oracle.com" <daniel.m.jordan@oracle.com>,
	"dave.dice@oracle.com" <dave.dice@oracle.com>,
	"rahul.x.yadav@oracle.com" <rahul.x.yadav@oracle.com>
Subject: Re: [PATCH v2 0/5] Add NUMA-awareness to qspinlock
Date: Fri, 12 Jul 2019 16:12:05 +0800	[thread overview]
Message-ID: <95683b80-f694-cf34-73fc-e6ec05462ee0@huawei.com> (raw)
In-Reply-To: <CAEiAFz238Ywgn6iDAz9gM_3PgPhs-YuAVDptehUBv7MRRPx8Cw@mail.gmail.com>

On 2019/7/3 19:58, Jan Glauber wrote:
> Hi Alex,
> I've tried this series on arm64 (ThunderX2 with up to SMT=4  and 224 CPUs)
> with the borderline testcase of accessing a single file from all
> threads. With that
> testcase the qspinlock slowpath is the top spot in the kernel.
> 
> The results look really promising:
> 
> CPUs    normal    numa-qspinlocks
> ---------------------------------------------
> 56        149.41          73.90
> 224      576.95          290.31
> 
> Also frontend-stalls are reduced to 50% and interconnect traffic is
> greatly reduced.
> Tested-by: Jan Glauber <jglauber@marvell.com>

Tested this patchset on Kunpeng920 ARM64 server (96 cores,
4 NUMA nodes), and with the same test case from Jan, I can
see 150%+ boost! (Need to add a patch below [1].)

For the real workload such as Nginx I can see about 10%
performance improvement as well.

Tested-by: Hanjun Guo <guohanjun@huawei.com>

Please cc me for new versions and I'm willing to test it.

Thanks
Hanjun

[1]
diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 657bbc5..72c1346 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -792,6 +792,20 @@ config NODES_SHIFT
          Specify the maximum number of NUMA Nodes available on the target
          system.  Increases memory reserved to accommodate various tables.

+config NUMA_AWARE_SPINLOCKS
+ bool "Numa-aware spinlocks"
+ depends on NUMA
+ default y
+ help
+   Introduce NUMA (Non Uniform Memory Access) awareness into
+   the slow path of spinlocks.
+
+   The kernel will try to keep the lock on the same node,
+   thus reducing the number of remote cache misses, while
+   trading some of the short term fairness for better performance.
+
+   Say N if you want absolute first come first serve fairness.
+
 config USE_PERCPU_NUMA_NODE_ID
        def_bool y
        depends on NUMA
diff --git a/kernel/locking/qspinlock_cna.h b/kernel/locking/qspinlock_cna.h
index 2994167..be5dd44 100644
--- a/kernel/locking/qspinlock_cna.h
+++ b/kernel/locking/qspinlock_cna.h
@@ -4,7 +4,7 @@
 #endif

 #include <linux/random.h>
-
+#include <linux/topology.h>
 /*
  * Implement a NUMA-aware version of MCS (aka CNA, or compact NUMA-aware lock).
  *
@@ -170,7 +170,7 @@ static __always_inline void cna_init_node(struct mcs_spinlock *node, int cpuid,
                                          u32 tail)
 {
        if (decode_numa_node(node->node_and_count) == -1)
-           store_numa_node(node, numa_cpu_node(cpuid));
+         store_numa_node(node, cpu_to_node(cpuid));
        node->encoded_tail = tail;
 }


WARNING: multiple messages have this Message-ID (diff)
From: Hanjun Guo <guohanjun@huawei.com>
To: Jan Glauber <jglauber@marvell.com>, Alex Kogan <alex.kogan@oracle.com>
Cc: "linux-arch@vger.kernel.org" <linux-arch@vger.kernel.org>,
	Arnd Bergmann <arnd@arndb.de>,
	Peter Zijlstra <peterz@infradead.org>,
	"dave.dice@oracle.com" <dave.dice@oracle.com>,
	"x86@kernel.org" <x86@kernel.org>,
	Will Deacon <will.deacon@arm.com>,
	"linux@armlinux.org.uk" <linux@armlinux.org.uk>,
	"steven.sistare@oracle.com" <steven.sistare@oracle.com>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"rahul.x.yadav@oracle.com" <rahul.x.yadav@oracle.com>,
	Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>,
	"hpa@zytor.com" <hpa@zytor.com>,
	"longman@redhat.com" <longman@redhat.com>,
	"tglx@linutronix.de" <tglx@linutronix.de>,
	"daniel.m.jordan@oracle.com" <daniel.m.jordan@oracle.com>,
	linux-arm-kernel <linux-arm-kernel@lists.infradead.org>
Subject: Re: [PATCH v2 0/5] Add NUMA-awareness to qspinlock
Date: Fri, 12 Jul 2019 16:12:05 +0800	[thread overview]
Message-ID: <95683b80-f694-cf34-73fc-e6ec05462ee0@huawei.com> (raw)
In-Reply-To: <CAEiAFz238Ywgn6iDAz9gM_3PgPhs-YuAVDptehUBv7MRRPx8Cw@mail.gmail.com>

On 2019/7/3 19:58, Jan Glauber wrote:
> Hi Alex,
> I've tried this series on arm64 (ThunderX2 with up to SMT=4  and 224 CPUs)
> with the borderline testcase of accessing a single file from all
> threads. With that
> testcase the qspinlock slowpath is the top spot in the kernel.
> 
> The results look really promising:
> 
> CPUs    normal    numa-qspinlocks
> ---------------------------------------------
> 56        149.41          73.90
> 224      576.95          290.31
> 
> Also frontend-stalls are reduced to 50% and interconnect traffic is
> greatly reduced.
> Tested-by: Jan Glauber <jglauber@marvell.com>

Tested this patchset on Kunpeng920 ARM64 server (96 cores,
4 NUMA nodes), and with the same test case from Jan, I can
see 150%+ boost! (Need to add a patch below [1].)

For the real workload such as Nginx I can see about 10%
performance improvement as well.

Tested-by: Hanjun Guo <guohanjun@huawei.com>

Please cc me for new versions and I'm willing to test it.

Thanks
Hanjun

[1]
diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 657bbc5..72c1346 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -792,6 +792,20 @@ config NODES_SHIFT
          Specify the maximum number of NUMA Nodes available on the target
          system.  Increases memory reserved to accommodate various tables.

+config NUMA_AWARE_SPINLOCKS
+ bool "Numa-aware spinlocks"
+ depends on NUMA
+ default y
+ help
+   Introduce NUMA (Non Uniform Memory Access) awareness into
+   the slow path of spinlocks.
+
+   The kernel will try to keep the lock on the same node,
+   thus reducing the number of remote cache misses, while
+   trading some of the short term fairness for better performance.
+
+   Say N if you want absolute first come first serve fairness.
+
 config USE_PERCPU_NUMA_NODE_ID
        def_bool y
        depends on NUMA
diff --git a/kernel/locking/qspinlock_cna.h b/kernel/locking/qspinlock_cna.h
index 2994167..be5dd44 100644
--- a/kernel/locking/qspinlock_cna.h
+++ b/kernel/locking/qspinlock_cna.h
@@ -4,7 +4,7 @@
 #endif

 #include <linux/random.h>
-
+#include <linux/topology.h>
 /*
  * Implement a NUMA-aware version of MCS (aka CNA, or compact NUMA-aware lock).
  *
@@ -170,7 +170,7 @@ static __always_inline void cna_init_node(struct mcs_spinlock *node, int cpuid,
                                          u32 tail)
 {
        if (decode_numa_node(node->node_and_count) == -1)
-           store_numa_node(node, numa_cpu_node(cpuid));
+         store_numa_node(node, cpu_to_node(cpuid));
        node->encoded_tail = tail;
 }


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

  reply	other threads:[~2019-07-12  8:12 UTC|newest]

Thread overview: 86+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-03-29 15:20 [PATCH v2 0/5] Add NUMA-awareness to qspinlock Alex Kogan
2019-03-29 15:20 ` Alex Kogan
2019-03-29 15:20 ` [PATCH v2 1/5] locking/qspinlock: Make arch_mcs_spin_unlock_contended more generic Alex Kogan
2019-03-29 15:20   ` Alex Kogan
2019-03-29 15:20 ` [PATCH v2 2/5] locking/qspinlock: Refactor the qspinlock slow path Alex Kogan
2019-03-29 15:20   ` Alex Kogan
2019-03-29 15:20 ` [PATCH v2 3/5] locking/qspinlock: Introduce CNA into the slow path of qspinlock Alex Kogan
2019-03-29 15:20   ` Alex Kogan
2019-04-01  9:06   ` Peter Zijlstra
2019-04-01  9:06     ` Peter Zijlstra
2019-04-01  9:06     ` Peter Zijlstra
2019-04-01  9:33     ` Peter Zijlstra
2019-04-01  9:33       ` Peter Zijlstra
2019-04-01  9:33       ` Peter Zijlstra
2019-04-03 15:53       ` Alex Kogan
2019-04-03 15:53         ` Alex Kogan
2019-04-03 16:10         ` Peter Zijlstra
2019-04-03 16:10           ` Peter Zijlstra
2019-04-03 16:10           ` Peter Zijlstra
2019-04-01  9:21   ` Peter Zijlstra
2019-04-01  9:21     ` Peter Zijlstra
2019-04-01 14:36   ` Waiman Long
2019-04-01 14:36     ` Waiman Long
2019-04-02  9:43     ` Peter Zijlstra
2019-04-02  9:43       ` Peter Zijlstra
2019-04-02  9:43       ` Peter Zijlstra
2019-04-03 15:39       ` Alex Kogan
2019-04-03 15:39         ` Alex Kogan
2019-04-03 15:48         ` Waiman Long
2019-04-03 15:48           ` Waiman Long
2019-04-03 16:01         ` Peter Zijlstra
2019-04-03 16:01           ` Peter Zijlstra
2019-04-04  5:05           ` Juergen Gross
2019-04-04  5:05             ` Juergen Gross
2019-04-04  9:38             ` Peter Zijlstra
2019-04-04  9:38               ` Peter Zijlstra
2019-04-04  9:38               ` Peter Zijlstra
2019-04-04 18:03               ` Waiman Long
2019-04-04 18:03                 ` Waiman Long
2019-06-04 23:21           ` Alex Kogan
2019-06-04 23:21             ` Alex Kogan
2019-06-05 20:40             ` Peter Zijlstra
2019-06-05 20:40               ` Peter Zijlstra
2019-06-06 15:21               ` Alex Kogan
2019-06-06 15:21                 ` Alex Kogan
2019-06-06 15:32                 ` Waiman Long
2019-06-06 15:32                   ` Waiman Long
2019-06-06 15:42                   ` Waiman Long
2019-06-06 15:42                     ` Waiman Long
2019-04-03 16:33       ` Waiman Long
2019-04-03 16:33         ` Waiman Long
2019-04-03 17:16         ` Peter Zijlstra
2019-04-03 17:16           ` Peter Zijlstra
2019-04-03 17:16           ` Peter Zijlstra
2019-04-03 17:40           ` Waiman Long
2019-04-03 17:40             ` Waiman Long
2019-04-04  2:02   ` Hanjun Guo
2019-04-04  2:02     ` Hanjun Guo
2019-04-04  2:02     ` Hanjun Guo
2019-04-04  3:14     ` Alex Kogan
2019-04-04  3:14       ` Alex Kogan
2019-06-11  4:22   ` liwei (GF)
2019-06-11  4:22     ` liwei (GF)
2019-06-11  4:22     ` liwei (GF)
2019-06-12  4:38     ` Alex Kogan
2019-06-12  4:38       ` Alex Kogan
2019-06-12 15:05       ` Waiman Long
2019-06-12 15:05         ` Waiman Long
2019-03-29 15:20 ` [PATCH v2 4/5] locking/qspinlock: Introduce starvation avoidance into CNA Alex Kogan
2019-03-29 15:20   ` Alex Kogan
2019-04-02 10:37   ` Peter Zijlstra
2019-04-02 10:37     ` Peter Zijlstra
2019-04-02 10:37     ` Peter Zijlstra
2019-04-03 17:06     ` Alex Kogan
2019-04-03 17:06       ` Alex Kogan
2019-03-29 15:20 ` [PATCH v2 5/5] locking/qspinlock: Introduce the shuffle reduction optimization " Alex Kogan
2019-03-29 15:20   ` Alex Kogan
2019-04-01  9:09 ` [PATCH v2 0/5] Add NUMA-awareness to qspinlock Peter Zijlstra
2019-04-01  9:09   ` Peter Zijlstra
2019-04-01  9:09   ` Peter Zijlstra
2019-04-03 17:13   ` Alex Kogan
2019-04-03 17:13     ` Alex Kogan
2019-07-03 11:57 ` Jan Glauber
2019-07-03 11:58   ` Jan Glauber
2019-07-12  8:12   ` Hanjun Guo [this message]
2019-07-12  8:12     ` Hanjun Guo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=95683b80-f694-cf34-73fc-e6ec05462ee0@huawei.com \
    --to=guohanjun@huawei.com \
    --cc=alex.kogan@oracle.com \
    --cc=arnd@arndb.de \
    --cc=bp@alien8.de \
    --cc=daniel.m.jordan@oracle.com \
    --cc=dave.dice@oracle.com \
    --cc=hpa@zytor.com \
    --cc=jglauber@marvell.com \
    --cc=linux-arch@vger.kernel.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux@armlinux.org.uk \
    --cc=longman@redhat.com \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=rahul.x.yadav@oracle.com \
    --cc=steven.sistare@oracle.com \
    --cc=tglx@linutronix.de \
    --cc=will.deacon@arm.com \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.