From: Hanjun Guo <guohanjun@huawei.com> To: Jan Glauber <jglauber@marvell.com>, Alex Kogan <alex.kogan@oracle.com> Cc: "linux@armlinux.org.uk" <linux@armlinux.org.uk>, Peter Zijlstra <peterz@infradead.org>, Ingo Molnar <mingo@redhat.com>, Will Deacon <will.deacon@arm.com>, Arnd Bergmann <arnd@arndb.de>, "longman@redhat.com" <longman@redhat.com>, "linux-arch@vger.kernel.org" <linux-arch@vger.kernel.org>, linux-arm-kernel <linux-arm-kernel@lists.infradead.org>, "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>, "tglx@linutronix.de" <tglx@linutronix.de>, Borislav Petkov <bp@alien8.de>, "hpa@zytor.com" <hpa@zytor.com>, "x86@kernel.org" <x86@kernel.org>, "steven.sistare@oracle.com" <steven.sistare@oracle.com>, "daniel.m.jordan@oracle.com" <daniel.m.jordan@oracle.com>, "dave.dice@oracle.com" <dave.dice@oracle.com>, "rahul.x.yadav@oracle.com" <rahul.x.yadav@oracle.com> Subject: Re: [PATCH v2 0/5] Add NUMA-awareness to qspinlock Date: Fri, 12 Jul 2019 16:12:05 +0800 [thread overview] Message-ID: <95683b80-f694-cf34-73fc-e6ec05462ee0@huawei.com> (raw) In-Reply-To: <CAEiAFz238Ywgn6iDAz9gM_3PgPhs-YuAVDptehUBv7MRRPx8Cw@mail.gmail.com> On 2019/7/3 19:58, Jan Glauber wrote: > Hi Alex, > I've tried this series on arm64 (ThunderX2 with up to SMT=4 and 224 CPUs) > with the borderline testcase of accessing a single file from all > threads. With that > testcase the qspinlock slowpath is the top spot in the kernel. > > The results look really promising: > > CPUs normal numa-qspinlocks > --------------------------------------------- > 56 149.41 73.90 > 224 576.95 290.31 > > Also frontend-stalls are reduced to 50% and interconnect traffic is > greatly reduced. > Tested-by: Jan Glauber <jglauber@marvell.com> Tested this patchset on Kunpeng920 ARM64 server (96 cores, 4 NUMA nodes), and with the same test case from Jan, I can see 150%+ boost! (Need to add a patch below [1].) For the real workload such as Nginx I can see about 10% performance improvement as well. Tested-by: Hanjun Guo <guohanjun@huawei.com> Please cc me for new versions and I'm willing to test it. Thanks Hanjun [1] diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig index 657bbc5..72c1346 100644 --- a/arch/arm64/Kconfig +++ b/arch/arm64/Kconfig @@ -792,6 +792,20 @@ config NODES_SHIFT Specify the maximum number of NUMA Nodes available on the target system. Increases memory reserved to accommodate various tables. +config NUMA_AWARE_SPINLOCKS + bool "Numa-aware spinlocks" + depends on NUMA + default y + help + Introduce NUMA (Non Uniform Memory Access) awareness into + the slow path of spinlocks. + + The kernel will try to keep the lock on the same node, + thus reducing the number of remote cache misses, while + trading some of the short term fairness for better performance. + + Say N if you want absolute first come first serve fairness. + config USE_PERCPU_NUMA_NODE_ID def_bool y depends on NUMA diff --git a/kernel/locking/qspinlock_cna.h b/kernel/locking/qspinlock_cna.h index 2994167..be5dd44 100644 --- a/kernel/locking/qspinlock_cna.h +++ b/kernel/locking/qspinlock_cna.h @@ -4,7 +4,7 @@ #endif #include <linux/random.h> - +#include <linux/topology.h> /* * Implement a NUMA-aware version of MCS (aka CNA, or compact NUMA-aware lock). * @@ -170,7 +170,7 @@ static __always_inline void cna_init_node(struct mcs_spinlock *node, int cpuid, u32 tail) { if (decode_numa_node(node->node_and_count) == -1) - store_numa_node(node, numa_cpu_node(cpuid)); + store_numa_node(node, cpu_to_node(cpuid)); node->encoded_tail = tail; }
WARNING: multiple messages have this Message-ID (diff)
From: Hanjun Guo <guohanjun@huawei.com> To: Jan Glauber <jglauber@marvell.com>, Alex Kogan <alex.kogan@oracle.com> Cc: "linux-arch@vger.kernel.org" <linux-arch@vger.kernel.org>, Arnd Bergmann <arnd@arndb.de>, Peter Zijlstra <peterz@infradead.org>, "dave.dice@oracle.com" <dave.dice@oracle.com>, "x86@kernel.org" <x86@kernel.org>, Will Deacon <will.deacon@arm.com>, "linux@armlinux.org.uk" <linux@armlinux.org.uk>, "steven.sistare@oracle.com" <steven.sistare@oracle.com>, "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>, "rahul.x.yadav@oracle.com" <rahul.x.yadav@oracle.com>, Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>, "hpa@zytor.com" <hpa@zytor.com>, "longman@redhat.com" <longman@redhat.com>, "tglx@linutronix.de" <tglx@linutronix.de>, "daniel.m.jordan@oracle.com" <daniel.m.jordan@oracle.com>, linux-arm-kernel <linux-arm-kernel@lists.infradead.org> Subject: Re: [PATCH v2 0/5] Add NUMA-awareness to qspinlock Date: Fri, 12 Jul 2019 16:12:05 +0800 [thread overview] Message-ID: <95683b80-f694-cf34-73fc-e6ec05462ee0@huawei.com> (raw) In-Reply-To: <CAEiAFz238Ywgn6iDAz9gM_3PgPhs-YuAVDptehUBv7MRRPx8Cw@mail.gmail.com> On 2019/7/3 19:58, Jan Glauber wrote: > Hi Alex, > I've tried this series on arm64 (ThunderX2 with up to SMT=4 and 224 CPUs) > with the borderline testcase of accessing a single file from all > threads. With that > testcase the qspinlock slowpath is the top spot in the kernel. > > The results look really promising: > > CPUs normal numa-qspinlocks > --------------------------------------------- > 56 149.41 73.90 > 224 576.95 290.31 > > Also frontend-stalls are reduced to 50% and interconnect traffic is > greatly reduced. > Tested-by: Jan Glauber <jglauber@marvell.com> Tested this patchset on Kunpeng920 ARM64 server (96 cores, 4 NUMA nodes), and with the same test case from Jan, I can see 150%+ boost! (Need to add a patch below [1].) For the real workload such as Nginx I can see about 10% performance improvement as well. Tested-by: Hanjun Guo <guohanjun@huawei.com> Please cc me for new versions and I'm willing to test it. Thanks Hanjun [1] diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig index 657bbc5..72c1346 100644 --- a/arch/arm64/Kconfig +++ b/arch/arm64/Kconfig @@ -792,6 +792,20 @@ config NODES_SHIFT Specify the maximum number of NUMA Nodes available on the target system. Increases memory reserved to accommodate various tables. +config NUMA_AWARE_SPINLOCKS + bool "Numa-aware spinlocks" + depends on NUMA + default y + help + Introduce NUMA (Non Uniform Memory Access) awareness into + the slow path of spinlocks. + + The kernel will try to keep the lock on the same node, + thus reducing the number of remote cache misses, while + trading some of the short term fairness for better performance. + + Say N if you want absolute first come first serve fairness. + config USE_PERCPU_NUMA_NODE_ID def_bool y depends on NUMA diff --git a/kernel/locking/qspinlock_cna.h b/kernel/locking/qspinlock_cna.h index 2994167..be5dd44 100644 --- a/kernel/locking/qspinlock_cna.h +++ b/kernel/locking/qspinlock_cna.h @@ -4,7 +4,7 @@ #endif #include <linux/random.h> - +#include <linux/topology.h> /* * Implement a NUMA-aware version of MCS (aka CNA, or compact NUMA-aware lock). * @@ -170,7 +170,7 @@ static __always_inline void cna_init_node(struct mcs_spinlock *node, int cpuid, u32 tail) { if (decode_numa_node(node->node_and_count) == -1) - store_numa_node(node, numa_cpu_node(cpuid)); + store_numa_node(node, cpu_to_node(cpuid)); node->encoded_tail = tail; } _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
next prev parent reply other threads:[~2019-07-12 8:12 UTC|newest] Thread overview: 86+ messages / expand[flat|nested] mbox.gz Atom feed top 2019-03-29 15:20 [PATCH v2 0/5] Add NUMA-awareness to qspinlock Alex Kogan 2019-03-29 15:20 ` Alex Kogan 2019-03-29 15:20 ` [PATCH v2 1/5] locking/qspinlock: Make arch_mcs_spin_unlock_contended more generic Alex Kogan 2019-03-29 15:20 ` Alex Kogan 2019-03-29 15:20 ` [PATCH v2 2/5] locking/qspinlock: Refactor the qspinlock slow path Alex Kogan 2019-03-29 15:20 ` Alex Kogan 2019-03-29 15:20 ` [PATCH v2 3/5] locking/qspinlock: Introduce CNA into the slow path of qspinlock Alex Kogan 2019-03-29 15:20 ` Alex Kogan 2019-04-01 9:06 ` Peter Zijlstra 2019-04-01 9:06 ` Peter Zijlstra 2019-04-01 9:06 ` Peter Zijlstra 2019-04-01 9:33 ` Peter Zijlstra 2019-04-01 9:33 ` Peter Zijlstra 2019-04-01 9:33 ` Peter Zijlstra 2019-04-03 15:53 ` Alex Kogan 2019-04-03 15:53 ` Alex Kogan 2019-04-03 16:10 ` Peter Zijlstra 2019-04-03 16:10 ` Peter Zijlstra 2019-04-03 16:10 ` Peter Zijlstra 2019-04-01 9:21 ` Peter Zijlstra 2019-04-01 9:21 ` Peter Zijlstra 2019-04-01 14:36 ` Waiman Long 2019-04-01 14:36 ` Waiman Long 2019-04-02 9:43 ` Peter Zijlstra 2019-04-02 9:43 ` Peter Zijlstra 2019-04-02 9:43 ` Peter Zijlstra 2019-04-03 15:39 ` Alex Kogan 2019-04-03 15:39 ` Alex Kogan 2019-04-03 15:48 ` Waiman Long 2019-04-03 15:48 ` Waiman Long 2019-04-03 16:01 ` Peter Zijlstra 2019-04-03 16:01 ` Peter Zijlstra 2019-04-04 5:05 ` Juergen Gross 2019-04-04 5:05 ` Juergen Gross 2019-04-04 9:38 ` Peter Zijlstra 2019-04-04 9:38 ` Peter Zijlstra 2019-04-04 9:38 ` Peter Zijlstra 2019-04-04 18:03 ` Waiman Long 2019-04-04 18:03 ` Waiman Long 2019-06-04 23:21 ` Alex Kogan 2019-06-04 23:21 ` Alex Kogan 2019-06-05 20:40 ` Peter Zijlstra 2019-06-05 20:40 ` Peter Zijlstra 2019-06-06 15:21 ` Alex Kogan 2019-06-06 15:21 ` Alex Kogan 2019-06-06 15:32 ` Waiman Long 2019-06-06 15:32 ` Waiman Long 2019-06-06 15:42 ` Waiman Long 2019-06-06 15:42 ` Waiman Long 2019-04-03 16:33 ` Waiman Long 2019-04-03 16:33 ` Waiman Long 2019-04-03 17:16 ` Peter Zijlstra 2019-04-03 17:16 ` Peter Zijlstra 2019-04-03 17:16 ` Peter Zijlstra 2019-04-03 17:40 ` Waiman Long 2019-04-03 17:40 ` Waiman Long 2019-04-04 2:02 ` Hanjun Guo 2019-04-04 2:02 ` Hanjun Guo 2019-04-04 2:02 ` Hanjun Guo 2019-04-04 3:14 ` Alex Kogan 2019-04-04 3:14 ` Alex Kogan 2019-06-11 4:22 ` liwei (GF) 2019-06-11 4:22 ` liwei (GF) 2019-06-11 4:22 ` liwei (GF) 2019-06-12 4:38 ` Alex Kogan 2019-06-12 4:38 ` Alex Kogan 2019-06-12 15:05 ` Waiman Long 2019-06-12 15:05 ` Waiman Long 2019-03-29 15:20 ` [PATCH v2 4/5] locking/qspinlock: Introduce starvation avoidance into CNA Alex Kogan 2019-03-29 15:20 ` Alex Kogan 2019-04-02 10:37 ` Peter Zijlstra 2019-04-02 10:37 ` Peter Zijlstra 2019-04-02 10:37 ` Peter Zijlstra 2019-04-03 17:06 ` Alex Kogan 2019-04-03 17:06 ` Alex Kogan 2019-03-29 15:20 ` [PATCH v2 5/5] locking/qspinlock: Introduce the shuffle reduction optimization " Alex Kogan 2019-03-29 15:20 ` Alex Kogan 2019-04-01 9:09 ` [PATCH v2 0/5] Add NUMA-awareness to qspinlock Peter Zijlstra 2019-04-01 9:09 ` Peter Zijlstra 2019-04-01 9:09 ` Peter Zijlstra 2019-04-03 17:13 ` Alex Kogan 2019-04-03 17:13 ` Alex Kogan 2019-07-03 11:57 ` Jan Glauber 2019-07-03 11:58 ` Jan Glauber 2019-07-12 8:12 ` Hanjun Guo [this message] 2019-07-12 8:12 ` Hanjun Guo
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=95683b80-f694-cf34-73fc-e6ec05462ee0@huawei.com \ --to=guohanjun@huawei.com \ --cc=alex.kogan@oracle.com \ --cc=arnd@arndb.de \ --cc=bp@alien8.de \ --cc=daniel.m.jordan@oracle.com \ --cc=dave.dice@oracle.com \ --cc=hpa@zytor.com \ --cc=jglauber@marvell.com \ --cc=linux-arch@vger.kernel.org \ --cc=linux-arm-kernel@lists.infradead.org \ --cc=linux-kernel@vger.kernel.org \ --cc=linux@armlinux.org.uk \ --cc=longman@redhat.com \ --cc=mingo@redhat.com \ --cc=peterz@infradead.org \ --cc=rahul.x.yadav@oracle.com \ --cc=steven.sistare@oracle.com \ --cc=tglx@linutronix.de \ --cc=will.deacon@arm.com \ --cc=x86@kernel.org \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.