From: Alex Kogan <alex.kogan@oracle.com> To: Lihao Liang <lihaoliang@google.com> Cc: linux@armlinux.org.uk, Peter Zijlstra <peterz@infradead.org>, mingo@redhat.com, will.deacon@arm.com, arnd@arndb.de, longman@redhat.com, linux-arch@vger.kernel.org, linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, tglx@linutronix.de, bp@alien8.de, hpa@zytor.com, x86@kernel.org, guohanjun@huawei.com, jglauber@marvell.com, dave.dice@oracle.com, steven.sistare@oracle.com, daniel.m.jordan@oracle.com, Will Deacon <will@kernel.org> Subject: Re: [PATCH v9 0/5] Add NUMA-awareness to qspinlock Date: Wed, 22 Jan 2020 14:29:36 -0500 [thread overview] Message-ID: <4F71A184-42C0-4865-9AAA-79A636743C25@oracle.com> (raw) In-Reply-To: <CAC4j=Y8rCeTX9oKKbh+dCdTP8Ud4hW1ybu+iE7t_nxMSYBOR5w@mail.gmail.com> Hi, Lihao. > On Jan 22, 2020, at 6:45 AM, Lihao Liang <lihaoliang@google.com> wrote: > > Hi Alex, > > On Wed, Jan 22, 2020 at 10:28 AM Alex Kogan <alex.kogan@oracle.com> wrote: >> >> Summary >> ------- >> >> Lock throughput can be increased by handing a lock to a waiter on the >> same NUMA node as the lock holder, provided care is taken to avoid >> starvation of waiters on other NUMA nodes. This patch introduces CNA >> (compact NUMA-aware lock) as the slow path for qspinlock. It is >> enabled through a configuration option (NUMA_AWARE_SPINLOCKS). >> > > Thanks for your patches. The experimental results look promising! > > I understand that the new CNA qspinlock uses randomization to achieve > long-term fairness, and provides the numa_spinlock_threshold parameter > for users to tune. This has been the case in the first versions of the series, but is not true anymore. That is, the long-term fairness is achieved deterministically (and you are correct that it is done through the numa_spinlock_threshold parameter). > As Linux runs extremely diverse workloads, it is not > clear how randomization affects its fairness, and how users with > different requirements are supposed to tune this parameter. > > To this end, Will and I consider it beneficial to be able to answer the > following question: > > With different values of numa_spinlock_threshold and > SHUFFLE_REDUCTION_PROB_ARG, how long do threads running on different > sockets have to wait to acquire the lock? The SHUFFLE_REDUCTION_PROB_ARG parameter is intended for performance optimization only, and *does not* affect the long-term fairness (or, at the very least, does not make it any worse). As Longman correctly pointed out in his response to this email, the shuffle reduction optimization is relevant only when the secondary queue is empty. In that case, CNA hands-off the lock exactly as MCS does, i.e., in the FIFO order. Note that when the secondary queue is not empty, we do not call probably(). > This is particularly relevant > in high contention situations when new threads keep arriving on the same > socket as the lock holder. In this case, the lock will stay on the same NUMA node/socket for 2^numa_spinlock_threshold times, which is the worst case scenario if we consider the long-term fairness. And if we have multiple nodes, it will take up to 2^numa_spinlock_threshold X (nr_nodes - 1) + nr_cpus_per_node lock transitions until any given thread will acquire the lock (assuming 2^numa_spinlock_threshold > nr_cpus_per_node). Hopefully, it addresses your concern. Let me know if you have any further questions. Best regards, — Alex
WARNING: multiple messages have this Message-ID (diff)
From: Alex Kogan <alex.kogan@oracle.com> To: Lihao Liang <lihaoliang@google.com> Cc: linux-arch@vger.kernel.org, guohanjun@huawei.com, arnd@arndb.de, Peter Zijlstra <peterz@infradead.org>, dave.dice@oracle.com, jglauber@marvell.com, x86@kernel.org, will.deacon@arm.com, linux@armlinux.org.uk, steven.sistare@oracle.com, linux-kernel@vger.kernel.org, mingo@redhat.com, bp@alien8.de, hpa@zytor.com, longman@redhat.com, tglx@linutronix.de, daniel.m.jordan@oracle.com, Will Deacon <will@kernel.org>, linux-arm-kernel@lists.infradead.org Subject: Re: [PATCH v9 0/5] Add NUMA-awareness to qspinlock Date: Wed, 22 Jan 2020 14:29:36 -0500 [thread overview] Message-ID: <4F71A184-42C0-4865-9AAA-79A636743C25@oracle.com> (raw) In-Reply-To: <CAC4j=Y8rCeTX9oKKbh+dCdTP8Ud4hW1ybu+iE7t_nxMSYBOR5w@mail.gmail.com> Hi, Lihao. > On Jan 22, 2020, at 6:45 AM, Lihao Liang <lihaoliang@google.com> wrote: > > Hi Alex, > > On Wed, Jan 22, 2020 at 10:28 AM Alex Kogan <alex.kogan@oracle.com> wrote: >> >> Summary >> ------- >> >> Lock throughput can be increased by handing a lock to a waiter on the >> same NUMA node as the lock holder, provided care is taken to avoid >> starvation of waiters on other NUMA nodes. This patch introduces CNA >> (compact NUMA-aware lock) as the slow path for qspinlock. It is >> enabled through a configuration option (NUMA_AWARE_SPINLOCKS). >> > > Thanks for your patches. The experimental results look promising! > > I understand that the new CNA qspinlock uses randomization to achieve > long-term fairness, and provides the numa_spinlock_threshold parameter > for users to tune. This has been the case in the first versions of the series, but is not true anymore. That is, the long-term fairness is achieved deterministically (and you are correct that it is done through the numa_spinlock_threshold parameter). > As Linux runs extremely diverse workloads, it is not > clear how randomization affects its fairness, and how users with > different requirements are supposed to tune this parameter. > > To this end, Will and I consider it beneficial to be able to answer the > following question: > > With different values of numa_spinlock_threshold and > SHUFFLE_REDUCTION_PROB_ARG, how long do threads running on different > sockets have to wait to acquire the lock? The SHUFFLE_REDUCTION_PROB_ARG parameter is intended for performance optimization only, and *does not* affect the long-term fairness (or, at the very least, does not make it any worse). As Longman correctly pointed out in his response to this email, the shuffle reduction optimization is relevant only when the secondary queue is empty. In that case, CNA hands-off the lock exactly as MCS does, i.e., in the FIFO order. Note that when the secondary queue is not empty, we do not call probably(). > This is particularly relevant > in high contention situations when new threads keep arriving on the same > socket as the lock holder. In this case, the lock will stay on the same NUMA node/socket for 2^numa_spinlock_threshold times, which is the worst case scenario if we consider the long-term fairness. And if we have multiple nodes, it will take up to 2^numa_spinlock_threshold X (nr_nodes - 1) + nr_cpus_per_node lock transitions until any given thread will acquire the lock (assuming 2^numa_spinlock_threshold > nr_cpus_per_node). Hopefully, it addresses your concern. Let me know if you have any further questions. Best regards, — Alex _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
next prev parent reply other threads:[~2020-01-22 19:30 UTC|newest] Thread overview: 89+ messages / expand[flat|nested] mbox.gz Atom feed top 2020-01-15 3:59 [PATCH v9 0/5] Add NUMA-awareness to qspinlock Alex Kogan 2020-01-15 3:59 ` Alex Kogan 2020-01-15 3:59 ` [PATCH v9 1/5] locking/qspinlock: Rename mcs lock/unlock macros and make them more generic Alex Kogan 2020-01-15 3:59 ` Alex Kogan 2020-01-15 3:59 ` [PATCH v9 2/5] locking/qspinlock: Refactor the qspinlock slow path Alex Kogan 2020-01-15 3:59 ` Alex Kogan 2020-01-15 3:59 ` Alex Kogan 2020-01-15 3:59 ` [PATCH v9 3/5] locking/qspinlock: Introduce CNA into the slow path of qspinlock Alex Kogan 2020-01-15 3:59 ` Alex Kogan 2020-01-15 3:59 ` Alex Kogan 2020-01-23 9:26 ` Peter Zijlstra 2020-01-23 9:26 ` Peter Zijlstra 2020-01-23 9:26 ` Peter Zijlstra 2020-01-23 10:06 ` Peter Zijlstra 2020-01-23 10:06 ` Peter Zijlstra 2020-01-23 10:06 ` Peter Zijlstra 2020-01-23 10:16 ` Peter Zijlstra 2020-01-23 10:16 ` Peter Zijlstra 2020-01-23 10:16 ` Peter Zijlstra 2020-01-23 11:22 ` Will Deacon 2020-01-23 11:22 ` Will Deacon 2020-01-23 13:17 ` Peter Zijlstra 2020-01-23 13:17 ` Peter Zijlstra 2020-01-23 13:17 ` Peter Zijlstra 2020-01-23 14:15 ` Waiman Long 2020-01-23 14:15 ` Waiman Long 2020-01-23 15:29 ` Peter Zijlstra 2020-01-23 15:29 ` Peter Zijlstra 2020-01-23 15:29 ` Peter Zijlstra 2020-01-15 3:59 ` [PATCH v9 4/5] locking/qspinlock: Introduce starvation avoidance into CNA Alex Kogan 2020-01-15 3:59 ` Alex Kogan 2020-01-23 19:55 ` Waiman Long 2020-01-23 19:55 ` Waiman Long 2020-01-23 20:39 ` Waiman Long 2020-01-23 20:39 ` Waiman Long 2020-01-23 23:39 ` Alex Kogan 2020-01-23 23:39 ` Alex Kogan 2020-01-15 3:59 ` [PATCH v9 5/5] locking/qspinlock: Introduce the shuffle reduction optimization " Alex Kogan 2020-01-15 3:59 ` Alex Kogan 2020-03-02 1:14 ` [locking/qspinlock] 7b6da71157: unixbench.score 8.4% improvement kernel test robot 2020-03-02 1:14 ` kernel test robot 2020-03-02 1:14 ` kernel test robot 2020-01-22 11:45 ` [PATCH v9 0/5] Add NUMA-awareness to qspinlock Lihao Liang 2020-01-22 11:45 ` Lihao Liang 2020-01-22 17:24 ` Waiman Long 2020-01-22 17:24 ` Waiman Long 2020-01-23 11:35 ` Will Deacon 2020-01-23 11:35 ` Will Deacon 2020-01-23 15:25 ` Waiman Long 2020-01-23 15:25 ` Waiman Long 2020-01-23 19:08 ` Waiman Long 2020-01-23 19:08 ` Waiman Long 2020-01-22 19:29 ` Alex Kogan [this message] 2020-01-22 19:29 ` Alex Kogan 2020-01-26 0:32 ` Lihao Liang 2020-01-26 0:32 ` Lihao Liang 2020-01-26 1:58 ` Lihao Liang 2020-01-26 1:58 ` Lihao Liang 2020-01-26 1:58 ` Lihao Liang 2020-01-27 16:01 ` Alex Kogan 2020-01-27 16:01 ` Alex Kogan 2020-01-29 1:39 ` Lihao Liang 2020-01-29 1:39 ` Lihao Liang 2020-01-27 6:16 ` Alex Kogan 2020-01-27 6:16 ` Alex Kogan 2020-01-24 22:24 ` Paul E. McKenney 2020-01-24 22:24 ` Paul E. McKenney [not found] ` <6AAE7FC6-F5DE-4067-8BC4-77F27948CD09@oracle.com> 2020-01-25 0:57 ` Paul E. McKenney 2020-01-25 0:57 ` Paul E. McKenney 2020-01-25 1:59 ` Waiman Long 2020-01-25 1:59 ` Waiman Long [not found] ` <adb4fb09-f374-4d64-096b-ba9ad8b35fd5@redhat.com> 2020-01-25 4:58 ` Paul E. McKenney 2020-01-25 4:58 ` Paul E. McKenney 2020-01-25 19:41 ` Waiman Long 2020-01-25 19:41 ` Waiman Long 2020-01-26 15:35 ` Paul E. McKenney 2020-01-26 15:35 ` Paul E. McKenney 2020-01-26 22:42 ` Paul E. McKenney 2020-01-26 22:42 ` Paul E. McKenney 2020-01-26 23:32 ` Paul E. McKenney 2020-01-26 23:32 ` Paul E. McKenney 2020-01-27 6:04 ` Alex Kogan 2020-01-27 6:04 ` Alex Kogan 2020-01-27 14:11 ` Waiman Long 2020-01-27 14:11 ` Waiman Long 2020-01-27 15:09 ` Paul E. McKenney 2020-01-27 15:09 ` Paul E. McKenney [not found] ` <9b3a3f16-5405-b6d1-d023-b85f4aab46dd@redhat.com> 2020-01-27 17:17 ` Waiman Long 2020-01-27 17:17 ` Waiman Long
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=4F71A184-42C0-4865-9AAA-79A636743C25@oracle.com \ --to=alex.kogan@oracle.com \ --cc=arnd@arndb.de \ --cc=bp@alien8.de \ --cc=daniel.m.jordan@oracle.com \ --cc=dave.dice@oracle.com \ --cc=guohanjun@huawei.com \ --cc=hpa@zytor.com \ --cc=jglauber@marvell.com \ --cc=lihaoliang@google.com \ --cc=linux-arch@vger.kernel.org \ --cc=linux-arm-kernel@lists.infradead.org \ --cc=linux-kernel@vger.kernel.org \ --cc=linux@armlinux.org.uk \ --cc=longman@redhat.com \ --cc=mingo@redhat.com \ --cc=peterz@infradead.org \ --cc=steven.sistare@oracle.com \ --cc=tglx@linutronix.de \ --cc=will.deacon@arm.com \ --cc=will@kernel.org \ --cc=x86@kernel.org \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.