All of lore.kernel.org
 help / color / mirror / Atom feed
From: Waiman Long <longman@redhat.com>
To: paulmck@kernel.org, Alex Kogan <alex.kogan@oracle.com>
Cc: linux@armlinux.org.uk, Peter Zijlstra <peterz@infradead.org>,
	Ingo Molnar <mingo@redhat.com>, Will Deacon <will.deacon@arm.com>,
	Arnd Bergmann <arnd@arndb.de>,
	linux-arch@vger.kernel.org,
	linux-arm-kernel <linux-arm-kernel@lists.infradead.org>,
	linux-kernel@vger.kernel.org, tglx@linutronix.de, bp@alien8.de,
	hpa@zytor.com, x86@kernel.org, guohanjun@huawei.com,
	jglauber@marvell.com, dave.dice@oracle.com,
	steven.sistare@oracle.com, daniel.m.jordan@oracle.com
Subject: Re: [PATCH v9 0/5] Add NUMA-awareness to qspinlock
Date: Fri, 24 Jan 2020 20:59:28 -0500	[thread overview]
Message-ID: <02defadb-217d-7803-88a1-ec72a37eda28@redhat.com> (raw)
In-Reply-To: <20200125005713.GZ2935@paulmck-ThinkPad-P72>

On 1/24/20 7:57 PM, Paul E. McKenney wrote:
> On Fri, Jan 24, 2020 at 06:39:02PM -0500, Alex Kogan wrote:
>> Hi, Paul.
>>
>> Thanks for running those experiments!
>>
>>> On Jan 24, 2020, at 5:24 PM, Paul E. McKenney <paulmck@kernel.org> wrote:
>>>
>>> On Tue, Jan 14, 2020 at 10:59:15PM -0500, Alex Kogan wrote:
>>>> Minor changes from v8 based on feedback from Longman:
>>>> -----------------------------------------------------
>>>>
>>>> - Add __init to cna_configure_spin_lock_slowpath().
>>>>
>>>> - Fix the comment for cna_scan_main_queue().
>>>>
>>>> - Change the type of intra_node_handoff_threshold to unsigned int.
>>>>
>>>>
>>>> Summary
>>>> -------
>>>>
>>>> Lock throughput can be increased by handing a lock to a waiter on the
>>>> same NUMA node as the lock holder, provided care is taken to avoid
>>>> starvation of waiters on other NUMA nodes. This patch introduces CNA
>>>> (compact NUMA-aware lock) as the slow path for qspinlock. It is
>>>> enabled through a configuration option (NUMA_AWARE_SPINLOCKS).
>>>>
>>>> CNA is a NUMA-aware version of the MCS lock. Spinning threads are
>>>> organized in two queues, a main queue for threads running on the same
>>>> node as the current lock holder, and a secondary queue for threads
>>>> running on other nodes. Threads store the ID of the node on which
>>>> they are running in their queue nodes. After acquiring the MCS lock and
>>>> before acquiring the spinlock, the lock holder scans the main queue
>>>> looking for a thread running on the same node (pre-scan). If found (call
>>>> it thread T), all threads in the main queue between the current lock
>>>> holder and T are moved to the end of the secondary queue.  If such T
>>>> is not found, we make another scan of the main queue after acquiring 
>>>> the spinlock when unlocking the MCS lock (post-scan), starting at the
>>>> node where pre-scan stopped. If both scans fail to find such T, the
>>>> MCS lock is passed to the first thread in the secondary queue. If the
>>>> secondary queue is empty, the MCS lock is passed to the next thread in the
>>>> main queue. To avoid starvation of threads in the secondary queue, those
>>>> threads are moved back to the head of the main queue after a certain
>>>> number of intra-node lock hand-offs.
>>>>
>>>> More details are available at https://urldefense.proofpoint.com/v2/url?u=https-3A__arxiv.org_abs_1810.05600&d=DwIBAg&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=Hvhk3F4omdCk-GE1PTOm3Kn0A7ApWOZ2aZLTuVxFK4k&m=1KUGGZYTHnQ25fgRFppdNvpJfI0rOO_Usdu18RDu_14&s=F12nhHutwnPNt_TQ2ELER0DhtsHlEI9EiW1nDPhm5-Y&e= <https://urldefense.proofpoint.com/v2/url?u=https-3A__arxiv.org_abs_1810.05600&d=DwIBAg&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=Hvhk3F4omdCk-GE1PTOm3Kn0A7ApWOZ2aZLTuVxFK4k&m=1KUGGZYTHnQ25fgRFppdNvpJfI0rOO_Usdu18RDu_14&s=F12nhHutwnPNt_TQ2ELER0DhtsHlEI9EiW1nDPhm5-Y&e=> .
>>>>
>>>> The series applies on top of v5.5.0-rc6, commit b3a987b026.
>>>> Performance numbers are available in previous revisions
>>>> of the series.
>>>>
>>>> Further comments are welcome and appreciated.
>>> I ran this on a large system with a version of locktorture that was
>>> modified to print out the maximum and minimum per-CPU lock-acquisition
>>> counts, and with CPU hotplug disabled.  I also modified the LOCK01 and
>>> LOCK04 scenarios to use 220 hardware threads.
>>>
>>> Here is what the test ended up with at the end of a one-hour run:
>>>
>>> LOCK01 (exclusive):
>>> Writes:  Total: 1241107333  Max/Min: 9206962/60902 ???  Fail: 0
>>>
>>> LOCK04 (rwlock):
>>> Writes:  Total: 232991963  Max/Min: 2631574/74582 ???  Fail: 0
>>> Reads :  Total: 216935386  Max/Min: 2735939/28665 ???  Fail: 0
>>>
>>> The "???" strings are printed because the ratio of maximum to minimum exceeds
>>> a factor of two.
>> Is this what you expect / have seen with the existing qspinlock?
>>
>>> I also ran 30-minute runs on my laptop, which has 12 hardware threads:
>>>
>>> LOCK01 (exclusive):
>>> Writes:  Total: 3992072782  Max/Min: 259368782/97231961 ???  Fail: 0
>>>
>>> LOCK04 (rwlock):
>>> Writes:  Total: 131063892  Max/Min: 13136206/5876157 ???  Fail: 0
>>> Reads :  Total: 144876801  Max/Min: 19999535/4873442 ???  Fail: 0
>> I assume the system above is multi-socket, but your laptop is probably not?
>>
>> If that’s the case, CNA should not be enabled on your laptop (grep
>> kernel logs for "Enabling CNA spinlock” to be sure).
>>
>>> These also exceed the factor-of-two cutoff, but not as dramatically.
>>> The readers for the reader-writer lock fared worst, with a 4-to-1 ratio.
>>>
>>> These tests did run within guest OSes.
>> So I really wonder if CNA was enabled here, or whether this is what you get
>> with paravirt qspinlock.
>>
>>>  Is that configuration out of
>>> scope for this locking algorithm?  In addition (as might well also have
>>> been the case for the locktorture runs in your paper), these tests run
>>> a pair of stress-test tasks for each hardware thread.
>>>
>>> Is this expected behavior?
>> The results do appear skewed a bit too much, but it would be helpful to know
>> what qspinlock we are looking at, and how they compare to the existing qspinlock,
>> in case it is indeed CNA.
> You called it!  I will play with QEMU's -numa argument to see if I can get
> CNA to run for me.  Please accept my apologies for the false alarm.
>
> 							Thanx, Paul
>
CNA is not currently supported in a VM guest simply because the numa
information is not reliable. You will have to run it on baremetal to
test it. Sorry for that.

Regards,
Longman


WARNING: multiple messages have this Message-ID (diff)
From: Waiman Long <longman@redhat.com>
To: paulmck@kernel.org, Alex Kogan <alex.kogan@oracle.com>
Cc: linux-arch@vger.kernel.org, guohanjun@huawei.com,
	Arnd Bergmann <arnd@arndb.de>,
	Peter Zijlstra <peterz@infradead.org>,
	dave.dice@oracle.com, jglauber@marvell.com, x86@kernel.org,
	Will Deacon <will.deacon@arm.com>,
	linux@armlinux.org.uk, linux-kernel@vger.kernel.org,
	Ingo Molnar <mingo@redhat.com>,
	bp@alien8.de, hpa@zytor.com, steven.sistare@oracle.com,
	tglx@linutronix.de, daniel.m.jordan@oracle.com,
	linux-arm-kernel <linux-arm-kernel@lists.infradead.org>
Subject: Re: [PATCH v9 0/5] Add NUMA-awareness to qspinlock
Date: Fri, 24 Jan 2020 20:59:28 -0500	[thread overview]
Message-ID: <02defadb-217d-7803-88a1-ec72a37eda28@redhat.com> (raw)
In-Reply-To: <20200125005713.GZ2935@paulmck-ThinkPad-P72>

On 1/24/20 7:57 PM, Paul E. McKenney wrote:
> On Fri, Jan 24, 2020 at 06:39:02PM -0500, Alex Kogan wrote:
>> Hi, Paul.
>>
>> Thanks for running those experiments!
>>
>>> On Jan 24, 2020, at 5:24 PM, Paul E. McKenney <paulmck@kernel.org> wrote:
>>>
>>> On Tue, Jan 14, 2020 at 10:59:15PM -0500, Alex Kogan wrote:
>>>> Minor changes from v8 based on feedback from Longman:
>>>> -----------------------------------------------------
>>>>
>>>> - Add __init to cna_configure_spin_lock_slowpath().
>>>>
>>>> - Fix the comment for cna_scan_main_queue().
>>>>
>>>> - Change the type of intra_node_handoff_threshold to unsigned int.
>>>>
>>>>
>>>> Summary
>>>> -------
>>>>
>>>> Lock throughput can be increased by handing a lock to a waiter on the
>>>> same NUMA node as the lock holder, provided care is taken to avoid
>>>> starvation of waiters on other NUMA nodes. This patch introduces CNA
>>>> (compact NUMA-aware lock) as the slow path for qspinlock. It is
>>>> enabled through a configuration option (NUMA_AWARE_SPINLOCKS).
>>>>
>>>> CNA is a NUMA-aware version of the MCS lock. Spinning threads are
>>>> organized in two queues, a main queue for threads running on the same
>>>> node as the current lock holder, and a secondary queue for threads
>>>> running on other nodes. Threads store the ID of the node on which
>>>> they are running in their queue nodes. After acquiring the MCS lock and
>>>> before acquiring the spinlock, the lock holder scans the main queue
>>>> looking for a thread running on the same node (pre-scan). If found (call
>>>> it thread T), all threads in the main queue between the current lock
>>>> holder and T are moved to the end of the secondary queue.  If such T
>>>> is not found, we make another scan of the main queue after acquiring 
>>>> the spinlock when unlocking the MCS lock (post-scan), starting at the
>>>> node where pre-scan stopped. If both scans fail to find such T, the
>>>> MCS lock is passed to the first thread in the secondary queue. If the
>>>> secondary queue is empty, the MCS lock is passed to the next thread in the
>>>> main queue. To avoid starvation of threads in the secondary queue, those
>>>> threads are moved back to the head of the main queue after a certain
>>>> number of intra-node lock hand-offs.
>>>>
>>>> More details are available at https://urldefense.proofpoint.com/v2/url?u=https-3A__arxiv.org_abs_1810.05600&d=DwIBAg&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=Hvhk3F4omdCk-GE1PTOm3Kn0A7ApWOZ2aZLTuVxFK4k&m=1KUGGZYTHnQ25fgRFppdNvpJfI0rOO_Usdu18RDu_14&s=F12nhHutwnPNt_TQ2ELER0DhtsHlEI9EiW1nDPhm5-Y&e= <https://urldefense.proofpoint.com/v2/url?u=https-3A__arxiv.org_abs_1810.05600&d=DwIBAg&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=Hvhk3F4omdCk-GE1PTOm3Kn0A7ApWOZ2aZLTuVxFK4k&m=1KUGGZYTHnQ25fgRFppdNvpJfI0rOO_Usdu18RDu_14&s=F12nhHutwnPNt_TQ2ELER0DhtsHlEI9EiW1nDPhm5-Y&e=> .
>>>>
>>>> The series applies on top of v5.5.0-rc6, commit b3a987b026.
>>>> Performance numbers are available in previous revisions
>>>> of the series.
>>>>
>>>> Further comments are welcome and appreciated.
>>> I ran this on a large system with a version of locktorture that was
>>> modified to print out the maximum and minimum per-CPU lock-acquisition
>>> counts, and with CPU hotplug disabled.  I also modified the LOCK01 and
>>> LOCK04 scenarios to use 220 hardware threads.
>>>
>>> Here is what the test ended up with at the end of a one-hour run:
>>>
>>> LOCK01 (exclusive):
>>> Writes:  Total: 1241107333  Max/Min: 9206962/60902 ???  Fail: 0
>>>
>>> LOCK04 (rwlock):
>>> Writes:  Total: 232991963  Max/Min: 2631574/74582 ???  Fail: 0
>>> Reads :  Total: 216935386  Max/Min: 2735939/28665 ???  Fail: 0
>>>
>>> The "???" strings are printed because the ratio of maximum to minimum exceeds
>>> a factor of two.
>> Is this what you expect / have seen with the existing qspinlock?
>>
>>> I also ran 30-minute runs on my laptop, which has 12 hardware threads:
>>>
>>> LOCK01 (exclusive):
>>> Writes:  Total: 3992072782  Max/Min: 259368782/97231961 ???  Fail: 0
>>>
>>> LOCK04 (rwlock):
>>> Writes:  Total: 131063892  Max/Min: 13136206/5876157 ???  Fail: 0
>>> Reads :  Total: 144876801  Max/Min: 19999535/4873442 ???  Fail: 0
>> I assume the system above is multi-socket, but your laptop is probably not?
>>
>> If that’s the case, CNA should not be enabled on your laptop (grep
>> kernel logs for "Enabling CNA spinlock” to be sure).
>>
>>> These also exceed the factor-of-two cutoff, but not as dramatically.
>>> The readers for the reader-writer lock fared worst, with a 4-to-1 ratio.
>>>
>>> These tests did run within guest OSes.
>> So I really wonder if CNA was enabled here, or whether this is what you get
>> with paravirt qspinlock.
>>
>>>  Is that configuration out of
>>> scope for this locking algorithm?  In addition (as might well also have
>>> been the case for the locktorture runs in your paper), these tests run
>>> a pair of stress-test tasks for each hardware thread.
>>>
>>> Is this expected behavior?
>> The results do appear skewed a bit too much, but it would be helpful to know
>> what qspinlock we are looking at, and how they compare to the existing qspinlock,
>> in case it is indeed CNA.
> You called it!  I will play with QEMU's -numa argument to see if I can get
> CNA to run for me.  Please accept my apologies for the false alarm.
>
> 							Thanx, Paul
>
CNA is not currently supported in a VM guest simply because the numa
information is not reliable. You will have to run it on baremetal to
test it. Sorry for that.

Regards,
Longman


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

  reply	other threads:[~2020-01-25  1:59 UTC|newest]

Thread overview: 89+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-01-15  3:59 [PATCH v9 0/5] Add NUMA-awareness to qspinlock Alex Kogan
2020-01-15  3:59 ` Alex Kogan
2020-01-15  3:59 ` [PATCH v9 1/5] locking/qspinlock: Rename mcs lock/unlock macros and make them more generic Alex Kogan
2020-01-15  3:59   ` Alex Kogan
2020-01-15  3:59 ` [PATCH v9 2/5] locking/qspinlock: Refactor the qspinlock slow path Alex Kogan
2020-01-15  3:59   ` Alex Kogan
2020-01-15  3:59   ` Alex Kogan
2020-01-15  3:59 ` [PATCH v9 3/5] locking/qspinlock: Introduce CNA into the slow path of qspinlock Alex Kogan
2020-01-15  3:59   ` Alex Kogan
2020-01-15  3:59   ` Alex Kogan
2020-01-23  9:26   ` Peter Zijlstra
2020-01-23  9:26     ` Peter Zijlstra
2020-01-23  9:26     ` Peter Zijlstra
2020-01-23 10:06     ` Peter Zijlstra
2020-01-23 10:06       ` Peter Zijlstra
2020-01-23 10:06       ` Peter Zijlstra
2020-01-23 10:16       ` Peter Zijlstra
2020-01-23 10:16         ` Peter Zijlstra
2020-01-23 10:16         ` Peter Zijlstra
2020-01-23 11:22         ` Will Deacon
2020-01-23 11:22           ` Will Deacon
2020-01-23 13:17           ` Peter Zijlstra
2020-01-23 13:17             ` Peter Zijlstra
2020-01-23 13:17             ` Peter Zijlstra
2020-01-23 14:15   ` Waiman Long
2020-01-23 14:15     ` Waiman Long
2020-01-23 15:29     ` Peter Zijlstra
2020-01-23 15:29       ` Peter Zijlstra
2020-01-23 15:29       ` Peter Zijlstra
2020-01-15  3:59 ` [PATCH v9 4/5] locking/qspinlock: Introduce starvation avoidance into CNA Alex Kogan
2020-01-15  3:59   ` Alex Kogan
2020-01-23 19:55   ` Waiman Long
2020-01-23 19:55     ` Waiman Long
2020-01-23 20:39     ` Waiman Long
2020-01-23 20:39       ` Waiman Long
2020-01-23 23:39       ` Alex Kogan
2020-01-23 23:39         ` Alex Kogan
2020-01-15  3:59 ` [PATCH v9 5/5] locking/qspinlock: Introduce the shuffle reduction optimization " Alex Kogan
2020-01-15  3:59   ` Alex Kogan
2020-03-02  1:14   ` [locking/qspinlock] 7b6da71157: unixbench.score 8.4% improvement kernel test robot
2020-03-02  1:14     ` kernel test robot
2020-03-02  1:14     ` kernel test robot
2020-01-22 11:45 ` [PATCH v9 0/5] Add NUMA-awareness to qspinlock Lihao Liang
2020-01-22 11:45   ` Lihao Liang
2020-01-22 17:24   ` Waiman Long
2020-01-22 17:24     ` Waiman Long
2020-01-23 11:35     ` Will Deacon
2020-01-23 11:35       ` Will Deacon
2020-01-23 15:25       ` Waiman Long
2020-01-23 15:25         ` Waiman Long
2020-01-23 19:08         ` Waiman Long
2020-01-23 19:08           ` Waiman Long
2020-01-22 19:29   ` Alex Kogan
2020-01-22 19:29     ` Alex Kogan
2020-01-26  0:32     ` Lihao Liang
2020-01-26  0:32       ` Lihao Liang
2020-01-26  1:58       ` Lihao Liang
2020-01-26  1:58         ` Lihao Liang
2020-01-26  1:58         ` Lihao Liang
2020-01-27 16:01         ` Alex Kogan
2020-01-27 16:01           ` Alex Kogan
2020-01-29  1:39           ` Lihao Liang
2020-01-29  1:39             ` Lihao Liang
2020-01-27  6:16       ` Alex Kogan
2020-01-27  6:16         ` Alex Kogan
2020-01-24 22:24 ` Paul E. McKenney
2020-01-24 22:24   ` Paul E. McKenney
     [not found]   ` <6AAE7FC6-F5DE-4067-8BC4-77F27948CD09@oracle.com>
2020-01-25  0:57     ` Paul E. McKenney
2020-01-25  0:57       ` Paul E. McKenney
2020-01-25  1:59       ` Waiman Long [this message]
2020-01-25  1:59         ` Waiman Long
     [not found]         ` <adb4fb09-f374-4d64-096b-ba9ad8b35fd5@redhat.com>
2020-01-25  4:58           ` Paul E. McKenney
2020-01-25  4:58             ` Paul E. McKenney
2020-01-25 19:41             ` Waiman Long
2020-01-25 19:41               ` Waiman Long
2020-01-26 15:35               ` Paul E. McKenney
2020-01-26 15:35                 ` Paul E. McKenney
2020-01-26 22:42                 ` Paul E. McKenney
2020-01-26 22:42                   ` Paul E. McKenney
2020-01-26 23:32                   ` Paul E. McKenney
2020-01-26 23:32                     ` Paul E. McKenney
2020-01-27  6:04                   ` Alex Kogan
2020-01-27  6:04                     ` Alex Kogan
2020-01-27 14:11                   ` Waiman Long
2020-01-27 14:11                     ` Waiman Long
2020-01-27 15:09                     ` Paul E. McKenney
2020-01-27 15:09                       ` Paul E. McKenney
     [not found]                       ` <9b3a3f16-5405-b6d1-d023-b85f4aab46dd@redhat.com>
2020-01-27 17:17                         ` Waiman Long
2020-01-27 17:17                           ` Waiman Long

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=02defadb-217d-7803-88a1-ec72a37eda28@redhat.com \
    --to=longman@redhat.com \
    --cc=alex.kogan@oracle.com \
    --cc=arnd@arndb.de \
    --cc=bp@alien8.de \
    --cc=daniel.m.jordan@oracle.com \
    --cc=dave.dice@oracle.com \
    --cc=guohanjun@huawei.com \
    --cc=hpa@zytor.com \
    --cc=jglauber@marvell.com \
    --cc=linux-arch@vger.kernel.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux@armlinux.org.uk \
    --cc=mingo@redhat.com \
    --cc=paulmck@kernel.org \
    --cc=peterz@infradead.org \
    --cc=steven.sistare@oracle.com \
    --cc=tglx@linutronix.de \
    --cc=will.deacon@arm.com \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.