From: Vikram Mulukutla <markivx@codeaurora.org>
To: qiaozhou <qiaozhou@asrmicro.com>
Cc: Thomas Gleixner <tglx@linutronix.de>,
John Stultz <john.stultz@linaro.org>,
sboyd@codeaurora.org, LKML <linux-kernel@vger.kernel.org>,
Wang Wilbur <wilburwang@asrmicro.com>,
Marc Zyngier <marc.zyngier@arm.com>,
Will Deacon <will.deacon@arm.com>,
Peter Zijlstra <peterz@infradead.org>,
linux-kernel-owner@vger.kernel.org, sudeep.holla@arm.com
Subject: Re: [Question]: try to fix contention between expire_timers and try_to_del_timer_sync
Date: Thu, 27 Jul 2017 18:10:34 -0700 [thread overview]
Message-ID: <e1cc02c5e7dfd4d6bec937b6dc97bfc7@codeaurora.org> (raw)
In-Reply-To: <dcb18367-c747-96a8-9927-d8ba6954c496@asrmicro.com>
[-- Attachment #1: Type: text/plain, Size: 2969 bytes --]
cc: Sudeep Holla
On 2017-07-26 18:29, qiaozhou wrote:
> On 2017年07月26日 22:16, Thomas Gleixner wrote:
>> On Wed, 26 Jul 2017, qiaozhou wrote:
>>
>> Cc'ed ARM folks.
>>
<snip>
>>
>> For that particular timer case we can clear base->running_timer w/o
>> the
>> lock held (see patch below), but this kind of
>>
>> lock -> test -> unlock -> retry
>>
>> loops are all over the place in the kernel, so this is going to hurt
>> you
>> sooner than later in some other place.
> It's true. This is the way spinlock is used normally and widely in
> kernel. I'll also ask ARM experts whether we can do something to avoid
> or reduce the chance of such issue. ARMv8.1 has one single
> instruction(ldadda) to replace the ldaxr/stxr loop. Hope it can
> improve and reduce the chance.
I think we should have this discussion now - I brought this up earlier
[1]
and I promised a test case that I completely forgot about - but here it
is (attached). Essentially a Big CPU in an acquire-check-release loop
will have an unfair advantage over a little CPU concurrently attempting
to acquire the same lock, in spite of the ticket implementation. If the
Big
CPU needs the little CPU to make forward progress : livelock.
We've run into the same loop construct in other spots in the kernel and
the reason that a real symptom is so rare is that the retry-loop on the
'Big'
CPU needs to be interrupted just once by say an IRQ/FIQ and the
live-lock
is broken. If the entire retry loop is within an interrupt-disabled
critical
section then the odds of live-locking are much higher.
An example of the problem on a previous kernel is here [2]. Changes to
the
workqueue code since may have fixed this particular instance.
One solution was to use udelay(1) in such loops instead of cpu_relax(),
but
that's not very 'relaxing'. I'm not sure if there's something we could
do
within the ticket spin-lock implementation to deal with this.
Note that I ran my test on a 4.9 kernel so that didn't include any
spinlock
implementation changes since then. The test schedules two threads, one
on
a big CPU and one on a little CPU. The big CPU thread does the
lock/unlock/retry
loop for a full 1 second with interrupts disabled, while the little CPU
attempts
to acquire the same loop but enabling interrupts after every successful
lock+unlock.
With unfairness, the little CPU may take upto 1 second (or several
milliseconds at
the least) just to acquire the lock once. This varies depending on the
IPC difference
and frequencies of the big and little ARM64 CPUs:
Big cpu frequency | Little cpu frequency | Max time taken by little to
acquire lock
2GHz | 1.5GHz | 133 microseconds
2GHz | 300MHz | 734 milliseconds
Thanks,
Vikram
[1] - https://lkml.org/lkml/2016/11/17/934
[2] - https://goo.gl/uneFjt
--
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0001-measure-spinlock-fairness-across-differently-capable.patch --]
[-- Type: text/x-diff; name=0001-measure-spinlock-fairness-across-differently-capable.patch, Size: 7285 bytes --]
From 51d6186b620a9e354a0d40af06aef1c1299ca223 Mon Sep 17 00:00:00 2001
From: Vikram Mulukutla <markivx@codeaurora.org>
Date: Thu, 27 Jul 2017 12:14:48 -0700
Subject: [PATCH] measure spinlock fairness across differently capable CPUs
How to run this test:
1) compile and boot
2) echo 1 > /sys/module/test/parameters/run_test
3) sleep 5
4) echo 0 > /sys/module/test/parameters/run_test
The test schedules two threads on two separate CPUs
and attempts to acquire the same spinlock from both
threads with a certain loop construct.
(it assumes cpu0 is 'Little' and cpu4 is 'Big'. This
can be changed in the code)
After running the test, check these counters:
cat /sys/module/test/parameters/big_time_us
cat /sys/module/test/parameters/little_time_us
If unfairness exists, little_time should be close to 1 second or
a fairly large millisecond value.
test.c has comments that explain why this is so.
Signed-off-by: Vikram Mulukutla <markivx@codeaurora.org>
---
kernel/sched/Makefile | 2 +-
kernel/sched/test.c | 204 ++++++++++++++++++++++++++++++++++++++++++++++++++
2 files changed, 205 insertions(+), 1 deletion(-)
create mode 100644 kernel/sched/test.c
diff --git a/kernel/sched/Makefile b/kernel/sched/Makefile
index f6cce95..542a1c7 100644
--- a/kernel/sched/Makefile
+++ b/kernel/sched/Makefile
@@ -15,7 +15,7 @@ ifneq ($(CONFIG_SCHED_OMIT_FRAME_POINTER),y)
CFLAGS_core.o := $(PROFILING) -fno-omit-frame-pointer
endif
-obj-y += core.o loadavg.o clock.o cputime.o
+obj-y += core.o loadavg.o clock.o cputime.o test.o
obj-y += idle_task.o fair.o rt.o deadline.o stop_task.o
obj-y += wait.o swait.o completion.o idle.o
obj-$(CONFIG_SMP) += cpupri.o cpudeadline.o energy.o sched_avg.o
diff --git a/kernel/sched/test.c b/kernel/sched/test.c
new file mode 100644
index 0000000..5dd3b0d
--- /dev/null
+++ b/kernel/sched/test.c
@@ -0,0 +1,204 @@
+/* Copyright (c) 2014-2016, The Linux Foundation. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 and
+ * only version 2 as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * Big.Little spinlock unfairness test by Vikram Mulukutla
+ */
+
+#include <linux/init.h>
+#include <linux/notifier.h>
+#include <linux/cpu.h>
+#include <linux/cpumask.h>
+#include <linux/cpufreq.h>
+#include <linux/kthread.h>
+#include <linux/sched.h>
+#include <linux/sched/rt.h>
+#include <linux/module.h>
+#include <linux/delay.h>
+
+static DEFINE_SPINLOCK(mylock);
+
+static int run_test;
+static int big_counter, little_counter;
+module_param(big_counter, int, 0644);
+module_param(little_counter, int, 0644);
+
+static int big_time_us, little_time_us;
+module_param(big_time_us, int, 0644);
+module_param(little_time_us, int, 0644);
+
+volatile int sync = 0;
+
+struct testdata {
+ int cpu;
+ int *counter;
+ int *time;
+};
+
+struct testdata littledata = {
+ .cpu = 0,
+ .counter = &little_counter,
+ .time = &little_time_us,
+};
+
+struct testdata bigdata = {
+ .cpu = 4,
+ .counter = &big_counter,
+ .time = &big_time_us,
+};
+
+/*
+ * This is the little CPU thread. It attempts to get the lock, disabling
+ * and enabling interrupts before and after the critical section, and
+ * checks how long it took to get the lock. Ideally this time
+ * should be reflective of just one iteration of the critical section of the
+ * big cpu thread but since this little CPU always loses the ticket strex,
+ * it ends up waiting a full second.
+ */
+static int __ref lockunlock(void *data)
+{
+ struct testdata *testdata = data;
+ u64 spin_start_time, spin_stop_time;
+
+ sched_setaffinity(current->pid, cpumask_of(testdata->cpu));
+
+ while (!sync);
+
+ local_irq_disable();
+ spin_lock(&mylock);
+ while (1) {
+ spin_unlock(&mylock);
+ local_irq_enable();
+
+ cpu_relax();
+
+ local_irq_disable();
+
+ spin_start_time = sched_clock();
+ spin_lock(&mylock);
+ spin_stop_time = sched_clock();
+
+ if (*testdata->time < ((spin_stop_time - spin_start_time)/NSEC_PER_USEC))
+ *testdata->time = (spin_stop_time - spin_start_time)/NSEC_PER_USEC;
+ *testdata->counter = *testdata->counter + 1;
+ if (!run_test)
+ break;
+ }
+ spin_unlock(&mylock);
+ local_irq_enable();
+
+ return 0;
+}
+
+/*
+ * This is the big CPU thread. Ideally it should contend and have its ticket
+ * go in after the little CPU. However what ends up happening is
+ * - if it's running fast enough - this big CPU keeps winning the strex every
+ * time and is able to obtain and release the lock for a full second before it
+ * takes a break, at which point the little CPU wins (finally).
+ *
+ * Note that if there exists a loop construct such as this one in the kernel,
+ * and there is no 'break', then we have a potential livelock situation if
+ * this big CPU needs the little CPU to obtain the lock for the former to
+ * break out of the loop.
+ */
+static int __ref lockunlock_noirq(void *data)
+{
+ struct testdata *testdata = data;
+ u64 spin_start_time, spin_stop_time, time_since_start, test_start_time;
+ bool rollover;
+
+ sched_setaffinity(current->pid, cpumask_of(testdata->cpu));
+
+ while (!sync);
+
+ local_irq_disable();
+ spin_lock(&mylock);
+ test_start_time = sched_clock();
+ while (1) {
+ spin_unlock(&mylock);
+
+ time_since_start = sched_clock() - test_start_time;
+ rollover = time_since_start >= NSEC_PER_SEC;
+
+ /*
+ * We take a break after 1 second to allow watchdog etc. and
+ * potentially a user shell to work!
+ */
+ if (rollover) {
+ local_irq_enable();
+ test_start_time = sched_clock();
+ }
+
+ /*
+ * Changing this cpu_relax to a udelay(1) will likely
+ * allow the little CPU enough cycles so it finally wins
+ * the strex battle.
+ */
+ cpu_relax();
+
+ if (rollover)
+ local_irq_disable();
+
+ spin_start_time = sched_clock();
+ spin_lock(&mylock);
+ spin_stop_time = sched_clock();
+
+ if (*testdata->time < ((spin_stop_time - spin_start_time)/NSEC_PER_USEC))
+ *testdata->time = (spin_stop_time - spin_start_time)/NSEC_PER_USEC;
+ *testdata->counter = *testdata->counter + 1;
+ if (!run_test)
+ break;
+ }
+ spin_unlock(&mylock);
+ local_irq_enable();
+
+ return 0;
+}
+
+static int runtest_param_set(const char *val, struct kernel_param *kp)
+{
+ int ret;
+ int old_val = run_test;
+
+ ret = param_set_int(val, kp);
+
+ if (ret)
+ return ret;
+
+ /* If run_test is not zero or one, ignore. */
+ if (run_test >> 1) {
+ run_test = old_val;
+ return -EINVAL;
+ }
+
+ if (run_test) {
+ sync = 0;
+
+ *littledata.time = 0;
+ *littledata.counter = 0;
+ *bigdata.time = 0;
+ *bigdata.counter = 0;
+
+ kthread_run(lockunlock, (void *) &littledata,
+ "test/%d", littledata.cpu);
+ kthread_run(lockunlock_noirq, (void *) &bigdata,
+ "test/%d", bigdata.cpu);
+ sync=1;
+ }
+
+ return 0;
+}
+
+module_param_call(run_test, runtest_param_set, param_get_int,
+ &run_test, S_IRUGO|S_IWUSR);
+
+
+
--
--
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project
next prev parent reply other threads:[~2017-07-28 1:10 UTC|newest]
Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <3d2459c7-defd-a47e-6cea-007c10cecaac@asrmicro.com>
2017-07-26 14:16 ` [Question]: try to fix contention between expire_timers and try_to_del_timer_sync Thomas Gleixner
2017-07-27 1:29 ` qiaozhou
2017-07-27 15:14 ` Will Deacon
2017-07-27 15:19 ` Thomas Gleixner
2017-07-28 1:10 ` Vikram Mulukutla [this message]
2017-07-28 9:28 ` Peter Zijlstra
2017-07-28 19:11 ` Vikram Mulukutla
2017-07-28 9:28 ` Will Deacon
2017-07-28 19:09 ` Vikram Mulukutla
2017-07-31 11:20 ` qiaozhou
2017-08-01 7:37 ` qiaozhou
2017-08-03 23:32 ` Vikram Mulukutla
2017-08-04 3:15 ` qiaozhou
2017-07-31 13:13 ` Will Deacon
2017-08-03 23:25 ` Vikram Mulukutla
2017-08-15 18:40 ` Will Deacon
2017-08-25 19:48 ` Vikram Mulukutla
2017-08-25 20:25 ` Vikram Mulukutla
2017-08-28 23:12 ` Vikram Mulukutla
2017-09-06 11:19 ` qiaozhou
2017-09-25 11:02 ` qiaozhou
2017-10-02 14:14 ` Will Deacon
2017-10-11 8:33 ` qiaozhou
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=e1cc02c5e7dfd4d6bec937b6dc97bfc7@codeaurora.org \
--to=markivx@codeaurora.org \
--cc=john.stultz@linaro.org \
--cc=linux-kernel-owner@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=marc.zyngier@arm.com \
--cc=peterz@infradead.org \
--cc=qiaozhou@asrmicro.com \
--cc=sboyd@codeaurora.org \
--cc=sudeep.holla@arm.com \
--cc=tglx@linutronix.de \
--cc=wilburwang@asrmicro.com \
--cc=will.deacon@arm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).