linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Steven Rostedt <rostedt@goodmis.org>
To: Peter Zijlstra <peterz@infradead.org>
Cc: LKML <linux-kernel@vger.kernel.org>,
	Ingo Molnar <mingo@kernel.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	Clark Williams <williams@redhat.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Nick Piggin <nickpiggin@yahoo.com.au>
Subject: [PATCH] sched: Do not release current rq lock on non contended double_lock_balance()
Date: Mon, 13 Jun 2016 12:37:32 -0400	[thread overview]
Message-ID: <20160613123732.3a8ccc57@gandalf.local.home> (raw)

Back in 2008 Gregory Haskins noticed that there was unfair locking when
dealing with double_lock_balance(). The old code (which still exists
when CONFIG_PREEMPT is disabled), on contention, will check the
ordering of the run-queue locks (by address) to determine if it should
release the current lock before taking the second lock.

	if (!spin_trylock(busiest->lock)) {
		if (busiest < this_rq) {
			spin_unlock(&this_rq->lock);
			spin_lock(&busiest->lock);
			spin_lock(&this_rq->lock);
		} else
			spin_lock(&busiest->lock);
	}

This gives unfair access for higher CPUs who are trying to take a
contended lock. For lower numbered CPUs (with lower address of their rq
structure), it wont release its lock when trying to take the next lock.
This can cause an escalation of latency in locking if there's a waiter
on the current 'this_rq->lock', which now must also wait for the
'busiest->lock' to be taken.

The solution was to simply release the current (this_rq) lock and then
take both locks.

	spin_unlock(&this_rq->lock);
	double_rq_lock(this_rq, busiest);

Where double_rq_lock() takes the locks in order of the rq's address.
This way, if there was a waiter on the rq lock, due to the ticket
spinlocks, it would grab the lock immediately, and the current owner
will have to wait for it now that it released the lock.

What I found while testing an 80 CPU core was a large "ping-pong"ing
around of the locks.

Let's say that an RT task on a CPU is woken up from another CPU, and
the CPU that the task was woken on is also running an RT task and tries
to push:


	CPU 0				CPU 1
	-----				-----
    [ wake up ]
				     spin_lock(cpu1_rq->lock);
    spin_lock(cpu1_rq->lock)
				    double_lock_balance()
				    [ release cpu1_rq->lock ]
				    spin_lock(cpu1_rq->lock)
    [due to ticket, now acquires
     cpu1_rq->lock ]

    [goes to push task]
    double_lock_balance()
    [ release cpu1_rq->lock ]
                                   [ acquires lock ]
				   spin_lock(cpu2_rq->lock)
				   [ blocks as cpu2 is using it ]

And then CPU2 would call double_lock_balance() and the ping pong
continues.

What I could not understand about Gregory's patch is that regardless of
contention, the currently held lock is always released, opening up a
window for this ping ponging to occur. When I changed the code to only
release on contention of the second lock, things improved tremendously.

The problem that Gregory was facing was that there was an unfair access
when there was contention. But what his solution did was to always
release the lock even when there was no contention on the second lock,
which happened to cause more contention later on.

I talked with Peter Zijlstra about this, and he pointed me to the
thread where this patch was discussed back in 2008. As I read the
thread, Nick Piggin brought up this very issue. After the little
discussion, it was determined to let the waiting task have the lock
even when there was no contention to be even more "fair". The problem
though, Nick's suggestion was never tested. If it had been back then,
I'm sure that Gregory would have decided to only release the lock if
there was contention on the second lock.

Link: http://lkml.kernel.org/r/20080825201534.23217.14936.stgit@dev.haskins.net

Suggested-by: Nick Piggin <nickpiggin@yahoo.com.au>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
---
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index ec2e8d23527e..3ed9ef770085 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -1548,10 +1548,15 @@ static inline int _double_lock_balance(struct rq *this_rq, struct rq *busiest)
 	__acquires(busiest->lock)
 	__acquires(this_rq->lock)
 {
-	raw_spin_unlock(&this_rq->lock);
-	double_rq_lock(this_rq, busiest);
+	int ret = 0;
+
+	if (unlikely(!raw_spin_trylock(&busiest->lock))) {
+		raw_spin_unlock(&this_rq->lock);
+		double_rq_lock(this_rq, busiest);
+		ret = 1;
+	}
 
-	return 1;
+	return ret;
 }
 
 #else

             reply	other threads:[~2016-06-13 16:37 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-06-13 16:37 Steven Rostedt [this message]
2016-06-14 11:58 ` Peter Zijlstra
2016-06-14 17:52   ` Steven Rostedt
2016-06-14 18:02   ` Steven Rostedt
2016-06-14 19:42     ` Peter Zijlstra
2016-06-15 11:14     ` Peter Zijlstra
2016-06-15 16:13       ` Steven Rostedt

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160613123732.3a8ccc57@gandalf.local.home \
    --to=rostedt@goodmis.org \
    --cc=akpm@linux-foundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@kernel.org \
    --cc=nickpiggin@yahoo.com.au \
    --cc=peterz@infradead.org \
    --cc=tglx@linutronix.de \
    --cc=williams@redhat.com \
    --subject='Re: [PATCH] sched: Do not release current rq lock on non contended double_lock_balance()' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).