Re: Performance regression from switching lock to rw-sem for anon-vma tree

From: Ingo Molnar <mingo@kernel.org>
To: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Ingo Molnar <mingo@elte.hu>,
	Andrea Arcangeli <aarcange@redhat.com>,
	Mel Gorman <mgorman@suse.de>, "Shi, Alex" <alex.shi@intel.com>,
	Andi Kleen <andi@firstfloor.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Michel Lespinasse <walken@google.com>,
	Davidlohr Bueso <davidlohr.bueso@hp.com>,
	"Wilcox, Matthew R" <matthew.r.wilcox@intel.com>,
	Dave Hansen <dave.hansen@intel.com>,
	Peter Zijlstra <a.p.zijlstra@chello.nl>,
	Rik van Riel <riel@redhat.com>,
	linux-kernel@vger.kernel.org, linux-mm <linux-mm@kvack.org>
Subject: Re: Performance regression from switching lock to rw-sem for anon-vma tree
Date: Fri, 28 Jun 2013 11:38:09 +0200	[thread overview]
Message-ID: <20130628093809.GB29205@gmail.com> (raw)
In-Reply-To: <1372375873.22432.200.camel@schen9-DESK>

* Tim Chen <tim.c.chen@linux.intel.com> wrote:

> I tried some tweaking that checks sem->count for read owned lock. Even 
> though it reduces the percentage of acquisitions that need sleeping by 
> 8.14% (from 18.6% to 10.46%), it increases the writer acquisition 
> blocked count by 11%. This change still doesn't boost throughput and has 
> a tiny regression for the workload.
> 
> 						Opt Spin Opt Spin
> 							 (with tweak)	
> Writer acquisition blocked count		7359040	8168006
> Blocked by reader				 0.55%	 0.52%
> Lock acquired first attempt (lock stealing)	16.92%	19.70%
> Lock acquired second attempt (1 sleep)	17.60%	 9.32%
> Lock acquired after more than 1 sleep		 1.00%	 1.14%
> Lock acquired with optimistic spin		64.48%	69.84%
> Optimistic spin abort 1 			11.77%	 1.14%
> Optimistic spin abort 2			 6.81%	 9.22%
> Optimistic spin abort 3			 0.02%	 0.10%

So lock stealing+spinning now acquires the lock successfully ~90% of the 
time, the remaining sleeps are:

> Lock acquired second attempt (1 sleep)	......	 9.32%

And the reason these sleeps are mostly due to:

> Optimistic spin abort 2			 .....	 9.22%

Right?

So this particular #2 abort point is:

|       preempt_disable();
|       for (;;) {
|               owner = ACCESS_ONCE(sem->owner);
|               if (owner && !rwsem_spin_on_owner(sem, owner))
|                       break;   <--------------------------- abort (2)

Next step would be to investigate why we decide to not spin there, why 
does rwsem_spin_on_owner() fail?

If I got all the patches right, rwsem_spin_on_owner() is this:

+static noinline
+int rwsem_spin_on_owner(struct rw_semaphore *lock, struct task_struct *owner)
+{
+       rcu_read_lock();
+       while (owner_running(lock, owner)) {
+               if (need_resched())
+                       break;
+
+               arch_mutex_cpu_relax();
+       }
+       rcu_read_unlock();
+
+       /*
+        * We break out the loop above on need_resched() and when the
+        * owner changed, which is a sign for heavy contention. Return
+        * success only when lock->owner is NULL.
+        */
+       return lock->owner == NULL;
+}

where owner_running() is similar to the mutex spinning code: it in the end 
checks owner->on_cpu - like the mutex code.

If my analysis is correct so far then it might be useful to add two more 
stats: did rwsem_spin_on_owner() fail because lock->owner == NULL [owner 
released the rwsem], or because owner_running() failed [owner went to 
sleep]?

Thanks,

	Ingo