From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner@vger.kernel.org>
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
	id S1751986AbcGNRPj (ORCPT <rfc822;w@1wt.eu>);
	Thu, 14 Jul 2016 13:15:39 -0400
Received: from mx0a-001b2d01.pphosted.com ([148.163.156.1]:36492 "EHLO
	mx0a-001b2d01.pphosted.com" rhost-flags-OK-OK-OK-OK)
	by vger.kernel.org with ESMTP id S1751384AbcGNRPg (ORCPT
	<rfc822;linux-kernel@vger.kernel.org>);
	Thu, 14 Jul 2016 13:15:36 -0400
X-IBM-Helo: d01dlp02.pok.ibm.com
X-IBM-MailFrom: paulmck@linux.vnet.ibm.com
Date: Thu, 14 Jul 2016 10:15:50 -0700
From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
To: Peter Zijlstra <peterz@infradead.org>
Cc: Tejun Heo <tj@kernel.org>, John Stultz <john.stultz@linaro.org>,
        Ingo Molnar <mingo@redhat.com>, lkml <linux-kernel@vger.kernel.org>,
        Dmitry Shmidt <dimitrysh@google.com>,
        Rom Lemarchand <romlem@google.com>, Colin Cross <ccross@google.com>,
        Todd Kjos <tkjos@google.com>, Oleg Nesterov <oleg@redhat.com>
Subject: Re: Severe performance regression w/ 4.4+ on Android due to cgroup
 locking changes
Reply-To: paulmck@linux.vnet.ibm.com
References: <20160713182102.GJ4065@mtj.duckdns.org>
 <20160713183347.GK4065@mtj.duckdns.org>
 <CALAqxLU76kVm8b5GNnSk25-Xk6D1efiWtwNJbY1th_pcN9Ks=g@mail.gmail.com>
 <20160713201823.GB29670@mtj.duckdns.org>
 <20160713202657.GW30154@twins.programming.kicks-ass.net>
 <20160713203944.GC29670@mtj.duckdns.org>
 <20160713205102.GZ30909@twins.programming.kicks-ass.net>
 <20160714131809.GO30927@twins.programming.kicks-ass.net>
 <20160714162355.GW7094@linux.vnet.ibm.com>
 <20160714164547.GG30154@twins.programming.kicks-ass.net>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20160714164547.GG30154@twins.programming.kicks-ass.net>
User-Agent: Mutt/1.5.21 (2010-09-15)
X-TM-AS-MML: disable
X-Content-Scanned: Fidelis XPS MAILER
x-cbid: 16071417-0044-0000-0000-000000A7C744
X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused
x-cbparentid: 16071417-0045-0000-0000-000004BDE7C0
Message-Id: <20160714171550.GX7094@linux.vnet.ibm.com>
X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:,, definitions=2016-07-14_08:,,
 signatures=0
X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 spamscore=0 suspectscore=0
 malwarescore=0 phishscore=0 adultscore=0 bulkscore=0 classifier=spam
 adjust=0 reason=mlx scancount=1 engine=8.0.1-1604210000
 definitions=main-1607140182
Sender: linux-kernel-owner@vger.kernel.org
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org

On Thu, Jul 14, 2016 at 06:45:47PM +0200, Peter Zijlstra wrote:
> On Thu, Jul 14, 2016 at 09:23:55AM -0700, Paul E. McKenney wrote:
> > Hmmm...  How does this handle the following sequence of events for
> > the case where we are not biased towards the reader?
> > 
> > o	The per-CPU rwsem is set up with RCU_NONE and readers_slow
> > 	(as opposed to readers_block).  The rcu_sync ->gp_state is
> > 	GP_PENDING, which means that rcu_sync_is_idle() will always
> > 	return true.
> 
> /false/, rcu_sync_is_idle() will always be false, to force us into the
> slowpath.

Ah, got it...

> > o	Task A on CPU 0 runs percpu_down_read() to completion, and remains
> > 	within the critical section.  CPU 0's ->refcount is therefore 1.
> > 
> > o	Task B on CPU 1 does percpu_down_write(), which write-acquires
> > 	->rw_sem, does rcu_sync_enter() (which is a no-op due to
> > 	RCU_NONE), sets ->state to readers_block, and is just now going
> > 	to wait for readers, which first invokes readers_active_check(),
> > 	which starts summing the ->refcount for CPUs 0, 1, and 2,
> > 	obtaining the value 1 so far.
> > 
> > o	Task C CPU 2 enters percpu_down_read(), disables preemption,
> > 	increments CPU 2's ->refcount, sees rcu_sync_is_idle() return
> > 	true (so skips __percpu_down_read()), enables preemption, and
> > 	enters its critical section.
> 
> false, so does __percpu_down_read()
> 
> > 
> > o	Task C migrates to CPU 3 and invokes percpu_up_read(), which
> > 	disables preemption, sees rcu_sync_is_idle() return true, calls
> > 	__this_cpu_dec() on CPU 3's ->refcount, and enables preemption.
> > 	The value of CPU 3's ->refcount is thus (unsigned int)-1.
> 
> __percpu_up_read()
> 
> > 
> > o	Task B on CPU 1 continues execution in readers_active_check(), with
> > 	the full sum being zero.
> > 
> > So it looks to me like we have Task A as a writer at the same time that
> > Task A is a reader, which would not be so good.
> > 
> > So what am I missing here?
> 
> for RCU_NONE we init rsp->gp_state to !0, which makes:
> 
> static inline rcu_sync_is_idle()'s
> 
> 	return !rsp->gp_state (aka. rsp->gp_state == 0)
> 
> return false.
> 
> > And a couple of checkpatch nits below.  Yes, I had to apply the patch to
> > figure out what it was doing.  ;-)
> 
> Yah, too much churn to read :-)
> 
> In any case, rest assured you've already gone over this part of the
> patch several times. I repurposed an old percpu-rwsem optimization, Oleg
> recognised it.

OK, in that case I will hold off pending John Stultz's performance
checks.

							Thanx, Paul