From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner@vger.kernel.org>
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1752699AbcKRN2c (ORCPT <rfc822;w@1wt.eu>);
        Fri, 18 Nov 2016 08:28:32 -0500
Received: from mx0b-001b2d01.pphosted.com ([148.163.158.5]:48612 "EHLO
        mx0a-001b2d01.pphosted.com" rhost-flags-OK-OK-OK-FAIL)
        by vger.kernel.org with ESMTP id S1752336AbcKRN2D (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Fri, 18 Nov 2016 08:28:03 -0500
Date: Fri, 18 Nov 2016 05:27:54 -0800
From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
To: Lance Roy <ldr709@gmail.com>
Cc: Lai Jiangshan <jiangshanlai@gmail.com>,
        LKML <linux-kernel@vger.kernel.org>, Ingo Molnar <mingo@kernel.org>,
        dipankar@in.ibm.com, akpm@linux-foundation.org,
        Mathieu Desnoyers <mathieu.desnoyers@efficios.com>,
        Josh Triplett <josh@joshtriplett.org>,
        Thomas Gleixner <tglx@linutronix.de>,
        Peter Zijlstra <peterz@infradead.org>,
        Steven Rostedt <rostedt@goodmis.org>,
        David Howells <dhowells@redhat.com>,
        Eric Dumazet <edumazet@google.com>, dvhart@linux.intel.com,
        =?iso-8859-1?Q?Fr=E9d=E9ric?= Weisbecker <fweisbec@gmail.com>,
        oleg@redhat.com, pranith kumar <bobby.prani@gmail.com>
Subject: Re: [PATCH RFC tip/core/rcu] SRCU rewrite
Reply-To: paulmck@linux.vnet.ibm.com
References: <20161114183636.GA28589@linux.vnet.ibm.com>
 <CAJhGHyA=y1NSoHhZaQumfM_odN4Lcs4tJ3FyQfdLY9p7cZu2nQ@mail.gmail.com>
 <20161117115304.0ff3f84e@gmail.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20161117115304.0ff3f84e@gmail.com>
User-Agent: Mutt/1.5.21 (2010-09-15)
X-TM-AS-GCONF: 00
X-Content-Scanned: Fidelis XPS MAILER
x-cbid: 16111813-0004-0000-0000-000010E21E0C
X-IBM-SpamModules-Scores: 
X-IBM-SpamModules-Versions: BY=3.00006099; HX=3.00000240; KW=3.00000007;
 PH=3.00000004; SC=3.00000189; SDB=6.00782402; UDB=6.00377542; IPR=6.00559871;
 BA=6.00004892; NDR=6.00000001; ZLA=6.00000005; ZF=6.00000009; ZB=6.00000000;
 ZP=6.00000000; ZH=6.00000000; ZU=6.00000002; MB=3.00013368; XFM=3.00000011;
 UTC=2016-11-18 13:27:59
X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused
x-cbparentid: 16111813-0005-0000-0000-00007AB14969
Message-Id: <20161118132754.GP3612@linux.vnet.ibm.com>
X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:,, definitions=2016-11-18_07:,,
 signatures=0
X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 spamscore=0 suspectscore=0
 malwarescore=0 phishscore=0 adultscore=0 bulkscore=0 classifier=spam
 adjust=0 reason=mlx scancount=1 engine=8.0.1-1609300000
 definitions=main-1611180236
Sender: linux-kernel-owner@vger.kernel.org
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org

On Thu, Nov 17, 2016 at 11:53:04AM -0800, Lance Roy wrote:
> On Thu, 17 Nov 2016 21:58:34 +0800
> Lai Jiangshan <jiangshanlai@gmail.com> wrote:
> > from the changelog, it sounds like that "ULONG_MAX - NR_CPUS" is the limit
> > of the implements(old or this one). but actually the real max number of
> > active readers is much smaller, I think ULONG_MAX/4 can be used here instead
> > and that part of the changelog can be removed.
> In the old version, there are two separate limits. There first is that there
> are no more than ULONG_MAX nested or parallel readers, as otherwise ->c[] would
> overflow.
> 
> The other limit is to prevent ->seq[] from overflowing during
> srcu_readers_active_idx_check(). For this to happen, there must be ULONG_MAX+1
> readers that loaded ->completed before srcu_flip() was run which then increment
> ->seq[]. The ->seq[] array is supposed to prevent
> srcu_readers_active_idx_check() from completing successfully if any such
> readers increment ->seq[], because otherwise they could decrement ->c[] while
> it is being read, which could cause it to incorrectly report that there are no
> active readers. If ->seq[] overflows then there is nothing (except how
> improbable it is) to prevent this from happening.
> 
> I used to think (because of the previous comment) that there could be at most
> one such increment of ->seq[] per CPU, as they would have to be using to old
> value of ->completed and preemption would be disabled. This is not the case
> because there are no barriers around srcu_flip(), so the processor is not
> required to increment ->completed before reading ->seq[] the first time, nor is
> it required to wait until it is done reading ->seq[] the second time before
> incrementing. This means that the following code could cause ->seq[] to
> increment an arbitrarily large number of times between the two ->seq[] loads in
> srcu_readers_active_idx_check().
> 	while (true) {
> 		int idx = srcu_read_lock(sp);
> 		srcu_read_unlock(sp, idx);
> 	}

I also initially thought that there would need to be a memory barrier
immediately after srcu_flip().  But after further thought, I don't
believe that this is the case.

The key point is that updaters do the flip, sum the unlock counters,
do a full memory barrier, then sum the lock counters.

We therefore know that if an updater sees an unlock, it is guaranteed
to see the corresponding lock.  Which prevents negative sums.  However,
it is true that the flip and the unlock reads can be interchanged.
This can result in failing to see a count of zero, but it cannot result
in spuriously seeing a count of zero.

More to this point, if an updater fails to see a lock, the next time
that CPU/task does an srcu_read_lock(), that CPU/task is guaranteed
to see the new value of the index.  This limits the number of CPUs/tasks
that can be using the old value of the index.  Given that preemption
is disabled across the fetch of the index and the increment of the lock
count, that number is NR_CPUS-1, given that the updater has to be running
on one of the CPUs (as Mathieu pointed out earlier in this thread).

Or am I missing something?

							Thanx, Paul