From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner@vger.kernel.org>
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
	id S1030503AbbD1Pxi (ORCPT <rfc822;w@1wt.eu>);
	Tue, 28 Apr 2015 11:53:38 -0400
Received: from mail-am1on0059.outbound.protection.outlook.com ([157.56.112.59]:2089
	"EHLO emea01-am1-obe.outbound.protection.outlook.com"
	rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP
	id S965853AbbD1Pxg (ORCPT <rfc822;linux-kernel@vger.kernel.org>);
	Tue, 28 Apr 2015 11:53:36 -0400
Authentication-Results: redhat.com; dkim=none (message not signed)
 header.d=none;
Message-ID: <553FACF1.2020405@ezchip.com>
Date: Tue, 28 Apr 2015 11:53:21 -0400
From: Chris Metcalf <cmetcalf@ezchip.com>
User-Agent: Mozilla/5.0 (X11; Linux i686 on x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.6.0
MIME-Version: 1.0
To: Peter Zijlstra <peterz@infradead.org>,
        "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
CC: Manfred Spraul <manfred@colorfullife.com>, Oleg Nesterov <oleg@redhat.com>,
        Kirill Tkhai <ktkhai@parallels.com>, <linux-kernel@vger.kernel.org>,
        Ingo Molnar <mingo@redhat.com>, Josh Poimboeuf <jpoimboe@redhat.com>
Subject: Re: [PATCH 2/2] [PATCH] sched: Add smp_rmb() in task rq locking cycles
References: <20150217183636.GR5029@twins.programming.kicks-ass.net> <20150217215231.GK4166@linux.vnet.ibm.com> <20150218155904.GA27687@redhat.com> <54E4E479.4050003@colorfullife.com> <20150218224317.GC5029@twins.programming.kicks-ass.net> <20150219141905.GA11018@redhat.com> <54E77CC0.5030401@colorfullife.com> <20150220184551.GQ2896@worktop.programming.kicks-ass.net> <20150425195602.GA26676@linux.vnet.ibm.com> <20150426105213.GA27191@linux.vnet.ibm.com> <20150428143357.GF23123@twins.programming.kicks-ass.net>
In-Reply-To: <20150428143357.GF23123@twins.programming.kicks-ass.net>
Content-Type: text/plain; charset="windows-1252"; format=flowed
Content-Transfer-Encoding: 7bit
X-Originating-IP: [12.216.194.146]
X-ClientProxiedBy: BN3PR0301CA0041.namprd03.prod.outlook.com (25.160.180.179)
 To VI1PR02MB0783.eurprd02.prod.outlook.com (25.162.14.145)
X-Microsoft-Antispam: UriScan:;BCL:0;PCL:0;RULEID:;SRVR:VI1PR02MB0783;
X-Forefront-Antispam-Report: BMV:1;SFV:NSPM;SFS:(10009020)(6009001)(6049001)(51704005)(52314003)(479174004)(377454003)(24454002)(36756003)(23746002)(62966003)(77156002)(87266999)(86362001)(50986999)(54356999)(47776003)(50466002)(15975445007)(77096005)(575784001)(65956001)(92566002)(65806001)(80316001)(19580395003)(40100003)(83506001)(76176999)(33656002)(65816999)(122386002)(64126003)(46102003)(42186005)(93886004)(66066001)(4001350100001)(2950100001)(5001770100001)(87976001)(18886065003);DIR:OUT;SFP:1101;SCL:1;SRVR:VI1PR02MB0783;H:[10.7.0.41];FPR:;SPF:None;MLV:sfv;LANG:en;
X-Microsoft-Antispam-PRVS: <VI1PR02MB07830F9EA3D788016465D2DBAFE80@VI1PR02MB0783.eurprd02.prod.outlook.com>
X-Exchange-Antispam-Report-Test: UriScan:;
X-Exchange-Antispam-Report-CFA-Test: BCL:0;PCL:0;RULEID:(601004)(5002010)(5005006)(3002001);SRVR:VI1PR02MB0783;BCL:0;PCL:0;RULEID:;SRVR:VI1PR02MB0783;
X-Forefront-PRVS: 0560A2214D
X-OriginatorOrg: ezchip.com
X-MS-Exchange-CrossTenant-OriginalArrivalTime: 28 Apr 2015 15:53:32.6812 (UTC)
X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted
X-MS-Exchange-Transport-CrossTenantHeadersStamped: VI1PR02MB0783
Sender: linux-kernel-owner@vger.kernel.org
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org

On 04/28/2015 10:33 AM, Peter Zijlstra wrote:
> On Sun, Apr 26, 2015 at 03:52:13AM -0700, Paul E. McKenney wrote:
>
>> And then an smp_read_barrier_depends() would be needed either here
>> or embedded in apin_unlock_wait().  But we also need to check the
>> spin_unlock_wait() implementations to see if any are potentially
>> vulnerable to compiler misbehavior due to lack of ACCESS_ONCE(),
>> READ_ONCE(), or other sources of the required volatility:
>>
>> o	tile: For 32-bit, looks like a bug.  Compares ->current_ticket and
>> 	->next_ticket with no obvious protection.  The compiler is free to
>> 	load them in either order, so it is possible that the two fields
>> 	could compare equal despite never having actually been equal at
>> 	any given time.  Needs something like arm, arm64, mips, or x86
>> 	to do single fetch, then compare fields in quantity fetched.
>>
>> 	Except that this appears to be using int on a 32-bit system,
>> 	thus might not have a 64-bit load.  If that is the case, the
>> 	trick would be to load them in order.  Except that this can be
>> 	defeated by overflow.  Are there really 32-bit tile systems with
>> 	enough CPUs to overflow an unsigned short?

As you surmise, tilepro doesn't have 64-bit loads.  So we are stuck with
32-bit loads on these two fields.  It's true that spin_unlock_wait() can
therefore falsely claim that the lock is unlocked, but it should be only a
hint anyway, since by the time the caller tries to act on that information
the lock may have been retaken anyway, right?  If spin_unlock_wait() is
really trying to guarantee that the lock was available at some point in
the interval between when it was called and when it returned, we could use
READ_ONCE() to read the current ticket value first; is that a necessary
part of the semantics?

(Even with READ_ONCE we'd still be exposed to a technical risk that
others cores had taken and released the lock 2 billion times in between
the two loads of the core running spin_unlock_wait, without ever having
the lock actually be free, so technically the only solution is for that core
to actually acquire and release the lock, but that seems a bit extreme
in practice.)

The reason we use two 32-bit fields on tilepro is that the only available
atomic instruction is tns (test and set), which sets a 32-bit "1" value
into the target memory and returns the old 32-bit value.  So we need to
be able to safely "corrupt" the next_ticket value with a "1", load the
current_ticket value, and if they don't match, rewrite next_ticket with
its old value.  We can't safely do this if next_ticket and current_ticket
are 16-bit fields in one 32-bit word, since the "tns" operation would
corrupt the current_ticket value in that case, making someone
waiting on ticket 0 think they owned the lock.

On tilegx we made the atomic instruction situation much, much better :-)

>> For 64-bit, a READ_ONCE() appears to be in order -- no obvious
>> 	volatility present.
>>

It depends, I guess.  If you are spinning on arch_spin_is_locked(), yes,
you need to make sure to do something to ensure the value is re-read.
But arch_spin_unlock_wait() already calls delay_backoff(), which calls
relax(), which includes a barrier(), so we're OK there.  But if 
stylistically
the consensus calls for a READ_ONCE() in arch_spin_is_locked(), I
can certainly add that.  What do folks think?

Assuming the answers to both questions is "change the code",
how does this look?

diff --git a/arch/tile/include/asm/spinlock_32.h b/arch/tile/include/asm/spinlock_32.h
index c0a77b38d39a..7c7b80bd83db 100644
--- a/arch/tile/include/asm/spinlock_32.h
+++ b/arch/tile/include/asm/spinlock_32.h
@@ -41,8 +41,23 @@ static inline int arch_spin_is_locked(arch_spinlock_t *lock)
          * to claim the lock is held, since it will be momentarily
          * if not already.  There's no need to wait for a "valid"
          * lock->next_ticket to become available.
+        *
+        * We order the reads here so that if we claim the lock is
+        * unlocked, we know it actually was for at least a moment.
+        * Since current_ticket is never incremented above
+        * next_ticket, by reading current first, then next, and
+        * finding them equal, we know that during that window between
+        * the reads the lock was unlocked.
+        *
+        * There is a technical risk in this that between reading
+        * current and reading next, other cores locked and unlocked
+        * two billion times without the lock ever being unlocked, and
+        * therefore it looks like the lock was at some point unlocked
+        * but it never was.  But this seems highly improbable.
          */
-       return lock->next_ticket != lock->current_ticket;
+       int current = READ_ONCE(lock->current_ticket);
+       int next = READ_ONCE(lock->next_ticket);
+       return next != current;
  }
  
  void arch_spin_lock(arch_spinlock_t *lock);
diff --git a/arch/tile/include/asm/spinlock_64.h b/arch/tile/include/asm/spinlock_64.h
index 9a12b9c7e5d3..b9718fb4e74a 100644
--- a/arch/tile/include/asm/spinlock_64.h
+++ b/arch/tile/include/asm/spinlock_64.h
@@ -18,6 +18,8 @@
  #ifndef _ASM_TILE_SPINLOCK_64_H
  #define _ASM_TILE_SPINLOCK_64_H
  
+#include <linux/compiler.h>
+
  /* Shifts and masks for the various fields in "lock". */
  #define __ARCH_SPIN_CURRENT_SHIFT      17
  #define __ARCH_SPIN_NEXT_MASK          0x7fff
@@ -44,7 +46,8 @@ static inline u32 arch_spin_next(u32 val)
  /* The lock is locked if a task would have to wait to get it. */
  static inline int arch_spin_is_locked(arch_spinlock_t *lock)
  {
-       u32 val = lock->lock;
+       /* Use READ_ONCE() to ensure that calling this in a loop is OK. */
+       u32 val = READ_ONCE(lock->lock);
         return arch_spin_current(val) != arch_spin_next(val);
  }

-- 
Chris Metcalf, EZChip Semiconductor
http://www.ezchip.com