From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1030503AbbD1Pxi (ORCPT ); Tue, 28 Apr 2015 11:53:38 -0400 Received: from mail-am1on0059.outbound.protection.outlook.com ([157.56.112.59]:2089 "EHLO emea01-am1-obe.outbound.protection.outlook.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S965853AbbD1Pxg (ORCPT ); Tue, 28 Apr 2015 11:53:36 -0400 Authentication-Results: redhat.com; dkim=none (message not signed) header.d=none; Message-ID: <553FACF1.2020405@ezchip.com> Date: Tue, 28 Apr 2015 11:53:21 -0400 From: Chris Metcalf User-Agent: Mozilla/5.0 (X11; Linux i686 on x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.6.0 MIME-Version: 1.0 To: Peter Zijlstra , "Paul E. McKenney" CC: Manfred Spraul , Oleg Nesterov , Kirill Tkhai , , Ingo Molnar , Josh Poimboeuf Subject: Re: [PATCH 2/2] [PATCH] sched: Add smp_rmb() in task rq locking cycles References: <20150217183636.GR5029@twins.programming.kicks-ass.net> <20150217215231.GK4166@linux.vnet.ibm.com> <20150218155904.GA27687@redhat.com> <54E4E479.4050003@colorfullife.com> <20150218224317.GC5029@twins.programming.kicks-ass.net> <20150219141905.GA11018@redhat.com> <54E77CC0.5030401@colorfullife.com> <20150220184551.GQ2896@worktop.programming.kicks-ass.net> <20150425195602.GA26676@linux.vnet.ibm.com> <20150426105213.GA27191@linux.vnet.ibm.com> <20150428143357.GF23123@twins.programming.kicks-ass.net> In-Reply-To: <20150428143357.GF23123@twins.programming.kicks-ass.net> Content-Type: text/plain; charset="windows-1252"; format=flowed Content-Transfer-Encoding: 7bit X-Originating-IP: [12.216.194.146] X-ClientProxiedBy: BN3PR0301CA0041.namprd03.prod.outlook.com (25.160.180.179) To VI1PR02MB0783.eurprd02.prod.outlook.com (25.162.14.145) X-Microsoft-Antispam: UriScan:;BCL:0;PCL:0;RULEID:;SRVR:VI1PR02MB0783; X-Forefront-Antispam-Report: BMV:1;SFV:NSPM;SFS:(10009020)(6009001)(6049001)(51704005)(52314003)(479174004)(377454003)(24454002)(36756003)(23746002)(62966003)(77156002)(87266999)(86362001)(50986999)(54356999)(47776003)(50466002)(15975445007)(77096005)(575784001)(65956001)(92566002)(65806001)(80316001)(19580395003)(40100003)(83506001)(76176999)(33656002)(65816999)(122386002)(64126003)(46102003)(42186005)(93886004)(66066001)(4001350100001)(2950100001)(5001770100001)(87976001)(18886065003);DIR:OUT;SFP:1101;SCL:1;SRVR:VI1PR02MB0783;H:[10.7.0.41];FPR:;SPF:None;MLV:sfv;LANG:en; X-Microsoft-Antispam-PRVS: X-Exchange-Antispam-Report-Test: UriScan:; X-Exchange-Antispam-Report-CFA-Test: BCL:0;PCL:0;RULEID:(601004)(5002010)(5005006)(3002001);SRVR:VI1PR02MB0783;BCL:0;PCL:0;RULEID:;SRVR:VI1PR02MB0783; X-Forefront-PRVS: 0560A2214D X-OriginatorOrg: ezchip.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 28 Apr 2015 15:53:32.6812 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-Transport-CrossTenantHeadersStamped: VI1PR02MB0783 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 04/28/2015 10:33 AM, Peter Zijlstra wrote: > On Sun, Apr 26, 2015 at 03:52:13AM -0700, Paul E. McKenney wrote: > >> And then an smp_read_barrier_depends() would be needed either here >> or embedded in apin_unlock_wait(). But we also need to check the >> spin_unlock_wait() implementations to see if any are potentially >> vulnerable to compiler misbehavior due to lack of ACCESS_ONCE(), >> READ_ONCE(), or other sources of the required volatility: >> >> o tile: For 32-bit, looks like a bug. Compares ->current_ticket and >> ->next_ticket with no obvious protection. The compiler is free to >> load them in either order, so it is possible that the two fields >> could compare equal despite never having actually been equal at >> any given time. Needs something like arm, arm64, mips, or x86 >> to do single fetch, then compare fields in quantity fetched. >> >> Except that this appears to be using int on a 32-bit system, >> thus might not have a 64-bit load. If that is the case, the >> trick would be to load them in order. Except that this can be >> defeated by overflow. Are there really 32-bit tile systems with >> enough CPUs to overflow an unsigned short? As you surmise, tilepro doesn't have 64-bit loads. So we are stuck with 32-bit loads on these two fields. It's true that spin_unlock_wait() can therefore falsely claim that the lock is unlocked, but it should be only a hint anyway, since by the time the caller tries to act on that information the lock may have been retaken anyway, right? If spin_unlock_wait() is really trying to guarantee that the lock was available at some point in the interval between when it was called and when it returned, we could use READ_ONCE() to read the current ticket value first; is that a necessary part of the semantics? (Even with READ_ONCE we'd still be exposed to a technical risk that others cores had taken and released the lock 2 billion times in between the two loads of the core running spin_unlock_wait, without ever having the lock actually be free, so technically the only solution is for that core to actually acquire and release the lock, but that seems a bit extreme in practice.) The reason we use two 32-bit fields on tilepro is that the only available atomic instruction is tns (test and set), which sets a 32-bit "1" value into the target memory and returns the old 32-bit value. So we need to be able to safely "corrupt" the next_ticket value with a "1", load the current_ticket value, and if they don't match, rewrite next_ticket with its old value. We can't safely do this if next_ticket and current_ticket are 16-bit fields in one 32-bit word, since the "tns" operation would corrupt the current_ticket value in that case, making someone waiting on ticket 0 think they owned the lock. On tilegx we made the atomic instruction situation much, much better :-) >> For 64-bit, a READ_ONCE() appears to be in order -- no obvious >> volatility present. >> It depends, I guess. If you are spinning on arch_spin_is_locked(), yes, you need to make sure to do something to ensure the value is re-read. But arch_spin_unlock_wait() already calls delay_backoff(), which calls relax(), which includes a barrier(), so we're OK there. But if stylistically the consensus calls for a READ_ONCE() in arch_spin_is_locked(), I can certainly add that. What do folks think? Assuming the answers to both questions is "change the code", how does this look? diff --git a/arch/tile/include/asm/spinlock_32.h b/arch/tile/include/asm/spinlock_32.h index c0a77b38d39a..7c7b80bd83db 100644 --- a/arch/tile/include/asm/spinlock_32.h +++ b/arch/tile/include/asm/spinlock_32.h @@ -41,8 +41,23 @@ static inline int arch_spin_is_locked(arch_spinlock_t *lock) * to claim the lock is held, since it will be momentarily * if not already. There's no need to wait for a "valid" * lock->next_ticket to become available. + * + * We order the reads here so that if we claim the lock is + * unlocked, we know it actually was for at least a moment. + * Since current_ticket is never incremented above + * next_ticket, by reading current first, then next, and + * finding them equal, we know that during that window between + * the reads the lock was unlocked. + * + * There is a technical risk in this that between reading + * current and reading next, other cores locked and unlocked + * two billion times without the lock ever being unlocked, and + * therefore it looks like the lock was at some point unlocked + * but it never was. But this seems highly improbable. */ - return lock->next_ticket != lock->current_ticket; + int current = READ_ONCE(lock->current_ticket); + int next = READ_ONCE(lock->next_ticket); + return next != current; } void arch_spin_lock(arch_spinlock_t *lock); diff --git a/arch/tile/include/asm/spinlock_64.h b/arch/tile/include/asm/spinlock_64.h index 9a12b9c7e5d3..b9718fb4e74a 100644 --- a/arch/tile/include/asm/spinlock_64.h +++ b/arch/tile/include/asm/spinlock_64.h @@ -18,6 +18,8 @@ #ifndef _ASM_TILE_SPINLOCK_64_H #define _ASM_TILE_SPINLOCK_64_H +#include + /* Shifts and masks for the various fields in "lock". */ #define __ARCH_SPIN_CURRENT_SHIFT 17 #define __ARCH_SPIN_NEXT_MASK 0x7fff @@ -44,7 +46,8 @@ static inline u32 arch_spin_next(u32 val) /* The lock is locked if a task would have to wait to get it. */ static inline int arch_spin_is_locked(arch_spinlock_t *lock) { - u32 val = lock->lock; + /* Use READ_ONCE() to ensure that calling this in a loop is OK. */ + u32 val = READ_ONCE(lock->lock); return arch_spin_current(val) != arch_spin_next(val); } -- Chris Metcalf, EZChip Semiconductor http://www.ezchip.com