From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1758324AbcFAQyP (ORCPT ); Wed, 1 Jun 2016 12:54:15 -0400 Received: from mail-bn1on0115.outbound.protection.outlook.com ([157.56.110.115]:12740 "EHLO na01-bn1-obe.outbound.protection.outlook.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1752871AbcFAQyN (ORCPT ); Wed, 1 Jun 2016 12:54:13 -0400 X-Greylist: delayed 75168 seconds by postgrey-1.27 at vger.kernel.org; Wed, 01 Jun 2016 12:54:12 EDT Authentication-Results: infradead.org; dkim=none (message not signed) header.d=none;infradead.org; dmarc=none action=none header.from=hpe.com; Message-ID: <574F1326.5000207@hpe.com> Date: Wed, 1 Jun 2016 12:53:58 -0400 From: Waiman Long User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:10.0.12) Gecko/20130109 Thunderbird/10.0.12 MIME-Version: 1.0 To: Peter Zijlstra CC: , , , , , , , , , , , , , , Subject: Re: [PATCH -v3 7/8] locking: Move smp_cond_load_acquire() and friends into asm-generic/barrier.h References: <20160531094134.606249808@infradead.org> <20160531094844.282806055@infradead.org> <574DED82.9080200@hpe.com> <20160601093158.GN3190@twins.programming.kicks-ass.net> In-Reply-To: <20160601093158.GN3190@twins.programming.kicks-ass.net> Content-Type: text/plain; charset="ISO-8859-1"; format=flowed Content-Transfer-Encoding: 7bit X-Originating-IP: [72.71.243.105] X-ClientProxiedBy: SN1PR01CA0020.prod.exchangelabs.com (10.165.224.30) To CS1PR84MB0309.NAMPRD84.PROD.OUTLOOK.COM (10.162.189.158) X-MS-Office365-Filtering-Correlation-Id: 34288bdb-16b1-4784-4697-08d38a3d5a10 X-Microsoft-Exchange-Diagnostics: 1;CS1PR84MB0309;2:5kmzTcJ4r4RlG5NeGHb/c5Yn1P/p/RqvFFG/gtMgzjzQmD8uMQ3y+Bts5JuPCR1pZEyB7U0KeOYIdGMqLP0lknwFRuDgpkQ/ZIMey+1ZiYPmPlkNxG78DJoFHskk0nX+rVbIrpQW/4xvZTzLz+Xv7GDeMII4GR7ZJYdslGx3Jf1b43QpTdHeVwfUBjwkT1c6;3:CV5R9xS+79SaGr+UMXD98mO/BRQVRI0M/P7XcFgmB/1g2Pvasy8eBzbLNNrD1DiFrqSwDLuQ+Njfu2R7jlyEMiH1WfAO/zil+H62vQ05CIoFtGWLMYyM4eEi/9XB35ZQ;25:rQKDozopTwbQLpngnAZD7x7q7pMZvPxRLLdhSe7rth3/njN1DlSz5OZFcrW8uXn/CIm1w/OXgcHQmzmuL4gJBGj7QCHUz+FUDtWzFIijUaue0O7862YzMZ8RPOt1n7JDf+rHkQ23rvA0tbdtPacZowYJja0ei/HDYwQqVjSIYDll8nj2z1fBBQt0B8rC/RkbRa2MN2MhXFyQDPRUMAD84c3ID+NdX3OXUsC8rZTb5b1dPTb+vLt98F3a2pISAiYfaAdZ01LFQPFhJdDaL8LutmiU3traTYGMbLCM3+yRbjfmSuu6A5J8JIZvCIrY2Ux+RvLOe5Dga/1i2h7gHn8JPwTsBTTljJztTNLGVVvdvrUADLISRXRrBnT8oMxMwaYeD8HR/ypg1MiJCj5Br/CainnarRSi03Mud36Hu5paa0w= X-Microsoft-Antispam: UriScan:;BCL:0;PCL:0;RULEID:;SRVR:CS1PR84MB0309; X-Microsoft-Exchange-Diagnostics: 1;CS1PR84MB0309;20:ufMzeO1Z+0aDrabJciMKnJO6zJXTPx6zzUizJLkKXxofMZBFJjeB8KLyA/Te/9zlWSvL5ijHya6Z0EK7JmqI8JbIuIYF5jdeI3GGcpLYtct5PjeLqS+wb9CO9qwsKuHLDEJmOhCSNZNY/VMQuLdOlpPAW1tMwhHgII/cAzjA4qawap2BdSjdODIL7/XS2BYNqlBQxw5Wa3nBqy1jI+Vofa7Iqjeg1CUX6HBMagc7/NO4jeAygP2I2LgJFpEzNia+ZSfJeiep53CZhZ+LPxUqJZRBT0SAN+ULDpt8r59LPwCDnsq8YiTxCEYXYPxAM7TBjs49L4QcUJQc6qEIOklH4acM+i32Setfj9kZ3zdf8vruGCZYA4KDuJuT6pNKQPTHTa4axe6OqEtR5gIslzN9ABPpk0OsxhId1W+wZ/cQDUo=;4:WxT8kLbS4LJgvc3pumCSjxjqvGuwlJT/qOIxPXa9pjiK16OohZ5rLJIdjYEgThkiwnbEtmKuBOMYvnGCcHhM3yV35iyiQB8M3VBUh6EOHDAXODmIbpEipr7S7v/5GK+WA1JT7KQkwVg7bejqKvA7/Qjni3Zp1KbtKPOYI05ZK1Ne840PUZevchD7EFJv9kz1V01L47e4b4J00Fa+uG+caHej+85tisQd34+Y5J7GoIY97eSMC3C32662N7ZTv6975lslUDA4rl9uBTuMh3pGQMHqMJlllrTQiW38qEMnxrW5+5LUd4fd98dWcPkNepE19KmvVekZj0nrpF6sKRwbs6Lv41rjy5fqoYRiud9w8XvziyD13m+7tLJ6xncg2xNj8snsurc4c4nHk/+aLcRe0xQHz0j+Q0IcHkNh/KDGBLxblXfClMQkzegLcB9Ws1vQ X-Microsoft-Antispam-PRVS: X-Exchange-Antispam-Report-Test: UriScan:(788757137089)(17755550239193); X-Exchange-Antispam-Report-CFA-Test: BCL:0;PCL:0;RULEID:(601004)(2401047)(8121501046)(5005006)(3002001)(10201501046);SRVR:CS1PR84MB0309;BCL:0;PCL:0;RULEID:;SRVR:CS1PR84MB0309; X-Forefront-PRVS: 096029FF66 X-Forefront-Antispam-Report: SFV:NSPM;SFS:(10019020)(4630300001)(6009001)(6049001)(24454002)(377454003)(93886004)(5008740100001)(36756003)(65956001)(65806001)(66066001)(77096005)(47776003)(42186005)(586003)(6116002)(92566002)(3846002)(2906002)(65816999)(110136002)(54356999)(86362001)(87266999)(83506001)(50986999)(189998001)(81166006)(33656002)(117156001)(4001350100001)(80316001)(5004730100002)(230700001)(23756003)(50466002)(2950100001)(4326007)(8676002)(59896002)(76176999)(41533002);DIR:OUT;SFP:1102;SCL:1;SRVR:CS1PR84MB0309;H:[192.168.142.143];FPR:;SPF:None;MLV:sfv;LANG:en; X-Microsoft-Exchange-Diagnostics: =?iso-8859-1?Q?1;CS1PR84MB0309;23:Z00GMISYbS7nsYmWHjgqgkqQUeoFDQ4brWaHX0o?= =?iso-8859-1?Q?VckktlxbDxwL4lPjgzZrmGlpDBvnRcXfhxoJo505wsTs1gBlt1XbEGbHlc?= =?iso-8859-1?Q?v/rZ+XuJTDEtnaSglO7sG87iJXmb52wBhslgFv3taKzTgwcF6khbGRbOUV?= =?iso-8859-1?Q?j85JgYFf8WnBviY03qS0WrL4w4zv+adW1LgDtVCXTzA1eLtVrvg5Ct9bdo?= =?iso-8859-1?Q?XMdjKRuRCjkQoltSsxN5ilqWVF9hxf7K/mGAuFyUMRWjRla7Tbr8QtTxiY?= =?iso-8859-1?Q?NN+YNINfkjiTk+2IytSvncZPvY/Yx1X/hkK+YCgF1gSg/crtS/OyGpzhTX?= =?iso-8859-1?Q?dnGbF5uJO1Lo+Yu09uDhEfwNXEM/v+Q1leQD0CROvwdZ4XvKuQHWDshTsA?= =?iso-8859-1?Q?OgblDM6OFmED1xx9PD8BXIkG4qmK2J+rCxIlFnXlwzCUg0013PyF8uDIdT?= =?iso-8859-1?Q?syiiLpUMUS8CtJ0G9S2q6kdxJvu3FZj9o9S43JqFtV70mj4YrapGdELo83?= =?iso-8859-1?Q?c5MgA02mVGdzqJTJfpL/K411gjhuiBuFzmZWegRc895f6zLhymVWYqRKkb?= =?iso-8859-1?Q?aVl4yivfT7MCpriEwq8Q1lP1PRPC+yXINH2dbqOrJQjw6OBlRzqbX9lLVi?= =?iso-8859-1?Q?YE3BGhYPa2oqZxsd9BbtUJWIWTBgyA8zB7PeSkobA8kdBGfsTl5vn7BZ35?= =?iso-8859-1?Q?dTpCyp6IgENqyPH6EkUiWjPdIPw8M+byl3Sm62wVV6JpOVtx1Azgrm+eYy?= =?iso-8859-1?Q?2q8gEiIXeURInEqrvdQwUCa3F51AGHbRCsL7F84JBqJHSBdAavz8eNIYsC?= =?iso-8859-1?Q?YLUMz962eW8VSghc3rmAP0ce9V894He5MfrnHRF+dDh4SeimatvHI0yEqQ?= =?iso-8859-1?Q?daBg3JdN+bhyHBOfDpqZwtO/C7CowClZMbjDO4qU9EiB+zeqnH30nokG0f?= =?iso-8859-1?Q?XIMMbPqeORP/WTQpw5jDTs+z5K6QbyB7o3syPNu0HxqSLLfblhkTmzbj3C?= =?iso-8859-1?Q?u7nGrQCNhMrHYbcldvwUsvlM6hcCQa3sNTT1juT4icsNdUbtuXe2UPVT4A?= =?iso-8859-1?Q?ff2Ir1FnF0nLeRL8YjzYkgNOGQN9Q1GVc0DcV0E4anBjnCd/t5zTOETNO2?= =?iso-8859-1?Q?FzIgKlBOcMiLnpdjToJd4112k7A=3D=3D?= X-Microsoft-Exchange-Diagnostics: 1;CS1PR84MB0309;5:YR1fp07YWuRUH9HqBUgQeQH+iNr5q7OavBPXVdb5cQ6EP6CfHVTe29yjw5YbA/fVUvy4GrM5qs8L6SxYWP7p5JU20AfQD8eKkkVXLifHDqDdAs0gk5ydxkW/g58WaCGIX6B26NnNAyozJbWLHyWCbg==;24:/oXqB7DeOUwI3nf+qc/g76wOlbZyCxTOxe40C/xKeHmQvbzWwHQLiBR+hu3rB2quV1Jo2t6nq00vZHcIQ7vz/5YRszWNEQgTq8ZuhoY1zFU=;7:DkKrRFoZddPk1eKSl2xEVfy6boOnd69z1tQMAtfwx/6N4exh4ufYaWeeY+nE3nHkr+pnuUI+U5Odix+Z8l1JpfqM8ZOkFoOvXpCCeRNIz04HUV0U0yGEyQywT3vFFaul/BS9DdTVcUbgWj023U4+A0rrFAafGgGpQAbvm7eiNOE/O2HfPXsLEGcldfEmkitP SpamDiagnosticOutput: 1:23 SpamDiagnosticMetadata: NSPM X-OriginatorOrg: hpe.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 01 Jun 2016 16:54:08.1931 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-Transport-CrossTenantHeadersStamped: CS1PR84MB0309 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 06/01/2016 05:31 AM, Peter Zijlstra wrote: > On Tue, May 31, 2016 at 04:01:06PM -0400, Waiman Long wrote: >> You are doing two READ_ONCE's in the smp_cond_load_acquire loop. Can we >> change it to do just one READ_ONCE, like >> >> --- a/include/asm-generic/barrier.h >> +++ b/include/asm-generic/barrier.h >> @@ -229,12 +229,18 @@ do { >> * value; some architectures can do this in hardware. >> */ >> #ifndef cmpwait >> +#define cmpwait(ptr, val) ({ \ >> typeof (ptr) __ptr = (ptr); \ >> + typeof (val) __old = (val); \ >> + typeof (val) __new; \ >> + for (;;) { \ >> + __new = READ_ONCE(*__ptr); \ >> + if (__new != __old) \ >> + break; \ >> cpu_relax(); \ >> + } \ >> + __new; \ >> +}) >> #endif >> >> /** >> @@ -251,12 +257,11 @@ do { >> #ifndef smp_cond_load_acquire >> #define smp_cond_load_acquire(ptr, cond_expr) ({ \ >> typeof(ptr) __PTR = (ptr); \ >> + typeof(*ptr) VAL = READ_ONCE(*__PTR); \ >> for (;;) { \ >> if (cond_expr) \ >> break; \ >> + VAL = cmpwait(__PTR, VAL); \ >> } \ >> smp_acquire__after_ctrl_dep(); \ >> VAL; \ > Yes, that generates slightly better code, but now that you made me look > at it, I think we need to kill the cmpwait() in the generic version and > only keep it for arch versions. > > /me ponders... > > So cmpwait() as implemented here has strict semantics; but arch > implementations as previously proposed have less strict semantics; and > the use here follows that less strict variant. > > The difference being that the arch implementations of cmpwait can have > false positives (ie. return early, without a changed value) > smp_cond_load_acquire() can deal with these false positives seeing how > its in a loop and does its own (more specific) comparison. > > Exposing cmpwait(), with the documented semantics, means that arch > versions need an additional loop inside to match these strict semantics, > or we need to weaken the cmpwait() semantics, at which point I'm not > entirely sure its worth keeping as a generic primitive... > > Hmm, so if we can find a use for the weaker cmpwait() outside of > smp_cond_load_acquire() I think we can make a case for keeping it, and > looking at qspinlock.h there's two sites we can replace cpu_relax() with > it. > > Will, since ARM64 seems to want to use this, does the below make sense > to you? > > --- > include/asm-generic/barrier.h | 15 ++++++--------- > kernel/locking/qspinlock.c | 4 ++-- > 2 files changed, 8 insertions(+), 11 deletions(-) > > diff --git a/include/asm-generic/barrier.h b/include/asm-generic/barrier.h > index be9222b10d17..05feda5c22e6 100644 > --- a/include/asm-generic/barrier.h > +++ b/include/asm-generic/barrier.h > @@ -221,20 +221,17 @@ do { \ > #endif > > /** > - * cmpwait - compare and wait for a variable to change > + * cmpwait - compare and wait for a variable to 'change' > * @ptr: pointer to the variable to wait on > * @val: the value it should change from > * > - * A simple constuct that waits for a variable to change from a known > - * value; some architectures can do this in hardware. > + * A 'better' cpu_relax(), some architectures can avoid polling and have event > + * based wakeups on variables. Such constructs allow false positives on the > + * 'change' and can return early. Therefore this reduces to cpu_relax() > + * without hardware assist. > */ > #ifndef cmpwait > -#define cmpwait(ptr, val) do { \ > - typeof (ptr) __ptr = (ptr); \ > - typeof (val) __val = (val); \ > - while (READ_ONCE(*__ptr) == __val) \ > - cpu_relax(); \ > -} while (0) > +#define cmpwait(ptr, val) cpu_relax() > #endif > > /** > diff --git a/kernel/locking/qspinlock.c b/kernel/locking/qspinlock.c > index e98e5bf679e9..60a811d56406 100644 > --- a/kernel/locking/qspinlock.c > +++ b/kernel/locking/qspinlock.c > @@ -311,7 +311,7 @@ void queued_spin_lock_slowpath(struct qspinlock *lock, u32 val) > */ > if (val == _Q_PENDING_VAL) { > while ((val = atomic_read(&lock->val)) == _Q_PENDING_VAL) > - cpu_relax(); > + cmpwait(&lock->val.counter, _Q_PENDING_VAL); > } > > /* > @@ -481,7 +481,7 @@ void queued_spin_lock_slowpath(struct qspinlock *lock, u32 val) > */ > if (!next) { > while (!(next = READ_ONCE(node->next))) > - cpu_relax(); > + cmpwait(&node->next, NULL); > } > > arch_mcs_spin_unlock_contended(&next->locked); I think it is a good idea to consider cmpwait as a fancier version of cpu_relax(). It can certainly get used in a lot more places. Cheers, Longman