From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751578AbdB0RIM (ORCPT ); Mon, 27 Feb 2017 12:08:12 -0500 Received: from mx0b-001b2d01.pphosted.com ([148.163.158.5]:39517 "EHLO mx0a-001b2d01.pphosted.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1751457AbdB0RII (ORCPT ); Mon, 27 Feb 2017 12:08:08 -0500 Subject: Re: [PATCH v5] locking/pvqspinlock: Relax cmpxchg's to improve performance on some archs To: Waiman Long , Peter Zijlstra , Ingo Molnar References: <1487859196-10709-1-git-send-email-longman@redhat.com> Cc: linux-kernel@vger.kernel.org, Boqun Feng , Andrea Parri From: Pan Xinhui Date: Tue, 28 Feb 2017 01:06:45 +0800 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.12; rv:45.0) Gecko/20100101 Thunderbird/45.4.0 MIME-Version: 1.0 In-Reply-To: <1487859196-10709-1-git-send-email-longman@redhat.com> Content-Type: text/plain; charset=gbk; format=flowed Content-Transfer-Encoding: 8bit X-TM-AS-MML: disable X-Content-Scanned: Fidelis XPS MAILER x-cbid: 17022717-0044-0000-0000-0000022F6664 X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 17022717-0045-0000-0000-0000069C0657 Message-Id: <0165b425-cce6-a5a5-235f-7c3e5c0210b7@linux.vnet.ibm.com> X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:,, definitions=2017-02-27_11:,, signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 spamscore=0 suspectscore=2 malwarescore=0 phishscore=0 adultscore=0 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1612050000 definitions=main-1702270163 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org ÔÚ 2017/2/23 22:13, Waiman Long дµÀ: > All the locking related cmpxchg's in the following functions are > replaced with the _acquire variants: > - pv_queued_spin_steal_lock() > - trylock_clear_pending() > > This change should help performance on architectures that use LL/SC. > > On a 2-core 16-thread Power8 system with pvqspinlock explicitly > enabled, the performance of a locking microbenchmark with and without > this patch on a 4.10-rc8 kernel with Xinhui's PPC qspinlock patch > were as follows: > > # of thread w/o patch with patch % Change > ----------- --------- ---------- -------- > 4 4053.3 Mop/s 4223.7 Mop/s +4.2% > 8 3310.4 Mop/s 3406.0 Mop/s +2.9% > 12 2576.4 Mop/s 2674.6 Mop/s +3.8% > > Signed-off-by: Waiman Long > --- Works on my side :) Reviewed-by: Pan Xinhui > v4->v5: > - Correct some grammatical issues in comment. > > v3->v4: > - Update the comment in pv_kick_node() to mention that the code > may not work in some archs. > > v2->v3: > - Reduce scope by relaxing cmpxchg's in fast path only. > > v1->v2: > - Add comments in changelog and code for the rationale of the change. > > kernel/locking/qspinlock_paravirt.h | 19 +++++++++++++------ > 1 file changed, 13 insertions(+), 6 deletions(-) > > diff --git a/kernel/locking/qspinlock_paravirt.h b/kernel/locking/qspinlock_paravirt.h > index e6b2f7a..4614e39 100644 > --- a/kernel/locking/qspinlock_paravirt.h > +++ b/kernel/locking/qspinlock_paravirt.h > @@ -72,7 +72,7 @@ static inline bool pv_queued_spin_steal_lock(struct qspinlock *lock) > struct __qspinlock *l = (void *)lock; > > if (!(atomic_read(&lock->val) & _Q_LOCKED_PENDING_MASK) && > - (cmpxchg(&l->locked, 0, _Q_LOCKED_VAL) == 0)) { > + (cmpxchg_acquire(&l->locked, 0, _Q_LOCKED_VAL) == 0)) { > qstat_inc(qstat_pv_lock_stealing, true); > return true; > } > @@ -101,16 +101,16 @@ static __always_inline void clear_pending(struct qspinlock *lock) > > /* > * The pending bit check in pv_queued_spin_steal_lock() isn't a memory > - * barrier. Therefore, an atomic cmpxchg() is used to acquire the lock > - * just to be sure that it will get it. > + * barrier. Therefore, an atomic cmpxchg_acquire() is used to acquire the > + * lock just to be sure that it will get it. > */ > static __always_inline int trylock_clear_pending(struct qspinlock *lock) > { > struct __qspinlock *l = (void *)lock; > > return !READ_ONCE(l->locked) && > - (cmpxchg(&l->locked_pending, _Q_PENDING_VAL, _Q_LOCKED_VAL) > - == _Q_PENDING_VAL); > + (cmpxchg_acquire(&l->locked_pending, _Q_PENDING_VAL, > + _Q_LOCKED_VAL) == _Q_PENDING_VAL); > } > #else /* _Q_PENDING_BITS == 8 */ > static __always_inline void set_pending(struct qspinlock *lock) > @@ -138,7 +138,7 @@ static __always_inline int trylock_clear_pending(struct qspinlock *lock) > */ > old = val; > new = (val & ~_Q_PENDING_MASK) | _Q_LOCKED_VAL; > - val = atomic_cmpxchg(&lock->val, old, new); > + val = atomic_cmpxchg_acquire(&lock->val, old, new); > > if (val == old) > return 1; > @@ -361,6 +361,13 @@ static void pv_kick_node(struct qspinlock *lock, struct mcs_spinlock *node) > * observe its next->locked value and advance itself. > * > * Matches with smp_store_mb() and cmpxchg() in pv_wait_node() > + * > + * The write to next->locked in arch_mcs_spin_unlock_contended() > + * must be ordered before the read of pn->state in the cmpxchg() > + * below for the code to work correctly. However, this is not > + * guaranteed on all architectures when the cmpxchg() call fails. > + * Both x86 and PPC can provide that guarantee, but other > + * architectures not necessarily. > */ > if (cmpxchg(&pn->state, vcpu_halted, vcpu_hashed) != vcpu_halted) > return; >