From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753577AbdBHGIZ (ORCPT ); Wed, 8 Feb 2017 01:08:25 -0500 Received: from mail-pg0-f67.google.com ([74.125.83.67]:35054 "EHLO mail-pg0-f67.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753422AbdBHGIX (ORCPT ); Wed, 8 Feb 2017 01:08:23 -0500 Date: Wed, 8 Feb 2017 14:09:39 +0800 From: Boqun Feng To: Xinhui Pan Cc: Waiman Long , Peter Zijlstra , Ingo Molnar , linux-kernel@vger.kernel.org, Pan Xinhui Subject: Re: [PATCH v2] locking/pvqspinlock: Relax cmpxchg's to improve performance on some archs Message-ID: <20170208060939.GC9178@tardis.cn.ibm.com> References: <1482697561-23848-1-git-send-email-longman@redhat.com> <20170208040540.GB9178@tardis.cn.ibm.com> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="oJ71EGRlYNjSvfq7" Content-Disposition: inline In-Reply-To: <20170208040540.GB9178@tardis.cn.ibm.com> User-Agent: Mutt/1.7.2 (2016-11-26) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org --oJ71EGRlYNjSvfq7 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Wed, Feb 08, 2017 at 12:05:40PM +0800, Boqun Feng wrote: > On Wed, Feb 08, 2017 at 11:39:10AM +0800, Xinhui Pan wrote: > > 2016-12-26 4:26 GMT+08:00 Waiman Long : > >=20 > > > A number of cmpxchg calls in qspinlock_paravirt.h were replaced by mo= re > > > relaxed versions to improve performance on architectures that use LL/= SC. > > > > > > All the locking related cmpxchg's are replaced with the _acquire > > > variants: > > > - pv_queued_spin_steal_lock() > > > - trylock_clear_pending() > > > > > > The cmpxchg's related to hashing are replaced by either by the _relea= se > > > or the _relaxed variants. See the inline comment for details. > > > > > > Signed-off-by: Waiman Long > > > > > > v1->v2: > > > - Add comments in changelog and code for the rationale of the chang= e. > > > > > > --- > > > kernel/locking/qspinlock_paravirt.h | 50 ++++++++++++++++++++++++---= --- > > > ------- > > > 1 file changed, 33 insertions(+), 17 deletions(-) > > > > > > > > > @@ -323,8 +329,14 @@ static void pv_wait_node(struct mcs_spinlock *no= de, > > > struct mcs_spinlock *prev) > > > * If pv_kick_node() changed us to vcpu_hashed, retai= n that > > > * value so that pv_wait_head_or_lock() knows to not = also > > > try > > > * to hash this lock. > > > + * > > > + * The smp_store_mb() and control dependency above wi= ll > > > ensure > > > + * that state change won't happen before that. > > > Synchronizing > > > + * with pv_kick_node() wrt hashing by this waiter or = by the > > > + * lock holder is done solely by the state variable. = There > > > is > > > + * no other ordering requirement. > > > */ > > > - cmpxchg(&pn->state, vcpu_halted, vcpu_running); > > > + cmpxchg_relaxed(&pn->state, vcpu_halted, vcpu_running= ); > > > > > > /* > > > * If the locked flag is still not set after wakeup, = it is > > > a > > > @@ -360,9 +372,12 @@ static void pv_kick_node(struct qspinlock *lock, > > > struct mcs_spinlock *node) > > > * pv_wait_node(). If OTOH this fails, the vCPU was running a= nd > > > will > > > * observe its next->locked value and advance itself. > > > * > > > - * Matches with smp_store_mb() and cmpxchg() in pv_wait_node() > > > + * Matches with smp_store_mb() and cmpxchg_relaxed() in > > > pv_wait_node(). > > > + * A release barrier is used here to ensure that node->locked= is > > > + * always set before changing the state. See comment in > > > pv_wait_node(). > > > */ > > > - if (cmpxchg(&pn->state, vcpu_halted, vcpu_hashed) !=3D vcpu_h= alted) > > > + if (cmpxchg_release(&pn->state, vcpu_halted, vcpu_hashed) > > > + !=3D vcpu_halted) > > > return; > > > > > > hi, Waiman > > We can't use _release here, a full barrier is needed. > >=20 > > There is pv_kick_node vs pv_wait_head_or_lock > >=20 > > [w] l->locked =3D _Q_SLOW_VAL //reordered here > >=20 > > if (READ_ONCE(pn->state) =3D=3D vcpu_hashed) //False. > >=20 > > lp =3D (struct qspinlock **)1; > >=20 > > [STORE] pn->state =3D vcpu_hashed lp =3D pv_hash= (lock, > > pn); > > pv_hash() = if > > (xchg(&l->locked, _Q_SLOW_VAL) =3D=3D 0) // fasle, not unhashed. > >=20 >=20 > This analysis is correct, but.. >=20 Hmm.. look at this again, I don't think this analysis is meaningful, let's say the reordering didn't happen, we still got(similar to your case): if (READ_ONCE(pn->state) =3D=3D vcpu_hashed) // false. lp =3D (struct qspinlock **)1; cmpxchg(pn->state, vcpu_halted, vcpu_hashed); if(!lp) { lp =3D pv_hash(lock, pn); WRITE_ONCE(l->locked, _Q_SLOW_VAL); pv_hash(); if (xchg(&l->locked, _Q_SLOW_VAL) =3D=3D 0) // fasle, not unhashe= d. , right? Actually, I think this or your case could not happen because we have cmpxchg(pn->state, vcpu_halted, vcpu_running); in pv_wait_node(), which makes us either observe vcpu_hashed or set pn->state to vcpu_running before pv_kick_node() trying to do the hash. I may miss something subtle, but does switching back to cmpxchg() could fix the RCU stall you observed? Regards, Boqun > > Then the same lock has hashed twice but only unhashed once. So at last = as > > the hash table grows big, we hit RCU stall. > >=20 > > I hit RCU stall when I run netperf benchmark > >=20 >=20 > how will a big hash table hit RCU stall? Do you have the call trace for > your RCU stall? >=20 > Regards, > Boqun >=20 > > thanks > > xinhui > >=20 > >=20 > > > -- > > > 1.8.3.1 > > > > > > --oJ71EGRlYNjSvfq7 Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- iQEzBAABCAAdFiEEj5IosQTPz8XU1wRHSXnow7UH+rgFAliath4ACgkQSXnow7UH +riQMwf/Xo0IcWvgOSSTgfoho3jTpoqkVkKbV5lxueuQbTNPd6D+45cnPU5aKvhC ZzHoNkMVKyj7zqyIqV+asPEasONeNgvlTvMDmRt4uHsI9ayrRzFokhlRvDqEZTpw I3MGmR3wFAl6s3ADCWy67MW87AcheXvRxvTwmMxDZ3Gue23BsDi6UbWXdIAv/m6I RNyzB+giC/l4p9hsJ4X3JXBgmeoVowsm4XzYh6zNLJsUAJ5mDGLErRhwP0D3bLEv A2gAxVqIknbzLi5V2uSOEG5XZXEtpKMRfsrDKBM2nDMzk0FoIM81646unO3M1D3z HUKBLUVeFgSxtimRbEg12x1CfCcMfA== =yKAx -----END PGP SIGNATURE----- --oJ71EGRlYNjSvfq7--