From mboxrd@z Thu Jan  1 00:00:00 1970
From: Waiman Long <waiman.long@hp.com>
Subject: Re: [PATCH v15 09/15] pvqspinlock: Implement simple
 paravirt support for the qspinlock
Date: Mon, 13 Apr 2015 11:51:36 -0400
Message-ID: <552BE608.7000901__14861.3206147731$1428940414$gmane$org@hp.com>
References: <1428375350-9213-1-git-send-email-Waiman.Long@hp.com>
	<1428375350-9213-10-git-send-email-Waiman.Long@hp.com>
	<20150409181327.GY5029@twins.programming.kicks-ass.net>
	<5526F218.2070909@hp.com>
	<20150413150832.GH5029@twins.programming.kicks-ass.net>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"; Format="flowed"
Content-Transfer-Encoding: 7bit
Return-path: <xen-devel-bounces@lists.xen.org>
Received: from mail6.bemta3.messagelabs.com ([195.245.230.39])
	by lists.xen.org with esmtp (Exim 4.72)
	(envelope-from <waiman.long@hp.com>) id 1YhgeG-0004LK-70
	for xen-devel@lists.xenproject.org; Mon, 13 Apr 2015 15:51:44 +0000
In-Reply-To: <20150413150832.GH5029@twins.programming.kicks-ass.net>
List-Unsubscribe: <http://lists.xen.org/cgi-bin/mailman/options/xen-devel>,
	<mailto:xen-devel-request@lists.xen.org?subject=unsubscribe>
List-Post: <mailto:xen-devel@lists.xen.org>
List-Help: <mailto:xen-devel-request@lists.xen.org?subject=help>
List-Subscribe: <http://lists.xen.org/cgi-bin/mailman/listinfo/xen-devel>,
	<mailto:xen-devel-request@lists.xen.org?subject=subscribe>
Sender: xen-devel-bounces@lists.xen.org
Errors-To: xen-devel-bounces@lists.xen.org
To: Peter Zijlstra <peterz@infradead.org>
Cc: linux-arch@vger.kernel.org, Rik van Riel <riel@redhat.com>, Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>, Oleg Nesterov <oleg@redhat.com>, kvm@vger.kernel.org, Daniel J Blueman <daniel@numascale.com>, x86@kernel.org, Paolo Bonzini <paolo.bonzini@gmail.com>, linux-kernel@vger.kernel.org, virtualization@lists.linux-foundation.org, Scott J Norton <scott.norton@hp.com>, Ingo Molnar <mingo@redhat.com>, David Vrabel <david.vrabel@citrix.com>, "H. Peter Anvin" <hpa@zytor.com>, xen-devel@lists.xenproject.org, Thomas Gleixner <tglx@linutronix.de>, "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>, Linus Torvalds <torvalds@linux-foundation.org>, Boris Ostrovsky <boris.ostrovsky@oracle.com>, Douglas Hatch <doug.hatch@hp.com>
List-Id: xen-devel@lists.xenproject.org

On 04/13/2015 11:08 AM, Peter Zijlstra wrote:
> On Thu, Apr 09, 2015 at 05:41:44PM -0400, Waiman Long wrote:
>
>>>> +static void pv_wait_head(struct qspinlock *lock, struct mcs_spinlock *node)
>>>> +{
>>>> +	struct __qspinlock *l = (void *)lock;
>>>> +	struct qspinlock **lp = NULL;
>>>> +	struct pv_node *pn = (struct pv_node *)node;
>>>> +	int slow_set = false;
>>>> +	int loop;
>>>> +
>>>> +	for (;;) {
>>>> +		for (loop = SPIN_THRESHOLD; loop; loop--) {
>>>> +			if (!READ_ONCE(l->locked))
>>>> +				return;
>>>> +
>>>> +			cpu_relax();
>>>> +		}
>>>> +
>>>> +		WRITE_ONCE(pn->state, vcpu_halted);
>>>> +		if (!lp)
>>>> +			lp = pv_hash(lock, pn);
>>>> +		/*
>>>> +		 * lp must be set before setting _Q_SLOW_VAL
>>>> +		 *
>>>> +		 * [S] lp = lock                [RmW] l = l->locked = 0
>>>> +		 *     MB                             MB
>>>> +		 * [S] l->locked = _Q_SLOW_VAL  [L]   lp
>>>> +		 *
>>>> +		 * Matches the cmpxchg() in pv_queue_spin_unlock().
>>>> +		 */
>>>> +		if (!slow_set&&
>>>> +		    !cmpxchg(&l->locked, _Q_LOCKED_VAL, _Q_SLOW_VAL)) {
>>>> +			/*
>>>> +			 * The lock is free and _Q_SLOW_VAL has never been
>>>> +			 * set. Need to clear the hash bucket before getting
>>>> +			 * the lock.
>>>> +			 */
>>>> +			WRITE_ONCE(*lp, NULL);
>>>> +			return;
>>>> +		} else if (slow_set&&   !READ_ONCE(l->locked))
>>>> +			return;
>>>> +		slow_set = true;
>>> I'm somewhat puzzled by the slow_set thing; what is wrong with the thing
>>> I had, namely:
>>>
>>> 		if (!lp) {
>>> 			lp = pv_hash(lock, pn);
>>>
>>> 			/*
>>> 			 * comment
>>> 			 */
>>> 			lv = cmpxchg(&l->locked, _Q_LOCKED_VAL, _Q_SLOW_VAL);
>>> 			if (lv != _Q_LOCKED_VAL) {
>>> 				/* we're woken, unhash and return */
>>> 				WRITE_ONCE(*lp, NULL);
>>> 				return;
>>> 			}
>>> 		}
>>>> +
>>>> +		pv_wait(&l->locked, _Q_SLOW_VAL);
>>> If we get a spurious wakeup (due to device interrupts or random kick)
>>> we'll loop around but ->locked will remain _Q_SLOW_VAL.
>> The purpose of the slow_set flag is not about the lock value. It is to make
>> sure that pv_hash_find() will always find a match. Consider the following
>> scenario:
>>
>> cpu1            cpu2                    cpu3
>> ----            ----                    ----
>> pv_wait
>> spurious wakeup
>> loop l->locked
>>
>>                  read _Q_SLOW_VAL
>>                  pv_hash_find()
>>                  unlock
>>
>>                                          pv_hash()<- same entry
>>
>> cmpxchg(&l->locked)
>> clear hash (?)
>>
>> Here, the entry for cpu3 will be removed leading to panic when
>> pv_hash_find() can find the entry. So the hash entry can only be
>> removed if the other cpu has no chance to see _Q_SLOW_VAL.
> Still confused. Afaict that cannot happen with my code. We only do the
> cmpxchg(, _Q_SLOW_VAL) _once_.
>
> Only on the first time around that loop will we hash the lock and set
> the slow flag. And cpu3 cannot come in on the same entry because we've
> not yet released the lock when we find and unhash.
>
>

Maybe I am not clear in my illustration. More than one locks can be 
hashed to the same value. So cpu3 is accessing a different lock which 
has the same hashed value as the lock used by cpu1 and cpu2.

Anyway, I remove the slow_set flag by unrolling the retry loop so that 
after pv_wait(), it goes into the 2nd loop instead of going back to the 
top. As a result, cmpxchg and pv_hash can only be called once.

Cheers,
Longman