From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner@vger.kernel.org>
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
	id S1751783AbbKJTrA (ORCPT <rfc822;w@1wt.eu>);
	Tue, 10 Nov 2015 14:47:00 -0500
Received: from g1t6220.austin.hp.com ([15.73.96.84]:57822 "EHLO
	g1t6220.austin.hp.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
	with ESMTP id S1751142AbbKJTqq (ORCPT
	<rfc822;linux-kernel@vger.kernel.org>);
	Tue, 10 Nov 2015 14:46:46 -0500
Message-ID: <564249A3.3070901@hpe.com>
Date: Tue, 10 Nov 2015 14:46:43 -0500
From: Waiman Long <waiman.long@hpe.com>
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:10.0.12) Gecko/20130109 Thunderbird/10.0.12
MIME-Version: 1.0
To: Peter Zijlstra <peterz@infradead.org>
CC: Ingo Molnar <mingo@redhat.com>, Thomas Gleixner <tglx@linutronix.de>,
        "H. Peter Anvin" <hpa@zytor.com>, x86@kernel.org,
        linux-kernel@vger.kernel.org, Scott J Norton <scott.norton@hpe.com>,
        Douglas Hatch <doug.hatch@hpe.com>,
        Davidlohr Bueso <dave@stgolabs.net>
Subject: Re: [PATCH tip/locking/core v10 6/7] locking/pvqspinlock: Allow limited
 lock stealing
References: <1447114167-47185-1-git-send-email-Waiman.Long@hpe.com> <1447114167-47185-7-git-send-email-Waiman.Long@hpe.com> <20151110160343.GE17308@twins.programming.kicks-ass.net>
In-Reply-To: <20151110160343.GE17308@twins.programming.kicks-ass.net>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Sender: linux-kernel-owner@vger.kernel.org
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org

On 11/10/2015 11:03 AM, Peter Zijlstra wrote:
> On Mon, Nov 09, 2015 at 07:09:26PM -0500, Waiman Long wrote:
>> @@ -291,7 +292,7 @@ static __always_inline void __pv_wait_head(struct qspinlock *lock,
>>   void queued_spin_lock_slowpath(struct qspinlock *lock, u32 val)
>>   {
>>   	struct mcs_spinlock *prev, *next, *node;
>> -	u32 new, old, tail;
>> +	u32 new, old, tail, locked;
>>   	int idx;
>>
>>   	BUILD_BUG_ON(CONFIG_NR_CPUS>= (1U<<  _Q_TAIL_CPU_BITS));
>> @@ -431,11 +432,25 @@ queue:
>>   	 * sequentiality; this is because the set_locked() function below
>>   	 * does not imply a full barrier.
>>   	 *
>> +	 * The PV pv_wait_head_or_lock function, if active, will acquire
>> +	 * the lock and return a non-zero value. So we have to skip the
>> +	 * smp_load_acquire() call. As the next PV queue head hasn't been
>> +	 * designated yet, there is no way for the locked value to become
>> +	 * _Q_SLOW_VAL. So both the set_locked() and the
>> +	 * atomic_cmpxchg_relaxed() calls will be safe.
>> +	 *
>> +	 * If PV isn't active, 0 will be returned instead.
>> +	 *
>>   	 */
>> -	pv_wait_head(lock, node);
>> -	while ((val = smp_load_acquire(&lock->val.counter))&  _Q_LOCKED_PENDING_MASK)
>> +	locked = val = pv_wait_head_or_lock(lock, node);
>> +	if (locked)
>> +		goto reset_tail_or_wait_next;
>> +
>> +	while ((val = smp_load_acquire(&lock->val.counter))
>> +			&  _Q_LOCKED_PENDING_MASK)
>>   		cpu_relax();
>>
>> +reset_tail_or_wait_next:
>>   	/*
>>   	 * claim the lock:
>>   	 *
>> @@ -447,8 +462,12 @@ queue:
>>   	 * to grab the lock.
>>   	 */
>>   	for (;;) {
>> -		if (val != tail) {
>> -			set_locked(lock);
>> +		/*
>> +		 * The lock value may or may not have the _Q_LOCKED_VAL bit set.
>> +		 */
>> +		if ((val&  _Q_TAIL_MASK) != tail) {
>> +			if (!locked)
>> +				set_locked(lock);
>>   			break;
>>   		}
>>   		/*
> How about this instead? If we've already got _Q_LOCKED_VAL set, issuing
> that store again isn't much of a problem, the cacheline is already hot
> and we own it and its a regular store not an atomic.
>
> @@ -432,10 +433,13 @@ void queued_spin_lock_slowpath(struct qs
>   	 * does not imply a full barrier.
>   	 *
>   	 */
> -	pv_wait_head(lock, node);
> +	if ((val = pv_wait_head_or_lock(lock, node)))
> +		goto locked;
> +
>   	while ((val = smp_load_acquire(&lock->val.counter))&  _Q_LOCKED_PENDING_MASK)
>   		cpu_relax();
>
> +locked:
>   	/*
>   	 * claim the lock:
>   	 *
> @@ -447,7 +451,8 @@ void queued_spin_lock_slowpath(struct qs
>   	 * to grab the lock.
>   	 */
>   	for (;;) {
> -		if (val != tail) {
> +		/* In the PV case we might already have _Q_LOCKED_VAL set */
> +		if ((val&  _Q_TAIL_MASK) != tail) {
>   			set_locked(lock);
>   			break;
>   		}
>

That is certainly fine. I was doing that originally, but then change it 
to add an additional if.

BTW, I have a process question. Should I just resend the patch 6 or 
should I resend the whole series? I do have a couple of bugs in the 
(_Q_PENDING_BITS != 8) part of the patch that I need to fix too.

Cheers,
Longman