From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner@vger.kernel.org>
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
	id S1752574AbcEKI26 (ORCPT <rfc822;w@1wt.eu>);
	Wed, 11 May 2016 04:28:58 -0400
Received: from mail-wm0-f66.google.com ([74.125.82.66]:33351 "EHLO
	mail-wm0-f66.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
	with ESMTP id S1751455AbcEKI24 (ORCPT
	<rfc822;linux-kernel@vger.kernel.org>);
	Wed, 11 May 2016 04:28:56 -0400
Date: Wed, 11 May 2016 10:28:53 +0200
From: Michal Hocko <mhocko@kernel.org>
To: Peter Zijlstra <peterz@infradead.org>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>,
        LKML <linux-kernel@vger.kernel.org>, Ingo Molnar <mingo@redhat.com>,
        Thomas Gleixner <tglx@linutronix.de>, "H. Peter Anvin" <hpa@zytor.com>,
        "David S. Miller" <davem@davemloft.net>,
        Tony Luck <tony.luck@intel.com>,
        Andrew Morton <akpm@linux-foundation.org>,
        Chris Zankel <chris@zankel.net>, Max Filippov <jcmvbkbc@gmail.com>,
        Davidlohr Bueso <dave@stgolabs.net>, Waiman Long <Waiman.Long@hpe.com>
Subject: Re: [PATCH 03/11] locking, rwsem: introduce basis for
 down_write_killable
Message-ID: <20160511082853.GF16677@dhcp22.suse.cz>
References: <1459508695-14915-1-git-send-email-mhocko@kernel.org>
 <1459508695-14915-4-git-send-email-mhocko@kernel.org>
 <8bd03bdc-0373-a3bb-da12-045322efb797@I-love.SAKURA.ne.jp>
 <20160510115320.GJ23576@dhcp22.suse.cz>
 <20160510123806.GB3193@twins.programming.kicks-ass.net>
 <20160511072357.GC16677@dhcp22.suse.cz>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20160511072357.GC16677@dhcp22.suse.cz>
User-Agent: Mutt/1.6.0 (2016-04-01)
Sender: linux-kernel-owner@vger.kernel.org
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org

On Wed 11-05-16 09:23:57, Michal Hocko wrote:
> On Tue 10-05-16 14:38:06, Peter Zijlstra wrote:
[...]
> > However, at the time I was thinking that if we have:
> > 
> > 	reader (owner)
> > 	writer (pending)
> > 	reader (blocked on writer)
> > 
> > and writer would get cancelled, the up_read() would do a wakeup and kick
> > the blocked reader.
> > 
> > But yes, immediately kicking further pending waiters might be better.
> 
> OK, that makes sense. We shouldn't be waiting for the first reader to
> do up_read.

I am still trying to wrap my head around the wake up logic (I managed to
forget all the juicy details). I still do not see the correctness
issue with the current code so the following should be "just an
optimization". But as I've said I might be missing something here.

Does the following look correct/reasonable? This is absolutely untested
and more for a discussion:
---
diff --git a/kernel/locking/rwsem-xadd.c b/kernel/locking/rwsem-xadd.c
index df4dcb883b50..726620c97366 100644
--- a/kernel/locking/rwsem-xadd.c
+++ b/kernel/locking/rwsem-xadd.c
@@ -516,7 +516,27 @@ EXPORT_SYMBOL(rwsem_down_write_failed);
 __visible struct rw_semaphore * __sched
 rwsem_down_write_failed_killable(struct rw_semaphore *sem)
 {
-	return __rwsem_down_write_failed_common(sem, TASK_KILLABLE);
+	struct rw_semaphore * ret;
+
+	ret = __rwsem_down_write_failed_common(sem, TASK_KILLABLE);
+	if (IS_ERR(ret)) {
+		long count;
+
+		raw_spin_lock_irq(&sem->wait_lock);
+		count = READ_ONCE(sem->count);
+
+		/*
+		 * If there were already threads queued before us and there are
+		 * no active writers, the lock must be read owned; so we try to
+		 * wake any read locks that were queued ahead of us so they
+		 * do not have to wait for up_read to wake up.
+		 */
+		if (count > RWSEM_WAITING_BIAS)
+			sem = __rwsem_do_wake(sem, RWSEM_WAKE_READERS);
+		raw_spin_unlock_irq(&sem->wait_lock);
+	}
+
+	return ret;
 }
 EXPORT_SYMBOL(rwsem_down_write_failed_killable);
 
-- 
Michal Hocko
SUSE Labs