From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner@vger.kernel.org>
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
	id S1751881AbcGRRr7 (ORCPT <rfc822;w@1wt.eu>);
	Mon, 18 Jul 2016 13:47:59 -0400
Received: from g2t4625.austin.hp.com ([15.73.212.76]:39718 "EHLO
	g2t4625.austin.hp.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
	with ESMTP id S1751457AbcGRRr5 (ORCPT
	<rfc822;linux-kernel@vger.kernel.org>);
	Mon, 18 Jul 2016 13:47:57 -0400
Message-ID: <1468864069.2367.21.camel@j-VirtualBox>
Subject: Re: [RFC] locking/mutex: Fix starvation of sleeping waiters
From: Jason Low <jason.low2@hpe.com>
To: Peter Zijlstra <peterz@infradead.org>
Cc: jason.low2@hpe.com, Imre Deak <imre.deak@intel.com>,
        linux-kernel@vger.kernel.org, Ingo Molnar <mingo@redhat.com>,
        Chris Wilson <chris@chris-wilson.co.uk>,
        Daniel Vetter <daniel.vetter@intel.com>,
        Ville Syrj??l?? <ville.syrjala@linux.intel.com>,
        Waiman Long <Waiman.Long@hpe.com>, Davidlohr Bueso <dave@stgolabs.net>,
        jason.low2@hp.com
Date: Mon, 18 Jul 2016 10:47:49 -0700
In-Reply-To: <20160718171537.GC6862@twins.programming.kicks-ass.net>
References: <1468858607-20481-1-git-send-email-imre.deak@intel.com>
	 <20160718171537.GC6862@twins.programming.kicks-ass.net>
Content-Type: text/plain; charset="UTF-8"
X-Mailer: Evolution 3.10.4-0ubuntu2 
Mime-Version: 1.0
Content-Transfer-Encoding: 7bit
Sender: linux-kernel-owner@vger.kernel.org
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org

On Mon, 2016-07-18 at 19:15 +0200, Peter Zijlstra wrote:
> On Mon, Jul 18, 2016 at 07:16:47PM +0300, Imre Deak wrote:
> > Currently a thread sleeping on a mutex wait queue can be delayed
> > indefinitely by other threads managing to steal the lock, that is
> > acquiring the lock out-of-order before the sleepers. I noticed this via
> > a testcase (see the Reference: below) where one CPU was unlocking /
> > relocking a mutex in a tight loop while another CPU was delayed
> > indefinitely trying to wake up and get the lock but losing out to the
> > first CPU and going back to sleep:
> > 
> > CPU0:                        CPU1:
> > mutex_lock->acquire
> >                              mutex_lock->sleep
> > mutex_unlock->wake CPU1
> >                              wakeup
> > mutex_lock->acquire
> >                              trylock fail->sleep
> > mutex_unlock->wake CPU1
> >                              wakeup
> > mutex_lock->acquire
> >                              trylock fail->sleep
> > ...			     ...
> > 
> > To fix this we can make sure that CPU1 makes progress by avoiding the
> > fastpath locking, optimistic spinning and trylocking if there is any
> > waiter on the list.  The corresponding check can be done without holding
> > wait_lock, since the goal is only to make sure sleepers make progress
> > and not to guarantee that the locking will happen in FIFO order.
> 
> I think we went over this before, that will also completely destroy
> performance under a number of workloads.

Yup, once a thread becomes a waiter, all other threads will need to
follow suit, so this change would effectively disable optimistic
spinning in some workloads.

A few months ago, we worked on patches that allow the waiter to return
to optimistic spinning to help reduce starvation. Longman sent out a
version 3 patch set, and it sounded like we were fine with the concept.

Jason