From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751881AbcGRRr7 (ORCPT ); Mon, 18 Jul 2016 13:47:59 -0400 Received: from g2t4625.austin.hp.com ([15.73.212.76]:39718 "EHLO g2t4625.austin.hp.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751457AbcGRRr5 (ORCPT ); Mon, 18 Jul 2016 13:47:57 -0400 Message-ID: <1468864069.2367.21.camel@j-VirtualBox> Subject: Re: [RFC] locking/mutex: Fix starvation of sleeping waiters From: Jason Low To: Peter Zijlstra Cc: jason.low2@hpe.com, Imre Deak , linux-kernel@vger.kernel.org, Ingo Molnar , Chris Wilson , Daniel Vetter , Ville Syrj??l?? , Waiman Long , Davidlohr Bueso , jason.low2@hp.com Date: Mon, 18 Jul 2016 10:47:49 -0700 In-Reply-To: <20160718171537.GC6862@twins.programming.kicks-ass.net> References: <1468858607-20481-1-git-send-email-imre.deak@intel.com> <20160718171537.GC6862@twins.programming.kicks-ass.net> Content-Type: text/plain; charset="UTF-8" X-Mailer: Evolution 3.10.4-0ubuntu2 Mime-Version: 1.0 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, 2016-07-18 at 19:15 +0200, Peter Zijlstra wrote: > On Mon, Jul 18, 2016 at 07:16:47PM +0300, Imre Deak wrote: > > Currently a thread sleeping on a mutex wait queue can be delayed > > indefinitely by other threads managing to steal the lock, that is > > acquiring the lock out-of-order before the sleepers. I noticed this via > > a testcase (see the Reference: below) where one CPU was unlocking / > > relocking a mutex in a tight loop while another CPU was delayed > > indefinitely trying to wake up and get the lock but losing out to the > > first CPU and going back to sleep: > > > > CPU0: CPU1: > > mutex_lock->acquire > > mutex_lock->sleep > > mutex_unlock->wake CPU1 > > wakeup > > mutex_lock->acquire > > trylock fail->sleep > > mutex_unlock->wake CPU1 > > wakeup > > mutex_lock->acquire > > trylock fail->sleep > > ... ... > > > > To fix this we can make sure that CPU1 makes progress by avoiding the > > fastpath locking, optimistic spinning and trylocking if there is any > > waiter on the list. The corresponding check can be done without holding > > wait_lock, since the goal is only to make sure sleepers make progress > > and not to guarantee that the locking will happen in FIFO order. > > I think we went over this before, that will also completely destroy > performance under a number of workloads. Yup, once a thread becomes a waiter, all other threads will need to follow suit, so this change would effectively disable optimistic spinning in some workloads. A few months ago, we worked on patches that allow the waiter to return to optimistic spinning to help reduce starvation. Longman sent out a version 3 patch set, and it sounded like we were fine with the concept. Jason