From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1758663AbcCVN0T (ORCPT ); Tue, 22 Mar 2016 09:26:19 -0400 Received: from e06smtp06.uk.ibm.com ([195.75.94.102]:37277 "EHLO e06smtp06.uk.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1758651AbcCVN0M (ORCPT ); Tue, 22 Mar 2016 09:26:12 -0400 X-IBM-Helo: d06dlp03.portsmouth.uk.ibm.com X-IBM-MailFrom: heiko.carstens@de.ibm.com X-IBM-RcptTo: linux-kernel@vger.kernel.org Date: Tue, 22 Mar 2016 14:26:00 +0100 From: Heiko Carstens To: Peter Zijlstra Cc: Davidlohr Bueso , tglx@linutronix.de, mingo@kernel.org, bigeasy@linutronix.de, umgwanakikbuti@gmail.com, paulmck@linux.vnet.ibm.com, linux-kernel@vger.kernel.org, kmo@daterainc.com Subject: Re: [PATCH 4/3] rtmutex: Avoid barrier in rt_mutex_handle_deadlock Message-ID: <20160322132600.GC3921@osiris> References: <1457461223-4301-1-git-send-email-dave@stgolabs.net> <20160308220539.GB4404@linux-uzut.site> <20160314134038.GZ6356@twins.programming.kicks-ass.net> <20160321181622.GB32012@linux-uzut.site> <20160322102153.GL6344@twins.programming.kicks-ass.net> <20160322113221.GA3921@osiris> <20160322122050.GM6344@twins.programming.kicks-ass.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20160322122050.GM6344@twins.programming.kicks-ass.net> User-Agent: Mutt/1.5.21 (2010-09-15) X-TM-AS-MML: disable X-Content-Scanned: Fidelis XPS MAILER x-cbid: 16032213-0025-0000-0000-000008EC1DA8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Mar 22, 2016 at 01:20:50PM +0100, Peter Zijlstra wrote: > On Tue, Mar 22, 2016 at 12:32:21PM +0100, Heiko Carstens wrote: > > On Tue, Mar 22, 2016 at 11:21:53AM +0100, Peter Zijlstra wrote: > > > > And s390 does something entirely vile, no idea what. > > > > For the two s390 usages tsk equals current. So it could be easily replaced > > with set_current_state(). > > Hmm indeed, I only saw tsk = find_task_by_pid_ns() and didn't look > further, but you do indeed have an assertion later that ensures task == > current. > > I still don't get that code though; why would you set the current task > state to UNINTERRUPTIBLE, also set need_resched, but then not call > schedule() at all. > > Clearly something magical is going on and its not clear. The mechanism of our pfault code: if Linux is running as guest, runs a user space process and the user space process accesses a page that the host has paged out we get a pfault interrupt. This allows us, within the guest, to schedule a different process. Without this mechanism the host would have to suspend the whole virtual CPU until the page has been paged in. So when we get such an interrupt then we set the state of the current task to uninterruptible and also set the need_resched flag. Both happens within interrupt context(!). If we later on want to return to user space we recognize the need_resched flag and then call schedule(). It's not very obvious how this works... Of course we have a lot of additional fun with the completion interrupt (-> host signals that a page of a process has been paged in and the process can continue to run). This interrupt can arrive on any cpu and, since we have virtual cpus, actually appear before the interrupt that signals that a page is missing.