From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755585AbcCVQle (ORCPT ); Tue, 22 Mar 2016 12:41:34 -0400 Received: from casper.infradead.org ([85.118.1.10]:58375 "EHLO casper.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751408AbcCVQl0 (ORCPT ); Tue, 22 Mar 2016 12:41:26 -0400 Date: Tue, 22 Mar 2016 17:41:22 +0100 From: Peter Zijlstra To: Heiko Carstens Cc: Davidlohr Bueso , tglx@linutronix.de, mingo@kernel.org, bigeasy@linutronix.de, umgwanakikbuti@gmail.com, paulmck@linux.vnet.ibm.com, linux-kernel@vger.kernel.org, kmo@daterainc.com Subject: Re: [PATCH 4/3] rtmutex: Avoid barrier in rt_mutex_handle_deadlock Message-ID: <20160322164122.GS6344@twins.programming.kicks-ass.net> References: <1457461223-4301-1-git-send-email-dave@stgolabs.net> <20160308220539.GB4404@linux-uzut.site> <20160314134038.GZ6356@twins.programming.kicks-ass.net> <20160321181622.GB32012@linux-uzut.site> <20160322102153.GL6344@twins.programming.kicks-ass.net> <20160322113221.GA3921@osiris> <20160322122050.GM6344@twins.programming.kicks-ass.net> <20160322132600.GC3921@osiris> <20160322135530.GR6344@twins.programming.kicks-ass.net> <20160322144537.GE3921@osiris> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20160322144537.GE3921@osiris> User-Agent: Mutt/1.5.21 (2012-12-30) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Mar 22, 2016 at 03:45:37PM +0100, Heiko Carstens wrote: > Sure, looks nice and makes a lot of sense. And the text looks a bit familiar > to me ;) > > Could you provide From: and Signed-off-by: lines? Of course, find below. --- Subject: s390: Clarify pagefault interrupt From: Peter Zijlstra While looking at set_task_state() users I stumbled over the s390 pfault interrupt code. Since Heiko provided a great explanation on how it worked, I figured we ought to preserve this. Also make a few little tweaks to the code to aid in readability and explicitly comment the unusual blocking scheme. Based-on-text-by: Heiko Carstens Signed-off-by: Peter Zijlstra (Intel) --- arch/s390/mm/fault.c | 44 ++++++++++++++++++++++++++++++++++++-------- 1 file changed, 36 insertions(+), 8 deletions(-) diff --git a/arch/s390/mm/fault.c b/arch/s390/mm/fault.c index 791a4146052c..52cc8c99e62c 100644 --- a/arch/s390/mm/fault.c +++ b/arch/s390/mm/fault.c @@ -629,6 +629,29 @@ void pfault_fini(void) static DEFINE_SPINLOCK(pfault_lock); static LIST_HEAD(pfault_list); +#define PF_COMPLETE 0x0080 + +/* + * The mechanism of our pfault code: if Linux is running as guest, runs a user + * space process and the user space process accesses a page that the host has + * paged out we get a pfault interrupt. + * + * This allows us, within the guest, to schedule a different process. Without + * this mechanism the host would have to suspend the whole virtual CPU until + * the page has been paged in. + * + * So when we get such an interrupt then we set the state of the current task + * to uninterruptible and also set the need_resched flag. Both happens within + * interrupt context(!). If we later on want to return to user space we + * recognize the need_resched flag and then call schedule(). It's not very + * obvious how this works... + * + * Of course we have a lot of additional fun with the completion interrupt (-> + * host signals that a page of a process has been paged in and the process can + * continue to run). This interrupt can arrive on any cpu and, since we have + * virtual cpus, actually appear before the interrupt that signals that a page + * is missing. + */ static void pfault_interrupt(struct ext_code ext_code, unsigned int param32, unsigned long param64) { @@ -637,14 +660,14 @@ static void pfault_interrupt(struct ext_code ext_code, pid_t pid; /* - * Get the external interruption subcode & pfault - * initial/completion signal bit. VM stores this - * in the 'cpu address' field associated with the - * external interrupt. + * Get the external interruption subcode & pfault initial/completion + * signal bit. VM stores this in the 'cpu address' field associated + * with the external interrupt. */ subcode = ext_code.subcode; if ((subcode & 0xff00) != __SUBCODE_MASK) return; + inc_irq_stat(IRQEXT_PFL); /* Get the token (= pid of the affected task). */ pid = param64 & LPP_PFAULT_PID_MASK; @@ -655,8 +678,9 @@ static void pfault_interrupt(struct ext_code ext_code, rcu_read_unlock(); if (!tsk) return; + spin_lock(&pfault_lock); - if (subcode & 0x0080) { + if (subcode & PF_COMPLETE) { /* signal bit is set -> a page has been swapped in by VM */ if (tsk->thread.pfault_wait == 1) { /* Initial interrupt was faster than the completion @@ -683,10 +707,10 @@ static void pfault_interrupt(struct ext_code ext_code, /* signal bit not set -> a real page is missing. */ if (WARN_ON_ONCE(tsk != current)) goto out; + if (tsk->thread.pfault_wait == 1) { /* Already on the list with a reference: put to sleep */ - __set_task_state(tsk, TASK_UNINTERRUPTIBLE); - set_tsk_need_resched(tsk); + goto block; } else if (tsk->thread.pfault_wait == -1) { /* Completion interrupt was faster than the initial * interrupt (pfault_wait == -1). Set pfault_wait @@ -701,7 +725,11 @@ static void pfault_interrupt(struct ext_code ext_code, get_task_struct(tsk); tsk->thread.pfault_wait = 1; list_add(&tsk->thread.list, &pfault_list); - __set_task_state(tsk, TASK_UNINTERRUPTIBLE); +block: + /* Since this must be a userspace fault, there + * is no kernel task state to trample. Rely on the + * return to userspace schedule() to block */ + __set_current_state(TASK_UNINTERRUPTIBLE); set_tsk_need_resched(tsk); } }