From mboxrd@z Thu Jan  1 00:00:00 1970
From: "Jan Beulich" <JBeulich@suse.com>
Subject: Re: Strange PVM spinlock case revisited (nailed down)
Date: Thu, 14 Feb 2013 11:43:31 +0000
Message-ID: <511CDBF302000078000BE2D7@nat28.tlf.novell.com>
References: <51151ABF.1070007@canonical.com> <511CC4D8.3060203@canonical.com>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Return-path: <xen-devel-bounces@lists.xen.org>
In-Reply-To: <511CC4D8.3060203@canonical.com>
Content-Disposition: inline
List-Unsubscribe: <http://lists.xen.org/cgi-bin/mailman/options/xen-devel>,
	<mailto:xen-devel-request@lists.xen.org?subject=unsubscribe>
List-Post: <mailto:xen-devel@lists.xen.org>
List-Help: <mailto:xen-devel-request@lists.xen.org?subject=help>
List-Subscribe: <http://lists.xen.org/cgi-bin/mailman/listinfo/xen-devel>,
	<mailto:xen-devel-request@lists.xen.org?subject=subscribe>
Sender: xen-devel-bounces@lists.xen.org
Errors-To: xen-devel-bounces@lists.xen.org
To: Stefan Bader <stefan.bader@canonical.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>, Ian Campbell <ian.campbell@citrix.com>, "xen-devel@lists.xen.org" <xen-devel@lists.xen.org>
List-Id: xen-devel@lists.xenproject.org

>>> On 14.02.13 at 12:04, Stefan Bader <stefan.bader@canonical.com> wrote:
> There was a bit more on the stack but try_to_wake_up seemed a believable 
> current
> path. Even more so after looking into the function:
> 
> #ifdef CONFIG_SMP
>         /*
>          * If the owning (remote) cpu is still in the middle of schedule() with
>          * this task as prev, wait until its done referencing the task.
>          */
>         while (p->on_cpu) {
> #ifdef __ARCH_WANT_INTERRUPTS_ON_CTXSW
>                 /*
>                  * In case the architecture enables interrupts in
>                  * context_switch(), we cannot busy wait, since that
>                  * would lead to deadlocks when an interrupt hits and
>                  * tries to wake up @prev. So bail and do a complete
>                  * remote wakeup.
>                  */
>                 if (ttwu_activate_remote(p, wake_flags))
>                         goto stat;
> #else
>                 cpu_relax();
> #endif
> 
> And prying out the task in question from the stack, it was one currently
> being accounted for VCPU 6 and on_cpu. VCPU 6 is one of the other waiters 
> for
> the runq lock of VCPU 1. Which would get picked up as soon as this callback 
> is
> done. Unfortunately we "stay awhile... stay forever!".

When analyzing a similar problem with our old PV ticket lock
implementation (and the interrupt re-enabling there when possible
before going into polling mode), it was precisely this open coded
lock construct that caused problems for us. Back then I didn't,
however, realize that this would even affect the simplistic byte
locks used by the pv-ops Xen code (or else I would have pointed
this out).

Being relatively certain that with our new implementation we don't
have any such problems, I can only recommend against dropping
the re-enabling of interrupts - this needlessly introduces high
interrupt latencies in a broader range of cases. Instead, the
interactions ought to be fixed properly. And it's time for using
ticket locks in the Xen code too...

Jan