From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752988AbbDAV7o (ORCPT ); Wed, 1 Apr 2015 17:59:44 -0400 Received: from youngberry.canonical.com ([91.189.89.112]:46784 "EHLO youngberry.canonical.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752427AbbDAV7l (ORCPT ); Wed, 1 Apr 2015 17:59:41 -0400 Message-ID: <551C6A48.9060805@canonical.com> Date: Wed, 01 Apr 2015 16:59:36 -0500 From: Chris J Arges User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.5.0 MIME-Version: 1.0 To: Linus Torvalds CC: Ingo Molnar , Rafael David Tinoco , Peter Anvin , Jiang Liu , Peter Zijlstra , LKML , Jens Axboe , Frederic Weisbecker , Gema Gomez , the arch/x86 maintainers Subject: Re: smp_call_function_single lockups References: <20150331031536.GA9303@canonical.com> <20150331222327.GA12512@canonical.com> <20150401124336.GB12841@gmail.com> <20150401161047.GD12730@canonical.com> In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 04/01/2015 11:14 AM, Linus Torvalds wrote: > On Wed, Apr 1, 2015 at 9:10 AM, Chris J Arges > wrote: >> >> Even with irqbalance removed from the L0/L1 machines the hang still occurs. >> >> This results in no 'apic: vector* or 'ack_APIC*' messages being displayed. > > Ok. So the ack_APIC debug patch found *something*, but it seems to be > unrelated to the hang. > > Dang. Oh well. Back to square one. > > Linus > With my L0 testing I've normally used a 3.13 series kernel since it tends to reproduce the hang very quickly with the testcase. Note we have reproduced an identical hang with newer kernels (3.19+patches) using the openstack tempest on openstack reproducer, but the timing can vary between hours and days. Installing a v4.0-rc6+patch kernel on L0 makes the problem very slow to reproduce, so I am running these tests now which may take day(s). It is worthwhile to do a 'bisect' to see where on average it takes longer to reproduce? Perhaps it will point to a relevant change, or it may be completely useless. --chris