From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755034AbcBDMRz (ORCPT ); Thu, 4 Feb 2016 07:17:55 -0500 Received: from mail-wm0-f42.google.com ([74.125.82.42]:35340 "EHLO mail-wm0-f42.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754431AbcBDMRx (ORCPT ); Thu, 4 Feb 2016 07:17:53 -0500 Message-ID: <1454588264.3407.142.camel@gmail.com> Subject: Re: crash in 3.12.51 (likely in 3.12.52 as well) in timer code From: Mike Galbraith To: Nikolay Borisov , "Linux-Kernel@Vger. Kernel. Org" Cc: Jiri Slaby , Oleg Nesterov , tglx@linutronix.de, SiteGround Operations Date: Thu, 04 Feb 2016 13:17:44 +0100 In-Reply-To: <56B33B34.6090508@kyup.com> References: <56B1DD62.9030900@kyup.com> <1454585550.3407.126.camel@gmail.com> <56B33B34.6090508@kyup.com> Content-Type: text/plain; charset="UTF-8" X-Mailer: Evolution 3.16.5 Mime-Version: 1.0 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, 2016-02-04 at 13:51 +0200, Nikolay Borisov wrote: > > On 02/04/2016 01:32 PM, Mike Galbraith wrote: > > On Wed, 2016-02-03 at 12:58 +0200, Nikolay Borisov wrote: > > > > > > So in this case the prev/next entries do not look like corrupted, > > > whereas > > > when manipulating the list inside detach_timer they do. This is > > > really > > > odd, any ideas how to further debug this? > > > > Suspiciously similar to https://lkml.org/lkml/2016/2/4/247 > > Right, I've been cursory following this thread but I was left with the > impression this only occurs on machines where the CPU can go offline, > currently the server on which this happened should never offline any of > its CPUs since the power management is disabled (though I will have to > double check this). AFAIU, hotplug isn't required, only mod_delayed_work() being called from a different CPU than where the timer was born, migrating it at a bad time. > On a different note - is there a way to safely reproduce this so I can > test the suggested fix by Thomas? Hm, write a module to beat mod_delayed_work() to pulp with a NR_CPUS horde, and run it in a vm where you don't care about shrapnel? -Mike