From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752235AbbFSMVf (ORCPT ); Fri, 19 Jun 2015 08:21:35 -0400 Received: from www.linutronix.de ([62.245.132.108]:57789 "EHLO Galois.linutronix.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751363AbbFSMVa (ORCPT ); Fri, 19 Jun 2015 08:21:30 -0400 Date: Fri, 19 Jun 2015 14:21:30 +0200 (CEST) From: Thomas Gleixner To: Sergey Senozhatsky cc: Jiang Liu , Borislav Petkov , linux-kernel@vger.kernel.org, Sergey Senozhatsky Subject: Re: [-next] !irqd_can_balance() WARNINGs at irq_move_masked_irq() In-Reply-To: Message-ID: References: <20150619071123.GA511@swordfish> User-Agent: Alpine 2.11 (DEB 23 2013-08-11) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-Linutronix-Spam-Score: -1.0 X-Linutronix-Spam-Level: - X-Linutronix-Spam-Status: No , -1.0 points, 5.0 required, ALL_TRUSTED=-1,SHORTCIRCUIT=-0.0001 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, 19 Jun 2015, Thomas Gleixner wrote: > On Fri, 19 Jun 2015, Sergey Senozhatsky wrote: > > [ 0.412291] WARNING: CPU: 0 PID: 0 at kernel/irq/migration.c:21 irq_move_masked_irq+0x57/0xc4() > > [ 0.412371] Can't balance irq 0 [edge] > > Yuck. > > > Do you guys want to replace WAN_ON() with WARN_ONCE(), perhaps? This, of course, > > doesn't fix anything; but at least one can boot the system. (not really a patch, > > just an idea). > > Indeed. We really want to clear the move pending bit before the can > balance check. Patch below. But that does not explain why this happens > in the first place. > > Can you please send me a full dmesg, kernel config and output of > /proc/interrupts ? (Private mail is fine, or upload it to some place) Thanks for providing the data. I think I know what happens. Something in the kernel (not yet clear what) tries to move the hpet irq 0 by calling irq_set_affinity(). That's an kernel internal interface which does not check whether the NO BALANCE flag is set for the irq. So the call runs and triggers the move from next interrupt machinery which ends up calling irq_move_masked_irq() and that trips over the flag and yells. That's why I changed the WARN to a pr_warn() because we already know the call stack. So the core behaviour is inconsistent. We let the caller of irq_set_affinity() succeed and yell later because we think it's wrong. I'm pretty sure that we must drop the check for NO BALANCE in irq_move_masked_irq() and only check for the per_cpu bit, but at the same time I really want to know where that call to irq_set_affinity(irq0) is coming from. Can you please collect the output of /proc/timer_list for the previous patch and then replace the previous patch with the one below and gather all the data again? Thanks, tglx -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in Please read the FAQ at http://www.tux.org/lkml/