From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752458AbcKRAn0 (ORCPT ); Thu, 17 Nov 2016 19:43:26 -0500 Received: from Galois.linutronix.de ([146.0.238.70]:50433 "EHLO Galois.linutronix.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750866AbcKRAnY (ORCPT ); Thu, 17 Nov 2016 19:43:24 -0500 Date: Fri, 18 Nov 2016 01:40:43 +0100 (CET) From: Thomas Gleixner To: Brian Starkey cc: Eric Dumazet , LKML , Peter Zijlstra , Ingo Molnar , Andrew Morton , Alexander Potapenko , Steven Rostedt , Sebastian Andrzej Siewior Subject: Re: Regression: Failed boots bisected to 4cd13c21b207 "softirq: Let ksoftirqd do its job" In-Reply-To: <20161117164200.GA24653@e106950-lin.cambridge.arm.com> Message-ID: References: <20161116135527.GA5833@e106950-lin.cambridge.arm.com> <20161116180156.GA21156@e106950-lin.cambridge.arm.com> <20161116210139.GB21156@e106950-lin.cambridge.arm.com> <20161117164200.GA24653@e106950-lin.cambridge.arm.com> User-Agent: Alpine 2.20 (DEB 67 2015-01-07) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Brian, On Thu, 17 Nov 2016, Brian Starkey wrote: > No joy with this patch :-( > > I had to add an ioaddr argument because apparently that macro depends > on local context (yuck...), but it doesn't help my issue. > > FWIW I don't see any timeouts, either with or without the patch. > (I don't know for sure, but I would guess that the model of the > network card doesn't model whatever stall that loop is checking for. > It probably just completes all MMU operations immediately) Is there a chance that you enable trace points at the kernel command line? trace_event=sched_wakeup,sched_switch,irq_handler_entry,irq_handler_exit,softirq_raise,softirq_entry,softirq_exit should be enough for a start. All we need aside of that is a trigger to stop the trace so we can actually see the events around the time where things go stale. I assume that the whole issue is visible throughout the slow progress of init towards a working system, so for a start it would be sufficient to add something like this into the startup sequence at some point: mount -t debugfs debugfs /sys/kernel/debug echo 0 >/sys/kernel/debug/tracing/tracing_on The only interesting challange is to get the trace data out of the system. The trace is accessible via: cat /sys/kernel/tracing/trace So if your ssh works at some point, that might be an option or you just try to store it over NFS (which will be slow, but better than nothing). Maybe you have a better idea :) Thanks, tglx