From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755875AbbLWDgU (ORCPT ); Tue, 22 Dec 2015 22:36:20 -0500 Received: from mail-pa0-f52.google.com ([209.85.220.52]:33306 "EHLO mail-pa0-f52.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753147AbbLWDgT (ORCPT ); Tue, 22 Dec 2015 22:36:19 -0500 Date: Wed, 23 Dec 2015 12:37:24 +0900 From: Sergey Senozhatsky To: Sergey Senozhatsky Cc: Jan Kara , Sergey Senozhatsky , Andrew Morton , Petr Mladek , KY Sri nivasan , Steven Rostedt , linux-kernel@vger.kernel.org Subject: Re: [PATCH 1/7] printk: Hand over printing to console if printing too long Message-ID: <20151223033724.GB2008@swordfish> References: <20151210145251.GA540@swordfish> <20151222134730.GD7266@quack.suse.cz> <20151223015420.GA2008@swordfish> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20151223015420.GA2008@swordfish> User-Agent: Mutt/1.5.24 (2015-08-30) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On (12/23/15 10:54), Sergey Senozhatsky wrote: > On (12/22/15 14:47), Jan Kara wrote: > [..] > > @@ -1803,10 +1869,24 @@ asmlinkage int vprintk_emit(int facility, int level, > > logbuf_cpu = UINT_MAX; > > raw_spin_unlock(&logbuf_lock); > > lockdep_on(); > > + /* > > + * By default we print message to console asynchronously so that kernel > > + * doesn't get stalled due to slow serial console. That can lead to > > + * softlockups, lost interrupts, or userspace timing out under heavy > > + * printing load. > > + * > > + * However we resort to synchronous printing of messages during early > > + * boot, when oops is in progress, or when synchronous printing was > > + * explicitely requested by kernel parameter. > > + */ > > + if (keventd_up() && !oops_in_progress && !sync_print) { > > + __this_cpu_or(printk_pending, PRINTK_PENDING_OUTPUT); > > + irq_work_queue(this_cpu_ptr(&wake_up_klogd_work)); > > + } else > > + sync_print = true; oops, didn't have enough coffee... missed that `else sync_print = true' :( -ss > > local_irq_restore(flags); > > can we replace this oops_in_progress check with something more reliable? > > CPU0 CPU1 - CPUN > panic() > local_irq_disable() executing foo() with irqs disabled, > console_verbose() or processing an extremely long irq handler. > bust_spinlocks() > oops_in_progress++ > > smp_send_stop() > > bust_spinlocks() > oops_in_progress-- ok, IPI arrives > dump_stack()/printk()/etc from IPI_CPU_STOP > "while (1) cpu_relax()" with irq/fiq disabled/halt/etc. > > smp_send_stop() wrapped in `oops_in_progress++/oops_in_progress--' is arch specific, > and some platforms don't do any IPI-delivered (e.g. via num_online_cpus()) checks at > all. Some do. For example, arm/arm64: > > void smp_send_stop(void) > ... > /* Wait up to one second for other CPUs to stop */ > timeout = USEC_PER_SEC; > while (num_online_cpus() > 1 && timeout--) > udelay(1); > > if (num_online_cpus() > 1) > pr_warn("SMP: failed to stop secondary CPUs\n"); > ... > > > so there are non-zero chances that IPI will arrive to CPU after 'oops_in_progress--', > and thus dump_stack()/etc. happening on that/those cpu/cpus will be lost. > > > bust_spinlocks(0) does > ... > if (--oops_in_progress == 0) > wake_up_klogd(); > ... > > but local cpu has irqs disabled and `panic_timeout' can be zero. > > How about setting 'sync_print' to 'true' in... > bust_spinlocks() /* only set to true */ > or > console_verbose() /* um... may be... */ > or > having a separate one-liner for that > > void console_panic_mode(void) > { > sync_print = true; > } > > and call it early in panic(), before we send out IPI_STOP. > > -ss >