From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S934475AbcLWPJM (ORCPT ); Fri, 23 Dec 2016 10:09:12 -0500 Received: from mail-pg0-f65.google.com ([74.125.83.65]:35683 "EHLO mail-pg0-f65.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753941AbcLWPJK (ORCPT ); Fri, 23 Dec 2016 10:09:10 -0500 Date: Sat, 24 Dec 2016 00:08:27 +0900 From: Sergey Senozhatsky To: Petr Mladek Cc: Sergey Senozhatsky , Andrew Morton , Jan Kara , Tejun Heo , Calvin Owens , Steven Rostedt , Ingo Molnar , Peter Zijlstra , Andy Lutomirski , Linus Torvalds , Peter Hurley , linux-kernel@vger.kernel.org, Sergey Senozhatsky Subject: Re: [PATCHv6 5/7] printk: report lost messages in printk safe/nmi contexts Message-ID: <20161223150827.GA7464@tigerII.localdomain> References: <20161221143605.2272-1-sergey.senozhatsky@gmail.com> <20161221143605.2272-6-sergey.senozhatsky@gmail.com> <20161223105421.GC2541@linux.suse> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20161223105421.GC2541@linux.suse> User-Agent: Mutt/1.7.2 (2016-11-26) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hello, On (12/23/16 11:54), Petr Mladek wrote: [..] > There is a potential race: > > CPU0 CPU1 > > printk_safe_log_store() > len = atomic_read(&s->len); > > __printk_save_flush() > > atomic_cmpxchg(&s->len, len, 0) > > report_message_lost(s); > > if (len >= sizeof(s->buffer) - 1) { > atomic_inc(&s->message_lost); > return 0; > > We check the outdated len, account lost message, but it will not > be reported until some other message appears in the log buffer. > > > + > > out: > > I would make sense to move report_message_lost(s) here, after > the out: label. hm, ok. to flush from another CPU we first need to have printk-safe/nmi messages on that CPU, then return from printk-safe/nmi on that CPU, execute per-CPU irq_wor, and then have concurrent printk-safe/nmi messages on current CPU, in addition happening frequent enough to hit this case. I may be wrong, but that's quite unlikely. I can move report_message_lost() to `out' label, no problem. thanks for the report. at some point I was actually considering turning ->message_lost into 'bool' -- "we lost your messages, we are sorry". the precise number of lost messages doesn't help that much: the messages are gone, go and increment CONFIG_PRINTK_SAFE_LOG_BUF_SHIFT; that's all we can say now. and speaking of lost messages. I think I found a regression in console_unlock(). so I'll send out a fix ahead of this series. and, besides, the logs I had a pleasure to look at today contained numerous "%d printk messages dropped" with very accurate numbers, but those numbers meant pretty much nothing to me - the messages were lost. -ss