From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751611AbcGKWfI (ORCPT ); Mon, 11 Jul 2016 18:35:08 -0400 Received: from mail-pf0-f181.google.com ([209.85.192.181]:35044 "EHLO mail-pf0-f181.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751414AbcGKWfE (ORCPT ); Mon, 11 Jul 2016 18:35:04 -0400 Date: Mon, 11 Jul 2016 15:35:01 -0700 From: Viresh Kumar To: Jan Kara , Sergey Senozhatsky Cc: Tejun Heo , Greg Kroah-Hartman , Linux Kernel Mailing List , vlevenetz@mm-sol.com, vaibhav.hiremath@linaro.org, alex.elder@linaro.org, johan@kernel.org, akpm@linux-foundation.org, rostedt@goodmis.org, Sergey Senozhatsky Subject: Re: [Query] Preemption (hogging) of the work handler Message-ID: <20160711223501.GI4695@ubuntu> References: <20160701165959.GR12473@ubuntu> <20160701172232.GD28719@htj.duckdns.org> <20160706182842.GS2671@ubuntu> <20160711102603.GI12410@quack2.suse.cz> <20160711154438.GA528@swordfish> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20160711154438.GA528@swordfish> User-Agent: Mutt/1.5.24 (2015-08-30) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi Sergey and Jan, On 12-07-16, 00:44, Sergey Senozhatsky wrote: > right. apart from cases when the existing console_unlock() behaviour can > simply "block" a process to flush the log_buf to slow serial consoles > (regardless the process execution context) and make the system less > responsive, I have around ~10 absolutely different scenarios on my list that > may cause soft/hard lockups, rcu stalls, oom-s, etc. and console_unlock() is > the root cause there. the simplest ones involve heavy printk() usage, the > trickier ones do not necessarily have anything that is abusing printk(): a > moderate printk() pressure coming from other CPUs on the system and more or > less active tty -> UART can do the trick, because uart interrupt service > routine and call_console_drivers()->write() have to compete for the same > uart port spin_lock. soft lockups are probably the most common problems, > though, it's not all that easy to catch, because watchdog does not ring > the bell straight after preempt_enable(), but from hrtimer interrupt, that > happens approx every 4 seconds. by this time CPU can be somewhere far away > from console_unlock(). I had an idea of doing watchdog soft lockup check > from preempt_enable(), when it brings preempt_count down to zero, but not > sure I can recall how well did it go. Thanks for your feedback guys, and I have one more blocking issue where I need your help/advice. So, the excess printing in our case is done in parallel to system suspend. And that can very much happen after all the non-boot CPUs are offlined. Sometimes, the platform doesn't come back after suspend. I have tried enabling no-console-suspend and the last line it prints is: Disabling non-boot CPUs And nothing after that at all. We have to forcefully reboot the phone after that. Moving the prints to they synchronous way (using echo 1 > /sys/module/printk/parameters/synchronous), fixes that issue. So, the asynchronous printing have a issue that only we are hitting. It looks like that all the CPUs are gone except CPU0 and that CPU is hogged by the printk thread to print stuff as well as to suspend the system, and something eventually gets wrong. I am only using the 3 patches from V12 version of the series. -- viresh