From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner@vger.kernel.org>
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
	id S1751611AbcGKWfI (ORCPT <rfc822;w@1wt.eu>);
	Mon, 11 Jul 2016 18:35:08 -0400
Received: from mail-pf0-f181.google.com ([209.85.192.181]:35044 "EHLO
	mail-pf0-f181.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
	with ESMTP id S1751414AbcGKWfE (ORCPT
	<rfc822;linux-kernel@vger.kernel.org>);
	Mon, 11 Jul 2016 18:35:04 -0400
Date: Mon, 11 Jul 2016 15:35:01 -0700
From: Viresh Kumar <viresh.kumar@linaro.org>
To: Jan Kara <jack@suse.cz>, Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Tejun Heo <tj@kernel.org>, Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
        Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
        vlevenetz@mm-sol.com, vaibhav.hiremath@linaro.org,
        alex.elder@linaro.org, johan@kernel.org, akpm@linux-foundation.org,
        rostedt@goodmis.org,
        Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Subject: Re: [Query] Preemption (hogging) of the work handler
Message-ID: <20160711223501.GI4695@ubuntu>
References: <20160701165959.GR12473@ubuntu>
 <20160701172232.GD28719@htj.duckdns.org>
 <20160706182842.GS2671@ubuntu>
 <20160711102603.GI12410@quack2.suse.cz>
 <20160711154438.GA528@swordfish>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20160711154438.GA528@swordfish>
User-Agent: Mutt/1.5.24 (2015-08-30)
Sender: linux-kernel-owner@vger.kernel.org
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org

Hi Sergey and Jan,

On 12-07-16, 00:44, Sergey Senozhatsky wrote:
> right. apart from cases when the existing console_unlock() behaviour can
> simply "block" a process to flush the log_buf to slow serial consoles
> (regardless the  process execution context) and make the system less
> responsive, I have around ~10 absolutely different scenarios on my list that
> may cause soft/hard lockups, rcu stalls, oom-s, etc. and console_unlock() is
> the root cause there. the simplest ones involve heavy printk() usage, the
> trickier ones do not necessarily have anything that is abusing printk(): a
> moderate printk() pressure coming from other CPUs on the system and more or
> less active tty -> UART can do the trick, because uart interrupt service
> routine and call_console_drivers()->write() have to compete for the same
> uart port spin_lock. soft lockups are probably the most common problems,
> though, it's not all that easy to catch, because watchdog does not ring
> the bell straight after preempt_enable(), but from hrtimer interrupt, that
> happens approx every 4 seconds. by this time CPU can be somewhere far away
> from console_unlock(). I had an idea of doing watchdog soft lockup check
> from preempt_enable(), when it brings preempt_count down to zero, but not
> sure I can recall how well did it go.

Thanks for your feedback guys, and I have one more blocking issue
where I need your help/advice.

So, the excess printing in our case is done in parallel to system
suspend. And that can very much happen after all the non-boot CPUs are
offlined.

Sometimes, the platform doesn't come back after suspend. I have tried
enabling no-console-suspend and the last line it prints is:

        Disabling non-boot CPUs

And nothing after that at all. We have to forcefully reboot the phone
after that. Moving the prints to they synchronous way (using
echo 1 > /sys/module/printk/parameters/synchronous), fixes that issue.

So, the asynchronous printing have a issue that only we are hitting.
It looks like that all the CPUs are gone except CPU0 and that CPU is
hogged by the printk thread to print stuff as well as to suspend the
system, and something eventually gets wrong.

I am only using the 3 patches from V12 version of the series.

-- 
viresh