From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner@vger.kernel.org>
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
	id S1759403AbcGKPpI (ORCPT <rfc822;w@1wt.eu>);
	Mon, 11 Jul 2016 11:45:08 -0400
Received: from mail-pa0-f41.google.com ([209.85.220.41]:36827 "EHLO
	mail-pa0-f41.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
	with ESMTP id S1755292AbcGKPpF (ORCPT
	<rfc822;linux-kernel@vger.kernel.org>);
	Mon, 11 Jul 2016 11:45:05 -0400
Date: Tue, 12 Jul 2016 00:44:38 +0900
From: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
To: Jan Kara <jack@suse.cz>, Viresh Kumar <viresh.kumar@linaro.org>
Cc: Tejun Heo <tj@kernel.org>, Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
        Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
        vlevenetz@mm-sol.com, vaibhav.hiremath@linaro.org,
        alex.elder@linaro.org, johan@kernel.org, akpm@linux-foundation.org,
        rostedt@goodmis.org, Sergey Senozhatsky <sergey.senozhatsky@gmail.com>,
        Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Subject: Re: [Query] Preemption (hogging) of the work handler
Message-ID: <20160711154438.GA528@swordfish>
References: <20160701165959.GR12473@ubuntu>
 <20160701172232.GD28719@htj.duckdns.org>
 <20160706182842.GS2671@ubuntu>
 <20160711102603.GI12410@quack2.suse.cz>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20160711102603.GI12410@quack2.suse.cz>
User-Agent: Mutt/1.6.2 (2016-07-01)
Sender: linux-kernel-owner@vger.kernel.org
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org

Hello,

Thanks for Cc-ing.

I'm attending an internal 2-days training now, so I'm a bit
slow at answering emails, sorry.

On (07/11/16 12:26), Jan Kara wrote:
[..]
> > These print messages continue from 2994.918 to 2996.268 (1.35 seconds)
> > and they hog the work-handler for that long, which results in watchdog
> > reboot in our setup. The 3.10 kernel implementation of the printk
> > looks like this (if I am not wrong):
> > 
> >         local_irq_save();
> >         flush-console-buffer(); //console_unlock()
> >         local_irq_restore();
> > 
> > So, the current CPU will try to print all the messages from the
> > buffer, before enabling the interrupts again on the local CPU and so I
> > don't see the hrtimer fire at all for almost a second.
> > 

right. apart from cases when the existing console_unlock() behaviour can
simply "block" a process to flush the log_buf to slow serial consoles
(regardless the  process execution context) and make the system less
responsive, I have around ~10 absolutely different scenarios on my list that
may cause soft/hard lockups, rcu stalls, oom-s, etc. and console_unlock() is
the root cause there. the simplest ones involve heavy printk() usage, the
trickier ones do not necessarily have anything that is abusing printk(): a
moderate printk() pressure coming from other CPUs on the system and more or
less active tty -> UART can do the trick, because uart interrupt service
routine and call_console_drivers()->write() have to compete for the same
uart port spin_lock. soft lockups are probably the most common problems,
though, it's not all that easy to catch, because watchdog does not ring
the bell straight after preempt_enable(), but from hrtimer interrupt, that
happens approx every 4 seconds. by this time CPU can be somewhere far away
from console_unlock(). I had an idea of doing watchdog soft lockup check
from preempt_enable(), when it brings preempt_count down to zero, but not
sure I can recall how well did it go.

> > I tried looking at if something related to this changed between 3.10
> > and mainline, and found few patches at least. One of the important
> > ones is:
> > 
> > commit 5874af2003b1 ("printk: enable interrupts before calling
> > console_trylock_for_printk()")
> > 
> > I wasn't able to backport it cleanly to 3.10 yet to see it makes thing
> > work better though. But it looks like it was targeting similar
> > problems.
> Yes. We have similar problems as you observe on machines when they do a lot
> of printing (usually due to device discovery or similar reasons). The
> problem is not fully solved even upstream as Andrew is reluctant to merge
> the patches. Sergey (added to CC) has the latest version of the series [1].
> If you are interested, I can send you the patches for 3.12 kernel which we
> carry in SLES kernels and which fixes the issue for us. It is significanly
> different from current upstream version but it works good enough for us.

yes, an alternative link /* lkml.org is pretty unreliable sometimes*/
is: http://marc.info/?l=linux-kernel&m=146314209118602

I don't have a backport to 3.10, sorry. I had it some time ago (not the
current version, tho), but I think I lost it by now, don't have to deal
with 3.10 anymore.

I'll re-spin the series in a day or two, I think. A rebased version
(against next-20160711), basically, has only that KERN_CONT patch as
part of 0001 now: http://marc.info/?l=linux-kernel&m=146717692431893

hopefully it will re-fresh the discussion and I'll be able to polish
the series so Andrew will be less sceptical about the whole thing.

	-ss