All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jan Kara <jack@suse.cz>
To: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: Jan Kara <jack@suse.cz>,
	Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>,
	Viresh Kumar <viresh.kumar@linaro.org>,
	Sergey Senozhatsky <sergey.senozhatsky@gmail.com>,
	Tejun Heo <tj@kernel.org>,
	Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	vlevenetz@mm-sol.com, vaibhav.hiremath@linaro.org,
	alex.elder@linaro.org, johan@kernel.org,
	akpm@linux-foundation.org, rostedt@goodmis.org,
	linux-pm@vger.kernel.org, Petr Mladek <pmladek@suse.com>,
	Thomas Gleixner <tglx@linutronix.de>
Subject: Re: [Query] Preemption (hogging) of the work handler
Date: Thu, 14 Jul 2016 16:55:54 +0200	[thread overview]
Message-ID: <20160714145554.GG13151@quack2.suse.cz> (raw)
In-Reply-To: <56498260.j02nbY6ACm@vostro.rjw.lan>

On Thu 14-07-16 16:47:11, Rafael J. Wysocki wrote:
> On Thursday, July 14, 2016 04:39:39 PM Jan Kara wrote:
> > On Thu 14-07-16 16:33:38, Rafael J. Wysocki wrote:
> > > On Thursday, July 14, 2016 04:12:16 PM Jan Kara wrote:
> > > > On Wed 13-07-16 14:45:07, Sergey Senozhatsky wrote:
> > > > > Cc Petr Mladek.
> > > > > 
> > > > > On (07/12/16 16:19), Viresh Kumar wrote:
> > > > > [..]
> > > > > > Okay, we have tracked this BUG and its really interesting.
> > > > > 
> > > > > good find!
> > > > > 
> > > > > > I hacked the platform's serial driver to implement a putchar() routine
> > > > > > that simply writes to the FIFO in polling mode, that helped us in
> > > > > > tracing on where we are going wrong.
> > > > > > 
> > > > > > The problem is that we are running asynchronous printks and we call
> > > > > > wake_up_process() from the last running CPU which has disabled
> > > > > > interrupts. That takes us to: try_to_wake_up().
> > > > > > 
> > > > > > In our case the CPU gets deadlocked on this line in try_to_wake_up().
> > > > > > 
> > > > > >         raw_spin_lock_irqsave(&p->pi_lock, flags);
> > > > > 
> > > > > yeah, printk() can't handle these types of recursion. it can prevent
> > > > > printk() calls issued from within the logbuf_lock spinlock section,
> > > > > with some limitations:
> > > > > 
> > > > > 	if (unlikely(logbuf_cpu == smp_processor_id())) {
> > > > > 		recursion_bug = true;
> > > > > 		return;
> > > > > 	}
> > > > > 
> > > > > 	raw_spin_lock(&logbuf_lock);
> > > > > 	logbuf_cpu = this_cpu;
> > > > > 		...
> > > > > 	logbuf_cpu = UINT_MAX;
> > > > > 	raw_spin_unlock(&logbuf_lock);
> > > > > 
> > > > > so should, for instance, raw_spin_unlock() generate spin_dump(), printk()
> > > > > will blow up (both sync and async), because logbuf_cpu is already reset.
> > > > > it may look that async printk added another source of recursion - wake_up().
> > > > > but, apparently, this is not exactly correct. because there is already a
> > > > > wake_up() call in console_unlock() - up().
> > > > > 
> > > > > 	printk()
> > > > > 	 if (logbuf_cpu == smp_processor_id())
> > > > > 		return;
> > > > > 
> > > > >          raw_spin_lock(&logbuf_lock);
> > > > > 	 logbuf_cpu = this_cpu;
> > > > > 	 ...
> > > > > 	 logbuf_cpu = UINT_MAX;
> > > > >          raw_spin_unlock(&logbuf_lock);
> > > > > 
> > > > > 	 console_trylock()
> > > > > 	   raw_spin_lock_irqsave(&sem->lock)      << ***
> > > > > 	   raw_spin_unlock_irqsave(&sem->lock)    << ***
> > > > > 
> > > > > 	 console_unlock()
> > > > >           up()
> > > > > 	   raw_spin_lock_irqsave(&sem->lock)  << ***
> > > > > 	    __up()
> > > > > 	     wake_up_process()
> > > > > 	      try_to_wake_up()  << *** in may places
> > > > > 
> > > > > 
> > > > > *** a printk() call from here will kill the system. either it will
> > > > > recurse printk(), or spin forever in 'nested' printk() on one of
> > > > > the already taken spin locks.
> > > > 
> > > > Exactly. Calling printk() from certain parts of the kernel (like scheduler
> > > > code or timer code) has been always unsafe because printk itself uses these
> > > > parts and so it can lead to deadlocks. That's why printk_deffered() has
> > > > been introduced as you mention below.
> > > > 
> > > > And with sync printk the above deadlock doesn't trigger only by chance - if
> > > > there happened to be a waiter on console_sem while we suspend, the same
> > > > deadlock would trigger because up(&console_sem) will try to wake him up and
> > > > the warning in timekeeping code will cause recursive printk.
> > > > 
> > > > So I think your patch doesn't really address the real issue - it only
> > > > works around the particular WARN_ON(timekeeping_enabled) warning but if
> > > > there was a different warning in timekeeping code which would trigger, it
> > > > has a potential for causing recursive printk deadlock (and indeed we had
> > > > such issues previously - see e.g. 504d58745c9c "timer: Fix lock inversion
> > > > between hrtimer_bases.lock and scheduler locks").
> > > > 
> > > > So there are IMHO two issues here worth looking at:
> > > > 
> > > > 1) I didn't find how a wakeup would would lead to calling to ktime_get() in
> > > > the current upstream kernel or even current RT kernel. Maybe this is a
> > > > problem specific to the 3.10 kernel you are using? If yes, we don't have to
> > > > do anything for current upstream AFAIU.
> > > > 
> > > > If I just missed how wakeup can call into ktime_get() in current upstream,
> > > > there is another question:
> > > > 
> > > > 2) Is it OK that printk calls wakeup so late during suspend? I believe it
> > > > is but I'm neither scheduler nor suspend expert.
> > > 
> > > I don't think it really is OK.  Nothing will wake up for sure at this point,
> > > so why to do that in the first place?
> > 
> > So that the process is put into a runnable state (currently it is in
> > uninterruptible sleep) and may run after the system resumes?
> 
> Fair enough.
> 
> But calling ktime_get() with suspended timekeeping is dumb at best which
> is why the warning is there.

Agree on that - but that seems to be a problem of a particular wakeup
implementation of the 3.10 kernel Viresh is using, not a problem of the
upstream kernel.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

  reply	other threads:[~2016-07-14 14:56 UTC|newest]

Thread overview: 56+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-07-01 16:59 [Query] Preemption (hogging) of the work handler Viresh Kumar
2016-07-01 17:22 ` Tejun Heo
2016-07-01 17:28   ` Viresh Kumar
2016-07-06 18:28   ` Viresh Kumar
2016-07-06 19:23     ` Steven Rostedt
2016-07-06 19:25       ` Viresh Kumar
2016-07-11 10:26     ` Jan Kara
2016-07-11 15:44       ` Sergey Senozhatsky
2016-07-11 22:35         ` Viresh Kumar
2016-07-11 22:44           ` Rafael J. Wysocki
2016-07-11 22:46             ` Viresh Kumar
2016-07-12 12:24               ` Rafael J. Wysocki
2016-07-12 13:02                 ` Viresh Kumar
2016-07-12 13:56                   ` Petr Mladek
2016-07-12 14:04                     ` Viresh Kumar
2016-07-12  9:38           ` Sergey Senozhatsky
2016-07-12 12:52             ` Petr Mladek
2016-07-12 13:12               ` Viresh Kumar
2016-07-12 17:11                 ` Viresh Kumar
2016-07-12 19:59                   ` Rafael J. Wysocki
2016-07-12 20:08                     ` Viresh Kumar
2016-07-13  7:00                   ` Sergey Senozhatsky
2016-07-13 12:05                     ` Rafael J. Wysocki
2016-07-13 12:57                       ` Sergey Senozhatsky
2016-07-13 13:22                         ` Rafael J. Wysocki
2016-07-12 14:03               ` Sergey Senozhatsky
2016-07-12 14:12                 ` Viresh Kumar
2016-07-14 23:52                 ` Viresh Kumar
2016-07-15 13:11                   ` Sergey Senozhatsky
2016-07-15 15:57                     ` Viresh Kumar
2016-07-12 23:19           ` Viresh Kumar
2016-07-13  0:18             ` Viresh Kumar
2016-07-13  5:45             ` Sergey Senozhatsky
2016-07-13 15:39               ` Viresh Kumar
2016-07-13 23:08                 ` Rafael J. Wysocki
2016-07-13 23:18                   ` Viresh Kumar
2016-07-13 23:38                     ` Greg Kroah-Hartman
2016-07-14  0:55                 ` Sergey Senozhatsky
2016-07-14  1:09                   ` Rafael J. Wysocki
2016-07-14  1:32                     ` Sergey Senozhatsky
2016-07-14 21:57                       ` Viresh Kumar
2016-07-14 21:55                   ` Viresh Kumar
2016-07-14 14:12               ` Jan Kara
2016-07-14 14:33                 ` Rafael J. Wysocki
2016-07-14 14:39                   ` Jan Kara
2016-07-14 14:47                     ` Rafael J. Wysocki
2016-07-14 14:55                       ` Jan Kara [this message]
2016-07-14 22:14                         ` Viresh Kumar
2016-07-14 14:34                 ` Sergey Senozhatsky
2016-07-14 15:03                   ` Jan Kara
2016-07-14 22:12                 ` Viresh Kumar
2016-07-18 11:01                   ` Jan Kara
2016-07-18 11:49                     ` Rafael J. Wysocki
2016-07-29 20:42               ` Viresh Kumar
2016-07-30  2:12                 ` Sergey Senozhatsky
2016-07-11 19:03       ` Viresh Kumar

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160714145554.GG13151@quack2.suse.cz \
    --to=jack@suse.cz \
    --cc=akpm@linux-foundation.org \
    --cc=alex.elder@linaro.org \
    --cc=gregkh@linuxfoundation.org \
    --cc=johan@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pm@vger.kernel.org \
    --cc=pmladek@suse.com \
    --cc=rjw@rjwysocki.net \
    --cc=rostedt@goodmis.org \
    --cc=sergey.senozhatsky.work@gmail.com \
    --cc=sergey.senozhatsky@gmail.com \
    --cc=tglx@linutronix.de \
    --cc=tj@kernel.org \
    --cc=vaibhav.hiremath@linaro.org \
    --cc=viresh.kumar@linaro.org \
    --cc=vlevenetz@mm-sol.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.