linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Mike Galbraith <efault@gmx.de>
To: Thomas Gleixner <tglx@linutronix.de>
Cc: Jan Kara <jack@suse.cz>, Jeff Moyer <jmoyer@redhat.com>,
	LKML <linux-kernel@vger.kernel.org>,
	linux-fsdevel@vger.kernel.org, Tejun Heo <tj@kernel.org>,
	Jens Axboe <jaxboe@fusionio.com>,
	mgalbraith@suse.com
Subject: Re: Deadlocks due to per-process plugging
Date: Sat, 14 Jul 2012 13:00:21 +0200	[thread overview]
Message-ID: <1342263621.7368.36.camel@marge.simpson.net> (raw)
In-Reply-To: <alpine.LFD.2.02.1207131444490.32033@ionos>

I have your patch burning on my 64 core rt box.  If it survives the
weekend, you should be able to replace my jbd hack with your fix..

Tested-by: Mike Galbraith <mgalbraith@suse.de>

..so here, one each chop in advance.  It wouldn't dare work ;-)

On Fri, 2012-07-13 at 16:25 +0200, Thomas Gleixner wrote: 
> On Fri, 13 Jul 2012, Jan Kara wrote:
> > On Thu 12-07-12 16:15:29, Thomas Gleixner wrote:
> > > >   Ah, I didn't know this. Thanks for the hint. So in the kdump I have I can
> > > > see requests queued in tsk->plug despite the process is sleeping in
> > > > TASK_UNINTERRUPTIBLE state.  So the only way how unplug could have been
> > > > omitted is if tsk_is_pi_blocked() was true. Rummaging through the dump...
> > > > indeed task has pi_blocked_on = 0xffff8802717d79c8. The dump is from an -rt
> > > > kernel (I just didn't originally thought that makes any difference) so
> > > > actually any mutex is rtmutex and thus tsk_is_pi_blocked() is true whenever
> > > > we are sleeping on a mutex. So this seems like a bug in rtmutex code.
> > > 
> > > Well, the reason why this check is there is that the task which is
> > > blocked on a lock can hold another lock which might cause a deadlock
> > > in the flush path.
> >   OK. Let me understand the details. Block layer needs just queue_lock for
> > unplug to succeed. That is a spinlock but in RT kernel, even a process
> > holding a spinlock can be preempted if I remember correctly. So that
> > condition is there effectively to not unplug when a task is being scheduled
> > away while holding queue_lock? Did I get it right?
> 
> blk_flush_plug_list() is not only queue_lock. There can be other locks
> taken in the callbacks, elevator ...
> 
> > > > Thomas, you seemed to have added that condition... Any idea how to avoid
> > > > the deadlock?
> > > 
> > > Good question. We could do the flush when the blocked task does not
> > > hold a lock itself. Might be worth a try.
> >   Yeah, that should work for avoiding the deadlock as well.
> 
> Though we don't have a lock held count except when lockdep is enabled,
> which you probably don't want to do when running a production system.
> 
> But we only care about stuff being scheduled out while blocked on a
> "sleeping spinlock" - i.e. spinlock, rwlock.
> 
> So the patch below should allow the unplug to take place when blocked
> on mutexes etc.
> 
> Thanks,
> 
> 	tglx
> ----
> Index: linux-stable-rt/include/linux/sched.h
> ===================================================================
> --- linux-stable-rt.orig/include/linux/sched.h
> +++ linux-stable-rt/include/linux/sched.h
> @@ -2145,9 +2145,10 @@ extern unsigned int sysctl_sched_cfs_ban
>  extern int rt_mutex_getprio(struct task_struct *p);
>  extern void rt_mutex_setprio(struct task_struct *p, int prio);
>  extern void rt_mutex_adjust_pi(struct task_struct *p);
> +extern bool pi_blocked_on_rt_lock(struct task_struct *tsk);
>  static inline bool tsk_is_pi_blocked(struct task_struct *tsk)
>  {
> -	return tsk->pi_blocked_on != NULL;
> +	return tsk->pi_blocked_on != NULL && pi_blocked_on_rt_lock(tsk);
>  }
>  #else
>  static inline int rt_mutex_getprio(struct task_struct *p)
> Index: linux-stable-rt/kernel/rtmutex.c
> ===================================================================
> --- linux-stable-rt.orig/kernel/rtmutex.c
> +++ linux-stable-rt/kernel/rtmutex.c
> @@ -699,6 +699,11 @@ static int adaptive_wait(struct rt_mutex
>  # define pi_lock(lock)			raw_spin_lock_irq(lock)
>  # define pi_unlock(lock)		raw_spin_unlock_irq(lock)
>  
> +bool pi_blocked_on_rt_lock(struct task_struct *tsk)
> +{
> +	return tsk->pi_blocked_on && tsk->pi_blocked_on->savestate;
> +}
> +
>  /*
>   * Slow path lock function spin_lock style: this variant is very
>   * careful not to miss any non-lock wakeups.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/



  parent reply	other threads:[~2012-07-14 11:00 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-07-11 13:37 Deadlocks due to per-process plugging Jan Kara
2012-07-11 16:05 ` Jeff Moyer
2012-07-11 20:16   ` Jan Kara
2012-07-11 22:12     ` Thomas Gleixner
2012-07-12  4:12       ` Mike Galbraith
2012-07-13 12:38       ` Jan Kara
2012-07-12  2:07     ` Mike Galbraith
2012-07-12 14:15     ` Thomas Gleixner
2012-07-13 12:33       ` Jan Kara
2012-07-13 14:25         ` Thomas Gleixner
2012-07-13 14:46           ` Jan Kara
2012-07-15  8:59             ` Thomas Gleixner
2012-07-15  9:14               ` Mike Galbraith
2012-07-15  9:51                 ` Thomas Gleixner
2012-07-16  2:22                 ` Mike Galbraith
2012-07-16  8:59                   ` Thomas Gleixner
2012-07-16  9:48                     ` Mike Galbraith
2012-07-16  9:59                       ` Thomas Gleixner
2012-07-16 10:13                         ` Mike Galbraith
2012-07-16 10:08                       ` Mike Galbraith
2012-07-16 10:19                         ` Thomas Gleixner
2012-07-16 10:30                           ` Mike Galbraith
2012-07-16 11:24                           ` Mike Galbraith
2012-07-16 14:35                             ` Mike Galbraith
2012-07-17 13:10                           ` Mike Galbraith
2012-07-18  4:44                             ` Mike Galbraith
2012-07-18  5:30                               ` Mike Galbraith
2012-07-21  7:47                                 ` Mike Galbraith
2012-07-22 18:43                                   ` Mike Galbraith
2012-07-23  9:46                                     ` Mike Galbraith
2012-07-14 11:00           ` Mike Galbraith [this message]
2012-07-14 11:06             ` Mike Galbraith
2012-07-15  7:14             ` Mike Galbraith

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1342263621.7368.36.camel@marge.simpson.net \
    --to=efault@gmx.de \
    --cc=jack@suse.cz \
    --cc=jaxboe@fusionio.com \
    --cc=jmoyer@redhat.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mgalbraith@suse.com \
    --cc=tglx@linutronix.de \
    --cc=tj@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).