linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jan Kara <jack@suse.cz>
To: Jeff Moyer <jmoyer@redhat.com>
Cc: Jan Kara <jack@suse.cz>, LKML <linux-kernel@vger.kernel.org>,
	linux-fsdevel@vger.kernel.org, Tejun Heo <tj@kernel.org>,
	Jens Axboe <jaxboe@fusionio.com>,
	mgalbraith@suse.com, Thomas Gleixner <tglx@linutronix.de>
Subject: Re: Deadlocks due to per-process plugging
Date: Wed, 11 Jul 2012 22:16:01 +0200	[thread overview]
Message-ID: <20120711201601.GB9779@quack.suse.cz> (raw)
In-Reply-To: <x49ehoii8ps.fsf@segfault.boston.devel.redhat.com>

On Wed 11-07-12 12:05:51, Jeff Moyer wrote:
> Jan Kara <jack@suse.cz> writes:
> 
> >   Hello,
> >
> >   we've recently hit a deadlock in our QA runs which is caused by the
> > per-process plugging code. The problem is as follows:
> >   process A					process B (kjournald)
> >   generic_file_aio_write()
> >     blk_start_plug(&plug);
> >     ...
> >     somewhere in here we allocate memory and
> >     direct reclaim submits buffer X for IO
> >     ...
> >     ext3_write_begin()
> >       ext3_journal_start()
> >         we need more space in a journal
> >         so we want to checkpoint old transactions,
> >         we block waiting for kjournald to commit
> >         a currently running transaction.
> > 						journal_commit_transaction()
> > 						  wait for IO on buffer X
> > 						  to complete as it is part
> > 						  of the current transaction
> >
> >   => deadlock since A waits for B and B waits for A to do unplug.
> > BTW: I don't think this is really ext3/ext4 specific. I think other
> > filesystems can get into problems as well when direct reclaim submits some
> > IO and the process subsequently blocks without submitting the IO.
> 
> So, I thought schedule would do the flush.  Checking the code:
> 
> asmlinkage void __sched schedule(void)
> {
>         struct task_struct *tsk = current;
> 
>         sched_submit_work(tsk);
>         __schedule();
> }
> 
> And sched_submit_work looks like this:
> 
> static inline void sched_submit_work(struct task_struct *tsk)
> {
>         if (!tsk->state || tsk_is_pi_blocked(tsk))
>                 return;
>         /*
>          * If we are going to sleep and we have plugged IO queued,
>          * make sure to submit it to avoid deadlocks.
>          */
>         if (blk_needs_flush_plug(tsk))
>                 blk_schedule_flush_plug(tsk);
> }
> 
> This eventually ends in a call to blk_run_queue_async(q) after
> submitting the I/O from the plug list.  Right?  So is the question
> really why doesn't the kblockd workqueue get scheduled?
  Ah, I didn't know this. Thanks for the hint. So in the kdump I have I can
see requests queued in tsk->plug despite the process is sleeping in
TASK_UNINTERRUPTIBLE state.  So the only way how unplug could have been
omitted is if tsk_is_pi_blocked() was true. Rummaging through the dump...
indeed task has pi_blocked_on = 0xffff8802717d79c8. The dump is from an -rt
kernel (I just didn't originally thought that makes any difference) so
actually any mutex is rtmutex and thus tsk_is_pi_blocked() is true whenever
we are sleeping on a mutex. So this seems like a bug in rtmutex code.
Thomas, you seemed to have added that condition... Any idea how to avoid
the deadlock?

									Honza 
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

  reply	other threads:[~2012-07-11 20:16 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-07-11 13:37 Deadlocks due to per-process plugging Jan Kara
2012-07-11 16:05 ` Jeff Moyer
2012-07-11 20:16   ` Jan Kara [this message]
2012-07-11 22:12     ` Thomas Gleixner
2012-07-12  4:12       ` Mike Galbraith
2012-07-13 12:38       ` Jan Kara
2012-07-12  2:07     ` Mike Galbraith
2012-07-12 14:15     ` Thomas Gleixner
2012-07-13 12:33       ` Jan Kara
2012-07-13 14:25         ` Thomas Gleixner
2012-07-13 14:46           ` Jan Kara
2012-07-15  8:59             ` Thomas Gleixner
2012-07-15  9:14               ` Mike Galbraith
2012-07-15  9:51                 ` Thomas Gleixner
2012-07-16  2:22                 ` Mike Galbraith
2012-07-16  8:59                   ` Thomas Gleixner
2012-07-16  9:48                     ` Mike Galbraith
2012-07-16  9:59                       ` Thomas Gleixner
2012-07-16 10:13                         ` Mike Galbraith
2012-07-16 10:08                       ` Mike Galbraith
2012-07-16 10:19                         ` Thomas Gleixner
2012-07-16 10:30                           ` Mike Galbraith
2012-07-16 11:24                           ` Mike Galbraith
2012-07-16 14:35                             ` Mike Galbraith
2012-07-17 13:10                           ` Mike Galbraith
2012-07-18  4:44                             ` Mike Galbraith
2012-07-18  5:30                               ` Mike Galbraith
2012-07-21  7:47                                 ` Mike Galbraith
2012-07-22 18:43                                   ` Mike Galbraith
2012-07-23  9:46                                     ` Mike Galbraith
2012-07-14 11:00           ` Mike Galbraith
2012-07-14 11:06             ` Mike Galbraith
2012-07-15  7:14             ` Mike Galbraith

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20120711201601.GB9779@quack.suse.cz \
    --to=jack@suse.cz \
    --cc=jaxboe@fusionio.com \
    --cc=jmoyer@redhat.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mgalbraith@suse.com \
    --cc=tglx@linutronix.de \
    --cc=tj@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).