Deadlocks due to per-process plugging

* Deadlocks due to per-process plugging
@ 2012-07-11 13:37 Jan Kara
  2012-07-11 16:05 ` Jeff Moyer
  0 siblings, 1 reply; 33+ messages in thread
From: Jan Kara @ 2012-07-11 13:37 UTC (permalink / raw)
  To: LKML; +Cc: linux-fsdevel, Tejun Heo, Jens Axboe

  Hello,

  we've recently hit a deadlock in our QA runs which is caused by the
per-process plugging code. The problem is as follows:
  process A					process B (kjournald)
  generic_file_aio_write()
    blk_start_plug(&plug);
    ...
    somewhere in here we allocate memory and
    direct reclaim submits buffer X for IO
    ...
    ext3_write_begin()
      ext3_journal_start()
        we need more space in a journal
        so we want to checkpoint old transactions,
        we block waiting for kjournald to commit
        a currently running transaction.
						journal_commit_transaction()
						  wait for IO on buffer X
						  to complete as it is part
						  of the current transaction

  => deadlock since A waits for B and B waits for A to do unplug.
BTW: I don't think this is really ext3/ext4 specific. I think other
filesystems can get into problems as well when direct reclaim submits some
IO and the process subsequently blocks without submitting the IO.

Effectively the per process plugging introduces a lock dependency
buffer_lock -> any lock acquired after IO submission before the process'
queue is unplugged. This certainly creates lots of cycles in the lock
dependency graph...

I'm wondering how we should fix this best. Trivial fix would be to flush
the IO plug on every schedule, not just io_schedule(), but that can have
some peformance implications I guess (the effect of plugging would be very
limited). Better (although more tedious) solution would be to push the
plugs from higher levels down into the filesystems where they could be
managed to not create problematic lock dependencies (but e.g. for ext3/ext4
that means we have to unplug after writing each page so it is effectively
rather similar to unplugging on every schedule()).

Thoughts?

								Honza
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 33+ messages in thread