Problem in log_do_checkpoint()?

* Problem in log_do_checkpoint()?
@ 2005-04-04  9:04 Jan Kara
  2005-04-08 14:06 ` Stephen C. Tweedie
  2005-04-08 17:14 ` Badari Pulavarty
  0 siblings, 2 replies; 7+ messages in thread
From: Jan Kara @ 2005-04-04  9:04 UTC (permalink / raw)
  To: linux-kernel; +Cc: sct, jeffm

  Hello,

  I've been looking through the JBD code when trying to understand the
assertion failure in log_do_checkpoint() (it was on old SUSE 2.6.5 kernel
though the reporter claims to be able to get the failure even with the
Stephen's patch fixing a race with journal_put_journal_head()) and I've
spotted one place where I think could be a race (the code around there
seems to be the same in latest kernels):
  In log_do_checkpoint() we go through the t_checkpoint_list of a
transaction and call __flush_buffer() on each buffer. Suppose there is
just one buffer on the list and it is dirty. __flush_buffer() sees it and
puts it to an array of buffers for flushing. Then the loop finishes,
retry=0, drop_count=0, batch_count=1. So __flush_batch() is called - we
drop all locks and sleep. While we are sleeping somebody else comes and
makes the buffer dirty again (OK, that is not probable, but I think it
could be possible). Now we wake up and call __cleanup_transaction().
It's not able to do anything and returns 0. And we fail on the assertion
J_ASSERT(drop_count != 0 || cleanup_ret != 0).
  Am I missing something? In my opinion we should set retry=1 after we
call __flush_batch() even if we call it outside of the "__flush_buffer-loop"...

								Honza

-- 
Jan Kara <jack@suse.cz>
SuSE CR Labs

^ permalink raw reply	[flat|nested] 7+ messages in thread