linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Christopher Li <ext3-user@chrisli.org>
To: Neil Brown <neilb@cse.unsw.edu.au>
Cc: ext3-users@redhat.com, linux-kernel@vger.kernel.org
Subject: Re: EXT3 deadlock in 2.4.22 and 2.4.23-pre7 - quota related?
Date: Mon, 27 Oct 2003 01:20:02 -0500	[thread overview]
Message-ID: <20031027062002.GA3544@64m.dyndns.org> (raw)
In-Reply-To: <16284.46552.776018.358472@notabene.cse.unsw.edu.au>

On Mon, Oct 27, 2003 at 05:06:16PM +1100, Neil Brown wrote:
> 
> The related kjournald is at:
>     kjournald Call Trace:    [sleep_on+75/124]
>          [journal_commit_transaction+357/4044] [do_IRQ+221/236]
>          [.text.lock.sched+131/471] [kjournald+326/540]
>          [commit_timeout+0/12] [arch_kernel_thread+40/56] 
> 
> This sleep_on is at line 87 in commit.c (journal_commit_transaction)
> where it is waiting for t_updates to be 0.  At this point, 
> t_state is T_LOCKED, so presumably those nfsd threads above are
> waiting on kjournald.  But what is kjournald really waiting for?

kjournald is wait for the current pending transaction to stop.

> My first though was the two nfsd threads  in:
>      nfsd Call Trace:    [sleep_on+75/124]
> 	  [log_wait_commit+74/136] [journal_stop+408/432]
> 	  [journal_force_commit+78/128] [ext3_force_commit+66/112]
> 	  [ext3_sync_file+128/144] [nfsd_sync_dir+49/72]
> 	  [nfsd_unlink+455/480] [nfsd_proc_remove+122/140]
> 	  [nfsd_dispatch+207/406] [svc_process+655/1264]
> 	  [nfsd+566/944] [arch_kernel_thread+40/56] 
> 
> that are waiting on j_wait_done_commit.  However they are doing that
> from journal_stop *after* journal_stop has decremented t_updates, so
> it doesn't seem likely that kjournald is waiting on that.

That is right.

> 
> Outside of nfsd, there is an rquotad program (locally written, not the
> standard one) that is :
> 
>       rquotad Call Trace:    [sleep_on+75/124]
>             [start_this_handle+205/368] [journal_start+149/196]
>             [ext3_dirty_inode+116/268] [__mark_inode_dirty+50/168]
>             [update_atime+75/80] [do_generic_file_read+1158/1172]
>             [generic_file_read+147/400] [file_read_actor+0/224]
>             [do_get_write_access+1382/1420] [v1_read_dqblk+121/196]
>             [read_dqblk+76/128] [dqget+344/484] [vfs_get_dqblk+21/64]
>             [v1_get_dqblk+39/172] [link_path_walk+2680/2956]
>             [do_compat_quotactl+417/688] [resolve_dev+89/108]
>             [sys_quotactl+166/275] [system_call+51/56] 
> 
> So it is trying to start a transaction to update the atime on the
> quota file, and has a lock on some quota structures thanks to
> "read_dqblk".

This guy is waiting the journal commit to be finished, seems harmless
to me.

> 
> At the same time, "sync" is running:
> 
>          sync Call Trace:    [__down+109/208] [__down_failed+8/12]
>              [.text.lock.dquot+73/286] [ext3_sync_dquot+337/462]
>              [vfs_quota_sync+102/372] [sync_dquots_dev+194/260]
>              [fsync_dev+66/128] [sys_sync+7/16] [system_call+51/56] 
> 
> and has started an ext3 transaction (in ext3_sync_dquot) and is trying
> to get the lock that rquotad has.

That seems wrong to me. It should get the lock before it start the
transasction. For the same reason that you can't lock_page inside
journal transasction, it is a ranking error. BTW, it seems that
current bk tree, truncate still do lock_page inside journal
transasction.

> 
> Presumably the transaction that sync has started is keeping t_updates
> greater than 0, thus preventing kjournald from progressing, and this
> preventing anyone else, including rquotad, from starting a new
> transaction.  Hence a deadlock.

That is right.

> 
> My guess is that ext3_sync_dquot doesn't need  ext3_journal_start at
> all but that isn't a well-informed guess.

I think you want to put ext3_sync_dquot to be atomic on power failure.  
The journal handle can get from ext3_current_journal_handle, which
used by writepage etc.

Chris

  reply	other threads:[~2003-10-27  9:22 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2003-10-27  6:06 EXT3 deadlock in 2.4.22 and 2.4.23-pre7 - quota related? Neil Brown
2003-10-27  6:20 ` Christopher Li [this message]
2003-10-27 10:01 ` Andrew Morton
2003-10-29  0:02   ` Jan Kara
2003-11-10 11:11   ` Jan Kara

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20031027062002.GA3544@64m.dyndns.org \
    --to=ext3-user@chrisli.org \
    --cc=ext3-users@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=neilb@cse.unsw.edu.au \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).