linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* EXT3 deadlock in 2.4.22 and 2.4.23-pre7 - quota related?
@ 2003-10-27  6:06 Neil Brown
  2003-10-27  6:20 ` Christopher Li
  2003-10-27 10:01 ` Andrew Morton
  0 siblings, 2 replies; 5+ messages in thread
From: Neil Brown @ 2003-10-27  6:06 UTC (permalink / raw)
  To: ext3-users, linux-kernel


Hi all, and particularly Andrew and Stephen,

 I recently "upgraded' one of my NFS fileservers from (patched)2.4.18
 to 2.4.23-pre7 (in order to resolve a HIMEM related memory pressure
 problem).

 Unfortunately I have experienced what appears to be a deadlock.
 The one I will describe was experienced while running 2.4.23-pre7,
 though I had a very similar problem in 2.4.22 (but didn't collect
 enough data to be able to give a useful report at the time).

 An ext3 filesystem is mounted with data=journal, with an external
 journal, and accessed mostly via NFS.  v1 quotas are enabled.

 The stacktrace lines below are from a lightly editted 
   "echo t > /proc/sysrq-trigger"
 output.


 A substantial number of nfsd threads a waiting to "start a handle":
    nfsd Call Trace:    [sleep_on+75/124]
         [start_this_handle+205/368] [journal_start+149/196]
         [ext3_dirty_inode+116/268] [__mark_inode_dirty+50/168]
         [update_atime+75/80] [do_generic_file_read+1158/1172]
         [generic_file_read+147/400] [file_read_actor+0/224]
         [nfsd_read+477/636] [nfsd3_proc_read+295/388]
         [nfsd_dispatch+207/406] [svc_process+655/1264] [nfsd+566/944]
         [arch_kernel_thread+40/56] 

This sleep_on is the one at line 136 of
transcation.c(start_this_handle), where it is waiting for t_state to
stop being T_LOCKED. 

The related kjournald is at:
    kjournald Call Trace:    [sleep_on+75/124]
         [journal_commit_transaction+357/4044] [do_IRQ+221/236]
         [.text.lock.sched+131/471] [kjournald+326/540]
         [commit_timeout+0/12] [arch_kernel_thread+40/56] 

This sleep_on is at line 87 in commit.c (journal_commit_transaction)
where it is waiting for t_updates to be 0.  At this point, 
t_state is T_LOCKED, so presumably those nfsd threads above are
waiting on kjournald.  But what is kjournald really waiting for?
My first though was the two nfsd threads  in:
     nfsd Call Trace:    [sleep_on+75/124]
	  [log_wait_commit+74/136] [journal_stop+408/432]
	  [journal_force_commit+78/128] [ext3_force_commit+66/112]
	  [ext3_sync_file+128/144] [nfsd_sync_dir+49/72]
	  [nfsd_unlink+455/480] [nfsd_proc_remove+122/140]
	  [nfsd_dispatch+207/406] [svc_process+655/1264]
	  [nfsd+566/944] [arch_kernel_thread+40/56] 

that are waiting on j_wait_done_commit.  However they are doing that
from journal_stop *after* journal_stop has decremented t_updates, so
it doesn't seem likely that kjournald is waiting on that.

Outside of nfsd, there is an rquotad program (locally written, not the
standard one) that is :

      rquotad Call Trace:    [sleep_on+75/124]
            [start_this_handle+205/368] [journal_start+149/196]
            [ext3_dirty_inode+116/268] [__mark_inode_dirty+50/168]
            [update_atime+75/80] [do_generic_file_read+1158/1172]
            [generic_file_read+147/400] [file_read_actor+0/224]
            [do_get_write_access+1382/1420] [v1_read_dqblk+121/196]
            [read_dqblk+76/128] [dqget+344/484] [vfs_get_dqblk+21/64]
            [v1_get_dqblk+39/172] [link_path_walk+2680/2956]
            [do_compat_quotactl+417/688] [resolve_dev+89/108]
            [sys_quotactl+166/275] [system_call+51/56] 

So it is trying to start a transaction to update the atime on the
quota file, and has a lock on some quota structures thanks to
"read_dqblk".

At the same time, "sync" is running:

         sync Call Trace:    [__down+109/208] [__down_failed+8/12]
             [.text.lock.dquot+73/286] [ext3_sync_dquot+337/462]
             [vfs_quota_sync+102/372] [sync_dquots_dev+194/260]
             [fsync_dev+66/128] [sys_sync+7/16] [system_call+51/56] 

and has started an ext3 transaction (in ext3_sync_dquot) and is trying
to get the lock that rquotad has.

Presumably the transaction that sync has started is keeping t_updates
greater than 0, thus preventing kjournald from progressing, and this
preventing anyone else, including rquotad, from starting a new
transaction.  Hence a deadlock.

The incomplete trace that I have from the first time this problem
happened also has sync blocks in the same place, and an nfsd thread,
rather than rquotad, holding a lock in read_dqblk.

I note that ext3_sync_dquot didn't exist in 2.4.18 and does now, so
that correlated well with the observation that this problem has been
introduced since 2.4.18.

If my analysis is correct, I can largely avoid the problem by not
running "sync" as much (I currently run it every 5 minutes which isn't
needed any longer (it's a long story:-)) but that doesn't fix the bug.

My guess is that ext3_sync_dquot doesn't need  ext3_journal_start at
all but that isn't a well-informed guess.

Help appreciated.

NeilBrown

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: EXT3 deadlock in 2.4.22 and 2.4.23-pre7 - quota related?
  2003-10-27  6:06 EXT3 deadlock in 2.4.22 and 2.4.23-pre7 - quota related? Neil Brown
@ 2003-10-27  6:20 ` Christopher Li
  2003-10-27 10:01 ` Andrew Morton
  1 sibling, 0 replies; 5+ messages in thread
From: Christopher Li @ 2003-10-27  6:20 UTC (permalink / raw)
  To: Neil Brown; +Cc: ext3-users, linux-kernel

On Mon, Oct 27, 2003 at 05:06:16PM +1100, Neil Brown wrote:
> 
> The related kjournald is at:
>     kjournald Call Trace:    [sleep_on+75/124]
>          [journal_commit_transaction+357/4044] [do_IRQ+221/236]
>          [.text.lock.sched+131/471] [kjournald+326/540]
>          [commit_timeout+0/12] [arch_kernel_thread+40/56] 
> 
> This sleep_on is at line 87 in commit.c (journal_commit_transaction)
> where it is waiting for t_updates to be 0.  At this point, 
> t_state is T_LOCKED, so presumably those nfsd threads above are
> waiting on kjournald.  But what is kjournald really waiting for?

kjournald is wait for the current pending transaction to stop.

> My first though was the two nfsd threads  in:
>      nfsd Call Trace:    [sleep_on+75/124]
> 	  [log_wait_commit+74/136] [journal_stop+408/432]
> 	  [journal_force_commit+78/128] [ext3_force_commit+66/112]
> 	  [ext3_sync_file+128/144] [nfsd_sync_dir+49/72]
> 	  [nfsd_unlink+455/480] [nfsd_proc_remove+122/140]
> 	  [nfsd_dispatch+207/406] [svc_process+655/1264]
> 	  [nfsd+566/944] [arch_kernel_thread+40/56] 
> 
> that are waiting on j_wait_done_commit.  However they are doing that
> from journal_stop *after* journal_stop has decremented t_updates, so
> it doesn't seem likely that kjournald is waiting on that.

That is right.

> 
> Outside of nfsd, there is an rquotad program (locally written, not the
> standard one) that is :
> 
>       rquotad Call Trace:    [sleep_on+75/124]
>             [start_this_handle+205/368] [journal_start+149/196]
>             [ext3_dirty_inode+116/268] [__mark_inode_dirty+50/168]
>             [update_atime+75/80] [do_generic_file_read+1158/1172]
>             [generic_file_read+147/400] [file_read_actor+0/224]
>             [do_get_write_access+1382/1420] [v1_read_dqblk+121/196]
>             [read_dqblk+76/128] [dqget+344/484] [vfs_get_dqblk+21/64]
>             [v1_get_dqblk+39/172] [link_path_walk+2680/2956]
>             [do_compat_quotactl+417/688] [resolve_dev+89/108]
>             [sys_quotactl+166/275] [system_call+51/56] 
> 
> So it is trying to start a transaction to update the atime on the
> quota file, and has a lock on some quota structures thanks to
> "read_dqblk".

This guy is waiting the journal commit to be finished, seems harmless
to me.

> 
> At the same time, "sync" is running:
> 
>          sync Call Trace:    [__down+109/208] [__down_failed+8/12]
>              [.text.lock.dquot+73/286] [ext3_sync_dquot+337/462]
>              [vfs_quota_sync+102/372] [sync_dquots_dev+194/260]
>              [fsync_dev+66/128] [sys_sync+7/16] [system_call+51/56] 
> 
> and has started an ext3 transaction (in ext3_sync_dquot) and is trying
> to get the lock that rquotad has.

That seems wrong to me. It should get the lock before it start the
transasction. For the same reason that you can't lock_page inside
journal transasction, it is a ranking error. BTW, it seems that
current bk tree, truncate still do lock_page inside journal
transasction.

> 
> Presumably the transaction that sync has started is keeping t_updates
> greater than 0, thus preventing kjournald from progressing, and this
> preventing anyone else, including rquotad, from starting a new
> transaction.  Hence a deadlock.

That is right.

> 
> My guess is that ext3_sync_dquot doesn't need  ext3_journal_start at
> all but that isn't a well-informed guess.

I think you want to put ext3_sync_dquot to be atomic on power failure.  
The journal handle can get from ext3_current_journal_handle, which
used by writepage etc.

Chris

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: EXT3 deadlock in 2.4.22 and 2.4.23-pre7 - quota related?
  2003-10-27  6:06 EXT3 deadlock in 2.4.22 and 2.4.23-pre7 - quota related? Neil Brown
  2003-10-27  6:20 ` Christopher Li
@ 2003-10-27 10:01 ` Andrew Morton
  2003-10-29  0:02   ` Jan Kara
  2003-11-10 11:11   ` Jan Kara
  1 sibling, 2 replies; 5+ messages in thread
From: Andrew Morton @ 2003-10-27 10:01 UTC (permalink / raw)
  To: Neil Brown; +Cc: ext3-users, linux-kernel, Jan Kara

Neil Brown <neilb@cse.unsw.edu.au> wrote:
>
> 
> Hi all, and particularly Andrew and Stephen,
> 
>  I recently "upgraded' one of my NFS fileservers from (patched)2.4.18
>  to 2.4.23-pre7 (in order to resolve a HIMEM related memory pressure
>  problem).
> 
>  Unfortunately I have experienced what appears to be a deadlock.
>

I don't recall a time when quotas on ext3 were not deadlocky :(

> ...
> 
>       rquotad Call Trace:    [sleep_on+75/124]
>             [start_this_handle+205/368] [journal_start+149/196]
>             [ext3_dirty_inode+116/268] [__mark_inode_dirty+50/168]
>             [update_atime+75/80] [do_generic_file_read+1158/1172]
>             [generic_file_read+147/400] [file_read_actor+0/224]
>             [do_get_write_access+1382/1420] [v1_read_dqblk+121/196]
>             [read_dqblk+76/128] [dqget+344/484] [vfs_get_dqblk+21/64]
>             [v1_get_dqblk+39/172] [link_path_walk+2680/2956]
>             [do_compat_quotactl+417/688] [resolve_dev+89/108]
>             [sys_quotactl+166/275] [system_call+51/56] 

read_dqblk() took dqio_sem, then ext3_dirty_inode() did journal_start().

> At the same time, "sync" is running:
> 
>          sync Call Trace:    [__down+109/208] [__down_failed+8/12]
>              [.text.lock.dquot+73/286] [ext3_sync_dquot+337/462]
>              [vfs_quota_sync+102/372] [sync_dquots_dev+194/260]
>              [fsync_dev+66/128] [sys_sync+7/16] [system_call+51/56] 

ext3_sync_dquot() did journal_start(), then someone (commit_dqblk?) tried
to take dqio_sem.

Probably it is not too hard to flip the ordering in the latter case, but all
other code paths need to be checked, and perhaps even documented.

Jan, didn't we fix this very problem in 2.6 recently?



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: EXT3 deadlock in 2.4.22 and 2.4.23-pre7 - quota related?
  2003-10-27 10:01 ` Andrew Morton
@ 2003-10-29  0:02   ` Jan Kara
  2003-11-10 11:11   ` Jan Kara
  1 sibling, 0 replies; 5+ messages in thread
From: Jan Kara @ 2003-10-29  0:02 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Neil Brown, ext3-users, linux-kernel, Jan Kara

> Neil Brown <neilb@cse.unsw.edu.au> wrote:
> >
> > 
> > Hi all, and particularly Andrew and Stephen,
> > 
> >  I recently "upgraded' one of my NFS fileservers from (patched)2.4.18
> >  to 2.4.23-pre7 (in order to resolve a HIMEM related memory pressure
> >  problem).
> > 
> >  Unfortunately I have experienced what appears to be a deadlock.
> >
> 
> I don't recall a time when quotas on ext3 were not deadlocky :(
  Sadly yes... The problems with journal_begin/end are constant..

> > ...
> > 
> >       rquotad Call Trace:    [sleep_on+75/124]
> >             [start_this_handle+205/368] [journal_start+149/196]
> >             [ext3_dirty_inode+116/268] [__mark_inode_dirty+50/168]
> >             [update_atime+75/80] [do_generic_file_read+1158/1172]
> >             [generic_file_read+147/400] [file_read_actor+0/224]
> >             [do_get_write_access+1382/1420] [v1_read_dqblk+121/196]
> >             [read_dqblk+76/128] [dqget+344/484] [vfs_get_dqblk+21/64]
> >             [v1_get_dqblk+39/172] [link_path_walk+2680/2956]
> >             [do_compat_quotactl+417/688] [resolve_dev+89/108]
> >             [sys_quotactl+166/275] [system_call+51/56] 
> 
> read_dqblk() took dqio_sem, then ext3_dirty_inode() did journal_start().
> 
> > At the same time, "sync" is running:
> > 
> >          sync Call Trace:    [__down+109/208] [__down_failed+8/12]
> >              [.text.lock.dquot+73/286] [ext3_sync_dquot+337/462]
> >              [vfs_quota_sync+102/372] [sync_dquots_dev+194/260]
> >              [fsync_dev+66/128] [sys_sync+7/16] [system_call+51/56] 
> 
> ext3_sync_dquot() did journal_start(), then someone (commit_dqblk?) tried
> to take dqio_sem.
> 
> Probably it is not too hard to flip the ordering in the latter case, but all
> other code paths need to be checked, and perhaps even documented.
  Actually I think we need to flip the first part - ext3_sync_dquot() is
there exactly because we needed to handle the sync() case separately..

> Jan, didn't we fix this very problem in 2.6 recently?
  If I remember right last 2.6 problem was a bit different - the problem
was with getting some locks when handling quota files which should not
be acquired...

							Honza

--
Jan Kara <jack@suse.cz>
SuSE CR Labs

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: EXT3 deadlock in 2.4.22 and 2.4.23-pre7 - quota related?
  2003-10-27 10:01 ` Andrew Morton
  2003-10-29  0:02   ` Jan Kara
@ 2003-11-10 11:11   ` Jan Kara
  1 sibling, 0 replies; 5+ messages in thread
From: Jan Kara @ 2003-11-10 11:11 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Neil Brown, ext3-users, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 2403 bytes --]

  Hi,

  attached patch for 2.4.22 (should apply to -pre kernels also) should
fix the problem... The fix contains two parts - first it disables ATIME
on quota files so we know that reading file doesn't cause a write to
filesystem (can I rely on this?) and second it ensures that whenever we
write some quota structure the transaction is already started and hence
we should not deadlock. I was thinking about the problem and I didn't
find better solution. Please test the patch (I gave it reasonable
testing under UML but anyway).

								Honza

> Neil Brown <neilb@cse.unsw.edu.au> wrote:
> >
> > 
> > Hi all, and particularly Andrew and Stephen,
> > 
> >  I recently "upgraded' one of my NFS fileservers from (patched)2.4.18
> >  to 2.4.23-pre7 (in order to resolve a HIMEM related memory pressure
> >  problem).
> > 
> >  Unfortunately I have experienced what appears to be a deadlock.
> >
> 
> I don't recall a time when quotas on ext3 were not deadlocky :(
> 
> > ...
> > 
> >       rquotad Call Trace:    [sleep_on+75/124]
> >             [start_this_handle+205/368] [journal_start+149/196]
> >             [ext3_dirty_inode+116/268] [__mark_inode_dirty+50/168]
> >             [update_atime+75/80] [do_generic_file_read+1158/1172]
> >             [generic_file_read+147/400] [file_read_actor+0/224]
> >             [do_get_write_access+1382/1420] [v1_read_dqblk+121/196]
> >             [read_dqblk+76/128] [dqget+344/484] [vfs_get_dqblk+21/64]
> >             [v1_get_dqblk+39/172] [link_path_walk+2680/2956]
> >             [do_compat_quotactl+417/688] [resolve_dev+89/108]
> >             [sys_quotactl+166/275] [system_call+51/56] 
> 
> read_dqblk() took dqio_sem, then ext3_dirty_inode() did journal_start().
> 
> > At the same time, "sync" is running:
> > 
> >          sync Call Trace:    [__down+109/208] [__down_failed+8/12]
> >              [.text.lock.dquot+73/286] [ext3_sync_dquot+337/462]
> >              [vfs_quota_sync+102/372] [sync_dquots_dev+194/260]
> >              [fsync_dev+66/128] [sys_sync+7/16] [system_call+51/56] 
> 
> ext3_sync_dquot() did journal_start(), then someone (commit_dqblk?) tried
> to take dqio_sem.
> 
> Probably it is not too hard to flip the ordering in the latter case, but all
> other code paths need to be checked, and perhaps even documented.
> 
> Jan, didn't we fix this very problem in 2.6 recently?
> 
-- 
Jan Kara <jack@suse.cz>
SuSE CR Labs

[-- Attachment #2: quota-2.4.22-deadlockfix.diff --]
[-- Type: text/plain, Size: 3219 bytes --]

diff -ruX ../kerndiffexclude linux-2.4.22-fixstat/fs/dquot.c linux-2.4.22-deadlock/fs/dquot.c
--- linux-2.4.22-fixstat/fs/dquot.c	Wed Nov  5 21:21:58 2003
+++ linux-2.4.22-deadlock/fs/dquot.c	Mon Nov 10 11:48:47 2003
@@ -396,7 +396,7 @@
 		if (dquot->dq_flags & DQ_LOCKED)
 			wait_on_dquot(dquot);
 		if (dquot_dirty(dquot))
-			sb->dq_op->sync_dquot(dquot);
+			sb->dq_op->write_dquot(dquot);
 		dqput(dquot);
 		goto restart;
 	}
@@ -543,7 +543,7 @@
 		return;
 	}
 	if (dquot_dirty(dquot)) {
-		commit_dqblk(dquot);
+		dquot->dq_sb->dq_op->write_dquot(dquot);
 		goto we_slept;
 	}
 
@@ -1219,7 +1219,7 @@
 	free_space:	dquot_free_space,
 	free_inode:	dquot_free_inode,
 	transfer:	dquot_transfer,
-	sync_dquot:	commit_dqblk
+	write_dquot:	commit_dqblk
 };
 
 /* Function used by filesystems for initializing the dquot_operations structure */
@@ -1331,9 +1331,9 @@
 	error = -EINVAL;
 	if (!fmt->qf_ops->check_quota_file(sb, type))
 		goto out_f;
-	/* We don't want quota on quota files */
+	/* We don't want quota and atime on quota files */
 	dquot_drop(inode);
-	inode->i_flags |= S_NOQUOTA;
+	inode->i_flags |= S_NOQUOTA | S_NOATIME;
 
 	dqopt->ops[type] = fmt->qf_ops;
 	dqopt->info[type].dqi_format = fmt;
diff -ruX ../kerndiffexclude linux-2.4.22-fixstat/fs/ext3/super.c linux-2.4.22-deadlock/fs/ext3/super.c
--- linux-2.4.22-fixstat/fs/ext3/super.c	Mon Aug 25 13:44:43 2003
+++ linux-2.4.22-deadlock/fs/ext3/super.c	Mon Nov 10 11:43:32 2003
@@ -449,7 +449,7 @@
 }
 
 static struct dquot_operations ext3_qops;
-static int (*old_sync_dquot)(struct dquot *dquot);
+static int (*old_write_dquot)(struct dquot *dquot);
 
 static struct super_operations ext3_sops = {
 	read_inode:	ext3_read_inode,	/* BKL held */
@@ -1779,7 +1779,7 @@
 /* Blocks: quota info + (4 pointer blocks + 1 entry block) * (3 indirect + 1 descriptor + 1 bitmap) + superblock */
 #define EXT3_V0_QFMT_BLOCKS 27
 
-static int ext3_sync_dquot(struct dquot *dquot)
+static int ext3_write_dquot(struct dquot *dquot)
 {
 	int nblocks, ret;
 	handle_t *handle;
@@ -1804,7 +1804,7 @@
 		return PTR_ERR(handle);
 	}
 	unlock_kernel();
-	ret = old_sync_dquot(dquot);
+	ret = old_write_dquot(dquot);
 	lock_kernel();
 	ret = ext3_journal_stop(handle, qinode);
 	unlock_kernel();
@@ -1818,8 +1818,8 @@
 {
 #ifdef CONFIG_QUOTA
 	init_dquot_operations(&ext3_qops);
-	old_sync_dquot = ext3_qops.sync_dquot;
-	ext3_qops.sync_dquot = ext3_sync_dquot;
+	old_write_dquot = ext3_qops.write_dquot;
+	ext3_qops.write_dquot = ext3_write_dquot;
 #endif
         return register_filesystem(&ext3_fs_type);
 }
diff -ruX ../kerndiffexclude linux-2.4.22-fixstat/include/linux/quota.h linux-2.4.22-deadlock/include/linux/quota.h
--- linux-2.4.22-fixstat/include/linux/quota.h	Wed Nov  5 21:27:44 2003
+++ linux-2.4.22-deadlock/include/linux/quota.h	Mon Nov 10 11:40:06 2003
@@ -249,7 +249,7 @@
 	void (*free_space) (struct inode *, qsize_t);
 	void (*free_inode) (const struct inode *, unsigned long);
 	int (*transfer) (struct inode *, struct iattr *);
-	int (*sync_dquot) (struct dquot *);
+	int (*write_dquot) (struct dquot *);
 };
 
 /* Operations handling requests from userspace */
Binary files linux-2.4.22-fixstat/linux and linux-2.4.22-deadlock/linux differ

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2003-11-10 11:11 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2003-10-27  6:06 EXT3 deadlock in 2.4.22 and 2.4.23-pre7 - quota related? Neil Brown
2003-10-27  6:20 ` Christopher Li
2003-10-27 10:01 ` Andrew Morton
2003-10-29  0:02   ` Jan Kara
2003-11-10 11:11   ` Jan Kara

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).