All of lore.kernel.org
 help / color / mirror / Atom feed
* Debug of xfstest 234 hang
@ 2011-11-10 21:16 Chandra Seetharaman
  2011-11-10 22:19 ` Dave Chinner
  0 siblings, 1 reply; 3+ messages in thread
From: Chandra Seetharaman @ 2011-11-10 21:16 UTC (permalink / raw)
  To: Dave Chinner, Alex Elder; +Cc: XFS Mailing List

Hi Dave, Alex,

Debugging using trace, crash and systemtap, I found that the hang
happens when xfs_sync_worker() (thru kworker) gets stuck in xlog_wait()
while reserving a transaction log buffer for the dummy log.

I also found that even though xfsaild_push() keeps getting invoked, it
doesn't do anything to push the log to the disk, since the
ailp->xa_target has not been changed since it has been called from the
process stack a while back.

So, I thought, resetting the target to the max value would help nudge
the flow of ail to the disk. So, I added the following code. 
------------------
diff --git a/fs/xfs/xfs_trans_ail.c b/fs/xfs/xfs_trans_ail.c
index ed9252b..f59fd9f 100644
--- a/fs/xfs/xfs_trans_ail.c
+++ b/fs/xfs/xfs_trans_ail.c
@@ -534,6 +534,10 @@ out_done:
 		ailp->xa_last_pushed_lsn = 0;
 	}
 
+	lsn = xfs_ail_max_lsn(ailp);
+	smp_wmb();
+	xfs_trans_ail_copy_lsn(ailp, &ailp->xa_target, &lsn);
+	smp_wmb();
 	return tout;
 }
--------------------

and it seem to do the magic.

With this change, test 234 runs fine.

Is this a good fix, bad fix, overkill... ?

Please let me know.

regards

chandra
 


_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: Debug of xfstest 234 hang
  2011-11-10 21:16 Debug of xfstest 234 hang Chandra Seetharaman
@ 2011-11-10 22:19 ` Dave Chinner
  2011-11-18 15:39   ` Chandra Seetharaman
  0 siblings, 1 reply; 3+ messages in thread
From: Dave Chinner @ 2011-11-10 22:19 UTC (permalink / raw)
  To: Chandra Seetharaman; +Cc: XFS Mailing List, Alex Elder

On Thu, Nov 10, 2011 at 03:16:33PM -0600, Chandra Seetharaman wrote:
> Hi Dave, Alex,
> 
> Debugging using trace, crash and systemtap, I found that the hang
> happens when xfs_sync_worker() (thru kworker) gets stuck in xlog_wait()
> while reserving a transaction log buffer for the dummy log.

Yes, that much I already knew from stack traces posted.

> I also found that even though xfsaild_push() keeps getting invoked, it
> doesn't do anything to push the log to the disk, since the
> ailp->xa_target has not been changed since it has been called from the
> process stack a while back.

Which is normal behaviour - if the log head/tail have't moved, then the
calculated push target (25 % of log space free) will not change
during the transaction reservation.

> So, I thought, resetting the target to the max value would help nudge
> the flow of ail to the disk. So, I added the following code. 
> ------------------
> diff --git a/fs/xfs/xfs_trans_ail.c b/fs/xfs/xfs_trans_ail.c
> index ed9252b..f59fd9f 100644
> --- a/fs/xfs/xfs_trans_ail.c
> +++ b/fs/xfs/xfs_trans_ail.c
> @@ -534,6 +534,10 @@ out_done:
>  		ailp->xa_last_pushed_lsn = 0;
>  	}
>  
> +	lsn = xfs_ail_max_lsn(ailp);
> +	smp_wmb();
> +	xfs_trans_ail_copy_lsn(ailp, &ailp->xa_target, &lsn);
> +	smp_wmb();
>  	return tout;
>  }
> --------------------
> 
> and it seem to do the magic.

which means anyone who pushes on the AIL *always empties it*. That's
extremely bad from a performance persepctive because it means we no
longer have any concept of delaying metadata writeback until it is
really necessary.

Basically all this does is prevent the log space usage from hanging
around the threshold point that causes the hang. i.e. instead of
keeping the log roughly 25% free, this change means we get to 75%
full then try to empty it completely. The hang is occurring due to
some as yet unknown flaw in the threshold logic, so all your change
does is make it harder to hit because the filesysetm spends less
time at the threshold.

A better way to acheive this woulxp dbe to push the AIL in
xfs_sync_worker() before trying to cover the log. i.e. change the
code to do this:

 
 	if (!(mp->m_flags & XFS_MOUNT_RDONLY)) {
+		/* start pushing all the metadata that is currently dirty */
+		xfs_ail_push_all(mp->m_ail);
+
 		/* dgc: errors ignored here */
 		if (mp->m_super->s_frozen == SB_UNFROZEN &&
 		    xfs_log_need_covered(mp))
 			error = xfs_fs_log_dummy(mp);
 		else
 			xfs_log_force(mp, 0);
 		error = xfs_qm_sync(mp, SYNC_TRYLOCK);
-
-		/* start pushing all the metadata that is currently dirty */
-		xfs_ail_push_all(mp->m_ail);
 	}
 
 	/* queue us up again */

This will guarantee the push target is moved before the sync worker
tries to do a transaction reservation and potentially hangs waiting
for space.

However, this is still just a workaround for whatever threshold
value problem we have that is the root cause of the problem. Hence
what I'd really like to know is where the grant head/log tail values
are at this point when the hang occurs (the log event tracing has
all the necessary values) so we can get an idea of the condition
that we are stuck in. Being able to view the progression of log
reservations and releases leading up to the hang (once again from
the event trace) might point out exactly what condition we are
getting stuck in, and that's what we nee dto understand first before
trying to fix the problem.

The log reserve tracepoints give output like this:

           <...>-8903  [002] 241733.544531: xfs_log_reserve:      dev 253:16 type QM_SETQLIM t_ocnt 1 t_cnt 1 t_curr_res 2332 t_unit_res 2332 t_flags XLOG_TIC_INITED reserveq empty writeq empty grant_reserve_cycle 1 grant_reserve_bytes 105116 grant_write_cycle 1 grant_write_bytes 105116 curr_cycle 1 curr_block 23 tail_cycle 1 tail_block 2
           <...>-8903  [002] 241733.544531: xfs_log_grant_enter:  dev 253:16 type QM_SETQLIM t_ocnt 1 t_cnt 1 t_curr_res 2332 t_unit_res 2332 t_flags XLOG_TIC_INITED reserveq empty writeq empty grant_reserve_cycle 1 grant_reserve_bytes 105116 grant_write_cycle 1 grant_write_bytes 105116 curr_cycle 1 curr_block 23 tail_cycle 1 tail_block 2
           <...>-8903  [002] 241733.544531: xfs_log_grant_exit:   dev 253:16 type QM_SETQLIM t_ocnt 1 t_cnt 1 t_curr_res 2332 t_unit_res 2332 t_flags XLOG_TIC_INITED reserveq empty writeq empty grant_reserve_cycle 1 grant_reserve_bytes 107448 grant_write_cycle 1 grant_write_bytes 107448 curr_cycle 1 curr_block 23 tail_cycle 1 tail_block 2

which show all the information about the grant/write head locations,
the current tail location, the transaction type and reservation,
etc. When the transaction is committed we then see:


           <...>-8894  [002] 241733.544536: xfs_log_done_nonperm: dev 253:16 type QM_SETQLIM t_ocnt 1 t_cnt 1 t_curr_res 2180 t_unit_res 2332 t_flags XLOG_TIC_INITED reserveq empty writeq empty grant_reserve_cycle 1 grant_reserve_bytes 107448 grant_write_cycle 1 grant_write_bytes 107448 curr_cycle 1 curr_block 23 tail_cycle 1 tail_block 2
           <...>-8894  [002] 241733.544536: xfs_log_ungrant_enter: dev 253:16 type QM_SETQLIM t_ocnt 1 t_cnt 0 t_curr_res 2180 t_unit_res 2332 t_flags XLOG_TIC_INITED reserveq empty writeq empty grant_reserve_cycle 1 grant_reserve_bytes 107448 grant_write_cycle 1 grant_write_bytes 107448 curr_cycle 1 curr_block 23 tail_cycle 1 tail_block 2
           <...>-8894  [002] 241733.544536: xfs_log_ungrant_sub:  dev 253:16 type QM_SETQLIM t_ocnt 1 t_cnt 0 t_curr_res 2180 t_unit_res 2332 t_flags XLOG_TIC_INITED reserveq empty writeq empty grant_reserve_cycle 1 grant_reserve_bytes 107448 grant_write_cycle 1 grant_write_bytes 107448 curr_cycle 1 curr_block 23 tail_cycle 1 tail_block 2
           <...>-8894  [002] 241733.544537: xfs_log_ungrant_exit: dev 253:16 type QM_SETQLIM t_ocnt 1 t_cnt 0 t_curr_res 2180 t_unit_res 2332 t_flags XLOG_TIC_INITED reserveq empty writeq empty grant_reserve_cycle 1 grant_reserve_bytes 105268 grant_write_cycle 1 grant_write_bytes 105268 curr_cycle 1 curr_block 23 tail_cycle 1 tail_block 2

Where the remaining reservation is returned to the log.

>From these traces, we can see how much space is being used by each
transaction (t_unit_res - t_curr_res), whether the grant head/write
head change apprpropriately, whether there are queued waiters, where
the tail is, where the last iclog was written to, etc. By looking at
the various values, the threshold problem might become obvious.  I
tend to write quick throw-away awk scripts to extract the necessary
fields into a plottable formats and run those through something like
gnuplot or kst to see if there is any visible artifacts that might
be the cause of the problem...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Debug of xfstest 234 hang
  2011-11-10 22:19 ` Dave Chinner
@ 2011-11-18 15:39   ` Chandra Seetharaman
  0 siblings, 0 replies; 3+ messages in thread
From: Chandra Seetharaman @ 2011-11-18 15:39 UTC (permalink / raw)
  To: Dave Chinner; +Cc: XFS Mailing List, Alex Elder

Hi Dave,

I found the problem that is causing this issue. The logic around the
threshold calculation works as expected.

I saw the problem even when there is lot of space left and
xlog_grant_push_ail() returns with free space available.

The problem is in the way the l_reserveq and l_writeq are handled.

When we wake the processes that are sleeping on l_reserveq and l_writeq
thru wake_up(), we do not remove them from the queue, we expect the
process to remove themselves from the list (and we drop the lock). But,
before the woken up process gets a chance to remove itself, some other
process p1 comes in, checks that the queue is not empty and puts itself
at the end of the queue. All the woken up processes remove themselves
from the queue and move on. Whereas, the process p1 just gets stuck in
the queue. Any new process that comes in gets back at the end of the
queue and all of them gets stuck.

The problem doesn't happen if there is lot of activities, which makes
some process calls xfs_log_move_tail() or push the ail (thru
xlog_grant_push_ail()). But, with no activity, all these processes are
never woken up.

IMO, the right solution is to remove the item from the list when we wake
them up. I tried the change and it works as expected.

Will send the patch to the list shortly.

Regards,

Chandra


_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2011-11-18 15:40 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-11-10 21:16 Debug of xfstest 234 hang Chandra Seetharaman
2011-11-10 22:19 ` Dave Chinner
2011-11-18 15:39   ` Chandra Seetharaman

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.