All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/6] xfs: fixes for 3.13-rc4
@ 2013-12-12  5:34 Dave Chinner
  2013-12-12  5:34 ` [PATCH 1/6] xfs: don't try to mark uncached buffers stale on error Dave Chinner
                   ` (6 more replies)
  0 siblings, 7 replies; 26+ messages in thread
From: Dave Chinner @ 2013-12-12  5:34 UTC (permalink / raw)
  To: xfs

Hi folks,

Theses are all the kernel fixes I've accumulated over the past
few weeks. There are a couple of IO error handling fixes, log grant
head accounting fixes to avod spurious warnings, a swalloc alignment
fix, and a patch to fix all the leaks in
xlog_recover_process_data().

Cheers,

Dave.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH 1/6] xfs: don't try to mark uncached buffers stale on error.
  2013-12-12  5:34 [PATCH 0/6] xfs: fixes for 3.13-rc4 Dave Chinner
@ 2013-12-12  5:34 ` Dave Chinner
  2013-12-12  9:30   ` Jeff Liu
                     ` (2 more replies)
  2013-12-12  5:34 ` [PATCH 2/6] xfs: prevent spurious "head behind tail" warnings Dave Chinner
                   ` (5 subsequent siblings)
  6 siblings, 3 replies; 26+ messages in thread
From: Dave Chinner @ 2013-12-12  5:34 UTC (permalink / raw)
  To: xfs

From: Dave Chinner <dchinner@redhat.com>

fsstress failed during a shutdown with the following assert:

XFS: Assertion failed: xfs_buf_islocked(bp), file: fs/xfs/xfs_buf.c, line: 143
.....
 xfs_buf_stale+0x3f/0xf0
 xfs_bioerror_relse+0x2d/0x90
 xfsbdstrat+0x51/0xa0
 xfs_zero_remaining_bytes+0x1d1/0x2d0
 xfs_free_file_space+0x5d0/0x600
 xfs_change_file_space+0x251/0x3a0
 xfs_ioc_space+0xcc/0x130
.....

xfs_zero_remaining_bytes() works with uncached buffers, and hence if
we are preventing IO due to a shutdown, we should not be marking it
stale as that is only for cached buffers. Instead, just mark it with
an error and make sure it gets to the caller.

[ 7732.193441] XFS: Assertion failed: xfs_buf_islocked(bp), file: fs/xfs/xfs_buf.c, line: 96
[ 7732.195036] ------------[ cut here ]------------
[ 7732.195890] kernel BUG at fs/xfs/xfs_message.c:107!
[ 7732.196018] invalid opcode: 0000 [#1] SMP
[ 7732.196018] Modules linked in:
[ 7732.196018] CPU: 0 PID: 2899 Comm: fsstress Not tainted 3.12.0-rc7-dgc+ #47
[ 7732.196018] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
[ 7732.196018] task: ffff88003a83ada0 ti: ffff88002c0ae000 task.ti: ffff88002c0ae000
[ 7732.196018] RIP: 0010:[<ffffffff81488462>]  [<ffffffff81488462>] assfail+0x22/0x30
[ 7732.196018] RSP: 0000:ffff88002c0afae8  EFLAGS: 00010292
[ 7732.196018] RAX: 000000000000004d RBX: ffff880002e59600 RCX: 0000000000000000
[ 7732.196018] RDX: ffff88003fc0ed68 RSI: ffff88003fc0d3f8 RDI: 0000000000000246
[ 7732.196018] RBP: ffff88002c0afae8 R08: 0000000000000096 R09: 00000000000012ec
[ 7732.196018] R10: 0000000000000000 R11: 00000000000012eb R12: 0000000000100002
[ 7732.196018] R13: ffffffff81473523 R14: 0000000000009fff R15: 0000000000009fff
[ 7732.196018] FS:  00007f8017e4a700(0000) GS:ffff88003fc00000(0000) knlGS:0000000000000000
[ 7732.196018] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 7732.196018] CR2: 00007ff8d020a000 CR3: 0000000009b43000 CR4: 00000000000006f0
[ 7732.196018] Stack:
[ 7732.196018]  ffff88002c0afb08 ffffffff8147476e ffff880002e59600 0000000000100002
[ 7732.196018]  ffff88002c0afb28 ffffffff814762cd ffff880002e59600 ffff880002e59600
[ 7732.196018]  ffff88002c0afb58 ffffffff81476841 0000000000009f89 ffff880002e59600
[ 7732.196018] Call Trace:
[ 7732.196018]  [<ffffffff8147476e>] xfs_buf_stale+0x2e/0xb0
[ 7732.196018]  [<ffffffff814762cd>] xfs_bioerror_relse+0x2d/0x90
[ 7732.196018]  [<ffffffff81476841>] xfsbdstrat+0x51/0x90
[ 7732.196018]  [<ffffffff81473523>] xfs_zero_remaining_bytes+0x1d3/0x2d0
[ 7732.196018]  [<ffffffff81473ba6>] xfs_free_file_space+0x586/0x5b0
[ 7732.196018]  [<ffffffff81190002>] ? slabs_cpu_partial_show+0xd2/0x120
[ 7732.196018]  [<ffffffff811b9399>] ? mntput_no_expire+0x49/0x160
[ 7732.196018]  [<ffffffff811b8812>] ? mnt_clone_write+0x12/0x30
[ 7732.196018]  [<ffffffff81aba3f6>] ? down_write+0x16/0x40
[ 7732.196018]  [<ffffffff81481c14>] xfs_ioc_space+0x2d4/0x450
[ 7732.196018]  [<ffffffff811a700b>] ? path_lookupat+0x6b/0x760
[ 7732.196018]  [<ffffffff8148f57e>] ? xfs_trans_free+0x6e/0x80
[ 7732.196018]  [<ffffffff81192a81>] ? kmem_cache_alloc+0x31/0x150
[ 7732.196018]  [<ffffffff8148326b>] xfs_file_ioctl+0x48b/0xae0
[ 7732.196018]  [<ffffffff811a60d6>] ? final_putname+0x26/0x50
[ 7732.196018]  [<ffffffff8110b6a2>] ? from_kgid+0x12/0x20
[ 7732.196018]  [<ffffffff8110b6be>] ? from_kgid_munged+0xe/0x20
[ 7732.196018]  [<ffffffff8119f856>] ? cp_new_stat+0x146/0x160
[ 7732.196018]  [<ffffffff811ace02>] do_vfs_ioctl+0x452/0x530
[ 7732.196018]  [<ffffffff8119fd35>] ? SYSC_newfstat+0x25/0x30
[ 7732.196018]  [<ffffffff811acf71>] SyS_ioctl+0x91/0xb0
[ 7732.196018]  [<ffffffff81ac6029>] system_call_fastpath+0x16/0x1b


Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/xfs_buf.c | 47 ++++++++++++++++++++++++++++-------------------
 1 file changed, 28 insertions(+), 19 deletions(-)

diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c
index ce01c1a..27dc152 100644
--- a/fs/xfs/xfs_buf.c
+++ b/fs/xfs/xfs_buf.c
@@ -1076,7 +1076,14 @@ xfs_bioerror(
 	 */
 	XFS_BUF_UNREAD(bp);
 	XFS_BUF_UNDONE(bp);
-	xfs_buf_stale(bp);
+
+	/*
+	 * we might be handling an uncached buffer here, in which case the
+	 * stale buffer handling is irrelevant as is doing IO with the buffer
+	 * locked. Hence we don't mark them stale.
+	 */
+	if (bp->b_pag)
+		xfs_buf_stale(bp);
 
 	xfs_buf_ioend(bp, 0);
 
@@ -1094,25 +1101,20 @@ xfs_bioerror_relse(
 	struct xfs_buf	*bp)
 {
 	int64_t		fl = bp->b_flags;
+
 	/*
-	 * No need to wait until the buffer is unpinned.
-	 * We aren't flushing it.
-	 *
-	 * chunkhold expects B_DONE to be set, whether
-	 * we actually finish the I/O or not. We don't want to
-	 * change that interface.
+	 * No need to wait until the buffer is unpinned. We aren't flushing it.
 	 */
 	XFS_BUF_UNREAD(bp);
 	XFS_BUF_DONE(bp);
 	xfs_buf_stale(bp);
 	bp->b_iodone = NULL;
+
+	/*
+	 * There's no reason to mark error for ASYNC buffers as there is no-one
+	 * waiting to collect the error.
+	 */
 	if (!(fl & XBF_ASYNC)) {
-		/*
-		 * Mark b_error and B_ERROR _both_.
-		 * Lot's of chunkcache code assumes that.
-		 * There's no reason to mark error for
-		 * ASYNC buffers.
-		 */
 		xfs_buf_ioerror(bp, EIO);
 		complete(&bp->b_iowait);
 	} else {
@@ -1129,14 +1131,13 @@ xfs_bdstrat_cb(
 	if (XFS_FORCED_SHUTDOWN(bp->b_target->bt_mount)) {
 		trace_xfs_bdstrat_shut(bp, _RET_IP_);
 		/*
-		 * Metadata write that didn't get logged but
-		 * written delayed anyway. These aren't associated
-		 * with a transaction, and can be ignored.
+		 * If this is a cached write, then it is likely to be a delayed
+		 * write metadata buffer that can be ignored because the
+		 * contents are logged.
 		 */
 		if (!bp->b_iodone && !XFS_BUF_ISREAD(bp))
 			return xfs_bioerror_relse(bp);
-		else
-			return xfs_bioerror(bp);
+		return xfs_bioerror(bp);
 	}
 
 	xfs_buf_iorequest(bp);
@@ -1176,7 +1177,15 @@ xfsbdstrat(
 {
 	if (XFS_FORCED_SHUTDOWN(mp)) {
 		trace_xfs_bdstrat_shut(bp, _RET_IP_);
-		xfs_bioerror_relse(bp);
+		/*
+		 * we could be handling uncached IO here, in which case there is
+		 * always a caller waiting to collect the error and releas the
+		 * buffer.
+		 */
+		if (bp->b_pag)
+			xfs_bioerror_relse(bp);
+		else
+			xfs_bioerror(bp);
 		return;
 	}
 
-- 
1.8.4.rc3

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 2/6] xfs: prevent spurious "head behind tail" warnings
  2013-12-12  5:34 [PATCH 0/6] xfs: fixes for 3.13-rc4 Dave Chinner
  2013-12-12  5:34 ` [PATCH 1/6] xfs: don't try to mark uncached buffers stale on error Dave Chinner
@ 2013-12-12  5:34 ` Dave Chinner
  2013-12-12  5:34 ` [PATCH 3/6] xfs: prevent spurious "space > BBTOB(tail_blocks)" warnings Dave Chinner
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 26+ messages in thread
From: Dave Chinner @ 2013-12-12  5:34 UTC (permalink / raw)
  To: xfs

From: Dave Chinner <dchinner@redhat.com>

When xlog_space_left() cracks the grant head and the log tail, it
does so without locking to synchronise the sampling of the
variables. It samples the grant head first, so if there is a delay
before it smaples the log tail, there is a window where the log tail
could have moved onwards and be moved past the sampled value of the
grant head. This then leads to the "xlog_space_left: head behind
tail" warning message.

To avoid spurious output in this situation, swap the order in which
the variables are cracked and disable preemption to minimise the
potential delays between them being sampled. This means that the
head may grant head may move if there is a delay but the log tail
will be stable, hence ensuring the tail does not jump the head
accidentally. The code also handles this condition without problems
or warnings.

While there, update the code with detailed comments on the cases
that it is handling so it is easier to understand in future.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/xfs_log.c | 99 +++++++++++++++++++++++++++++++++++++-------------------
 1 file changed, 66 insertions(+), 33 deletions(-)

diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c
index 8497a00..97b2705 100644
--- a/fs/xfs/xfs_log.c
+++ b/fs/xfs/xfs_log.c
@@ -1095,55 +1095,88 @@ xlog_assign_tail_lsn(
 }
 
 /*
- * Return the space in the log between the tail and the head.  The head
- * is passed in the cycle/bytes formal parms.  In the special case where
- * the reserve head has wrapped passed the tail, this calculation is no
- * longer valid.  In this case, just return 0 which means there is no space
- * in the log.  This works for all places where this function is called
- * with the reserve head.  Of course, if the write head were to ever
- * wrap the tail, we should blow up.  Rather than catch this case here,
- * we depend on other ASSERTions in other parts of the code.   XXXmiken
+ * Return the space in the log between the tail and the head.
  *
- * This code also handles the case where the reservation head is behind
- * the tail.  The details of this case are described below, but the end
- * result is that we return the size of the log as the amount of space left.
+ * This has two special cases where head and tail overlap
+ *	- head wraps passed the tail; the log is considered full
+ *	- tail overtakes the head;  the log is considered empty.
+ *	  This should never happen, so warn when it does.
  */
 STATIC int
 xlog_space_left(
 	struct xlog	*log,
 	atomic64_t	*head)
 {
-	int		free_bytes;
 	int		tail_bytes;
 	int		tail_cycle;
 	int		head_cycle;
 	int		head_bytes;
 
+	/*
+	 * Sample the tail before head to avoid spurious "head behind tail"
+	 * warnings due to racing tail updates. We disable preemption and dump a
+	 * memory barrier here to make sure we pick up the latest values with as
+	 * little latency between the samples as possible.
+	 */
+	preempt_disable();
+	smp_mb();
+	xlog_crack_atomic_lsn(&log->l_tail_lsn, &tail_cycle,
+			      &tail_bytes);
 	xlog_crack_grant_head(head, &head_cycle, &head_bytes);
-	xlog_crack_atomic_lsn(&log->l_tail_lsn, &tail_cycle, &tail_bytes);
+	preempt_enable();
+
 	tail_bytes = BBTOB(tail_bytes);
-	if (tail_cycle == head_cycle && head_bytes >= tail_bytes)
-		free_bytes = log->l_logsize - (head_bytes - tail_bytes);
-	else if (tail_cycle + 1 < head_cycle)
+
+	/*
+	 * If the tail cycle is not within one count of head, we are in grant
+	 * space overcommit situation and so there is no log space available to
+	 * anyone right now.
+	 */
+	if (tail_cycle + 1 < head_cycle)
 		return 0;
-	else if (tail_cycle < head_cycle) {
+
+	/*
+	 * If the tail is in previous cycle, space available is the region
+	 * between the tail and the head.
+	 *
+	 *	     H		 T
+	 *    +2222222111111111111111+
+	 *            +-- free --+
+	 */
+	if (tail_cycle < head_cycle) {
 		ASSERT(tail_cycle == (head_cycle - 1));
-		free_bytes = tail_bytes - head_bytes;
-	} else {
-		/*
-		 * The reservation head is behind the tail.
-		 * In this case we just want to return the size of the
-		 * log as the amount of space left.
-		 */
-		xfs_alert(log->l_mp,
-			"xlog_space_left: head behind tail\n"
-			"  tail_cycle = %d, tail_bytes = %d\n"
-			"  GH   cycle = %d, GH   bytes = %d",
-			tail_cycle, tail_bytes, head_cycle, head_bytes);
-		ASSERT(0);
-		free_bytes = log->l_logsize;
-	}
-	return free_bytes;
+		return tail_bytes - head_bytes;
+	}
+
+	/*
+	 * If the tail is in the same cycle, space available is the regions
+	 * outside the range between the tail and the head:
+	 *
+	 *	   T	       H
+	 *    +2222222222222222211111+
+	 *          +-- used --+
+	 *    +free+		+free+
+	 *
+	 * If the head and tail are the same, then the whole log is free.
+	 */
+	if (tail_cycle == head_cycle && head_bytes >= tail_bytes)
+		return log->l_logsize - (head_bytes - tail_bytes);
+
+	/*
+	 * The head is behind the tail. That's not supposed to happen, but we
+	 * need to handle it. In this case, it's the equivalent of the entire
+	 * log being empty and available for use.
+	 */
+	xfs_alert(log->l_mp,
+		  "%s: Head behind log tail, caller %pF\n"
+		  "  tail_cycle = %d, tail_bytes = %d\n"
+		  "  GH   cycle = %d, GH   bytes = %d",
+		  __func__, (void *)_RET_IP_,
+		  tail_cycle, tail_bytes, head_cycle, head_bytes);
+#ifdef DEBUG
+	dump_stack();
+#endif
+	return log->l_logsize;
 }
 
 
-- 
1.8.4.rc3

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 3/6] xfs: prevent spurious "space > BBTOB(tail_blocks)" warnings
  2013-12-12  5:34 [PATCH 0/6] xfs: fixes for 3.13-rc4 Dave Chinner
  2013-12-12  5:34 ` [PATCH 1/6] xfs: don't try to mark uncached buffers stale on error Dave Chinner
  2013-12-12  5:34 ` [PATCH 2/6] xfs: prevent spurious "head behind tail" warnings Dave Chinner
@ 2013-12-12  5:34 ` Dave Chinner
  2013-12-12  5:34 ` [PATCH 4/6] xfs: swalloc doesn't align allocations properly Dave Chinner
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 26+ messages in thread
From: Dave Chinner @ 2013-12-12  5:34 UTC (permalink / raw)
  To: xfs

When xlog_verify_grant_tail() cracks the grant head and the log
tail, it does so without locking to synchronise the sampling of the
variables. A delay between samples can give the wrong results,
sometimes leading to false warnings being emitted.

To avoid spurious output in this situation, disable preemption to
minimise the potential delays between them being sampled. This means
that the log tail may still move, but it won't trigger warnings
unless it moves beyond the current head cycle. If we see delays like
that we probably should be throwing a warning, anyway.

Further, the write head can validly pass the tail in certain
circumstances. Document those circumstances in the commentsx, and
remove the xlog_verify_grant_tail() call in xfs_log_reserve() that
is guaranteed to emit false positive warnings due it being the
source of temporary overcommit conditions.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/xfs_log.c | 94 +++++++++++++++++++++++++++++++++++++++++---------------
 1 file changed, 69 insertions(+), 25 deletions(-)

diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c
index 97b2705..f4ccc10 100644
--- a/fs/xfs/xfs_log.c
+++ b/fs/xfs/xfs_log.c
@@ -472,7 +472,6 @@ xfs_log_reserve(
 	xlog_grant_add_space(log, &log->l_reserve_head.grant, need_bytes);
 	xlog_grant_add_space(log, &log->l_write_head.grant, need_bytes);
 	trace_xfs_log_reserve_exit(log, tic);
-	xlog_verify_grant_tail(log);
 	return 0;
 
 out_error:
@@ -3651,36 +3650,81 @@ xlog_verify_dest_ptr(
  * the cycles are the same, we can't be overlapping.  Otherwise, make sure that
  * the cycles differ by exactly one and check the byte count.
  *
- * This check is run unlocked, so can give false positives. Rather than assert
- * on failures, use a warn-once flag and a panic tag to allow the admin to
- * determine if they want to panic the machine when such an error occurs. For
- * debug kernels this will have the same effect as using an assert but, unlinke
- * an assert, it can be turned off at runtime.
+ * We only do this check when regranting write space because xfs_log_reserve can
+ * race with a regrant and a reserve space overcommit can result in a write
+ * space overcommit from xfs_log_reserve() once log space becomes available
+ * again. This is only a temporary situation, but it means that checking the
+ * write grant head and log tail from xfs_log_reserve gives false positives.
+ *
+ * If we only call from xfs_log_regrant(), we are only called after overcommit
+ * situations have been resolved by sleeping. If the head and tail overlap at
+ * this point, then we have a problem.
+ *
+ * Because we always want to know about this issue if there is a filesystem hang
+ * due to log space availability, use a warn-once flag and a panic tag to allow
+ * the admin to determine if they want to panic the machine when such an error
+ * occurs. For debug kernels this will have the same effect as using an assert
+ * but, unlike an assert, it can be turned off at runtime.
  */
 STATIC void
 xlog_verify_grant_tail(
 	struct xlog	*log)
 {
-	int		tail_cycle, tail_blocks;
-	int		cycle, space;
-
-	xlog_crack_grant_head(&log->l_write_head.grant, &cycle, &space);
-	xlog_crack_atomic_lsn(&log->l_tail_lsn, &tail_cycle, &tail_blocks);
-	if (tail_cycle != cycle) {
-		if (cycle - 1 != tail_cycle &&
-		    !(log->l_flags & XLOG_TAIL_WARN)) {
-			xfs_alert_tag(log->l_mp, XFS_PTAG_LOGRES,
-				"%s: cycle - 1 != tail_cycle", __func__);
-			log->l_flags |= XLOG_TAIL_WARN;
-		}
+	int		tail_cycle;
+	int		tail_blocks;
+	int		head_cycle;
+	int		head_bytes;
 
-		if (space > BBTOB(tail_blocks) &&
-		    !(log->l_flags & XLOG_TAIL_WARN)) {
-			xfs_alert_tag(log->l_mp, XFS_PTAG_LOGRES,
-				"%s: space > BBTOB(tail_blocks)", __func__);
-			log->l_flags |= XLOG_TAIL_WARN;
-		}
-	}
+	/*
+	 * If we've already detected an problem here, don't bother checking
+	 * again.
+	 */
+	if (log->l_flags & XLOG_TAIL_WARN)
+		return;
+
+	/*
+	 * Sample the tail before head to avoid spurious warnings due to racing
+	 * tail updates. We disable preemption and dump a memory barrier here to
+	 * make sure we pick up the latest values with as little latency between
+	 * the samples as possible.
+	 */
+	preempt_disable();
+	smp_mb();
+	xlog_crack_atomic_lsn(&log->l_tail_lsn, &tail_cycle,
+			      &tail_blocks);
+	xlog_crack_grant_head(&log->l_write_head.grant, &head_cycle,
+			      &head_bytes);
+	preempt_enable();
+
+	/*
+	 * if the cycles are the same, the head and tail can't be
+	 * overlapping, so everything is ok and we are done.
+	 */
+	if (tail_cycle == head_cycle)
+		return;
+
+	/*
+	 * if the tail is on the previous cycle to the head and the head
+	 * is before the tail, then all is good.
+	 */
+	if (tail_cycle == head_cycle - 1 &&
+	    BBTOB(tail_blocks) >= head_bytes)
+		return;
+
+	/*
+	 * OK, we're in trouble now - the head and tail are out of sync. Time to
+	 * issue a warning about it
+	 */
+	xfs_alert(log->l_mp,
+		  "%s: Write head overlaps the tail, caller %pF\n"
+		  "  tail_cycle = %d, tail_bytes = %d\n"
+		  "  GH   cycle = %d, GH   bytes = %d",
+		  __func__, (void *)_RET_IP_,
+		  tail_cycle, BBTOB(tail_blocks), head_cycle, head_bytes);
+#ifdef DEBUG
+	dump_stack();
+#endif
+	log->l_flags |= XLOG_TAIL_WARN;
 }
 
 /* check if it will fit */
-- 
1.8.4.rc3

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 4/6] xfs: swalloc doesn't align allocations properly
  2013-12-12  5:34 [PATCH 0/6] xfs: fixes for 3.13-rc4 Dave Chinner
                   ` (2 preceding siblings ...)
  2013-12-12  5:34 ` [PATCH 3/6] xfs: prevent spurious "space > BBTOB(tail_blocks)" warnings Dave Chinner
@ 2013-12-12  5:34 ` Dave Chinner
  2013-12-13 12:01   ` Christoph Hellwig
  2013-12-12  5:34 ` [PATCH 5/6] xfs: xlog_recover_process_data leaks like a sieve Dave Chinner
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 26+ messages in thread
From: Dave Chinner @ 2013-12-12  5:34 UTC (permalink / raw)
  To: xfs

From: Dave Chinner <dchinner@redhat.com>

When swalloc is specified as a mount option, allocations are
supposed to be aligned to the stripe width rather than the stripe
unit of the underlying filesystem. However, it does not do this.

What the implementation does is round up the allocation size to a
stripe width, hence ensuring that all allocations span a full stripe
width. It does not, however, ensure that that allocation is aligned
to a stripe width, and hence the allocations can span multiple
underlying stripes and so still see RMW cycles for things like
direct IO on MD RAID.

So, if the swalloc mount option is set, change the allocation
alignment in xfs_bmap_btalloc() to use the stripe width rather than
the stripe unit.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/xfs_bmap.c | 23 +++++++++++++++++------
 1 file changed, 17 insertions(+), 6 deletions(-)

diff --git a/fs/xfs/xfs_bmap.c b/fs/xfs/xfs_bmap.c
index 8401f11..3b2c14b 100644
--- a/fs/xfs/xfs_bmap.c
+++ b/fs/xfs/xfs_bmap.c
@@ -3648,10 +3648,19 @@ xfs_bmap_btalloc(
 	int		isaligned;
 	int		tryagain;
 	int		error;
+	int		stripe_align;
 
 	ASSERT(ap->length);
 
 	mp = ap->ip->i_mount;
+
+	/* stripe alignment for allocation is determined by mount parameters */
+	stripe_align = 0;
+	if (mp->m_swidth && (mp->m_flags & XFS_MOUNT_SWALLOC))
+		stripe_align = mp->m_swidth;
+	else if (mp->m_dalign)
+		stripe_align = mp->m_dalign;
+
 	align = ap->userdata ? xfs_get_extsz_hint(ap->ip) : 0;
 	if (unlikely(align)) {
 		error = xfs_bmap_extsize_align(mp, &ap->got, &ap->prev,
@@ -3660,6 +3669,8 @@ xfs_bmap_btalloc(
 		ASSERT(!error);
 		ASSERT(ap->length);
 	}
+
+
 	nullfb = *ap->firstblock == NULLFSBLOCK;
 	fb_agno = nullfb ? NULLAGNUMBER : XFS_FSB_TO_AGNO(mp, *ap->firstblock);
 	if (nullfb) {
@@ -3735,7 +3746,7 @@ xfs_bmap_btalloc(
 	 */
 	if (!ap->flist->xbf_low && ap->aeof) {
 		if (!ap->offset) {
-			args.alignment = mp->m_dalign;
+			args.alignment = stripe_align;
 			atype = args.type;
 			isaligned = 1;
 			/*
@@ -3760,13 +3771,13 @@ xfs_bmap_btalloc(
 			 * of minlen+alignment+slop doesn't go up
 			 * between the calls.
 			 */
-			if (blen > mp->m_dalign && blen <= args.maxlen)
-				nextminlen = blen - mp->m_dalign;
+			if (blen > stripe_align && blen <= args.maxlen)
+				nextminlen = blen - stripe_align;
 			else
 				nextminlen = args.minlen;
-			if (nextminlen + mp->m_dalign > args.minlen + 1)
+			if (nextminlen + stripe_align > args.minlen + 1)
 				args.minalignslop =
-					nextminlen + mp->m_dalign -
+					nextminlen + stripe_align -
 					args.minlen - 1;
 			else
 				args.minalignslop = 0;
@@ -3788,7 +3799,7 @@ xfs_bmap_btalloc(
 		 */
 		args.type = atype;
 		args.fsbno = ap->blkno;
-		args.alignment = mp->m_dalign;
+		args.alignment = stripe_align;
 		args.minlen = nextminlen;
 		args.minalignslop = 0;
 		isaligned = 1;
-- 
1.8.4.rc3

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 5/6] xfs: xlog_recover_process_data leaks like a sieve
  2013-12-12  5:34 [PATCH 0/6] xfs: fixes for 3.13-rc4 Dave Chinner
                   ` (3 preceding siblings ...)
  2013-12-12  5:34 ` [PATCH 4/6] xfs: swalloc doesn't align allocations properly Dave Chinner
@ 2013-12-12  5:34 ` Dave Chinner
  2013-12-13 12:32   ` Christoph Hellwig
  2013-12-12  5:34 ` [PATCH 6/6] xfs: abort metadata writeback on permanent errors Dave Chinner
  2013-12-17 16:02 ` [PATCH 0/6] xfs: fixes for 3.13-rc4 Ben Myers
  6 siblings, 1 reply; 26+ messages in thread
From: Dave Chinner @ 2013-12-12  5:34 UTC (permalink / raw)
  To: xfs

From: Dave Chinner <dchinner@redhat.com>

Fix the double free of the transaction structure introduced by
commit 2a84108 ("xfs: free the list of recovery items on error").
In the process, make the freeing of the trans structure on error or
completion of processing consistent - i.e. the responsibility of the
the function that detected the error or completes processing. Add
comments to document this behaviour so it can be maintained more
easily in future.

Fix the rest of the memory leaks of the transaction structure used
by xlog_recover_process_data() that commit 2a84108 didn't address.

Fix potential use-after-free of the trans structure by ensuring
they are removed from the transaction recoovery-in-progress hash
table before they are freed.

Remove all the shouty XFS_ERROR() macros that are used directly
after ASSERT(0) calls as they are entirely redundant and make the
code harder to read.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/xfs_log_recover.c | 155 +++++++++++++++++++++++++++--------------------
 1 file changed, 88 insertions(+), 67 deletions(-)

diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c
index 07ab52c..517f7ee 100644
--- a/fs/xfs/xfs_log_recover.c
+++ b/fs/xfs/xfs_log_recover.c
@@ -1483,6 +1483,32 @@ xlog_recover_add_item(
 	list_add_tail(&item->ri_list, head);
 }
 
+/*
+ * Free up any resources allocated by the transaction
+ *
+ * Remember that EFIs, EFDs, and IUNLINKs are handled later.
+ */
+STATIC void
+xlog_recover_free_trans(
+	struct xlog_recover	*trans)
+{
+	xlog_recover_item_t	*item, *n;
+	int			i;
+
+	hlist_del_init(&trans->r_list);
+	list_for_each_entry_safe(item, n, &trans->r_itemq, ri_list) {
+		/* Free the regions in the item. */
+		list_del(&item->ri_list);
+		for (i = 0; i < item->ri_cnt; i++)
+			kmem_free(item->ri_buf[i].i_addr);
+		/* Free the item itself */
+		kmem_free(item->ri_buf);
+		kmem_free(item);
+	}
+	/* Free the transaction recover structure */
+	kmem_free(trans);
+}
+
 STATIC int
 xlog_recover_add_to_cont_trans(
 	struct xlog		*log,
@@ -1548,7 +1574,7 @@ xlog_recover_add_to_trans(
 			xfs_warn(log->l_mp, "%s: bad header magic number",
 				__func__);
 			ASSERT(0);
-			return XFS_ERROR(EIO);
+			goto out_eio;
 		}
 		if (len == sizeof(xfs_trans_header_t))
 			xlog_recover_add_item(&trans->r_itemq);
@@ -1577,8 +1603,8 @@ xlog_recover_add_to_trans(
 		"bad number of regions (%d) in inode log format",
 				  in_f->ilf_size);
 			ASSERT(0);
-			kmem_free(ptr);
-			return XFS_ERROR(EIO);
+			goto out_free_eio;
+
 		}
 
 		item->ri_total = in_f->ilf_size;
@@ -1593,6 +1619,17 @@ xlog_recover_add_to_trans(
 	item->ri_cnt++;
 	trace_xfs_log_recover_item_add(log, trans, item, 0);
 	return 0;
+
+out_free_eio:
+	kmem_free(ptr);
+out_eio:
+	/*
+	 * This transaction is now unrecoverable, so we need to remove it from
+	 * the transaction hash so nobody else can find it and free it.  The
+	 * error we return will abort further recovery processing.
+	 */
+	xlog_recover_free_trans(trans);
+	return EIO;
 }
 
 /*
@@ -1699,7 +1736,7 @@ xlog_recover_reorder_trans(
 			 */
 			if (!list_empty(&sort_list))
 				list_splice_init(&sort_list, &trans->r_itemq);
-			error = XFS_ERROR(EIO);
+			error = EIO;
 			goto out;
 		}
 	}
@@ -1713,6 +1750,15 @@ out:
 		list_splice_tail(&inode_buffer_list, &trans->r_itemq);
 	if (!list_empty(&cancel_list))
 		list_splice_tail(&cancel_list, &trans->r_itemq);
+
+	/*
+	 * If we failed to reorder the transaction, it is now unrecoverable so
+	 * we need to remove it from the transaction hash so nobody else can
+	 * find it and free it.  The error we return will abort further recovery
+	 * processing.
+	 */
+	if (error)
+		xlog_recover_free_trans(trans);
 	return error;
 }
 
@@ -3235,31 +3281,6 @@ xlog_recover_do_icreate_pass2(
 	return 0;
 }
 
-/*
- * Free up any resources allocated by the transaction
- *
- * Remember that EFIs, EFDs, and IUNLINKs are handled later.
- */
-STATIC void
-xlog_recover_free_trans(
-	struct xlog_recover	*trans)
-{
-	xlog_recover_item_t	*item, *n;
-	int			i;
-
-	list_for_each_entry_safe(item, n, &trans->r_itemq, ri_list) {
-		/* Free the regions in the item. */
-		list_del(&item->ri_list);
-		for (i = 0; i < item->ri_cnt; i++)
-			kmem_free(item->ri_buf[i].i_addr);
-		/* Free the item itself */
-		kmem_free(item->ri_buf);
-		kmem_free(item);
-	}
-	/* Free the transaction recover structure */
-	kmem_free(trans);
-}
-
 STATIC void
 xlog_recover_buffer_ra_pass2(
 	struct xlog                     *log,
@@ -3384,7 +3405,7 @@ xlog_recover_commit_pass1(
 		xfs_warn(log->l_mp, "%s: invalid item type (%d)",
 			__func__, ITEM_TYPE(item));
 		ASSERT(0);
-		return XFS_ERROR(EIO);
+		return EIO;
 	}
 }
 
@@ -3420,7 +3441,7 @@ xlog_recover_commit_pass2(
 		xfs_warn(log->l_mp, "%s: invalid item type (%d)",
 			__func__, ITEM_TYPE(item));
 		ASSERT(0);
-		return XFS_ERROR(EIO);
+		return EIO;
 	}
 }
 
@@ -3467,7 +3488,7 @@ xlog_recover_commit_trans(
 
 	#define XLOG_RECOVER_COMMIT_QUEUE_MAX 100
 
-	hlist_del(&trans->r_list);
+	hlist_del_init(&trans->r_list);
 
 	error = xlog_recover_reorder_trans(log, trans, pass);
 	if (error)
@@ -3488,17 +3509,17 @@ xlog_recover_commit_trans(
 				list_splice_tail_init(&ra_list, &done_list);
 				items_queued = 0;
 			}
-
 			break;
 		default:
 			ASSERT(0);
+			error = ERANGE;
+			break;
 		}
 
 		if (error)
-			goto out;
+			break;
 	}
 
-out:
 	if (!list_empty(&ra_list)) {
 		if (!error)
 			error = xlog_recover_items_pass2(log, trans,
@@ -3509,22 +3530,17 @@ out:
 	if (!list_empty(&done_list))
 		list_splice_init(&done_list, &trans->r_itemq);
 
+	/*
+	 * We've already removed the trans structure from the hash, so nobody
+	 * else will ever find this structure again. Hence we must free it here
+	 * regardless of whether we processed it successfully or not.
+	 */
 	xlog_recover_free_trans(trans);
 
 	error2 = xfs_buf_delwri_submit(&buffer_list);
 	return error ? error : error2;
 }
 
-STATIC int
-xlog_recover_unmount_trans(
-	struct xlog		*log,
-	struct xlog_recover	*trans)
-{
-	/* Do nothing now */
-	xfs_warn(log->l_mp, "%s: Unmount LR", __func__);
-	return 0;
-}
-
 /*
  * There are two valid states of the r_state field.  0 indicates that the
  * transaction structure is in a normal state.  We have either seen the
@@ -3545,9 +3561,9 @@ xlog_recover_process_data(
 	xfs_caddr_t		lp;
 	int			num_logops;
 	xlog_op_header_t	*ohead;
-	xlog_recover_t		*trans;
+	xlog_recover_t		*trans = NULL;
 	xlog_tid_t		tid;
-	int			error;
+	int			error = 0;
 	unsigned long		hash;
 	uint			flags;
 
@@ -3556,7 +3572,7 @@ xlog_recover_process_data(
 
 	/* check the log format matches our own - else we can't recover */
 	if (xlog_header_check_recover(log->l_mp, rhead))
-		return (XFS_ERROR(EIO));
+		return XFS_ERROR(EIO);
 
 	while ((dp < lp) && num_logops) {
 		ASSERT(dp + sizeof(xlog_op_header_t) <= lp);
@@ -3564,10 +3580,13 @@ xlog_recover_process_data(
 		dp += sizeof(xlog_op_header_t);
 		if (ohead->oh_clientid != XFS_TRANSACTION &&
 		    ohead->oh_clientid != XFS_LOG) {
-			xfs_warn(log->l_mp, "%s: bad clientid 0x%x",
+			xfs_warn(log->l_mp,
+				"%s: bad transaction opheader clientid 0x%x",
 					__func__, ohead->oh_clientid);
 			ASSERT(0);
-			return (XFS_ERROR(EIO));
+			if (trans)
+				xlog_recover_free_trans(trans);
+			return EIO;
 		}
 		tid = be32_to_cpu(ohead->oh_tid);
 		hash = XLOG_RHASH(tid);
@@ -3578,11 +3597,14 @@ xlog_recover_process_data(
 					be64_to_cpu(rhead->h_lsn));
 		} else {
 			if (dp + be32_to_cpu(ohead->oh_len) > lp) {
-				xfs_warn(log->l_mp, "%s: bad length 0x%x",
+				xfs_warn(log->l_mp,
+				"%s: bad transaction opheader length 0x%x",
 					__func__, be32_to_cpu(ohead->oh_len));
 				WARN_ON(1);
-				return (XFS_ERROR(EIO));
+				xlog_recover_free_trans(trans);
+				return EIO;
 			}
+
 			flags = ohead->oh_flags & ~XLOG_END_TRANS;
 			if (flags & XLOG_WAS_CONT_TRANS)
 				flags &= ~XLOG_CONTINUE_TRANS;
@@ -3591,36 +3613,35 @@ xlog_recover_process_data(
 				error = xlog_recover_commit_trans(log,
 								trans, pass);
 				break;
-			case XLOG_UNMOUNT_TRANS:
-				error = xlog_recover_unmount_trans(log, trans);
-				break;
 			case XLOG_WAS_CONT_TRANS:
 				error = xlog_recover_add_to_cont_trans(log,
 						trans, dp,
 						be32_to_cpu(ohead->oh_len));
 				break;
-			case XLOG_START_TRANS:
-				xfs_warn(log->l_mp, "%s: bad transaction",
-					__func__);
-				ASSERT(0);
-				error = XFS_ERROR(EIO);
-				break;
 			case 0:
 			case XLOG_CONTINUE_TRANS:
 				error = xlog_recover_add_to_trans(log, trans,
 						dp, be32_to_cpu(ohead->oh_len));
 				break;
+			case XLOG_UNMOUNT_TRANS:
+				xfs_warn(log->l_mp, "%s: Unmount LR", __func__);
+				break;
+			case XLOG_START_TRANS:
 			default:
-				xfs_warn(log->l_mp, "%s: bad flag 0x%x",
+				xfs_warn(log->l_mp,
+				"%s: bad transaction opheader flag 0x%x",
 					__func__, flags);
 				ASSERT(0);
-				error = XFS_ERROR(EIO);
-				break;
-			}
-			if (error) {
 				xlog_recover_free_trans(trans);
-				return error;
+				return EIO;
 			}
+			/*
+			 * If there's been an error, the trans structure has
+			 * already been freed. So there's nothing for us to do
+			 * but abort the recovery process.
+			 */
+			if (error)
+				return error;
 		}
 		dp += be32_to_cpu(ohead->oh_len);
 		num_logops--;
-- 
1.8.4.rc3

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 6/6] xfs: abort metadata writeback on permanent errors
  2013-12-12  5:34 [PATCH 0/6] xfs: fixes for 3.13-rc4 Dave Chinner
                   ` (4 preceding siblings ...)
  2013-12-12  5:34 ` [PATCH 5/6] xfs: xlog_recover_process_data leaks like a sieve Dave Chinner
@ 2013-12-12  5:34 ` Dave Chinner
  2013-12-13 12:33   ` Christoph Hellwig
  2013-12-17 16:02 ` [PATCH 0/6] xfs: fixes for 3.13-rc4 Ben Myers
  6 siblings, 1 reply; 26+ messages in thread
From: Dave Chinner @ 2013-12-12  5:34 UTC (permalink / raw)
  To: xfs

From: Dave Chinner <dchinner@redhat.com>

If we are doing aysnc writeback of metadata, we can get write errors
but have nobody to report them to. At the moment, we simply attempt
to reissue the write from io completion in the hope that it's a
transient error.

When it's not a transient error, the buffer is stuck forever in
this loop, and we cannot break out of it. Eventually, unmount will
hang because the AIL cannot be emptied and everything goes downhill
from them.

To solve this problem, only retry the write IO once before aborting
it. We don't throw the buffer away because some transient errors can
last minutes (e.g.  FC path failover) or even hours (thin
provisioned devices that have run out of backing space) before they
go away. Hence we really want to keep trying until we can't try any
more.

Because the buffer was not cleaned, however, it does not get removed
from the AIL and hence the next pass across the AIL will start IO on
it again. As such, we still get the "retry forever" semantics that
we currently have, but we allow other access to the buffer in the
mean time. Meanwhile the filesystem can continue to modify the
buffer and relog it, so the IO errors won't hang the log or the
filesystem.

Now when we are pushing the AIL, we can see all these "permanent IO
error" buffers and we can issue a warning about failures before we
retry the IO. We can also catch these buffers when unmounting an
issue a corruption warning, too.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/xfs_buf.c      | 10 ++++++++--
 fs/xfs/xfs_buf.h      |  6 +++++-
 fs/xfs/xfs_buf_item.c | 21 +++++++++++++++++++--
 3 files changed, 32 insertions(+), 5 deletions(-)

diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c
index 27dc152..402e050 100644
--- a/fs/xfs/xfs_buf.c
+++ b/fs/xfs/xfs_buf.c
@@ -1153,7 +1153,7 @@ xfs_bwrite(
 	ASSERT(xfs_buf_islocked(bp));
 
 	bp->b_flags |= XBF_WRITE;
-	bp->b_flags &= ~(XBF_ASYNC | XBF_READ | _XBF_DELWRI_Q);
+	bp->b_flags &= ~(XBF_ASYNC | XBF_READ | _XBF_DELWRI_Q | XBF_WRITE_FAIL);
 
 	xfs_bdstrat_cb(bp);
 
@@ -1525,6 +1525,12 @@ xfs_wait_buftarg(
 			struct xfs_buf *bp;
 			bp = list_first_entry(&dispose, struct xfs_buf, b_lru);
 			list_del_init(&bp->b_lru);
+			if (bp->b_flags & XBF_WRITE_FAIL) {
+				xfs_alert(btp->bt_mount,
+"Corruption Alert: Buffer at block 0x%llx had permanent write failures!\n"
+"Please run xfs_repair to determine the extent of the problem.",
+					(long long)bp->b_bn);
+			}
 			xfs_buf_rele(bp);
 		}
 		if (loop++ != 0)
@@ -1798,7 +1804,7 @@ __xfs_buf_delwri_submit(
 
 	blk_start_plug(&plug);
 	list_for_each_entry_safe(bp, n, io_list, b_list) {
-		bp->b_flags &= ~(_XBF_DELWRI_Q | XBF_ASYNC);
+		bp->b_flags &= ~(_XBF_DELWRI_Q | XBF_ASYNC | XBF_WRITE_FAIL);
 		bp->b_flags |= XBF_WRITE;
 
 		if (!wait) {
diff --git a/fs/xfs/xfs_buf.h b/fs/xfs/xfs_buf.h
index e656833..614097e 100644
--- a/fs/xfs/xfs_buf.h
+++ b/fs/xfs/xfs_buf.h
@@ -45,6 +45,7 @@ typedef enum {
 #define XBF_ASYNC	 (1 << 4) /* initiator will not wait for completion */
 #define XBF_DONE	 (1 << 5) /* all pages in the buffer uptodate */
 #define XBF_STALE	 (1 << 6) /* buffer has been staled, do not find it */
+#define XBF_WRITE_FAIL	 (1 << 24)/* async writes have failed on this buffer */
 
 /* I/O hints for the BIO layer */
 #define XBF_SYNCIO	 (1 << 10)/* treat this buffer as synchronous I/O */
@@ -70,6 +71,7 @@ typedef unsigned int xfs_buf_flags_t;
 	{ XBF_ASYNC,		"ASYNC" }, \
 	{ XBF_DONE,		"DONE" }, \
 	{ XBF_STALE,		"STALE" }, \
+	{ XBF_WRITE_FAIL,	"WRITE_FAIL" }, \
 	{ XBF_SYNCIO,		"SYNCIO" }, \
 	{ XBF_FUA,		"FUA" }, \
 	{ XBF_FLUSH,		"FLUSH" }, \
@@ -80,6 +82,7 @@ typedef unsigned int xfs_buf_flags_t;
 	{ _XBF_DELWRI_Q,	"DELWRI_Q" }, \
 	{ _XBF_COMPOUND,	"COMPOUND" }
 
+
 /*
  * Internal state flags.
  */
@@ -301,7 +304,8 @@ extern void xfs_buf_terminate(void);
 
 #define XFS_BUF_ZEROFLAGS(bp) \
 	((bp)->b_flags &= ~(XBF_READ|XBF_WRITE|XBF_ASYNC| \
-			    XBF_SYNCIO|XBF_FUA|XBF_FLUSH))
+			    XBF_SYNCIO|XBF_FUA|XBF_FLUSH| \
+			    XBF_WRITE_FAIL))
 
 void xfs_buf_stale(struct xfs_buf *bp);
 #define XFS_BUF_UNSTALE(bp)	((bp)->b_flags &= ~XBF_STALE)
diff --git a/fs/xfs/xfs_buf_item.c b/fs/xfs/xfs_buf_item.c
index a64f67b..2227b9b 100644
--- a/fs/xfs/xfs_buf_item.c
+++ b/fs/xfs/xfs_buf_item.c
@@ -496,6 +496,14 @@ xfs_buf_item_unpin(
 	}
 }
 
+/*
+ * Buffer IO error rate limiting. Limit it to no more than 10 messages per 30
+ * seconds so as to not spam logs too much on repeated detection of the same
+ * buffer being bad..
+ */
+
+DEFINE_RATELIMIT_STATE(xfs_buf_write_fail_rl_state, 30 * HZ, 10);
+
 STATIC uint
 xfs_buf_item_push(
 	struct xfs_log_item	*lip,
@@ -524,6 +532,14 @@ xfs_buf_item_push(
 
 	trace_xfs_buf_item_push(bip);
 
+	/* has a previous flush failed due to IO errors? */
+	if ((bp->b_flags & XBF_WRITE_FAIL) &&
+	    ___ratelimit(&xfs_buf_write_fail_rl_state, "XFS:")) {
+		xfs_warn(bp->b_target->bt_mount,
+"Detected failing async write on buffer block 0x%llx. Retrying async write.\n",
+			 (long long)bp->b_bn);
+	}
+
 	if (!xfs_buf_delwri_queue(bp, buffer_list))
 		rval = XFS_ITEM_FLUSHING;
 	xfs_buf_unlock(bp);
@@ -1096,8 +1112,9 @@ xfs_buf_iodone_callbacks(
 
 		xfs_buf_ioerror(bp, 0); /* errno of 0 unsets the flag */
 
-		if (!XFS_BUF_ISSTALE(bp)) {
-			bp->b_flags |= XBF_WRITE | XBF_ASYNC | XBF_DONE;
+		if (!(bp->b_flags & (XBF_STALE|XBF_WRITE_FAIL))) {
+			bp->b_flags |= XBF_WRITE | XBF_ASYNC |
+				       XBF_DONE | XBF_WRITE_FAIL;
 			xfs_buf_iorequest(bp);
 		} else {
 			xfs_buf_relse(bp);
-- 
1.8.4.rc3

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/6] xfs: don't try to mark uncached buffers stale on error.
  2013-12-12  5:34 ` [PATCH 1/6] xfs: don't try to mark uncached buffers stale on error Dave Chinner
@ 2013-12-12  9:30   ` Jeff Liu
  2013-12-12 10:09     ` Dave Chinner
  2013-12-12 16:36   ` Christoph Hellwig
  2013-12-13 13:02   ` Christoph Hellwig
  2 siblings, 1 reply; 26+ messages in thread
From: Jeff Liu @ 2013-12-12  9:30 UTC (permalink / raw)
  To: Dave Chinner, xfs

On 12/12 2013 13:34, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> fsstress failed during a shutdown with the following assert:
Looks fsstress is always a good aid to make file system cry... :-P

I wonder if here the shutdown is simulated via xfstests/src/godown or not,
but I can trigger another hang up issue with this patch via a test case
below(80% reproducible, I also ran the test against the left patches in
this series, this problem still exists):

#!/bin/bash

mkfs.xfs -f /dev/sda7

for ((i=0;i<10;i++))
do
    mount /dev/sda7 /xfs
    fsstress -d /xfs -n 1000 -p 100 >/dev/null 2>&1 &
    sleep 10
    godown /xfs
    wait
    killall -q fsstress
    umount /xfs
done

It seems there is no such kind of test cases in xfstestes for now, I'd
write one if required.

The backtraces were shown as following:

[  365.987493] INFO: task fsstress:3215 blocked for more than 120 seconds.
[  365.987499]       Tainted: PF          O 3.13.0-rc2+ #13
[  365.987500] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  365.987502] fsstress        D ffff88026f254440     0  3215   3142 0x00000000
[  365.987507]  ffff880253f19de0 0000000000000086 ffff880242071800 ffff880253f19fd8
[  365.987512]  0000000000014440 0000000000014440 ffff880242071800 ffff880073694c00
[  365.987515]  ffff880073694c80 ffff880073694c90 ffffffffffffffff 0000000000000292
[  365.987519] Call Trace:
[  365.987528]  [<ffffffff81718779>] schedule+0x29/0x70
[  365.987560]  [<ffffffffa0c2a49d>] xlog_cil_force_lsn+0x18d/0x1e0 [xfs]
[  365.987565]  [<ffffffff81097210>] ? wake_up_state+0x20/0x20
[  365.987570]  [<ffffffff811e8770>] ? do_fsync+0x80/0x80
[  365.987594]  [<ffffffffa0c28921>] _xfs_log_force+0x61/0x270 [xfs]
[  365.987599]  [<ffffffff812b0610>] ? jbd2_log_wait_commit+0x110/0x180
[  365.987603]  [<ffffffff810a83f0>] ? prepare_to_wait_event+0x100/0x100
[  365.987607]  [<ffffffff811e8770>] ? do_fsync+0x80/0x80
[  365.987629]  [<ffffffffa0c28b56>] xfs_log_force+0x26/0x80 [xfs]
[  365.987648]  [<ffffffffa0bcf35d>] xfs_fs_sync_fs+0x2d/0x50 [xfs]
[  365.987652]  [<ffffffff811e8790>] sync_fs_one_sb+0x20/0x30
[  365.987656]  [<ffffffff811bcc32>] iterate_supers+0xb2/0x110
[  365.987660]  [<ffffffff811e88c2>] sys_sync+0x62/0xa0
[  365.987665]  [<ffffffff81724ced>] system_call_fastpath+0x1a/0x1f
[  372.225302] XFS (sda7): xfs_log_force: error 5 returned.
[  402.275608] XFS (sda7): xfs_log_force: error 5 returned.
[  432.325929] XFS (sda7): xfs_log_force: error 5 returned.
[  462.376239] XFS (sda7): xfs_log_force: error 5 returned.
[  485.869059] INFO: task fsstress:3215 blocked for more than 120 seconds.
[  485.869065]       Tainted: PF          O 3.13.0-rc2+ #13
[  485.869066] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  485.869068] fsstress        D ffff88026f254440     0  3215   3142 0x00000000
[  485.869073]  ffff880253f19de0 0000000000000086 ffff880242071800 ffff880253f19fd8
[  485.869077]  0000000000014440 0000000000014440 ffff880242071800 ffff880073694c00
[  485.869081]  ffff880073694c80 ffff880073694c90 ffffffffffffffff 0000000000000292
[  485.869085] Call Trace:
[  485.869093]  [<ffffffff81718779>] schedule+0x29/0x70
[  485.869123]  [<ffffffffa0c2a49d>] xlog_cil_force_lsn+0x18d/0x1e0 [xfs]
[  485.869129]  [<ffffffff81097210>] ? wake_up_state+0x20/0x20
[  485.869135]  [<ffffffff811e8770>] ? do_fsync+0x80/0x80
[  485.869160]  [<ffffffffa0c28921>] _xfs_log_force+0x61/0x270 [xfs]
[  485.869167]  [<ffffffff812b0610>] ? jbd2_log_wait_commit+0x110/0x180
[  485.869172]  [<ffffffff810a83f0>] ? prepare_to_wait_event+0x100/0x100
[  485.869177]  [<ffffffff811e8770>] ? do_fsync+0x80/0x80
[  485.869199]  [<ffffffffa0c28b56>] xfs_log_force+0x26/0x80 [xfs]
[  485.869219]  [<ffffffffa0bcf35d>] xfs_fs_sync_fs+0x2d/0x50 [xfs]
[  485.869224]  [<ffffffff811e8790>] sync_fs_one_sb+0x20/0x30
[  485.869229]  [<ffffffff811bcc32>] iterate_supers+0xb2/0x110
[  485.869232]  [<ffffffff811e88c2>] sys_sync+0x62/0xa0
[  485.869237]  [<ffffffff81724ced>] system_call_fastpath+0x1a/0x1f
[  492.426514] XFS (sda7): xfs_log_force: error 5 returned.
[  522.476867] XFS (sda7): xfs_log_force: error 5 returned.
[  552.527131] XFS (sda7): xfs_log_force: error 5 returned.
[  582.577433] XFS (sda7): xfs_log_force: error 5 returned.


Thanks,
-Jeff
> 
> XFS: Assertion failed: xfs_buf_islocked(bp), file: fs/xfs/xfs_buf.c, line: 143
> .....
>  xfs_buf_stale+0x3f/0xf0
>  xfs_bioerror_relse+0x2d/0x90 
>  xfsbdstrat+0x51/0xa0
>  xfs_zero_remaining_bytes+0x1d1/0x2d0
>  xfs_free_file_space+0x5d0/0x600
>  xfs_change_file_space+0x251/0x3a0
>  xfs_ioc_space+0xcc/0x130
> .....
> 
> xfs_zero_remaining_bytes() works with uncached buffers, and hence if
> we are preventing IO due to a shutdown, we should not be marking it
> stale as that is only for cached buffers. Instead, just mark it with
> an error and make sure it gets to the caller.
> 
> [ 7732.193441] XFS: Assertion failed: xfs_buf_islocked(bp), file: fs/xfs/xfs_buf.c, line: 96
> [ 7732.195036] ------------[ cut here ]------------
> [ 7732.195890] kernel BUG at fs/xfs/xfs_message.c:107!
> [ 7732.196018] invalid opcode: 0000 [#1] SMP
> [ 7732.196018] Modules linked in:
> [ 7732.196018] CPU: 0 PID: 2899 Comm: fsstress Not tainted 3.12.0-rc7-dgc+ #47
> [ 7732.196018] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
> [ 7732.196018] task: ffff88003a83ada0 ti: ffff88002c0ae000 task.ti: ffff88002c0ae000
> [ 7732.196018] RIP: 0010:[<ffffffff81488462>]  [<ffffffff81488462>] assfail+0x22/0x30
> [ 7732.196018] RSP: 0000:ffff88002c0afae8  EFLAGS: 00010292
> [ 7732.196018] RAX: 000000000000004d RBX: ffff880002e59600 RCX: 0000000000000000
> [ 7732.196018] RDX: ffff88003fc0ed68 RSI: ffff88003fc0d3f8 RDI: 0000000000000246
> [ 7732.196018] RBP: ffff88002c0afae8 R08: 0000000000000096 R09: 00000000000012ec
> [ 7732.196018] R10: 0000000000000000 R11: 00000000000012eb R12: 0000000000100002
> [ 7732.196018] R13: ffffffff81473523 R14: 0000000000009fff R15: 0000000000009fff
> [ 7732.196018] FS:  00007f8017e4a700(0000) GS:ffff88003fc00000(0000) knlGS:0000000000000000
> [ 7732.196018] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> [ 7732.196018] CR2: 00007ff8d020a000 CR3: 0000000009b43000 CR4: 00000000000006f0
> [ 7732.196018] Stack:
> [ 7732.196018]  ffff88002c0afb08 ffffffff8147476e ffff880002e59600 0000000000100002
> [ 7732.196018]  ffff88002c0afb28 ffffffff814762cd ffff880002e59600 ffff880002e59600
> [ 7732.196018]  ffff88002c0afb58 ffffffff81476841 0000000000009f89 ffff880002e59600
> [ 7732.196018] Call Trace:
> [ 7732.196018]  [<ffffffff8147476e>] xfs_buf_stale+0x2e/0xb0
> [ 7732.196018]  [<ffffffff814762cd>] xfs_bioerror_relse+0x2d/0x90
> [ 7732.196018]  [<ffffffff81476841>] xfsbdstrat+0x51/0x90
> [ 7732.196018]  [<ffffffff81473523>] xfs_zero_remaining_bytes+0x1d3/0x2d0
> [ 7732.196018]  [<ffffffff81473ba6>] xfs_free_file_space+0x586/0x5b0
> [ 7732.196018]  [<ffffffff81190002>] ? slabs_cpu_partial_show+0xd2/0x120
> [ 7732.196018]  [<ffffffff811b9399>] ? mntput_no_expire+0x49/0x160
> [ 7732.196018]  [<ffffffff811b8812>] ? mnt_clone_write+0x12/0x30
> [ 7732.196018]  [<ffffffff81aba3f6>] ? down_write+0x16/0x40
> [ 7732.196018]  [<ffffffff81481c14>] xfs_ioc_space+0x2d4/0x450
> [ 7732.196018]  [<ffffffff811a700b>] ? path_lookupat+0x6b/0x760
> [ 7732.196018]  [<ffffffff8148f57e>] ? xfs_trans_free+0x6e/0x80
> [ 7732.196018]  [<ffffffff81192a81>] ? kmem_cache_alloc+0x31/0x150
> [ 7732.196018]  [<ffffffff8148326b>] xfs_file_ioctl+0x48b/0xae0
> [ 7732.196018]  [<ffffffff811a60d6>] ? final_putname+0x26/0x50
> [ 7732.196018]  [<ffffffff8110b6a2>] ? from_kgid+0x12/0x20
> [ 7732.196018]  [<ffffffff8110b6be>] ? from_kgid_munged+0xe/0x20
> [ 7732.196018]  [<ffffffff8119f856>] ? cp_new_stat+0x146/0x160
> [ 7732.196018]  [<ffffffff811ace02>] do_vfs_ioctl+0x452/0x530
> [ 7732.196018]  [<ffffffff8119fd35>] ? SYSC_newfstat+0x25/0x30
> [ 7732.196018]  [<ffffffff811acf71>] SyS_ioctl+0x91/0xb0
> [ 7732.196018]  [<ffffffff81ac6029>] system_call_fastpath+0x16/0x1b
> 
> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> ---
>  fs/xfs/xfs_buf.c | 47 ++++++++++++++++++++++++++++-------------------
>  1 file changed, 28 insertions(+), 19 deletions(-)
> 
> diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c
> index ce01c1a..27dc152 100644
> --- a/fs/xfs/xfs_buf.c
> +++ b/fs/xfs/xfs_buf.c
> @@ -1076,7 +1076,14 @@ xfs_bioerror(
>  	 */
>  	XFS_BUF_UNREAD(bp);
>  	XFS_BUF_UNDONE(bp);
> -	xfs_buf_stale(bp);
> +
> +	/*
> +	 * we might be handling an uncached buffer here, in which case the
> +	 * stale buffer handling is irrelevant as is doing IO with the buffer
> +	 * locked. Hence we don't mark them stale.
> +	 */
> +	if (bp->b_pag)
> +		xfs_buf_stale(bp);
>  
>  	xfs_buf_ioend(bp, 0);
>  
> @@ -1094,25 +1101,20 @@ xfs_bioerror_relse(
>  	struct xfs_buf	*bp)
>  {
>  	int64_t		fl = bp->b_flags;
> +
>  	/*
> -	 * No need to wait until the buffer is unpinned.
> -	 * We aren't flushing it.
> -	 *
> -	 * chunkhold expects B_DONE to be set, whether
> -	 * we actually finish the I/O or not. We don't want to
> -	 * change that interface.
> +	 * No need to wait until the buffer is unpinned. We aren't flushing it.
>  	 */
>  	XFS_BUF_UNREAD(bp);
>  	XFS_BUF_DONE(bp);
>  	xfs_buf_stale(bp);
>  	bp->b_iodone = NULL;
> +
> +	/*
> +	 * There's no reason to mark error for ASYNC buffers as there is no-one
> +	 * waiting to collect the error.
> +	 */
>  	if (!(fl & XBF_ASYNC)) {
> -		/*
> -		 * Mark b_error and B_ERROR _both_.
> -		 * Lot's of chunkcache code assumes that.
> -		 * There's no reason to mark error for
> -		 * ASYNC buffers.
> -		 */
>  		xfs_buf_ioerror(bp, EIO);
>  		complete(&bp->b_iowait);
>  	} else {
> @@ -1129,14 +1131,13 @@ xfs_bdstrat_cb(
>  	if (XFS_FORCED_SHUTDOWN(bp->b_target->bt_mount)) {
>  		trace_xfs_bdstrat_shut(bp, _RET_IP_);
>  		/*
> -		 * Metadata write that didn't get logged but
> -		 * written delayed anyway. These aren't associated
> -		 * with a transaction, and can be ignored.
> +		 * If this is a cached write, then it is likely to be a delayed
> +		 * write metadata buffer that can be ignored because the
> +		 * contents are logged.
>  		 */
>  		if (!bp->b_iodone && !XFS_BUF_ISREAD(bp))
>  			return xfs_bioerror_relse(bp);
> -		else
> -			return xfs_bioerror(bp);
> +		return xfs_bioerror(bp);
>  	}
>  
>  	xfs_buf_iorequest(bp);
> @@ -1176,7 +1177,15 @@ xfsbdstrat(
>  {
>  	if (XFS_FORCED_SHUTDOWN(mp)) {
>  		trace_xfs_bdstrat_shut(bp, _RET_IP_);
> -		xfs_bioerror_relse(bp);
> +		/*
> +		 * we could be handling uncached IO here, in which case there is
> +		 * always a caller waiting to collect the error and releas the
> +		 * buffer.
> +		 */
> +		if (bp->b_pag)
> +			xfs_bioerror_relse(bp);
> +		else
> +			xfs_bioerror(bp);
>  		return;
>  	}
>  
> 

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/6] xfs: don't try to mark uncached buffers stale on error.
  2013-12-12  9:30   ` Jeff Liu
@ 2013-12-12 10:09     ` Dave Chinner
  2013-12-13  4:47       ` Jeff Liu
  0 siblings, 1 reply; 26+ messages in thread
From: Dave Chinner @ 2013-12-12 10:09 UTC (permalink / raw)
  To: Jeff Liu; +Cc: xfs

On Thu, Dec 12, 2013 at 05:30:14PM +0800, Jeff Liu wrote:
> On 12/12 2013 13:34, Dave Chinner wrote:
> > From: Dave Chinner <dchinner@redhat.com>
> > 
> > fsstress failed during a shutdown with the following assert:
> Looks fsstress is always a good aid to make file system cry... :-P

Heh. That it is. :)


> I wonder if here the shutdown is simulated via xfstests/src/godown or not,
> but I can trigger another hang up issue with this patch via a test case
> below(80% reproducible, I also ran the test against the left patches in
> this series, this problem still exists):

I occassionally get that hang, too. It's a "once a month" type hang,
and I've never beenable to reproduce it reliably enough to debug.

> #!/bin/bash
> 
> mkfs.xfs -f /dev/sda7
> 
> for ((i=0;i<10;i++))
> do
>     mount /dev/sda7 /xfs
>     fsstress -d /xfs -n 1000 -p 100 >/dev/null 2>&1 &
>     sleep 10
>     godown /xfs
>     wait
>     killall -q fsstress
>     umount /xfs
> done
> 
> It seems there is no such kind of test cases in xfstestes for now, I'd
> write one if required.

nothing quite that generic - xfs/087 does a loop like that over
different log configurations, but that's testing log recovery more
than shutdown sanity. Adding that test would be a good idea - it's a
shame no other filesystem supports a shutdown like XFS does....

> The backtraces were shown as following:
> 
> [  365.987493] INFO: task fsstress:3215 blocked for more than 120 seconds.
> [  365.987499]       Tainted: PF          O 3.13.0-rc2+ #13
> [  365.987500] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [  365.987502] fsstress        D ffff88026f254440     0  3215   3142 0x00000000
> [  365.987507]  ffff880253f19de0 0000000000000086 ffff880242071800 ffff880253f19fd8
> [  365.987512]  0000000000014440 0000000000014440 ffff880242071800 ffff880073694c00
> [  365.987515]  ffff880073694c80 ffff880073694c90 ffffffffffffffff 0000000000000292
> [  365.987519] Call Trace:
> [  365.987528]  [<ffffffff81718779>] schedule+0x29/0x70
> [  365.987560]  [<ffffffffa0c2a49d>] xlog_cil_force_lsn+0x18d/0x1e0 [xfs]
> [  365.987565]  [<ffffffff81097210>] ? wake_up_state+0x20/0x20
> [  365.987570]  [<ffffffff811e8770>] ? do_fsync+0x80/0x80
> [  365.987594]  [<ffffffffa0c28921>] _xfs_log_force+0x61/0x270 [xfs]
> [  365.987599]  [<ffffffff812b0610>] ? jbd2_log_wait_commit+0x110/0x180
> [  365.987603]  [<ffffffff810a83f0>] ? prepare_to_wait_event+0x100/0x100
> [  365.987607]  [<ffffffff811e8770>] ? do_fsync+0x80/0x80
> [  365.987629]  [<ffffffffa0c28b56>] xfs_log_force+0x26/0x80 [xfs]
> [  365.987648]  [<ffffffffa0bcf35d>] xfs_fs_sync_fs+0x2d/0x50 [xfs]
> [  365.987652]  [<ffffffff811e8790>] sync_fs_one_sb+0x20/0x30
> [  365.987656]  [<ffffffff811bcc32>] iterate_supers+0xb2/0x110
> [  365.987660]  [<ffffffff811e88c2>] sys_sync+0x62/0xa0
> [  365.987665]  [<ffffffff81724ced>] system_call_fastpath+0x1a/0x1f
> [  372.225302] XFS (sda7): xfs_log_force: error 5 returned.
> [  402.275608] XFS (sda7): xfs_log_force: error 5 returned.
> [  432.325929] XFS (sda7): xfs_log_force: error 5 returned.
> [  462.376239] XFS (sda7): xfs_log_force: error 5 returned.

So what we see here is that there is a race condition somewhere in
the shutdown code. The shutdown is supposed to wake everyone waiting
of the ic_force_wait wait queue on each iclog, but for some reason
that hasn't happened. The sleepers check for XLOG_STATE_IOERROR
(which is set during the force shutdown before we wake ic_force_wait
sleepers) before they go to sleep, so whatever the race is it isn't
immediately obvious to me.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/6] xfs: don't try to mark uncached buffers stale on error.
  2013-12-12  5:34 ` [PATCH 1/6] xfs: don't try to mark uncached buffers stale on error Dave Chinner
  2013-12-12  9:30   ` Jeff Liu
@ 2013-12-12 16:36   ` Christoph Hellwig
  2013-12-12 22:24     ` Dave Chinner
  2013-12-13 13:02   ` Christoph Hellwig
  2 siblings, 1 reply; 26+ messages in thread
From: Christoph Hellwig @ 2013-12-12 16:36 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

I really don't like how this makes even more of a mess out of the
already convoluted xfs_bioerror/xfs_bioerror_else maze.  Can we
maybe first merge them and document the difference before adding
even more special case branches?

Also most uses of uncached buffers use xfsbdstrat, where we can do
error handling straight in the caller instead of playing with all
the flags manipulation mess.  In all these cases no one but the
caller can find these buffers anyway, so doing all this on an
I/O error is pointless.

The only buffer where any of this matters is the superblock one,
and given that we re-read it on mount anyway I wonder if we should
just make it a regular buffer again and let all this mess just
disappear.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/6] xfs: don't try to mark uncached buffers stale on error.
  2013-12-12 16:36   ` Christoph Hellwig
@ 2013-12-12 22:24     ` Dave Chinner
  2013-12-13 11:01       ` Christoph Hellwig
  0 siblings, 1 reply; 26+ messages in thread
From: Dave Chinner @ 2013-12-12 22:24 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: xfs

On Thu, Dec 12, 2013 at 08:36:29AM -0800, Christoph Hellwig wrote:
> I really don't like how this makes even more of a mess out of the
> already convoluted xfs_bioerror/xfs_bioerror_else maze.  Can we
> maybe first merge them and document the difference before adding
> even more special case branches?
> 
> Also most uses of uncached buffers use xfsbdstrat, where we can do
> error handling straight in the caller instead of playing with all
> the flags manipulation mess.  In all these cases no one but the
> caller can find these buffers anyway, so doing all this on an
> I/O error is pointless.
> 
> The only buffer where any of this matters is the superblock one,
> and given that we re-read it on mount anyway I wonder if we should
> just make it a regular buffer again and let all this mess just
> disappear.

Ok, I agree it is a bit messy, but that code is already pretty ugly.
I'd like to get this fix in first, because it's causing oopses in
roughly 30% of my local xfstests runs on a couple of VMs, so I'd
prefer to get the fix out there now and do the cleanup as a separate
patch series. Would that be an acceptible approach to take here from
your perspective?

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/6] xfs: don't try to mark uncached buffers stale on error.
  2013-12-12 10:09     ` Dave Chinner
@ 2013-12-13  4:47       ` Jeff Liu
  0 siblings, 0 replies; 26+ messages in thread
From: Jeff Liu @ 2013-12-13  4:47 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs


On 12/12 2013 18:09 PM, Dave Chinner wrote:
> On Thu, Dec 12, 2013 at 05:30:14PM +0800, Jeff Liu wrote:
>> On 12/12 2013 13:34, Dave Chinner wrote:
>>> From: Dave Chinner <dchinner@redhat.com>
<snip>
>>
>> It seems there is no such kind of test cases in xfstestes for now, I'd
>> write one if required.
> 
> nothing quite that generic - xfs/087 does a loop like that over
> different log configurations, but that's testing log recovery more
> than shutdown sanity. Adding that test would be a good idea - it's a
> shame no other filesystem supports a shutdown like XFS does....
This is really an unique feature of us :), I'll write a case so.
>
>> The backtraces were shown as following:
>>
>> [  365.987493] INFO: task fsstress:3215 blocked for more than 120 seconds.
>> [  365.987499]       Tainted: PF          O 3.13.0-rc2+ #13
>> [  365.987500] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>> [  365.987502] fsstress        D ffff88026f254440     0  3215   3142 0x00000000
>> [  365.987507]  ffff880253f19de0 0000000000000086 ffff880242071800 ffff880253f19fd8
>> [  365.987512]  0000000000014440 0000000000014440 ffff880242071800 ffff880073694c00
>> [  365.987515]  ffff880073694c80 ffff880073694c90 ffffffffffffffff 0000000000000292
>> [  365.987519] Call Trace:
>> [  365.987528]  [<ffffffff81718779>] schedule+0x29/0x70
>> [  365.987560]  [<ffffffffa0c2a49d>] xlog_cil_force_lsn+0x18d/0x1e0 [xfs]
>> [  365.987565]  [<ffffffff81097210>] ? wake_up_state+0x20/0x20
>> [  365.987570]  [<ffffffff811e8770>] ? do_fsync+0x80/0x80
>> [  365.987594]  [<ffffffffa0c28921>] _xfs_log_force+0x61/0x270 [xfs]
>> [  365.987599]  [<ffffffff812b0610>] ? jbd2_log_wait_commit+0x110/0x180
>> [  365.987603]  [<ffffffff810a83f0>] ? prepare_to_wait_event+0x100/0x100
>> [  365.987607]  [<ffffffff811e8770>] ? do_fsync+0x80/0x80
>> [  365.987629]  [<ffffffffa0c28b56>] xfs_log_force+0x26/0x80 [xfs]
>> [  365.987648]  [<ffffffffa0bcf35d>] xfs_fs_sync_fs+0x2d/0x50 [xfs]
>> [  365.987652]  [<ffffffff811e8790>] sync_fs_one_sb+0x20/0x30
>> [  365.987656]  [<ffffffff811bcc32>] iterate_supers+0xb2/0x110
>> [  365.987660]  [<ffffffff811e88c2>] sys_sync+0x62/0xa0
>> [  365.987665]  [<ffffffff81724ced>] system_call_fastpath+0x1a/0x1f
>> [  372.225302] XFS (sda7): xfs_log_force: error 5 returned.
>> [  402.275608] XFS (sda7): xfs_log_force: error 5 returned.
>> [  432.325929] XFS (sda7): xfs_log_force: error 5 returned.
>> [  462.376239] XFS (sda7): xfs_log_force: error 5 returned.
> 
> So what we see here is that there is a race condition somewhere in
> the shutdown code. The shutdown is supposed to wake everyone waiting
> of the ic_force_wait wait queue on each iclog, but for some reason
> that hasn't happened. The sleepers check for XLOG_STATE_IOERROR
> (which is set during the force shutdown before we wake ic_force_wait
> sleepers) before they go to sleep, so whatever the race is it isn't
> immediately obvious to me.
Now I basically can always reproducing this problem on SSD, so I'm
going to get involved in tracing down it.

Thanks,
-Jeff

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/6] xfs: don't try to mark uncached buffers stale on error.
  2013-12-12 22:24     ` Dave Chinner
@ 2013-12-13 11:01       ` Christoph Hellwig
  0 siblings, 0 replies; 26+ messages in thread
From: Christoph Hellwig @ 2013-12-13 11:01 UTC (permalink / raw)
  To: Dave Chinner; +Cc: Christoph Hellwig, xfs

On Fri, Dec 13, 2013 at 09:24:38AM +1100, Dave Chinner wrote:
> Ok, I agree it is a bit messy, but that code is already pretty ugly.
> I'd like to get this fix in first, because it's causing oopses in
> roughly 30% of my local xfstests runs on a couple of VMs, so I'd
> prefer to get the fix out there now and do the cleanup as a separate
> patch series. Would that be an acceptible approach to take here from
> your perspective?

Ok, let's put the minimal fix in for now and then sort it out.  I'll
re-review it carefully.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 4/6] xfs: swalloc doesn't align allocations properly
  2013-12-12  5:34 ` [PATCH 4/6] xfs: swalloc doesn't align allocations properly Dave Chinner
@ 2013-12-13 12:01   ` Christoph Hellwig
  2013-12-16 23:14     ` Ben Myers
  0 siblings, 1 reply; 26+ messages in thread
From: Christoph Hellwig @ 2013-12-13 12:01 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

Looks good.

Reviewed-by: Christoph Hellwig <hch@lst.de>

Two very minor nitpicks below:

> +	int		stripe_align;
>  
>  	ASSERT(ap->length);
>  
>  	mp = ap->ip->i_mount;
> +
> +	/* stripe alignment for allocation is determined by mount parameters */
> +	stripe_align = 0;
> +	if (mp->m_swidth && (mp->m_flags & XFS_MOUNT_SWALLOC))
> +		stripe_align = mp->m_swidth;
> +	else if (mp->m_dalign)
> +		stripe_align = mp->m_dalign;

nipick: I'd either initialize the variable to zero at the point of the
declaration or do if .. else if .. else here.

>  	}
> +
> +
>  	nullfb = *ap->firstblock == NULLFSBLOCK;

Two newlines seem odd here.  I'd support one even if that's an unrelated
change :)

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 5/6] xfs: xlog_recover_process_data leaks like a sieve
  2013-12-12  5:34 ` [PATCH 5/6] xfs: xlog_recover_process_data leaks like a sieve Dave Chinner
@ 2013-12-13 12:32   ` Christoph Hellwig
  2013-12-13 22:11     ` Dave Chinner
  0 siblings, 1 reply; 26+ messages in thread
From: Christoph Hellwig @ 2013-12-13 12:32 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

On Thu, Dec 12, 2013 at 04:34:37PM +1100, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> Fix the double free of the transaction structure introduced by
> commit 2a84108 ("xfs: free the list of recovery items on error").
> In the process, make the freeing of the trans structure on error or
> completion of processing consistent - i.e. the responsibility of the
> the function that detected the error or completes processing. Add
> comments to document this behaviour so it can be maintained more
> easily in future.

I don't really understand why we'd want to push the freeing into
more low-level functions.

e.g. keeping it in xlog_recover_process_data vs the low-level
functions called by it not only reduces the amount of code, but also
is way more logical as we lookup trans there, so freeing it seems
more logical as well.

> +			if (trans)
> +				xlog_recover_free_trans(trans);

goto out_free_trans;

>  			if (dp + be32_to_cpu(ohead->oh_len) > lp) {
> -				xfs_warn(log->l_mp, "%s: bad length 0x%x",
> +				xfs_warn(log->l_mp,
> +				"%s: bad transaction opheader length 0x%x",
>  					__func__, be32_to_cpu(ohead->oh_len));
>  				WARN_ON(1);
> -				return (XFS_ERROR(EIO));
> +				xlog_recover_free_trans(trans);

goto out_free_trans;

> +			/*
> +			 * If there's been an error, the trans structure has
> +			 * already been freed. So there's nothing for us to do
> +			 * but abort the recovery process.
> +			 */
> +			if (error)
> +				return error;

To me it seems we'd be better off doing a goto out_free_trans here
aswell, then remove the existing call to xlog_recover_free_trans in
xlog_recover_commit_trans for the error case, and keep it out of
xlog_recover_add_to_trans.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 6/6] xfs: abort metadata writeback on permanent errors
  2013-12-12  5:34 ` [PATCH 6/6] xfs: abort metadata writeback on permanent errors Dave Chinner
@ 2013-12-13 12:33   ` Christoph Hellwig
  0 siblings, 0 replies; 26+ messages in thread
From: Christoph Hellwig @ 2013-12-13 12:33 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

Looks good,

Reviewed-by: Christoph Hellwig <hch@lst.de>

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/6] xfs: don't try to mark uncached buffers stale on error.
  2013-12-12  5:34 ` [PATCH 1/6] xfs: don't try to mark uncached buffers stale on error Dave Chinner
  2013-12-12  9:30   ` Jeff Liu
  2013-12-12 16:36   ` Christoph Hellwig
@ 2013-12-13 13:02   ` Christoph Hellwig
  2013-12-16 22:44     ` Ben Myers
  2 siblings, 1 reply; 26+ messages in thread
From: Christoph Hellwig @ 2013-12-13 13:02 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

I think the real problem here is not nessecarily marking uncached
buffers as stale, but marking unlocked buffers as stale.  This kinda
overlaps because we only really ever do I/O on unlocked buffers if they
are uncached too as it would be safe otherwise.

I think this is easily fixable by never calling xfsbdstrat on unlocked
buffers, and as an extension simply killing xfsbdstrat as it's already
fairly useless.  The patch below replaces all calls of xfsbdstrat with
trivial error handling for the callers that have local uncached buffers,
and opencodes it in the one remaining other caller.  There's a lot left
to be cleaned up in this area, but this seems like the least invasive
patch that doesn't cause more of a mess.

---
From: Christoph Hellwig <hch@infradead.org>
Subject: [PATCH] xfs: remove xfsbdstrat

The xfsbdstrat helper is a small but useless wrapper for xfs_buf_iorequest that
handles the case of a shut down filesystem.  Most of the users have private,
uncached buffers that can just be freed in this case, but the complex error
handling in xfs_bioerror_relse messes up the case when it's called without
a locked buffer.

Remove xfsbdstrat and opencode the error handling in the callers.  All but
one can simply return an error and don't need to deal with buffer state,
and the one caller that cares about the buffer state could do with a major
cleanup as well, but we'll defer that to later.

Signed-off-by: Christoph Hellwig <hch@lst.de>

diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c
index 5887e41..1394106 100644
--- a/fs/xfs/xfs_bmap_util.c
+++ b/fs/xfs/xfs_bmap_util.c
@@ -1187,7 +1187,12 @@ xfs_zero_remaining_bytes(
 		XFS_BUF_UNWRITE(bp);
 		XFS_BUF_READ(bp);
 		XFS_BUF_SET_ADDR(bp, xfs_fsb_to_db(ip, imap.br_startblock));
-		xfsbdstrat(mp, bp);
+
+		if (XFS_FORCED_SHUTDOWN(mp)) {
+			error = XFS_ERROR(EIO);
+			break;
+		}
+		xfs_buf_iorequest(bp);
 		error = xfs_buf_iowait(bp);
 		if (error) {
 			xfs_buf_ioerror_alert(bp,
@@ -1200,7 +1205,12 @@ xfs_zero_remaining_bytes(
 		XFS_BUF_UNDONE(bp);
 		XFS_BUF_UNREAD(bp);
 		XFS_BUF_WRITE(bp);
-		xfsbdstrat(mp, bp);
+
+		if (XFS_FORCED_SHUTDOWN(mp)) {
+			error = XFS_ERROR(EIO);
+			break;
+		}
+		xfs_buf_iorequest(bp);
 		error = xfs_buf_iowait(bp);
 		if (error) {
 			xfs_buf_ioerror_alert(bp,
diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c
index ce01c1a..544315e 100644
--- a/fs/xfs/xfs_buf.c
+++ b/fs/xfs/xfs_buf.c
@@ -698,7 +698,11 @@ xfs_buf_read_uncached(
 	bp->b_flags |= XBF_READ;
 	bp->b_ops = ops;
 
-	xfsbdstrat(target->bt_mount, bp);
+	if (XFS_FORCED_SHUTDOWN(target->bt_mount)) {
+		xfs_buf_relse(bp);
+		return NULL;
+	}
+	xfs_buf_iorequest(bp);
 	xfs_buf_iowait(bp);
 	return bp;
 }
@@ -1089,7 +1093,7 @@ xfs_bioerror(
  * This is meant for userdata errors; metadata bufs come with
  * iodone functions attached, so that we can track down errors.
  */
-STATIC int
+int
 xfs_bioerror_relse(
 	struct xfs_buf	*bp)
 {
@@ -1164,25 +1168,6 @@ xfs_bwrite(
 	return error;
 }
 
-/*
- * Wrapper around bdstrat so that we can stop data from going to disk in case
- * we are shutting down the filesystem.  Typically user data goes thru this
- * path; one of the exceptions is the superblock.
- */
-void
-xfsbdstrat(
-	struct xfs_mount	*mp,
-	struct xfs_buf		*bp)
-{
-	if (XFS_FORCED_SHUTDOWN(mp)) {
-		trace_xfs_bdstrat_shut(bp, _RET_IP_);
-		xfs_bioerror_relse(bp);
-		return;
-	}
-
-	xfs_buf_iorequest(bp);
-}
-
 STATIC void
 _xfs_buf_ioend(
 	xfs_buf_t		*bp,
diff --git a/fs/xfs/xfs_buf.h b/fs/xfs/xfs_buf.h
index e656833..7e41b08 100644
--- a/fs/xfs/xfs_buf.h
+++ b/fs/xfs/xfs_buf.h
@@ -269,9 +269,6 @@ extern void xfs_buf_unlock(xfs_buf_t *);
 
 /* Buffer Read and Write Routines */
 extern int xfs_bwrite(struct xfs_buf *bp);
-
-extern void xfsbdstrat(struct xfs_mount *, struct xfs_buf *);
-
 extern void xfs_buf_ioend(xfs_buf_t *,	int);
 extern void xfs_buf_ioerror(xfs_buf_t *, int);
 extern void xfs_buf_ioerror_alert(struct xfs_buf *, const char *func);
@@ -282,6 +279,8 @@ extern void xfs_buf_iomove(xfs_buf_t *, size_t, size_t, void *,
 #define xfs_buf_zero(bp, off, len) \
 	    xfs_buf_iomove((bp), (off), (len), NULL, XBRW_ZERO)
 
+extern int xfs_bioerror_relse(struct xfs_buf *);
+
 static inline int xfs_buf_geterror(xfs_buf_t *bp)
 {
 	return bp ? bp->b_error : ENOMEM;
diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c
index 07ab52c..73c1493 100644
--- a/fs/xfs/xfs_log_recover.c
+++ b/fs/xfs/xfs_log_recover.c
@@ -193,7 +193,10 @@ xlog_bread_noalign(
 	bp->b_io_length = nbblks;
 	bp->b_error = 0;
 
-	xfsbdstrat(log->l_mp, bp);
+	if (XFS_FORCED_SHUTDOWN(log->l_mp))
+		return XFS_ERROR(EIO);
+
+	xfs_buf_iorequest(bp);
 	error = xfs_buf_iowait(bp);
 	if (error)
 		xfs_buf_ioerror_alert(bp, __func__);
@@ -4408,7 +4411,11 @@ xlog_do_recover(
 	XFS_BUF_READ(bp);
 	XFS_BUF_UNASYNC(bp);
 	bp->b_ops = &xfs_sb_buf_ops;
-	xfsbdstrat(log->l_mp, bp);
+
+	if (XFS_FORCED_SHUTDOWN(log->l_mp))
+		return XFS_ERROR(EIO);
+
+	xfs_buf_iorequest(bp);
 	error = xfs_buf_iowait(bp);
 	if (error) {
 		xfs_buf_ioerror_alert(bp, __func__);
diff --git a/fs/xfs/xfs_trans_buf.c b/fs/xfs/xfs_trans_buf.c
index c035d11..647b6f1 100644
--- a/fs/xfs/xfs_trans_buf.c
+++ b/fs/xfs/xfs_trans_buf.c
@@ -314,7 +314,18 @@ xfs_trans_read_buf_map(
 			ASSERT(bp->b_iodone == NULL);
 			XFS_BUF_READ(bp);
 			bp->b_ops = ops;
-			xfsbdstrat(tp->t_mountp, bp);
+
+			/*
+			 * XXX(hch): clean up the error handling here to be less
+			 * of a mess..
+			 */
+			if (XFS_FORCED_SHUTDOWN(mp)) {
+				trace_xfs_bdstrat_shut(bp, _RET_IP_);
+				xfs_bioerror_relse(bp);
+			} else {
+				xfs_buf_iorequest(bp);
+			}
+
 			error = xfs_buf_iowait(bp);
 			if (error) {
 				xfs_buf_ioerror_alert(bp, __func__);

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH 5/6] xfs: xlog_recover_process_data leaks like a sieve
  2013-12-13 12:32   ` Christoph Hellwig
@ 2013-12-13 22:11     ` Dave Chinner
  2013-12-16 15:23       ` Christoph Hellwig
  0 siblings, 1 reply; 26+ messages in thread
From: Dave Chinner @ 2013-12-13 22:11 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: xfs

On Fri, Dec 13, 2013 at 04:32:05AM -0800, Christoph Hellwig wrote:
> On Thu, Dec 12, 2013 at 04:34:37PM +1100, Dave Chinner wrote:
> > From: Dave Chinner <dchinner@redhat.com>
> > 
> > Fix the double free of the transaction structure introduced by
> > commit 2a84108 ("xfs: free the list of recovery items on error").
> > In the process, make the freeing of the trans structure on error or
> > completion of processing consistent - i.e. the responsibility of the
> > the function that detected the error or completes processing. Add
> > comments to document this behaviour so it can be maintained more
> > easily in future.
> 
> I don't really understand why we'd want to push the freeing into
> more low-level functions.
> 
> e.g. keeping it in xlog_recover_process_data vs the low-level
> functions called by it not only reduces the amount of code, but also
> is way more logical as we lookup trans there, so freeing it seems
> more logical as well.
> 
> > +			if (trans)
> > +				xlog_recover_free_trans(trans);
> 
> goto out_free_trans;
> 
> >  			if (dp + be32_to_cpu(ohead->oh_len) > lp) {
> > -				xfs_warn(log->l_mp, "%s: bad length 0x%x",
> > +				xfs_warn(log->l_mp,
> > +				"%s: bad transaction opheader length 0x%x",
> >  					__func__, be32_to_cpu(ohead->oh_len));
> >  				WARN_ON(1);
> > -				return (XFS_ERROR(EIO));
> > +				xlog_recover_free_trans(trans);
> 
> goto out_free_trans;
> 
> > +			/*
> > +			 * If there's been an error, the trans structure has
> > +			 * already been freed. So there's nothing for us to do
> > +			 * but abort the recovery process.
> > +			 */
> > +			if (error)
> > +				return error;
> 
> To me it seems we'd be better off doing a goto out_free_trans here
> aswell, then remove the existing call to xlog_recover_free_trans in
> xlog_recover_commit_trans for the error case, and keep it out of
> xlog_recover_add_to_trans.

I'll rework it, but hte main issue is that it has to be freed
regardless of the error value in commit record processing, so it's
not as simple as just freeing it on error....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 5/6] xfs: xlog_recover_process_data leaks like a sieve
  2013-12-13 22:11     ` Dave Chinner
@ 2013-12-16 15:23       ` Christoph Hellwig
  2013-12-17 17:58         ` Mark Tinguely
  0 siblings, 1 reply; 26+ messages in thread
From: Christoph Hellwig @ 2013-12-16 15:23 UTC (permalink / raw)
  To: Dave Chinner; +Cc: Christoph Hellwig, xfs

On Sat, Dec 14, 2013 at 09:11:02AM +1100, Dave Chinner wrote:
> I'll rework it, but hte main issue is that it has to be freed
> regardless of the error value in commit record processing, so it's
> not as simple as just freeing it on error....

Indeed, but think that's another reason to move the freeing to
xlog_recover_process_data.  Right now or with the proposed patch
xlog_recover_commit_trans frees trans, but there's nothing that
breaks out of the loop in xlog_recover_process_data after that case.

By moving the freeing there for all cases we can add a corruption
check for that case, and we have a single function that controls
the lifetime of the xlog_recover structure.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/6] xfs: don't try to mark uncached buffers stale on error.
  2013-12-13 13:02   ` Christoph Hellwig
@ 2013-12-16 22:44     ` Ben Myers
  2013-12-17  8:03       ` [PATCH v2] xfs: remove xfsbdstrat error Christoph Hellwig
  0 siblings, 1 reply; 26+ messages in thread
From: Ben Myers @ 2013-12-16 22:44 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: xfs

On Fri, Dec 13, 2013 at 05:02:30AM -0800, Christoph Hellwig wrote:
> I think the real problem here is not nessecarily marking uncached
> buffers as stale, but marking unlocked buffers as stale.  This kinda
> overlaps because we only really ever do I/O on unlocked buffers if they
> are uncached too as it would be safe otherwise.
> 
> I think this is easily fixable by never calling xfsbdstrat on unlocked
> buffers, and as an extension simply killing xfsbdstrat as it's already
> fairly useless.  The patch below replaces all calls of xfsbdstrat with
> trivial error handling for the callers that have local uncached buffers,
> and opencodes it in the one remaining other caller.  There's a lot left
> to be cleaned up in this area, but this seems like the least invasive
> patch that doesn't cause more of a mess.
> 
> ---
> From: Christoph Hellwig <hch@infradead.org>
> Subject: [PATCH] xfs: remove xfsbdstrat
> 
> The xfsbdstrat helper is a small but useless wrapper for xfs_buf_iorequest that
> handles the case of a shut down filesystem.  Most of the users have private,
> uncached buffers that can just be freed in this case, but the complex error
> handling in xfs_bioerror_relse messes up the case when it's called without
> a locked buffer.
> 
> Remove xfsbdstrat and opencode the error handling in the callers.  All but
> one can simply return an error and don't need to deal with buffer state,
> and the one caller that cares about the buffer state could do with a major
> cleanup as well, but we'll defer that to later.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> 
> diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c
> index 5887e41..1394106 100644
> --- a/fs/xfs/xfs_bmap_util.c
> +++ b/fs/xfs/xfs_bmap_util.c
> @@ -1187,7 +1187,12 @@ xfs_zero_remaining_bytes(
>  		XFS_BUF_UNWRITE(bp);
>  		XFS_BUF_READ(bp);
>  		XFS_BUF_SET_ADDR(bp, xfs_fsb_to_db(ip, imap.br_startblock));
> -		xfsbdstrat(mp, bp);
> +
> +		if (XFS_FORCED_SHUTDOWN(mp)) {
> +			error = XFS_ERROR(EIO);
> +			break;
> +		}
> +		xfs_buf_iorequest(bp);
>  		error = xfs_buf_iowait(bp);
>  		if (error) {
>  			xfs_buf_ioerror_alert(bp,
> @@ -1200,7 +1205,12 @@ xfs_zero_remaining_bytes(
>  		XFS_BUF_UNDONE(bp);
>  		XFS_BUF_UNREAD(bp);
>  		XFS_BUF_WRITE(bp);
> -		xfsbdstrat(mp, bp);
> +
> +		if (XFS_FORCED_SHUTDOWN(mp)) {
> +			error = XFS_ERROR(EIO);
> +			break;
> +		}
> +		xfs_buf_iorequest(bp);
>  		error = xfs_buf_iowait(bp);
>  		if (error) {
>  			xfs_buf_ioerror_alert(bp,
> diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c
> index ce01c1a..544315e 100644
> --- a/fs/xfs/xfs_buf.c
> +++ b/fs/xfs/xfs_buf.c
> @@ -698,7 +698,11 @@ xfs_buf_read_uncached(
>  	bp->b_flags |= XBF_READ;
>  	bp->b_ops = ops;
>  
> -	xfsbdstrat(target->bt_mount, bp);
> +	if (XFS_FORCED_SHUTDOWN(target->bt_mount)) {
> +		xfs_buf_relse(bp);
> +		return NULL;
> +	}
> +	xfs_buf_iorequest(bp);
>  	xfs_buf_iowait(bp);
>  	return bp;
>  }
> @@ -1089,7 +1093,7 @@ xfs_bioerror(
>   * This is meant for userdata errors; metadata bufs come with
>   * iodone functions attached, so that we can track down errors.
>   */
> -STATIC int
> +int
>  xfs_bioerror_relse(
>  	struct xfs_buf	*bp)
>  {
> @@ -1164,25 +1168,6 @@ xfs_bwrite(
>  	return error;
>  }
>  
> -/*
> - * Wrapper around bdstrat so that we can stop data from going to disk in case
> - * we are shutting down the filesystem.  Typically user data goes thru this
> - * path; one of the exceptions is the superblock.
> - */
> -void
> -xfsbdstrat(
> -	struct xfs_mount	*mp,
> -	struct xfs_buf		*bp)
> -{
> -	if (XFS_FORCED_SHUTDOWN(mp)) {
> -		trace_xfs_bdstrat_shut(bp, _RET_IP_);
> -		xfs_bioerror_relse(bp);
> -		return;
> -	}
> -
> -	xfs_buf_iorequest(bp);
> -}
> -
>  STATIC void
>  _xfs_buf_ioend(
>  	xfs_buf_t		*bp,
> diff --git a/fs/xfs/xfs_buf.h b/fs/xfs/xfs_buf.h
> index e656833..7e41b08 100644
> --- a/fs/xfs/xfs_buf.h
> +++ b/fs/xfs/xfs_buf.h
> @@ -269,9 +269,6 @@ extern void xfs_buf_unlock(xfs_buf_t *);
>  
>  /* Buffer Read and Write Routines */
>  extern int xfs_bwrite(struct xfs_buf *bp);
> -
> -extern void xfsbdstrat(struct xfs_mount *, struct xfs_buf *);
> -
>  extern void xfs_buf_ioend(xfs_buf_t *,	int);
>  extern void xfs_buf_ioerror(xfs_buf_t *, int);
>  extern void xfs_buf_ioerror_alert(struct xfs_buf *, const char *func);
> @@ -282,6 +279,8 @@ extern void xfs_buf_iomove(xfs_buf_t *, size_t, size_t, void *,
>  #define xfs_buf_zero(bp, off, len) \
>  	    xfs_buf_iomove((bp), (off), (len), NULL, XBRW_ZERO)
>  
> +extern int xfs_bioerror_relse(struct xfs_buf *);
> +
>  static inline int xfs_buf_geterror(xfs_buf_t *bp)
>  {
>  	return bp ? bp->b_error : ENOMEM;
> diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c
> index 07ab52c..73c1493 100644
> --- a/fs/xfs/xfs_log_recover.c
> +++ b/fs/xfs/xfs_log_recover.c
> @@ -193,7 +193,10 @@ xlog_bread_noalign(
>  	bp->b_io_length = nbblks;
>  	bp->b_error = 0;
>  
> -	xfsbdstrat(log->l_mp, bp);
> +	if (XFS_FORCED_SHUTDOWN(log->l_mp))
> +		return XFS_ERROR(EIO);
> +
> +	xfs_buf_iorequest(bp);
>  	error = xfs_buf_iowait(bp);
>  	if (error)
>  		xfs_buf_ioerror_alert(bp, __func__);
> @@ -4408,7 +4411,11 @@ xlog_do_recover(
>  	XFS_BUF_READ(bp);
>  	XFS_BUF_UNASYNC(bp);
>  	bp->b_ops = &xfs_sb_buf_ops;
> -	xfsbdstrat(log->l_mp, bp);
> +
> +	if (XFS_FORCED_SHUTDOWN(log->l_mp))
> +		return XFS_ERROR(EIO);
> +

I think we need a

xfs_buf_relse(bp);

before returning.  Looks good otherwise.

Reviewed-by: Ben Myers <bpm@sgi.com>

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 4/6] xfs: swalloc doesn't align allocations properly
  2013-12-13 12:01   ` Christoph Hellwig
@ 2013-12-16 23:14     ` Ben Myers
  2013-12-17  3:39       ` Dave Chinner
  0 siblings, 1 reply; 26+ messages in thread
From: Ben Myers @ 2013-12-16 23:14 UTC (permalink / raw)
  To: Christoph Hellwig, Dave Chinner; +Cc: xfs

Hi Gents,

On Fri, Dec 13, 2013 at 04:01:23AM -0800, Christoph Hellwig wrote:
> Looks good.
> 
> Reviewed-by: Christoph Hellwig <hch@lst.de>
> 
> Two very minor nitpicks below:
> 
> > +	int		stripe_align;
> >  
> >  	ASSERT(ap->length);
> >  
> >  	mp = ap->ip->i_mount;
> > +
> > +	/* stripe alignment for allocation is determined by mount parameters */
> > +	stripe_align = 0;
> > +	if (mp->m_swidth && (mp->m_flags & XFS_MOUNT_SWALLOC))
> > +		stripe_align = mp->m_swidth;
> > +	else if (mp->m_dalign)
> > +		stripe_align = mp->m_dalign;
> 
> nipick: I'd either initialize the variable to zero at the point of the
> declaration or do if .. else if .. else here.
> 
> >  	}
> > +
> > +
> >  	nullfb = *ap->firstblock == NULLFSBLOCK;
> 
> Two newlines seem odd here.  I'd support one even if that's an unrelated
> change :)

This is probably not the right thing to do for small files.  They will all end
up in the first stripe unit.

Quoting jpk from
http://lwn.net/Articles/87526/                                                                                                                                                                                   
                                                                                                                                                                                                                 
"  o [XFS] Add support for allocating additional file space in stripe width
sized chunks. A new fstab/mount option, "swalloc" has been defined. If
specified when mounting a striped file system, allocation requests will be
rounded up to a stripe width if the file size is >= stripe width, and the data
is being appended to eof. The 'swalloc' option is "off" by default. "


This feature was likely designed with volume stripes in mind as opposed to to
raid stripes.

-Ben

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 4/6] xfs: swalloc doesn't align allocations properly
  2013-12-16 23:14     ` Ben Myers
@ 2013-12-17  3:39       ` Dave Chinner
  2013-12-17 14:59         ` Ben Myers
  0 siblings, 1 reply; 26+ messages in thread
From: Dave Chinner @ 2013-12-17  3:39 UTC (permalink / raw)
  To: Ben Myers; +Cc: Christoph Hellwig, xfs

On Mon, Dec 16, 2013 at 05:14:14PM -0600, Ben Myers wrote:
> Hi Gents,
> 
> On Fri, Dec 13, 2013 at 04:01:23AM -0800, Christoph Hellwig wrote:
> > Looks good.
> > 
> > Reviewed-by: Christoph Hellwig <hch@lst.de>
> > 
> > Two very minor nitpicks below:
> > 
> > > +	int		stripe_align;
> > >  
> > >  	ASSERT(ap->length);
> > >  
> > >  	mp = ap->ip->i_mount;
> > > +
> > > +	/* stripe alignment for allocation is determined by mount parameters */
> > > +	stripe_align = 0;
> > > +	if (mp->m_swidth && (mp->m_flags & XFS_MOUNT_SWALLOC))
> > > +		stripe_align = mp->m_swidth;
> > > +	else if (mp->m_dalign)
> > > +		stripe_align = mp->m_dalign;
> > 
> > nipick: I'd either initialize the variable to zero at the point of the
> > declaration or do if .. else if .. else here.
> > 
> > >  	}
> > > +
> > > +
> > >  	nullfb = *ap->firstblock == NULLFSBLOCK;
> > 
> > Two newlines seem odd here.  I'd support one even if that's an unrelated
> > change :)
> 
> This is probably not the right thing to do for small files.  They will all end
> up in the first stripe unit.

You're right, it's not the right thing to do for small files. And we
don't, because the ap->aeof that triggers aligned allocation only
when:

	/*
	 * Only want to do the alignment at the eof if it is userdata and
	 * allocation length is larger than a stripe unit.
	 */
	if (mp->m_dalign && bma->length >= mp->m_dalign &&
	    !(bma->flags & XFS_BMAPI_METADATA) && whichfork == XFS_DATA_FORK) {
		error = xfs_bmap_isaeof(bma, whichfork);
		if (error)
			return error;
	}

The requested allocation length is greater than the stripe unit that
is configured.

So, we never align small files, regardless of the mount option....

> Quoting jpk from
> http://lwn.net/Articles/87526/
> "  o [XFS] Add support for allocating additional file space in stripe width
> sized chunks. A new fstab/mount option, "swalloc" has been defined. If
> specified when mounting a striped file system, allocation requests will be
> rounded up to a stripe width if the file size is >= stripe width, and the data
> is being appended to eof. The 'swalloc' option is "off" by default. "

You can find the actual commit in the oss archive tree:

http://oss.sgi.com/cgi-bin/gitweb.cgi?p=archive/xfs-import.git;a=commitdiff;h=2da28d20c986a1a16e266f9c3e90dd967267f2bd

> This feature was likely designed with volume stripes in mind as
> opposed to to raid stripes.

I'd say go look up PV783527 and see what the problem it was solving
was... ;)

But regardless, if someone has askked for swalloc, it's because they
have an application doing large sequential IO, and when that happens
there's no reason not to stripe width align the allocation.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH v2] xfs: remove xfsbdstrat error.
  2013-12-16 22:44     ` Ben Myers
@ 2013-12-17  8:03       ` Christoph Hellwig
  0 siblings, 0 replies; 26+ messages in thread
From: Christoph Hellwig @ 2013-12-17  8:03 UTC (permalink / raw)
  To: Ben Myers; +Cc: Christoph Hellwig, xfs

The xfsbdstrat helper is a small but useless wrapper for xfs_buf_iorequest that
handles the case of a shut down filesystem.  Most of the users have private,
uncached buffers that can just be freed in this case, but the complex error
handling in xfs_bioerror_relse messes up the case when it's called without
a locked buffer.

Remove xfsbdstrat and opencode the error handling in the callers.  All but
one can simply return an error and don't need to deal with buffer state,
and the one caller that cares about the buffer state could do with a major
cleanup as well, but we'll defer that to later.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Ben Myers <bpm@sgi.com>

diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c
index 6fc6bbe..16646e4 100644
--- a/fs/xfs/xfs_bmap_util.c
+++ b/fs/xfs/xfs_bmap_util.c
@@ -1209,7 +1209,12 @@ xfs_zero_remaining_bytes(
 		XFS_BUF_UNWRITE(bp);
 		XFS_BUF_READ(bp);
 		XFS_BUF_SET_ADDR(bp, xfs_fsb_to_db(ip, imap.br_startblock));
-		xfsbdstrat(mp, bp);
+
+		if (XFS_FORCED_SHUTDOWN(mp)) {
+			error = XFS_ERROR(EIO);
+			break;
+		}
+		xfs_buf_iorequest(bp);
 		error = xfs_buf_iowait(bp);
 		if (error) {
 			xfs_buf_ioerror_alert(bp,
@@ -1222,7 +1227,12 @@ xfs_zero_remaining_bytes(
 		XFS_BUF_UNDONE(bp);
 		XFS_BUF_UNREAD(bp);
 		XFS_BUF_WRITE(bp);
-		xfsbdstrat(mp, bp);
+
+		if (XFS_FORCED_SHUTDOWN(mp)) {
+			error = XFS_ERROR(EIO);
+			break;
+		}
+		xfs_buf_iorequest(bp);
 		error = xfs_buf_iowait(bp);
 		if (error) {
 			xfs_buf_ioerror_alert(bp,
diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c
index ce01c1a..544315e 100644
--- a/fs/xfs/xfs_buf.c
+++ b/fs/xfs/xfs_buf.c
@@ -698,7 +698,11 @@ xfs_buf_read_uncached(
 	bp->b_flags |= XBF_READ;
 	bp->b_ops = ops;
 
-	xfsbdstrat(target->bt_mount, bp);
+	if (XFS_FORCED_SHUTDOWN(target->bt_mount)) {
+		xfs_buf_relse(bp);
+		return NULL;
+	}
+	xfs_buf_iorequest(bp);
 	xfs_buf_iowait(bp);
 	return bp;
 }
@@ -1089,7 +1093,7 @@ xfs_bioerror(
  * This is meant for userdata errors; metadata bufs come with
  * iodone functions attached, so that we can track down errors.
  */
-STATIC int
+int
 xfs_bioerror_relse(
 	struct xfs_buf	*bp)
 {
@@ -1164,25 +1168,6 @@ xfs_bwrite(
 	return error;
 }
 
-/*
- * Wrapper around bdstrat so that we can stop data from going to disk in case
- * we are shutting down the filesystem.  Typically user data goes thru this
- * path; one of the exceptions is the superblock.
- */
-void
-xfsbdstrat(
-	struct xfs_mount	*mp,
-	struct xfs_buf		*bp)
-{
-	if (XFS_FORCED_SHUTDOWN(mp)) {
-		trace_xfs_bdstrat_shut(bp, _RET_IP_);
-		xfs_bioerror_relse(bp);
-		return;
-	}
-
-	xfs_buf_iorequest(bp);
-}
-
 STATIC void
 _xfs_buf_ioend(
 	xfs_buf_t		*bp,
diff --git a/fs/xfs/xfs_buf.h b/fs/xfs/xfs_buf.h
index e656833..7e41b08 100644
--- a/fs/xfs/xfs_buf.h
+++ b/fs/xfs/xfs_buf.h
@@ -269,9 +269,6 @@ extern void xfs_buf_unlock(xfs_buf_t *);
 
 /* Buffer Read and Write Routines */
 extern int xfs_bwrite(struct xfs_buf *bp);
-
-extern void xfsbdstrat(struct xfs_mount *, struct xfs_buf *);
-
 extern void xfs_buf_ioend(xfs_buf_t *,	int);
 extern void xfs_buf_ioerror(xfs_buf_t *, int);
 extern void xfs_buf_ioerror_alert(struct xfs_buf *, const char *func);
@@ -282,6 +279,8 @@ extern void xfs_buf_iomove(xfs_buf_t *, size_t, size_t, void *,
 #define xfs_buf_zero(bp, off, len) \
 	    xfs_buf_iomove((bp), (off), (len), NULL, XBRW_ZERO)
 
+extern int xfs_bioerror_relse(struct xfs_buf *);
+
 static inline int xfs_buf_geterror(xfs_buf_t *bp)
 {
 	return bp ? bp->b_error : ENOMEM;
diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c
index 07ab52c..f6c5ede 100644
--- a/fs/xfs/xfs_log_recover.c
+++ b/fs/xfs/xfs_log_recover.c
@@ -193,7 +193,10 @@ xlog_bread_noalign(
 	bp->b_io_length = nbblks;
 	bp->b_error = 0;
 
-	xfsbdstrat(log->l_mp, bp);
+	if (XFS_FORCED_SHUTDOWN(log->l_mp))
+		return XFS_ERROR(EIO);
+
+	xfs_buf_iorequest(bp);
 	error = xfs_buf_iowait(bp);
 	if (error)
 		xfs_buf_ioerror_alert(bp, __func__);
@@ -4408,7 +4411,13 @@ xlog_do_recover(
 	XFS_BUF_READ(bp);
 	XFS_BUF_UNASYNC(bp);
 	bp->b_ops = &xfs_sb_buf_ops;
-	xfsbdstrat(log->l_mp, bp);
+
+	if (XFS_FORCED_SHUTDOWN(log->l_mp)) {
+		xfs_buf_relse(bp);
+		return XFS_ERROR(EIO);
+	}
+
+	xfs_buf_iorequest(bp);
 	error = xfs_buf_iowait(bp);
 	if (error) {
 		xfs_buf_ioerror_alert(bp, __func__);
diff --git a/fs/xfs/xfs_trans_buf.c b/fs/xfs/xfs_trans_buf.c
index c035d11..647b6f1 100644
--- a/fs/xfs/xfs_trans_buf.c
+++ b/fs/xfs/xfs_trans_buf.c
@@ -314,7 +314,18 @@ xfs_trans_read_buf_map(
 			ASSERT(bp->b_iodone == NULL);
 			XFS_BUF_READ(bp);
 			bp->b_ops = ops;
-			xfsbdstrat(tp->t_mountp, bp);
+
+			/*
+			 * XXX(hch): clean up the error handling here to be less
+			 * of a mess..
+			 */
+			if (XFS_FORCED_SHUTDOWN(mp)) {
+				trace_xfs_bdstrat_shut(bp, _RET_IP_);
+				xfs_bioerror_relse(bp);
+			} else {
+				xfs_buf_iorequest(bp);
+			}
+
 			error = xfs_buf_iowait(bp);
 			if (error) {
 				xfs_buf_ioerror_alert(bp, __func__);

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH 4/6] xfs: swalloc doesn't align allocations properly
  2013-12-17  3:39       ` Dave Chinner
@ 2013-12-17 14:59         ` Ben Myers
  0 siblings, 0 replies; 26+ messages in thread
From: Ben Myers @ 2013-12-17 14:59 UTC (permalink / raw)
  To: Dave Chinner; +Cc: Christoph Hellwig, xfs

On Tue, Dec 17, 2013 at 02:39:41PM +1100, Dave Chinner wrote:
> On Mon, Dec 16, 2013 at 05:14:14PM -0600, Ben Myers wrote:
> > On Fri, Dec 13, 2013 at 04:01:23AM -0800, Christoph Hellwig wrote:
> > > Looks good.
> > > 
> > > Reviewed-by: Christoph Hellwig <hch@lst.de>
> > > 
> > > Two very minor nitpicks below:
> > > 
> > > > +	int		stripe_align;
> > > >  
> > > >  	ASSERT(ap->length);
> > > >  
> > > >  	mp = ap->ip->i_mount;
> > > > +
> > > > +	/* stripe alignment for allocation is determined by mount parameters */
> > > > +	stripe_align = 0;
> > > > +	if (mp->m_swidth && (mp->m_flags & XFS_MOUNT_SWALLOC))
> > > > +		stripe_align = mp->m_swidth;
> > > > +	else if (mp->m_dalign)
> > > > +		stripe_align = mp->m_dalign;
> > > 
> > > nipick: I'd either initialize the variable to zero at the point of the
> > > declaration or do if .. else if .. else here.
> > > 
> > > >  	}
> > > > +
> > > > +
> > > >  	nullfb = *ap->firstblock == NULLFSBLOCK;
> > > 
> > > Two newlines seem odd here.  I'd support one even if that's an unrelated
> > > change :)
> > 
> > This is probably not the right thing to do for small files.  They will all end
> > up in the first stripe unit.
> 
> You're right, it's not the right thing to do for small files. And we
> don't, because the ap->aeof that triggers aligned allocation only
> when:
> 
> 	/*
> 	 * Only want to do the alignment at the eof if it is userdata and
> 	 * allocation length is larger than a stripe unit.
> 	 */
> 	if (mp->m_dalign && bma->length >= mp->m_dalign &&
> 	    !(bma->flags & XFS_BMAPI_METADATA) && whichfork == XFS_DATA_FORK) {
> 		error = xfs_bmap_isaeof(bma, whichfork);
> 		if (error)
> 			return error;
> 	}
> 
> The requested allocation length is greater than the stripe unit that
> is configured.
> 
> So, we never align small files, regardless of the mount option....

That addresses my concerns, thanks.  ;)

Reviewed-by: Ben Myers <bpm@sgi.com>

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 0/6] xfs: fixes for 3.13-rc4
  2013-12-12  5:34 [PATCH 0/6] xfs: fixes for 3.13-rc4 Dave Chinner
                   ` (5 preceding siblings ...)
  2013-12-12  5:34 ` [PATCH 6/6] xfs: abort metadata writeback on permanent errors Dave Chinner
@ 2013-12-17 16:02 ` Ben Myers
  6 siblings, 0 replies; 26+ messages in thread
From: Ben Myers @ 2013-12-17 16:02 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

On Thu, Dec 12, 2013 at 04:34:32PM +1100, Dave Chinner wrote:
> Theses are all the kernel fixes I've accumulated over the past
> few weeks. There are a couple of IO error handling fixes, log grant
> head accounting fixes to avod spurious warnings, a swalloc alignment
> fix, and a patch to fix all the leaks in
> xlog_recover_process_data().

Applied Christoph's version of patch 1, as well as 4, and 6.  Cherry-picked
some stuff that I feel is appropriate for -rc5 and pushed into
xfs-for-linus-v3.13-rc5.

I'm not certain about the the two 'spurious messages' patches just yet.

-Ben

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 5/6] xfs: xlog_recover_process_data leaks like a sieve
  2013-12-16 15:23       ` Christoph Hellwig
@ 2013-12-17 17:58         ` Mark Tinguely
  0 siblings, 0 replies; 26+ messages in thread
From: Mark Tinguely @ 2013-12-17 17:58 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: xfs

On 12/16/13 09:23, Christoph Hellwig wrote:
> On Sat, Dec 14, 2013 at 09:11:02AM +1100, Dave Chinner wrote:
>> >  I'll rework it, but hte main issue is that it has to be freed
>> >  regardless of the error value in commit record processing, so it's
>> >  not as simple as just freeing it on error....
> Indeed, but think that's another reason to move the freeing to
> xlog_recover_process_data.  Right now or with the proposed patch
> xlog_recover_commit_trans frees trans, but there's nothing that
> breaks out of the loop in xlog_recover_process_data after that case.
>
> By moving the freeing there for all cases we can add a corruption
> check for that case, and we have a single function that controls
> the lifetime of the xlog_recover structure.

I like that idea of doing the frees in xlog_recover_process_data().
Each loop allocates a new trans, so each loop is self contained.

--Mark.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2013-12-17 17:59 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-12-12  5:34 [PATCH 0/6] xfs: fixes for 3.13-rc4 Dave Chinner
2013-12-12  5:34 ` [PATCH 1/6] xfs: don't try to mark uncached buffers stale on error Dave Chinner
2013-12-12  9:30   ` Jeff Liu
2013-12-12 10:09     ` Dave Chinner
2013-12-13  4:47       ` Jeff Liu
2013-12-12 16:36   ` Christoph Hellwig
2013-12-12 22:24     ` Dave Chinner
2013-12-13 11:01       ` Christoph Hellwig
2013-12-13 13:02   ` Christoph Hellwig
2013-12-16 22:44     ` Ben Myers
2013-12-17  8:03       ` [PATCH v2] xfs: remove xfsbdstrat error Christoph Hellwig
2013-12-12  5:34 ` [PATCH 2/6] xfs: prevent spurious "head behind tail" warnings Dave Chinner
2013-12-12  5:34 ` [PATCH 3/6] xfs: prevent spurious "space > BBTOB(tail_blocks)" warnings Dave Chinner
2013-12-12  5:34 ` [PATCH 4/6] xfs: swalloc doesn't align allocations properly Dave Chinner
2013-12-13 12:01   ` Christoph Hellwig
2013-12-16 23:14     ` Ben Myers
2013-12-17  3:39       ` Dave Chinner
2013-12-17 14:59         ` Ben Myers
2013-12-12  5:34 ` [PATCH 5/6] xfs: xlog_recover_process_data leaks like a sieve Dave Chinner
2013-12-13 12:32   ` Christoph Hellwig
2013-12-13 22:11     ` Dave Chinner
2013-12-16 15:23       ` Christoph Hellwig
2013-12-17 17:58         ` Mark Tinguely
2013-12-12  5:34 ` [PATCH 6/6] xfs: abort metadata writeback on permanent errors Dave Chinner
2013-12-13 12:33   ` Christoph Hellwig
2013-12-17 16:02 ` [PATCH 0/6] xfs: fixes for 3.13-rc4 Ben Myers

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.