All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/3] xfs: fixes for 3.7-rc6
@ 2012-11-12 11:09 Dave Chinner
  2012-11-12 11:09 ` [PATCH 1/3] xfs: fix attr tree double split corruption Dave Chinner
                   ` (4 more replies)
  0 siblings, 5 replies; 12+ messages in thread
From: Dave Chinner @ 2012-11-12 11:09 UTC (permalink / raw)
  To: xfs

Hi folks,

These are the remaining fixes I have in my tree for the 3.7 cycle.
One is a corruption fix, the other two are problems I'm seeing
during shutdown and recovery testing. That testing lasts 400-500
cycles (3-4 hours) now before it fails with another, as yet unknown
buffer/perag reference assert failure. Without these fixes, it lasts
3-4 cycles before assert failing or triggering a BUG.

I'd like these fixes to be in the 3.7 cycle if at all possible, with
both patch 1 and 2 being stable kernel candidates.

Cheers,

Dave.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH 1/3] xfs: fix attr tree double split corruption
  2012-11-12 11:09 [PATCH 0/3] xfs: fixes for 3.7-rc6 Dave Chinner
@ 2012-11-12 11:09 ` Dave Chinner
  2012-11-12 19:10   ` Mark Tinguely
  2012-11-12 11:09 ` [PATCH 2/3] xfs: fix broken error handling in xfs_vm_writepage Dave Chinner
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 12+ messages in thread
From: Dave Chinner @ 2012-11-12 11:09 UTC (permalink / raw)
  To: xfs

From: Dave Chinner <dchinner@redhat.com>

In certain circumstances, a double split of an attribute tree is
needed to insert or replace an attribute. In rare situations, this
can go wrong, leaving the attribute tree corrupted. In this case,
the attr being replaced is the last attr in a leaf node, and the
replacement is larger so doesn't fit in the same leaf node.
When we have the initial condition of a node format attribute
btree with two leaves at index 1 and 2. Call them L1 and L2.  The
leaf L1 is completely full, there is not a single byte of free space
in it. L2 is mostly empty.  The attribute being replaced - call it X
- is the last attribute in L1.

The way an attribute replace is executed is that the replacement
attribute - call it Y - is first inserted into the tree, but has an
INCOMPLETE flag set on it so that list traversals ignore it. Once
this transaction is committed, a second transaction it run to
atomically mark Y as COMPLETE and X as INCOMPLETE, so that a
traversal will now find Y and skip X. Once that transaction is
committed, attribute X is then removed.

So, the initial condition is:

     +--------+     +--------+
     |   L1   |     |   L2   |
     | fwd: 2 |---->| fwd: 0 |
     | bwd: 0 |<----| bwd: 1 |
     | fsp: 0 |     | fsp: N |
     |--------|     |--------|
     | attr A |     | attr 1 |
     |--------|     |--------|
     | attr B |     | attr 2 |
     |--------|     |--------|
     ..........     ..........
     |--------|     |--------|
     | attr X |     | attr n |
     +--------+     +--------+


So now we go to replace X, and see that L1:fsp = 0 - it is full so
we can't insert Y in the same leaf. So we record the the location of
attribute X so we can track it for later use, then we split L1 into
L1 and L3 and reblance across the two leafs. We end with:


     +--------+     +--------+     +--------+
     |   L1   |     |   L3   |     |   L2   |
     | fwd: 3 |---->| fwd: 2 |---->| fwd: 0 |
     | bwd: 0 |<----| bwd: 1 |<----| bwd: 3 |
     | fsp: M |     | fsp: J |     | fsp: N |
     |--------|     |--------|     |--------|
     | attr A |     | attr X |     | attr 1 |
     |--------|     +--------+     |--------|
     | attr B |                    | attr 2 |
     |--------|                    |--------|
     ..........                    ..........
     |--------|                    |--------|
     | attr W |                    | attr n |
     +--------+                    +--------+


And we track that the original attribute is now at L3:0.

We then try to insert Y into L1 again, and find that there isn't
enough room because the new attribute is larger than the old one.
Hence we have to split again to make room for Y. We end up with
this:


     +--------+     +--------+     +--------+     +--------+
     |   L1   |     |   L4   |     |   L3   |     |   L2   |
     | fwd: 4 |---->| fwd: 3 |---->| fwd: 2 |---->| fwd: 0 |
     | bwd: 0 |<----| bwd: 1 |<----| bwd: 4 |<----| bwd: 3 |
     | fsp: M |     | fsp: J |     | fsp: J |     | fsp: N |
     |--------|     |--------|     |--------|     |--------|
     | attr A |     | attr Y |     | attr X |     | attr 1 |
     |--------|     + INCOMP +     +--------+     |--------|
     | attr B |     +--------+                    | attr 2 |
     |--------|                                   |--------|
     ..........                                   ..........
     |--------|                                   |--------|
     | attr W |                                   | attr n |
     +--------+                                   +--------+

And now we have the new (incomplete) attribute @ L4:0, and the
original attribute at L3:0. At this point, the first transaction is
committed, and we move to the flipping of the flags.

This is where we are supposed to end up with this:

     +--------+     +--------+     +--------+     +--------+
     |   L1   |     |   L4   |     |   L3   |     |   L2   |
     | fwd: 4 |---->| fwd: 3 |---->| fwd: 2 |---->| fwd: 0 |
     | bwd: 0 |<----| bwd: 1 |<----| bwd: 4 |<----| bwd: 3 |
     | fsp: M |     | fsp: J |     | fsp: J |     | fsp: N |
     |--------|     |--------|     |--------|     |--------|
     | attr A |     | attr Y |     | attr X |     | attr 1 |
     |--------|     +--------+     + INCOMP +     |--------|
     | attr B |                    +--------+     | attr 2 |
     |--------|                                   |--------|
     ..........                                   ..........
     |--------|                                   |--------|
     | attr W |                                   | attr n |
     +--------+                                   +--------+

But that doesn't happen properly - the attribute tracking indexes
are not pointing to the right locations. What we end up with is both
the old attribute to be removed pointing at L4:0 and the new
attribute at L4:1.  On a debug kernel, this assert fails like so:

XFS: Assertion failed: args->index2 < be16_to_cpu(leaf2->hdr.count), file: fs/xfs/xfs_attr_leaf.c, line: 2725

because the new attribute location does not exist. On a production
kernel, this goes unnoticed and the code proceeds ahead merrily and
removes L4 because it thinks that is the block that is no longer
needed. This leaves the hash index node pointing to entries
L1, L4 and L2, but only blocks L1, L3 and L2 to exist. Further, the
leaf level sibling list is L1 <-> L4 <-> L2, but L4 is now free
space, and so everything is busted. This corruption is caused by the
removal of the old attribute triggering a join - it joins everything
correctly but then frees the wrong block.

xfs_repair will report something like:

bad sibling back pointer for block 4 in attribute fork for inode 131
problem with attribute contents in inode 131
would clear attr fork
bad nblocks 8 for inode 131, would reset to 3
bad anextents 4 for inode 131, would reset to 0

The problem lies in the assignment of the old/new blocks for
tracking purposes when the double leaf split occurs. The first split
tries to place the new attribute inside the current leaf (i.e.
"inleaf == true") and moves the old attribute (X) to the new block.
This sets up the old block/index to L1:X, and newly allocated
block to L3:0. It then moves attr X to the new block and tries to
insert attr Y at the old index. That fails, so it splits again.

With the second split, the rebalance ends up placing the new attr in
the second new block - L4:0 - and this is where the code goes wrong.
What is does is it sets both the new and old block index to the
second new block. Hence it inserts attr Y at the right place (L4:0)
but overwrites the current location of the attr to replace that is
held in the new block index (currently L3:0). It over writes it with
L4:1 - the index we later assert fail on.

Hopefully this table will show this in a foramt that is a bit easier
to understand:

Split		old attr index		new attr index
		vanilla	patched		vanilla	patched
before 1st	L1:26	L1:26		N/A	N/A
after 1st	L3:0	L3:0		L1:26	L1:26
after 2nd	L4:0	L3:0		L4:1	L4:0
                ^^^^			^^^^
		wrong			wrong

The fix is surprisingly simple, for all this analysis - just stop
the rebalance on the out-of leaf case from overwriting the new attr
index - it's already correct for the double split case.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/xfs_attr_leaf.c |   20 ++++++++++++++++++--
 1 file changed, 18 insertions(+), 2 deletions(-)

diff --git a/fs/xfs/xfs_attr_leaf.c b/fs/xfs/xfs_attr_leaf.c
index d330111..70eec18 100644
--- a/fs/xfs/xfs_attr_leaf.c
+++ b/fs/xfs/xfs_attr_leaf.c
@@ -1291,6 +1291,7 @@ xfs_attr_leaf_rebalance(xfs_da_state_t *state, xfs_da_state_blk_t *blk1,
 	leaf2 = blk2->bp->b_addr;
 	ASSERT(leaf1->hdr.info.magic == cpu_to_be16(XFS_ATTR_LEAF_MAGIC));
 	ASSERT(leaf2->hdr.info.magic == cpu_to_be16(XFS_ATTR_LEAF_MAGIC));
+	ASSERT(leaf2->hdr.count == 0);
 	args = state->args;
 
 	trace_xfs_attr_leaf_rebalance(args);
@@ -1361,6 +1362,7 @@ xfs_attr_leaf_rebalance(xfs_da_state_t *state, xfs_da_state_blk_t *blk1,
 		 * I assert that since all callers pass in an empty
 		 * second buffer, this code should never execute.
 		 */
+		ASSERT(0);
 
 		/*
 		 * Figure the total bytes to be added to the destination leaf.
@@ -1422,10 +1424,24 @@ xfs_attr_leaf_rebalance(xfs_da_state_t *state, xfs_da_state_blk_t *blk1,
 			args->index2 = 0;
 			args->blkno2 = blk2->blkno;
 		} else {
+			/*
+			 * On a double leaf split, the original attr location
+			 * is already stored in blkno2/index2, so don't
+			 * overwrite it overwise we corrupt the tree.
+			 */
 			blk2->index = blk1->index
 				    - be16_to_cpu(leaf1->hdr.count);
-			args->index = args->index2 = blk2->index;
-			args->blkno = args->blkno2 = blk2->blkno;
+			args->index = blk2->index;
+			args->blkno = blk2->blkno;
+			if (!state->extravalid) {
+				/*
+				 * set the new attr location to match the old
+				 * one and let the higher level split code
+				 * decide where in the leaf to place it.
+				 */
+				args->index2 = blk2->index;
+				args->blkno2 = blk2->blkno;
+			}
 		}
 	} else {
 		ASSERT(state->inleaf == 1);
-- 
1.7.10

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 2/3] xfs: fix broken error handling in xfs_vm_writepage
  2012-11-12 11:09 [PATCH 0/3] xfs: fixes for 3.7-rc6 Dave Chinner
  2012-11-12 11:09 ` [PATCH 1/3] xfs: fix attr tree double split corruption Dave Chinner
@ 2012-11-12 11:09 ` Dave Chinner
  2012-11-13 14:27   ` Mark Tinguely
  2012-11-12 11:09 ` [PATCH 3/3] xfs: drop buffer io reference when a bad bio is built Dave Chinner
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 12+ messages in thread
From: Dave Chinner @ 2012-11-12 11:09 UTC (permalink / raw)
  To: xfs

From: Dave Chinner <dchinner@redhat.com>

When we shut down the filesystem, it might first be detected in
writeback when we are allocating a inode size transaction. This
happens after we have moved all the pages into the writeback state
and unlocked them. Unfortunately, if we fail to set up the
transaction we then abort writeback and try to invalidate the
current page. This then triggers are BUG() in block_invalidatepage()
because we are trying to invalidate an unlocked page.

Fixing this is a bit of a chicken and egg problem - we can't
allocate the transaction until we've clustered all the pages into
the IO and we know the size of it (i.e. whether the last block of
the IO is beyond the current EOF or not). However, we don't want to
hold pages locked for long periods of time, especially while we lock
other pages to cluster them into the write.

To fix this, we need to make a clear delineation in writeback where
errors can only be handled by IO completion processing. That is,
once we have marked a page for writeback and unlocked it, we have to
report errors via IO completion because we've already started the
IO. We may not have submitted any IO, but we've changed the page
state to indicate that it is under IO so we must now use the IO
completion path to report errors.

To do this, add an error field to xfs_submit_ioend() to pass it the
error that occurred during the building on the ioend chain. When
this is non-zero, mark each ioend with the error and call
xfs_finish_ioend() directly rather than building bios. This will
immediately push the ioends through completion processing with the
error that has occurred.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/xfs_aops.c |   54 ++++++++++++++++++++++++++++++++++++++---------------
 1 file changed, 39 insertions(+), 15 deletions(-)

diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
index e562dd4..e57e2da 100644
--- a/fs/xfs/xfs_aops.c
+++ b/fs/xfs/xfs_aops.c
@@ -481,11 +481,17 @@ static inline int bio_add_buffer(struct bio *bio, struct buffer_head *bh)
  *
  * The fix is two passes across the ioend list - one to start writeback on the
  * buffer_heads, and then submit them for I/O on the second pass.
+ *
+ * If @fail is non-zero, it means that we have a situation where some part of
+ * the submission process has failed after we have marked paged for writeback
+ * and unlocked them. In this situation, we need to fail the ioend chain rather
+ * than submit it to IO. This typically only happens on a filesystem shutdown.
  */
 STATIC void
 xfs_submit_ioend(
 	struct writeback_control *wbc,
-	xfs_ioend_t		*ioend)
+	xfs_ioend_t		*ioend,
+	int			fail)
 {
 	xfs_ioend_t		*head = ioend;
 	xfs_ioend_t		*next;
@@ -506,6 +512,18 @@ xfs_submit_ioend(
 		next = ioend->io_list;
 		bio = NULL;
 
+		/*
+		 * If we are failing the IO now, just mark the ioend with an
+		 * error and finish it. This will run IO completion immediately
+		 * as there is only one reference to the ioend at this point in
+		 * time.
+		 */
+		if (fail) {
+			ioend->io_error = -fail;
+			xfs_finish_ioend(ioend);
+			continue;
+		}
+
 		for (bh = ioend->io_buffer_head; bh; bh = bh->b_private) {
 
 			if (!bio) {
@@ -1060,7 +1078,18 @@ xfs_vm_writepage(
 
 	xfs_start_page_writeback(page, 1, count);
 
-	if (ioend && imap_valid) {
+	/* if there is no IO to be submitted for this page, we are done */
+	if (!ioend)
+		return 0;
+
+	ASSERT(iohead);
+
+	/*
+	 * Any errors from this point onwards need tobe reported through the IO
+	 * completion path as we have marked the initial page as under writeback
+	 * and unlocked it.
+	 */
+	if (imap_valid) {
 		xfs_off_t		end_index;
 
 		end_index = imap.br_startoff + imap.br_blockcount;
@@ -1079,20 +1108,15 @@ xfs_vm_writepage(
 				  wbc, end_index);
 	}
 
-	if (iohead) {
-		/*
-		 * Reserve log space if we might write beyond the on-disk
-		 * inode size.
-		 */
-		if (ioend->io_type != XFS_IO_UNWRITTEN &&
-		    xfs_ioend_is_append(ioend)) {
-			err = xfs_setfilesize_trans_alloc(ioend);
-			if (err)
-				goto error;
-		}
 
-		xfs_submit_ioend(wbc, iohead);
-	}
+	/*
+	 * Reserve log space if we might write beyond the on-disk inode size.
+	 */
+	err = 0;
+	if (ioend->io_type != XFS_IO_UNWRITTEN && xfs_ioend_is_append(ioend))
+		err = xfs_setfilesize_trans_alloc(ioend);
+
+	xfs_submit_ioend(wbc, iohead, err);
 
 	return 0;
 
-- 
1.7.10

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 3/3] xfs: drop buffer io reference when a bad bio is built
  2012-11-12 11:09 [PATCH 0/3] xfs: fixes for 3.7-rc6 Dave Chinner
  2012-11-12 11:09 ` [PATCH 1/3] xfs: fix attr tree double split corruption Dave Chinner
  2012-11-12 11:09 ` [PATCH 2/3] xfs: fix broken error handling in xfs_vm_writepage Dave Chinner
@ 2012-11-12 11:09 ` Dave Chinner
  2012-11-12 20:51   ` Mark Tinguely
  2012-11-13  2:01 ` [PATCH 0/3] xfs: fixes for 3.7-rc6 Ben Myers
  2012-11-13 21:20 ` Ben Myers
  4 siblings, 1 reply; 12+ messages in thread
From: Dave Chinner @ 2012-11-12 11:09 UTC (permalink / raw)
  To: xfs

From: Dave Chinner <dchinner@redhat.com>

Error handling in xfs_buf_ioapply_map() does not handle IO reference
counts correctly. We increment the b_io_remaining count before
building the bio, but then fail to decrement it in the failure case.
This leads to the buffer never running IO completion and releasing
the reference that the IO holds, so at unmount we can leak the
buffer. This leak is captured by this assert failure during unmount:

XFS: Assertion failed: atomic_read(&pag->pag_ref) == 0, file: fs/xfs/xfs_mount.c, line: 273

This is not a new bug - the b_io_remaining accounting has had this
problem for a long, long time - it's just very hard to get a
zero length bio being built by this code...

Further, the buffer IO error can be overwritten on a multi-segment
buffer by subsequent bio completions for partial sections of the
buffer. Hence we should only set the buffer error status if the
buffer is not already carrying an error status. This ensures that a
partial IO error on a multi-segment buffer will not be lost. This
part of the problem is a regression, however.

cc: <stable@vger.kernel.org>
Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/xfs_buf.c |   14 ++++++++++++--
 1 file changed, 12 insertions(+), 2 deletions(-)

diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c
index 933b793..4b0b8dd 100644
--- a/fs/xfs/xfs_buf.c
+++ b/fs/xfs/xfs_buf.c
@@ -1197,9 +1197,14 @@ xfs_buf_bio_end_io(
 {
 	xfs_buf_t		*bp = (xfs_buf_t *)bio->bi_private;
 
-	xfs_buf_ioerror(bp, -error);
+	/*
+	 * don't overwrite existing errors - otherwise we can lose errors on
+	 * buffers that require multiple bios to complete.
+	 */
+	if (!bp->b_error)
+		xfs_buf_ioerror(bp, -error);
 
-	if (!error && xfs_buf_is_vmapped(bp) && (bp->b_flags & XBF_READ))
+	if (!bp->b_error && xfs_buf_is_vmapped(bp) && (bp->b_flags & XBF_READ))
 		invalidate_kernel_vmap_range(bp->b_addr, xfs_buf_vmap_len(bp));
 
 	_xfs_buf_ioend(bp, 1);
@@ -1279,6 +1284,11 @@ next_chunk:
 		if (size)
 			goto next_chunk;
 	} else {
+		/*
+		 * This is guaranteed not to be the last io reference count
+		 * because the caller (xfs_buf_iorequest) holds a count itself.
+		 */
+		atomic_dec(&bp->b_io_remaining);
 		xfs_buf_ioerror(bp, EIO);
 		bio_put(bio);
 	}
-- 
1.7.10

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/3] xfs: fix attr tree double split corruption
  2012-11-12 11:09 ` [PATCH 1/3] xfs: fix attr tree double split corruption Dave Chinner
@ 2012-11-12 19:10   ` Mark Tinguely
  0 siblings, 0 replies; 12+ messages in thread
From: Mark Tinguely @ 2012-11-12 19:10 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

On 11/12/12 05:09, Dave Chinner wrote:
> From: Dave Chinner<dchinner@redhat.com>
>
> In certain circumstances, a double split of an attribute tree is
> needed to insert or replace an attribute. In rare situations, this
> can go wrong, leaving the attribute tree corrupted. In this case,
> the attr being replaced is the last attr in a leaf node, and the
> replacement is larger so doesn't fit in the same leaf node.


...

> Signed-off-by: Dave Chinner<dchinner@redhat.com>
> ---

Looks great.

Reviewed-by: Mark Tinguely <tinguely@sgi.com>

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 3/3] xfs: drop buffer io reference when a bad bio is built
  2012-11-12 11:09 ` [PATCH 3/3] xfs: drop buffer io reference when a bad bio is built Dave Chinner
@ 2012-11-12 20:51   ` Mark Tinguely
  0 siblings, 0 replies; 12+ messages in thread
From: Mark Tinguely @ 2012-11-12 20:51 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

On 11/12/12 05:09, Dave Chinner wrote:
> From: Dave Chinner<dchinner@redhat.com>
>
> Error handling in xfs_buf_ioapply_map() does not handle IO reference
> counts correctly. We increment the b_io_remaining count before
> building the bio, but then fail to decrement it in the failure case.
> This leads to the buffer never running IO completion and releasing
> the reference that the IO holds, so at unmount we can leak the
> buffer. This leak is captured by this assert failure during unmount:
>
> XFS: Assertion failed: atomic_read(&pag->pag_ref) == 0, file: fs/xfs/xfs_mount.c, line: 273
>
> This is not a new bug - the b_io_remaining accounting has had this
> problem for a long, long time - it's just very hard to get a
> zero length bio being built by this code...
>
> Further, the buffer IO error can be overwritten on a multi-segment
> buffer by subsequent bio completions for partial sections of the
> buffer. Hence we should only set the buffer error status if the
> buffer is not already carrying an error status. This ensures that a
> partial IO error on a multi-segment buffer will not be lost. This
> part of the problem is a regression, however.
>
> cc:<stable@vger.kernel.org>
> Signed-off-by: Dave Chinner<dchinner@redhat.com>
> ---

Looks good.

Reviewed-by: Mark Tinguely <tinguely@sgi.com>

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 0/3] xfs: fixes for 3.7-rc6
  2012-11-12 11:09 [PATCH 0/3] xfs: fixes for 3.7-rc6 Dave Chinner
                   ` (2 preceding siblings ...)
  2012-11-12 11:09 ` [PATCH 3/3] xfs: drop buffer io reference when a bad bio is built Dave Chinner
@ 2012-11-13  2:01 ` Ben Myers
  2012-11-13  3:05   ` Dave Chinner
  2012-11-13 21:20 ` Ben Myers
  4 siblings, 1 reply; 12+ messages in thread
From: Ben Myers @ 2012-11-13  2:01 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

Hi Dave,

On Mon, Nov 12, 2012 at 10:09:43PM +1100, Dave Chinner wrote:
> These are the remaining fixes I have in my tree for the 3.7 cycle.
> One is a corruption fix, the other two are problems I'm seeing
> during shutdown and recovery testing. That testing lasts 400-500
> cycles (3-4 hours) now before it fails with another, as yet unknown
> buffer/perag reference assert failure. Without these fixes, it lasts
> 3-4 cycles before assert failing or triggering a BUG.
> 
> I'd like these fixes to be in the 3.7 cycle if at all possible, with
> both patch 1 and 2 being stable kernel candidates.

I think we can make that happen.  I noticed that you posted patches 2 and 3
earlier today and had to chuckle, but I understand.  We'll put this 3.7 work at
the top of the priority queue, followed by your 3.8 queue, and then the few
items left for the userspace release.

Thanks for reposting your 3.8 queue, BTW.  Sometimes I think that frequent
reposts are inconvenient, but in this case I think it will be helpful.

Thanks,
	Ben

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 0/3] xfs: fixes for 3.7-rc6
  2012-11-13  2:01 ` [PATCH 0/3] xfs: fixes for 3.7-rc6 Ben Myers
@ 2012-11-13  3:05   ` Dave Chinner
  2012-11-13  7:11     ` Christoph Hellwig
  0 siblings, 1 reply; 12+ messages in thread
From: Dave Chinner @ 2012-11-13  3:05 UTC (permalink / raw)
  To: Ben Myers; +Cc: xfs

On Mon, Nov 12, 2012 at 08:01:50PM -0600, Ben Myers wrote:
> Hi Dave,
> 
> On Mon, Nov 12, 2012 at 10:09:43PM +1100, Dave Chinner wrote:
> > These are the remaining fixes I have in my tree for the 3.7 cycle.
> > One is a corruption fix, the other two are problems I'm seeing
> > during shutdown and recovery testing. That testing lasts 400-500
> > cycles (3-4 hours) now before it fails with another, as yet unknown
> > buffer/perag reference assert failure. Without these fixes, it lasts
> > 3-4 cycles before assert failing or triggering a BUG.
> > 
> > I'd like these fixes to be in the 3.7 cycle if at all possible, with
> > both patch 1 and 2 being stable kernel candidates.
> 
> I think we can make that happen.  I noticed that you posted patches 2 and 3
> earlier today and had to chuckle, but I understand.

Fixing a nasty bug or regression bug and asking it be merged into
the current -rc cycle after initial testing is not unusual, so I'm
not sure why you'd chuckle at that....

> We'll put this 3.7 work at
> the top of the priority queue, followed by your 3.8 queue, and then the few
> items left for the userspace release.

Just FYI, the first thing that needs to be done after the userspace
release is to resync the libxfs/libxlog code with the kernel code.
That needs to be done before any of the CRC changes can be merged,
so be prepared for a pretty major update that'll need review soon
after the release.

IIRC, the last kernel/user resync I did was about 15,000 lines of
change I don't expect this to be quite as big (the previous one
contained the btree consolidation changes), but I'd expect it to be
in the order of 2-3000 lines of change to be needed.

> Thanks for reposting your 3.8 queue, BTW.  Sometimes I think that frequent
> reposts are inconvenient, but in this case I think it will be helpful.

When I'm having trouble tracking of several of my own patch sets for
the same release, it's time for re-organisation and a repost.
Besides, it's a habit Christoph and I got into a while back when we
had lots of work going on and we needed to make sure nothing
slipped through the cracks..... :)

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 0/3] xfs: fixes for 3.7-rc6
  2012-11-13  3:05   ` Dave Chinner
@ 2012-11-13  7:11     ` Christoph Hellwig
  2012-11-13 19:46       ` Dave Chinner
  0 siblings, 1 reply; 12+ messages in thread
From: Christoph Hellwig @ 2012-11-13  7:11 UTC (permalink / raw)
  To: Dave Chinner; +Cc: Ben Myers, xfs

On Tue, Nov 13, 2012 at 02:05:07PM +1100, Dave Chinner wrote:
> Just FYI, the first thing that needs to be done after the userspace
> release is to resync the libxfs/libxlog code with the kernel code.
> That needs to be done before any of the CRC changes can be merged,
> so be prepared for a pretty major update that'll need review soon
> after the release.
> 
> IIRC, the last kernel/user resync I did was about 15,000 lines of
> change I don't expect this to be quite as big (the previous one
> contained the btree consolidation changes), but I'd expect it to be
> in the order of 2-3000 lines of change to be needed.

Do we have anyone who has signed up for that work yet?

One thing I'd really love to do is to make the resyncs simpler, e.g.
make sure userspace can use the same #include statements as the kernel
code, and make sure all files are either used entirely in userspace or
not at all.  Things like the split of xfs_extent_busy.c are going to
help with that, but I suspect there's more work to do.

> > Thanks for reposting your 3.8 queue, BTW.  Sometimes I think that frequent
> > reposts are inconvenient, but in this case I think it will be helpful.
> 
> When I'm having trouble tracking of several of my own patch sets for
> the same release, it's time for re-organisation and a repost.
> Besides, it's a habit Christoph and I got into a while back when we
> had lots of work going on and we needed to make sure nothing
> slipped through the cracks..... :)

For anything more than one or two simple patches a complete repost is
they way to spell a ping.  Once you're having dozends of patches it's
almost guaranteed there are a few small fixes or context changes by the
time of the ping, and the repost takes care of having an uptodate
version.

And in case anyone wonder, yes I plan to get back to reviewing your bits
soon.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 2/3] xfs: fix broken error handling in xfs_vm_writepage
  2012-11-12 11:09 ` [PATCH 2/3] xfs: fix broken error handling in xfs_vm_writepage Dave Chinner
@ 2012-11-13 14:27   ` Mark Tinguely
  0 siblings, 0 replies; 12+ messages in thread
From: Mark Tinguely @ 2012-11-13 14:27 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

On 11/12/12 05:09, Dave Chinner wrote:
> From: Dave Chinner<dchinner@redhat.com>
>
> When we shut down the filesystem, it might first be detected in
> writeback when we are allocating a inode size transaction. This
> happens after we have moved all the pages into the writeback state
> and unlocked them. Unfortunately, if we fail to set up the
> transaction we then abort writeback and try to invalidate the
> current page. This then triggers are BUG() in block_invalidatepage()
> because we are trying to invalidate an unlocked page.
>
> Fixing this is a bit of a chicken and egg problem - we can't
> allocate the transaction until we've clustered all the pages into
> the IO and we know the size of it (i.e. whether the last block of
> the IO is beyond the current EOF or not). However, we don't want to
> hold pages locked for long periods of time, especially while we lock
> other pages to cluster them into the write.
>
> To fix this, we need to make a clear delineation in writeback where
> errors can only be handled by IO completion processing. That is,
> once we have marked a page for writeback and unlocked it, we have to
> report errors via IO completion because we've already started the
> IO. We may not have submitted any IO, but we've changed the page
> state to indicate that it is under IO so we must now use the IO
> completion path to report errors.
>
> To do this, add an error field to xfs_submit_ioend() to pass it the
> error that occurred during the building on the ioend chain. When
> this is non-zero, mark each ioend with the error and call
> xfs_finish_ioend() directly rather than building bios. This will
> immediately push the ioends through completion processing with the
> error that has occurred.
>
> Signed-off-by: Dave Chinner<dchinner@redhat.com>
> ---

I could not recreate the problem with my test machines, which is not
surprising - it is a matter of timing.

The patch looks sound to me.

Reviewed-by: Mark Tinguely <tinguely@sgi.com>

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 0/3] xfs: fixes for 3.7-rc6
  2012-11-13  7:11     ` Christoph Hellwig
@ 2012-11-13 19:46       ` Dave Chinner
  0 siblings, 0 replies; 12+ messages in thread
From: Dave Chinner @ 2012-11-13 19:46 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Ben Myers, xfs

On Tue, Nov 13, 2012 at 02:11:47AM -0500, Christoph Hellwig wrote:
> On Tue, Nov 13, 2012 at 02:05:07PM +1100, Dave Chinner wrote:
> > Just FYI, the first thing that needs to be done after the userspace
> > release is to resync the libxfs/libxlog code with the kernel code.
> > That needs to be done before any of the CRC changes can be merged,
> > so be prepared for a pretty major update that'll need review soon
> > after the release.
> > 
> > IIRC, the last kernel/user resync I did was about 15,000 lines of
> > change I don't expect this to be quite as big (the previous one
> > contained the btree consolidation changes), but I'd expect it to be
> > in the order of 2-3000 lines of change to be needed.
> 
> Do we have anyone who has signed up for that work yet?

I think it will be end up being me because I'm the one who needs it
done in the next month or so...

> One thing I'd really love to do is to make the resyncs simpler, e.g.
> make sure userspace can use the same #include statements as the kernel
> code, and make sure all files are either used entirely in userspace or
> not at all.  Things like the split of xfs_extent_busy.c are going to
> help with that, but I suspect there's more work to do.

There's a lot more work to get to that point, I think. I don't plan
on tackling that right now, though. Stuff like the way you split up
the dir2 headers is also a step in the right direction...

> > > Thanks for reposting your 3.8 queue, BTW.  Sometimes I think that frequent
> > > reposts are inconvenient, but in this case I think it will be helpful.
> > 
> > When I'm having trouble tracking of several of my own patch sets for
> > the same release, it's time for re-organisation and a repost.
> > Besides, it's a habit Christoph and I got into a while back when we
> > had lots of work going on and we needed to make sure nothing
> > slipped through the cracks..... :)
> 
> For anything more than one or two simple patches a complete repost is
> they way to spell a ping.  Once you're having dozends of patches it's
> almost guaranteed there are a few small fixes or context changes by the
> time of the ping, and the repost takes care of having an uptodate
> version.
> 
> And in case anyone wonder, yes I plan to get back to reviewing your bits
> soon.

Cool.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 0/3] xfs: fixes for 3.7-rc6
  2012-11-12 11:09 [PATCH 0/3] xfs: fixes for 3.7-rc6 Dave Chinner
                   ` (3 preceding siblings ...)
  2012-11-13  2:01 ` [PATCH 0/3] xfs: fixes for 3.7-rc6 Ben Myers
@ 2012-11-13 21:20 ` Ben Myers
  4 siblings, 0 replies; 12+ messages in thread
From: Ben Myers @ 2012-11-13 21:20 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

Hi Dave,

On Mon, Nov 12, 2012 at 10:09:43PM +1100, Dave Chinner wrote:
> These are the remaining fixes I have in my tree for the 3.7 cycle.
> One is a corruption fix, the other two are problems I'm seeing
> during shutdown and recovery testing. That testing lasts 400-500
> cycles (3-4 hours) now before it fails with another, as yet unknown
> buffer/perag reference assert failure. Without these fixes, it lasts
> 3-4 cycles before assert failing or triggering a BUG.
> 
> I'd like these fixes to be in the 3.7 cycle if at all possible, with
> both patch 1 and 2 being stable kernel candidates.

committed to git://oss.sgi.com/xfs/xfs.git, master and for-next branches.

I'll send a pull request tomorrow.

Regards,
	Ben

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2012-11-13 21:18 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-11-12 11:09 [PATCH 0/3] xfs: fixes for 3.7-rc6 Dave Chinner
2012-11-12 11:09 ` [PATCH 1/3] xfs: fix attr tree double split corruption Dave Chinner
2012-11-12 19:10   ` Mark Tinguely
2012-11-12 11:09 ` [PATCH 2/3] xfs: fix broken error handling in xfs_vm_writepage Dave Chinner
2012-11-13 14:27   ` Mark Tinguely
2012-11-12 11:09 ` [PATCH 3/3] xfs: drop buffer io reference when a bad bio is built Dave Chinner
2012-11-12 20:51   ` Mark Tinguely
2012-11-13  2:01 ` [PATCH 0/3] xfs: fixes for 3.7-rc6 Ben Myers
2012-11-13  3:05   ` Dave Chinner
2012-11-13  7:11     ` Christoph Hellwig
2012-11-13 19:46       ` Dave Chinner
2012-11-13 21:20 ` Ben Myers

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.