All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] xfs: fix broken multi-fsb buffer logging
@ 2016-05-25 19:31 Brian Foster
  2016-05-30 14:08 ` Brian Foster
                   ` (2 more replies)
  0 siblings, 3 replies; 6+ messages in thread
From: Brian Foster @ 2016-05-25 19:31 UTC (permalink / raw)
  To: xfs

Multi-block buffers are logged based on buffer offset in
xfs_trans_log_buf(). xfs_buf_item_log() ultimately walks each mapping in
the buffer and marks the associated range to be logged in the
xfs_buf_log_format bitmap for that mapping. This code is broken,
however, in that it marks the actual buffer offsets of the associated
range in each bitmap rather than shifting to the byte range for that
particular mapping.

For example, on a 4k fsb fs, buffer offset 4096 refers to the first byte
of the second mapping in the buffer. This means byte 0 of the second log
format bitmap should be tagged as dirty. Instead, the current code marks
byte offset 4096 of the second log format bitmap, which is invalid and
potentially out of range of the mapping.

As a result of this, the log item format code invoked at transaction
commit time is not be able to correctly identify what parts of the
buffer to copy into log vectors. This can lead to NULL log vector
pointer dereferences in CIL push context if the item format code was not
able to locate any dirty ranges at all. This crash has been reproduced
on a 4k FSB filesystem using 16k directory blocks where an unlink
operation happened not to log anything in the first block of the
mapping. The logged offsets were all over 4k, marked as such in the
subsequent log format mappings, and thus left the transaction with an
xfs_log_item that is marked DIRTY but without any logged regions.

Further, even when the logged regions are marked correctly in the buffer
log format bitmaps, the format code doesn't copy the correct ranges of
the buffer into the log. This means that any logged region beyond the
first block of a multi-block buffer is subject to corruption after a
crash and log recovery sequence. This is due to a failure to convert the
mapping bm_len field from basic blocks to bytes in the buffer offset
tracking code in xfs_buf_item_format().

Update xfs_buf_item_log() to convert buffer offsets to segment relative
offsets when logging multi-block buffers. This ensures that the modified
regions of a buffer are logged correctly and avoids the aforementioned
crash. Also update xfs_buf_item_format() to correctly track the source
offset into the buffer for the log vector formatting code. This ensures
that the correct data is copied into the log.

Signed-off-by: Brian Foster <bfoster@redhat.com>
---

Note that this has only been tested so far in a particular filesystem
environment that reproduces the crash/corruption discussed in the commit
log description. Generic testing still required, posting just to get a
jump on review...

Brian

 fs/xfs/xfs_buf_item.c | 18 +++++++++++++-----
 1 file changed, 13 insertions(+), 5 deletions(-)

diff --git a/fs/xfs/xfs_buf_item.c b/fs/xfs/xfs_buf_item.c
index 3425799..2e95ad0 100644
--- a/fs/xfs/xfs_buf_item.c
+++ b/fs/xfs/xfs_buf_item.c
@@ -359,7 +359,7 @@ xfs_buf_item_format(
 	for (i = 0; i < bip->bli_format_count; i++) {
 		xfs_buf_item_format_segment(bip, lv, &vecp, offset,
 					    &bip->bli_formats[i]);
-		offset += bp->b_maps[i].bm_len;
+		offset += BBTOB(bp->b_maps[i].bm_len);
 	}
 
 	/*
@@ -915,20 +915,28 @@ xfs_buf_item_log(
 	for (i = 0; i < bip->bli_format_count; i++) {
 		if (start > last)
 			break;
-		end = start + BBTOB(bp->b_maps[i].bm_len);
+		end = start + BBTOB(bp->b_maps[i].bm_len) - 1;
+
+		/* skip to the map that includes the first byte to log */
 		if (first > end) {
 			start += BBTOB(bp->b_maps[i].bm_len);
 			continue;
 		}
+
+		/*
+		 * Trim the range to this segment and mark it in the bitmap.
+		 * Note that we must convert buffer offsets to segment relative
+		 * offsets (e.g., the first byte of each segment is byte 0 of
+		 * that segment).
+		 */
 		if (first < start)
 			first = start;
 		if (end > last)
 			end = last;
-
-		xfs_buf_item_log_segment(first, end,
+		xfs_buf_item_log_segment(first - start, end - start,
 					 &bip->bli_formats[i].blf_data_map[0]);
 
-		start += bp->b_maps[i].bm_len;
+		start += BBTOB(bp->b_maps[i].bm_len);
 	}
 }
 
-- 
2.4.11

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH] xfs: fix broken multi-fsb buffer logging
  2016-05-25 19:31 [PATCH] xfs: fix broken multi-fsb buffer logging Brian Foster
@ 2016-05-30 14:08 ` Brian Foster
  2016-05-30 22:08   ` Eric Sandeen
  2016-05-31  8:32 ` Dave Chinner
  2016-05-31 18:29 ` Eric Sandeen
  2 siblings, 1 reply; 6+ messages in thread
From: Brian Foster @ 2016-05-30 14:08 UTC (permalink / raw)
  To: xfs

On Wed, May 25, 2016 at 03:31:07PM -0400, Brian Foster wrote:
> Multi-block buffers are logged based on buffer offset in
> xfs_trans_log_buf(). xfs_buf_item_log() ultimately walks each mapping in
> the buffer and marks the associated range to be logged in the
> xfs_buf_log_format bitmap for that mapping. This code is broken,
> however, in that it marks the actual buffer offsets of the associated
> range in each bitmap rather than shifting to the byte range for that
> particular mapping.
> 
> For example, on a 4k fsb fs, buffer offset 4096 refers to the first byte
> of the second mapping in the buffer. This means byte 0 of the second log
> format bitmap should be tagged as dirty. Instead, the current code marks
> byte offset 4096 of the second log format bitmap, which is invalid and
> potentially out of range of the mapping.
> 
> As a result of this, the log item format code invoked at transaction
> commit time is not be able to correctly identify what parts of the
> buffer to copy into log vectors. This can lead to NULL log vector
> pointer dereferences in CIL push context if the item format code was not
> able to locate any dirty ranges at all. This crash has been reproduced
> on a 4k FSB filesystem using 16k directory blocks where an unlink
> operation happened not to log anything in the first block of the
> mapping. The logged offsets were all over 4k, marked as such in the
> subsequent log format mappings, and thus left the transaction with an
> xfs_log_item that is marked DIRTY but without any logged regions.
> 
> Further, even when the logged regions are marked correctly in the buffer
> log format bitmaps, the format code doesn't copy the correct ranges of
> the buffer into the log. This means that any logged region beyond the
> first block of a multi-block buffer is subject to corruption after a
> crash and log recovery sequence. This is due to a failure to convert the
> mapping bm_len field from basic blocks to bytes in the buffer offset
> tracking code in xfs_buf_item_format().
> 
> Update xfs_buf_item_log() to convert buffer offsets to segment relative
> offsets when logging multi-block buffers. This ensures that the modified
> regions of a buffer are logged correctly and avoids the aforementioned
> crash. Also update xfs_buf_item_format() to correctly track the source
> offset into the buffer for the log vector formatting code. This ensures
> that the correct data is copied into the log.
> 
> Signed-off-by: Brian Foster <bfoster@redhat.com>
> ---
> 
> Note that this has only been tested so far in a particular filesystem
> environment that reproduces the crash/corruption discussed in the commit
> log description. Generic testing still required, posting just to get a
> jump on review...
> 

This has passed my tests without any obvious problems or regressions.
Any other thoughts/comments or reviews? I'd like to get a backport going
for this, but can't do so until there is at least some kind of
acknowledgement this is acceptable.

Brian

> Brian
> 
>  fs/xfs/xfs_buf_item.c | 18 +++++++++++++-----
>  1 file changed, 13 insertions(+), 5 deletions(-)
> 
> diff --git a/fs/xfs/xfs_buf_item.c b/fs/xfs/xfs_buf_item.c
> index 3425799..2e95ad0 100644
> --- a/fs/xfs/xfs_buf_item.c
> +++ b/fs/xfs/xfs_buf_item.c
> @@ -359,7 +359,7 @@ xfs_buf_item_format(
>  	for (i = 0; i < bip->bli_format_count; i++) {
>  		xfs_buf_item_format_segment(bip, lv, &vecp, offset,
>  					    &bip->bli_formats[i]);
> -		offset += bp->b_maps[i].bm_len;
> +		offset += BBTOB(bp->b_maps[i].bm_len);
>  	}
>  
>  	/*
> @@ -915,20 +915,28 @@ xfs_buf_item_log(
>  	for (i = 0; i < bip->bli_format_count; i++) {
>  		if (start > last)
>  			break;
> -		end = start + BBTOB(bp->b_maps[i].bm_len);
> +		end = start + BBTOB(bp->b_maps[i].bm_len) - 1;
> +
> +		/* skip to the map that includes the first byte to log */
>  		if (first > end) {
>  			start += BBTOB(bp->b_maps[i].bm_len);
>  			continue;
>  		}
> +
> +		/*
> +		 * Trim the range to this segment and mark it in the bitmap.
> +		 * Note that we must convert buffer offsets to segment relative
> +		 * offsets (e.g., the first byte of each segment is byte 0 of
> +		 * that segment).
> +		 */
>  		if (first < start)
>  			first = start;
>  		if (end > last)
>  			end = last;
> -
> -		xfs_buf_item_log_segment(first, end,
> +		xfs_buf_item_log_segment(first - start, end - start,
>  					 &bip->bli_formats[i].blf_data_map[0]);
>  
> -		start += bp->b_maps[i].bm_len;
> +		start += BBTOB(bp->b_maps[i].bm_len);
>  	}
>  }
>  
> -- 
> 2.4.11
> 
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] xfs: fix broken multi-fsb buffer logging
  2016-05-30 14:08 ` Brian Foster
@ 2016-05-30 22:08   ` Eric Sandeen
  0 siblings, 0 replies; 6+ messages in thread
From: Eric Sandeen @ 2016-05-30 22:08 UTC (permalink / raw)
  To: Brian Foster; +Cc: xfs



> On May 30, 2016, at 9:08 AM, Brian Foster <bfoster@redhat.com> wrote:
> 
>> ...
> 
> This has passed my tests without any obvious problems or regressions.
> Any other thoughts/comments or reviews? I'd like to get a backport going
> for this, but can't do so until there is at least some kind of
> acknowledgement this is acceptable.
> 
> Brian
> 
Sorry Brian, I meant to review this last Friday, but got wrapped up in other things… My first pass over it looks fine, I just want to give it a little more in depth review, and I will do my best to get that done tomorrow.

Eric

>> Brian

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] xfs: fix broken multi-fsb buffer logging
  2016-05-25 19:31 [PATCH] xfs: fix broken multi-fsb buffer logging Brian Foster
  2016-05-30 14:08 ` Brian Foster
@ 2016-05-31  8:32 ` Dave Chinner
  2016-05-31 22:39   ` Dave Chinner
  2016-05-31 18:29 ` Eric Sandeen
  2 siblings, 1 reply; 6+ messages in thread
From: Dave Chinner @ 2016-05-31  8:32 UTC (permalink / raw)
  To: Brian Foster; +Cc: xfs

On Wed, May 25, 2016 at 03:31:07PM -0400, Brian Foster wrote:
> Multi-block buffers are logged based on buffer offset in
> xfs_trans_log_buf(). xfs_buf_item_log() ultimately walks each mapping in
> the buffer and marks the associated range to be logged in the
> xfs_buf_log_format bitmap for that mapping. This code is broken,
> however, [....]

[snip description I've not read, and go look at the code changes]

> diff --git a/fs/xfs/xfs_buf_item.c b/fs/xfs/xfs_buf_item.c
> index 3425799..2e95ad0 100644
> --- a/fs/xfs/xfs_buf_item.c
> +++ b/fs/xfs/xfs_buf_item.c
> @@ -359,7 +359,7 @@ xfs_buf_item_format(
>  	for (i = 0; i < bip->bli_format_count; i++) {
>  		xfs_buf_item_format_segment(bip, lv, &vecp, offset,
>  					    &bip->bli_formats[i]);
> -		offset += bp->b_maps[i].bm_len;
> +		offset += BBTOB(bp->b_maps[i].bm_len);
>  	}

Ok, that means the offset into the multi-fsb buffer is wrong for the
second and subsequent chunks, resulting in copying the wrong range
into the log vector. Offset should be in bytes, so this is a valid
fix all by itself.


>  
>  	/*
> @@ -915,20 +915,28 @@ xfs_buf_item_log(
>  	for (i = 0; i < bip->bli_format_count; i++) {
>  		if (start > last)
>  			break;
> -		end = start + BBTOB(bp->b_maps[i].bm_len);
> +		end = start + BBTOB(bp->b_maps[i].bm_len) - 1;

Ok, buffer starts at offset 0, last byte in buffer is length - 1.
Yup, there's an off-by-one there - end points to the offset of the
first byte of the next buffer. This will screw up the range trimming
done below. I think this is probably harmless as it doesn't impact
the next iteration of the loop, but still good to get it right.

> +
> +		/* skip to the map that includes the first byte to log */
>  		if (first > end) {
>  			start += BBTOB(bp->b_maps[i].bm_len);
>  			continue;
>  		}
> +
> +		/*
> +		 * Trim the range to this segment and mark it in the bitmap.
> +		 * Note that we must convert buffer offsets to segment relative
> +		 * offsets (e.g., the first byte of each segment is byte 0 of
> +		 * that segment).
> +		 */
>  		if (first < start)
>  			first = start;
>  		if (end > last)
>  			end = last;
> -
> -		xfs_buf_item_log_segment(first, end,
> +		xfs_buf_item_log_segment(first - start, end - start,
>  					 &bip->bli_formats[i].blf_data_map[0]);

Ok so we pass offsets first and end as the range that needs to be
logged in the segment, and start is the byte offset of the first
byte of data in the current buffer segment.

/me looks at xfs_buf_item_log_segment()

It sets bits in the format item based on offsets passed in.
That means if the start is non-zero (i.e. we're in the second
buffer range) we'll set bits in the dirty mask that map to first/end
rather than 0/buffer len.

/me looks at xfs_buf_item_size, follows blf_map_size to it's
initialisation, sees that first/end bits will be beyond blf_map_size
for second+ buffer range and so never seen by xfs_buf_item_size().

Hence subtracting start normalises the range being logged to 
offsets within the buffer range, and so now xfs_buf_item_size (and
hence xfs_buf_item_format) will see them correctly as dirty.

Yes, that change looks correct.

>  
> -		start += bp->b_maps[i].bm_len;
> +		start += BBTOB(bp->b_maps[i].bm_len);

Yup, same as the first hunk in the patch.

So from the code perspective the change looks correct. I've looked
over all the other users of bp->b_maps[i].bm_len and loops over
bli_format_count and I can't see any other obvious problems. I'm
going to leave this under test overnight and see if anything pops
up...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] xfs: fix broken multi-fsb buffer logging
  2016-05-25 19:31 [PATCH] xfs: fix broken multi-fsb buffer logging Brian Foster
  2016-05-30 14:08 ` Brian Foster
  2016-05-31  8:32 ` Dave Chinner
@ 2016-05-31 18:29 ` Eric Sandeen
  2 siblings, 0 replies; 6+ messages in thread
From: Eric Sandeen @ 2016-05-31 18:29 UTC (permalink / raw)
  To: xfs

On 5/25/16 2:31 PM, Brian Foster wrote:
> Multi-block buffers are logged based on buffer offset in
> xfs_trans_log_buf(). xfs_buf_item_log() ultimately walks each mapping in
> the buffer and marks the associated range to be logged in the
> xfs_buf_log_format bitmap for that mapping. This code is broken,
> however, in that it marks the actual buffer offsets of the associated
> range in each bitmap rather than shifting to the byte range for that
> particular mapping.
> 
> For example, on a 4k fsb fs, buffer offset 4096 refers to the first byte
> of the second mapping in the buffer. This means byte 0 of the second log
> format bitmap should be tagged as dirty. Instead, the current code marks
> byte offset 4096 of the second log format bitmap, which is invalid and
> potentially out of range of the mapping.
> 
> As a result of this, the log item format code invoked at transaction
> commit time is not be able to correctly identify what parts of the
> buffer to copy into log vectors. This can lead to NULL log vector
> pointer dereferences in CIL push context if the item format code was not
> able to locate any dirty ranges at all. This crash has been reproduced
> on a 4k FSB filesystem using 16k directory blocks where an unlink
> operation happened not to log anything in the first block of the
> mapping. The logged offsets were all over 4k, marked as such in the
> subsequent log format mappings, and thus left the transaction with an
> xfs_log_item that is marked DIRTY but without any logged regions.
> 
> Further, even when the logged regions are marked correctly in the buffer
> log format bitmaps, the format code doesn't copy the correct ranges of
> the buffer into the log. This means that any logged region beyond the
> first block of a multi-block buffer is subject to corruption after a
> crash and log recovery sequence. This is due to a failure to convert the
> mapping bm_len field from basic blocks to bytes in the buffer offset
> tracking code in xfs_buf_item_format().
> 
> Update xfs_buf_item_log() to convert buffer offsets to segment relative
> offsets when logging multi-block buffers. This ensures that the modified
> regions of a buffer are logged correctly and avoids the aforementioned
> crash. Also update xfs_buf_item_format() to correctly track the source
> offset into the buffer for the log vector formatting code. This ensures
> that the correct data is copied into the log.
> 
> Signed-off-by: Brian Foster <bfoster@redhat.com>
> ---
> 
> Note that this has only been tested so far in a particular filesystem
> environment that reproduces the crash/corruption discussed in the commit
> log description. Generic testing still required, posting just to get a
> jump on review...
> 
> Brian
> 
>  fs/xfs/xfs_buf_item.c | 18 +++++++++++++-----
>  1 file changed, 13 insertions(+), 5 deletions(-)
> 
> diff --git a/fs/xfs/xfs_buf_item.c b/fs/xfs/xfs_buf_item.c
> index 3425799..2e95ad0 100644
> --- a/fs/xfs/xfs_buf_item.c
> +++ b/fs/xfs/xfs_buf_item.c
> @@ -359,7 +359,7 @@ xfs_buf_item_format(
>  	for (i = 0; i < bip->bli_format_count; i++) {
>  		xfs_buf_item_format_segment(bip, lv, &vecp, offset,
>  					    &bip->bli_formats[i]);
> -		offset += bp->b_maps[i].bm_len;
> +		offset += BBTOB(bp->b_maps[i].bm_len);

Yep this and the other instance in the other hunk look obviously
correct.

>  	}
>  
>  	/*
> @@ -915,20 +915,28 @@ xfs_buf_item_log(
>  	for (i = 0; i < bip->bli_format_count; i++) {
>  		if (start > last)
>  			break;
> -		end = start + BBTOB(bp->b_maps[i].bm_len);
> +		end = start + BBTOB(bp->b_maps[i].bm_len) - 1;
> +

So let's say start is 0, and len is 8, just making up nice numbers.
That means a range of 0 through 7, with the next one starting at 8.
7 is the last block, so if "first" is beyond that, we need to skip.
Makes sense.

> +		/* skip to the map that includes the first byte to log */
>  		if (first > end) {
>  			start += BBTOB(bp->b_maps[i].bm_len);
>  			continue;
>  		}
> +
> +		/*
> +		 * Trim the range to this segment and mark it in the bitmap.
> +		 * Note that we must convert buffer offsets to segment relative
> +		 * offsets (e.g., the first byte of each segment is byte 0 of
> +		 * that segment).
> +		 */
>  		if (first < start)
>  			first = start;
>  		if (end > last)
>  			end = last;
> -
> -		xfs_buf_item_log_segment(first, end,
> +		xfs_buf_item_log_segment(first - start, end - start,
>  					 &bip->bli_formats[i].blf_data_map[0]);

I guess this is the trickiest part.

in the called function, we do:

        /*
         * Get a pointer to the first word in the bitmap
         * to set a bit in.
         */
        word_num = first_bit >> BIT_TO_WORD_SHIFT;
        wordp = &map[word_num];

where first_bit is derived from the passed-in "first" argument
(in bytes).  So we're marking bits in map[]; this skips us out into
the word_num index in map[].  But what's map?

The map[] we passed in is &bip->bli_formats[i].blf_data_map[0], where
'i' is the i'th segment.  So yes, this needs to be segment-relative
bytes, whereas start (and therefore end) are offsets in the whole
buffer.  So yes, this makes sense to me as well.


>  
> -		start += bp->b_maps[i].bm_len;
> +		start += BBTOB(bp->b_maps[i].bm_len);

Again obviously correct.

Reviewed-by: Eric Sandeen <sandeen@redhat.com>

>  	}
>  }
>  
> 

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] xfs: fix broken multi-fsb buffer logging
  2016-05-31  8:32 ` Dave Chinner
@ 2016-05-31 22:39   ` Dave Chinner
  0 siblings, 0 replies; 6+ messages in thread
From: Dave Chinner @ 2016-05-31 22:39 UTC (permalink / raw)
  To: Brian Foster; +Cc: xfs

On Tue, May 31, 2016 at 06:32:52PM +1000, Dave Chinner wrote:
> On Wed, May 25, 2016 at 03:31:07PM -0400, Brian Foster wrote:
> > Multi-block buffers are logged based on buffer offset in
> > xfs_trans_log_buf(). xfs_buf_item_log() ultimately walks each mapping in
> > the buffer and marks the associated range to be logged in the
> > xfs_buf_log_format bitmap for that mapping. This code is broken,
> > however, [....]
> 
> [snip description I've not read, and go look at the code changes]

....

> So from the code perspective the change looks correct. I've looked
> over all the other users of bp->b_maps[i].bm_len and loops over
> bli_format_count and I can't see any other obvious problems. I'm
> going to leave this under test overnight and see if anything pops
> up...

Passes all the tests I've thrown at it. multiple xfstests runs on
multiple machines with dir block sizes from 16k to 64k, fsmark runs
with 16k and 64k block sizes, dbench stress loops with up to 500
processes running at once, etc.

Reviewed-by: Dave Chinner <dchinner@redhat.com>

I'll commit this when I get to updating the xfs trees now that
the merge window has closed and 4.7-rc1 is out. Hopefully that will
be later today....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2016-05-31 22:40 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-05-25 19:31 [PATCH] xfs: fix broken multi-fsb buffer logging Brian Foster
2016-05-30 14:08 ` Brian Foster
2016-05-30 22:08   ` Eric Sandeen
2016-05-31  8:32 ` Dave Chinner
2016-05-31 22:39   ` Dave Chinner
2016-05-31 18:29 ` Eric Sandeen

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.