All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] xfs: handle array index overrun in xfs_dir2_leaf_readbuf()
@ 2017-04-13 18:45 Eric Sandeen
  2017-04-13 19:09 ` Carlos Maiolino
                   ` (3 more replies)
  0 siblings, 4 replies; 15+ messages in thread
From: Eric Sandeen @ 2017-04-13 18:45 UTC (permalink / raw)
  To: linux-xfs, Carlos Maiolino

Carlos had a case where "find" seemed to start spinning
forever and never return.

This was on a filesystem with non-default multi-fsb (8k)
directory blocks, and a fragmented directory with extents
like this:

0:[0,133646,2,0]
1:[2,195888,1,0]
2:[3,195890,1,0]
3:[4,195892,1,0]
4:[5,195894,1,0]
5:[6,195896,1,0]
6:[7,195898,1,0]
7:[8,195900,1,0]
8:[9,195902,1,0]
9:[10,195908,1,0]
10:[11,195910,1,0]
11:[12,195912,1,0]
12:[13,195914,1,0]
...

i.e. the first extent is a contiguous 2-fsb dir block, but
after that it is fragmented into 1 block extents.

At the top of the readdir path, we allocate a mapping array
which (for this filesystem geometry) can hold 10 extents; see
the assignment to map_info->map_size.  During readdir, we are
therefore able to map extents 0 through 9 above into the array
for readahead purposes.  If we count by 2, we see that the last
mapped index (9) is the first block of a 2-fsb directory block.

At the end of xfs_dir2_leaf_readbuf() we have 2 loops to fill
more readahead; the outer loop assumes one full dir block is
processed each loop iteration, and an inner loop that ensures
that this is so by advancing to the next extent until a full
directory block is mapped.

The problem is that this inner loop may step past the last
extent in the mapping array as it tries to reach the end of
the directory block.  This will read garbage for the extent
length, and as a result the loop control variable 'j' may
become corrupted and never fail the loop conditional.

The number of valid mappings we have in our array is stored
in map->map_valid, so stop this inner loop based on that limit.

There is an ASSERT at the top of the outer loop for this
same condition, but we never made it out of the inner loop,
so the ASSERT never fired.

Huge appreciation for Carlos for debugging and isolating
the problem.

Debugged-and-analyzed-by: Carlos Maiolino <cmaiolin@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
---


diff --git a/fs/xfs/xfs_dir2_readdir.c b/fs/xfs/xfs_dir2_readdir.c
index ad9396e..c45de72 100644
--- a/fs/xfs/xfs_dir2_readdir.c
+++ b/fs/xfs/xfs_dir2_readdir.c
@@ -394,6 +394,7 @@ struct xfs_dir2_leaf_map_info {
 
 	/*
 	 * Do we need more readahead?
+	 * Each loop tries to process 1 full dir blk; last may be partial.
 	 */
 	blk_start_plug(&plug);
 	for (mip->ra_index = mip->ra_offset = i = 0;
@@ -425,9 +426,14 @@ struct xfs_dir2_leaf_map_info {
 		}
 
 		/*
-		 * Advance offset through the mapping table.
+		 * Advance offset through the mapping table, processing a full
+		 * dir block even if it is fragmented into several extents.
+		 * But stop if we have consumed all valid mappings, even if
+		 * it's not yet a full directory block.
 		 */
-		for (j = 0; j < geo->fsbcount; j += length ) {
+		for (j = 0;
+		     j < geo->fsbcount && mip->ra_index < mip->map_valid;
+		     j += length ) {
 			/*
 			 * The rest of this extent but not more than a dir
 			 * block.



^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH] xfs: handle array index overrun in xfs_dir2_leaf_readbuf()
  2017-04-13 18:45 [PATCH] xfs: handle array index overrun in xfs_dir2_leaf_readbuf() Eric Sandeen
@ 2017-04-13 19:09 ` Carlos Maiolino
  2017-04-13 22:12 ` Darrick J. Wong
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 15+ messages in thread
From: Carlos Maiolino @ 2017-04-13 19:09 UTC (permalink / raw)
  To: Eric Sandeen; +Cc: linux-xfs, Carlos Maiolino

On Thu, Apr 13, 2017 at 01:45:43PM -0500, Eric Sandeen wrote:
> Carlos had a case where "find" seemed to start spinning
> forever and never return.
> 
> This was on a filesystem with non-default multi-fsb (8k)
> directory blocks, and a fragmented directory with extents
> like this:
> 
> 0:[0,133646,2,0]
> 1:[2,195888,1,0]
> 2:[3,195890,1,0]
> 3:[4,195892,1,0]
> 4:[5,195894,1,0]
> 5:[6,195896,1,0]
> 6:[7,195898,1,0]
> 7:[8,195900,1,0]
> 8:[9,195902,1,0]
> 9:[10,195908,1,0]
> 10:[11,195910,1,0]
> 11:[12,195912,1,0]
> 12:[13,195914,1,0]
> ...
> 
> i.e. the first extent is a contiguous 2-fsb dir block, but
> after that it is fragmented into 1 block extents.
> 
> Huge appreciation for Carlos for debugging and isolating
> the problem.

Thank you :)
> 
> Debugged-and-analyzed-by: Carlos Maiolino <cmaiolin@redhat.com>
Darrick, if this goes in, could you change the email to <cmaiolino@redhat.com>?


The patch looks good to me, and I've tested it with the broken image I've been
working with, so, you can add:

Tested-by: Carlos Maiolino <cmaiolino@redhat.com>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>



Thanks Eric
> Signed-off-by: Eric Sandeen <sandeen@redhat.com>
> ---
> 
> 
> diff --git a/fs/xfs/xfs_dir2_readdir.c b/fs/xfs/xfs_dir2_readdir.c
> index ad9396e..c45de72 100644
> --- a/fs/xfs/xfs_dir2_readdir.c
> +++ b/fs/xfs/xfs_dir2_readdir.c
> @@ -394,6 +394,7 @@ struct xfs_dir2_leaf_map_info {
>  
>  	/*
>  	 * Do we need more readahead?
> +	 * Each loop tries to process 1 full dir blk; last may be partial.
>  	 */
>  	blk_start_plug(&plug);
>  	for (mip->ra_index = mip->ra_offset = i = 0;
> @@ -425,9 +426,14 @@ struct xfs_dir2_leaf_map_info {
>  		}
>  
>  		/*
> -		 * Advance offset through the mapping table.
> +		 * Advance offset through the mapping table, processing a full
> +		 * dir block even if it is fragmented into several extents.
> +		 * But stop if we have consumed all valid mappings, even if
> +		 * it's not yet a full directory block.
>  		 */
> -		for (j = 0; j < geo->fsbcount; j += length ) {
> +		for (j = 0;
> +		     j < geo->fsbcount && mip->ra_index < mip->map_valid;
> +		     j += length ) {
>  			/*
>  			 * The rest of this extent but not more than a dir
>  			 * block.
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

-- 
Carlos

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] xfs: handle array index overrun in xfs_dir2_leaf_readbuf()
  2017-04-13 18:45 [PATCH] xfs: handle array index overrun in xfs_dir2_leaf_readbuf() Eric Sandeen
  2017-04-13 19:09 ` Carlos Maiolino
@ 2017-04-13 22:12 ` Darrick J. Wong
  2017-04-13 22:24 ` Bill O'Donnell
  2017-04-17 20:57 ` Brian Foster
  3 siblings, 0 replies; 15+ messages in thread
From: Darrick J. Wong @ 2017-04-13 22:12 UTC (permalink / raw)
  To: Eric Sandeen; +Cc: linux-xfs, Carlos Maiolino

On Thu, Apr 13, 2017 at 01:45:43PM -0500, Eric Sandeen wrote:
> Carlos had a case where "find" seemed to start spinning
> forever and never return.
> 
> This was on a filesystem with non-default multi-fsb (8k)
> directory blocks, and a fragmented directory with extents
> like this:
> 
> 0:[0,133646,2,0]
> 1:[2,195888,1,0]
> 2:[3,195890,1,0]
> 3:[4,195892,1,0]
> 4:[5,195894,1,0]
> 5:[6,195896,1,0]
> 6:[7,195898,1,0]
> 7:[8,195900,1,0]
> 8:[9,195902,1,0]
> 9:[10,195908,1,0]
> 10:[11,195910,1,0]
> 11:[12,195912,1,0]
> 12:[13,195914,1,0]
> ...
> 
> i.e. the first extent is a contiguous 2-fsb dir block, but
> after that it is fragmented into 1 block extents.
> 
> At the top of the readdir path, we allocate a mapping array
> which (for this filesystem geometry) can hold 10 extents; see
> the assignment to map_info->map_size.  During readdir, we are
> therefore able to map extents 0 through 9 above into the array
> for readahead purposes.  If we count by 2, we see that the last
> mapped index (9) is the first block of a 2-fsb directory block.
> 
> At the end of xfs_dir2_leaf_readbuf() we have 2 loops to fill
> more readahead; the outer loop assumes one full dir block is
> processed each loop iteration, and an inner loop that ensures
> that this is so by advancing to the next extent until a full
> directory block is mapped.
> 
> The problem is that this inner loop may step past the last
> extent in the mapping array as it tries to reach the end of
> the directory block.  This will read garbage for the extent
> length, and as a result the loop control variable 'j' may
> become corrupted and never fail the loop conditional.
> 
> The number of valid mappings we have in our array is stored
> in map->map_valid, so stop this inner loop based on that limit.
> 
> There is an ASSERT at the top of the outer loop for this
> same condition, but we never made it out of the inner loop,
> so the ASSERT never fired.
> 
> Huge appreciation for Carlos for debugging and isolating
> the problem.
> 
> Debugged-and-analyzed-by: Carlos Maiolino <cmaiolin@redhat.com>
> Signed-off-by: Eric Sandeen <sandeen@redhat.com>
> ---
> 
> 
> diff --git a/fs/xfs/xfs_dir2_readdir.c b/fs/xfs/xfs_dir2_readdir.c
> index ad9396e..c45de72 100644
> --- a/fs/xfs/xfs_dir2_readdir.c
> +++ b/fs/xfs/xfs_dir2_readdir.c
> @@ -394,6 +394,7 @@ struct xfs_dir2_leaf_map_info {
>  
>  	/*
>  	 * Do we need more readahead?
> +	 * Each loop tries to process 1 full dir blk; last may be partial.
>  	 */
>  	blk_start_plug(&plug);
>  	for (mip->ra_index = mip->ra_offset = i = 0;
> @@ -425,9 +426,14 @@ struct xfs_dir2_leaf_map_info {
>  		}
>  
>  		/*
> -		 * Advance offset through the mapping table.
> +		 * Advance offset through the mapping table, processing a full
> +		 * dir block even if it is fragmented into several extents.
> +		 * But stop if we have consumed all valid mappings, even if
> +		 * it's not yet a full directory block.
>  		 */
> -		for (j = 0; j < geo->fsbcount; j += length ) {
> +		for (j = 0;
> +		     j < geo->fsbcount && mip->ra_index < mip->map_valid;
> +		     j += length ) {

Seems ok for a bug fix, but /me wonders if it's worth all the loopy
effort if ultimately _dabuf_map can sort through compound (i.e. non
contiguous) directory blocks, which means that we could just
_dir3_data_readahead(dabno + i, -1) until we've scheduled readahead on
32k worth of da blocks.  We've already got the extents loaded, so it
ought to be safe to bmapi_read...

<shrug>  Well now that there's a test I have something to keep the CPUs
warm for a while...

Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>

--D

>  			/*
>  			 * The rest of this extent but not more than a dir
>  			 * block.
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] xfs: handle array index overrun in xfs_dir2_leaf_readbuf()
  2017-04-13 18:45 [PATCH] xfs: handle array index overrun in xfs_dir2_leaf_readbuf() Eric Sandeen
  2017-04-13 19:09 ` Carlos Maiolino
  2017-04-13 22:12 ` Darrick J. Wong
@ 2017-04-13 22:24 ` Bill O'Donnell
  2017-04-17 20:57 ` Brian Foster
  3 siblings, 0 replies; 15+ messages in thread
From: Bill O'Donnell @ 2017-04-13 22:24 UTC (permalink / raw)
  To: Eric Sandeen; +Cc: linux-xfs, Carlos Maiolino

On Thu, Apr 13, 2017 at 01:45:43PM -0500, Eric Sandeen wrote:
> Carlos had a case where "find" seemed to start spinning
> forever and never return.
> 
> This was on a filesystem with non-default multi-fsb (8k)
> directory blocks, and a fragmented directory with extents
> like this:
> 
> 0:[0,133646,2,0]
> 1:[2,195888,1,0]
> 2:[3,195890,1,0]
> 3:[4,195892,1,0]
> 4:[5,195894,1,0]
> 5:[6,195896,1,0]
> 6:[7,195898,1,0]
> 7:[8,195900,1,0]
> 8:[9,195902,1,0]
> 9:[10,195908,1,0]
> 10:[11,195910,1,0]
> 11:[12,195912,1,0]
> 12:[13,195914,1,0]
> ...
> 
> i.e. the first extent is a contiguous 2-fsb dir block, but
> after that it is fragmented into 1 block extents.
> 
> At the top of the readdir path, we allocate a mapping array
> which (for this filesystem geometry) can hold 10 extents; see
> the assignment to map_info->map_size.  During readdir, we are
> therefore able to map extents 0 through 9 above into the array
> for readahead purposes.  If we count by 2, we see that the last
> mapped index (9) is the first block of a 2-fsb directory block.
> 
> At the end of xfs_dir2_leaf_readbuf() we have 2 loops to fill
> more readahead; the outer loop assumes one full dir block is
> processed each loop iteration, and an inner loop that ensures
> that this is so by advancing to the next extent until a full
> directory block is mapped.
> 
> The problem is that this inner loop may step past the last
> extent in the mapping array as it tries to reach the end of
> the directory block.  This will read garbage for the extent
> length, and as a result the loop control variable 'j' may
> become corrupted and never fail the loop conditional.
> 
> The number of valid mappings we have in our array is stored
> in map->map_valid, so stop this inner loop based on that limit.
> 
> There is an ASSERT at the top of the outer loop for this
> same condition, but we never made it out of the inner loop,
> so the ASSERT never fired.
> 
> Huge appreciation for Carlos for debugging and isolating
> the problem.
> 
> Debugged-and-analyzed-by: Carlos Maiolino <cmaiolin@redhat.com>
> Signed-off-by: Eric Sandeen <sandeen@redhat.com>

Looks fine.
-Bill

Reviewed-by: Bill O'Donnell <billodo@redhat.com>

> ---
> 
> 
> diff --git a/fs/xfs/xfs_dir2_readdir.c b/fs/xfs/xfs_dir2_readdir.c
> index ad9396e..c45de72 100644
> --- a/fs/xfs/xfs_dir2_readdir.c
> +++ b/fs/xfs/xfs_dir2_readdir.c
> @@ -394,6 +394,7 @@ struct xfs_dir2_leaf_map_info {
>  
>  	/*
>  	 * Do we need more readahead?
> +	 * Each loop tries to process 1 full dir blk; last may be partial.
>  	 */
>  	blk_start_plug(&plug);
>  	for (mip->ra_index = mip->ra_offset = i = 0;
> @@ -425,9 +426,14 @@ struct xfs_dir2_leaf_map_info {
>  		}
>  
>  		/*
> -		 * Advance offset through the mapping table.
> +		 * Advance offset through the mapping table, processing a full
> +		 * dir block even if it is fragmented into several extents.
> +		 * But stop if we have consumed all valid mappings, even if
> +		 * it's not yet a full directory block.
>  		 */
> -		for (j = 0; j < geo->fsbcount; j += length ) {
> +		for (j = 0;
> +		     j < geo->fsbcount && mip->ra_index < mip->map_valid;
> +		     j += length ) {
>  			/*
>  			 * The rest of this extent but not more than a dir
>  			 * block.
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] xfs: handle array index overrun in xfs_dir2_leaf_readbuf()
  2017-04-13 18:45 [PATCH] xfs: handle array index overrun in xfs_dir2_leaf_readbuf() Eric Sandeen
                   ` (2 preceding siblings ...)
  2017-04-13 22:24 ` Bill O'Donnell
@ 2017-04-17 20:57 ` Brian Foster
  2017-04-17 22:12   ` Eric Sandeen
  3 siblings, 1 reply; 15+ messages in thread
From: Brian Foster @ 2017-04-17 20:57 UTC (permalink / raw)
  To: Eric Sandeen; +Cc: linux-xfs, Carlos Maiolino

On Thu, Apr 13, 2017 at 01:45:43PM -0500, Eric Sandeen wrote:
> Carlos had a case where "find" seemed to start spinning
> forever and never return.
> 
> This was on a filesystem with non-default multi-fsb (8k)
> directory blocks, and a fragmented directory with extents
> like this:
> 
> 0:[0,133646,2,0]
> 1:[2,195888,1,0]
> 2:[3,195890,1,0]
> 3:[4,195892,1,0]
> 4:[5,195894,1,0]
> 5:[6,195896,1,0]
> 6:[7,195898,1,0]
> 7:[8,195900,1,0]
> 8:[9,195902,1,0]
> 9:[10,195908,1,0]
> 10:[11,195910,1,0]
> 11:[12,195912,1,0]
> 12:[13,195914,1,0]
> ...
> 

This fix seems fine to me, but I'm wondering if this code may have
issues with other kinds of misalignment between the directory blocks and
underlying bmap extents as well. For example, what happens if we end up
with something like the following on an 8k dir fsb fs?

 0:[0,xxx,3,0]
 1:[3,xxx,1,0]

... or ...

 0:[0,xxx,3,0]
 1:[3,xxx,3,0]
 ...
 N:[...]

Am I following correctly that we may end up assuming the wrong mapping
for the second dir fsb and/or possibly skipping blocks?

Brian

> i.e. the first extent is a contiguous 2-fsb dir block, but
> after that it is fragmented into 1 block extents.
> 
> At the top of the readdir path, we allocate a mapping array
> which (for this filesystem geometry) can hold 10 extents; see
> the assignment to map_info->map_size.  During readdir, we are
> therefore able to map extents 0 through 9 above into the array
> for readahead purposes.  If we count by 2, we see that the last
> mapped index (9) is the first block of a 2-fsb directory block.
> 
> At the end of xfs_dir2_leaf_readbuf() we have 2 loops to fill
> more readahead; the outer loop assumes one full dir block is
> processed each loop iteration, and an inner loop that ensures
> that this is so by advancing to the next extent until a full
> directory block is mapped.
> 
> The problem is that this inner loop may step past the last
> extent in the mapping array as it tries to reach the end of
> the directory block.  This will read garbage for the extent
> length, and as a result the loop control variable 'j' may
> become corrupted and never fail the loop conditional.
> 
> The number of valid mappings we have in our array is stored
> in map->map_valid, so stop this inner loop based on that limit.
> 
> There is an ASSERT at the top of the outer loop for this
> same condition, but we never made it out of the inner loop,
> so the ASSERT never fired.
> 
> Huge appreciation for Carlos for debugging and isolating
> the problem.
> 
> Debugged-and-analyzed-by: Carlos Maiolino <cmaiolin@redhat.com>
> Signed-off-by: Eric Sandeen <sandeen@redhat.com>
> ---
> 
> 
> diff --git a/fs/xfs/xfs_dir2_readdir.c b/fs/xfs/xfs_dir2_readdir.c
> index ad9396e..c45de72 100644
> --- a/fs/xfs/xfs_dir2_readdir.c
> +++ b/fs/xfs/xfs_dir2_readdir.c
> @@ -394,6 +394,7 @@ struct xfs_dir2_leaf_map_info {
>  
>  	/*
>  	 * Do we need more readahead?
> +	 * Each loop tries to process 1 full dir blk; last may be partial.
>  	 */
>  	blk_start_plug(&plug);
>  	for (mip->ra_index = mip->ra_offset = i = 0;
> @@ -425,9 +426,14 @@ struct xfs_dir2_leaf_map_info {
>  		}
>  
>  		/*
> -		 * Advance offset through the mapping table.
> +		 * Advance offset through the mapping table, processing a full
> +		 * dir block even if it is fragmented into several extents.
> +		 * But stop if we have consumed all valid mappings, even if
> +		 * it's not yet a full directory block.
>  		 */
> -		for (j = 0; j < geo->fsbcount; j += length ) {
> +		for (j = 0;
> +		     j < geo->fsbcount && mip->ra_index < mip->map_valid;
> +		     j += length ) {
>  			/*
>  			 * The rest of this extent but not more than a dir
>  			 * block.
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] xfs: handle array index overrun in xfs_dir2_leaf_readbuf()
  2017-04-17 20:57 ` Brian Foster
@ 2017-04-17 22:12   ` Eric Sandeen
  2017-04-18 16:55     ` Brian Foster
  0 siblings, 1 reply; 15+ messages in thread
From: Eric Sandeen @ 2017-04-17 22:12 UTC (permalink / raw)
  To: Brian Foster; +Cc: linux-xfs, Carlos Maiolino

On 4/17/17 3:57 PM, Brian Foster wrote:
> On Thu, Apr 13, 2017 at 01:45:43PM -0500, Eric Sandeen wrote:
>> Carlos had a case where "find" seemed to start spinning
>> forever and never return.
>>
>> This was on a filesystem with non-default multi-fsb (8k)
>> directory blocks, and a fragmented directory with extents
>> like this:
>>
>> 0:[0,133646,2,0]
>> 1:[2,195888,1,0]
>> 2:[3,195890,1,0]
>> 3:[4,195892,1,0]
>> 4:[5,195894,1,0]
>> 5:[6,195896,1,0]
>> 6:[7,195898,1,0]
>> 7:[8,195900,1,0]
>> 8:[9,195902,1,0]
>> 9:[10,195908,1,0]
>> 10:[11,195910,1,0]
>> 11:[12,195912,1,0]
>> 12:[13,195914,1,0]
>> ...
>>
> 
> This fix seems fine to me, but I'm wondering if this code may have
> issues with other kinds of misalignment between the directory blocks and
> underlying bmap extents as well. For example, what happens if we end up
> with something like the following on an 8k dir fsb fs?
> 
>  0:[0,xxx,3,0]
>  1:[3,xxx,1,0]
> 
> ... or ...
> 
>  0:[0,xxx,3,0]
>  1:[3,xxx,3,0]

Well, as far as that goes it won't be an issue; for 8k dir block sizes
we will allocate an extent map with room for 10 extents, and we'll go
well beyond the above extents which cross directory block boundaries.

>  ...
>  N:[...]
> 
> Am I following correctly that we may end up assuming the wrong mapping
> for the second dir fsb and/or possibly skipping blocks?

As far as I can tell, this code is only managing the read-ahead state
by looking at these cached extents. We keep track of our position within
that allocated array of mappings - this bug just stepped off the end
while doing so.

Stopping at the correct point should keep all of the state consistent
and correct.

But yeah, it's kind of hairy & hard to read, IMHO.

Also as far as I can tell, we handle such discontiguities correctly,
other than the bug I found.  If you see something that looks suspicious,
I'm sure I could tweak my test case to craft a specific situation if
there's something you'd like to see tested...

-Eric

> Brian



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] xfs: handle array index overrun in xfs_dir2_leaf_readbuf()
  2017-04-17 22:12   ` Eric Sandeen
@ 2017-04-18 16:55     ` Brian Foster
  2017-04-18 17:00       ` Brian Foster
  2017-04-19  0:09       ` Darrick J. Wong
  0 siblings, 2 replies; 15+ messages in thread
From: Brian Foster @ 2017-04-18 16:55 UTC (permalink / raw)
  To: Eric Sandeen; +Cc: linux-xfs, Carlos Maiolino

On Mon, Apr 17, 2017 at 05:12:36PM -0500, Eric Sandeen wrote:
> On 4/17/17 3:57 PM, Brian Foster wrote:
> > On Thu, Apr 13, 2017 at 01:45:43PM -0500, Eric Sandeen wrote:
...
> > 
> > This fix seems fine to me, but I'm wondering if this code may have
> > issues with other kinds of misalignment between the directory blocks and
> > underlying bmap extents as well. For example, what happens if we end up
> > with something like the following on an 8k dir fsb fs?
> > 
> >  0:[0,xxx,3,0]
> >  1:[3,xxx,1,0]
> > 
> > ... or ...
> > 
> >  0:[0,xxx,3,0]
> >  1:[3,xxx,3,0]
> 
> Well, as far as that goes it won't be an issue; for 8k dir block sizes
> we will allocate an extent map with room for 10 extents, and we'll go
> well beyond the above extents which cross directory block boundaries.
> 
> >  ...
> >  N:[...]
> > 
> > Am I following correctly that we may end up assuming the wrong mapping
> > for the second dir fsb and/or possibly skipping blocks?
> 
> As far as I can tell, this code is only managing the read-ahead state
> by looking at these cached extents. We keep track of our position within
> that allocated array of mappings - this bug just stepped off the end
> while doing so.
> 
> Stopping at the correct point should keep all of the state consistent
> and correct.
> 
> But yeah, it's kind of hairy & hard to read, IMHO.
> 
> Also as far as I can tell, we handle such discontiguities correctly,
> other than the bug I found.  If you see something that looks suspicious,
> I'm sure I could tweak my test case to craft a specific situation if
> there's something you'd like to see tested...
> 

Background: Eric and I chatted a bit on irc to rectify that what I'm
calling out above is a different issue from what is fixed by this patch.

Eric,

I managed to construct a directory that looks like this:

 EXT: FILE-OFFSET      BLOCK-RANGE      AG AG-OFFSET        TOTAL
   0: [0..7]:          88..95            0 (88..95)             8
   1: [8..15]:         80..87            0 (80..87)             8
   2: [16..39]:        168..191          0 (168..191)          24
   3: [40..63]:        5242952..5242975  1 (72..95)            24

The fs has 8k directory fsbs. Dir fsb offset 0 spans extents 0 and 1,
offset 1 (which corresponds to the 512b range 16-31 above) is covered
completely by extent 2 and dir offset 2 (range 32-47) spans extents 2
and 3. An ls of this directory produces this:

 XFS (dm-3): Metadata corruption detected at xfs_dir3_data_reada_verify+0x42/0x80 [xfs], xfs_dir3_data_reada block 0x500058
 XFS (dm-3): Unmount and run xfs_repair
 XFS (dm-3): First 64 bytes of corrupted metadata buffer:
 ffffbcb901c44000: 00 00 00 00 00 00 00 64 0f 78 78 78 78 78 78 78  .......d.xxxxxxx
 ffffbcb901c44010: 78 78 78 78 2e 38 38 36 01 00 00 00 00 00 10 00  xxxx.886........
 ffffbcb901c44020: 00 00 00 00 00 00 00 64 0f 78 78 78 78 78 78 78  .......d.xxxxxxx
 ffffbcb901c44030: 78 78 78 78 2e 38 38 37 01 00 00 00 00 00 10 20  xxxx.887....... 

... which is yelling about block 184 (dir fsb 2). The fs is otherwise
clean according to xfs_repair. 

I _think_ something like the appended diff deals with it, but this is
lightly tested only and could definitely use more eyes.

Brian

--- 8< ---

diff --git a/fs/xfs/xfs_dir2_readdir.c b/fs/xfs/xfs_dir2_readdir.c
index ad9396e..9fa379d 100644
--- a/fs/xfs/xfs_dir2_readdir.c
+++ b/fs/xfs/xfs_dir2_readdir.c
@@ -404,7 +404,8 @@ xfs_dir2_leaf_readbuf(
 		 * Read-ahead a contiguous directory block.
 		 */
 		if (i > mip->ra_current &&
-		    map[mip->ra_index].br_blockcount >= geo->fsbcount) {
+		    (map[mip->ra_index].br_blockcount - mip->ra_offset) >=
+		     geo->fsbcount) {
 			xfs_dir3_data_readahead(dp,
 				map[mip->ra_index].br_startoff + mip->ra_offset,
 				XFS_FSB_TO_DADDR(dp->i_mount,
@@ -432,7 +433,7 @@ xfs_dir2_leaf_readbuf(
 			 * The rest of this extent but not more than a dir
 			 * block.
 			 */
-			length = min_t(int, geo->fsbcount,
+			length = min_t(int, geo->fsbcount - j,
 					map[mip->ra_index].br_blockcount -
 							mip->ra_offset);
 			mip->ra_offset += length;

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH] xfs: handle array index overrun in xfs_dir2_leaf_readbuf()
  2017-04-18 16:55     ` Brian Foster
@ 2017-04-18 17:00       ` Brian Foster
  2017-04-19  0:09       ` Darrick J. Wong
  1 sibling, 0 replies; 15+ messages in thread
From: Brian Foster @ 2017-04-18 17:00 UTC (permalink / raw)
  To: Eric Sandeen; +Cc: linux-xfs, Carlos Maiolino

On Tue, Apr 18, 2017 at 12:55:44PM -0400, Brian Foster wrote:
> On Mon, Apr 17, 2017 at 05:12:36PM -0500, Eric Sandeen wrote:
> > On 4/17/17 3:57 PM, Brian Foster wrote:
> > > On Thu, Apr 13, 2017 at 01:45:43PM -0500, Eric Sandeen wrote:
> ...
> > > 
> > > This fix seems fine to me, but I'm wondering if this code may have
> > > issues with other kinds of misalignment between the directory blocks and
> > > underlying bmap extents as well. For example, what happens if we end up
> > > with something like the following on an 8k dir fsb fs?
> > > 
> > >  0:[0,xxx,3,0]
> > >  1:[3,xxx,1,0]
> > > 
> > > ... or ...
> > > 
> > >  0:[0,xxx,3,0]
> > >  1:[3,xxx,3,0]
> > 
> > Well, as far as that goes it won't be an issue; for 8k dir block sizes
> > we will allocate an extent map with room for 10 extents, and we'll go
> > well beyond the above extents which cross directory block boundaries.
> > 
> > >  ...
> > >  N:[...]
> > > 
> > > Am I following correctly that we may end up assuming the wrong mapping
> > > for the second dir fsb and/or possibly skipping blocks?
> > 
> > As far as I can tell, this code is only managing the read-ahead state
> > by looking at these cached extents. We keep track of our position within
> > that allocated array of mappings - this bug just stepped off the end
> > while doing so.
> > 
> > Stopping at the correct point should keep all of the state consistent
> > and correct.
> > 
> > But yeah, it's kind of hairy & hard to read, IMHO.
> > 
> > Also as far as I can tell, we handle such discontiguities correctly,
> > other than the bug I found.  If you see something that looks suspicious,
> > I'm sure I could tweak my test case to craft a specific situation if
> > there's something you'd like to see tested...
> > 
> 
> Background: Eric and I chatted a bit on irc to rectify that what I'm
> calling out above is a different issue from what is fixed by this patch.
> 
> Eric,
> 
> I managed to construct a directory that looks like this:
> 
>  EXT: FILE-OFFSET      BLOCK-RANGE      AG AG-OFFSET        TOTAL
>    0: [0..7]:          88..95            0 (88..95)             8
>    1: [8..15]:         80..87            0 (80..87)             8
>    2: [16..39]:        168..191          0 (168..191)          24
>    3: [40..63]:        5242952..5242975  1 (72..95)            24
> 
> The fs has 8k directory fsbs. Dir fsb offset 0 spans extents 0 and 1,
> offset 1 (which corresponds to the 512b range 16-31 above) is covered
> completely by extent 2 and dir offset 2 (range 32-47) spans extents 2
> and 3. An ls of this directory produces this:
> 
>  XFS (dm-3): Metadata corruption detected at xfs_dir3_data_reada_verify+0x42/0x80 [xfs], xfs_dir3_data_reada block 0x500058
>  XFS (dm-3): Unmount and run xfs_repair
>  XFS (dm-3): First 64 bytes of corrupted metadata buffer:
>  ffffbcb901c44000: 00 00 00 00 00 00 00 64 0f 78 78 78 78 78 78 78  .......d.xxxxxxx
>  ffffbcb901c44010: 78 78 78 78 2e 38 38 36 01 00 00 00 00 00 10 00  xxxx.886........
>  ffffbcb901c44020: 00 00 00 00 00 00 00 64 0f 78 78 78 78 78 78 78  .......d.xxxxxxx
>  ffffbcb901c44030: 78 78 78 78 2e 38 38 37 01 00 00 00 00 00 10 20  xxxx.887....... 
> 

Oops, snipped the wrong part. This is the full output:

 XFS (dm-3): Metadata CRC error detected at xfs_dir3_data_read_verify+0x5e/0x100 [xfs], xfs_dir3_data block 0xb8
 XFS (dm-3): Unmount and run xfs_repair
 XFS (dm-3): First 64 bytes of corrupted metadata buffer:
 ffffbcb901c44000: 58 44 44 33 bb bf aa 94 00 00 00 00 00 00 00 b8  XDD3............
 ffffbcb901c44010: 00 00 00 01 00 00 04 95 00 5b 7c 62 71 fb 47 92  .........[|bq.G.
 ffffbcb901c44020: b4 49 02 c4 e8 1a 75 2f 00 00 00 00 00 00 00 63  .I....u/.......c
 ffffbcb901c44030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
 XFS (dm-3): Metadata corruption detected at xfs_dir3_data_reada_verify+0x42/0x80 [xfs], xfs_dir3_data_reada block 0x500058
 XFS (dm-3): Metadata CRC error detected at xfs_dir3_data_read_verify+0x5e/0x100 [xfs], xfs_dir3_data block 0xb8
 XFS (dm-3): Unmount and run xfs_repair
 XFS (dm-3): First 64 bytes of corrupted metadata buffer:
 ffffbcb901c44000: 58 44 44 33 bb bf aa 94 00 00 00 00 00 00 00 b8  XDD3............
 ffffbcb901c44010: 00 00 00 01 00 00 04 95 00 5b 7c 62 71 fb 47 92  .........[|bq.G.
 ffffbcb901c44020: b4 49 02 c4 e8 1a 75 2f 00 00 00 00 00 00 00 63  .I....u/.......c
 ffffbcb901c44030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
 XFS (dm-3): metadata I/O error: block 0xb8 ("xfs_trans_read_buf_map") error 74 numblks 16
 XFS (dm-3): Unmount and run xfs_repair
 XFS (dm-3): First 64 bytes of corrupted metadata buffer:
 ffffbcb901c46000: 00 00 00 00 00 00 00 64 0f 78 78 78 78 78 78 78  .......d.xxxxxxx
 ffffbcb901c46010: 78 78 78 78 2e 38 38 36 01 00 00 00 00 00 10 00  xxxx.886........
 ffffbcb901c46020: 00 00 00 00 00 00 00 64 0f 78 78 78 78 78 78 78  .......d.xxxxxxx
 ffffbcb901c46030: 78 78 78 78 2e 38 38 37 01 00 00 00 00 00 10 20  xxxx.887....... 

Brian

> ... which is yelling about block 184 (dir fsb 2). The fs is otherwise
> clean according to xfs_repair. 
> 
> I _think_ something like the appended diff deals with it, but this is
> lightly tested only and could definitely use more eyes.
> 
> Brian
> 
> --- 8< ---
> 
> diff --git a/fs/xfs/xfs_dir2_readdir.c b/fs/xfs/xfs_dir2_readdir.c
> index ad9396e..9fa379d 100644
> --- a/fs/xfs/xfs_dir2_readdir.c
> +++ b/fs/xfs/xfs_dir2_readdir.c
> @@ -404,7 +404,8 @@ xfs_dir2_leaf_readbuf(
>  		 * Read-ahead a contiguous directory block.
>  		 */
>  		if (i > mip->ra_current &&
> -		    map[mip->ra_index].br_blockcount >= geo->fsbcount) {
> +		    (map[mip->ra_index].br_blockcount - mip->ra_offset) >=
> +		     geo->fsbcount) {
>  			xfs_dir3_data_readahead(dp,
>  				map[mip->ra_index].br_startoff + mip->ra_offset,
>  				XFS_FSB_TO_DADDR(dp->i_mount,
> @@ -432,7 +433,7 @@ xfs_dir2_leaf_readbuf(
>  			 * The rest of this extent but not more than a dir
>  			 * block.
>  			 */
> -			length = min_t(int, geo->fsbcount,
> +			length = min_t(int, geo->fsbcount - j,
>  					map[mip->ra_index].br_blockcount -
>  							mip->ra_offset);
>  			mip->ra_offset += length;
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] xfs: handle array index overrun in xfs_dir2_leaf_readbuf()
  2017-04-18 16:55     ` Brian Foster
  2017-04-18 17:00       ` Brian Foster
@ 2017-04-19  0:09       ` Darrick J. Wong
  2017-04-19  0:14         ` Eric Sandeen
  1 sibling, 1 reply; 15+ messages in thread
From: Darrick J. Wong @ 2017-04-19  0:09 UTC (permalink / raw)
  To: Brian Foster; +Cc: Eric Sandeen, linux-xfs, Carlos Maiolino

On Tue, Apr 18, 2017 at 12:55:44PM -0400, Brian Foster wrote:
> On Mon, Apr 17, 2017 at 05:12:36PM -0500, Eric Sandeen wrote:
> > On 4/17/17 3:57 PM, Brian Foster wrote:
> > > On Thu, Apr 13, 2017 at 01:45:43PM -0500, Eric Sandeen wrote:
> ...
> > > 
> > > This fix seems fine to me, but I'm wondering if this code may have
> > > issues with other kinds of misalignment between the directory blocks and
> > > underlying bmap extents as well. For example, what happens if we end up
> > > with something like the following on an 8k dir fsb fs?
> > > 
> > >  0:[0,xxx,3,0]
> > >  1:[3,xxx,1,0]
> > > 
> > > ... or ...
> > > 
> > >  0:[0,xxx,3,0]
> > >  1:[3,xxx,3,0]
> > 
> > Well, as far as that goes it won't be an issue; for 8k dir block sizes
> > we will allocate an extent map with room for 10 extents, and we'll go
> > well beyond the above extents which cross directory block boundaries.
> > 
> > >  ...
> > >  N:[...]
> > > 
> > > Am I following correctly that we may end up assuming the wrong mapping
> > > for the second dir fsb and/or possibly skipping blocks?
> > 
> > As far as I can tell, this code is only managing the read-ahead state
> > by looking at these cached extents. We keep track of our position within
> > that allocated array of mappings - this bug just stepped off the end
> > while doing so.
> > 
> > Stopping at the correct point should keep all of the state consistent
> > and correct.
> > 
> > But yeah, it's kind of hairy & hard to read, IMHO.
> > 
> > Also as far as I can tell, we handle such discontiguities correctly,
> > other than the bug I found.  If you see something that looks suspicious,
> > I'm sure I could tweak my test case to craft a specific situation if
> > there's something you'd like to see tested...
> > 
> 
> Background: Eric and I chatted a bit on irc to rectify that what I'm
> calling out above is a different issue from what is fixed by this patch.
> 
> Eric,
> 
> I managed to construct a directory that looks like this:
> 
>  EXT: FILE-OFFSET      BLOCK-RANGE      AG AG-OFFSET        TOTAL
>    0: [0..7]:          88..95            0 (88..95)             8
>    1: [8..15]:         80..87            0 (80..87)             8
>    2: [16..39]:        168..191          0 (168..191)          24
>    3: [40..63]:        5242952..5242975  1 (72..95)            24
> 
> The fs has 8k directory fsbs. Dir fsb offset 0 spans extents 0 and 1,
> offset 1 (which corresponds to the 512b range 16-31 above) is covered
> completely by extent 2 and dir offset 2 (range 32-47) spans extents 2
> and 3. An ls of this directory produces this:
> 
>  XFS (dm-3): Metadata corruption detected at xfs_dir3_data_reada_verify+0x42/0x80 [xfs], xfs_dir3_data_reada block 0x500058
>  XFS (dm-3): Unmount and run xfs_repair
>  XFS (dm-3): First 64 bytes of corrupted metadata buffer:
>  ffffbcb901c44000: 00 00 00 00 00 00 00 64 0f 78 78 78 78 78 78 78  .......d.xxxxxxx
>  ffffbcb901c44010: 78 78 78 78 2e 38 38 36 01 00 00 00 00 00 10 00  xxxx.886........
>  ffffbcb901c44020: 00 00 00 00 00 00 00 64 0f 78 78 78 78 78 78 78  .......d.xxxxxxx
>  ffffbcb901c44030: 78 78 78 78 2e 38 38 37 01 00 00 00 00 00 10 20  xxxx.887....... 
> 
> ... which is yelling about block 184 (dir fsb 2). The fs is otherwise
> clean according to xfs_repair. 
> 
> I _think_ something like the appended diff deals with it, but this is
> lightly tested only and could definitely use more eyes.
> 
> Brian
> 
> --- 8< ---
> 
> diff --git a/fs/xfs/xfs_dir2_readdir.c b/fs/xfs/xfs_dir2_readdir.c
> index ad9396e..9fa379d 100644
> --- a/fs/xfs/xfs_dir2_readdir.c
> +++ b/fs/xfs/xfs_dir2_readdir.c
> @@ -404,7 +404,8 @@ xfs_dir2_leaf_readbuf(
>  		 * Read-ahead a contiguous directory block.
>  		 */
>  		if (i > mip->ra_current &&
> -		    map[mip->ra_index].br_blockcount >= geo->fsbcount) {
> +		    (map[mip->ra_index].br_blockcount - mip->ra_offset) >=
> +		     geo->fsbcount) {
>  			xfs_dir3_data_readahead(dp,
>  				map[mip->ra_index].br_startoff + mip->ra_offset,
>  				XFS_FSB_TO_DADDR(dp->i_mount,
> @@ -432,7 +433,7 @@ xfs_dir2_leaf_readbuf(
>  			 * The rest of this extent but not more than a dir
>  			 * block.
>  			 */
> -			length = min_t(int, geo->fsbcount,
> +			length = min_t(int, geo->fsbcount - j,

Looks ok to me to make Eric's bugfix complete.

I will, however, post a cleanup patch to remove the persistent shadow
bmap and have readahead issued directly off the inode fork contents.

--D

>  					map[mip->ra_index].br_blockcount -
>  							mip->ra_offset);
>  			mip->ra_offset += length;
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] xfs: handle array index overrun in xfs_dir2_leaf_readbuf()
  2017-04-19  0:09       ` Darrick J. Wong
@ 2017-04-19  0:14         ` Eric Sandeen
  2017-04-19  0:29           ` Darrick J. Wong
  0 siblings, 1 reply; 15+ messages in thread
From: Eric Sandeen @ 2017-04-19  0:14 UTC (permalink / raw)
  To: Darrick J. Wong, Brian Foster; +Cc: linux-xfs, Carlos Maiolino

On 4/18/17 7:09 PM, Darrick J. Wong wrote:
> On Tue, Apr 18, 2017 at 12:55:44PM -0400, Brian Foster wrote:
>> On Mon, Apr 17, 2017 at 05:12:36PM -0500, Eric Sandeen wrote:
>>> On 4/17/17 3:57 PM, Brian Foster wrote:
>>>> On Thu, Apr 13, 2017 at 01:45:43PM -0500, Eric Sandeen wrote:
>> ...
>>>>
>>>> This fix seems fine to me, but I'm wondering if this code may have
>>>> issues with other kinds of misalignment between the directory blocks and
>>>> underlying bmap extents as well. For example, what happens if we end up
>>>> with something like the following on an 8k dir fsb fs?
>>>>
>>>>  0:[0,xxx,3,0]
>>>>  1:[3,xxx,1,0]
>>>>
>>>> ... or ...
>>>>
>>>>  0:[0,xxx,3,0]
>>>>  1:[3,xxx,3,0]
>>>
>>> Well, as far as that goes it won't be an issue; for 8k dir block sizes
>>> we will allocate an extent map with room for 10 extents, and we'll go
>>> well beyond the above extents which cross directory block boundaries.
>>>
>>>>  ...
>>>>  N:[...]
>>>>
>>>> Am I following correctly that we may end up assuming the wrong mapping
>>>> for the second dir fsb and/or possibly skipping blocks?
>>>
>>> As far as I can tell, this code is only managing the read-ahead state
>>> by looking at these cached extents. We keep track of our position within
>>> that allocated array of mappings - this bug just stepped off the end
>>> while doing so.
>>>
>>> Stopping at the correct point should keep all of the state consistent
>>> and correct.
>>>
>>> But yeah, it's kind of hairy & hard to read, IMHO.
>>>
>>> Also as far as I can tell, we handle such discontiguities correctly,
>>> other than the bug I found.  If you see something that looks suspicious,
>>> I'm sure I could tweak my test case to craft a specific situation if
>>> there's something you'd like to see tested...
>>>
>>
>> Background: Eric and I chatted a bit on irc to rectify that what I'm
>> calling out above is a different issue from what is fixed by this patch.
>>
>> Eric,
>>
>> I managed to construct a directory that looks like this:
>>
>>  EXT: FILE-OFFSET      BLOCK-RANGE      AG AG-OFFSET        TOTAL
>>    0: [0..7]:          88..95            0 (88..95)             8
>>    1: [8..15]:         80..87            0 (80..87)             8
>>    2: [16..39]:        168..191          0 (168..191)          24
>>    3: [40..63]:        5242952..5242975  1 (72..95)            24
>>
>> The fs has 8k directory fsbs. Dir fsb offset 0 spans extents 0 and 1,
>> offset 1 (which corresponds to the 512b range 16-31 above) is covered
>> completely by extent 2 and dir offset 2 (range 32-47) spans extents 2
>> and 3. An ls of this directory produces this:
>>
>>  XFS (dm-3): Metadata corruption detected at xfs_dir3_data_reada_verify+0x42/0x80 [xfs], xfs_dir3_data_reada block 0x500058
>>  XFS (dm-3): Unmount and run xfs_repair
>>  XFS (dm-3): First 64 bytes of corrupted metadata buffer:
>>  ffffbcb901c44000: 00 00 00 00 00 00 00 64 0f 78 78 78 78 78 78 78  .......d.xxxxxxx
>>  ffffbcb901c44010: 78 78 78 78 2e 38 38 36 01 00 00 00 00 00 10 00  xxxx.886........
>>  ffffbcb901c44020: 00 00 00 00 00 00 00 64 0f 78 78 78 78 78 78 78  .......d.xxxxxxx
>>  ffffbcb901c44030: 78 78 78 78 2e 38 38 37 01 00 00 00 00 00 10 20  xxxx.887....... 
>>
>> ... which is yelling about block 184 (dir fsb 2). The fs is otherwise
>> clean according to xfs_repair. 
>>
>> I _think_ something like the appended diff deals with it, but this is
>> lightly tested only and could definitely use more eyes.
>>
>> Brian
>>
>> --- 8< ---
>>
>> diff --git a/fs/xfs/xfs_dir2_readdir.c b/fs/xfs/xfs_dir2_readdir.c
>> index ad9396e..9fa379d 100644
>> --- a/fs/xfs/xfs_dir2_readdir.c
>> +++ b/fs/xfs/xfs_dir2_readdir.c
>> @@ -404,7 +404,8 @@ xfs_dir2_leaf_readbuf(
>>  		 * Read-ahead a contiguous directory block.
>>  		 */
>>  		if (i > mip->ra_current &&
>> -		    map[mip->ra_index].br_blockcount >= geo->fsbcount) {
>> +		    (map[mip->ra_index].br_blockcount - mip->ra_offset) >=
>> +		     geo->fsbcount) {
>>  			xfs_dir3_data_readahead(dp,
>>  				map[mip->ra_index].br_startoff + mip->ra_offset,
>>  				XFS_FSB_TO_DADDR(dp->i_mount,
>> @@ -432,7 +433,7 @@ xfs_dir2_leaf_readbuf(
>>  			 * The rest of this extent but not more than a dir
>>  			 * block.
>>  			 */
>> -			length = min_t(int, geo->fsbcount,
>> +			length = min_t(int, geo->fsbcount - j,
> 
> Looks ok to me to make Eric's bugfix complete.

Brian, thanks for crafting the image to expose this.  :)

I've been otherwise occupied today, sorry - technically this fixes a separate
issue, yes? So 2 patches, 2 bugfixes AFAICT?

Thanks,
-Eric

> I will, however, post a cleanup patch to remove the persistent shadow
> bmap and have readahead issued directly off the inode fork contents.
> 
> --D
> 
>>  					map[mip->ra_index].br_blockcount -
>>  							mip->ra_offset);
>>  			mip->ra_offset += length;
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] xfs: handle array index overrun in xfs_dir2_leaf_readbuf()
  2017-04-19  0:14         ` Eric Sandeen
@ 2017-04-19  0:29           ` Darrick J. Wong
  2017-04-19 12:32             ` Brian Foster
  0 siblings, 1 reply; 15+ messages in thread
From: Darrick J. Wong @ 2017-04-19  0:29 UTC (permalink / raw)
  To: Eric Sandeen; +Cc: Brian Foster, linux-xfs, Carlos Maiolino

On Tue, Apr 18, 2017 at 07:14:03PM -0500, Eric Sandeen wrote:
> On 4/18/17 7:09 PM, Darrick J. Wong wrote:
> > On Tue, Apr 18, 2017 at 12:55:44PM -0400, Brian Foster wrote:
> >> On Mon, Apr 17, 2017 at 05:12:36PM -0500, Eric Sandeen wrote:
> >>> On 4/17/17 3:57 PM, Brian Foster wrote:
> >>>> On Thu, Apr 13, 2017 at 01:45:43PM -0500, Eric Sandeen wrote:
> >> ...
> >>>>
> >>>> This fix seems fine to me, but I'm wondering if this code may have
> >>>> issues with other kinds of misalignment between the directory blocks and
> >>>> underlying bmap extents as well. For example, what happens if we end up
> >>>> with something like the following on an 8k dir fsb fs?
> >>>>
> >>>>  0:[0,xxx,3,0]
> >>>>  1:[3,xxx,1,0]
> >>>>
> >>>> ... or ...
> >>>>
> >>>>  0:[0,xxx,3,0]
> >>>>  1:[3,xxx,3,0]
> >>>
> >>> Well, as far as that goes it won't be an issue; for 8k dir block sizes
> >>> we will allocate an extent map with room for 10 extents, and we'll go
> >>> well beyond the above extents which cross directory block boundaries.
> >>>
> >>>>  ...
> >>>>  N:[...]
> >>>>
> >>>> Am I following correctly that we may end up assuming the wrong mapping
> >>>> for the second dir fsb and/or possibly skipping blocks?
> >>>
> >>> As far as I can tell, this code is only managing the read-ahead state
> >>> by looking at these cached extents. We keep track of our position within
> >>> that allocated array of mappings - this bug just stepped off the end
> >>> while doing so.
> >>>
> >>> Stopping at the correct point should keep all of the state consistent
> >>> and correct.
> >>>
> >>> But yeah, it's kind of hairy & hard to read, IMHO.
> >>>
> >>> Also as far as I can tell, we handle such discontiguities correctly,
> >>> other than the bug I found.  If you see something that looks suspicious,
> >>> I'm sure I could tweak my test case to craft a specific situation if
> >>> there's something you'd like to see tested...
> >>>
> >>
> >> Background: Eric and I chatted a bit on irc to rectify that what I'm
> >> calling out above is a different issue from what is fixed by this patch.
> >>
> >> Eric,
> >>
> >> I managed to construct a directory that looks like this:
> >>
> >>  EXT: FILE-OFFSET      BLOCK-RANGE      AG AG-OFFSET        TOTAL
> >>    0: [0..7]:          88..95            0 (88..95)             8
> >>    1: [8..15]:         80..87            0 (80..87)             8
> >>    2: [16..39]:        168..191          0 (168..191)          24
> >>    3: [40..63]:        5242952..5242975  1 (72..95)            24
> >>
> >> The fs has 8k directory fsbs. Dir fsb offset 0 spans extents 0 and 1,
> >> offset 1 (which corresponds to the 512b range 16-31 above) is covered
> >> completely by extent 2 and dir offset 2 (range 32-47) spans extents 2
> >> and 3. An ls of this directory produces this:
> >>
> >>  XFS (dm-3): Metadata corruption detected at xfs_dir3_data_reada_verify+0x42/0x80 [xfs], xfs_dir3_data_reada block 0x500058
> >>  XFS (dm-3): Unmount and run xfs_repair
> >>  XFS (dm-3): First 64 bytes of corrupted metadata buffer:
> >>  ffffbcb901c44000: 00 00 00 00 00 00 00 64 0f 78 78 78 78 78 78 78  .......d.xxxxxxx
> >>  ffffbcb901c44010: 78 78 78 78 2e 38 38 36 01 00 00 00 00 00 10 00  xxxx.886........
> >>  ffffbcb901c44020: 00 00 00 00 00 00 00 64 0f 78 78 78 78 78 78 78  .......d.xxxxxxx
> >>  ffffbcb901c44030: 78 78 78 78 2e 38 38 37 01 00 00 00 00 00 10 20  xxxx.887....... 
> >>
> >> ... which is yelling about block 184 (dir fsb 2). The fs is otherwise
> >> clean according to xfs_repair. 
> >>
> >> I _think_ something like the appended diff deals with it, but this is
> >> lightly tested only and could definitely use more eyes.
> >>
> >> Brian
> >>
> >> --- 8< ---
> >>
> >> diff --git a/fs/xfs/xfs_dir2_readdir.c b/fs/xfs/xfs_dir2_readdir.c
> >> index ad9396e..9fa379d 100644
> >> --- a/fs/xfs/xfs_dir2_readdir.c
> >> +++ b/fs/xfs/xfs_dir2_readdir.c
> >> @@ -404,7 +404,8 @@ xfs_dir2_leaf_readbuf(
> >>  		 * Read-ahead a contiguous directory block.
> >>  		 */
> >>  		if (i > mip->ra_current &&
> >> -		    map[mip->ra_index].br_blockcount >= geo->fsbcount) {
> >> +		    (map[mip->ra_index].br_blockcount - mip->ra_offset) >=
> >> +		     geo->fsbcount) {
> >>  			xfs_dir3_data_readahead(dp,
> >>  				map[mip->ra_index].br_startoff + mip->ra_offset,
> >>  				XFS_FSB_TO_DADDR(dp->i_mount,
> >> @@ -432,7 +433,7 @@ xfs_dir2_leaf_readbuf(
> >>  			 * The rest of this extent but not more than a dir
> >>  			 * block.
> >>  			 */
> >> -			length = min_t(int, geo->fsbcount,
> >> +			length = min_t(int, geo->fsbcount - j,
> > 
> > Looks ok to me to make Eric's bugfix complete.
> 
> Brian, thanks for crafting the image to expose this.  :)
> 
> I've been otherwise occupied today, sorry - technically this fixes a separate
> issue, yes? So 2 patches, 2 bugfixes AFAICT?

Ok.  I don't mind taking two patches since you're right, the first patch
saves us from running off the end of the shadow bmap array, and the
second one fixes bookkeepping problems when a dir extent starts midway
through a large dirblock.

--D

> 
> Thanks,
> -Eric
> 
> > I will, however, post a cleanup patch to remove the persistent shadow
> > bmap and have readahead issued directly off the inode fork contents.
> > 
> > --D
> > 
> >>  					map[mip->ra_index].br_blockcount -
> >>  							mip->ra_offset);
> >>  			mip->ra_offset += length;
> >> --
> >> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> >> the body of a message to majordomo@vger.kernel.org
> >> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] xfs: handle array index overrun in xfs_dir2_leaf_readbuf()
  2017-04-19  0:29           ` Darrick J. Wong
@ 2017-04-19 12:32             ` Brian Foster
  2017-04-19 14:40               ` Eric Sandeen
  0 siblings, 1 reply; 15+ messages in thread
From: Brian Foster @ 2017-04-19 12:32 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: Eric Sandeen, linux-xfs, Carlos Maiolino

On Tue, Apr 18, 2017 at 05:29:27PM -0700, Darrick J. Wong wrote:
> On Tue, Apr 18, 2017 at 07:14:03PM -0500, Eric Sandeen wrote:
> > On 4/18/17 7:09 PM, Darrick J. Wong wrote:
> > > On Tue, Apr 18, 2017 at 12:55:44PM -0400, Brian Foster wrote:
> > >> On Mon, Apr 17, 2017 at 05:12:36PM -0500, Eric Sandeen wrote:
> > >>> On 4/17/17 3:57 PM, Brian Foster wrote:
> > >>>> On Thu, Apr 13, 2017 at 01:45:43PM -0500, Eric Sandeen wrote:
> > >> ...
> > >>>>
> > >>>> This fix seems fine to me, but I'm wondering if this code may have
> > >>>> issues with other kinds of misalignment between the directory blocks and
> > >>>> underlying bmap extents as well. For example, what happens if we end up
> > >>>> with something like the following on an 8k dir fsb fs?
> > >>>>
> > >>>>  0:[0,xxx,3,0]
> > >>>>  1:[3,xxx,1,0]
> > >>>>
> > >>>> ... or ...
> > >>>>
> > >>>>  0:[0,xxx,3,0]
> > >>>>  1:[3,xxx,3,0]
> > >>>
> > >>> Well, as far as that goes it won't be an issue; for 8k dir block sizes
> > >>> we will allocate an extent map with room for 10 extents, and we'll go
> > >>> well beyond the above extents which cross directory block boundaries.
> > >>>
> > >>>>  ...
> > >>>>  N:[...]
> > >>>>
> > >>>> Am I following correctly that we may end up assuming the wrong mapping
> > >>>> for the second dir fsb and/or possibly skipping blocks?
> > >>>
> > >>> As far as I can tell, this code is only managing the read-ahead state
> > >>> by looking at these cached extents. We keep track of our position within
> > >>> that allocated array of mappings - this bug just stepped off the end
> > >>> while doing so.
> > >>>
> > >>> Stopping at the correct point should keep all of the state consistent
> > >>> and correct.
> > >>>
> > >>> But yeah, it's kind of hairy & hard to read, IMHO.
> > >>>
> > >>> Also as far as I can tell, we handle such discontiguities correctly,
> > >>> other than the bug I found.  If you see something that looks suspicious,
> > >>> I'm sure I could tweak my test case to craft a specific situation if
> > >>> there's something you'd like to see tested...
> > >>>
> > >>
> > >> Background: Eric and I chatted a bit on irc to rectify that what I'm
> > >> calling out above is a different issue from what is fixed by this patch.
> > >>
> > >> Eric,
> > >>
> > >> I managed to construct a directory that looks like this:
> > >>
> > >>  EXT: FILE-OFFSET      BLOCK-RANGE      AG AG-OFFSET        TOTAL
> > >>    0: [0..7]:          88..95            0 (88..95)             8
> > >>    1: [8..15]:         80..87            0 (80..87)             8
> > >>    2: [16..39]:        168..191          0 (168..191)          24
> > >>    3: [40..63]:        5242952..5242975  1 (72..95)            24
> > >>
> > >> The fs has 8k directory fsbs. Dir fsb offset 0 spans extents 0 and 1,
> > >> offset 1 (which corresponds to the 512b range 16-31 above) is covered
> > >> completely by extent 2 and dir offset 2 (range 32-47) spans extents 2
> > >> and 3. An ls of this directory produces this:
> > >>
> > >>  XFS (dm-3): Metadata corruption detected at xfs_dir3_data_reada_verify+0x42/0x80 [xfs], xfs_dir3_data_reada block 0x500058
> > >>  XFS (dm-3): Unmount and run xfs_repair
> > >>  XFS (dm-3): First 64 bytes of corrupted metadata buffer:
> > >>  ffffbcb901c44000: 00 00 00 00 00 00 00 64 0f 78 78 78 78 78 78 78  .......d.xxxxxxx
> > >>  ffffbcb901c44010: 78 78 78 78 2e 38 38 36 01 00 00 00 00 00 10 00  xxxx.886........
> > >>  ffffbcb901c44020: 00 00 00 00 00 00 00 64 0f 78 78 78 78 78 78 78  .......d.xxxxxxx
> > >>  ffffbcb901c44030: 78 78 78 78 2e 38 38 37 01 00 00 00 00 00 10 20  xxxx.887....... 
> > >>
> > >> ... which is yelling about block 184 (dir fsb 2). The fs is otherwise
> > >> clean according to xfs_repair. 
> > >>
> > >> I _think_ something like the appended diff deals with it, but this is
> > >> lightly tested only and could definitely use more eyes.
> > >>
> > >> Brian
> > >>
> > >> --- 8< ---
> > >>
> > >> diff --git a/fs/xfs/xfs_dir2_readdir.c b/fs/xfs/xfs_dir2_readdir.c
> > >> index ad9396e..9fa379d 100644
> > >> --- a/fs/xfs/xfs_dir2_readdir.c
> > >> +++ b/fs/xfs/xfs_dir2_readdir.c
> > >> @@ -404,7 +404,8 @@ xfs_dir2_leaf_readbuf(
> > >>  		 * Read-ahead a contiguous directory block.
> > >>  		 */
> > >>  		if (i > mip->ra_current &&
> > >> -		    map[mip->ra_index].br_blockcount >= geo->fsbcount) {
> > >> +		    (map[mip->ra_index].br_blockcount - mip->ra_offset) >=
> > >> +		     geo->fsbcount) {
> > >>  			xfs_dir3_data_readahead(dp,
> > >>  				map[mip->ra_index].br_startoff + mip->ra_offset,
> > >>  				XFS_FSB_TO_DADDR(dp->i_mount,
> > >> @@ -432,7 +433,7 @@ xfs_dir2_leaf_readbuf(
> > >>  			 * The rest of this extent but not more than a dir
> > >>  			 * block.
> > >>  			 */
> > >> -			length = min_t(int, geo->fsbcount,
> > >> +			length = min_t(int, geo->fsbcount - j,
> > > 
> > > Looks ok to me to make Eric's bugfix complete.
> > 
> > Brian, thanks for crafting the image to expose this.  :)
> > 
> > I've been otherwise occupied today, sorry - technically this fixes a separate
> > issue, yes? So 2 patches, 2 bugfixes AFAICT?
> 
> Ok.  I don't mind taking two patches since you're right, the first patch
> saves us from running off the end of the shadow bmap array, and the
> second one fixes bookkeepping problems when a dir extent starts midway
> through a large dirblock.
> 

Yeah, these are definitely separate issues and Eric's original patch
here is probably higher priority on its own.

But I'm not clear on the plan here.. do we want this additional patch or
are we planning just to rewrite the readahead bits (or do we want to do
both)? I guess having a bugfix patch in the tree is useful from a stable
perspective... hm?

Brian

> --D
> 
> > 
> > Thanks,
> > -Eric
> > 
> > > I will, however, post a cleanup patch to remove the persistent shadow
> > > bmap and have readahead issued directly off the inode fork contents.
> > > 
> > > --D
> > > 
> > >>  					map[mip->ra_index].br_blockcount -
> > >>  							mip->ra_offset);
> > >>  			mip->ra_offset += length;
> > >> --
> > >> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> > >> the body of a message to majordomo@vger.kernel.org
> > >> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > 
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] xfs: handle array index overrun in xfs_dir2_leaf_readbuf()
  2017-04-19 12:32             ` Brian Foster
@ 2017-04-19 14:40               ` Eric Sandeen
  2017-04-19 16:08                 ` Darrick J. Wong
  0 siblings, 1 reply; 15+ messages in thread
From: Eric Sandeen @ 2017-04-19 14:40 UTC (permalink / raw)
  To: Brian Foster, Darrick J. Wong; +Cc: linux-xfs, Carlos Maiolino

On 4/19/17 7:32 AM, Brian Foster wrote:
> On Tue, Apr 18, 2017 at 05:29:27PM -0700, Darrick J. Wong wrote:
>> On Tue, Apr 18, 2017 at 07:14:03PM -0500, Eric Sandeen wrote:
>>> On 4/18/17 7:09 PM, Darrick J. Wong wrote:
>>>> On Tue, Apr 18, 2017 at 12:55:44PM -0400, Brian Foster wrote:
>>>>> On Mon, Apr 17, 2017 at 05:12:36PM -0500, Eric Sandeen wrote:
>>>>>> On 4/17/17 3:57 PM, Brian Foster wrote:
>>>>>>> On Thu, Apr 13, 2017 at 01:45:43PM -0500, Eric Sandeen wrote:
>>>>> ...
>>>>>>>
>>>>>>> This fix seems fine to me, but I'm wondering if this code may have
>>>>>>> issues with other kinds of misalignment between the directory blocks and
>>>>>>> underlying bmap extents as well. For example, what happens if we end up
>>>>>>> with something like the following on an 8k dir fsb fs?
>>>>>>>
>>>>>>>  0:[0,xxx,3,0]
>>>>>>>  1:[3,xxx,1,0]
>>>>>>>
>>>>>>> ... or ...
>>>>>>>
>>>>>>>  0:[0,xxx,3,0]
>>>>>>>  1:[3,xxx,3,0]
>>>>>>
>>>>>> Well, as far as that goes it won't be an issue; for 8k dir block sizes
>>>>>> we will allocate an extent map with room for 10 extents, and we'll go
>>>>>> well beyond the above extents which cross directory block boundaries.
>>>>>>
>>>>>>>  ...
>>>>>>>  N:[...]
>>>>>>>
>>>>>>> Am I following correctly that we may end up assuming the wrong mapping
>>>>>>> for the second dir fsb and/or possibly skipping blocks?
>>>>>>
>>>>>> As far as I can tell, this code is only managing the read-ahead state
>>>>>> by looking at these cached extents. We keep track of our position within
>>>>>> that allocated array of mappings - this bug just stepped off the end
>>>>>> while doing so.
>>>>>>
>>>>>> Stopping at the correct point should keep all of the state consistent
>>>>>> and correct.
>>>>>>
>>>>>> But yeah, it's kind of hairy & hard to read, IMHO.
>>>>>>
>>>>>> Also as far as I can tell, we handle such discontiguities correctly,
>>>>>> other than the bug I found.  If you see something that looks suspicious,
>>>>>> I'm sure I could tweak my test case to craft a specific situation if
>>>>>> there's something you'd like to see tested...
>>>>>>
>>>>>
>>>>> Background: Eric and I chatted a bit on irc to rectify that what I'm
>>>>> calling out above is a different issue from what is fixed by this patch.
>>>>>
>>>>> Eric,
>>>>>
>>>>> I managed to construct a directory that looks like this:
>>>>>
>>>>>  EXT: FILE-OFFSET      BLOCK-RANGE      AG AG-OFFSET        TOTAL
>>>>>    0: [0..7]:          88..95            0 (88..95)             8
>>>>>    1: [8..15]:         80..87            0 (80..87)             8
>>>>>    2: [16..39]:        168..191          0 (168..191)          24
>>>>>    3: [40..63]:        5242952..5242975  1 (72..95)            24
>>>>>
>>>>> The fs has 8k directory fsbs. Dir fsb offset 0 spans extents 0 and 1,
>>>>> offset 1 (which corresponds to the 512b range 16-31 above) is covered
>>>>> completely by extent 2 and dir offset 2 (range 32-47) spans extents 2
>>>>> and 3. An ls of this directory produces this:
>>>>>
>>>>>  XFS (dm-3): Metadata corruption detected at xfs_dir3_data_reada_verify+0x42/0x80 [xfs], xfs_dir3_data_reada block 0x500058
>>>>>  XFS (dm-3): Unmount and run xfs_repair
>>>>>  XFS (dm-3): First 64 bytes of corrupted metadata buffer:
>>>>>  ffffbcb901c44000: 00 00 00 00 00 00 00 64 0f 78 78 78 78 78 78 78  .......d.xxxxxxx
>>>>>  ffffbcb901c44010: 78 78 78 78 2e 38 38 36 01 00 00 00 00 00 10 00  xxxx.886........
>>>>>  ffffbcb901c44020: 00 00 00 00 00 00 00 64 0f 78 78 78 78 78 78 78  .......d.xxxxxxx
>>>>>  ffffbcb901c44030: 78 78 78 78 2e 38 38 37 01 00 00 00 00 00 10 20  xxxx.887....... 
>>>>>
>>>>> ... which is yelling about block 184 (dir fsb 2). The fs is otherwise
>>>>> clean according to xfs_repair. 
>>>>>
>>>>> I _think_ something like the appended diff deals with it, but this is
>>>>> lightly tested only and could definitely use more eyes.
>>>>>
>>>>> Brian
>>>>>
>>>>> --- 8< ---
>>>>>
>>>>> diff --git a/fs/xfs/xfs_dir2_readdir.c b/fs/xfs/xfs_dir2_readdir.c
>>>>> index ad9396e..9fa379d 100644
>>>>> --- a/fs/xfs/xfs_dir2_readdir.c
>>>>> +++ b/fs/xfs/xfs_dir2_readdir.c
>>>>> @@ -404,7 +404,8 @@ xfs_dir2_leaf_readbuf(
>>>>>  		 * Read-ahead a contiguous directory block.
>>>>>  		 */
>>>>>  		if (i > mip->ra_current &&
>>>>> -		    map[mip->ra_index].br_blockcount >= geo->fsbcount) {
>>>>> +		    (map[mip->ra_index].br_blockcount - mip->ra_offset) >=
>>>>> +		     geo->fsbcount) {
>>>>>  			xfs_dir3_data_readahead(dp,
>>>>>  				map[mip->ra_index].br_startoff + mip->ra_offset,
>>>>>  				XFS_FSB_TO_DADDR(dp->i_mount,
>>>>> @@ -432,7 +433,7 @@ xfs_dir2_leaf_readbuf(
>>>>>  			 * The rest of this extent but not more than a dir
>>>>>  			 * block.
>>>>>  			 */
>>>>> -			length = min_t(int, geo->fsbcount,
>>>>> +			length = min_t(int, geo->fsbcount - j,
>>>>
>>>> Looks ok to me to make Eric's bugfix complete.
>>>
>>> Brian, thanks for crafting the image to expose this.  :)
>>>
>>> I've been otherwise occupied today, sorry - technically this fixes a separate
>>> issue, yes? So 2 patches, 2 bugfixes AFAICT?
>>
>> Ok.  I don't mind taking two patches since you're right, the first patch
>> saves us from running off the end of the shadow bmap array, and the
>> second one fixes bookkeepping problems when a dir extent starts midway
>> through a large dirblock.
>>
> 
> Yeah, these are definitely separate issues and Eric's original patch
> here is probably higher priority on its own.
> 
> But I'm not clear on the plan here.. do we want this additional patch or
> are we planning just to rewrite the readahead bits (or do we want to do
> both)? I guess having a bugfix patch in the tree is useful from a stable
> perspective... hm?
> 
> Brian

Yeah.

There's 20 years of the old code in the field with these bugs - I think 2
straightforward fixes have value even if we might want to rewrite it in the
future.

-Eric


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] xfs: handle array index overrun in xfs_dir2_leaf_readbuf()
  2017-04-19 14:40               ` Eric Sandeen
@ 2017-04-19 16:08                 ` Darrick J. Wong
  2017-04-19 18:01                   ` Brian Foster
  0 siblings, 1 reply; 15+ messages in thread
From: Darrick J. Wong @ 2017-04-19 16:08 UTC (permalink / raw)
  To: Eric Sandeen; +Cc: Brian Foster, linux-xfs, Carlos Maiolino

On Wed, Apr 19, 2017 at 09:40:20AM -0500, Eric Sandeen wrote:
> On 4/19/17 7:32 AM, Brian Foster wrote:
> > On Tue, Apr 18, 2017 at 05:29:27PM -0700, Darrick J. Wong wrote:
> >> On Tue, Apr 18, 2017 at 07:14:03PM -0500, Eric Sandeen wrote:
> >>> On 4/18/17 7:09 PM, Darrick J. Wong wrote:
> >>>> On Tue, Apr 18, 2017 at 12:55:44PM -0400, Brian Foster wrote:
> >>>>> On Mon, Apr 17, 2017 at 05:12:36PM -0500, Eric Sandeen wrote:
> >>>>>> On 4/17/17 3:57 PM, Brian Foster wrote:
> >>>>>>> On Thu, Apr 13, 2017 at 01:45:43PM -0500, Eric Sandeen wrote:
> >>>>> ...
> >>>>>>>
> >>>>>>> This fix seems fine to me, but I'm wondering if this code may have
> >>>>>>> issues with other kinds of misalignment between the directory blocks and
> >>>>>>> underlying bmap extents as well. For example, what happens if we end up
> >>>>>>> with something like the following on an 8k dir fsb fs?
> >>>>>>>
> >>>>>>>  0:[0,xxx,3,0]
> >>>>>>>  1:[3,xxx,1,0]
> >>>>>>>
> >>>>>>> ... or ...
> >>>>>>>
> >>>>>>>  0:[0,xxx,3,0]
> >>>>>>>  1:[3,xxx,3,0]
> >>>>>>
> >>>>>> Well, as far as that goes it won't be an issue; for 8k dir block sizes
> >>>>>> we will allocate an extent map with room for 10 extents, and we'll go
> >>>>>> well beyond the above extents which cross directory block boundaries.
> >>>>>>
> >>>>>>>  ...
> >>>>>>>  N:[...]
> >>>>>>>
> >>>>>>> Am I following correctly that we may end up assuming the wrong mapping
> >>>>>>> for the second dir fsb and/or possibly skipping blocks?
> >>>>>>
> >>>>>> As far as I can tell, this code is only managing the read-ahead state
> >>>>>> by looking at these cached extents. We keep track of our position within
> >>>>>> that allocated array of mappings - this bug just stepped off the end
> >>>>>> while doing so.
> >>>>>>
> >>>>>> Stopping at the correct point should keep all of the state consistent
> >>>>>> and correct.
> >>>>>>
> >>>>>> But yeah, it's kind of hairy & hard to read, IMHO.
> >>>>>>
> >>>>>> Also as far as I can tell, we handle such discontiguities correctly,
> >>>>>> other than the bug I found.  If you see something that looks suspicious,
> >>>>>> I'm sure I could tweak my test case to craft a specific situation if
> >>>>>> there's something you'd like to see tested...
> >>>>>>
> >>>>>
> >>>>> Background: Eric and I chatted a bit on irc to rectify that what I'm
> >>>>> calling out above is a different issue from what is fixed by this patch.
> >>>>>
> >>>>> Eric,
> >>>>>
> >>>>> I managed to construct a directory that looks like this:
> >>>>>
> >>>>>  EXT: FILE-OFFSET      BLOCK-RANGE      AG AG-OFFSET        TOTAL
> >>>>>    0: [0..7]:          88..95            0 (88..95)             8
> >>>>>    1: [8..15]:         80..87            0 (80..87)             8
> >>>>>    2: [16..39]:        168..191          0 (168..191)          24
> >>>>>    3: [40..63]:        5242952..5242975  1 (72..95)            24
> >>>>>
> >>>>> The fs has 8k directory fsbs. Dir fsb offset 0 spans extents 0 and 1,
> >>>>> offset 1 (which corresponds to the 512b range 16-31 above) is covered
> >>>>> completely by extent 2 and dir offset 2 (range 32-47) spans extents 2
> >>>>> and 3. An ls of this directory produces this:
> >>>>>
> >>>>>  XFS (dm-3): Metadata corruption detected at xfs_dir3_data_reada_verify+0x42/0x80 [xfs], xfs_dir3_data_reada block 0x500058
> >>>>>  XFS (dm-3): Unmount and run xfs_repair
> >>>>>  XFS (dm-3): First 64 bytes of corrupted metadata buffer:
> >>>>>  ffffbcb901c44000: 00 00 00 00 00 00 00 64 0f 78 78 78 78 78 78 78  .......d.xxxxxxx
> >>>>>  ffffbcb901c44010: 78 78 78 78 2e 38 38 36 01 00 00 00 00 00 10 00  xxxx.886........
> >>>>>  ffffbcb901c44020: 00 00 00 00 00 00 00 64 0f 78 78 78 78 78 78 78  .......d.xxxxxxx
> >>>>>  ffffbcb901c44030: 78 78 78 78 2e 38 38 37 01 00 00 00 00 00 10 20  xxxx.887....... 
> >>>>>
> >>>>> ... which is yelling about block 184 (dir fsb 2). The fs is otherwise
> >>>>> clean according to xfs_repair. 
> >>>>>
> >>>>> I _think_ something like the appended diff deals with it, but this is
> >>>>> lightly tested only and could definitely use more eyes.
> >>>>>
> >>>>> Brian
> >>>>>
> >>>>> --- 8< ---
> >>>>>
> >>>>> diff --git a/fs/xfs/xfs_dir2_readdir.c b/fs/xfs/xfs_dir2_readdir.c
> >>>>> index ad9396e..9fa379d 100644
> >>>>> --- a/fs/xfs/xfs_dir2_readdir.c
> >>>>> +++ b/fs/xfs/xfs_dir2_readdir.c
> >>>>> @@ -404,7 +404,8 @@ xfs_dir2_leaf_readbuf(
> >>>>>  		 * Read-ahead a contiguous directory block.
> >>>>>  		 */
> >>>>>  		if (i > mip->ra_current &&
> >>>>> -		    map[mip->ra_index].br_blockcount >= geo->fsbcount) {
> >>>>> +		    (map[mip->ra_index].br_blockcount - mip->ra_offset) >=
> >>>>> +		     geo->fsbcount) {
> >>>>>  			xfs_dir3_data_readahead(dp,
> >>>>>  				map[mip->ra_index].br_startoff + mip->ra_offset,
> >>>>>  				XFS_FSB_TO_DADDR(dp->i_mount,
> >>>>> @@ -432,7 +433,7 @@ xfs_dir2_leaf_readbuf(
> >>>>>  			 * The rest of this extent but not more than a dir
> >>>>>  			 * block.
> >>>>>  			 */
> >>>>> -			length = min_t(int, geo->fsbcount,
> >>>>> +			length = min_t(int, geo->fsbcount - j,
> >>>>
> >>>> Looks ok to me to make Eric's bugfix complete.
> >>>
> >>> Brian, thanks for crafting the image to expose this.  :)
> >>>
> >>> I've been otherwise occupied today, sorry - technically this fixes a separate
> >>> issue, yes? So 2 patches, 2 bugfixes AFAICT?
> >>
> >> Ok.  I don't mind taking two patches since you're right, the first patch
> >> saves us from running off the end of the shadow bmap array, and the
> >> second one fixes bookkeepping problems when a dir extent starts midway
> >> through a large dirblock.
> >>
> > 
> > Yeah, these are definitely separate issues and Eric's original patch
> > here is probably higher priority on its own.
> > 
> > But I'm not clear on the plan here.. do we want this additional patch or
> > are we planning just to rewrite the readahead bits (or do we want to do
> > both)? I guess having a bugfix patch in the tree is useful from a stable
> > perspective... hm?
> > 
> > Brian
> 
> Yeah.
> 
> There's 20 years of the old code in the field with these bugs - I think 2
> straightforward fixes have value even if we might want to rewrite it in the
> future.

Assuming 4.11 is released on Sunday, I was figuring 4.12 for the two
fixes + backporting to older kernels, and explore the rewrite for 4.13.

--D

> 
> -Eric
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] xfs: handle array index overrun in xfs_dir2_leaf_readbuf()
  2017-04-19 16:08                 ` Darrick J. Wong
@ 2017-04-19 18:01                   ` Brian Foster
  0 siblings, 0 replies; 15+ messages in thread
From: Brian Foster @ 2017-04-19 18:01 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: Eric Sandeen, linux-xfs, Carlos Maiolino

On Wed, Apr 19, 2017 at 09:08:54AM -0700, Darrick J. Wong wrote:
> On Wed, Apr 19, 2017 at 09:40:20AM -0500, Eric Sandeen wrote:
> > On 4/19/17 7:32 AM, Brian Foster wrote:
> > > On Tue, Apr 18, 2017 at 05:29:27PM -0700, Darrick J. Wong wrote:
> > >> On Tue, Apr 18, 2017 at 07:14:03PM -0500, Eric Sandeen wrote:
> > >>> On 4/18/17 7:09 PM, Darrick J. Wong wrote:
> > >>>> On Tue, Apr 18, 2017 at 12:55:44PM -0400, Brian Foster wrote:
> > >>>>> On Mon, Apr 17, 2017 at 05:12:36PM -0500, Eric Sandeen wrote:
> > >>>>>> On 4/17/17 3:57 PM, Brian Foster wrote:
> > >>>>>>> On Thu, Apr 13, 2017 at 01:45:43PM -0500, Eric Sandeen wrote:
> > >>>>> ...
> > >>>>>>>
> > >>>>>>> This fix seems fine to me, but I'm wondering if this code may have
> > >>>>>>> issues with other kinds of misalignment between the directory blocks and
> > >>>>>>> underlying bmap extents as well. For example, what happens if we end up
> > >>>>>>> with something like the following on an 8k dir fsb fs?
> > >>>>>>>
> > >>>>>>>  0:[0,xxx,3,0]
> > >>>>>>>  1:[3,xxx,1,0]
> > >>>>>>>
> > >>>>>>> ... or ...
> > >>>>>>>
> > >>>>>>>  0:[0,xxx,3,0]
> > >>>>>>>  1:[3,xxx,3,0]
> > >>>>>>
> > >>>>>> Well, as far as that goes it won't be an issue; for 8k dir block sizes
> > >>>>>> we will allocate an extent map with room for 10 extents, and we'll go
> > >>>>>> well beyond the above extents which cross directory block boundaries.
> > >>>>>>
> > >>>>>>>  ...
> > >>>>>>>  N:[...]
> > >>>>>>>
> > >>>>>>> Am I following correctly that we may end up assuming the wrong mapping
> > >>>>>>> for the second dir fsb and/or possibly skipping blocks?
> > >>>>>>
> > >>>>>> As far as I can tell, this code is only managing the read-ahead state
> > >>>>>> by looking at these cached extents. We keep track of our position within
> > >>>>>> that allocated array of mappings - this bug just stepped off the end
> > >>>>>> while doing so.
> > >>>>>>
> > >>>>>> Stopping at the correct point should keep all of the state consistent
> > >>>>>> and correct.
> > >>>>>>
> > >>>>>> But yeah, it's kind of hairy & hard to read, IMHO.
> > >>>>>>
> > >>>>>> Also as far as I can tell, we handle such discontiguities correctly,
> > >>>>>> other than the bug I found.  If you see something that looks suspicious,
> > >>>>>> I'm sure I could tweak my test case to craft a specific situation if
> > >>>>>> there's something you'd like to see tested...
> > >>>>>>
> > >>>>>
> > >>>>> Background: Eric and I chatted a bit on irc to rectify that what I'm
> > >>>>> calling out above is a different issue from what is fixed by this patch.
> > >>>>>
> > >>>>> Eric,
> > >>>>>
> > >>>>> I managed to construct a directory that looks like this:
> > >>>>>
> > >>>>>  EXT: FILE-OFFSET      BLOCK-RANGE      AG AG-OFFSET        TOTAL
> > >>>>>    0: [0..7]:          88..95            0 (88..95)             8
> > >>>>>    1: [8..15]:         80..87            0 (80..87)             8
> > >>>>>    2: [16..39]:        168..191          0 (168..191)          24
> > >>>>>    3: [40..63]:        5242952..5242975  1 (72..95)            24
> > >>>>>
> > >>>>> The fs has 8k directory fsbs. Dir fsb offset 0 spans extents 0 and 1,
> > >>>>> offset 1 (which corresponds to the 512b range 16-31 above) is covered
> > >>>>> completely by extent 2 and dir offset 2 (range 32-47) spans extents 2
> > >>>>> and 3. An ls of this directory produces this:
> > >>>>>
> > >>>>>  XFS (dm-3): Metadata corruption detected at xfs_dir3_data_reada_verify+0x42/0x80 [xfs], xfs_dir3_data_reada block 0x500058
> > >>>>>  XFS (dm-3): Unmount and run xfs_repair
> > >>>>>  XFS (dm-3): First 64 bytes of corrupted metadata buffer:
> > >>>>>  ffffbcb901c44000: 00 00 00 00 00 00 00 64 0f 78 78 78 78 78 78 78  .......d.xxxxxxx
> > >>>>>  ffffbcb901c44010: 78 78 78 78 2e 38 38 36 01 00 00 00 00 00 10 00  xxxx.886........
> > >>>>>  ffffbcb901c44020: 00 00 00 00 00 00 00 64 0f 78 78 78 78 78 78 78  .......d.xxxxxxx
> > >>>>>  ffffbcb901c44030: 78 78 78 78 2e 38 38 37 01 00 00 00 00 00 10 20  xxxx.887....... 
> > >>>>>
> > >>>>> ... which is yelling about block 184 (dir fsb 2). The fs is otherwise
> > >>>>> clean according to xfs_repair. 
> > >>>>>
> > >>>>> I _think_ something like the appended diff deals with it, but this is
> > >>>>> lightly tested only and could definitely use more eyes.
> > >>>>>
> > >>>>> Brian
> > >>>>>
> > >>>>> --- 8< ---
> > >>>>>
> > >>>>> diff --git a/fs/xfs/xfs_dir2_readdir.c b/fs/xfs/xfs_dir2_readdir.c
> > >>>>> index ad9396e..9fa379d 100644
> > >>>>> --- a/fs/xfs/xfs_dir2_readdir.c
> > >>>>> +++ b/fs/xfs/xfs_dir2_readdir.c
> > >>>>> @@ -404,7 +404,8 @@ xfs_dir2_leaf_readbuf(
> > >>>>>  		 * Read-ahead a contiguous directory block.
> > >>>>>  		 */
> > >>>>>  		if (i > mip->ra_current &&
> > >>>>> -		    map[mip->ra_index].br_blockcount >= geo->fsbcount) {
> > >>>>> +		    (map[mip->ra_index].br_blockcount - mip->ra_offset) >=
> > >>>>> +		     geo->fsbcount) {
> > >>>>>  			xfs_dir3_data_readahead(dp,
> > >>>>>  				map[mip->ra_index].br_startoff + mip->ra_offset,
> > >>>>>  				XFS_FSB_TO_DADDR(dp->i_mount,
> > >>>>> @@ -432,7 +433,7 @@ xfs_dir2_leaf_readbuf(
> > >>>>>  			 * The rest of this extent but not more than a dir
> > >>>>>  			 * block.
> > >>>>>  			 */
> > >>>>> -			length = min_t(int, geo->fsbcount,
> > >>>>> +			length = min_t(int, geo->fsbcount - j,
> > >>>>
> > >>>> Looks ok to me to make Eric's bugfix complete.
> > >>>
> > >>> Brian, thanks for crafting the image to expose this.  :)
> > >>>
> > >>> I've been otherwise occupied today, sorry - technically this fixes a separate
> > >>> issue, yes? So 2 patches, 2 bugfixes AFAICT?
> > >>
> > >> Ok.  I don't mind taking two patches since you're right, the first patch
> > >> saves us from running off the end of the shadow bmap array, and the
> > >> second one fixes bookkeepping problems when a dir extent starts midway
> > >> through a large dirblock.
> > >>
> > > 
> > > Yeah, these are definitely separate issues and Eric's original patch
> > > here is probably higher priority on its own.
> > > 
> > > But I'm not clear on the plan here.. do we want this additional patch or
> > > are we planning just to rewrite the readahead bits (or do we want to do
> > > both)? I guess having a bugfix patch in the tree is useful from a stable
> > > perspective... hm?
> > > 
> > > Brian
> > 
> > Yeah.
> > 
> > There's 20 years of the old code in the field with these bugs - I think 2
> > straightforward fixes have value even if we might want to rewrite it in the
> > future.
> 
> Assuming 4.11 is released on Sunday, I was figuring 4.12 for the two
> fixes + backporting to older kernels, and explore the rewrite for 4.13.
> 

Ok, sounds good to me. I'll get this one posted once it survives a bit
more testing. Thanks.

Brian

> --D
> 
> > 
> > -Eric
> > 
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2017-04-19 18:01 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-04-13 18:45 [PATCH] xfs: handle array index overrun in xfs_dir2_leaf_readbuf() Eric Sandeen
2017-04-13 19:09 ` Carlos Maiolino
2017-04-13 22:12 ` Darrick J. Wong
2017-04-13 22:24 ` Bill O'Donnell
2017-04-17 20:57 ` Brian Foster
2017-04-17 22:12   ` Eric Sandeen
2017-04-18 16:55     ` Brian Foster
2017-04-18 17:00       ` Brian Foster
2017-04-19  0:09       ` Darrick J. Wong
2017-04-19  0:14         ` Eric Sandeen
2017-04-19  0:29           ` Darrick J. Wong
2017-04-19 12:32             ` Brian Foster
2017-04-19 14:40               ` Eric Sandeen
2017-04-19 16:08                 ` Darrick J. Wong
2017-04-19 18:01                   ` Brian Foster

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.