linux-xfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Questions about XFS abnormal img mount test
@ 2020-02-10  3:02 zhengbin (A)
  2020-02-10  3:59 ` Eric Sandeen
  2020-02-11  1:15 ` Dave Chinner
  0 siblings, 2 replies; 5+ messages in thread
From: zhengbin (A) @ 2020-02-10  3:02 UTC (permalink / raw)
  To: Darrick J. Wong, Dave Chinner, sandeen, linux-xfs; +Cc: renxudong1, zhangyi (F)

### question
We recently used fuzz(hydra) to test 4.19 stable XFS and automatically generate tmp.img (XFS v5 format, but some metadata is wrong)

Test as follows:
mount tmp.img tmpdir
cp file tmpdir
sync  --> stuck

### cause analysis
This is because tmp.img (only 1 AG) has some problems. Using xfs_repair detect information as follows:

agf_freeblks 0, counted 3224 in ag 0
agf_longest 536874136, counted 3224 in ag 0 
sb_fdblocks 613, counted 3228

The reason sync is blocked is :
xfs_vm_writepages(xfs_address_space_operations--writepages)
  write_cache_pages
    xfs_do_writepage
      xfs_writepage_map
	xfs_map_blocks
          allocate_blocks:
	    error = xfs_iomap_write_allocate
			
xfs_iomap_write_allocate
  while (count_fsb != 0) {
    nimaps = 0;
      while (nimaps == 0) { --> endless loop
	nimaps = 1;
	error = xfs_bmapi_write(..., &nimaps) --> nimaps becomes 0 again

xfs_bmapi_write
  xfs_bmap_alloc
    xfs_bmap_btalloc
      xfs_alloc_vextent
	xfs_alloc_fix_freelist
          xfs_alloc_space_available --> less space than needed

xfs_alloc_space_available
  alloc_len = args->minlen + (args->alignment - 1) + args->minalignslop;
    longest = xfs_alloc_longest_free_extent(pag, min_free, reservation);
    if (longest < alloc_len)
       return false;

    /* do we have enough free space remaining for the allocation? */
    available = (int)(pag->pagf_freeblks + pag->pagf_flcount -
                        reservation - min_free - args->minleft);
    if (available < (int)max(args->total, alloc_len))
      return false;

### solve
1. Detect the above metadata corruption when mounting XFS?
   agf_freeblks 0, counted 3224 in ag 0
   agf_longest 536874136, counted 3224 in ag 0 
   sb_fdblocks 613, counted 3228

2. xfs_repair detection at system boot? If xfs_repair fails, refuse to mount XFS



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Questions about XFS abnormal img mount test
  2020-02-10  3:02 Questions about XFS abnormal img mount test zhengbin (A)
@ 2020-02-10  3:59 ` Eric Sandeen
  2020-02-11  1:15 ` Dave Chinner
  1 sibling, 0 replies; 5+ messages in thread
From: Eric Sandeen @ 2020-02-10  3:59 UTC (permalink / raw)
  To: zhengbin (A), Darrick J. Wong, Dave Chinner, sandeen, linux-xfs
  Cc: renxudong1, zhangyi (F)

On 2/9/20 9:02 PM, zhengbin (A) wrote:
> ### question
> We recently used fuzz(hydra) to test 4.19 stable XFS and automatically generate tmp.img (XFS v5 format, but some metadata is wrong)

Since you are testing a stable series kernel, the first thing to do
would be to test an upstream kernel and see if the problem persists.
If it does not you can bisect to the solution.

If the problem persists in the current upstream kernel, please let us know
and we can look into it further.

> Test as follows:
> mount tmp.img tmpdir
> cp file tmpdir
> sync  --> stuck
> 
> ### cause analysis
> This is because tmp.img (only 1 AG) has some problems. Using xfs_repair detect information as follows:
> 
> agf_freeblks 0, counted 3224 in ag 0
> agf_longest 536874136, counted 3224 in ag 0 
> sb_fdblocks 613, counted 3228
> 
> The reason sync is blocked is :
> xfs_vm_writepages(xfs_address_space_operations--writepages)
>   write_cache_pages
>     xfs_do_writepage
>       xfs_writepage_map
> 	xfs_map_blocks
>           allocate_blocks:
> 	    error = xfs_iomap_write_allocate
> 			
> xfs_iomap_write_allocate
>   while (count_fsb != 0) {
>     nimaps = 0;
>       while (nimaps == 0) { --> endless loop
> 	nimaps = 1;
> 	error = xfs_bmapi_write(..., &nimaps) --> nimaps becomes 0 again
> 
> xfs_bmapi_write
>   xfs_bmap_alloc
>     xfs_bmap_btalloc
>       xfs_alloc_vextent
> 	xfs_alloc_fix_freelist
>           xfs_alloc_space_available --> less space than needed
> 
> xfs_alloc_space_available
>   alloc_len = args->minlen + (args->alignment - 1) + args->minalignslop;
>     longest = xfs_alloc_longest_free_extent(pag, min_free, reservation);
>     if (longest < alloc_len)
>        return false;
> 
>     /* do we have enough free space remaining for the allocation? */
>     available = (int)(pag->pagf_freeblks + pag->pagf_flcount -
>                         reservation - min_free - args->minleft);
>     if (available < (int)max(args->total, alloc_len))
>       return false;
> 
> ### solve
> 1. Detect the above metadata corruption when mounting XFS?
>    agf_freeblks 0, counted 3224 in ag 0
>    agf_longest 536874136, counted 3224 in ag 0 
>    sb_fdblocks 613, counted 3228
> 
> 2. xfs_repair detection at system boot? If xfs_repair fails, refuse to mount XFS

no, we won't be running repair at every boot.

-Eric

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Questions about XFS abnormal img mount test
  2020-02-10  3:02 Questions about XFS abnormal img mount test zhengbin (A)
  2020-02-10  3:59 ` Eric Sandeen
@ 2020-02-11  1:15 ` Dave Chinner
  2020-02-13  8:33   ` zhengbin (A)
  1 sibling, 1 reply; 5+ messages in thread
From: Dave Chinner @ 2020-02-11  1:15 UTC (permalink / raw)
  To: zhengbin (A); +Cc: Darrick J. Wong, sandeen, linux-xfs, renxudong1, zhangyi (F)

On Mon, Feb 10, 2020 at 11:02:08AM +0800, zhengbin (A) wrote:
> ### question
> We recently used fuzz(hydra) to test 4.19 stable XFS and automatically generate tmp.img (XFS v5 format, but some metadata is wrong)

So you create impossible situations in the on-disk format, then
recalculate the CRC to make appear valid to the filesystem?

> Test as follows:
> mount tmp.img tmpdir
> cp file tmpdir
> sync  --> stuck
> 
> ### cause analysis
> This is because tmp.img (only 1 AG) has some problems. Using xfs_repair detect information as follows:

Please use at least 2 AGs for your fuzzer images. There's no point
in testing single AG filesystems because:
	a) they are not supported
	b) there is no redundant information in the filesysetm to
	   be able to detect a vast range of potential corruptions.

> agf_freeblks 0, counted 3224 in ag 0
> agf_longest 536874136, counted 3224 in ag 0 
> sb_fdblocks 613, counted 3228

So the AGF verifier is missing these checks:

a) agf_longest < agf_freeblks
b) agf_freeblks < sb_dblocks / sb_agcount
c) agf_freeblks < sb_fdblocks

and probably some other things as well. Can you please add these
checks to xfs_agf_verify() (and any other obvious bounds tests that
are missing) and submit the patch for inclusion?

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Questions about XFS abnormal img mount test
  2020-02-11  1:15 ` Dave Chinner
@ 2020-02-13  8:33   ` zhengbin (A)
  2020-02-13 17:11     ` Darrick J. Wong
  0 siblings, 1 reply; 5+ messages in thread
From: zhengbin (A) @ 2020-02-13  8:33 UTC (permalink / raw)
  To: Dave Chinner; +Cc: Darrick J. Wong, sandeen, linux-xfs, renxudong1, zhangyi (F)


On 2020/2/11 9:15, Dave Chinner wrote:
> On Mon, Feb 10, 2020 at 11:02:08AM +0800, zhengbin (A) wrote:
>> ### question
>> We recently used fuzz(hydra) to test 4.19 stable XFS and automatically generate tmp.img (XFS v5 format, but some metadata is wrong)
> So you create impossible situations in the on-disk format, then
> recalculate the CRC to make appear valid to the filesystem?
>
>> Test as follows:
>> mount tmp.img tmpdir
>> cp file tmpdir
>> sync  --> stuck
>>
>> ### cause analysis
>> This is because tmp.img (only 1 AG) has some problems. Using xfs_repair detect information as follows:
> Please use at least 2 AGs for your fuzzer images. There's no point
> in testing single AG filesystems because:
> 	a) they are not supported
Maybe we can add a check in mount? If there is only 1 AG, refuse to mount?
> 	b) there is no redundant information in the filesysetm to
> 	   be able to detect a vast range of potential corruptions.
>
>> agf_freeblks 0, counted 3224 in ag 0
>> agf_longest 536874136, counted 3224 in ag 0 
>> sb_fdblocks 613, counted 3228
> So the AGF verifier is missing these checks:
>
> a) agf_longest < agf_freeblks
> b) agf_freeblks < sb_dblocks / sb_agcount
> c) agf_freeblks < sb_fdblocks

b is not ok,

ie: disk is 10G, mkfs.xfs -d agsize=3G, so there will be 4 AG, while the last AG is 1G.

sb_dblocks is 10G, while the first AG's  agf_freeblks is 3G > 10G/4=2.5G

>
> and probably some other things as well. Can you please add these
> checks to xfs_agf_verify() (and any other obvious bounds tests that
> are missing) and submit the patch for inclusion?
I will send a patch
> Cheers,
>
> Dave.


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Questions about XFS abnormal img mount test
  2020-02-13  8:33   ` zhengbin (A)
@ 2020-02-13 17:11     ` Darrick J. Wong
  0 siblings, 0 replies; 5+ messages in thread
From: Darrick J. Wong @ 2020-02-13 17:11 UTC (permalink / raw)
  To: zhengbin (A); +Cc: Dave Chinner, sandeen, linux-xfs, renxudong1, zhangyi (F)

On Thu, Feb 13, 2020 at 04:33:38PM +0800, zhengbin (A) wrote:
> 
> On 2020/2/11 9:15, Dave Chinner wrote:
> > On Mon, Feb 10, 2020 at 11:02:08AM +0800, zhengbin (A) wrote:
> >> ### question
> >> We recently used fuzz(hydra) to test 4.19 stable XFS and automatically generate tmp.img (XFS v5 format, but some metadata is wrong)
> > So you create impossible situations in the on-disk format, then
> > recalculate the CRC to make appear valid to the filesystem?
> >
> >> Test as follows:
> >> mount tmp.img tmpdir
> >> cp file tmpdir
> >> sync  --> stuck
> >>
> >> ### cause analysis
> >> This is because tmp.img (only 1 AG) has some problems. Using xfs_repair detect information as follows:
> > Please use at least 2 AGs for your fuzzer images. There's no point
> > in testing single AG filesystems because:
> > 	a) they are not supported
> Maybe we can add a check in mount? If there is only 1 AG, refuse to mount?

No, that will break existing users.  Single AG filesystems exist in a
weird gray area where they're not supported but they're not explicitly
prohibited either.

--D

> > 	b) there is no redundant information in the filesysetm to
> > 	   be able to detect a vast range of potential corruptions.
> >
> >> agf_freeblks 0, counted 3224 in ag 0
> >> agf_longest 536874136, counted 3224 in ag 0 
> >> sb_fdblocks 613, counted 3228
> > So the AGF verifier is missing these checks:
> >
> > a) agf_longest < agf_freeblks
> > b) agf_freeblks < sb_dblocks / sb_agcount
> > c) agf_freeblks < sb_fdblocks
> 
> b is not ok,
> 
> ie: disk is 10G, mkfs.xfs -d agsize=3G, so there will be 4 AG, while the last AG is 1G.
> 
> sb_dblocks is 10G, while the first AG's  agf_freeblks is 3G > 10G/4=2.5G
> 
> >
> > and probably some other things as well. Can you please add these
> > checks to xfs_agf_verify() (and any other obvious bounds tests that
> > are missing) and submit the patch for inclusion?
> I will send a patch
> > Cheers,
> >
> > Dave.
> 

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2020-02-13 17:12 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-02-10  3:02 Questions about XFS abnormal img mount test zhengbin (A)
2020-02-10  3:59 ` Eric Sandeen
2020-02-11  1:15 ` Dave Chinner
2020-02-13  8:33   ` zhengbin (A)
2020-02-13 17:11     ` Darrick J. Wong

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).