* Re: [LTP] LTP test df01.sh detected different size of loop device in v5.19 [not found] ` <YvZUfq+3HYwXEncw@pevik> @ 2022-08-12 14:00 ` Petr Vorel 0 siblings, 0 replies; 13+ messages in thread From: Petr Vorel @ 2022-08-12 14:00 UTC (permalink / raw) To: linux-block, Jens Axboe, Jan Kara, Hannes Reinecke, linux-xfs; +Cc: ltp [ Cc LTP ML, sorry for the noise ] Petr > > Hi all, > > LTP test df01.sh found different size of loop device in v5.19. > > Test uses loop device formatted on various file systems, only XFS fails. > > It randomly fails during verifying that loop size usage changes: > > grep ${TST_DEVICE} output | grep -q "${total}.*${used}" [1] > > How to reproduce: > > # PATH="/opt/ltp/testcases/bin:$PATH" df01.sh -f xfs # it needs several tries to hit > > df saved output: > > Filesystem 1024-blocks Used Available Capacity Mounted on > > ... > > /dev/loop0 256672 16208 240464 7% /tmp/LTP_df01.1kRwoUCCR7/mntpoint > > df output: > > Filesystem 1024-blocks Used Available Capacity Mounted on > > ... > > tmpfs 201780 0 201780 0% /run/user/0 > > /dev/loop0 256672 15160 241512 6% /tmp/LTP_df01.1kRwoUCCR7/mntpoint > > => different size > > df01 4 TFAIL: 'df -k -P' failed, not expected. > > Also 'df -T -P' fails. > > It might be a false positive / bug in the test, but it's at least a changed behavior. > > I was able to reproduce it on v5.19 distro kernels (openSUSE, Debian). > > I haven't bisected (yet), nor checked Jens' git tree (maybe it has been fixed). > Forget to note dmesg "operation not supported error" warning on *each* run (even > successful) on affected v5.19: > [ 5097.594021] loop0: detected capacity change from 0 to 524288 > [ 5097.658201] operation not supported error, dev loop0, sector 262192 op 0x9:(WRITE_ZEROES) flags 0x8000800 phys_seg 0 prio class 0 > [ 5097.675670] XFS (loop0): Mounting V5 Filesystem > [ 5097.681668] XFS (loop0): Ending clean mount > [ 5097.956445] XFS (loop0): Unmounting Filesystem > Kind regards, > Petr > > Kind regards, > > Petr > > [1] https://github.com/linux-test-project/ltp/blob/f42f6f3b4671f447b743afe8612917ba4362b8a6/testcases/commands/df/df01.sh#L103-L110 -- Mailing list info: https://lists.linux.it/listinfo/ltp ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [LTP] LTP test df01.sh detected different size of loop device in v5.19 [not found] <YvZTpQFinpkB06p9@pevik> [not found] ` <YvZUfq+3HYwXEncw@pevik> @ 2022-08-14 22:44 ` Dave Chinner 2022-08-15 9:31 ` Petr Vorel 1 sibling, 1 reply; 13+ messages in thread From: Dave Chinner @ 2022-08-14 22:44 UTC (permalink / raw) To: Petr Vorel Cc: Jens Axboe, linux-xfs, Jan Kara, linux-block, Hannes Reinecke, ltp On Fri, Aug 12, 2022 at 03:20:37PM +0200, Petr Vorel wrote: > Hi all, > > LTP test df01.sh found different size of loop device in v5.19. > Test uses loop device formatted on various file systems, only XFS fails. > It randomly fails during verifying that loop size usage changes: > > grep ${TST_DEVICE} output | grep -q "${total}.*${used}" [1] > > How to reproduce: > # PATH="/opt/ltp/testcases/bin:$PATH" df01.sh -f xfs # it needs several tries to hit > > df saved output: > Filesystem 1024-blocks Used Available Capacity Mounted on > ... > /dev/loop0 256672 16208 240464 7% /tmp/LTP_df01.1kRwoUCCR7/mntpoint > df output: > Filesystem 1024-blocks Used Available Capacity Mounted on > ... > tmpfs 201780 0 201780 0% /run/user/0 > /dev/loop0 256672 15160 241512 6% /tmp/LTP_df01.1kRwoUCCR7/mntpoint > => different size > df01 4 TFAIL: 'df -k -P' failed, not expected. Yup, most likely because we changed something in XFS related to internal block reservation spaces. That is, the test is making fundamentally flawed assumptions about filesystem used space accounting. It is wrong to assuming that the available capacity of a given empty filesystem will never change. Assumptions like this have been invalid for decades because the available space can change based on the underlying configuration or the filesystem. e.g. different versions of mkfs.xfs set different default parameters and so simply changing the version of xfsprogs you use between the two comparision tests will make it fail.... And, well, XFS also has XFS_IOC_{GS}ET_RESBLKS ioctls that allow userspace to change the amount of reserved blocks. They were introduced in 1997, and since then we've changed the default reservation the filesystem takes at least a dozen times. > > It might be a false positive / bug in the test, but it's at least a changed behavior. Yup, any test that assumes "available space" does not change from kernel version to kernel version is flawed. There is no guarantee that this ever stays the same, nor that it needs to stay the same. > > I was able to reproduce it on v5.19 distro kernels (openSUSE, Debian). > > I haven't bisected (yet), nor checked Jens' git tree (maybe it has been fixed). > > Forget to note dmesg "operation not supported error" warning on *each* run (even > successful) on affected v5.19: > [ 5097.594021] loop0: detected capacity change from 0 to 524288 > [ 5097.658201] operation not supported error, dev loop0, sector 262192 op 0x9:(WRITE_ZEROES) flags 0x8000800 phys_seg 0 prio class 0 > [ 5097.675670] XFS (loop0): Mounting V5 Filesystem > [ 5097.681668] XFS (loop0): Ending clean mount > [ 5097.956445] XFS (loop0): Unmounting Filesystem That warning is from mkfs attempting to use fallocate(ZERO_RANGE) to offload the zeroing of the journal to the block device. It would seem that the loop device image file is being hosted on a filesystem that does not support the fallocate() ZERO_RANGE (or maybe PUNCH_HOLE) operation. That warning should simply be removed - it serves no useful purpose to a user... CHeers, Dave. -- Dave Chinner david@fromorbit.com -- Mailing list info: https://lists.linux.it/listinfo/ltp ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [LTP] LTP test df01.sh detected different size of loop device in v5.19 2022-08-14 22:44 ` Dave Chinner @ 2022-08-15 9:31 ` Petr Vorel 2022-08-15 20:09 ` Eric Sandeen 0 siblings, 1 reply; 13+ messages in thread From: Petr Vorel @ 2022-08-15 9:31 UTC (permalink / raw) To: Dave Chinner Cc: Jens Axboe, linux-xfs, Jan Kara, linux-block, Hannes Reinecke, ltp Hi Dave, > On Fri, Aug 12, 2022 at 03:20:37PM +0200, Petr Vorel wrote: > > Hi all, > > LTP test df01.sh found different size of loop device in v5.19. > > Test uses loop device formatted on various file systems, only XFS fails. > > It randomly fails during verifying that loop size usage changes: > > grep ${TST_DEVICE} output | grep -q "${total}.*${used}" [1] > > How to reproduce: > > # PATH="/opt/ltp/testcases/bin:$PATH" df01.sh -f xfs # it needs several tries to hit > > df saved output: > > Filesystem 1024-blocks Used Available Capacity Mounted on > > ... > > /dev/loop0 256672 16208 240464 7% /tmp/LTP_df01.1kRwoUCCR7/mntpoint > > df output: > > Filesystem 1024-blocks Used Available Capacity Mounted on > > ... > > tmpfs 201780 0 201780 0% /run/user/0 > > /dev/loop0 256672 15160 241512 6% /tmp/LTP_df01.1kRwoUCCR7/mntpoint > > => different size > > df01 4 TFAIL: 'df -k -P' failed, not expected. > Yup, most likely because we changed something in XFS related to > internal block reservation spaces. That is, the test is making > fundamentally flawed assumptions about filesystem used space > accounting. > It is wrong to assuming that the available capacity of a given empty > filesystem will never change. Assumptions like this have been > invalid for decades because the available space can change based on > the underlying configuration or the filesystem. e.g. different > versions of mkfs.xfs set different default parameters and so simply > changing the version of xfsprogs you use between the two comparision > tests will make it fail.... > And, well, XFS also has XFS_IOC_{GS}ET_RESBLKS ioctls that allow > userspace to change the amount of reserved blocks. They were > introduced in 1997, and since then we've changed the default > reservation the filesystem takes at least a dozen times. Thanks a lot for valuable info. > > > It might be a false positive / bug in the test, but it's at least a changed behavior. > Yup, any test that assumes "available space" does not change from > kernel version to kernel version is flawed. There is no guarantee > that this ever stays the same, nor that it needs to stay the same. I'm sorry I was not clear. Test [1] does not measure "available space" between kernel releases. It just run df command with parameters, saves it's output and compares "1024-blocks" and "Used" columns of df output with stat output: local total=$(stat -f mntpoint --printf=%b) local free=$(stat -f mntpoint --printf=%f) local used=$((total-free)) local bsize=$(stat -f mntpoint --printf=%s) total=$((($total * $bsize + 512)/ 1024)) used=$((($used * $bsize + 512) / 1024)) And comparison with "$used" is what sometimes fails. BTW this happens on both distros when loop device is on tmpfs. I'm trying to trigger it on ext4 and btrfs, not successful so far. Looks like it's tmpfs related. If that's really expected, we might remove this check for used for XFS (not sure if check only for total makes sense). > > > I was able to reproduce it on v5.19 distro kernels (openSUSE, Debian). > > > I haven't bisected (yet), nor checked Jens' git tree (maybe it has been fixed). > > Forget to note dmesg "operation not supported error" warning on *each* run (even > > successful) on affected v5.19: > > [ 5097.594021] loop0: detected capacity change from 0 to 524288 > > [ 5097.658201] operation not supported error, dev loop0, sector 262192 op 0x9:(WRITE_ZEROES) flags 0x8000800 phys_seg 0 prio class 0 > > [ 5097.675670] XFS (loop0): Mounting V5 Filesystem > > [ 5097.681668] XFS (loop0): Ending clean mount > > [ 5097.956445] XFS (loop0): Unmounting Filesystem > That warning is from mkfs attempting to use fallocate(ZERO_RANGE) to > offload the zeroing of the journal to the block device. It would > seem that the loop device image file is being hosted on a filesystem > that does not support the fallocate() ZERO_RANGE (or maybe > PUNCH_HOLE) operation. That warning should simply be removed - it > serves no useful purpose to a user... Interesting. Which one of these two is not supported on tmpfs? Kind regards, Petr > CHeers, > Dave. [1] https://github.com/linux-test-project/ltp/blob/f42f6f3b4671f447b743afe8612917ba4362b8a6/testcases/commands/df/df01.sh -- Mailing list info: https://lists.linux.it/listinfo/ltp ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [LTP] LTP test df01.sh detected different size of loop device in v5.19 2022-08-15 9:31 ` Petr Vorel @ 2022-08-15 20:09 ` Eric Sandeen 2022-08-18 15:25 ` Petr Vorel 0 siblings, 1 reply; 13+ messages in thread From: Eric Sandeen @ 2022-08-15 20:09 UTC (permalink / raw) To: Petr Vorel, Dave Chinner Cc: Jens Axboe, linux-xfs, Jan Kara, linux-block, Hannes Reinecke, ltp On 8/15/22 4:31 AM, Petr Vorel wrote: > Hi Dave, > >> On Fri, Aug 12, 2022 at 03:20:37PM +0200, Petr Vorel wrote: >>> Hi all, > >>> LTP test df01.sh found different size of loop device in v5.19. >>> Test uses loop device formatted on various file systems, only XFS fails. >>> It randomly fails during verifying that loop size usage changes: > >>> grep ${TST_DEVICE} output | grep -q "${total}.*${used}" [1] > >>> How to reproduce: >>> # PATH="/opt/ltp/testcases/bin:$PATH" df01.sh -f xfs # it needs several tries to hit > >>> df saved output: >>> Filesystem 1024-blocks Used Available Capacity Mounted on >>> ... >>> /dev/loop0 256672 16208 240464 7% /tmp/LTP_df01.1kRwoUCCR7/mntpoint >>> df output: >>> Filesystem 1024-blocks Used Available Capacity Mounted on >>> ... >>> tmpfs 201780 0 201780 0% /run/user/0 >>> /dev/loop0 256672 15160 241512 6% /tmp/LTP_df01.1kRwoUCCR7/mntpoint >>> => different size >>> df01 4 TFAIL: 'df -k -P' failed, not expected. > >> Yup, most likely because we changed something in XFS related to >> internal block reservation spaces. That is, the test is making >> fundamentally flawed assumptions about filesystem used space >> accounting. > >> It is wrong to assuming that the available capacity of a given empty >> filesystem will never change. Assumptions like this have been >> invalid for decades because the available space can change based on >> the underlying configuration or the filesystem. e.g. different >> versions of mkfs.xfs set different default parameters and so simply >> changing the version of xfsprogs you use between the two comparision >> tests will make it fail.... > >> And, well, XFS also has XFS_IOC_{GS}ET_RESBLKS ioctls that allow >> userspace to change the amount of reserved blocks. They were >> introduced in 1997, and since then we've changed the default >> reservation the filesystem takes at least a dozen times. > > Thanks a lot for valuable info. > >>>> It might be a false positive / bug in the test, but it's at least a changed behavior. > >> Yup, any test that assumes "available space" does not change from >> kernel version to kernel version is flawed. There is no guarantee >> that this ever stays the same, nor that it needs to stay the same. > I'm sorry I was not clear. Test [1] does not measure "available space" between > kernel releases. It just run df command with parameters, saves it's output > and compares "1024-blocks" and "Used" columns of df output with stat output: Annotating what these do... > local total=$(stat -f mntpoint --printf=%b) # number of blocks allocated > local free=$(stat -f mntpoint --printf=%f) # free blocks in filesystem > local used=$((total-free)) # (number of blocks free) > > local bsize=$(stat -f mntpoint --printf=%s) # block size ("for faster transfers") > total=$((($total * $bsize + 512)/ 1024)) # number of 1k blocks allocated? > used=$((($used * $bsize + 512) / 1024)) # number of 1k blocks used? > > And comparison with "$used" is what sometimes fails. > > BTW this happens on both distros when loop device is on tmpfs. I'm trying to > trigger it on ext4 and btrfs, not successful so far. Looks like it's tmpfs > related. > > If that's really expected, we might remove this check for used for XFS > (not sure if check only for total makes sense). It's kind of hard to follow this test, but it seems to be trying to ensure that df output is consistent with du (statfs) numbers, before and after creating and removing a 1MB file. I guess it's literally testing df itself, i.e. that it actually presents the numbers it obtained from statfs. AFAICT the difference in the failure is 1024 1K blocks, which is the size the file that's been created and removed during the test. My best guess is that this is xfs inode deferred inode inactivation hanging onto the space a little longer than the test expects. This is probably because the new-ish xfs inode inactivation no longer blocks on inode garbage collection during statfs(). IOWS, I think the test expects that free space is reflected in statfs numbers immediately after a file is removed, and that's no longer the case here. They change in between the df check and the statfs check. (The test isn't just checking that the values are correct, it is checking that the values are /immediately/ correct.) Putting a "sleep 1" after the "rm -f" in the test seems to fix it; IIRC the max time to wait for inodegc is 1s. This does slow the test down a bit. -Eric -- Mailing list info: https://lists.linux.it/listinfo/ltp ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [LTP] LTP test df01.sh detected different size of loop device in v5.19 2022-08-15 20:09 ` Eric Sandeen @ 2022-08-18 15:25 ` Petr Vorel 2022-08-18 16:05 ` Eric Sandeen 0 siblings, 1 reply; 13+ messages in thread From: Petr Vorel @ 2022-08-18 15:25 UTC (permalink / raw) To: Eric Sandeen Cc: Jens Axboe, linux-xfs, Jan Kara, Dave Chinner, linux-block, Hannes Reinecke, ltp Hi Eric, all, > On 8/15/22 4:31 AM, Petr Vorel wrote: > > Hi Dave, > >> On Fri, Aug 12, 2022 at 03:20:37PM +0200, Petr Vorel wrote: > >>> Hi all, > >>> LTP test df01.sh found different size of loop device in v5.19. > >>> Test uses loop device formatted on various file systems, only XFS fails. > >>> It randomly fails during verifying that loop size usage changes: > >>> grep ${TST_DEVICE} output | grep -q "${total}.*${used}" [1] > >>> How to reproduce: > >>> # PATH="/opt/ltp/testcases/bin:$PATH" df01.sh -f xfs # it needs several tries to hit > >>> df saved output: > >>> Filesystem 1024-blocks Used Available Capacity Mounted on > >>> ... > >>> /dev/loop0 256672 16208 240464 7% /tmp/LTP_df01.1kRwoUCCR7/mntpoint > >>> df output: > >>> Filesystem 1024-blocks Used Available Capacity Mounted on > >>> ... > >>> tmpfs 201780 0 201780 0% /run/user/0 > >>> /dev/loop0 256672 15160 241512 6% /tmp/LTP_df01.1kRwoUCCR7/mntpoint > >>> => different size > >>> df01 4 TFAIL: 'df -k -P' failed, not expected. > >> Yup, most likely because we changed something in XFS related to > >> internal block reservation spaces. That is, the test is making > >> fundamentally flawed assumptions about filesystem used space > >> accounting. > >> It is wrong to assuming that the available capacity of a given empty > >> filesystem will never change. Assumptions like this have been > >> invalid for decades because the available space can change based on > >> the underlying configuration or the filesystem. e.g. different > >> versions of mkfs.xfs set different default parameters and so simply > >> changing the version of xfsprogs you use between the two comparision > >> tests will make it fail.... > >> And, well, XFS also has XFS_IOC_{GS}ET_RESBLKS ioctls that allow > >> userspace to change the amount of reserved blocks. They were > >> introduced in 1997, and since then we've changed the default > >> reservation the filesystem takes at least a dozen times. > > Thanks a lot for valuable info. > >>>> It might be a false positive / bug in the test, but it's at least a changed behavior. > >> Yup, any test that assumes "available space" does not change from > >> kernel version to kernel version is flawed. There is no guarantee > >> that this ever stays the same, nor that it needs to stay the same. > > I'm sorry I was not clear. Test [1] does not measure "available space" between > > kernel releases. It just run df command with parameters, saves it's output > > and compares "1024-blocks" and "Used" columns of df output with stat output: > Annotating what these do... > > local total=$(stat -f mntpoint --printf=%b) # number of blocks allocated > > local free=$(stat -f mntpoint --printf=%f) # free blocks in filesystem > > local used=$((total-free)) # (number of blocks free) > > local bsize=$(stat -f mntpoint --printf=%s) # block size ("for faster transfers") > > total=$((($total * $bsize + 512)/ 1024)) # number of 1k blocks allocated? > > used=$((($used * $bsize + 512) / 1024)) # number of 1k blocks used? > > And comparison with "$used" is what sometimes fails. > > BTW this happens on both distros when loop device is on tmpfs. I'm trying to > > trigger it on ext4 and btrfs, not successful so far. Looks like it's tmpfs > > related. > > If that's really expected, we might remove this check for used for XFS > > (not sure if check only for total makes sense). > It's kind of hard to follow this test, but it seems to be trying to > ensure that df output is consistent with du (statfs) numbers, before > and after creating and removing a 1MB file. I guess it's literally > testing df itself, i.e. that it actually presents the numbers it obtained > from statfs. > AFAICT the difference in the failure is 1024 1K blocks, which is the size > the file that's been created and removed during the test. > My best guess is that this is xfs inode deferred inode inactivation hanging > onto the space a little longer than the test expects. > This is probably because the new-ish xfs inode inactivation no longer blocks > on inode garbage collection during statfs(). > IOWS, I think the test expects that free space is reflected in statfs numbers > immediately after a file is removed, and that's no longer the case here. They > change in between the df check and the statfs check. > (The test isn't just checking that the values are correct, it is checking that > the values are /immediately/ correct.) > Putting a "sleep 1" after the "rm -f" in the test seems to fix it; IIRC > the max time to wait for inodegc is 1s. This does slow the test down a bit. Sure, it looks like we can sleep just 50ms on my hw (although better might be to poll for the result [1]), I just wanted to make sure there is no bug/regression before hiding it with sleep. Thanks for your input! Kind regards, Petr [1] https://people.kernel.org/metan/why-sleep-is-almost-never-acceptable-in-tests > -Eric +++ testcases/commands/df/df01.sh @@ -63,6 +63,10 @@ df_test() tst_res TFAIL "'$cmd' failed." fi + if [ "$DF_FS_TYPE" = xfs ]; then + tst_sleep 50ms + fi + ROD_SILENT rm -rf mntpoint/testimg # flush file system buffers, then we can get the actual sizes. -- Mailing list info: https://lists.linux.it/listinfo/ltp ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [LTP] LTP test df01.sh detected different size of loop device in v5.19 2022-08-18 15:25 ` Petr Vorel @ 2022-08-18 16:05 ` Eric Sandeen 2022-08-18 16:27 ` Darrick J. Wong 2022-08-18 21:02 ` Dave Chinner 0 siblings, 2 replies; 13+ messages in thread From: Eric Sandeen @ 2022-08-18 16:05 UTC (permalink / raw) To: Petr Vorel Cc: Jens Axboe, linux-xfs, Jan Kara, Dave Chinner, linux-block, Hannes Reinecke, ltp On 8/18/22 10:25 AM, Petr Vorel wrote: > Hi Eric, all, > ... > >> IOWS, I think the test expects that free space is reflected in statfs numbers >> immediately after a file is removed, and that's no longer the case here. They >> change in between the df check and the statfs check. > >> (The test isn't just checking that the values are correct, it is checking that >> the values are /immediately/ correct.) > >> Putting a "sleep 1" after the "rm -f" in the test seems to fix it; IIRC >> the max time to wait for inodegc is 1s. This does slow the test down a bit. > > Sure, it looks like we can sleep just 50ms on my hw (although better might be to > poll for the result [1]), I just wanted to make sure there is no bug/regression > before hiding it with sleep. > > Thanks for your input! > > Kind regards, > Petr > > [1] https://people.kernel.org/metan/why-sleep-is-almost-never-acceptable-in-tests > >> -Eric > > +++ testcases/commands/df/df01.sh > @@ -63,6 +63,10 @@ df_test() > tst_res TFAIL "'$cmd' failed." > fi > > + if [ "$DF_FS_TYPE" = xfs ]; then > + tst_sleep 50ms > + fi > + Probably worth at least a comment as to why ... Dave / Darrick / Brian - I'm not sure how long it might take to finish inodegc? A too-short sleep will let the flakiness remain ... -Eric > ROD_SILENT rm -rf mntpoint/testimg > > # flush file system buffers, then we can get the actual sizes. > -- Mailing list info: https://lists.linux.it/listinfo/ltp ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [LTP] LTP test df01.sh detected different size of loop device in v5.19 2022-08-18 16:05 ` Eric Sandeen @ 2022-08-18 16:27 ` Darrick J. Wong 2022-08-18 17:01 ` Petr Vorel 2022-08-18 21:02 ` Dave Chinner 1 sibling, 1 reply; 13+ messages in thread From: Darrick J. Wong @ 2022-08-18 16:27 UTC (permalink / raw) To: Eric Sandeen Cc: Jens Axboe, linux-xfs, Jan Kara, Dave Chinner, linux-block, Hannes Reinecke, ltp On Thu, Aug 18, 2022 at 11:05:33AM -0500, Eric Sandeen wrote: > On 8/18/22 10:25 AM, Petr Vorel wrote: > > Hi Eric, all, > > > > ... > > > > >> IOWS, I think the test expects that free space is reflected in statfs numbers > >> immediately after a file is removed, and that's no longer the case here. They > >> change in between the df check and the statfs check. > > > >> (The test isn't just checking that the values are correct, it is checking that > >> the values are /immediately/ correct.) > > > >> Putting a "sleep 1" after the "rm -f" in the test seems to fix it; IIRC > >> the max time to wait for inodegc is 1s. This does slow the test down a bit. > > > > Sure, it looks like we can sleep just 50ms on my hw (although better might be to > > poll for the result [1]), I just wanted to make sure there is no bug/regression > > before hiding it with sleep. > > > > Thanks for your input! > > > > Kind regards, > > Petr > > > > [1] https://people.kernel.org/metan/why-sleep-is-almost-never-acceptable-in-tests > > > >> -Eric > > > > +++ testcases/commands/df/df01.sh > > @@ -63,6 +63,10 @@ df_test() > > tst_res TFAIL "'$cmd' failed." > > fi > > > > + if [ "$DF_FS_TYPE" = xfs ]; then > > + tst_sleep 50ms > > + fi > > + > > Probably worth at least a comment as to why ... > > Dave / Darrick / Brian - I'm not sure how long it might take to finish inodegc? > A too-short sleep will let the flakiness remain ... A fsfreeze -f / fsfreeze -u cycle will force all the background garbage collection to run to completion when precise free space accounting is being tested. --D > -Eric > > > ROD_SILENT rm -rf mntpoint/testimg > > > > # flush file system buffers, then we can get the actual sizes. > > -- Mailing list info: https://lists.linux.it/listinfo/ltp ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [LTP] LTP test df01.sh detected different size of loop device in v5.19 2022-08-18 16:27 ` Darrick J. Wong @ 2022-08-18 17:01 ` Petr Vorel 2022-08-18 21:19 ` Eric Sandeen 0 siblings, 1 reply; 13+ messages in thread From: Petr Vorel @ 2022-08-18 17:01 UTC (permalink / raw) To: Darrick J. Wong Cc: Jens Axboe, linux-xfs, Jan Kara, Eric Sandeen, Dave Chinner, linux-block, Hannes Reinecke, ltp > On Thu, Aug 18, 2022 at 11:05:33AM -0500, Eric Sandeen wrote: > > On 8/18/22 10:25 AM, Petr Vorel wrote: > > > Hi Eric, all, > > ... > > >> IOWS, I think the test expects that free space is reflected in statfs numbers > > >> immediately after a file is removed, and that's no longer the case here. They > > >> change in between the df check and the statfs check. > > >> (The test isn't just checking that the values are correct, it is checking that > > >> the values are /immediately/ correct.) > > >> Putting a "sleep 1" after the "rm -f" in the test seems to fix it; IIRC > > >> the max time to wait for inodegc is 1s. This does slow the test down a bit. > > > Sure, it looks like we can sleep just 50ms on my hw (although better might be to > > > poll for the result [1]), I just wanted to make sure there is no bug/regression > > > before hiding it with sleep. > > > Thanks for your input! > > > Kind regards, > > > Petr > > > [1] https://people.kernel.org/metan/why-sleep-is-almost-never-acceptable-in-tests > > >> -Eric > > > +++ testcases/commands/df/df01.sh > > > @@ -63,6 +63,10 @@ df_test() > > > tst_res TFAIL "'$cmd' failed." > > > fi > > > + if [ "$DF_FS_TYPE" = xfs ]; then > > > + tst_sleep 50ms > > > + fi > > > + > > Probably worth at least a comment as to why ... Sure, that was just to document possible fix. BTW even 200ms was not reliable in the long run => not a good solution. > > Dave / Darrick / Brian - I'm not sure how long it might take to finish inodegc? > > A too-short sleep will let the flakiness remain ... > A fsfreeze -f / fsfreeze -u cycle will force all the background garbage > collection to run to completion when precise free space accounting is > being tested. Thanks for a hint, do you mean to put it into df_test after creating file with dd to wrap second df_verify (calls df) and df_check (runs stat and compare values)? Because that does not help - it fails when running in the loop (managed to break after 5th run). Kind regards, Petr df_test() { local cmd="$1 -P" df_verify $cmd if [ $? -ne 0 ]; then return fi df_check $cmd if [ $? -ne 0 ]; then tst_res TFAIL "'$cmd' failed, not expected." return fi ROD_SILENT dd if=/dev/zero of=mntpoint/testimg bs=1024 count=1024 if [ "$DF_FS_TYPE" = xfs ]; then fsfreeze -f $TST_MNTPOINT fi df_verify $cmd df_check $cmd if [ $? -eq 0 ]; then tst_res TPASS "'$cmd' passed." else tst_res TFAIL "'$cmd' failed." fi if [ "$DF_FS_TYPE" = xfs ]; then fsfreeze -u $TST_MNTPOINT fi ROD_SILENT rm -rf mntpoint/testimg # flush file system buffers, then we can get the actual sizes. sync } df_verify() { $@ >output 2>&1 if [ $? -ne 0 ]; then grep -q -E "unrecognized option | invalid option" output if [ $? -eq 0 ]; then tst_res TCONF "'$@' not supported." return 32 else tst_res TFAIL "'$@' failed." cat output return 1 fi fi } df_check() { if [ "$(echo $@)" = "df -i -P" ]; then local total=$(stat -f mntpoint --printf=%c) local free=$(stat -f mntpoint --printf=%d) local used=$((total-free)) else local total=$(stat -f mntpoint --printf=%b) local free=$(stat -f mntpoint --printf=%f) local used=$((total-free)) local bsize=$(stat -f mntpoint --printf=%s) total=$((($total * $bsize + 512)/ 1024)) used=$((($used * $bsize + 512) / 1024)) fi grep ${TST_DEVICE} output | grep -q "${total}.*${used}" if [ $? -ne 0 ]; then echo "total: ${total}, used: ${used}" echo "df saved output:" cat output echo "df output:" $@ return 1 fi } > --D > > -Eric > > > ROD_SILENT rm -rf mntpoint/testimg > > > # flush file system buffers, then we can get the actual sizes. -- Mailing list info: https://lists.linux.it/listinfo/ltp ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [LTP] LTP test df01.sh detected different size of loop device in v5.19 2022-08-18 17:01 ` Petr Vorel @ 2022-08-18 21:19 ` Eric Sandeen 2022-08-19 16:00 ` Petr Vorel 0 siblings, 1 reply; 13+ messages in thread From: Eric Sandeen @ 2022-08-18 21:19 UTC (permalink / raw) To: Petr Vorel, Darrick J. Wong Cc: Jens Axboe, linux-xfs, Jan Kara, Eric Sandeen, Dave Chinner, linux-block, Hannes Reinecke, ltp On 8/18/22 12:01 PM, Petr Vorel wrote: >> On Thu, Aug 18, 2022 at 11:05:33AM -0500, Eric Sandeen wrote: >>> On 8/18/22 10:25 AM, Petr Vorel wrote: >>>> Hi Eric, all, > > >>> ... > > >>>>> IOWS, I think the test expects that free space is reflected in statfs numbers >>>>> immediately after a file is removed, and that's no longer the case here. They >>>>> change in between the df check and the statfs check. > >>>>> (The test isn't just checking that the values are correct, it is checking that >>>>> the values are /immediately/ correct.) > >>>>> Putting a "sleep 1" after the "rm -f" in the test seems to fix it; IIRC >>>>> the max time to wait for inodegc is 1s. This does slow the test down a bit. > >>>> Sure, it looks like we can sleep just 50ms on my hw (although better might be to >>>> poll for the result [1]), I just wanted to make sure there is no bug/regression >>>> before hiding it with sleep. > >>>> Thanks for your input! > >>>> Kind regards, >>>> Petr > >>>> [1] https://people.kernel.org/metan/why-sleep-is-almost-never-acceptable-in-tests > >>>>> -Eric > >>>> +++ testcases/commands/df/df01.sh >>>> @@ -63,6 +63,10 @@ df_test() >>>> tst_res TFAIL "'$cmd' failed." >>>> fi > >>>> + if [ "$DF_FS_TYPE" = xfs ]; then >>>> + tst_sleep 50ms >>>> + fi >>>> + > >>> Probably worth at least a comment as to why ... > > Sure, that was just to document possible fix. BTW even 200ms was not reliable in > the long run => not a good solution. > >>> Dave / Darrick / Brian - I'm not sure how long it might take to finish inodegc? >>> A too-short sleep will let the flakiness remain ... > >> A fsfreeze -f / fsfreeze -u cycle will force all the background garbage >> collection to run to completion when precise free space accounting is >> being tested. > Thanks for a hint, do you mean to put it into df_test after creating file with > dd to wrap second df_verify (calls df) and df_check (runs stat and compare values)? > Because that does not help - it fails when running in the loop (managed to break after 5th run). I think it would go after you remove the file, to ensure that no space usage changes are pending when you check. <tests> This seems to work fine (pseudopatch): ROD_SILENT rm -rf mntpoint/testimg + # Ensure free space change can be seen by statfs + fsfreeze -f $TST_MNTPOINT + fsfreeze -u $TST_MNTPOINT # flush file system buffers, then we can get the actual sizes. sync (although: what's the difference between $TST_MNTPOINT and mountpoint/ ?) You just don't want to accidentally freeze the root filesystem ;) -Eric -- Mailing list info: https://lists.linux.it/listinfo/ltp ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [LTP] LTP test df01.sh detected different size of loop device in v5.19 2022-08-18 21:19 ` Eric Sandeen @ 2022-08-19 16:00 ` Petr Vorel 2022-08-19 16:06 ` Bird, Tim 0 siblings, 1 reply; 13+ messages in thread From: Petr Vorel @ 2022-08-19 16:00 UTC (permalink / raw) To: Eric Sandeen Cc: Jens Axboe, linux-xfs, Eric Sandeen, Jan Kara, Darrick J. Wong, Dave Chinner, linux-block, Hannes Reinecke, ltp > On 8/18/22 12:01 PM, Petr Vorel wrote: > >> On Thu, Aug 18, 2022 at 11:05:33AM -0500, Eric Sandeen wrote: > >>> On 8/18/22 10:25 AM, Petr Vorel wrote: > >>>> Hi Eric, all, > >>> ... > >>>>> IOWS, I think the test expects that free space is reflected in statfs numbers > >>>>> immediately after a file is removed, and that's no longer the case here. They > >>>>> change in between the df check and the statfs check. > >>>>> (The test isn't just checking that the values are correct, it is checking that > >>>>> the values are /immediately/ correct.) > >>>>> Putting a "sleep 1" after the "rm -f" in the test seems to fix it; IIRC > >>>>> the max time to wait for inodegc is 1s. This does slow the test down a bit. > >>>> Sure, it looks like we can sleep just 50ms on my hw (although better might be to > >>>> poll for the result [1]), I just wanted to make sure there is no bug/regression > >>>> before hiding it with sleep. > >>>> Thanks for your input! > >>>> Kind regards, > >>>> Petr > >>>> [1] https://people.kernel.org/metan/why-sleep-is-almost-never-acceptable-in-tests > >>>>> -Eric > >>>> +++ testcases/commands/df/df01.sh > >>>> @@ -63,6 +63,10 @@ df_test() > >>>> tst_res TFAIL "'$cmd' failed." > >>>> fi > >>>> + if [ "$DF_FS_TYPE" = xfs ]; then > >>>> + tst_sleep 50ms > >>>> + fi > >>>> + > >>> Probably worth at least a comment as to why ... > > Sure, that was just to document possible fix. BTW even 200ms was not reliable in > > the long run => not a good solution. > >>> Dave / Darrick / Brian - I'm not sure how long it might take to finish inodegc? > >>> A too-short sleep will let the flakiness remain ... > >> A fsfreeze -f / fsfreeze -u cycle will force all the background garbage > >> collection to run to completion when precise free space accounting is > >> being tested. > > Thanks for a hint, do you mean to put it into df_test after creating file with > > dd to wrap second df_verify (calls df) and df_check (runs stat and compare values)? > > Because that does not help - it fails when running in the loop (managed to break after 5th run). > I think it would go after you remove the file, to ensure that no space usage > changes are pending when you check. > <tests> > This seems to work fine (pseudopatch): > ROD_SILENT rm -rf mntpoint/testimg > + # Ensure free space change can be seen by statfs > + fsfreeze -f $TST_MNTPOINT > + fsfreeze -u $TST_MNTPOINT It looks like it works. We might add small binary which just calls these 2 ioctl (FIFREEZE and FITHAW), just to be friendly to people on embedded environment with minimal dependencies (yes, some people might not install util-linux). > # flush file system buffers, then we can get the actual sizes. > sync > (although: what's the difference between $TST_MNTPOINT and mountpoint/ ?) Thanks for a report, fixed in 96ae882d3 ("df01.sh: Use $TST_MNTPOINT") > You just don't want to accidentally freeze the root filesystem ;) Sure :) Kind regards, Petr > -Eric -- Mailing list info: https://lists.linux.it/listinfo/ltp ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [LTP] LTP test df01.sh detected different size of loop device in v5.19 2022-08-19 16:00 ` Petr Vorel @ 2022-08-19 16:06 ` Bird, Tim 2022-08-19 19:30 ` Petr Vorel 0 siblings, 1 reply; 13+ messages in thread From: Bird, Tim @ 2022-08-19 16:06 UTC (permalink / raw) To: Petr Vorel, Eric Sandeen Cc: Jens Axboe, linux-block, Jan Kara, Dave Chinner, Eric Sandeen, linux-xfs, Hannes Reinecke, Darrick J. Wong, ltp > -----Original Message----- > From: ltp <ltp-bounces+tim.bird=sony.com@lists.linux.it> On Behalf Of Petr Vorel > > > On 8/18/22 12:01 PM, Petr Vorel wrote: > > >> On Thu, Aug 18, 2022 at 11:05:33AM -0500, Eric Sandeen wrote: > > >>> On 8/18/22 10:25 AM, Petr Vorel wrote: > > >>>> Hi Eric, all, > > > > >>> ... > > > > >>>>> IOWS, I think the test expects that free space is reflected in statfs numbers > > >>>>> immediately after a file is removed, and that's no longer the case here. They > > >>>>> change in between the df check and the statfs check. > > > >>>>> (The test isn't just checking that the values are correct, it is checking that > > >>>>> the values are /immediately/ correct.) > > > >>>>> Putting a "sleep 1" after the "rm -f" in the test seems to fix it; IIRC > > >>>>> the max time to wait for inodegc is 1s. This does slow the test down a bit. > > > >>>> Sure, it looks like we can sleep just 50ms on my hw (although better might be to > > >>>> poll for the result [1]), I just wanted to make sure there is no bug/regression > > >>>> before hiding it with sleep. > > > >>>> Thanks for your input! > > > >>>> Kind regards, > > >>>> Petr > > > >>>> [1] https://people.kernel.org/metan/why-sleep-is-almost-never-acceptable-in-tests > > > >>>>> -Eric > > > >>>> +++ testcases/commands/df/df01.sh > > >>>> @@ -63,6 +63,10 @@ df_test() > > >>>> tst_res TFAIL "'$cmd' failed." > > >>>> fi > > > >>>> + if [ "$DF_FS_TYPE" = xfs ]; then > > >>>> + tst_sleep 50ms > > >>>> + fi > > >>>> + > > > >>> Probably worth at least a comment as to why ... > > > > Sure, that was just to document possible fix. BTW even 200ms was not reliable in > > > the long run => not a good solution. > > > >>> Dave / Darrick / Brian - I'm not sure how long it might take to finish inodegc? > > >>> A too-short sleep will let the flakiness remain ... > > > >> A fsfreeze -f / fsfreeze -u cycle will force all the background garbage > > >> collection to run to completion when precise free space accounting is > > >> being tested. > > > Thanks for a hint, do you mean to put it into df_test after creating file with > > > dd to wrap second df_verify (calls df) and df_check (runs stat and compare values)? > > > Because that does not help - it fails when running in the loop (managed to break after 5th run). > > > I think it would go after you remove the file, to ensure that no space usage > > changes are pending when you check. > > > <tests> > > > This seems to work fine (pseudopatch): > > > ROD_SILENT rm -rf mntpoint/testimg > > > + # Ensure free space change can be seen by statfs > > + fsfreeze -f $TST_MNTPOINT > > + fsfreeze -u $TST_MNTPOINT > It looks like it works. We might add small binary which just calls these 2 > ioctl (FIFREEZE and FITHAW), just to be friendly to people on embedded > environment with minimal dependencies (yes, some people might not install > util-linux). Thank you!! It's good to know that small embedded systems are still considered, and the consideration is much appreciated! :-) Let me know if you'd like me to try writing the utility. -- Tim -- Mailing list info: https://lists.linux.it/listinfo/ltp ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [LTP] LTP test df01.sh detected different size of loop device in v5.19 2022-08-19 16:06 ` Bird, Tim @ 2022-08-19 19:30 ` Petr Vorel 0 siblings, 0 replies; 13+ messages in thread From: Petr Vorel @ 2022-08-19 19:30 UTC (permalink / raw) To: Bird, Tim Cc: Jens Axboe, Eric Sandeen, Dave Chinner, Jan Kara, Darrick J. Wong, Eric Sandeen, linux-xfs, linux-block, Hannes Reinecke, ltp > > -----Original Message----- > > From: ltp <ltp-bounces+tim.bird=sony.com@lists.linux.it> On Behalf Of Petr Vorel > > > On 8/18/22 12:01 PM, Petr Vorel wrote: > > > >> On Thu, Aug 18, 2022 at 11:05:33AM -0500, Eric Sandeen wrote: > > > >>> On 8/18/22 10:25 AM, Petr Vorel wrote: > > > >>>> Hi Eric, all, > > > >>> ... > > > >>>>> IOWS, I think the test expects that free space is reflected in statfs numbers > > > >>>>> immediately after a file is removed, and that's no longer the case here. They > > > >>>>> change in between the df check and the statfs check. > > > >>>>> (The test isn't just checking that the values are correct, it is checking that > > > >>>>> the values are /immediately/ correct.) > > > >>>>> Putting a "sleep 1" after the "rm -f" in the test seems to fix it; IIRC > > > >>>>> the max time to wait for inodegc is 1s. This does slow the test down a bit. > > > >>>> Sure, it looks like we can sleep just 50ms on my hw (although better might be to > > > >>>> poll for the result [1]), I just wanted to make sure there is no bug/regression > > > >>>> before hiding it with sleep. > > > >>>> Thanks for your input! > > > >>>> Kind regards, > > > >>>> Petr > > > >>>> [1] https://people.kernel.org/metan/why-sleep-is-almost-never-acceptable-in-tests > > > >>>>> -Eric > > > >>>> +++ testcases/commands/df/df01.sh > > > >>>> @@ -63,6 +63,10 @@ df_test() > > > >>>> tst_res TFAIL "'$cmd' failed." > > > >>>> fi > > > >>>> + if [ "$DF_FS_TYPE" = xfs ]; then > > > >>>> + tst_sleep 50ms > > > >>>> + fi > > > >>>> + > > > >>> Probably worth at least a comment as to why ... > > > > Sure, that was just to document possible fix. BTW even 200ms was not reliable in > > > > the long run => not a good solution. > > > >>> Dave / Darrick / Brian - I'm not sure how long it might take to finish inodegc? > > > >>> A too-short sleep will let the flakiness remain ... > > > >> A fsfreeze -f / fsfreeze -u cycle will force all the background garbage > > > >> collection to run to completion when precise free space accounting is > > > >> being tested. > > > > Thanks for a hint, do you mean to put it into df_test after creating file with > > > > dd to wrap second df_verify (calls df) and df_check (runs stat and compare values)? > > > > Because that does not help - it fails when running in the loop (managed to break after 5th run). > > > I think it would go after you remove the file, to ensure that no space usage > > > changes are pending when you check. > > > <tests> > > > This seems to work fine (pseudopatch): > > > ROD_SILENT rm -rf mntpoint/testimg > > > + # Ensure free space change can be seen by statfs > > > + fsfreeze -f $TST_MNTPOINT > > > + fsfreeze -u $TST_MNTPOINT > > It looks like it works. We might add small binary which just calls these 2 > > ioctl (FIFREEZE and FITHAW), just to be friendly to people on embedded > > environment with minimal dependencies (yes, some people might not install > > util-linux). > Thank you!! It's good to know that small embedded systems are still > considered, and the consideration is much appreciated! :-) > Let me know if you'd like me to try writing the utility. Thank you, Tim! I'll Cc you when sending this patch (likely next week). You might also appreciate our effort to lower down loop device size (used for all_filesystems): https://lore.kernel.org/ltp/Yv%2FkVXSK0xJGb3RO@pevik/ Kind regards, Petr > -- Tim -- Mailing list info: https://lists.linux.it/listinfo/ltp ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [LTP] LTP test df01.sh detected different size of loop device in v5.19 2022-08-18 16:05 ` Eric Sandeen 2022-08-18 16:27 ` Darrick J. Wong @ 2022-08-18 21:02 ` Dave Chinner 1 sibling, 0 replies; 13+ messages in thread From: Dave Chinner @ 2022-08-18 21:02 UTC (permalink / raw) To: Eric Sandeen Cc: Jens Axboe, linux-xfs, Jan Kara, linux-block, Hannes Reinecke, ltp On Thu, Aug 18, 2022 at 11:05:33AM -0500, Eric Sandeen wrote: > On 8/18/22 10:25 AM, Petr Vorel wrote: > > Hi Eric, all, > > > > ... > > > > >> IOWS, I think the test expects that free space is reflected in statfs numbers > >> immediately after a file is removed, and that's no longer the case here. They > >> change in between the df check and the statfs check. > > > >> (The test isn't just checking that the values are correct, it is checking that > >> the values are /immediately/ correct.) > > > >> Putting a "sleep 1" after the "rm -f" in the test seems to fix it; IIRC > >> the max time to wait for inodegc is 1s. This does slow the test down a bit. > > > > Sure, it looks like we can sleep just 50ms on my hw (although better might be to > > poll for the result [1]), I just wanted to make sure there is no bug/regression > > before hiding it with sleep. > > > > Thanks for your input! > > > > Kind regards, > > Petr > > > > [1] https://people.kernel.org/metan/why-sleep-is-almost-never-acceptable-in-tests > > > >> -Eric > > > > +++ testcases/commands/df/df01.sh > > @@ -63,6 +63,10 @@ df_test() > > tst_res TFAIL "'$cmd' failed." > > fi > > > > + if [ "$DF_FS_TYPE" = xfs ]; then > > + tst_sleep 50ms > > + fi > > + > > Probably worth at least a comment as to why ... > > Dave / Darrick / Brian - I'm not sure how long it might take to finish inodegc? > A too-short sleep will let the flakiness remain ... For a single 1MB file? inodegc is delayed 1 jiffie (max 10ms). If it's processed immediately because everything else is idle and nothing blocks, it will be done in 250-500 microsends. Of course, the moment you put such tests in a VM on a host that is running other VMs nothing is ever idle and inodegc delays are outside the control of the guest kernel/filesystem.... Cheers, Dave. -- Dave Chinner david@fromorbit.com -- Mailing list info: https://lists.linux.it/listinfo/ltp ^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2022-08-19 19:31 UTC | newest] Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- [not found] <YvZTpQFinpkB06p9@pevik> [not found] ` <YvZUfq+3HYwXEncw@pevik> 2022-08-12 14:00 ` [LTP] LTP test df01.sh detected different size of loop device in v5.19 Petr Vorel 2022-08-14 22:44 ` Dave Chinner 2022-08-15 9:31 ` Petr Vorel 2022-08-15 20:09 ` Eric Sandeen 2022-08-18 15:25 ` Petr Vorel 2022-08-18 16:05 ` Eric Sandeen 2022-08-18 16:27 ` Darrick J. Wong 2022-08-18 17:01 ` Petr Vorel 2022-08-18 21:19 ` Eric Sandeen 2022-08-19 16:00 ` Petr Vorel 2022-08-19 16:06 ` Bird, Tim 2022-08-19 19:30 ` Petr Vorel 2022-08-18 21:02 ` Dave Chinner
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).