generic/204 failure due to e88b64e xfs: use generic percpu counters for free inode counter

All of lore.kernel.org
 help / color / mirror / Atom feed

* generic/204 failure due to e88b64e xfs: use generic percpu counters for free inode counter
@ 2015-04-28 16:56 Eryu Guan
  2015-04-28 20:49 ` Dave Chinner
  0 siblings, 1 reply; 3+ messages in thread
From: Eryu Guan @ 2015-04-28 16:56 UTC (permalink / raw)
  To: xfs; +Cc: xuw2015

Hi,

I was testing v4.1-rc1 kernel and hit generic/204 failure on 512b block
size v4 xfs and 1k block size v5 xfs. And this seems to be a regression
since v4.0

[root@dhcp-66-86-11 xfstests]# MKFS_OPTIONS="-b size=512" ./check generic/204
FSTYP         -- xfs (non-debug)
PLATFORM      -- Linux/x86_64 dhcp-66-86-11 4.0.0-rc1+
MKFS_OPTIONS  -- -f -b size=512 /dev/sda6
MOUNT_OPTIONS -- -o context=system_u:object_r:nfs_t:s0 /dev/sda6 /mnt/testarea/scratch

generic/204 8s ... - output mismatch (see /root/xfstests/results//generic/204.out.bad)
    --- tests/generic/204.out   2014-12-11 00:28:13.409000000 +0800
    +++ /root/xfstests/results//generic/204.out.bad     2015-04-29 00:36:43.232000000 +0800
    @@ -1,2 +1,37664 @@
     QA output created by 204
    +./tests/generic/204: line 83: /mnt/testarea/scratch/108670: No space left on device
    +./tests/generic/204: line 84: /mnt/testarea/scratch/108670: No space left on device
    ...

I bisected to this commit

e88b64e xfs: use generic percpu counters for free inode counter

Seems like the same issue this patch tries to fix, but test still fails
after applying this patch.

[PATCH v2] xfs: use percpu_counter_read_positive for mp->m_icount
http://oss.sgi.com/archives/xfs/2015-04/msg00195.html

Not sure if it's the expected behavior/a known issue, report it to the
list anyway.

Thanks,
Eryu

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: generic/204 failure due to e88b64e xfs: use generic percpu counters for free inode counter
  2015-04-28 16:56 generic/204 failure due to e88b64e xfs: use generic percpu counters for free inode counter Eryu Guan
@ 2015-04-28 20:49 ` Dave Chinner
  2015-04-30  6:57   ` Eryu Guan
  0 siblings, 1 reply; 3+ messages in thread
From: Dave Chinner @ 2015-04-28 20:49 UTC (permalink / raw)
  To: Eryu Guan; +Cc: xuw2015, xfs

On Wed, Apr 29, 2015 at 12:56:34AM +0800, Eryu Guan wrote:
> Hi,
> 
> I was testing v4.1-rc1 kernel and hit generic/204 failure on 512b block
> size v4 xfs and 1k block size v5 xfs. And this seems to be a regression
> since v4.0

Firstly, knowing your exact test machine and xfstests configuration
is important here, so:

http://xfs.org/index.php/XFS_FAQ#Q:_What_information_should_I_include_when_reporting_a_problem.3F

> [root@dhcp-66-86-11 xfstests]# MKFS_OPTIONS="-b size=512" ./check generic/204
> FSTYP         -- xfs (non-debug)
> PLATFORM      -- Linux/x86_64 dhcp-66-86-11 4.0.0-rc1+
> MKFS_OPTIONS  -- -f -b size=512 /dev/sda6
> MOUNT_OPTIONS -- -o context=system_u:object_r:nfs_t:s0 /dev/sda6 /mnt/testarea/scratch
> 
> generic/204 8s ... - output mismatch (see /root/xfstests/results//generic/204.out.bad)
>     --- tests/generic/204.out   2014-12-11 00:28:13.409000000 +0800
>     +++ /root/xfstests/results//generic/204.out.bad     2015-04-29 00:36:43.232000000 +0800
>     @@ -1,2 +1,37664 @@
>      QA output created by 204
>     +./tests/generic/204: line 83: /mnt/testarea/scratch/108670: No space left on device
>     +./tests/generic/204: line 84: /mnt/testarea/scratch/108670: No space left on device
>     ...
> I bisected to this commit
>
> e88b64e xfs: use generic percpu counters for free inode counter

I don't think that this is the actual cause of the issue, because I
have records of generic/204 failing on 1k v5 filesystems every so
often going back to the start of the log file I have for my v5/1k
test config:

$ grep "Failures\|EST" results/check.log |grep -B 1 generic/204
Wed Jun 19 11:26:35 EST 2013
Failures: generic/204 generic/225 generic/231 generic/263 generic/306
Wed Jun 19 12:49:08 EST 2013
Failures: generic/204 generic/225 generic/231 generic/263 generic/270
--
Mon Jul  8 17:23:44 EST 2013
Failures: generic/204
Mon Jul  8 20:37:50 EST 2013
Failures: generic/204 generic/225 generic/231 generic/263 generic/306
--
Thu Jul 18 16:55:26 EST 2013
Failures: generic/015 generic/077 generic/193 generic/204
--
Mon Jul 29 19:42:49 EST 2013
Failures: generic/193 generic/204 generic/225 generic/230 generic/231
Mon Aug 12 19:40:53 EST 2013
Failures: generic/193 generic/204 generic/225 generic/230 generic/23
....

> Seems like the same issue this patch tries to fix, but test still fails
> after applying this patch.
> 
> [PATCH v2] xfs: use percpu_counter_read_positive for mp->m_icount
> http://oss.sgi.com/archives/xfs/2015-04/msg00195.html
> 
> Not sure if it's the expected behavior/a known issue, report it to the
> list anyway.

Repeating the test on v4/512b, I get the same result as you.

$ cat results/generic/204.full
files 127500, resvblks 1024
reserved blocks = 1024
available reserved blocks = 1024
$

Ok, those numbers add up to exactly 97,920,000 bytes, as per the
test config.

$ sudo mount /dev/vdb /mnt/scratch
$ df -h /mnt/scratch
Filesystem      Size  Used Avail Use% Mounted on
/dev/vdb         99M   87M   13M  88% /mnt/scratch
$ df -i /mnt/scratch
Filesystem     Inodes  IUsed  IFree IUse% Mounted on
/dev/vdb       108608 108608      0  100% /mnt/scratch
$

And for v5/1k:

$ sudo mkfs.xfs -f -m crc=1,finobt=1 -b size=1k -d size=$((106 * 1024 * 1024)) -l size=7m /dev/vdb
meta-data=/dev/vdb               isize=512    agcount=4, agsize=27136 blks
         =                       sectsz=512   attr=2, projid32bit=1
         =                       crc=1        finobt=1
data     =                       bsize=1024   blocks=108544, imaxpct=25
         =                       sunit=0      swidth=0 blks
naming   =version 2              bsize=4096   ascii-ci=0 ftype=1
log      =internal log           bsize=1024   blocks=7168, version=2
         =                       sectsz=512   sunit=0 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0
$ sudo mount /dev/vdb /mnt/scratch
$ df -i /mnt/scratch
Filesystem     Inodes IUsed IFree IUse% Mounted on
/dev/vdb        54272     3 54269    1% /mnt/scratch
$ 

Yup, it's clear *why* it is failing, too. There aren't enough free
inodes configured by mkfs.  That means it's the mkfs imaxpct config
that is the issue here, not the commit that made the max inode
threshold more accurate...

Adding "-i maxpct=50" to the mkfs command allows the test to pass on
both v4/512 and v5/1k filesystems.  IOWs, it does not appear to be
code problem but is a test config problem...

Can you send a patch to fstests@vger.kernel.org that fixes the test
for these configs?

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: generic/204 failure due to e88b64e xfs: use generic percpu counters for free inode counter
  2015-04-28 20:49 ` Dave Chinner
@ 2015-04-30  6:57   ` Eryu Guan
  0 siblings, 0 replies; 3+ messages in thread
From: Eryu Guan @ 2015-04-30  6:57 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xuw2015, xfs

On Wed, Apr 29, 2015 at 06:49:25AM +1000, Dave Chinner wrote:
> On Wed, Apr 29, 2015 at 12:56:34AM +0800, Eryu Guan wrote:
> > Hi,
> >
> > I was testing v4.1-rc1 kernel and hit generic/204 failure on 512b block
> > size v4 xfs and 1k block size v5 xfs. And this seems to be a regression
> > since v4.0
>
> Firstly, knowing your exact test machine and xfstests configuration
> is important here, so:
>
> http://xfs.org/index.php/XFS_FAQ#Q:_What_information_should_I_include_when_reporting_a_problem.3F

Thanks, I'll follow it next time. (I know about this link, but I hit the
issue on different hosts, both vm and baremetal, so I thought it's not
relevant to hardware, but I still missed the test configs..)

>
> > [root@dhcp-66-86-11 xfstests]# MKFS_OPTIONS="-b size=512" ./check generic/204
> > FSTYP         -- xfs (non-debug)
> > PLATFORM      -- Linux/x86_64 dhcp-66-86-11 4.0.0-rc1+
> > MKFS_OPTIONS  -- -f -b size=512 /dev/sda6
> > MOUNT_OPTIONS -- -o context=system_u:object_r:nfs_t:s0 /dev/sda6 /mnt/testarea/scratch
> >
> > generic/204 8s ... - output mismatch (see /root/xfstests/results//generic/204.out.bad)
> >     --- tests/generic/204.out   2014-12-11 00:28:13.409000000 +0800
> >     +++ /root/xfstests/results//generic/204.out.bad     2015-04-29 00:36:43.232000000 +0800
> >     @@ -1,2 +1,37664 @@
> >      QA output created by 204
> >     +./tests/generic/204: line 83: /mnt/testarea/scratch/108670: No space left on device
> >     +./tests/generic/204: line 84: /mnt/testarea/scratch/108670: No space left on device
> >     ...
> > I bisected to this commit
> >
> > e88b64e xfs: use generic percpu counters for free inode counter

Sorry, I pasted the wrong commit (again..), it should be

501ab32 xfs: use generic percpu counters for inode counter

>
> I don't think that this is the actual cause of the issue, because I
> have records of generic/204 failing on 1k v5 filesystems every so
> often going back to the start of the log file I have for my v5/1k
> test config:
>
> $ grep "Failures\|EST" results/check.log |grep -B 1 generic/204
> Wed Jun 19 11:26:35 EST 2013
> Failures: generic/204 generic/225 generic/231 generic/263 generic/306
> Wed Jun 19 12:49:08 EST 2013
> Failures: generic/204 generic/225 generic/231 generic/263 generic/270
> --
> Mon Jul  8 17:23:44 EST 2013
> Failures: generic/204
> Mon Jul  8 20:37:50 EST 2013
> Failures: generic/204 generic/225 generic/231 generic/263 generic/306
> --
> Thu Jul 18 16:55:26 EST 2013
> Failures: generic/015 generic/077 generic/193 generic/204
> --
> Mon Jul 29 19:42:49 EST 2013
> Failures: generic/193 generic/204 generic/225 generic/230 generic/231
> Mon Aug 12 19:40:53 EST 2013
> Failures: generic/193 generic/204 generic/225 generic/230 generic/23
> ....

I noticed that the failures are quite old, generic/204 got updated
several times to make it pass in 2014, especially this commit

31a50c7 generic/204: tweak reserve pool size (Mon Apr 28 10:54:27 2014)

The commit log says

'This makes the test pass on a filesystem made with MKFS_OPTIONS="-b
size=1024 -m crc=1".'

So I think it's a new failure since v4.0

>
> > Seems like the same issue this patch tries to fix, but test still fails
> > after applying this patch.
> >
> > [PATCH v2] xfs: use percpu_counter_read_positive for mp->m_icount
> > http://oss.sgi.com/archives/xfs/2015-04/msg00195.html
> >
> > Not sure if it's the expected behavior/a known issue, report it to the
> > list anyway.
>
> Repeating the test on v4/512b, I get the same result as you.
>
> $ cat results/generic/204.full
> files 127500, resvblks 1024
> reserved blocks = 1024
> available reserved blocks = 1024
> $
>
> Ok, those numbers add up to exactly 97,920,000 bytes, as per the
> test config.
>
> $ sudo mount /dev/vdb /mnt/scratch
> $ df -h /mnt/scratch
> Filesystem      Size  Used Avail Use% Mounted on
> /dev/vdb         99M   87M   13M  88% /mnt/scratch
> $ df -i /mnt/scratch
> Filesystem     Inodes  IUsed  IFree IUse% Mounted on
> /dev/vdb       108608 108608      0  100% /mnt/scratch
> $
>
> And for v5/1k:
>
> $ sudo mkfs.xfs -f -m crc=1,finobt=1 -b size=1k -d size=$((106 * 1024 * 1024)) -l size=7m /dev/vdb
> meta-data=/dev/vdb               isize=512    agcount=4, agsize=27136 blks
>          =                       sectsz=512   attr=2, projid32bit=1
>          =                       crc=1        finobt=1
> data     =                       bsize=1024   blocks=108544, imaxpct=25
>          =                       sunit=0      swidth=0 blks
> naming   =version 2              bsize=4096   ascii-ci=0 ftype=1
> log      =internal log           bsize=1024   blocks=7168, version=2
>          =                       sectsz=512   sunit=0 blks, lazy-count=1
> realtime =none                   extsz=4096   blocks=0, rtextents=0
> $ sudo mount /dev/vdb /mnt/scratch
> $ df -i /mnt/scratch
> Filesystem     Inodes IUsed IFree IUse% Mounted on
> /dev/vdb        54272     3 54269    1% /mnt/scratch
> $
>
> Yup, it's clear *why* it is failing, too. There aren't enough free
> inodes configured by mkfs.  That means it's the mkfs imaxpct config
> that is the issue here, not the commit that made the max inode
> threshold more accurate...

I did some comparison on "good" kernel and "bad" kernel(output of
xfs_info, df -i, df -h and 204.full after test), here is the diff

[root@dhcp-66-86-11 xfstests]# diff -Nu 204.good 204.bad
--- 204.good    2015-04-29 22:00:13.274000000 +0800
+++ 204.bad     2015-04-29 19:51:15.195000000 +0800
@@ -10,10 +10,10 @@
 realtime =none                   extsz=4096   blocks=0, rtextents=0
 [root@dhcp-66-86-11 xfstests]# df -i /mnt/scratch
 Filesystem     Inodes IUsed IFree IUse% Mounted on
-/dev/sda6       63808 63753    55  100% /mnt/scratch
+/dev/sda6       54528 54528     0  100% /mnt/scratch
 [root@dhcp-66-86-11 xfstests]# df -h /mnt/scratch
 Filesystem      Size  Used Avail Use% Mounted on
-/dev/sda6        99M   99M     0 100% /mnt/scratch
+/dev/sda6        99M   88M   12M  89% /mnt/scratch
 [root@dhcp-66-86-11 xfstests]# cat results/generic/204.full
 files 63750, resvblks 1024
 reserved blocks = 1024

So the only difference is the max inode count, "bad" kernel has a lower
up limit of max inode count.

More experiments show that the icount is more accurate on "bad" kernel.

fs/xfs/libxfs/xfs_ialloc.c:1343
        if (mp->m_maxicount &&
            percpu_counter_read(&mp->m_icount) + mp->m_ialloc_inos >
                                                        mp->m_maxicount) {
                noroom = 1;
                okalloc = 0;
        }

"Good" kernel uses mp->m_sb.sb_icount, which is not accurate during the
test(256), and it never hits the "noroom" condition. "Bad" kernel uses
percpu counter and the &mp->m_icount is a more accurate number(54000+),
so it hits "noroom" in the test.

>
> Adding "-i maxpct=50" to the mkfs command allows the test to pass on
> both v4/512 and v5/1k filesystems.  IOWs, it does not appear to be
> code problem but is a test config problem...

I agree it's not a code problem, I think it's kind of expected behavior.
And confirmed that adding "-i maxpct=50" makes test pass again.

>
> Can you send a patch to fstests@vger.kernel.org that fixes the test
> for these configs?

Sure, will do.

Thanks for the explanation!

Eryu

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2015-04-30  6:57 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-04-28 16:56 generic/204 failure due to e88b64e xfs: use generic percpu counters for free inode counter Eryu Guan
2015-04-28 20:49 ` Dave Chinner
2015-04-30  6:57   ` Eryu Guan

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.