Re: [RFC-PATCH] xfs: do not update sunit/swidth in the superblock to match those provided during mount

From: Alex Lyakas <alex@zadara.com>
To: Dave Chinner <david@fromorbit.com>
Cc: Christoph Hellwig <hch@infradead.org>,
	Eric Sandeen <sandeen@sandeen.net>,
	Brian Foster <bfoster@redhat.com>,
	linux-xfs@vger.kernel.org
Subject: Re: [RFC-PATCH] xfs: do not update sunit/swidth in the superblock to match those provided during mount
Date: Mon, 2 Dec 2019 10:07:42 +0200	[thread overview]
Message-ID: <CAOcd+r2gaoZz4=xWAh+d=-TNTOtBDvbEezN9OH3oks2p1TNhBg@mail.gmail.com> (raw)
In-Reply-To: <20191201215732.GB2695@dread.disaster.area>

Hi Dave,

On Sun, Dec 1, 2019 at 11:57 PM Dave Chinner <david@fromorbit.com> wrote:
>
> On Sun, Dec 01, 2019 at 11:00:32AM +0200, Alex Lyakas wrote:
> > Hi Dave,
> >
> > Thank you for your response.
> >
> > On Sat, Nov 30, 2019 at 10:28 PM Dave Chinner <david@fromorbit.com> wrote:
> > >
> > > On Wed, Nov 27, 2019 at 06:19:29AM -0800, Christoph Hellwig wrote:
> > > > Can we all take a little step back and think about the implications
> > > > of the original patch from Alex?  Because I think there is very little.
> > > > And updated sunit/swidth is just a little performance optimization,
> > > > and anyone who really cares about changing that after the fact can
> > > > trivially add those to fstab.
> > > >
> > > > So I think something like his original patch plus a message during
> > > > mount that the new values are not persisted should be perfectly fine.
> > >
> > > Well, the original purpose of the mount options was to persist a new
> > > sunit/swidth to the superblock...
> > >
> > > Let's ignore the fact that it was a result of a CXFS client mount
> > > bug trashing the existing sunit/swidth values, and instead focus on
> > > the fact we've been telling people for years that you "only need to
> > > set these once after a RAID reshape" and so we have a lot of users
> > > out there expecting it to persist the new values...
> > >
> > > I don't think we can just redefine the documented and expected
> > > behaviour of a mount option like this.
> > >
> > > With that in mind, the xfs(5) man page explicitly states this:
> > >
> > >         The sunit and swidth parameters specified must be compatible
> > >         with the existing filesystem alignment characteristics.  In
> > >         general,  that  means  the  only  valid changes to sunit are
> > >         increasing it by a power-of-2 multiple. Valid swidth values
> > >         are any integer multiple of a valid sunit value.
> > >
> > > Note the comment about changes to sunit? What is being done here -
> > > halving the sunit from 64 to 32 blocks is invalid, documented as
> > > invalid, but the kernel does not enforce this. We should fix the
> > > kernel code to enforce the alignment rules that the mount option
> > > is documented to require.
> > >
> > > If we want to change the alignment characteristics after mkfs, then
> > > use su=1,sw=1 as the initial values, then the first mount can use
> > > the options to change it to whatever is present after mkfs has run.
> >
> > If I understand your response correctly:
> > - some sunit/swidth changes during mount are legal and some aren't
> > - the legal changes should be persisted in the superblock
>
> Yup.
>
> > What about the repair? Even if user performs a legal change, it still
> > breaks the repairability of the file system.
>
> It is not a legal ichange if it moves the root inode to a new
> location. IOWs, if the alignment mods will result in the root inode
> changing location, then it should be rejected by the kernel.
>
> Anyway, we need more details about your test environment, because
> the example you gave:
>
> | # mkfs
> | mkfs.xfs -f -K -p /etc/zadara/xfs.protofile -d sunit=64,swidth=64 -l sunit=32 /dev/vda
> |
> | #mount with a different sunit/swidth:
> | mount -onoatime,sync,nouuid,sunit=32,swidth=32 /dev/vda /mnt/xfs
> |
> | #umount
> | umount /mnt/xfs
> |
> | #xfs_repair
> | xfs_repair -n /dev/vda
> | # reports false corruption and eventually segfaults[1]
>
> Does not reproduce the reported failure on my workstation running
> v5.3.0 kernel and a v5.0.0 xfsprogs:
>
> $ sudo mkfs.xfs -f -d file,name=1t.img,size=1t,sunit=64,swidth=64 -l sunit=32
> meta-data=1t.img                 isize=512    agcount=32, agsize=8388608 blks
>          =                       sectsz=512   attr=2, projid32bit=1
>          =                       crc=1        finobt=1, sparse=1, rmapbt=0
>          =                       reflink=0
> data     =                       bsize=4096   blocks=268435456, imaxpct=5
>          =                       sunit=8      swidth=8 blks
> naming   =version 2              bsize=4096   ascii-ci=0, ftype=1
> log      =internal log           bsize=4096   blocks=131072, version=2
>          =                       sectsz=512   sunit=4 blks, lazy-count=1
> realtime =none                   extsz=4096   blocks=0, rtextents=0

I believe in one of earlier emails in this thread, Brian mentioned
that this is related to "sparse inodes" being enabled:

"I couldn't reproduce this at first because sparse inodes is enabled
by default and that introduces more strict inode alignment requirements."

Can you please try mkfs'ing with spare inodes disabled?

Thanks,
Alex.

> $ sudo mount -o loop -o sunit=32,swidth=32 1t.img /mnt/1t
> $ sudo xfs_info /mnt/1t
> ....
> data     =                       bsize=4096   blocks=268435456, imaxpct=5
>          =                       sunit=4      swidth=4 blks
> ....
> $ sudo umount /mnt/1t
> $ sudo xfs_repair -f 1t.img
> Phase 1 - find and verify superblock...
>         - reporting progress in intervals of 15 minutes
> Phase 2 - using internal log
>         - zero log...
>         - scan filesystem freespace and inode maps...
>         - 08:42:14: scanning filesystem freespace - 32 of 32 allocation groups done
>         - found root inode chunk
> Phase 3 - for each AG...
>         - scan and clear agi unlinked lists...
> ....
> Phase 7 - verify and correct link counts...
>         - 08:42:18: verify and correct link counts - 32 of 32 allocation groups done
> done
> $ echo $?
> 0
> $ sudo mount -o loop 1t.img /mnt/1t
> $ sudo xfs_info /mnt/1t
> ....
> data     =                       bsize=4096   blocks=268435456, imaxpct=5
>          =                       sunit=4      swidth=4 blks
> ....
> $
>
> So reducing the sunit doesn't necessarily change the root inode
> location, and so in some cases reducing the sunit doesn't change
> the root inode location, either.
>
> > For now, we made a local change to not persist sunit/swidth updates in
> > the superblock. Because we must have a working repair, and our kernel
> > (4.14 stable) allows any sunit/swidth changes.
>
> From the above, it's not clear where the problem lies - it may be
> that there's a bug in repair we've fixed since whatever version you
> are using....
>
> > We can definitely adhere to the recommended behavior of setting
> > sunit/swidth=1 during mkfs, provided the repair still works after
> > mounting with different sunit/swidth.
>
> ... hence I'd suggest that more investigation needs to be done
> before you do anything permanent...
>
> Cheers,
>
> Dave.
> --
> Dave Chinner
> david@fromorbit.com