linux-xfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Brian Foster <bfoster@redhat.com>
To: Alex Lyakas <alex@zadara.com>
Cc: linux-xfs@vger.kernel.org, Dave Chinner <david@fromorbit.com>
Subject: Re: [RFC-PATCH] xfs: do not update sunit/swidth in the superblock to match those provided during mount
Date: Mon, 25 Nov 2019 08:07:44 -0500	[thread overview]
Message-ID: <20191125130744.GA44777@bfoster> (raw)
In-Reply-To: <CAOcd+r3_gKYBv4vtM7nfPEPvkVp-FgHKvgQQx-_zMDt+QZ9z+g@mail.gmail.com>

On Sun, Nov 24, 2019 at 11:13:09AM +0200, Alex Lyakas wrote:
> Hi Brian,
> 
> Thank you for your response.
> 
> On Fri, Nov 22, 2019 at 5:43 PM Brian Foster <bfoster@redhat.com> wrote:
> >
> > On Thu, Nov 21, 2019 at 08:08:19PM +0200, Alex Lyakas wrote:
> > > We are hitting the following issue: if XFS is mounted with sunit/swidth different from those
> > > specified during mkfs, then xfs_repair reports false corruption and eventually segfaults.
> > >
> > > Example:
> > >
> > > # mkfs
> > > mkfs.xfs -f -K -p /etc/zadara/xfs.protofile -d sunit=64,swidth=64 -l sunit=32 /dev/vda
> > >
> > > #mount with a different sunit/swidth:
> > > mount -onoatime,sync,nouuid,sunit=32,swidth=32 /dev/vda /mnt/xfs
> > >
> >
> > FYI, I couldn't reproduce this at first because sparse inodes is enabled
> > by default and that introduces more strict inode alignment requirements.
> > I'm assuming that sparse inodes is disabled in your example, but it
> > would be more helpful if you included the exact configuration and mkfs
> > output in such reports.
> Providing more details about configuration:
> 
> kernel: 4.14.99
> xfsprogs: 4.9.0+nmu1ubuntu2
> content of /etc/zadara/xfs.protofile is captured in [1]
> mkfs output is captured in [2]
> 
> >
> > > #umount
> > > umount /mnt/xfs
> > >
> > ...
> > >
> > > Looking at the kernel code of XFS, there seems to be no need to update the superblock sunit/swidth if the mount-provided sunit/swidth are different.
> > > The superblock values are not used during runtime.
> > >
> >
> > I'm not really sure what the right answer is here. On one hand, this
> > patch seems fundamentally reasonable to me. I find it kind of odd for
> > mount options to override and persist configuration set in the
> > superblock like this. OTOH, this changes a historical behavior which may
> > (or may not) cause disruption for users. I also think it's somewhat
> > unfortunate to change kernel mount option behavior to accommodate
> > repair,
> This is very critical for us to have a working repair in production. I
> presume the same is true for most users.
> 

Of course.

> > but I think the mount option behavior being odd argument stands
> > on its own regardless.
> >
> > What is your actual use case for changing the stripe unit/width at mount
> > time like this?
> Our use case is like this: during mkfs the overall system does not
> know yet the exact sunit/swidth to be used. Also, the underlying
> storage can change its sunit/swidth alignment as part of some storage
> migration scenarios. During mount we already know the proper
> sunit/swidth. But the problem is that in order to specify sunit/swidth
> during mount, XFS superblock must be marked as "supporting data
> alignment", i.e., XFS_SB_VERSION_DALIGNBIT has to be set. Otherwise,
> XFS refuses to mount and says:
> 
>             xfs_warn(mp,
>     "cannot change alignment: superblock does not support data alignment");
>             return -EINVAL;
> 
> In order for the superblock to be marked like this, *some*
> sunit/swidth need to be specified during mkfs.
> 

What do you mean by "the overall system doesn't know the stripe unit
values" at mkfs time? Shouldn't these values originate from the
underlying storage? Is there some additional configuration at play in
this particular use case?

The migration use case certainly sounds valid, though my understanding
is that if an existing stripe configuration is in place, it's usually
best for the new storage configuration to take that into consideration
(and use some multiple of stripe unit or something, for example).

> >
> > > With the suggested patch, xfs repair is working properly also when mount-provided sunit/swidth are different.
> > >
> > > However, I am not sure whether this is the proper approach. Otherwise, should we not allow specifying different sunit/swidth during mount?
> > >
> > ...
> > >
> > > Signed-off-by: Alex Lyakas <alex@zadara.com>
> > > ---
> > >  fs/xfs/xfs_mount.c | 18 ++++++------------
> > >  1 file changed, 6 insertions(+), 12 deletions(-)
> > >
> > > diff --git a/fs/xfs/xfs_mount.c b/fs/xfs/xfs_mount.c
> > > index ba5b6f3..e8263b4 100644
> > > --- a/fs/xfs/xfs_mount.c
> > > +++ b/fs/xfs/xfs_mount.c
> > > @@ -399,19 +399,13 @@
> > >               }
> > >
> > >               /*
> > > -              * Update superblock with new values
> > > -              * and log changes
> > > +              * If sunit/swidth specified during mount do not match
> > > +              * those in the superblock, use the mount-specified values,
> > > +              * but do not update the superblock.
> > > +              * Otherwise, xfs_repair reports false corruption.
> > > +              * Here, only verify that superblock supports data alignment.
> > >                */
> > > -             if (xfs_sb_version_hasdalign(sbp)) {
> > > -                     if (sbp->sb_unit != mp->m_dalign) {
> > > -                             sbp->sb_unit = mp->m_dalign;
> > > -                             mp->m_update_sb = true;
> > > -                     }
> > > -                     if (sbp->sb_width != mp->m_swidth) {
> > > -                             sbp->sb_width = mp->m_swidth;
> > > -                             mp->m_update_sb = true;
> > > -                     }
> > > -             } else {
> > > +             if (!xfs_sb_version_hasdalign(sbp)) {
> >
> > Would this change xfs_info behavior on a filesystem mounted with
> > different runtime fields from the superblock? I haven't tested it, but
> > it looks like we pull the fields from the superblock.
> xfs_info uses XFS_IOC_FSGEOMETRY to get the values, and this pulls the
> values from the run-time copy of the superblock:
> 
> int
> xfs_fs_geometry(
>     xfs_mount_t        *mp,
>     xfs_fsop_geom_t        *geo,
>     int            new_version)
> ...
>     if (new_version >= 2) {
>         geo->sunit = mp->m_sb.sb_unit;
>         geo->swidth = mp->m_sb.sb_width;
>     }
> 
> So if during mount we have updated the superblock, we will pull the
> updated values. If we do not update (as the proposed patch), we will
> report the values stored in the superblock. Perhaps we need to update
> the geomtery ioctl to report runtime values?
> 

Ok, thanks. I'd argue that we should return the runtime results if we
took the proposed approach, but we should probably settle on a solution
first..

Brian

> Thanks,
> Alex.
> 
> >
> > Brian
> >
> > >                       xfs_warn(mp,
> > >       "cannot change alignment: superblock does not support data alignment");
> > >                       return -EINVAL;
> > > --
> > > 1.9.1
> > >
> >
> 
> [1]
> root@vc-20-01-48-dev:~# cat /etc/zadara/xfs.protofile
> dummy                   : bootfilename, not used, backward compatibility
> 0 0                             : numbers of blocks and inodes, not
> used, backward compatibility
> d--777 0 0              : set 777 perms for the root dir
> $
> $
> 
> [2]
> root@vc-20-01-48-dev:~# mkfs.xfs -f -K -p /etc/zadara/xfs.protofile -d
> sunit=64,swidth=64 -l sunit=32 /dev/vda
> meta-data=/dev/vda               isize=512    agcount=16, agsize=163832 blks
>          =                       sectsz=512   attr=2, projid32bit=1
>          =                       crc=1        finobt=1, sparse=0,
> rmapbt=0, reflink=0
> data     =                       bsize=4096   blocks=2621312, imaxpct=25
>          =                       sunit=8      swidth=8 blks
> naming   =version 2              bsize=4096   ascii-ci=0 ftype=1
> log      =internal log           bsize=4096   blocks=2560, version=2
>          =                       sectsz=512   sunit=4 blks, lazy-count=1
> realtime =none                   extsz=4096   blocks=0, rtextents=0
> 


  parent reply	other threads:[~2019-11-25 13:07 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-11-21 18:08 [RFC-PATCH] xfs: do not update sunit/swidth in the superblock to match those provided during mount Alex Lyakas
2019-11-22 15:43 ` Brian Foster
2019-11-24  9:13   ` Alex Lyakas
2019-11-24 16:40     ` Darrick J. Wong
2019-11-24 17:38       ` Eric Sandeen
2019-11-25 13:07         ` Brian Foster
2019-11-26  8:50           ` Alex Lyakas
2019-11-25 13:07     ` Brian Foster [this message]
2019-11-26  8:49       ` Alex Lyakas
2019-11-26 11:54         ` Brian Foster
2019-11-26 13:37           ` Alex Lyakas
2019-11-26 16:53             ` Eric Sandeen
2019-11-27 14:19               ` Christoph Hellwig
2019-11-27 15:19                 ` Brian Foster
2019-11-30 20:28                 ` Dave Chinner
2019-12-01  9:00                   ` Alex Lyakas
2019-12-01 21:57                     ` Dave Chinner
2019-12-02  8:07                       ` Alex Lyakas
2019-12-01 23:29                     ` Dave Chinner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20191125130744.GA44777@bfoster \
    --to=bfoster@redhat.com \
    --cc=alex@zadara.com \
    --cc=david@fromorbit.com \
    --cc=linux-xfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).