All of lore.kernel.org
 help / color / mirror / Atom feed
From: Brian Foster <bfoster@redhat.com>
To: Dave Chinner <david@fromorbit.com>
Cc: "Darrick J. Wong" <darrick.wong@oracle.com>,
	Eric Sandeen <sandeen@sandeen.net>,
	linux-xfs@vger.kernel.org
Subject: Re: [PATCH 2/6] xfs: verify extent size hint is valid in inode verifier
Date: Tue, 21 Aug 2018 06:56:45 -0400	[thread overview]
Message-ID: <20180821105645.GA14228@bfoster> (raw)
In-Reply-To: <20180820221506.GG31495@dastard>

On Tue, Aug 21, 2018 at 08:15:06AM +1000, Dave Chinner wrote:
> On Mon, Aug 20, 2018 at 11:59:18AM -0400, Brian Foster wrote:
> > On Mon, Aug 20, 2018 at 08:36:26AM -0700, Darrick J. Wong wrote:
> > > On Mon, Aug 20, 2018 at 10:27:42AM -0500, Eric Sandeen wrote:
> > > > On 8/20/18 10:06 AM, Brian Foster wrote:
> > > > > On Tue, Jul 24, 2018 at 09:43:46AM -0700, Darrick J. Wong wrote:
> > > > >> On Mon, Jul 23, 2018 at 11:39:53PM -0700, Eric Sandeen wrote:
> > > > >>> On 6/4/18 11:24 PM, Dave Chinner wrote:
> > > > >>>> From: Dave Chinner <dchinner@redhat.com>
> > > > >>>>
> > > > >>>> There are rules for vald extent size hints. We enforce them when
> > > > >>>> applications set them, but fuzzers violate those rules and that
> > > > >>>> screws us over.
> > > > >>>>
> > > > >>>> This results in alignment assertion failures when setting up
> > > > >>>> allocations such as this in direct IO:
> > > > >>>>
> > > > >>>> XFS: Assertion failed: ap->length, file: fs/xfs/libxfs/xfs_bmap.c, line: 3432
> > > > >>>> ....
> > > > >>>> Call Trace:
> > > > >>>>  xfs_bmap_btalloc+0x415/0x910
> > > > >>>>  xfs_bmapi_write+0x71c/0x12e0
> > > > >>>>  xfs_iomap_write_direct+0x2a9/0x420
> > > > >>>>  xfs_file_iomap_begin+0x4dc/0xa70
> > > > >>>>  iomap_apply+0x43/0x100
> > > > >>>>  iomap_file_buffered_write+0x62/0x90
> > > > >>>>  xfs_file_buffered_aio_write+0xba/0x300
> > > > >>>>  __vfs_write+0xd5/0x150
> > > > >>>>  vfs_write+0xb6/0x180
> > > > >>>>  ksys_write+0x45/0xa0
> > > > >>>>  do_syscall_64+0x5a/0x180
> > > > >>>>  entry_SYSCALL_64_after_hwframe+0x49/0xbe
> > > > >>>>
> > > > >>>> And from xfs_db:
> > > > >>>>
> > > > >>>> core.extsize = 10380288
> > > > >>>>
> > > > >>>> Which is not an integer multiple of the block size, and so violates
> > > > >>>> Rule #7 for setting extent size hints. Validate extent size hint
> > > > >>>> rules in the inode verifier to catch this.
> > > > >>>
> > > > >>> So, I think that if I do:
> > > > >>>
> > > > >>> # mkfs.xfs -f -m crc=0 $TEST_DEV
> > > > >>> # ./check xfs/229
> > > > >>> # ./check xfs/229
> > > > >>>
> > > > >>> I trip the verifier, because I end up with freed inodes on disk with an
> > > > >>> extent size hints but zeroed flags.  
> > > > >>>
> > > > >>> xfs_ifree sets di_flags = 0 but doesn't clear di_extsize; xfs_inode_validate_extsize
> > > > >>> says if extsize !=0 and the hint flag is set, it fails
> > > > >>>
> > > > >>> Anyone else see this?
> > > > >>
> > > > >> Yeah, I think I just hit this on the TEST_DEV in xfs/242.
> > > > >>
> > > > >> git blame says I lifted the code from the scrub code, and I probably
> > > > >> wrote the code having read the ioctl code (which clears the extsize
> > > > >> field if the iflag isn't set).
> > > > >>
> > > > >>> (crc=0 needed because that causes us to actually reread the inode chunks
> > > > >>> in xfs_iread vs. /* shortcut IO on inode allocation if possible */
> > > > >>
> > > > >> Hmmm, so a v5 fs mounted with ikeep will also read an inode chunk when
> > > > >> creating an inode.  It looks like we do that (instead of zeroing the
> > > > >> incore inode and setting a random i_generation) to preserve the existing
> > > > >> generation number?
> > > > >>
> > > > >> In any case, it's pretty clear that kernels have been writing out freed
> > > > >> inode cores with di_mode == 0, di_flags == 0, and di_extsize == (some
> > > > >> number) so we clearly can't have that in the verifier.  It looks like we
> > > > >> only examine di_extsize if either EXTSZ flag are set, so it's not
> > > > >> causing incorrect behavior.  Maybe it can be a preening fix in
> > > > >> scrub/repair.
> > > > >>
> > > > > 
> > > > > I just stumbled on this problem with xfs/229 that Eric reported. I'm
> > > > > confused by the comment above regarding this not causing incorrect
> > > > > behavior.
> > > > 
> > > > I think Darrick meant that having a nonzero extent size hint on disk
> > > > won't cause incorrect behavior because "we only examine di_extsize if
> > > > either EXTSZ flag are set"
> > > 
> > > Yeah, he probably did. :)
> > > 
> > 
> > Got it, thanks.
> > 
> > > I think Brian's suggestion of
> > > 
> > > if (i_mode != 0 && !hint && extsize != 0)
> > > 	barf_error();
> > > 
> > > sounds reasonable (having not tested that at all).
> > > 
> > 
> > I'll run it through xfstests and get it posted if nothing else fails.
> > 
> > BTW, do we have a similar issue with the cowextsize hint (assuming
> > v5+ikeep)? It looks like it's cleared similarly in xfs_ialloc(), but I'm
> > not sure if it's cleared somewhere else on free...
> 

I should note for the list that we've since determined this was already
fixed in v4.18 [1]. The patch ended up in a common base branch between
what is used for upstream pull requests and XFS' for-next, being left
out of the latter just by accident.

[1] d4a34e1655 ("xfs: properly handle free inodes in extent hint
validators")

> We should clear them on free now, so that we can draw a line in the
> sand for when we can have verifiers check it. e.g. when the next
> feature bit gets introduced, filesystems with that feature bit set
> can also verify the extent size hints are zero on freed inodes
> because we know that kernels supporting that feature always zero
> them on free....
> 

That seems fine (and harmless) to me if the goal is ultimately to have
this content clear on-disk. It keeps things consistent for verifiers,
scrub, repair, etc. to not have some bits with required initialized
values and others where we need to accommodate stale data.

Brian

> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com

  reply	other threads:[~2018-08-21 14:16 UTC|newest]

Thread overview: 36+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-06-05  6:24 [PATCH 0/6 V2] xfs: more verifications! Dave Chinner
2018-06-05  6:24 ` [PATCH 1/6] xfs: catch bad stripe alignment configurations Dave Chinner
2018-06-05  9:27   ` Carlos Maiolino
2018-06-05  6:24 ` [PATCH 2/6] xfs: verify extent size hint is valid in inode verifier Dave Chinner
2018-06-05  9:53   ` Carlos Maiolino
2018-06-05 22:56     ` Dave Chinner
2018-06-05 17:10   ` Darrick J. Wong
2018-06-07 16:16     ` Darrick J. Wong
2018-06-08  1:10       ` Dave Chinner
2018-06-08  1:23         ` Darrick J. Wong
2018-06-08  2:23           ` Eric Sandeen
2018-07-24  6:39   ` Eric Sandeen
2018-07-24 16:43     ` Darrick J. Wong
2018-08-20 15:06       ` Brian Foster
2018-08-20 15:27         ` Eric Sandeen
2018-08-20 15:36           ` Darrick J. Wong
2018-08-20 15:59             ` Brian Foster
2018-08-20 22:15               ` Dave Chinner
2018-08-21 10:56                 ` Brian Foster [this message]
2018-08-22  0:41                   ` Dave Chinner
2018-06-05  6:24 ` [PATCH 3/6] xfs: verify COW " Dave Chinner
2018-06-05 10:00   ` Carlos Maiolino
2018-06-05 17:09   ` Darrick J. Wong
2018-06-05  6:24 ` [PATCH 4/6] xfs: validate btree records on retreival Dave Chinner
2018-06-05  6:40   ` [PATCH 4/6 v2] " Dave Chinner
2018-06-05 10:42     ` Carlos Maiolino
2018-06-05 23:00       ` Dave Chinner
2018-06-05 17:47     ` Darrick J. Wong
2018-06-05 23:02       ` Dave Chinner
2018-06-06  1:21     ` [PATCH 4/6 v3] " Dave Chinner
2018-06-05  6:24 ` [PATCH 5/6] xfs: verify root inode more thoroughly Dave Chinner
2018-06-05 10:50   ` Carlos Maiolino
2018-06-05 17:10   ` Darrick J. Wong
2018-06-05  6:24 ` [PATCH 6/6] xfs: push corruption -> ESTALE conversion to xfs_nfs_get_inode() Dave Chinner
2018-06-05 11:12   ` Carlos Maiolino
2018-06-05 17:11   ` Darrick J. Wong

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180821105645.GA14228@bfoster \
    --to=bfoster@redhat.com \
    --cc=darrick.wong@oracle.com \
    --cc=david@fromorbit.com \
    --cc=linux-xfs@vger.kernel.org \
    --cc=sandeen@sandeen.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.