All of lore.kernel.org
 help / color / mirror / Atom feed
From: Brian Foster <bfoster@redhat.com>
To: Dave Chinner <david@fromorbit.com>
Cc: Jan Tulak <jtulak@redhat.com>,
	linux-xfs@vger.kernel.org, sandeen@sandeen.net
Subject: Re: [PATCH 2/2] mdrestore: warn about corruption if log is dirty
Date: Thu, 13 Apr 2017 09:10:19 -0400	[thread overview]
Message-ID: <20170413131018.GD24893@bfoster.bfoster> (raw)
In-Reply-To: <20170413025105.GD12369@dastard>

On Thu, Apr 13, 2017 at 12:51:05PM +1000, Dave Chinner wrote:
> On Wed, Apr 12, 2017 at 07:04:04AM -0400, Brian Foster wrote:
> > On Wed, Apr 12, 2017 at 08:34:05AM +1000, Dave Chinner wrote:
> > > On Tue, Apr 11, 2017 at 04:12:37PM +0200, Jan Tulak wrote:
> > > > A dirty log in an obfuscated dump means that a corruption can happen
> > > > when replaying the log (which contains unobfuscated data). Warn the user
> > > > about this possibility.
> > > > 
> > > > The xlog workaround is copy&paste solution from repair/phase2.c and
> > > > other tools, because the function is not implemented in libxlog.
> > > > 
> > > > Signed-off-by: Jan Tulak <jtulak@redhat.com>
> > > 
> > > I think this is overkill. mdrestore is not the place
> > > to be interpreting the state of the dumped image - it is a basic
> > > "restore the image" program, not a "check the validity of the image"
> > > program.
> > > 
> > 
> > I think that's a reasonable argument for the mdrestore side. I'm less
> > interested in seeing a warning on the restore side in general,
> > personally. I was initially thinking it would have required less code
> > and the whole obfuscation detection thing is getting into hackish
> > territory, to be fair.
> > 
> > > Secondly, if people are having problems with running log recovery on
> > > a restored obfuscated image and getting corruption and not knowing
> > > why or what to do, then that is a /documentation and training/
> > > problem, not a code problem.
> > > 
> > > i.e. the problem is that people who aren't developers are trying to
> > > use tools that were written for developers to do forensic analysis
> > > of failures. Don't dumb down the tool for clueless users - point the
> > > users at the documentation that the tool requires to use correctly...
> > > 
> > 
> > Put me in the clueless users bucket, then. This started with a customer
> > with a corrupted filesystem that provided a metadump that exhibited
> > filesystem corruption. A support person began the process of diagnosing
> > the problem and it eventually got to me, who had to spend a nontrivial
> > amount of time trying to identify what the problem was, see if I could
> > reproduce it on my own to verify it was actually specific to the
> > metadump, etc.
> > 
> > This is not an obvious "your metadump is broken" log recovery failure.
> > It's a latent directory corruption that doesn't obviously have anything
> > to do with log recovery in the first place. I'm sure I'll be able to
> > spot it going forward for some time while it's fresh in my mind, but I
> > expect to lose track of that eventually given the rarity (of debugging
> > log recovery issues). It's not reasonable at all to expect regular users
> > or support people to understand this enough to filter out bad images or
> > know when to use or not use a certain combination of metadump options,
> > because it otherwise requires a detailed understanding of XFS logging
> > and directory internals.
> 
> Log recovery on an obfuscated directory is, to me, a known obvious
> vector for directory corruption because we replay unobfuscated
> dirents over obfuscated on-disk data. Buffer logging is done in
> aligned 128 byte chunks, so it /should/ be obvious that the recovery
> of directory data buffers will partially overwrite dirents on disk
> even when they were not directly modified by the user.  And because
> this causes an obfuscated/clear text mismatch in the dirent name,
> the hash will not calculate to teh same as what the directory stored
> for that dirent. Hence the corruption reports that repair will now
> spew...
> 
> This was always considered a known problem for obfuscated metadump
> restorations - the unobfuscated log will result in recovery issues
> and name/data corruptions for dirs and xattrs.  In hindsight, this
> should have been documented long ago so you didn't have to waste the
> time to "rediscover" it like you did. It wasn't documented because
> both developers and users were far more concerned about the data
> exposure issues than they were about whether the log unobfuscated
> log replayed correctly or not.
> 

What might be obvious to a design architect from a code standpoint isn't
necessarily obvious to others nor obvious in practice. With such an
image in hand, it's not obvious (enough, IMO) that the corruption is
even a result of log recovery.

Rather, that is something that has to be worked out, distinguished from
any other potentially similar corruption of the source image and
isolated to the metadump image itself. This is a waste of time for
everybody involved.

> IMO - and as I said to Eric on IRC - we should not be trying to work
> around institutional problems (i.e. inability to train or impart the
> necessary knowledge on support engineers) with code changes.
> Training support engineers properly requires documentation and
> knowledge distribution processes; the code implementing the tools
> they are being taught about is not the right instrument to perform
> this knowledge transfer....
> 

Documentation is good. Jan's latest series updates the man page on the
issue.

That aside, this does not reduce to solely a documentation problem.
xfs_metadump is simply broken for particular source filesystems in that
it may corrupt the resulting fs image (by default, no less). The
fundamentally correct approach is to fix metadump obfuscation to not
corrupt the output image (then we wouldn't need to document the fact
that it does ;).

Fixing xfs_metadump to obfuscate the log correctly is not a trivial
matter, however, so that isn't a realistic solution atm. The next
available option is to disallow obfuscation of such filesystems, but
that limits the use of a valuable support tool to avoid a
rare/particular case. The next available option after that is the
approach implemented by these patches: to warn about the situation to
hopefully avoid the effects of the problem in the field.

So while I'm actually fine with dropping the mdrestore side bits here
for practical reasons (it's more code than I anticipated for the purpose
of emitting a warning), and I agree that we should update documentation,
the hightest priority here should be to provide a usable tool that
functions correctly.

IOW, this documentation problem exists because the tool is broken. The
tool will remain broken despite the fact that the problem is documented.
Therefore, we are not just working around a documentation issue by
attempting to improve the tool.

Brian

> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

  reply	other threads:[~2017-04-13 13:10 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-04-11 14:12 [PATCH 0/2] xfsprogs: metadump/mdrestore warns about dirty journal Jan Tulak
2017-04-11 14:12 ` [PATCH 1/2] metadump: warn about corruption if log is dirty Jan Tulak
2017-04-11 18:30   ` Brian Foster
2017-04-11 18:34     ` Eric Sandeen
2017-04-11 18:43       ` Brian Foster
2017-04-11 19:01         ` Eric Sandeen
2017-04-11 23:44           ` Darrick J. Wong
2017-04-12 11:03             ` Brian Foster
2017-04-12 11:24               ` Jan Tulak
2017-04-11 14:12 ` [PATCH 2/2] mdrestore: " Jan Tulak
2017-04-11 18:33   ` Brian Foster
2017-04-11 18:39     ` Eric Sandeen
2017-04-11 18:49       ` Brian Foster
2017-04-11 18:59         ` Eric Sandeen
2017-04-11 22:34   ` Dave Chinner
2017-04-11 23:43     ` Darrick J. Wong
2017-04-12  1:48       ` Eric Sandeen
2017-04-12 11:26         ` Brian Foster
2017-04-12 11:06       ` Brian Foster
2017-04-12 17:45         ` Darrick J. Wong
2017-04-13  8:12           ` Jan Tulak
2017-04-12 11:04     ` Brian Foster
2017-04-13  2:51       ` Dave Chinner
2017-04-13 13:10         ` Brian Foster [this message]
2017-04-14  0:29           ` Dave Chinner
2017-04-14  2:54             ` Brian Foster
2017-05-25 17:29 ` [PATCH 0/2] xfsprogs: metadump/mdrestore warns about dirty journal Eric Sandeen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170413131018.GD24893@bfoster.bfoster \
    --to=bfoster@redhat.com \
    --cc=david@fromorbit.com \
    --cc=jtulak@redhat.com \
    --cc=linux-xfs@vger.kernel.org \
    --cc=sandeen@sandeen.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.