All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Darrick J. Wong" <darrick.wong@oracle.com>
To: Eryu Guan <eguan@redhat.com>
Cc: Amir Goldstein <amir73il@gmail.com>,
	linux-xfs@vger.kernel.org, Eric Sandeen <sandeen@redhat.com>
Subject: Re: [PATCH v4 00/47] xfs: online scrub/repair support
Date: Tue, 10 Jan 2017 10:20:06 -0800	[thread overview]
Message-ID: <20170110182006.GJ14038@birch.djwong.org> (raw)
In-Reply-To: <20170110075444.GL1859@eguan.usersys.redhat.com>

On Tue, Jan 10, 2017 at 03:54:44PM +0800, Eryu Guan wrote:
> On Mon, Jan 09, 2017 at 01:15:40PM -0800, Darrick J. Wong wrote:
> > On Mon, Jan 09, 2017 at 02:40:56PM +0200, Amir Goldstein wrote:
> > > On Sat, Jan 7, 2017 at 2:35 AM, Darrick J. Wong <darrick.wong@oracle.com> wrote:
> > > > Hi all,
> > > >
> > > ...
> > > > If you're going to start using this mess, you probably ought to just
> > > > pull from my github trees.  The kernel patches[1] should apply against
> > > > 4.10-rc2.  xfsprogs[2] and xfstests[3] can be found in their usual
> > > > places.
> > > >
> > > > The patches have survived all auto group xfstests both with scrub-only
> > > > mode and also a special debugging mode to xfs_scrub that forces it to
> > > > rebuild the metadata structures even if they're not damaged.  Since the
> > > > last patch release, I have now had time to run the new tests in [3] that
> > > > try to fuzz every field in every data structure on disk.
> > > >
> > > 
> > > Darrick,
> > > 
> > > I started running the dangerous_scrub group yersterday and it's killing my
> > > test machine. The test machine is x86_64 (i5-3470) 16GB RAM
> > > and test partitions are 100GB volume on spinning disk.
> > > 
> > > xfs_db swaps my system to death and most of the tests it eventually
> > > gets shot down by oom killer.
> > > 
> > > Is that surprising to you?
> > 
> > Yes.
> 
> I hit OOM too in xfs/1301. (I ran xfs/13??, xfs/1300 passed and 1301
> oom'ed the host, I haven't run other tests yet.)
> 
> > 
> > > How much RAM does you test systems have?
> > 
> > 2GB in a VM so the host system won't go down.  Usually the test disks 
> > are 8GB disks to keep the fuzzer runtimes down, but I've also run them
> > against 100GB volumes without OOMing...
> > 
> > > Can you figure out a minimal RAM requirement to run these fuzzers
> > > and maybe check required RAM before running the test?
> > 
> > I wouldn't have thought xfs_check would OOM... it would help to know
> > exactly what the xfs_db invocation thought it was doing.
> 
> My test host has 64G memory, it's running on a 15G SCRATCH_DEV.

Aha, I /have/ been hitting OOM, but it got lost in the noise.

> > > Alternatively, can you figure out how to reduce the amount of RAM
> > > used by the fuzzer?
> > > 
> > > I was using mkfs options "-m rmapbt=1,reflink=1"
> > > and I tried running with and then without TEST_XFS_SCRUB=1.
> > > I don't see a reason to send the logs at this point, they are just a complete
> > > mass of destruction.
> > 
> > All the tests?  The full dmesg output would be useful to narrow it down to
> > a specific xfstest number, field name, and fuzz verb.  I'm running them
> 
> In my case, the xfs_db command is doing
> 
> /usr/sbin/xfs_db -x -c sb 0 -c fuzz /dev/mapper/systemvg-testlv2
> 
> I attached console log and xfs-1301.full I have so far.

Aha, thank you.

> Thanks,
> Eryu
>
> Fields we propose to fuzz under: sb 0
> xfs_db>
> blocksize
> dblocks
<snip>
> Field agblocks already set to , skipping test.
> + Fuzz agcount = zeroes
> ========================
> xfs_db> xfs_db> Allowing fuzz of corrupted data with good CRC
> agcount = 0
> xfs_db> 
> 
> Field agcount already set to , skipping test.
> + Fuzz agcount = ones
> ========================
> xfs_db> xfs_db> Allowing fuzz of corrupted data with good CRC
> agcount = null
> xfs_db> 
> 
> Field agcount already set to , skipping test.
> + Fuzz agcount = firstbit
> ========================
> xfs_db> xfs_db> Allowing fuzz of corrupted data with good CRC
> agcount = 2147483664
> xfs_db> 
> 
> Field agcount already set to , skipping test.
> + Fuzz agcount = middlebit
> ========================
> xfs_db> xfs_db> Allowing fuzz of corrupted data with good CRC
> agcount = 32784
> xfs_db> 
> 
> Field agcount already set to , skipping test.
> + Fuzz agcount = lastbit
> ========================
> xfs_db> xfs_db> Allowing fuzz of corrupted data with good CRC
> agcount = 17
> xfs_db> 
> 
> Field agcount already set to , skipping test.
> + Fuzz agcount = add
> ========================
> xfs_db> xfs_db> Allowing fuzz of corrupted data with good CRC
> agcount = 2033
> xfs_db> 
> 
> Field agcount already set to , skipping test.
> + Fuzz agcount = sub
> ========================
> xfs_db> xfs_db> Allowing fuzz of corrupted data with good CRC
> agcount = 4294965295
> xfs_db> 
> 
> Field agcount already set to , skipping test.
> + Fuzz agcount = random
> ========================
> xfs_db> xfs_db> Allowing fuzz of corrupted data with good CRC
> agcount = 1858079377
> xfs_db> 
> 
> Field agcount already set to , skipping test.

Now I see what the problem is.  We set an insane number of AGs.  The
next thing we try to do is read the fuzzed value back from the sb, which
fires up another xfs_db instance.  That instance thinks we have
1,858,079,377 AGs and tries to allocate per-ag data for all of them and
OOMs the system, causing xfs_db to fail.

Fortunately the kernel and xfs_repair notice the broken geometry and
handle it nicely, but that leaves xfs_db unable to deal with it.  We
could clamp agcount to a "reasonable" value, though it isn't clear what
that means if agblocks is also insane.

OTOH xfs_db AFAIK doesn't use the in-core perag stuff anyway so maybe
setting agcount to 0 is reasonable enough(???)  Will experiment with
this and report back.

--D

      parent reply	other threads:[~2017-01-10 18:20 UTC|newest]

Thread overview: 55+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-01-07  0:35 [PATCH v4 00/47] xfs: online scrub/repair support Darrick J. Wong
2017-01-07  0:35 ` [PATCH 01/47] xfs: plumb in needed functions for range querying of the freespace btrees Darrick J. Wong
2017-01-07  0:35 ` [PATCH 02/47] xfs: provide a query_range function for " Darrick J. Wong
2017-01-07  0:36 ` [PATCH 03/47] xfs: create a function to query all records in a btree Darrick J. Wong
2017-01-07  0:36 ` [PATCH 04/47] xfs: introduce the XFS_IOC_GETFSMAP ioctl Darrick J. Wong
2017-01-07  0:36 ` [PATCH 05/47] xfs: report shared extents in getfsmapx Darrick J. Wong
2017-01-07  0:36 ` [PATCH 06/47] xfs: have getfsmap fall back to the freesp btrees when rmap is not present Darrick J. Wong
2017-01-07  0:36 ` [PATCH 07/47] xfs: getfsmap should fall back to rtbitmap when rtrmapbt " Darrick J. Wong
2017-01-07  0:36 ` [PATCH 08/47] xfs: add scrub tracepoints Darrick J. Wong
2017-01-07  0:36 ` [PATCH 09/47] xfs: create an ioctl to scrub AG metadata Darrick J. Wong
2017-01-07  0:36 ` [PATCH 10/47] xfs: generic functions to scrub metadata and btrees Darrick J. Wong
2017-01-07  0:36 ` [PATCH 11/47] xfs: scrub the backup superblocks Darrick J. Wong
2017-01-07  0:37 ` [PATCH 12/47] xfs: scrub AGF and AGFL Darrick J. Wong
2017-01-07  0:37 ` [PATCH 13/47] xfs: scrub the AGI Darrick J. Wong
2017-01-07  0:37 ` [PATCH 14/47] xfs: support scrubbing free space btrees Darrick J. Wong
2017-01-07  0:37 ` [PATCH 15/47] xfs: support scrubbing inode btrees Darrick J. Wong
2017-01-07  0:37 ` [PATCH 16/47] xfs: support scrubbing rmap btree Darrick J. Wong
2017-01-07  0:37 ` [PATCH 17/47] xfs: support scrubbing refcount btree Darrick J. Wong
2017-01-07  0:37 ` [PATCH 18/47] xfs: scrub inodes Darrick J. Wong
2017-01-07  0:37 ` [PATCH 19/47] xfs: scrub inode block mappings Darrick J. Wong
2017-01-07  0:37 ` [PATCH 20/47] xfs: scrub directory/attribute btrees Darrick J. Wong
2017-01-07  0:38 ` [PATCH 21/47] xfs: scrub directory metadata Darrick J. Wong
2017-01-07  0:38 ` [PATCH 22/47] xfs: scrub extended attributes Darrick J. Wong
2017-01-07  0:38 ` [PATCH 23/47] xfs: scrub symbolic links Darrick J. Wong
2017-01-07  0:38 ` [PATCH 24/47] xfs: scrub realtime bitmap/summary Darrick J. Wong
2017-01-07  0:38 ` [PATCH 25/47] xfs: scrub should cross-reference with the bnobt Darrick J. Wong
2017-01-07  0:38 ` [PATCH 26/47] xfs: cross-reference bnobt records with cntbt Darrick J. Wong
2017-01-07  0:38 ` [PATCH 27/47] xfs: cross-reference extents with AG header Darrick J. Wong
2017-01-07  0:38 ` [PATCH 28/47] xfs: cross-reference inode btrees during scrub Darrick J. Wong
2017-01-07  0:38 ` [PATCH 29/47] xfs: cross-reference reverse-mapping btree Darrick J. Wong
2017-01-07  0:39 ` [PATCH 30/47] xfs: cross-reference refcount btree during scrub Darrick J. Wong
2017-01-07  0:39 ` [PATCH 31/47] xfs: scrub should cross-reference the realtime bitmap Darrick J. Wong
2017-01-07  0:39 ` [PATCH 32/47] xfs: cross-reference the block mappings when possible Darrick J. Wong
2017-01-07  0:39 ` [PATCH 33/47] xfs: create tracepoints for online repair Darrick J. Wong
2017-01-07  0:39 ` [PATCH 34/47] xfs: implement the metadata repair ioctl flag Darrick J. Wong
2017-01-07  0:39 ` [PATCH 35/47] xfs: add helper routines for the repair code Darrick J. Wong
2017-01-07  0:39 ` [PATCH 36/47] xfs: repair superblocks Darrick J. Wong
2017-01-07  0:39 ` [PATCH 37/47] xfs: repair the AGF and AGFL Darrick J. Wong
2017-01-07  0:39 ` [PATCH 38/47] xfs: rebuild the AGI Darrick J. Wong
2017-01-07  0:39 ` [PATCH 39/47] xfs: repair free space btrees Darrick J. Wong
2017-01-07  0:40 ` [PATCH 40/47] xfs: repair inode btrees Darrick J. Wong
2017-01-07  0:40 ` [PATCH 41/47] xfs: rebuild the rmapbt Darrick J. Wong
2017-01-07  0:40 ` [PATCH 42/47] xfs: repair refcount btrees Darrick J. Wong
2017-01-07  0:40 ` [PATCH 43/47] xfs: online repair of inodes Darrick J. Wong
2017-01-07  0:40 ` [PATCH 44/47] xfs: repair inode block maps Darrick J. Wong
2017-01-07  0:40 ` [PATCH 45/47] xfs: repair damaged symlinks Darrick J. Wong
2017-01-07  0:40 ` [PATCH 46/47] xfs: query the per-AG reservation counters Darrick J. Wong
2017-01-07  0:40 ` [PATCH 47/47] xfs: avoid mount-time deadlock in CoW extent recovery Darrick J. Wong
2017-01-09 12:40 ` [PATCH v4 00/47] xfs: online scrub/repair support Amir Goldstein
2017-01-09 21:15   ` Darrick J. Wong
2017-01-10  7:54     ` Eryu Guan
2017-01-10  8:13       ` Amir Goldstein
2017-01-10  8:44         ` Eryu Guan
     [not found]         ` <CAOQ4uxiFg18fVh3RFr-Y1-XRmV82dTxc5r05QH8OFYpv2=juvg@mail.gmail.com>
     [not found]           ` <CAOQ4uxhTPt7t4-4MmQwogy+d4mgyG+=MX=12NX8R4V-hGR1q0w@mail.gmail.com>
2017-01-12 20:10             ` Darrick J. Wong
2017-01-10 18:20       ` Darrick J. Wong [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170110182006.GJ14038@birch.djwong.org \
    --to=darrick.wong@oracle.com \
    --cc=amir73il@gmail.com \
    --cc=eguan@redhat.com \
    --cc=linux-xfs@vger.kernel.org \
    --cc=sandeen@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.