All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Michael L. Semon" <mlsemon35@gmail.com>
To: Brian Foster <bfoster@redhat.com>, xfs@oss.sgi.com
Subject: Re: [PATCH v2 00/11] xfs: introduce the free inode btree
Date: Sun, 17 Nov 2013 17:43:17 -0500	[thread overview]
Message-ID: <52894685.8080603@gmail.com> (raw)
In-Reply-To: <1384353427-36205-1-git-send-email-bfoster@redhat.com>

On 11/13/2013 09:36 AM, Brian Foster wrote:
> Hi all,
> 
> The free inode btree adds a new inode btree to XFS with the intent to
> track only inode chunks with at least one free inode. Patches 1-3 add
> the necessary support for the new XFS_BTNUM_FINOBT type and introduce a
> read-only v5 superblock flag. Patch 4 updates the transaction
> reservations for inode allocation operations to account for the finobt.
> Patches 5-9 add support to manage the finobt on inode chunk allocation,
> inode allocation, inode free (and chunk deletion) and growfs. Patch 10
> adds support to report finobt status in the fs geometry. Patch 11 adds
> the feature bit to the associated mask. Thoughts, reviews, flames
> appreciated.
> 
> Brian
> 
> v2:
> - Rebase to latest xfs tree (minor shifting around of some header bits).
> - Added "xfs: report finobt status in fs geometry" patch to series.

Very nice rebase!  There might have been a whitespace issue on patch #6 
for kernel and xfsprogs, but it was easy going after that.

I'm halfway through testing 4k finobt CRC filesystems on a 2.2-GB, 2-disk 
md RAID-0, x86 Pentium 4, 512 MB of RAM.  The current nasty setup is 
kernel 3.12.0+, less the 5 most recent AIO commits/merges, and me trying 
to get in the few not-merged Dave Chinner kernel/xfsprogs patches along 
with your patches.

I meant to be done with 4k by now, but generic/224 caused the kernel OOM 
killer to halt testing, much like it does in 256 MB RAM without finobt.  
No problem:  I'll thank Stan in advance for introducing me to the term 
O_PONIES.

The rest of this letter is random junk that hasn't been re-tested, to 
give a flavor of what might lie ahead.  I'm missing a stack trace to the 
effect of "Error 117: offline filesystem operation in progress" as 
something later than xfstests xfs/296 was running.  None of this letter 
needs a reply.

Good luck!

Michael

[NOISE FOLLOWS]

***** I don't know if this one is an xfstests issue or an xfsprogs 
issue.  Something like this also happened in a non-finobt 
`./check -g auto`...

xfs/033	 [failed, exit status 1] - output mismatch (see /var/lib/xfstests/results//xfs/033.out.bad)
    --- tests/xfs/033.out	2013-11-11 13:46:22.367412935 -0500
    +++ /var/lib/xfstests/results//xfs/033.out.bad	2013-11-17 12:57:28.010382465 -0500
    @@ -17,9 +17,10 @@
             - process known inodes and perform inode discovery...
     bad magic number 0x0 on inode INO
     bad version number 0x0 on inode INO
    +inode identifier 0 mismatch on inode INO
     bad magic number 0x0 on inode INO, resetting magic number
     bad version number 0x0 on inode INO, resetting version number
    -imap claims a free inode INO is in use, correcting imap and clearing inode
     ...
     (Run 'diff -u tests/xfs/033.out /var/lib/xfstests/results//xfs/033.out.bad' to see the entire diff)

***** The diff for xfs/033:

19a20
> inode identifier 0 mismatch on inode INO
22c23
< imap claims a free inode INO is in use, correcting imap and clearing inode
---
> inode identifier 0 mismatch on inode INO
33,194c34,37
<         - resetting contents of realtime bitmap and summary inodes
<         - traversing filesystem ...
<         - traversal finished ...
<         - moving disconnected inodes to lost+found ...
< Phase 7 - verify and correct link counts...
< resetting inode INO nlinks from 1 to 2
< done
< Corrupting rt bitmap inode - setting bits to 0
< Wrote X.XXKb (value 0x0)
< Phase 1 - find and verify superblock...
< Phase 2 - using <TYPEOF> log
<         - zero log...
<         - scan filesystem freespace and inode maps...
<         - found root inode chunk
< Phase 3 - for each AG...
<         - scan and clear agi unlinked lists...
<         - process known inodes and perform inode discovery...
< bad magic number 0x0 on inode INO
< bad version number 0x0 on inode INO
< bad magic number 0x0 on inode INO, resetting magic number
< bad version number 0x0 on inode INO, resetting version number
< imap claims a free inode INO is in use, correcting imap and clearing inode
< cleared realtime bitmap inode INO
<         - process newly discovered inodes...
< Phase 4 - check for duplicate blocks...
<         - setting up duplicate extent list...
<         - check for inodes claiming duplicate blocks...
< Phase 5 - rebuild AG headers and trees...
<         - reset superblock...
< Phase 6 - check inode connectivity...
< reinitializing realtime bitmap inode
<         - resetting contents of realtime bitmap and summary inodes
<         - traversing filesystem ...
<         - traversal finished ...
<         - moving disconnected inodes to lost+found ...
< Phase 7 - verify and correct link counts...
< done
< Corrupting rt summary inode - setting bits to 0
< Wrote X.XXKb (value 0x0)
< Phase 1 - find and verify superblock...
< Phase 2 - using <TYPEOF> log
<         - zero log...
<         - scan filesystem freespace and inode maps...
<         - found root inode chunk
< Phase 3 - for each AG...
<         - scan and clear agi unlinked lists...
<         - process known inodes and perform inode discovery...
< bad magic number 0x0 on inode INO
< bad version number 0x0 on inode INO
< bad magic number 0x0 on inode INO, resetting magic number
< bad version number 0x0 on inode INO, resetting version number
< imap claims a free inode INO is in use, correcting imap and clearing inode
< cleared realtime summary inode INO
<         - process newly discovered inodes...
< Phase 4 - check for duplicate blocks...
<         - setting up duplicate extent list...
<         - check for inodes claiming duplicate blocks...
< Phase 5 - rebuild AG headers and trees...
<         - reset superblock...
< Phase 6 - check inode connectivity...
< reinitializing realtime summary inode
<         - resetting contents of realtime bitmap and summary inodes
<         - traversing filesystem ...
<         - traversal finished ...
<         - moving disconnected inodes to lost+found ...
< Phase 7 - verify and correct link counts...
< done
< Corrupting root inode - setting bits to -1
< Wrote X.XXKb (value 0xffffffff)
< Phase 1 - find and verify superblock...
< Phase 2 - using <TYPEOF> log
<         - zero log...
<         - scan filesystem freespace and inode maps...
<         - found root inode chunk
< Phase 3 - for each AG...
<         - scan and clear agi unlinked lists...
<         - process known inodes and perform inode discovery...
< bad magic number 0xffff on inode INO
< bad version number 0xffffffff on inode INO
< bad (negative) size -1 on inode INO
< bad magic number 0xffff on inode INO, resetting magic number
< bad version number 0xffffffff on inode INO, resetting version number
< bad (negative) size -1 on inode INO
< cleared root inode INO
<         - process newly discovered inodes...
< Phase 4 - check for duplicate blocks...
<         - setting up duplicate extent list...
< root inode lost
<         - check for inodes claiming duplicate blocks...
< Phase 5 - rebuild AG headers and trees...
<         - reset superblock...
< Phase 6 - check inode connectivity...
< reinitializing root directory
<         - resetting contents of realtime bitmap and summary inodes
<         - traversing filesystem ...
<         - traversal finished ...
<         - moving disconnected inodes to lost+found ...
< Phase 7 - verify and correct link counts...
< resetting inode INO nlinks from 1 to 2
< done
< Corrupting rt bitmap inode - setting bits to -1
< Wrote X.XXKb (value 0xffffffff)
< Phase 1 - find and verify superblock...
< Phase 2 - using <TYPEOF> log
<         - zero log...
<         - scan filesystem freespace and inode maps...
<         - found root inode chunk
< Phase 3 - for each AG...
<         - scan and clear agi unlinked lists...
<         - process known inodes and perform inode discovery...
< bad magic number 0xffff on inode INO
< bad version number 0xffffffff on inode INO
< bad (negative) size -1 on inode INO
< bad magic number 0xffff on inode INO, resetting magic number
< bad version number 0xffffffff on inode INO, resetting version number
< bad (negative) size -1 on inode INO
< cleared realtime bitmap inode INO
<         - process newly discovered inodes...
< Phase 4 - check for duplicate blocks...
<         - setting up duplicate extent list...
<         - check for inodes claiming duplicate blocks...
< Phase 5 - rebuild AG headers and trees...
<         - reset superblock...
< Phase 6 - check inode connectivity...
< reinitializing realtime bitmap inode
<         - resetting contents of realtime bitmap and summary inodes
<         - traversing filesystem ...
<         - traversal finished ...
<         - moving disconnected inodes to lost+found ...
< Phase 7 - verify and correct link counts...
< done
< Corrupting rt summary inode - setting bits to -1
< Wrote X.XXKb (value 0xffffffff)
< Phase 1 - find and verify superblock...
< Phase 2 - using <TYPEOF> log
<         - zero log...
<         - scan filesystem freespace and inode maps...
<         - found root inode chunk
< Phase 3 - for each AG...
<         - scan and clear agi unlinked lists...
<         - process known inodes and perform inode discovery...
< bad magic number 0xffff on inode INO
< bad version number 0xffffffff on inode INO
< bad (negative) size -1 on inode INO
< bad magic number 0xffff on inode INO, resetting magic number
< bad version number 0xffffffff on inode INO, resetting version number
< bad (negative) size -1 on inode INO
< cleared realtime summary inode INO
<         - process newly discovered inodes...
< Phase 4 - check for duplicate blocks...
<         - setting up duplicate extent list...
<         - check for inodes claiming duplicate blocks...
< Phase 5 - rebuild AG headers and trees...
<         - reset superblock...
< Phase 6 - check inode connectivity...
< reinitializing realtime summary inode
<         - resetting contents of realtime bitmap and summary inodes
<         - traversing filesystem ...
<         - traversal finished ...
<         - moving disconnected inodes to lost+found ...
< Phase 7 - verify and correct link counts...
< done
---
> xfs_imap_to_bp: xfs_trans_read_buf() returned error 117.
> 
> fatal error -- could not iget root inode -- error - 117
> _check_xfs_filesystem: filesystem on /dev/md126 is inconsistent (r) (see /var/lib/xfstests/results//xfs/033.full)

***** This is the lone segfault so far:

xfs/291	[12832.846621] XFS (md126): Version 5 superblock detected. This kernel has EXPERIMENTAL support enabled!
[12832.846621] Use of these features in this kernel is at your own risk!
[12832.872608] XFS (md126): Mounting Filesystem
[12833.063779] XFS (md126): Ending clean mount
[13153.675046] XFS (md126): Version 5 superblock detected. This kernel has EXPERIMENTAL support enabled!
[13153.675046] Use of these features in this kernel is at your own risk!
[13153.694128] XFS (md126): Mounting Filesystem
[13154.105167] XFS (md126): Ending clean mount
[13201.470358] xfs_db[17902]: segfault at 9c157f8 ip 0809b6b0 sp bfe97950 error 4 in xfs_db[8048000+90000]
 [failed, exit status 1] - output mismatch (see /var/lib/xfstests/results//xfs/291.out.bad)
    --- tests/xfs/291.out	2013-11-11 13:46:26.652264785 -0500
    +++ /var/lib/xfstests/results//xfs/291.out.bad	2013-11-17 16:28:05.133832908 -0500
    @@ -1 +1,11 @@
     QA output created by 291
    +xfs_dir3_data_read_verify: XFS_CORRUPTION_ERROR
    +xfs_dir3_data_read_verify: XFS_CORRUPTION_ERROR
    +xfs_dir3_data_read_verify: XFS_CORRUPTION_ERROR
    +xfs_dir3_data_read_verify: XFS_CORRUPTION_ERROR
    +xfs_dir3_data_read_verify: XFS_CORRUPTION_ERROR
    +__read_verify: XFS_CORRUPTION_ERROR
     ...
     (Run 'diff -u tests/xfs/291.out /var/lib/xfstests/results//xfs/291.out.bad' to see the entire diff)
[13202.293470] XFS (md127): Version 5 superblock detected. This kernel has EXPERIMENTAL support enabled!
[13202.293470] Use of these features in this kernel is at your own risk!
[13202.309944] XFS (md127): Mounting Filesystem
[13202.587663] XFS (md127): Ending clean mount

***** I might not have seen this lockdep splat yet, but this 
is a new merge window.  This splat is repeatable and may be 
independent of finobt.

xfs/078	[87803.635893] 
======================================================
[ INFO: possible circular locking dependency detected ]
3.12.0+ #2 Not tainted
-------------------------------------------------------
xfs_repair/12944 is trying to acquire lock:
 (timekeeper_seq){------}, at: [<c104f843>] __hrtimer_start_range_ns+0xc7/0x35d

but task is already holding lock:
 (hrtimer_bases.lock){-.-.-.}, at: [<c104f7a4>] __hrtimer_start_range_ns+0x28/0x35d

which lock already depends on the new lock.


the existing dependency chain (in reverse order) is:

-> #5 (hrtimer_bases.lock){-.-.-.}:
       [<c106577c>] lock_acquire+0x7f/0x15e
       [<c162d072>] _raw_spin_lock_irqsave+0x4a/0x7a
       [<c104f7a4>] __hrtimer_start_range_ns+0x28/0x35d
       [<c1055b01>] start_bandwidth_timer+0x60/0x6f
       [<c105b1c2>] enqueue_task_rt+0xd3/0xfd
       [<c10546aa>] enqueue_task+0x45/0x60
       [<c1055813>] __sched_setscheduler+0x243/0x372
       [<c1056a21>] sched_setscheduler+0x17/0x19
       [<c108ae53>] watchdog_enable+0x69/0x7d
       [<c1053063>] smpboot_thread_fn+0x93/0x130
       [<c104c4ab>] kthread+0xb3/0xc7
       [<c162e4b7>] ret_from_kernel_thread+0x1b/0x28

-> #4 (&rt_b->rt_runtime_lock){-.-.-.}:
       [<c106577c>] lock_acquire+0x7f/0x15e
       [<c162cffb>] _raw_spin_lock+0x41/0x6e
       [<c105b1ac>] enqueue_task_rt+0xbd/0xfd
       [<c10546aa>] enqueue_task+0x45/0x60
       [<c1055813>] __sched_setscheduler+0x243/0x372
       [<c1056a21>] sched_setscheduler+0x17/0x19
       [<c108ae53>] watchdog_enable+0x69/0x7d
       [<c1053063>] smpboot_thread_fn+0x93/0x130
       [<c104c4ab>] kthread+0xb3/0xc7
       [<c162e4b7>] ret_from_kernel_thread+0x1b/0x28

-> #3 (&rq->lock){-.-.-.}:
       [<c106577c>] lock_acquire+0x7f/0x15e
       [<c162cffb>] _raw_spin_lock+0x41/0x6e
       [<c10561da>] wake_up_new_task+0x3b/0x147
       [<c102d132>] do_fork+0x116/0x305
       [<c102d34e>] kernel_thread+0x2d/0x33
       [<c161f0b2>] rest_init+0x22/0x128
       [<c19f39da>] start_kernel+0x2df/0x2e5
       [<c19f3378>] i386_start_kernel+0x12e/0x131

-> #2 (&p->pi_lock){-.-.-.}:
       [<c106577c>] lock_acquire+0x7f/0x15e
       [<c162d072>] _raw_spin_lock_irqsave+0x4a/0x7a
       [<c1055e1a>] try_to_wake_up+0x23/0x138
       [<c1055f60>] wake_up_process+0x1f/0x33
       [<c104411c>] start_worker+0x25/0x28
       [<c10451cc>] create_and_start_worker+0x37/0x5d
       [<c1a03b34>] init_workqueues+0xd4/0x2c4
       [<c19f3a99>] do_one_initcall+0xb9/0x153
       [<c19f3b7e>] kernel_init_freeable+0x4b/0x17d
       [<c161f1c8>] kernel_init+0x10/0xf2
       [<c162e4b7>] ret_from_kernel_thread+0x1b/0x28

-> #1 (&(&pool->lock)->rlock){-.-.-.}:
       [<c106577c>] lock_acquire+0x7f/0x15e
       [<c162cffb>] _raw_spin_lock+0x41/0x6e
       [<c104575a>] __queue_work+0x12b/0x393
       [<c1045c26>] queue_work_on+0x2f/0x6a
       [<c104f4b6>] clock_was_set_delayed+0x1d/0x1f
       [<c1075a67>] do_adjtimex+0xf4/0x145
       [<c1030ce0>] SYSC_adjtimex+0x30/0x62
       [<c1030f67>] SyS_adjtimex+0x10/0x12
       [<c162e53f>] sysenter_do_call+0x12/0x36

-> #0 (timekeeper_seq){------}:
       [<c10648b9>] __lock_acquire+0x13a4/0x17ac
       [<c106577c>] lock_acquire+0x7f/0x15e
       [<c1073688>] ktime_get+0x4f/0x169
       [<c104f843>] __hrtimer_start_range_ns+0xc7/0x35d
       [<c104faff>] hrtimer_start_range_ns+0x26/0x2c
       [<c104b17f>] common_timer_set+0xf5/0x164
       [<c104bd58>] SyS_timer_settime+0xbe/0x183
       [<c162dcc8>] syscall_call+0x7/0xb

other info that might help us debug this:

Chain exists of:
  timekeeper_seq --> &rt_b->rt_runtime_lock --> hrtimer_bases.lock

 Possible unsafe locking scenario:

       CPU0                    CPU1
       ----                    ----
  lock(hrtimer_bases.lock);
                               lock(&rt_b->rt_runtime_lock);
                               lock(hrtimer_bases.lock);
  lock(timekeeper_seq);

 *** DEADLOCK ***

2 locks held by xfs_repair/12944:
 #0:  (&(&new_timer->it_lock)->rlock){......}, at: [<c104b292>] __lock_timer+0xa4/0x1af
 #1:  (hrtimer_bases.lock){-.-.-.}, at: [<c104f7a4>] __hrtimer_start_range_ns+0x28/0x35d

stack backtrace:
CPU: 0 PID: 12944 Comm: xfs_repair Not tainted 3.12.0+ #2
Hardware name: Dell Computer Corporation Dimension 2350/07W080, BIOS A01 12/17/2002
 c1cb2e70 c1cb2e70 deb01dd8 c162748d deb01df8 c162303c c17a3306 deb01e3c
 deaad0c0 deaad550 deaad550 00000002 deb01e6c c10648b9 deaad528 0000006f
 c106269b deb01e20 c1c8bd08 00000003 00000000 0000000e 00000002 00000001
Call Trace:
 [<c162748d>] dump_stack+0x16/0x18
 [<c162303c>] print_circular_bug+0x1b8/0x1c2
 [<c10648b9>] __lock_acquire+0x13a4/0x17ac
 [<c106269b>] ? trace_hardirqs_off+0xb/0xd
 [<c106577c>] lock_acquire+0x7f/0x15e
 [<c104f843>] ? __hrtimer_start_range_ns+0xc7/0x35d
 [<c1073688>] ktime_get+0x4f/0x169
 [<c104f843>] ? __hrtimer_start_range_ns+0xc7/0x35d
 [<c162d098>] ? _raw_spin_lock_irqsave+0x70/0x7a
 [<c104f7a4>] ? __hrtimer_start_range_ns+0x28/0x35d
 [<c104f843>] __hrtimer_start_range_ns+0xc7/0x35d
 [<c104faff>] hrtimer_start_range_ns+0x26/0x2c
 [<c104b17f>] common_timer_set+0xf5/0x164
 [<c104b08a>] ? __posix_timers_find+0xa7/0xa7
 [<c104bd58>] SyS_timer_settime+0xbe/0x183
 [<c162dcc8>] syscall_call+0x7/0xb



_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

  parent reply	other threads:[~2013-11-17 22:43 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-11-13 14:36 [PATCH v2 00/11] xfs: introduce the free inode btree Brian Foster
2013-11-13 14:36 ` [PATCH v2 01/11] xfs: refactor xfs_ialloc_btree.c to support multiple inobt numbers Brian Foster
2013-11-13 16:17   ` Christoph Hellwig
2013-11-13 14:36 ` [PATCH v2 02/11] xfs: reserve v5 superblock read-only compat. feature bit for finobt Brian Foster
2013-11-13 16:18   ` Christoph Hellwig
2013-11-13 14:36 ` [PATCH v2 03/11] xfs: support the XFS_BTNUM_FINOBT free inode btree type Brian Foster
2013-11-13 14:37 ` [PATCH v2 04/11] xfs: update inode allocation/free transaction reservations for finobt Brian Foster
2013-11-13 14:37 ` [PATCH v2 05/11] xfs: insert newly allocated inode chunks into the finobt Brian Foster
2013-11-13 14:37 ` [PATCH v2 06/11] xfs: use and update the finobt on inode allocation Brian Foster
2013-11-13 14:37 ` [PATCH v2 07/11] xfs: refactor xfs_difree() inobt bits into xfs_difree_inobt() helper Brian Foster
2013-11-13 14:37 ` [PATCH v2 08/11] xfs: update the finobt on inode free Brian Foster
2013-11-13 14:37 ` [PATCH v2 09/11] xfs: add finobt support to growfs Brian Foster
2013-11-13 14:37 ` [PATCH v2 10/11] xfs: report finobt status in fs geometry Brian Foster
2013-11-13 14:37 ` [PATCH v2 11/11] xfs: enable the finobt feature on v5 superblocks Brian Foster
2013-11-13 16:17 ` [PATCH v2 00/11] xfs: introduce the free inode btree Christoph Hellwig
2013-11-13 17:55   ` Brian Foster
2013-11-13 21:10     ` Dave Chinner
2013-11-19 21:29       ` Brian Foster
2013-11-19 22:17         ` Dave Chinner
2013-11-17 22:43 ` Michael L. Semon [this message]
2013-11-18 22:38   ` Michael L. Semon

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=52894685.8080603@gmail.com \
    --to=mlsemon35@gmail.com \
    --cc=bfoster@redhat.com \
    --cc=xfs@oss.sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.