From: Brian Foster <bfoster@redhat.com>
To: Martin Svec <martin.svec@zoner.cz>
Cc: Dave Chinner <david@fromorbit.com>, linux-xfs@vger.kernel.org
Subject: Re: Quota-enabled XFS hangs during mount
Date: Wed, 25 Jan 2017 10:36:10 -0500 [thread overview]
Message-ID: <20170125153610.GD28388@bfoster.bfoster> (raw)
In-Reply-To: <5b41d19b-1a0d-2b74-a633-30a5f6d2f14a@zoner.cz>
On Tue, Jan 24, 2017 at 02:17:36PM +0100, Martin Svec wrote:
> Hello,
>
> Dne 23.1.2017 v 14:44 Brian Foster napsal(a):
> > On Mon, Jan 23, 2017 at 10:44:20AM +0100, Martin Svec wrote:
> >> Hello Dave,
> >>
> >> Any updates on this? It's a bit annoying to workaround the bug by increasing RAM just because of the
> >> initial quotacheck.
> >>
> > Note that Dave is away on a bit of an extended vacation[1]. It looks
> > like he was in the process of fishing through the code to spot any
> > potential problems related to quotacheck+reclaim. I see you've cc'd him
> > directly so we'll see if we get a response wrt to if he got anywhere
> > with that...
> >
> > Skimming back through this thread, it looks like we have an issue where
> > quota check is not quite reliable in the event of reclaim, and you
> > appear to be reproducing this due to a probably unique combination of
> > large inode count and low memory.
> >
> > Is my understanding correct that you've reproduced this on more recent
> > kernels than the original report?
>
> Yes, I repeated the tests using 4.9.3 kernel on another VM where we hit this issue.
>
> Configuration:
> * vSphere 5.5 virtual machine, 2 vCPUs, virtual disks residing on iSCSI VMFS datastore
> * Debian Jessie 64 bit webserver, vanilla kernel 4.9.3
> * 180 GB XFS data disk mounted as /www
>
> Quotacheck behavior depends on assigned RAM:
> * 2 or less GiB: mount /www leads to a storm of OOM kills including shell, ttys etc., so the system
> becomes unusable.
> * 3 GiB: mount /www task hangs in the same way as I reported in earlier in this thread.
> * 4 or more GiB: mount /www succeeds.
>
> The affected disk is checked using xfs_repair. I keep a VM snapshot to be able to reproduce the bug.
> Below is updated filesystem information and dmesg output:
>
> ---------
> xfs-test:~# df -i
> Filesystem Inodes IUsed IFree IUse% Mounted on
> /dev/sdd1 165312432 2475753 162836679 2% /www
>
> ---------
> xfs-test:~# xfs_info /www
> meta-data=/dev/sdd1 isize=256 agcount=73, agsize=655232 blks
> = sectsz=512 attr=2, projid32bit=0
> = crc=0 finobt=0
> data = bsize=4096 blocks=47185664, imaxpct=25
> = sunit=0 swidth=0 blks
> naming =version 2 bsize=4096 ascii-ci=0 ftype=0
> log =internal bsize=4096 blocks=2560, version=2
> = sectsz=512 sunit=0 blks, lazy-count=1
> realtime =none extsz=4096 blocks=0, rtextents=0
>
Ok, thanks.
> ---------
> slabtop, 3 GiB RAM:
>
> Active / Total Objects (% used) : 3447273 / 3452076 (99.9%)
> Active / Total Slabs (% used) : 648365 / 648371 (100.0%)
> Active / Total Caches (% used) : 70 / 124 (56.5%)
> Active / Total Size (% used) : 2592192.04K / 2593485.27K (100.0%)
> Minimum / Average / Maximum Object : 0.02K / 0.75K / 4096.00K
>
> OBJS ACTIVE USE OBJ SIZE SLABS OBJ/SLAB CACHE SIZE NAME
> 2477104 2477101 99% 1.00K 619276 4 2477104K xfs_inode
> 631904 631840 99% 0.03K 5096 124 20384K kmalloc-32
> 74496 74492 99% 0.06K 1164 64 4656K kmalloc-64
> 72373 72367 99% 0.56K 10339 7 41356K radix_tree_node
> 38410 38314 99% 0.38K 3841 10 15364K mnt_cache
> 31360 31334 99% 0.12K 980 32 3920K kmalloc-96
> 27574 27570 99% 0.12K 811 34 3244K kernfs_node_cache
> 19152 18291 95% 0.19K 912 21 3648K dentry
> 17312 17300 99% 0.12K 541 32 2164K kmalloc-node
> 14546 13829 95% 0.57K 2078 7 8312K inode_cache
> 11088 11088 100% 0.19K 528 21 2112K kmalloc-192
> 5432 5269 96% 0.07K 97 56 388K Acpi-Operand
> 3960 3917 98% 0.04K 40 99 160K Acpi-Namespace
> 3624 3571 98% 0.50K 453 8 1812K kmalloc-512
> 3320 3249 97% 0.05K 40 83 160K ftrace_event_field
> 3146 3048 96% 0.18K 143 22 572K vm_area_struct
> 2752 2628 95% 0.06K 43 64 172K anon_vma_chain
> 2640 1991 75% 0.25K 165 16 660K kmalloc-256
> 1748 1703 97% 0.09K 38 46 152K trace_event_file
> 1568 1400 89% 0.07K 28 56 112K anon_vma
> 1086 1035 95% 0.62K 181 6 724K proc_inode_cache
> 935 910 97% 0.67K 85 11 680K shmem_inode_cache
> 786 776 98% 2.00K 393 2 1572K kmalloc-2048
> 780 764 97% 1.00K 195 4 780K kmalloc-1024
> 525 341 64% 0.19K 25 21 100K cred_jar
> 408 396 97% 0.47K 51 8 204K xfs_da_state
> 336 312 92% 0.62K 56 6 224K sock_inode_cache
> 309 300 97% 2.05K 103 3 824K idr_layer_cache
> 256 176 68% 0.12K 8 32 32K pid
> 240 2 0% 0.02K 1 240 4K jbd2_revoke_table_s
> 231 231 100% 4.00K 231 1 924K kmalloc-4096
> 230 222 96% 3.31K 115 2 920K task_struct
> 224 205 91% 1.06K 32 7 256K signal_cache
> 213 26 12% 0.05K 3 71 12K Acpi-Parse
> 213 213 100% 2.06K 71 3 568K sighand_cache
> 189 97 51% 0.06K 3 63 12K fs_cache
> 187 86 45% 0.36K 17 11 68K blkdev_requests
> 163 63 38% 0.02K 1 163 4K numa_policy
>
> ---------
> dmesg, 3 GiB RAM:
>
> [ 967.642413] INFO: task mount:669 blocked for more than 120 seconds.
> [ 967.642456] Tainted: G E 4.9.3-znr1+ #24
> [ 967.642510] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [ 967.642570] mount D 0 669 652 0x00000000
> [ 967.642573] ffff8800b9b8ac00 0000000000000000 ffffffffa800e540 ffff880036b85200
> [ 967.642575] ffff8800bb618740 ffffc90000f87998 ffffffffa7a2802d ffff8800ba38e000
> [ 967.642577] ffffc90000f87998 00000000c021fd94 0002000000000000 ffff880036b85200
> [ 967.642579] Call Trace:
> [ 967.642586] [<ffffffffa7a2802d>] ? __schedule+0x23d/0x6e0
> [ 967.642588] [<ffffffffa7a28506>] schedule+0x36/0x80
> [ 967.642590] [<ffffffffa7a2bbac>] schedule_timeout+0x21c/0x3c0
> [ 967.642592] [<ffffffffa774c3ab>] ? __radix_tree_lookup+0x7b/0xe0
> [ 967.642594] [<ffffffffa7a28fbb>] wait_for_completion+0xfb/0x140
> [ 967.642596] [<ffffffffa74ae1f0>] ? wake_up_q+0x70/0x70
> [ 967.642654] [<ffffffffc0225b32>] xfs_qm_flush_one+0x82/0xc0 [xfs]
> [ 967.642684] [<ffffffffc0225ab0>] ? xfs_qm_dqattach_one+0x120/0x120 [xfs]
> [ 967.642712] [<ffffffffc0225f1c>] xfs_qm_dquot_walk.isra.10+0xec/0x170 [xfs]
> [ 967.642744] [<ffffffffc0227f75>] xfs_qm_quotacheck+0x255/0x310 [xfs]
> [ 967.642774] [<ffffffffc0228114>] xfs_qm_mount_quotas+0xe4/0x170 [xfs]
> [ 967.642800] [<ffffffffc02042bd>] xfs_mountfs+0x62d/0x940 [xfs]
> [ 967.642827] [<ffffffffc0208eca>] xfs_fs_fill_super+0x40a/0x590 [xfs]
> [ 967.642829] [<ffffffffa761aa4a>] mount_bdev+0x17a/0x1b0
> [ 967.642864] [<ffffffffc0208ac0>] ? xfs_test_remount_options.isra.14+0x60/0x60 [xfs]
> [ 967.642895] [<ffffffffc0207b35>] xfs_fs_mount+0x15/0x20 [xfs]
> [ 967.642897] [<ffffffffa761b428>] mount_fs+0x38/0x170
> [ 967.642900] [<ffffffffa76390a4>] vfs_kern_mount+0x64/0x110
> [ 967.642901] [<ffffffffa763b7f5>] do_mount+0x1e5/0xcd0
> [ 967.642903] [<ffffffffa763b3ec>] ? copy_mount_options+0x2c/0x230
> [ 967.642904] [<ffffffffa763c5d4>] SyS_mount+0x94/0xd0
> [ 967.642907] [<ffffffffa7a2d0fb>] entry_SYSCALL_64_fastpath+0x1e/0xad
>
> > If so and we don't hear back from Dave
> > in a reasonable time, it might be useful to provide a metadump of the fs
> > if possible. That would allow us to restore in a similar low RAM vm
> > configuration, trigger quota check and try to reproduce directly...
>
> Unfortunately, the output of xfs_metadump apparently contains readable fragments of files! We cannot
> provide you such a dump from production server. Shouldn't metadump obfuscate metadata and ignore all
> filesystem data? Maybe it's a sign of filesystem corruption unrecognized by xfs_repair?
>
It should, not sure what's going on there. Perhaps a metadump bug. We
can probably just create a filesystem with similar geometry and inode
population and see what happens with that...
Brian
>
> Thank you,
> Martin
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
next prev parent reply other threads:[~2017-01-25 15:36 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-11-01 16:45 Quota-enabled XFS hangs during mount Martin Svec
2016-11-01 21:58 ` Dave Chinner
2016-11-02 16:31 ` Martin Svec
2016-11-03 1:31 ` Dave Chinner
2016-11-03 12:04 ` Martin Svec
2016-11-03 20:40 ` Dave Chinner
2017-01-23 9:44 ` Martin Svec
2017-01-23 13:44 ` Brian Foster
2017-01-23 22:06 ` Dave Chinner
2017-01-24 13:17 ` Martin Svec
2017-01-25 15:36 ` Brian Foster [this message]
2017-01-25 22:17 ` Brian Foster
2017-01-26 17:46 ` Martin Svec
2017-01-26 19:12 ` Brian Foster
2017-01-27 13:06 ` Martin Svec
2017-01-27 17:07 ` Brian Foster
2017-01-27 20:49 ` Martin Svec
2017-01-27 21:00 ` Martin Svec
2017-01-27 23:17 ` Darrick J. Wong
2017-01-28 22:42 ` Dave Chinner
2017-01-30 15:31 ` Brian Foster
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20170125153610.GD28388@bfoster.bfoster \
--to=bfoster@redhat.com \
--cc=david@fromorbit.com \
--cc=linux-xfs@vger.kernel.org \
--cc=martin.svec@zoner.cz \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.