From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx1.redhat.com ([209.132.183.28]:35652 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S934643AbdGTMis (ORCPT ); Thu, 20 Jul 2017 08:38:48 -0400 Date: Thu, 20 Jul 2017 08:38:46 -0400 From: Brian Foster Subject: Re: quotacheck deadlock? Message-ID: <20170720123846.GC3944@bfoster.bfoster> References: <20170720065804.GR4224@magnolia> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20170720065804.GR4224@magnolia> Sender: linux-xfs-owner@vger.kernel.org List-ID: List-Id: xfs To: "Darrick J. Wong" Cc: xfs On Wed, Jul 19, 2017 at 11:58:04PM -0700, Darrick J. Wong wrote: > Hi, > > I ran the following sequence of commands on 4.13-rc1: > > # mkfs.xfs -f /dev/sdf > # xfs_db -x -c 'sb 0' -c 'addr rootino' -c 'write -d core.uid 4294967295' /dev/sdf > # mount /dev/sdf -o usrquota > > The kernel reports that it's starting quotacheck, but never finishes. > echo t > /proc/sysrq produces this for the hung mount command: > > mount R running task 0 988 895 0x00000000 > Call Trace: > ? sched_clock_cpu+0xa8/0xe0 > ? xfs_qm_flush_one+0x3c/0x120 [xfs] > ? lock_acquire+0xac/0x200 > ? lock_acquire+0xac/0x200 > ? xfs_qm_flush_one+0x3c/0x120 [xfs] > ? xfs_qm_dquot_walk+0xa1/0x170 [xfs] > ? get_lock_stats+0x19/0x60 > ? get_lock_stats+0x19/0x60 > ? xfs_qm_dquot_walk+0xa1/0x170 [xfs] > ? xfs_qm_dquot_walk+0x125/0x170 [xfs] > ? radix_tree_gang_lookup+0xd1/0xf0 > ? xfs_qm_shrink_count+0x20/0x20 [xfs] > ? xfs_qm_dquot_walk+0xbb/0x170 [xfs] > ? kfree+0x23f/0x2d0 > ? kvfree+0x2a/0x40 > ? xfs_bulkstat+0x315/0x680 [xfs] > ? xfs_qm_get_rtblks+0xa0/0xa0 [xfs] > ? xfs_qm_quotacheck+0x2bd/0x360 [xfs] > ? xfs_qm_mount_quotas+0x106/0x1f0 [xfs] > ? xfs_mountfs+0x6f2/0xb00 [xfs] > ? xfs_fs_fill_super+0x483/0x610 [xfs] > ? mount_bdev+0x180/0x1b0 > ? xfs_finish_flags+0x150/0x150 [xfs] > ? xfs_fs_mount+0x15/0x20 [xfs] > ? mount_fs+0x14/0x80 > ? vfs_kern_mount+0x67/0x170 > ? do_mount+0x195/0xd00 > ? kmem_cache_alloc_trace+0x231/0x2a0 > ? SyS_mount+0x95/0xe0 > ? entry_SYSCALL_64_fastpath+0x1f/0xbe > > Any thoughts? I'm not sure what's going on for sure, other than the > call stack looks funny and it's midnight so I'm going to sleep. :) > It looks like a problem with the loop in xfs_qm_dquot_walk(). The next lookup index is calculated as: next_index = be32_to_cpu(dqp->q_core.d_id) + 1; ... each time through the loop. With the uid written above, the +1 overflows the 32-bit next_index back to zero and the lookup starts over. I suppose a simple fix might be to do something like the following. Thoughts? --- 8< --- diff --git a/fs/xfs/xfs_qm.c b/fs/xfs/xfs_qm.c index 6ce948c..f013c893 100644 --- a/fs/xfs/xfs_qm.c +++ b/fs/xfs/xfs_qm.c @@ -111,6 +111,8 @@ xfs_qm_dquot_walk( skipped = 0; break; } + if (!next_index) + break; } if (skipped) {