All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 2/3] xfs: fix infinite loop by detaching the group/project hints from user dquot
@ 2013-11-26 13:38 Jeff Liu
  2013-11-28 10:43 ` Christoph Hellwig
  0 siblings, 1 reply; 12+ messages in thread
From: Jeff Liu @ 2013-11-26 13:38 UTC (permalink / raw)
  To: xfs

From: Jie Liu <jeff.liu@oracle.com>

xfs_quota(8) will hang up if trying to turn group/project quota off
before the user quota is off, this could be 100% reproduced by:
  # mount -ouquota,gquota /dev/sda7 /xfs
  # mkdir /xfs/test
  # xfs_quota -xc 'off -g' /xfs <-- hangs up
  # echo w > /proc/sysrq-trigger
  # dmesg

  SysRq : Show Blocked State
  task                        PC stack   pid father
  xfs_quota       D 0000000000000000     0 27574   2551 0x00000000
  [snip]
  Call Trace:
  [<ffffffff81aaa21d>] schedule+0xad/0xc0
  [<ffffffff81aa327e>] schedule_timeout+0x35e/0x3c0
  [<ffffffff8114b506>] ? mark_held_locks+0x176/0x1c0
  [<ffffffff810ad6c0>] ? call_timer_fn+0x2c0/0x2c0
  [<ffffffffa0c25380>] ? xfs_qm_shrink_count+0x30/0x30 [xfs]
  [<ffffffff81aa3306>] schedule_timeout_uninterruptible+0x26/0x30
  [<ffffffffa0c26155>] xfs_qm_dquot_walk+0x235/0x260 [xfs]
  [<ffffffffa0c059d8>] ? xfs_perag_get+0x1d8/0x2d0 [xfs]
  [<ffffffffa0c05805>] ? xfs_perag_get+0x5/0x2d0 [xfs]
  [<ffffffffa0b7707e>] ? xfs_inode_ag_iterator+0xae/0xf0 [xfs]
  [<ffffffffa0c22280>] ? xfs_trans_free_dqinfo+0x50/0x50 [xfs]
  [<ffffffffa0b7709f>] ? xfs_inode_ag_iterator+0xcf/0xf0 [xfs]
  [<ffffffffa0c261e6>] xfs_qm_dqpurge_all+0x66/0xb0 [xfs]
  [<ffffffffa0c2497a>] xfs_qm_scall_quotaoff+0x20a/0x5f0 [xfs]
  [<ffffffffa0c2b8f6>] xfs_fs_set_xstate+0x136/0x180 [xfs]
  [<ffffffff8136cf7a>] do_quotactl+0x53a/0x6b0
  [<ffffffff812fba4b>] ? iput+0x5b/0x90
  [<ffffffff8136d257>] SyS_quotactl+0x167/0x1d0
  [<ffffffff814cf2ee>] ? trace_hardirqs_on_thunk+0x3a/0x3f
  [<ffffffff81abcd19>] system_call_fastpath+0x16/0x1b

It's fine if we turn user quota off at first, then turn off other
kind of quotas if they are enabled since the group/project dquot
refcount is decreased to zero once the user quota if off. Otherwise,
those dquots refcount is non-zero due to the user dquot might refer
to them as hint(s).  Hence, above operation cause an infinite loop
at xfs_qm_dquot_walk() while trying to purge dquot cache.

This problem has been around since Linux 3.4, it was introduced by:
  [ b84a3a9675 xfs: remove the per-filesystem list of dquots ]

Originally we will release the group dquot pointers because the user
dquots maybe carrying around as a hint via xfs_qm_detach_gdquots().
However, with above change, there is no such work to be done before
purging group/project dquot cache.

In order to solve this problem, this patch introduces a special routine
xfs_qm_dqpurge_hints(), and it would release the group/project dquot
pointers the user dquots maybe carrying around as a hint, and then it
will proceed to purge the user dquot cache if requested.

Cc: stable@vger.kernel.org
Signed-off-by: Jie Liu <jeff.liu@oracle.com>
---
 fs/xfs/xfs_qm.c | 71 ++++++++++++++++++++++++++++++++++++++++-----------------
 1 file changed, 50 insertions(+), 21 deletions(-)

diff --git a/fs/xfs/xfs_qm.c b/fs/xfs/xfs_qm.c
index 14a4996..424ef73 100644
--- a/fs/xfs/xfs_qm.c
+++ b/fs/xfs/xfs_qm.c
@@ -134,8 +134,6 @@ xfs_qm_dqpurge(
 {
 	struct xfs_mount	*mp = dqp->q_mount;
 	struct xfs_quotainfo	*qi = mp->m_quotainfo;
-	struct xfs_dquot	*gdqp = NULL;
-	struct xfs_dquot	*pdqp = NULL;
 
 	xfs_dqlock(dqp);
 	if ((dqp->dq_flags & XFS_DQ_FREEING) || dqp->q_nrefs != 0) {
@@ -143,21 +141,6 @@ xfs_qm_dqpurge(
 		return EAGAIN;
 	}
 
-	/*
-	 * If this quota has a hint attached, prepare for releasing it now.
-	 */
-	gdqp = dqp->q_gdquot;
-	if (gdqp) {
-		xfs_dqlock(gdqp);
-		dqp->q_gdquot = NULL;
-	}
-
-	pdqp = dqp->q_pdquot;
-	if (pdqp) {
-		xfs_dqlock(pdqp);
-		dqp->q_pdquot = NULL;
-	}
-
 	dqp->dq_flags |= XFS_DQ_FREEING;
 
 	xfs_dqflock(dqp);
@@ -206,11 +189,47 @@ xfs_qm_dqpurge(
 	XFS_STATS_DEC(xs_qm_dquot_unused);
 
 	xfs_qm_dqdestroy(dqp);
+	return 0;
+}
+
+/*
+ * Release the group or project dquot pointers the user dquots maybe carrying
+ * around as a hint, and proceed to purge the user dquot cache if requested.
+*/
+STATIC int
+xfs_qm_dqpurge_hints(
+	struct xfs_dquot	*dqp,
+	void			*data)
+{
+	struct xfs_dquot	*gdqp = NULL;
+	struct xfs_dquot	*pdqp = NULL;
+	uint			flags = *((uint *)data);
 
+	xfs_dqlock(dqp);
+	if (dqp->dq_flags & XFS_DQ_FREEING) {
+		xfs_dqunlock(dqp);
+		return EAGAIN;
+	}
+
+	/* If this quota has a hint attached, prepare for releasing it now */
+	gdqp = dqp->q_gdquot;
 	if (gdqp)
-		xfs_qm_dqput(gdqp);
+		dqp->q_gdquot = NULL;
+
+	pdqp = dqp->q_pdquot;
 	if (pdqp)
-		xfs_qm_dqput(pdqp);
+		dqp->q_pdquot = NULL;
+
+	xfs_dqunlock(dqp);
+
+	if (gdqp)
+		xfs_qm_dqrele(gdqp);
+	if (pdqp)
+		xfs_qm_dqrele(pdqp);
+
+	if (flags & XFS_QMOPT_UQUOTA)
+		return xfs_qm_dqpurge(dqp, NULL);
+
 	return 0;
 }
 
@@ -222,8 +241,18 @@ xfs_qm_dqpurge_all(
 	struct xfs_mount	*mp,
 	uint			flags)
 {
-	if (flags & XFS_QMOPT_UQUOTA)
-		xfs_qm_dquot_walk(mp, XFS_DQ_USER, xfs_qm_dqpurge, NULL);
+	/*
+	 * We have to release group/project dquot hint(s) from the user dquot
+	 * at first if they are there, otherwise we would run into an infinite
+	 * loop while walking through radix tree to purge other type of dquots
+	 * since their refcount is not zero if the user dquot refers to them
+	 * as hint.
+	 *
+	 * Call the special xfs_qm_dqpurge_hints() will end up go through the
+	 * general xfs_qm_dqpurge() against user dquot cache if requested.
+	 */
+	xfs_qm_dquot_walk(mp, XFS_DQ_USER, xfs_qm_dqpurge_hints, &flags);
+
 	if (flags & XFS_QMOPT_GQUOTA)
 		xfs_qm_dquot_walk(mp, XFS_DQ_GROUP, xfs_qm_dqpurge, NULL);
 	if (flags & XFS_QMOPT_PQUOTA)
-- 
1.8.3.2

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 2/3] xfs: fix infinite loop by detaching the group/project hints from user dquot
  2013-11-26 13:38 [PATCH v2 2/3] xfs: fix infinite loop by detaching the group/project hints from user dquot Jeff Liu
@ 2013-11-28 10:43 ` Christoph Hellwig
  2013-11-29  9:36   ` Jeff Liu
  0 siblings, 1 reply; 12+ messages in thread
From: Christoph Hellwig @ 2013-11-28 10:43 UTC (permalink / raw)
  To: Jeff Liu; +Cc: xfs

On Tue, Nov 26, 2013 at 09:38:49PM +0800, Jeff Liu wrote:
> +	if (flags & XFS_QMOPT_UQUOTA)
> +		return xfs_qm_dqpurge(dqp, NULL);

To me it doesn't make any sense to overload this function for the user
quotas that don't have hints.

I'd suggest dropping this hunk and keeping a separate walk for
releasing the uquots. 

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 2/3] xfs: fix infinite loop by detaching the group/project hints from user dquot
  2013-11-28 10:43 ` Christoph Hellwig
@ 2013-11-29  9:36   ` Jeff Liu
  2013-12-06 21:01     ` Ben Myers
  0 siblings, 1 reply; 12+ messages in thread
From: Jeff Liu @ 2013-11-29  9:36 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: xfs

On 11/28 2013 18:43 PM, Christoph Hellwig wrote:
> On Tue, Nov 26, 2013 at 09:38:49PM +0800, Jeff Liu wrote:
>> +	if (flags & XFS_QMOPT_UQUOTA)
>> +		return xfs_qm_dqpurge(dqp, NULL);
> 
> To me it doesn't make any sense to overload this function for the user
> quotas that don't have hints.
To me it would like a silly compromise.
> 
> I'd suggest dropping this hunk and keeping a separate walk for
> releasing the uquots.
I thought this over and yup, that is an overload if neither group nor project
are enabled, or we don't want to turn user quota off.

But even so, we currently also have overloads by checking group/project hints
before releasing any type of quota in xfs_qm_purge().  In this point, this fix
can reduce a bit overloads by moving those checkups to xfs_qm_purge_hints() if
we want to turn group/project quotas off.

If we considering to drop above hunk to release user quota separately, we finally
would have to walk through user quota to remove those hints again, i.e,

/* Remove group/project hints from user dquot */
STATIC int
xfs_qm_dqpurge_hints(
	struct xfs_dquot        *dqp,
	void                    *data)
{
	uint                    flags = *((uint *)data);
	struct xfs_dquot        *gdqp;
	struct xfs_dquot        *pdqp;

	xfs_dqlock(dqp);
	if (dqp->dq_flags & XFS_DQ_FREEING) {
		xfs_dqunlock(dqp);
		return EAGAIN;
	}

	/* If this quota has a hint attached, prepare for releasing it now */
        gdqp = dqp->q_gdquot;
	if (gdqp)
		dqp->q_gdquot = NULL;

	pdqp = dqp->q_pdquot;
	if (pdqp)
		dqp->q_pdquot = NULL;

	xfs_dqunlock(dqp);

	if (gdqp)
		xfs_qm_dqrele(gdqp);
	if (pdqp)
		xfs_qm_dqrele(pdqp);

	return 0;
}

void
xfs_qm_dqpurge_all()
{
	xfs_qm_dquot_walk(mp, XFS_DQ_USER, xfs_qm_dqpurge_hints, NULL);

	if (flags & XFS_QMOPT_UQUOTA)
		xfs_qm_dquot_walk(mp, XFS_DQ_USER, xfs_qm_dqpurge, NULL);
	if (flags & XFS_QMOPT_GQUOTA)
		xfs_qm_dquot_walk(mp, XFS_DQ_GROUP, xfs_qm_dqpurge, NULL);
	if (flags & XFS_QMOPT_PQUOTA)
		xfs_qm_dquot_walk(mp, XFS_DQ_PROJ, xfs_qm_dqpurge, NULL);
}

Above code is what I can figured out as per your suggestions for now, but it
would introduce overheads for walking through user dquots to release hints
separately if we want to turn user quota off.

Any thoughts?

Thanks,
-Jeff

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 2/3] xfs: fix infinite loop by detaching the group/project hints from user dquot
  2013-11-29  9:36   ` Jeff Liu
@ 2013-12-06 21:01     ` Ben Myers
  2013-12-07  5:51       ` Jeff Liu
  0 siblings, 1 reply; 12+ messages in thread
From: Ben Myers @ 2013-12-06 21:01 UTC (permalink / raw)
  To: Jeff Liu, Christoph Hellwig; +Cc: xfs

Hey Jeff,

On Fri, Nov 29, 2013 at 05:36:01PM +0800, Jeff Liu wrote:
> On 11/28 2013 18:43 PM, Christoph Hellwig wrote:
> > On Tue, Nov 26, 2013 at 09:38:49PM +0800, Jeff Liu wrote:
> >> +	if (flags & XFS_QMOPT_UQUOTA)
> >> +		return xfs_qm_dqpurge(dqp, NULL);
> > 
> > To me it doesn't make any sense to overload this function for the user
> > quotas that don't have hints.
> To me it would like a silly compromise.
> > 
> > I'd suggest dropping this hunk and keeping a separate walk for
> > releasing the uquots.
> I thought this over and yup, that is an overload if neither group nor project
> are enabled, or we don't want to turn user quota off.
> 
> But even so, we currently also have overloads by checking group/project hints
> before releasing any type of quota in xfs_qm_purge().  In this point, this fix
> can reduce a bit overloads by moving those checkups to xfs_qm_purge_hints() if
> we want to turn group/project quotas off.
> 
> If we considering to drop above hunk to release user quota separately, we finally
> would have to walk through user quota to remove those hints again, i.e,
> 
> /* Remove group/project hints from user dquot */
> STATIC int
> xfs_qm_dqpurge_hints(
> 	struct xfs_dquot        *dqp,
> 	void                    *data)
> {
> 	uint                    flags = *((uint *)data);
> 	struct xfs_dquot        *gdqp;
> 	struct xfs_dquot        *pdqp;
> 
> 	xfs_dqlock(dqp);
> 	if (dqp->dq_flags & XFS_DQ_FREEING) {
> 		xfs_dqunlock(dqp);
> 		return EAGAIN;
> 	}
> 
> 	/* If this quota has a hint attached, prepare for releasing it now */
>         gdqp = dqp->q_gdquot;
> 	if (gdqp)
> 		dqp->q_gdquot = NULL;
> 
> 	pdqp = dqp->q_pdquot;
> 	if (pdqp)
> 		dqp->q_pdquot = NULL;
> 
> 	xfs_dqunlock(dqp);
> 
> 	if (gdqp)
> 		xfs_qm_dqrele(gdqp);
> 	if (pdqp)
> 		xfs_qm_dqrele(pdqp);
> 
> 	return 0;
> }
> 
> void
> xfs_qm_dqpurge_all()
> {
> 	xfs_qm_dquot_walk(mp, XFS_DQ_USER, xfs_qm_dqpurge_hints, NULL);
> 
> 	if (flags & XFS_QMOPT_UQUOTA)
> 		xfs_qm_dquot_walk(mp, XFS_DQ_USER, xfs_qm_dqpurge, NULL);
> 	if (flags & XFS_QMOPT_GQUOTA)
> 		xfs_qm_dquot_walk(mp, XFS_DQ_GROUP, xfs_qm_dqpurge, NULL);
> 	if (flags & XFS_QMOPT_PQUOTA)
> 		xfs_qm_dquot_walk(mp, XFS_DQ_PROJ, xfs_qm_dqpurge, NULL);
> }
> 
> Above code is what I can figured out as per your suggestions for now, but it
> would introduce overheads for walking through user dquots to release hints
> separately if we want to turn user quota off.
> 
> Any thoughts?

I was gonna pull in the single walk version, but now I realize that it is still
under discussion.  I'm happy with either implementation, with maybe a slight
preference for a single user quota walk.  Can you and Christoph come to an
agreement?

Thanks,
	Ben

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 2/3] xfs: fix infinite loop by detaching the group/project hints from user dquot
  2013-12-06 21:01     ` Ben Myers
@ 2013-12-07  5:51       ` Jeff Liu
  2013-12-09  1:26         ` Dave Chinner
  0 siblings, 1 reply; 12+ messages in thread
From: Jeff Liu @ 2013-12-07  5:51 UTC (permalink / raw)
  To: Ben Myers, Christoph Hellwig; +Cc: xfs

Hi Ben,

On 12/07 2013 05:01 AM, Ben Myers wrote:
> Hey Jeff,
> 
> On Fri, Nov 29, 2013 at 05:36:01PM +0800, Jeff Liu wrote:
>> On 11/28 2013 18:43 PM, Christoph Hellwig wrote:
>>> On Tue, Nov 26, 2013 at 09:38:49PM +0800, Jeff Liu wrote:
>>>> +	if (flags & XFS_QMOPT_UQUOTA)
>>>> +		return xfs_qm_dqpurge(dqp, NULL);
>>>
>>> To me it doesn't make any sense to overload this function for the user
>>> quotas that don't have hints.
>> To me it would like a silly compromise.
>>>
>>> I'd suggest dropping this hunk and keeping a separate walk for
>>> releasing the uquots.
>> I thought this over and yup, that is an overload if neither group nor project
>> are enabled, or we don't want to turn user quota off.
>>
>> But even so, we currently also have overloads by checking group/project hints
>> before releasing any type of quota in xfs_qm_purge().  In this point, this fix
>> can reduce a bit overloads by moving those checkups to xfs_qm_purge_hints() if
>> we want to turn group/project quotas off.
>>
>> If we considering to drop above hunk to release user quota separately, we finally
>> would have to walk through user quota to remove those hints again, i.e,
>>
>> /* Remove group/project hints from user dquot */
>> STATIC int
>> xfs_qm_dqpurge_hints(
>> 	struct xfs_dquot        *dqp,
>> 	void                    *data)
>> {
>> 	uint                    flags = *((uint *)data);
>> 	struct xfs_dquot        *gdqp;
>> 	struct xfs_dquot        *pdqp;
>>
>> 	xfs_dqlock(dqp);
>> 	if (dqp->dq_flags & XFS_DQ_FREEING) {
>> 		xfs_dqunlock(dqp);
>> 		return EAGAIN;
>> 	}
>>
>> 	/* If this quota has a hint attached, prepare for releasing it now */
>>         gdqp = dqp->q_gdquot;
>> 	if (gdqp)
>> 		dqp->q_gdquot = NULL;
>>
>> 	pdqp = dqp->q_pdquot;
>> 	if (pdqp)
>> 		dqp->q_pdquot = NULL;
>>
>> 	xfs_dqunlock(dqp);
>>
>> 	if (gdqp)
>> 		xfs_qm_dqrele(gdqp);
>> 	if (pdqp)
>> 		xfs_qm_dqrele(pdqp);
>>
>> 	return 0;
>> }
>>
>> void
>> xfs_qm_dqpurge_all()
>> {
>> 	xfs_qm_dquot_walk(mp, XFS_DQ_USER, xfs_qm_dqpurge_hints, NULL);
>>
>> 	if (flags & XFS_QMOPT_UQUOTA)
>> 		xfs_qm_dquot_walk(mp, XFS_DQ_USER, xfs_qm_dqpurge, NULL);
>> 	if (flags & XFS_QMOPT_GQUOTA)
>> 		xfs_qm_dquot_walk(mp, XFS_DQ_GROUP, xfs_qm_dqpurge, NULL);
>> 	if (flags & XFS_QMOPT_PQUOTA)
>> 		xfs_qm_dquot_walk(mp, XFS_DQ_PROJ, xfs_qm_dqpurge, NULL);
>> }
>>
>> Above code is what I can figured out as per your suggestions for now, but it
>> would introduce overheads for walking through user dquots to release hints
>> separately if we want to turn user quota off.
>>
>> Any thoughts?
> 
> I was gonna pull in the single walk version, but now I realize that it is still
> under discussion.  I'm happy with either implementation, with maybe a slight
> preference for a single user quota walk.  Can you and Christoph come to an
> agreement?
For now, I can not figure out a more optimized solution.  Well, I just realized
I don't need to initialize both gdqp and pdqp to NULL at xfs_qm_dqpurge_hints()
since they will be evaluated by dqp pointers dereference anyway.  As a minor fix,
the revised version was shown as follows.

Christoph, as I mentioned previously, keeping a separate walk to release the user
dquots would also have overloads in some cases, would you happy to have this fix
although it is not most optimized?

Also, Ben, would you please pull in another fix below? It is independent to the
current fix and it has already been reviewed by Christoph. :)
[PATCH v2 1/3] xfs: fix assertion failure at xfs_setattr_nonsize

Thanks,
-Jeff

From: Jie Liu <jeff.liu@oracle.com>

xfs_quota(8) will hang up if trying to turn group/project quota off
before the user quota is off, this could be 100% reproduced by:
  # mount -ouquota,gquota /dev/sda7 /xfs
  # mkdir /xfs/test
  # xfs_quota -xc 'off -g' /xfs <-- hangs up
  # echo w > /proc/sysrq-trigger
  # dmesg

  SysRq : Show Blocked State
  task                        PC stack   pid father
  xfs_quota       D 0000000000000000     0 27574   2551 0x00000000
  [snip]
  Call Trace:
  [<ffffffff81aaa21d>] schedule+0xad/0xc0
  [<ffffffff81aa327e>] schedule_timeout+0x35e/0x3c0
  [<ffffffff8114b506>] ? mark_held_locks+0x176/0x1c0
  [<ffffffff810ad6c0>] ? call_timer_fn+0x2c0/0x2c0
  [<ffffffffa0c25380>] ? xfs_qm_shrink_count+0x30/0x30 [xfs]
  [<ffffffff81aa3306>] schedule_timeout_uninterruptible+0x26/0x30
  [<ffffffffa0c26155>] xfs_qm_dquot_walk+0x235/0x260 [xfs]
  [<ffffffffa0c059d8>] ? xfs_perag_get+0x1d8/0x2d0 [xfs]
  [<ffffffffa0c05805>] ? xfs_perag_get+0x5/0x2d0 [xfs]
  [<ffffffffa0b7707e>] ? xfs_inode_ag_iterator+0xae/0xf0 [xfs]
  [<ffffffffa0c22280>] ? xfs_trans_free_dqinfo+0x50/0x50 [xfs]
  [<ffffffffa0b7709f>] ? xfs_inode_ag_iterator+0xcf/0xf0 [xfs]
  [<ffffffffa0c261e6>] xfs_qm_dqpurge_all+0x66/0xb0 [xfs]
  [<ffffffffa0c2497a>] xfs_qm_scall_quotaoff+0x20a/0x5f0 [xfs]
  [<ffffffffa0c2b8f6>] xfs_fs_set_xstate+0x136/0x180 [xfs]
  [<ffffffff8136cf7a>] do_quotactl+0x53a/0x6b0
  [<ffffffff812fba4b>] ? iput+0x5b/0x90
  [<ffffffff8136d257>] SyS_quotactl+0x167/0x1d0
  [<ffffffff814cf2ee>] ? trace_hardirqs_on_thunk+0x3a/0x3f
  [<ffffffff81abcd19>] system_call_fastpath+0x16/0x1b

It's fine if we turn user quota off at first, then turn off other
kind of quotas if they are enabled since the group/project dquot
refcount is decreased to zero once the user quota if off. Otherwise,
those dquots refcount is non-zero due to the user dquot might refer
to them as hint(s).  Hence, above operation cause an infinite loop
at xfs_qm_dquot_walk() while trying to purge dquot cache.

This problem has been around since Linux 3.4, it was introduced by:
  [ b84a3a9675 xfs: remove the per-filesystem list of dquots ]

Originally we will release the group dquot pointers because the user
dquots maybe carrying around as a hint via xfs_qm_detach_gdquots().
However, with above change, there is no such work to be done before
purging group/project dquot cache.

In order to solve this problem, this patch introduces a special routine
xfs_qm_dqpurge_hints(), and it would release the group/project dquot
pointers the user dquots maybe carrying around as a hint, and then it
will proceed to purge the user dquot cache if requested.

Cc: stable@vger.kernel.org
Signed-off-by: Jie Liu <jeff.liu@oracle.com>
---
 fs/xfs/xfs_qm.c | 71 ++++++++++++++++++++++++++++++++++++++++-----------------
 1 file changed, 50 insertions(+), 21 deletions(-)

diff --git a/fs/xfs/xfs_qm.c b/fs/xfs/xfs_qm.c
index 14a4996..424ef73 100644
--- a/fs/xfs/xfs_qm.c
+++ b/fs/xfs/xfs_qm.c
@@ -134,8 +134,6 @@ xfs_qm_dqpurge(
 {
 	struct xfs_mount	*mp = dqp->q_mount;
 	struct xfs_quotainfo	*qi = mp->m_quotainfo;
-	struct xfs_dquot	*gdqp = NULL;
-	struct xfs_dquot	*pdqp = NULL;
 
 	xfs_dqlock(dqp);
 	if ((dqp->dq_flags & XFS_DQ_FREEING) || dqp->q_nrefs != 0) {
@@ -143,21 +141,6 @@ xfs_qm_dqpurge(
 		return EAGAIN;
 	}
 
-	/*
-	 * If this quota has a hint attached, prepare for releasing it now.
-	 */
-	gdqp = dqp->q_gdquot;
-	if (gdqp) {
-		xfs_dqlock(gdqp);
-		dqp->q_gdquot = NULL;
-	}
-
-	pdqp = dqp->q_pdquot;
-	if (pdqp) {
-		xfs_dqlock(pdqp);
-		dqp->q_pdquot = NULL;
-	}
-
 	dqp->dq_flags |= XFS_DQ_FREEING;
 
 	xfs_dqflock(dqp);
@@ -206,11 +189,47 @@ xfs_qm_dqpurge(
 	XFS_STATS_DEC(xs_qm_dquot_unused);
 
 	xfs_qm_dqdestroy(dqp);
+	return 0;
+}
+
+/*
+ * Release the group or project dquot pointers the user dquots maybe carrying
+ * around as a hint, and proceed to purge the user dquot cache if requested.
+*/
+STATIC int
+xfs_qm_dqpurge_hints(
+	struct xfs_dquot	*dqp,
+	void			*data)
+{
+	uint			flags = *((uint *)data);
+	struct xfs_dquot	*gdqp;
+	struct xfs_dquot	*pdqp;
 
+	xfs_dqlock(dqp);
+	if (dqp->dq_flags & XFS_DQ_FREEING) {
+		xfs_dqunlock(dqp);
+		return EAGAIN;
+	}
+
+	/* If this quota has a hint attached, prepare for releasing it now */
+	gdqp = dqp->q_gdquot;
 	if (gdqp)
-		xfs_qm_dqput(gdqp);
+		dqp->q_gdquot = NULL;
+
+	pdqp = dqp->q_pdquot;
 	if (pdqp)
-		xfs_qm_dqput(pdqp);
+		dqp->q_pdquot = NULL;
+
+	xfs_dqunlock(dqp);
+
+	if (gdqp)
+		xfs_qm_dqrele(gdqp);
+	if (pdqp)
+		xfs_qm_dqrele(pdqp);
+
+	if (flags & XFS_QMOPT_UQUOTA)
+		return xfs_qm_dqpurge(dqp, NULL);
+
 	return 0;
 }
 
@@ -222,8 +241,18 @@ xfs_qm_dqpurge_all(
 	struct xfs_mount	*mp,
 	uint			flags)
 {
-	if (flags & XFS_QMOPT_UQUOTA)
-		xfs_qm_dquot_walk(mp, XFS_DQ_USER, xfs_qm_dqpurge, NULL);
+	/*
+	 * We have to release group/project dquot hint(s) from the user dquot
+	 * at first if they are there, otherwise we would run into an infinite
+	 * loop while walking through radix tree to purge other type of dquots
+	 * since their refcount is not zero if the user dquot refers to them
+	 * as hint.
+	 *
+	 * Call the special xfs_qm_dqpurge_hints() will end up go through the
+	 * general xfs_qm_dqpurge() against user dquot cache if requested.
+	 */
+	xfs_qm_dquot_walk(mp, XFS_DQ_USER, xfs_qm_dqpurge_hints, &flags);
+
 	if (flags & XFS_QMOPT_GQUOTA)
 		xfs_qm_dquot_walk(mp, XFS_DQ_GROUP, xfs_qm_dqpurge, NULL);
 	if (flags & XFS_QMOPT_PQUOTA)
-- 1.8.3.2

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 2/3] xfs: fix infinite loop by detaching the group/project hints from user dquot
  2013-12-07  5:51       ` Jeff Liu
@ 2013-12-09  1:26         ` Dave Chinner
  2013-12-09  2:36           ` Dave Chinner
  0 siblings, 1 reply; 12+ messages in thread
From: Dave Chinner @ 2013-12-09  1:26 UTC (permalink / raw)
  To: Jeff Liu; +Cc: Christoph Hellwig, Ben Myers, xfs

On Sat, Dec 07, 2013 at 01:51:24PM +0800, Jeff Liu wrote:
> Hi Ben,
> 
....
> >> void
> >> xfs_qm_dqpurge_all()
> >> {
> >> 	xfs_qm_dquot_walk(mp, XFS_DQ_USER, xfs_qm_dqpurge_hints, NULL);
> >>
> >> 	if (flags & XFS_QMOPT_UQUOTA)
> >> 		xfs_qm_dquot_walk(mp, XFS_DQ_USER, xfs_qm_dqpurge, NULL);
> >> 	if (flags & XFS_QMOPT_GQUOTA)
> >> 		xfs_qm_dquot_walk(mp, XFS_DQ_GROUP, xfs_qm_dqpurge, NULL);
> >> 	if (flags & XFS_QMOPT_PQUOTA)
> >> 		xfs_qm_dquot_walk(mp, XFS_DQ_PROJ, xfs_qm_dqpurge, NULL);
> >> }
> >>
> >> Above code is what I can figured out as per your suggestions for now, but it
> >> would introduce overheads for walking through user dquots to release hints
> >> separately if we want to turn user quota off.
> >>
> >> Any thoughts?
> > 
> > I was gonna pull in the single walk version, but now I realize that it is still
> > under discussion.  I'm happy with either implementation, with maybe a slight
> > preference for a single user quota walk.  Can you and Christoph come to an
> > agreement?
> For now, I can not figure out a more optimized solution.  Well, I just realized
> I don't need to initialize both gdqp and pdqp to NULL at xfs_qm_dqpurge_hints()
> since they will be evaluated by dqp pointers dereference anyway.  As a minor fix,
> the revised version was shown as follows.
> 
> Christoph, as I mentioned previously, keeping a separate walk to release the user
> dquots would also have overloads in some cases, would you happy to have this fix
> although it is not most optimized?

I'm happy either way it is done - I'd prefer we fix the problem than
bikeshed over an extra radix tree walk or not given for most people
the overhead won't be significant.

> From: Jie Liu <jeff.liu@oracle.com>
> 
> xfs_quota(8) will hang up if trying to turn group/project quota off
> before the user quota is off, this could be 100% reproduced by:
.....

So from the perspective, I'm happy to consider the updated
patch as:

Reviewed-by: Dave Chinner <dchinner@redhat.com>

However, I question the need for the hints at all now. The hints
were necessary back when the quota manager had global lists and
hashes, and the lookups were expensive. Hence there was a
significant win to caching the group dquot on the user dquot as it
avoided a significant amount of code, locks and dirty cachelines.

Now, it's just a radix tree lookup under only a single lock and the
process dirties far fewer cachelines (none in the radix tree at all)
and so should be substantially faster than the old code. And with
the dquots being attached and cached on inodes in the first place, I
don't see much advantage to keeping hints on the user dquot. THis is
especially true for project quotas where a user might be accessing
files in different projects all the time and so thrashing the
project quota hint on the user dquot....

Hence I wonder if removing the dquot hint caching altogether would
result in smaller, simpler, faster code.  And, in reality, if the
radix tree lock is a contention point on lookup after removing the
hints, then we can fix that quite easily by switching to RCU-based
lockless lookups like we do for the inode cache....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 2/3] xfs: fix infinite loop by detaching the group/project hints from user dquot
  2013-12-09  1:26         ` Dave Chinner
@ 2013-12-09  2:36           ` Dave Chinner
  2013-12-09  3:26             ` Jeff Liu
  2013-12-09  7:10             ` Christoph Hellwig
  0 siblings, 2 replies; 12+ messages in thread
From: Dave Chinner @ 2013-12-09  2:36 UTC (permalink / raw)
  To: Jeff Liu; +Cc: Christoph Hellwig, Ben Myers, xfs

On Mon, Dec 09, 2013 at 12:26:42PM +1100, Dave Chinner wrote:
> On Sat, Dec 07, 2013 at 01:51:24PM +0800, Jeff Liu wrote:
> > Hi Ben,
> > 
> ....
> > >> void
> > >> xfs_qm_dqpurge_all()
> > >> {
> > >> 	xfs_qm_dquot_walk(mp, XFS_DQ_USER, xfs_qm_dqpurge_hints, NULL);
> > >>
> > >> 	if (flags & XFS_QMOPT_UQUOTA)
> > >> 		xfs_qm_dquot_walk(mp, XFS_DQ_USER, xfs_qm_dqpurge, NULL);
> > >> 	if (flags & XFS_QMOPT_GQUOTA)
> > >> 		xfs_qm_dquot_walk(mp, XFS_DQ_GROUP, xfs_qm_dqpurge, NULL);
> > >> 	if (flags & XFS_QMOPT_PQUOTA)
> > >> 		xfs_qm_dquot_walk(mp, XFS_DQ_PROJ, xfs_qm_dqpurge, NULL);
> > >> }
> > >>
> > >> Above code is what I can figured out as per your suggestions for now, but it
> > >> would introduce overheads for walking through user dquots to release hints
> > >> separately if we want to turn user quota off.
> > >>
> > >> Any thoughts?
> > > 
> > > I was gonna pull in the single walk version, but now I realize that it is still
> > > under discussion.  I'm happy with either implementation, with maybe a slight
> > > preference for a single user quota walk.  Can you and Christoph come to an
> > > agreement?
> > For now, I can not figure out a more optimized solution.  Well, I just realized
> > I don't need to initialize both gdqp and pdqp to NULL at xfs_qm_dqpurge_hints()
> > since they will be evaluated by dqp pointers dereference anyway.  As a minor fix,
> > the revised version was shown as follows.
> > 
> > Christoph, as I mentioned previously, keeping a separate walk to release the user
> > dquots would also have overloads in some cases, would you happy to have this fix
> > although it is not most optimized?
> 
> I'm happy either way it is done - I'd prefer we fix the problem than
> bikeshed over an extra radix tree walk or not given for most people
> the overhead won't be significant.
> 
> > From: Jie Liu <jeff.liu@oracle.com>
> > 
> > xfs_quota(8) will hang up if trying to turn group/project quota off
> > before the user quota is off, this could be 100% reproduced by:
> .....
> 
> So from the perspective, I'm happy to consider the updated
> patch as:
> 
> Reviewed-by: Dave Chinner <dchinner@redhat.com>
> 
> However, I question the need for the hints at all now. The hints
> were necessary back when the quota manager had global lists and
> hashes, and the lookups were expensive. Hence there was a
> significant win to caching the group dquot on the user dquot as it
> avoided a significant amount of code, locks and dirty cachelines.
> 
> Now, it's just a radix tree lookup under only a single lock and the
> process dirties far fewer cachelines (none in the radix tree at all)
> and so should be substantially faster than the old code. And with
> the dquots being attached and cached on inodes in the first place, I
> don't see much advantage to keeping hints on the user dquot. THis is
> especially true for project quotas where a user might be accessing
> files in different projects all the time and so thrashing the
> project quota hint on the user dquot....
> 
> Hence I wonder if removing the dquot hint caching altogether would
> result in smaller, simpler, faster code.  And, in reality, if the
> radix tree lock is a contention point on lookup after removing the
> hints, then we can fix that quite easily by switching to RCU-based
> lockless lookups like we do for the inode cache....

Actually, scalability couldn't get any worse by removing the hints.
If I run a concurrent workload with quota enabled, the global dquot
locks (be it user, quota or project) completely serialises the
workload. This result if from u/g/p all enabled, run by a single
user in a single group and a project ID of zero:

./fs_mark  -D  10000  -S0  -n  100000  -s  0  -L  32  -d  /mnt/scratch/0  -d  /mnt/scratch/1  -d  /mnt/scratch/2  -d  /mnt/scratch/3  -d  /mnt/scratch/4  -d  /mnt/scratch/5  -d  /mnt/scratch/6  -d  /mnt/scratch/7  -d  /mnt/scratch/8  -d  /mnt/scratch/9  -d  /mnt/scratch/10  -d  /mnt/scratch/11  -d  /mnt/scratch/12  -d  /mnt/scratch/13  -d  /mnt/scratch/14  -d  /mnt/scratch/15
#       Version 3.3, 16 thread(s) starting at Mon Dec  9 12:53:46 2013
#       Sync method: NO SYNC: Test does not issue sync() or fsync() calls.
#       Directories:  Time based hash between directories across 10000 subdirectories with 180 seconds per subdirectory.
#       File names: 40 bytes long, (16 initial bytes of time stamp with 24 random bytes at end of name)
#       Files info: size 0 bytes, written with an IO size of 16384 bytes per write
#       App overhead is time in microseconds spent in the test not doing file writing related system calls.

FSUse%        Count         Size    Files/sec     App Overhead
     0      1600000            0      17666.5         15377143
     0      3200000            0      17018.6         15922906
     0      4800000            0      17373.5         16149660
     0      6400000            0      16564.9         17234139
....

Without quota enabled, that workload runs at >250,000 files/sec.

Serialisation is completely on the dquot locks - so I don't see
anything right now that hints are going to buy us in terms of
improving concurrency or scalability, so I think we probably can
just get rid of them.

FWIW, getting rid of the hints and converting the dquot reference
counter to an atomic actually improves performance a bit:

FSUse%        Count         Size    Files/sec     App Overhead
     0      1600000            0      17559.3         15606077
     0      3200000            0      18738.9         14026009
     0      4800000            0      18960.0         14381162
     0      6400000            0      19026.5         14422024
     0      8000000            0      18456.6         15369059

Sure, 10% improvement is 10%, but concurrency still sucks. At least
it narrows down the cause - the transactional modifications are the
serialisation issue.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 2/3] xfs: fix infinite loop by detaching the group/project hints from user dquot
  2013-12-09  2:36           ` Dave Chinner
@ 2013-12-09  3:26             ` Jeff Liu
  2013-12-09  6:09               ` Dave Chinner
  2013-12-09  7:10             ` Christoph Hellwig
  1 sibling, 1 reply; 12+ messages in thread
From: Jeff Liu @ 2013-12-09  3:26 UTC (permalink / raw)
  To: Dave Chinner; +Cc: Christoph Hellwig, Ben Myers, xfs

On 12/09 2013 10:36 AM, Dave Chinner wrote:
> On Mon, Dec 09, 2013 at 12:26:42PM +1100, Dave Chinner wrote:
>> On Sat, Dec 07, 2013 at 01:51:24PM +0800, Jeff Liu wrote:
>>> Hi Ben,
>>>
>> ....
>>>>> void
>>>>> xfs_qm_dqpurge_all()
>>>>> {
>>>>> 	xfs_qm_dquot_walk(mp, XFS_DQ_USER, xfs_qm_dqpurge_hints, NULL);
>>>>>
>>>>> 	if (flags & XFS_QMOPT_UQUOTA)
>>>>> 		xfs_qm_dquot_walk(mp, XFS_DQ_USER, xfs_qm_dqpurge, NULL);
>>>>> 	if (flags & XFS_QMOPT_GQUOTA)
>>>>> 		xfs_qm_dquot_walk(mp, XFS_DQ_GROUP, xfs_qm_dqpurge, NULL);
>>>>> 	if (flags & XFS_QMOPT_PQUOTA)
>>>>> 		xfs_qm_dquot_walk(mp, XFS_DQ_PROJ, xfs_qm_dqpurge, NULL);
>>>>> }
>>>>>
>>>>> Above code is what I can figured out as per your suggestions for now, but it
>>>>> would introduce overheads for walking through user dquots to release hints
>>>>> separately if we want to turn user quota off.
>>>>>
>>>>> Any thoughts?
>>>>
>>>> I was gonna pull in the single walk version, but now I realize that it is still
>>>> under discussion.  I'm happy with either implementation, with maybe a slight
>>>> preference for a single user quota walk.  Can you and Christoph come to an
>>>> agreement?
>>> For now, I can not figure out a more optimized solution.  Well, I just realized
>>> I don't need to initialize both gdqp and pdqp to NULL at xfs_qm_dqpurge_hints()
>>> since they will be evaluated by dqp pointers dereference anyway.  As a minor fix,
>>> the revised version was shown as follows.
>>>
>>> Christoph, as I mentioned previously, keeping a separate walk to release the user
>>> dquots would also have overloads in some cases, would you happy to have this fix
>>> although it is not most optimized?
>>
>> I'm happy either way it is done - I'd prefer we fix the problem than
>> bikeshed over an extra radix tree walk or not given for most people
>> the overhead won't be significant.
>>
>>> From: Jie Liu <jeff.liu@oracle.com>
>>>
>>> xfs_quota(8) will hang up if trying to turn group/project quota off
>>> before the user quota is off, this could be 100% reproduced by:
>> .....
>>
>> So from the perspective, I'm happy to consider the updated
>> patch as:
>>
>> Reviewed-by: Dave Chinner <dchinner@redhat.com>
>>
>> However, I question the need for the hints at all now. The hints
>> were necessary back when the quota manager had global lists and
>> hashes, and the lookups were expensive. Hence there was a
>> significant win to caching the group dquot on the user dquot as it
>> avoided a significant amount of code, locks and dirty cachelines.
>>
>> Now, it's just a radix tree lookup under only a single lock and the
>> process dirties far fewer cachelines (none in the radix tree at all)
>> and so should be substantially faster than the old code. And with
>> the dquots being attached and cached on inodes in the first place, I
>> don't see much advantage to keeping hints on the user dquot. THis is
>> especially true for project quotas where a user might be accessing
>> files in different projects all the time and so thrashing the
>> project quota hint on the user dquot....
Ah, that accounts for it!  Yesterday, I even thought to add an udquot
member to struct xfs_dquot in order to avoid walk though user quota
while turning off others, i.e,

diff --git a/fs/xfs/xfs_dquot.h b/fs/xfs/xfs_dquot.h
index d22ed00..0037c7e 100644
--- a/fs/xfs/xfs_dquot.h
+++ b/fs/xfs/xfs_dquot.h
@@ -52,6 +52,13 @@ typedef struct xfs_dquot {
        int              q_bufoffset;   /* off of dq in buffer (# dquots) */
        xfs_fileoff_t    q_fileoffset;  /* offset in quotas file */
 
+       union {
+               struct xfs_dquot *q_udquot;
+               struct {
+                       struct xfs_dquot *q_pdquot;
+                       struct xfs_dquot *q_gdquot;
+               } gp_hints;
+       } hints;
        struct xfs_dquot*q_gdquot;      /* group dquot, hint only */
        struct xfs_dquot*q_pdquot;      /* project dquot, hint only */
        xfs_disk_dquot_t q_core;        /* actual usage & quotas */

In this way, I can attach the q_udquot to group/project dquots while
attaching them to the user's.  Thus I don't need to walk through user
dquots to fetch the hints but to fetch them via:
gdquot->hints.q_udquot.g_pdquot/g_gdquot and then decrease the reference
count, but that need more code changes and add complexities.

>>
>> Hence I wonder if removing the dquot hint caching altogether would
>> result in smaller, simpler, faster code.  And, in reality, if the
>> radix tree lock is a contention point on lookup after removing the
>> hints, then we can fix that quite easily by switching to RCU-based
>> lockless lookups like we do for the inode cache....
> 
> Actually, scalability couldn't get any worse by removing the hints.
> If I run a concurrent workload with quota enabled, the global dquot
> locks (be it user, quota or project) completely serialises the
> workload. This result if from u/g/p all enabled, run by a single
> user in a single group and a project ID of zero:
> 
> ./fs_mark  -D  10000  -S0  -n  100000  -s  0  -L  32  -d  /mnt/scratch/0  -d  /mnt/scratch/1  -d  /mnt/scratch/2  -d  /mnt/scratch/3  -d  /mnt/scratch/4  -d  /mnt/scratch/5  -d  /mnt/scratch/6  -d  /mnt/scratch/7  -d  /mnt/scratch/8  -d  /mnt/scratch/9  -d  /mnt/scratch/10  -d  /mnt/scratch/11  -d  /mnt/scratch/12  -d  /mnt/scratch/13  -d  /mnt/scratch/14  -d  /mnt/scratch/15
> #       Version 3.3, 16 thread(s) starting at Mon Dec  9 12:53:46 2013
> #       Sync method: NO SYNC: Test does not issue sync() or fsync() calls.
> #       Directories:  Time based hash between directories across 10000 subdirectories with 180 seconds per subdirectory.
> #       File names: 40 bytes long, (16 initial bytes of time stamp with 24 random bytes at end of name)
> #       Files info: size 0 bytes, written with an IO size of 16384 bytes per write
> #       App overhead is time in microseconds spent in the test not doing file writing related system calls.
> 
> FSUse%        Count         Size    Files/sec     App Overhead
>      0      1600000            0      17666.5         15377143
>      0      3200000            0      17018.6         15922906
>      0      4800000            0      17373.5         16149660
>      0      6400000            0      16564.9         17234139
> ....
> 
> Without quota enabled, that workload runs at >250,000 files/sec.
> 
> Serialisation is completely on the dquot locks - so I don't see
> anything right now that hints are going to buy us in terms of
> improving concurrency or scalability, so I think we probably can
> just get rid of them.
> 
> FWIW, getting rid of the hints and converting the dquot reference
> counter to an atomic actually improves performance a bit:
> 
> FSUse%        Count         Size    Files/sec     App Overhead
>      0      1600000            0      17559.3         15606077
>      0      3200000            0      18738.9         14026009
>      0      4800000            0      18960.0         14381162
>      0      6400000            0      19026.5         14422024
>      0      8000000            0      18456.6         15369059
> 
> Sure, 10% improvement is 10%, but concurrency still sucks. At least
> it narrows down the cause - the transactional modifications are the
> serialisation issue.
Admire!! I'm still in considering of remove the hints but you have already
shown the measuring results. :)
Would you like to fix it in this way directly or make it as a increased
improvement once my current fix got merged? Both fine to me.

Thanks,
-Jeff


_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 2/3] xfs: fix infinite loop by detaching the group/project hints from user dquot
  2013-12-09  3:26             ` Jeff Liu
@ 2013-12-09  6:09               ` Dave Chinner
  2013-12-09  6:36                 ` Dave Chinner
  0 siblings, 1 reply; 12+ messages in thread
From: Dave Chinner @ 2013-12-09  6:09 UTC (permalink / raw)
  To: Jeff Liu; +Cc: Christoph Hellwig, Ben Myers, xfs

On Mon, Dec 09, 2013 at 11:26:14AM +0800, Jeff Liu wrote:
> On 12/09 2013 10:36 AM, Dave Chinner wrote:
> > On Mon, Dec 09, 2013 at 12:26:42PM +1100, Dave Chinner wrote:
> >> On Sat, Dec 07, 2013 at 01:51:24PM +0800, Jeff Liu wrote:
> >>> Hi Ben,
> >>>
> >> ....
> >>>>> void
> >>>>> xfs_qm_dqpurge_all()
> >>>>> {
> >>>>> 	xfs_qm_dquot_walk(mp, XFS_DQ_USER, xfs_qm_dqpurge_hints, NULL);
> >>>>>
> >>>>> 	if (flags & XFS_QMOPT_UQUOTA)
> >>>>> 		xfs_qm_dquot_walk(mp, XFS_DQ_USER, xfs_qm_dqpurge, NULL);
> >>>>> 	if (flags & XFS_QMOPT_GQUOTA)
> >>>>> 		xfs_qm_dquot_walk(mp, XFS_DQ_GROUP, xfs_qm_dqpurge, NULL);
> >>>>> 	if (flags & XFS_QMOPT_PQUOTA)
> >>>>> 		xfs_qm_dquot_walk(mp, XFS_DQ_PROJ, xfs_qm_dqpurge, NULL);
> >>>>> }
> >>>>>
> >>>>> Above code is what I can figured out as per your suggestions for now, but it
> >>>>> would introduce overheads for walking through user dquots to release hints
> >>>>> separately if we want to turn user quota off.
> >>>>>
> >>>>> Any thoughts?
> >>>>
> >>>> I was gonna pull in the single walk version, but now I realize that it is still
> >>>> under discussion.  I'm happy with either implementation, with maybe a slight
> >>>> preference for a single user quota walk.  Can you and Christoph come to an
> >>>> agreement?
> >>> For now, I can not figure out a more optimized solution.  Well, I just realized
> >>> I don't need to initialize both gdqp and pdqp to NULL at xfs_qm_dqpurge_hints()
> >>> since they will be evaluated by dqp pointers dereference anyway.  As a minor fix,
> >>> the revised version was shown as follows.
> >>>
> >>> Christoph, as I mentioned previously, keeping a separate walk to release the user
> >>> dquots would also have overloads in some cases, would you happy to have this fix
> >>> although it is not most optimized?
> >>
> >> I'm happy either way it is done - I'd prefer we fix the problem than
> >> bikeshed over an extra radix tree walk or not given for most people
> >> the overhead won't be significant.
> >>
> >>> From: Jie Liu <jeff.liu@oracle.com>
> >>>
> >>> xfs_quota(8) will hang up if trying to turn group/project quota off
> >>> before the user quota is off, this could be 100% reproduced by:
> >> .....
> >>
> >> So from the perspective, I'm happy to consider the updated
> >> patch as:
> >>
> >> Reviewed-by: Dave Chinner <dchinner@redhat.com>
> >>
> >> However, I question the need for the hints at all now. The hints
> >> were necessary back when the quota manager had global lists and
> >> hashes, and the lookups were expensive. Hence there was a
> >> significant win to caching the group dquot on the user dquot as it
> >> avoided a significant amount of code, locks and dirty cachelines.
> >>
> >> Now, it's just a radix tree lookup under only a single lock and the
> >> process dirties far fewer cachelines (none in the radix tree at all)
> >> and so should be substantially faster than the old code. And with
> >> the dquots being attached and cached on inodes in the first place, I
> >> don't see much advantage to keeping hints on the user dquot. THis is
> >> especially true for project quotas where a user might be accessing
> >> files in different projects all the time and so thrashing the
> >> project quota hint on the user dquot....
> Ah, that accounts for it!  Yesterday, I even thought to add an udquot
> member to struct xfs_dquot in order to avoid walk though user quota
> while turning off others, i.e,
> 
> diff --git a/fs/xfs/xfs_dquot.h b/fs/xfs/xfs_dquot.h
> index d22ed00..0037c7e 100644
> --- a/fs/xfs/xfs_dquot.h
> +++ b/fs/xfs/xfs_dquot.h
> @@ -52,6 +52,13 @@ typedef struct xfs_dquot {
>         int              q_bufoffset;   /* off of dq in buffer (# dquots) */
>         xfs_fileoff_t    q_fileoffset;  /* offset in quotas file */
>  
> +       union {
> +               struct xfs_dquot *q_udquot;
> +               struct {
> +                       struct xfs_dquot *q_pdquot;
> +                       struct xfs_dquot *q_gdquot;
> +               } gp_hints;
> +       } hints;
>         struct xfs_dquot*q_gdquot;      /* group dquot, hint only */
>         struct xfs_dquot*q_pdquot;      /* project dquot, hint only */
>         xfs_disk_dquot_t q_core;        /* actual usage & quotas */
> 
> In this way, I can attach the q_udquot to group/project dquots while
> attaching them to the user's.  Thus I don't need to walk through user
> dquots to fetch the hints but to fetch them via:
> gdquot->hints.q_udquot.g_pdquot/g_gdquot and then decrease the reference
> count, but that need more code changes and add complexities.

Yeah, more complexity, but I can see why you might take that path ;)

> >> Hence I wonder if removing the dquot hint caching altogether would
> >> result in smaller, simpler, faster code.  And, in reality, if the
> >> radix tree lock is a contention point on lookup after removing the
> >> hints, then we can fix that quite easily by switching to RCU-based
> >> lockless lookups like we do for the inode cache....
> > 
> > Actually, scalability couldn't get any worse by removing the hints.
> > If I run a concurrent workload with quota enabled, the global dquot
> > locks (be it user, quota or project) completely serialises the
> > workload. This result if from u/g/p all enabled, run by a single
> > user in a single group and a project ID of zero:
.....
> > 
> > FSUse%        Count         Size    Files/sec     App Overhead
> >      0      1600000            0      17666.5         15377143
> >      0      3200000            0      17018.6         15922906
> >      0      4800000            0      17373.5         16149660
> >      0      6400000            0      16564.9         17234139
> > ....
> > 
> > Without quota enabled, that workload runs at >250,000 files/sec.
> > 
> > Serialisation is completely on the dquot locks - so I don't see
> > anything right now that hints are going to buy us in terms of
> > improving concurrency or scalability, so I think we probably can
> > just get rid of them.
> > 
> > FWIW, getting rid of the hints and converting the dquot reference
> > counter to an atomic actually improves performance a bit:
> > 
> > FSUse%        Count         Size    Files/sec     App Overhead
> >      0      1600000            0      17559.3         15606077
> >      0      3200000            0      18738.9         14026009
> >      0      4800000            0      18960.0         14381162
> >      0      6400000            0      19026.5         14422024
> >      0      8000000            0      18456.6         15369059
> > 
> > Sure, 10% improvement is 10%, but concurrency still sucks. At least
> > it narrows down the cause - the transactional modifications are the
> > serialisation issue.
> Admire!! I'm still in considering of remove the hints but you have already
> shown the measuring results. :)

Oh, removing 200 lines of code is easy ;)

It's making the dquot transaction code lockless that's hard...

> Would you like to fix it in this way directly or make it as a increased
> improvement once my current fix got merged? Both fine to me.

Well, lets see what other's think about remove the hints altogether.
If there is agreement on doing that, then we may as well just do
that straight away.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 2/3] xfs: fix infinite loop by detaching the group/project hints from user dquot
  2013-12-09  6:09               ` Dave Chinner
@ 2013-12-09  6:36                 ` Dave Chinner
  0 siblings, 0 replies; 12+ messages in thread
From: Dave Chinner @ 2013-12-09  6:36 UTC (permalink / raw)
  To: Jeff Liu; +Cc: Christoph Hellwig, Ben Myers, xfs

On Mon, Dec 09, 2013 at 05:09:45PM +1100, Dave Chinner wrote:
> On Mon, Dec 09, 2013 at 11:26:14AM +0800, Jeff Liu wrote:
> > On 12/09 2013 10:36 AM, Dave Chinner wrote:
> > > On Mon, Dec 09, 2013 at 12:26:42PM +1100, Dave Chinner wrote:
> > >> On Sat, Dec 07, 2013 at 01:51:24PM +0800, Jeff Liu wrote:
> > >>> Hi Ben,
> > >>>
> > >> ....
> > >>>>> void
> > >>>>> xfs_qm_dqpurge_all()
> > >>>>> {
> > >>>>> 	xfs_qm_dquot_walk(mp, XFS_DQ_USER, xfs_qm_dqpurge_hints, NULL);
> > >>>>>
> > >>>>> 	if (flags & XFS_QMOPT_UQUOTA)
> > >>>>> 		xfs_qm_dquot_walk(mp, XFS_DQ_USER, xfs_qm_dqpurge, NULL);
> > >>>>> 	if (flags & XFS_QMOPT_GQUOTA)
> > >>>>> 		xfs_qm_dquot_walk(mp, XFS_DQ_GROUP, xfs_qm_dqpurge, NULL);
> > >>>>> 	if (flags & XFS_QMOPT_PQUOTA)
> > >>>>> 		xfs_qm_dquot_walk(mp, XFS_DQ_PROJ, xfs_qm_dqpurge, NULL);
> > >>>>> }
> > >>>>>
> > >>>>> Above code is what I can figured out as per your suggestions for now, but it
> > >>>>> would introduce overheads for walking through user dquots to release hints
> > >>>>> separately if we want to turn user quota off.
> > >>>>>
> > >>>>> Any thoughts?
> > >>>>
> > >>>> I was gonna pull in the single walk version, but now I realize that it is still
> > >>>> under discussion.  I'm happy with either implementation, with maybe a slight
> > >>>> preference for a single user quota walk.  Can you and Christoph come to an
> > >>>> agreement?
> > >>> For now, I can not figure out a more optimized solution.  Well, I just realized
> > >>> I don't need to initialize both gdqp and pdqp to NULL at xfs_qm_dqpurge_hints()
> > >>> since they will be evaluated by dqp pointers dereference anyway.  As a minor fix,
> > >>> the revised version was shown as follows.
> > >>>
> > >>> Christoph, as I mentioned previously, keeping a separate walk to release the user
> > >>> dquots would also have overloads in some cases, would you happy to have this fix
> > >>> although it is not most optimized?
> > >>
> > >> I'm happy either way it is done - I'd prefer we fix the problem than
> > >> bikeshed over an extra radix tree walk or not given for most people
> > >> the overhead won't be significant.
> > >>
> > >>> From: Jie Liu <jeff.liu@oracle.com>
> > >>>
> > >>> xfs_quota(8) will hang up if trying to turn group/project quota off
> > >>> before the user quota is off, this could be 100% reproduced by:
> > >> .....
> > >>
> > >> So from the perspective, I'm happy to consider the updated
> > >> patch as:
> > >>
> > >> Reviewed-by: Dave Chinner <dchinner@redhat.com>
> > >>
> > >> However, I question the need for the hints at all now. The hints
> > >> were necessary back when the quota manager had global lists and
> > >> hashes, and the lookups were expensive. Hence there was a
> > >> significant win to caching the group dquot on the user dquot as it
> > >> avoided a significant amount of code, locks and dirty cachelines.
> > >>
> > >> Now, it's just a radix tree lookup under only a single lock and the
> > >> process dirties far fewer cachelines (none in the radix tree at all)
> > >> and so should be substantially faster than the old code. And with
> > >> the dquots being attached and cached on inodes in the first place, I
> > >> don't see much advantage to keeping hints on the user dquot. THis is
> > >> especially true for project quotas where a user might be accessing
> > >> files in different projects all the time and so thrashing the
> > >> project quota hint on the user dquot....
> > Ah, that accounts for it!  Yesterday, I even thought to add an udquot
> > member to struct xfs_dquot in order to avoid walk though user quota
> > while turning off others, i.e,
> > 
> > diff --git a/fs/xfs/xfs_dquot.h b/fs/xfs/xfs_dquot.h
> > index d22ed00..0037c7e 100644
> > --- a/fs/xfs/xfs_dquot.h
> > +++ b/fs/xfs/xfs_dquot.h
> > @@ -52,6 +52,13 @@ typedef struct xfs_dquot {
> >         int              q_bufoffset;   /* off of dq in buffer (# dquots) */
> >         xfs_fileoff_t    q_fileoffset;  /* offset in quotas file */
> >  
> > +       union {
> > +               struct xfs_dquot *q_udquot;
> > +               struct {
> > +                       struct xfs_dquot *q_pdquot;
> > +                       struct xfs_dquot *q_gdquot;
> > +               } gp_hints;
> > +       } hints;
> >         struct xfs_dquot*q_gdquot;      /* group dquot, hint only */
> >         struct xfs_dquot*q_pdquot;      /* project dquot, hint only */
> >         xfs_disk_dquot_t q_core;        /* actual usage & quotas */
> > 
> > In this way, I can attach the q_udquot to group/project dquots while
> > attaching them to the user's.  Thus I don't need to walk through user
> > dquots to fetch the hints but to fetch them via:
> > gdquot->hints.q_udquot.g_pdquot/g_gdquot and then decrease the reference
> > count, but that need more code changes and add complexities.
> 
> Yeah, more complexity, but I can see why you might take that path ;)
> 
> > >> Hence I wonder if removing the dquot hint caching altogether would
> > >> result in smaller, simpler, faster code.  And, in reality, if the
> > >> radix tree lock is a contention point on lookup after removing the
> > >> hints, then we can fix that quite easily by switching to RCU-based
> > >> lockless lookups like we do for the inode cache....
> > > 
> > > Actually, scalability couldn't get any worse by removing the hints.
> > > If I run a concurrent workload with quota enabled, the global dquot
> > > locks (be it user, quota or project) completely serialises the
> > > workload. This result if from u/g/p all enabled, run by a single
> > > user in a single group and a project ID of zero:
> .....
> > > 
> > > FSUse%        Count         Size    Files/sec     App Overhead
> > >      0      1600000            0      17666.5         15377143
> > >      0      3200000            0      17018.6         15922906
> > >      0      4800000            0      17373.5         16149660
> > >      0      6400000            0      16564.9         17234139
> > > ....
> > > 
> > > Without quota enabled, that workload runs at >250,000 files/sec.
> > > 
> > > Serialisation is completely on the dquot locks - so I don't see
> > > anything right now that hints are going to buy us in terms of
> > > improving concurrency or scalability, so I think we probably can
> > > just get rid of them.
> > > 
> > > FWIW, getting rid of the hints and converting the dquot reference
> > > counter to an atomic actually improves performance a bit:
> > > 
> > > FSUse%        Count         Size    Files/sec     App Overhead
> > >      0      1600000            0      17559.3         15606077
> > >      0      3200000            0      18738.9         14026009
> > >      0      4800000            0      18960.0         14381162
> > >      0      6400000            0      19026.5         14422024
> > >      0      8000000            0      18456.6         15369059
> > > 
> > > Sure, 10% improvement is 10%, but concurrency still sucks. At least
> > > it narrows down the cause - the transactional modifications are the
> > > serialisation issue.
> > Admire!! I'm still in considering of remove the hints but you have already
> > shown the measuring results. :)
> 
> Oh, removing 200 lines of code is easy ;)
> 
> It's making the dquot transaction code lockless that's hard...

Well, the easy bit is prototyped - xfs_trans_dqresv() uses cmpxchg
loops now:

FSUse%        Count         Size    Files/sec     App Overhead
     0      1600000            0      24487.3         13020824
     0      3200000            0      24399.3         13333231
     0      4800000            0      24268.4         14574793
     0      6400000            0      24026.3         14918698

And all the lock contention is locking the dquots during
xfs_trans_commit():

-   3.44%  [kernel]  [k] __mutex_lock_slowpath
   - __mutex_lock_slowpath
      - 99.56% mutex_lock
         - 83.30% xfs_trans_dqlockedjoin
              xfs_trans_apply_dquot_deltas
              xfs_trans_commit
              xfs_create
              xfs_vn_mknod
              xfs_vn_create
              vfs_create
....

I have a couple of interesting thoughts on how to deal with this;
it involves building on top of Christoph's log formatting patch
series and pushing the dquot modifications all the way into the
->iop_format method where they can be applied locklessly to the
dquot....

If I can make that work, then I need to step back and have a deep
think about how we can apply the technique generically at an object
level, because if we can do this then lockless, concurrent
transactions are possible for various types of objects. I can see
much joy coming from not having to lock the inode to update the file
size transactionally during IO completion....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 2/3] xfs: fix infinite loop by detaching the group/project hints from user dquot
  2013-12-09  2:36           ` Dave Chinner
  2013-12-09  3:26             ` Jeff Liu
@ 2013-12-09  7:10             ` Christoph Hellwig
  2013-12-09 23:07               ` Ben Myers
  1 sibling, 1 reply; 12+ messages in thread
From: Christoph Hellwig @ 2013-12-09  7:10 UTC (permalink / raw)
  To: Dave Chinner; +Cc: Christoph Hellwig, Jeff Liu, Ben Myers, xfs

The hint removal gets my vote.  I actually had it written up the last
time I touch the quota code, but for some reasons I stopped before
sumitting it.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 2/3] xfs: fix infinite loop by detaching the group/project hints from user dquot
  2013-12-09  7:10             ` Christoph Hellwig
@ 2013-12-09 23:07               ` Ben Myers
  0 siblings, 0 replies; 12+ messages in thread
From: Ben Myers @ 2013-12-09 23:07 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Jeff Liu, xfs

On Sun, Dec 08, 2013 at 11:10:03PM -0800, Christoph Hellwig wrote:
> The hint removal gets my vote.  I actually had it written up the last
> time I touch the quota code, but for some reasons I stopped before
> sumitting it.

Sounds good to me too.

-Ben

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2013-12-09 23:07 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-11-26 13:38 [PATCH v2 2/3] xfs: fix infinite loop by detaching the group/project hints from user dquot Jeff Liu
2013-11-28 10:43 ` Christoph Hellwig
2013-11-29  9:36   ` Jeff Liu
2013-12-06 21:01     ` Ben Myers
2013-12-07  5:51       ` Jeff Liu
2013-12-09  1:26         ` Dave Chinner
2013-12-09  2:36           ` Dave Chinner
2013-12-09  3:26             ` Jeff Liu
2013-12-09  6:09               ` Dave Chinner
2013-12-09  6:36                 ` Dave Chinner
2013-12-09  7:10             ` Christoph Hellwig
2013-12-09 23:07               ` Ben Myers

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.