All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/3] xfs: dquot modification scalability
@ 2013-12-12  9:40 Dave Chinner
  2013-12-12  9:40 ` [PATCH 1/3] xfs: remote dquot hints Dave Chinner
                   ` (4 more replies)
  0 siblings, 5 replies; 13+ messages in thread
From: Dave Chinner @ 2013-12-12  9:40 UTC (permalink / raw)
  To: xfs

Hi folks,

I recently made the mistake of enabling quotas on one of my regular
scalability tests - concurrent file creates - and discovered that
the quota modification serialised the entire workload. Not good.

Only two of these patches are really scalability patches - the first
patch in the series is a cleanup that gets rid of dquot hints.

The first scalability change is to not require the dquot lock when
taking references to the dquot. This is done simply by converting
the reference count to an atomic and replacing all all operations
with equivalent atomic variable operations. This means that we can
remove the dquot lock from xfs_qm_dqhold(). Further optimisations
can be done on the release of references, but that is not done in
this patch or in this patch set.

Getting rid of the dquot lock from the hold code moves the
contention point to the transaction subsystem - xfs_trans_dqresv and
the transaction commit code. The second scalability change it to
make xfs_trans_dqresv() lockless by using cmpxchg rather than the
dquot lock for updating the reservations. We don't really need to
hold the dquot lock to check the quota limits as the limits almost
never change - it's really only the reservation that we care about
here, and if that changes between the check and the cmpxchg, then
we'll go around the loop and check the limits again with the newly
sampled reservation...

Overall, these patches improve workload performance from around
16,500 creates/s to about 24,000 creates/s. While 25% improvement is
nothing to complain about, performance without quotas is about
250,000 creates/s. So there's still a lot of ground to make up here.

The patchset moves the contention almost entirely to the transaction
commit code, along with the xfs_qm_dqrele calls in xfs_create (about
15% of the overall locks contention). Fixing the transaction commit
code is a major piece of work and where the order of magnitude
improvement will come from, but I haven't quite figured it all
out yet. The dqrele code is simpler, so I'll probably have a patch
soon for that - it'll give another 10% improvement on what we have
now...

Cheers,

Dave.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH 1/3] xfs: remote dquot hints
  2013-12-12  9:40 [PATCH 0/3] xfs: dquot modification scalability Dave Chinner
@ 2013-12-12  9:40 ` Dave Chinner
  2013-12-12 18:33   ` Christoph Hellwig
  2013-12-12  9:40 ` [PATCH 2/3] xfs: dquot refcounting by atomics Dave Chinner
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 13+ messages in thread
From: Dave Chinner @ 2013-12-12  9:40 UTC (permalink / raw)
  To: xfs

From: Dave Chinner <dchinner@redhat.com>

group and project quota hints are currently stored on the user
dquot. If we are attaching quotas to the inode, then the group and
project dquots are stored as hints on the user dquot to save having
to look them up again later.

The thing is, the hints are not used for that inode for the rest of
the life of the inode - the dquots are attached directly to the
inode itself - so the only time the hints are used is when an inode
first has dquots attached.

When the hints on the user dquot don't match the dquots being
attache dto the inode, they are then removed and replaced with the
new hints. If a user is concurrently modifying files in different
group and/or project contexts, then this leads to thrashing of the
hints attached to user dquot.

If user quotas are not enabled, then hints are never even used.

So, if the hints are used to avoid the cost of the lookup, is the
cost of the lookup significant enough to justify the hint
infrstructure? Maybe it was once, when there was a global quota
manager shared between all XFS filesystems and was hash table based.

However, lookups are now much simpler, requiring only a single lock and
radix tree lookup local to the filesystem and no hash or LRU
manipulations to be made. Hence the cost of lookup is much lower
than when hints were implemented. Turns out that benchmarks show
that, too, with thir being no differnce in performance when doing
file creation workloads as a single user with user, group and
project quotas enabled - the hints do not make the code go any
faster. In fact, removing the hints shows a 2-3% reduction in the
time it takes to create 50 million inodes....

So, let's just get rid of the hints and the complexity around them.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/xfs_dquot.c |  53 ++-----------
 fs/xfs/xfs_dquot.h |   2 -
 fs/xfs/xfs_qm.c    | 214 ++++++-----------------------------------------------
 3 files changed, 29 insertions(+), 240 deletions(-)

diff --git a/fs/xfs/xfs_dquot.c b/fs/xfs/xfs_dquot.c
index 6b1e695..4ce4984 100644
--- a/fs/xfs/xfs_dquot.c
+++ b/fs/xfs/xfs_dquot.c
@@ -831,47 +831,6 @@ restart:
 	return (0);
 }
 
-
-STATIC void
-xfs_qm_dqput_final(
-	struct xfs_dquot	*dqp)
-{
-	struct xfs_quotainfo	*qi = dqp->q_mount->m_quotainfo;
-	struct xfs_dquot	*gdqp;
-	struct xfs_dquot	*pdqp;
-
-	trace_xfs_dqput_free(dqp);
-
-	if (list_lru_add(&qi->qi_lru, &dqp->q_lru))
-		XFS_STATS_INC(xs_qm_dquot_unused);
-
-	/*
-	 * If we just added a udquot to the freelist, then we want to release
-	 * the gdquot/pdquot reference that it (probably) has. Otherwise it'll
-	 * keep the gdquot/pdquot from getting reclaimed.
-	 */
-	gdqp = dqp->q_gdquot;
-	if (gdqp) {
-		xfs_dqlock(gdqp);
-		dqp->q_gdquot = NULL;
-	}
-
-	pdqp = dqp->q_pdquot;
-	if (pdqp) {
-		xfs_dqlock(pdqp);
-		dqp->q_pdquot = NULL;
-	}
-	xfs_dqunlock(dqp);
-
-	/*
-	 * If we had a group/project quota hint, release it now.
-	 */
-	if (gdqp)
-		xfs_qm_dqput(gdqp);
-	if (pdqp)
-		xfs_qm_dqput(pdqp);
-}
-
 /*
  * Release a reference to the dquot (decrement ref-count) and unlock it.
  *
@@ -887,10 +846,14 @@ xfs_qm_dqput(
 
 	trace_xfs_dqput(dqp);
 
-	if (--dqp->q_nrefs > 0)
-		xfs_dqunlock(dqp);
-	else
-		xfs_qm_dqput_final(dqp);
+	if (--dqp->q_nrefs == 0) {
+		struct xfs_quotainfo	*qi = dqp->q_mount->m_quotainfo;
+		trace_xfs_dqput_free(dqp);
+
+		if (list_lru_add(&qi->qi_lru, &dqp->q_lru))
+			XFS_STATS_INC(xs_qm_dquot_unused);
+	}
+	xfs_dqunlock(dqp);
 }
 
 /*
diff --git a/fs/xfs/xfs_dquot.h b/fs/xfs/xfs_dquot.h
index d22ed00..68a68f7 100644
--- a/fs/xfs/xfs_dquot.h
+++ b/fs/xfs/xfs_dquot.h
@@ -52,8 +52,6 @@ typedef struct xfs_dquot {
 	int		 q_bufoffset;	/* off of dq in buffer (# dquots) */
 	xfs_fileoff_t	 q_fileoffset;	/* offset in quotas file */
 
-	struct xfs_dquot*q_gdquot;	/* group dquot, hint only */
-	struct xfs_dquot*q_pdquot;	/* project dquot, hint only */
 	xfs_disk_dquot_t q_core;	/* actual usage & quotas */
 	xfs_dq_logitem_t q_logitem;	/* dquot log item */
 	xfs_qcnt_t	 q_res_bcount;	/* total regular nblks used+reserved */
diff --git a/fs/xfs/xfs_qm.c b/fs/xfs/xfs_qm.c
index dd88f0e..d31b88e 100644
--- a/fs/xfs/xfs_qm.c
+++ b/fs/xfs/xfs_qm.c
@@ -193,47 +193,6 @@ xfs_qm_dqpurge(
 }
 
 /*
- * Release the group or project dquot pointers the user dquots maybe carrying
- * around as a hint, and proceed to purge the user dquot cache if requested.
-*/
-STATIC int
-xfs_qm_dqpurge_hints(
-	struct xfs_dquot	*dqp,
-	void			*data)
-{
-	struct xfs_dquot	*gdqp = NULL;
-	struct xfs_dquot	*pdqp = NULL;
-	uint			flags = *((uint *)data);
-
-	xfs_dqlock(dqp);
-	if (dqp->dq_flags & XFS_DQ_FREEING) {
-		xfs_dqunlock(dqp);
-		return EAGAIN;
-	}
-
-	/* If this quota has a hint attached, prepare for releasing it now */
-	gdqp = dqp->q_gdquot;
-	if (gdqp)
-		dqp->q_gdquot = NULL;
-
-	pdqp = dqp->q_pdquot;
-	if (pdqp)
-		dqp->q_pdquot = NULL;
-
-	xfs_dqunlock(dqp);
-
-	if (gdqp)
-		xfs_qm_dqrele(gdqp);
-	if (pdqp)
-		xfs_qm_dqrele(pdqp);
-
-	if (flags & XFS_QMOPT_UQUOTA)
-		return xfs_qm_dqpurge(dqp, NULL);
-
-	return 0;
-}
-
-/*
  * Purge the dquot cache.
  */
 void
@@ -241,18 +200,8 @@ xfs_qm_dqpurge_all(
 	struct xfs_mount	*mp,
 	uint			flags)
 {
-	/*
-	 * We have to release group/project dquot hint(s) from the user dquot
-	 * at first if they are there, otherwise we would run into an infinite
-	 * loop while walking through radix tree to purge other type of dquots
-	 * since their refcount is not zero if the user dquot refers to them
-	 * as hint.
-	 *
-	 * Call the special xfs_qm_dqpurge_hints() will end up go through the
-	 * general xfs_qm_dqpurge() against user dquot cache if requested.
-	 */
-	xfs_qm_dquot_walk(mp, XFS_DQ_USER, xfs_qm_dqpurge_hints, &flags);
-
+	if (flags & XFS_QMOPT_UQUOTA)
+		xfs_qm_dquot_walk(mp, XFS_DQ_USER, xfs_qm_dqpurge, NULL);
 	if (flags & XFS_QMOPT_GQUOTA)
 		xfs_qm_dquot_walk(mp, XFS_DQ_GROUP, xfs_qm_dqpurge, NULL);
 	if (flags & XFS_QMOPT_PQUOTA)
@@ -409,7 +358,6 @@ xfs_qm_dqattach_one(
 	xfs_dqid_t	id,
 	uint		type,
 	uint		doalloc,
-	xfs_dquot_t	*udqhint, /* hint */
 	xfs_dquot_t	**IO_idqpp)
 {
 	xfs_dquot_t	*dqp;
@@ -419,9 +367,9 @@ xfs_qm_dqattach_one(
 	error = 0;
 
 	/*
-	 * See if we already have it in the inode itself. IO_idqpp is
-	 * &i_udquot or &i_gdquot. This made the code look weird, but
-	 * made the logic a lot simpler.
+	 * See if we already have it in the inode itself. IO_idqpp is &i_udquot
+	 * or &i_gdquot. This made the code look weird, but made the logic a lot
+	 * simpler.
 	 */
 	dqp = *IO_idqpp;
 	if (dqp) {
@@ -430,49 +378,10 @@ xfs_qm_dqattach_one(
 	}
 
 	/*
-	 * udqhint is the i_udquot field in inode, and is non-NULL only
-	 * when the type arg is group/project. Its purpose is to save a
-	 * lookup by dqid (xfs_qm_dqget) by caching a group dquot inside
-	 * the user dquot.
-	 */
-	if (udqhint) {
-		ASSERT(type == XFS_DQ_GROUP || type == XFS_DQ_PROJ);
-		xfs_dqlock(udqhint);
-
-		/*
-		 * No need to take dqlock to look at the id.
-		 *
-		 * The ID can't change until it gets reclaimed, and it won't
-		 * be reclaimed as long as we have a ref from inode and we
-		 * hold the ilock.
-		 */
-		if (type == XFS_DQ_GROUP)
-			dqp = udqhint->q_gdquot;
-		else
-			dqp = udqhint->q_pdquot;
-		if (dqp && be32_to_cpu(dqp->q_core.d_id) == id) {
-			ASSERT(*IO_idqpp == NULL);
-
-			*IO_idqpp = xfs_qm_dqhold(dqp);
-			xfs_dqunlock(udqhint);
-			return 0;
-		}
-
-		/*
-		 * We can't hold a dquot lock when we call the dqget code.
-		 * We'll deadlock in no time, because of (not conforming to)
-		 * lock ordering - the inodelock comes before any dquot lock,
-		 * and we may drop and reacquire the ilock in xfs_qm_dqget().
-		 */
-		xfs_dqunlock(udqhint);
-	}
-
-	/*
-	 * Find the dquot from somewhere. This bumps the
-	 * reference count of dquot and returns it locked.
-	 * This can return ENOENT if dquot didn't exist on
-	 * disk and we didn't ask it to allocate;
-	 * ESRCH if quotas got turned off suddenly.
+	 * Find the dquot from somewhere. This bumps the reference count of
+	 * dquot and returns it locked.  This can return ENOENT if dquot didn't
+	 * exist on disk and we didn't ask it to allocate; ESRCH if quotas got
+	 * turned off suddenly.
 	 */
 	error = xfs_qm_dqget(ip->i_mount, ip, id, type,
 			     doalloc | XFS_QMOPT_DOWARN, &dqp);
@@ -490,48 +399,6 @@ xfs_qm_dqattach_one(
 	return 0;
 }
 
-
-/*
- * Given a udquot and group/project type, attach the group/project
- * dquot pointer to the udquot as a hint for future lookups.
- */
-STATIC void
-xfs_qm_dqattach_hint(
-	struct xfs_inode	*ip,
-	int			type)
-{
-	struct xfs_dquot **dqhintp;
-	struct xfs_dquot *dqp;
-	struct xfs_dquot *udq = ip->i_udquot;
-
-	ASSERT(type == XFS_DQ_GROUP || type == XFS_DQ_PROJ);
-
-	xfs_dqlock(udq);
-
-	if (type == XFS_DQ_GROUP) {
-		dqp = ip->i_gdquot;
-		dqhintp = &udq->q_gdquot;
-	} else {
-		dqp = ip->i_pdquot;
-		dqhintp = &udq->q_pdquot;
-	}
-
-	if (*dqhintp) {
-		struct xfs_dquot *tmp;
-
-		if (*dqhintp == dqp)
-			goto done;
-
-		tmp = *dqhintp;
-		*dqhintp = NULL;
-		xfs_qm_dqrele(tmp);
-	}
-
-	*dqhintp = xfs_qm_dqhold(dqp);
-done:
-	xfs_dqunlock(udq);
-}
-
 static bool
 xfs_qm_need_dqattach(
 	struct xfs_inode	*ip)
@@ -562,7 +429,6 @@ xfs_qm_dqattach_locked(
 	uint		flags)
 {
 	xfs_mount_t	*mp = ip->i_mount;
-	uint		nquotas = 0;
 	int		error = 0;
 
 	if (!xfs_qm_need_dqattach(ip))
@@ -570,77 +436,39 @@ xfs_qm_dqattach_locked(
 
 	ASSERT(xfs_isilocked(ip, XFS_ILOCK_EXCL));
 
-	if (XFS_IS_UQUOTA_ON(mp)) {
+	if (XFS_IS_UQUOTA_ON(mp) && !ip->i_udquot) {
 		error = xfs_qm_dqattach_one(ip, ip->i_d.di_uid, XFS_DQ_USER,
 						flags & XFS_QMOPT_DQALLOC,
-						NULL, &ip->i_udquot);
+						&ip->i_udquot);
 		if (error)
 			goto done;
-		nquotas++;
+		ASSERT(ip->i_udquot);
 	}
 
-	ASSERT(xfs_isilocked(ip, XFS_ILOCK_EXCL));
-	if (XFS_IS_GQUOTA_ON(mp)) {
+	if (XFS_IS_GQUOTA_ON(mp) && !ip->i_gdquot) {
 		error = xfs_qm_dqattach_one(ip, ip->i_d.di_gid, XFS_DQ_GROUP,
 						flags & XFS_QMOPT_DQALLOC,
-						ip->i_udquot, &ip->i_gdquot);
-		/*
-		 * Don't worry about the udquot that we may have
-		 * attached above. It'll get detached, if not already.
-		 */
+						&ip->i_gdquot);
 		if (error)
 			goto done;
-		nquotas++;
+		ASSERT(ip->i_gdquot);
 	}
 
-	ASSERT(xfs_isilocked(ip, XFS_ILOCK_EXCL));
-	if (XFS_IS_PQUOTA_ON(mp)) {
+	if (XFS_IS_PQUOTA_ON(mp) && !ip->i_pdquot) {
 		error = xfs_qm_dqattach_one(ip, xfs_get_projid(ip), XFS_DQ_PROJ,
 						flags & XFS_QMOPT_DQALLOC,
-						ip->i_udquot, &ip->i_pdquot);
-		/*
-		 * Don't worry about the udquot that we may have
-		 * attached above. It'll get detached, if not already.
-		 */
+						&ip->i_pdquot);
 		if (error)
 			goto done;
-		nquotas++;
+		ASSERT(ip->i_pdquot);
 	}
 
+done:
 	/*
-	 * Attach this group/project quota to the user quota as a hint.
-	 * This WON'T, in general, result in a thrash.
+	 * Don't worry about the dquots that we may have attached before any
+	 * error - they'll get detached later if it has not already been done.
 	 */
-	if (nquotas > 1 && ip->i_udquot) {
-		ASSERT(xfs_isilocked(ip, XFS_ILOCK_EXCL));
-		ASSERT(ip->i_gdquot || !XFS_IS_GQUOTA_ON(mp));
-		ASSERT(ip->i_pdquot || !XFS_IS_PQUOTA_ON(mp));
-
-		/*
-		 * We do not have i_udquot locked at this point, but this check
-		 * is OK since we don't depend on the i_gdquot to be accurate
-		 * 100% all the time. It is just a hint, and this will
-		 * succeed in general.
-		 */
-		if (ip->i_udquot->q_gdquot != ip->i_gdquot)
-			xfs_qm_dqattach_hint(ip, XFS_DQ_GROUP);
-
-		if (ip->i_udquot->q_pdquot != ip->i_pdquot)
-			xfs_qm_dqattach_hint(ip, XFS_DQ_PROJ);
-	}
-
- done:
-#ifdef DEBUG
-	if (!error) {
-		if (XFS_IS_UQUOTA_ON(mp))
-			ASSERT(ip->i_udquot);
-		if (XFS_IS_GQUOTA_ON(mp))
-			ASSERT(ip->i_gdquot);
-		if (XFS_IS_PQUOTA_ON(mp))
-			ASSERT(ip->i_pdquot);
-	}
 	ASSERT(xfs_isilocked(ip, XFS_ILOCK_EXCL));
-#endif
 	return error;
 }
 
-- 
1.8.4.rc3

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 2/3] xfs: dquot refcounting by atomics
  2013-12-12  9:40 [PATCH 0/3] xfs: dquot modification scalability Dave Chinner
  2013-12-12  9:40 ` [PATCH 1/3] xfs: remote dquot hints Dave Chinner
@ 2013-12-12  9:40 ` Dave Chinner
  2013-12-13 13:23   ` Christoph Hellwig
  2013-12-12  9:40 ` [PATCH 3/3] xfs: xfs_trans_dqresv() can be made lockless Dave Chinner
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 13+ messages in thread
From: Dave Chinner @ 2013-12-12  9:40 UTC (permalink / raw)
  To: xfs

From: Dave Chinner <dchinner@redhat.com>

On concurrent workloads, the dquot lock completely serialises the
workload. One of the contributors to that is that taking a reference
count on the dquot requires taking the dquot lock. If we make the
reference count atomic, we don't need to take the lock to bump the
count.

Profiles showed that the reference count locking really hurt:

-   5.02%  [kernel]  [k] __mutex_lock_slowpath
   - __mutex_lock_slowpath
      - 99.66% mutex_lock
         - 31.04% xfs_qm_vop_create_dqattach
         - 30.03% xfs_qm_vop_dqalloc
         - 20.56% xfs_qm_dqrele
         - 9.16% xfs_trans_dqresv
         - 7.31% xfs_trans_dqlockedjoin

Primarily in xfs_qm_vop_create_dqattach and xfs_qm_vop_dqalloc().
Baseline performance looked like:

FSUse%        Count         Size    Files/sec     App Overhead
     0      1600000            0      17666.5         15377143
     0      3200000            0      17018.6         15922906
     0      4800000            0      17373.5         16149660
     0      6400000            0      16564.9         17234139
     0      8000000            0      17022.4         15987230
     0      9600000            0      16684.2         14834507
     0     11200000            0      16770.3         27330353
     0     12800000            0      15921.4         18935868


So, convert the refcount to an atomic, slightly rearrange the dquot
structure to separate read-mostly and contended fields, and the
profile changes drastically:

-   5.54%  [kernel]  [k] __mutex_lock_slowpath
   - __mutex_lock_slowpath
      - 99.67% mutex_lock
         - 45.15% xfs_trans_dqlockedjoin
         - 44.71% xfs_trans_dqresv
         - 8.23% xfs_qm_dqrele

The reference count locking is gone completely and now all
contention is within the transaction subsystem. The result:

FSUse%        Count         Size    Files/sec     App Overhead
     0      1600000            0      17559.3         15606077
     0      3200000            0      18738.9         14026009
     0      4800000            0      18960.0         14381162
     0      6400000            0      19026.5         14422024
     0      8000000            0      18456.6         15369059
     0      9600000            0      17828.4         21075613
     0     11200000            0      17903.9         16474615
     0     12800000            0      17546.0         13919798

is a roughly 10% improvement in performance.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/xfs_dquot.c | 16 ++++++++++------
 fs/xfs/xfs_dquot.h | 16 +++++++---------
 fs/xfs/xfs_qm.c    |  6 +++---
 fs/xfs/xfs_trace.h |  2 +-
 4 files changed, 21 insertions(+), 19 deletions(-)

diff --git a/fs/xfs/xfs_dquot.c b/fs/xfs/xfs_dquot.c
index 4ce4984..975a46c 100644
--- a/fs/xfs/xfs_dquot.c
+++ b/fs/xfs/xfs_dquot.c
@@ -748,7 +748,7 @@ restart:
 			goto restart;
 		}
 
-		dqp->q_nrefs++;
+		atomic_inc(&dqp->q_nrefs);
 		mutex_unlock(&qi->qi_tree_lock);
 
 		trace_xfs_dqget_hit(dqp);
@@ -799,6 +799,12 @@ restart:
 		}
 	}
 
+	/*
+	 * set the reference count before we insert the dquot into the tree
+	 * so it is safe from reclaim by default.
+	 */
+	atomic_set(&dqp->q_nrefs, 1);
+
 	mutex_lock(&qi->qi_tree_lock);
 	error = -radix_tree_insert(tree, id, dqp);
 	if (unlikely(error)) {
@@ -816,11 +822,9 @@ restart:
 	}
 
 	/*
-	 * We return a locked dquot to the caller, with a reference taken
+	 * We return a locked, referenced dquot to the caller.
 	 */
 	xfs_dqlock(dqp);
-	dqp->q_nrefs = 1;
-
 	qi->qi_dquots++;
 	mutex_unlock(&qi->qi_tree_lock);
 
@@ -841,12 +845,12 @@ void
 xfs_qm_dqput(
 	struct xfs_dquot	*dqp)
 {
-	ASSERT(dqp->q_nrefs > 0);
+	ASSERT(atomic_read(&dqp->q_nrefs) > 0);
 	ASSERT(XFS_DQ_IS_LOCKED(dqp));
 
 	trace_xfs_dqput(dqp);
 
-	if (--dqp->q_nrefs == 0) {
+	if (atomic_dec_and_test(&dqp->q_nrefs)) {
 		struct xfs_quotainfo	*qi = dqp->q_mount->m_quotainfo;
 		trace_xfs_dqput_free(dqp);
 
diff --git a/fs/xfs/xfs_dquot.h b/fs/xfs/xfs_dquot.h
index 68a68f7..949a47b 100644
--- a/fs/xfs/xfs_dquot.h
+++ b/fs/xfs/xfs_dquot.h
@@ -44,26 +44,26 @@ enum {
  */
 typedef struct xfs_dquot {
 	uint		 dq_flags;	/* various flags (XFS_DQ_*) */
-	struct list_head q_lru;		/* global free list of dquots */
 	struct xfs_mount*q_mount;	/* filesystem this relates to */
-	struct xfs_trans*q_transp;	/* trans this belongs to currently */
-	uint		 q_nrefs;	/* # active refs from inodes */
 	xfs_daddr_t	 q_blkno;	/* blkno of dquot buffer */
 	int		 q_bufoffset;	/* off of dq in buffer (# dquots) */
 	xfs_fileoff_t	 q_fileoffset;	/* offset in quotas file */
-
-	xfs_disk_dquot_t q_core;	/* actual usage & quotas */
-	xfs_dq_logitem_t q_logitem;	/* dquot log item */
 	xfs_qcnt_t	 q_res_bcount;	/* total regular nblks used+reserved */
 	xfs_qcnt_t	 q_res_icount;	/* total inos allocd+reserved */
 	xfs_qcnt_t	 q_res_rtbcount;/* total realtime blks used+reserved */
 	xfs_qcnt_t	 q_prealloc_lo_wmark;/* prealloc throttle wmark */
 	xfs_qcnt_t	 q_prealloc_hi_wmark;/* prealloc disabled wmark */
 	int64_t		 q_low_space[XFS_QLOWSP_MAX];
+
+	atomic_t	 q_nrefs;	/* # active refs from inodes */
+	xfs_disk_dquot_t q_core;	/* actual usage & quotas */
+	xfs_dq_logitem_t q_logitem;	/* dquot log item */
 	struct mutex	 q_qlock;	/* quota lock */
 	struct completion q_flush;	/* flush completion queue */
 	atomic_t          q_pincount;	/* dquot pin count */
 	wait_queue_head_t q_pinwait;	/* dquot pinning wait queue */
+	struct list_head q_lru;		/* global free list of dquots */
+	struct xfs_trans *q_transp;	/* trans this belongs to currently */
 } xfs_dquot_t;
 
 /*
@@ -164,9 +164,7 @@ extern void		xfs_dquot_set_prealloc_limits(struct xfs_dquot *);
 
 static inline struct xfs_dquot *xfs_qm_dqhold(struct xfs_dquot *dqp)
 {
-	xfs_dqlock(dqp);
-	dqp->q_nrefs++;
-	xfs_dqunlock(dqp);
+	atomic_inc(&dqp->q_nrefs);
 	return dqp;
 }
 
diff --git a/fs/xfs/xfs_qm.c b/fs/xfs/xfs_qm.c
index d31b88e..31c0f85 100644
--- a/fs/xfs/xfs_qm.c
+++ b/fs/xfs/xfs_qm.c
@@ -136,7 +136,7 @@ xfs_qm_dqpurge(
 	struct xfs_quotainfo	*qi = mp->m_quotainfo;
 
 	xfs_dqlock(dqp);
-	if ((dqp->dq_flags & XFS_DQ_FREEING) || dqp->q_nrefs != 0) {
+	if ((dqp->dq_flags & XFS_DQ_FREEING) || atomic_read(&dqp->q_nrefs)) {
 		xfs_dqunlock(dqp);
 		return EAGAIN;
 	}
@@ -540,7 +540,7 @@ xfs_qm_dquot_isolate(
 	 * This dquot has acquired a reference in the meantime remove it from
 	 * the freelist and try again.
 	 */
-	if (dqp->q_nrefs) {
+	if (atomic_read(&dqp->q_nrefs)) {
 		xfs_dqunlock(dqp);
 		XFS_STATS_INC(xs_qm_dqwants);
 
@@ -588,7 +588,7 @@ xfs_qm_dquot_isolate(
 	dqp->dq_flags |= XFS_DQ_FREEING;
 	xfs_dqunlock(dqp);
 
-	ASSERT(dqp->q_nrefs == 0);
+	ASSERT(atomic_read(&dqp->q_nrefs) == 0);
 	list_move_tail(&dqp->q_lru, &isol->dispose);
 	XFS_STATS_DEC(xs_qm_dquot_unused);
 	trace_xfs_dqreclaim_done(dqp);
diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
index 425dfa4..051813c 100644
--- a/fs/xfs/xfs_trace.h
+++ b/fs/xfs/xfs_trace.h
@@ -769,7 +769,7 @@ DECLARE_EVENT_CLASS(xfs_dquot_class,
 		__entry->dev = dqp->q_mount->m_super->s_dev;
 		__entry->id = be32_to_cpu(dqp->q_core.d_id);
 		__entry->flags = dqp->dq_flags;
-		__entry->nrefs = dqp->q_nrefs;
+		__entry->nrefs = atomic_read(&dqp->q_nrefs);
 		__entry->res_bcount = dqp->q_res_bcount;
 		__entry->bcount = be64_to_cpu(dqp->q_core.d_bcount);
 		__entry->icount = be64_to_cpu(dqp->q_core.d_icount);
-- 
1.8.4.rc3

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 3/3] xfs: xfs_trans_dqresv() can be made lockless
  2013-12-12  9:40 [PATCH 0/3] xfs: dquot modification scalability Dave Chinner
  2013-12-12  9:40 ` [PATCH 1/3] xfs: remote dquot hints Dave Chinner
  2013-12-12  9:40 ` [PATCH 2/3] xfs: dquot refcounting by atomics Dave Chinner
@ 2013-12-12  9:40 ` Dave Chinner
  2013-12-13 13:37   ` Christoph Hellwig
  2013-12-12 10:25 ` [PATCH 4/3] xfs: xfs_qm_dqrele mostly doesn't need locking Dave Chinner
  2013-12-13 16:30 ` [PATCH 5/3] xfs: return unlocked dquots from xfs_qm_dqqet Christoph Hellwig
  4 siblings, 1 reply; 13+ messages in thread
From: Dave Chinner @ 2013-12-12  9:40 UTC (permalink / raw)
  To: xfs

From: Dave Chinner <dchinner@redhat.com>

xfs_trans_dqresv() serialises dquot modifications by taking the
dquot lock while it is doing reservations. The thing is, nothing it
does really requires exclusive access to the dquot except for the
reservation accounting. We can do that locklessly with cmpxchg.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/xfs_trans_dquot.c | 372 +++++++++++++++++++++++++++--------------------
 1 file changed, 213 insertions(+), 159 deletions(-)

diff --git a/fs/xfs/xfs_trans_dquot.c b/fs/xfs/xfs_trans_dquot.c
index 4117286..fa89d21 100644
--- a/fs/xfs/xfs_trans_dquot.c
+++ b/fs/xfs/xfs_trans_dquot.c
@@ -33,6 +33,97 @@
 
 STATIC void	xfs_trans_alloc_dqinfo(xfs_trans_t *);
 
+STATIC void
+xfs_quota_warn(
+	struct xfs_mount	*mp,
+	struct xfs_dquot	*dqp,
+	int			type)
+{
+	/* no warnings for project quotas - we just return ENOSPC later */
+	if (dqp->dq_flags & XFS_DQ_PROJ)
+		return;
+	quota_send_warning(make_kqid(&init_user_ns,
+				     (dqp->dq_flags & XFS_DQ_USER) ?
+				     USRQUOTA : GRPQUOTA,
+				     be32_to_cpu(dqp->q_core.d_id)),
+			   mp->m_super->s_dev, type);
+}
+
+/*
+ * See if we'd go over the hardlimit or exceed the timelimit if we allocate
+ * nblks.
+ */
+static bool
+xfs_dqlimits_exceeded(
+	struct xfs_mount	*mp,
+	struct xfs_dquot	*dqp,
+	bool			blklimit,
+	xfs_qcnt_t		total_count,
+	xfs_qcnt_t		hardlimit,
+	xfs_qcnt_t		softlimit,
+	time_t			timer,
+	xfs_qwarncnt_t		warns,
+	xfs_qwarncnt_t		warnlimit)
+{
+	if (hardlimit && total_count > hardlimit) {
+		xfs_quota_warn(mp, dqp, blklimit ? QUOTA_NL_BHARDWARN
+						 : QUOTA_NL_IHARDWARN);
+		return true;
+	}
+
+	if (softlimit && total_count > softlimit) {
+		if ((timer && get_seconds() > timer) ||
+		    (warns && warns >= warnlimit)) {
+			xfs_quota_warn(mp, dqp, blklimit
+						 ? QUOTA_NL_BSOFTLONGWARN
+						 : QUOTA_NL_ISOFTLONGWARN);
+			return true;
+		}
+		xfs_quota_warn(mp, dqp, blklimit ? QUOTA_NL_BSOFTWARN
+						 : QUOTA_NL_ISOFTWARN);
+	}
+	return false;
+}
+
+/*
+ * Make the required reservation, first checking the limits provided (if
+ * required) to see if we'd exceed the quota limits.
+ */
+static xfs_qcnt_t
+xfs_dqresv_cmpxchg(
+	struct xfs_mount	*mp,
+	struct xfs_dquot	*dqp,
+	xfs_qcnt_t		*cntp,
+	xfs_qcnt_t		diff,
+	bool			blklimit,
+	bool			enforce,
+	xfs_qcnt_t		hardlimit,
+	xfs_qcnt_t		softlimit,
+	time_t			timer,
+	xfs_qwarncnt_t		warns,
+	xfs_qwarncnt_t		warnlimit)
+{
+	xfs_qcnt_t	count;
+	xfs_qcnt_t	old;
+
+	do {
+		xfs_qcnt_t	total_count;
+
+		count = ACCESS_ONCE(*cntp);
+		total_count = count + diff;
+		if (enforce &&
+		    xfs_dqlimits_exceeded(mp, dqp, blklimit, total_count,
+					  hardlimit, softlimit, timer, warns,
+					  warnlimit))
+			return -1ULL;
+
+		old = count;
+		count = cmpxchg64(cntp, old, total_count);
+	} while (count != old);
+
+	return old;
+}
+
 /*
  * Add the locked dquot to the transaction.
  * The dquot must be locked, and it cannot be associated with any
@@ -315,6 +406,18 @@ xfs_trans_dqlockedjoin(
 	}
 }
 
+static int64_t
+xfs_dqresv_return(
+	xfs_qcnt_t	resv,
+	xfs_qcnt_t	resv_used,
+	xfs_qcnt_t	delta)
+{
+	if (!resv)
+		return delta;
+	if (resv > resv_used)
+		return resv_used - resv;
+	return resv - resv_used;
+}
 
 /*
  * Called by xfs_trans_commit() and similar in spirit to
@@ -334,6 +437,7 @@ xfs_trans_apply_dquot_deltas(
 	struct xfs_disk_dquot	*d;
 	long			totalbdelta;
 	long			totalrtbdelta;
+	struct xfs_mount	*mp = tp->t_mountp;
 
 	if (!(tp->t_flags & XFS_TRANS_DQ_DIRTY))
 		return;
@@ -350,6 +454,7 @@ xfs_trans_apply_dquot_deltas(
 		xfs_trans_dqlockedjoin(tp, qa);
 
 		for (i = 0; i < XFS_QM_TRANS_MAXDQS; i++) {
+			int64_t		diff;
 			qtrx = &qa[i];
 			/*
 			 * The array of dquots is filled
@@ -419,73 +524,46 @@ xfs_trans_apply_dquot_deltas(
 			 * add this to the list of items to get logged
 			 */
 			xfs_trans_log_dquot(tp, dqp);
+
 			/*
 			 * Take off what's left of the original reservation.
 			 * In case of delayed allocations, there's no
 			 * reservation that a transaction structure knows of.
 			 */
-			if (qtrx->qt_blk_res != 0) {
-				if (qtrx->qt_blk_res != qtrx->qt_blk_res_used) {
-					if (qtrx->qt_blk_res >
-					    qtrx->qt_blk_res_used)
-						dqp->q_res_bcount -= (xfs_qcnt_t)
-							(qtrx->qt_blk_res -
-							 qtrx->qt_blk_res_used);
-					else
-						dqp->q_res_bcount -= (xfs_qcnt_t)
-							(qtrx->qt_blk_res_used -
-							 qtrx->qt_blk_res);
-				}
-			} else {
-				/*
-				 * These blks were never reserved, either inside
-				 * a transaction or outside one (in a delayed
-				 * allocation). Also, this isn't always a
-				 * negative number since we sometimes
-				 * deliberately skip quota reservations.
-				 */
-				if (qtrx->qt_bcount_delta) {
-					dqp->q_res_bcount +=
-					      (xfs_qcnt_t)qtrx->qt_bcount_delta;
-				}
-			}
+			diff = xfs_dqresv_return(qtrx->qt_blk_res,
+						 qtrx->qt_blk_res_used,
+						 qtrx->qt_bcount_delta);
+			if (diff)
+				xfs_dqresv_cmpxchg(mp, dqp, &dqp->q_res_bcount,
+						   diff, true, false, 0, 0, 0,
+						   0, 0);
 			/*
 			 * Adjust the RT reservation.
 			 */
-			if (qtrx->qt_rtblk_res != 0) {
-				if (qtrx->qt_rtblk_res != qtrx->qt_rtblk_res_used) {
-					if (qtrx->qt_rtblk_res >
-					    qtrx->qt_rtblk_res_used)
-					       dqp->q_res_rtbcount -= (xfs_qcnt_t)
-						       (qtrx->qt_rtblk_res -
-							qtrx->qt_rtblk_res_used);
-					else
-					       dqp->q_res_rtbcount -= (xfs_qcnt_t)
-						       (qtrx->qt_rtblk_res_used -
-							qtrx->qt_rtblk_res);
-				}
-			} else {
-				if (qtrx->qt_rtbcount_delta)
-					dqp->q_res_rtbcount +=
-					    (xfs_qcnt_t)qtrx->qt_rtbcount_delta;
-			}
+			diff = xfs_dqresv_return(qtrx->qt_rtblk_res,
+						 qtrx->qt_rtblk_res_used,
+						 qtrx->qt_rtbcount_delta);
+			if (diff)
+				xfs_dqresv_cmpxchg(mp, dqp, &dqp->q_res_rtbcount,
+						   diff, true, false, 0, 0, 0,
+						   0, 0);
 
 			/*
 			 * Adjust the inode reservation.
 			 */
-			if (qtrx->qt_ino_res != 0) {
+			if (qtrx->qt_ino_res == 0)
+				diff = qtrx->qt_icount_delta;
+			else {
 				ASSERT(qtrx->qt_ino_res >=
 				       qtrx->qt_ino_res_used);
-				if (qtrx->qt_ino_res > qtrx->qt_ino_res_used)
-					dqp->q_res_icount -= (xfs_qcnt_t)
-						(qtrx->qt_ino_res -
-						 qtrx->qt_ino_res_used);
-			} else {
-				if (qtrx->qt_icount_delta)
-					dqp->q_res_icount +=
-					    (xfs_qcnt_t)qtrx->qt_icount_delta;
+				diff = qtrx->qt_ino_res - qtrx->qt_ino_res_used;
+				if (diff < 0)
+					diff = 0;
 			}
-
+			if (diff)
+				xfs_dqresv_cmpxchg(mp, dqp, &dqp->q_res_icount,
+						   diff, true, false, 0, 0, 0,
+						   0, 0);
 			ASSERT(dqp->q_res_bcount >=
 				be64_to_cpu(dqp->q_core.d_bcount));
 			ASSERT(dqp->q_res_icount >=
@@ -562,22 +640,6 @@ xfs_trans_unreserve_and_mod_dquots(
 	}
 }
 
-STATIC void
-xfs_quota_warn(
-	struct xfs_mount	*mp,
-	struct xfs_dquot	*dqp,
-	int			type)
-{
-	/* no warnings for project quotas - we just return ENOSPC later */
-	if (dqp->dq_flags & XFS_DQ_PROJ)
-		return;
-	quota_send_warning(make_kqid(&init_user_ns,
-				     (dqp->dq_flags & XFS_DQ_USER) ?
-				     USRQUOTA : GRPQUOTA,
-				     be32_to_cpu(dqp->q_core.d_id)),
-			   mp->m_super->s_dev, type);
-}
-
 /*
  * This reserves disk blocks and inodes against a dquot.
  * Flags indicate if the dquot is to be locked here and also
@@ -591,20 +653,35 @@ xfs_trans_dqresv(
 	xfs_dquot_t	*dqp,
 	long		nblks,
 	long		ninos,
-	uint		flags)
+	uint		flags,
+	bool		enforce)
 {
 	xfs_qcnt_t	hardlimit;
 	xfs_qcnt_t	softlimit;
 	time_t		timer;
 	xfs_qwarncnt_t	warns;
 	xfs_qwarncnt_t	warnlimit;
-	xfs_qcnt_t	total_count;
+	xfs_qcnt_t	oldcnt;
 	xfs_qcnt_t	*resbcountp;
 	xfs_quotainfo_t	*q = mp->m_quotainfo;
 
+	/*
+	 * Lockless reservation algorithm:
+	 *
+	 * sample block count, inode count, timers and limits
+	 * cmpxchg loop to modify block reservation
+	 *	check limits:
+	 *		if over, check limits have not changed
+	 *			no change, fail
+	 *	cmpxchg block reservation
+	 *
+	 * if transaction, modify transaction context w/ change deltas.
+	 *	no locks required for this as context is private to transaction.
+	 */
+	if (nblks == 0)
+		goto do_ninos;
 
-	xfs_dqlock(dqp);
-
+	smp_mb();
 	if (flags & XFS_TRANS_DQ_RES_BLKS) {
 		hardlimit = be64_to_cpu(dqp->q_core.d_blk_hardlimit);
 		if (!hardlimit)
@@ -630,69 +707,35 @@ xfs_trans_dqresv(
 		resbcountp = &dqp->q_res_rtbcount;
 	}
 
-	if ((flags & XFS_QMOPT_FORCE_RES) == 0 &&
-	    dqp->q_core.d_id &&
-	    ((XFS_IS_UQUOTA_ENFORCED(dqp->q_mount) && XFS_QM_ISUDQ(dqp)) ||
-	     (XFS_IS_GQUOTA_ENFORCED(dqp->q_mount) && XFS_QM_ISGDQ(dqp)) ||
-	     (XFS_IS_PQUOTA_ENFORCED(dqp->q_mount) && XFS_QM_ISPDQ(dqp)))) {
-		if (nblks > 0) {
-			/*
-			 * dquot is locked already. See if we'd go over the
-			 * hardlimit or exceed the timelimit if we allocate
-			 * nblks.
-			 */
-			total_count = *resbcountp + nblks;
-			if (hardlimit && total_count > hardlimit) {
-				xfs_quota_warn(mp, dqp, QUOTA_NL_BHARDWARN);
-				goto error_return;
-			}
-			if (softlimit && total_count > softlimit) {
-				if ((timer != 0 && get_seconds() > timer) ||
-				    (warns != 0 && warns >= warnlimit)) {
-					xfs_quota_warn(mp, dqp,
-						       QUOTA_NL_BSOFTLONGWARN);
-					goto error_return;
-				}
-
-				xfs_quota_warn(mp, dqp, QUOTA_NL_BSOFTWARN);
-			}
-		}
-		if (ninos > 0) {
-			total_count = be64_to_cpu(dqp->q_core.d_icount) + ninos;
-			timer = be32_to_cpu(dqp->q_core.d_itimer);
-			warns = be16_to_cpu(dqp->q_core.d_iwarns);
-			warnlimit = dqp->q_mount->m_quotainfo->qi_iwarnlimit;
-			hardlimit = be64_to_cpu(dqp->q_core.d_ino_hardlimit);
-			if (!hardlimit)
-				hardlimit = q->qi_ihardlimit;
-			softlimit = be64_to_cpu(dqp->q_core.d_ino_softlimit);
-			if (!softlimit)
-				softlimit = q->qi_isoftlimit;
-
-			if (hardlimit && total_count > hardlimit) {
-				xfs_quota_warn(mp, dqp, QUOTA_NL_IHARDWARN);
-				goto error_return;
-			}
-			if (softlimit && total_count > softlimit) {
-				if  ((timer != 0 && get_seconds() > timer) ||
-				     (warns != 0 && warns >= warnlimit)) {
-					xfs_quota_warn(mp, dqp,
-						       QUOTA_NL_ISOFTLONGWARN);
-					goto error_return;
-				}
-				xfs_quota_warn(mp, dqp, QUOTA_NL_ISOFTWARN);
-			}
-		}
-	}
-
-	/*
-	 * Change the reservation, but not the actual usage.
-	 * Note that q_res_bcount = q_core.d_bcount + resv
-	 */
-	(*resbcountp) += (xfs_qcnt_t)nblks;
-	if (ninos != 0)
-		dqp->q_res_icount += (xfs_qcnt_t)ninos;
-
+	oldcnt = xfs_dqresv_cmpxchg(mp, dqp, resbcountp, nblks, true, enforce,
+				    hardlimit, softlimit, timer, warns,
+				    warnlimit);
+	if (oldcnt == (xfs_qcnt_t)-1ULL)
+		goto error_return;
+
+do_ninos:
+	if (ninos == 0)
+		goto do_trans;
+
+	smp_mb();
+	timer = be32_to_cpu(dqp->q_core.d_itimer);
+	warns = be16_to_cpu(dqp->q_core.d_iwarns);
+	warnlimit = dqp->q_mount->m_quotainfo->qi_iwarnlimit;
+	hardlimit = be64_to_cpu(dqp->q_core.d_ino_hardlimit);
+	if (!hardlimit)
+		hardlimit = q->qi_ihardlimit;
+	softlimit = be64_to_cpu(dqp->q_core.d_ino_softlimit);
+	if (!softlimit)
+		softlimit = q->qi_isoftlimit;
+	resbcountp = &dqp->q_res_icount;
+
+	oldcnt = xfs_dqresv_cmpxchg(mp, dqp, resbcountp, ninos, false, enforce,
+				    hardlimit, softlimit, timer, warns,
+				    warnlimit);
+	if (oldcnt == (xfs_qcnt_t)-1ULL)
+		goto error_undo_nblks;
+
+do_trans:
 	/*
 	 * note the reservation amt in the trans struct too,
 	 * so that the transaction knows how much was reserved by
@@ -700,27 +743,30 @@ xfs_trans_dqresv(
 	 * We don't do this when we are reserving for a delayed allocation,
 	 * because we don't have the luxury of a transaction envelope then.
 	 */
-	if (tp) {
-		ASSERT(tp->t_dqinfo);
-		ASSERT(flags & XFS_QMOPT_RESBLK_MASK);
-		if (nblks != 0)
-			xfs_trans_mod_dquot(tp, dqp,
-					    flags & XFS_QMOPT_RESBLK_MASK,
-					    nblks);
-		if (ninos != 0)
-			xfs_trans_mod_dquot(tp, dqp,
-					    XFS_TRANS_DQ_RES_INOS,
-					    ninos);
-	}
-	ASSERT(dqp->q_res_bcount >= be64_to_cpu(dqp->q_core.d_bcount));
-	ASSERT(dqp->q_res_rtbcount >= be64_to_cpu(dqp->q_core.d_rtbcount));
-	ASSERT(dqp->q_res_icount >= be64_to_cpu(dqp->q_core.d_icount));
+	if (!tp)
+		return 0;
 
-	xfs_dqunlock(dqp);
+	ASSERT(tp->t_dqinfo);
+	ASSERT(flags & XFS_QMOPT_RESBLK_MASK);
+	if (nblks)
+		xfs_trans_mod_dquot(tp, dqp, flags & XFS_QMOPT_RESBLK_MASK,
+				    nblks);
+	if (ninos != 0)
+		xfs_trans_mod_dquot(tp, dqp, XFS_TRANS_DQ_RES_INOS, ninos);
 	return 0;
 
+error_undo_nblks:
+	/* ninos reservation failed, so if we changed nblks, undo that. */
+	if (nblks) {
+		if (flags & XFS_TRANS_DQ_RES_BLKS)
+			resbcountp = &dqp->q_res_bcount;
+		else
+			resbcountp = &dqp->q_res_rtbcount;
+		xfs_dqresv_cmpxchg(mp, dqp, resbcountp, -nblks, true, false,
+			           0, 0, 0, 0, 0);
+	}
+
 error_return:
-	xfs_dqunlock(dqp);
 	if (flags & XFS_QMOPT_ENOSPC)
 		return ENOSPC;
 	return EDQUOT;
@@ -751,6 +797,7 @@ xfs_trans_reserve_quota_bydquots(
 	uint			flags)
 {
 	int		error;
+	bool		enforce;
 
 	if (!XFS_IS_QUOTA_RUNNING(mp) || !XFS_IS_QUOTA_ON(mp))
 		return 0;
@@ -761,20 +808,28 @@ xfs_trans_reserve_quota_bydquots(
 	ASSERT(flags & XFS_QMOPT_RESBLK_MASK);
 
 	if (udqp) {
+		enforce = !(flags & XFS_QMOPT_FORCE_RES) &&
+			  udqp->q_core.d_id && XFS_IS_UQUOTA_ENFORCED(mp);
 		error = xfs_trans_dqresv(tp, mp, udqp, nblks, ninos,
-					(flags & ~XFS_QMOPT_ENOSPC));
+					(flags & ~XFS_QMOPT_ENOSPC), enforce);
 		if (error)
 			return error;
 	}
 
 	if (gdqp) {
-		error = xfs_trans_dqresv(tp, mp, gdqp, nblks, ninos, flags);
+		enforce = !(flags & XFS_QMOPT_FORCE_RES) &&
+			  gdqp->q_core.d_id && XFS_IS_GQUOTA_ENFORCED(mp);
+		error = xfs_trans_dqresv(tp, mp, gdqp, nblks, ninos, flags,
+					 enforce);
 		if (error)
 			goto unwind_usr;
 	}
 
 	if (pdqp) {
-		error = xfs_trans_dqresv(tp, mp, pdqp, nblks, ninos, flags);
+		enforce = !(flags & XFS_QMOPT_FORCE_RES) &&
+			  pdqp->q_core.d_id && XFS_IS_PQUOTA_ENFORCED(mp);
+		error = xfs_trans_dqresv(tp, mp, pdqp, nblks, ninos, flags,
+					 enforce);
 		if (error)
 			goto unwind_grp;
 	}
@@ -784,14 +839,13 @@ xfs_trans_reserve_quota_bydquots(
 	 */
 	return 0;
 
+	/* unwinding does not require limit enforcement. */
 unwind_grp:
-	flags |= XFS_QMOPT_FORCE_RES;
 	if (gdqp)
-		xfs_trans_dqresv(tp, mp, gdqp, -nblks, -ninos, flags);
+		xfs_trans_dqresv(tp, mp, gdqp, -nblks, -ninos, flags, false);
 unwind_usr:
-	flags |= XFS_QMOPT_FORCE_RES;
 	if (udqp)
-		xfs_trans_dqresv(tp, mp, udqp, -nblks, -ninos, flags);
+		xfs_trans_dqresv(tp, mp, udqp, -nblks, -ninos, flags, false);
 	return error;
 }
 
-- 
1.8.4.rc3

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 4/3] xfs: xfs_qm_dqrele mostly doesn't need locking
  2013-12-12  9:40 [PATCH 0/3] xfs: dquot modification scalability Dave Chinner
                   ` (2 preceding siblings ...)
  2013-12-12  9:40 ` [PATCH 3/3] xfs: xfs_trans_dqresv() can be made lockless Dave Chinner
@ 2013-12-12 10:25 ` Dave Chinner
  2013-12-13 13:28   ` Christoph Hellwig
  2013-12-13 16:30 ` [PATCH 5/3] xfs: return unlocked dquots from xfs_qm_dqqet Christoph Hellwig
  4 siblings, 1 reply; 13+ messages in thread
From: Dave Chinner @ 2013-12-12 10:25 UTC (permalink / raw)
  To: xfs


From: Dave Chinner <dchinner@redhat.com>

Now that we have an atomic variable for the reference count, we
don't need to take the dquot lock if we are not removing the last
reference count. The dquot lock is a mutex, so we can't use
atomic_dec_and_lock(), but we can open code it in xfs_qm_dqrele and
hence avoid the dquot lock for most of the cases where we drop a
reference count.

The result is that concurrent file creates jump from 24,000/s to
28,000/s, and the entire workload is now serialised on the dquot
being locked during transaction commit. Another significant win,
even though it's not the big one...

While there, rename xfs_qm_dqrele to xfs_dqrele - the "qm" part of
the name means nothing and just makes the code harder to read.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/xfs_dquot.c       | 17 +++++++++--------
 fs/xfs/xfs_inode.c       | 12 ++++++------
 fs/xfs/xfs_ioctl.c       | 10 +++++-----
 fs/xfs/xfs_iops.c        | 12 ++++++------
 fs/xfs/xfs_qm.c          | 16 ++++++++--------
 fs/xfs/xfs_qm_syscalls.c |  8 ++++----
 fs/xfs/xfs_quota.h       |  4 ++--
 fs/xfs/xfs_symlink.c     | 12 ++++++------
 8 files changed, 46 insertions(+), 45 deletions(-)

diff --git a/fs/xfs/xfs_dquot.c b/fs/xfs/xfs_dquot.c
index 975a46c..f17350d 100644
--- a/fs/xfs/xfs_dquot.c
+++ b/fs/xfs/xfs_dquot.c
@@ -861,11 +861,10 @@ xfs_qm_dqput(
 }
 
 /*
- * Release a dquot. Flush it if dirty, then dqput() it.
- * dquot must not be locked.
+ * Release a dquot. The dquot must not be locked.
  */
 void
-xfs_qm_dqrele(
+xfs_dqrele(
 	xfs_dquot_t	*dqp)
 {
 	if (!dqp)
@@ -873,13 +872,15 @@ xfs_qm_dqrele(
 
 	trace_xfs_dqrele(dqp);
 
-	xfs_dqlock(dqp);
 	/*
-	 * We don't care to flush it if the dquot is dirty here.
-	 * That will create stutters that we want to avoid.
-	 * Instead we do a delayed write when we try to reclaim
-	 * a dirty dquot. Also xfs_sync will take part of the burden...
+	 * If this is not the final reference, we don't need to take the dquot
+	 * lock at all. This is effectively a mutex based atomic_dec_and_lock()
+	 * operation.
 	 */
+	if (atomic_add_unless(&dqp->q_nrefs, -1, 1))
+		return;
+
+	xfs_dqlock(dqp);
 	xfs_qm_dqput(dqp);
 }
 
diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
index 001aa89..2442c57 100644
--- a/fs/xfs/xfs_inode.c
+++ b/fs/xfs/xfs_inode.c
@@ -1305,9 +1305,9 @@ xfs_create(
 	if (error)
 		goto out_release_inode;
 
-	xfs_qm_dqrele(udqp);
-	xfs_qm_dqrele(gdqp);
-	xfs_qm_dqrele(pdqp);
+	xfs_dqrele(udqp);
+	xfs_dqrele(gdqp);
+	xfs_dqrele(pdqp);
 
 	*ipp = ip;
 	return 0;
@@ -1327,9 +1327,9 @@ xfs_create(
 	if (ip)
 		IRELE(ip);
 
-	xfs_qm_dqrele(udqp);
-	xfs_qm_dqrele(gdqp);
-	xfs_qm_dqrele(pdqp);
+	xfs_dqrele(udqp);
+	xfs_dqrele(gdqp);
+	xfs_dqrele(pdqp);
 
 	if (unlock_dp_on_error)
 		xfs_iunlock(dp, XFS_ILOCK_EXCL);
diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c
index 33ad9a7..13d60b9 100644
--- a/fs/xfs/xfs_ioctl.c
+++ b/fs/xfs/xfs_ioctl.c
@@ -1300,15 +1300,15 @@ xfs_ioctl_setattr(
 	/*
 	 * Release any dquot(s) the inode had kept before chown.
 	 */
-	xfs_qm_dqrele(olddquot);
-	xfs_qm_dqrele(udqp);
-	xfs_qm_dqrele(pdqp);
+	xfs_dqrele(olddquot);
+	xfs_dqrele(udqp);
+	xfs_dqrele(pdqp);
 
 	return code;
 
  error_return:
-	xfs_qm_dqrele(udqp);
-	xfs_qm_dqrele(pdqp);
+	xfs_dqrele(udqp);
+	xfs_dqrele(pdqp);
 	xfs_trans_cancel(tp, 0);
 	if (lock_flags)
 		xfs_iunlock(ip, lock_flags);
diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c
index 0ce1d75..c5ad890 100644
--- a/fs/xfs/xfs_iops.c
+++ b/fs/xfs/xfs_iops.c
@@ -672,10 +672,10 @@ xfs_setattr_nonsize(
 	/*
 	 * Release any dquot(s) the inode had kept before chown.
 	 */
-	xfs_qm_dqrele(olddquot1);
-	xfs_qm_dqrele(olddquot2);
-	xfs_qm_dqrele(udqp);
-	xfs_qm_dqrele(gdqp);
+	xfs_dqrele(olddquot1);
+	xfs_dqrele(olddquot2);
+	xfs_dqrele(udqp);
+	xfs_dqrele(gdqp);
 
 	if (error)
 		return XFS_ERROR(error);
@@ -699,8 +699,8 @@ out_trans_cancel:
 	xfs_trans_cancel(tp, 0);
 	xfs_iunlock(ip, XFS_ILOCK_EXCL);
 out_dqrele:
-	xfs_qm_dqrele(udqp);
-	xfs_qm_dqrele(gdqp);
+	xfs_dqrele(udqp);
+	xfs_dqrele(gdqp);
 	return error;
 }
 
diff --git a/fs/xfs/xfs_qm.c b/fs/xfs/xfs_qm.c
index 31c0f85..843ab07 100644
--- a/fs/xfs/xfs_qm.c
+++ b/fs/xfs/xfs_qm.c
@@ -505,15 +505,15 @@ xfs_qm_dqdetach(
 
 	ASSERT(!xfs_is_quota_inode(&ip->i_mount->m_sb, ip->i_ino));
 	if (ip->i_udquot) {
-		xfs_qm_dqrele(ip->i_udquot);
+		xfs_dqrele(ip->i_udquot);
 		ip->i_udquot = NULL;
 	}
 	if (ip->i_gdquot) {
-		xfs_qm_dqrele(ip->i_gdquot);
+		xfs_dqrele(ip->i_gdquot);
 		ip->i_gdquot = NULL;
 	}
 	if (ip->i_pdquot) {
-		xfs_qm_dqrele(ip->i_pdquot);
+		xfs_dqrele(ip->i_pdquot);
 		ip->i_pdquot = NULL;
 	}
 }
@@ -1741,22 +1741,22 @@ xfs_qm_vop_dqalloc(
 	if (O_udqpp)
 		*O_udqpp = uq;
 	else if (uq)
-		xfs_qm_dqrele(uq);
+		xfs_dqrele(uq);
 	if (O_gdqpp)
 		*O_gdqpp = gq;
 	else if (gq)
-		xfs_qm_dqrele(gq);
+		xfs_dqrele(gq);
 	if (O_pdqpp)
 		*O_pdqpp = pq;
 	else if (pq)
-		xfs_qm_dqrele(pq);
+		xfs_dqrele(pq);
 	return 0;
 
 error_rele:
 	if (gq)
-		xfs_qm_dqrele(gq);
+		xfs_dqrele(gq);
 	if (uq)
-		xfs_qm_dqrele(uq);
+		xfs_dqrele(uq);
 	return error;
 }
 
diff --git a/fs/xfs/xfs_qm_syscalls.c b/fs/xfs/xfs_qm_syscalls.c
index 3daf5ea..1e61cd4 100644
--- a/fs/xfs/xfs_qm_syscalls.c
+++ b/fs/xfs/xfs_qm_syscalls.c
@@ -736,7 +736,7 @@ xfs_qm_scall_setqlim(
 	error = xfs_trans_commit(tp, 0);
 
 out_rele:
-	xfs_qm_dqrele(dqp);
+	xfs_dqrele(dqp);
 out_unlock:
 	mutex_unlock(&q->qi_quotaofflock);
 	return error;
@@ -975,15 +975,15 @@ xfs_dqrele_inode(
 
 	xfs_ilock(ip, XFS_ILOCK_EXCL);
 	if ((flags & XFS_UQUOTA_ACCT) && ip->i_udquot) {
-		xfs_qm_dqrele(ip->i_udquot);
+		xfs_dqrele(ip->i_udquot);
 		ip->i_udquot = NULL;
 	}
 	if ((flags & XFS_GQUOTA_ACCT) && ip->i_gdquot) {
-		xfs_qm_dqrele(ip->i_gdquot);
+		xfs_dqrele(ip->i_gdquot);
 		ip->i_gdquot = NULL;
 	}
 	if ((flags & XFS_PQUOTA_ACCT) && ip->i_pdquot) {
-		xfs_qm_dqrele(ip->i_pdquot);
+		xfs_dqrele(ip->i_pdquot);
 		ip->i_pdquot = NULL;
 	}
 	xfs_iunlock(ip, XFS_ILOCK_EXCL);
diff --git a/fs/xfs/xfs_quota.h b/fs/xfs/xfs_quota.h
index 5376dd4..ae4f41a 100644
--- a/fs/xfs/xfs_quota.h
+++ b/fs/xfs/xfs_quota.h
@@ -94,7 +94,7 @@ extern int xfs_qm_vop_chown_reserve(struct xfs_trans *, struct xfs_inode *,
 extern int xfs_qm_dqattach(struct xfs_inode *, uint);
 extern int xfs_qm_dqattach_locked(struct xfs_inode *, uint);
 extern void xfs_qm_dqdetach(struct xfs_inode *);
-extern void xfs_qm_dqrele(struct xfs_dquot *);
+extern void xfs_dqrele(struct xfs_dquot *);
 extern void xfs_qm_statvfs(struct xfs_inode *, struct kstatfs *);
 extern int xfs_qm_newmount(struct xfs_mount *, uint *, uint *);
 extern void xfs_qm_mount_quotas(struct xfs_mount *);
@@ -136,7 +136,7 @@ static inline int xfs_trans_reserve_quota_bydquots(struct xfs_trans *tp,
 #define xfs_qm_dqattach(ip, fl)						(0)
 #define xfs_qm_dqattach_locked(ip, fl)					(0)
 #define xfs_qm_dqdetach(ip)
-#define xfs_qm_dqrele(d)
+#define xfs_dqrele(d)
 #define xfs_qm_statvfs(ip, s)
 #define xfs_qm_newmount(mp, a, b)					(0)
 #define xfs_qm_mount_quotas(mp)
diff --git a/fs/xfs/xfs_symlink.c b/fs/xfs/xfs_symlink.c
index 14e58f2..14e38fe 100644
--- a/fs/xfs/xfs_symlink.c
+++ b/fs/xfs/xfs_symlink.c
@@ -395,9 +395,9 @@ xfs_symlink(
 		goto error2;
 	}
 	error = xfs_trans_commit(tp, XFS_TRANS_RELEASE_LOG_RES);
-	xfs_qm_dqrele(udqp);
-	xfs_qm_dqrele(gdqp);
-	xfs_qm_dqrele(pdqp);
+	xfs_dqrele(udqp);
+	xfs_dqrele(gdqp);
+	xfs_dqrele(pdqp);
 
 	*ipp = ip;
 	return 0;
@@ -409,9 +409,9 @@ xfs_symlink(
 	cancel_flags |= XFS_TRANS_ABORT;
  error_return:
 	xfs_trans_cancel(tp, cancel_flags);
-	xfs_qm_dqrele(udqp);
-	xfs_qm_dqrele(gdqp);
-	xfs_qm_dqrele(pdqp);
+	xfs_dqrele(udqp);
+	xfs_dqrele(gdqp);
+	xfs_dqrele(pdqp);
 
 	if (unlock_dp_on_error)
 		xfs_iunlock(dp, XFS_ILOCK_EXCL);

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/3] xfs: remote dquot hints
  2013-12-12  9:40 ` [PATCH 1/3] xfs: remote dquot hints Dave Chinner
@ 2013-12-12 18:33   ` Christoph Hellwig
  0 siblings, 0 replies; 13+ messages in thread
From: Christoph Hellwig @ 2013-12-12 18:33 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

Looks good,

Reviewed-by: Christoph Hellwig <hch@lst.de>

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 2/3] xfs: dquot refcounting by atomics
  2013-12-12  9:40 ` [PATCH 2/3] xfs: dquot refcounting by atomics Dave Chinner
@ 2013-12-13 13:23   ` Christoph Hellwig
  0 siblings, 0 replies; 13+ messages in thread
From: Christoph Hellwig @ 2013-12-13 13:23 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

Looks good,

Reviewed-by: Christoph Hellwig <hch@lst.de>

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 4/3] xfs: xfs_qm_dqrele mostly doesn't need locking
  2013-12-12 10:25 ` [PATCH 4/3] xfs: xfs_qm_dqrele mostly doesn't need locking Dave Chinner
@ 2013-12-13 13:28   ` Christoph Hellwig
  2013-12-13 21:30     ` Dave Chinner
  0 siblings, 1 reply; 13+ messages in thread
From: Christoph Hellwig @ 2013-12-13 13:28 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

On Thu, Dec 12, 2013 at 09:25:07PM +1100, Dave Chinner wrote:
> 
> From: Dave Chinner <dchinner@redhat.com>
> 
> Now that we have an atomic variable for the reference count, we
> don't need to take the dquot lock if we are not removing the last
> reference count. The dquot lock is a mutex, so we can't use
> atomic_dec_and_lock(), but we can open code it in xfs_qm_dqrele and
> hence avoid the dquot lock for most of the cases where we drop a
> reference count.
> 
> The result is that concurrent file creates jump from 24,000/s to
> 28,000/s, and the entire workload is now serialised on the dquot
> being locked during transaction commit. Another significant win,
> even though it's not the big one...

Maybe I'm missing something, but shou;dn't the following be enough to
be a valid dqput (plus asserts & tracing):


	if (atomic_dec_and_test(&dqp->q_nrefs)) {
		if (list_lru_add(&mp->m_quotainfo->qi_lru, &dqp->q_lru))
			XFS_STATS_INC(xs_qm_dquot_unused);
	}

given that the only locking we need is the internal lru lock?
> 
> While there, rename xfs_qm_dqrele to xfs_dqrele - the "qm" part of
> the name means nothing and just makes the code harder to read.

Please keep that out of the patch.  I don't mind dropping the
qm_ part, but there's a lot of functions that have it, and it should
be done for all of them at the same time.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 3/3] xfs: xfs_trans_dqresv() can be made lockless
  2013-12-12  9:40 ` [PATCH 3/3] xfs: xfs_trans_dqresv() can be made lockless Dave Chinner
@ 2013-12-13 13:37   ` Christoph Hellwig
  2013-12-16  0:11     ` Dave Chinner
  0 siblings, 1 reply; 13+ messages in thread
From: Christoph Hellwig @ 2013-12-13 13:37 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

On Thu, Dec 12, 2013 at 08:40:58PM +1100, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> xfs_trans_dqresv() serialises dquot modifications by taking the
> dquot lock while it is doing reservations. The thing is, nothing it
> does really requires exclusive access to the dquot except for the
> reservation accounting. We can do that locklessly with cmpxchg.

Can you split the various refactorings into separate patches to make
this more readable?

> +do_ninos:
> +	if (ninos == 0)
> +		goto do_trans;
> +
> +	smp_mb();
> +	timer = be32_to_cpu(dqp->q_core.d_itimer);
> +	warns = be16_to_cpu(dqp->q_core.d_iwarns);
> +	warnlimit = dqp->q_mount->m_quotainfo->qi_iwarnlimit;
> +	hardlimit = be64_to_cpu(dqp->q_core.d_ino_hardlimit);
> +	if (!hardlimit)
> +		hardlimit = q->qi_ihardlimit;
> +	softlimit = be64_to_cpu(dqp->q_core.d_ino_softlimit);
> +	if (!softlimit)
> +		softlimit = q->qi_isoftlimit;
> +	resbcountp = &dqp->q_res_icount;
> +
> +	oldcnt = xfs_dqresv_cmpxchg(mp, dqp, resbcountp, ninos, false, enforce,
> +				    hardlimit, softlimit, timer, warns,
> +				    warnlimit);
> +	if (oldcnt == (xfs_qcnt_t)-1ULL)
> +		goto error_undo_nblks;
> +
> +do_trans:

Instead of having all these goto labels maye this should be factored
into helpers for each of the stages?

>  	if (udqp) {
> +		enforce = !(flags & XFS_QMOPT_FORCE_RES) &&
> +			  udqp->q_core.d_id && XFS_IS_UQUOTA_ENFORCED(mp);
>  		error = xfs_trans_dqresv(tp, mp, udqp, nblks, ninos,
> -					(flags & ~XFS_QMOPT_ENOSPC));
> +					(flags & ~XFS_QMOPT_ENOSPC), enforce);

I have to say I'd much prefer having the enforcement decision hidden
inside xfs_trans_dqresv.

>  		if (error)
>  			return error;
>  	}
>  
>  	if (gdqp) {
> -		error = xfs_trans_dqresv(tp, mp, gdqp, nblks, ninos, flags);
> +		enforce = !(flags & XFS_QMOPT_FORCE_RES) &&
> +			  gdqp->q_core.d_id && XFS_IS_GQUOTA_ENFORCED(mp);
> +		error = xfs_trans_dqresv(tp, mp, gdqp, nblks, ninos, flags,
> +					 enforce);
>  		if (error)

Unrelated to the patch: why do we clear XFS_QMOPT_ENOSPC for user
quotas, but not for project quotas here?

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH 5/3] xfs: return unlocked dquots from xfs_qm_dqqet
  2013-12-12  9:40 [PATCH 0/3] xfs: dquot modification scalability Dave Chinner
                   ` (3 preceding siblings ...)
  2013-12-12 10:25 ` [PATCH 4/3] xfs: xfs_qm_dqrele mostly doesn't need locking Dave Chinner
@ 2013-12-13 16:30 ` Christoph Hellwig
  4 siblings, 0 replies; 13+ messages in thread
From: Christoph Hellwig @ 2013-12-13 16:30 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

We generally don't need the dquot structure locked for the fast path
operations, and we don't need the lock during lookup either.

Signed-off-by: Christoph Hellwig <hch@lst.de>

diff --git a/fs/xfs/xfs_dquot.c b/fs/xfs/xfs_dquot.c
index f17350d..7da4097 100644
--- a/fs/xfs/xfs_dquot.c
+++ b/fs/xfs/xfs_dquot.c
@@ -689,8 +689,9 @@ error0:
 }
 
 /*
- * Given the file system, inode OR id, and type (UDQUOT/GDQUOT), return a
- * a locked dquot, doing an allocation (if requested) as needed.
+ * Given the file system, id, type and optionally inode, return a an unlocked
+ * dquot, doing an allocation if requested and needed.
+ *
  * When both an inode and an id are given, the inode's id takes precedence.
  * That is, if the id changes while we don't hold the ilock inside this
  * function, the new dquot is returned, not necessarily the one requested
@@ -739,9 +740,7 @@ restart:
 	mutex_lock(&qi->qi_tree_lock);
 	dqp = radix_tree_lookup(tree, id);
 	if (dqp) {
-		xfs_dqlock(dqp);
 		if (dqp->dq_flags & XFS_DQ_FREEING) {
-			xfs_dqunlock(dqp);
 			mutex_unlock(&qi->qi_tree_lock);
 			trace_xfs_dqget_freeing(dqp);
 			delay(1);
@@ -824,7 +823,6 @@ restart:
 	/*
 	 * We return a locked, referenced dquot to the caller.
 	 */
-	xfs_dqlock(dqp);
 	qi->qi_dquots++;
 	mutex_unlock(&qi->qi_tree_lock);
 
diff --git a/fs/xfs/xfs_qm.c b/fs/xfs/xfs_qm.c
index 843ab07..985b583 100644
--- a/fs/xfs/xfs_qm.c
+++ b/fs/xfs/xfs_qm.c
@@ -395,7 +395,6 @@ xfs_qm_dqattach_one(
 	 * that the dquot returned is the one that should go in the inode.
 	 */
 	*IO_idqpp = dqp;
-	xfs_dqunlock(dqp);
 	return 0;
 }
 
@@ -1142,6 +1141,8 @@ xfs_qm_quotacheck_dqadjust(
 
 	trace_xfs_dqadjust(dqp);
 
+	xfs_dqlock(dqp);
+
 	/*
 	 * Adjust the inode count and the block count to reflect this inode's
 	 * resource usage.
@@ -1679,10 +1680,6 @@ xfs_qm_vop_dqalloc(
 				ASSERT(error != ENOENT);
 				return error;
 			}
-			/*
-			 * Get the ilock in the right order.
-			 */
-			xfs_dqunlock(uq);
 			lockflags = XFS_ILOCK_SHARED;
 			xfs_ilock(ip, lockflags);
 		} else {
@@ -1706,7 +1703,6 @@ xfs_qm_vop_dqalloc(
 				ASSERT(error != ENOENT);
 				goto error_rele;
 			}
-			xfs_dqunlock(gq);
 			lockflags = XFS_ILOCK_SHARED;
 			xfs_ilock(ip, lockflags);
 		} else {
@@ -1726,7 +1722,6 @@ xfs_qm_vop_dqalloc(
 				ASSERT(error != ENOENT);
 				goto error_rele;
 			}
-			xfs_dqunlock(pq);
 			lockflags = XFS_ILOCK_SHARED;
 			xfs_ilock(ip, lockflags);
 		} else {
diff --git a/fs/xfs/xfs_qm_bhv.c b/fs/xfs/xfs_qm_bhv.c
index e9be63a..c8fda48 100644
--- a/fs/xfs/xfs_qm_bhv.c
+++ b/fs/xfs/xfs_qm_bhv.c
@@ -75,6 +75,7 @@ xfs_qm_statvfs(
 	xfs_dquot_t		*dqp;
 
 	if (!xfs_qm_dqget(mp, NULL, xfs_get_projid(ip), XFS_DQ_PROJ, 0, &dqp)) {
+		xfs_dqlock(dqp);
 		xfs_fill_statvfs_from_dquot(statp, dqp);
 		xfs_qm_dqput(dqp);
 	}
diff --git a/fs/xfs/xfs_qm_syscalls.c b/fs/xfs/xfs_qm_syscalls.c
index 1e61cd4..94dadbe 100644
--- a/fs/xfs/xfs_qm_syscalls.c
+++ b/fs/xfs/xfs_qm_syscalls.c
@@ -617,7 +617,6 @@ xfs_qm_scall_setqlim(
 		ASSERT(error != ENOENT);
 		goto out_unlock;
 	}
-	xfs_dqunlock(dqp);
 
 	tp = xfs_trans_alloc(mp, XFS_TRANS_QM_SETQLIM);
 	error = xfs_trans_reserve(tp, &M_RES(mp)->tr_qm_setqlim, 0, 0);
@@ -844,6 +843,8 @@ xfs_qm_scall_getquota(
 	if (error)
 		return error;
 
+	xfs_dqlock(dqp);
+
 	/*
 	 * If everything's NULL, this dquot doesn't quite exist as far as
 	 * our utility programs are concerned.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH 4/3] xfs: xfs_qm_dqrele mostly doesn't need locking
  2013-12-13 13:28   ` Christoph Hellwig
@ 2013-12-13 21:30     ` Dave Chinner
  2013-12-16 18:21       ` Christoph Hellwig
  0 siblings, 1 reply; 13+ messages in thread
From: Dave Chinner @ 2013-12-13 21:30 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: xfs

On Fri, Dec 13, 2013 at 05:28:07AM -0800, Christoph Hellwig wrote:
> On Thu, Dec 12, 2013 at 09:25:07PM +1100, Dave Chinner wrote:
> > 
> > From: Dave Chinner <dchinner@redhat.com>
> > 
> > Now that we have an atomic variable for the reference count, we
> > don't need to take the dquot lock if we are not removing the last
> > reference count. The dquot lock is a mutex, so we can't use
> > atomic_dec_and_lock(), but we can open code it in xfs_qm_dqrele and
> > hence avoid the dquot lock for most of the cases where we drop a
> > reference count.
> > 
> > The result is that concurrent file creates jump from 24,000/s to
> > 28,000/s, and the entire workload is now serialised on the dquot
> > being locked during transaction commit. Another significant win,
> > even though it's not the big one...
> 
> Maybe I'm missing something, but shou;dn't the following be enough to
> be a valid dqput (plus asserts & tracing):
> 
> 
> 	if (atomic_dec_and_test(&dqp->q_nrefs)) {
> 		if (list_lru_add(&mp->m_quotainfo->qi_lru, &dqp->q_lru))
> 			XFS_STATS_INC(xs_qm_dquot_unused);
> 	}
> 
> given that the only locking we need is the internal lru lock?

Yes, I think it is.

However, that involves changing all the callers of dqput to not hold
the dqlock when they call, which is a bigger change than was
necessary to avoid the lock contention problem. i.e. it doesn't seem
to be in a fast path that needed immediate fixing, so I didn't touch
it.

> > 
> > While there, rename xfs_qm_dqrele to xfs_dqrele - the "qm" part of
> > the name means nothing and just makes the code harder to read.
> 
> Please keep that out of the patch.  I don't mind dropping the
> qm_ part, but there's a lot of functions that have it, and it should
> be done for all of them at the same time.

OK.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 3/3] xfs: xfs_trans_dqresv() can be made lockless
  2013-12-13 13:37   ` Christoph Hellwig
@ 2013-12-16  0:11     ` Dave Chinner
  0 siblings, 0 replies; 13+ messages in thread
From: Dave Chinner @ 2013-12-16  0:11 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: xfs

On Fri, Dec 13, 2013 at 05:37:45AM -0800, Christoph Hellwig wrote:
> On Thu, Dec 12, 2013 at 08:40:58PM +1100, Dave Chinner wrote:
> > From: Dave Chinner <dchinner@redhat.com>
> > 
> > xfs_trans_dqresv() serialises dquot modifications by taking the
> > dquot lock while it is doing reservations. The thing is, nothing it
> > does really requires exclusive access to the dquot except for the
> > reservation accounting. We can do that locklessly with cmpxchg.
> 
> Can you split the various refactorings into separate patches to make
> this more readable?

Yeah, I probably can - this is just the patch that I ended up with
after making it all work....

> > +do_ninos:
> > +	if (ninos == 0)
> > +		goto do_trans;
> > +
> > +	smp_mb();
> > +	timer = be32_to_cpu(dqp->q_core.d_itimer);
> > +	warns = be16_to_cpu(dqp->q_core.d_iwarns);
> > +	warnlimit = dqp->q_mount->m_quotainfo->qi_iwarnlimit;
> > +	hardlimit = be64_to_cpu(dqp->q_core.d_ino_hardlimit);
> > +	if (!hardlimit)
> > +		hardlimit = q->qi_ihardlimit;
> > +	softlimit = be64_to_cpu(dqp->q_core.d_ino_softlimit);
> > +	if (!softlimit)
> > +		softlimit = q->qi_isoftlimit;
> > +	resbcountp = &dqp->q_res_icount;
> > +
> > +	oldcnt = xfs_dqresv_cmpxchg(mp, dqp, resbcountp, ninos, false, enforce,
> > +				    hardlimit, softlimit, timer, warns,
> > +				    warnlimit);
> > +	if (oldcnt == (xfs_qcnt_t)-1ULL)
> > +		goto error_undo_nblks;
> > +
> > +do_trans:
> 
> Instead of having all these goto labels maye this should be factored
> into helpers for each of the stages?

I kind of wanted to get to the same place as the log grant heads,
where the limits are in a separate structure that abstracts this
out. But that turned out to not be immediately possible because the
infor comes from different places.

I think, however, that I want to put the limits into a separate
in-memory structure, anyway, so that we don't have to access the
dquot core here. That way when we modify them, we can modify them in
memory and then apply them to the core during transaction commit,
like we do with the counters. IOWs, try and move away from needing
an in-memory copy of the core entirely, similar to what I think you
are trying to do with the inode code...

> >  	if (udqp) {
> > +		enforce = !(flags & XFS_QMOPT_FORCE_RES) &&
> > +			  udqp->q_core.d_id && XFS_IS_UQUOTA_ENFORCED(mp);
> >  		error = xfs_trans_dqresv(tp, mp, udqp, nblks, ninos,
> > -					(flags & ~XFS_QMOPT_ENOSPC));
> > +					(flags & ~XFS_QMOPT_ENOSPC), enforce);
> 
> I have to say I'd much prefer having the enforcement decision hidden
> inside xfs_trans_dqresv.
> 
> >  		if (error)
> >  			return error;
> >  	}
> >  
> >  	if (gdqp) {
> > -		error = xfs_trans_dqresv(tp, mp, gdqp, nblks, ninos, flags);
> > +		enforce = !(flags & XFS_QMOPT_FORCE_RES) &&
> > +			  gdqp->q_core.d_id && XFS_IS_GQUOTA_ENFORCED(mp);
> > +		error = xfs_trans_dqresv(tp, mp, gdqp, nblks, ninos, flags,
> > +					 enforce);
> >  		if (error)
> 
> Unrelated to the patch: why do we clear XFS_QMOPT_ENOSPC for user
> quotas, but not for project quotas here?

Project quotas return ENOSPC rather than EDQUOT when the quota hits
it's limits. All that flag does is change the return value if we get
an EDQUOT. I think we can probably do that here rather than in
xfs_trans_dqresv() itself, which means that we don't need to mask it
out.

I'll see what I can turn this into, seeing as most of the issues
you've pointed out are with the structure of the code rather than
the algorithmic (cmpxchg) modification....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 4/3] xfs: xfs_qm_dqrele mostly doesn't need locking
  2013-12-13 21:30     ` Dave Chinner
@ 2013-12-16 18:21       ` Christoph Hellwig
  0 siblings, 0 replies; 13+ messages in thread
From: Christoph Hellwig @ 2013-12-16 18:21 UTC (permalink / raw)
  To: Dave Chinner; +Cc: Christoph Hellwig, xfs

On Sat, Dec 14, 2013 at 08:30:06AM +1100, Dave Chinner wrote:
> > 	if (atomic_dec_and_test(&dqp->q_nrefs)) {
> > 		if (list_lru_add(&mp->m_quotainfo->qi_lru, &dqp->q_lru))
> > 			XFS_STATS_INC(xs_qm_dquot_unused);
> > 	}
> > 
> > given that the only locking we need is the internal lru lock?
> 
> Yes, I think it is.
> 
> However, that involves changing all the callers of dqput to not hold
> the dqlock when they call, which is a bigger change than was
> necessary to avoid the lock contention problem. i.e. it doesn't seem
> to be in a fast path that needed immediate fixing, so I didn't touch
> it.

Given that the lru list lock nests inside dqlock we can just turn
dqput into:

void
xfs_qm_dqput(
	struct xfs_dquot        *dqp)
{
	ASSERT(dqp->q_nrefs > 0);
	ASSERT(XFS_DQ_IS_LOCKED(dqp));

	trace_xfs_dqput(dqp);

	xfs_qm_dqrele(dqp);
	xfs_dqunlock(dqp);
}

But with my other patch we can probably replace most callers
with xfs_qm_dqrele, and the remaining ones with an opencoded
version that first drops the lock easily.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2013-12-16 18:21 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-12-12  9:40 [PATCH 0/3] xfs: dquot modification scalability Dave Chinner
2013-12-12  9:40 ` [PATCH 1/3] xfs: remote dquot hints Dave Chinner
2013-12-12 18:33   ` Christoph Hellwig
2013-12-12  9:40 ` [PATCH 2/3] xfs: dquot refcounting by atomics Dave Chinner
2013-12-13 13:23   ` Christoph Hellwig
2013-12-12  9:40 ` [PATCH 3/3] xfs: xfs_trans_dqresv() can be made lockless Dave Chinner
2013-12-13 13:37   ` Christoph Hellwig
2013-12-16  0:11     ` Dave Chinner
2013-12-12 10:25 ` [PATCH 4/3] xfs: xfs_qm_dqrele mostly doesn't need locking Dave Chinner
2013-12-13 13:28   ` Christoph Hellwig
2013-12-13 21:30     ` Dave Chinner
2013-12-16 18:21       ` Christoph Hellwig
2013-12-13 16:30 ` [PATCH 5/3] xfs: return unlocked dquots from xfs_qm_dqqet Christoph Hellwig

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.