All of lore.kernel.org
 help / color / mirror / Atom feed
* filestream allocator rewrite
@ 2014-04-12  8:01 Christoph Hellwig
  2014-04-12  8:01 ` [PATCH 1/9] xfs: don't try to use the filestream allocator for metadata allocations Christoph Hellwig
                   ` (9 more replies)
  0 siblings, 10 replies; 15+ messages in thread
From: Christoph Hellwig @ 2014-04-12  8:01 UTC (permalink / raw)
  To: xfs

This series is a major overhaul of the filestream allocator.  The main point
is to use the dcache to get the parent and avoiding to maintain a fstrm_item
for each inode that the filestream is applied to in favor of just tracking
the parent, but there's various more or less related fallout from this as well.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 1/9] xfs: don't try to use the filestream allocator for metadata allocations
  2014-04-12  8:01 filestream allocator rewrite Christoph Hellwig
@ 2014-04-12  8:01 ` Christoph Hellwig
  2014-04-12  8:01 ` [PATCH 2/9] xfs: split xfs_bmap_btalloc_nullfb Christoph Hellwig
                   ` (8 subsequent siblings)
  9 siblings, 0 replies; 15+ messages in thread
From: Christoph Hellwig @ 2014-04-12  8:01 UTC (permalink / raw)
  To: xfs

xfs_bmap_btalloc_nullfb has two entirely different control flows when
using the filestream allocator vs the regular one, but it get the
conditionals wrong and ends up mixing the two for metadata allocations.
Fix this by adding a missing userdata check and slight refactoring.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/xfs/xfs_bmap.c |   42 ++++++++++++++++++++----------------------
 1 file changed, 20 insertions(+), 22 deletions(-)

diff --git a/fs/xfs/xfs_bmap.c b/fs/xfs/xfs_bmap.c
index f0efc7e..3340f0e 100644
--- a/fs/xfs/xfs_bmap.c
+++ b/fs/xfs/xfs_bmap.c
@@ -3566,33 +3566,31 @@ xfs_bmap_btalloc_nullfb(
 		} else
 			notinit = 1;
 
-		if (xfs_inode_is_filestream(ap->ip)) {
+		if (xfs_inode_is_filestream(ap->ip) && ap->userdata) {
 			if (*blen >= args->maxlen)
 				break;
 
-			if (ap->userdata) {
-				/*
-				 * If startag is an invalid AG, we've
-				 * come here once before and
-				 * xfs_filestream_new_ag picked the
-				 * best currently available.
-				 *
-				 * Don't continue looping, since we
-				 * could loop forever.
-				 */
-				if (startag == NULLAGNUMBER)
-					break;
+			/*
+			 * If startag is an invalid AG, we've
+			 * come here once before and
+			 * xfs_filestream_new_ag picked the
+			 * best currently available.
+			 *
+			 * Don't continue looping, since we
+			 * could loop forever.
+			 */
+			if (startag == NULLAGNUMBER)
+				break;
 
-				error = xfs_filestream_new_ag(ap, &ag);
-				xfs_perag_put(pag);
-				if (error)
-					return error;
+			error = xfs_filestream_new_ag(ap, &ag);
+			xfs_perag_put(pag);
+			if (error)
+				return error;
 
-				/* loop again to set 'blen'*/
-				startag = NULLAGNUMBER;
-				pag = xfs_perag_get(mp, ag);
-				continue;
-			}
+			/* loop again to set 'blen'*/
+			startag = NULLAGNUMBER;
+			pag = xfs_perag_get(mp, ag);
+			continue;
 		}
 		if (++ag == mp->m_sb.sb_agcount)
 			ag = 0;
-- 
1.7.10.4

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 2/9] xfs: split xfs_bmap_btalloc_nullfb
  2014-04-12  8:01 filestream allocator rewrite Christoph Hellwig
  2014-04-12  8:01 ` [PATCH 1/9] xfs: don't try to use the filestream allocator for metadata allocations Christoph Hellwig
@ 2014-04-12  8:01 ` Christoph Hellwig
  2014-04-14  1:55   ` Dave Chinner
  2014-04-12  8:01 ` [PATCH 3/9] xfs: handle duplicate entries in xfs_mru_cache_insert Christoph Hellwig
                   ` (7 subsequent siblings)
  9 siblings, 1 reply; 15+ messages in thread
From: Christoph Hellwig @ 2014-04-12  8:01 UTC (permalink / raw)
  To: xfs

Split xfs_bmap_btalloc_nullfb into one function for filestream allocations
and one for everything else that share a few helpers.  This dramatically
simplifies the control flow.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/xfs/xfs_bmap.c |  200 +++++++++++++++++++++++++++++++----------------------
 1 file changed, 117 insertions(+), 83 deletions(-)

diff --git a/fs/xfs/xfs_bmap.c b/fs/xfs/xfs_bmap.c
index 3340f0e..d5d52df 100644
--- a/fs/xfs/xfs_bmap.c
+++ b/fs/xfs/xfs_bmap.c
@@ -3517,6 +3517,67 @@ xfs_bmap_adjacent(
 #undef ISVALID
 }
 
+static int
+xfs_bmap_longest_free_extent(
+	struct xfs_trans	*tp,
+	xfs_agnumber_t		ag,
+	xfs_extlen_t		*blen,
+	int			*notinit)
+{
+	struct xfs_mount	*mp = tp->t_mountp;
+	struct xfs_perag	*pag;
+	xfs_extlen_t		longest;
+	int			error = 0;
+
+	pag = xfs_perag_get(mp, ag);
+	if (!pag->pagf_init) {
+		error = xfs_alloc_pagf_init(mp, tp, ag, XFS_ALLOC_FLAG_TRYLOCK);
+		if (error)
+			goto out;
+
+		if (!pag->pagf_init) {
+			*notinit = 1;
+			goto out;
+		}
+	}
+
+	longest = xfs_alloc_longest_free_extent(mp, pag);
+	if (*blen < longest)
+		*blen = longest;
+
+out:
+	xfs_perag_put(pag);
+	return error;
+}
+
+static void
+xfs_bmap_fix_args(
+	struct xfs_bmalloca	*ap,
+	struct xfs_alloc_arg	*args,
+	xfs_extlen_t		*blen,
+	int			notinit)
+{
+	if (notinit || *blen < ap->minlen) {
+		/*
+		 * Since we did a BUF_TRYLOCK above, it is possible that
+		 * there is space for this request.
+		 */
+		args->minlen = ap->minlen;
+	} else if (*blen < args->maxlen) {
+		/*
+		 * If the best seen length is less than the request length,
+		 * use the best as the minimum.
+		 */
+		args->minlen = *blen;
+	} else {
+		/*
+		 * Otherwise we've seen an extent as big as maxlen, use that
+		 * as the minimum.
+		 */
+		args->minlen = args->maxlen;
+	}
+}
+
 STATIC int
 xfs_bmap_btalloc_nullfb(
 	struct xfs_bmalloca	*ap,
@@ -3524,109 +3585,74 @@ xfs_bmap_btalloc_nullfb(
 	xfs_extlen_t		*blen)
 {
 	struct xfs_mount	*mp = ap->ip->i_mount;
-	struct xfs_perag	*pag;
 	xfs_agnumber_t		ag, startag;
 	int			notinit = 0;
 	int			error;
 
-	if (ap->userdata && xfs_inode_is_filestream(ap->ip))
-		args->type = XFS_ALLOCTYPE_NEAR_BNO;
-	else
-		args->type = XFS_ALLOCTYPE_START_BNO;
+	args->type = XFS_ALLOCTYPE_START_BNO;
 	args->total = ap->total;
 
-	/*
-	 * Search for an allocation group with a single extent large enough
-	 * for the request.  If one isn't found, then adjust the minimum
-	 * allocation size to the largest space found.
-	 */
 	startag = ag = XFS_FSB_TO_AGNO(mp, args->fsbno);
 	if (startag == NULLAGNUMBER)
 		startag = ag = 0;
 
-	pag = xfs_perag_get(mp, ag);
 	while (*blen < args->maxlen) {
-		if (!pag->pagf_init) {
-			error = xfs_alloc_pagf_init(mp, args->tp, ag,
-						    XFS_ALLOC_FLAG_TRYLOCK);
-			if (error) {
-				xfs_perag_put(pag);
-				return error;
-			}
-		}
-
-		/*
-		 * See xfs_alloc_fix_freelist...
-		 */
-		if (pag->pagf_init) {
-			xfs_extlen_t	longest;
-			longest = xfs_alloc_longest_free_extent(mp, pag);
-			if (*blen < longest)
-				*blen = longest;
-		} else
-			notinit = 1;
-
-		if (xfs_inode_is_filestream(ap->ip) && ap->userdata) {
-			if (*blen >= args->maxlen)
-				break;
-
-			/*
-			 * If startag is an invalid AG, we've
-			 * come here once before and
-			 * xfs_filestream_new_ag picked the
-			 * best currently available.
-			 *
-			 * Don't continue looping, since we
-			 * could loop forever.
-			 */
-			if (startag == NULLAGNUMBER)
-				break;
-
-			error = xfs_filestream_new_ag(ap, &ag);
-			xfs_perag_put(pag);
-			if (error)
-				return error;
+		error = xfs_bmap_longest_free_extent(args->tp, ag, blen,
+						     &notinit);
+		if (error)
+			return error;
 
-			/* loop again to set 'blen'*/
-			startag = NULLAGNUMBER;
-			pag = xfs_perag_get(mp, ag);
-			continue;
-		}
 		if (++ag == mp->m_sb.sb_agcount)
 			ag = 0;
 		if (ag == startag)
 			break;
-		xfs_perag_put(pag);
-		pag = xfs_perag_get(mp, ag);
 	}
-	xfs_perag_put(pag);
 
-	/*
-	 * Since the above loop did a BUF_TRYLOCK, it is
-	 * possible that there is space for this request.
-	 */
-	if (notinit || *blen < ap->minlen)
-		args->minlen = ap->minlen;
-	/*
-	 * If the best seen length is less than the request
-	 * length, use the best as the minimum.
-	 */
-	else if (*blen < args->maxlen)
-		args->minlen = *blen;
-	/*
-	 * Otherwise we've seen an extent as big as maxlen,
-	 * use that as the minimum.
-	 */
-	else
-		args->minlen = args->maxlen;
+	xfs_bmap_fix_args(ap, args, blen, notinit);
+	return 0;
+}
+
+STATIC int
+xfs_bmap_btalloc_filestreams(
+	struct xfs_bmalloca	*ap,
+	struct xfs_alloc_arg	*args,
+	xfs_extlen_t		*blen)
+{
+	struct xfs_mount	*mp = ap->ip->i_mount;
+	xfs_agnumber_t		ag;
+	int			notinit = 0;
+	int			error;
+
+	args->type = XFS_ALLOCTYPE_NEAR_BNO;
+	args->total = ap->total;
+
+	ag = XFS_FSB_TO_AGNO(mp, args->fsbno);
+	if (ag == NULLAGNUMBER)
+		ag = 0;
+
+	error = xfs_bmap_longest_free_extent(args->tp, ag, blen, &notinit);
+	if (error)
+		return error;
+
+	if (*blen < args->maxlen) {
+		error = xfs_filestream_new_ag(ap, &ag);
+		if (error)
+			return error;
+
+		error = xfs_bmap_longest_free_extent(args->tp, ag, blen,
+						     &notinit);
+		if (error)
+			return error;
+
+	}
+
+	xfs_bmap_fix_args(ap, args, blen, notinit);
 
 	/*
-	 * set the failure fallback case to look in the selected
-	 * AG as the stream may have moved.
+	 * Set the failure fallback case to look in the selected AG as stream
+	 * may have moved.
 	 */
-	if (xfs_inode_is_filestream(ap->ip))
-		ap->blkno = args->fsbno = XFS_AGB_TO_FSB(mp, ag, 0);
-
+	ap->blkno = args->fsbno = XFS_AGB_TO_FSB(mp, ag, 0);
 	return 0;
 }
 
@@ -3706,7 +3732,15 @@ xfs_bmap_btalloc(
 	args.firstblock = *ap->firstblock;
 	blen = 0;
 	if (nullfb) {
-		error = xfs_bmap_btalloc_nullfb(ap, &args, &blen);
+		/*
+		 * Search for an allocation group with a single extent large
+		 * enough for the request.  If one isn't found, then adjust
+		 * the minimum allocation size to the largest space found.
+		 */
+		if (ap->userdata && xfs_inode_is_filestream(ap->ip))
+			error = xfs_bmap_btalloc_filestreams(ap, &args, &blen);
+		else
+			error = xfs_bmap_btalloc_nullfb(ap, &args, &blen);
 		if (error)
 			return error;
 	} else if (ap->flist->xbf_low) {
-- 
1.7.10.4

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 3/9] xfs: handle duplicate entries in xfs_mru_cache_insert
  2014-04-12  8:01 filestream allocator rewrite Christoph Hellwig
  2014-04-12  8:01 ` [PATCH 1/9] xfs: don't try to use the filestream allocator for metadata allocations Christoph Hellwig
  2014-04-12  8:01 ` [PATCH 2/9] xfs: split xfs_bmap_btalloc_nullfb Christoph Hellwig
@ 2014-04-12  8:01 ` Christoph Hellwig
  2014-04-12  8:01 ` [PATCH 4/9] xfs: embedd mru_elem into parent structure Christoph Hellwig
                   ` (6 subsequent siblings)
  9 siblings, 0 replies; 15+ messages in thread
From: Christoph Hellwig @ 2014-04-12  8:01 UTC (permalink / raw)
  To: xfs

The radix tree code can detect and reject duplicate keys at insert
time.  Make xfs_mru_cache_insert handle this case so that future
changes to the filestream allocator can take advantage of this.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/xfs/xfs_mru_cache.c |   14 +++++++++++---
 1 file changed, 11 insertions(+), 3 deletions(-)

diff --git a/fs/xfs/xfs_mru_cache.c b/fs/xfs/xfs_mru_cache.c
index 4aff563..4aa3166 100644
--- a/fs/xfs/xfs_mru_cache.c
+++ b/fs/xfs/xfs_mru_cache.c
@@ -443,6 +443,7 @@ xfs_mru_cache_insert(
 	void		*value)
 {
 	xfs_mru_cache_elem_t *elem;
+	int error;
 
 	ASSERT(mru && mru->lists);
 	if (!mru || !mru->lists)
@@ -453,8 +454,8 @@ xfs_mru_cache_insert(
 		return ENOMEM;
 
 	if (radix_tree_preload(GFP_KERNEL)) {
-		kmem_zone_free(xfs_mru_elem_zone, elem);
-		return ENOMEM;
+		error = ENOMEM;
+		goto out_free_item;
 	}
 
 	INIT_LIST_HEAD(&elem->list_node);
@@ -463,13 +464,20 @@ xfs_mru_cache_insert(
 
 	spin_lock(&mru->lock);
 
-	radix_tree_insert(&mru->store, key, elem);
+	error = -radix_tree_insert(&mru->store, key, elem);
 	radix_tree_preload_end();
+	if (error) {
+		spin_unlock(&mru->lock);
+		goto out_free_item;
+	}
 	_xfs_mru_cache_list_insert(mru, elem);
 
 	spin_unlock(&mru->lock);
 
 	return 0;
+out_free_item:
+	kmem_zone_free(xfs_mru_elem_zone, elem);
+	return error;
 }
 
 /*
-- 
1.7.10.4

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 4/9] xfs: embedd mru_elem into parent structure
  2014-04-12  8:01 filestream allocator rewrite Christoph Hellwig
                   ` (2 preceding siblings ...)
  2014-04-12  8:01 ` [PATCH 3/9] xfs: handle duplicate entries in xfs_mru_cache_insert Christoph Hellwig
@ 2014-04-12  8:01 ` Christoph Hellwig
  2014-04-12  8:01 ` [PATCH 5/9] xfs: remove XFS_IFILESTREAM Christoph Hellwig
                   ` (5 subsequent siblings)
  9 siblings, 0 replies; 15+ messages in thread
From: Christoph Hellwig @ 2014-04-12  8:01 UTC (permalink / raw)
  To: xfs

There is no need to do a separate allocation for each mru element, just
embedd the structure into the parent one in the user.  Besides saving
a memory allocation and the infrastructure required for it this also
simplifies the API.

While we do major surgery on xfs_mru_cache.c also de-typedef it and
make struct mru_cache private to the implementation file.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/xfs/xfs_filestream.c |   67 ++++++++++----------
 fs/xfs/xfs_mru_cache.c  |  155 +++++++++++++++++++----------------------------
 fs/xfs/xfs_mru_cache.h  |   31 ++++------
 3 files changed, 107 insertions(+), 146 deletions(-)

diff --git a/fs/xfs/xfs_filestream.c b/fs/xfs/xfs_filestream.c
index 12b6e77..dde529f 100644
--- a/fs/xfs/xfs_filestream.c
+++ b/fs/xfs/xfs_filestream.c
@@ -118,6 +118,7 @@ static kmem_zone_t *item_zone;
  */
 typedef struct fstrm_item
 {
+	struct xfs_mru_cache_elem mru;
 	xfs_agnumber_t	ag;	/* AG currently in use for the file/directory. */
 	xfs_inode_t	*ip;	/* inode self-pointer. */
 	xfs_inode_t	*pip;	/* Parent directory inode pointer. */
@@ -335,10 +336,10 @@ _xfs_filestream_update_ag(
 {
 	int		err = 0;
 	xfs_mount_t	*mp;
-	xfs_mru_cache_t	*cache;
 	fstrm_item_t	*item;
 	xfs_agnumber_t	old_ag;
 	xfs_inode_t	*old_pip;
+	struct xfs_mru_cache_elem *mru;
 
 	/*
 	 * Either ip is a regular file and pip is a directory, or ip is a
@@ -349,16 +350,17 @@ _xfs_filestream_update_ag(
 	              (S_ISDIR(ip->i_d.di_mode) && !pip)));
 
 	mp = ip->i_mount;
-	cache = mp->m_filestream;
 
-	item = xfs_mru_cache_lookup(cache, ip->i_ino);
-	if (item) {
+	mru = xfs_mru_cache_lookup(mp->m_filestream, ip->i_ino);
+	if (mru) {
+		item = container_of(mru, fstrm_item_t, mru);
+
 		ASSERT(item->ip == ip);
 		old_ag = item->ag;
 		item->ag = ag;
 		old_pip = item->pip;
 		item->pip = pip;
-		xfs_mru_cache_done(cache);
+		xfs_mru_cache_done(mp->m_filestream);
 
 		/*
 		 * If the AG has changed, drop the old ref and take a new one,
@@ -391,7 +393,7 @@ _xfs_filestream_update_ag(
 	item->ip = ip;
 	item->pip = pip;
 
-	err = xfs_mru_cache_insert(cache, ip->i_ino, item);
+	err = xfs_mru_cache_insert(mp->m_filestream, ip->i_ino, &item->mru);
 	if (err) {
 		kmem_zone_free(item_zone, item);
 		return err;
@@ -422,14 +424,12 @@ _xfs_filestream_update_ag(
 /* xfs_fstrm_free_func(): callback for freeing cached stream items. */
 STATIC void
 xfs_fstrm_free_func(
-	unsigned long	ino,
-	void		*data)
+	struct xfs_mru_cache_elem *mru)
 {
-	fstrm_item_t	*item  = (fstrm_item_t *)data;
+	fstrm_item_t	*item =
+		container_of(mru, fstrm_item_t, mru);
 	xfs_inode_t	*ip = item->ip;
 
-	ASSERT(ip->i_ino == ino);
-
 	xfs_iflags_clear(ip, XFS_IFILESTREAM);
 
 	/* Drop the reference taken on the AG when the item was added. */
@@ -532,7 +532,8 @@ xfs_agnumber_t
 xfs_filestream_lookup_ag(
 	xfs_inode_t	*ip)
 {
-	xfs_mru_cache_t	*cache;
+	struct xfs_mount *mp = ip->i_mount;
+	struct xfs_mru_cache_elem *mru;
 	fstrm_item_t	*item;
 	xfs_agnumber_t	ag;
 	int		ref;
@@ -542,17 +543,17 @@ xfs_filestream_lookup_ag(
 		return NULLAGNUMBER;
 	}
 
-	cache = ip->i_mount->m_filestream;
-	item = xfs_mru_cache_lookup(cache, ip->i_ino);
-	if (!item) {
+	mru = xfs_mru_cache_lookup(mp->m_filestream, ip->i_ino);
+	if (!mru) {
 		TRACE_LOOKUP(ip->i_mount, ip, NULL, NULLAGNUMBER, 0);
 		return NULLAGNUMBER;
 	}
 
+	item = container_of(mru, fstrm_item_t, mru);
 	ASSERT(ip == item->ip);
 	ag = item->ag;
 	ref = xfs_filestream_peek_ag(ip->i_mount, ag);
-	xfs_mru_cache_done(cache);
+	xfs_mru_cache_done(mp->m_filestream);
 
 	TRACE_LOOKUP(ip->i_mount, ip, item->pip, ag, ref);
 	return ag;
@@ -573,8 +574,8 @@ xfs_filestream_associate(
 	xfs_inode_t	*pip,
 	xfs_inode_t	*ip)
 {
+	struct xfs_mru_cache_elem *mru;
 	xfs_mount_t	*mp;
-	xfs_mru_cache_t	*cache;
 	fstrm_item_t	*item;
 	xfs_agnumber_t	ag, rotorstep, startag;
 	int		err = 0;
@@ -585,7 +586,6 @@ xfs_filestream_associate(
 		return -EINVAL;
 
 	mp = pip->i_mount;
-	cache = mp->m_filestream;
 
 	/*
 	 * We have a problem, Houston.
@@ -606,11 +606,13 @@ xfs_filestream_associate(
 		return 1;
 
 	/* If the parent directory is already in the cache, use its AG. */
-	item = xfs_mru_cache_lookup(cache, pip->i_ino);
-	if (item) {
+	mru = xfs_mru_cache_lookup(mp->m_filestream, pip->i_ino);
+	if (mru) {
+		item = container_of(mru, fstrm_item_t, mru);
+
 		ASSERT(item->ip == pip);
 		ag = item->ag;
-		xfs_mru_cache_done(cache);
+		xfs_mru_cache_done(mp->m_filestream);
 
 		TRACE_LOOKUP(mp, pip, pip, ag, xfs_filestream_peek_ag(mp, ag));
 		err = _xfs_filestream_update_ag(ip, pip, ag);
@@ -671,17 +673,16 @@ xfs_filestream_new_ag(
 	struct xfs_bmalloca	*ap,
 	xfs_agnumber_t		*agp)
 {
+	struct xfs_mru_cache_elem *mru, *mru2;
 	int		flags, err;
 	xfs_inode_t	*ip, *pip = NULL;
 	xfs_mount_t	*mp;
-	xfs_mru_cache_t	*cache;
 	xfs_extlen_t	minlen;
 	fstrm_item_t	*dir, *file;
 	xfs_agnumber_t	ag = NULLAGNUMBER;
 
 	ip = ap->ip;
 	mp = ip->i_mount;
-	cache = mp->m_filestream;
 	minlen = ap->length;
 	*agp = NULLAGNUMBER;
 
@@ -689,8 +690,9 @@ xfs_filestream_new_ag(
 	 * Look for the file in the cache, removing it if it's found.  Doing
 	 * this allows it to be held across the dir lookup that follows.
 	 */
-	file = xfs_mru_cache_remove(cache, ip->i_ino);
-	if (file) {
+	mru = xfs_mru_cache_remove(mp->m_filestream, ip->i_ino);
+	if (mru) {
+		file = container_of(mru, fstrm_item_t, mru);
 		ASSERT(ip == file->ip);
 
 		/* Save the file's parent inode and old AG number for later. */
@@ -698,8 +700,9 @@ xfs_filestream_new_ag(
 		ag = file->ag;
 
 		/* Look for the file's directory in the cache. */
-		dir = xfs_mru_cache_lookup(cache, pip->i_ino);
-		if (dir) {
+		mru2 = xfs_mru_cache_lookup(mp->m_filestream, pip->i_ino);
+		if (mru2) {
+			dir = container_of(mru2, fstrm_item_t, mru);
 			ASSERT(pip == dir->ip);
 
 			/*
@@ -714,7 +717,7 @@ xfs_filestream_new_ag(
 				*agp = file->ag = dir->ag;
 			}
 
-			xfs_mru_cache_done(cache);
+			xfs_mru_cache_done(mp->m_filestream);
 		}
 
 		/*
@@ -722,9 +725,9 @@ xfs_filestream_new_ag(
 		 * function needs to be called to tidy up in the same way as if
 		 * the item had simply expired from the cache.
 		 */
-		err = xfs_mru_cache_insert(cache, ip->i_ino, file);
+		err = xfs_mru_cache_insert(mp->m_filestream, ip->i_ino, mru);
 		if (err) {
-			xfs_fstrm_free_func(ip->i_ino, file);
+			xfs_fstrm_free_func(mru);
 			return err;
 		}
 
@@ -818,7 +821,5 @@ void
 xfs_filestream_deassociate(
 	xfs_inode_t	*ip)
 {
-	xfs_mru_cache_t	*cache = ip->i_mount->m_filestream;
-
-	xfs_mru_cache_delete(cache, ip->i_ino);
+	xfs_mru_cache_delete(ip->i_mount->m_filestream, ip->i_ino);
 }
diff --git a/fs/xfs/xfs_mru_cache.c b/fs/xfs/xfs_mru_cache.c
index 4aa3166..f99b493 100644
--- a/fs/xfs/xfs_mru_cache.c
+++ b/fs/xfs/xfs_mru_cache.c
@@ -100,14 +100,20 @@
  * likely result in a loop in one of the lists.  That's a sure-fire recipe for
  * an infinite loop in the code.
  */
-typedef struct xfs_mru_cache_elem
-{
-	struct list_head list_node;
-	unsigned long	key;
-	void		*value;
-} xfs_mru_cache_elem_t;
+struct xfs_mru_cache {
+	struct radix_tree_root	store;     /* Core storage data structure.  */
+	struct list_head	*lists;    /* Array of lists, one per grp.  */
+	struct list_head	reap_list; /* Elements overdue for reaping. */
+	spinlock_t		lock;      /* Lock to protect this struct.  */
+	unsigned int		grp_count; /* Number of discrete groups.    */
+	unsigned int		grp_time;  /* Time period spanned by grps.  */
+	unsigned int		lru_grp;   /* Group containing time zero.   */
+	unsigned long		time_zero; /* Time first element was added. */
+	xfs_mru_cache_free_func_t free_func; /* Function pointer for freeing. */
+	struct delayed_work	work;      /* Workqueue data for reaping.   */
+	unsigned int		queued;	   /* work has been queued */
+};
 
-static kmem_zone_t		*xfs_mru_elem_zone;
 static struct workqueue_struct	*xfs_mru_reap_wq;
 
 /*
@@ -129,12 +135,12 @@ static struct workqueue_struct	*xfs_mru_reap_wq;
  */
 STATIC unsigned long
 _xfs_mru_cache_migrate(
-	xfs_mru_cache_t	*mru,
-	unsigned long	now)
+	struct xfs_mru_cache	*mru,
+	unsigned long		now)
 {
-	unsigned int	grp;
-	unsigned int	migrated = 0;
-	struct list_head *lru_list;
+	unsigned int		grp;
+	unsigned int		migrated = 0;
+	struct list_head	*lru_list;
 
 	/* Nothing to do if the data store is empty. */
 	if (!mru->time_zero)
@@ -193,11 +199,11 @@ _xfs_mru_cache_migrate(
  */
 STATIC void
 _xfs_mru_cache_list_insert(
-	xfs_mru_cache_t		*mru,
-	xfs_mru_cache_elem_t	*elem)
+	struct xfs_mru_cache	*mru,
+	struct xfs_mru_cache_elem *elem)
 {
-	unsigned int	grp = 0;
-	unsigned long	now = jiffies;
+	unsigned int		grp = 0;
+	unsigned long		now = jiffies;
 
 	/*
 	 * If the data store is empty, initialise time zero, leave grp set to
@@ -231,10 +237,10 @@ _xfs_mru_cache_list_insert(
  */
 STATIC void
 _xfs_mru_cache_clear_reap_list(
-	xfs_mru_cache_t		*mru) __releases(mru->lock) __acquires(mru->lock)
-
+	struct xfs_mru_cache	*mru)
+		__releases(mru->lock) __acquires(mru->lock)
 {
-	xfs_mru_cache_elem_t	*elem, *next;
+	struct xfs_mru_cache_elem *elem, *next;
 	struct list_head	tmp;
 
 	INIT_LIST_HEAD(&tmp);
@@ -252,15 +258,8 @@ _xfs_mru_cache_clear_reap_list(
 	spin_unlock(&mru->lock);
 
 	list_for_each_entry_safe(elem, next, &tmp, list_node) {
-
-		/* Remove the element from the reap list. */
 		list_del_init(&elem->list_node);
-
-		/* Call the client's free function with the key and value pointer. */
-		mru->free_func(elem->key, elem->value);
-
-		/* Free the element structure. */
-		kmem_zone_free(xfs_mru_elem_zone, elem);
+		mru->free_func(elem);
 	}
 
 	spin_lock(&mru->lock);
@@ -277,7 +276,8 @@ STATIC void
 _xfs_mru_cache_reap(
 	struct work_struct	*work)
 {
-	xfs_mru_cache_t		*mru = container_of(work, xfs_mru_cache_t, work.work);
+	struct xfs_mru_cache	*mru =
+		container_of(work, struct xfs_mru_cache, work.work);
 	unsigned long		now, next;
 
 	ASSERT(mru && mru->lists);
@@ -304,28 +304,16 @@ _xfs_mru_cache_reap(
 int
 xfs_mru_cache_init(void)
 {
-	xfs_mru_elem_zone = kmem_zone_init(sizeof(xfs_mru_cache_elem_t),
-	                                 "xfs_mru_cache_elem");
-	if (!xfs_mru_elem_zone)
-		goto out;
-
 	xfs_mru_reap_wq = alloc_workqueue("xfs_mru_cache", WQ_MEM_RECLAIM, 1);
 	if (!xfs_mru_reap_wq)
-		goto out_destroy_mru_elem_zone;
-
+		return -ENOMEM;
 	return 0;
-
- out_destroy_mru_elem_zone:
-	kmem_zone_destroy(xfs_mru_elem_zone);
- out:
-	return -ENOMEM;
 }
 
 void
 xfs_mru_cache_uninit(void)
 {
 	destroy_workqueue(xfs_mru_reap_wq);
-	kmem_zone_destroy(xfs_mru_elem_zone);
 }
 
 /*
@@ -336,14 +324,14 @@ xfs_mru_cache_uninit(void)
  */
 int
 xfs_mru_cache_create(
-	xfs_mru_cache_t		**mrup,
+	struct xfs_mru_cache	**mrup,
 	unsigned int		lifetime_ms,
 	unsigned int		grp_count,
 	xfs_mru_cache_free_func_t free_func)
 {
-	xfs_mru_cache_t	*mru = NULL;
-	int		err = 0, grp;
-	unsigned int	grp_time;
+	struct xfs_mru_cache	*mru = NULL;
+	int			err = 0, grp;
+	unsigned int		grp_time;
 
 	if (mrup)
 		*mrup = NULL;
@@ -400,7 +388,7 @@ exit:
  */
 static void
 xfs_mru_cache_flush(
-	xfs_mru_cache_t		*mru)
+	struct xfs_mru_cache	*mru)
 {
 	if (!mru || !mru->lists)
 		return;
@@ -420,7 +408,7 @@ xfs_mru_cache_flush(
 
 void
 xfs_mru_cache_destroy(
-	xfs_mru_cache_t		*mru)
+	struct xfs_mru_cache	*mru)
 {
 	if (!mru || !mru->lists)
 		return;
@@ -438,45 +426,29 @@ xfs_mru_cache_destroy(
  */
 int
 xfs_mru_cache_insert(
-	xfs_mru_cache_t	*mru,
-	unsigned long	key,
-	void		*value)
+	struct xfs_mru_cache	*mru,
+	unsigned long		key,
+	struct xfs_mru_cache_elem *elem)
 {
-	xfs_mru_cache_elem_t *elem;
-	int error;
+	int			error;
 
 	ASSERT(mru && mru->lists);
 	if (!mru || !mru->lists)
 		return EINVAL;
 
-	elem = kmem_zone_zalloc(xfs_mru_elem_zone, KM_SLEEP);
-	if (!elem)
+	if (radix_tree_preload(GFP_KERNEL))
 		return ENOMEM;
 
-	if (radix_tree_preload(GFP_KERNEL)) {
-		error = ENOMEM;
-		goto out_free_item;
-	}
-
 	INIT_LIST_HEAD(&elem->list_node);
 	elem->key = key;
-	elem->value = value;
 
 	spin_lock(&mru->lock);
-
 	error = -radix_tree_insert(&mru->store, key, elem);
 	radix_tree_preload_end();
-	if (error) {
-		spin_unlock(&mru->lock);
-		goto out_free_item;
-	}
-	_xfs_mru_cache_list_insert(mru, elem);
-
+	if (!error)
+		_xfs_mru_cache_list_insert(mru, elem);
 	spin_unlock(&mru->lock);
 
-	return 0;
-out_free_item:
-	kmem_zone_free(xfs_mru_elem_zone, elem);
 	return error;
 }
 
@@ -486,13 +458,12 @@ out_free_item:
  * the client data pointer for the removed element is returned, otherwise this
  * function will return a NULL pointer.
  */
-void *
+struct xfs_mru_cache_elem *
 xfs_mru_cache_remove(
-	xfs_mru_cache_t	*mru,
-	unsigned long	key)
+	struct xfs_mru_cache	*mru,
+	unsigned long		key)
 {
-	xfs_mru_cache_elem_t *elem;
-	void		*value = NULL;
+	struct xfs_mru_cache_elem *elem;
 
 	ASSERT(mru && mru->lists);
 	if (!mru || !mru->lists)
@@ -500,17 +471,11 @@ xfs_mru_cache_remove(
 
 	spin_lock(&mru->lock);
 	elem = radix_tree_delete(&mru->store, key);
-	if (elem) {
-		value = elem->value;
+	if (elem)
 		list_del(&elem->list_node);
-	}
-
 	spin_unlock(&mru->lock);
 
-	if (elem)
-		kmem_zone_free(xfs_mru_elem_zone, elem);
-
-	return value;
+	return elem;
 }
 
 /*
@@ -519,13 +484,14 @@ xfs_mru_cache_remove(
  */
 void
 xfs_mru_cache_delete(
-	xfs_mru_cache_t	*mru,
-	unsigned long	key)
+	struct xfs_mru_cache	*mru,
+	unsigned long		key)
 {
-	void		*value = xfs_mru_cache_remove(mru, key);
+	struct xfs_mru_cache_elem *elem;
 
-	if (value)
-		mru->free_func(key, value);
+	elem = xfs_mru_cache_remove(mru, key);
+	if (elem)
+		mru->free_func(elem);
 }
 
 /*
@@ -548,12 +514,12 @@ xfs_mru_cache_delete(
  * status, we need to help it get it right by annotating the path that does
  * not release the lock.
  */
-void *
+struct xfs_mru_cache_elem *
 xfs_mru_cache_lookup(
-	xfs_mru_cache_t	*mru,
-	unsigned long	key)
+	struct xfs_mru_cache	*mru,
+	unsigned long		key)
 {
-	xfs_mru_cache_elem_t *elem;
+	struct xfs_mru_cache_elem *elem;
 
 	ASSERT(mru && mru->lists);
 	if (!mru || !mru->lists)
@@ -568,7 +534,7 @@ xfs_mru_cache_lookup(
 	} else
 		spin_unlock(&mru->lock);
 
-	return elem ? elem->value : NULL;
+	return elem;
 }
 
 /*
@@ -578,7 +544,8 @@ xfs_mru_cache_lookup(
  */
 void
 xfs_mru_cache_done(
-	xfs_mru_cache_t	*mru) __releases(mru->lock)
+	struct xfs_mru_cache	*mru)
+		__releases(mru->lock)
 {
 	spin_unlock(&mru->lock);
 }
diff --git a/fs/xfs/xfs_mru_cache.h b/fs/xfs/xfs_mru_cache.h
index 36dd3ec..fb5245b 100644
--- a/fs/xfs/xfs_mru_cache.h
+++ b/fs/xfs/xfs_mru_cache.h
@@ -18,24 +18,15 @@
 #ifndef __XFS_MRU_CACHE_H__
 #define __XFS_MRU_CACHE_H__
 
+struct xfs_mru_cache;
 
-/* Function pointer type for callback to free a client's data pointer. */
-typedef void (*xfs_mru_cache_free_func_t)(unsigned long, void*);
+struct xfs_mru_cache_elem {
+	struct list_head list_node;
+	unsigned long	key;
+};
 
-typedef struct xfs_mru_cache
-{
-	struct radix_tree_root	store;     /* Core storage data structure.  */
-	struct list_head	*lists;    /* Array of lists, one per grp.  */
-	struct list_head	reap_list; /* Elements overdue for reaping. */
-	spinlock_t		lock;      /* Lock to protect this struct.  */
-	unsigned int		grp_count; /* Number of discrete groups.    */
-	unsigned int		grp_time;  /* Time period spanned by grps.  */
-	unsigned int		lru_grp;   /* Group containing time zero.   */
-	unsigned long		time_zero; /* Time first element was added. */
-	xfs_mru_cache_free_func_t free_func; /* Function pointer for freeing. */
-	struct delayed_work	work;      /* Workqueue data for reaping.   */
-	unsigned int		queued;	   /* work has been queued */
-} xfs_mru_cache_t;
+/* Function pointer type for callback to free a client's data pointer. */
+typedef void (*xfs_mru_cache_free_func_t)(struct xfs_mru_cache_elem *elem);
 
 int xfs_mru_cache_init(void);
 void xfs_mru_cache_uninit(void);
@@ -44,10 +35,12 @@ int xfs_mru_cache_create(struct xfs_mru_cache **mrup, unsigned int lifetime_ms,
 			     xfs_mru_cache_free_func_t free_func);
 void xfs_mru_cache_destroy(struct xfs_mru_cache *mru);
 int xfs_mru_cache_insert(struct xfs_mru_cache *mru, unsigned long key,
-				void *value);
-void * xfs_mru_cache_remove(struct xfs_mru_cache *mru, unsigned long key);
+		struct xfs_mru_cache_elem *elem);
+struct xfs_mru_cache_elem *
+xfs_mru_cache_remove(struct xfs_mru_cache *mru, unsigned long key);
 void xfs_mru_cache_delete(struct xfs_mru_cache *mru, unsigned long key);
-void *xfs_mru_cache_lookup(struct xfs_mru_cache *mru, unsigned long key);
+struct xfs_mru_cache_elem *
+xfs_mru_cache_lookup(struct xfs_mru_cache *mru, unsigned long key);
 void xfs_mru_cache_done(struct xfs_mru_cache *mru);
 
 #endif /* __XFS_MRU_CACHE_H__ */
-- 
1.7.10.4

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 5/9] xfs: remove XFS_IFILESTREAM
  2014-04-12  8:01 filestream allocator rewrite Christoph Hellwig
                   ` (3 preceding siblings ...)
  2014-04-12  8:01 ` [PATCH 4/9] xfs: embedd mru_elem into parent structure Christoph Hellwig
@ 2014-04-12  8:01 ` Christoph Hellwig
  2014-04-12  8:02 ` [PATCH 6/9] xfs: rewrite the filestream allocator using the dentry cache Christoph Hellwig
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 15+ messages in thread
From: Christoph Hellwig @ 2014-04-12  8:01 UTC (permalink / raw)
  To: xfs

We never test the flag except in xfs_inode_is_filestream, but that
function already tests the on-disk flag or filesystem wide flags,
and is used to decide if we want to set XFS_IFILESTREAM in the
first place.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/xfs/xfs_filestream.c |    2 --
 fs/xfs/xfs_filestream.h |    1 -
 fs/xfs/xfs_inode.c      |    2 --
 fs/xfs/xfs_inode.h      |    4 +---
 4 files changed, 1 insertion(+), 8 deletions(-)

diff --git a/fs/xfs/xfs_filestream.c b/fs/xfs/xfs_filestream.c
index dde529f..c422110 100644
--- a/fs/xfs/xfs_filestream.c
+++ b/fs/xfs/xfs_filestream.c
@@ -430,8 +430,6 @@ xfs_fstrm_free_func(
 		container_of(mru, fstrm_item_t, mru);
 	xfs_inode_t	*ip = item->ip;
 
-	xfs_iflags_clear(ip, XFS_IFILESTREAM);
-
 	/* Drop the reference taken on the AG when the item was added. */
 	xfs_filestream_put_ag(ip->i_mount, item->ag);
 
diff --git a/fs/xfs/xfs_filestream.h b/fs/xfs/xfs_filestream.h
index 6d61dbe..c4fa9a0 100644
--- a/fs/xfs/xfs_filestream.h
+++ b/fs/xfs/xfs_filestream.h
@@ -63,7 +63,6 @@ xfs_inode_is_filestream(
 	struct xfs_inode	*ip)
 {
 	return (ip->i_mount->m_flags & XFS_MOUNT_FILESTREAMS) ||
-		xfs_iflags_test(ip, XFS_IFILESTREAM) ||
 		(ip->i_d.di_flags & XFS_DIFLAG_FILESTREAM);
 }
 
diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
index 5e7a38f..3328320 100644
--- a/fs/xfs/xfs_inode.c
+++ b/fs/xfs/xfs_inode.c
@@ -849,8 +849,6 @@ xfs_ialloc(
 		error = xfs_filestream_associate(pip, ip);
 		if (error < 0)
 			return -error;
-		if (!error)
-			xfs_iflags_set(ip, XFS_IFILESTREAM);
 	}
 
 	*ipp = ip;
diff --git a/fs/xfs/xfs_inode.h b/fs/xfs/xfs_inode.h
index 396cc1f..a7c2ebf 100644
--- a/fs/xfs/xfs_inode.h
+++ b/fs/xfs/xfs_inode.h
@@ -209,7 +209,6 @@ xfs_get_initial_prid(struct xfs_inode *dp)
 #define XFS_ISTALE		(1 << 1) /* inode has been staled */
 #define XFS_IRECLAIMABLE	(1 << 2) /* inode can be reclaimed */
 #define XFS_INEW		(1 << 3) /* inode has just been allocated */
-#define XFS_IFILESTREAM		(1 << 4) /* inode is in a filestream dir. */
 #define XFS_ITRUNCATED		(1 << 5) /* truncated down so flush-on-close */
 #define XFS_IDIRTY_RELEASE	(1 << 6) /* dirty release already seen */
 #define __XFS_IFLOCK_BIT	7	 /* inode is being flushed right now */
@@ -225,8 +224,7 @@ xfs_get_initial_prid(struct xfs_inode *dp)
  */
 #define XFS_IRECLAIM_RESET_FLAGS	\
 	(XFS_IRECLAIMABLE | XFS_IRECLAIM | \
-	 XFS_IDIRTY_RELEASE | XFS_ITRUNCATED | \
-	 XFS_IFILESTREAM);
+	 XFS_IDIRTY_RELEASE | XFS_ITRUNCATED)
 
 /*
  * Synchronize processes attempting to flush the in-core inode back to disk.
-- 
1.7.10.4

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 6/9] xfs: rewrite the filestream allocator using the dentry cache
  2014-04-12  8:01 filestream allocator rewrite Christoph Hellwig
                   ` (4 preceding siblings ...)
  2014-04-12  8:01 ` [PATCH 5/9] xfs: remove XFS_IFILESTREAM Christoph Hellwig
@ 2014-04-12  8:02 ` Christoph Hellwig
  2014-04-12  8:02 ` [PATCH 7/9] xfs: don't create a slab cache for filestream items Christoph Hellwig
                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 15+ messages in thread
From: Christoph Hellwig @ 2014-04-12  8:02 UTC (permalink / raw)
  To: xfs

In Linux we will always be able to find a parent inode for file that are
undergoing I/O.  Use this to simply the file stream allocator by only
keeping track of parent inodes.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/xfs/xfs_filestream.c |  660 ++++++++++++-----------------------------------
 fs/xfs/xfs_filestream.h |   31 +--
 fs/xfs/xfs_inode.c      |   24 +-
 3 files changed, 171 insertions(+), 544 deletions(-)

diff --git a/fs/xfs/xfs_filestream.c b/fs/xfs/xfs_filestream.c
index c422110..ff6f902 100644
--- a/fs/xfs/xfs_filestream.c
+++ b/fs/xfs/xfs_filestream.c
@@ -1,5 +1,6 @@
 /*
  * Copyright (c) 2006-2007 Silicon Graphics, Inc.
+ * Copyright (c) 2014 Christoph Hellwig.
  * All Rights Reserved.
  *
  * This program is free software; you can redistribute it and/or
@@ -32,101 +33,28 @@
 #include "xfs_filestream.h"
 #include "xfs_trace.h"
 
-#ifdef XFS_FILESTREAMS_TRACE
-
-ktrace_t *xfs_filestreams_trace_buf;
-
-STATIC void
-xfs_filestreams_trace(
-	xfs_mount_t	*mp,	/* mount point */
-	int		type,	/* type of trace */
-	const char	*func,	/* source function */
-	int		line,	/* source line number */
-	__psunsigned_t	arg0,
-	__psunsigned_t	arg1,
-	__psunsigned_t	arg2,
-	__psunsigned_t	arg3,
-	__psunsigned_t	arg4,
-	__psunsigned_t	arg5)
-{
-	ktrace_enter(xfs_filestreams_trace_buf,
-		(void *)(__psint_t)(type | (line << 16)),
-		(void *)func,
-		(void *)(__psunsigned_t)current_pid(),
-		(void *)mp,
-		(void *)(__psunsigned_t)arg0,
-		(void *)(__psunsigned_t)arg1,
-		(void *)(__psunsigned_t)arg2,
-		(void *)(__psunsigned_t)arg3,
-		(void *)(__psunsigned_t)arg4,
-		(void *)(__psunsigned_t)arg5,
-		NULL, NULL, NULL, NULL, NULL, NULL);
-}
-
-#define TRACE0(mp,t)			TRACE6(mp,t,0,0,0,0,0,0)
-#define TRACE1(mp,t,a0)			TRACE6(mp,t,a0,0,0,0,0,0)
-#define TRACE2(mp,t,a0,a1)		TRACE6(mp,t,a0,a1,0,0,0,0)
-#define TRACE3(mp,t,a0,a1,a2)		TRACE6(mp,t,a0,a1,a2,0,0,0)
-#define TRACE4(mp,t,a0,a1,a2,a3)	TRACE6(mp,t,a0,a1,a2,a3,0,0)
-#define TRACE5(mp,t,a0,a1,a2,a3,a4)	TRACE6(mp,t,a0,a1,a2,a3,a4,0)
-#define TRACE6(mp,t,a0,a1,a2,a3,a4,a5) \
-	xfs_filestreams_trace(mp, t, __func__, __LINE__, \
-				(__psunsigned_t)a0, (__psunsigned_t)a1, \
-				(__psunsigned_t)a2, (__psunsigned_t)a3, \
-				(__psunsigned_t)a4, (__psunsigned_t)a5)
-
-#define TRACE_AG_SCAN(mp, ag, ag2) \
-		TRACE2(mp, XFS_FSTRM_KTRACE_AGSCAN, ag, ag2);
-#define TRACE_AG_PICK1(mp, max_ag, maxfree) \
-		TRACE2(mp, XFS_FSTRM_KTRACE_AGPICK1, max_ag, maxfree);
-#define TRACE_AG_PICK2(mp, ag, ag2, cnt, free, scan, flag) \
-		TRACE6(mp, XFS_FSTRM_KTRACE_AGPICK2, ag, ag2, \
-			 cnt, free, scan, flag)
-#define TRACE_UPDATE(mp, ip, ag, cnt, ag2, cnt2) \
-		TRACE5(mp, XFS_FSTRM_KTRACE_UPDATE, ip, ag, cnt, ag2, cnt2)
-#define TRACE_FREE(mp, ip, pip, ag, cnt) \
-		TRACE4(mp, XFS_FSTRM_KTRACE_FREE, ip, pip, ag, cnt)
-#define TRACE_LOOKUP(mp, ip, pip, ag, cnt) \
-		TRACE4(mp, XFS_FSTRM_KTRACE_ITEM_LOOKUP, ip, pip, ag, cnt)
-#define TRACE_ASSOCIATE(mp, ip, pip, ag, cnt) \
-		TRACE4(mp, XFS_FSTRM_KTRACE_ASSOCIATE, ip, pip, ag, cnt)
-#define TRACE_MOVEAG(mp, ip, pip, oag, ocnt, nag, ncnt) \
-		TRACE6(mp, XFS_FSTRM_KTRACE_MOVEAG, ip, pip, oag, ocnt, nag, ncnt)
-#define TRACE_ORPHAN(mp, ip, ag) \
-		TRACE2(mp, XFS_FSTRM_KTRACE_ORPHAN, ip, ag);
-
-
-#else
 #define TRACE_AG_SCAN(mp, ag, ag2)
 #define TRACE_AG_PICK1(mp, max_ag, maxfree)
 #define TRACE_AG_PICK2(mp, ag, ag2, cnt, free, scan, flag)
-#define TRACE_UPDATE(mp, ip, ag, cnt, ag2, cnt2)
 #define TRACE_FREE(mp, ip, pip, ag, cnt)
 #define TRACE_LOOKUP(mp, ip, pip, ag, cnt)
-#define TRACE_ASSOCIATE(mp, ip, pip, ag, cnt)
-#define TRACE_MOVEAG(mp, ip, pip, oag, ocnt, nag, ncnt)
-#define TRACE_ORPHAN(mp, ip, ag)
-#endif
 
 static kmem_zone_t *item_zone;
 
-/*
- * Structure for associating a file or a directory with an allocation group.
- * The parent directory pointer is only needed for files, but since there will
- * generally be vastly more files than directories in the cache, using the same
- * data structure simplifies the code with very little memory overhead.
- */
-typedef struct fstrm_item
-{
-	struct xfs_mru_cache_elem mru;
-	xfs_agnumber_t	ag;	/* AG currently in use for the file/directory. */
-	xfs_inode_t	*ip;	/* inode self-pointer. */
-	xfs_inode_t	*pip;	/* Parent directory inode pointer. */
-} fstrm_item_t;
+struct xfs_fstrm_item {
+	struct xfs_mru_cache_elem	mru;
+	struct xfs_inode		*ip;
+	xfs_agnumber_t			ag; /* AG in use for this directory */
+};
+
+enum xfs_fstrm_alloc {
+	XFS_PICK_USERDATA = 1,
+	XFS_PICK_LOWSPACE = 2,
+};
 
 /*
  * Allocation group filestream associations are tracked with per-ag atomic
- * counters.  These counters allow _xfs_filestream_pick_ag() to tell whether a
+ * counters.  These counters allow xfs_filestream_pick_ag() to tell whether a
  * particular AG already has active filestreams associated with it. The mount
  * point's m_peraglock is used to protect these counters from per-ag array
  * re-allocation during a growfs operation.  When xfs_growfs_data_private() is
@@ -201,23 +129,42 @@ xfs_filestream_put_ag(
 	xfs_perag_put(pag);
 }
 
+static void
+xfs_fstrm_free_func(
+	struct xfs_mru_cache_elem *mru)
+{
+	struct xfs_fstrm_item	*item =
+		container_of(mru, struct xfs_fstrm_item, mru);
+
+	xfs_filestream_put_ag(item->ip->i_mount, item->ag);
+
+	TRACE_FREE(mp, ip, NULL, item->ag,
+		   xfs_filestream_peek_ag(mp, item->ag));
+
+	kmem_zone_free(item_zone, item);
+}
+
 /*
  * Scan the AGs starting at startag looking for an AG that isn't in use and has
  * at least minlen blocks free.
  */
 static int
-_xfs_filestream_pick_ag(
-	xfs_mount_t	*mp,
-	xfs_agnumber_t	startag,
-	xfs_agnumber_t	*agp,
-	int		flags,
-	xfs_extlen_t	minlen)
+xfs_filestream_pick_ag(
+	struct xfs_inode	*ip,
+	xfs_agnumber_t		startag,
+	xfs_agnumber_t		*agp,
+	int			flags,
+	xfs_extlen_t		minlen)
 {
-	int		streams, max_streams;
-	int		err, trylock, nscan;
-	xfs_extlen_t	longest, free, minfree, maxfree = 0;
-	xfs_agnumber_t	ag, max_ag = NULLAGNUMBER;
-	struct xfs_perag *pag;
+	struct xfs_mount	*mp = ip->i_mount;
+	struct xfs_fstrm_item	*item;
+	struct xfs_perag	*pag;
+	xfs_extlen_t		longest, free, minfree, maxfree = 0;
+	xfs_agnumber_t		ag, max_ag = NULLAGNUMBER;
+	int			streams, max_streams;
+	int			err, trylock, nscan;
+
+	ASSERT(S_ISDIR(ip->i_d.di_mode));
 
 	/* 2% of an AG's blocks must be free for it to be chosen. */
 	minfree = mp->m_sb.sb_agblocks / 50;
@@ -321,205 +268,55 @@ next_ag:
 
 	TRACE_AG_PICK2(mp, startag, *agp, streams, free, nscan, flags);
 
-	return 0;
-}
-
-/*
- * Set the allocation group number for a file or a directory, updating inode
- * references and per-AG references as appropriate.
- */
-static int
-_xfs_filestream_update_ag(
-	xfs_inode_t	*ip,
-	xfs_inode_t	*pip,
-	xfs_agnumber_t	ag)
-{
-	int		err = 0;
-	xfs_mount_t	*mp;
-	fstrm_item_t	*item;
-	xfs_agnumber_t	old_ag;
-	xfs_inode_t	*old_pip;
-	struct xfs_mru_cache_elem *mru;
-
-	/*
-	 * Either ip is a regular file and pip is a directory, or ip is a
-	 * directory and pip is NULL.
-	 */
-	ASSERT(ip && ((S_ISREG(ip->i_d.di_mode) && pip &&
-	               S_ISDIR(pip->i_d.di_mode)) ||
-	              (S_ISDIR(ip->i_d.di_mode) && !pip)));
-
-	mp = ip->i_mount;
-
-	mru = xfs_mru_cache_lookup(mp->m_filestream, ip->i_ino);
-	if (mru) {
-		item = container_of(mru, fstrm_item_t, mru);
-
-		ASSERT(item->ip == ip);
-		old_ag = item->ag;
-		item->ag = ag;
-		old_pip = item->pip;
-		item->pip = pip;
-		xfs_mru_cache_done(mp->m_filestream);
-
-		/*
-		 * If the AG has changed, drop the old ref and take a new one,
-		 * effectively transferring the reference from old to new AG.
-		 */
-		if (ag != old_ag) {
-			xfs_filestream_put_ag(mp, old_ag);
-			xfs_filestream_get_ag(mp, ag);
-		}
-
-		/*
-		 * If ip is a file and its pip has changed, drop the old ref and
-		 * take a new one.
-		 */
-		if (pip && pip != old_pip) {
-			IRELE(old_pip);
-			IHOLD(pip);
-		}
-
-		TRACE_UPDATE(mp, ip, old_ag, xfs_filestream_peek_ag(mp, old_ag),
-				ag, xfs_filestream_peek_ag(mp, ag));
+	if (*agp == NULLAGNUMBER)
 		return 0;
-	}
 
+	err = ENOMEM;
 	item = kmem_zone_zalloc(item_zone, KM_MAYFAIL);
 	if (!item)
-		return ENOMEM;
+		goto out_put_ag;
 
-	item->ag = ag;
+	item->ag = *agp;
 	item->ip = ip;
-	item->pip = pip;
 
 	err = xfs_mru_cache_insert(mp->m_filestream, ip->i_ino, &item->mru);
 	if (err) {
-		kmem_zone_free(item_zone, item);
-		return err;
+		if (err == EEXIST)
+			err = 0;
+		goto out_free_item;
 	}
 
-	/* Take a reference on the AG. */
-	xfs_filestream_get_ag(mp, ag);
-
-	/*
-	 * Take a reference on the inode itself regardless of whether it's a
-	 * regular file or a directory.
-	 */
-	IHOLD(ip);
-
-	/*
-	 * In the case of a regular file, take a reference on the parent inode
-	 * as well to ensure it remains in-core.
-	 */
-	if (pip)
-		IHOLD(pip);
-
-	TRACE_UPDATE(mp, ip, ag, xfs_filestream_peek_ag(mp, ag),
-			ag, xfs_filestream_peek_ag(mp, ag));
-
 	return 0;
-}
-
-/* xfs_fstrm_free_func(): callback for freeing cached stream items. */
-STATIC void
-xfs_fstrm_free_func(
-	struct xfs_mru_cache_elem *mru)
-{
-	fstrm_item_t	*item =
-		container_of(mru, fstrm_item_t, mru);
-	xfs_inode_t	*ip = item->ip;
-
-	/* Drop the reference taken on the AG when the item was added. */
-	xfs_filestream_put_ag(ip->i_mount, item->ag);
-
-	TRACE_FREE(ip->i_mount, ip, item->pip, item->ag,
-		xfs_filestream_peek_ag(ip->i_mount, item->ag));
-
-	/*
-	 * _xfs_filestream_update_ag() always takes a reference on the inode
-	 * itself, whether it's a file or a directory.  Release it here.
-	 * This can result in the inode being freed and so we must
-	 * not hold any inode locks when freeing filesstreams objects
-	 * otherwise we can deadlock here.
-	 */
-	IRELE(ip);
 
-	/*
-	 * In the case of a regular file, _xfs_filestream_update_ag() also
-	 * takes a ref on the parent inode to keep it in-core.  Release that
-	 * too.
-	 */
-	if (item->pip)
-		IRELE(item->pip);
-
-	/* Finally, free the memory allocated for the item. */
+out_free_item:
 	kmem_zone_free(item_zone, item);
+out_put_ag:
+	xfs_filestream_put_ag(mp, *agp);
+	return err;
 }
 
-/*
- * xfs_filestream_init() is called at xfs initialisation time to set up the
- * memory zone that will be used for filestream data structure allocation.
- */
-int
-xfs_filestream_init(void)
-{
-	item_zone = kmem_zone_init(sizeof(fstrm_item_t), "fstrm_item");
-	if (!item_zone)
-		return -ENOMEM;
-
-	return 0;
-}
-
-/*
- * xfs_filestream_uninit() is called at xfs termination time to destroy the
- * memory zone that was used for filestream data structure allocation.
- */
-void
-xfs_filestream_uninit(void)
+static struct xfs_inode *
+xfs_filestream_get_parent(
+	struct xfs_inode	*ip)
 {
-	kmem_zone_destroy(item_zone);
-}
+	struct inode		*inode = VFS_I(ip), *dir = NULL;
+	struct dentry		*dentry, *parent;
 
-/*
- * xfs_filestream_mount() is called when a file system is mounted with the
- * filestream option.  It is responsible for allocating the data structures
- * needed to track the new file system's file streams.
- */
-int
-xfs_filestream_mount(
-	xfs_mount_t	*mp)
-{
-	int		err;
-	unsigned int	lifetime, grp_count;
+	dentry = d_find_alias(inode);
+	if (!dentry)
+		goto out;
 
-	/*
-	 * The filestream timer tunable is currently fixed within the range of
-	 * one second to four minutes, with five seconds being the default.  The
-	 * group count is somewhat arbitrary, but it'd be nice to adhere to the
-	 * timer tunable to within about 10 percent.  This requires at least 10
-	 * groups.
-	 */
-	lifetime  = xfs_fstrm_centisecs * 10;
-	grp_count = 10;
+	parent = dget_parent(dentry);
+	if (!parent)
+		goto out_dput;
 
-	err = xfs_mru_cache_create(&mp->m_filestream, lifetime, grp_count,
-	                     xfs_fstrm_free_func);
+	dir = igrab(parent->d_inode);
+	dput(parent);
 
-	return err;
-}
-
-/*
- * xfs_filestream_unmount() is called when a file system that was mounted with
- * the filestream option is unmounted.  It drains the data structures created
- * to track the file system's file streams and frees all the memory that was
- * allocated.
- */
-void
-xfs_filestream_unmount(
-	xfs_mount_t	*mp)
-{
-	xfs_mru_cache_destroy(mp->m_filestream);
+out_dput:
+	dput(dentry);
+out:
+	return dir ? XFS_I(dir) : NULL;
 }
 
 /*
@@ -528,94 +325,61 @@ xfs_filestream_unmount(
  */
 xfs_agnumber_t
 xfs_filestream_lookup_ag(
-	xfs_inode_t	*ip)
+	struct xfs_inode	*ip)
 {
-	struct xfs_mount *mp = ip->i_mount;
+	struct xfs_mount	*mp = ip->i_mount;
+	struct xfs_fstrm_item	*item;
+	struct xfs_inode	*pip = NULL;
+	xfs_agnumber_t		ag = NULLAGNUMBER;
+	int			ref = 0;
 	struct xfs_mru_cache_elem *mru;
-	fstrm_item_t	*item;
-	xfs_agnumber_t	ag;
-	int		ref;
 
-	if (!S_ISREG(ip->i_d.di_mode) && !S_ISDIR(ip->i_d.di_mode)) {
-		ASSERT(0);
-		return NULLAGNUMBER;
-	}
+	ASSERT(S_ISREG(ip->i_d.di_mode));
 
-	mru = xfs_mru_cache_lookup(mp->m_filestream, ip->i_ino);
-	if (!mru) {
-		TRACE_LOOKUP(ip->i_mount, ip, NULL, NULLAGNUMBER, 0);
-		return NULLAGNUMBER;
-	}
+	pip = xfs_filestream_get_parent(ip);
+	if (!pip)
+		goto out;
+
+	mru = xfs_mru_cache_lookup(mp->m_filestream, pip->i_ino);
+	if (!mru)
+		goto out;
+
+	item = container_of(mru, struct xfs_fstrm_item, mru);
 
-	item = container_of(mru, fstrm_item_t, mru);
-	ASSERT(ip == item->ip);
 	ag = item->ag;
-	ref = xfs_filestream_peek_ag(ip->i_mount, ag);
 	xfs_mru_cache_done(mp->m_filestream);
 
-	TRACE_LOOKUP(ip->i_mount, ip, item->pip, ag, ref);
+	ref = xfs_filestream_peek_ag(ip->i_mount, ag);
+out:
+	TRACE_LOOKUP(mp, ip, pip, ag, ref);
+	IRELE(pip);
 	return ag;
 }
 
 /*
- * xfs_filestream_associate() should only be called to associate a regular file
- * with its parent directory.  Calling it with a child directory isn't
- * appropriate because filestreams don't apply to entire directory hierarchies.
- * Creating a file in a child directory of an existing filestream directory
- * starts a new filestream with its own allocation group association.
+ * Make sure a directory has a filestream associated with it.
  *
- * Returns < 0 on error, 0 if successful association occurred, > 0 if
- * we failed to get an association because of locking issues.
+ * This is called when creating regular files in an directory that has
+ * filestreams enabled, so that a stream is ready by the time we need it
+ * in the allocator for the files inside the directory.
  */
 int
 xfs_filestream_associate(
-	xfs_inode_t	*pip,
-	xfs_inode_t	*ip)
+	struct xfs_inode	*pip)
 {
+	struct xfs_mount	*mp = pip->i_mount;
 	struct xfs_mru_cache_elem *mru;
-	xfs_mount_t	*mp;
-	fstrm_item_t	*item;
-	xfs_agnumber_t	ag, rotorstep, startag;
-	int		err = 0;
+	xfs_agnumber_t		startag, ag;
 
 	ASSERT(S_ISDIR(pip->i_d.di_mode));
-	ASSERT(S_ISREG(ip->i_d.di_mode));
-	if (!S_ISDIR(pip->i_d.di_mode) || !S_ISREG(ip->i_d.di_mode))
-		return -EINVAL;
-
-	mp = pip->i_mount;
 
 	/*
-	 * We have a problem, Houston.
-	 *
-	 * Taking the iolock here violates inode locking order - we already
-	 * hold the ilock. Hence if we block getting this lock we may never
-	 * wake. Unfortunately, that means if we can't get the lock, we're
-	 * screwed in terms of getting a stream association - we can't spin
-	 * waiting for the lock because someone else is waiting on the lock we
-	 * hold and we cannot drop that as we are in a transaction here.
-	 *
-	 * Lucky for us, this inversion is not a problem because it's a
-	 * directory inode that we are trying to lock here.
-	 *
-	 * So, if we can't get the iolock without sleeping then just give up
+	 * If the directory already has a file stream associated we're done.
 	 */
-	if (!xfs_ilock_nowait(pip, XFS_IOLOCK_EXCL))
-		return 1;
-
-	/* If the parent directory is already in the cache, use its AG. */
 	mru = xfs_mru_cache_lookup(mp->m_filestream, pip->i_ino);
 	if (mru) {
-		item = container_of(mru, fstrm_item_t, mru);
-
-		ASSERT(item->ip == pip);
-		ag = item->ag;
 		xfs_mru_cache_done(mp->m_filestream);
-
-		TRACE_LOOKUP(mp, pip, pip, ag, xfs_filestream_peek_ag(mp, ag));
-		err = _xfs_filestream_update_ag(ip, pip, ag);
-
-		goto exit;
+		return 0;
 	}
 
 	/*
@@ -623,201 +387,107 @@ xfs_filestream_associate(
 	 * use the directory inode's AG.
 	 */
 	if (mp->m_flags & XFS_MOUNT_32BITINODES) {
-		rotorstep = xfs_rotorstep;
+		xfs_agnumber_t	 rotorstep = xfs_rotorstep;
 		startag = (mp->m_agfrotor / rotorstep) % mp->m_sb.sb_agcount;
 		mp->m_agfrotor = (mp->m_agfrotor + 1) %
 		                 (mp->m_sb.sb_agcount * rotorstep);
 	} else
 		startag = XFS_INO_TO_AGNO(mp, pip->i_ino);
 
-	/* Pick a new AG for the parent inode starting at startag. */
-	err = _xfs_filestream_pick_ag(mp, startag, &ag, 0, 0);
-	if (err || ag == NULLAGNUMBER)
-		goto exit_did_pick;
-
-	/* Associate the parent inode with the AG. */
-	err = _xfs_filestream_update_ag(pip, NULL, ag);
-	if (err)
-		goto exit_did_pick;
-
-	/* Associate the file inode with the AG. */
-	err = _xfs_filestream_update_ag(ip, pip, ag);
-	if (err)
-		goto exit_did_pick;
-
-	TRACE_ASSOCIATE(mp, ip, pip, ag, xfs_filestream_peek_ag(mp, ag));
-
-exit_did_pick:
-	/*
-	 * If _xfs_filestream_pick_ag() returned a valid AG, remove the
-	 * reference it took on it, since the file and directory will have taken
-	 * their own now if they were successfully cached.
-	 */
-	if (ag != NULLAGNUMBER)
-		xfs_filestream_put_ag(mp, ag);
-
-exit:
-	xfs_iunlock(pip, XFS_IOLOCK_EXCL);
-	return -err;
+	return xfs_filestream_pick_ag(pip, startag, &ag, 0, 0);
 }
 
 /*
- * Pick a new allocation group for the current file and its file stream.  This
- * function is called by xfs_bmap_filestreams() with the mount point's per-ag
- * lock held.
+ * Pick a new allocation group for the current file and its file stream.
+ *
+ * This is called when the allocator can't find a suitable extent in the
+ * current AG, and we have to move the stream into a new AG with more space.
  */
 int
 xfs_filestream_new_ag(
 	struct xfs_bmalloca	*ap,
 	xfs_agnumber_t		*agp)
 {
-	struct xfs_mru_cache_elem *mru, *mru2;
-	int		flags, err;
-	xfs_inode_t	*ip, *pip = NULL;
-	xfs_mount_t	*mp;
-	xfs_extlen_t	minlen;
-	fstrm_item_t	*dir, *file;
-	xfs_agnumber_t	ag = NULLAGNUMBER;
-
-	ip = ap->ip;
-	mp = ip->i_mount;
-	minlen = ap->length;
-	*agp = NULLAGNUMBER;
-
-	/*
-	 * Look for the file in the cache, removing it if it's found.  Doing
-	 * this allows it to be held across the dir lookup that follows.
-	 */
-	mru = xfs_mru_cache_remove(mp->m_filestream, ip->i_ino);
-	if (mru) {
-		file = container_of(mru, fstrm_item_t, mru);
-		ASSERT(ip == file->ip);
-
-		/* Save the file's parent inode and old AG number for later. */
-		pip = file->pip;
-		ag = file->ag;
-
-		/* Look for the file's directory in the cache. */
-		mru2 = xfs_mru_cache_lookup(mp->m_filestream, pip->i_ino);
-		if (mru2) {
-			dir = container_of(mru2, fstrm_item_t, mru);
-			ASSERT(pip == dir->ip);
-
-			/*
-			 * If the directory has already moved on to a new AG,
-			 * use that AG as the new AG for the file. Don't
-			 * forget to twiddle the AG refcounts to match the
-			 * movement.
-			 */
-			if (dir->ag != file->ag) {
-				xfs_filestream_put_ag(mp, file->ag);
-				xfs_filestream_get_ag(mp, dir->ag);
-				*agp = file->ag = dir->ag;
-			}
+	struct xfs_inode	*ip = ap->ip, *pip;
+	struct xfs_mount	*mp = ip->i_mount;
+	xfs_extlen_t		minlen = ap->length;
+	xfs_agnumber_t		startag = 0;
+	int			flags, err = 0;
+	struct xfs_mru_cache_elem *mru;
 
-			xfs_mru_cache_done(mp->m_filestream);
-		}
+	*agp = NULLAGNUMBER;
 
-		/*
-		 * Put the file back in the cache.  If this fails, the free
-		 * function needs to be called to tidy up in the same way as if
-		 * the item had simply expired from the cache.
-		 */
-		err = xfs_mru_cache_insert(mp->m_filestream, ip->i_ino, mru);
-		if (err) {
-			xfs_fstrm_free_func(mru);
-			return err;
-		}
+	pip = xfs_filestream_get_parent(ip);
+	if (!pip)
+		goto exit;
 
-		/*
-		 * If the file's AG was moved to the directory's new AG, there's
-		 * nothing more to be done.
-		 */
-		if (*agp != NULLAGNUMBER) {
-			TRACE_MOVEAG(mp, ip, pip,
-					ag, xfs_filestream_peek_ag(mp, ag),
-					*agp, xfs_filestream_peek_ag(mp, *agp));
-			return 0;
-		}
+	mru = xfs_mru_cache_remove(mp->m_filestream, pip->i_ino);
+	if (mru) {
+		struct xfs_fstrm_item *item =
+			container_of(mru, struct xfs_fstrm_item, mru);
+		startag = (item->ag + 1) % mp->m_sb.sb_agcount;
 	}
 
-	/*
-	 * If the file's parent directory is known, take its iolock in exclusive
-	 * mode to prevent two sibling files from racing each other to migrate
-	 * themselves and their parent to different AGs.
-	 *
-	 * Note that we lock the parent directory iolock inside the child
-	 * iolock here.  That's fine as we never hold both parent and child
-	 * iolock in any other place.  This is different from the ilock,
-	 * which requires locking of the child after the parent for namespace
-	 * operations.
-	 */
-	if (pip)
-		xfs_ilock(pip, XFS_IOLOCK_EXCL | XFS_IOLOCK_PARENT);
-
-	/*
-	 * A new AG needs to be found for the file.  If the file's parent
-	 * directory is also known, it will be moved to the new AG as well to
-	 * ensure that files created inside it in future use the new AG.
-	 */
-	ag = (ag == NULLAGNUMBER) ? 0 : (ag + 1) % mp->m_sb.sb_agcount;
 	flags = (ap->userdata ? XFS_PICK_USERDATA : 0) |
 	        (ap->flist->xbf_low ? XFS_PICK_LOWSPACE : 0);
 
-	err = _xfs_filestream_pick_ag(mp, ag, agp, flags, minlen);
-	if (err || *agp == NULLAGNUMBER)
-		goto exit;
+	err = xfs_filestream_pick_ag(pip, startag, agp, flags, minlen);
 
 	/*
-	 * If the file wasn't found in the file cache, then its parent directory
-	 * inode isn't known.  For this to have happened, the file must either
-	 * be pre-existing, or it was created long enough ago that its cache
-	 * entry has expired.  This isn't the sort of usage that the filestreams
-	 * allocator is trying to optimise, so there's no point trying to track
-	 * its new AG somehow in the filestream data structures.
+	 * Only free the item here so we skip over the old AG earlier.
 	 */
-	if (!pip) {
-		TRACE_ORPHAN(mp, ip, *agp);
-		goto exit;
-	}
+	if (mru)
+		xfs_fstrm_free_func(mru);
 
-	/* Associate the parent inode with the AG. */
-	err = _xfs_filestream_update_ag(pip, NULL, *agp);
-	if (err)
-		goto exit;
-
-	/* Associate the file inode with the AG. */
-	err = _xfs_filestream_update_ag(ip, pip, *agp);
-	if (err)
-		goto exit;
+	IRELE(pip);
+exit:
+	if (*agp == NULLAGNUMBER)
+		*agp = 0;
+	return err;
+}
 
-	TRACE_MOVEAG(mp, ip, pip, NULLAGNUMBER, 0,
-			*agp, xfs_filestream_peek_ag(mp, *agp));
+void
+xfs_filestream_deassociate(
+	struct xfs_inode	*ip)
+{
+	xfs_mru_cache_delete(ip->i_mount->m_filestream, ip->i_ino);
+}
 
-exit:
+int
+xfs_filestream_mount(
+	xfs_mount_t	*mp)
+{
 	/*
-	 * If _xfs_filestream_pick_ag() returned a valid AG, remove the
-	 * reference it took on it, since the file and directory will have taken
-	 * their own now if they were successfully cached.
+	 * The filestream timer tunable is currently fixed within the range of
+	 * one second to four minutes, with five seconds being the default.  The
+	 * group count is somewhat arbitrary, but it'd be nice to adhere to the
+	 * timer tunable to within about 10 percent.  This requires at least 10
+	 * groups.
 	 */
-	if (*agp != NULLAGNUMBER)
-		xfs_filestream_put_ag(mp, *agp);
-	else
-		*agp = 0;
+	return xfs_mru_cache_create(&mp->m_filestream, xfs_fstrm_centisecs * 10,
+				    10, xfs_fstrm_free_func);
+}
 
-	if (pip)
-		xfs_iunlock(pip, XFS_IOLOCK_EXCL);
+void
+xfs_filestream_unmount(
+	xfs_mount_t	*mp)
+{
+	xfs_mru_cache_destroy(mp->m_filestream);
+}
 
-	return err;
+
+/* needs to return a positive errno for the init path */
+int
+xfs_filestream_init(void)
+{
+	item_zone = kmem_zone_init(sizeof(struct xfs_fstrm_item), "fstrm_item");
+	if (!item_zone)
+		return -ENOMEM;
+	return 0;
 }
 
-/*
- * Remove an association between an inode and a filestream object.
- * Typically this is done on last close of an unlinked file.
- */
 void
-xfs_filestream_deassociate(
-	xfs_inode_t	*ip)
+xfs_filestream_uninit(void)
 {
-	xfs_mru_cache_delete(ip->i_mount->m_filestream, ip->i_ino);
+	kmem_zone_destroy(item_zone);
 }
diff --git a/fs/xfs/xfs_filestream.h b/fs/xfs/xfs_filestream.h
index c4fa9a0..e3a25f8 100644
--- a/fs/xfs/xfs_filestream.h
+++ b/fs/xfs/xfs_filestream.h
@@ -20,44 +20,17 @@
 
 struct xfs_mount;
 struct xfs_inode;
-struct xfs_perag;
 struct xfs_bmalloca;
 
-#ifdef XFS_FILESTREAMS_TRACE
-#define XFS_FSTRM_KTRACE_INFO		1
-#define XFS_FSTRM_KTRACE_AGSCAN		2
-#define XFS_FSTRM_KTRACE_AGPICK1	3
-#define XFS_FSTRM_KTRACE_AGPICK2	4
-#define XFS_FSTRM_KTRACE_UPDATE		5
-#define XFS_FSTRM_KTRACE_FREE		6
-#define	XFS_FSTRM_KTRACE_ITEM_LOOKUP	7
-#define	XFS_FSTRM_KTRACE_ASSOCIATE	8
-#define	XFS_FSTRM_KTRACE_MOVEAG		9
-#define	XFS_FSTRM_KTRACE_ORPHAN		10
-
-#define XFS_FSTRM_KTRACE_SIZE	16384
-extern ktrace_t *xfs_filestreams_trace_buf;
-
-#endif
-
-/* allocation selection flags */
-typedef enum xfs_fstrm_alloc {
-	XFS_PICK_USERDATA = 1,
-	XFS_PICK_LOWSPACE = 2,
-} xfs_fstrm_alloc_t;
-
-/* prototypes for filestream.c */
 int xfs_filestream_init(void);
 void xfs_filestream_uninit(void);
 int xfs_filestream_mount(struct xfs_mount *mp);
 void xfs_filestream_unmount(struct xfs_mount *mp);
-xfs_agnumber_t xfs_filestream_lookup_ag(struct xfs_inode *ip);
-int xfs_filestream_associate(struct xfs_inode *dip, struct xfs_inode *ip);
 void xfs_filestream_deassociate(struct xfs_inode *ip);
+xfs_agnumber_t xfs_filestream_lookup_ag(struct xfs_inode *ip);
+int xfs_filestream_associate(struct xfs_inode *dip);
 int xfs_filestream_new_ag(struct xfs_bmalloca *ap, xfs_agnumber_t *agp);
 
-
-/* filestreams for the inode? */
 static inline int
 xfs_inode_is_filestream(
 	struct xfs_inode	*ip)
diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
index 3328320..b9b531f 100644
--- a/fs/xfs/xfs_inode.c
+++ b/fs/xfs/xfs_inode.c
@@ -846,9 +846,9 @@ xfs_ialloc(
 
 	/* now we have set up the vfs inode we can associate the filestream */
 	if (filestreams) {
-		error = xfs_filestream_associate(pip, ip);
-		if (error < 0)
-			return -error;
+		error = xfs_filestream_associate(pip);
+		if (error)
+			return error;
 	}
 
 	*ipp = ip;
@@ -1696,16 +1696,6 @@ xfs_release(
 		int truncated;
 
 		/*
-		 * If we are using filestreams, and we have an unlinked
-		 * file that we are processing the last close on, then nothing
-		 * will be able to reopen and write to this file. Purge this
-		 * inode from the filestreams cache so that it doesn't delay
-		 * teardown of the inode.
-		 */
-		if ((ip->i_d.di_nlink == 0) && xfs_inode_is_filestream(ip))
-			xfs_filestream_deassociate(ip);
-
-		/*
 		 * If we previously truncated this file and removed old data
 		 * in the process, we want to initiate "early" writeout on
 		 * the last close.  This is an attempt to combat the notorious
@@ -2661,13 +2651,7 @@ xfs_remove(
 	if (error)
 		goto std_return;
 
-	/*
-	 * If we are using filestreams, kill the stream association.
-	 * If the file is still open it may get a new one but that
-	 * will get killed on last close in xfs_close() so we don't
-	 * have to worry about that.
-	 */
-	if (!is_dir && link_zero && xfs_inode_is_filestream(ip))
+	if (is_dir && xfs_inode_is_filestream(ip))
 		xfs_filestream_deassociate(ip);
 
 	return 0;
-- 
1.7.10.4

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 7/9] xfs: don't create a slab cache for filestream items
  2014-04-12  8:01 filestream allocator rewrite Christoph Hellwig
                   ` (5 preceding siblings ...)
  2014-04-12  8:02 ` [PATCH 6/9] xfs: rewrite the filestream allocator using the dentry cache Christoph Hellwig
@ 2014-04-12  8:02 ` Christoph Hellwig
  2014-04-12  8:02 ` [PATCH 8/9] xfs: remove xfs_filestream_associate Christoph Hellwig
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 15+ messages in thread
From: Christoph Hellwig @ 2014-04-12  8:02 UTC (permalink / raw)
  To: xfs

We only have very few of these around, and allocation isn't that
much of a hot path.  Remove the slab cache to simplify the code,
and to not waste any resources for the usual case of not having
any inodes that use the filestream allocator.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/xfs/xfs_filestream.c |   25 +++----------------------
 fs/xfs/xfs_filestream.h |    2 --
 fs/xfs/xfs_super.c      |    9 +--------
 3 files changed, 4 insertions(+), 32 deletions(-)

diff --git a/fs/xfs/xfs_filestream.c b/fs/xfs/xfs_filestream.c
index ff6f902..7b94036 100644
--- a/fs/xfs/xfs_filestream.c
+++ b/fs/xfs/xfs_filestream.c
@@ -39,8 +39,6 @@
 #define TRACE_FREE(mp, ip, pip, ag, cnt)
 #define TRACE_LOOKUP(mp, ip, pip, ag, cnt)
 
-static kmem_zone_t *item_zone;
-
 struct xfs_fstrm_item {
 	struct xfs_mru_cache_elem	mru;
 	struct xfs_inode		*ip;
@@ -141,7 +139,7 @@ xfs_fstrm_free_func(
 	TRACE_FREE(mp, ip, NULL, item->ag,
 		   xfs_filestream_peek_ag(mp, item->ag));
 
-	kmem_zone_free(item_zone, item);
+	kmem_free(item);
 }
 
 /*
@@ -272,7 +270,7 @@ next_ag:
 		return 0;
 
 	err = ENOMEM;
-	item = kmem_zone_zalloc(item_zone, KM_MAYFAIL);
+	item = kmem_alloc(sizeof(*item), KM_MAYFAIL);
 	if (!item)
 		goto out_put_ag;
 
@@ -289,7 +287,7 @@ next_ag:
 	return 0;
 
 out_free_item:
-	kmem_zone_free(item_zone, item);
+	kmem_free(item);
 out_put_ag:
 	xfs_filestream_put_ag(mp, *agp);
 	return err;
@@ -474,20 +472,3 @@ xfs_filestream_unmount(
 {
 	xfs_mru_cache_destroy(mp->m_filestream);
 }
-
-
-/* needs to return a positive errno for the init path */
-int
-xfs_filestream_init(void)
-{
-	item_zone = kmem_zone_init(sizeof(struct xfs_fstrm_item), "fstrm_item");
-	if (!item_zone)
-		return -ENOMEM;
-	return 0;
-}
-
-void
-xfs_filestream_uninit(void)
-{
-	kmem_zone_destroy(item_zone);
-}
diff --git a/fs/xfs/xfs_filestream.h b/fs/xfs/xfs_filestream.h
index e3a25f8..578d49e 100644
--- a/fs/xfs/xfs_filestream.h
+++ b/fs/xfs/xfs_filestream.h
@@ -22,8 +22,6 @@ struct xfs_mount;
 struct xfs_inode;
 struct xfs_bmalloca;
 
-int xfs_filestream_init(void);
-void xfs_filestream_uninit(void);
 int xfs_filestream_mount(struct xfs_mount *mp);
 void xfs_filestream_unmount(struct xfs_mount *mp);
 void xfs_filestream_deassociate(struct xfs_inode *ip);
diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
index f317488..aab37af 100644
--- a/fs/xfs/xfs_super.c
+++ b/fs/xfs/xfs_super.c
@@ -1753,13 +1753,9 @@ init_xfs_fs(void)
 	if (error)
 		goto out_destroy_wq;
 
-	error = xfs_filestream_init();
-	if (error)
-		goto out_mru_cache_uninit;
-
 	error = xfs_buf_init();
 	if (error)
-		goto out_filestream_uninit;
+		goto out_mru_cache_uninit;
 
 	error = xfs_init_procfs();
 	if (error)
@@ -1786,8 +1782,6 @@ init_xfs_fs(void)
 	xfs_cleanup_procfs();
  out_buf_terminate:
 	xfs_buf_terminate();
- out_filestream_uninit:
-	xfs_filestream_uninit();
  out_mru_cache_uninit:
 	xfs_mru_cache_uninit();
  out_destroy_wq:
@@ -1806,7 +1800,6 @@ exit_xfs_fs(void)
 	xfs_sysctl_unregister();
 	xfs_cleanup_procfs();
 	xfs_buf_terminate();
-	xfs_filestream_uninit();
 	xfs_mru_cache_uninit();
 	xfs_destroy_workqueues();
 	xfs_destroy_zones();
-- 
1.7.10.4

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 8/9] xfs: remove xfs_filestream_associate
  2014-04-12  8:01 filestream allocator rewrite Christoph Hellwig
                   ` (6 preceding siblings ...)
  2014-04-12  8:02 ` [PATCH 7/9] xfs: don't create a slab cache for filestream items Christoph Hellwig
@ 2014-04-12  8:02 ` Christoph Hellwig
  2014-04-12  8:02 ` [PATCH 9/9] xfs: add filestream allocator tracepoints Christoph Hellwig
  2014-04-14  2:23 ` filestream allocator rewrite Dave Chinner
  9 siblings, 0 replies; 15+ messages in thread
From: Christoph Hellwig @ 2014-04-12  8:02 UTC (permalink / raw)
  To: xfs

There is no good reason to create a filestream when a directory entry
is created.  Delay it until the first allocation happens to simply
the code and reduce the amount of mru cache lookups we do.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/xfs/xfs_filestream.c |   57 +++++++++++++----------------------------------
 fs/xfs/xfs_filestream.h |    1 -
 fs/xfs/xfs_inode.c      |   15 -------------
 3 files changed, 15 insertions(+), 58 deletions(-)

diff --git a/fs/xfs/xfs_filestream.c b/fs/xfs/xfs_filestream.c
index 7b94036..c8a8840 100644
--- a/fs/xfs/xfs_filestream.c
+++ b/fs/xfs/xfs_filestream.c
@@ -318,17 +318,18 @@ out:
 }
 
 /*
- * Return the AG of the filestream the file or directory belongs to, or
- * NULLAGNUMBER otherwise.
+ * Find the right allocation group for a file, either by finding an
+ * existing file stream or creating a new one.
+ *
+ * Returns NULLAGNUMBER in case of an error.
  */
 xfs_agnumber_t
 xfs_filestream_lookup_ag(
 	struct xfs_inode	*ip)
 {
 	struct xfs_mount	*mp = ip->i_mount;
-	struct xfs_fstrm_item	*item;
 	struct xfs_inode	*pip = NULL;
-	xfs_agnumber_t		ag = NULLAGNUMBER;
+	xfs_agnumber_t		startag, ag = NULLAGNUMBER;
 	int			ref = 0;
 	struct xfs_mru_cache_elem *mru;
 
@@ -339,45 +340,13 @@ xfs_filestream_lookup_ag(
 		goto out;
 
 	mru = xfs_mru_cache_lookup(mp->m_filestream, pip->i_ino);
-	if (!mru)
-		goto out;
-
-	item = container_of(mru, struct xfs_fstrm_item, mru);
-
-	ag = item->ag;
-	xfs_mru_cache_done(mp->m_filestream);
-
-	ref = xfs_filestream_peek_ag(ip->i_mount, ag);
-out:
-	TRACE_LOOKUP(mp, ip, pip, ag, ref);
-	IRELE(pip);
-	return ag;
-}
-
-/*
- * Make sure a directory has a filestream associated with it.
- *
- * This is called when creating regular files in an directory that has
- * filestreams enabled, so that a stream is ready by the time we need it
- * in the allocator for the files inside the directory.
- */
-int
-xfs_filestream_associate(
-	struct xfs_inode	*pip)
-{
-	struct xfs_mount	*mp = pip->i_mount;
-	struct xfs_mru_cache_elem *mru;
-	xfs_agnumber_t		startag, ag;
-
-	ASSERT(S_ISDIR(pip->i_d.di_mode));
-
-	/*
-	 * If the directory already has a file stream associated we're done.
-	 */
-	mru = xfs_mru_cache_lookup(mp->m_filestream, pip->i_ino);
 	if (mru) {
+		ag = container_of(mru, struct xfs_fstrm_item, mru)->ag;
 		xfs_mru_cache_done(mp->m_filestream);
-		return 0;
+
+		ref = xfs_filestream_peek_ag(ip->i_mount, ag);
+		TRACE_LOOKUP(mp, ip, pip, ag, ref);
+		goto out;
 	}
 
 	/*
@@ -392,7 +361,11 @@ xfs_filestream_associate(
 	} else
 		startag = XFS_INO_TO_AGNO(mp, pip->i_ino);
 
-	return xfs_filestream_pick_ag(pip, startag, &ag, 0, 0);
+	if (xfs_filestream_pick_ag(pip, startag, &ag, 0, 0))
+		ag = NULLAGNUMBER;
+out:
+	IRELE(pip);
+	return ag;
 }
 
 /*
diff --git a/fs/xfs/xfs_filestream.h b/fs/xfs/xfs_filestream.h
index 578d49e..2de853e 100644
--- a/fs/xfs/xfs_filestream.h
+++ b/fs/xfs/xfs_filestream.h
@@ -26,7 +26,6 @@ int xfs_filestream_mount(struct xfs_mount *mp);
 void xfs_filestream_unmount(struct xfs_mount *mp);
 void xfs_filestream_deassociate(struct xfs_inode *ip);
 xfs_agnumber_t xfs_filestream_lookup_ag(struct xfs_inode *ip);
-int xfs_filestream_associate(struct xfs_inode *dip);
 int xfs_filestream_new_ag(struct xfs_bmalloca *ap, xfs_agnumber_t *agp);
 
 static inline int
diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
index b9b531f..cec18e9 100644
--- a/fs/xfs/xfs_inode.c
+++ b/fs/xfs/xfs_inode.c
@@ -655,7 +655,6 @@ xfs_ialloc(
 	uint		flags;
 	int		error;
 	timespec_t	tv;
-	int		filestreams = 0;
 
 	/*
 	 * Call the space management code to pick
@@ -772,13 +771,6 @@ xfs_ialloc(
 		flags |= XFS_ILOG_DEV;
 		break;
 	case S_IFREG:
-		/*
-		 * we can't set up filestreams until after the VFS inode
-		 * is set up properly.
-		 */
-		if (pip && xfs_inode_is_filestream(pip))
-			filestreams = 1;
-		/* fall through */
 	case S_IFDIR:
 		if (pip && (pip->i_d.di_flags & XFS_DIFLAG_ANY)) {
 			uint	di_flags = 0;
@@ -844,13 +836,6 @@ xfs_ialloc(
 	/* now that we have an i_mode we can setup inode ops and unlock */
 	xfs_setup_inode(ip);
 
-	/* now we have set up the vfs inode we can associate the filestream */
-	if (filestreams) {
-		error = xfs_filestream_associate(pip);
-		if (error)
-			return error;
-	}
-
 	*ipp = ip;
 	return 0;
 }
-- 
1.7.10.4

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 9/9] xfs: add filestream allocator tracepoints
  2014-04-12  8:01 filestream allocator rewrite Christoph Hellwig
                   ` (7 preceding siblings ...)
  2014-04-12  8:02 ` [PATCH 8/9] xfs: remove xfs_filestream_associate Christoph Hellwig
@ 2014-04-12  8:02 ` Christoph Hellwig
  2014-04-14  2:23 ` filestream allocator rewrite Dave Chinner
  9 siblings, 0 replies; 15+ messages in thread
From: Christoph Hellwig @ 2014-04-12  8:02 UTC (permalink / raw)
  To: xfs

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/xfs/xfs_filestream.c |   29 +++++++-----------------
 fs/xfs/xfs_filestream.h |    1 +
 fs/xfs/xfs_trace.c      |    1 +
 fs/xfs/xfs_trace.h      |   58 +++++++++++++++++++++++++++++++++++++++++++++++
 4 files changed, 68 insertions(+), 21 deletions(-)

diff --git a/fs/xfs/xfs_filestream.c b/fs/xfs/xfs_filestream.c
index c8a8840..8ec81be 100644
--- a/fs/xfs/xfs_filestream.c
+++ b/fs/xfs/xfs_filestream.c
@@ -33,12 +33,6 @@
 #include "xfs_filestream.h"
 #include "xfs_trace.h"
 
-#define TRACE_AG_SCAN(mp, ag, ag2)
-#define TRACE_AG_PICK1(mp, max_ag, maxfree)
-#define TRACE_AG_PICK2(mp, ag, ag2, cnt, free, scan, flag)
-#define TRACE_FREE(mp, ip, pip, ag, cnt)
-#define TRACE_LOOKUP(mp, ip, pip, ag, cnt)
-
 struct xfs_fstrm_item {
 	struct xfs_mru_cache_elem	mru;
 	struct xfs_inode		*ip;
@@ -87,7 +81,7 @@ enum xfs_fstrm_alloc {
  * the cache that reference per-ag array elements that have since been
  * reallocated.
  */
-static int
+int
 xfs_filestream_peek_ag(
 	xfs_mount_t	*mp,
 	xfs_agnumber_t	agno)
@@ -136,8 +130,7 @@ xfs_fstrm_free_func(
 
 	xfs_filestream_put_ag(item->ip->i_mount, item->ag);
 
-	TRACE_FREE(mp, ip, NULL, item->ag,
-		   xfs_filestream_peek_ag(mp, item->ag));
+	trace_xfs_filestream_free(item->ip, item->ag);
 
 	kmem_free(item);
 }
@@ -157,9 +150,8 @@ xfs_filestream_pick_ag(
 	struct xfs_mount	*mp = ip->i_mount;
 	struct xfs_fstrm_item	*item;
 	struct xfs_perag	*pag;
-	xfs_extlen_t		longest, free, minfree, maxfree = 0;
+	xfs_extlen_t		longest, free = 0, minfree, maxfree = 0;
 	xfs_agnumber_t		ag, max_ag = NULLAGNUMBER;
-	int			streams, max_streams;
 	int			err, trylock, nscan;
 
 	ASSERT(S_ISDIR(ip->i_d.di_mode));
@@ -174,8 +166,9 @@ xfs_filestream_pick_ag(
 	trylock = XFS_ALLOC_FLAG_TRYLOCK;
 
 	for (nscan = 0; 1; nscan++) {
+		trace_xfs_filestream_scan(ip, ag);
+
 		pag = xfs_perag_get(mp, ag);
-		TRACE_AG_SCAN(mp, ag, atomic_read(&pag->pagf_fstrms));
 
 		if (!pag->pagf_init) {
 			err = xfs_alloc_pagf_init(mp, NULL, ag, trylock);
@@ -192,7 +185,6 @@ xfs_filestream_pick_ag(
 		/* Keep track of the AG with the most free blocks. */
 		if (pag->pagf_freeblks > maxfree) {
 			maxfree = pag->pagf_freeblks;
-			max_streams = atomic_read(&pag->pagf_fstrms);
 			max_ag = ag;
 		}
 
@@ -215,7 +207,6 @@ xfs_filestream_pick_ag(
 
 			/* Break out, retaining the reference on the AG. */
 			free = pag->pagf_freeblks;
-			streams = atomic_read(&pag->pagf_fstrms);
 			xfs_perag_put(pag);
 			*agp = ag;
 			break;
@@ -251,20 +242,18 @@ next_ag:
 		 */
 		if (max_ag != NULLAGNUMBER) {
 			xfs_filestream_get_ag(mp, max_ag);
-			TRACE_AG_PICK1(mp, max_ag, maxfree);
-			streams = max_streams;
 			free = maxfree;
 			*agp = max_ag;
 			break;
 		}
 
 		/* take AG 0 if none matched */
-		TRACE_AG_PICK1(mp, max_ag, maxfree);
+		trace_xfs_filestream_pick(ip, *agp, free, nscan);
 		*agp = 0;
 		return 0;
 	}
 
-	TRACE_AG_PICK2(mp, startag, *agp, streams, free, nscan, flags);
+	trace_xfs_filestream_pick(ip, *agp, free, nscan);
 
 	if (*agp == NULLAGNUMBER)
 		return 0;
@@ -330,7 +319,6 @@ xfs_filestream_lookup_ag(
 	struct xfs_mount	*mp = ip->i_mount;
 	struct xfs_inode	*pip = NULL;
 	xfs_agnumber_t		startag, ag = NULLAGNUMBER;
-	int			ref = 0;
 	struct xfs_mru_cache_elem *mru;
 
 	ASSERT(S_ISREG(ip->i_d.di_mode));
@@ -344,8 +332,7 @@ xfs_filestream_lookup_ag(
 		ag = container_of(mru, struct xfs_fstrm_item, mru)->ag;
 		xfs_mru_cache_done(mp->m_filestream);
 
-		ref = xfs_filestream_peek_ag(ip->i_mount, ag);
-		TRACE_LOOKUP(mp, ip, pip, ag, ref);
+		trace_xfs_filestream_lookup(ip, ag);
 		goto out;
 	}
 
diff --git a/fs/xfs/xfs_filestream.h b/fs/xfs/xfs_filestream.h
index 2de853e..2ef4340 100644
--- a/fs/xfs/xfs_filestream.h
+++ b/fs/xfs/xfs_filestream.h
@@ -27,6 +27,7 @@ void xfs_filestream_unmount(struct xfs_mount *mp);
 void xfs_filestream_deassociate(struct xfs_inode *ip);
 xfs_agnumber_t xfs_filestream_lookup_ag(struct xfs_inode *ip);
 int xfs_filestream_new_ag(struct xfs_bmalloca *ap, xfs_agnumber_t *agp);
+int xfs_filestream_peek_ag(struct xfs_mount *mp, xfs_agnumber_t agno);
 
 static inline int
 xfs_inode_is_filestream(
diff --git a/fs/xfs/xfs_trace.c b/fs/xfs/xfs_trace.c
index dee3279..1e85bcd 100644
--- a/fs/xfs/xfs_trace.c
+++ b/fs/xfs/xfs_trace.c
@@ -46,6 +46,7 @@
 #include "xfs_log_recover.h"
 #include "xfs_inode_item.h"
 #include "xfs_bmap_btree.h"
+#include "xfs_filestream.h"
 
 /*
  * We include this last to have the helpers above available for the trace
diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
index 65d8c79..6910458 100644
--- a/fs/xfs/xfs_trace.h
+++ b/fs/xfs/xfs_trace.h
@@ -538,6 +538,64 @@ DEFINE_BUF_ITEM_EVENT(xfs_trans_bhold_release);
 DEFINE_BUF_ITEM_EVENT(xfs_trans_binval);
 DEFINE_BUF_ITEM_EVENT(xfs_trans_buf_ordered);
 
+DECLARE_EVENT_CLASS(xfs_filestream_class,
+	TP_PROTO(struct xfs_inode *ip, xfs_agnumber_t agno),
+	TP_ARGS(ip, agno),
+	TP_STRUCT__entry(
+		__field(dev_t, dev)
+		__field(xfs_ino_t, ino)
+		__field(xfs_agnumber_t, agno)
+		__field(int, streams)
+	),
+	TP_fast_assign(
+		__entry->dev = VFS_I(ip)->i_sb->s_dev;
+		__entry->ino = ip->i_ino;
+		__entry->agno = agno;
+		__entry->streams = xfs_filestream_peek_ag(ip->i_mount, agno);
+	),
+	TP_printk("dev %d:%d ino 0x%llx agno %u streams %d",
+		  MAJOR(__entry->dev), MINOR(__entry->dev),
+		  __entry->ino,
+		  __entry->agno,
+		  __entry->streams)
+)
+#define DEFINE_FILESTREAM_EVENT(name) \
+DEFINE_EVENT(xfs_filestream_class, name, \
+	TP_PROTO(struct xfs_inode *ip, xfs_agnumber_t agno), \
+	TP_ARGS(ip, agno))
+DEFINE_FILESTREAM_EVENT(xfs_filestream_free);
+DEFINE_FILESTREAM_EVENT(xfs_filestream_lookup);
+DEFINE_FILESTREAM_EVENT(xfs_filestream_scan);
+
+TRACE_EVENT(xfs_filestream_pick,
+	TP_PROTO(struct xfs_inode *ip, xfs_agnumber_t agno,
+		 xfs_extlen_t free, int nscan),
+	TP_ARGS(ip, agno, free, nscan),
+	TP_STRUCT__entry(
+		__field(dev_t, dev)
+		__field(xfs_ino_t, ino)
+		__field(xfs_agnumber_t, agno)
+		__field(int, streams)
+		__field(xfs_extlen_t, free)
+		__field(int, nscan)
+	),
+	TP_fast_assign(
+		__entry->dev = VFS_I(ip)->i_sb->s_dev;
+		__entry->ino = ip->i_ino;
+		__entry->agno = agno;
+		__entry->streams = xfs_filestream_peek_ag(ip->i_mount, agno);
+		__entry->free = free;
+		__entry->nscan = nscan;
+	),
+	TP_printk("dev %d:%d ino 0x%llx agno %u streams %d free %d nscan %d",
+		  MAJOR(__entry->dev), MINOR(__entry->dev),
+		  __entry->ino,
+		  __entry->agno,
+		  __entry->streams,
+		  __entry->free,
+		  __entry->nscan)
+);
+
 DECLARE_EVENT_CLASS(xfs_lock_class,
 	TP_PROTO(struct xfs_inode *ip, unsigned lock_flags,
 		 unsigned long caller_ip),
-- 
1.7.10.4

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/9] xfs: split xfs_bmap_btalloc_nullfb
  2014-04-12  8:01 ` [PATCH 2/9] xfs: split xfs_bmap_btalloc_nullfb Christoph Hellwig
@ 2014-04-14  1:55   ` Dave Chinner
  2014-04-14  7:03     ` Christoph Hellwig
  0 siblings, 1 reply; 15+ messages in thread
From: Dave Chinner @ 2014-04-14  1:55 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: xfs

On Sat, Apr 12, 2014 at 10:01:56AM +0200, Christoph Hellwig wrote:
> Split xfs_bmap_btalloc_nullfb into one function for filestream allocations
> and one for everything else that share a few helpers.  This dramatically
> simplifies the control flow.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>

Small comment below.

> ---
>  fs/xfs/xfs_bmap.c |  200 +++++++++++++++++++++++++++++++----------------------
>  1 file changed, 117 insertions(+), 83 deletions(-)
> 
> diff --git a/fs/xfs/xfs_bmap.c b/fs/xfs/xfs_bmap.c
> index 3340f0e..d5d52df 100644
> --- a/fs/xfs/xfs_bmap.c
> +++ b/fs/xfs/xfs_bmap.c
> @@ -3517,6 +3517,67 @@ xfs_bmap_adjacent(
>  #undef ISVALID
>  }
>  
> +static int
> +xfs_bmap_longest_free_extent(
> +	struct xfs_trans	*tp,
> +	xfs_agnumber_t		ag,
> +	xfs_extlen_t		*blen,
> +	int			*notinit)
> +{
> +	struct xfs_mount	*mp = tp->t_mountp;
> +	struct xfs_perag	*pag;
> +	xfs_extlen_t		longest;
> +	int			error = 0;
> +
> +	pag = xfs_perag_get(mp, ag);
> +	if (!pag->pagf_init) {
> +		error = xfs_alloc_pagf_init(mp, tp, ag, XFS_ALLOC_FLAG_TRYLOCK);
> +		if (error)
> +			goto out;
> +
> +		if (!pag->pagf_init) {
> +			*notinit = 1;
> +			goto out;
> +		}
> +	}
> +
> +	longest = xfs_alloc_longest_free_extent(mp, pag);
> +	if (*blen < longest)
> +		*blen = longest;
> +
> +out:
> +	xfs_perag_put(pag);
> +	return error;
> +}
> +
> +static void
> +xfs_bmap_fix_args(
> +	struct xfs_bmalloca	*ap,
> +	struct xfs_alloc_arg	*args,
> +	xfs_extlen_t		*blen,
> +	int			notinit)

Not a big fan of the "fix_args" name here. What it's actually doing
is setting the minimum allocation length. Also, it could return the
minimum length rather than set it inside the function, and pass
blen by value rather than reference as it isn't modified in the
function...

Other than that, it's a good cleanup.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: filestream allocator rewrite
  2014-04-12  8:01 filestream allocator rewrite Christoph Hellwig
                   ` (8 preceding siblings ...)
  2014-04-12  8:02 ` [PATCH 9/9] xfs: add filestream allocator tracepoints Christoph Hellwig
@ 2014-04-14  2:23 ` Dave Chinner
  2014-04-22 21:06   ` Dave Chinner
  9 siblings, 1 reply; 15+ messages in thread
From: Dave Chinner @ 2014-04-14  2:23 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: xfs

On Sat, Apr 12, 2014 at 10:01:54AM +0200, Christoph Hellwig wrote:
> This series is a major overhaul of the filestream allocator.  The main point
> is to use the dcache to get the parent and avoiding to maintain a fstrm_item
> for each inode that the filestream is applied to in favor of just tracking
> the parent, but there's various more or less related fallout from this as well.

OVerall it looks good. I'll need to do some testing on it and 
have a deeper look at the main use-the-dentry-cache patch, but it
removes a heap of complexity and code and gets rid of the use of
inode IO locks, so there's a loot to like here...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/9] xfs: split xfs_bmap_btalloc_nullfb
  2014-04-14  1:55   ` Dave Chinner
@ 2014-04-14  7:03     ` Christoph Hellwig
  2014-04-14  7:17       ` Dave Chinner
  0 siblings, 1 reply; 15+ messages in thread
From: Christoph Hellwig @ 2014-04-14  7:03 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

On Mon, Apr 14, 2014 at 11:55:44AM +1000, Dave Chinner wrote:
> > +static void
> > +xfs_bmap_fix_args(
> > +	struct xfs_bmalloca	*ap,
> > +	struct xfs_alloc_arg	*args,
> > +	xfs_extlen_t		*blen,
> > +	int			notinit)
> 
> Not a big fan of the "fix_args" name here. What it's actually doing
> is setting the minimum allocation length. Also, it could return the
> minimum length rather than set it inside the function, and pass
> blen by value rather than reference as it isn't modified in the
> function...

Sure.  A any good suggestion for a name?

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/9] xfs: split xfs_bmap_btalloc_nullfb
  2014-04-14  7:03     ` Christoph Hellwig
@ 2014-04-14  7:17       ` Dave Chinner
  0 siblings, 0 replies; 15+ messages in thread
From: Dave Chinner @ 2014-04-14  7:17 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: xfs

On Mon, Apr 14, 2014 at 09:03:25AM +0200, Christoph Hellwig wrote:
> On Mon, Apr 14, 2014 at 11:55:44AM +1000, Dave Chinner wrote:
> > > +static void
> > > +xfs_bmap_fix_args(
> > > +	struct xfs_bmalloca	*ap,
> > > +	struct xfs_alloc_arg	*args,
> > > +	xfs_extlen_t		*blen,
> > > +	int			notinit)
> > 
> > Not a big fan of the "fix_args" name here. What it's actually doing
> > is setting the minimum allocation length. Also, it could return the
> > minimum length rather than set it inside the function, and pass
> > blen by value rather than reference as it isn't modified in the
> > function...
> 
> Sure.  A any good suggestion for a name?

<shrug>

xfs_bmap_select_minlen()?

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: filestream allocator rewrite
  2014-04-14  2:23 ` filestream allocator rewrite Dave Chinner
@ 2014-04-22 21:06   ` Dave Chinner
  0 siblings, 0 replies; 15+ messages in thread
From: Dave Chinner @ 2014-04-22 21:06 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: xfs

On Mon, Apr 14, 2014 at 12:23:47PM +1000, Dave Chinner wrote:
> On Sat, Apr 12, 2014 at 10:01:54AM +0200, Christoph Hellwig wrote:
> > This series is a major overhaul of the filestream allocator.  The main point
> > is to use the dcache to get the parent and avoiding to maintain a fstrm_item
> > for each inode that the filestream is applied to in favor of just tracking
> > the parent, but there's various more or less related fallout from this as well.
> 
> OVerall it looks good. I'll need to do some testing on it and 
> have a deeper look at the main use-the-dentry-cache patch, but it
> removes a heap of complexity and code and gets rid of the use of
> inode IO locks, so there's a loot to like here...

So I've looked over this in more detail and done some testing on it,
and apart from the comment I made about a function name, I can't
spot any problems with the code. In terms of testing, this passes
all of the filestreams group tests except of xfs/172, which still
randomly fails.  That makes it at least as reliable as the current
code.

However, the failure of xfs/172 is interesting. It is expecting
buffered IO to *fail* due to a short filestreams timeout that is
set. However, when it fails, it's actually separating the streams
correctly:

$ cat results//xfs/172.full
stream 1 AGs: 0 0 0 0 0 4 4 4
stream 2 AGs: 1 1 1 1 1 5 5 5
stream 3 AGs: 2 2 2 2 2 6 6 6
stream 4 AGs: 3 3 3 3 3 7 7 7
- streams are in separate AGs, expected _matching_
$

And that's because it's able to create an association with it's
parent at allocation time rather than create time. So, the test
failure is actually a case of "this works better than the old code"
and not a negative at all. And you can see where the AGs filled up
in this test, and so new associations for the streams had to be
made, and that worked just fine, too.

I've done some other, higher bandwidth testing similar to realtime
file-per-frame video ingest for 2k resolution video (12.8MB files,
writing as fast as possible using direct IO with a directory per
"video") and that works just fine at the limits of my local hardware
(about 850MB/s to disk).

So I'm going to modify the function name I disliked, and commit this
to a topic branch and push it into the for-next branch for wider
testing. If we need further modifications, we can fix them up
later....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2014-04-22 21:07 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-04-12  8:01 filestream allocator rewrite Christoph Hellwig
2014-04-12  8:01 ` [PATCH 1/9] xfs: don't try to use the filestream allocator for metadata allocations Christoph Hellwig
2014-04-12  8:01 ` [PATCH 2/9] xfs: split xfs_bmap_btalloc_nullfb Christoph Hellwig
2014-04-14  1:55   ` Dave Chinner
2014-04-14  7:03     ` Christoph Hellwig
2014-04-14  7:17       ` Dave Chinner
2014-04-12  8:01 ` [PATCH 3/9] xfs: handle duplicate entries in xfs_mru_cache_insert Christoph Hellwig
2014-04-12  8:01 ` [PATCH 4/9] xfs: embedd mru_elem into parent structure Christoph Hellwig
2014-04-12  8:01 ` [PATCH 5/9] xfs: remove XFS_IFILESTREAM Christoph Hellwig
2014-04-12  8:02 ` [PATCH 6/9] xfs: rewrite the filestream allocator using the dentry cache Christoph Hellwig
2014-04-12  8:02 ` [PATCH 7/9] xfs: don't create a slab cache for filestream items Christoph Hellwig
2014-04-12  8:02 ` [PATCH 8/9] xfs: remove xfs_filestream_associate Christoph Hellwig
2014-04-12  8:02 ` [PATCH 9/9] xfs: add filestream allocator tracepoints Christoph Hellwig
2014-04-14  2:23 ` filestream allocator rewrite Dave Chinner
2014-04-22 21:06   ` Dave Chinner

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.