All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/8] xfs: bug fixes for 3.4
@ 2012-03-02  4:11 Dave Chinner
  2012-03-02  4:11 ` [PATCH 1/8] xfs: clean up minor sparse warnings Dave Chinner
                   ` (7 more replies)
  0 siblings, 8 replies; 21+ messages in thread
From: Dave Chinner @ 2012-03-02  4:11 UTC (permalink / raw)
  To: xfs

This series contains various fixes for random bugs that are still sitting in my
local queue. The only new one is the last which (once again) fixes inode
reclaim during quota check.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH 1/8] xfs: clean up minor sparse warnings
  2012-03-02  4:11 [PATCH 0/8] xfs: bug fixes for 3.4 Dave Chinner
@ 2012-03-02  4:11 ` Dave Chinner
  2012-03-02  7:46   ` Christoph Hellwig
  2012-03-02  4:11 ` [PATCH 2/8] xfs: Fix open flag handling in open_by_handle code Dave Chinner
                   ` (6 subsequent siblings)
  7 siblings, 1 reply; 21+ messages in thread
From: Dave Chinner @ 2012-03-02  4:11 UTC (permalink / raw)
  To: xfs

From: Dave Chinner <dchinner@redhat.com>

From: Dave Chinner <dchinner@redhat.com>

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Ben Myers <bpm@sgi.com>
---
 fs/xfs/xfs_dir2_block.c |    1 +
 fs/xfs/xfs_ioctl32.c    |    2 +-
 fs/xfs/xfs_iops.c       |   13 ++++++++-----
 3 files changed, 10 insertions(+), 6 deletions(-)

diff --git a/fs/xfs/xfs_dir2_block.c b/fs/xfs/xfs_dir2_block.c
index 9245e02..d3b63ae 100644
--- a/fs/xfs/xfs_dir2_block.c
+++ b/fs/xfs/xfs_dir2_block.c
@@ -29,6 +29,7 @@
 #include "xfs_dinode.h"
 #include "xfs_inode.h"
 #include "xfs_inode_item.h"
+#include "xfs_dir2.h"
 #include "xfs_dir2_format.h"
 #include "xfs_dir2_priv.h"
 #include "xfs_error.h"
diff --git a/fs/xfs/xfs_ioctl32.c b/fs/xfs/xfs_ioctl32.c
index f9ccb7b..a849a54 100644
--- a/fs/xfs/xfs_ioctl32.c
+++ b/fs/xfs/xfs_ioctl32.c
@@ -293,7 +293,7 @@ xfs_compat_ioc_bulkstat(
 		int res;
 
 		error = xfs_bulkstat_one_compat(mp, inlast, bulkreq.ubuffer,
-				sizeof(compat_xfs_bstat_t), 0, &res);
+				sizeof(compat_xfs_bstat_t), NULL, &res);
 	} else if (cmd == XFS_IOC_FSBULKSTAT_32) {
 		error = xfs_bulkstat(mp, &inlast, &count,
 			xfs_bulkstat_one_compat, sizeof(compat_xfs_bstat_t),
diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c
index ab30253..2601d6b 100644
--- a/fs/xfs/xfs_iops.c
+++ b/fs/xfs/xfs_iops.c
@@ -103,12 +103,15 @@ xfs_mark_inode_dirty(
 }
 
 
-int xfs_initxattrs(struct inode *inode, const struct xattr *xattr_array,
-		   void *fs_info)
+static int
+xfs_initxattrs(
+	struct inode		*inode,
+	const struct xattr	*xattr_array,
+	void			*fs_info)
 {
-	const struct xattr *xattr;
-	struct xfs_inode *ip = XFS_I(inode);
-	int error = 0;
+	const struct xattr	*xattr;
+	struct xfs_inode	*ip = XFS_I(inode);
+	int			error = 0;
 
 	for (xattr = xattr_array; xattr->name != NULL; xattr++) {
 		error = xfs_attr_set(ip, xattr->name, xattr->value,
-- 
1.7.9

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 2/8] xfs: Fix open flag handling in open_by_handle code
  2012-03-02  4:11 [PATCH 0/8] xfs: bug fixes for 3.4 Dave Chinner
  2012-03-02  4:11 ` [PATCH 1/8] xfs: clean up minor sparse warnings Dave Chinner
@ 2012-03-02  4:11 ` Dave Chinner
  2012-03-02  7:47   ` Christoph Hellwig
  2012-03-02  4:11 ` [PATCH 3/8] xfs: handle kmalloc failure when reading attrs Dave Chinner
                   ` (5 subsequent siblings)
  7 siblings, 1 reply; 21+ messages in thread
From: Dave Chinner @ 2012-03-02  4:11 UTC (permalink / raw)
  To: xfs

From: Dave Chinner <dchinner@redhat.com>

From: Dave Chinner <dchinner@redhat.com>

Sparse identified some unsafe handling of open flags in the xfs open
by handle ioctl code. Update the code to use the correct access
macros to ensure that we handle the open flags correctly.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/xfs_ioctl.c |   11 +++--------
 1 files changed, 3 insertions(+), 8 deletions(-)

diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c
index 76f3ca5..86c86d8 100644
--- a/fs/xfs/xfs_ioctl.c
+++ b/fs/xfs/xfs_ioctl.c
@@ -230,24 +230,19 @@ xfs_open_by_handle(
 
 	/* Put open permission in namei format. */
 	permflag = hreq->oflags;
-	if ((permflag+1) & O_ACCMODE)
-		permflag++;
-	if (permflag & O_TRUNC)
-		permflag |= 2;
-
 	if ((!(permflag & O_APPEND) || (permflag & O_TRUNC)) &&
-	    (permflag & FMODE_WRITE) && IS_APPEND(inode)) {
+	    (OPEN_FMODE(permflag) & FMODE_WRITE) && IS_APPEND(inode)) {
 		error = -XFS_ERROR(EPERM);
 		goto out_dput;
 	}
 
-	if ((permflag & FMODE_WRITE) && IS_IMMUTABLE(inode)) {
+	if ((OPEN_FMODE(permflag) & FMODE_WRITE) && IS_IMMUTABLE(inode)) {
 		error = -XFS_ERROR(EACCES);
 		goto out_dput;
 	}
 
 	/* Can't write directories. */
-	if (S_ISDIR(inode->i_mode) && (permflag & FMODE_WRITE)) {
+	if (S_ISDIR(inode->i_mode) && (OPEN_FMODE(permflag) & FMODE_WRITE)) {
 		error = -XFS_ERROR(EISDIR);
 		goto out_dput;
 	}
-- 
1.7.9

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 3/8] xfs: handle kmalloc failure when reading attrs
  2012-03-02  4:11 [PATCH 0/8] xfs: bug fixes for 3.4 Dave Chinner
  2012-03-02  4:11 ` [PATCH 1/8] xfs: clean up minor sparse warnings Dave Chinner
  2012-03-02  4:11 ` [PATCH 2/8] xfs: Fix open flag handling in open_by_handle code Dave Chinner
@ 2012-03-02  4:11 ` Dave Chinner
  2012-03-02  7:49   ` Christoph Hellwig
  2012-03-02  4:11 ` [PATCH 4/8] xfs: avoid memory allocation failures in xfs_getbmap Dave Chinner
                   ` (4 subsequent siblings)
  7 siblings, 1 reply; 21+ messages in thread
From: Dave Chinner @ 2012-03-02  4:11 UTC (permalink / raw)
  To: xfs

From: Dave Chinner <dchinner@redhat.com>

From: Dave Chinner <dchinner@redhat.com>

xfsdump uses for a large buffer for extended attributes, which has a
kmalloc'd shadow buffer in the kernel. This can fail after the
system has been running for some time as it is a high order
allocation. Add a fallback to vmalloc so that it doesn't require
contiguous memory and so won't randomly fail while xfsdump is
running.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/xfs_ioctl.c |   11 +++++++----
 1 files changed, 7 insertions(+), 4 deletions(-)

diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c
index 86c86d8..7721962 100644
--- a/fs/xfs/xfs_ioctl.c
+++ b/fs/xfs/xfs_ioctl.c
@@ -445,9 +445,12 @@ xfs_attrmulti_attr_get(
 
 	if (*len > XATTR_SIZE_MAX)
 		return EINVAL;
-	kbuf = kmalloc(*len, GFP_KERNEL);
-	if (!kbuf)
-		return ENOMEM;
+	kbuf = kmem_zalloc(*len, KM_SLEEP | KM_MAYFAIL);
+	if (!kbuf) {
+		kbuf = kmem_zalloc_large(*len);
+		if (!kbuf)
+			return ENOMEM;
+	}
 
 	error = xfs_attr_get(XFS_I(inode), name, kbuf, (int *)len, flags);
 	if (error)
@@ -457,7 +460,7 @@ xfs_attrmulti_attr_get(
 		error = EFAULT;
 
  out_kfree:
-	kfree(kbuf);
+	kmem_free(kbuf);
 	return error;
 }
 
-- 
1.7.9

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 4/8] xfs: avoid memory allocation failures in xfs_getbmap
  2012-03-02  4:11 [PATCH 0/8] xfs: bug fixes for 3.4 Dave Chinner
                   ` (2 preceding siblings ...)
  2012-03-02  4:11 ` [PATCH 3/8] xfs: handle kmalloc failure when reading attrs Dave Chinner
@ 2012-03-02  4:11 ` Dave Chinner
  2012-03-02  7:49   ` Christoph Hellwig
  2012-03-02  4:11 ` [PATCH 5/8] xfs: introduce an allocation workqueue Dave Chinner
                   ` (3 subsequent siblings)
  7 siblings, 1 reply; 21+ messages in thread
From: Dave Chinner @ 2012-03-02  4:11 UTC (permalink / raw)
  To: xfs

From: Dave Chinner <dchinner@redhat.com>

From: Dave Chinner <dchinner@redhat.com>

xfs_getbmap uses for a large buffer for extents, which is kmalloc'd.
This can fail after the system has been running for some time as it
is a high order allocation. Add a fallback to vmalloc so that it
doesn't require contiguous memory and so won't randomly fail on
files with large extent lists.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/xfs_bmap.c |    8 ++++++--
 1 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/fs/xfs/xfs_bmap.c b/fs/xfs/xfs_bmap.c
index 188ef2f..d1ab08d 100644
--- a/fs/xfs/xfs_bmap.c
+++ b/fs/xfs/xfs_bmap.c
@@ -5536,8 +5536,12 @@ xfs_getbmap(
 	if (bmv->bmv_count > ULONG_MAX / sizeof(struct getbmapx))
 		return XFS_ERROR(ENOMEM);
 	out = kmem_zalloc(bmv->bmv_count * sizeof(struct getbmapx), KM_MAYFAIL);
-	if (!out)
-		return XFS_ERROR(ENOMEM);
+	if (!out) {
+		out = kmem_zalloc_large(bmv->bmv_count *
+					sizeof(struct getbmapx));
+		if (!out)
+			return XFS_ERROR(ENOMEM);
+	}
 
 	xfs_ilock(ip, XFS_IOLOCK_SHARED);
 	if (whichfork == XFS_DATA_FORK && !(iflags & BMV_IF_DELALLOC)) {
-- 
1.7.9

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 5/8] xfs: introduce an allocation workqueue
  2012-03-02  4:11 [PATCH 0/8] xfs: bug fixes for 3.4 Dave Chinner
                   ` (3 preceding siblings ...)
  2012-03-02  4:11 ` [PATCH 4/8] xfs: avoid memory allocation failures in xfs_getbmap Dave Chinner
@ 2012-03-02  4:11 ` Dave Chinner
  2012-03-02  4:11 ` [PATCH 6/8] xfs: remove remaining scraps of struct xfs_iomap Dave Chinner
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 21+ messages in thread
From: Dave Chinner @ 2012-03-02  4:11 UTC (permalink / raw)
  To: xfs

From: Dave Chinner <dchinner@redhat.com>

From: Dave Chinner <dchinner@redhat.com>

We currently have significant issues with the amount of stack that
allocation in XFS uses, especially in the writeback path. We can
easily consume 4k of stack between mapping the page, manipulating
the bmap btree and allocating blocks from the free list. Not to
mention btree block readahead and other functionality that issues IO
in the allocation path.

As a result, we can no longer fit allocation in the writeback path
in the stack space provided on x86_64. To alleviate this problem,
introduce an allocation workqueue and move all allocations to a
seperate context. This can be easily added as an interposing layer
into xfs_alloc_vextent(), which takes a single argument structure
and does not return until the allocation is complete or has failed.

To do this, add a work structure and a completion to the allocation
args structure. This allows xfs_alloc_vextent to queue the args onto
the workqueue and wait for it to be completed by the worker. This
can be done completely transparently to the caller.

The worker function needs to ensure that it sets and clears the
PF_TRANS flag appropriately as it is being run in an active
transaction context. Work can also be queued in a memory reclaim
context, so a rescuer is needed for the workqueue.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/xfs_alloc.c |   34 +++++++++++++++++++++++++++++++++-
 fs/xfs/xfs_alloc.h |    5 +++++
 fs/xfs/xfs_super.c |   16 ++++++++++++++++
 3 files changed, 54 insertions(+), 1 deletions(-)

diff --git a/fs/xfs/xfs_alloc.c b/fs/xfs/xfs_alloc.c
index ce84ffd..31e9033 100644
--- a/fs/xfs/xfs_alloc.c
+++ b/fs/xfs/xfs_alloc.c
@@ -35,6 +35,7 @@
 #include "xfs_error.h"
 #include "xfs_trace.h"
 
+struct workqueue_struct *xfs_alloc_wq;
 
 #define XFS_ABSDIFF(a,b)	(((a) <= (b)) ? ((b) - (a)) : ((a) - (b)))
 
@@ -2207,7 +2208,7 @@ xfs_alloc_read_agf(
  * group or loop over the allocation groups to find the result.
  */
 int				/* error */
-xfs_alloc_vextent(
+__xfs_alloc_vextent(
 	xfs_alloc_arg_t	*args)	/* allocation argument structure */
 {
 	xfs_agblock_t	agsize;	/* allocation group size */
@@ -2417,6 +2418,37 @@ error0:
 	return error;
 }
 
+static void
+xfs_alloc_vextent_worker(
+	struct work_struct	*work)
+{
+	struct xfs_alloc_arg	*args = container_of(work,
+						struct xfs_alloc_arg, work);
+	unsigned long		pflags;
+
+	/* we are in a transaction context here */
+	current_set_flags_nested(&pflags, PF_FSTRANS);
+
+	args->result = __xfs_alloc_vextent(args);
+	complete(args->done);
+
+	current_restore_flags_nested(&pflags, PF_FSTRANS);
+}
+
+
+int				/* error */
+xfs_alloc_vextent(
+	xfs_alloc_arg_t	*args)	/* allocation argument structure */
+{
+	DECLARE_COMPLETION_ONSTACK(done);
+
+	args->done = &done;
+	INIT_WORK(&args->work, xfs_alloc_vextent_worker);
+	queue_work(xfs_alloc_wq, &args->work);
+	wait_for_completion(&done);
+	return args->result;
+}
+
 /*
  * Free an extent.
  * Just break up the extent address and hand off to xfs_free_ag_extent
diff --git a/fs/xfs/xfs_alloc.h b/fs/xfs/xfs_alloc.h
index 2f52b92..ab5d0fd 100644
--- a/fs/xfs/xfs_alloc.h
+++ b/fs/xfs/xfs_alloc.h
@@ -25,6 +25,8 @@ struct xfs_perag;
 struct xfs_trans;
 struct xfs_busy_extent;
 
+extern struct workqueue_struct *xfs_alloc_wq;
+
 /*
  * Freespace allocation types.  Argument to xfs_alloc_[v]extent.
  */
@@ -119,6 +121,9 @@ typedef struct xfs_alloc_arg {
 	char		isfl;		/* set if is freelist blocks - !acctg */
 	char		userdata;	/* set if this is user data */
 	xfs_fsblock_t	firstblock;	/* io first block allocated */
+	struct completion *done;
+	struct work_struct work;
+	int		result;
 } xfs_alloc_arg_t;
 
 /*
diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
index 5e0d43f..13fa0cf 100644
--- a/fs/xfs/xfs_super.c
+++ b/fs/xfs/xfs_super.c
@@ -1604,12 +1604,28 @@ xfs_init_workqueues(void)
 	xfs_syncd_wq = alloc_workqueue("xfssyncd", WQ_NON_REENTRANT, 0);
 	if (!xfs_syncd_wq)
 		return -ENOMEM;
+
+	/*
+	 * The allocation workqueue can be used in memory reclaim situations
+	 * (writepage path), and parallelism is only limited by the number of
+	 * AGs in all the filesystems mounted. Hence use the default large
+	 * max_active value for this workqueue.
+	 */
+	xfs_alloc_wq = alloc_workqueue("xfsalloc", WQ_MEM_RECLAIM, 0);
+	if (!xfs_alloc_wq)
+		goto out_destroy_syncd;
+
 	return 0;
+
+out_destroy_syncd:
+	destroy_workqueue(xfs_syncd_wq);
+	return -ENOMEM;
 }
 
 STATIC void
 xfs_destroy_workqueues(void)
 {
+	destroy_workqueue(xfs_alloc_wq);
 	destroy_workqueue(xfs_syncd_wq);
 }
 
-- 
1.7.9

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 6/8] xfs: remove remaining scraps of struct xfs_iomap
  2012-03-02  4:11 [PATCH 0/8] xfs: bug fixes for 3.4 Dave Chinner
                   ` (4 preceding siblings ...)
  2012-03-02  4:11 ` [PATCH 5/8] xfs: introduce an allocation workqueue Dave Chinner
@ 2012-03-02  4:11 ` Dave Chinner
  2012-03-02  4:11 ` [PATCH 7/8] xfs: fix inode lookup race Dave Chinner
  2012-03-02  4:11 ` [PATCH 8/8] xfs: add a shrinker for quotacheck Dave Chinner
  7 siblings, 0 replies; 21+ messages in thread
From: Dave Chinner @ 2012-03-02  4:11 UTC (permalink / raw)
  To: xfs

From: Dave Chinner <dchinner@redhat.com>

From: Dave Chinner <dchinner@redhat.com>

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 fs/xfs/xfs_vnode.h    |    1 -
 fs/xfs/xfs_vnodeops.h |    3 ---
 2 files changed, 0 insertions(+), 4 deletions(-)

diff --git a/fs/xfs/xfs_vnode.h b/fs/xfs/xfs_vnode.h
index 7c220b4..db14d0c 100644
--- a/fs/xfs/xfs_vnode.h
+++ b/fs/xfs/xfs_vnode.h
@@ -22,7 +22,6 @@
 
 struct file;
 struct xfs_inode;
-struct xfs_iomap;
 struct attrlist_cursor_kern;
 
 /*
diff --git a/fs/xfs/xfs_vnodeops.h b/fs/xfs/xfs_vnodeops.h
index 0c877cb..447e146 100644
--- a/fs/xfs/xfs_vnodeops.h
+++ b/fs/xfs/xfs_vnodeops.h
@@ -10,7 +10,6 @@ struct kiocb;
 struct pipe_inode_info;
 struct uio;
 struct xfs_inode;
-struct xfs_iomap;
 
 
 int xfs_setattr_nonsize(struct xfs_inode *ip, struct iattr *vap, int flags);
@@ -49,8 +48,6 @@ int xfs_attr_set(struct xfs_inode *dp, const unsigned char *name,
 int xfs_attr_remove(struct xfs_inode *dp, const unsigned char *name, int flags);
 int xfs_attr_list(struct xfs_inode *dp, char *buffer, int bufsize,
 		int flags, struct attrlist_cursor_kern *cursor);
-int xfs_bmap(struct xfs_inode *ip, xfs_off_t offset, ssize_t count,
-		int flags, struct xfs_iomap *iomapp, int *niomaps);
 void xfs_tosspages(struct xfs_inode *inode, xfs_off_t first,
 		xfs_off_t last, int fiopt);
 int xfs_flushinval_pages(struct xfs_inode *ip, xfs_off_t first,
-- 
1.7.9

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 7/8] xfs: fix inode lookup race
  2012-03-02  4:11 [PATCH 0/8] xfs: bug fixes for 3.4 Dave Chinner
                   ` (5 preceding siblings ...)
  2012-03-02  4:11 ` [PATCH 6/8] xfs: remove remaining scraps of struct xfs_iomap Dave Chinner
@ 2012-03-02  4:11 ` Dave Chinner
  2012-03-02  4:11 ` [PATCH 8/8] xfs: add a shrinker for quotacheck Dave Chinner
  7 siblings, 0 replies; 21+ messages in thread
From: Dave Chinner @ 2012-03-02  4:11 UTC (permalink / raw)
  To: xfs

From: Dave Chinner <dchinner@redhat.com>

From: Dave Chinner <dchinner@redhat.com>

When we get concurrent lookups of the same inode that is not in the
per-AG inode cache, there is a race condition that triggers warnings
in unlock_new_inode() indicating that we are initialising an inode
that isn't in a the correct state for a new inode.

When we do an inode lookup via a file handle or a bulkstat, we don't
serialise lookups at a higher level through the dentry cache (i.e.
pathless lookup), and so we can get concurrent lookups of the same
inode.

The race condition is between the insertion of the inode into the
cache in the case of a cache miss and a concurrently lookup:

Thread 1			Thread 2
xfs_iget()
  xfs_iget_cache_miss()
    xfs_iread()
    lock radix tree
    radix_tree_insert()
				rcu_read_lock
				radix_tree_lookup
				lock inode flags
				XFS_INEW not set
				igrab()
				unlock inode flags
				rcu_read_unlock
				use uninitialised inode
				.....
    lock inode flags
    set XFS_INEW
    unlock inode flags
    unlock radix tree
  xfs_setup_inode()
    inode flags = I_NEW
    unlock_new_inode()
      WARNING as inode flags != I_NEW

This can lead to inode corruption, inode list corruption, etc, and
is generally a bad thing to occur.

Fix this by setting XFS_INEW before inserting the inode into the
radix tree. This will ensure any concurrent lookup will find the new
inode with XFS_INEW set and that forces the lookup to wait until the
XFS_INEW flag is removed before allowing the lookup to succeed.

cc: <stable@vger.kernel.org> # for 3.0.x, 3.2.x
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Ben Myers <bpm@sgi.com>
---
 fs/xfs/xfs_iget.c |   18 ++++++++++++------
 1 files changed, 12 insertions(+), 6 deletions(-)

diff --git a/fs/xfs/xfs_iget.c b/fs/xfs/xfs_iget.c
index 37f22da..93fc1dc 100644
--- a/fs/xfs/xfs_iget.c
+++ b/fs/xfs/xfs_iget.c
@@ -350,9 +350,20 @@ xfs_iget_cache_miss(
 			BUG();
 	}
 
-	spin_lock(&pag->pag_ici_lock);
+	/*
+	 * These values must be set before inserting the inode into the radix
+	 * tree as the moment it is inserted a concurrent lookup (allowed by the
+	 * RCU locking mechanism) can find it and that lookup must see that this
+	 * is an inode currently under construction (i.e. that XFS_INEW is set).
+	 * The ip->i_flags_lock that protects the XFS_INEW flag forms the
+	 * memory barrier that ensures this detection works correctly at lookup
+	 * time.
+	 */
+	ip->i_udquot = ip->i_gdquot = NULL;
+	xfs_iflags_set(ip, XFS_INEW);
 
 	/* insert the new inode */
+	spin_lock(&pag->pag_ici_lock);
 	error = radix_tree_insert(&pag->pag_ici_root, agino, ip);
 	if (unlikely(error)) {
 		WARN_ON(error != -EEXIST);
@@ -360,11 +371,6 @@ xfs_iget_cache_miss(
 		error = EAGAIN;
 		goto out_preload_end;
 	}
-
-	/* These values _must_ be set before releasing the radix tree lock! */
-	ip->i_udquot = ip->i_gdquot = NULL;
-	xfs_iflags_set(ip, XFS_INEW);
-
 	spin_unlock(&pag->pag_ici_lock);
 	radix_tree_preload_end();
 
-- 
1.7.9

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 8/8] xfs: add a shrinker for quotacheck
  2012-03-02  4:11 [PATCH 0/8] xfs: bug fixes for 3.4 Dave Chinner
                   ` (6 preceding siblings ...)
  2012-03-02  4:11 ` [PATCH 7/8] xfs: fix inode lookup race Dave Chinner
@ 2012-03-02  4:11 ` Dave Chinner
  2012-03-02  7:51   ` Christoph Hellwig
  7 siblings, 1 reply; 21+ messages in thread
From: Dave Chinner @ 2012-03-02  4:11 UTC (permalink / raw)
  To: xfs

From: Dave Chinner <dchinner@redhat.com>

From: Dave Chinner <dchinner@redhat.com>

Whenthe filesystem is mounting, the superblock based shrinker cannot run due to
the mount process holding the sb->s_umount lock exclusively. Hence when
quotacheck runs, it cannot shrink the inode cache if it grows too large. We've
had repeated problems with the inode cache shrinker startup and quotacheck
resulting in OOM conditions.

Avoid this problem altogether by installing a quotacheck specific inode cache
shrinker that is registered before the quotacheck starts, and is then
unregistered after the quotacheck finishes. This shrinker uses exactly the same
infrastructure as the superblock based inode cache shrinker, so there is very
little extra code.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/xfs_qm.c |   28 ++++++++++++++++++++++++++++
 fs/xfs/xfs_qm.h |    2 ++
 2 files changed, 30 insertions(+), 0 deletions(-)

diff --git a/fs/xfs/xfs_qm.c b/fs/xfs/xfs_qm.c
index c872fea..c1a42f1 100644
--- a/fs/xfs/xfs_qm.c
+++ b/fs/xfs/xfs_qm.c
@@ -823,6 +823,7 @@ xfs_qm_init_quotainfo(
 		return error;
 	}
 
+	qinf->qi_mount = mp;
 	INIT_LIST_HEAD(&qinf->qi_dqlist);
 	mutex_init(&qinf->qi_dqlist_lock);
 	lockdep_set_class(&qinf->qi_dqlist_lock, &xfs_quota_mplist_class);
@@ -1398,6 +1399,26 @@ error0:
 }
 
 /*
+ * quotacheck specific shrinker. This is only active while quotacheck is in
+ * progress as the superblock shrinker won't run until the mount completes and
+ * drops the sb->s_umount lock.
+ */
+static int
+xfs_qm_quotacheck_shrink(
+	struct shrinker		*shr,
+	struct shrink_control	*sc)
+{
+	struct xfs_quotainfo	*qi = container_of(shr, struct xfs_quotainfo,
+							qi_shrinker);
+	if (!(sc->gfp_mask & __GFP_FS))
+		return -1;
+	if (sc->nr_to_scan)
+		xfs_reclaim_inodes_nr(qi->qi_mount, sc->nr_to_scan);
+
+	return xfs_reclaim_inodes_count(qi->qi_mount);
+}
+
+/*
  * Walk thru all the filesystem inodes and construct a consistent view
  * of the disk quota world. If the quotacheck fails, disable quotas.
  */
@@ -1410,6 +1431,7 @@ xfs_qm_quotacheck(
 	size_t		structsz;
 	xfs_inode_t	*uip, *gip;
 	uint		flags;
+	struct xfs_quotainfo *qi = mp->m_quotainfo;
 
 	count = INT_MAX;
 	structsz = 1;
@@ -1427,6 +1449,11 @@ xfs_qm_quotacheck(
 
 	xfs_notice(mp, "Quotacheck needed: Please wait.");
 
+	qi->qi_shrinker.seeks = DEFAULT_SEEKS;
+	qi->qi_shrinker.shrink = xfs_qm_quotacheck_shrink;
+	qi->qi_shrinker.batch = 1024;
+	register_shrinker(&qi->qi_shrinker);
+
 	/*
 	 * First we go thru all the dquots on disk, USR and GRP/PRJ, and reset
 	 * their counters to zero. We need a clean slate.
@@ -1500,6 +1527,7 @@ xfs_qm_quotacheck(
 	mp->m_qflags |= flags;
 
  error_return:
+	unregister_shrinker(&qi->qi_shrinker);
 	if (error) {
 		xfs_warn(mp,
 	"Quotacheck: Unsuccessful (Error %d): Disabling quotas.",
diff --git a/fs/xfs/xfs_qm.h b/fs/xfs/xfs_qm.h
index 9a9b997..9ee7990 100644
--- a/fs/xfs/xfs_qm.h
+++ b/fs/xfs/xfs_qm.h
@@ -73,6 +73,8 @@ typedef struct xfs_qm {
 typedef struct xfs_quotainfo {
 	xfs_inode_t	*qi_uquotaip;	 /* user quota inode */
 	xfs_inode_t	*qi_gquotaip;	 /* group quota inode */
+	struct xfs_mount *qi_mount;
+	struct shrinker	 qi_shrinker;
 	struct list_head qi_dqlist;	 /* all dquots in filesys */
 	struct mutex	 qi_dqlist_lock;
 	int		 qi_dquots;
-- 
1.7.9

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH 1/8] xfs: clean up minor sparse warnings
  2012-03-02  4:11 ` [PATCH 1/8] xfs: clean up minor sparse warnings Dave Chinner
@ 2012-03-02  7:46   ` Christoph Hellwig
  2012-03-02  7:59     ` Dave Chinner
  0 siblings, 1 reply; 21+ messages in thread
From: Christoph Hellwig @ 2012-03-02  7:46 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

On Fri, Mar 02, 2012 at 03:11:40PM +1100, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> From: Dave Chinner <dchinner@redhat.com>

Why do you use this double from line?  It confuses all the patch apply
scripts.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 2/8] xfs: Fix open flag handling in open_by_handle code
  2012-03-02  4:11 ` [PATCH 2/8] xfs: Fix open flag handling in open_by_handle code Dave Chinner
@ 2012-03-02  7:47   ` Christoph Hellwig
  0 siblings, 0 replies; 21+ messages in thread
From: Christoph Hellwig @ 2012-03-02  7:47 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

>  	/* Put open permission in namei format. */

this coment is obsolete now.

>  	permflag = hreq->oflags;
> -	if ((permflag+1) & O_ACCMODE)
> -		permflag++;
> -	if (permflag & O_TRUNC)
> -		permflag |= 2;
> -
>  	if ((!(permflag & O_APPEND) || (permflag & O_TRUNC)) &&
> -	    (permflag & FMODE_WRITE) && IS_APPEND(inode)) {
> +	    (OPEN_FMODE(permflag) & FMODE_WRITE) && IS_APPEND(inode)) {

It would seem cleaner to still keep a loca fmode_t fmode variable
instead of using OPEN_FMODE multiple times.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 3/8] xfs: handle kmalloc failure when reading attrs
  2012-03-02  4:11 ` [PATCH 3/8] xfs: handle kmalloc failure when reading attrs Dave Chinner
@ 2012-03-02  7:49   ` Christoph Hellwig
  2012-03-02  9:49     ` Dave Chinner
  0 siblings, 1 reply; 21+ messages in thread
From: Christoph Hellwig @ 2012-03-02  7:49 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

I think the subject should be more like:

xfs: fallback to vmalloc for large buffers in xfs_attrmulti_attr_get

> +	kbuf = kmem_zalloc(*len, KM_SLEEP | KM_MAYFAIL);
> +	if (!kbuf) {
> +		kbuf = kmem_zalloc_large(*len);
> +		if (!kbuf)
> +			return ENOMEM;
> +	}
>  
>  	error = xfs_attr_get(XFS_I(inode), name, kbuf, (int *)len, flags);
>  	if (error)
> @@ -457,7 +460,7 @@ xfs_attrmulti_attr_get(
>  		error = EFAULT;
>  
>   out_kfree:
> -	kfree(kbuf);
> +	kmem_free(kbuf);

kmem_free doesn't handle vmalloced buffers from kmem_zalloc_large, you
need to use kmem_free_large for them.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 4/8] xfs: avoid memory allocation failures in xfs_getbmap
  2012-03-02  4:11 ` [PATCH 4/8] xfs: avoid memory allocation failures in xfs_getbmap Dave Chinner
@ 2012-03-02  7:49   ` Christoph Hellwig
  0 siblings, 0 replies; 21+ messages in thread
From: Christoph Hellwig @ 2012-03-02  7:49 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

Both comments for the last patch apply here, too.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 8/8] xfs: add a shrinker for quotacheck
  2012-03-02  4:11 ` [PATCH 8/8] xfs: add a shrinker for quotacheck Dave Chinner
@ 2012-03-02  7:51   ` Christoph Hellwig
  2012-03-02 10:04     ` Dave Chinner
  0 siblings, 1 reply; 21+ messages in thread
From: Christoph Hellwig @ 2012-03-02  7:51 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

Hmm, I don't like this complication all that much.

Why would we even bother caching inodes during quotacheck?  The bulkstat
is a 100% sequential read only workload going through all inodes in the
filesystem.  I think we should simply not cache any inodes while in
quotacheck.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 1/8] xfs: clean up minor sparse warnings
  2012-03-02  7:46   ` Christoph Hellwig
@ 2012-03-02  7:59     ` Dave Chinner
  0 siblings, 0 replies; 21+ messages in thread
From: Dave Chinner @ 2012-03-02  7:59 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: xfs

On Fri, Mar 02, 2012 at 02:46:42AM -0500, Christoph Hellwig wrote:
> On Fri, Mar 02, 2012 at 03:11:40PM +1100, Dave Chinner wrote:
> > From: Dave Chinner <dchinner@redhat.com>
> > 
> > From: Dave Chinner <dchinner@redhat.com>
> 
> Why do you use this double from line?  It confuses all the patch apply
> scripts.

I didn't. It looks like the current version of guilt is busted
or doesn't work correctly with the current version of git. Basically
guilt hasn't stripped the From line out of the patch description
when using it for the commit message, so it's getting set by
git-send-email as well as being included in the commit message. It
must have broken when I updated everything last week...

I'll have a look at it and repost when I find out why it's now
busted.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 3/8] xfs: handle kmalloc failure when reading attrs
  2012-03-02  7:49   ` Christoph Hellwig
@ 2012-03-02  9:49     ` Dave Chinner
  2012-03-02 10:31       ` Christoph Hellwig
  0 siblings, 1 reply; 21+ messages in thread
From: Dave Chinner @ 2012-03-02  9:49 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: xfs

On Fri, Mar 02, 2012 at 02:49:20AM -0500, Christoph Hellwig wrote:
> I think the subject should be more like:
> 
> xfs: fallback to vmalloc for large buffers in xfs_attrmulti_attr_get

OK.

> > +	kbuf = kmem_zalloc(*len, KM_SLEEP | KM_MAYFAIL);
> > +	if (!kbuf) {
> > +		kbuf = kmem_zalloc_large(*len);
> > +		if (!kbuf)
> > +			return ENOMEM;
> > +	}
> >  
> >  	error = xfs_attr_get(XFS_I(inode), name, kbuf, (int *)len, flags);
> >  	if (error)
> > @@ -457,7 +460,7 @@ xfs_attrmulti_attr_get(
> >  		error = EFAULT;
> >  
> >   out_kfree:
> > -	kfree(kbuf);
> > +	kmem_free(kbuf);
> 
> kmem_free doesn't handle vmalloced buffers from kmem_zalloc_large, you
> need to use kmem_free_large for them.

static inline void kmem_free_large(void *ptr)
{
        vfree(ptr);
}

That only handles vmalloced memory, but kmem_free()
handles both kmalloc() and vmalloc() memory:

void
kmem_free(const void *ptr)
{
        if (!is_vmalloc_addr(ptr)) {
                kfree(ptr);
        } else {
                vfree(ptr);
        }
}

Avoiding having to open code this vmalloc check is exactly why I
chose kmem_free() here ;)

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 8/8] xfs: add a shrinker for quotacheck
  2012-03-02  7:51   ` Christoph Hellwig
@ 2012-03-02 10:04     ` Dave Chinner
  2012-03-02 10:38       ` Christoph Hellwig
  0 siblings, 1 reply; 21+ messages in thread
From: Dave Chinner @ 2012-03-02 10:04 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: xfs

On Fri, Mar 02, 2012 at 02:51:04AM -0500, Christoph Hellwig wrote:
> Hmm, I don't like this complication all that much.

Though it is a simple, self contained fix for the problem...

> Why would we even bother caching inodes during quotacheck?  The bulkstat
> is a 100% sequential read only workload going through all inodes in the
> filesystem.  I think we should simply not cache any inodes while in
> quotacheck.

I have tried that approach previously with inodes read through
bulkstat, but I couldn't find a clean workable solution. It kept
getting rather complex because all our caching and recycling is tied
into VFS level triggers. That was a while back, so maybe there is a
simpler solution that I missed in attempting to do this.

I suspect for a quotacheck only solution we can hack a check into
.drop_inode, but a generic coherent non-cached bulkstat lookup is
somewhat more troublesome.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 3/8] xfs: handle kmalloc failure when reading attrs
  2012-03-02  9:49     ` Dave Chinner
@ 2012-03-02 10:31       ` Christoph Hellwig
  0 siblings, 0 replies; 21+ messages in thread
From: Christoph Hellwig @ 2012-03-02 10:31 UTC (permalink / raw)
  To: Dave Chinner; +Cc: Christoph Hellwig, xfs

On Fri, Mar 02, 2012 at 08:49:38PM +1100, Dave Chinner wrote:
> That only handles vmalloced memory, but kmem_free()
> handles both kmalloc() and vmalloc() memory:
> 
> void
> kmem_free(const void *ptr)
> {
>         if (!is_vmalloc_addr(ptr)) {
>                 kfree(ptr);
>         } else {
>                 vfree(ptr);
>         }
> }
> 
> Avoiding having to open code this vmalloc check is exactly why I
> chose kmem_free() here ;)

Oh - this should have been removed when I changed kmem_alloc to not use
vmalloc anymore.  I'd really prefer to have this kind of check in the
callers as it should be the exception, not the rule.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 8/8] xfs: add a shrinker for quotacheck
  2012-03-02 10:04     ` Dave Chinner
@ 2012-03-02 10:38       ` Christoph Hellwig
  2012-03-02 11:14         ` Christoph Hellwig
  2012-03-05  1:49         ` Dave Chinner
  0 siblings, 2 replies; 21+ messages in thread
From: Christoph Hellwig @ 2012-03-02 10:38 UTC (permalink / raw)
  To: Dave Chinner; +Cc: Christoph Hellwig, xfs

On Fri, Mar 02, 2012 at 09:04:26PM +1100, Dave Chinner wrote:
> On Fri, Mar 02, 2012 at 02:51:04AM -0500, Christoph Hellwig wrote:
> > Hmm, I don't like this complication all that much.
> 
> Though it is a simple, self contained fix for the problem...

It just smells hacky.  If the non-caching version doesn't go anywhere
I won't veto it, but it's defintively not my favourite.

> > Why would we even bother caching inodes during quotacheck?  The bulkstat
> > is a 100% sequential read only workload going through all inodes in the
> > filesystem.  I think we should simply not cache any inodes while in
> > quotacheck.
> 
> I have tried that approach previously with inodes read through
> bulkstat, but I couldn't find a clean workable solution. It kept
> getting rather complex because all our caching and recycling is tied
> into VFS level triggers. That was a while back, so maybe there is a
> simpler solution that I missed in attempting to do this.
> 
> I suspect for a quotacheck only solution we can hack a check into
> .drop_inode, but a generic coherent non-cached bulkstat lookup is
> somewhat more troublesome.

Right, the whole issue also applies to any bulkstat.  But even for that
it doesn't seem that bad.

We add a new XFS_IGET_BULKSTAT flag for iget, which then sets an
XFS_INOTCACHE or similar flag on the inode.  If we see that in bulkstat
on a clean inode in ->drop_inode return true there, which takes care
of the VFS side.

For the XFS side we'd have to move the call to xfs_syncd_init earlier 
during the mount process, which effectively revers
2bcf6e970f5a88fa05dced5eeb0326e13d93c4a1.  That should be fine now that
we never call into the quota code from the sync work items.  If we want
to be entirely on the safe side we could only move starting the reclaim
work item earlier.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 8/8] xfs: add a shrinker for quotacheck
  2012-03-02 10:38       ` Christoph Hellwig
@ 2012-03-02 11:14         ` Christoph Hellwig
  2012-03-05  1:49         ` Dave Chinner
  1 sibling, 0 replies; 21+ messages in thread
From: Christoph Hellwig @ 2012-03-02 11:14 UTC (permalink / raw)
  To: Dave Chinner; +Cc: Christoph Hellwig, xfs

On Fri, Mar 02, 2012 at 05:38:31AM -0500, Christoph Hellwig wrote:
> Right, the whole issue also applies to any bulkstat.  But even for that
> it doesn't seem that bad.
> 
> We add a new XFS_IGET_BULKSTAT flag for iget, which then sets an
> XFS_INOTCACHE or similar flag on the inode.  If we see that in bulkstat
> on a clean inode in ->drop_inode return true there, which takes care
> of the VFS side.
> 
> For the XFS side we'd have to move the call to xfs_syncd_init earlier 
> during the mount process, which effectively revers
> 2bcf6e970f5a88fa05dced5eeb0326e13d93c4a1.  That should be fine now that
> we never call into the quota code from the sync work items.  If we want
> to be entirely on the safe side we could only move starting the reclaim
> work item earlier.

It seems like we actually only need the second for to fix the quotacheck
issue (or to be equivalent to your patch), even if the first one would
be nice to have eventually.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 8/8] xfs: add a shrinker for quotacheck
  2012-03-02 10:38       ` Christoph Hellwig
  2012-03-02 11:14         ` Christoph Hellwig
@ 2012-03-05  1:49         ` Dave Chinner
  1 sibling, 0 replies; 21+ messages in thread
From: Dave Chinner @ 2012-03-05  1:49 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: xfs

On Fri, Mar 02, 2012 at 05:38:31AM -0500, Christoph Hellwig wrote:
> On Fri, Mar 02, 2012 at 09:04:26PM +1100, Dave Chinner wrote:
> > On Fri, Mar 02, 2012 at 02:51:04AM -0500, Christoph Hellwig wrote:
> > > Hmm, I don't like this complication all that much.
> > 
> > Though it is a simple, self contained fix for the problem...
> 
> It just smells hacky.  If the non-caching version doesn't go anywhere
> I won't veto it, but it's defintively not my favourite.
> 
> > > Why would we even bother caching inodes during quotacheck?  The bulkstat
> > > is a 100% sequential read only workload going through all inodes in the
> > > filesystem.  I think we should simply not cache any inodes while in
> > > quotacheck.
> > 
> > I have tried that approach previously with inodes read through
> > bulkstat, but I couldn't find a clean workable solution. It kept
> > getting rather complex because all our caching and recycling is tied
> > into VFS level triggers. That was a while back, so maybe there is a
> > simpler solution that I missed in attempting to do this.
> > 
> > I suspect for a quotacheck only solution we can hack a check into
> > .drop_inode, but a generic coherent non-cached bulkstat lookup is
> > somewhat more troublesome.
> 
> Right, the whole issue also applies to any bulkstat.  But even for that
> it doesn't seem that bad.
> 
> We add a new XFS_IGET_BULKSTAT flag for iget, which then sets an
> XFS_INOTCACHE or similar flag on the inode.  If we see that in bulkstat
> on a clean inode in ->drop_inode return true there, which takes care
> of the VFS side.

Right, that's effectively what I did. All the problems came from
getting cache hits on an inode marked XFS_INOTCACHE and having to
convert it to a cached inode at that point. I suspect that the
problems I had related to the fact that this bug:

778e24b xfs: reset inode per-lifetime state when recycling it

had not been discovered at the time so that was likely related to
the problems I was seeing.

> For the XFS side we'd have to move the call to xfs_syncd_init earlier 
> during the mount process, which effectively revers
> 2bcf6e970f5a88fa05dced5eeb0326e13d93c4a1.  That should be fine now that
> we never call into the quota code from the sync work items.  If we want
> to be entirely on the safe side we could only move starting the reclaim
> work item earlier.

I initially suspected that all we needed to do here is check if
(mp->m_super->sb_flags & MS_ACTIVE) is set in the syncd work, and if
it isn't, just requeue the work again. That would prevent it from
running during mount and shutdown.

However, the reclaim work already checks this to prevent shutdown
races, so we can't actually queue inode reclaim work during the mount
process right now, either. Indeed, this is the only reason we are
not crashing on quotacheck right now - the syncd workqueue is not
intialised until after the quotacheck completes, but we are most
certainly trying to queue reclaim work during quotacheck. It's only
this check against MS_ACTIVE that is preventing quotacheck from
trying to queue work on an uninitialised workqueue.

This is turning into quite a mess - the additional shrinker might be
the simplest solution for 3.4....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2012-03-05  1:49 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-03-02  4:11 [PATCH 0/8] xfs: bug fixes for 3.4 Dave Chinner
2012-03-02  4:11 ` [PATCH 1/8] xfs: clean up minor sparse warnings Dave Chinner
2012-03-02  7:46   ` Christoph Hellwig
2012-03-02  7:59     ` Dave Chinner
2012-03-02  4:11 ` [PATCH 2/8] xfs: Fix open flag handling in open_by_handle code Dave Chinner
2012-03-02  7:47   ` Christoph Hellwig
2012-03-02  4:11 ` [PATCH 3/8] xfs: handle kmalloc failure when reading attrs Dave Chinner
2012-03-02  7:49   ` Christoph Hellwig
2012-03-02  9:49     ` Dave Chinner
2012-03-02 10:31       ` Christoph Hellwig
2012-03-02  4:11 ` [PATCH 4/8] xfs: avoid memory allocation failures in xfs_getbmap Dave Chinner
2012-03-02  7:49   ` Christoph Hellwig
2012-03-02  4:11 ` [PATCH 5/8] xfs: introduce an allocation workqueue Dave Chinner
2012-03-02  4:11 ` [PATCH 6/8] xfs: remove remaining scraps of struct xfs_iomap Dave Chinner
2012-03-02  4:11 ` [PATCH 7/8] xfs: fix inode lookup race Dave Chinner
2012-03-02  4:11 ` [PATCH 8/8] xfs: add a shrinker for quotacheck Dave Chinner
2012-03-02  7:51   ` Christoph Hellwig
2012-03-02 10:04     ` Dave Chinner
2012-03-02 10:38       ` Christoph Hellwig
2012-03-02 11:14         ` Christoph Hellwig
2012-03-05  1:49         ` Dave Chinner

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.