All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/5] for-3.2 queue
@ 2011-11-15 20:14 Christoph Hellwig
  2011-11-15 20:14 ` [PATCH 1/5] [PATCH] xfs: fix attr2 vs large data fork assert Christoph Hellwig
                   ` (2 more replies)
  0 siblings, 3 replies; 15+ messages in thread
From: Christoph Hellwig @ 2011-11-15 20:14 UTC (permalink / raw)
  To: xfs

With more reports showing up that the VFS writeback code is not able
to write back our size updates dirtied from the I/O completion handler
in reasonable time I think it's time to move to logging all file size
updates ASAP, that is for 3.2 and maybe after a reasonable testing
period even -stable.

This series has been sent out a few times, and I've been doing QA on
it for weeks.  Note that I haven't implemented the log space reservations
from ->writepage - see the actual patch for the rationale.

Also throw in the fork offset fix as it's an user-triggerable oops.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 1/5] [PATCH] xfs: fix attr2 vs large data fork assert
  2011-11-15 20:14 [PATCH 0/5] for-3.2 queue Christoph Hellwig
@ 2011-11-15 20:14 ` Christoph Hellwig
  2011-11-16 23:15   ` Dave Chinner
  2011-11-19 17:44   ` [PATCH v2] " Christoph Hellwig
  2011-11-15 20:14 ` [PATCH 2/5] xfs: use per-filesystem I/O completion workqueues Christoph Hellwig
  2011-11-15 20:14 ` [PATCH 3/5] xfs: do not require an ioend for new EOF calculation Christoph Hellwig
  2 siblings, 2 replies; 15+ messages in thread
From: Christoph Hellwig @ 2011-11-15 20:14 UTC (permalink / raw)
  To: xfs

[-- Attachment #1: xfs-fix-attr-oops --]
[-- Type: text/plain, Size: 3818 bytes --]

With Dmitry fsstress updates I've seen very reproducible crashes in
xfs_attr_shortform_remove because xfs_attr_shortform_bytesfit claims that
the attributes would not fit inline into the inode after removing an
attribute.  It turns out that we were operating on an inode with lots
of delalloc extents, and thus an if_bytes values for the data fork that
is larger than biggest possible on-disk storage for it which utterly
confuses the code near the end of xfs_attr_shortform_bytesfit.

Fix this by always allowing the current attribute fork, like we already
do for the attr1 format, given that delalloc conversion will take care
for moving either the data or attribute area out of line if it doesn't
fit at that point - or making the point moot by merging extents at this
point.

Also document the function better, and clean up some lose bits.

Signed-off-by: Christoph Hellwig <hch@lst.de>

Index: linux-2.6/fs/xfs/xfs_attr_leaf.c
===================================================================
--- linux-2.6.orig/fs/xfs/xfs_attr_leaf.c	2011-11-04 13:46:59.481655134 +0100
+++ linux-2.6/fs/xfs/xfs_attr_leaf.c	2011-11-04 15:58:06.480155275 +0100
@@ -110,6 +110,7 @@ xfs_attr_namesp_match(int arg_flags, int
 /*
  * Query whether the requested number of additional bytes of extended
  * attribute space will be able to fit inline.
+ *
  * Returns zero if not, else the di_forkoff fork offset to be used in the
  * literal area for attribute data once the new bytes have been added.
  *
@@ -136,11 +137,26 @@ xfs_attr_shortform_bytesfit(xfs_inode_t
 		return (offset >= minforkoff) ? minforkoff : 0;
 	}
 
-	if (!(mp->m_flags & XFS_MOUNT_ATTR2)) {
-		if (bytes <= XFS_IFORK_ASIZE(dp))
-			return dp->i_d.di_forkoff;
+	/*
+	 * If the requested numbers of bytes is smaller or equal to the
+	 * current attribute fork size we can always proceed.
+	 *
+	 * Note that if_bytes in the data fork might actually be larger than
+	 * the current data fork size is due to delalloc extents. In that
+	 * case either the extent count will go down when they are converted
+	 * to ral extents, or the delalloc conversion will take care of the
+	 * literal area rebalancing.
+	 */
+	if (bytes <= XFS_IFORK_ASIZE(dp))
+		return dp->i_d.di_forkoff;
+
+	/*
+	 * For attr2 we can try to move the forkoff if there is space in the
+	 * literal area, but for the old format we are done if there is no
+	 * space in the fixes attribute fork.
+	 */
+	if (!(mp->m_flags & XFS_MOUNT_ATTR2))
 		return 0;
-	}
 
 	dsize = dp->i_df.if_bytes;
 	
@@ -157,10 +173,9 @@ xfs_attr_shortform_bytesfit(xfs_inode_t
 		    xfs_default_attroffset(dp))
 			dsize = XFS_BMDR_SPACE_CALC(MINDBTPTRS);
 		break;
-		
 	case XFS_DINODE_FMT_BTREE:
 		/*
-		 * If have data btree then keep forkoff if we have one,
+		 * If have a data btree then keep forkoff if we have one,
 		 * otherwise we are adding a new attr, so then we set 
 		 * minforkoff to where the btree root can finish so we have 
 		 * plenty of room for attrs
@@ -168,10 +183,9 @@ xfs_attr_shortform_bytesfit(xfs_inode_t
 		if (dp->i_d.di_forkoff) {
 			if (offset < dp->i_d.di_forkoff) 
 				return 0;
-			else 
-				return dp->i_d.di_forkoff;
-		} else
-			dsize = XFS_BMAP_BROOT_SPACE(dp->i_df.if_broot);
+			return dp->i_d.di_forkoff;
+		}
+		dsize = XFS_BMAP_BROOT_SPACE(dp->i_df.if_broot);
 		break;
 	}
 	
@@ -186,10 +200,10 @@ xfs_attr_shortform_bytesfit(xfs_inode_t
 	maxforkoff = XFS_LITINO(mp) - XFS_BMDR_SPACE_CALC(MINABTPTRS);
 	maxforkoff = maxforkoff >> 3;	/* rounded down */
 
-	if (offset >= minforkoff && offset < maxforkoff)
-		return offset;
 	if (offset >= maxforkoff)
 		return maxforkoff;
+	if (offset >= minforkoff)
+		return offset;
 	return 0;
 }
 

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 2/5] xfs: use per-filesystem I/O completion workqueues
  2011-11-15 20:14 [PATCH 0/5] for-3.2 queue Christoph Hellwig
  2011-11-15 20:14 ` [PATCH 1/5] [PATCH] xfs: fix attr2 vs large data fork assert Christoph Hellwig
@ 2011-11-15 20:14 ` Christoph Hellwig
  2011-11-16 19:01   ` Ben Myers
  2011-11-16 23:24   ` Dave Chinner
  2011-11-15 20:14 ` [PATCH 3/5] xfs: do not require an ioend for new EOF calculation Christoph Hellwig
  2 siblings, 2 replies; 15+ messages in thread
From: Christoph Hellwig @ 2011-11-15 20:14 UTC (permalink / raw)
  To: xfs

[-- Attachment #1: xfs-split-workqueues --]
[-- Type: text/plain, Size: 8068 bytes --]

commit 77d7a0c "xfs: Non-blocking inode locking in IO completion" introduced
a trylocked and defer scheme in xfs_setfilesize to avoid deadlocks when on
XFS filesystem is used ontop of another using the loop device, and we
fsync in the loop filesystem.

Now that we have the cheap enough concurrency managed workqueues, we can
create per-filesystem instead of global workqueues and remove this scheme
again, given that it has the potential of delaying size updates and is not
helpful once we start to log the inode size.

Signed-off-by: Christoph Hellwig <hch@lst.de>

---
 fs/xfs/xfs_aops.c  |   39 ++++++++++-----------------------------
 fs/xfs/xfs_aops.h  |    2 --
 fs/xfs/xfs_buf.c   |   17 -----------------
 fs/xfs/xfs_mount.h |    3 +++
 fs/xfs/xfs_super.c |   43 ++++++++++++++++++++++++++++++++++++++++++-
 5 files changed, 55 insertions(+), 49 deletions(-)

Index: linux-2.6/fs/xfs/xfs_aops.c
===================================================================
--- linux-2.6.orig/fs/xfs/xfs_aops.c	2011-11-12 15:37:28.000000000 +0100
+++ linux-2.6/fs/xfs/xfs_aops.c	2011-11-15 09:14:16.856650210 +0100
@@ -131,21 +131,15 @@ static inline bool xfs_ioend_is_append(s
  * will be the intended file size until i_size is updated.  If this write does
  * not extend all the way to the valid file size then restrict this update to
  * the end of the write.
- *
- * This function does not block as blocking on the inode lock in IO completion
- * can lead to IO completion order dependency deadlocks.. If it can't get the
- * inode ilock it will return EAGAIN. Callers must handle this.
  */
-STATIC int
+STATIC void
 xfs_setfilesize(
-	xfs_ioend_t		*ioend)
+	struct xfs_ioend	*ioend)
 {
-	xfs_inode_t		*ip = XFS_I(ioend->io_inode);
+	struct xfs_inode	*ip = XFS_I(ioend->io_inode);
 	xfs_fsize_t		isize;
 
-	if (!xfs_ilock_nowait(ip, XFS_ILOCK_EXCL))
-		return EAGAIN;
-
+	xfs_ilock(ip, XFS_ILOCK_EXCL);
 	isize = xfs_ioend_new_eof(ioend);
 	if (isize) {
 		trace_xfs_setfilesize(ip, ioend->io_offset, ioend->io_size);
@@ -154,7 +148,6 @@ xfs_setfilesize(
 	}
 
 	xfs_iunlock(ip, XFS_ILOCK_EXCL);
-	return 0;
 }
 
 /*
@@ -168,10 +161,12 @@ xfs_finish_ioend(
 	struct xfs_ioend	*ioend)
 {
 	if (atomic_dec_and_test(&ioend->io_remaining)) {
+		struct xfs_mount	*mp = XFS_I(ioend->io_inode)->i_mount;
+
 		if (ioend->io_type == IO_UNWRITTEN)
-			queue_work(xfsconvertd_workqueue, &ioend->io_work);
+			queue_work(mp->m_unwritten_workqueue, &ioend->io_work);
 		else if (xfs_ioend_is_append(ioend))
-			queue_work(xfsdatad_workqueue, &ioend->io_work);
+			queue_work(mp->m_data_workqueue, &ioend->io_work);
 		else
 			xfs_destroy_ioend(ioend);
 	}
@@ -212,23 +207,9 @@ xfs_end_io(
 	 * We might have to update the on-disk file size after extending
 	 * writes.
 	 */
-	error = xfs_setfilesize(ioend);
-	ASSERT(!error || error == EAGAIN);
-
+	xfs_setfilesize(ioend);
 done:
-	/*
-	 * If we didn't complete processing of the ioend, requeue it to the
-	 * tail of the workqueue for another attempt later. Otherwise destroy
-	 * it.
-	 */
-	if (error == EAGAIN) {
-		atomic_inc(&ioend->io_remaining);
-		xfs_finish_ioend(ioend);
-		/* ensure we don't spin on blocked ioends */
-		delay(1);
-	} else {
-		xfs_destroy_ioend(ioend);
-	}
+	xfs_destroy_ioend(ioend);
 }
 
 /*
Index: linux-2.6/fs/xfs/xfs_aops.h
===================================================================
--- linux-2.6.orig/fs/xfs/xfs_aops.h	2011-11-10 16:50:40.751797337 +0100
+++ linux-2.6/fs/xfs/xfs_aops.h	2011-11-15 09:14:16.856650210 +0100
@@ -18,8 +18,6 @@
 #ifndef __XFS_AOPS_H__
 #define __XFS_AOPS_H__
 
-extern struct workqueue_struct *xfsdatad_workqueue;
-extern struct workqueue_struct *xfsconvertd_workqueue;
 extern mempool_t *xfs_ioend_pool;
 
 /*
Index: linux-2.6/fs/xfs/xfs_buf.c
===================================================================
--- linux-2.6.orig/fs/xfs/xfs_buf.c	2011-11-10 16:50:40.763795564 +0100
+++ linux-2.6/fs/xfs/xfs_buf.c	2011-11-15 09:14:16.859983547 +0100
@@ -45,8 +45,6 @@ static kmem_zone_t *xfs_buf_zone;
 STATIC int xfsbufd(void *);
 
 static struct workqueue_struct *xfslogd_workqueue;
-struct workqueue_struct *xfsdatad_workqueue;
-struct workqueue_struct *xfsconvertd_workqueue;
 
 #ifdef XFS_BUF_LOCK_TRACKING
 # define XB_SET_OWNER(bp)	((bp)->b_last_holder = current->pid)
@@ -1797,21 +1795,8 @@ xfs_buf_init(void)
 	if (!xfslogd_workqueue)
 		goto out_free_buf_zone;
 
-	xfsdatad_workqueue = alloc_workqueue("xfsdatad", WQ_MEM_RECLAIM, 1);
-	if (!xfsdatad_workqueue)
-		goto out_destroy_xfslogd_workqueue;
-
-	xfsconvertd_workqueue = alloc_workqueue("xfsconvertd",
-						WQ_MEM_RECLAIM, 1);
-	if (!xfsconvertd_workqueue)
-		goto out_destroy_xfsdatad_workqueue;
-
 	return 0;
 
- out_destroy_xfsdatad_workqueue:
-	destroy_workqueue(xfsdatad_workqueue);
- out_destroy_xfslogd_workqueue:
-	destroy_workqueue(xfslogd_workqueue);
  out_free_buf_zone:
 	kmem_zone_destroy(xfs_buf_zone);
  out:
@@ -1821,8 +1806,6 @@ xfs_buf_init(void)
 void
 xfs_buf_terminate(void)
 {
-	destroy_workqueue(xfsconvertd_workqueue);
-	destroy_workqueue(xfsdatad_workqueue);
 	destroy_workqueue(xfslogd_workqueue);
 	kmem_zone_destroy(xfs_buf_zone);
 }
Index: linux-2.6/fs/xfs/xfs_super.c
===================================================================
--- linux-2.6.orig/fs/xfs/xfs_super.c	2011-11-10 16:50:40.771795378 +0100
+++ linux-2.6/fs/xfs/xfs_super.c	2011-11-15 09:17:13.763315819 +0100
@@ -769,6 +769,42 @@ xfs_setup_devices(
 	return 0;
 }
 
+STATIC int
+xfs_init_mount_workqueues(
+	struct xfs_mount	*mp)
+{
+	snprintf(mp->m_data_workqueue_name, XFS_WQ_NAME_LEN,
+		 "xfs-data/%s", mp->m_fsname);
+	mp->m_data_workqueue =
+		alloc_workqueue(mp->m_data_workqueue_name, WQ_MEM_RECLAIM, 1);
+	if (!mp->m_data_workqueue)
+		goto out;
+
+	snprintf(mp->m_unwritten_workqueue_name, XFS_WQ_NAME_LEN,
+		 "xfs-conv/%s", mp->m_fsname);
+	mp->m_unwritten_workqueue =
+		alloc_workqueue(mp->m_unwritten_workqueue_name,
+				WQ_MEM_RECLAIM, 1);
+	if (!mp->m_unwritten_workqueue)
+		goto out_destroy_data_iodone_queue;
+
+	return 0;
+
+out_destroy_data_iodone_queue:
+	destroy_workqueue(mp->m_data_workqueue);
+out:
+	return -ENOMEM;
+#undef XFS_WQ_NAME_LEN
+}
+
+STATIC void
+xfs_destroy_mount_workqueues(
+	struct xfs_mount	*mp)
+{
+	destroy_workqueue(mp->m_data_workqueue);
+	destroy_workqueue(mp->m_unwritten_workqueue);
+}
+
 /* Catch misguided souls that try to use this interface on XFS */
 STATIC struct inode *
 xfs_fs_alloc_inode(
@@ -1020,6 +1056,7 @@ xfs_fs_put_super(
 	xfs_unmountfs(mp);
 	xfs_freesb(mp);
 	xfs_icsb_destroy_counters(mp);
+	xfs_destroy_mount_workqueues(mp);
 	xfs_close_devices(mp);
 	xfs_free_fsname(mp);
 	kfree(mp);
@@ -1353,10 +1390,14 @@ xfs_fs_fill_super(
 	if (error)
 		goto out_free_fsname;
 
-	error = xfs_icsb_init_counters(mp);
+	error = xfs_init_mount_workqueues(mp);
 	if (error)
 		goto out_close_devices;
 
+	error = xfs_icsb_init_counters(mp);
+	if (error)
+		goto out_destroy_workqueues;
+
 	error = xfs_readsb(mp, flags);
 	if (error)
 		goto out_destroy_counters;
@@ -1419,6 +1460,8 @@ xfs_fs_fill_super(
 	xfs_freesb(mp);
  out_destroy_counters:
 	xfs_icsb_destroy_counters(mp);
+out_destroy_workqueues:
+	xfs_destroy_mount_workqueues(mp);
  out_close_devices:
 	xfs_close_devices(mp);
  out_free_fsname:
Index: linux-2.6/fs/xfs/xfs_mount.h
===================================================================
--- linux-2.6.orig/fs/xfs/xfs_mount.h	2011-11-10 16:50:40.787796774 +0100
+++ linux-2.6/fs/xfs/xfs_mount.h	2011-11-15 09:15:25.053316473 +0100
@@ -211,6 +211,12 @@ typedef struct xfs_mount {
 	struct shrinker		m_inode_shrink;	/* inode reclaim shrinker */
 	int64_t			m_low_space[XFS_LOWSP_MAX];
 						/* low free space thresholds */
+
+	struct workqueue_struct	*m_data_workqueue;
+	struct workqueue_struct	*m_unwritten_workqueue;
+#define XFS_WQ_NAME_LEN		512
+	char			m_data_workqueue_name[XFS_WQ_NAME_LEN];
+	char			m_unwritten_workqueue_name[XFS_WQ_NAME_LEN];
 } xfs_mount_t;
 
 /*

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 3/5] xfs: do not require an ioend for new EOF calculation
  2011-11-15 20:14 [PATCH 0/5] for-3.2 queue Christoph Hellwig
  2011-11-15 20:14 ` [PATCH 1/5] [PATCH] xfs: fix attr2 vs large data fork assert Christoph Hellwig
  2011-11-15 20:14 ` [PATCH 2/5] xfs: use per-filesystem I/O completion workqueues Christoph Hellwig
@ 2011-11-15 20:14 ` Christoph Hellwig
  2011-11-16 18:09   ` Ben Myers
  2011-11-16 23:28   ` Dave Chinner
  2 siblings, 2 replies; 15+ messages in thread
From: Christoph Hellwig @ 2011-11-15 20:14 UTC (permalink / raw)
  To: xfs

[-- Attachment #1: xfs-simplify-eof-calculation --]
[-- Type: text/plain, Size: 3171 bytes --]

Replace xfs_ioend_new_eof with a new inline xfs_new_eof helper that
doesn't require and ioend, and is available also outside of xfs_aops.c.

Also make the code a bit more clear by using a normal if statement
instead of a slightly misleading MIN().

Signed-off-by: Christoph Hellwig <hch@lst.de>

---
 fs/xfs/xfs_aops.c  |   26 +++++---------------------
 fs/xfs/xfs_inode.h |   14 ++++++++++++++
 2 files changed, 19 insertions(+), 21 deletions(-)

Index: linux-2.6/fs/xfs/xfs_aops.c
===================================================================
--- linux-2.6.orig/fs/xfs/xfs_aops.c	2011-11-08 08:11:44.891887054 +0100
+++ linux-2.6/fs/xfs/xfs_aops.c	2011-11-08 08:12:31.586400976 +0100
@@ -99,24 +99,6 @@ xfs_destroy_ioend(
 }
 
 /*
- * If the end of the current ioend is beyond the current EOF,
- * return the new EOF value, otherwise zero.
- */
-STATIC xfs_fsize_t
-xfs_ioend_new_eof(
-	xfs_ioend_t		*ioend)
-{
-	xfs_inode_t		*ip = XFS_I(ioend->io_inode);
-	xfs_fsize_t		isize;
-	xfs_fsize_t		bsize;
-
-	bsize = ioend->io_offset + ioend->io_size;
-	isize = MAX(ip->i_size, ip->i_new_size);
-	isize = MIN(isize, bsize);
-	return isize > ip->i_d.di_size ? isize : 0;
-}
-
-/*
  * Fast and loose check if this write could update the on-disk inode size.
  */
 static inline bool xfs_ioend_is_append(struct xfs_ioend *ioend)
@@ -140,7 +122,7 @@ xfs_setfilesize(
 	xfs_fsize_t		isize;
 
 	xfs_ilock(ip, XFS_ILOCK_EXCL);
-	isize = xfs_ioend_new_eof(ioend);
+	isize = xfs_new_eof(ip, ioend->io_offset + ioend->io_size);
 	if (isize) {
 		trace_xfs_setfilesize(ip, ioend->io_offset, ioend->io_size);
 		ip->i_d.di_size = isize;
@@ -362,6 +344,8 @@ xfs_submit_ioend_bio(
 	xfs_ioend_t		*ioend,
 	struct bio		*bio)
 {
+	struct xfs_inode	*ip = XFS_I(ioend->io_inode);
+
 	atomic_inc(&ioend->io_remaining);
 	bio->bi_private = ioend;
 	bio->bi_end_io = xfs_end_bio;
@@ -370,8 +354,8 @@ xfs_submit_ioend_bio(
 	 * If the I/O is beyond EOF we mark the inode dirty immediately
 	 * but don't update the inode size until I/O completion.
 	 */
-	if (xfs_ioend_new_eof(ioend))
-		xfs_mark_inode_dirty(XFS_I(ioend->io_inode));
+	if (xfs_new_eof(ip, ioend->io_offset + ioend->io_size))
+		xfs_mark_inode_dirty(ip);
 
 	submit_bio(wbc->sync_mode == WB_SYNC_ALL ? WRITE_SYNC : WRITE, bio);
 }
Index: linux-2.6/fs/xfs/xfs_inode.h
===================================================================
--- linux-2.6.orig/fs/xfs/xfs_inode.h	2011-11-08 08:02:50.000000000 +0100
+++ linux-2.6/fs/xfs/xfs_inode.h	2011-11-08 08:13:01.290386996 +0100
@@ -278,6 +278,20 @@ static inline struct inode *VFS_I(struct
 }
 
 /*
+ * If this I/O goes past the on-disk inode size update it unless it would
+ * be past the current in-core or write in-progress inode size.
+ */
+static inline xfs_fsize_t
+xfs_new_eof(struct xfs_inode *ip, xfs_fsize_t new_size)
+{
+	xfs_fsize_t i_size = max(ip->i_size, ip->i_new_size);
+
+	if (new_size > i_size)
+		new_size = i_size;
+	return new_size > ip->i_d.di_size ? new_size : 0;
+}
+
+/*
  * i_flags helper functions
  */
 static inline void

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 3/5] xfs: do not require an ioend for new EOF calculation
  2011-11-15 20:14 ` [PATCH 3/5] xfs: do not require an ioend for new EOF calculation Christoph Hellwig
@ 2011-11-16 18:09   ` Ben Myers
  2011-11-16 23:28   ` Dave Chinner
  1 sibling, 0 replies; 15+ messages in thread
From: Ben Myers @ 2011-11-16 18:09 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: xfs

On Tue, Nov 15, 2011 at 03:14:10PM -0500, Christoph Hellwig wrote:
> Replace xfs_ioend_new_eof with a new inline xfs_new_eof helper that
> doesn't require and ioend, and is available also outside of xfs_aops.c.
> 
> Also make the code a bit more clear by using a normal if statement
> instead of a slightly misleading MIN().
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>

This looks good to me.  Getting rid of that MIN is much clearer.

Reviewed-by: Ben Myers <bpm@sgi.com>

> ---
>  fs/xfs/xfs_aops.c  |   26 +++++---------------------
>  fs/xfs/xfs_inode.h |   14 ++++++++++++++
>  2 files changed, 19 insertions(+), 21 deletions(-)
> 
> Index: linux-2.6/fs/xfs/xfs_aops.c
> ===================================================================
> --- linux-2.6.orig/fs/xfs/xfs_aops.c	2011-11-08 08:11:44.891887054 +0100
> +++ linux-2.6/fs/xfs/xfs_aops.c	2011-11-08 08:12:31.586400976 +0100
> @@ -99,24 +99,6 @@ xfs_destroy_ioend(
>  }
>  
>  /*
> - * If the end of the current ioend is beyond the current EOF,
> - * return the new EOF value, otherwise zero.
> - */
> -STATIC xfs_fsize_t
> -xfs_ioend_new_eof(
> -	xfs_ioend_t		*ioend)
> -{
> -	xfs_inode_t		*ip = XFS_I(ioend->io_inode);
> -	xfs_fsize_t		isize;
> -	xfs_fsize_t		bsize;
> -
> -	bsize = ioend->io_offset + ioend->io_size;
> -	isize = MAX(ip->i_size, ip->i_new_size);
> -	isize = MIN(isize, bsize);
> -	return isize > ip->i_d.di_size ? isize : 0;
> -}
> -
> -/*
>   * Fast and loose check if this write could update the on-disk inode size.
>   */
>  static inline bool xfs_ioend_is_append(struct xfs_ioend *ioend)
> @@ -140,7 +122,7 @@ xfs_setfilesize(
>  	xfs_fsize_t		isize;
>  
>  	xfs_ilock(ip, XFS_ILOCK_EXCL);
> -	isize = xfs_ioend_new_eof(ioend);
> +	isize = xfs_new_eof(ip, ioend->io_offset + ioend->io_size);
>  	if (isize) {
>  		trace_xfs_setfilesize(ip, ioend->io_offset, ioend->io_size);
>  		ip->i_d.di_size = isize;
> @@ -362,6 +344,8 @@ xfs_submit_ioend_bio(
>  	xfs_ioend_t		*ioend,
>  	struct bio		*bio)
>  {
> +	struct xfs_inode	*ip = XFS_I(ioend->io_inode);
> +
>  	atomic_inc(&ioend->io_remaining);
>  	bio->bi_private = ioend;
>  	bio->bi_end_io = xfs_end_bio;
> @@ -370,8 +354,8 @@ xfs_submit_ioend_bio(
>  	 * If the I/O is beyond EOF we mark the inode dirty immediately
>  	 * but don't update the inode size until I/O completion.
>  	 */
> -	if (xfs_ioend_new_eof(ioend))
> -		xfs_mark_inode_dirty(XFS_I(ioend->io_inode));
> +	if (xfs_new_eof(ip, ioend->io_offset + ioend->io_size))
> +		xfs_mark_inode_dirty(ip);
>  
>  	submit_bio(wbc->sync_mode == WB_SYNC_ALL ? WRITE_SYNC : WRITE, bio);
>  }
> Index: linux-2.6/fs/xfs/xfs_inode.h
> ===================================================================
> --- linux-2.6.orig/fs/xfs/xfs_inode.h	2011-11-08 08:02:50.000000000 +0100
> +++ linux-2.6/fs/xfs/xfs_inode.h	2011-11-08 08:13:01.290386996 +0100
> @@ -278,6 +278,20 @@ static inline struct inode *VFS_I(struct
>  }
>  
>  /*
> + * If this I/O goes past the on-disk inode size update it unless it would
> + * be past the current in-core or write in-progress inode size.
> + */
> +static inline xfs_fsize_t
> +xfs_new_eof(struct xfs_inode *ip, xfs_fsize_t new_size)
> +{
> +	xfs_fsize_t i_size = max(ip->i_size, ip->i_new_size);
> +
> +	if (new_size > i_size)
> +		new_size = i_size;
> +	return new_size > ip->i_d.di_size ? new_size : 0;
> +}
> +
> +/*
>   * i_flags helper functions
>   */
>  static inline void
> 
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/5] xfs: use per-filesystem I/O completion workqueues
  2011-11-15 20:14 ` [PATCH 2/5] xfs: use per-filesystem I/O completion workqueues Christoph Hellwig
@ 2011-11-16 19:01   ` Ben Myers
  2011-11-17  7:40     ` Christoph Hellwig
  2011-11-16 23:24   ` Dave Chinner
  1 sibling, 1 reply; 15+ messages in thread
From: Ben Myers @ 2011-11-16 19:01 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: xfs

On Tue, Nov 15, 2011 at 03:14:09PM -0500, Christoph Hellwig wrote:
> commit 77d7a0c "xfs: Non-blocking inode locking in IO completion" introduced
> a trylocked and defer scheme in xfs_setfilesize to avoid deadlocks when on
> XFS filesystem is used ontop of another using the loop device, and we
> fsync in the loop filesystem.
> 
> Now that we have the cheap enough concurrency managed workqueues, we can
> create per-filesystem instead of global workqueues and remove this scheme
> again, given that it has the potential of delaying size updates and is not
> helpful once we start to log the inode size.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>

...

>  /*
> @@ -168,10 +161,12 @@ xfs_finish_ioend(
>  	struct xfs_ioend	*ioend)
>  {
>  	if (atomic_dec_and_test(&ioend->io_remaining)) {
> +		struct xfs_mount	*mp = XFS_I(ioend->io_inode)->i_mount;
> +
>  		if (ioend->io_type == IO_UNWRITTEN)
> -			queue_work(xfsconvertd_workqueue, &ioend->io_work);
> +			queue_work(mp->m_unwritten_workqueue, &ioend->io_work);
>  		else if (xfs_ioend_is_append(ioend))

I wonder if we could skip size updates due to the 'fast and loose'
nature of xfs_ioend_is_append, and end up destroying the ioend below,
without updating the file size.  It's not strictly related to your patch
though.

> -			queue_work(xfsdatad_workqueue, &ioend->io_work);
> +			queue_work(mp->m_data_workqueue, &ioend->io_work);
>  		else
>  			xfs_destroy_ioend(ioend);
>  	}

...

> Index: linux-2.6/fs/xfs/xfs_super.c
> ===================================================================
> --- linux-2.6.orig/fs/xfs/xfs_super.c	2011-11-10 16:50:40.771795378 +0100
> +++ linux-2.6/fs/xfs/xfs_super.c	2011-11-15 09:17:13.763315819 +0100
> @@ -769,6 +769,42 @@ xfs_setup_devices(
>  	return 0;
>  }
>  
> +STATIC int
> +xfs_init_mount_workqueues(
> +	struct xfs_mount	*mp)
> +{
> +	snprintf(mp->m_data_workqueue_name, XFS_WQ_NAME_LEN,
> +		 "xfs-data/%s", mp->m_fsname);
> +	mp->m_data_workqueue =
> +		alloc_workqueue(mp->m_data_workqueue_name, WQ_MEM_RECLAIM, 1);
> +	if (!mp->m_data_workqueue)
> +		goto out;
> +
> +	snprintf(mp->m_unwritten_workqueue_name, XFS_WQ_NAME_LEN,
> +		 "xfs-conv/%s", mp->m_fsname);
> +	mp->m_unwritten_workqueue =
> +		alloc_workqueue(mp->m_unwritten_workqueue_name,
> +				WQ_MEM_RECLAIM, 1);

Hrm... mp->m_fsname can be MAXNAMELEN (256 in xfs), and XFS_WQ_NAME_LEN
you chose is 512.  As it stands there really isn't a problem here.

And, it sounds like you are wanting to replace this once Tejun improves
the interface...  maybe that was worth pointing out.

> +	if (!mp->m_unwritten_workqueue)
> +		goto out_destroy_data_iodone_queue;
> +
> +	return 0;
> +
> +out_destroy_data_iodone_queue:
> +	destroy_workqueue(mp->m_data_workqueue);
> +out:
> +	return -ENOMEM;
> +#undef XFS_WQ_NAME_LEN

Reviewed-by: Ben Myers <bpm@sgi.com>

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/5] [PATCH] xfs: fix attr2 vs large data fork assert
  2011-11-15 20:14 ` [PATCH 1/5] [PATCH] xfs: fix attr2 vs large data fork assert Christoph Hellwig
@ 2011-11-16 23:15   ` Dave Chinner
  2011-11-17  7:30     ` Christoph Hellwig
  2011-11-19 17:44   ` [PATCH v2] " Christoph Hellwig
  1 sibling, 1 reply; 15+ messages in thread
From: Dave Chinner @ 2011-11-16 23:15 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: xfs

On Tue, Nov 15, 2011 at 03:14:08PM -0500, Christoph Hellwig wrote:
> With Dmitry fsstress updates I've seen very reproducible crashes in
> xfs_attr_shortform_remove because xfs_attr_shortform_bytesfit claims that
> the attributes would not fit inline into the inode after removing an
> attribute.  It turns out that we were operating on an inode with lots
> of delalloc extents, and thus an if_bytes values for the data fork that
> is larger than biggest possible on-disk storage for it which utterly
> confuses the code near the end of xfs_attr_shortform_bytesfit.

We have a test that stresses allocated extents vs attributes in the
xfs_fsr swapext test (227), but that does not take into account
delalloc extents. It sounds like it would be relatively easy to
write a regression test for this particular case - create a file
with a bunch of attributes, then create a number of delalloc data
extents, then remove the attributes to trigger the condition in
xfs_attr_shortform_remove()....


> Fix this by always allowing the current attribute fork, like we already
> do for the attr1 format, given that delalloc conversion will take care
> for moving either the data or attribute area out of line if it doesn't
> fit at that point - or making the point moot by merging extents at this
> point.
> 
> Also document the function better, and clean up some lose bits.

While you are touching that function, can you fix all the whitespace
damage as well? (lots of trailing whitespace). There's a couple of
typos I noticed in your changes (below), but otherwise looks good.

Reviewed-by: Dave Chinner <dchinner@redhat.com>

> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> 
> Index: linux-2.6/fs/xfs/xfs_attr_leaf.c
> ===================================================================
> --- linux-2.6.orig/fs/xfs/xfs_attr_leaf.c	2011-11-04 13:46:59.481655134 +0100
> +++ linux-2.6/fs/xfs/xfs_attr_leaf.c	2011-11-04 15:58:06.480155275 +0100
> @@ -110,6 +110,7 @@ xfs_attr_namesp_match(int arg_flags, int
>  /*
>   * Query whether the requested number of additional bytes of extended
>   * attribute space will be able to fit inline.
> + *
>   * Returns zero if not, else the di_forkoff fork offset to be used in the
>   * literal area for attribute data once the new bytes have been added.
>   *
> @@ -136,11 +137,26 @@ xfs_attr_shortform_bytesfit(xfs_inode_t
>  		return (offset >= minforkoff) ? minforkoff : 0;
>  	}
>  
> -	if (!(mp->m_flags & XFS_MOUNT_ATTR2)) {
> -		if (bytes <= XFS_IFORK_ASIZE(dp))
> -			return dp->i_d.di_forkoff;
> +	/*
> +	 * If the requested numbers of bytes is smaller or equal to the
> +	 * current attribute fork size we can always proceed.
> +	 *
> +	 * Note that if_bytes in the data fork might actually be larger than
> +	 * the current data fork size is due to delalloc extents. In that
> +	 * case either the extent count will go down when they are converted
> +	 * to ral extents, or the delalloc conversion will take care of the

              real

> +	 * literal area rebalancing.
> +	 */
> +	if (bytes <= XFS_IFORK_ASIZE(dp))
> +		return dp->i_d.di_forkoff;
> +
> +	/*
> +	 * For attr2 we can try to move the forkoff if there is space in the
> +	 * literal area, but for the old format we are done if there is no
> +	 * space in the fixes attribute fork.
> +	 */
> +	if (!(mp->m_flags & XFS_MOUNT_ATTR2))
>  		return 0;
> -	}
>  
>  	dsize = dp->i_df.if_bytes;
>  	
> @@ -157,10 +173,9 @@ xfs_attr_shortform_bytesfit(xfs_inode_t
>  		    xfs_default_attroffset(dp))
>  			dsize = XFS_BMDR_SPACE_CALC(MINDBTPTRS);
>  		break;
> -		
>  	case XFS_DINODE_FMT_BTREE:
>  		/*
> -		 * If have data btree then keep forkoff if we have one,
> +		 * If have a data btree then keep forkoff if we have one,

	           If we have

>  		 * otherwise we are adding a new attr, so then we set 
>  		 * minforkoff to where the btree root can finish so we have 
>  		 * plenty of room for attrs
> @@ -168,10 +183,9 @@ xfs_attr_shortform_bytesfit(xfs_inode_t
>  		if (dp->i_d.di_forkoff) {
>  			if (offset < dp->i_d.di_forkoff) 
>  				return 0;
> -			else 
> -				return dp->i_d.di_forkoff;
> -		} else
> -			dsize = XFS_BMAP_BROOT_SPACE(dp->i_df.if_broot);
> +			return dp->i_d.di_forkoff;
> +		}
> +		dsize = XFS_BMAP_BROOT_SPACE(dp->i_df.if_broot);
>  		break;
>  	}
>  	
> @@ -186,10 +200,10 @@ xfs_attr_shortform_bytesfit(xfs_inode_t
>  	maxforkoff = XFS_LITINO(mp) - XFS_BMDR_SPACE_CALC(MINABTPTRS);
>  	maxforkoff = maxforkoff >> 3;	/* rounded down */
>  
> -	if (offset >= minforkoff && offset < maxforkoff)
> -		return offset;
>  	if (offset >= maxforkoff)
>  		return maxforkoff;
> +	if (offset >= minforkoff)
> +		return offset;
>  	return 0;
>  }
>  
> 
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs
> 

-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/5] xfs: use per-filesystem I/O completion workqueues
  2011-11-15 20:14 ` [PATCH 2/5] xfs: use per-filesystem I/O completion workqueues Christoph Hellwig
  2011-11-16 19:01   ` Ben Myers
@ 2011-11-16 23:24   ` Dave Chinner
  2011-11-17  7:25     ` Christoph Hellwig
  1 sibling, 1 reply; 15+ messages in thread
From: Dave Chinner @ 2011-11-16 23:24 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: xfs

On Tue, Nov 15, 2011 at 03:14:09PM -0500, Christoph Hellwig wrote:
> commit 77d7a0c "xfs: Non-blocking inode locking in IO completion" introduced
> a trylocked and defer scheme in xfs_setfilesize to avoid deadlocks when on
> XFS filesystem is used ontop of another using the loop device, and we
> fsync in the loop filesystem.
> 
> Now that we have the cheap enough concurrency managed workqueues, we can
> create per-filesystem instead of global workqueues and remove this scheme
> again, given that it has the potential of delaying size updates and is not
> helpful once we start to log the inode size.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>

....

>  
> +STATIC int
> +xfs_init_mount_workqueues(
> +	struct xfs_mount	*mp)
> +{
> +	snprintf(mp->m_data_workqueue_name, XFS_WQ_NAME_LEN,
> +		 "xfs-data/%s", mp->m_fsname);
> +	mp->m_data_workqueue =
> +		alloc_workqueue(mp->m_data_workqueue_name, WQ_MEM_RECLAIM, 1);
> +	if (!mp->m_data_workqueue)
> +		goto out;
> +
> +	snprintf(mp->m_unwritten_workqueue_name, XFS_WQ_NAME_LEN,
> +		 "xfs-conv/%s", mp->m_fsname);
> +	mp->m_unwritten_workqueue =
> +		alloc_workqueue(mp->m_unwritten_workqueue_name,
> +				WQ_MEM_RECLAIM, 1);
> +	if (!mp->m_unwritten_workqueue)
> +		goto out_destroy_data_iodone_queue;
> +
> +	return 0;
> +
> +out_destroy_data_iodone_queue:
> +	destroy_workqueue(mp->m_data_workqueue);
> +out:
> +	return -ENOMEM;
> +#undef XFS_WQ_NAME_LEN

Don't need the undef here anymore as the #define is in the header
file and not local to the function.

> Index: linux-2.6/fs/xfs/xfs_mount.h
> ===================================================================
> --- linux-2.6.orig/fs/xfs/xfs_mount.h	2011-11-10 16:50:40.787796774 +0100
> +++ linux-2.6/fs/xfs/xfs_mount.h	2011-11-15 09:15:25.053316473 +0100
> @@ -211,6 +211,12 @@ typedef struct xfs_mount {
>  	struct shrinker		m_inode_shrink;	/* inode reclaim shrinker */
>  	int64_t			m_low_space[XFS_LOWSP_MAX];
>  						/* low free space thresholds */
> +
> +	struct workqueue_struct	*m_data_workqueue;
> +	struct workqueue_struct	*m_unwritten_workqueue;
> +#define XFS_WQ_NAME_LEN		512
> +	char			m_data_workqueue_name[XFS_WQ_NAME_LEN];
> +	char			m_unwritten_workqueue_name[XFS_WQ_NAME_LEN];
>  } xfs_mount_t;

Can't say I'm a great fan of this - making the workqueue code use
kstrdup() would be a much better better solution, IMO, just like was
done a while for the SLAB cache names to solve exactly the same
problem....

Regardless,

Reviewed-by: Dave Chinner <dchinner@redhat.com>

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 3/5] xfs: do not require an ioend for new EOF calculation
  2011-11-15 20:14 ` [PATCH 3/5] xfs: do not require an ioend for new EOF calculation Christoph Hellwig
  2011-11-16 18:09   ` Ben Myers
@ 2011-11-16 23:28   ` Dave Chinner
  1 sibling, 0 replies; 15+ messages in thread
From: Dave Chinner @ 2011-11-16 23:28 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: xfs

On Tue, Nov 15, 2011 at 03:14:10PM -0500, Christoph Hellwig wrote:
> Replace xfs_ioend_new_eof with a new inline xfs_new_eof helper that
> doesn't require and ioend, and is available also outside of xfs_aops.c.
> 
> Also make the code a bit more clear by using a normal if statement
> instead of a slightly misleading MIN().
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>

Reviewed-by: Dave Chinner <dchinner@redhat.com>

-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/5] xfs: use per-filesystem I/O completion workqueues
  2011-11-16 23:24   ` Dave Chinner
@ 2011-11-17  7:25     ` Christoph Hellwig
  0 siblings, 0 replies; 15+ messages in thread
From: Christoph Hellwig @ 2011-11-17  7:25 UTC (permalink / raw)
  To: Dave Chinner; +Cc: Christoph Hellwig, xfs

On Thu, Nov 17, 2011 at 10:24:30AM +1100, Dave Chinner wrote:
> Don't need the undef here anymore as the #define is in the header
> file and not local to the function.

I'll fix it up.

> Can't say I'm a great fan of this - making the workqueue code use
> kstrdup() would be a much better better solution, IMO, just like was
> done a while for the SLAB cache names to solve exactly the same
> problem....

Tejun has a patch to make the name argument to alloc_workqueue both
dynamically allocated and varags format.  That'll fix it, but we can't
rely on it yet.  I will switch over to it later in the 3.3 cycle.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/5] [PATCH] xfs: fix attr2 vs large data fork assert
  2011-11-16 23:15   ` Dave Chinner
@ 2011-11-17  7:30     ` Christoph Hellwig
  2011-11-29 18:48       ` Ben Myers
  0 siblings, 1 reply; 15+ messages in thread
From: Christoph Hellwig @ 2011-11-17  7:30 UTC (permalink / raw)
  To: Dave Chinner; +Cc: Christoph Hellwig, xfs

On Thu, Nov 17, 2011 at 10:15:17AM +1100, Dave Chinner wrote:
> On Tue, Nov 15, 2011 at 03:14:08PM -0500, Christoph Hellwig wrote:
> > With Dmitry fsstress updates I've seen very reproducible crashes in
> > xfs_attr_shortform_remove because xfs_attr_shortform_bytesfit claims that
> > the attributes would not fit inline into the inode after removing an
> > attribute.  It turns out that we were operating on an inode with lots
> > of delalloc extents, and thus an if_bytes values for the data fork that
> > is larger than biggest possible on-disk storage for it which utterly
> > confuses the code near the end of xfs_attr_shortform_bytesfit.
> 
> We have a test that stresses allocated extents vs attributes in the
> xfs_fsr swapext test (227), but that does not take into account
> delalloc extents. It sounds like it would be relatively easy to
> write a regression test for this particular case - create a file
> with a bunch of attributes, then create a number of delalloc data
> extents, then remove the attributes to trigger the condition in
> xfs_attr_shortform_remove()....

Test 117 with Dmitries new fsstress changes hit it 100% reliably
before

	xfstests: freeze fsstress options for 117'th

I was planning on adding a copy of the test using an explicit
combination of fsstress seeds that reproduce the issue.

> While you are touching that function, can you fix all the whitespace
> damage as well? (lots of trailing whitespace). There's a couple of
> typos I noticed in your changes (below), but otherwise looks good.

I'll fix the tpos and whitespace issues.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/5] xfs: use per-filesystem I/O completion workqueues
  2011-11-16 19:01   ` Ben Myers
@ 2011-11-17  7:40     ` Christoph Hellwig
  0 siblings, 0 replies; 15+ messages in thread
From: Christoph Hellwig @ 2011-11-17  7:40 UTC (permalink / raw)
  To: Ben Myers; +Cc: Christoph Hellwig, xfs

On Wed, Nov 16, 2011 at 01:01:20PM -0600, Ben Myers wrote:
> On Tue, Nov 15, 2011 at 03:14:09PM -0500, Christoph Hellwig wrote:
> > commit 77d7a0c "xfs: Non-blocking inode locking in IO completion" introduced
> > a trylocked and defer scheme in xfs_setfilesize to avoid deadlocks when on
> > XFS filesystem is used ontop of another using the loop device, and we
> > fsync in the loop filesystem.
> > 
> > Now that we have the cheap enough concurrency managed workqueues, we can
> > create per-filesystem instead of global workqueues and remove this scheme
> > again, given that it has the potential of delaying size updates and is not
> > helpful once we start to log the inode size.
> > 
> > Signed-off-by: Christoph Hellwig <hch@lst.de>
> 
> ...
> 
> >  /*
> > @@ -168,10 +161,12 @@ xfs_finish_ioend(
> >  	struct xfs_ioend	*ioend)
> >  {
> >  	if (atomic_dec_and_test(&ioend->io_remaining)) {
> > +		struct xfs_mount	*mp = XFS_I(ioend->io_inode)->i_mount;
> > +
> >  		if (ioend->io_type == IO_UNWRITTEN)
> > -			queue_work(xfsconvertd_workqueue, &ioend->io_work);
> > +			queue_work(mp->m_unwritten_workqueue, &ioend->io_work);
> >  		else if (xfs_ioend_is_append(ioend))
> 
> I wonder if we could skip size updates due to the 'fast and loose'
> nature of xfs_ioend_is_append, and end up destroying the ioend below,
> without updating the file size.  It's not strictly related to your patch
> though.

No - xfs_ioend_is_append check that the offset is beyond the on-disk
inode size.  The loose part is that we don't bother with the in-core
i_size and i_new_size which could change due to I/O errors.  di_size
on the other hand will only go downwards during truncate, and we make
sure all outstanding buffered I/Os have finished first.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH v2] xfs: fix attr2 vs large data fork assert
  2011-11-15 20:14 ` [PATCH 1/5] [PATCH] xfs: fix attr2 vs large data fork assert Christoph Hellwig
  2011-11-16 23:15   ` Dave Chinner
@ 2011-11-19 17:44   ` Christoph Hellwig
  1 sibling, 0 replies; 15+ messages in thread
From: Christoph Hellwig @ 2011-11-19 17:44 UTC (permalink / raw)
  To: xfs

With Dmitry fsstress updates I've seen very reproducible crashes in
xfs_attr_shortform_remove because xfs_attr_shortform_bytesfit claims that
the attributes would not fit inline into the inode after removing an
attribute.  It turns out that we were operating on an inode with lots
of delalloc extents, and thus an if_bytes values for the data fork that
is larger than biggest possible on-disk storage for it which utterly
confuses the code near the end of xfs_attr_shortform_bytesfit.

Fix this by always allowing the current attribute fork, like we already
do for the attr1 format, given that delalloc conversion will take care
for moving either the data or attribute area out of line if it doesn't
fit at that point - or making the point moot by merging extents at this
point.

Also document the function better, and clean up some lose bits.

Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>

---
 fs/xfs/xfs_attr_leaf.c |   64 +++++++++++++++++++++++++++++--------------------
 1 file changed, 39 insertions(+), 25 deletions(-)

Index: linux-2.6/fs/xfs/xfs_attr_leaf.c
===================================================================
--- linux-2.6.orig/fs/xfs/xfs_attr_leaf.c	2011-11-17 08:29:53.874168153 +0100
+++ linux-2.6/fs/xfs/xfs_attr_leaf.c	2011-11-17 08:33:24.790833563 +0100
@@ -110,6 +110,7 @@ xfs_attr_namesp_match(int arg_flags, int
 /*
  * Query whether the requested number of additional bytes of extended
  * attribute space will be able to fit inline.
+ *
  * Returns zero if not, else the di_forkoff fork offset to be used in the
  * literal area for attribute data once the new bytes have been added.
  *
@@ -122,7 +123,7 @@ xfs_attr_shortform_bytesfit(xfs_inode_t
 	int offset;
 	int minforkoff;	/* lower limit on valid forkoff locations */
 	int maxforkoff;	/* upper limit on valid forkoff locations */
-	int dsize;	
+	int dsize;
 	xfs_mount_t *mp = dp->i_mount;
 
 	offset = (XFS_LITINO(mp) - bytes) >> 3; /* rounded down */
@@ -136,47 +137,60 @@ xfs_attr_shortform_bytesfit(xfs_inode_t
 		return (offset >= minforkoff) ? minforkoff : 0;
 	}
 
-	if (!(mp->m_flags & XFS_MOUNT_ATTR2)) {
-		if (bytes <= XFS_IFORK_ASIZE(dp))
-			return dp->i_d.di_forkoff;
+	/*
+	 * If the requested numbers of bytes is smaller or equal to the
+	 * current attribute fork size we can always proceed.
+	 *
+	 * Note that if_bytes in the data fork might actually be larger than
+	 * the current data fork size is due to delalloc extents. In that
+	 * case either the extent count will go down when they are converted
+	 * to real extents, or the delalloc conversion will take care of the
+	 * literal area rebalancing.
+	 */
+	if (bytes <= XFS_IFORK_ASIZE(dp))
+		return dp->i_d.di_forkoff;
+
+	/*
+	 * For attr2 we can try to move the forkoff if there is space in the
+	 * literal area, but for the old format we are done if there is no
+	 * space in the fixes attribute fork.
+	 */
+	if (!(mp->m_flags & XFS_MOUNT_ATTR2))
 		return 0;
-	}
 
 	dsize = dp->i_df.if_bytes;
-	
+
 	switch (dp->i_d.di_format) {
 	case XFS_DINODE_FMT_EXTENTS:
-		/* 
+		/*
 		 * If there is no attr fork and the data fork is extents, 
-		 * determine if creating the default attr fork will result 
-		 * in the extents form migrating to btree. If so, the 
-		 * minimum offset only needs to be the space required for 
+		 * determine if creating the default attr fork will result
+		 * in the extents form migrating to btree. If so, the
+		 * minimum offset only needs to be the space required for
 		 * the btree root.
-		 */ 
+		 */
 		if (!dp->i_d.di_forkoff && dp->i_df.if_bytes >
 		    xfs_default_attroffset(dp))
 			dsize = XFS_BMDR_SPACE_CALC(MINDBTPTRS);
 		break;
-		
 	case XFS_DINODE_FMT_BTREE:
 		/*
-		 * If have data btree then keep forkoff if we have one,
-		 * otherwise we are adding a new attr, so then we set 
-		 * minforkoff to where the btree root can finish so we have 
+		 * If we have a data btree then keep forkoff if we have one,
+		 * otherwise we are adding a new attr, so then we set
+		 * minforkoff to where the btree root can finish so we have
 		 * plenty of room for attrs
 		 */
 		if (dp->i_d.di_forkoff) {
-			if (offset < dp->i_d.di_forkoff) 
+			if (offset < dp->i_d.di_forkoff)
 				return 0;
-			else 
-				return dp->i_d.di_forkoff;
-		} else
-			dsize = XFS_BMAP_BROOT_SPACE(dp->i_df.if_broot);
+			return dp->i_d.di_forkoff;
+		}
+		dsize = XFS_BMAP_BROOT_SPACE(dp->i_df.if_broot);
 		break;
 	}
-	
-	/* 
-	 * A data fork btree root must have space for at least 
+
+	/*
+	 * A data fork btree root must have space for at least
 	 * MINDBTPTRS key/ptr pairs if the data fork is small or empty.
 	 */
 	minforkoff = MAX(dsize, XFS_BMDR_SPACE_CALC(MINDBTPTRS));
@@ -186,10 +200,10 @@ xfs_attr_shortform_bytesfit(xfs_inode_t
 	maxforkoff = XFS_LITINO(mp) - XFS_BMDR_SPACE_CALC(MINABTPTRS);
 	maxforkoff = maxforkoff >> 3;	/* rounded down */
 
-	if (offset >= minforkoff && offset < maxforkoff)
-		return offset;
 	if (offset >= maxforkoff)
 		return maxforkoff;
+	if (offset >= minforkoff)
+		return offset;
 	return 0;
 }
 

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/5] [PATCH] xfs: fix attr2 vs large data fork assert
  2011-11-17  7:30     ` Christoph Hellwig
@ 2011-11-29 18:48       ` Ben Myers
  2011-11-29 18:55         ` Christoph Hellwig
  0 siblings, 1 reply; 15+ messages in thread
From: Ben Myers @ 2011-11-29 18:48 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: xfs

Hey Christoph,

On Thu, Nov 17, 2011 at 02:30:04AM -0500, Christoph Hellwig wrote:
> On Thu, Nov 17, 2011 at 10:15:17AM +1100, Dave Chinner wrote:
> > On Tue, Nov 15, 2011 at 03:14:08PM -0500, Christoph Hellwig wrote:
> > > With Dmitry fsstress updates I've seen very reproducible crashes in
> > > xfs_attr_shortform_remove because xfs_attr_shortform_bytesfit claims that
> > > the attributes would not fit inline into the inode after removing an
> > > attribute.  It turns out that we were operating on an inode with lots
> > > of delalloc extents, and thus an if_bytes values for the data fork that
> > > is larger than biggest possible on-disk storage for it which utterly
> > > confuses the code near the end of xfs_attr_shortform_bytesfit.
> > 
> > We have a test that stresses allocated extents vs attributes in the
> > xfs_fsr swapext test (227), but that does not take into account
> > delalloc extents. It sounds like it would be relatively easy to
> > write a regression test for this particular case - create a file
> > with a bunch of attributes, then create a number of delalloc data
> > extents, then remove the attributes to trigger the condition in
> > xfs_attr_shortform_remove()....
> 
> Test 117 with Dmitries new fsstress changes hit it 100% reliably
> before
> 
> 	xfstests: freeze fsstress options for 117'th
>
> I was planning on adding a copy of the test using an explicit
> combination of fsstress seeds that reproduce the issue.

FYI, Test 117 also hit it for me after I backed off 'freeze fsstress
options'.  Are you still planning on adding a copy of the test with the
seeds in question?

Thanks,
	Ben

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/5] [PATCH] xfs: fix attr2 vs large data fork assert
  2011-11-29 18:48       ` Ben Myers
@ 2011-11-29 18:55         ` Christoph Hellwig
  0 siblings, 0 replies; 15+ messages in thread
From: Christoph Hellwig @ 2011-11-29 18:55 UTC (permalink / raw)
  To: Ben Myers; +Cc: Christoph Hellwig, xfs

On Tue, Nov 29, 2011 at 12:48:16PM -0600, Ben Myers wrote:
> FYI, Test 117 also hit it for me after I backed off 'freeze fsstress
> options'.  Are you still planning on adding a copy of the test with the
> seeds in question?

Yes.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2011-11-29 18:55 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-11-15 20:14 [PATCH 0/5] for-3.2 queue Christoph Hellwig
2011-11-15 20:14 ` [PATCH 1/5] [PATCH] xfs: fix attr2 vs large data fork assert Christoph Hellwig
2011-11-16 23:15   ` Dave Chinner
2011-11-17  7:30     ` Christoph Hellwig
2011-11-29 18:48       ` Ben Myers
2011-11-29 18:55         ` Christoph Hellwig
2011-11-19 17:44   ` [PATCH v2] " Christoph Hellwig
2011-11-15 20:14 ` [PATCH 2/5] xfs: use per-filesystem I/O completion workqueues Christoph Hellwig
2011-11-16 19:01   ` Ben Myers
2011-11-17  7:40     ` Christoph Hellwig
2011-11-16 23:24   ` Dave Chinner
2011-11-17  7:25     ` Christoph Hellwig
2011-11-15 20:14 ` [PATCH 3/5] xfs: do not require an ioend for new EOF calculation Christoph Hellwig
2011-11-16 18:09   ` Ben Myers
2011-11-16 23:28   ` Dave Chinner

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.