All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 00/27] patch queue for Linux 3.1, V2
@ 2011-07-01  9:43 Christoph Hellwig
  2011-07-01  9:43 ` [PATCH 01/27] xfs: PF_FSTRANS should never be set in ->writepage Christoph Hellwig
                   ` (26 more replies)
  0 siblings, 27 replies; 88+ messages in thread
From: Christoph Hellwig @ 2011-07-01  9:43 UTC (permalink / raw)
  To: xfs

This is my current patch queue for Linux 3.1.  Compared to the last
posting all review comments were incorporated and two additional trivial
patches were added.  The ->writepages implementation was dropped for now,
given the bad situation of kswap-originating writeback, but I'll repost
the fixed version separately to get feedback on the updated version.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 88+ messages in thread

* [PATCH 01/27] xfs: PF_FSTRANS should never be set in ->writepage
  2011-07-01  9:43 [PATCH 00/27] patch queue for Linux 3.1, V2 Christoph Hellwig
@ 2011-07-01  9:43 ` Christoph Hellwig
  2011-07-01  9:43 ` [PATCH 02/27] xfs: re-enable non-blocking behaviour in xfs_map_blocks Christoph Hellwig
                   ` (25 subsequent siblings)
  26 siblings, 0 replies; 88+ messages in thread
From: Christoph Hellwig @ 2011-07-01  9:43 UTC (permalink / raw)
  To: xfs

[-- Attachment #1: xfs-writepage-simplify-fstrans-check --]
[-- Type: text/plain, Size: 2220 bytes --]

Now that we reject direct reclaim in addition to always using GFP_NOFS
allocation there's no chance we'll ever end up in ->writepage with
PF_FSTRANS set.  Add a WARN_ON if we hit this case, and stop checking
if we'd actually need to start a transaction.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Alex Elder <aelder@sgi.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>

Index: xfs/fs/xfs/linux-2.6/xfs_aops.c
===================================================================
--- xfs.orig/fs/xfs/linux-2.6/xfs_aops.c	2011-06-30 14:50:35.206501640 +0200
+++ xfs/fs/xfs/linux-2.6/xfs_aops.c	2011-06-30 15:34:59.156468543 +0200
@@ -894,11 +894,6 @@ out_invalidate:
  * For unwritten space on the page we need to start the conversion to
  * regular allocated space.
  * For any other dirty buffer heads on the page we should flush them.
- *
- * If we detect that a transaction would be required to flush the page, we
- * have to check the process flags first, if we are already in a transaction
- * or disk I/O during allocations is off, we need to fail the writepage and
- * redirty the page.
  */
 STATIC int
 xfs_vm_writepage(
@@ -906,7 +901,6 @@ xfs_vm_writepage(
 	struct writeback_control *wbc)
 {
 	struct inode		*inode = page->mapping->host;
-	int			delalloc, unwritten;
 	struct buffer_head	*bh, *head;
 	struct xfs_bmbt_irec	imap;
 	xfs_ioend_t		*ioend = NULL, *iohead = NULL;
@@ -938,15 +932,10 @@ xfs_vm_writepage(
 		goto redirty;
 
 	/*
-	 * We need a transaction if there are delalloc or unwritten buffers
-	 * on the page.
-	 *
-	 * If we need a transaction and the process flags say we are already
-	 * in a transaction, or no IO is allowed then mark the page dirty
-	 * again and leave the page as is.
+	 * Given that we do not allow direct reclaim to call us, we should
+	 * never be called while in a filesystem transaction.
 	 */
-	xfs_count_page_state(page, &delalloc, &unwritten);
-	if ((current->flags & PF_FSTRANS) && (delalloc || unwritten))
+	if (WARN_ON(current->flags & PF_FSTRANS))
 		goto redirty;
 
 	/* Is this page beyond the end of the file? */

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 88+ messages in thread

* [PATCH 02/27] xfs: re-enable non-blocking behaviour in xfs_map_blocks
  2011-07-01  9:43 [PATCH 00/27] patch queue for Linux 3.1, V2 Christoph Hellwig
  2011-07-01  9:43 ` [PATCH 01/27] xfs: PF_FSTRANS should never be set in ->writepage Christoph Hellwig
@ 2011-07-01  9:43 ` Christoph Hellwig
  2011-07-05 22:35   ` Alex Elder
  2011-07-01  9:43 ` [PATCH 03/27] xfs: work around bogus gcc warning in xfs_allocbt_init_cursor Christoph Hellwig
                   ` (24 subsequent siblings)
  26 siblings, 1 reply; 88+ messages in thread
From: Christoph Hellwig @ 2011-07-01  9:43 UTC (permalink / raw)
  To: xfs

[-- Attachment #1: xfs-writepage-repair-nolock-support --]
[-- Type: text/plain, Size: 1061 bytes --]

The non-blockig behaviour in xfs_map_blocks currently is conditional on
having both the WB_SYNC_NONE sync_mode and the nonblocking flag set.
The latter used to be used by both pdflush, kswapd and a few other places
in older kernels, but has been fading out starting with the introduction
of the per-bdi flusher threads.

Enable the non-blocking behaviour for all WB_SYNC_NONE calls to get back
the behaviour we want.

Signed-off-by: Christoph Hellwig <hch@lst.de>

Index: xfs/fs/xfs/linux-2.6/xfs_aops.c
===================================================================
--- xfs.orig/fs/xfs/linux-2.6/xfs_aops.c	2011-06-30 20:10:06.959596789 +0200
+++ xfs/fs/xfs/linux-2.6/xfs_aops.c	2011-06-30 20:10:19.749596630 +0200
@@ -959,7 +959,7 @@ xfs_vm_writepage(
 	offset = page_offset(page);
 	type = IO_OVERWRITE;
 
-	if (wbc->sync_mode == WB_SYNC_NONE && wbc->nonblocking)
+	if (wbc->sync_mode == WB_SYNC_NONE)
 		nonblocking = 1;
 
 	do {

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 88+ messages in thread

* [PATCH 03/27] xfs: work around bogus gcc warning in xfs_allocbt_init_cursor
  2011-07-01  9:43 [PATCH 00/27] patch queue for Linux 3.1, V2 Christoph Hellwig
  2011-07-01  9:43 ` [PATCH 01/27] xfs: PF_FSTRANS should never be set in ->writepage Christoph Hellwig
  2011-07-01  9:43 ` [PATCH 02/27] xfs: re-enable non-blocking behaviour in xfs_map_blocks Christoph Hellwig
@ 2011-07-01  9:43 ` Christoph Hellwig
  2011-07-01  9:43 ` [PATCH 04/27] xfs: split xfs_setattr Christoph Hellwig
                   ` (23 subsequent siblings)
  26 siblings, 0 replies; 88+ messages in thread
From: Christoph Hellwig @ 2011-07-01  9:43 UTC (permalink / raw)
  To: xfs

[-- Attachment #1: xfs-fix-xfs_allocbt_init_cursor-warning --]
[-- Type: text/plain, Size: 1442 bytes --]

GCC 4.6 complains about an array subscript is above array bounds when
using the btree index to index into the agf_levels array.  The only
two indices passed in are 0 and 1, and we have an assert insuring that.

Replace the trick of using the array index directly with using constants
in the already existing branch for assigning the XFS_BTREE_LASTREC_UPDATE
flag.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Alex Elder <aelder@sgi.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>

Index: xfs/fs/xfs/xfs_alloc_btree.c
===================================================================
--- xfs.orig/fs/xfs/xfs_alloc_btree.c	2011-06-17 14:16:27.929065669 +0200
+++ xfs/fs/xfs/xfs_alloc_btree.c	2011-06-17 14:17:22.145729599 +0200
@@ -427,13 +427,16 @@ xfs_allocbt_init_cursor(
 
 	cur->bc_tp = tp;
 	cur->bc_mp = mp;
-	cur->bc_nlevels = be32_to_cpu(agf->agf_levels[btnum]);
 	cur->bc_btnum = btnum;
 	cur->bc_blocklog = mp->m_sb.sb_blocklog;
-
 	cur->bc_ops = &xfs_allocbt_ops;
-	if (btnum == XFS_BTNUM_CNT)
+
+	if (btnum == XFS_BTNUM_CNT) {
+		cur->bc_nlevels = be32_to_cpu(agf->agf_levels[XFS_BTNUM_CNT]);
 		cur->bc_flags = XFS_BTREE_LASTREC_UPDATE;
+	} else {
+		cur->bc_nlevels = be32_to_cpu(agf->agf_levels[XFS_BTNUM_BNO]);
+	}
 
 	cur->bc_private.a.agbp = agbp;
 	cur->bc_private.a.agno = agno;

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 88+ messages in thread

* [PATCH 04/27] xfs: split xfs_setattr
  2011-07-01  9:43 [PATCH 00/27] patch queue for Linux 3.1, V2 Christoph Hellwig
                   ` (2 preceding siblings ...)
  2011-07-01  9:43 ` [PATCH 03/27] xfs: work around bogus gcc warning in xfs_allocbt_init_cursor Christoph Hellwig
@ 2011-07-01  9:43 ` Christoph Hellwig
  2011-07-01  9:43 ` [PATCH 06/27] xfs: kill xfs_itruncate_start Christoph Hellwig
                   ` (22 subsequent siblings)
  26 siblings, 0 replies; 88+ messages in thread
From: Christoph Hellwig @ 2011-07-01  9:43 UTC (permalink / raw)
  To: xfs

[-- Attachment #1: xfs-split-setattr --]
[-- Type: text/plain, Size: 28066 bytes --]

Split up xfs_setattr into two functions, one for the complex truncate
handling, and one for the trivial attribute updates.  Also move both
new routines to xfs_iops.c as they are fairly Linux-specific.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Alex Elder <aelder@sgi.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>

Index: xfs/fs/xfs/linux-2.6/xfs_iops.c
===================================================================
--- xfs.orig/fs/xfs/linux-2.6/xfs_iops.c	2011-06-29 11:29:02.684972774 +0200
+++ xfs/fs/xfs/linux-2.6/xfs_iops.c	2011-06-29 11:29:07.154948558 +0200
@@ -39,6 +39,7 @@
 #include "xfs_buf_item.h"
 #include "xfs_utils.h"
 #include "xfs_vnodeops.h"
+#include "xfs_inode_item.h"
 #include "xfs_trace.h"
 
 #include <linux/capability.h>
@@ -497,12 +498,449 @@ xfs_vn_getattr(
 	return 0;
 }
 
+int
+xfs_setattr_nonsize(
+	struct xfs_inode	*ip,
+	struct iattr		*iattr,
+	int			flags)
+{
+	xfs_mount_t		*mp = ip->i_mount;
+	struct inode		*inode = VFS_I(ip);
+	int			mask = iattr->ia_valid;
+	xfs_trans_t		*tp;
+	int			error;
+	uid_t			uid = 0, iuid = 0;
+	gid_t			gid = 0, igid = 0;
+	struct xfs_dquot	*udqp = NULL, *gdqp = NULL;
+	struct xfs_dquot	*olddquot1 = NULL, *olddquot2 = NULL;
+
+	trace_xfs_setattr(ip);
+
+	if (mp->m_flags & XFS_MOUNT_RDONLY)
+		return XFS_ERROR(EROFS);
+
+	if (XFS_FORCED_SHUTDOWN(mp))
+		return XFS_ERROR(EIO);
+
+	error = -inode_change_ok(inode, iattr);
+	if (error)
+		return XFS_ERROR(error);
+
+	ASSERT((mask & ATTR_SIZE) == 0);
+
+	/*
+	 * If disk quotas is on, we make sure that the dquots do exist on disk,
+	 * before we start any other transactions. Trying to do this later
+	 * is messy. We don't care to take a readlock to look at the ids
+	 * in inode here, because we can't hold it across the trans_reserve.
+	 * If the IDs do change before we take the ilock, we're covered
+	 * because the i_*dquot fields will get updated anyway.
+	 */
+	if (XFS_IS_QUOTA_ON(mp) && (mask & (ATTR_UID|ATTR_GID))) {
+		uint	qflags = 0;
+
+		if ((mask & ATTR_UID) && XFS_IS_UQUOTA_ON(mp)) {
+			uid = iattr->ia_uid;
+			qflags |= XFS_QMOPT_UQUOTA;
+		} else {
+			uid = ip->i_d.di_uid;
+		}
+		if ((mask & ATTR_GID) && XFS_IS_GQUOTA_ON(mp)) {
+			gid = iattr->ia_gid;
+			qflags |= XFS_QMOPT_GQUOTA;
+		}  else {
+			gid = ip->i_d.di_gid;
+		}
+
+		/*
+		 * We take a reference when we initialize udqp and gdqp,
+		 * so it is important that we never blindly double trip on
+		 * the same variable. See xfs_create() for an example.
+		 */
+		ASSERT(udqp == NULL);
+		ASSERT(gdqp == NULL);
+		error = xfs_qm_vop_dqalloc(ip, uid, gid, xfs_get_projid(ip),
+					 qflags, &udqp, &gdqp);
+		if (error)
+			return error;
+	}
+
+	tp = xfs_trans_alloc(mp, XFS_TRANS_SETATTR_NOT_SIZE);
+	error = xfs_trans_reserve(tp, 0, XFS_ICHANGE_LOG_RES(mp), 0, 0, 0);
+	if (error)
+		goto out_dqrele;
+
+	xfs_ilock(ip, XFS_ILOCK_EXCL);
+
+	/*
+	 * Change file ownership.  Must be the owner or privileged.
+	 */
+	if (mask & (ATTR_UID|ATTR_GID)) {
+		/*
+		 * These IDs could have changed since we last looked at them.
+		 * But, we're assured that if the ownership did change
+		 * while we didn't have the inode locked, inode's dquot(s)
+		 * would have changed also.
+		 */
+		iuid = ip->i_d.di_uid;
+		igid = ip->i_d.di_gid;
+		gid = (mask & ATTR_GID) ? iattr->ia_gid : igid;
+		uid = (mask & ATTR_UID) ? iattr->ia_uid : iuid;
+
+		/*
+		 * Do a quota reservation only if uid/gid is actually
+		 * going to change.
+		 */
+		if (XFS_IS_QUOTA_RUNNING(mp) &&
+		    ((XFS_IS_UQUOTA_ON(mp) && iuid != uid) ||
+		     (XFS_IS_GQUOTA_ON(mp) && igid != gid))) {
+			ASSERT(tp);
+			error = xfs_qm_vop_chown_reserve(tp, ip, udqp, gdqp,
+						capable(CAP_FOWNER) ?
+						XFS_QMOPT_FORCE_RES : 0);
+			if (error)	/* out of quota */
+				goto out_trans_cancel;
+		}
+	}
+
+	xfs_trans_ijoin(tp, ip);
+
+	/*
+	 * Change file ownership.  Must be the owner or privileged.
+	 */
+	if (mask & (ATTR_UID|ATTR_GID)) {
+		/*
+		 * CAP_FSETID overrides the following restrictions:
+		 *
+		 * The set-user-ID and set-group-ID bits of a file will be
+		 * cleared upon successful return from chown()
+		 */
+		if ((ip->i_d.di_mode & (S_ISUID|S_ISGID)) &&
+		    !capable(CAP_FSETID))
+			ip->i_d.di_mode &= ~(S_ISUID|S_ISGID);
+
+		/*
+		 * Change the ownerships and register quota modifications
+		 * in the transaction.
+		 */
+		if (iuid != uid) {
+			if (XFS_IS_QUOTA_RUNNING(mp) && XFS_IS_UQUOTA_ON(mp)) {
+				ASSERT(mask & ATTR_UID);
+				ASSERT(udqp);
+				olddquot1 = xfs_qm_vop_chown(tp, ip,
+							&ip->i_udquot, udqp);
+			}
+			ip->i_d.di_uid = uid;
+			inode->i_uid = uid;
+		}
+		if (igid != gid) {
+			if (XFS_IS_QUOTA_RUNNING(mp) && XFS_IS_GQUOTA_ON(mp)) {
+				ASSERT(!XFS_IS_PQUOTA_ON(mp));
+				ASSERT(mask & ATTR_GID);
+				ASSERT(gdqp);
+				olddquot2 = xfs_qm_vop_chown(tp, ip,
+							&ip->i_gdquot, gdqp);
+			}
+			ip->i_d.di_gid = gid;
+			inode->i_gid = gid;
+		}
+	}
+
+	/*
+	 * Change file access modes.
+	 */
+	if (mask & ATTR_MODE) {
+		umode_t mode = iattr->ia_mode;
+
+		if (!in_group_p(inode->i_gid) && !capable(CAP_FSETID))
+			mode &= ~S_ISGID;
+
+		ip->i_d.di_mode &= S_IFMT;
+		ip->i_d.di_mode |= mode & ~S_IFMT;
+
+		inode->i_mode &= S_IFMT;
+		inode->i_mode |= mode & ~S_IFMT;
+	}
+
+	/*
+	 * Change file access or modified times.
+	 */
+	if (mask & ATTR_ATIME) {
+		inode->i_atime = iattr->ia_atime;
+		ip->i_d.di_atime.t_sec = iattr->ia_atime.tv_sec;
+		ip->i_d.di_atime.t_nsec = iattr->ia_atime.tv_nsec;
+		ip->i_update_core = 1;
+	}
+	if (mask & ATTR_CTIME) {
+		inode->i_ctime = iattr->ia_ctime;
+		ip->i_d.di_ctime.t_sec = iattr->ia_ctime.tv_sec;
+		ip->i_d.di_ctime.t_nsec = iattr->ia_ctime.tv_nsec;
+		ip->i_update_core = 1;
+	}
+	if (mask & ATTR_MTIME) {
+		inode->i_mtime = iattr->ia_mtime;
+		ip->i_d.di_mtime.t_sec = iattr->ia_mtime.tv_sec;
+		ip->i_d.di_mtime.t_nsec = iattr->ia_mtime.tv_nsec;
+		ip->i_update_core = 1;
+	}
+
+	xfs_trans_log_inode(tp, ip, XFS_ILOG_CORE);
+
+	XFS_STATS_INC(xs_ig_attrchg);
+
+	if (mp->m_flags & XFS_MOUNT_WSYNC)
+		xfs_trans_set_sync(tp);
+	error = xfs_trans_commit(tp, 0);
+
+	xfs_iunlock(ip, XFS_ILOCK_EXCL);
+
+	/*
+	 * Release any dquot(s) the inode had kept before chown.
+	 */
+	xfs_qm_dqrele(olddquot1);
+	xfs_qm_dqrele(olddquot2);
+	xfs_qm_dqrele(udqp);
+	xfs_qm_dqrele(gdqp);
+
+	if (error)
+		return XFS_ERROR(error);
+
+	/*
+	 * XXX(hch): Updating the ACL entries is not atomic vs the i_mode
+	 * 	     update.  We could avoid this with linked transactions
+	 * 	     and passing down the transaction pointer all the way
+	 *	     to attr_set.  No previous user of the generic
+	 * 	     Posix ACL code seems to care about this issue either.
+	 */
+	if ((mask & ATTR_MODE) && !(flags & XFS_ATTR_NOACL)) {
+		error = -xfs_acl_chmod(inode);
+		if (error)
+			return XFS_ERROR(error);
+	}
+
+	return 0;
+
+out_trans_cancel:
+	xfs_trans_cancel(tp, 0);
+	xfs_iunlock(ip, XFS_ILOCK_EXCL);
+out_dqrele:
+	xfs_qm_dqrele(udqp);
+	xfs_qm_dqrele(gdqp);
+	return error;
+}
+
+/*
+ * Truncate file.  Must have write permission and not be a directory.
+ */
+int
+xfs_setattr_size(
+	struct xfs_inode	*ip,
+	struct iattr		*iattr,
+	int			flags)
+{
+	struct xfs_mount	*mp = ip->i_mount;
+	struct inode		*inode = VFS_I(ip);
+	int			mask = iattr->ia_valid;
+	struct xfs_trans	*tp;
+	int			error;
+	uint			lock_flags;
+	uint			commit_flags = 0;
+
+	trace_xfs_setattr(ip);
+
+	if (mp->m_flags & XFS_MOUNT_RDONLY)
+		return XFS_ERROR(EROFS);
+
+	if (XFS_FORCED_SHUTDOWN(mp))
+		return XFS_ERROR(EIO);
+
+	error = -inode_change_ok(inode, iattr);
+	if (error)
+		return XFS_ERROR(error);
+
+	ASSERT(S_ISREG(ip->i_d.di_mode));
+	ASSERT((mask & (ATTR_MODE|ATTR_UID|ATTR_GID|ATTR_ATIME|ATTR_ATIME_SET|
+			ATTR_MTIME_SET|ATTR_KILL_SUID|ATTR_KILL_SGID|
+			ATTR_KILL_PRIV|ATTR_TIMES_SET)) == 0);
+
+	lock_flags = XFS_ILOCK_EXCL;
+	if (!(flags & XFS_ATTR_NOLOCK))
+		lock_flags |= XFS_IOLOCK_EXCL;
+	xfs_ilock(ip, lock_flags);
+
+	/*
+	 * Short circuit the truncate case for zero length files.
+	 */
+	if (iattr->ia_size == 0 &&
+	    ip->i_size == 0 && ip->i_d.di_nextents == 0) {
+		xfs_iunlock(ip, XFS_ILOCK_EXCL);
+		lock_flags &= ~XFS_ILOCK_EXCL;
+		if (mask & ATTR_CTIME) {
+			inode->i_mtime = inode->i_ctime =
+					current_fs_time(inode->i_sb);
+			xfs_mark_inode_dirty_sync(ip);
+		}
+		goto out_unlock;
+	}
+
+	/*
+	 * Make sure that the dquots are attached to the inode.
+	 */
+	error = xfs_qm_dqattach_locked(ip, 0);
+	if (error)
+		goto out_unlock;
+
+	/*
+	 * Now we can make the changes.  Before we join the inode to the
+	 * transaction, take care of the part of the truncation that must be
+	 * done without the inode lock.  This needs to be done before joining
+	 * the inode to the transaction, because the inode cannot be unlocked
+	 * once it is a part of the transaction.
+	 */
+	if (iattr->ia_size > ip->i_size) {
+		/*
+		 * Do the first part of growing a file: zero any data in the
+		 * last block that is beyond the old EOF.  We need to do this
+		 * before the inode is joined to the transaction to modify
+		 * i_size.
+		 */
+		error = xfs_zero_eof(ip, iattr->ia_size, ip->i_size);
+		if (error)
+			goto out_unlock;
+	}
+	xfs_iunlock(ip, XFS_ILOCK_EXCL);
+	lock_flags &= ~XFS_ILOCK_EXCL;
+
+	/*
+	 * We are going to log the inode size change in this transaction so
+	 * any previous writes that are beyond the on disk EOF and the new
+	 * EOF that have not been written out need to be written here.  If we
+	 * do not write the data out, we expose ourselves to the null files
+	 * problem.
+	 *
+	 * Only flush from the on disk size to the smaller of the in memory
+	 * file size or the new size as that's the range we really care about
+	 * here and prevents waiting for other data not within the range we
+	 * care about here.
+	 */
+	if (ip->i_size != ip->i_d.di_size && iattr->ia_size > ip->i_d.di_size) {
+		error = xfs_flush_pages(ip, ip->i_d.di_size, iattr->ia_size,
+					XBF_ASYNC, FI_NONE);
+		if (error)
+			goto out_unlock;
+	}
+
+	/*
+	 * Wait for all I/O to complete.
+	 */
+	xfs_ioend_wait(ip);
+
+	error = -block_truncate_page(inode->i_mapping, iattr->ia_size,
+				     xfs_get_blocks);
+	if (error)
+		goto out_unlock;
+
+	tp = xfs_trans_alloc(mp, XFS_TRANS_SETATTR_SIZE);
+	error = xfs_trans_reserve(tp, 0, XFS_ITRUNCATE_LOG_RES(mp), 0,
+				 XFS_TRANS_PERM_LOG_RES,
+				 XFS_ITRUNCATE_LOG_COUNT);
+	if (error)
+		goto out_trans_cancel;
+
+	truncate_setsize(inode, iattr->ia_size);
+
+	commit_flags = XFS_TRANS_RELEASE_LOG_RES;
+	lock_flags |= XFS_ILOCK_EXCL;
+
+	xfs_ilock(ip, XFS_ILOCK_EXCL);
+
+	xfs_trans_ijoin(tp, ip);
+
+	/*
+	 * Only change the c/mtime if we are changing the size or we are
+	 * explicitly asked to change it.  This handles the semantic difference
+	 * between truncate() and ftruncate() as implemented in the VFS.
+	 *
+	 * The regular truncate() case without ATTR_CTIME and ATTR_MTIME is a
+	 * special case where we need to update the times despite not having
+	 * these flags set.  For all other operations the VFS set these flags
+	 * explicitly if it wants a timestamp update.
+	 */
+	if (iattr->ia_size != ip->i_size &&
+	    (!(mask & (ATTR_CTIME | ATTR_MTIME)))) {
+		iattr->ia_ctime = iattr->ia_mtime =
+			current_fs_time(inode->i_sb);
+		mask |= ATTR_CTIME | ATTR_MTIME;
+	}
+
+	if (iattr->ia_size > ip->i_size) {
+		ip->i_d.di_size = iattr->ia_size;
+		ip->i_size = iattr->ia_size;
+	} else if (iattr->ia_size <= ip->i_size ||
+		   (iattr->ia_size == 0 && ip->i_d.di_nextents)) {
+		/*
+		 * Signal a sync transaction unless we are truncating an
+		 * already unlinked file on a wsync filesystem.
+		 */
+		error = xfs_itruncate_finish(&tp, ip, iattr->ia_size,
+				    XFS_DATA_FORK,
+				    ((ip->i_d.di_nlink != 0 ||
+				      !(mp->m_flags & XFS_MOUNT_WSYNC))
+				     ? 1 : 0));
+		if (error)
+			goto out_trans_abort;
+
+		/*
+		 * Truncated "down", so we're removing references to old data
+		 * here - if we delay flushing for a long time, we expose
+		 * ourselves unduly to the notorious NULL files problem.  So,
+		 * we mark this inode and flush it when the file is closed,
+		 * and do not wait the usual (long) time for writeout.
+		 */
+		xfs_iflags_set(ip, XFS_ITRUNCATED);
+	}
+
+	if (mask & ATTR_CTIME) {
+		inode->i_ctime = iattr->ia_ctime;
+		ip->i_d.di_ctime.t_sec = iattr->ia_ctime.tv_sec;
+		ip->i_d.di_ctime.t_nsec = iattr->ia_ctime.tv_nsec;
+		ip->i_update_core = 1;
+	}
+	if (mask & ATTR_MTIME) {
+		inode->i_mtime = iattr->ia_mtime;
+		ip->i_d.di_mtime.t_sec = iattr->ia_mtime.tv_sec;
+		ip->i_d.di_mtime.t_nsec = iattr->ia_mtime.tv_nsec;
+		ip->i_update_core = 1;
+	}
+
+	xfs_trans_log_inode(tp, ip, XFS_ILOG_CORE);
+
+	XFS_STATS_INC(xs_ig_attrchg);
+
+	if (mp->m_flags & XFS_MOUNT_WSYNC)
+		xfs_trans_set_sync(tp);
+
+	error = xfs_trans_commit(tp, XFS_TRANS_RELEASE_LOG_RES);
+out_unlock:
+	if (lock_flags)
+		xfs_iunlock(ip, lock_flags);
+	return error;
+
+out_trans_abort:
+	commit_flags |= XFS_TRANS_ABORT;
+out_trans_cancel:
+	xfs_trans_cancel(tp, commit_flags);
+	goto out_unlock;
+}
+
 STATIC int
 xfs_vn_setattr(
 	struct dentry	*dentry,
 	struct iattr	*iattr)
 {
-	return -xfs_setattr(XFS_I(dentry->d_inode), iattr, 0);
+	if (iattr->ia_valid & ATTR_SIZE)
+		return -xfs_setattr_size(XFS_I(dentry->d_inode), iattr, 0);
+	return -xfs_setattr_nonsize(XFS_I(dentry->d_inode), iattr, 0);
 }
 
 #define XFS_FIEMAP_FLAGS	(FIEMAP_FLAG_SYNC|FIEMAP_FLAG_XATTR)
Index: xfs/fs/xfs/linux-2.6/xfs_acl.c
===================================================================
--- xfs.orig/fs/xfs/linux-2.6/xfs_acl.c	2011-06-29 11:29:02.698306035 +0200
+++ xfs/fs/xfs/linux-2.6/xfs_acl.c	2011-06-29 11:29:07.154948558 +0200
@@ -264,7 +264,7 @@ xfs_set_mode(struct inode *inode, mode_t
 		iattr.ia_mode = mode;
 		iattr.ia_ctime = current_fs_time(inode->i_sb);
 
-		error = -xfs_setattr(XFS_I(inode), &iattr, XFS_ATTR_NOACL);
+		error = -xfs_setattr_nonsize(XFS_I(inode), &iattr, XFS_ATTR_NOACL);
 	}
 
 	return error;
Index: xfs/fs/xfs/linux-2.6/xfs_file.c
===================================================================
--- xfs.orig/fs/xfs/linux-2.6/xfs_file.c	2011-06-29 11:29:02.711639297 +0200
+++ xfs/fs/xfs/linux-2.6/xfs_file.c	2011-06-29 11:29:07.158281874 +0200
@@ -944,7 +944,7 @@ xfs_file_fallocate(
 
 		iattr.ia_valid = ATTR_SIZE;
 		iattr.ia_size = new_size;
-		error = -xfs_setattr(ip, &iattr, XFS_ATTR_NOLOCK);
+		error = -xfs_setattr_size(ip, &iattr, XFS_ATTR_NOLOCK);
 	}
 
 out_unlock:
Index: xfs/fs/xfs/xfs_vnodeops.c
===================================================================
--- xfs.orig/fs/xfs/xfs_vnodeops.c	2011-06-29 11:29:02.721639242 +0200
+++ xfs/fs/xfs/xfs_vnodeops.c	2011-06-29 11:29:07.158281874 +0200
@@ -50,430 +50,6 @@
 #include "xfs_vnodeops.h"
 #include "xfs_trace.h"
 
-int
-xfs_setattr(
-	struct xfs_inode	*ip,
-	struct iattr		*iattr,
-	int			flags)
-{
-	xfs_mount_t		*mp = ip->i_mount;
-	struct inode		*inode = VFS_I(ip);
-	int			mask = iattr->ia_valid;
-	xfs_trans_t		*tp;
-	int			code;
-	uint			lock_flags;
-	uint			commit_flags=0;
-	uid_t			uid=0, iuid=0;
-	gid_t			gid=0, igid=0;
-	struct xfs_dquot	*udqp, *gdqp, *olddquot1, *olddquot2;
-	int			need_iolock = 1;
-
-	trace_xfs_setattr(ip);
-
-	if (mp->m_flags & XFS_MOUNT_RDONLY)
-		return XFS_ERROR(EROFS);
-
-	if (XFS_FORCED_SHUTDOWN(mp))
-		return XFS_ERROR(EIO);
-
-	code = -inode_change_ok(inode, iattr);
-	if (code)
-		return code;
-
-	olddquot1 = olddquot2 = NULL;
-	udqp = gdqp = NULL;
-
-	/*
-	 * If disk quotas is on, we make sure that the dquots do exist on disk,
-	 * before we start any other transactions. Trying to do this later
-	 * is messy. We don't care to take a readlock to look at the ids
-	 * in inode here, because we can't hold it across the trans_reserve.
-	 * If the IDs do change before we take the ilock, we're covered
-	 * because the i_*dquot fields will get updated anyway.
-	 */
-	if (XFS_IS_QUOTA_ON(mp) && (mask & (ATTR_UID|ATTR_GID))) {
-		uint	qflags = 0;
-
-		if ((mask & ATTR_UID) && XFS_IS_UQUOTA_ON(mp)) {
-			uid = iattr->ia_uid;
-			qflags |= XFS_QMOPT_UQUOTA;
-		} else {
-			uid = ip->i_d.di_uid;
-		}
-		if ((mask & ATTR_GID) && XFS_IS_GQUOTA_ON(mp)) {
-			gid = iattr->ia_gid;
-			qflags |= XFS_QMOPT_GQUOTA;
-		}  else {
-			gid = ip->i_d.di_gid;
-		}
-
-		/*
-		 * We take a reference when we initialize udqp and gdqp,
-		 * so it is important that we never blindly double trip on
-		 * the same variable. See xfs_create() for an example.
-		 */
-		ASSERT(udqp == NULL);
-		ASSERT(gdqp == NULL);
-		code = xfs_qm_vop_dqalloc(ip, uid, gid, xfs_get_projid(ip),
-					 qflags, &udqp, &gdqp);
-		if (code)
-			return code;
-	}
-
-	/*
-	 * For the other attributes, we acquire the inode lock and
-	 * first do an error checking pass.
-	 */
-	tp = NULL;
-	lock_flags = XFS_ILOCK_EXCL;
-	if (flags & XFS_ATTR_NOLOCK)
-		need_iolock = 0;
-	if (!(mask & ATTR_SIZE)) {
-		tp = xfs_trans_alloc(mp, XFS_TRANS_SETATTR_NOT_SIZE);
-		commit_flags = 0;
-		code = xfs_trans_reserve(tp, 0, XFS_ICHANGE_LOG_RES(mp),
-					 0, 0, 0);
-		if (code) {
-			lock_flags = 0;
-			goto error_return;
-		}
-	} else {
-		if (need_iolock)
-			lock_flags |= XFS_IOLOCK_EXCL;
-	}
-
-	xfs_ilock(ip, lock_flags);
-
-	/*
-	 * Change file ownership.  Must be the owner or privileged.
-	 */
-	if (mask & (ATTR_UID|ATTR_GID)) {
-		/*
-		 * These IDs could have changed since we last looked at them.
-		 * But, we're assured that if the ownership did change
-		 * while we didn't have the inode locked, inode's dquot(s)
-		 * would have changed also.
-		 */
-		iuid = ip->i_d.di_uid;
-		igid = ip->i_d.di_gid;
-		gid = (mask & ATTR_GID) ? iattr->ia_gid : igid;
-		uid = (mask & ATTR_UID) ? iattr->ia_uid : iuid;
-
-		/*
-		 * Do a quota reservation only if uid/gid is actually
-		 * going to change.
-		 */
-		if (XFS_IS_QUOTA_RUNNING(mp) &&
-		    ((XFS_IS_UQUOTA_ON(mp) && iuid != uid) ||
-		     (XFS_IS_GQUOTA_ON(mp) && igid != gid))) {
-			ASSERT(tp);
-			code = xfs_qm_vop_chown_reserve(tp, ip, udqp, gdqp,
-						capable(CAP_FOWNER) ?
-						XFS_QMOPT_FORCE_RES : 0);
-			if (code)	/* out of quota */
-				goto error_return;
-		}
-	}
-
-	/*
-	 * Truncate file.  Must have write permission and not be a directory.
-	 */
-	if (mask & ATTR_SIZE) {
-		/* Short circuit the truncate case for zero length files */
-		if (iattr->ia_size == 0 &&
-		    ip->i_size == 0 && ip->i_d.di_nextents == 0) {
-			xfs_iunlock(ip, XFS_ILOCK_EXCL);
-			lock_flags &= ~XFS_ILOCK_EXCL;
-			if (mask & ATTR_CTIME) {
-				inode->i_mtime = inode->i_ctime =
-						current_fs_time(inode->i_sb);
-				xfs_mark_inode_dirty_sync(ip);
-			}
-			code = 0;
-			goto error_return;
-		}
-
-		if (S_ISDIR(ip->i_d.di_mode)) {
-			code = XFS_ERROR(EISDIR);
-			goto error_return;
-		} else if (!S_ISREG(ip->i_d.di_mode)) {
-			code = XFS_ERROR(EINVAL);
-			goto error_return;
-		}
-
-		/*
-		 * Make sure that the dquots are attached to the inode.
-		 */
-		code = xfs_qm_dqattach_locked(ip, 0);
-		if (code)
-			goto error_return;
-
-		/*
-		 * Now we can make the changes.  Before we join the inode
-		 * to the transaction, if ATTR_SIZE is set then take care of
-		 * the part of the truncation that must be done without the
-		 * inode lock.  This needs to be done before joining the inode
-		 * to the transaction, because the inode cannot be unlocked
-		 * once it is a part of the transaction.
-		 */
-		if (iattr->ia_size > ip->i_size) {
-			/*
-			 * Do the first part of growing a file: zero any data
-			 * in the last block that is beyond the old EOF.  We
-			 * need to do this before the inode is joined to the
-			 * transaction to modify the i_size.
-			 */
-			code = xfs_zero_eof(ip, iattr->ia_size, ip->i_size);
-			if (code)
-				goto error_return;
-		}
-		xfs_iunlock(ip, XFS_ILOCK_EXCL);
-		lock_flags &= ~XFS_ILOCK_EXCL;
-
-		/*
-		 * We are going to log the inode size change in this
-		 * transaction so any previous writes that are beyond the on
-		 * disk EOF and the new EOF that have not been written out need
-		 * to be written here. If we do not write the data out, we
-		 * expose ourselves to the null files problem.
-		 *
-		 * Only flush from the on disk size to the smaller of the in
-		 * memory file size or the new size as that's the range we
-		 * really care about here and prevents waiting for other data
-		 * not within the range we care about here.
-		 */
-		if (ip->i_size != ip->i_d.di_size &&
-		    iattr->ia_size > ip->i_d.di_size) {
-			code = xfs_flush_pages(ip,
-					ip->i_d.di_size, iattr->ia_size,
-					XBF_ASYNC, FI_NONE);
-			if (code)
-				goto error_return;
-		}
-
-		/* wait for all I/O to complete */
-		xfs_ioend_wait(ip);
-
-		code = -block_truncate_page(inode->i_mapping, iattr->ia_size,
-					    xfs_get_blocks);
-		if (code)
-			goto error_return;
-
-		tp = xfs_trans_alloc(mp, XFS_TRANS_SETATTR_SIZE);
-		code = xfs_trans_reserve(tp, 0, XFS_ITRUNCATE_LOG_RES(mp), 0,
-					 XFS_TRANS_PERM_LOG_RES,
-					 XFS_ITRUNCATE_LOG_COUNT);
-		if (code)
-			goto error_return;
-
-		truncate_setsize(inode, iattr->ia_size);
-
-		commit_flags = XFS_TRANS_RELEASE_LOG_RES;
-		lock_flags |= XFS_ILOCK_EXCL;
-
-		xfs_ilock(ip, XFS_ILOCK_EXCL);
-
-		xfs_trans_ijoin(tp, ip);
-
-		/*
-		 * Only change the c/mtime if we are changing the size
-		 * or we are explicitly asked to change it. This handles
-		 * the semantic difference between truncate() and ftruncate()
-		 * as implemented in the VFS.
-		 *
-		 * The regular truncate() case without ATTR_CTIME and ATTR_MTIME
-		 * is a special case where we need to update the times despite
-		 * not having these flags set.  For all other operations the
-		 * VFS set these flags explicitly if it wants a timestamp
-		 * update.
-		 */
-		if (iattr->ia_size != ip->i_size &&
-		    (!(mask & (ATTR_CTIME | ATTR_MTIME)))) {
-			iattr->ia_ctime = iattr->ia_mtime =
-				current_fs_time(inode->i_sb);
-			mask |= ATTR_CTIME | ATTR_MTIME;
-		}
-
-		if (iattr->ia_size > ip->i_size) {
-			ip->i_d.di_size = iattr->ia_size;
-			ip->i_size = iattr->ia_size;
-			xfs_trans_log_inode(tp, ip, XFS_ILOG_CORE);
-		} else if (iattr->ia_size <= ip->i_size ||
-			   (iattr->ia_size == 0 && ip->i_d.di_nextents)) {
-			/*
-			 * signal a sync transaction unless
-			 * we're truncating an already unlinked
-			 * file on a wsync filesystem
-			 */
-			code = xfs_itruncate_finish(&tp, ip, iattr->ia_size,
-					    XFS_DATA_FORK,
-					    ((ip->i_d.di_nlink != 0 ||
-					      !(mp->m_flags & XFS_MOUNT_WSYNC))
-					     ? 1 : 0));
-			if (code)
-				goto abort_return;
-			/*
-			 * Truncated "down", so we're removing references
-			 * to old data here - if we now delay flushing for
-			 * a long time, we expose ourselves unduly to the
-			 * notorious NULL files problem.  So, we mark this
-			 * vnode and flush it when the file is closed, and
-			 * do not wait the usual (long) time for writeout.
-			 */
-			xfs_iflags_set(ip, XFS_ITRUNCATED);
-		}
-	} else if (tp) {
-		xfs_trans_ijoin(tp, ip);
-	}
-
-	/*
-	 * Change file ownership.  Must be the owner or privileged.
-	 */
-	if (mask & (ATTR_UID|ATTR_GID)) {
-		/*
-		 * CAP_FSETID overrides the following restrictions:
-		 *
-		 * The set-user-ID and set-group-ID bits of a file will be
-		 * cleared upon successful return from chown()
-		 */
-		if ((ip->i_d.di_mode & (S_ISUID|S_ISGID)) &&
-		    !capable(CAP_FSETID)) {
-			ip->i_d.di_mode &= ~(S_ISUID|S_ISGID);
-		}
-
-		/*
-		 * Change the ownerships and register quota modifications
-		 * in the transaction.
-		 */
-		if (iuid != uid) {
-			if (XFS_IS_QUOTA_RUNNING(mp) && XFS_IS_UQUOTA_ON(mp)) {
-				ASSERT(mask & ATTR_UID);
-				ASSERT(udqp);
-				olddquot1 = xfs_qm_vop_chown(tp, ip,
-							&ip->i_udquot, udqp);
-			}
-			ip->i_d.di_uid = uid;
-			inode->i_uid = uid;
-		}
-		if (igid != gid) {
-			if (XFS_IS_QUOTA_RUNNING(mp) && XFS_IS_GQUOTA_ON(mp)) {
-				ASSERT(!XFS_IS_PQUOTA_ON(mp));
-				ASSERT(mask & ATTR_GID);
-				ASSERT(gdqp);
-				olddquot2 = xfs_qm_vop_chown(tp, ip,
-							&ip->i_gdquot, gdqp);
-			}
-			ip->i_d.di_gid = gid;
-			inode->i_gid = gid;
-		}
-	}
-
-	/*
-	 * Change file access modes.
-	 */
-	if (mask & ATTR_MODE) {
-		umode_t mode = iattr->ia_mode;
-
-		if (!in_group_p(inode->i_gid) && !capable(CAP_FSETID))
-			mode &= ~S_ISGID;
-
-		ip->i_d.di_mode &= S_IFMT;
-		ip->i_d.di_mode |= mode & ~S_IFMT;
-
-		inode->i_mode &= S_IFMT;
-		inode->i_mode |= mode & ~S_IFMT;
-	}
-
-	/*
-	 * Change file access or modified times.
-	 */
-	if (mask & ATTR_ATIME) {
-		inode->i_atime = iattr->ia_atime;
-		ip->i_d.di_atime.t_sec = iattr->ia_atime.tv_sec;
-		ip->i_d.di_atime.t_nsec = iattr->ia_atime.tv_nsec;
-		ip->i_update_core = 1;
-	}
-	if (mask & ATTR_CTIME) {
-		inode->i_ctime = iattr->ia_ctime;
-		ip->i_d.di_ctime.t_sec = iattr->ia_ctime.tv_sec;
-		ip->i_d.di_ctime.t_nsec = iattr->ia_ctime.tv_nsec;
-		ip->i_update_core = 1;
-	}
-	if (mask & ATTR_MTIME) {
-		inode->i_mtime = iattr->ia_mtime;
-		ip->i_d.di_mtime.t_sec = iattr->ia_mtime.tv_sec;
-		ip->i_d.di_mtime.t_nsec = iattr->ia_mtime.tv_nsec;
-		ip->i_update_core = 1;
-	}
-
-	/*
-	 * And finally, log the inode core if any attribute in it
-	 * has been changed.
-	 */
-	if (mask & (ATTR_UID|ATTR_GID|ATTR_MODE|
-		    ATTR_ATIME|ATTR_CTIME|ATTR_MTIME))
-		xfs_trans_log_inode(tp, ip, XFS_ILOG_CORE);
-
-	XFS_STATS_INC(xs_ig_attrchg);
-
-	/*
-	 * If this is a synchronous mount, make sure that the
-	 * transaction goes to disk before returning to the user.
-	 * This is slightly sub-optimal in that truncates require
-	 * two sync transactions instead of one for wsync filesystems.
-	 * One for the truncate and one for the timestamps since we
-	 * don't want to change the timestamps unless we're sure the
-	 * truncate worked.  Truncates are less than 1% of the laddis
-	 * mix so this probably isn't worth the trouble to optimize.
-	 */
-	code = 0;
-	if (mp->m_flags & XFS_MOUNT_WSYNC)
-		xfs_trans_set_sync(tp);
-
-	code = xfs_trans_commit(tp, commit_flags);
-
-	xfs_iunlock(ip, lock_flags);
-
-	/*
-	 * Release any dquot(s) the inode had kept before chown.
-	 */
-	xfs_qm_dqrele(olddquot1);
-	xfs_qm_dqrele(olddquot2);
-	xfs_qm_dqrele(udqp);
-	xfs_qm_dqrele(gdqp);
-
-	if (code)
-		return code;
-
-	/*
-	 * XXX(hch): Updating the ACL entries is not atomic vs the i_mode
-	 * 	     update.  We could avoid this with linked transactions
-	 * 	     and passing down the transaction pointer all the way
-	 *	     to attr_set.  No previous user of the generic
-	 * 	     Posix ACL code seems to care about this issue either.
-	 */
-	if ((mask & ATTR_MODE) && !(flags & XFS_ATTR_NOACL)) {
-		code = -xfs_acl_chmod(inode);
-		if (code)
-			return XFS_ERROR(code);
-	}
-
-	return 0;
-
- abort_return:
-	commit_flags |= XFS_TRANS_ABORT;
- error_return:
-	xfs_qm_dqrele(udqp);
-	xfs_qm_dqrele(gdqp);
-	if (tp) {
-		xfs_trans_cancel(tp, commit_flags);
-	}
-	if (lock_flags != 0) {
-		xfs_iunlock(ip, lock_flags);
-	}
-	return code;
-}
-
 /*
  * The maximum pathlen is 1024 bytes. Since the minimum file system
  * blocksize is 512 bytes, we can get a max of 2 extents back from
@@ -2784,7 +2360,7 @@ xfs_change_file_space(
 		iattr.ia_valid = ATTR_SIZE;
 		iattr.ia_size = startoffset;
 
-		error = xfs_setattr(ip, &iattr, attr_flags);
+		error = xfs_setattr_size(ip, &iattr, attr_flags);
 
 		if (error)
 			return error;
Index: xfs/fs/xfs/xfs_vnodeops.h
===================================================================
--- xfs.orig/fs/xfs/xfs_vnodeops.h	2011-06-29 11:29:02.734972504 +0200
+++ xfs/fs/xfs/xfs_vnodeops.h	2011-06-29 11:29:07.161615190 +0200
@@ -13,7 +13,8 @@ struct xfs_inode;
 struct xfs_iomap;
 
 
-int xfs_setattr(struct xfs_inode *ip, struct iattr *vap, int flags);
+int xfs_setattr_nonsize(struct xfs_inode *ip, struct iattr *vap, int flags);
+int xfs_setattr_size(struct xfs_inode *ip, struct iattr *vap, int flags);
 #define	XFS_ATTR_DMI		0x01	/* invocation from a DMI function */
 #define	XFS_ATTR_NONBLOCK	0x02	/* return EAGAIN if operation would block */
 #define XFS_ATTR_NOLOCK		0x04	/* Don't grab any conflicting locks */

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 88+ messages in thread

* [PATCH 06/27] xfs: kill xfs_itruncate_start
  2011-07-01  9:43 [PATCH 00/27] patch queue for Linux 3.1, V2 Christoph Hellwig
                   ` (3 preceding siblings ...)
  2011-07-01  9:43 ` [PATCH 04/27] xfs: split xfs_setattr Christoph Hellwig
@ 2011-07-01  9:43 ` Christoph Hellwig
  2011-07-01  9:43 ` [PATCH 07/27] xfs: split xfs_itruncate_finish Christoph Hellwig
                   ` (21 subsequent siblings)
  26 siblings, 0 replies; 88+ messages in thread
From: Christoph Hellwig @ 2011-07-01  9:43 UTC (permalink / raw)
  To: xfs

[-- Attachment #1: xfs-kill-xfs_itruncate_start --]
[-- Type: text/plain, Size: 11685 bytes --]

xfs_itruncate_start is a rather length wrapper that evaluates to a call
to xfs_ioend_wait and xfs_tosspages, and only has two callers.

Instead of using the complicated checks left over from IRIX where we
can to truncate the pagecache just call xfs_tosspages
(aka truncate_inode_pages) directly as we want to get rid of all data
after i_size, and truncate_inode_pages handles incorrect alignments
and too large offsets just fine.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Alex Elder <aelder@sgi.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>

Index: xfs/fs/xfs/xfs_inode.c
===================================================================
--- xfs.orig/fs/xfs/xfs_inode.c	2011-06-29 11:29:02.494973804 +0200
+++ xfs/fs/xfs/xfs_inode.c	2011-06-29 11:29:11.888256249 +0200
@@ -1217,165 +1217,8 @@ xfs_isize_check(
 #endif	/* DEBUG */
 
 /*
- * Calculate the last possible buffered byte in a file.  This must
- * include data that was buffered beyond the EOF by the write code.
- * This also needs to deal with overflowing the xfs_fsize_t type
- * which can happen for sizes near the limit.
- *
- * We also need to take into account any blocks beyond the EOF.  It
- * may be the case that they were buffered by a write which failed.
- * In that case the pages will still be in memory, but the inode size
- * will never have been updated.
- */
-STATIC xfs_fsize_t
-xfs_file_last_byte(
-	xfs_inode_t	*ip)
-{
-	xfs_mount_t	*mp;
-	xfs_fsize_t	last_byte;
-	xfs_fileoff_t	last_block;
-	xfs_fileoff_t	size_last_block;
-	int		error;
-
-	ASSERT(xfs_isilocked(ip, XFS_IOLOCK_EXCL|XFS_IOLOCK_SHARED));
-
-	mp = ip->i_mount;
-	/*
-	 * Only check for blocks beyond the EOF if the extents have
-	 * been read in.  This eliminates the need for the inode lock,
-	 * and it also saves us from looking when it really isn't
-	 * necessary.
-	 */
-	if (ip->i_df.if_flags & XFS_IFEXTENTS) {
-		xfs_ilock(ip, XFS_ILOCK_SHARED);
-		error = xfs_bmap_last_offset(NULL, ip, &last_block,
-			XFS_DATA_FORK);
-		xfs_iunlock(ip, XFS_ILOCK_SHARED);
-		if (error) {
-			last_block = 0;
-		}
-	} else {
-		last_block = 0;
-	}
-	size_last_block = XFS_B_TO_FSB(mp, (xfs_ufsize_t)ip->i_size);
-	last_block = XFS_FILEOFF_MAX(last_block, size_last_block);
-
-	last_byte = XFS_FSB_TO_B(mp, last_block);
-	if (last_byte < 0) {
-		return XFS_MAXIOFFSET(mp);
-	}
-	last_byte += (1 << mp->m_writeio_log);
-	if (last_byte < 0) {
-		return XFS_MAXIOFFSET(mp);
-	}
-	return last_byte;
-}
-
-/*
- * Start the truncation of the file to new_size.  The new size
- * must be smaller than the current size.  This routine will
- * clear the buffer and page caches of file data in the removed
- * range, and xfs_itruncate_finish() will remove the underlying
- * disk blocks.
- *
- * The inode must have its I/O lock locked EXCLUSIVELY, and it
- * must NOT have the inode lock held at all.  This is because we're
- * calling into the buffer/page cache code and we can't hold the
- * inode lock when we do so.
- *
- * We need to wait for any direct I/Os in flight to complete before we
- * proceed with the truncate. This is needed to prevent the extents
- * being read or written by the direct I/Os from being removed while the
- * I/O is in flight as there is no other method of synchronising
- * direct I/O with the truncate operation.  Also, because we hold
- * the IOLOCK in exclusive mode, we prevent new direct I/Os from being
- * started until the truncate completes and drops the lock. Essentially,
- * the xfs_ioend_wait() call forms an I/O barrier that provides strict
- * ordering between direct I/Os and the truncate operation.
- *
- * The flags parameter can have either the value XFS_ITRUNC_DEFINITE
- * or XFS_ITRUNC_MAYBE.  The XFS_ITRUNC_MAYBE value should be used
- * in the case that the caller is locking things out of order and
- * may not be able to call xfs_itruncate_finish() with the inode lock
- * held without dropping the I/O lock.  If the caller must drop the
- * I/O lock before calling xfs_itruncate_finish(), then xfs_itruncate_start()
- * must be called again with all the same restrictions as the initial
- * call.
- */
-int
-xfs_itruncate_start(
-	xfs_inode_t	*ip,
-	uint		flags,
-	xfs_fsize_t	new_size)
-{
-	xfs_fsize_t	last_byte;
-	xfs_off_t	toss_start;
-	xfs_mount_t	*mp;
-	int		error = 0;
-
-	ASSERT(xfs_isilocked(ip, XFS_IOLOCK_EXCL));
-	ASSERT((new_size == 0) || (new_size <= ip->i_size));
-	ASSERT((flags == XFS_ITRUNC_DEFINITE) ||
-	       (flags == XFS_ITRUNC_MAYBE));
-
-	mp = ip->i_mount;
-
-	/* wait for the completion of any pending DIOs */
-	if (new_size == 0 || new_size < ip->i_size)
-		xfs_ioend_wait(ip);
-
-	/*
-	 * Call toss_pages or flushinval_pages to get rid of pages
-	 * overlapping the region being removed.  We have to use
-	 * the less efficient flushinval_pages in the case that the
-	 * caller may not be able to finish the truncate without
-	 * dropping the inode's I/O lock.  Make sure
-	 * to catch any pages brought in by buffers overlapping
-	 * the EOF by searching out beyond the isize by our
-	 * block size. We round new_size up to a block boundary
-	 * so that we don't toss things on the same block as
-	 * new_size but before it.
-	 *
-	 * Before calling toss_page or flushinval_pages, make sure to
-	 * call remapf() over the same region if the file is mapped.
-	 * This frees up mapped file references to the pages in the
-	 * given range and for the flushinval_pages case it ensures
-	 * that we get the latest mapped changes flushed out.
-	 */
-	toss_start = XFS_B_TO_FSB(mp, (xfs_ufsize_t)new_size);
-	toss_start = XFS_FSB_TO_B(mp, toss_start);
-	if (toss_start < 0) {
-		/*
-		 * The place to start tossing is beyond our maximum
-		 * file size, so there is no way that the data extended
-		 * out there.
-		 */
-		return 0;
-	}
-	last_byte = xfs_file_last_byte(ip);
-	trace_xfs_itruncate_start(ip, new_size, flags, toss_start, last_byte);
-	if (last_byte > toss_start) {
-		if (flags & XFS_ITRUNC_DEFINITE) {
-			xfs_tosspages(ip, toss_start,
-					-1, FI_REMAPF_LOCKED);
-		} else {
-			error = xfs_flushinval_pages(ip, toss_start,
-					-1, FI_REMAPF_LOCKED);
-		}
-	}
-
-#ifdef DEBUG
-	if (new_size == 0) {
-		ASSERT(VN_CACHED(VFS_I(ip)) == 0);
-	}
-#endif
-	return error;
-}
-
-/*
- * Shrink the file to the given new_size.  The new size must be smaller than
- * the current size.  This will free up the underlying blocks in the removed
- * range after a call to xfs_itruncate_start() or xfs_atruncate_start().
+ * Free up the underlying blocks past new_size.  The new size must be
+ * smaller than the current size.
  *
  * The transaction passed to this routine must have made a permanent log
  * reservation of at least XFS_ITRUNCATE_LOG_RES.  This routine may commit the
@@ -1387,7 +1230,7 @@ xfs_itruncate_start(
  * will be "held" within the returned transaction.  This routine does NOT
  * require any disk space to be reserved for it within the transaction.
  *
- * The fork parameter must be either xfs_attr_fork or xfs_data_fork, and it
+ * The fork parameter must be either XFS_ATTR_FORK or XFS_DATA_FORK, and it
  * indicates the fork which is to be truncated.  For the attribute fork we only
  * support truncation to size 0.
  *
Index: xfs/fs/xfs/xfs_vnodeops.c
===================================================================
--- xfs.orig/fs/xfs/xfs_vnodeops.c	2011-06-29 11:29:07.158281874 +0200
+++ xfs/fs/xfs/xfs_vnodeops.c	2011-06-29 11:29:11.888256249 +0200
@@ -197,13 +197,6 @@ xfs_free_eofblocks(
 		 */
 		tp = xfs_trans_alloc(mp, XFS_TRANS_INACTIVE);
 
-		/*
-		 * Do the xfs_itruncate_start() call before
-		 * reserving any log space because
-		 * itruncate_start will call into the buffer
-		 * cache and we can't
-		 * do that within a transaction.
-		 */
 		if (flags & XFS_FREE_EOF_TRYLOCK) {
 			if (!xfs_ilock_nowait(ip, XFS_IOLOCK_EXCL)) {
 				xfs_trans_cancel(tp, 0);
@@ -212,13 +205,6 @@ xfs_free_eofblocks(
 		} else {
 			xfs_ilock(ip, XFS_IOLOCK_EXCL);
 		}
-		error = xfs_itruncate_start(ip, XFS_ITRUNC_DEFINITE,
-				    ip->i_size);
-		if (error) {
-			xfs_trans_cancel(tp, 0);
-			xfs_iunlock(ip, XFS_IOLOCK_EXCL);
-			return error;
-		}
 
 		error = xfs_trans_reserve(tp, 0,
 					  XFS_ITRUNCATE_LOG_RES(mp),
@@ -660,20 +646,9 @@ xfs_inactive(
 
 	tp = xfs_trans_alloc(mp, XFS_TRANS_INACTIVE);
 	if (truncate) {
-		/*
-		 * Do the xfs_itruncate_start() call before
-		 * reserving any log space because itruncate_start
-		 * will call into the buffer cache and we can't
-		 * do that within a transaction.
-		 */
 		xfs_ilock(ip, XFS_IOLOCK_EXCL);
 
-		error = xfs_itruncate_start(ip, XFS_ITRUNC_DEFINITE, 0);
-		if (error) {
-			xfs_trans_cancel(tp, 0);
-			xfs_iunlock(ip, XFS_IOLOCK_EXCL);
-			return VN_INACTIVE_CACHE;
-		}
+		xfs_ioend_wait(ip);
 
 		error = xfs_trans_reserve(tp, 0,
 					  XFS_ITRUNCATE_LOG_RES(mp),
Index: xfs/fs/xfs/linux-2.6/xfs_trace.h
===================================================================
--- xfs.orig/fs/xfs/linux-2.6/xfs_trace.h	2011-06-29 11:29:02.518307010 +0200
+++ xfs/fs/xfs/linux-2.6/xfs_trace.h	2011-06-29 11:29:11.891589564 +0200
@@ -1029,40 +1029,6 @@ DEFINE_SIMPLE_IO_EVENT(xfs_delalloc_enos
 DEFINE_SIMPLE_IO_EVENT(xfs_unwritten_convert);
 DEFINE_SIMPLE_IO_EVENT(xfs_get_blocks_notfound);
 
-
-TRACE_EVENT(xfs_itruncate_start,
-	TP_PROTO(struct xfs_inode *ip, xfs_fsize_t new_size, int flag,
-		 xfs_off_t toss_start, xfs_off_t toss_finish),
-	TP_ARGS(ip, new_size, flag, toss_start, toss_finish),
-	TP_STRUCT__entry(
-		__field(dev_t, dev)
-		__field(xfs_ino_t, ino)
-		__field(xfs_fsize_t, size)
-		__field(xfs_fsize_t, new_size)
-		__field(xfs_off_t, toss_start)
-		__field(xfs_off_t, toss_finish)
-		__field(int, flag)
-	),
-	TP_fast_assign(
-		__entry->dev = VFS_I(ip)->i_sb->s_dev;
-		__entry->ino = ip->i_ino;
-		__entry->size = ip->i_d.di_size;
-		__entry->new_size = new_size;
-		__entry->toss_start = toss_start;
-		__entry->toss_finish = toss_finish;
-		__entry->flag = flag;
-	),
-	TP_printk("dev %d:%d ino 0x%llx %s size 0x%llx new_size 0x%llx "
-		  "toss start 0x%llx toss finish 0x%llx",
-		  MAJOR(__entry->dev), MINOR(__entry->dev),
-		  __entry->ino,
-		  __print_flags(__entry->flag, "|", XFS_ITRUNC_FLAGS),
-		  __entry->size,
-		  __entry->new_size,
-		  __entry->toss_start,
-		  __entry->toss_finish)
-);
-
 DECLARE_EVENT_CLASS(xfs_itrunc_class,
 	TP_PROTO(struct xfs_inode *ip, xfs_fsize_t new_size),
 	TP_ARGS(ip, new_size),
Index: xfs/fs/xfs/xfs_inode.h
===================================================================
--- xfs.orig/fs/xfs/xfs_inode.h	2011-06-29 11:29:02.531640272 +0200
+++ xfs/fs/xfs/xfs_inode.h	2011-06-29 11:29:11.891589564 +0200
@@ -458,16 +458,6 @@ static inline void xfs_ifunlock(xfs_inod
 extern struct lock_class_key xfs_iolock_reclaimable;
 
 /*
- * Flags for xfs_itruncate_start().
- */
-#define	XFS_ITRUNC_DEFINITE	0x1
-#define	XFS_ITRUNC_MAYBE	0x2
-
-#define XFS_ITRUNC_FLAGS \
-	{ XFS_ITRUNC_DEFINITE,	"DEFINITE" }, \
-	{ XFS_ITRUNC_MAYBE,	"MAYBE" }
-
-/*
  * For multiple groups support: if S_ISGID bit is set in the parent
  * directory, group of new file is set to that of the parent, and
  * new subdirectory gets S_ISGID bit from parent.
@@ -501,7 +491,6 @@ uint		xfs_ip2xflags(struct xfs_inode *);
 uint		xfs_dic2xflags(struct xfs_dinode *);
 int		xfs_ifree(struct xfs_trans *, xfs_inode_t *,
 			   struct xfs_bmap_free *);
-int		xfs_itruncate_start(xfs_inode_t *, uint, xfs_fsize_t);
 int		xfs_itruncate_finish(struct xfs_trans **, xfs_inode_t *,
 				     xfs_fsize_t, int, int);
 int		xfs_iunlink(struct xfs_trans *, xfs_inode_t *);

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 88+ messages in thread

* [PATCH 07/27] xfs: split xfs_itruncate_finish
  2011-07-01  9:43 [PATCH 00/27] patch queue for Linux 3.1, V2 Christoph Hellwig
                   ` (4 preceding siblings ...)
  2011-07-01  9:43 ` [PATCH 06/27] xfs: kill xfs_itruncate_start Christoph Hellwig
@ 2011-07-01  9:43 ` Christoph Hellwig
  2011-07-06  4:35   ` Alex Elder
  2011-07-01  9:43 ` [PATCH 08/27] xfs: improve sync behaviour in the fact of aggressive dirtying Christoph Hellwig
                   ` (20 subsequent siblings)
  26 siblings, 1 reply; 88+ messages in thread
From: Christoph Hellwig @ 2011-07-01  9:43 UTC (permalink / raw)
  To: xfs

[-- Attachment #1: xfs-split-xfs_itruncate_finish --]
[-- Type: text/plain, Size: 22229 bytes --]

Split the guts of xfs_itruncate_finish that loop over the existing extents
and calls xfs_bunmapi on them into a new helper, xfs_itruncate_externs.
Make xfs_attr_inactive call it directly instead of xfs_itruncate_finish,
which allows to simplify the latter a lot, by only letting it deal with
the data fork.  As a result xfs_itruncate_finish is renamed to
xfs_itruncate_data to make its use case more obvious.

Also remove the sync parameter from xfs_itruncate_data, which has been
unessecary since the introduction of the busy extent list in 2002, and
completely dead code since 2003 when the XFS_BMAPI_ASYNC parameter was
made a no-op.

I can't actually see why the xfs_attr_inactive needs to set the transaction
sync, but let's keep this patch simple and without changes in behaviour.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>

Index: xfs/fs/xfs/linux-2.6/xfs_iops.c
===================================================================
--- xfs.orig/fs/xfs/linux-2.6/xfs_iops.c	2011-06-30 09:02:46.606760906 +0200
+++ xfs/fs/xfs/linux-2.6/xfs_iops.c	2011-06-30 09:05:30.870092231 +0200
@@ -879,15 +879,7 @@ xfs_setattr_size(
 		ip->i_size = iattr->ia_size;
 	} else if (iattr->ia_size <= ip->i_size ||
 		   (iattr->ia_size == 0 && ip->i_d.di_nextents)) {
-		/*
-		 * Signal a sync transaction unless we are truncating an
-		 * already unlinked file on a wsync filesystem.
-		 */
-		error = xfs_itruncate_finish(&tp, ip, iattr->ia_size,
-				    XFS_DATA_FORK,
-				    ((ip->i_d.di_nlink != 0 ||
-				      !(mp->m_flags & XFS_MOUNT_WSYNC))
-				     ? 1 : 0));
+		error = xfs_itruncate_data(&tp, ip, iattr->ia_size);
 		if (error)
 			goto out_trans_abort;
 
Index: xfs/fs/xfs/quota/xfs_qm_syscalls.c
===================================================================
--- xfs.orig/fs/xfs/quota/xfs_qm_syscalls.c	2011-06-29 19:45:25.346959576 +0200
+++ xfs/fs/xfs/quota/xfs_qm_syscalls.c	2011-06-30 09:05:30.870092231 +0200
@@ -263,7 +263,7 @@ xfs_qm_scall_trunc_qfile(
 	xfs_ilock(ip, XFS_ILOCK_EXCL);
 	xfs_trans_ijoin(tp, ip);
 
-	error = xfs_itruncate_finish(&tp, ip, 0, XFS_DATA_FORK, 1);
+	error = xfs_itruncate_data(&tp, ip, 0);
 	if (error) {
 		xfs_trans_cancel(tp, XFS_TRANS_RELEASE_LOG_RES |
 				     XFS_TRANS_ABORT);
Index: xfs/fs/xfs/xfs_attr.c
===================================================================
--- xfs.orig/fs/xfs/xfs_attr.c	2011-06-29 19:45:25.360292838 +0200
+++ xfs/fs/xfs/xfs_attr.c	2011-06-30 09:05:30.873425550 +0200
@@ -822,17 +822,21 @@ xfs_attr_inactive(xfs_inode_t *dp)
 	error = xfs_attr_root_inactive(&trans, dp);
 	if (error)
 		goto out;
+
 	/*
-	 * signal synchronous inactive transactions unless this
-	 * is a synchronous mount filesystem in which case we
-	 * know that we're here because we've been called out of
-	 * xfs_inactive which means that the last reference is gone
-	 * and the unlink transaction has already hit the disk so
-	 * async inactive transactions are safe.
+	 * Signal synchronous inactive transactions unless this is a
+	 * synchronous mount filesystem in which case we know that we're here
+	 * because we've been called out of xfs_inactive which means that the
+	 * last reference is gone and the unlink transaction has already hit
+	 * the disk so async inactive transactions are safe.
 	 */
-	if ((error = xfs_itruncate_finish(&trans, dp, 0LL, XFS_ATTR_FORK,
-				(!(mp->m_flags & XFS_MOUNT_WSYNC)
-				 ? 1 : 0))))
+	if (!(mp->m_flags & XFS_MOUNT_WSYNC)) {
+		if (dp->i_d.di_anextents > 0)
+			xfs_trans_set_sync(trans);
+	}
+
+	error = xfs_itruncate_extents(&trans, dp, XFS_ATTR_FORK, 0);
+	if (error)
 		goto out;
 
 	/*
Index: xfs/fs/xfs/xfs_inode.c
===================================================================
--- xfs.orig/fs/xfs/xfs_inode.c	2011-06-30 09:02:59.840094075 +0200
+++ xfs/fs/xfs/xfs_inode.c	2011-06-30 09:15:11.956751640 +0200
@@ -52,7 +52,7 @@ kmem_zone_t *xfs_ifork_zone;
 kmem_zone_t *xfs_inode_zone;
 
 /*
- * Used in xfs_itruncate().  This is the maximum number of extents
+ * Used in xfs_itruncate_extents().  This is the maximum number of extents
  * freed from a file in a single transaction.
  */
 #define	XFS_ITRUNC_MAX_EXTENTS	2
@@ -1218,7 +1218,9 @@ xfs_isize_check(
 
 /*
  * Free up the underlying blocks past new_size.  The new size must be
- * smaller than the current size.
+ * smaller than the current size.  This routine can be used both for
+ * the attribute and data fork, and does not modify the inode size,
+ * which is left to the caller.
  *
  * The transaction passed to this routine must have made a permanent log
  * reservation of at least XFS_ITRUNCATE_LOG_RES.  This routine may commit the
@@ -1230,31 +1232,6 @@ xfs_isize_check(
  * will be "held" within the returned transaction.  This routine does NOT
  * require any disk space to be reserved for it within the transaction.
  *
- * The fork parameter must be either XFS_ATTR_FORK or XFS_DATA_FORK, and it
- * indicates the fork which is to be truncated.  For the attribute fork we only
- * support truncation to size 0.
- *
- * We use the sync parameter to indicate whether or not the first transaction
- * we perform might have to be synchronous.  For the attr fork, it needs to be
- * so if the unlink of the inode is not yet known to be permanent in the log.
- * This keeps us from freeing and reusing the blocks of the attribute fork
- * before the unlink of the inode becomes permanent.
- *
- * For the data fork, we normally have to run synchronously if we're being
- * called out of the inactive path or we're being called out of the create path
- * where we're truncating an existing file.  Either way, the truncate needs to
- * be sync so blocks don't reappear in the file with altered data in case of a
- * crash.  wsync filesystems can run the first case async because anything that
- * shrinks the inode has to run sync so by the time we're called here from
- * inactive, the inode size is permanently set to 0.
- *
- * Calls from the truncate path always need to be sync unless we're in a wsync
- * filesystem and the file has already been unlinked.
- *
- * The caller is responsible for correctly setting the sync parameter.  It gets
- * too hard for us to guess here which path we're being called out of just
- * based on inode state.
- *
  * If we get an error, we must return with the inode locked and linked into the
  * current transaction. This keeps things simple for the higher level code,
  * because it always knows that the inode is locked and held in the transaction
@@ -1262,124 +1239,31 @@ xfs_isize_check(
  * dirty on error so that transactions can be easily aborted if possible.
  */
 int
-xfs_itruncate_finish(
-	xfs_trans_t	**tp,
-	xfs_inode_t	*ip,
-	xfs_fsize_t	new_size,
-	int		fork,
-	int		sync)
+xfs_itruncate_extents(
+	struct xfs_trans	**tpp,
+	struct xfs_inode	*ip,
+	int			whichfork,
+	xfs_fsize_t		new_size)
 {
-	xfs_fsblock_t	first_block;
-	xfs_fileoff_t	first_unmap_block;
-	xfs_fileoff_t	last_block;
-	xfs_filblks_t	unmap_len=0;
-	xfs_mount_t	*mp;
-	xfs_trans_t	*ntp;
-	int		done;
-	int		committed;
-	xfs_bmap_free_t	free_list;
-	int		error;
+	struct xfs_mount	*mp = ip->i_mount;
+	struct xfs_trans	*tp = *tpp;
+	struct xfs_trans	*ntp;
+	xfs_bmap_free_t		free_list;
+	xfs_fsblock_t		first_block;
+	xfs_fileoff_t		first_unmap_block;
+	xfs_fileoff_t		last_block;
+	xfs_filblks_t		unmap_len;
+	int			committed;
+	int			error = 0;
+	int			done = 0;
 
 	ASSERT(xfs_isilocked(ip, XFS_ILOCK_EXCL|XFS_IOLOCK_EXCL));
-	ASSERT((new_size == 0) || (new_size <= ip->i_size));
-	ASSERT(*tp != NULL);
-	ASSERT((*tp)->t_flags & XFS_TRANS_PERM_LOG_RES);
-	ASSERT(ip->i_transp == *tp);
+	ASSERT(new_size <= ip->i_size);
+	ASSERT(tp->t_flags & XFS_TRANS_PERM_LOG_RES);
+	ASSERT(ip->i_transp == tp);
 	ASSERT(ip->i_itemp != NULL);
 	ASSERT(ip->i_itemp->ili_lock_flags == 0);
-
-
-	ntp = *tp;
-	mp = (ntp)->t_mountp;
-	ASSERT(! XFS_NOT_DQATTACHED(mp, ip));
-
-	/*
-	 * We only support truncating the entire attribute fork.
-	 */
-	if (fork == XFS_ATTR_FORK) {
-		new_size = 0LL;
-	}
-	first_unmap_block = XFS_B_TO_FSB(mp, (xfs_ufsize_t)new_size);
-	trace_xfs_itruncate_finish_start(ip, new_size);
-
-	/*
-	 * The first thing we do is set the size to new_size permanently
-	 * on disk.  This way we don't have to worry about anyone ever
-	 * being able to look at the data being freed even in the face
-	 * of a crash.  What we're getting around here is the case where
-	 * we free a block, it is allocated to another file, it is written
-	 * to, and then we crash.  If the new data gets written to the
-	 * file but the log buffers containing the free and reallocation
-	 * don't, then we'd end up with garbage in the blocks being freed.
-	 * As long as we make the new_size permanent before actually
-	 * freeing any blocks it doesn't matter if they get written to.
-	 *
-	 * The callers must signal into us whether or not the size
-	 * setting here must be synchronous.  There are a few cases
-	 * where it doesn't have to be synchronous.  Those cases
-	 * occur if the file is unlinked and we know the unlink is
-	 * permanent or if the blocks being truncated are guaranteed
-	 * to be beyond the inode eof (regardless of the link count)
-	 * and the eof value is permanent.  Both of these cases occur
-	 * only on wsync-mounted filesystems.  In those cases, we're
-	 * guaranteed that no user will ever see the data in the blocks
-	 * that are being truncated so the truncate can run async.
-	 * In the free beyond eof case, the file may wind up with
-	 * more blocks allocated to it than it needs if we crash
-	 * and that won't get fixed until the next time the file
-	 * is re-opened and closed but that's ok as that shouldn't
-	 * be too many blocks.
-	 *
-	 * However, we can't just make all wsync xactions run async
-	 * because there's one call out of the create path that needs
-	 * to run sync where it's truncating an existing file to size
-	 * 0 whose size is > 0.
-	 *
-	 * It's probably possible to come up with a test in this
-	 * routine that would correctly distinguish all the above
-	 * cases from the values of the function parameters and the
-	 * inode state but for sanity's sake, I've decided to let the
-	 * layers above just tell us.  It's simpler to correctly figure
-	 * out in the layer above exactly under what conditions we
-	 * can run async and I think it's easier for others read and
-	 * follow the logic in case something has to be changed.
-	 * cscope is your friend -- rcc.
-	 *
-	 * The attribute fork is much simpler.
-	 *
-	 * For the attribute fork we allow the caller to tell us whether
-	 * the unlink of the inode that led to this call is yet permanent
-	 * in the on disk log.  If it is not and we will be freeing extents
-	 * in this inode then we make the first transaction synchronous
-	 * to make sure that the unlink is permanent by the time we free
-	 * the blocks.
-	 */
-	if (fork == XFS_DATA_FORK) {
-		if (ip->i_d.di_nextents > 0) {
-			/*
-			 * If we are not changing the file size then do
-			 * not update the on-disk file size - we may be
-			 * called from xfs_inactive_free_eofblocks().  If we
-			 * update the on-disk file size and then the system
-			 * crashes before the contents of the file are
-			 * flushed to disk then the files may be full of
-			 * holes (ie NULL files bug).
-			 */
-			if (ip->i_size != new_size) {
-				ip->i_d.di_size = new_size;
-				ip->i_size = new_size;
-				xfs_trans_log_inode(ntp, ip, XFS_ILOG_CORE);
-			}
-		}
-	} else if (sync) {
-		ASSERT(!(mp->m_flags & XFS_MOUNT_WSYNC));
-		if (ip->i_d.di_anextents > 0)
-			xfs_trans_set_sync(ntp);
-	}
-	ASSERT(fork == XFS_DATA_FORK ||
-		(fork == XFS_ATTR_FORK &&
-			((sync && !(mp->m_flags & XFS_MOUNT_WSYNC)) ||
-			 (sync == 0 && (mp->m_flags & XFS_MOUNT_WSYNC)))));
+	ASSERT(!XFS_NOT_DQATTACHED(mp, ip));
 
 	/*
 	 * Since it is possible for space to become allocated beyond
@@ -1390,128 +1274,143 @@ xfs_itruncate_finish(
 	 * beyond the maximum file size (ie it is the same as last_block),
 	 * then there is nothing to do.
 	 */
+	first_unmap_block = XFS_B_TO_FSB(mp, (xfs_ufsize_t)new_size);
 	last_block = XFS_B_TO_FSB(mp, (xfs_ufsize_t)XFS_MAXIOFFSET(mp));
-	ASSERT(first_unmap_block <= last_block);
-	done = 0;
-	if (last_block == first_unmap_block) {
-		done = 1;
-	} else {
-		unmap_len = last_block - first_unmap_block + 1;
-	}
+	if (first_unmap_block == last_block)
+		return 0;
+
+	ASSERT(first_unmap_block < last_block);
+	unmap_len = last_block - first_unmap_block + 1;
 	while (!done) {
-		/*
-		 * Free up up to XFS_ITRUNC_MAX_EXTENTS.  xfs_bunmapi()
-		 * will tell us whether it freed the entire range or
-		 * not.  If this is a synchronous mount (wsync),
-		 * then we can tell bunmapi to keep all the
-		 * transactions asynchronous since the unlink
-		 * transaction that made this inode inactive has
-		 * already hit the disk.  There's no danger of
-		 * the freed blocks being reused, there being a
-		 * crash, and the reused blocks suddenly reappearing
-		 * in this file with garbage in them once recovery
-		 * runs.
-		 */
 		xfs_bmap_init(&free_list, &first_block);
-		error = xfs_bunmapi(ntp, ip,
+		error = xfs_bunmapi(tp, ip,
 				    first_unmap_block, unmap_len,
-				    xfs_bmapi_aflag(fork),
+				    xfs_bmapi_aflag(whichfork),
 				    XFS_ITRUNC_MAX_EXTENTS,
 				    &first_block, &free_list,
 				    &done);
-		if (error) {
-			/*
-			 * If the bunmapi call encounters an error,
-			 * return to the caller where the transaction
-			 * can be properly aborted.  We just need to
-			 * make sure we're not holding any resources
-			 * that we were not when we came in.
-			 */
-			xfs_bmap_cancel(&free_list);
-			return error;
-		}
+		if (error)
+			goto out_bmap_cancel;
 
 		/*
 		 * Duplicate the transaction that has the permanent
 		 * reservation and commit the old transaction.
 		 */
-		error = xfs_bmap_finish(tp, &free_list, &committed);
-		ntp = *tp;
+		error = xfs_bmap_finish(&tp, &free_list, &committed);
 		if (committed)
-			xfs_trans_ijoin(ntp, ip);
-
-		if (error) {
-			/*
-			 * If the bmap finish call encounters an error, return
-			 * to the caller where the transaction can be properly
-			 * aborted.  We just need to make sure we're not
-			 * holding any resources that we were not when we came
-			 * in.
-			 *
-			 * Aborting from this point might lose some blocks in
-			 * the file system, but oh well.
-			 */
-			xfs_bmap_cancel(&free_list);
-			return error;
-		}
+			xfs_trans_ijoin(tp, ip);
+		if (error)
+			goto out_bmap_cancel;
 
 		if (committed) {
 			/*
 			 * Mark the inode dirty so it will be logged and
 			 * moved forward in the log as part of every commit.
 			 */
-			xfs_trans_log_inode(ntp, ip, XFS_ILOG_CORE);
+			xfs_trans_log_inode(tp, ip, XFS_ILOG_CORE);
 		}
 
-		ntp = xfs_trans_dup(ntp);
-		error = xfs_trans_commit(*tp, 0);
-		*tp = ntp;
+		ntp = xfs_trans_dup(tp);
+		error = xfs_trans_commit(tp, 0);
+		tp = ntp;
 
-		xfs_trans_ijoin(ntp, ip);
+		xfs_trans_ijoin(tp, ip);
 
 		if (error)
-			return error;
+			goto out;
+
 		/*
-		 * transaction commit worked ok so we can drop the extra ticket
+		 * Transaction commit worked ok so we can drop the extra ticket
 		 * reference that we gained in xfs_trans_dup()
 		 */
-		xfs_log_ticket_put(ntp->t_ticket);
-		error = xfs_trans_reserve(ntp, 0,
+		xfs_log_ticket_put(tp->t_ticket);
+		error = xfs_trans_reserve(tp, 0,
 					XFS_ITRUNCATE_LOG_RES(mp), 0,
 					XFS_TRANS_PERM_LOG_RES,
 					XFS_ITRUNCATE_LOG_COUNT);
 		if (error)
-			return error;
+			goto out;
 	}
+
+out:
+	*tpp = tp;
+	return error;
+out_bmap_cancel:
+	/*
+	 * If the bunmapi call encounters an error, return to the caller where
+	 * the transaction can be properly aborted.  We just need to make sure
+	 * we're not holding any resources that we were not when we came in.
+	 */
+	xfs_bmap_cancel(&free_list);
+	goto out;
+}
+
+int
+xfs_itruncate_data(
+	struct xfs_trans	**tpp,
+	struct xfs_inode	*ip,
+	xfs_fsize_t		new_size)
+{
+	struct xfs_mount	*mp = ip->i_mount;
+	int			error;
+
+	trace_xfs_itruncate_data_start(ip, new_size);
+
 	/*
-	 * Only update the size in the case of the data fork, but
-	 * always re-log the inode so that our permanent transaction
-	 * can keep on rolling it forward in the log.
+	 * The first thing we do is set the size to new_size permanently on
+	 * disk.  This way we don't have to worry about anyone ever being able
+	 * to look at the data being freed even in the face of a crash.
+	 * What we're getting around here is the case where we free a block, it
+	 * is allocated to another file, it is written to, and then we crash.
+	 * If the new data gets written to the file but the log buffers
+	 * containing the free and reallocation don't, then we'd end up with
+	 * garbage in the blocks being freed.  As long as we make the new_size
+	 * permanent before actually freeing any blocks it doesn't matter if
+	 * they get written to.
 	 */
-	if (fork == XFS_DATA_FORK) {
-		xfs_isize_check(mp, ip, new_size);
+	if (ip->i_d.di_nextents > 0) {
 		/*
-		 * If we are not changing the file size then do
-		 * not update the on-disk file size - we may be
-		 * called from xfs_inactive_free_eofblocks().  If we
-		 * update the on-disk file size and then the system
-		 * crashes before the contents of the file are
-		 * flushed to disk then the files may be full of
-		 * holes (ie NULL files bug).
+		 * If we are not changing the file size then do not update
+		 * the on-disk file size - we may be called from
+		 * xfs_inactive_free_eofblocks().  If we update the on-disk
+		 * file size and then the system crashes before the contents
+		 * of the file are flushed to disk then the files may be
+		 * full of holes (ie NULL files bug).
 		 */
 		if (ip->i_size != new_size) {
 			ip->i_d.di_size = new_size;
 			ip->i_size = new_size;
+			xfs_trans_log_inode(*tpp, ip, XFS_ILOG_CORE);
 		}
 	}
-	xfs_trans_log_inode(ntp, ip, XFS_ILOG_CORE);
-	ASSERT((new_size != 0) ||
-	       (fork == XFS_ATTR_FORK) ||
-	       (ip->i_delayed_blks == 0));
-	ASSERT((new_size != 0) ||
-	       (fork == XFS_ATTR_FORK) ||
-	       (ip->i_d.di_nextents == 0));
-	trace_xfs_itruncate_finish_end(ip, new_size);
+
+	error = xfs_itruncate_extents(tpp, ip, XFS_DATA_FORK, new_size);
+	if (error)
+		return error;
+
+	/*
+	 * If we are not changing the file size then do not update the on-disk
+	 * file size - we may be called from xfs_inactive_free_eofblocks().
+	 * If we update the on-disk file size and then the system crashes
+	 * before the contents of the file are flushed to disk then the files
+	 * may be full of holes (ie NULL files bug).
+	 */
+	xfs_isize_check(mp, ip, new_size);
+	if (ip->i_size != new_size) {
+		ip->i_d.di_size = new_size;
+		ip->i_size = new_size;
+	}
+
+	ASSERT(new_size != 0 || ip->i_delayed_blks == 0);
+	ASSERT(new_size != 0 || ip->i_d.di_nextents == 0);
+
+	/*
+	 * Always re-log the inode so that our permanent transaction can keep
+	 * on rolling it forward in the log.
+	 */
+	xfs_trans_log_inode(*tpp, ip, XFS_ILOG_CORE);
+
+	trace_xfs_itruncate_data_end(ip, new_size);
 	return 0;
 }
 
Index: xfs/fs/xfs/xfs_inode.h
===================================================================
--- xfs.orig/fs/xfs/xfs_inode.h	2011-06-30 09:02:59.846760741 +0200
+++ xfs/fs/xfs/xfs_inode.h	2011-06-30 09:05:30.876758871 +0200
@@ -491,8 +491,10 @@ uint		xfs_ip2xflags(struct xfs_inode *);
 uint		xfs_dic2xflags(struct xfs_dinode *);
 int		xfs_ifree(struct xfs_trans *, xfs_inode_t *,
 			   struct xfs_bmap_free *);
-int		xfs_itruncate_finish(struct xfs_trans **, xfs_inode_t *,
-				     xfs_fsize_t, int, int);
+int		xfs_itruncate_extents(struct xfs_trans **, struct xfs_inode *,
+				      int, xfs_fsize_t);
+int		xfs_itruncate_data(struct xfs_trans **, struct xfs_inode *,
+				   xfs_fsize_t);
 int		xfs_iunlink(struct xfs_trans *, xfs_inode_t *);
 
 void		xfs_iext_realloc(xfs_inode_t *, int, int);
Index: xfs/fs/xfs/xfs_vnodeops.c
===================================================================
--- xfs.orig/fs/xfs/xfs_vnodeops.c	2011-06-30 09:02:59.843427408 +0200
+++ xfs/fs/xfs/xfs_vnodeops.c	2011-06-30 09:05:30.876758871 +0200
@@ -220,15 +220,12 @@ xfs_free_eofblocks(
 		xfs_ilock(ip, XFS_ILOCK_EXCL);
 		xfs_trans_ijoin(tp, ip);
 
-		error = xfs_itruncate_finish(&tp, ip,
-					     ip->i_size,
-					     XFS_DATA_FORK,
-					     0);
-		/*
-		 * If we get an error at this point we
-		 * simply don't bother truncating the file.
-		 */
+		error = xfs_itruncate_data(&tp, ip, ip->i_size);
 		if (error) {
+			/*
+			 * If we get an error at this point we simply don't
+			 * bother truncating the file.
+			 */
 			xfs_trans_cancel(tp,
 					 (XFS_TRANS_RELEASE_LOG_RES |
 					  XFS_TRANS_ABORT));
@@ -665,16 +662,7 @@ xfs_inactive(
 		xfs_ilock(ip, XFS_ILOCK_EXCL);
 		xfs_trans_ijoin(tp, ip);
 
-		/*
-		 * normally, we have to run xfs_itruncate_finish sync.
-		 * But if filesystem is wsync and we're in the inactive
-		 * path, then we know that nlink == 0, and that the
-		 * xaction that made nlink == 0 is permanently committed
-		 * since xfs_remove runs as a synchronous transaction.
-		 */
-		error = xfs_itruncate_finish(&tp, ip, 0, XFS_DATA_FORK,
-				(!(mp->m_flags & XFS_MOUNT_WSYNC) ? 1 : 0));
-
+		error = xfs_itruncate_data(&tp, ip, 0);
 		if (error) {
 			xfs_trans_cancel(tp,
 				XFS_TRANS_RELEASE_LOG_RES | XFS_TRANS_ABORT);
Index: xfs/fs/xfs/linux-2.6/xfs_trace.h
===================================================================
--- xfs.orig/fs/xfs/linux-2.6/xfs_trace.h	2011-06-30 09:02:59.846760741 +0200
+++ xfs/fs/xfs/linux-2.6/xfs_trace.h	2011-06-30 09:05:30.880092189 +0200
@@ -1055,8 +1055,8 @@ DECLARE_EVENT_CLASS(xfs_itrunc_class,
 DEFINE_EVENT(xfs_itrunc_class, name, \
 	TP_PROTO(struct xfs_inode *ip, xfs_fsize_t new_size), \
 	TP_ARGS(ip, new_size))
-DEFINE_ITRUNC_EVENT(xfs_itruncate_finish_start);
-DEFINE_ITRUNC_EVENT(xfs_itruncate_finish_end);
+DEFINE_ITRUNC_EVENT(xfs_itruncate_data_start);
+DEFINE_ITRUNC_EVENT(xfs_itruncate_data_end);
 
 TRACE_EVENT(xfs_pagecache_inval,
 	TP_PROTO(struct xfs_inode *ip, xfs_off_t start, xfs_off_t finish),

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 88+ messages in thread

* [PATCH 08/27] xfs: improve sync behaviour in the fact of aggressive dirtying
  2011-07-01  9:43 [PATCH 00/27] patch queue for Linux 3.1, V2 Christoph Hellwig
                   ` (5 preceding siblings ...)
  2011-07-01  9:43 ` [PATCH 07/27] xfs: split xfs_itruncate_finish Christoph Hellwig
@ 2011-07-01  9:43 ` Christoph Hellwig
  2011-07-05 22:36   ` Alex Elder
  2011-07-01  9:43 ` [PATCH 09/27] xfs: fix filesystsem freeze race in xfs_trans_alloc Christoph Hellwig
                   ` (19 subsequent siblings)
  26 siblings, 1 reply; 88+ messages in thread
From: Christoph Hellwig @ 2011-07-01  9:43 UTC (permalink / raw)
  To: xfs

[-- Attachment #1: xfs-simplify-sync --]
[-- Type: text/plain, Size: 2423 bytes --]

The following script from Wu Fengguang shows very bad behaviour in XFS
when aggressively dirtying data during a sync on XFS, with sync times
up to almost 10 times as long as ext4.

A large part of the issue is that XFS writes data out itself two times
in the ->sync_fs method, overriding the lifelock protection in the core
writeback code, and another issue is the lock-less xfs_ioend_wait call,
which doesn't prevent new ioend from beeing queue up while waiting for
the count to reach zero.

This patch removes the XFS-internal sync calls and relies on the VFS
to do it's work just like all other filesystems do.  Note that the
i_iocount wait which is rather suboptimal is simply removed here.
We already do it in ->write_inode, which keeps the current supoptimal
behaviour.  We'll eventually need to remove that as well, but that's
material for a separate commit.

------------------------------ snip ------------------------------
#!/bin/sh

umount /dev/sda7
mkfs.xfs -f /dev/sda7
# mkfs.ext4 /dev/sda7
# mkfs.btrfs /dev/sda7
mount /dev/sda7 /fs

echo $((50<<20)) > /proc/sys/vm/dirty_bytes

pid=
for i in `seq 10`
do
	dd if=/dev/zero of=/fs/zero-$i bs=1M count=1000 &
	pid="$pid $!"
done

sleep 1

tic=$(date +'%s')
sync
tac=$(date +'%s')

echo
echo sync time: $((tac-tic))
egrep '(Dirty|Writeback|NFS_Unstable)' /proc/meminfo

pidof dd > /dev/null && { kill -9 $pid; echo sync NOT livelocked; }
------------------------------ snip ------------------------------

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reported-by: Wu Fengguang <fengguang.wu@intel.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>

Index: xfs/fs/xfs/linux-2.6/xfs_sync.c
===================================================================
--- xfs.orig/fs/xfs/linux-2.6/xfs_sync.c	2011-06-29 11:26:14.109219361 +0200
+++ xfs/fs/xfs/linux-2.6/xfs_sync.c	2011-06-29 11:37:20.642275110 +0200
@@ -359,14 +359,12 @@ xfs_quiesce_data(
 {
 	int			error, error2 = 0;
 
-	/* push non-blocking */
-	xfs_sync_data(mp, 0);
 	xfs_qm_sync(mp, SYNC_TRYLOCK);
-
-	/* push and block till complete */
-	xfs_sync_data(mp, SYNC_WAIT);
 	xfs_qm_sync(mp, SYNC_WAIT);
 
+	/* force out the newly dirtied log buffers */
+	xfs_log_force(mp, XFS_LOG_SYNC);
+
 	/* write superblock and hoover up shutdown errors */
 	error = xfs_sync_fsdata(mp);
 

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 88+ messages in thread

* [PATCH 09/27] xfs: fix filesystsem freeze race in xfs_trans_alloc
  2011-07-01  9:43 [PATCH 00/27] patch queue for Linux 3.1, V2 Christoph Hellwig
                   ` (6 preceding siblings ...)
  2011-07-01  9:43 ` [PATCH 08/27] xfs: improve sync behaviour in the fact of aggressive dirtying Christoph Hellwig
@ 2011-07-01  9:43 ` Christoph Hellwig
  2011-07-05 22:36   ` Alex Elder
  2011-07-01  9:43 ` [PATCH 10/27] xfs: remove i_transp Christoph Hellwig
                   ` (18 subsequent siblings)
  26 siblings, 1 reply; 88+ messages in thread
From: Christoph Hellwig @ 2011-07-01  9:43 UTC (permalink / raw)
  To: xfs

[-- Attachment #1: xfs-fix-freeze-race --]
[-- Type: text/plain, Size: 5090 bytes --]

As pointed out by Jan xfs_trans_alloc can race with a concurrent filesystem
freeze when it sleeps during the memory allocation.  Fix this by moving the
wait_for_freeze call after the memory allocation.  This means moving the
freeze into the low-level _xfs_trans_alloc helper, which thus grows a new
argument.  Also fix up some comments in that area while at it.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <david@fromorbit.com>

Index: xfs/fs/xfs/xfs_fsops.c
===================================================================
--- xfs.orig/fs/xfs/xfs_fsops.c	2011-06-18 17:50:43.477373715 +0200
+++ xfs/fs/xfs/xfs_fsops.c	2011-06-20 09:17:00.933518761 +0200
@@ -626,7 +626,7 @@ xfs_fs_log_dummy(
 	xfs_trans_t	*tp;
 	int		error;
 
-	tp = _xfs_trans_alloc(mp, XFS_TRANS_DUMMY1, KM_SLEEP);
+	tp = _xfs_trans_alloc(mp, XFS_TRANS_DUMMY1, KM_SLEEP, false);
 	error = xfs_trans_reserve(tp, 0, mp->m_sb.sb_sectsize + 128, 0, 0,
 					XFS_DEFAULT_LOG_COUNT);
 	if (error) {
Index: xfs/fs/xfs/xfs_iomap.c
===================================================================
--- xfs.orig/fs/xfs/xfs_iomap.c	2011-06-18 17:50:43.487373714 +0200
+++ xfs/fs/xfs/xfs_iomap.c	2011-06-20 09:17:00.933518761 +0200
@@ -688,8 +688,7 @@ xfs_iomap_write_unwritten(
 		 * the same inode that we complete here and might deadlock
 		 * on the iolock.
 		 */
-		xfs_wait_for_freeze(mp, SB_FREEZE_TRANS);
-		tp = _xfs_trans_alloc(mp, XFS_TRANS_STRAT_WRITE, KM_NOFS);
+		tp = _xfs_trans_alloc(mp, XFS_TRANS_STRAT_WRITE, KM_NOFS, true);
 		tp->t_flags |= XFS_TRANS_RESERVE;
 		error = xfs_trans_reserve(tp, resblks,
 				XFS_WRITE_LOG_RES(mp), 0,
Index: xfs/fs/xfs/xfs_trans.h
===================================================================
--- xfs.orig/fs/xfs/xfs_trans.h	2011-06-18 17:50:43.497373713 +0200
+++ xfs/fs/xfs/xfs_trans.h	2011-06-21 10:57:04.908840421 +0200
@@ -447,8 +447,14 @@ typedef struct xfs_trans {
 /*
  * XFS transaction mechanism exported interfaces.
  */
-xfs_trans_t	*xfs_trans_alloc(struct xfs_mount *, uint);
-xfs_trans_t	*_xfs_trans_alloc(struct xfs_mount *, uint, uint);
+xfs_trans_t	*_xfs_trans_alloc(struct xfs_mount *, uint, uint, bool);
+
+static inline struct xfs_trans *
+xfs_trans_alloc(struct xfs_mount *mp, uint type)
+{
+	return _xfs_trans_alloc(mp, type, KM_SLEEP, true);
+}
+
 xfs_trans_t	*xfs_trans_dup(xfs_trans_t *);
 int		xfs_trans_reserve(xfs_trans_t *, uint, uint, uint,
 				  uint, uint);
Index: xfs/fs/xfs/xfs_mount.c
===================================================================
--- xfs.orig/fs/xfs/xfs_mount.c	2011-06-18 17:50:43.510707047 +0200
+++ xfs/fs/xfs/xfs_mount.c	2011-06-20 09:17:00.936852094 +0200
@@ -1566,15 +1566,9 @@ xfs_fs_writable(xfs_mount_t *mp)
 }
 
 /*
- * xfs_log_sbcount
- *
  * Called either periodically to keep the on disk superblock values
  * roughly up to date or from unmount to make sure the values are
  * correct on a clean unmount.
- *
- * Note this code can be called during the process of freezing, so
- * we may need to use the transaction allocator which does not not
- * block when the transaction subsystem is in its frozen state.
  */
 int
 xfs_log_sbcount(
@@ -1596,7 +1590,13 @@ xfs_log_sbcount(
 	if (!xfs_sb_version_haslazysbcount(&mp->m_sb))
 		return 0;
 
-	tp = _xfs_trans_alloc(mp, XFS_TRANS_SB_COUNT, KM_SLEEP);
+	/*
+	 * We can be called during the process of freezing, so make sure
+	 * we go ahead even if the frozen for new transactions.  We will
+	 * always use a sync transaction in the freeze path to make sure
+	 * the transaction has completed by the time we return.
+	 */
+	tp = _xfs_trans_alloc(mp, XFS_TRANS_SB_COUNT, KM_SLEEP, false);
 	error = xfs_trans_reserve(tp, 0, mp->m_sb.sb_sectsize + 128, 0, 0,
 					XFS_DEFAULT_LOG_COUNT);
 	if (error) {
Index: xfs/fs/xfs/xfs_trans.c
===================================================================
--- xfs.orig/fs/xfs/xfs_trans.c	2011-06-18 17:50:43.524040379 +0200
+++ xfs/fs/xfs/xfs_trans.c	2011-06-21 10:56:25.305509042 +0200
@@ -566,31 +566,24 @@ xfs_trans_init(
 
 /*
  * This routine is called to allocate a transaction structure.
+ *
  * The type parameter indicates the type of the transaction.  These
  * are enumerated in xfs_trans.h.
- *
- * Dynamically allocate the transaction structure from the transaction
- * zone, initialize it, and return it to the caller.
  */
-xfs_trans_t *
-xfs_trans_alloc(
-	xfs_mount_t	*mp,
-	uint		type)
-{
-	xfs_wait_for_freeze(mp, SB_FREEZE_TRANS);
-	return _xfs_trans_alloc(mp, type, KM_SLEEP);
-}
-
-xfs_trans_t *
+struct xfs_trans *
 _xfs_trans_alloc(
-	xfs_mount_t	*mp,
-	uint		type,
-	uint		memflags)
+	struct xfs_mount	*mp,
+	uint			type,
+	uint			memflags,
+	bool			wait_for_freeze)
 {
-	xfs_trans_t	*tp;
+	struct xfs_trans	*tp;
 
 	atomic_inc(&mp->m_active_trans);
 
+	if (wait_for_freeze)
+		xfs_wait_for_freeze(mp, SB_FREEZE_TRANS);
+
 	tp = kmem_zone_zalloc(xfs_trans_zone, memflags);
 	tp->t_magic = XFS_TRANS_MAGIC;
 	tp->t_type = type;

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 88+ messages in thread

* [PATCH 10/27] xfs: remove i_transp
  2011-07-01  9:43 [PATCH 00/27] patch queue for Linux 3.1, V2 Christoph Hellwig
                   ` (7 preceding siblings ...)
  2011-07-01  9:43 ` [PATCH 09/27] xfs: fix filesystsem freeze race in xfs_trans_alloc Christoph Hellwig
@ 2011-07-01  9:43 ` Christoph Hellwig
  2011-07-05 22:36   ` Alex Elder
  2011-07-01  9:43 ` [PATCH 11/27] xfs: kill the unused struct xfs_sync_work Christoph Hellwig
                   ` (17 subsequent siblings)
  26 siblings, 1 reply; 88+ messages in thread
From: Christoph Hellwig @ 2011-07-01  9:43 UTC (permalink / raw)
  To: xfs

[-- Attachment #1: xfs-kill-i_transp --]
[-- Type: text/plain, Size: 9093 bytes --]

Remove the transaction pointer in the inode.  It's only used to avoid
passing down an argument in the bmap code, and for a few asserts in
the transaction code right now.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>

Index: xfs/fs/xfs/quota/xfs_trans_dquot.c
===================================================================
--- xfs.orig/fs/xfs/quota/xfs_trans_dquot.c	2011-06-29 19:45:24.000000000 +0200
+++ xfs/fs/xfs/quota/xfs_trans_dquot.c	2011-06-30 09:16:44.710083825 +0200
@@ -59,7 +59,7 @@ xfs_trans_dqjoin(
 	xfs_trans_add_item(tp, &dqp->q_logitem.qli_item);
 
 	/*
-	 * Initialize i_transp so we can later determine if this dquot is
+	 * Initialize d_transp so we can later determine if this dquot is
 	 * associated with this transaction.
 	 */
 	dqp->q_transp = tp;
Index: xfs/fs/xfs/xfs_bmap.c
===================================================================
--- xfs.orig/fs/xfs/xfs_bmap.c	2011-06-29 19:45:24.000000000 +0200
+++ xfs/fs/xfs/xfs_bmap.c	2011-06-30 09:16:44.713417161 +0200
@@ -94,6 +94,7 @@ xfs_bmap_add_attrfork_local(
  */
 STATIC int				/* error */
 xfs_bmap_add_extent_delay_real(
+	struct xfs_trans	*tp,	/* transaction pointer */
 	xfs_inode_t		*ip,	/* incore inode pointer */
 	xfs_extnum_t		*idx,	/* extent number to update/insert */
 	xfs_btree_cur_t		**curp,	/* if *curp is null, not a btree */
@@ -439,6 +440,7 @@ xfs_bmap_add_attrfork_local(
  */
 STATIC int				/* error */
 xfs_bmap_add_extent(
+	struct xfs_trans	*tp,	/* transaction pointer */
 	xfs_inode_t		*ip,	/* incore inode pointer */
 	xfs_extnum_t		*idx,	/* extent number to update/insert */
 	xfs_btree_cur_t		**curp,	/* if *curp is null, not a btree */
@@ -524,7 +526,7 @@ xfs_bmap_add_extent(
 				if (cur)
 					ASSERT(cur->bc_private.b.flags &
 						XFS_BTCUR_BPRV_WASDEL);
-				error = xfs_bmap_add_extent_delay_real(ip,
+				error = xfs_bmap_add_extent_delay_real(tp, ip,
 						idx, &cur, new, &da_new,
 						first, flist, &logflags);
 			} else {
@@ -561,7 +563,7 @@ xfs_bmap_add_extent(
 		int	tmp_logflags;	/* partial log flag return val */
 
 		ASSERT(cur == NULL);
-		error = xfs_bmap_extents_to_btree(ip->i_transp, ip, first,
+		error = xfs_bmap_extents_to_btree(tp, ip, first,
 			flist, &cur, da_old > 0, &tmp_logflags, whichfork);
 		logflags |= tmp_logflags;
 		if (error)
@@ -604,6 +606,7 @@ done:
  */
 STATIC int				/* error */
 xfs_bmap_add_extent_delay_real(
+	struct xfs_trans	*tp,	/* transaction pointer */
 	xfs_inode_t		*ip,	/* incore inode pointer */
 	xfs_extnum_t		*idx,	/* extent number to update/insert */
 	xfs_btree_cur_t		**curp,	/* if *curp is null, not a btree */
@@ -901,7 +904,7 @@ xfs_bmap_add_extent_delay_real(
 		}
 		if (ip->i_d.di_format == XFS_DINODE_FMT_EXTENTS &&
 		    ip->i_d.di_nextents > ip->i_df.if_ext_max) {
-			error = xfs_bmap_extents_to_btree(ip->i_transp, ip,
+			error = xfs_bmap_extents_to_btree(tp, ip,
 					first, flist, &cur, 1, &tmp_rval,
 					XFS_DATA_FORK);
 			rval |= tmp_rval;
@@ -984,7 +987,7 @@ xfs_bmap_add_extent_delay_real(
 		}
 		if (ip->i_d.di_format == XFS_DINODE_FMT_EXTENTS &&
 		    ip->i_d.di_nextents > ip->i_df.if_ext_max) {
-			error = xfs_bmap_extents_to_btree(ip->i_transp, ip,
+			error = xfs_bmap_extents_to_btree(tp, ip,
 				first, flist, &cur, 1, &tmp_rval,
 				XFS_DATA_FORK);
 			rval |= tmp_rval;
@@ -1052,7 +1055,7 @@ xfs_bmap_add_extent_delay_real(
 		}
 		if (ip->i_d.di_format == XFS_DINODE_FMT_EXTENTS &&
 		    ip->i_d.di_nextents > ip->i_df.if_ext_max) {
-			error = xfs_bmap_extents_to_btree(ip->i_transp, ip,
+			error = xfs_bmap_extents_to_btree(tp, ip,
 					first, flist, &cur, 1, &tmp_rval,
 					XFS_DATA_FORK);
 			rval |= tmp_rval;
@@ -2871,8 +2874,8 @@ xfs_bmap_del_extent(
 			len = del->br_blockcount;
 			do_div(bno, mp->m_sb.sb_rextsize);
 			do_div(len, mp->m_sb.sb_rextsize);
-			if ((error = xfs_rtfree_extent(ip->i_transp, bno,
-					(xfs_extlen_t)len)))
+			error = xfs_rtfree_extent(tp, bno, (xfs_extlen_t)len);
+			if (error)
 				goto done;
 			do_fx = 0;
 			nblks = len * mp->m_sb.sb_rextsize;
@@ -4662,7 +4665,7 @@ xfs_bmapi(
 				if (!wasdelay && (flags & XFS_BMAPI_PREALLOC))
 					got.br_state = XFS_EXT_UNWRITTEN;
 			}
-			error = xfs_bmap_add_extent(ip, &lastx, &cur, &got,
+			error = xfs_bmap_add_extent(tp, ip, &lastx, &cur, &got,
 				firstblock, flist, &tmp_logflags,
 				whichfork);
 			logflags |= tmp_logflags;
@@ -4763,7 +4766,7 @@ xfs_bmapi(
 			mval->br_state = (mval->br_state == XFS_EXT_UNWRITTEN)
 						? XFS_EXT_NORM
 						: XFS_EXT_UNWRITTEN;
-			error = xfs_bmap_add_extent(ip, &lastx, &cur, mval,
+			error = xfs_bmap_add_extent(tp, ip, &lastx, &cur, mval,
 				firstblock, flist, &tmp_logflags,
 				whichfork);
 			logflags |= tmp_logflags;
@@ -5117,7 +5120,7 @@ xfs_bunmapi(
 				del.br_blockcount = mod;
 			}
 			del.br_state = XFS_EXT_UNWRITTEN;
-			error = xfs_bmap_add_extent(ip, &lastx, &cur, &del,
+			error = xfs_bmap_add_extent(tp, ip, &lastx, &cur, &del,
 				firstblock, flist, &logflags,
 				XFS_DATA_FORK);
 			if (error)
@@ -5175,18 +5178,18 @@ xfs_bunmapi(
 				}
 				prev.br_state = XFS_EXT_UNWRITTEN;
 				lastx--;
-				error = xfs_bmap_add_extent(ip, &lastx, &cur,
-					&prev, firstblock, flist, &logflags,
-					XFS_DATA_FORK);
+				error = xfs_bmap_add_extent(tp, ip, &lastx,
+						&cur, &prev, firstblock, flist,
+						&logflags, XFS_DATA_FORK);
 				if (error)
 					goto error0;
 				goto nodelete;
 			} else {
 				ASSERT(del.br_state == XFS_EXT_NORM);
 				del.br_state = XFS_EXT_UNWRITTEN;
-				error = xfs_bmap_add_extent(ip, &lastx, &cur,
-					&del, firstblock, flist, &logflags,
-					XFS_DATA_FORK);
+				error = xfs_bmap_add_extent(tp, ip, &lastx,
+						&cur, &del, firstblock, flist,
+						&logflags, XFS_DATA_FORK);
 				if (error)
 					goto error0;
 				goto nodelete;
Index: xfs/fs/xfs/xfs_inode.c
===================================================================
--- xfs.orig/fs/xfs/xfs_inode.c	2011-06-30 09:15:11.000000000 +0200
+++ xfs/fs/xfs/xfs_inode.c	2011-06-30 09:16:57.120083690 +0200
@@ -1260,7 +1260,6 @@ xfs_itruncate_extents(
 	ASSERT(xfs_isilocked(ip, XFS_ILOCK_EXCL|XFS_IOLOCK_EXCL));
 	ASSERT(new_size <= ip->i_size);
 	ASSERT(tp->t_flags & XFS_TRANS_PERM_LOG_RES);
-	ASSERT(ip->i_transp == tp);
 	ASSERT(ip->i_itemp != NULL);
 	ASSERT(ip->i_itemp->ili_lock_flags == 0);
 	ASSERT(!XFS_NOT_DQATTACHED(mp, ip));
@@ -1436,7 +1435,6 @@ xfs_iunlink(
 
 	ASSERT(ip->i_d.di_nlink == 0);
 	ASSERT(ip->i_d.di_mode != 0);
-	ASSERT(ip->i_transp == tp);
 
 	mp = tp->t_mountp;
 
@@ -1828,7 +1826,6 @@ xfs_ifree(
 	xfs_buf_t       	*ibp;
 
 	ASSERT(xfs_isilocked(ip, XFS_ILOCK_EXCL));
-	ASSERT(ip->i_transp == tp);
 	ASSERT(ip->i_d.di_nlink == 0);
 	ASSERT(ip->i_d.di_nextents == 0);
 	ASSERT(ip->i_d.di_anextents == 0);
Index: xfs/fs/xfs/xfs_inode.h
===================================================================
--- xfs.orig/fs/xfs/xfs_inode.h	2011-06-30 09:05:30.000000000 +0200
+++ xfs/fs/xfs/xfs_inode.h	2011-06-30 09:16:44.720083829 +0200
@@ -241,7 +241,6 @@ typedef struct xfs_inode {
 	xfs_ifork_t		i_df;		/* data fork */
 
 	/* Transaction and locking information. */
-	struct xfs_trans	*i_transp;	/* ptr to owning transaction*/
 	struct xfs_inode_log_item *i_itemp;	/* logging information */
 	mrlock_t		i_lock;		/* inode lock */
 	mrlock_t		i_iolock;	/* inode IO lock */
Index: xfs/fs/xfs/xfs_inode_item.c
===================================================================
--- xfs.orig/fs/xfs/xfs_inode_item.c	2011-06-29 19:45:24.960295005 +0200
+++ xfs/fs/xfs/xfs_inode_item.c	2011-06-30 09:16:44.723417161 +0200
@@ -636,11 +636,6 @@ xfs_inode_item_unlock(
 	ASSERT(xfs_isilocked(iip->ili_inode, XFS_ILOCK_EXCL));
 
 	/*
-	 * Clear the transaction pointer in the inode.
-	 */
-	ip->i_transp = NULL;
-
-	/*
 	 * If the inode needed a separate buffer with which to log
 	 * its extents, then free it now.
 	 */
Index: xfs/fs/xfs/xfs_trans_inode.c
===================================================================
--- xfs.orig/fs/xfs/xfs_trans_inode.c	2011-06-29 19:45:24.973628266 +0200
+++ xfs/fs/xfs/xfs_trans_inode.c	2011-06-30 09:16:44.723417161 +0200
@@ -55,7 +55,6 @@ xfs_trans_ijoin(
 {
 	xfs_inode_log_item_t	*iip;
 
-	ASSERT(ip->i_transp == NULL);
 	ASSERT(xfs_isilocked(ip, XFS_ILOCK_EXCL));
 	if (ip->i_itemp == NULL)
 		xfs_inode_item_init(ip, ip->i_mount);
@@ -68,12 +67,6 @@ xfs_trans_ijoin(
 	xfs_trans_add_item(tp, &iip->ili_item);
 
 	xfs_trans_inode_broot_debug(ip);
-
-	/*
-	 * Initialize i_transp so we can find it with xfs_inode_incore()
-	 * in xfs_trans_iget() above.
-	 */
-	ip->i_transp = tp;
 }
 
 /*
@@ -111,7 +104,6 @@ xfs_trans_ichgtime(
 
 	ASSERT(tp);
 	ASSERT(xfs_isilocked(ip, XFS_ILOCK_EXCL));
-	ASSERT(ip->i_transp == tp);
 
 	tv = current_fs_time(inode->i_sb);
 
@@ -140,7 +132,6 @@ xfs_trans_log_inode(
 	xfs_inode_t	*ip,
 	uint		flags)
 {
-	ASSERT(ip->i_transp == tp);
 	ASSERT(ip->i_itemp != NULL);
 	ASSERT(xfs_isilocked(ip, XFS_ILOCK_EXCL));
 

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 88+ messages in thread

* [PATCH 11/27] xfs: kill the unused struct xfs_sync_work
  2011-07-01  9:43 [PATCH 00/27] patch queue for Linux 3.1, V2 Christoph Hellwig
                   ` (8 preceding siblings ...)
  2011-07-01  9:43 ` [PATCH 10/27] xfs: remove i_transp Christoph Hellwig
@ 2011-07-01  9:43 ` Christoph Hellwig
  2011-07-05 22:36   ` Alex Elder
  2011-07-01  9:43 ` [PATCH 12/27] xfs: factor out xfs_dir2_leaf_find_entry Christoph Hellwig
                   ` (16 subsequent siblings)
  26 siblings, 1 reply; 88+ messages in thread
From: Christoph Hellwig @ 2011-07-01  9:43 UTC (permalink / raw)
  To: xfs

[-- Attachment #1: xfs-kill-xfs_sync_work --]
[-- Type: text/plain, Size: 848 bytes --]

Signed-off-by: Christoph Hellwig <hch@lst.de>

Index: xfs/fs/xfs/linux-2.6/xfs_sync.h
===================================================================
--- xfs.orig/fs/xfs/linux-2.6/xfs_sync.h	2011-06-30 15:47:30.203125879 +0200
+++ xfs/fs/xfs/linux-2.6/xfs_sync.h	2011-06-30 15:47:39.093125768 +0200
@@ -21,14 +21,6 @@
 struct xfs_mount;
 struct xfs_perag;
 
-typedef struct xfs_sync_work {
-	struct list_head	w_list;
-	struct xfs_mount	*w_mount;
-	void			*w_data;	/* syncer routine argument */
-	void			(*w_syncer)(struct xfs_mount *, void *);
-	struct completion	*w_completion;
-} xfs_sync_work_t;
-
 #define SYNC_WAIT		0x0001	/* wait for i/o to complete */
 #define SYNC_TRYLOCK		0x0002  /* only try to lock inodes */
 

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 88+ messages in thread

* [PATCH 12/27] xfs: factor out xfs_dir2_leaf_find_entry
  2011-07-01  9:43 [PATCH 00/27] patch queue for Linux 3.1, V2 Christoph Hellwig
                   ` (9 preceding siblings ...)
  2011-07-01  9:43 ` [PATCH 11/27] xfs: kill the unused struct xfs_sync_work Christoph Hellwig
@ 2011-07-01  9:43 ` Christoph Hellwig
  2011-07-05 22:36   ` Alex Elder
  2011-07-01  9:43 ` [PATCH 13/27] xfs: cleanup shortform directory inode number handling Christoph Hellwig
                   ` (15 subsequent siblings)
  26 siblings, 1 reply; 88+ messages in thread
From: Christoph Hellwig @ 2011-07-01  9:43 UTC (permalink / raw)
  To: xfs

[-- Attachment #1: xfs-factor-dir2-leaf-code --]
[-- Type: text/plain, Size: 11010 bytes --]

Add a new xfs_dir2_leaf_find_entry helper to factor out some duplicate code
from xfs_dir2_leaf_addname xfs_dir2_leafn_add.  Found by Eric Sandeen using
an automated code duplication checker.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>

Index: xfs/fs/xfs/xfs_dir2_leaf.c
===================================================================
--- xfs.orig/fs/xfs/xfs_dir2_leaf.c	2011-06-29 19:45:24.846962285 +0200
+++ xfs/fs/xfs/xfs_dir2_leaf.c	2011-06-30 09:29:24.446740960 +0200
@@ -152,6 +152,123 @@ xfs_dir2_block_to_leaf(
 	return 0;
 }
 
+struct xfs_dir2_leaf_entry *
+xfs_dir2_leaf_find_entry(
+	xfs_dir2_leaf_t		*leaf,		/* leaf structure */
+	int			index,		/* leaf table position */
+	int			compact,	/* need to compact leaves */
+	int			lowstale,	/* index of prev stale leaf */
+	int			highstale,	/* index of next stale leaf */
+	int			*lfloglow,	/* low leaf logging index */
+	int			*lfloghigh)	/* high leaf logging index */
+{
+	if (!leaf->hdr.stale) {
+		xfs_dir2_leaf_entry_t	*lep;	/* leaf entry table pointer */
+
+		/*
+		 * Now we need to make room to insert the leaf entry.
+		 *
+		 * If there are no stale entries, just insert a hole at index.
+		 */
+		lep = &leaf->ents[index];
+		if (index < be16_to_cpu(leaf->hdr.count))
+			memmove(lep + 1, lep,
+				(be16_to_cpu(leaf->hdr.count) - index) *
+				 sizeof(*lep));
+
+		/*
+		 * Record low and high logging indices for the leaf.
+		 */
+		*lfloglow = index;
+		*lfloghigh = be16_to_cpu(leaf->hdr.count);
+		be16_add_cpu(&leaf->hdr.count, 1);
+		return lep;
+	}
+
+	/*
+	 * There are stale entries.
+	 *
+	 * We will use one of them for the new entry.  It's probably not at
+	 * the right location, so we'll have to shift some up or down first.
+	 *
+	 * If we didn't compact before, we need to find the nearest stale
+	 * entries before and after our insertion point.
+	 */
+	if (compact == 0) {
+		/*
+		 * Find the first stale entry before the insertion point,
+		 * if any.
+		 */
+		for (lowstale = index - 1;
+		     lowstale >= 0 &&
+			be32_to_cpu(leaf->ents[lowstale].address) !=
+			XFS_DIR2_NULL_DATAPTR;
+		     lowstale--)
+			continue;
+
+		/*
+		 * Find the next stale entry at or after the insertion point,
+		 * if any.   Stop if we go so far that the lowstale entry
+		 * would be better.
+		 */
+		for (highstale = index;
+		     highstale < be16_to_cpu(leaf->hdr.count) &&
+			be32_to_cpu(leaf->ents[highstale].address) !=
+			XFS_DIR2_NULL_DATAPTR &&
+			(lowstale < 0 ||
+			 index - lowstale - 1 >= highstale - index);
+		     highstale++)
+			continue;
+	}
+
+	/*
+	 * If the low one is better, use it.
+	 */
+	if (lowstale >= 0 &&
+	    (highstale == be16_to_cpu(leaf->hdr.count) ||
+	     index - lowstale - 1 < highstale - index)) {
+		ASSERT(index - lowstale - 1 >= 0);
+		ASSERT(be32_to_cpu(leaf->ents[lowstale].address) ==
+		       XFS_DIR2_NULL_DATAPTR);
+
+		/*
+		 * Copy entries up to cover the stale entry and make room
+		 * for the new entry.
+		 */
+		if (index - lowstale - 1 > 0) {
+			memmove(&leaf->ents[lowstale],
+				&leaf->ents[lowstale + 1],
+				(index - lowstale - 1) *
+				sizeof(xfs_dir2_leaf_entry_t));
+		}
+		*lfloglow = MIN(lowstale, *lfloglow);
+		*lfloghigh = MAX(index - 1, *lfloghigh);
+		be16_add_cpu(&leaf->hdr.stale, -1);
+		return &leaf->ents[index - 1];
+	}
+
+	/*
+	 * The high one is better, so use that one.
+	 */
+	ASSERT(highstale - index >= 0);
+	ASSERT(be32_to_cpu(leaf->ents[highstale].address) ==
+	       XFS_DIR2_NULL_DATAPTR);
+
+	/*
+	 * Copy entries down to cover the stale entry and make room for the
+	 * new entry.
+	 */
+	if (highstale - index > 0) {
+		memmove(&leaf->ents[index + 1],
+			&leaf->ents[index],
+			(highstale - index) * sizeof(xfs_dir2_leaf_entry_t));
+	}
+	*lfloglow = MIN(index, *lfloglow);
+	*lfloghigh = MAX(highstale, *lfloghigh);
+	be16_add_cpu(&leaf->hdr.stale, -1);
+	return &leaf->ents[index];
+}
+
 /*
  * Add an entry to a leaf form directory.
  */
@@ -430,102 +547,10 @@ xfs_dir2_leaf_addname(
 		if (!grown)
 			xfs_dir2_leaf_log_bests(tp, lbp, use_block, use_block);
 	}
-	/*
-	 * Now we need to make room to insert the leaf entry.
-	 * If there are no stale entries, we just insert a hole at index.
-	 */
-	if (!leaf->hdr.stale) {
-		/*
-		 * lep is still good as the index leaf entry.
-		 */
-		if (index < be16_to_cpu(leaf->hdr.count))
-			memmove(lep + 1, lep,
-				(be16_to_cpu(leaf->hdr.count) - index) * sizeof(*lep));
-		/*
-		 * Record low and high logging indices for the leaf.
-		 */
-		lfloglow = index;
-		lfloghigh = be16_to_cpu(leaf->hdr.count);
-		be16_add_cpu(&leaf->hdr.count, 1);
-	}
-	/*
-	 * There are stale entries.
-	 * We will use one of them for the new entry.
-	 * It's probably not at the right location, so we'll have to
-	 * shift some up or down first.
-	 */
-	else {
-		/*
-		 * If we didn't compact before, we need to find the nearest
-		 * stale entries before and after our insertion point.
-		 */
-		if (compact == 0) {
-			/*
-			 * Find the first stale entry before the insertion
-			 * point, if any.
-			 */
-			for (lowstale = index - 1;
-			     lowstale >= 0 &&
-				be32_to_cpu(leaf->ents[lowstale].address) !=
-				XFS_DIR2_NULL_DATAPTR;
-			     lowstale--)
-				continue;
-			/*
-			 * Find the next stale entry at or after the insertion
-			 * point, if any.   Stop if we go so far that the
-			 * lowstale entry would be better.
-			 */
-			for (highstale = index;
-			     highstale < be16_to_cpu(leaf->hdr.count) &&
-				be32_to_cpu(leaf->ents[highstale].address) !=
-				XFS_DIR2_NULL_DATAPTR &&
-				(lowstale < 0 ||
-				 index - lowstale - 1 >= highstale - index);
-			     highstale++)
-				continue;
-		}
-		/*
-		 * If the low one is better, use it.
-		 */
-		if (lowstale >= 0 &&
-		    (highstale == be16_to_cpu(leaf->hdr.count) ||
-		     index - lowstale - 1 < highstale - index)) {
-			ASSERT(index - lowstale - 1 >= 0);
-			ASSERT(be32_to_cpu(leaf->ents[lowstale].address) ==
-			       XFS_DIR2_NULL_DATAPTR);
-			/*
-			 * Copy entries up to cover the stale entry
-			 * and make room for the new entry.
-			 */
-			if (index - lowstale - 1 > 0)
-				memmove(&leaf->ents[lowstale],
-					&leaf->ents[lowstale + 1],
-					(index - lowstale - 1) * sizeof(*lep));
-			lep = &leaf->ents[index - 1];
-			lfloglow = MIN(lowstale, lfloglow);
-			lfloghigh = MAX(index - 1, lfloghigh);
-		}
-		/*
-		 * The high one is better, so use that one.
-		 */
-		else {
-			ASSERT(highstale - index >= 0);
-			ASSERT(be32_to_cpu(leaf->ents[highstale].address) ==
-			       XFS_DIR2_NULL_DATAPTR);
-			/*
-			 * Copy entries down to cover the stale entry
-			 * and make room for the new entry.
-			 */
-			if (highstale - index > 0)
-				memmove(&leaf->ents[index + 1],
-					&leaf->ents[index],
-					(highstale - index) * sizeof(*lep));
-			lep = &leaf->ents[index];
-			lfloglow = MIN(index, lfloglow);
-			lfloghigh = MAX(highstale, lfloghigh);
-		}
-		be16_add_cpu(&leaf->hdr.stale, -1);
-	}
+
+	lep = xfs_dir2_leaf_find_entry(leaf, index, compact, lowstale,
+				       highstale, &lfloglow, &lfloghigh);
+
 	/*
 	 * Fill in the new leaf entry.
 	 */
Index: xfs/fs/xfs/xfs_dir2_leaf.h
===================================================================
--- xfs.orig/fs/xfs/xfs_dir2_leaf.h	2011-06-29 19:45:24.856962230 +0200
+++ xfs/fs/xfs/xfs_dir2_leaf.h	2011-06-30 09:18:07.263416117 +0200
@@ -248,6 +248,9 @@ extern int xfs_dir2_leaf_search_hash(str
 				     struct xfs_dabuf *lbp);
 extern int xfs_dir2_leaf_trim_data(struct xfs_da_args *args,
 				   struct xfs_dabuf *lbp, xfs_dir2_db_t db);
+extern xfs_dir2_leaf_entry_t *xfs_dir2_leaf_find_entry(xfs_dir2_leaf_t *, int,
+						       int, int, int,
+						       int *, int *);
 extern int xfs_dir2_node_to_leaf(struct xfs_da_state *state);
 
 #endif	/* __XFS_DIR2_LEAF_H__ */
Index: xfs/fs/xfs/xfs_dir2_node.c
===================================================================
--- xfs.orig/fs/xfs/xfs_dir2_node.c	2011-06-29 19:45:24.870295493 +0200
+++ xfs/fs/xfs/xfs_dir2_node.c	2011-06-30 09:27:19.103409194 +0200
@@ -244,89 +244,13 @@ xfs_dir2_leafn_add(
 		lfloglow = be16_to_cpu(leaf->hdr.count);
 		lfloghigh = -1;
 	}
-	/*
-	 * No stale entries, just insert a space for the new entry.
-	 */
-	if (!leaf->hdr.stale) {
-		lep = &leaf->ents[index];
-		if (index < be16_to_cpu(leaf->hdr.count))
-			memmove(lep + 1, lep,
-				(be16_to_cpu(leaf->hdr.count) - index) * sizeof(*lep));
-		lfloglow = index;
-		lfloghigh = be16_to_cpu(leaf->hdr.count);
-		be16_add_cpu(&leaf->hdr.count, 1);
-	}
-	/*
-	 * There are stale entries.  We'll use one for the new entry.
-	 */
-	else {
-		/*
-		 * If we didn't do a compact then we need to figure out
-		 * which stale entry will be used.
-		 */
-		if (compact == 0) {
-			/*
-			 * Find first stale entry before our insertion point.
-			 */
-			for (lowstale = index - 1;
-			     lowstale >= 0 &&
-				be32_to_cpu(leaf->ents[lowstale].address) !=
-				XFS_DIR2_NULL_DATAPTR;
-			     lowstale--)
-				continue;
-			/*
-			 * Find next stale entry after insertion point.
-			 * Stop looking if the answer would be worse than
-			 * lowstale already found.
-			 */
-			for (highstale = index;
-			     highstale < be16_to_cpu(leaf->hdr.count) &&
-				be32_to_cpu(leaf->ents[highstale].address) !=
-				XFS_DIR2_NULL_DATAPTR &&
-				(lowstale < 0 ||
-				 index - lowstale - 1 >= highstale - index);
-			     highstale++)
-				continue;
-		}
-		/*
-		 * Using the low stale entry.
-		 * Shift entries up toward the stale slot.
-		 */
-		if (lowstale >= 0 &&
-		    (highstale == be16_to_cpu(leaf->hdr.count) ||
-		     index - lowstale - 1 < highstale - index)) {
-			ASSERT(be32_to_cpu(leaf->ents[lowstale].address) ==
-			       XFS_DIR2_NULL_DATAPTR);
-			ASSERT(index - lowstale - 1 >= 0);
-			if (index - lowstale - 1 > 0)
-				memmove(&leaf->ents[lowstale],
-					&leaf->ents[lowstale + 1],
-					(index - lowstale - 1) * sizeof(*lep));
-			lep = &leaf->ents[index - 1];
-			lfloglow = MIN(lowstale, lfloglow);
-			lfloghigh = MAX(index - 1, lfloghigh);
-		}
-		/*
-		 * Using the high stale entry.
-		 * Shift entries down toward the stale slot.
-		 */
-		else {
-			ASSERT(be32_to_cpu(leaf->ents[highstale].address) ==
-			       XFS_DIR2_NULL_DATAPTR);
-			ASSERT(highstale - index >= 0);
-			if (highstale - index > 0)
-				memmove(&leaf->ents[index + 1],
-					&leaf->ents[index],
-					(highstale - index) * sizeof(*lep));
-			lep = &leaf->ents[index];
-			lfloglow = MIN(index, lfloglow);
-			lfloghigh = MAX(highstale, lfloghigh);
-		}
-		be16_add_cpu(&leaf->hdr.stale, -1);
-	}
+
 	/*
 	 * Insert the new entry, log everything.
 	 */
+	lep = xfs_dir2_leaf_find_entry(leaf, index, compact, lowstale,
+				       highstale, &lfloglow, &lfloghigh);
+
 	lep->hashval = cpu_to_be32(args->hashval);
 	lep->address = cpu_to_be32(xfs_dir2_db_off_to_dataptr(mp,
 				args->blkno, args->index));

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 88+ messages in thread

* [PATCH 13/27] xfs: cleanup shortform directory inode number handling
  2011-07-01  9:43 [PATCH 00/27] patch queue for Linux 3.1, V2 Christoph Hellwig
                   ` (10 preceding siblings ...)
  2011-07-01  9:43 ` [PATCH 12/27] xfs: factor out xfs_dir2_leaf_find_entry Christoph Hellwig
@ 2011-07-01  9:43 ` Christoph Hellwig
  2011-07-05 22:36   ` Alex Elder
  2011-07-01  9:43 ` [PATCH 14/27] xfs: kill struct xfs_dir2_sf Christoph Hellwig
                   ` (14 subsequent siblings)
  26 siblings, 1 reply; 88+ messages in thread
From: Christoph Hellwig @ 2011-07-01  9:43 UTC (permalink / raw)
  To: xfs

[-- Attachment #1: xfs-dir2_sf-cleanup-inum-handling --]
[-- Type: text/plain, Size: 13549 bytes --]

Refactor the shortform directory helpers that deal with the 32-bit vs
64-bit wide inode numbers into more sensible helpers, and kill the
xfs_intino_t typedef that is now superflous.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>

Index: xfs/fs/xfs/xfs_dir2_sf.c
===================================================================
--- xfs.orig/fs/xfs/xfs_dir2_sf.c	2011-06-30 09:31:15.330073010 +0200
+++ xfs/fs/xfs/xfs_dir2_sf.c	2011-06-30 09:34:46.640070544 +0200
@@ -59,6 +59,79 @@ static void xfs_dir2_sf_toino4(xfs_da_ar
 static void xfs_dir2_sf_toino8(xfs_da_args_t *args);
 #endif /* XFS_BIG_INUMS */
 
+
+/*
+ * Inode numbers in short-form directories can come in two versions,
+ * either 4 bytes or 8 bytes wide.  These helpers deal with the
+ * two forms transparently by looking at the headers i8count field.
+ */
+static xfs_ino_t
+xfs_dir2_sf_get_ino(
+	struct xfs_dir2_sf	*sfp,
+	xfs_dir2_inou_t		*from)
+{
+	if (sfp->hdr.i8count)
+		return XFS_GET_DIR_INO8(from->i8);
+	else
+		return XFS_GET_DIR_INO4(from->i4);
+}
+
+static void
+xfs_dir2_sf_put_ino(
+	struct xfs_dir2_sf	*sfp,
+	xfs_dir2_inou_t		*to,
+	xfs_ino_t		ino)
+{
+	if (sfp->hdr.i8count)
+		XFS_PUT_DIR_INO8(ino, to->i8);
+	else
+		XFS_PUT_DIR_INO4(ino, to->i4);
+}
+
+xfs_ino_t
+xfs_dir2_sf_get_parent_ino(
+	struct xfs_dir2_sf	*sfp)
+{
+	return xfs_dir2_sf_get_ino(sfp, &sfp->hdr.parent);
+}
+
+static void
+xfs_dir2_sf_put_parent_ino(
+	struct xfs_dir2_sf	*sfp,
+	xfs_ino_t		ino)
+{
+	xfs_dir2_sf_put_ino(sfp, &sfp->hdr.parent, ino);
+}
+
+/*
+ * In short-form directory entries the inode numbers are stored at variable
+ * offset behind the entry name.  The inode numbers may only be accessed
+ * through the helpers below.
+ */
+static xfs_dir2_inou_t *
+xfs_dir2_sfe_inop(
+	struct xfs_dir2_sf_entry *sfep)
+{
+	return (xfs_dir2_inou_t *)&sfep->name[sfep->namelen];
+}
+
+xfs_ino_t
+xfs_dir2_sfe_get_ino(
+	struct xfs_dir2_sf	*sfp,
+	struct xfs_dir2_sf_entry *sfep)
+{
+	return xfs_dir2_sf_get_ino(sfp, xfs_dir2_sfe_inop(sfep));
+}
+
+static void
+xfs_dir2_sfe_put_ino(
+	struct xfs_dir2_sf	*sfp,
+	struct xfs_dir2_sf_entry *sfep,
+	xfs_ino_t		ino)
+{
+	xfs_dir2_sf_put_ino(sfp, xfs_dir2_sfe_inop(sfep), ino);
+}
+
 /*
  * Given a block directory (dp/block), calculate its size as a shortform (sf)
  * directory and a header for the sf directory, if it will fit it the
@@ -138,7 +211,7 @@ xfs_dir2_block_sfsize(
 	 */
 	sfhp->count = count;
 	sfhp->i8count = i8count;
-	xfs_dir2_sf_put_inumber((xfs_dir2_sf_t *)sfhp, &parent, &sfhp->parent);
+	xfs_dir2_sf_put_parent_ino((xfs_dir2_sf_t *)sfhp, parent);
 	return size;
 }
 
@@ -165,7 +238,6 @@ xfs_dir2_block_to_sf(
 	char			*ptr;		/* current data pointer */
 	xfs_dir2_sf_entry_t	*sfep;		/* shortform entry */
 	xfs_dir2_sf_t		*sfp;		/* shortform structure */
-	xfs_ino_t               temp;
 
 	trace_xfs_dir2_block_to_sf(args);
 
@@ -233,7 +305,7 @@ xfs_dir2_block_to_sf(
 		else if (dep->namelen == 2 &&
 			 dep->name[0] == '.' && dep->name[1] == '.')
 			ASSERT(be64_to_cpu(dep->inumber) ==
-			       xfs_dir2_sf_get_inumber(sfp, &sfp->hdr.parent));
+			       xfs_dir2_sf_get_parent_ino(sfp));
 		/*
 		 * Normal entry, copy it into shortform.
 		 */
@@ -243,9 +315,9 @@ xfs_dir2_block_to_sf(
 				(xfs_dir2_data_aoff_t)
 				((char *)dep - (char *)block));
 			memcpy(sfep->name, dep->name, dep->namelen);
-			temp = be64_to_cpu(dep->inumber);
-			xfs_dir2_sf_put_inumber(sfp, &temp,
-				xfs_dir2_sf_inumberp(sfep));
+			xfs_dir2_sfe_put_ino(sfp, sfep,
+					     be64_to_cpu(dep->inumber));
+
 			sfep = xfs_dir2_sf_nextentry(sfp, sfep);
 		}
 		ptr += xfs_dir2_data_entsize(dep->namelen);
@@ -406,8 +478,7 @@ xfs_dir2_sf_addname_easy(
 	sfep->namelen = args->namelen;
 	xfs_dir2_sf_put_offset(sfep, offset);
 	memcpy(sfep->name, args->name, sfep->namelen);
-	xfs_dir2_sf_put_inumber(sfp, &args->inumber,
-		xfs_dir2_sf_inumberp(sfep));
+	xfs_dir2_sfe_put_ino(sfp, sfep, args->inumber);
 	/*
 	 * Update the header and inode.
 	 */
@@ -498,8 +569,7 @@ xfs_dir2_sf_addname_hard(
 	sfep->namelen = args->namelen;
 	xfs_dir2_sf_put_offset(sfep, offset);
 	memcpy(sfep->name, args->name, sfep->namelen);
-	xfs_dir2_sf_put_inumber(sfp, &args->inumber,
-		xfs_dir2_sf_inumberp(sfep));
+	xfs_dir2_sfe_put_ino(sfp, sfep, args->inumber);
 	sfp->hdr.count++;
 #if XFS_BIG_INUMS
 	if (args->inumber > XFS_DIR2_MAX_SHORT_INUM && !objchange)
@@ -618,14 +688,14 @@ xfs_dir2_sf_check(
 
 	sfp = (xfs_dir2_sf_t *)dp->i_df.if_u1.if_data;
 	offset = XFS_DIR2_DATA_FIRST_OFFSET;
-	ino = xfs_dir2_sf_get_inumber(sfp, &sfp->hdr.parent);
+	ino = xfs_dir2_sf_get_parent_ino(sfp);
 	i8count = ino > XFS_DIR2_MAX_SHORT_INUM;
 
 	for (i = 0, sfep = xfs_dir2_sf_firstentry(sfp);
 	     i < sfp->hdr.count;
 	     i++, sfep = xfs_dir2_sf_nextentry(sfp, sfep)) {
 		ASSERT(xfs_dir2_sf_get_offset(sfep) >= offset);
-		ino = xfs_dir2_sf_get_inumber(sfp, xfs_dir2_sf_inumberp(sfep));
+		ino = xfs_dir2_sfe_get_ino(sfp, sfep);
 		i8count += ino > XFS_DIR2_MAX_SHORT_INUM;
 		offset =
 			xfs_dir2_sf_get_offset(sfep) +
@@ -686,7 +756,7 @@ xfs_dir2_sf_create(
 	/*
 	 * Now can put in the inode number, since i8count is set.
 	 */
-	xfs_dir2_sf_put_inumber(sfp, &pino, &sfp->hdr.parent);
+	xfs_dir2_sf_put_parent_ino(sfp, pino);
 	sfp->hdr.count = 0;
 	dp->i_d.di_size = size;
 	xfs_dir2_sf_check(args);
@@ -759,7 +829,7 @@ xfs_dir2_sf_getdents(
 	 * Put .. entry unless we're starting past it.
 	 */
 	if (*offset <= dotdot_offset) {
-		ino = xfs_dir2_sf_get_inumber(sfp, &sfp->hdr.parent);
+		ino = xfs_dir2_sf_get_parent_ino(sfp);
 		if (filldir(dirent, "..", 2, dotdot_offset & 0x7fffffff, ino, DT_DIR)) {
 			*offset = dotdot_offset & 0x7fffffff;
 			return 0;
@@ -779,7 +849,7 @@ xfs_dir2_sf_getdents(
 			continue;
 		}
 
-		ino = xfs_dir2_sf_get_inumber(sfp, xfs_dir2_sf_inumberp(sfep));
+		ino = xfs_dir2_sfe_get_ino(sfp, sfep);
 		if (filldir(dirent, (char *)sfep->name, sfep->namelen,
 			    off & 0x7fffffff, ino, DT_UNKNOWN)) {
 			*offset = off & 0x7fffffff;
@@ -839,7 +909,7 @@ xfs_dir2_sf_lookup(
 	 */
 	if (args->namelen == 2 &&
 	    args->name[0] == '.' && args->name[1] == '.') {
-		args->inumber = xfs_dir2_sf_get_inumber(sfp, &sfp->hdr.parent);
+		args->inumber = xfs_dir2_sf_get_parent_ino(sfp);
 		args->cmpresult = XFS_CMP_EXACT;
 		return XFS_ERROR(EEXIST);
 	}
@@ -858,8 +928,7 @@ xfs_dir2_sf_lookup(
 								sfep->namelen);
 		if (cmp != XFS_CMP_DIFFERENT && cmp != args->cmpresult) {
 			args->cmpresult = cmp;
-			args->inumber = xfs_dir2_sf_get_inumber(sfp,
-						xfs_dir2_sf_inumberp(sfep));
+			args->inumber = xfs_dir2_sfe_get_ino(sfp, sfep);
 			if (cmp == XFS_CMP_EXACT)
 				return XFS_ERROR(EEXIST);
 			ci_sfep = sfep;
@@ -918,9 +987,8 @@ xfs_dir2_sf_removename(
 				i++, sfep = xfs_dir2_sf_nextentry(sfp, sfep)) {
 		if (xfs_da_compname(args, sfep->name, sfep->namelen) ==
 								XFS_CMP_EXACT) {
-			ASSERT(xfs_dir2_sf_get_inumber(sfp,
-						xfs_dir2_sf_inumberp(sfep)) ==
-								args->inumber);
+			ASSERT(xfs_dir2_sfe_get_ino(sfp, sfep) ==
+			       args->inumber);
 			break;
 		}
 	}
@@ -1040,10 +1108,10 @@ xfs_dir2_sf_replace(
 	if (args->namelen == 2 &&
 	    args->name[0] == '.' && args->name[1] == '.') {
 #if XFS_BIG_INUMS || defined(DEBUG)
-		ino = xfs_dir2_sf_get_inumber(sfp, &sfp->hdr.parent);
+		ino = xfs_dir2_sf_get_parent_ino(sfp);
 		ASSERT(args->inumber != ino);
 #endif
-		xfs_dir2_sf_put_inumber(sfp, &args->inumber, &sfp->hdr.parent);
+		xfs_dir2_sf_put_parent_ino(sfp, args->inumber);
 	}
 	/*
 	 * Normal entry, look for the name.
@@ -1055,12 +1123,10 @@ xfs_dir2_sf_replace(
 			if (xfs_da_compname(args, sfep->name, sfep->namelen) ==
 								XFS_CMP_EXACT) {
 #if XFS_BIG_INUMS || defined(DEBUG)
-				ino = xfs_dir2_sf_get_inumber(sfp,
-					xfs_dir2_sf_inumberp(sfep));
+				ino = xfs_dir2_sfe_get_ino(sfp, sfep);
 				ASSERT(args->inumber != ino);
 #endif
-				xfs_dir2_sf_put_inumber(sfp, &args->inumber,
-					xfs_dir2_sf_inumberp(sfep));
+				xfs_dir2_sfe_put_ino(sfp, sfep, args->inumber);
 				break;
 			}
 		}
@@ -1121,7 +1187,6 @@ xfs_dir2_sf_toino4(
 	char			*buf;		/* old dir's buffer */
 	xfs_inode_t		*dp;		/* incore directory inode */
 	int			i;		/* entry index */
-	xfs_ino_t		ino;		/* entry inode number */
 	int			newsize;	/* new inode size */
 	xfs_dir2_sf_entry_t	*oldsfep;	/* old sf entry */
 	xfs_dir2_sf_t		*oldsfp;	/* old sf directory */
@@ -1162,8 +1227,7 @@ xfs_dir2_sf_toino4(
 	 */
 	sfp->hdr.count = oldsfp->hdr.count;
 	sfp->hdr.i8count = 0;
-	ino = xfs_dir2_sf_get_inumber(oldsfp, &oldsfp->hdr.parent);
-	xfs_dir2_sf_put_inumber(sfp, &ino, &sfp->hdr.parent);
+	xfs_dir2_sf_put_parent_ino(sfp, xfs_dir2_sf_get_parent_ino(oldsfp));
 	/*
 	 * Copy the entries field by field.
 	 */
@@ -1175,9 +1239,8 @@ xfs_dir2_sf_toino4(
 		sfep->namelen = oldsfep->namelen;
 		sfep->offset = oldsfep->offset;
 		memcpy(sfep->name, oldsfep->name, sfep->namelen);
-		ino = xfs_dir2_sf_get_inumber(oldsfp,
-			xfs_dir2_sf_inumberp(oldsfep));
-		xfs_dir2_sf_put_inumber(sfp, &ino, xfs_dir2_sf_inumberp(sfep));
+		xfs_dir2_sfe_put_ino(sfp, sfep,
+			xfs_dir2_sfe_get_ino(oldsfp, oldsfep));
 	}
 	/*
 	 * Clean up the inode.
@@ -1199,7 +1262,6 @@ xfs_dir2_sf_toino8(
 	char			*buf;		/* old dir's buffer */
 	xfs_inode_t		*dp;		/* incore directory inode */
 	int			i;		/* entry index */
-	xfs_ino_t		ino;		/* entry inode number */
 	int			newsize;	/* new inode size */
 	xfs_dir2_sf_entry_t	*oldsfep;	/* old sf entry */
 	xfs_dir2_sf_t		*oldsfp;	/* old sf directory */
@@ -1240,8 +1302,7 @@ xfs_dir2_sf_toino8(
 	 */
 	sfp->hdr.count = oldsfp->hdr.count;
 	sfp->hdr.i8count = 1;
-	ino = xfs_dir2_sf_get_inumber(oldsfp, &oldsfp->hdr.parent);
-	xfs_dir2_sf_put_inumber(sfp, &ino, &sfp->hdr.parent);
+	xfs_dir2_sf_put_parent_ino(sfp, xfs_dir2_sf_get_parent_ino(oldsfp));
 	/*
 	 * Copy the entries field by field.
 	 */
@@ -1253,9 +1314,8 @@ xfs_dir2_sf_toino8(
 		sfep->namelen = oldsfep->namelen;
 		sfep->offset = oldsfep->offset;
 		memcpy(sfep->name, oldsfep->name, sfep->namelen);
-		ino = xfs_dir2_sf_get_inumber(oldsfp,
-			xfs_dir2_sf_inumberp(oldsfep));
-		xfs_dir2_sf_put_inumber(sfp, &ino, xfs_dir2_sf_inumberp(sfep));
+		xfs_dir2_sfe_put_ino(sfp, sfep,
+			xfs_dir2_sfe_get_ino(oldsfp, oldsfep));
 	}
 	/*
 	 * Clean up the inode.
Index: xfs/fs/xfs/xfs_dir2_sf.h
===================================================================
--- xfs.orig/fs/xfs/xfs_dir2_sf.h	2011-06-30 09:31:15.343406344 +0200
+++ xfs/fs/xfs/xfs_dir2_sf.h	2011-06-30 09:32:00.390072451 +0200
@@ -90,28 +90,6 @@ static inline int xfs_dir2_sf_hdr_size(i
 		((uint)sizeof(xfs_dir2_ino8_t) - (uint)sizeof(xfs_dir2_ino4_t)));
 }
 
-static inline xfs_dir2_inou_t *xfs_dir2_sf_inumberp(xfs_dir2_sf_entry_t *sfep)
-{
-	return (xfs_dir2_inou_t *)&(sfep)->name[(sfep)->namelen];
-}
-
-static inline xfs_intino_t
-xfs_dir2_sf_get_inumber(xfs_dir2_sf_t *sfp, xfs_dir2_inou_t *from)
-{
-	return ((sfp)->hdr.i8count == 0 ? \
-		(xfs_intino_t)XFS_GET_DIR_INO4((from)->i4) : \
-		(xfs_intino_t)XFS_GET_DIR_INO8((from)->i8));
-}
-
-static inline void xfs_dir2_sf_put_inumber(xfs_dir2_sf_t *sfp, xfs_ino_t *from,
-						xfs_dir2_inou_t *to)
-{
-	if ((sfp)->hdr.i8count == 0)
-		XFS_PUT_DIR_INO4(*(from), (to)->i4);
-	else
-		XFS_PUT_DIR_INO8(*(from), (to)->i8);
-}
-
 static inline xfs_dir2_data_aoff_t
 xfs_dir2_sf_get_offset(xfs_dir2_sf_entry_t *sfep)
 {
@@ -155,6 +133,9 @@ xfs_dir2_sf_nextentry(xfs_dir2_sf_t *sfp
 /*
  * Functions.
  */
+extern xfs_ino_t xfs_dir2_sf_get_parent_ino(struct xfs_dir2_sf *sfp);
+extern xfs_ino_t xfs_dir2_sfe_get_ino(struct xfs_dir2_sf *sfp,
+				      struct xfs_dir2_sf_entry *sfep);
 extern int xfs_dir2_block_sfsize(struct xfs_inode *dp,
 				 struct xfs_dir2_block *block,
 				 xfs_dir2_sf_hdr_t *sfhp);
Index: xfs/fs/xfs/xfs_inum.h
===================================================================
--- xfs.orig/fs/xfs/xfs_inum.h	2011-06-30 09:31:15.353406344 +0200
+++ xfs/fs/xfs/xfs_inum.h	2011-06-30 09:32:00.390072451 +0200
@@ -28,17 +28,6 @@
 
 typedef	__uint32_t	xfs_agino_t;	/* within allocation grp inode number */
 
-/*
- * Useful inode bits for this kernel.
- * Used in some places where having 64-bits in the 32-bit kernels
- * costs too much.
- */
-#if XFS_BIG_INUMS
-typedef	xfs_ino_t	xfs_intino_t;
-#else
-typedef	__uint32_t	xfs_intino_t;
-#endif
-
 #define	NULLFSINO	((xfs_ino_t)-1)
 #define	NULLAGINO	((xfs_agino_t)-1)
 
Index: xfs/fs/xfs/xfs_dir2_block.c
===================================================================
--- xfs.orig/fs/xfs/xfs_dir2_block.c	2011-06-30 09:31:15.000000000 +0200
+++ xfs/fs/xfs/xfs_dir2_block.c	2011-06-30 09:32:00.393405784 +0200
@@ -1146,7 +1146,7 @@ xfs_dir2_sf_to_block(
 	 */
 	dep = (xfs_dir2_data_entry_t *)
 		((char *)block + XFS_DIR2_DATA_DOTDOT_OFFSET);
-	dep->inumber = cpu_to_be64(xfs_dir2_sf_get_inumber(sfp, &sfp->hdr.parent));
+	dep->inumber = cpu_to_be64(xfs_dir2_sf_get_parent_ino(sfp));
 	dep->namelen = 2;
 	dep->name[0] = dep->name[1] = '.';
 	tagp = xfs_dir2_data_entry_tag_p(dep);
@@ -1195,8 +1195,7 @@ xfs_dir2_sf_to_block(
 		 * Copy a real entry.
 		 */
 		dep = (xfs_dir2_data_entry_t *)((char *)block + newoffset);
-		dep->inumber = cpu_to_be64(xfs_dir2_sf_get_inumber(sfp,
-				xfs_dir2_sf_inumberp(sfep)));
+		dep->inumber = cpu_to_be64(xfs_dir2_sfe_get_ino(sfp, sfep));
 		dep->namelen = sfep->namelen;
 		memcpy(dep->name, sfep->name, dep->namelen);
 		tagp = xfs_dir2_data_entry_tag_p(dep);

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 88+ messages in thread

* [PATCH 14/27] xfs: kill struct xfs_dir2_sf
  2011-07-01  9:43 [PATCH 00/27] patch queue for Linux 3.1, V2 Christoph Hellwig
                   ` (11 preceding siblings ...)
  2011-07-01  9:43 ` [PATCH 13/27] xfs: cleanup shortform directory inode number handling Christoph Hellwig
@ 2011-07-01  9:43 ` Christoph Hellwig
  2011-07-06  1:57   ` Dave Chinner
  2011-07-06  3:24   ` Alex Elder
  2011-07-01  9:43 ` [PATCH 15/27] xfs: cleanup the defintion of struct xfs_dir2_sf_entry Christoph Hellwig
                   ` (13 subsequent siblings)
  26 siblings, 2 replies; 88+ messages in thread
From: Christoph Hellwig @ 2011-07-01  9:43 UTC (permalink / raw)
  To: xfs

[-- Attachment #1: xfs-kill-xfs_dir2_sf_t --]
[-- Type: text/plain, Size: 29729 bytes --]

The list field of it is never cactually used, so all uses can simply be
replaced with the xfs_dir2_sf_hdr_t type that it has as first member.

Signed-off-by: Christoph Hellwig <hch@lst.de>

Index: xfs/fs/xfs/xfs_dir2.c
===================================================================
--- xfs.orig/fs/xfs/xfs_dir2.c	2011-06-29 19:45:24.000000000 +0200
+++ xfs/fs/xfs/xfs_dir2.c	2011-06-30 09:35:55.806736193 +0200
@@ -122,15 +122,15 @@ int
 xfs_dir_isempty(
 	xfs_inode_t	*dp)
 {
-	xfs_dir2_sf_t	*sfp;
+	xfs_dir2_sf_hdr_t	*sfp;
 
 	ASSERT((dp->i_d.di_mode & S_IFMT) == S_IFDIR);
 	if (dp->i_d.di_size == 0)	/* might happen during shutdown. */
 		return 1;
 	if (dp->i_d.di_size > XFS_IFORK_DSIZE(dp))
 		return 0;
-	sfp = (xfs_dir2_sf_t *)dp->i_df.if_u1.if_data;
-	return !sfp->hdr.count;
+	sfp = (xfs_dir2_sf_hdr_t *)dp->i_df.if_u1.if_data;
+	return !sfp->count;
 }
 
 /*
Index: xfs/fs/xfs/xfs_dir2_block.c
===================================================================
--- xfs.orig/fs/xfs/xfs_dir2_block.c	2011-06-30 09:32:00.000000000 +0200
+++ xfs/fs/xfs/xfs_dir2_block.c	2011-06-30 09:35:55.810069526 +0200
@@ -1028,8 +1028,6 @@ xfs_dir2_sf_to_block(
 	xfs_dir2_leaf_entry_t	*blp;		/* block leaf entries */
 	xfs_dabuf_t		*bp;		/* block buffer */
 	xfs_dir2_block_tail_t	*btp;		/* block tail pointer */
-	char			*buf;		/* sf buffer */
-	int			buf_len;
 	xfs_dir2_data_entry_t	*dep;		/* data entry pointer */
 	xfs_inode_t		*dp;		/* incore directory inode */
 	int			dummy;		/* trash */
@@ -1043,7 +1041,8 @@ xfs_dir2_sf_to_block(
 	int			newoffset;	/* offset from current entry */
 	int			offset;		/* target block offset */
 	xfs_dir2_sf_entry_t	*sfep;		/* sf entry pointer */
-	xfs_dir2_sf_t		*sfp;		/* shortform structure */
+	xfs_dir2_sf_hdr_t	*oldsfp;	/* old shortform header  */
+	xfs_dir2_sf_hdr_t	*sfp;		/* shortform header  */
 	__be16			*tagp;		/* end of data entry */
 	xfs_trans_t		*tp;		/* transaction pointer */
 	struct xfs_name		name;
@@ -1061,32 +1060,30 @@ xfs_dir2_sf_to_block(
 		ASSERT(XFS_FORCED_SHUTDOWN(mp));
 		return XFS_ERROR(EIO);
 	}
+
+	oldsfp = (xfs_dir2_sf_hdr_t *)dp->i_df.if_u1.if_data;
+
 	ASSERT(dp->i_df.if_bytes == dp->i_d.di_size);
 	ASSERT(dp->i_df.if_u1.if_data != NULL);
-	sfp = (xfs_dir2_sf_t *)dp->i_df.if_u1.if_data;
-	ASSERT(dp->i_d.di_size >= xfs_dir2_sf_hdr_size(sfp->hdr.i8count));
+	ASSERT(dp->i_d.di_size >= xfs_dir2_sf_hdr_size(oldsfp->i8count));
+
 	/*
-	 * Copy the directory into the stack buffer.
+	 * Copy the directory into a temporary buffer.
 	 * Then pitch the incore inode data so we can make extents.
 	 */
+	sfp = kmem_alloc(dp->i_df.if_bytes, KM_SLEEP);
+	memcpy(sfp, oldsfp, dp->i_df.if_bytes);
 
-	buf_len = dp->i_df.if_bytes;
-	buf = kmem_alloc(buf_len, KM_SLEEP);
-
-	memcpy(buf, sfp, buf_len);
-	xfs_idata_realloc(dp, -buf_len, XFS_DATA_FORK);
+	xfs_idata_realloc(dp, -dp->i_df.if_bytes, XFS_DATA_FORK);
 	dp->i_d.di_size = 0;
 	xfs_trans_log_inode(tp, dp, XFS_ILOG_CORE);
-	/*
-	 * Reset pointer - old sfp is gone.
-	 */
-	sfp = (xfs_dir2_sf_t *)buf;
+
 	/*
 	 * Add block 0 to the inode.
 	 */
 	error = xfs_dir2_grow_inode(args, XFS_DIR2_DATA_SPACE, &blkno);
 	if (error) {
-		kmem_free(buf);
+		kmem_free(sfp);
 		return error;
 	}
 	/*
@@ -1094,7 +1091,7 @@ xfs_dir2_sf_to_block(
 	 */
 	error = xfs_dir2_data_init(args, blkno, &bp);
 	if (error) {
-		kmem_free(buf);
+		kmem_free(sfp);
 		return error;
 	}
 	block = bp->data;
@@ -1103,7 +1100,7 @@ xfs_dir2_sf_to_block(
 	 * Compute size of block "tail" area.
 	 */
 	i = (uint)sizeof(*btp) +
-	    (sfp->hdr.count + 2) * (uint)sizeof(xfs_dir2_leaf_entry_t);
+	    (sfp->count + 2) * (uint)sizeof(xfs_dir2_leaf_entry_t);
 	/*
 	 * The whole thing is initialized to free by the init routine.
 	 * Say we're using the leaf and tail area.
@@ -1117,7 +1114,7 @@ xfs_dir2_sf_to_block(
 	 * Fill in the tail.
 	 */
 	btp = xfs_dir2_block_tail_p(mp, block);
-	btp->count = cpu_to_be32(sfp->hdr.count + 2);	/* ., .. */
+	btp->count = cpu_to_be32(sfp->count + 2);	/* ., .. */
 	btp->stale = 0;
 	blp = xfs_dir2_block_leaf_p(btp);
 	endoffset = (uint)((char *)blp - (char *)block);
@@ -1159,7 +1156,8 @@ xfs_dir2_sf_to_block(
 	/*
 	 * Loop over existing entries, stuff them in.
 	 */
-	if ((i = 0) == sfp->hdr.count)
+	i = 0;
+	if (!sfp->count)
 		sfep = NULL;
 	else
 		sfep = xfs_dir2_sf_firstentry(sfp);
@@ -1208,13 +1206,13 @@ xfs_dir2_sf_to_block(
 		blp[2 + i].address = cpu_to_be32(xfs_dir2_byte_to_dataptr(mp,
 						 (char *)dep - (char *)block));
 		offset = (int)((char *)(tagp + 1) - (char *)block);
-		if (++i == sfp->hdr.count)
+		if (++i == sfp->count)
 			sfep = NULL;
 		else
 			sfep = xfs_dir2_sf_nextentry(sfp, sfep);
 	}
 	/* Done with the temporary buffer */
-	kmem_free(buf);
+	kmem_free(sfp);
 	/*
 	 * Sort the leaf entries by hash value.
 	 */
Index: xfs/fs/xfs/xfs_dir2_sf.c
===================================================================
--- xfs.orig/fs/xfs/xfs_dir2_sf.c	2011-06-30 09:34:46.000000000 +0200
+++ xfs/fs/xfs/xfs_dir2_sf.c	2011-06-30 09:37:41.120068219 +0200
@@ -67,10 +67,10 @@ static void xfs_dir2_sf_toino8(xfs_da_ar
  */
 static xfs_ino_t
 xfs_dir2_sf_get_ino(
-	struct xfs_dir2_sf	*sfp,
+	struct xfs_dir2_sf_hdr	*hdr,
 	xfs_dir2_inou_t		*from)
 {
-	if (sfp->hdr.i8count)
+	if (hdr->i8count)
 		return XFS_GET_DIR_INO8(from->i8);
 	else
 		return XFS_GET_DIR_INO4(from->i4);
@@ -78,11 +78,11 @@ xfs_dir2_sf_get_ino(
 
 static void
 xfs_dir2_sf_put_ino(
-	struct xfs_dir2_sf	*sfp,
+	struct xfs_dir2_sf_hdr	*hdr,
 	xfs_dir2_inou_t		*to,
 	xfs_ino_t		ino)
 {
-	if (sfp->hdr.i8count)
+	if (hdr->i8count)
 		XFS_PUT_DIR_INO8(ino, to->i8);
 	else
 		XFS_PUT_DIR_INO4(ino, to->i4);
@@ -90,17 +90,17 @@ xfs_dir2_sf_put_ino(
 
 xfs_ino_t
 xfs_dir2_sf_get_parent_ino(
-	struct xfs_dir2_sf	*sfp)
+	struct xfs_dir2_sf_hdr	*hdr)
 {
-	return xfs_dir2_sf_get_ino(sfp, &sfp->hdr.parent);
+	return xfs_dir2_sf_get_ino(hdr, &hdr->parent);
 }
 
 static void
 xfs_dir2_sf_put_parent_ino(
-	struct xfs_dir2_sf	*sfp,
+	struct xfs_dir2_sf_hdr	*hdr,
 	xfs_ino_t		ino)
 {
-	xfs_dir2_sf_put_ino(sfp, &sfp->hdr.parent, ino);
+	xfs_dir2_sf_put_ino(hdr, &hdr->parent, ino);
 }
 
 /*
@@ -117,19 +117,19 @@ xfs_dir2_sfe_inop(
 
 xfs_ino_t
 xfs_dir2_sfe_get_ino(
-	struct xfs_dir2_sf	*sfp,
+	struct xfs_dir2_sf_hdr	*hdr,
 	struct xfs_dir2_sf_entry *sfep)
 {
-	return xfs_dir2_sf_get_ino(sfp, xfs_dir2_sfe_inop(sfep));
+	return xfs_dir2_sf_get_ino(hdr, xfs_dir2_sfe_inop(sfep));
 }
 
 static void
 xfs_dir2_sfe_put_ino(
-	struct xfs_dir2_sf	*sfp,
+	struct xfs_dir2_sf_hdr	*hdr,
 	struct xfs_dir2_sf_entry *sfep,
 	xfs_ino_t		ino)
 {
-	xfs_dir2_sf_put_ino(sfp, xfs_dir2_sfe_inop(sfep), ino);
+	xfs_dir2_sf_put_ino(hdr, xfs_dir2_sfe_inop(sfep), ino);
 }
 
 /*
@@ -211,7 +211,7 @@ xfs_dir2_block_sfsize(
 	 */
 	sfhp->count = count;
 	sfhp->i8count = i8count;
-	xfs_dir2_sf_put_parent_ino((xfs_dir2_sf_t *)sfhp, parent);
+	xfs_dir2_sf_put_parent_ino(sfhp, parent);
 	return size;
 }
 
@@ -237,7 +237,7 @@ xfs_dir2_block_to_sf(
 	xfs_mount_t		*mp;		/* filesystem mount point */
 	char			*ptr;		/* current data pointer */
 	xfs_dir2_sf_entry_t	*sfep;		/* shortform entry */
-	xfs_dir2_sf_t		*sfp;		/* shortform structure */
+	xfs_dir2_sf_hdr_t	*sfp;		/* shortform structure */
 
 	trace_xfs_dir2_block_to_sf(args);
 
@@ -270,7 +270,7 @@ xfs_dir2_block_to_sf(
 	/*
 	 * Copy the header into the newly allocate local space.
 	 */
-	sfp = (xfs_dir2_sf_t *)dp->i_df.if_u1.if_data;
+	sfp = (xfs_dir2_sf_hdr_t *)dp->i_df.if_u1.if_data;
 	memcpy(sfp, sfhp, xfs_dir2_sf_hdr_size(sfhp->i8count));
 	dp->i_d.di_size = size;
 	/*
@@ -349,7 +349,7 @@ xfs_dir2_sf_addname(
 	xfs_dir2_data_aoff_t	offset = 0;	/* offset for new entry */
 	int			old_isize;	/* di_size before adding name */
 	int			pick;		/* which algorithm to use */
-	xfs_dir2_sf_t		*sfp;		/* shortform structure */
+	xfs_dir2_sf_hdr_t	*sfp;		/* shortform structure */
 	xfs_dir2_sf_entry_t	*sfep = NULL;	/* shortform entry */
 
 	trace_xfs_dir2_sf_addname(args);
@@ -366,8 +366,8 @@ xfs_dir2_sf_addname(
 	}
 	ASSERT(dp->i_df.if_bytes == dp->i_d.di_size);
 	ASSERT(dp->i_df.if_u1.if_data != NULL);
-	sfp = (xfs_dir2_sf_t *)dp->i_df.if_u1.if_data;
-	ASSERT(dp->i_d.di_size >= xfs_dir2_sf_hdr_size(sfp->hdr.i8count));
+	sfp = (xfs_dir2_sf_hdr_t *)dp->i_df.if_u1.if_data;
+	ASSERT(dp->i_d.di_size >= xfs_dir2_sf_hdr_size(sfp->i8count));
 	/*
 	 * Compute entry (and change in) size.
 	 */
@@ -378,7 +378,7 @@ xfs_dir2_sf_addname(
 	/*
 	 * Do we have to change to 8 byte inodes?
 	 */
-	if (args->inumber > XFS_DIR2_MAX_SHORT_INUM && sfp->hdr.i8count == 0) {
+	if (args->inumber > XFS_DIR2_MAX_SHORT_INUM && sfp->i8count == 0) {
 		/*
 		 * Yes, adjust the entry size and the total size.
 		 */
@@ -386,7 +386,7 @@ xfs_dir2_sf_addname(
 			(uint)sizeof(xfs_dir2_ino8_t) -
 			(uint)sizeof(xfs_dir2_ino4_t);
 		incr_isize +=
-			(sfp->hdr.count + 2) *
+			(sfp->count + 2) *
 			((uint)sizeof(xfs_dir2_ino8_t) -
 			 (uint)sizeof(xfs_dir2_ino4_t));
 		objchange = 1;
@@ -456,11 +456,11 @@ xfs_dir2_sf_addname_easy(
 {
 	int			byteoff;	/* byte offset in sf dir */
 	xfs_inode_t		*dp;		/* incore directory inode */
-	xfs_dir2_sf_t		*sfp;		/* shortform structure */
+	xfs_dir2_sf_hdr_t	*sfp;		/* shortform structure */
 
 	dp = args->dp;
 
-	sfp = (xfs_dir2_sf_t *)dp->i_df.if_u1.if_data;
+	sfp = (xfs_dir2_sf_hdr_t *)dp->i_df.if_u1.if_data;
 	byteoff = (int)((char *)sfep - (char *)sfp);
 	/*
 	 * Grow the in-inode space.
@@ -470,7 +470,7 @@ xfs_dir2_sf_addname_easy(
 	/*
 	 * Need to set up again due to realloc of the inode data.
 	 */
-	sfp = (xfs_dir2_sf_t *)dp->i_df.if_u1.if_data;
+	sfp = (xfs_dir2_sf_hdr_t *)dp->i_df.if_u1.if_data;
 	sfep = (xfs_dir2_sf_entry_t *)((char *)sfp + byteoff);
 	/*
 	 * Fill in the new entry.
@@ -482,10 +482,10 @@ xfs_dir2_sf_addname_easy(
 	/*
 	 * Update the header and inode.
 	 */
-	sfp->hdr.count++;
+	sfp->count++;
 #if XFS_BIG_INUMS
 	if (args->inumber > XFS_DIR2_MAX_SHORT_INUM)
-		sfp->hdr.i8count++;
+		sfp->i8count++;
 #endif
 	dp->i_d.di_size = new_isize;
 	xfs_dir2_sf_check(args);
@@ -515,19 +515,19 @@ xfs_dir2_sf_addname_hard(
 	xfs_dir2_data_aoff_t	offset;		/* current offset value */
 	int			old_isize;	/* previous di_size */
 	xfs_dir2_sf_entry_t	*oldsfep;	/* entry in original dir */
-	xfs_dir2_sf_t		*oldsfp;	/* original shortform dir */
+	xfs_dir2_sf_hdr_t	*oldsfp;	/* original shortform dir */
 	xfs_dir2_sf_entry_t	*sfep;		/* entry in new dir */
-	xfs_dir2_sf_t		*sfp;		/* new shortform dir */
+	xfs_dir2_sf_hdr_t	*sfp;		/* new shortform dir */
 
 	/*
 	 * Copy the old directory to the stack buffer.
 	 */
 	dp = args->dp;
 
-	sfp = (xfs_dir2_sf_t *)dp->i_df.if_u1.if_data;
+	sfp = (xfs_dir2_sf_hdr_t *)dp->i_df.if_u1.if_data;
 	old_isize = (int)dp->i_d.di_size;
 	buf = kmem_alloc(old_isize, KM_SLEEP);
-	oldsfp = (xfs_dir2_sf_t *)buf;
+	oldsfp = (xfs_dir2_sf_hdr_t *)buf;
 	memcpy(oldsfp, sfp, old_isize);
 	/*
 	 * Loop over the old directory finding the place we're going
@@ -556,7 +556,7 @@ xfs_dir2_sf_addname_hard(
 	/*
 	 * Reset the pointer since the buffer was reallocated.
 	 */
-	sfp = (xfs_dir2_sf_t *)dp->i_df.if_u1.if_data;
+	sfp = (xfs_dir2_sf_hdr_t *)dp->i_df.if_u1.if_data;
 	/*
 	 * Copy the first part of the directory, including the header.
 	 */
@@ -570,10 +570,10 @@ xfs_dir2_sf_addname_hard(
 	xfs_dir2_sf_put_offset(sfep, offset);
 	memcpy(sfep->name, args->name, sfep->namelen);
 	xfs_dir2_sfe_put_ino(sfp, sfep, args->inumber);
-	sfp->hdr.count++;
+	sfp->count++;
 #if XFS_BIG_INUMS
 	if (args->inumber > XFS_DIR2_MAX_SHORT_INUM && !objchange)
-		sfp->hdr.i8count++;
+		sfp->i8count++;
 #endif
 	/*
 	 * If there's more left to copy, do that.
@@ -607,14 +607,14 @@ xfs_dir2_sf_addname_pick(
 	xfs_mount_t		*mp;		/* filesystem mount point */
 	xfs_dir2_data_aoff_t	offset;		/* data block offset */
 	xfs_dir2_sf_entry_t	*sfep;		/* shortform entry */
-	xfs_dir2_sf_t		*sfp;		/* shortform structure */
+	xfs_dir2_sf_hdr_t	*sfp;		/* shortform structure */
 	int			size;		/* entry's data size */
 	int			used;		/* data bytes used */
 
 	dp = args->dp;
 	mp = dp->i_mount;
 
-	sfp = (xfs_dir2_sf_t *)dp->i_df.if_u1.if_data;
+	sfp = (xfs_dir2_sf_hdr_t *)dp->i_df.if_u1.if_data;
 	size = xfs_dir2_data_entsize(args->namelen);
 	offset = XFS_DIR2_DATA_FIRST_OFFSET;
 	sfep = xfs_dir2_sf_firstentry(sfp);
@@ -624,7 +624,7 @@ xfs_dir2_sf_addname_pick(
 	 * Keep track of data offset and whether we've seen a place
 	 * to insert the new entry.
 	 */
-	for (i = 0; i < sfp->hdr.count; i++) {
+	for (i = 0; i < sfp->count; i++) {
 		if (!holefit)
 			holefit = offset + size <= xfs_dir2_sf_get_offset(sfep);
 		offset = xfs_dir2_sf_get_offset(sfep) +
@@ -636,7 +636,7 @@ xfs_dir2_sf_addname_pick(
 	 * was a data block (block form directory).
 	 */
 	used = offset +
-	       (sfp->hdr.count + 3) * (uint)sizeof(xfs_dir2_leaf_entry_t) +
+	       (sfp->count + 3) * (uint)sizeof(xfs_dir2_leaf_entry_t) +
 	       (uint)sizeof(xfs_dir2_block_tail_t);
 	/*
 	 * If it won't fit in a block form then we can't insert it,
@@ -682,17 +682,17 @@ xfs_dir2_sf_check(
 	xfs_ino_t		ino;		/* entry inode number */
 	int			offset;		/* data offset */
 	xfs_dir2_sf_entry_t	*sfep;		/* shortform dir entry */
-	xfs_dir2_sf_t		*sfp;		/* shortform structure */
+	xfs_dir2_sf_hdr_t	*sfp;		/* shortform structure */
 
 	dp = args->dp;
 
-	sfp = (xfs_dir2_sf_t *)dp->i_df.if_u1.if_data;
+	sfp = (xfs_dir2_sf_hdr_t *)dp->i_df.if_u1.if_data;
 	offset = XFS_DIR2_DATA_FIRST_OFFSET;
 	ino = xfs_dir2_sf_get_parent_ino(sfp);
 	i8count = ino > XFS_DIR2_MAX_SHORT_INUM;
 
 	for (i = 0, sfep = xfs_dir2_sf_firstentry(sfp);
-	     i < sfp->hdr.count;
+	     i < sfp->count;
 	     i++, sfep = xfs_dir2_sf_nextentry(sfp, sfep)) {
 		ASSERT(xfs_dir2_sf_get_offset(sfep) >= offset);
 		ino = xfs_dir2_sfe_get_ino(sfp, sfep);
@@ -701,11 +701,11 @@ xfs_dir2_sf_check(
 			xfs_dir2_sf_get_offset(sfep) +
 			xfs_dir2_data_entsize(sfep->namelen);
 	}
-	ASSERT(i8count == sfp->hdr.i8count);
+	ASSERT(i8count == sfp->i8count);
 	ASSERT(XFS_BIG_INUMS || i8count == 0);
 	ASSERT((char *)sfep - (char *)sfp == dp->i_d.di_size);
 	ASSERT(offset +
-	       (sfp->hdr.count + 2) * (uint)sizeof(xfs_dir2_leaf_entry_t) +
+	       (sfp->count + 2) * (uint)sizeof(xfs_dir2_leaf_entry_t) +
 	       (uint)sizeof(xfs_dir2_block_tail_t) <=
 	       dp->i_mount->m_dirblksize);
 }
@@ -721,7 +721,7 @@ xfs_dir2_sf_create(
 {
 	xfs_inode_t	*dp;		/* incore directory inode */
 	int		i8count;	/* parent inode is an 8-byte number */
-	xfs_dir2_sf_t	*sfp;		/* shortform structure */
+	xfs_dir2_sf_hdr_t *sfp;		/* shortform structure */
 	int		size;		/* directory size */
 
 	trace_xfs_dir2_sf_create(args);
@@ -751,13 +751,13 @@ xfs_dir2_sf_create(
 	/*
 	 * Fill in the header,
 	 */
-	sfp = (xfs_dir2_sf_t *)dp->i_df.if_u1.if_data;
-	sfp->hdr.i8count = i8count;
+	sfp = (xfs_dir2_sf_hdr_t *)dp->i_df.if_u1.if_data;
+	sfp->i8count = i8count;
 	/*
 	 * Now can put in the inode number, since i8count is set.
 	 */
 	xfs_dir2_sf_put_parent_ino(sfp, pino);
-	sfp->hdr.count = 0;
+	sfp->count = 0;
 	dp->i_d.di_size = size;
 	xfs_dir2_sf_check(args);
 	xfs_trans_log_inode(args->trans, dp, XFS_ILOG_CORE | XFS_ILOG_DDATA);
@@ -775,7 +775,7 @@ xfs_dir2_sf_getdents(
 	xfs_mount_t		*mp;		/* filesystem mount point */
 	xfs_dir2_dataptr_t	off;		/* current entry's offset */
 	xfs_dir2_sf_entry_t	*sfep;		/* shortform directory entry */
-	xfs_dir2_sf_t		*sfp;		/* shortform structure */
+	xfs_dir2_sf_hdr_t	*sfp;		/* shortform structure */
 	xfs_dir2_dataptr_t	dot_offset;
 	xfs_dir2_dataptr_t	dotdot_offset;
 	xfs_ino_t		ino;
@@ -794,9 +794,9 @@ xfs_dir2_sf_getdents(
 	ASSERT(dp->i_df.if_bytes == dp->i_d.di_size);
 	ASSERT(dp->i_df.if_u1.if_data != NULL);
 
-	sfp = (xfs_dir2_sf_t *)dp->i_df.if_u1.if_data;
+	sfp = (xfs_dir2_sf_hdr_t *)dp->i_df.if_u1.if_data;
 
-	ASSERT(dp->i_d.di_size >= xfs_dir2_sf_hdr_size(sfp->hdr.i8count));
+	ASSERT(dp->i_d.di_size >= xfs_dir2_sf_hdr_size(sfp->i8count));
 
 	/*
 	 * If the block number in the offset is out of range, we're done.
@@ -840,7 +840,7 @@ xfs_dir2_sf_getdents(
 	 * Loop while there are more entries and put'ing works.
 	 */
 	sfep = xfs_dir2_sf_firstentry(sfp);
-	for (i = 0; i < sfp->hdr.count; i++) {
+	for (i = 0; i < sfp->count; i++) {
 		off = xfs_dir2_db_off_to_dataptr(mp, mp->m_dirdatablk,
 				xfs_dir2_sf_get_offset(sfep));
 
@@ -875,7 +875,7 @@ xfs_dir2_sf_lookup(
 	int			i;		/* entry index */
 	int			error;
 	xfs_dir2_sf_entry_t	*sfep;		/* shortform directory entry */
-	xfs_dir2_sf_t		*sfp;		/* shortform structure */
+	xfs_dir2_sf_hdr_t	*sfp;		/* shortform structure */
 	enum xfs_dacmp		cmp;		/* comparison result */
 	xfs_dir2_sf_entry_t	*ci_sfep;	/* case-insens. entry */
 
@@ -894,8 +894,8 @@ xfs_dir2_sf_lookup(
 	}
 	ASSERT(dp->i_df.if_bytes == dp->i_d.di_size);
 	ASSERT(dp->i_df.if_u1.if_data != NULL);
-	sfp = (xfs_dir2_sf_t *)dp->i_df.if_u1.if_data;
-	ASSERT(dp->i_d.di_size >= xfs_dir2_sf_hdr_size(sfp->hdr.i8count));
+	sfp = (xfs_dir2_sf_hdr_t *)dp->i_df.if_u1.if_data;
+	ASSERT(dp->i_d.di_size >= xfs_dir2_sf_hdr_size(sfp->i8count));
 	/*
 	 * Special case for .
 	 */
@@ -917,7 +917,7 @@ xfs_dir2_sf_lookup(
 	 * Loop over all the entries trying to match ours.
 	 */
 	ci_sfep = NULL;
-	for (i = 0, sfep = xfs_dir2_sf_firstentry(sfp); i < sfp->hdr.count;
+	for (i = 0, sfep = xfs_dir2_sf_firstentry(sfp); i < sfp->count;
 				i++, sfep = xfs_dir2_sf_nextentry(sfp, sfep)) {
 		/*
 		 * Compare name and if it's an exact match, return the inode
@@ -960,7 +960,7 @@ xfs_dir2_sf_removename(
 	int			newsize;	/* new inode size */
 	int			oldsize;	/* old inode size */
 	xfs_dir2_sf_entry_t	*sfep;		/* shortform directory entry */
-	xfs_dir2_sf_t		*sfp;		/* shortform structure */
+	xfs_dir2_sf_hdr_t	*sfp;		/* shortform structure */
 
 	trace_xfs_dir2_sf_removename(args);
 
@@ -977,13 +977,13 @@ xfs_dir2_sf_removename(
 	}
 	ASSERT(dp->i_df.if_bytes == oldsize);
 	ASSERT(dp->i_df.if_u1.if_data != NULL);
-	sfp = (xfs_dir2_sf_t *)dp->i_df.if_u1.if_data;
-	ASSERT(oldsize >= xfs_dir2_sf_hdr_size(sfp->hdr.i8count));
+	sfp = (xfs_dir2_sf_hdr_t *)dp->i_df.if_u1.if_data;
+	ASSERT(oldsize >= xfs_dir2_sf_hdr_size(sfp->i8count));
 	/*
 	 * Loop over the old directory entries.
 	 * Find the one we're deleting.
 	 */
-	for (i = 0, sfep = xfs_dir2_sf_firstentry(sfp); i < sfp->hdr.count;
+	for (i = 0, sfep = xfs_dir2_sf_firstentry(sfp); i < sfp->count;
 				i++, sfep = xfs_dir2_sf_nextentry(sfp, sfep)) {
 		if (xfs_da_compname(args, sfep->name, sfep->namelen) ==
 								XFS_CMP_EXACT) {
@@ -995,7 +995,7 @@ xfs_dir2_sf_removename(
 	/*
 	 * Didn't find it.
 	 */
-	if (i == sfp->hdr.count)
+	if (i == sfp->count)
 		return XFS_ERROR(ENOENT);
 	/*
 	 * Calculate sizes.
@@ -1012,22 +1012,22 @@ xfs_dir2_sf_removename(
 	/*
 	 * Fix up the header and file size.
 	 */
-	sfp->hdr.count--;
+	sfp->count--;
 	dp->i_d.di_size = newsize;
 	/*
 	 * Reallocate, making it smaller.
 	 */
 	xfs_idata_realloc(dp, newsize - oldsize, XFS_DATA_FORK);
-	sfp = (xfs_dir2_sf_t *)dp->i_df.if_u1.if_data;
+	sfp = (xfs_dir2_sf_hdr_t *)dp->i_df.if_u1.if_data;
 #if XFS_BIG_INUMS
 	/*
 	 * Are we changing inode number size?
 	 */
 	if (args->inumber > XFS_DIR2_MAX_SHORT_INUM) {
-		if (sfp->hdr.i8count == 1)
+		if (sfp->i8count == 1)
 			xfs_dir2_sf_toino4(args);
 		else
-			sfp->hdr.i8count--;
+			sfp->i8count--;
 	}
 #endif
 	xfs_dir2_sf_check(args);
@@ -1051,7 +1051,7 @@ xfs_dir2_sf_replace(
 	int			i8elevated;	/* sf_toino8 set i8count=1 */
 #endif
 	xfs_dir2_sf_entry_t	*sfep;		/* shortform directory entry */
-	xfs_dir2_sf_t		*sfp;		/* shortform structure */
+	xfs_dir2_sf_hdr_t	*sfp;		/* shortform structure */
 
 	trace_xfs_dir2_sf_replace(args);
 
@@ -1067,19 +1067,19 @@ xfs_dir2_sf_replace(
 	}
 	ASSERT(dp->i_df.if_bytes == dp->i_d.di_size);
 	ASSERT(dp->i_df.if_u1.if_data != NULL);
-	sfp = (xfs_dir2_sf_t *)dp->i_df.if_u1.if_data;
-	ASSERT(dp->i_d.di_size >= xfs_dir2_sf_hdr_size(sfp->hdr.i8count));
+	sfp = (xfs_dir2_sf_hdr_t *)dp->i_df.if_u1.if_data;
+	ASSERT(dp->i_d.di_size >= xfs_dir2_sf_hdr_size(sfp->i8count));
 #if XFS_BIG_INUMS
 	/*
 	 * New inode number is large, and need to convert to 8-byte inodes.
 	 */
-	if (args->inumber > XFS_DIR2_MAX_SHORT_INUM && sfp->hdr.i8count == 0) {
+	if (args->inumber > XFS_DIR2_MAX_SHORT_INUM && sfp->i8count == 0) {
 		int	error;			/* error return value */
 		int	newsize;		/* new inode size */
 
 		newsize =
 			dp->i_df.if_bytes +
-			(sfp->hdr.count + 1) *
+			(sfp->count + 1) *
 			((uint)sizeof(xfs_dir2_ino8_t) -
 			 (uint)sizeof(xfs_dir2_ino4_t));
 		/*
@@ -1097,7 +1097,7 @@ xfs_dir2_sf_replace(
 		 */
 		xfs_dir2_sf_toino8(args);
 		i8elevated = 1;
-		sfp = (xfs_dir2_sf_t *)dp->i_df.if_u1.if_data;
+		sfp = (xfs_dir2_sf_hdr_t *)dp->i_df.if_u1.if_data;
 	} else
 		i8elevated = 0;
 #endif
@@ -1118,7 +1118,7 @@ xfs_dir2_sf_replace(
 	 */
 	else {
 		for (i = 0, sfep = xfs_dir2_sf_firstentry(sfp);
-				i < sfp->hdr.count;
+				i < sfp->count;
 				i++, sfep = xfs_dir2_sf_nextentry(sfp, sfep)) {
 			if (xfs_da_compname(args, sfep->name, sfep->namelen) ==
 								XFS_CMP_EXACT) {
@@ -1133,7 +1133,7 @@ xfs_dir2_sf_replace(
 		/*
 		 * Didn't find it.
 		 */
-		if (i == sfp->hdr.count) {
+		if (i == sfp->count) {
 			ASSERT(args->op_flags & XFS_DA_OP_OKNOENT);
 #if XFS_BIG_INUMS
 			if (i8elevated)
@@ -1151,10 +1151,10 @@ xfs_dir2_sf_replace(
 		/*
 		 * And the old count was one, so need to convert to small.
 		 */
-		if (sfp->hdr.i8count == 1)
+		if (sfp->i8count == 1)
 			xfs_dir2_sf_toino4(args);
 		else
-			sfp->hdr.i8count--;
+			sfp->i8count--;
 	}
 	/*
 	 * See if the old number was small, the new number is large.
@@ -1165,9 +1165,9 @@ xfs_dir2_sf_replace(
 		 * add to the i8count unless we just converted to 8-byte
 		 * inodes (which does an implied i8count = 1)
 		 */
-		ASSERT(sfp->hdr.i8count != 0);
+		ASSERT(sfp->i8count != 0);
 		if (!i8elevated)
-			sfp->hdr.i8count++;
+			sfp->i8count++;
 	}
 #endif
 	xfs_dir2_sf_check(args);
@@ -1189,10 +1189,10 @@ xfs_dir2_sf_toino4(
 	int			i;		/* entry index */
 	int			newsize;	/* new inode size */
 	xfs_dir2_sf_entry_t	*oldsfep;	/* old sf entry */
-	xfs_dir2_sf_t		*oldsfp;	/* old sf directory */
+	xfs_dir2_sf_hdr_t	*oldsfp;	/* old sf directory */
 	int			oldsize;	/* old inode size */
 	xfs_dir2_sf_entry_t	*sfep;		/* new sf entry */
-	xfs_dir2_sf_t		*sfp;		/* new sf directory */
+	xfs_dir2_sf_hdr_t	*sfp;		/* new sf directory */
 
 	trace_xfs_dir2_sf_toino4(args);
 
@@ -1205,35 +1205,35 @@ xfs_dir2_sf_toino4(
 	 */
 	oldsize = dp->i_df.if_bytes;
 	buf = kmem_alloc(oldsize, KM_SLEEP);
-	oldsfp = (xfs_dir2_sf_t *)dp->i_df.if_u1.if_data;
-	ASSERT(oldsfp->hdr.i8count == 1);
+	oldsfp = (xfs_dir2_sf_hdr_t *)dp->i_df.if_u1.if_data;
+	ASSERT(oldsfp->i8count == 1);
 	memcpy(buf, oldsfp, oldsize);
 	/*
 	 * Compute the new inode size.
 	 */
 	newsize =
 		oldsize -
-		(oldsfp->hdr.count + 1) *
+		(oldsfp->count + 1) *
 		((uint)sizeof(xfs_dir2_ino8_t) - (uint)sizeof(xfs_dir2_ino4_t));
 	xfs_idata_realloc(dp, -oldsize, XFS_DATA_FORK);
 	xfs_idata_realloc(dp, newsize, XFS_DATA_FORK);
 	/*
 	 * Reset our pointers, the data has moved.
 	 */
-	oldsfp = (xfs_dir2_sf_t *)buf;
-	sfp = (xfs_dir2_sf_t *)dp->i_df.if_u1.if_data;
+	oldsfp = (xfs_dir2_sf_hdr_t *)buf;
+	sfp = (xfs_dir2_sf_hdr_t *)dp->i_df.if_u1.if_data;
 	/*
 	 * Fill in the new header.
 	 */
-	sfp->hdr.count = oldsfp->hdr.count;
-	sfp->hdr.i8count = 0;
+	sfp->count = oldsfp->count;
+	sfp->i8count = 0;
 	xfs_dir2_sf_put_parent_ino(sfp, xfs_dir2_sf_get_parent_ino(oldsfp));
 	/*
 	 * Copy the entries field by field.
 	 */
 	for (i = 0, sfep = xfs_dir2_sf_firstentry(sfp),
 		    oldsfep = xfs_dir2_sf_firstentry(oldsfp);
-	     i < sfp->hdr.count;
+	     i < sfp->count;
 	     i++, sfep = xfs_dir2_sf_nextentry(sfp, sfep),
 		  oldsfep = xfs_dir2_sf_nextentry(oldsfp, oldsfep)) {
 		sfep->namelen = oldsfep->namelen;
@@ -1264,10 +1264,10 @@ xfs_dir2_sf_toino8(
 	int			i;		/* entry index */
 	int			newsize;	/* new inode size */
 	xfs_dir2_sf_entry_t	*oldsfep;	/* old sf entry */
-	xfs_dir2_sf_t		*oldsfp;	/* old sf directory */
+	xfs_dir2_sf_hdr_t	*oldsfp;	/* old sf directory */
 	int			oldsize;	/* old inode size */
 	xfs_dir2_sf_entry_t	*sfep;		/* new sf entry */
-	xfs_dir2_sf_t		*sfp;		/* new sf directory */
+	xfs_dir2_sf_hdr_t	*sfp;		/* new sf directory */
 
 	trace_xfs_dir2_sf_toino8(args);
 
@@ -1280,35 +1280,35 @@ xfs_dir2_sf_toino8(
 	 */
 	oldsize = dp->i_df.if_bytes;
 	buf = kmem_alloc(oldsize, KM_SLEEP);
-	oldsfp = (xfs_dir2_sf_t *)dp->i_df.if_u1.if_data;
-	ASSERT(oldsfp->hdr.i8count == 0);
+	oldsfp = (xfs_dir2_sf_hdr_t *)dp->i_df.if_u1.if_data;
+	ASSERT(oldsfp->i8count == 0);
 	memcpy(buf, oldsfp, oldsize);
 	/*
 	 * Compute the new inode size.
 	 */
 	newsize =
 		oldsize +
-		(oldsfp->hdr.count + 1) *
+		(oldsfp->count + 1) *
 		((uint)sizeof(xfs_dir2_ino8_t) - (uint)sizeof(xfs_dir2_ino4_t));
 	xfs_idata_realloc(dp, -oldsize, XFS_DATA_FORK);
 	xfs_idata_realloc(dp, newsize, XFS_DATA_FORK);
 	/*
 	 * Reset our pointers, the data has moved.
 	 */
-	oldsfp = (xfs_dir2_sf_t *)buf;
-	sfp = (xfs_dir2_sf_t *)dp->i_df.if_u1.if_data;
+	oldsfp = (xfs_dir2_sf_hdr_t *)buf;
+	sfp = (xfs_dir2_sf_hdr_t *)dp->i_df.if_u1.if_data;
 	/*
 	 * Fill in the new header.
 	 */
-	sfp->hdr.count = oldsfp->hdr.count;
-	sfp->hdr.i8count = 1;
+	sfp->count = oldsfp->count;
+	sfp->i8count = 1;
 	xfs_dir2_sf_put_parent_ino(sfp, xfs_dir2_sf_get_parent_ino(oldsfp));
 	/*
 	 * Copy the entries field by field.
 	 */
 	for (i = 0, sfep = xfs_dir2_sf_firstentry(sfp),
 		    oldsfep = xfs_dir2_sf_firstentry(oldsfp);
-	     i < sfp->hdr.count;
+	     i < sfp->count;
 	     i++, sfep = xfs_dir2_sf_nextentry(sfp, sfep),
 		  oldsfep = xfs_dir2_sf_nextentry(oldsfp, oldsfep)) {
 		sfep->namelen = oldsfep->namelen;
Index: xfs/fs/xfs/xfs_dir2_sf.h
===================================================================
--- xfs.orig/fs/xfs/xfs_dir2_sf.h	2011-06-30 09:32:00.390072451 +0200
+++ xfs/fs/xfs/xfs_dir2_sf.h	2011-06-30 09:35:55.813402859 +0200
@@ -21,8 +21,12 @@
 /*
  * Directory layout when stored internal to an inode.
  *
- * Small directories are packed as tightly as possible so as to
- * fit into the literal area of the inode.
+ * Small directories are packed as tightly as possible so as to fit into the
+ * literal area of the inode.  They consist of a single xfs_dir2_sf_hdr header
+ * followed by zero or more xfs_dir2_sf_entry structures.  Due the different
+ * inode number storage sized and the variable length name filed in
+ * the xfs_dir2_sf_entry all these structure are variable length, and the
+ * accessors in this file need to be used to iterate over them.
  */
 
 struct uio;
@@ -61,9 +65,9 @@ typedef struct { __uint8_t i[2]; } __arc
  * The parent directory has a dedicated field, and the self-pointer must
  * be calculated on the fly.
  *
- * Entries are packed toward the top as tightly as possible.  The header
- * and the elements must be memcpy'd out into a work area to get correct
- * alignment for the inode number fields.
+ * Entries are packed toward the top as tightly as possible, and thus may
+ * be misaligned.  Care needs to be taken to access them through special
+ * helpers or copy them into aligned variables first.
  */
 typedef struct xfs_dir2_sf_hdr {
 	__uint8_t		count;		/* count of entries */
@@ -78,11 +82,6 @@ typedef struct xfs_dir2_sf_entry {
 	xfs_dir2_inou_t		inumber;	/* inode number, var. offset */
 } __arch_pack xfs_dir2_sf_entry_t; 
 
-typedef struct xfs_dir2_sf {
-	xfs_dir2_sf_hdr_t	hdr;		/* shortform header */
-	xfs_dir2_sf_entry_t	list[1];	/* shortform entries */
-} xfs_dir2_sf_t;
-
 static inline int xfs_dir2_sf_hdr_size(int i8count)
 {
 	return ((uint)sizeof(xfs_dir2_sf_hdr_t) - \
@@ -102,29 +101,29 @@ xfs_dir2_sf_put_offset(xfs_dir2_sf_entry
 	INT_SET_UNALIGNED_16_BE(&(sfep)->offset.i, off);
 }
 
-static inline int xfs_dir2_sf_entsize_byname(xfs_dir2_sf_t *sfp, int len)
+static inline int xfs_dir2_sf_entsize_byname(xfs_dir2_sf_hdr_t *sfp, int len)
 {
 	return ((uint)sizeof(xfs_dir2_sf_entry_t) - 1 + (len) - \
-		((sfp)->hdr.i8count == 0) * \
+		((sfp)->i8count == 0) * \
 		((uint)sizeof(xfs_dir2_ino8_t) - (uint)sizeof(xfs_dir2_ino4_t)));
 }
 
 static inline int
-xfs_dir2_sf_entsize_byentry(xfs_dir2_sf_t *sfp, xfs_dir2_sf_entry_t *sfep)
+xfs_dir2_sf_entsize_byentry(xfs_dir2_sf_hdr_t *sfp, xfs_dir2_sf_entry_t *sfep)
 {
 	return ((uint)sizeof(xfs_dir2_sf_entry_t) - 1 + (sfep)->namelen - \
-		((sfp)->hdr.i8count == 0) * \
+		((sfp)->i8count == 0) * \
 		((uint)sizeof(xfs_dir2_ino8_t) - (uint)sizeof(xfs_dir2_ino4_t)));
 }
 
-static inline xfs_dir2_sf_entry_t *xfs_dir2_sf_firstentry(xfs_dir2_sf_t *sfp)
+static inline xfs_dir2_sf_entry_t *xfs_dir2_sf_firstentry(xfs_dir2_sf_hdr_t *sfp)
 {
 	return ((xfs_dir2_sf_entry_t *) \
-		((char *)(sfp) + xfs_dir2_sf_hdr_size(sfp->hdr.i8count)));
+		((char *)(sfp) + xfs_dir2_sf_hdr_size(sfp->i8count)));
 }
 
 static inline xfs_dir2_sf_entry_t *
-xfs_dir2_sf_nextentry(xfs_dir2_sf_t *sfp, xfs_dir2_sf_entry_t *sfep)
+xfs_dir2_sf_nextentry(xfs_dir2_sf_hdr_t *sfp, xfs_dir2_sf_entry_t *sfep)
 {
 	return ((xfs_dir2_sf_entry_t *) \
 		((char *)(sfep) + xfs_dir2_sf_entsize_byentry(sfp,sfep)));
@@ -133,8 +132,8 @@ xfs_dir2_sf_nextentry(xfs_dir2_sf_t *sfp
 /*
  * Functions.
  */
-extern xfs_ino_t xfs_dir2_sf_get_parent_ino(struct xfs_dir2_sf *sfp);
-extern xfs_ino_t xfs_dir2_sfe_get_ino(struct xfs_dir2_sf *sfp,
+extern xfs_ino_t xfs_dir2_sf_get_parent_ino(struct xfs_dir2_sf_hdr *sfp);
+extern xfs_ino_t xfs_dir2_sfe_get_ino(struct xfs_dir2_sf_hdr *sfp,
 				      struct xfs_dir2_sf_entry *sfep);
 extern int xfs_dir2_block_sfsize(struct xfs_inode *dp,
 				 struct xfs_dir2_block *block,

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 88+ messages in thread

* [PATCH 15/27] xfs: cleanup the defintion of struct xfs_dir2_sf_entry
  2011-07-01  9:43 [PATCH 00/27] patch queue for Linux 3.1, V2 Christoph Hellwig
                   ` (12 preceding siblings ...)
  2011-07-01  9:43 ` [PATCH 14/27] xfs: kill struct xfs_dir2_sf Christoph Hellwig
@ 2011-07-01  9:43 ` Christoph Hellwig
  2011-07-06  2:00   ` Dave Chinner
  2011-07-06  3:33   ` Alex Elder
  2011-07-01  9:43 ` [PATCH 16/27] xfs: avoid usage of struct xfs_dir2_block Christoph Hellwig
                   ` (12 subsequent siblings)
  26 siblings, 2 replies; 88+ messages in thread
From: Christoph Hellwig @ 2011-07-01  9:43 UTC (permalink / raw)
  To: xfs

[-- Attachment #1: xfs-cleanup-xfs_dir2_sf_entry --]
[-- Type: text/plain, Size: 3899 bytes --]

Remove the inumber member which is at a variable offset after the actual
name, and make name a real variable sized C99 array instead of the incorrect
one-sized array which confuses (not only) gcc.  Based on this clean up
the helpers to calculate the entry size.

Signed-off-by: Christoph Hellwig <hch@lst.de>

Index: xfs/fs/xfs/xfs_dir2_sf.c
===================================================================
--- xfs.orig/fs/xfs/xfs_dir2_sf.c	2011-06-30 09:37:41.120068219 +0200
+++ xfs/fs/xfs/xfs_dir2_sf.c	2011-06-30 09:38:34.303400889 +0200
@@ -371,7 +371,7 @@ xfs_dir2_sf_addname(
 	/*
 	 * Compute entry (and change in) size.
 	 */
-	add_entsize = xfs_dir2_sf_entsize_byname(sfp, args->namelen);
+	add_entsize = xfs_dir2_sf_entsize(sfp, args->namelen);
 	incr_isize = add_entsize;
 	objchange = 0;
 #if XFS_BIG_INUMS
@@ -465,7 +465,7 @@ xfs_dir2_sf_addname_easy(
 	/*
 	 * Grow the in-inode space.
 	 */
-	xfs_idata_realloc(dp, xfs_dir2_sf_entsize_byname(sfp, args->namelen),
+	xfs_idata_realloc(dp, xfs_dir2_sf_entsize(sfp, args->namelen),
 		XFS_DATA_FORK);
 	/*
 	 * Need to set up again due to realloc of the inode data.
@@ -1001,7 +1001,7 @@ xfs_dir2_sf_removename(
 	 * Calculate sizes.
 	 */
 	byteoff = (int)((char *)sfep - (char *)sfp);
-	entsize = xfs_dir2_sf_entsize_byname(sfp, args->namelen);
+	entsize = xfs_dir2_sf_entsize(sfp, args->namelen);
 	newsize = oldsize - entsize;
 	/*
 	 * Copy the part if any after the removed entry, sliding it down.
Index: xfs/fs/xfs/xfs_dir2_sf.h
===================================================================
--- xfs.orig/fs/xfs/xfs_dir2_sf.h	2011-06-30 09:35:55.813402859 +0200
+++ xfs/fs/xfs/xfs_dir2_sf.h	2011-06-30 09:38:34.303400889 +0200
@@ -76,10 +76,13 @@ typedef struct xfs_dir2_sf_hdr {
 } __arch_pack xfs_dir2_sf_hdr_t;
 
 typedef struct xfs_dir2_sf_entry {
-	__uint8_t		namelen;	/* actual name length */
+	__u8			namelen;	/* actual name length */
 	xfs_dir2_sf_off_t	offset;		/* saved offset */
-	__uint8_t		name[1];	/* name, variable size */
-	xfs_dir2_inou_t		inumber;	/* inode number, var. offset */
+	__u8			name[];		/* name, variable size */
+	/*
+	 * A xfs_dir2_ino8_t or xfs_dir2_ino4_t follows here, at a
+	 * variable offset after the name.
+	 */
 } __arch_pack xfs_dir2_sf_entry_t; 
 
 static inline int xfs_dir2_sf_hdr_size(int i8count)
@@ -101,32 +104,27 @@ xfs_dir2_sf_put_offset(xfs_dir2_sf_entry
 	INT_SET_UNALIGNED_16_BE(&(sfep)->offset.i, off);
 }
 
-static inline int xfs_dir2_sf_entsize_byname(xfs_dir2_sf_hdr_t *sfp, int len)
-{
-	return ((uint)sizeof(xfs_dir2_sf_entry_t) - 1 + (len) - \
-		((sfp)->i8count == 0) * \
-		((uint)sizeof(xfs_dir2_ino8_t) - (uint)sizeof(xfs_dir2_ino4_t)));
-}
-
 static inline int
-xfs_dir2_sf_entsize_byentry(xfs_dir2_sf_hdr_t *sfp, xfs_dir2_sf_entry_t *sfep)
+xfs_dir2_sf_entsize(xfs_dir2_sf_hdr_t *sfp, int len)
 {
-	return ((uint)sizeof(xfs_dir2_sf_entry_t) - 1 + (sfep)->namelen - \
-		((sfp)->i8count == 0) * \
-		((uint)sizeof(xfs_dir2_ino8_t) - (uint)sizeof(xfs_dir2_ino4_t)));
+	return sizeof(xfs_dir2_sf_entry_t) +	/* namelen + offset */
+		len +				/* name */
+		(sfp->i8count ?			/* ino */
+		 sizeof(xfs_dir2_ino8_t) :
+		 sizeof(xfs_dir2_ino4_t));
 }
 
 static inline xfs_dir2_sf_entry_t *xfs_dir2_sf_firstentry(xfs_dir2_sf_hdr_t *sfp)
 {
-	return ((xfs_dir2_sf_entry_t *) \
-		((char *)(sfp) + xfs_dir2_sf_hdr_size(sfp->i8count)));
+	return (xfs_dir2_sf_entry_t *)
+		((char *)sfp + xfs_dir2_sf_hdr_size(sfp->i8count));
 }
 
 static inline xfs_dir2_sf_entry_t *
 xfs_dir2_sf_nextentry(xfs_dir2_sf_hdr_t *sfp, xfs_dir2_sf_entry_t *sfep)
 {
-	return ((xfs_dir2_sf_entry_t *) \
-		((char *)(sfep) + xfs_dir2_sf_entsize_byentry(sfp,sfep)));
+	return (xfs_dir2_sf_entry_t *)
+		((char *)sfep + xfs_dir2_sf_entsize(sfp, sfep->namelen));
 }
 
 /*

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 88+ messages in thread

* [PATCH 16/27] xfs: avoid usage of struct xfs_dir2_block
  2011-07-01  9:43 [PATCH 00/27] patch queue for Linux 3.1, V2 Christoph Hellwig
                   ` (13 preceding siblings ...)
  2011-07-01  9:43 ` [PATCH 15/27] xfs: cleanup the defintion of struct xfs_dir2_sf_entry Christoph Hellwig
@ 2011-07-01  9:43 ` Christoph Hellwig
  2011-07-06  2:19   ` Dave Chinner
  2011-07-06  3:36   ` Alex Elder
  2011-07-01  9:43 ` [PATCH 17/27] xfs: kill " Christoph Hellwig
                   ` (11 subsequent siblings)
  26 siblings, 2 replies; 88+ messages in thread
From: Christoph Hellwig @ 2011-07-01  9:43 UTC (permalink / raw)
  To: xfs

[-- Attachment #1: xfs-avoid-xfs_dir2_block_t --]
[-- Type: text/plain, Size: 27184 bytes --]

In most places we can simply pass around and use the struct xfs_dir2_data_hdr,
which is the first and most important member of struct xfs_dir2_block instead
of the full structure.

Signed-off-by: Christoph Hellwig <hch@lst.de>

Index: xfs/fs/xfs/xfs_dir2_block.c
===================================================================
--- xfs.orig/fs/xfs/xfs_dir2_block.c	2011-06-30 09:35:55.810069526 +0200
+++ xfs/fs/xfs/xfs_dir2_block.c	2011-06-30 09:38:36.586734196 +0200
@@ -67,7 +67,7 @@ xfs_dir2_block_addname(
 	xfs_da_args_t		*args)		/* directory op arguments */
 {
 	xfs_dir2_data_free_t	*bf;		/* bestfree table in block */
-	xfs_dir2_block_t	*block;		/* directory block structure */
+	xfs_dir2_data_hdr_t	*hdr;		/* block header */
 	xfs_dir2_leaf_entry_t	*blp;		/* block leaf entries */
 	xfs_dabuf_t		*bp;		/* buffer for block */
 	xfs_dir2_block_tail_t	*btp;		/* block tail */
@@ -105,13 +105,13 @@ xfs_dir2_block_addname(
 		return error;
 	}
 	ASSERT(bp != NULL);
-	block = bp->data;
+	hdr = bp->data;
 	/*
 	 * Check the magic number, corrupted if wrong.
 	 */
-	if (unlikely(be32_to_cpu(block->hdr.magic) != XFS_DIR2_BLOCK_MAGIC)) {
+	if (unlikely(hdr->magic != cpu_to_be32(XFS_DIR2_BLOCK_MAGIC))) {
 		XFS_CORRUPTION_ERROR("xfs_dir2_block_addname",
-				     XFS_ERRLEVEL_LOW, mp, block);
+				     XFS_ERRLEVEL_LOW, mp, hdr);
 		xfs_da_brelse(tp, bp);
 		return XFS_ERROR(EFSCORRUPTED);
 	}
@@ -119,8 +119,8 @@ xfs_dir2_block_addname(
 	/*
 	 * Set up pointers to parts of the block.
 	 */
-	bf = block->hdr.bestfree;
-	btp = xfs_dir2_block_tail_p(mp, block);
+	bf = hdr->bestfree;
+	btp = xfs_dir2_block_tail_p(mp, hdr);
 	blp = xfs_dir2_block_leaf_p(btp);
 	/*
 	 * No stale entries?  Need space for entry and new leaf.
@@ -133,7 +133,7 @@ xfs_dir2_block_addname(
 		/*
 		 * Data object just before the first leaf entry.
 		 */
-		enddup = (xfs_dir2_data_unused_t *)((char *)block + be16_to_cpu(*tagp));
+		enddup = (xfs_dir2_data_unused_t *)((char *)hdr + be16_to_cpu(*tagp));
 		/*
 		 * If it's not free then can't do this add without cleaning up:
 		 * the space before the first leaf entry needs to be free so it
@@ -146,7 +146,7 @@ xfs_dir2_block_addname(
 		 */
 		else {
 			dup = (xfs_dir2_data_unused_t *)
-			      ((char *)block + be16_to_cpu(bf[0].offset));
+			      ((char *)hdr + be16_to_cpu(bf[0].offset));
 			if (dup == enddup) {
 				/*
 				 * It is the biggest freespace, is it too small
@@ -159,7 +159,7 @@ xfs_dir2_block_addname(
 					 */
 					if (be16_to_cpu(bf[1].length) >= len)
 						dup = (xfs_dir2_data_unused_t *)
-						      ((char *)block +
+						      ((char *)hdr +
 						       be16_to_cpu(bf[1].offset));
 					else
 						dup = NULL;
@@ -182,7 +182,7 @@ xfs_dir2_block_addname(
 	 */
 	else if (be16_to_cpu(bf[0].length) >= len) {
 		dup = (xfs_dir2_data_unused_t *)
-		      ((char *)block + be16_to_cpu(bf[0].offset));
+		      ((char *)hdr + be16_to_cpu(bf[0].offset));
 		compact = 0;
 	}
 	/*
@@ -196,7 +196,7 @@ xfs_dir2_block_addname(
 		/*
 		 * Data object just before the first leaf entry.
 		 */
-		dup = (xfs_dir2_data_unused_t *)((char *)block + be16_to_cpu(*tagp));
+		dup = (xfs_dir2_data_unused_t *)((char *)hdr + be16_to_cpu(*tagp));
 		/*
 		 * If it's not free then the data will go where the
 		 * leaf data starts now, if it works at all.
@@ -272,7 +272,7 @@ xfs_dir2_block_addname(
 		lfloghigh -= be32_to_cpu(btp->stale) - 1;
 		be32_add_cpu(&btp->count, -(be32_to_cpu(btp->stale) - 1));
 		xfs_dir2_data_make_free(tp, bp,
-			(xfs_dir2_data_aoff_t)((char *)blp - (char *)block),
+			(xfs_dir2_data_aoff_t)((char *)blp - (char *)hdr),
 			(xfs_dir2_data_aoff_t)((be32_to_cpu(btp->stale) - 1) * sizeof(*blp)),
 			&needlog, &needscan);
 		blp += be32_to_cpu(btp->stale) - 1;
@@ -282,7 +282,7 @@ xfs_dir2_block_addname(
 		 * This needs to happen before the next call to use_free.
 		 */
 		if (needscan) {
-			xfs_dir2_data_freescan(mp, (xfs_dir2_data_t *)block, &needlog);
+			xfs_dir2_data_freescan(mp, (xfs_dir2_data_t *)hdr, &needlog);
 			needscan = 0;
 		}
 	}
@@ -318,7 +318,7 @@ xfs_dir2_block_addname(
 		 */
 		xfs_dir2_data_use_free(tp, bp, enddup,
 			(xfs_dir2_data_aoff_t)
-			((char *)enddup - (char *)block + be16_to_cpu(enddup->length) -
+			((char *)enddup - (char *)hdr + be16_to_cpu(enddup->length) -
 			 sizeof(*blp)),
 			(xfs_dir2_data_aoff_t)sizeof(*blp),
 			&needlog, &needscan);
@@ -331,7 +331,7 @@ xfs_dir2_block_addname(
 		 * This needs to happen before the next call to use_free.
 		 */
 		if (needscan) {
-			xfs_dir2_data_freescan(mp, (xfs_dir2_data_t *)block,
+			xfs_dir2_data_freescan(mp, (xfs_dir2_data_t *)hdr,
 				&needlog);
 			needscan = 0;
 		}
@@ -397,13 +397,13 @@ xfs_dir2_block_addname(
 	 */
 	blp[mid].hashval = cpu_to_be32(args->hashval);
 	blp[mid].address = cpu_to_be32(xfs_dir2_byte_to_dataptr(mp,
-				(char *)dep - (char *)block));
+				(char *)dep - (char *)hdr));
 	xfs_dir2_block_log_leaf(tp, bp, lfloglow, lfloghigh);
 	/*
 	 * Mark space for the data entry used.
 	 */
 	xfs_dir2_data_use_free(tp, bp, dup,
-		(xfs_dir2_data_aoff_t)((char *)dup - (char *)block),
+		(xfs_dir2_data_aoff_t)((char *)dup - (char *)hdr),
 		(xfs_dir2_data_aoff_t)len, &needlog, &needscan);
 	/*
 	 * Create the new data entry.
@@ -412,12 +412,12 @@ xfs_dir2_block_addname(
 	dep->namelen = args->namelen;
 	memcpy(dep->name, args->name, args->namelen);
 	tagp = xfs_dir2_data_entry_tag_p(dep);
-	*tagp = cpu_to_be16((char *)dep - (char *)block);
+	*tagp = cpu_to_be16((char *)dep - (char *)hdr);
 	/*
 	 * Clean up the bestfree array and log the header, tail, and entry.
 	 */
 	if (needscan)
-		xfs_dir2_data_freescan(mp, (xfs_dir2_data_t *)block, &needlog);
+		xfs_dir2_data_freescan(mp, (xfs_dir2_data_t *)hdr, &needlog);
 	if (needlog)
 		xfs_dir2_data_log_header(tp, bp);
 	xfs_dir2_block_log_tail(tp, bp);
@@ -438,6 +438,7 @@ xfs_dir2_block_getdents(
 	filldir_t		filldir)
 {
 	xfs_dir2_block_t	*block;		/* directory block structure */
+	xfs_dir2_data_hdr_t	*hdr;		/* block header */
 	xfs_dabuf_t		*bp;		/* buffer for block */
 	xfs_dir2_block_tail_t	*btp;		/* block tail */
 	xfs_dir2_data_entry_t	*dep;		/* block data entry */
@@ -471,11 +472,12 @@ xfs_dir2_block_getdents(
 	 */
 	wantoff = xfs_dir2_dataptr_to_off(mp, *offset);
 	block = bp->data;
+	hdr = &block->hdr;
 	xfs_dir2_data_check(dp, bp);
 	/*
 	 * Set up values for the loop.
 	 */
-	btp = xfs_dir2_block_tail_p(mp, block);
+	btp = xfs_dir2_block_tail_p(mp, hdr);
 	ptr = (char *)block->u;
 	endptr = (char *)xfs_dir2_block_leaf_p(btp);
 
@@ -502,11 +504,11 @@ xfs_dir2_block_getdents(
 		/*
 		 * The entry is before the desired starting point, skip it.
 		 */
-		if ((char *)dep - (char *)block < wantoff)
+		if ((char *)dep - (char *)hdr < wantoff)
 			continue;
 
 		cook = xfs_dir2_db_off_to_dataptr(mp, mp->m_dirdatablk,
-					    (char *)dep - (char *)block);
+					    (char *)dep - (char *)hdr);
 
 		/*
 		 * If it didn't fit, set the final offset to here & return.
@@ -540,17 +542,14 @@ xfs_dir2_block_log_leaf(
 	int			first,		/* index of first logged leaf */
 	int			last)		/* index of last logged leaf */
 {
-	xfs_dir2_block_t	*block;		/* directory block structure */
-	xfs_dir2_leaf_entry_t	*blp;		/* block leaf entries */
-	xfs_dir2_block_tail_t	*btp;		/* block tail */
-	xfs_mount_t		*mp;		/* filesystem mount point */
+	xfs_dir2_data_hdr_t	*hdr = bp->data;
+	xfs_dir2_leaf_entry_t	*blp;
+	xfs_dir2_block_tail_t	*btp;
 
-	mp = tp->t_mountp;
-	block = bp->data;
-	btp = xfs_dir2_block_tail_p(mp, block);
+	btp = xfs_dir2_block_tail_p(tp->t_mountp, hdr);
 	blp = xfs_dir2_block_leaf_p(btp);
-	xfs_da_log_buf(tp, bp, (uint)((char *)&blp[first] - (char *)block),
-		(uint)((char *)&blp[last + 1] - (char *)block - 1));
+	xfs_da_log_buf(tp, bp, (uint)((char *)&blp[first] - (char *)hdr),
+		(uint)((char *)&blp[last + 1] - (char *)hdr - 1));
 }
 
 /*
@@ -561,15 +560,12 @@ xfs_dir2_block_log_tail(
 	xfs_trans_t		*tp,		/* transaction structure */
 	xfs_dabuf_t		*bp)		/* block buffer */
 {
-	xfs_dir2_block_t	*block;		/* directory block structure */
-	xfs_dir2_block_tail_t	*btp;		/* block tail */
-	xfs_mount_t		*mp;		/* filesystem mount point */
+	xfs_dir2_data_hdr_t	*hdr = bp->data;
+	xfs_dir2_block_tail_t	*btp;
 
-	mp = tp->t_mountp;
-	block = bp->data;
-	btp = xfs_dir2_block_tail_p(mp, block);
-	xfs_da_log_buf(tp, bp, (uint)((char *)btp - (char *)block),
-		(uint)((char *)(btp + 1) - (char *)block - 1));
+	btp = xfs_dir2_block_tail_p(tp->t_mountp, hdr);
+	xfs_da_log_buf(tp, bp, (uint)((char *)btp - (char *)hdr),
+		(uint)((char *)(btp + 1) - (char *)hdr - 1));
 }
 
 /*
@@ -580,7 +576,7 @@ int						/* error */
 xfs_dir2_block_lookup(
 	xfs_da_args_t		*args)		/* dir lookup arguments */
 {
-	xfs_dir2_block_t	*block;		/* block structure */
+	xfs_dir2_data_hdr_t	*hdr;		/* block header */
 	xfs_dir2_leaf_entry_t	*blp;		/* block leaf entries */
 	xfs_dabuf_t		*bp;		/* block buffer */
 	xfs_dir2_block_tail_t	*btp;		/* block tail */
@@ -600,14 +596,14 @@ xfs_dir2_block_lookup(
 		return error;
 	dp = args->dp;
 	mp = dp->i_mount;
-	block = bp->data;
+	hdr = bp->data;
 	xfs_dir2_data_check(dp, bp);
-	btp = xfs_dir2_block_tail_p(mp, block);
+	btp = xfs_dir2_block_tail_p(mp, hdr);
 	blp = xfs_dir2_block_leaf_p(btp);
 	/*
 	 * Get the offset from the leaf entry, to point to the data.
 	 */
-	dep = (xfs_dir2_data_entry_t *)((char *)block +
+	dep = (xfs_dir2_data_entry_t *)((char *)hdr +
 		xfs_dir2_dataptr_to_off(mp, be32_to_cpu(blp[ent].address)));
 	/*
 	 * Fill in inode number, CI name if appropriate, release the block.
@@ -628,7 +624,7 @@ xfs_dir2_block_lookup_int(
 	int			*entno)		/* returned entry number */
 {
 	xfs_dir2_dataptr_t	addr;		/* data entry address */
-	xfs_dir2_block_t	*block;		/* block structure */
+	xfs_dir2_data_hdr_t	*hdr;		/* block header */
 	xfs_dir2_leaf_entry_t	*blp;		/* block leaf entries */
 	xfs_dabuf_t		*bp;		/* block buffer */
 	xfs_dir2_block_tail_t	*btp;		/* block tail */
@@ -654,9 +650,9 @@ xfs_dir2_block_lookup_int(
 		return error;
 	}
 	ASSERT(bp != NULL);
-	block = bp->data;
+	hdr = bp->data;
 	xfs_dir2_data_check(dp, bp);
-	btp = xfs_dir2_block_tail_p(mp, block);
+	btp = xfs_dir2_block_tail_p(mp, hdr);
 	blp = xfs_dir2_block_leaf_p(btp);
 	/*
 	 * Loop doing a binary search for our hash value.
@@ -694,7 +690,7 @@ xfs_dir2_block_lookup_int(
 		 * Get pointer to the entry from the leaf.
 		 */
 		dep = (xfs_dir2_data_entry_t *)
-			((char *)block + xfs_dir2_dataptr_to_off(mp, addr));
+			((char *)hdr + xfs_dir2_dataptr_to_off(mp, addr));
 		/*
 		 * Compare name and if it's an exact match, return the index
 		 * and buffer. If it's the first case-insensitive match, store
@@ -733,7 +729,7 @@ int						/* error */
 xfs_dir2_block_removename(
 	xfs_da_args_t		*args)		/* directory operation args */
 {
-	xfs_dir2_block_t	*block;		/* block structure */
+	xfs_dir2_data_hdr_t	*hdr;		/* block header */
 	xfs_dir2_leaf_entry_t	*blp;		/* block leaf pointer */
 	xfs_dabuf_t		*bp;		/* block buffer */
 	xfs_dir2_block_tail_t	*btp;		/* block tail */
@@ -760,20 +756,20 @@ xfs_dir2_block_removename(
 	dp = args->dp;
 	tp = args->trans;
 	mp = dp->i_mount;
-	block = bp->data;
-	btp = xfs_dir2_block_tail_p(mp, block);
+	hdr = bp->data;
+	btp = xfs_dir2_block_tail_p(mp, hdr);
 	blp = xfs_dir2_block_leaf_p(btp);
 	/*
 	 * Point to the data entry using the leaf entry.
 	 */
 	dep = (xfs_dir2_data_entry_t *)
-	      ((char *)block + xfs_dir2_dataptr_to_off(mp, be32_to_cpu(blp[ent].address)));
+	      ((char *)hdr + xfs_dir2_dataptr_to_off(mp, be32_to_cpu(blp[ent].address)));
 	/*
 	 * Mark the data entry's space free.
 	 */
 	needlog = needscan = 0;
 	xfs_dir2_data_make_free(tp, bp,
-		(xfs_dir2_data_aoff_t)((char *)dep - (char *)block),
+		(xfs_dir2_data_aoff_t)((char *)dep - (char *)hdr),
 		xfs_dir2_data_entsize(dep->namelen), &needlog, &needscan);
 	/*
 	 * Fix up the block tail.
@@ -789,15 +785,15 @@ xfs_dir2_block_removename(
 	 * Fix up bestfree, log the header if necessary.
 	 */
 	if (needscan)
-		xfs_dir2_data_freescan(mp, (xfs_dir2_data_t *)block, &needlog);
+		xfs_dir2_data_freescan(mp, (xfs_dir2_data_t *)hdr, &needlog);
 	if (needlog)
 		xfs_dir2_data_log_header(tp, bp);
 	xfs_dir2_data_check(dp, bp);
 	/*
 	 * See if the size as a shortform is good enough.
 	 */
-	if ((size = xfs_dir2_block_sfsize(dp, block, &sfh)) >
-	    XFS_IFORK_DSIZE(dp)) {
+	size = xfs_dir2_block_sfsize(dp, hdr, &sfh);
+	if (size > XFS_IFORK_DSIZE(dp)) {
 		xfs_da_buf_done(bp);
 		return 0;
 	}
@@ -815,7 +811,7 @@ int						/* error */
 xfs_dir2_block_replace(
 	xfs_da_args_t		*args)		/* directory operation args */
 {
-	xfs_dir2_block_t	*block;		/* block structure */
+	xfs_dir2_data_hdr_t	*hdr;		/* block header */
 	xfs_dir2_leaf_entry_t	*blp;		/* block leaf entries */
 	xfs_dabuf_t		*bp;		/* block buffer */
 	xfs_dir2_block_tail_t	*btp;		/* block tail */
@@ -836,14 +832,14 @@ xfs_dir2_block_replace(
 	}
 	dp = args->dp;
 	mp = dp->i_mount;
-	block = bp->data;
-	btp = xfs_dir2_block_tail_p(mp, block);
+	hdr = bp->data;
+	btp = xfs_dir2_block_tail_p(mp, hdr);
 	blp = xfs_dir2_block_leaf_p(btp);
 	/*
 	 * Point to the data entry we need to change.
 	 */
 	dep = (xfs_dir2_data_entry_t *)
-	      ((char *)block + xfs_dir2_dataptr_to_off(mp, be32_to_cpu(blp[ent].address)));
+	      ((char *)hdr + xfs_dir2_dataptr_to_off(mp, be32_to_cpu(blp[ent].address)));
 	ASSERT(be64_to_cpu(dep->inumber) != args->inumber);
 	/*
 	 * Change the inode number to the new value.
@@ -882,7 +878,7 @@ xfs_dir2_leaf_to_block(
 	xfs_dabuf_t		*dbp)		/* data buffer */
 {
 	__be16			*bestsp;	/* leaf bests table */
-	xfs_dir2_block_t	*block;		/* block structure */
+	xfs_dir2_data_hdr_t	*hdr;		/* block header */
 	xfs_dir2_block_tail_t	*btp;		/* block tail */
 	xfs_inode_t		*dp;		/* incore directory inode */
 	xfs_dir2_data_unused_t	*dup;		/* unused data entry */
@@ -917,7 +913,7 @@ xfs_dir2_leaf_to_block(
 	while (dp->i_d.di_size > mp->m_dirblksize) {
 		bestsp = xfs_dir2_leaf_bests_p(ltp);
 		if (be16_to_cpu(bestsp[be32_to_cpu(ltp->bestcount) - 1]) ==
-		    mp->m_dirblksize - (uint)sizeof(block->hdr)) {
+		    mp->m_dirblksize - (uint)sizeof(*hdr)) {
 			if ((error =
 			    xfs_dir2_leaf_trim_data(args, lbp,
 				    (xfs_dir2_db_t)(be32_to_cpu(ltp->bestcount) - 1))))
@@ -935,18 +931,18 @@ xfs_dir2_leaf_to_block(
 		    XFS_DATA_FORK))) {
 		goto out;
 	}
-	block = dbp->data;
-	ASSERT(be32_to_cpu(block->hdr.magic) == XFS_DIR2_DATA_MAGIC);
+	hdr = dbp->data;
+	ASSERT(be32_to_cpu(hdr->magic) == XFS_DIR2_DATA_MAGIC);
 	/*
 	 * Size of the "leaf" area in the block.
 	 */
-	size = (uint)sizeof(block->tail) +
+	size = (uint)sizeof(xfs_dir2_block_tail_t) +
 	       (uint)sizeof(*lep) * (be16_to_cpu(leaf->hdr.count) - be16_to_cpu(leaf->hdr.stale));
 	/*
 	 * Look at the last data entry.
 	 */
-	tagp = (__be16 *)((char *)block + mp->m_dirblksize) - 1;
-	dup = (xfs_dir2_data_unused_t *)((char *)block + be16_to_cpu(*tagp));
+	tagp = (__be16 *)((char *)hdr + mp->m_dirblksize) - 1;
+	dup = (xfs_dir2_data_unused_t *)((char *)hdr + be16_to_cpu(*tagp));
 	/*
 	 * If it's not free or is too short we can't do it.
 	 */
@@ -958,7 +954,7 @@ xfs_dir2_leaf_to_block(
 	/*
 	 * Start converting it to block form.
 	 */
-	block->hdr.magic = cpu_to_be32(XFS_DIR2_BLOCK_MAGIC);
+	hdr->magic = cpu_to_be32(XFS_DIR2_BLOCK_MAGIC);
 	needlog = 1;
 	needscan = 0;
 	/*
@@ -969,7 +965,7 @@ xfs_dir2_leaf_to_block(
 	/*
 	 * Initialize the block tail.
 	 */
-	btp = xfs_dir2_block_tail_p(mp, block);
+	btp = xfs_dir2_block_tail_p(mp, hdr);
 	btp->count = cpu_to_be32(be16_to_cpu(leaf->hdr.count) - be16_to_cpu(leaf->hdr.stale));
 	btp->stale = 0;
 	xfs_dir2_block_log_tail(tp, dbp);
@@ -988,7 +984,7 @@ xfs_dir2_leaf_to_block(
 	 * Scan the bestfree if we need it and log the data block header.
 	 */
 	if (needscan)
-		xfs_dir2_data_freescan(mp, (xfs_dir2_data_t *)block, &needlog);
+		xfs_dir2_data_freescan(mp, (xfs_dir2_data_t *)hdr, &needlog);
 	if (needlog)
 		xfs_dir2_data_log_header(tp, dbp);
 	/*
@@ -1002,8 +998,8 @@ xfs_dir2_leaf_to_block(
 	/*
 	 * Now see if the resulting block can be shrunken to shortform.
 	 */
-	if ((size = xfs_dir2_block_sfsize(dp, block, &sfh)) >
-	    XFS_IFORK_DSIZE(dp)) {
+	size = xfs_dir2_block_sfsize(dp, hdr, &sfh);
+	if (size > XFS_IFORK_DSIZE(dp)) {
 		error = 0;
 		goto out;
 	}
@@ -1025,6 +1021,7 @@ xfs_dir2_sf_to_block(
 {
 	xfs_dir2_db_t		blkno;		/* dir-relative block # (0) */
 	xfs_dir2_block_t	*block;		/* block structure */
+	xfs_dir2_data_hdr_t	*hdr;		/* block header */
 	xfs_dir2_leaf_entry_t	*blp;		/* block leaf entries */
 	xfs_dabuf_t		*bp;		/* block buffer */
 	xfs_dir2_block_tail_t	*btp;		/* block tail pointer */
@@ -1095,7 +1092,8 @@ xfs_dir2_sf_to_block(
 		return error;
 	}
 	block = bp->data;
-	block->hdr.magic = cpu_to_be32(XFS_DIR2_BLOCK_MAGIC);
+	hdr = &block->hdr;
+	hdr->magic = cpu_to_be32(XFS_DIR2_BLOCK_MAGIC);
 	/*
 	 * Compute size of block "tail" area.
 	 */
@@ -1113,45 +1111,45 @@ xfs_dir2_sf_to_block(
 	/*
 	 * Fill in the tail.
 	 */
-	btp = xfs_dir2_block_tail_p(mp, block);
+	btp = xfs_dir2_block_tail_p(mp, hdr);
 	btp->count = cpu_to_be32(sfp->count + 2);	/* ., .. */
 	btp->stale = 0;
 	blp = xfs_dir2_block_leaf_p(btp);
-	endoffset = (uint)((char *)blp - (char *)block);
+	endoffset = (uint)((char *)blp - (char *)hdr);
 	/*
 	 * Remove the freespace, we'll manage it.
 	 */
 	xfs_dir2_data_use_free(tp, bp, dup,
-		(xfs_dir2_data_aoff_t)((char *)dup - (char *)block),
+		(xfs_dir2_data_aoff_t)((char *)dup - (char *)hdr),
 		be16_to_cpu(dup->length), &needlog, &needscan);
 	/*
 	 * Create entry for .
 	 */
 	dep = (xfs_dir2_data_entry_t *)
-	      ((char *)block + XFS_DIR2_DATA_DOT_OFFSET);
+	      ((char *)hdr + XFS_DIR2_DATA_DOT_OFFSET);
 	dep->inumber = cpu_to_be64(dp->i_ino);
 	dep->namelen = 1;
 	dep->name[0] = '.';
 	tagp = xfs_dir2_data_entry_tag_p(dep);
-	*tagp = cpu_to_be16((char *)dep - (char *)block);
+	*tagp = cpu_to_be16((char *)dep - (char *)hdr);
 	xfs_dir2_data_log_entry(tp, bp, dep);
 	blp[0].hashval = cpu_to_be32(xfs_dir_hash_dot);
 	blp[0].address = cpu_to_be32(xfs_dir2_byte_to_dataptr(mp,
-				(char *)dep - (char *)block));
+				(char *)dep - (char *)hdr));
 	/*
 	 * Create entry for ..
 	 */
 	dep = (xfs_dir2_data_entry_t *)
-		((char *)block + XFS_DIR2_DATA_DOTDOT_OFFSET);
+		((char *)hdr + XFS_DIR2_DATA_DOTDOT_OFFSET);
 	dep->inumber = cpu_to_be64(xfs_dir2_sf_get_parent_ino(sfp));
 	dep->namelen = 2;
 	dep->name[0] = dep->name[1] = '.';
 	tagp = xfs_dir2_data_entry_tag_p(dep);
-	*tagp = cpu_to_be16((char *)dep - (char *)block);
+	*tagp = cpu_to_be16((char *)dep - (char *)hdr);
 	xfs_dir2_data_log_entry(tp, bp, dep);
 	blp[1].hashval = cpu_to_be32(xfs_dir_hash_dotdot);
 	blp[1].address = cpu_to_be32(xfs_dir2_byte_to_dataptr(mp,
-				(char *)dep - (char *)block));
+				(char *)dep - (char *)hdr));
 	offset = XFS_DIR2_DATA_FIRST_OFFSET;
 	/*
 	 * Loop over existing entries, stuff them in.
@@ -1177,14 +1175,13 @@ xfs_dir2_sf_to_block(
 		 * There should be a hole here, make one.
 		 */
 		if (offset < newoffset) {
-			dup = (xfs_dir2_data_unused_t *)
-			      ((char *)block + offset);
+			dup = (xfs_dir2_data_unused_t *)((char *)hdr + offset);
 			dup->freetag = cpu_to_be16(XFS_DIR2_DATA_FREE_TAG);
 			dup->length = cpu_to_be16(newoffset - offset);
 			*xfs_dir2_data_unused_tag_p(dup) = cpu_to_be16(
-				((char *)dup - (char *)block));
+				((char *)dup - (char *)hdr));
 			xfs_dir2_data_log_unused(tp, bp, dup);
-			(void)xfs_dir2_data_freeinsert((xfs_dir2_data_t *)block,
+			(void)xfs_dir2_data_freeinsert((xfs_dir2_data_t *)hdr,
 				dup, &dummy);
 			offset += be16_to_cpu(dup->length);
 			continue;
@@ -1192,20 +1189,20 @@ xfs_dir2_sf_to_block(
 		/*
 		 * Copy a real entry.
 		 */
-		dep = (xfs_dir2_data_entry_t *)((char *)block + newoffset);
+		dep = (xfs_dir2_data_entry_t *)((char *)hdr + newoffset);
 		dep->inumber = cpu_to_be64(xfs_dir2_sfe_get_ino(sfp, sfep));
 		dep->namelen = sfep->namelen;
 		memcpy(dep->name, sfep->name, dep->namelen);
 		tagp = xfs_dir2_data_entry_tag_p(dep);
-		*tagp = cpu_to_be16((char *)dep - (char *)block);
+		*tagp = cpu_to_be16((char *)dep - (char *)hdr);
 		xfs_dir2_data_log_entry(tp, bp, dep);
 		name.name = sfep->name;
 		name.len = sfep->namelen;
 		blp[2 + i].hashval = cpu_to_be32(mp->m_dirnameops->
 							hashname(&name));
 		blp[2 + i].address = cpu_to_be32(xfs_dir2_byte_to_dataptr(mp,
-						 (char *)dep - (char *)block));
-		offset = (int)((char *)(tagp + 1) - (char *)block);
+						 (char *)dep - (char *)hdr));
+		offset = (int)((char *)(tagp + 1) - (char *)hdr);
 		if (++i == sfp->count)
 			sfep = NULL;
 		else
Index: xfs/fs/xfs/xfs_dir2_data.c
===================================================================
--- xfs.orig/fs/xfs/xfs_dir2_data.c	2011-06-29 19:45:24.326965102 +0200
+++ xfs/fs/xfs/xfs_dir2_data.c	2011-06-30 09:38:36.586734196 +0200
@@ -72,7 +72,7 @@ xfs_dir2_data_check(
 	bf = d->hdr.bestfree;
 	p = (char *)d->u;
 	if (be32_to_cpu(d->hdr.magic) == XFS_DIR2_BLOCK_MAGIC) {
-		btp = xfs_dir2_block_tail_p(mp, (xfs_dir2_block_t *)d);
+		btp = xfs_dir2_block_tail_p(mp, &d->hdr);
 		lep = xfs_dir2_block_leaf_p(btp);
 		endp = (char *)lep;
 	} else
@@ -348,7 +348,7 @@ xfs_dir2_data_freescan(
 	 */
 	p = (char *)d->u;
 	if (be32_to_cpu(d->hdr.magic) == XFS_DIR2_BLOCK_MAGIC) {
-		btp = xfs_dir2_block_tail_p(mp, (xfs_dir2_block_t *)d);
+		btp = xfs_dir2_block_tail_p(mp, &d->hdr);
 		endp = (char *)xfs_dir2_block_leaf_p(btp);
 	} else
 		endp = (char *)d + mp->m_dirblksize;
@@ -537,7 +537,7 @@ xfs_dir2_data_make_free(
 		xfs_dir2_block_tail_t	*btp;	/* block tail */
 
 		ASSERT(be32_to_cpu(d->hdr.magic) == XFS_DIR2_BLOCK_MAGIC);
-		btp = xfs_dir2_block_tail_p(mp, (xfs_dir2_block_t *)d);
+		btp = xfs_dir2_block_tail_p(mp, &d->hdr);
 		endptr = (char *)xfs_dir2_block_leaf_p(btp);
 	}
 	/*
Index: xfs/fs/xfs/xfs_dir2_leaf.c
===================================================================
--- xfs.orig/fs/xfs/xfs_dir2_leaf.c	2011-06-30 09:29:24.446740960 +0200
+++ xfs/fs/xfs/xfs_dir2_leaf.c	2011-06-30 09:38:36.590067529 +0200
@@ -64,7 +64,7 @@ xfs_dir2_block_to_leaf(
 {
 	__be16			*bestsp;	/* leaf's bestsp entries */
 	xfs_dablk_t		blkno;		/* leaf block's bno */
-	xfs_dir2_block_t	*block;		/* block structure */
+	xfs_dir2_data_hdr_t	*hdr;		/* block header */
 	xfs_dir2_leaf_entry_t	*blp;		/* block's leaf entries */
 	xfs_dir2_block_tail_t	*btp;		/* block's tail */
 	xfs_inode_t		*dp;		/* incore directory inode */
@@ -101,9 +101,9 @@ xfs_dir2_block_to_leaf(
 	}
 	ASSERT(lbp != NULL);
 	leaf = lbp->data;
-	block = dbp->data;
+	hdr = dbp->data;
 	xfs_dir2_data_check(dp, dbp);
-	btp = xfs_dir2_block_tail_p(mp, block);
+	btp = xfs_dir2_block_tail_p(mp, hdr);
 	blp = xfs_dir2_block_leaf_p(btp);
 	/*
 	 * Set the counts in the leaf header.
@@ -123,23 +123,23 @@ xfs_dir2_block_to_leaf(
 	 * tail be free.
 	 */
 	xfs_dir2_data_make_free(tp, dbp,
-		(xfs_dir2_data_aoff_t)((char *)blp - (char *)block),
-		(xfs_dir2_data_aoff_t)((char *)block + mp->m_dirblksize -
+		(xfs_dir2_data_aoff_t)((char *)blp - (char *)hdr),
+		(xfs_dir2_data_aoff_t)((char *)hdr + mp->m_dirblksize -
 				       (char *)blp),
 		&needlog, &needscan);
 	/*
 	 * Fix up the block header, make it a data block.
 	 */
-	block->hdr.magic = cpu_to_be32(XFS_DIR2_DATA_MAGIC);
+	hdr->magic = cpu_to_be32(XFS_DIR2_DATA_MAGIC);
 	if (needscan)
-		xfs_dir2_data_freescan(mp, (xfs_dir2_data_t *)block, &needlog);
+		xfs_dir2_data_freescan(mp, (xfs_dir2_data_t *)hdr, &needlog);
 	/*
 	 * Set up leaf tail and bests table.
 	 */
 	ltp = xfs_dir2_leaf_tail_p(mp, leaf);
 	ltp->bestcount = cpu_to_be32(1);
 	bestsp = xfs_dir2_leaf_bests_p(ltp);
-	bestsp[0] =  block->hdr.bestfree[0].length;
+	bestsp[0] =  hdr->bestfree[0].length;
 	/*
 	 * Log the data header and leaf bests table.
 	 */
Index: xfs/fs/xfs/xfs_dir2_sf.c
===================================================================
--- xfs.orig/fs/xfs/xfs_dir2_sf.c	2011-06-30 09:38:34.303400889 +0200
+++ xfs/fs/xfs/xfs_dir2_sf.c	2011-06-30 09:38:36.590067529 +0200
@@ -141,7 +141,7 @@ xfs_dir2_sfe_put_ino(
 int						/* size for sf form */
 xfs_dir2_block_sfsize(
 	xfs_inode_t		*dp,		/* incore inode pointer */
-	xfs_dir2_block_t	*block,		/* block directory data */
+	xfs_dir2_data_hdr_t	*hdr,		/* block directory data */
 	xfs_dir2_sf_hdr_t	*sfhp)		/* output: header for sf form */
 {
 	xfs_dir2_dataptr_t	addr;		/* data entry address */
@@ -161,7 +161,7 @@ xfs_dir2_block_sfsize(
 	mp = dp->i_mount;
 
 	count = i8count = namelen = 0;
-	btp = xfs_dir2_block_tail_p(mp, block);
+	btp = xfs_dir2_block_tail_p(mp, hdr);
 	blp = xfs_dir2_block_leaf_p(btp);
 
 	/*
@@ -174,7 +174,7 @@ xfs_dir2_block_sfsize(
 		 * Calculate the pointer to the entry at hand.
 		 */
 		dep = (xfs_dir2_data_entry_t *)
-		      ((char *)block + xfs_dir2_dataptr_to_off(mp, addr));
+		      ((char *)hdr + xfs_dir2_dataptr_to_off(mp, addr));
 		/*
 		 * Detect . and .., so we can special-case them.
 		 * . is not included in sf directories.
@@ -255,6 +255,7 @@ xfs_dir2_block_to_sf(
 		ASSERT(error != ENOSPC);
 		goto out;
 	}
+
 	/*
 	 * The buffer is now unconditionally gone, whether
 	 * xfs_dir2_shrink_inode worked or not.
@@ -276,7 +277,7 @@ xfs_dir2_block_to_sf(
 	/*
 	 * Set up to loop over the block's entries.
 	 */
-	btp = xfs_dir2_block_tail_p(mp, block);
+	btp = xfs_dir2_block_tail_p(mp, &block->hdr);
 	ptr = (char *)block->u;
 	endptr = (char *)xfs_dir2_block_leaf_p(btp);
 	sfep = xfs_dir2_sf_firstentry(sfp);
Index: xfs/fs/xfs/xfs_dir2_sf.h
===================================================================
--- xfs.orig/fs/xfs/xfs_dir2_sf.h	2011-06-30 09:38:34.303400889 +0200
+++ xfs/fs/xfs/xfs_dir2_sf.h	2011-06-30 09:38:36.593400862 +0200
@@ -32,7 +32,7 @@
 struct uio;
 struct xfs_dabuf;
 struct xfs_da_args;
-struct xfs_dir2_block;
+struct xfs_dir2_data_hdr;
 struct xfs_inode;
 struct xfs_mount;
 struct xfs_trans;
@@ -134,7 +134,7 @@ extern xfs_ino_t xfs_dir2_sf_get_parent_
 extern xfs_ino_t xfs_dir2_sfe_get_ino(struct xfs_dir2_sf_hdr *sfp,
 				      struct xfs_dir2_sf_entry *sfep);
 extern int xfs_dir2_block_sfsize(struct xfs_inode *dp,
-				 struct xfs_dir2_block *block,
+				 struct xfs_dir2_data_hdr *block,
 				 xfs_dir2_sf_hdr_t *sfhp);
 extern int xfs_dir2_block_to_sf(struct xfs_da_args *args, struct xfs_dabuf *bp,
 				int size, xfs_dir2_sf_hdr_t *sfhp);
Index: xfs/fs/xfs/xfs_dir2_block.h
===================================================================
--- xfs.orig/fs/xfs/xfs_dir2_block.h	2011-06-29 19:45:24.376964832 +0200
+++ xfs/fs/xfs/xfs_dir2_block.h	2011-06-30 09:38:36.596734195 +0200
@@ -61,10 +61,9 @@ typedef struct xfs_dir2_block {
  * Pointer to the leaf header embedded in a data block (1-block format)
  */
 static inline xfs_dir2_block_tail_t *
-xfs_dir2_block_tail_p(struct xfs_mount *mp, xfs_dir2_block_t *block)
+xfs_dir2_block_tail_p(struct xfs_mount *mp, xfs_dir2_data_hdr_t *hdr)
 {
-	return (((xfs_dir2_block_tail_t *)
-		((char *)(block) + (mp)->m_dirblksize)) - 1);
+	return ((xfs_dir2_block_tail_t *)((char *)hdr + mp->m_dirblksize)) - 1;
 }
 
 /*

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 88+ messages in thread

* [PATCH 17/27] xfs: kill struct xfs_dir2_block
  2011-07-01  9:43 [PATCH 00/27] patch queue for Linux 3.1, V2 Christoph Hellwig
                   ` (14 preceding siblings ...)
  2011-07-01  9:43 ` [PATCH 16/27] xfs: avoid usage of struct xfs_dir2_block Christoph Hellwig
@ 2011-07-01  9:43 ` Christoph Hellwig
  2011-07-06  2:31   ` Dave Chinner
  2011-07-06  3:36   ` Alex Elder
  2011-07-01  9:43 ` [PATCH 18/27] xfs: avoid usage of struct xfs_dir2_data Christoph Hellwig
                   ` (10 subsequent siblings)
  26 siblings, 2 replies; 88+ messages in thread
From: Christoph Hellwig @ 2011-07-01  9:43 UTC (permalink / raw)
  To: xfs

[-- Attachment #1: xfs-kill-xfs_dir2_block_t --]
[-- Type: text/plain, Size: 7148 bytes --]

Remove the confusing xfs_dir2_block structure.  It is supposed to describe
an XFS dir2 block format btree block, but due to the variable sized nature
of almost all elements in it it can't actuall do anything close to that
job.  In addition to accessing the fixed offset header structure it was
only used to get a pointer to the first dir or unused entry after it,
which can be trivially replaced by pointer arithmetics on the header
pointer.  For most users that is actually more natural anyway, as they
don't use a typed pointer but rather a character pointer for further
arithmetics.

Signed-off-by: Christoph Hellwig <hch@lst.de>

Index: xfs/fs/xfs/xfs_dir2_block.h
===================================================================
--- xfs.orig/fs/xfs/xfs_dir2_block.h	2011-06-30 09:38:36.596734195 +0200
+++ xfs/fs/xfs/xfs_dir2_block.h	2011-06-30 09:38:38.116734176 +0200
@@ -19,10 +19,30 @@
 #define	__XFS_DIR2_BLOCK_H__
 
 /*
- * xfs_dir2_block.h
- * Directory version 2, single block format structures
+ * Directory version 2, single block format structures.
+ *
+ * The single block format looks like the following drawing on disk:
+ *
+ *    +-------------------------------------------------+
+ *    | xfs_dir2_data_hdr_t                             |
+ *    +-------------------------------------------------+
+ *    | xfs_dir2_data_entry_t OR xfs_dir2_data_unused_t |
+ *    | xfs_dir2_data_entry_t OR xfs_dir2_data_unused_t |
+ *    | xfs_dir2_data_entry_t OR xfs_dir2_data_unused_t |
+ *    | ...                                             |
+ *    +-------------------------------------------------+
+ *    | unused space                                    |
+ *    +-------------------------------------------------+
+ *    | ...                                             |
+ *    | xfs_dir2_leaf_entry_t                           |
+ *    | xfs_dir2_leaf_entry_t                           |
+ *    +-------------------------------------------------+
+ *    | xfs_dir2_block_tail_t                           |
+ *    +-------------------------------------------------+
+ *
+ * As all the entries are variable sized structures the accessors in this
+ * file and xfs_dir2_data.h need to be used to iterate over them.
  */
-
 struct uio;
 struct xfs_dabuf;
 struct xfs_da_args;
@@ -32,14 +52,6 @@ struct xfs_inode;
 struct xfs_mount;
 struct xfs_trans;
 
-/*
- * The single block format is as follows:
- * xfs_dir2_data_hdr_t structure
- * xfs_dir2_data_entry_t and xfs_dir2_data_unused_t structures
- * xfs_dir2_leaf_entry_t structures
- * xfs_dir2_block_tail_t structure
- */
-
 #define	XFS_DIR2_BLOCK_MAGIC	0x58443242	/* XD2B: for one block dirs */
 
 typedef struct xfs_dir2_block_tail {
@@ -48,16 +60,6 @@ typedef struct xfs_dir2_block_tail {
 } xfs_dir2_block_tail_t;
 
 /*
- * Generic single-block structure, for xfs_db.
- */
-typedef struct xfs_dir2_block {
-	xfs_dir2_data_hdr_t	hdr;		/* magic XFS_DIR2_BLOCK_MAGIC */
-	xfs_dir2_data_union_t	u[1];
-	xfs_dir2_leaf_entry_t	leaf[1];
-	xfs_dir2_block_tail_t	tail;
-} xfs_dir2_block_t;
-
-/*
  * Pointer to the leaf header embedded in a data block (1-block format)
  */
 static inline xfs_dir2_block_tail_t *
Index: xfs/fs/xfs/xfs_dir2_block.c
===================================================================
--- xfs.orig/fs/xfs/xfs_dir2_block.c	2011-06-30 09:38:36.586734196 +0200
+++ xfs/fs/xfs/xfs_dir2_block.c	2011-06-30 09:38:38.120067509 +0200
@@ -437,7 +437,6 @@ xfs_dir2_block_getdents(
 	xfs_off_t		*offset,
 	filldir_t		filldir)
 {
-	xfs_dir2_block_t	*block;		/* directory block structure */
 	xfs_dir2_data_hdr_t	*hdr;		/* block header */
 	xfs_dabuf_t		*bp;		/* buffer for block */
 	xfs_dir2_block_tail_t	*btp;		/* block tail */
@@ -471,14 +470,13 @@ xfs_dir2_block_getdents(
 	 * We'll skip entries before this.
 	 */
 	wantoff = xfs_dir2_dataptr_to_off(mp, *offset);
-	block = bp->data;
-	hdr = &block->hdr;
+	hdr = bp->data;
 	xfs_dir2_data_check(dp, bp);
 	/*
 	 * Set up values for the loop.
 	 */
 	btp = xfs_dir2_block_tail_p(mp, hdr);
-	ptr = (char *)block->u;
+	ptr = (char *)(hdr + 1);
 	endptr = (char *)xfs_dir2_block_leaf_p(btp);
 
 	/*
@@ -1020,7 +1018,6 @@ xfs_dir2_sf_to_block(
 	xfs_da_args_t		*args)		/* operation arguments */
 {
 	xfs_dir2_db_t		blkno;		/* dir-relative block # (0) */
-	xfs_dir2_block_t	*block;		/* block structure */
 	xfs_dir2_data_hdr_t	*hdr;		/* block header */
 	xfs_dir2_leaf_entry_t	*blp;		/* block leaf entries */
 	xfs_dabuf_t		*bp;		/* block buffer */
@@ -1091,8 +1088,7 @@ xfs_dir2_sf_to_block(
 		kmem_free(sfp);
 		return error;
 	}
-	block = bp->data;
-	hdr = &block->hdr;
+	hdr = bp->data;
 	hdr->magic = cpu_to_be32(XFS_DIR2_BLOCK_MAGIC);
 	/*
 	 * Compute size of block "tail" area.
@@ -1103,7 +1099,7 @@ xfs_dir2_sf_to_block(
 	 * The whole thing is initialized to free by the init routine.
 	 * Say we're using the leaf and tail area.
 	 */
-	dup = (xfs_dir2_data_unused_t *)block->u;
+	dup = (xfs_dir2_data_unused_t *)(hdr + 1);
 	needlog = needscan = 0;
 	xfs_dir2_data_use_free(tp, bp, dup, mp->m_dirblksize - i, i, &needlog,
 		&needscan);
Index: xfs/fs/xfs/xfs_dir2_sf.c
===================================================================
--- xfs.orig/fs/xfs/xfs_dir2_sf.c	2011-06-30 09:38:36.590067529 +0200
+++ xfs/fs/xfs/xfs_dir2_sf.c	2011-06-30 09:38:38.123400842 +0200
@@ -226,7 +226,7 @@ xfs_dir2_block_to_sf(
 	int			size,		/* shortform directory size */
 	xfs_dir2_sf_hdr_t	*sfhp)		/* shortform directory hdr */
 {
-	xfs_dir2_block_t	*block;		/* block structure */
+	xfs_dir2_data_hdr_t	*hdr;		/* block header */
 	xfs_dir2_block_tail_t	*btp;		/* block tail pointer */
 	xfs_dir2_data_entry_t	*dep;		/* data entry pointer */
 	xfs_inode_t		*dp;		/* incore directory inode */
@@ -248,8 +248,8 @@ xfs_dir2_block_to_sf(
 	 * Make a copy of the block data, so we can shrink the inode
 	 * and add local data.
 	 */
-	block = kmem_alloc(mp->m_dirblksize, KM_SLEEP);
-	memcpy(block, bp->data, mp->m_dirblksize);
+	hdr = kmem_alloc(mp->m_dirblksize, KM_SLEEP);
+	memcpy(hdr, bp->data, mp->m_dirblksize);
 	logflags = XFS_ILOG_CORE;
 	if ((error = xfs_dir2_shrink_inode(args, mp->m_dirdatablk, bp))) {
 		ASSERT(error != ENOSPC);
@@ -277,8 +277,8 @@ xfs_dir2_block_to_sf(
 	/*
 	 * Set up to loop over the block's entries.
 	 */
-	btp = xfs_dir2_block_tail_p(mp, &block->hdr);
-	ptr = (char *)block->u;
+	btp = xfs_dir2_block_tail_p(mp, hdr);
+	ptr = (char *)(hdr + 1);
 	endptr = (char *)xfs_dir2_block_leaf_p(btp);
 	sfep = xfs_dir2_sf_firstentry(sfp);
 	/*
@@ -314,7 +314,7 @@ xfs_dir2_block_to_sf(
 			sfep->namelen = dep->namelen;
 			xfs_dir2_sf_put_offset(sfep,
 				(xfs_dir2_data_aoff_t)
-				((char *)dep - (char *)block));
+				((char *)dep - (char *)hdr));
 			memcpy(sfep->name, dep->name, dep->namelen);
 			xfs_dir2_sfe_put_ino(sfp, sfep,
 					     be64_to_cpu(dep->inumber));
@@ -327,7 +327,7 @@ xfs_dir2_block_to_sf(
 	xfs_dir2_sf_check(args);
 out:
 	xfs_trans_log_inode(args->trans, dp, logflags);
-	kmem_free(block);
+	kmem_free(hdr);
 	return error;
 }
 

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 88+ messages in thread

* [PATCH 18/27] xfs: avoid usage of struct xfs_dir2_data
  2011-07-01  9:43 [PATCH 00/27] patch queue for Linux 3.1, V2 Christoph Hellwig
                   ` (15 preceding siblings ...)
  2011-07-01  9:43 ` [PATCH 17/27] xfs: kill " Christoph Hellwig
@ 2011-07-01  9:43 ` Christoph Hellwig
  2011-07-06  3:02   ` Dave Chinner
  2011-07-06  3:38   ` Alex Elder
  2011-07-01  9:43 ` [PATCH 19/27] xfs: kill " Christoph Hellwig
                   ` (9 subsequent siblings)
  26 siblings, 2 replies; 88+ messages in thread
From: Christoph Hellwig @ 2011-07-01  9:43 UTC (permalink / raw)
  To: xfs

[-- Attachment #1: xfs-avoid-xfs_dir2_data_t --]
[-- Type: text/plain, Size: 45985 bytes --]

In most places we can simply pass around and use the struct xfs_dir2_data_hdr,
which is the first and most important member of struct xfs_dir2_data instead
of the full structure.

Signed-off-by: Christoph Hellwig <hch@lst.de>

Index: xfs/fs/xfs/xfs_da_btree.c
===================================================================
--- xfs.orig/fs/xfs/xfs_da_btree.c	2011-06-29 19:45:24.010300152 +0200
+++ xfs/fs/xfs/xfs_da_btree.c	2011-06-30 09:38:40.126734150 +0200
@@ -2079,16 +2079,13 @@ xfs_da_do_buf(
 	 * For read_buf, check the magic number.
 	 */
 	if (caller == 1) {
-		xfs_dir2_data_t		*data;
-		xfs_dir2_free_t		*free;
-		xfs_da_blkinfo_t	*info;
+		xfs_dir2_data_hdr_t	*hdr = rbp->data;
+		xfs_dir2_free_t		*free = rbp->data;
+		xfs_da_blkinfo_t	*info = rbp->data;
 		uint			magic, magic1;
 
-		info = rbp->data;
-		data = rbp->data;
-		free = rbp->data;
 		magic = be16_to_cpu(info->magic);
-		magic1 = be32_to_cpu(data->hdr.magic);
+		magic1 = be32_to_cpu(hdr->magic);
 		if (unlikely(
 		    XFS_TEST_ERROR((magic != XFS_DA_NODE_MAGIC) &&
 				   (magic != XFS_ATTR_LEAF_MAGIC) &&
Index: xfs/fs/xfs/xfs_dir2_block.c
===================================================================
--- xfs.orig/fs/xfs/xfs_dir2_block.c	2011-06-30 09:38:38.120067509 +0200
+++ xfs/fs/xfs/xfs_dir2_block.c	2011-06-30 09:38:40.130067486 +0200
@@ -282,7 +282,7 @@ xfs_dir2_block_addname(
 		 * This needs to happen before the next call to use_free.
 		 */
 		if (needscan) {
-			xfs_dir2_data_freescan(mp, (xfs_dir2_data_t *)hdr, &needlog);
+			xfs_dir2_data_freescan(mp, hdr, &needlog);
 			needscan = 0;
 		}
 	}
@@ -331,8 +331,7 @@ xfs_dir2_block_addname(
 		 * This needs to happen before the next call to use_free.
 		 */
 		if (needscan) {
-			xfs_dir2_data_freescan(mp, (xfs_dir2_data_t *)hdr,
-				&needlog);
+			xfs_dir2_data_freescan(mp, hdr, &needlog);
 			needscan = 0;
 		}
 		/*
@@ -417,7 +416,7 @@ xfs_dir2_block_addname(
 	 * Clean up the bestfree array and log the header, tail, and entry.
 	 */
 	if (needscan)
-		xfs_dir2_data_freescan(mp, (xfs_dir2_data_t *)hdr, &needlog);
+		xfs_dir2_data_freescan(mp, hdr, &needlog);
 	if (needlog)
 		xfs_dir2_data_log_header(tp, bp);
 	xfs_dir2_block_log_tail(tp, bp);
@@ -783,7 +782,7 @@ xfs_dir2_block_removename(
 	 * Fix up bestfree, log the header if necessary.
 	 */
 	if (needscan)
-		xfs_dir2_data_freescan(mp, (xfs_dir2_data_t *)hdr, &needlog);
+		xfs_dir2_data_freescan(mp, hdr, &needlog);
 	if (needlog)
 		xfs_dir2_data_log_header(tp, bp);
 	xfs_dir2_data_check(dp, bp);
@@ -982,7 +981,7 @@ xfs_dir2_leaf_to_block(
 	 * Scan the bestfree if we need it and log the data block header.
 	 */
 	if (needscan)
-		xfs_dir2_data_freescan(mp, (xfs_dir2_data_t *)hdr, &needlog);
+		xfs_dir2_data_freescan(mp, hdr, &needlog);
 	if (needlog)
 		xfs_dir2_data_log_header(tp, dbp);
 	/*
@@ -1177,8 +1176,7 @@ xfs_dir2_sf_to_block(
 			*xfs_dir2_data_unused_tag_p(dup) = cpu_to_be16(
 				((char *)dup - (char *)hdr));
 			xfs_dir2_data_log_unused(tp, bp, dup);
-			(void)xfs_dir2_data_freeinsert((xfs_dir2_data_t *)hdr,
-				dup, &dummy);
+			(void)xfs_dir2_data_freeinsert(hdr, dup, &dummy);
 			offset += be16_to_cpu(dup->length);
 			continue;
 		}
Index: xfs/fs/xfs/xfs_dir2_data.c
===================================================================
--- xfs.orig/fs/xfs/xfs_dir2_data.c	2011-06-30 09:38:36.586734196 +0200
+++ xfs/fs/xfs/xfs_dir2_data.c	2011-06-30 09:38:40.133400821 +0200
@@ -35,6 +35,9 @@
 #include "xfs_dir2_block.h"
 #include "xfs_error.h"
 
+STATIC xfs_dir2_data_free_t *
+xfs_dir2_data_freefind(xfs_dir2_data_hdr_t *hdr, xfs_dir2_data_unused_t *dup);
+
 #ifdef DEBUG
 /*
  * Check the consistency of the data block.
@@ -51,6 +54,7 @@ xfs_dir2_data_check(
 	xfs_dir2_block_tail_t	*btp=NULL;	/* block tail */
 	int			count;		/* count of entries found */
 	xfs_dir2_data_t		*d;		/* data block pointer */
+	xfs_dir2_data_hdr_t	*hdr;		/* data block header */
 	xfs_dir2_data_entry_t	*dep;		/* data entry */
 	xfs_dir2_data_free_t	*dfp;		/* bestfree entry */
 	xfs_dir2_data_unused_t	*dup;		/* unused entry */
@@ -67,16 +71,19 @@ xfs_dir2_data_check(
 
 	mp = dp->i_mount;
 	d = bp->data;
-	ASSERT(be32_to_cpu(d->hdr.magic) == XFS_DIR2_DATA_MAGIC ||
-	       be32_to_cpu(d->hdr.magic) == XFS_DIR2_BLOCK_MAGIC);
-	bf = d->hdr.bestfree;
+	hdr = &d->hdr;
+	bf = hdr->bestfree;
 	p = (char *)d->u;
-	if (be32_to_cpu(d->hdr.magic) == XFS_DIR2_BLOCK_MAGIC) {
-		btp = xfs_dir2_block_tail_p(mp, &d->hdr);
+
+	if (hdr->magic == cpu_to_be32(XFS_DIR2_BLOCK_MAGIC)) {
+		btp = xfs_dir2_block_tail_p(mp, hdr);
 		lep = xfs_dir2_block_leaf_p(btp);
 		endp = (char *)lep;
-	} else
-		endp = (char *)d + mp->m_dirblksize;
+	} else {
+		ASSERT(hdr->magic == cpu_to_be32(XFS_DIR2_DATA_MAGIC));
+		endp = (char *)hdr + mp->m_dirblksize;
+	}
+
 	count = lastfree = freeseen = 0;
 	/*
 	 * Account for zero bestfree entries.
@@ -108,8 +115,8 @@ xfs_dir2_data_check(
 		if (be16_to_cpu(dup->freetag) == XFS_DIR2_DATA_FREE_TAG) {
 			ASSERT(lastfree == 0);
 			ASSERT(be16_to_cpu(*xfs_dir2_data_unused_tag_p(dup)) ==
-			       (char *)dup - (char *)d);
-			dfp = xfs_dir2_data_freefind(d, dup);
+			       (char *)dup - (char *)hdr);
+			dfp = xfs_dir2_data_freefind(hdr, dup);
 			if (dfp) {
 				i = (int)(dfp - bf);
 				ASSERT((freeseen & (1 << i)) == 0);
@@ -132,13 +139,13 @@ xfs_dir2_data_check(
 		ASSERT(dep->namelen != 0);
 		ASSERT(xfs_dir_ino_validate(mp, be64_to_cpu(dep->inumber)) == 0);
 		ASSERT(be16_to_cpu(*xfs_dir2_data_entry_tag_p(dep)) ==
-		       (char *)dep - (char *)d);
+		       (char *)dep - (char *)hdr);
 		count++;
 		lastfree = 0;
-		if (be32_to_cpu(d->hdr.magic) == XFS_DIR2_BLOCK_MAGIC) {
+		if (hdr->magic == cpu_to_be32(XFS_DIR2_BLOCK_MAGIC)) {
 			addr = xfs_dir2_db_off_to_dataptr(mp, mp->m_dirdatablk,
 				(xfs_dir2_data_aoff_t)
-				((char *)dep - (char *)d));
+				((char *)dep - (char *)hdr));
 			name.name = dep->name;
 			name.len = dep->namelen;
 			hash = mp->m_dirnameops->hashname(&name);
@@ -155,7 +162,7 @@ xfs_dir2_data_check(
 	 * Need to have seen all the entries and all the bestfree slots.
 	 */
 	ASSERT(freeseen == 7);
-	if (be32_to_cpu(d->hdr.magic) == XFS_DIR2_BLOCK_MAGIC) {
+	if (hdr->magic == cpu_to_be32(XFS_DIR2_BLOCK_MAGIC)) {
 		for (i = stale = 0; i < be32_to_cpu(btp->count); i++) {
 			if (be32_to_cpu(lep[i].address) == XFS_DIR2_NULL_DATAPTR)
 				stale++;
@@ -172,9 +179,9 @@ xfs_dir2_data_check(
  * Given a data block and an unused entry from that block,
  * return the bestfree entry if any that corresponds to it.
  */
-xfs_dir2_data_free_t *
+STATIC xfs_dir2_data_free_t *
 xfs_dir2_data_freefind(
-	xfs_dir2_data_t		*d,		/* data block */
+	xfs_dir2_data_hdr_t	*hdr,		/* data block */
 	xfs_dir2_data_unused_t	*dup)		/* data unused entry */
 {
 	xfs_dir2_data_free_t	*dfp;		/* bestfree entry */
@@ -184,17 +191,17 @@ xfs_dir2_data_freefind(
 	int			seenzero;	/* saw a 0 bestfree entry */
 #endif
 
-	off = (xfs_dir2_data_aoff_t)((char *)dup - (char *)d);
+	off = (xfs_dir2_data_aoff_t)((char *)dup - (char *)hdr);
 #if defined(DEBUG) && defined(__KERNEL__)
 	/*
 	 * Validate some consistency in the bestfree table.
 	 * Check order, non-overlapping entries, and if we find the
 	 * one we're looking for it has to be exact.
 	 */
-	ASSERT(be32_to_cpu(d->hdr.magic) == XFS_DIR2_DATA_MAGIC ||
-	       be32_to_cpu(d->hdr.magic) == XFS_DIR2_BLOCK_MAGIC);
-	for (dfp = &d->hdr.bestfree[0], seenzero = matched = 0;
-	     dfp < &d->hdr.bestfree[XFS_DIR2_DATA_FD_COUNT];
+	ASSERT(be32_to_cpu(hdr->magic) == XFS_DIR2_DATA_MAGIC ||
+	       be32_to_cpu(hdr->magic) == XFS_DIR2_BLOCK_MAGIC);
+	for (dfp = &hdr->bestfree[0], seenzero = matched = 0;
+	     dfp < &hdr->bestfree[XFS_DIR2_DATA_FD_COUNT];
 	     dfp++) {
 		if (!dfp->offset) {
 			ASSERT(!dfp->length);
@@ -210,7 +217,7 @@ xfs_dir2_data_freefind(
 		else
 			ASSERT(be16_to_cpu(dfp->offset) + be16_to_cpu(dfp->length) <= off);
 		ASSERT(matched || be16_to_cpu(dfp->length) >= be16_to_cpu(dup->length));
-		if (dfp > &d->hdr.bestfree[0])
+		if (dfp > &hdr->bestfree[0])
 			ASSERT(be16_to_cpu(dfp[-1].length) >= be16_to_cpu(dfp[0].length));
 	}
 #endif
@@ -219,13 +226,13 @@ xfs_dir2_data_freefind(
 	 * it can't be there since they're sorted.
 	 */
 	if (be16_to_cpu(dup->length) <
-	    be16_to_cpu(d->hdr.bestfree[XFS_DIR2_DATA_FD_COUNT - 1].length))
+	    be16_to_cpu(hdr->bestfree[XFS_DIR2_DATA_FD_COUNT - 1].length))
 		return NULL;
 	/*
 	 * Look at the three bestfree entries for our guy.
 	 */
-	for (dfp = &d->hdr.bestfree[0];
-	     dfp < &d->hdr.bestfree[XFS_DIR2_DATA_FD_COUNT];
+	for (dfp = &hdr->bestfree[0];
+	     dfp < &hdr->bestfree[XFS_DIR2_DATA_FD_COUNT];
 	     dfp++) {
 		if (!dfp->offset)
 			return NULL;
@@ -243,7 +250,7 @@ xfs_dir2_data_freefind(
  */
 xfs_dir2_data_free_t *				/* entry inserted */
 xfs_dir2_data_freeinsert(
-	xfs_dir2_data_t		*d,		/* data block pointer */
+	xfs_dir2_data_hdr_t	*hdr,		/* data block pointer */
 	xfs_dir2_data_unused_t	*dup,		/* unused space */
 	int			*loghead)	/* log the data header (out) */
 {
@@ -251,12 +258,13 @@ xfs_dir2_data_freeinsert(
 	xfs_dir2_data_free_t	new;		/* new bestfree entry */
 
 #ifdef __KERNEL__
-	ASSERT(be32_to_cpu(d->hdr.magic) == XFS_DIR2_DATA_MAGIC ||
-	       be32_to_cpu(d->hdr.magic) == XFS_DIR2_BLOCK_MAGIC);
+	ASSERT(be32_to_cpu(hdr->magic) == XFS_DIR2_DATA_MAGIC ||
+	       be32_to_cpu(hdr->magic) == XFS_DIR2_BLOCK_MAGIC);
 #endif
-	dfp = d->hdr.bestfree;
+	dfp = hdr->bestfree;
 	new.length = dup->length;
-	new.offset = cpu_to_be16((char *)dup - (char *)d);
+	new.offset = cpu_to_be16((char *)dup - (char *)hdr);
+
 	/*
 	 * Insert at position 0, 1, or 2; or not at all.
 	 */
@@ -286,36 +294,36 @@ xfs_dir2_data_freeinsert(
  */
 STATIC void
 xfs_dir2_data_freeremove(
-	xfs_dir2_data_t		*d,		/* data block pointer */
+	xfs_dir2_data_hdr_t	*hdr,		/* data block header */
 	xfs_dir2_data_free_t	*dfp,		/* bestfree entry pointer */
 	int			*loghead)	/* out: log data header */
 {
 #ifdef __KERNEL__
-	ASSERT(be32_to_cpu(d->hdr.magic) == XFS_DIR2_DATA_MAGIC ||
-	       be32_to_cpu(d->hdr.magic) == XFS_DIR2_BLOCK_MAGIC);
+	ASSERT(be32_to_cpu(hdr->magic) == XFS_DIR2_DATA_MAGIC ||
+	       be32_to_cpu(hdr->magic) == XFS_DIR2_BLOCK_MAGIC);
 #endif
 	/*
 	 * It's the first entry, slide the next 2 up.
 	 */
-	if (dfp == &d->hdr.bestfree[0]) {
-		d->hdr.bestfree[0] = d->hdr.bestfree[1];
-		d->hdr.bestfree[1] = d->hdr.bestfree[2];
+	if (dfp == &hdr->bestfree[0]) {
+		hdr->bestfree[0] = hdr->bestfree[1];
+		hdr->bestfree[1] = hdr->bestfree[2];
 	}
 	/*
 	 * It's the second entry, slide the 3rd entry up.
 	 */
-	else if (dfp == &d->hdr.bestfree[1])
-		d->hdr.bestfree[1] = d->hdr.bestfree[2];
+	else if (dfp == &hdr->bestfree[1])
+		hdr->bestfree[1] = hdr->bestfree[2];
 	/*
 	 * Must be the last entry.
 	 */
 	else
-		ASSERT(dfp == &d->hdr.bestfree[2]);
+		ASSERT(dfp == &hdr->bestfree[2]);
 	/*
 	 * Clear the 3rd entry, must be zero now.
 	 */
-	d->hdr.bestfree[2].length = 0;
-	d->hdr.bestfree[2].offset = 0;
+	hdr->bestfree[2].length = 0;
+	hdr->bestfree[2].offset = 0;
 	*loghead = 1;
 }
 
@@ -325,9 +333,10 @@ xfs_dir2_data_freeremove(
 void
 xfs_dir2_data_freescan(
 	xfs_mount_t		*mp,		/* filesystem mount point */
-	xfs_dir2_data_t		*d,		/* data block pointer */
+	xfs_dir2_data_hdr_t	*hdr,		/* data block header */
 	int			*loghead)	/* out: log data header */
 {
+	xfs_dir2_data_t		*d = (xfs_dir2_data_t *)hdr;
 	xfs_dir2_block_tail_t	*btp;		/* block tail */
 	xfs_dir2_data_entry_t	*dep;		/* active data entry */
 	xfs_dir2_data_unused_t	*dup;		/* unused data entry */
@@ -335,23 +344,23 @@ xfs_dir2_data_freescan(
 	char			*p;		/* current entry pointer */
 
 #ifdef __KERNEL__
-	ASSERT(be32_to_cpu(d->hdr.magic) == XFS_DIR2_DATA_MAGIC ||
-	       be32_to_cpu(d->hdr.magic) == XFS_DIR2_BLOCK_MAGIC);
+	ASSERT(hdr->magic == cpu_to_be32(XFS_DIR2_DATA_MAGIC) ||
+	       hdr->magic == cpu_to_be32(XFS_DIR2_BLOCK_MAGIC));
 #endif
 	/*
 	 * Start by clearing the table.
 	 */
-	memset(d->hdr.bestfree, 0, sizeof(d->hdr.bestfree));
+	memset(hdr->bestfree, 0, sizeof(hdr->bestfree));
 	*loghead = 1;
 	/*
 	 * Set up pointers.
 	 */
 	p = (char *)d->u;
-	if (be32_to_cpu(d->hdr.magic) == XFS_DIR2_BLOCK_MAGIC) {
-		btp = xfs_dir2_block_tail_p(mp, &d->hdr);
+	if (be32_to_cpu(hdr->magic) == XFS_DIR2_BLOCK_MAGIC) {
+		btp = xfs_dir2_block_tail_p(mp, hdr);
 		endp = (char *)xfs_dir2_block_leaf_p(btp);
 	} else
-		endp = (char *)d + mp->m_dirblksize;
+		endp = (char *)hdr + mp->m_dirblksize;
 	/*
 	 * Loop over the block's entries.
 	 */
@@ -361,9 +370,9 @@ xfs_dir2_data_freescan(
 		 * If it's a free entry, insert it.
 		 */
 		if (be16_to_cpu(dup->freetag) == XFS_DIR2_DATA_FREE_TAG) {
-			ASSERT((char *)dup - (char *)d ==
+			ASSERT((char *)dup - (char *)hdr ==
 			       be16_to_cpu(*xfs_dir2_data_unused_tag_p(dup)));
-			xfs_dir2_data_freeinsert(d, dup, loghead);
+			xfs_dir2_data_freeinsert(hdr, dup, loghead);
 			p += be16_to_cpu(dup->length);
 		}
 		/*
@@ -371,7 +380,7 @@ xfs_dir2_data_freescan(
 		 */
 		else {
 			dep = (xfs_dir2_data_entry_t *)p;
-			ASSERT((char *)dep - (char *)d ==
+			ASSERT((char *)dep - (char *)hdr ==
 			       be16_to_cpu(*xfs_dir2_data_entry_tag_p(dep)));
 			p += xfs_dir2_data_entsize(dep->namelen);
 		}
@@ -390,6 +399,7 @@ xfs_dir2_data_init(
 {
 	xfs_dabuf_t		*bp;		/* block buffer */
 	xfs_dir2_data_t		*d;		/* pointer to block */
+	xfs_dir2_data_hdr_t	*hdr;		/* data block header */
 	xfs_inode_t		*dp;		/* incore directory inode */
 	xfs_dir2_data_unused_t	*dup;		/* unused entry pointer */
 	int			error;		/* error return value */
@@ -410,26 +420,29 @@ xfs_dir2_data_init(
 		return error;
 	}
 	ASSERT(bp != NULL);
+
 	/*
 	 * Initialize the header.
 	 */
 	d = bp->data;
-	d->hdr.magic = cpu_to_be32(XFS_DIR2_DATA_MAGIC);
-	d->hdr.bestfree[0].offset = cpu_to_be16(sizeof(d->hdr));
+	hdr = &d->hdr;
+	hdr->magic = cpu_to_be32(XFS_DIR2_DATA_MAGIC);
+	hdr->bestfree[0].offset = cpu_to_be16(sizeof(*hdr));
 	for (i = 1; i < XFS_DIR2_DATA_FD_COUNT; i++) {
-		d->hdr.bestfree[i].length = 0;
-		d->hdr.bestfree[i].offset = 0;
+		hdr->bestfree[i].length = 0;
+		hdr->bestfree[i].offset = 0;
 	}
+
 	/*
 	 * Set up an unused entry for the block's body.
 	 */
 	dup = &d->u[0].unused;
 	dup->freetag = cpu_to_be16(XFS_DIR2_DATA_FREE_TAG);
 
-	t=mp->m_dirblksize - (uint)sizeof(d->hdr);
-	d->hdr.bestfree[0].length = cpu_to_be16(t);
+	t = mp->m_dirblksize - (uint)sizeof(*hdr);
+	hdr->bestfree[0].length = cpu_to_be16(t);
 	dup->length = cpu_to_be16(t);
-	*xfs_dir2_data_unused_tag_p(dup) = cpu_to_be16((char *)dup - (char *)d);
+	*xfs_dir2_data_unused_tag_p(dup) = cpu_to_be16((char *)dup - (char *)hdr);
 	/*
 	 * Log it and return it.
 	 */
@@ -448,14 +461,14 @@ xfs_dir2_data_log_entry(
 	xfs_dabuf_t		*bp,		/* block buffer */
 	xfs_dir2_data_entry_t	*dep)		/* data entry pointer */
 {
-	xfs_dir2_data_t		*d;		/* data block pointer */
+	xfs_dir2_data_hdr_t	*hdr = bp->data;
 
-	d = bp->data;
-	ASSERT(be32_to_cpu(d->hdr.magic) == XFS_DIR2_DATA_MAGIC ||
-	       be32_to_cpu(d->hdr.magic) == XFS_DIR2_BLOCK_MAGIC);
-	xfs_da_log_buf(tp, bp, (uint)((char *)dep - (char *)d),
+	ASSERT(hdr->magic == cpu_to_be32(XFS_DIR2_DATA_MAGIC) ||
+	       hdr->magic == cpu_to_be32(XFS_DIR2_BLOCK_MAGIC));
+
+	xfs_da_log_buf(tp, bp, (uint)((char *)dep - (char *)hdr),
 		(uint)((char *)(xfs_dir2_data_entry_tag_p(dep) + 1) -
-		       (char *)d - 1));
+		       (char *)hdr - 1));
 }
 
 /*
@@ -466,13 +479,12 @@ xfs_dir2_data_log_header(
 	xfs_trans_t		*tp,		/* transaction pointer */
 	xfs_dabuf_t		*bp)		/* block buffer */
 {
-	xfs_dir2_data_t		*d;		/* data block pointer */
+	xfs_dir2_data_hdr_t	*hdr = bp->data;
 
-	d = bp->data;
-	ASSERT(be32_to_cpu(d->hdr.magic) == XFS_DIR2_DATA_MAGIC ||
-	       be32_to_cpu(d->hdr.magic) == XFS_DIR2_BLOCK_MAGIC);
-	xfs_da_log_buf(tp, bp, (uint)((char *)&d->hdr - (char *)d),
-		(uint)(sizeof(d->hdr) - 1));
+	ASSERT(hdr->magic == cpu_to_be32(XFS_DIR2_DATA_MAGIC) ||
+	       hdr->magic == cpu_to_be32(XFS_DIR2_BLOCK_MAGIC));
+
+	xfs_da_log_buf(tp, bp, 0, sizeof(*hdr) - 1);
 }
 
 /*
@@ -484,23 +496,23 @@ xfs_dir2_data_log_unused(
 	xfs_dabuf_t		*bp,		/* block buffer */
 	xfs_dir2_data_unused_t	*dup)		/* data unused pointer */
 {
-	xfs_dir2_data_t		*d;		/* data block pointer */
+	xfs_dir2_data_hdr_t	*hdr = bp->data;
+
+	ASSERT(hdr->magic == cpu_to_be32(XFS_DIR2_DATA_MAGIC) ||
+	       hdr->magic == cpu_to_be32(XFS_DIR2_BLOCK_MAGIC));
 
-	d = bp->data;
-	ASSERT(be32_to_cpu(d->hdr.magic) == XFS_DIR2_DATA_MAGIC ||
-	       be32_to_cpu(d->hdr.magic) == XFS_DIR2_BLOCK_MAGIC);
 	/*
 	 * Log the first part of the unused entry.
 	 */
-	xfs_da_log_buf(tp, bp, (uint)((char *)dup - (char *)d),
+	xfs_da_log_buf(tp, bp, (uint)((char *)dup - (char *)hdr),
 		(uint)((char *)&dup->length + sizeof(dup->length) -
-		       1 - (char *)d));
+		       1 - (char *)hdr));
 	/*
 	 * Log the end (tag) of the unused entry.
 	 */
 	xfs_da_log_buf(tp, bp,
-		(uint)((char *)xfs_dir2_data_unused_tag_p(dup) - (char *)d),
-		(uint)((char *)xfs_dir2_data_unused_tag_p(dup) - (char *)d +
+		(uint)((char *)xfs_dir2_data_unused_tag_p(dup) - (char *)hdr),
+		(uint)((char *)xfs_dir2_data_unused_tag_p(dup) - (char *)hdr +
 		       sizeof(xfs_dir2_data_off_t) - 1));
 }
 
@@ -517,7 +529,7 @@ xfs_dir2_data_make_free(
 	int			*needlogp,	/* out: log header */
 	int			*needscanp)	/* out: regen bestfree */
 {
-	xfs_dir2_data_t		*d;		/* data block pointer */
+	xfs_dir2_data_hdr_t	*hdr;		/* data block pointer */
 	xfs_dir2_data_free_t	*dfp;		/* bestfree pointer */
 	char			*endptr;	/* end of data area */
 	xfs_mount_t		*mp;		/* filesystem mount point */
@@ -527,28 +539,29 @@ xfs_dir2_data_make_free(
 	xfs_dir2_data_unused_t	*prevdup;	/* unused entry before us */
 
 	mp = tp->t_mountp;
-	d = bp->data;
+	hdr = bp->data;
+
 	/*
 	 * Figure out where the end of the data area is.
 	 */
-	if (be32_to_cpu(d->hdr.magic) == XFS_DIR2_DATA_MAGIC)
-		endptr = (char *)d + mp->m_dirblksize;
+	if (hdr->magic == cpu_to_be32(XFS_DIR2_DATA_MAGIC))
+		endptr = (char *)hdr + mp->m_dirblksize;
 	else {
 		xfs_dir2_block_tail_t	*btp;	/* block tail */
 
-		ASSERT(be32_to_cpu(d->hdr.magic) == XFS_DIR2_BLOCK_MAGIC);
-		btp = xfs_dir2_block_tail_p(mp, &d->hdr);
+		ASSERT(hdr->magic == cpu_to_be32(XFS_DIR2_BLOCK_MAGIC));
+		btp = xfs_dir2_block_tail_p(mp, hdr);
 		endptr = (char *)xfs_dir2_block_leaf_p(btp);
 	}
 	/*
 	 * If this isn't the start of the block, then back up to
 	 * the previous entry and see if it's free.
 	 */
-	if (offset > sizeof(d->hdr)) {
+	if (offset > sizeof(*hdr)) {
 		__be16			*tagp;	/* tag just before us */
 
-		tagp = (__be16 *)((char *)d + offset) - 1;
-		prevdup = (xfs_dir2_data_unused_t *)((char *)d + be16_to_cpu(*tagp));
+		tagp = (__be16 *)((char *)hdr + offset) - 1;
+		prevdup = (xfs_dir2_data_unused_t *)((char *)hdr + be16_to_cpu(*tagp));
 		if (be16_to_cpu(prevdup->freetag) != XFS_DIR2_DATA_FREE_TAG)
 			prevdup = NULL;
 	} else
@@ -557,9 +570,9 @@ xfs_dir2_data_make_free(
 	 * If this isn't the end of the block, see if the entry after
 	 * us is free.
 	 */
-	if ((char *)d + offset + len < endptr) {
+	if ((char *)hdr + offset + len < endptr) {
 		postdup =
-			(xfs_dir2_data_unused_t *)((char *)d + offset + len);
+			(xfs_dir2_data_unused_t *)((char *)hdr + offset + len);
 		if (be16_to_cpu(postdup->freetag) != XFS_DIR2_DATA_FREE_TAG)
 			postdup = NULL;
 	} else
@@ -576,21 +589,21 @@ xfs_dir2_data_make_free(
 		/*
 		 * See if prevdup and/or postdup are in bestfree table.
 		 */
-		dfp = xfs_dir2_data_freefind(d, prevdup);
-		dfp2 = xfs_dir2_data_freefind(d, postdup);
+		dfp = xfs_dir2_data_freefind(hdr, prevdup);
+		dfp2 = xfs_dir2_data_freefind(hdr, postdup);
 		/*
 		 * We need a rescan unless there are exactly 2 free entries
 		 * namely our two.  Then we know what's happening, otherwise
 		 * since the third bestfree is there, there might be more
 		 * entries.
 		 */
-		needscan = (d->hdr.bestfree[2].length != 0);
+		needscan = (hdr->bestfree[2].length != 0);
 		/*
 		 * Fix up the new big freespace.
 		 */
 		be16_add_cpu(&prevdup->length, len + be16_to_cpu(postdup->length));
 		*xfs_dir2_data_unused_tag_p(prevdup) =
-			cpu_to_be16((char *)prevdup - (char *)d);
+			cpu_to_be16((char *)prevdup - (char *)hdr);
 		xfs_dir2_data_log_unused(tp, bp, prevdup);
 		if (!needscan) {
 			/*
@@ -600,18 +613,18 @@ xfs_dir2_data_make_free(
 			 * Remove entry 1 first then entry 0.
 			 */
 			ASSERT(dfp && dfp2);
-			if (dfp == &d->hdr.bestfree[1]) {
-				dfp = &d->hdr.bestfree[0];
+			if (dfp == &hdr->bestfree[1]) {
+				dfp = &hdr->bestfree[0];
 				ASSERT(dfp2 == dfp);
-				dfp2 = &d->hdr.bestfree[1];
+				dfp2 = &hdr->bestfree[1];
 			}
-			xfs_dir2_data_freeremove(d, dfp2, needlogp);
-			xfs_dir2_data_freeremove(d, dfp, needlogp);
+			xfs_dir2_data_freeremove(hdr, dfp2, needlogp);
+			xfs_dir2_data_freeremove(hdr, dfp, needlogp);
 			/*
 			 * Now insert the new entry.
 			 */
-			dfp = xfs_dir2_data_freeinsert(d, prevdup, needlogp);
-			ASSERT(dfp == &d->hdr.bestfree[0]);
+			dfp = xfs_dir2_data_freeinsert(hdr, prevdup, needlogp);
+			ASSERT(dfp == &hdr->bestfree[0]);
 			ASSERT(dfp->length == prevdup->length);
 			ASSERT(!dfp[1].length);
 			ASSERT(!dfp[2].length);
@@ -621,10 +634,10 @@ xfs_dir2_data_make_free(
 	 * The entry before us is free, merge with it.
 	 */
 	else if (prevdup) {
-		dfp = xfs_dir2_data_freefind(d, prevdup);
+		dfp = xfs_dir2_data_freefind(hdr, prevdup);
 		be16_add_cpu(&prevdup->length, len);
 		*xfs_dir2_data_unused_tag_p(prevdup) =
-			cpu_to_be16((char *)prevdup - (char *)d);
+			cpu_to_be16((char *)prevdup - (char *)hdr);
 		xfs_dir2_data_log_unused(tp, bp, prevdup);
 		/*
 		 * If the previous entry was in the table, the new entry
@@ -632,27 +645,27 @@ xfs_dir2_data_make_free(
 		 * the old one and add the new one.
 		 */
 		if (dfp) {
-			xfs_dir2_data_freeremove(d, dfp, needlogp);
-			(void)xfs_dir2_data_freeinsert(d, prevdup, needlogp);
+			xfs_dir2_data_freeremove(hdr, dfp, needlogp);
+			(void)xfs_dir2_data_freeinsert(hdr, prevdup, needlogp);
 		}
 		/*
 		 * Otherwise we need a scan if the new entry is big enough.
 		 */
 		else {
 			needscan = be16_to_cpu(prevdup->length) >
-				   be16_to_cpu(d->hdr.bestfree[2].length);
+				   be16_to_cpu(hdr->bestfree[2].length);
 		}
 	}
 	/*
 	 * The following entry is free, merge with it.
 	 */
 	else if (postdup) {
-		dfp = xfs_dir2_data_freefind(d, postdup);
-		newdup = (xfs_dir2_data_unused_t *)((char *)d + offset);
+		dfp = xfs_dir2_data_freefind(hdr, postdup);
+		newdup = (xfs_dir2_data_unused_t *)((char *)hdr + offset);
 		newdup->freetag = cpu_to_be16(XFS_DIR2_DATA_FREE_TAG);
 		newdup->length = cpu_to_be16(len + be16_to_cpu(postdup->length));
 		*xfs_dir2_data_unused_tag_p(newdup) =
-			cpu_to_be16((char *)newdup - (char *)d);
+			cpu_to_be16((char *)newdup - (char *)hdr);
 		xfs_dir2_data_log_unused(tp, bp, newdup);
 		/*
 		 * If the following entry was in the table, the new entry
@@ -660,28 +673,28 @@ xfs_dir2_data_make_free(
 		 * the old one and add the new one.
 		 */
 		if (dfp) {
-			xfs_dir2_data_freeremove(d, dfp, needlogp);
-			(void)xfs_dir2_data_freeinsert(d, newdup, needlogp);
+			xfs_dir2_data_freeremove(hdr, dfp, needlogp);
+			(void)xfs_dir2_data_freeinsert(hdr, newdup, needlogp);
 		}
 		/*
 		 * Otherwise we need a scan if the new entry is big enough.
 		 */
 		else {
 			needscan = be16_to_cpu(newdup->length) >
-				   be16_to_cpu(d->hdr.bestfree[2].length);
+				   be16_to_cpu(hdr->bestfree[2].length);
 		}
 	}
 	/*
 	 * Neither neighbor is free.  Make a new entry.
 	 */
 	else {
-		newdup = (xfs_dir2_data_unused_t *)((char *)d + offset);
+		newdup = (xfs_dir2_data_unused_t *)((char *)hdr + offset);
 		newdup->freetag = cpu_to_be16(XFS_DIR2_DATA_FREE_TAG);
 		newdup->length = cpu_to_be16(len);
 		*xfs_dir2_data_unused_tag_p(newdup) =
-			cpu_to_be16((char *)newdup - (char *)d);
+			cpu_to_be16((char *)newdup - (char *)hdr);
 		xfs_dir2_data_log_unused(tp, bp, newdup);
-		(void)xfs_dir2_data_freeinsert(d, newdup, needlogp);
+		(void)xfs_dir2_data_freeinsert(hdr, newdup, needlogp);
 	}
 	*needscanp = needscan;
 }
@@ -699,7 +712,7 @@ xfs_dir2_data_use_free(
 	int			*needlogp,	/* out: need to log header */
 	int			*needscanp)	/* out: need regen bestfree */
 {
-	xfs_dir2_data_t		*d;		/* data block */
+	xfs_dir2_data_hdr_t	*hdr;		/* data block header */
 	xfs_dir2_data_free_t	*dfp;		/* bestfree pointer */
 	int			matchback;	/* matches end of freespace */
 	int			matchfront;	/* matches start of freespace */
@@ -708,24 +721,24 @@ xfs_dir2_data_use_free(
 	xfs_dir2_data_unused_t	*newdup2;	/* another new unused entry */
 	int			oldlen;		/* old unused entry's length */
 
-	d = bp->data;
-	ASSERT(be32_to_cpu(d->hdr.magic) == XFS_DIR2_DATA_MAGIC ||
-	       be32_to_cpu(d->hdr.magic) == XFS_DIR2_BLOCK_MAGIC);
+	hdr = bp->data;
+	ASSERT(be32_to_cpu(hdr->magic) == XFS_DIR2_DATA_MAGIC ||
+	       be32_to_cpu(hdr->magic) == XFS_DIR2_BLOCK_MAGIC);
 	ASSERT(be16_to_cpu(dup->freetag) == XFS_DIR2_DATA_FREE_TAG);
-	ASSERT(offset >= (char *)dup - (char *)d);
-	ASSERT(offset + len <= (char *)dup + be16_to_cpu(dup->length) - (char *)d);
-	ASSERT((char *)dup - (char *)d == be16_to_cpu(*xfs_dir2_data_unused_tag_p(dup)));
+	ASSERT(offset >= (char *)dup - (char *)hdr);
+	ASSERT(offset + len <= (char *)dup + be16_to_cpu(dup->length) - (char *)hdr);
+	ASSERT((char *)dup - (char *)hdr == be16_to_cpu(*xfs_dir2_data_unused_tag_p(dup)));
 	/*
 	 * Look up the entry in the bestfree table.
 	 */
-	dfp = xfs_dir2_data_freefind(d, dup);
+	dfp = xfs_dir2_data_freefind(hdr, dup);
 	oldlen = be16_to_cpu(dup->length);
-	ASSERT(dfp || oldlen <= be16_to_cpu(d->hdr.bestfree[2].length));
+	ASSERT(dfp || oldlen <= be16_to_cpu(hdr->bestfree[2].length));
 	/*
 	 * Check for alignment with front and back of the entry.
 	 */
-	matchfront = (char *)dup - (char *)d == offset;
-	matchback = (char *)dup + oldlen - (char *)d == offset + len;
+	matchfront = (char *)dup - (char *)hdr == offset;
+	matchback = (char *)dup + oldlen - (char *)hdr == offset + len;
 	ASSERT(*needscanp == 0);
 	needscan = 0;
 	/*
@@ -734,9 +747,9 @@ xfs_dir2_data_use_free(
 	 */
 	if (matchfront && matchback) {
 		if (dfp) {
-			needscan = (d->hdr.bestfree[2].offset != 0);
+			needscan = (hdr->bestfree[2].offset != 0);
 			if (!needscan)
-				xfs_dir2_data_freeremove(d, dfp, needlogp);
+				xfs_dir2_data_freeremove(hdr, dfp, needlogp);
 		}
 	}
 	/*
@@ -744,27 +757,27 @@ xfs_dir2_data_use_free(
 	 * Make a new entry with the remaining freespace.
 	 */
 	else if (matchfront) {
-		newdup = (xfs_dir2_data_unused_t *)((char *)d + offset + len);
+		newdup = (xfs_dir2_data_unused_t *)((char *)hdr + offset + len);
 		newdup->freetag = cpu_to_be16(XFS_DIR2_DATA_FREE_TAG);
 		newdup->length = cpu_to_be16(oldlen - len);
 		*xfs_dir2_data_unused_tag_p(newdup) =
-			cpu_to_be16((char *)newdup - (char *)d);
+			cpu_to_be16((char *)newdup - (char *)hdr);
 		xfs_dir2_data_log_unused(tp, bp, newdup);
 		/*
 		 * If it was in the table, remove it and add the new one.
 		 */
 		if (dfp) {
-			xfs_dir2_data_freeremove(d, dfp, needlogp);
-			dfp = xfs_dir2_data_freeinsert(d, newdup, needlogp);
+			xfs_dir2_data_freeremove(hdr, dfp, needlogp);
+			dfp = xfs_dir2_data_freeinsert(hdr, newdup, needlogp);
 			ASSERT(dfp != NULL);
 			ASSERT(dfp->length == newdup->length);
-			ASSERT(be16_to_cpu(dfp->offset) == (char *)newdup - (char *)d);
+			ASSERT(be16_to_cpu(dfp->offset) == (char *)newdup - (char *)hdr);
 			/*
 			 * If we got inserted at the last slot,
 			 * that means we don't know if there was a better
 			 * choice for the last slot, or not.  Rescan.
 			 */
-			needscan = dfp == &d->hdr.bestfree[2];
+			needscan = dfp == &hdr->bestfree[2];
 		}
 	}
 	/*
@@ -773,25 +786,25 @@ xfs_dir2_data_use_free(
 	 */
 	else if (matchback) {
 		newdup = dup;
-		newdup->length = cpu_to_be16(((char *)d + offset) - (char *)newdup);
+		newdup->length = cpu_to_be16(((char *)hdr + offset) - (char *)newdup);
 		*xfs_dir2_data_unused_tag_p(newdup) =
-			cpu_to_be16((char *)newdup - (char *)d);
+			cpu_to_be16((char *)newdup - (char *)hdr);
 		xfs_dir2_data_log_unused(tp, bp, newdup);
 		/*
 		 * If it was in the table, remove it and add the new one.
 		 */
 		if (dfp) {
-			xfs_dir2_data_freeremove(d, dfp, needlogp);
-			dfp = xfs_dir2_data_freeinsert(d, newdup, needlogp);
+			xfs_dir2_data_freeremove(hdr, dfp, needlogp);
+			dfp = xfs_dir2_data_freeinsert(hdr, newdup, needlogp);
 			ASSERT(dfp != NULL);
 			ASSERT(dfp->length == newdup->length);
-			ASSERT(be16_to_cpu(dfp->offset) == (char *)newdup - (char *)d);
+			ASSERT(be16_to_cpu(dfp->offset) == (char *)newdup - (char *)hdr);
 			/*
 			 * If we got inserted at the last slot,
 			 * that means we don't know if there was a better
 			 * choice for the last slot, or not.  Rescan.
 			 */
-			needscan = dfp == &d->hdr.bestfree[2];
+			needscan = dfp == &hdr->bestfree[2];
 		}
 	}
 	/*
@@ -800,15 +813,15 @@ xfs_dir2_data_use_free(
 	 */
 	else {
 		newdup = dup;
-		newdup->length = cpu_to_be16(((char *)d + offset) - (char *)newdup);
+		newdup->length = cpu_to_be16(((char *)hdr + offset) - (char *)newdup);
 		*xfs_dir2_data_unused_tag_p(newdup) =
-			cpu_to_be16((char *)newdup - (char *)d);
+			cpu_to_be16((char *)newdup - (char *)hdr);
 		xfs_dir2_data_log_unused(tp, bp, newdup);
-		newdup2 = (xfs_dir2_data_unused_t *)((char *)d + offset + len);
+		newdup2 = (xfs_dir2_data_unused_t *)((char *)hdr + offset + len);
 		newdup2->freetag = cpu_to_be16(XFS_DIR2_DATA_FREE_TAG);
 		newdup2->length = cpu_to_be16(oldlen - len - be16_to_cpu(newdup->length));
 		*xfs_dir2_data_unused_tag_p(newdup2) =
-			cpu_to_be16((char *)newdup2 - (char *)d);
+			cpu_to_be16((char *)newdup2 - (char *)hdr);
 		xfs_dir2_data_log_unused(tp, bp, newdup2);
 		/*
 		 * If the old entry was in the table, we need to scan
@@ -819,12 +832,12 @@ xfs_dir2_data_use_free(
 		 * the 2 new will work.
 		 */
 		if (dfp) {
-			needscan = (d->hdr.bestfree[2].length != 0);
+			needscan = (hdr->bestfree[2].length != 0);
 			if (!needscan) {
-				xfs_dir2_data_freeremove(d, dfp, needlogp);
-				(void)xfs_dir2_data_freeinsert(d, newdup,
+				xfs_dir2_data_freeremove(hdr, dfp, needlogp);
+				(void)xfs_dir2_data_freeinsert(hdr, newdup,
 					needlogp);
-				(void)xfs_dir2_data_freeinsert(d, newdup2,
+				(void)xfs_dir2_data_freeinsert(hdr, newdup2,
 					needlogp);
 			}
 		}
Index: xfs/fs/xfs/xfs_dir2_data.h
===================================================================
--- xfs.orig/fs/xfs/xfs_dir2_data.h	2011-06-29 19:45:24.043633305 +0200
+++ xfs/fs/xfs/xfs_dir2_data.h	2011-06-30 09:38:40.136734154 +0200
@@ -157,12 +157,10 @@ extern void xfs_dir2_data_check(struct x
 #else
 #define	xfs_dir2_data_check(dp,bp)
 #endif
-extern xfs_dir2_data_free_t *xfs_dir2_data_freefind(xfs_dir2_data_t *d,
-				xfs_dir2_data_unused_t *dup);
-extern xfs_dir2_data_free_t *xfs_dir2_data_freeinsert(xfs_dir2_data_t *d,
+extern xfs_dir2_data_free_t *xfs_dir2_data_freeinsert(xfs_dir2_data_hdr_t *hdr,
 				xfs_dir2_data_unused_t *dup, int *loghead);
-extern void xfs_dir2_data_freescan(struct xfs_mount *mp, xfs_dir2_data_t *d,
-				int *loghead);
+extern void xfs_dir2_data_freescan(struct xfs_mount *mp,
+				xfs_dir2_data_hdr_t *hdr, int *loghead);
 extern int xfs_dir2_data_init(struct xfs_da_args *args, xfs_dir2_db_t blkno,
 				struct xfs_dabuf **bpp);
 extern void xfs_dir2_data_log_entry(struct xfs_trans *tp, struct xfs_dabuf *bp,
Index: xfs/fs/xfs/xfs_dir2_leaf.c
===================================================================
--- xfs.orig/fs/xfs/xfs_dir2_leaf.c	2011-06-30 09:38:36.590067529 +0200
+++ xfs/fs/xfs/xfs_dir2_leaf.c	2011-06-30 09:38:40.140067486 +0200
@@ -132,7 +132,7 @@ xfs_dir2_block_to_leaf(
 	 */
 	hdr->magic = cpu_to_be32(XFS_DIR2_DATA_MAGIC);
 	if (needscan)
-		xfs_dir2_data_freescan(mp, (xfs_dir2_data_t *)hdr, &needlog);
+		xfs_dir2_data_freescan(mp, hdr, &needlog);
 	/*
 	 * Set up leaf tail and bests table.
 	 */
@@ -278,7 +278,7 @@ xfs_dir2_leaf_addname(
 {
 	__be16			*bestsp;	/* freespace table in leaf */
 	int			compact;	/* need to compact leaves */
-	xfs_dir2_data_t		*data;		/* data block structure */
+	xfs_dir2_data_hdr_t	*hdr;		/* data block header */
 	xfs_dabuf_t		*dbp;		/* data block buffer */
 	xfs_dir2_data_entry_t	*dep;		/* data block entry */
 	xfs_inode_t		*dp;		/* incore directory inode */
@@ -486,8 +486,8 @@ xfs_dir2_leaf_addname(
 		 */
 		else
 			xfs_dir2_leaf_log_bests(tp, lbp, use_block, use_block);
-		data = dbp->data;
-		bestsp[use_block] = data->hdr.bestfree[0].length;
+		hdr = dbp->data;
+		bestsp[use_block] = hdr->bestfree[0].length;
 		grown = 1;
 	}
 	/*
@@ -501,7 +501,7 @@ xfs_dir2_leaf_addname(
 			xfs_da_brelse(tp, lbp);
 			return error;
 		}
-		data = dbp->data;
+		hdr = dbp->data;
 		grown = 0;
 	}
 	xfs_dir2_data_check(dp, dbp);
@@ -509,14 +509,14 @@ xfs_dir2_leaf_addname(
 	 * Point to the biggest freespace in our data block.
 	 */
 	dup = (xfs_dir2_data_unused_t *)
-	      ((char *)data + be16_to_cpu(data->hdr.bestfree[0].offset));
+	      ((char *)hdr + be16_to_cpu(hdr->bestfree[0].offset));
 	ASSERT(be16_to_cpu(dup->length) >= length);
 	needscan = needlog = 0;
 	/*
 	 * Mark the initial part of our freespace in use for the new entry.
 	 */
 	xfs_dir2_data_use_free(tp, dbp, dup,
-		(xfs_dir2_data_aoff_t)((char *)dup - (char *)data), length,
+		(xfs_dir2_data_aoff_t)((char *)dup - (char *)hdr), length,
 		&needlog, &needscan);
 	/*
 	 * Initialize our new entry (at last).
@@ -526,12 +526,12 @@ xfs_dir2_leaf_addname(
 	dep->namelen = args->namelen;
 	memcpy(dep->name, args->name, dep->namelen);
 	tagp = xfs_dir2_data_entry_tag_p(dep);
-	*tagp = cpu_to_be16((char *)dep - (char *)data);
+	*tagp = cpu_to_be16((char *)dep - (char *)hdr);
 	/*
 	 * Need to scan fix up the bestfree table.
 	 */
 	if (needscan)
-		xfs_dir2_data_freescan(mp, data, &needlog);
+		xfs_dir2_data_freescan(mp, hdr, &needlog);
 	/*
 	 * Need to log the data block's header.
 	 */
@@ -542,8 +542,8 @@ xfs_dir2_leaf_addname(
 	 * If the bests table needs to be changed, do it.
 	 * Log the change unless we've already done that.
 	 */
-	if (be16_to_cpu(bestsp[use_block]) != be16_to_cpu(data->hdr.bestfree[0].length)) {
-		bestsp[use_block] = data->hdr.bestfree[0].length;
+	if (be16_to_cpu(bestsp[use_block]) != be16_to_cpu(hdr->bestfree[0].length)) {
+		bestsp[use_block] = hdr->bestfree[0].length;
 		if (!grown)
 			xfs_dir2_leaf_log_bests(tp, lbp, use_block, use_block);
 	}
@@ -786,6 +786,7 @@ xfs_dir2_leaf_getdents(
 	xfs_dir2_db_t		curdb;		/* db for current block */
 	xfs_dir2_off_t		curoff;		/* current overall offset */
 	xfs_dir2_data_t		*data;		/* data block structure */
+	xfs_dir2_data_hdr_t	*hdr;		/* data block header */
 	xfs_dir2_data_entry_t	*dep;		/* data entry */
 	xfs_dir2_data_unused_t	*dup;		/* unused entry */
 	int			error = 0;	/* error return value */
@@ -1044,6 +1045,7 @@ xfs_dir2_leaf_getdents(
 				ASSERT(xfs_dir2_byte_to_db(mp, curoff) ==
 				       curdb);
 			data = bp->data;
+			hdr = &data->hdr;
 			xfs_dir2_data_check(dp, bp);
 			/*
 			 * Find our position in the block.
@@ -1054,12 +1056,12 @@ xfs_dir2_leaf_getdents(
 			 * Skip past the header.
 			 */
 			if (byteoff == 0)
-				curoff += (uint)sizeof(data->hdr);
+				curoff += (uint)sizeof(*hdr);
 			/*
 			 * Skip past entries until we reach our offset.
 			 */
 			else {
-				while ((char *)ptr - (char *)data < byteoff) {
+				while ((char *)ptr - (char *)hdr < byteoff) {
 					dup = (xfs_dir2_data_unused_t *)ptr;
 
 					if (be16_to_cpu(dup->freetag)
@@ -1080,8 +1082,8 @@ xfs_dir2_leaf_getdents(
 				curoff =
 					xfs_dir2_db_off_to_byte(mp,
 					    xfs_dir2_byte_to_db(mp, curoff),
-					    (char *)ptr - (char *)data);
-				if (ptr >= (char *)data + mp->m_dirblksize) {
+					    (char *)ptr - (char *)hdr);
+				if (ptr >= (char *)hdr + mp->m_dirblksize) {
 					continue;
 				}
 			}
@@ -1462,7 +1464,7 @@ xfs_dir2_leaf_removename(
 	xfs_da_args_t		*args)		/* operation arguments */
 {
 	__be16			*bestsp;	/* leaf block best freespace */
-	xfs_dir2_data_t		*data;		/* data block structure */
+	xfs_dir2_data_hdr_t	*hdr;		/* data block header */
 	xfs_dir2_db_t		db;		/* data block number */
 	xfs_dabuf_t		*dbp;		/* data block buffer */
 	xfs_dir2_data_entry_t	*dep;		/* data entry structure */
@@ -1492,7 +1494,7 @@ xfs_dir2_leaf_removename(
 	tp = args->trans;
 	mp = dp->i_mount;
 	leaf = lbp->data;
-	data = dbp->data;
+	hdr = dbp->data;
 	xfs_dir2_data_check(dp, dbp);
 	/*
 	 * Point to the leaf entry, use that to point to the data entry.
@@ -1500,9 +1502,9 @@ xfs_dir2_leaf_removename(
 	lep = &leaf->ents[index];
 	db = xfs_dir2_dataptr_to_db(mp, be32_to_cpu(lep->address));
 	dep = (xfs_dir2_data_entry_t *)
-	      ((char *)data + xfs_dir2_dataptr_to_off(mp, be32_to_cpu(lep->address)));
+	      ((char *)hdr + xfs_dir2_dataptr_to_off(mp, be32_to_cpu(lep->address)));
 	needscan = needlog = 0;
-	oldbest = be16_to_cpu(data->hdr.bestfree[0].length);
+	oldbest = be16_to_cpu(hdr->bestfree[0].length);
 	ltp = xfs_dir2_leaf_tail_p(mp, leaf);
 	bestsp = xfs_dir2_leaf_bests_p(ltp);
 	ASSERT(be16_to_cpu(bestsp[db]) == oldbest);
@@ -1510,7 +1512,7 @@ xfs_dir2_leaf_removename(
 	 * Mark the former data entry unused.
 	 */
 	xfs_dir2_data_make_free(tp, dbp,
-		(xfs_dir2_data_aoff_t)((char *)dep - (char *)data),
+		(xfs_dir2_data_aoff_t)((char *)dep - (char *)hdr),
 		xfs_dir2_data_entsize(dep->namelen), &needlog, &needscan);
 	/*
 	 * We just mark the leaf entry stale by putting a null in it.
@@ -1524,23 +1526,23 @@ xfs_dir2_leaf_removename(
 	 * log the data block header if necessary.
 	 */
 	if (needscan)
-		xfs_dir2_data_freescan(mp, data, &needlog);
+		xfs_dir2_data_freescan(mp, hdr, &needlog);
 	if (needlog)
 		xfs_dir2_data_log_header(tp, dbp);
 	/*
 	 * If the longest freespace in the data block has changed,
 	 * put the new value in the bests table and log that.
 	 */
-	if (be16_to_cpu(data->hdr.bestfree[0].length) != oldbest) {
-		bestsp[db] = data->hdr.bestfree[0].length;
+	if (be16_to_cpu(hdr->bestfree[0].length) != oldbest) {
+		bestsp[db] = hdr->bestfree[0].length;
 		xfs_dir2_leaf_log_bests(tp, lbp, db, db);
 	}
 	xfs_dir2_data_check(dp, dbp);
 	/*
 	 * If the data block is now empty then get rid of the data block.
 	 */
-	if (be16_to_cpu(data->hdr.bestfree[0].length) ==
-	    mp->m_dirblksize - (uint)sizeof(data->hdr)) {
+	if (be16_to_cpu(hdr->bestfree[0].length) ==
+	    mp->m_dirblksize - (uint)sizeof(*hdr)) {
 		ASSERT(db != mp->m_dirdatablk);
 		if ((error = xfs_dir2_shrink_inode(args, db, dbp))) {
 			/*
@@ -1711,9 +1713,6 @@ xfs_dir2_leaf_trim_data(
 	xfs_dir2_db_t		db)		/* data block number */
 {
 	__be16			*bestsp;	/* leaf bests table */
-#ifdef DEBUG
-	xfs_dir2_data_t		*data;		/* data block structure */
-#endif
 	xfs_dabuf_t		*dbp;		/* data block buffer */
 	xfs_inode_t		*dp;		/* incore directory inode */
 	int			error;		/* error return value */
@@ -1732,20 +1731,21 @@ xfs_dir2_leaf_trim_data(
 			XFS_DATA_FORK))) {
 		return error;
 	}
-#ifdef DEBUG
-	data = dbp->data;
-	ASSERT(be32_to_cpu(data->hdr.magic) == XFS_DIR2_DATA_MAGIC);
-#endif
-	/* this seems to be an error
-	 * data is only valid if DEBUG is defined?
-	 * RMC 09/08/1999
-	 */
 
 	leaf = lbp->data;
 	ltp = xfs_dir2_leaf_tail_p(mp, leaf);
-	ASSERT(be16_to_cpu(data->hdr.bestfree[0].length) ==
-	       mp->m_dirblksize - (uint)sizeof(data->hdr));
+
+#ifdef DEBUG
+{
+	struct xfs_dir2_data_hdr *hdr = dbp->data;
+
+	ASSERT(be32_to_cpu(hdr->magic) == XFS_DIR2_DATA_MAGIC);
+	ASSERT(be16_to_cpu(hdr->bestfree[0].length) ==
+	       mp->m_dirblksize - (uint)sizeof(*hdr));
 	ASSERT(db == be32_to_cpu(ltp->bestcount) - 1);
+}
+#endif
+
 	/*
 	 * Get rid of the data block.
 	 */
Index: xfs/fs/xfs/xfs_dir2_node.c
===================================================================
--- xfs.orig/fs/xfs/xfs_dir2_node.c	2011-06-30 09:27:19.103409194 +0200
+++ xfs/fs/xfs/xfs_dir2_node.c	2011-06-30 09:38:40.143400818 +0200
@@ -842,7 +842,7 @@ xfs_dir2_leafn_remove(
 	xfs_da_state_blk_t	*dblk,		/* data block */
 	int			*rval)		/* resulting block needs join */
 {
-	xfs_dir2_data_t		*data;		/* data block structure */
+	xfs_dir2_data_hdr_t	*hdr;		/* data block header */
 	xfs_dir2_db_t		db;		/* data block number */
 	xfs_dabuf_t		*dbp;		/* data block buffer */
 	xfs_dir2_data_entry_t	*dep;		/* data block entry */
@@ -887,9 +887,9 @@ xfs_dir2_leafn_remove(
 	 * in the data block in case it changes.
 	 */
 	dbp = dblk->bp;
-	data = dbp->data;
-	dep = (xfs_dir2_data_entry_t *)((char *)data + off);
-	longest = be16_to_cpu(data->hdr.bestfree[0].length);
+	hdr = dbp->data;
+	dep = (xfs_dir2_data_entry_t *)((char *)hdr + off);
+	longest = be16_to_cpu(hdr->bestfree[0].length);
 	needlog = needscan = 0;
 	xfs_dir2_data_make_free(tp, dbp, off,
 		xfs_dir2_data_entsize(dep->namelen), &needlog, &needscan);
@@ -898,7 +898,7 @@ xfs_dir2_leafn_remove(
 	 * Log the data block header if needed.
 	 */
 	if (needscan)
-		xfs_dir2_data_freescan(mp, data, &needlog);
+		xfs_dir2_data_freescan(mp, hdr, &needlog);
 	if (needlog)
 		xfs_dir2_data_log_header(tp, dbp);
 	xfs_dir2_data_check(dp, dbp);
@@ -906,7 +906,7 @@ xfs_dir2_leafn_remove(
 	 * If the longest data block freespace changes, need to update
 	 * the corresponding freeblock entry.
 	 */
-	if (longest < be16_to_cpu(data->hdr.bestfree[0].length)) {
+	if (longest < be16_to_cpu(hdr->bestfree[0].length)) {
 		int		error;		/* error return value */
 		xfs_dabuf_t	*fbp;		/* freeblock buffer */
 		xfs_dir2_db_t	fdb;		/* freeblock block number */
@@ -932,19 +932,19 @@ xfs_dir2_leafn_remove(
 		 * Calculate which entry we need to fix.
 		 */
 		findex = xfs_dir2_db_to_fdindex(mp, db);
-		longest = be16_to_cpu(data->hdr.bestfree[0].length);
+		longest = be16_to_cpu(hdr->bestfree[0].length);
 		/*
 		 * If the data block is now empty we can get rid of it
 		 * (usually).
 		 */
-		if (longest == mp->m_dirblksize - (uint)sizeof(data->hdr)) {
+		if (longest == mp->m_dirblksize - (uint)sizeof(*hdr)) {
 			/*
 			 * Try to punch out the data block.
 			 */
 			error = xfs_dir2_shrink_inode(args, db, dbp);
 			if (error == 0) {
 				dblk->bp = NULL;
-				data = NULL;
+				hdr = NULL;
 			}
 			/*
 			 * We can get ENOSPC if there's no space reservation.
@@ -960,7 +960,7 @@ xfs_dir2_leafn_remove(
 		 * If we got rid of the data block, we can eliminate that entry
 		 * in the free block.
 		 */
-		if (data == NULL) {
+		if (hdr == NULL) {
 			/*
 			 * One less used entry in the free table.
 			 */
@@ -1356,7 +1356,7 @@ xfs_dir2_node_addname_int(
 	xfs_da_args_t		*args,		/* operation arguments */
 	xfs_da_state_blk_t	*fblk)		/* optional freespace block */
 {
-	xfs_dir2_data_t		*data;		/* data block structure */
+	xfs_dir2_data_hdr_t	*hdr;		/* data block header */
 	xfs_dir2_db_t		dbno;		/* data block number */
 	xfs_dabuf_t		*dbp;		/* data block buffer */
 	xfs_dir2_data_entry_t	*dep;		/* data entry pointer */
@@ -1641,8 +1641,8 @@ xfs_dir2_node_addname_int(
 		 * We haven't allocated the data entry yet so this will
 		 * change again.
 		 */
-		data = dbp->data;
-		free->bests[findex] = data->hdr.bestfree[0].length;
+		hdr = dbp->data;
+		free->bests[findex] = hdr->bestfree[0].length;
 		logfree = 1;
 	}
 	/*
@@ -1667,21 +1667,21 @@ xfs_dir2_node_addname_int(
 				xfs_da_buf_done(fbp);
 			return error;
 		}
-		data = dbp->data;
+		hdr = dbp->data;
 		logfree = 0;
 	}
-	ASSERT(be16_to_cpu(data->hdr.bestfree[0].length) >= length);
+	ASSERT(be16_to_cpu(hdr->bestfree[0].length) >= length);
 	/*
 	 * Point to the existing unused space.
 	 */
 	dup = (xfs_dir2_data_unused_t *)
-	      ((char *)data + be16_to_cpu(data->hdr.bestfree[0].offset));
+	      ((char *)hdr + be16_to_cpu(hdr->bestfree[0].offset));
 	needscan = needlog = 0;
 	/*
 	 * Mark the first part of the unused space, inuse for us.
 	 */
 	xfs_dir2_data_use_free(tp, dbp, dup,
-		(xfs_dir2_data_aoff_t)((char *)dup - (char *)data), length,
+		(xfs_dir2_data_aoff_t)((char *)dup - (char *)hdr), length,
 		&needlog, &needscan);
 	/*
 	 * Fill in the new entry and log it.
@@ -1691,13 +1691,13 @@ xfs_dir2_node_addname_int(
 	dep->namelen = args->namelen;
 	memcpy(dep->name, args->name, dep->namelen);
 	tagp = xfs_dir2_data_entry_tag_p(dep);
-	*tagp = cpu_to_be16((char *)dep - (char *)data);
+	*tagp = cpu_to_be16((char *)dep - (char *)hdr);
 	xfs_dir2_data_log_entry(tp, dbp, dep);
 	/*
 	 * Rescan the block for bestfree if needed.
 	 */
 	if (needscan)
-		xfs_dir2_data_freescan(mp, data, &needlog);
+		xfs_dir2_data_freescan(mp, hdr, &needlog);
 	/*
 	 * Log the data block header if needed.
 	 */
@@ -1706,8 +1706,8 @@ xfs_dir2_node_addname_int(
 	/*
 	 * If the freespace entry is now wrong, update it.
 	 */
-	if (be16_to_cpu(free->bests[findex]) != be16_to_cpu(data->hdr.bestfree[0].length)) {
-		free->bests[findex] = data->hdr.bestfree[0].length;
+	if (be16_to_cpu(free->bests[findex]) != be16_to_cpu(hdr->bestfree[0].length)) {
+		free->bests[findex] = hdr->bestfree[0].length;
 		logfree = 1;
 	}
 	/*
@@ -1857,7 +1857,7 @@ xfs_dir2_node_replace(
 	xfs_da_args_t		*args)		/* operation arguments */
 {
 	xfs_da_state_blk_t	*blk;		/* leaf block */
-	xfs_dir2_data_t		*data;		/* data block structure */
+	xfs_dir2_data_hdr_t	*hdr;		/* data block header */
 	xfs_dir2_data_entry_t	*dep;		/* data entry changed */
 	int			error;		/* error return value */
 	int			i;		/* btree level */
@@ -1901,10 +1901,10 @@ xfs_dir2_node_replace(
 		/*
 		 * Point to the data entry.
 		 */
-		data = state->extrablk.bp->data;
-		ASSERT(be32_to_cpu(data->hdr.magic) == XFS_DIR2_DATA_MAGIC);
+		hdr = state->extrablk.bp->data;
+		ASSERT(be32_to_cpu(hdr->magic) == XFS_DIR2_DATA_MAGIC);
 		dep = (xfs_dir2_data_entry_t *)
-		      ((char *)data +
+		      ((char *)hdr +
 		       xfs_dir2_dataptr_to_off(state->mp, be32_to_cpu(lep->address)));
 		ASSERT(inum != be64_to_cpu(dep->inumber));
 		/*

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 88+ messages in thread

* [PATCH 19/27] xfs: kill struct xfs_dir2_data
  2011-07-01  9:43 [PATCH 00/27] patch queue for Linux 3.1, V2 Christoph Hellwig
                   ` (16 preceding siblings ...)
  2011-07-01  9:43 ` [PATCH 18/27] xfs: avoid usage of struct xfs_dir2_data Christoph Hellwig
@ 2011-07-01  9:43 ` Christoph Hellwig
  2011-07-06  3:05   ` Dave Chinner
  2011-07-06  3:38   ` Alex Elder
  2011-07-01  9:43 ` [PATCH 20/27] xfs: cleanup the defintion of struct xfs_dir2_data_entry Christoph Hellwig
                   ` (8 subsequent siblings)
  26 siblings, 2 replies; 88+ messages in thread
From: Christoph Hellwig @ 2011-07-01  9:43 UTC (permalink / raw)
  To: xfs

[-- Attachment #1: xfs-kill-xfs_dir2_data_t --]
[-- Type: text/plain, Size: 5902 bytes --]

Remove the confusing xfs_dir2_data structure.  It is supposed to describe
an XFS dir2 data btree block, but due to the variable sized nature of
almost all elements in it it can't actuall do anything close to that
job.  In addition to accessing the fixed offset header structure it was
only used to get a pointer to the first dir or unused entry after it,
which can be trivially replaced by pointer arithmetics on the header
pointer.  For most users that is actually more natural anyway, as they
don't use a typed pointer but rather a character pointer for further
arithmetics.

Signed-off-by: Christoph Hellwig <hch@lst.de>

Index: xfs/fs/xfs/xfs_dir2_data.c
===================================================================
--- xfs.orig/fs/xfs/xfs_dir2_data.c	2011-06-30 09:38:40.133400821 +0200
+++ xfs/fs/xfs/xfs_dir2_data.c	2011-06-30 09:38:41.643400800 +0200
@@ -53,7 +53,6 @@ xfs_dir2_data_check(
 	xfs_dir2_data_free_t	*bf;		/* bestfree table */
 	xfs_dir2_block_tail_t	*btp=NULL;	/* block tail */
 	int			count;		/* count of entries found */
-	xfs_dir2_data_t		*d;		/* data block pointer */
 	xfs_dir2_data_hdr_t	*hdr;		/* data block header */
 	xfs_dir2_data_entry_t	*dep;		/* data entry */
 	xfs_dir2_data_free_t	*dfp;		/* bestfree entry */
@@ -70,10 +69,9 @@ xfs_dir2_data_check(
 	struct xfs_name		name;
 
 	mp = dp->i_mount;
-	d = bp->data;
-	hdr = &d->hdr;
+	hdr = bp->data;
 	bf = hdr->bestfree;
-	p = (char *)d->u;
+	p = (char *)(hdr + 1);
 
 	if (hdr->magic == cpu_to_be32(XFS_DIR2_BLOCK_MAGIC)) {
 		btp = xfs_dir2_block_tail_p(mp, hdr);
@@ -336,7 +334,6 @@ xfs_dir2_data_freescan(
 	xfs_dir2_data_hdr_t	*hdr,		/* data block header */
 	int			*loghead)	/* out: log data header */
 {
-	xfs_dir2_data_t		*d = (xfs_dir2_data_t *)hdr;
 	xfs_dir2_block_tail_t	*btp;		/* block tail */
 	xfs_dir2_data_entry_t	*dep;		/* active data entry */
 	xfs_dir2_data_unused_t	*dup;		/* unused data entry */
@@ -355,7 +352,7 @@ xfs_dir2_data_freescan(
 	/*
 	 * Set up pointers.
 	 */
-	p = (char *)d->u;
+	p = (char *)(hdr + 1);
 	if (be32_to_cpu(hdr->magic) == XFS_DIR2_BLOCK_MAGIC) {
 		btp = xfs_dir2_block_tail_p(mp, hdr);
 		endp = (char *)xfs_dir2_block_leaf_p(btp);
@@ -398,7 +395,6 @@ xfs_dir2_data_init(
 	xfs_dabuf_t		**bpp)		/* output block buffer */
 {
 	xfs_dabuf_t		*bp;		/* block buffer */
-	xfs_dir2_data_t		*d;		/* pointer to block */
 	xfs_dir2_data_hdr_t	*hdr;		/* data block header */
 	xfs_inode_t		*dp;		/* incore directory inode */
 	xfs_dir2_data_unused_t	*dup;		/* unused entry pointer */
@@ -424,8 +420,7 @@ xfs_dir2_data_init(
 	/*
 	 * Initialize the header.
 	 */
-	d = bp->data;
-	hdr = &d->hdr;
+	hdr = bp->data;
 	hdr->magic = cpu_to_be32(XFS_DIR2_DATA_MAGIC);
 	hdr->bestfree[0].offset = cpu_to_be16(sizeof(*hdr));
 	for (i = 1; i < XFS_DIR2_DATA_FD_COUNT; i++) {
@@ -436,7 +431,7 @@ xfs_dir2_data_init(
 	/*
 	 * Set up an unused entry for the block's body.
 	 */
-	dup = &d->u[0].unused;
+	dup = (xfs_dir2_data_unused_t *)(hdr + 1);
 	dup->freetag = cpu_to_be16(XFS_DIR2_DATA_FREE_TAG);
 
 	t = mp->m_dirblksize - (uint)sizeof(*hdr);
Index: xfs/fs/xfs/xfs_dir2_data.h
===================================================================
--- xfs.orig/fs/xfs/xfs_dir2_data.h	2011-06-30 09:38:40.136734154 +0200
+++ xfs/fs/xfs/xfs_dir2_data.h	2011-06-30 09:38:41.646734133 +0200
@@ -20,6 +20,22 @@
 
 /*
  * Directory format 2, data block structures.
+ *
+ * A pure data block looks like the following drawing on disk:
+ *
+ *    +-------------------------------------------------+
+ *    | xfs_dir2_data_hdr_t                             |
+ *    +-------------------------------------------------+
+ *    | xfs_dir2_data_entry_t OR xfs_dir2_data_unused_t |
+ *    | xfs_dir2_data_entry_t OR xfs_dir2_data_unused_t |
+ *    | xfs_dir2_data_entry_t OR xfs_dir2_data_unused_t |
+ *    | ...                                             |
+ *    +-------------------------------------------------+
+ *    | unused space                                    |
+ *    +-------------------------------------------------+
+ *
+ * As all the entries are variable sized structures the accessors in this
+ * file need to be used to iterate over them.
  */
 
 struct xfs_dabuf;
@@ -103,23 +119,6 @@ typedef struct xfs_dir2_data_unused {
 	__be16			tag;		/* starting offset of us */
 } xfs_dir2_data_unused_t;
 
-typedef union {
-	xfs_dir2_data_entry_t	entry;
-	xfs_dir2_data_unused_t	unused;
-} xfs_dir2_data_union_t;
-
-/*
- * Generic data block structure, for xfs_db.
- */
-typedef struct xfs_dir2_data {
-	xfs_dir2_data_hdr_t	hdr;		/* magic XFS_DIR2_DATA_MAGIC */
-	xfs_dir2_data_union_t	u[1];
-} xfs_dir2_data_t;
-
-/*
- * Macros.
- */
-
 /*
  * Size of a data entry.
  */
Index: xfs/fs/xfs/xfs_dir2_leaf.c
===================================================================
--- xfs.orig/fs/xfs/xfs_dir2_leaf.c	2011-06-30 09:38:40.140067486 +0200
+++ xfs/fs/xfs/xfs_dir2_leaf.c	2011-06-30 09:38:41.646734133 +0200
@@ -785,7 +785,6 @@ xfs_dir2_leaf_getdents(
 	int			byteoff;	/* offset in current block */
 	xfs_dir2_db_t		curdb;		/* db for current block */
 	xfs_dir2_off_t		curoff;		/* current overall offset */
-	xfs_dir2_data_t		*data;		/* data block structure */
 	xfs_dir2_data_hdr_t	*hdr;		/* data block header */
 	xfs_dir2_data_entry_t	*dep;		/* data entry */
 	xfs_dir2_data_unused_t	*dup;		/* unused entry */
@@ -1044,13 +1043,12 @@ xfs_dir2_leaf_getdents(
 			else if (curoff > newoff)
 				ASSERT(xfs_dir2_byte_to_db(mp, curoff) ==
 				       curdb);
-			data = bp->data;
-			hdr = &data->hdr;
+			hdr = bp->data;
 			xfs_dir2_data_check(dp, bp);
 			/*
 			 * Find our position in the block.
 			 */
-			ptr = (char *)&data->u;
+			ptr = (char *)(hdr + 1);
 			byteoff = xfs_dir2_byte_to_off(mp, curoff);
 			/*
 			 * Skip past the header.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 88+ messages in thread

* [PATCH 20/27] xfs: cleanup the defintion of struct xfs_dir2_data_entry
  2011-07-01  9:43 [PATCH 00/27] patch queue for Linux 3.1, V2 Christoph Hellwig
                   ` (17 preceding siblings ...)
  2011-07-01  9:43 ` [PATCH 19/27] xfs: kill " Christoph Hellwig
@ 2011-07-01  9:43 ` Christoph Hellwig
  2011-07-06  3:06   ` Dave Chinner
  2011-07-06  3:44   ` Alex Elder
  2011-07-01  9:43 ` [PATCH 21/27] xfs: cleanup struct xfs_dir2_leaf Christoph Hellwig
                   ` (7 subsequent siblings)
  26 siblings, 2 replies; 88+ messages in thread
From: Christoph Hellwig @ 2011-07-01  9:43 UTC (permalink / raw)
  To: xfs

[-- Attachment #1: xfs-cleanup-xfs_dir2_data_entry --]
[-- Type: text/plain, Size: 1202 bytes --]

Remove the tag member which is at a variable offset after the actual
name, and make name a real variable sized C99 array instead of the incorrect
one-sized array which confuses (not only) gcc.

Signed-off-by: Christoph Hellwig <hch@lst.de>

Index: xfs/fs/xfs/xfs_dir2_data.h
===================================================================
--- xfs.orig/fs/xfs/xfs_dir2_data.h	2011-06-29 13:42:35.521563513 +0200
+++ xfs/fs/xfs/xfs_dir2_data.h	2011-06-29 13:43:03.284746440 +0200
@@ -98,14 +98,14 @@ typedef struct xfs_dir2_data_hdr {
 
 /*
  * Active entry in a data block.  Aligned to 8 bytes.
- * Tag appears as the last 2 bytes.
+ *
+ * After the variable length name field there is a 2 byte tag field, which
+ * can be accessed using xfs_dir2_data_entry_tag_p.
  */
 typedef struct xfs_dir2_data_entry {
 	__be64			inumber;	/* inode number */
 	__u8			namelen;	/* name length */
-	__u8			name[1];	/* name bytes, no null */
-						/* variable offset */
-	__be16			tag;		/* starting offset of us */
+	__u8			name[];		/* name bytes, no null */
 } xfs_dir2_data_entry_t;
 
 /*

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 88+ messages in thread

* [PATCH 21/27] xfs: cleanup struct xfs_dir2_leaf
  2011-07-01  9:43 [PATCH 00/27] patch queue for Linux 3.1, V2 Christoph Hellwig
                   ` (18 preceding siblings ...)
  2011-07-01  9:43 ` [PATCH 20/27] xfs: cleanup the defintion of struct xfs_dir2_data_entry Christoph Hellwig
@ 2011-07-01  9:43 ` Christoph Hellwig
  2011-07-06  3:13   ` Dave Chinner
  2011-07-06  3:44   ` Alex Elder
  2011-07-01  9:43 ` [PATCH 22/27] xfs: use generic get_unaligned_beXX helpers Christoph Hellwig
                   ` (6 subsequent siblings)
  26 siblings, 2 replies; 88+ messages in thread
From: Christoph Hellwig @ 2011-07-01  9:43 UTC (permalink / raw)
  To: xfs

[-- Attachment #1: xfs-cleanup-xfs_dir2_leaf_t --]
[-- Type: text/plain, Size: 3927 bytes --]

Simplify the confusing xfs_dir2_leaf structure.  It is supposed to describe
an XFS dir2 leaf format btree block, but due to the variable sized nature
of almost all elements in it it can't actuall do anything close to that
job.   Remove the members that are after the first variable sized array,
given that they could only be used for sizeof expressions that can as well
just use the underlying types directly, and make the ents array a real
C99 variable sized array.

Also factor out the xfs_dir2_leaf_size, to make the sizing of a leaf
entry which already was convoluted somewhat readable after using the
longer type names in the sizeof expressions.

Signed-off-by: Christoph Hellwig <hch@lst.de>

Index: xfs/fs/xfs/xfs_dir2_leaf.c
===================================================================
--- xfs.orig/fs/xfs/xfs_dir2_leaf.c	2011-06-30 09:38:41.646734133 +0200
+++ xfs/fs/xfs/xfs_dir2_leaf.c	2011-06-30 09:38:44.723400763 +0200
@@ -367,9 +367,12 @@ xfs_dir2_leaf_addname(
 	/*
 	 * How many bytes do we need in the leaf block?
 	 */
-	needbytes =
-		(leaf->hdr.stale ? 0 : (uint)sizeof(leaf->ents[0])) +
-		(use_block != -1 ? 0 : (uint)sizeof(leaf->bests[0]));
+	needbytes = 0;
+	if (!leaf->hdr.stale)
+		needbytes += sizeof(xfs_dir2_leaf_entry_t);
+	if (use_block == -1)
+		needbytes += sizeof(xfs_dir2_data_off_t);
+
 	/*
 	 * Now kill use_block if it refers to a missing block, so we
 	 * can use it as an indication of allocation needed.
@@ -1763,6 +1766,20 @@ xfs_dir2_leaf_trim_data(
 	return 0;
 }
 
+static inline size_t
+xfs_dir2_leaf_size(
+	struct xfs_dir2_leaf_hdr	*hdr,
+	int				counts)
+{
+	int			entries;
+
+	entries = be16_to_cpu(hdr->count) - be16_to_cpu(hdr->stale);
+	return sizeof(xfs_dir2_leaf_hdr_t) +
+	    entries * sizeof(xfs_dir2_leaf_entry_t) +
+	    counts * sizeof(xfs_dir2_data_off_t) +
+	    sizeof(xfs_dir2_leaf_tail_t);
+}
+
 /*
  * Convert node form directory to leaf form directory.
  * The root of the node form dir needs to already be a LEAFN block.
@@ -1844,18 +1861,17 @@ xfs_dir2_node_to_leaf(
 	free = fbp->data;
 	ASSERT(be32_to_cpu(free->hdr.magic) == XFS_DIR2_FREE_MAGIC);
 	ASSERT(!free->hdr.firstdb);
+
 	/*
 	 * Now see if the leafn and free data will fit in a leaf1.
 	 * If not, release the buffer and give up.
 	 */
-	if ((uint)sizeof(leaf->hdr) +
-	    (be16_to_cpu(leaf->hdr.count) - be16_to_cpu(leaf->hdr.stale)) * (uint)sizeof(leaf->ents[0]) +
-	    be32_to_cpu(free->hdr.nvalid) * (uint)sizeof(leaf->bests[0]) +
-	    (uint)sizeof(leaf->tail) >
-	    mp->m_dirblksize) {
+	if (xfs_dir2_leaf_size(&leaf->hdr, be32_to_cpu(free->hdr.nvalid)) >
+			mp->m_dirblksize) {
 		xfs_da_brelse(tp, fbp);
 		return 0;
 	}
+
 	/*
 	 * If the leaf has any stale entries in it, compress them out.
 	 * The compact routine will log the header.
@@ -1874,7 +1890,7 @@ xfs_dir2_node_to_leaf(
 	 * Set up the leaf bests table.
 	 */
 	memcpy(xfs_dir2_leaf_bests_p(ltp), free->bests,
-		be32_to_cpu(ltp->bestcount) * sizeof(leaf->bests[0]));
+		be32_to_cpu(ltp->bestcount) * sizeof(xfs_dir2_data_off_t));
 	xfs_dir2_leaf_log_bests(tp, lbp, 0, be32_to_cpu(ltp->bestcount) - 1);
 	xfs_dir2_leaf_log_tail(tp, lbp);
 	xfs_dir2_leaf_check(dp, lbp);
Index: xfs/fs/xfs/xfs_dir2_leaf.h
===================================================================
--- xfs.orig/fs/xfs/xfs_dir2_leaf.h	2011-06-30 09:18:07.263416117 +0200
+++ xfs/fs/xfs/xfs_dir2_leaf.h	2011-06-30 09:38:44.723400763 +0200
@@ -72,10 +72,7 @@ typedef struct xfs_dir2_leaf_tail {
  */
 typedef struct xfs_dir2_leaf {
 	xfs_dir2_leaf_hdr_t	hdr;		/* leaf header */
-	xfs_dir2_leaf_entry_t	ents[1];	/* entries */
-						/* ... */
-	xfs_dir2_data_off_t	bests[1];	/* best free counts */
-	xfs_dir2_leaf_tail_t	tail;		/* leaf tail */
+	xfs_dir2_leaf_entry_t	ents[];	/* entries */
 } xfs_dir2_leaf_t;
 
 /*

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 88+ messages in thread

* [PATCH 22/27] xfs: use generic get_unaligned_beXX helpers
  2011-07-01  9:43 [PATCH 00/27] patch queue for Linux 3.1, V2 Christoph Hellwig
                   ` (19 preceding siblings ...)
  2011-07-01  9:43 ` [PATCH 21/27] xfs: cleanup struct xfs_dir2_leaf Christoph Hellwig
@ 2011-07-01  9:43 ` Christoph Hellwig
  2011-07-06  3:44   ` Dave Chinner
  2011-07-06  3:47   ` Alex Elder
  2011-07-01  9:43 ` [PATCH 23/27] xfs: remove the unused xfs_bufhash structure Christoph Hellwig
                   ` (5 subsequent siblings)
  26 siblings, 2 replies; 88+ messages in thread
From: Christoph Hellwig @ 2011-07-01  9:43 UTC (permalink / raw)
  To: xfs

[-- Attachment #1: xfs-dir2-use-generic-unaligned-byteswap-macros --]
[-- Type: text/plain, Size: 7561 bytes --]

Switch the shortform directory code over to use the generic
get_unaligned_beXX helpers instead of reinventing them.  As a result
kill off xfs_arch.h and move the setting of XFS_NATIVE_HOST into
xfs_linux.h.

Signed-off-by: Christoph Hellwig <hch@lst.de>

Index: xfs/fs/xfs/linux-2.6/xfs_linux.h
===================================================================
--- xfs.orig/fs/xfs/linux-2.6/xfs_linux.h	2011-06-30 20:22:44.849587371 +0200
+++ xfs/fs/xfs/linux-2.6/xfs_linux.h	2011-06-30 20:40:52.116240531 +0200
@@ -33,7 +33,6 @@
 #endif
 
 #include <xfs_types.h>
-#include <xfs_arch.h>
 
 #include <kmem.h>
 #include <mrlock.h>
@@ -88,6 +87,12 @@
 #include <xfs_buf.h>
 #include <xfs_message.h>
 
+#ifdef __BIG_ENDIAN
+#define XFS_NATIVE_HOST 1
+#else
+#undef XFS_NATIVE_HOST
+#endif
+
 /*
  * Feature macros (disable/enable)
  */
Index: xfs/fs/xfs/xfs_arch.h
===================================================================
--- xfs.orig/fs/xfs/xfs_arch.h	2011-06-30 20:21:42.116254819 +0200
+++ /dev/null	1970-01-01 00:00:00.000000000 +0000
@@ -1,136 +0,0 @@
-/*
- * Copyright (c) 2000-2002,2005 Silicon Graphics, Inc.
- * All Rights Reserved.
- *
- * This program is free software; you can redistribute it and/or
- * modify it under the terms of the GNU General Public License as
- * published by the Free Software Foundation.
- *
- * This program is distributed in the hope that it would be useful,
- * but WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
- * GNU General Public License for more details.
- *
- * You should have received a copy of the GNU General Public License
- * along with this program; if not, write the Free Software Foundation,
- * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
- */
-#ifndef __XFS_ARCH_H__
-#define __XFS_ARCH_H__
-
-#ifndef XFS_BIG_INUMS
-# error XFS_BIG_INUMS must be defined true or false
-#endif
-
-#ifdef __KERNEL__
-
-#include <asm/byteorder.h>
-
-#ifdef __BIG_ENDIAN
-#define	XFS_NATIVE_HOST	1
-#else
-#undef XFS_NATIVE_HOST
-#endif
-
-#else /* __KERNEL__ */
-
-#if __BYTE_ORDER == __BIG_ENDIAN
-#define	XFS_NATIVE_HOST	1
-#else
-#undef XFS_NATIVE_HOST
-#endif
-
-#ifdef XFS_NATIVE_HOST
-#define cpu_to_be16(val)	((__force __be16)(__u16)(val))
-#define cpu_to_be32(val)	((__force __be32)(__u32)(val))
-#define cpu_to_be64(val)	((__force __be64)(__u64)(val))
-#define be16_to_cpu(val)	((__force __u16)(__be16)(val))
-#define be32_to_cpu(val)	((__force __u32)(__be32)(val))
-#define be64_to_cpu(val)	((__force __u64)(__be64)(val))
-#else
-#define cpu_to_be16(val)	((__force __be16)__swab16((__u16)(val)))
-#define cpu_to_be32(val)	((__force __be32)__swab32((__u32)(val)))
-#define cpu_to_be64(val)	((__force __be64)__swab64((__u64)(val)))
-#define be16_to_cpu(val)	(__swab16((__force __u16)(__be16)(val)))
-#define be32_to_cpu(val)	(__swab32((__force __u32)(__be32)(val)))
-#define be64_to_cpu(val)	(__swab64((__force __u64)(__be64)(val)))
-#endif
-
-static inline void be16_add_cpu(__be16 *a, __s16 b)
-{
-	*a = cpu_to_be16(be16_to_cpu(*a) + b);
-}
-
-static inline void be32_add_cpu(__be32 *a, __s32 b)
-{
-	*a = cpu_to_be32(be32_to_cpu(*a) + b);
-}
-
-static inline void be64_add_cpu(__be64 *a, __s64 b)
-{
-	*a = cpu_to_be64(be64_to_cpu(*a) + b);
-}
-
-#endif	/* __KERNEL__ */
-
-/*
- * get and set integers from potentially unaligned locations
- */
-
-#define INT_GET_UNALIGNED_16_BE(pointer) \
-   ((__u16)((((__u8*)(pointer))[0] << 8) | (((__u8*)(pointer))[1])))
-#define INT_SET_UNALIGNED_16_BE(pointer,value) \
-    { \
-	((__u8*)(pointer))[0] = (((value) >> 8) & 0xff); \
-	((__u8*)(pointer))[1] = (((value)     ) & 0xff); \
-    }
-
-/*
- * In directories inode numbers are stored as unaligned arrays of unsigned
- * 8bit integers on disk.
- *
- * For v1 directories or v2 directories that contain inode numbers that
- * do not fit into 32bit the array has eight members, but the first member
- * is always zero:
- *
- *  |unused|48-55|40-47|32-39|24-31|16-23| 8-15| 0- 7|
- *
- * For v2 directories that only contain entries with inode numbers that fit
- * into 32bits a four-member array is used:
- *
- *  |24-31|16-23| 8-15| 0- 7|
- */ 
-
-#define XFS_GET_DIR_INO4(di) \
-	(((__u32)(di).i[0] << 24) | ((di).i[1] << 16) | ((di).i[2] << 8) | ((di).i[3]))
-
-#define XFS_PUT_DIR_INO4(from, di) \
-do { \
-	(di).i[0] = (((from) & 0xff000000ULL) >> 24); \
-	(di).i[1] = (((from) & 0x00ff0000ULL) >> 16); \
-	(di).i[2] = (((from) & 0x0000ff00ULL) >> 8); \
-	(di).i[3] = ((from) & 0x000000ffULL); \
-} while (0)
-
-#define XFS_DI_HI(di) \
-	(((__u32)(di).i[1] << 16) | ((di).i[2] << 8) | ((di).i[3]))
-#define XFS_DI_LO(di) \
-	(((__u32)(di).i[4] << 24) | ((di).i[5] << 16) | ((di).i[6] << 8) | ((di).i[7]))
-
-#define XFS_GET_DIR_INO8(di)        \
-	(((xfs_ino_t)XFS_DI_LO(di) & 0xffffffffULL) | \
-	 ((xfs_ino_t)XFS_DI_HI(di) << 32))
-
-#define XFS_PUT_DIR_INO8(from, di) \
-do { \
-	(di).i[0] = 0; \
-	(di).i[1] = (((from) & 0x00ff000000000000ULL) >> 48); \
-	(di).i[2] = (((from) & 0x0000ff0000000000ULL) >> 40); \
-	(di).i[3] = (((from) & 0x000000ff00000000ULL) >> 32); \
-	(di).i[4] = (((from) & 0x00000000ff000000ULL) >> 24); \
-	(di).i[5] = (((from) & 0x0000000000ff0000ULL) >> 16); \
-	(di).i[6] = (((from) & 0x000000000000ff00ULL) >> 8); \
-	(di).i[7] = ((from) & 0x00000000000000ffULL); \
-} while (0)
-	
-#endif	/* __XFS_ARCH_H__ */
Index: xfs/fs/xfs/xfs_dir2_sf.c
===================================================================
--- xfs.orig/fs/xfs/xfs_dir2_sf.c	2011-06-30 20:24:26.516252776 +0200
+++ xfs/fs/xfs/xfs_dir2_sf.c	2011-06-30 20:46:45.366236141 +0200
@@ -59,11 +59,12 @@ static void xfs_dir2_sf_toino4(xfs_da_ar
 static void xfs_dir2_sf_toino8(xfs_da_args_t *args);
 #endif /* XFS_BIG_INUMS */
 
-
 /*
  * Inode numbers in short-form directories can come in two versions,
  * either 4 bytes or 8 bytes wide.  These helpers deal with the
  * two forms transparently by looking at the headers i8count field.
+ *
+ * For 64-bit inode number the most significant byte must be zero.
  */
 static xfs_ino_t
 xfs_dir2_sf_get_ino(
@@ -71,9 +72,9 @@ xfs_dir2_sf_get_ino(
 	xfs_dir2_inou_t		*from)
 {
 	if (hdr->i8count)
-		return XFS_GET_DIR_INO8(from->i8);
+		return get_unaligned_be64(&from->i8.i) & 0x00ffffffffffffffULL;
 	else
-		return XFS_GET_DIR_INO4(from->i4);
+		return get_unaligned_be32(&from->i4.i);
 }
 
 static void
@@ -82,10 +83,12 @@ xfs_dir2_sf_put_ino(
 	xfs_dir2_inou_t		*to,
 	xfs_ino_t		ino)
 {
+	ASSERT((ino & 0xff00000000000000ULL) == 0);
+
 	if (hdr->i8count)
-		XFS_PUT_DIR_INO8(ino, to->i8);
+		put_unaligned_be64(ino, &to->i8.i);
 	else
-		XFS_PUT_DIR_INO4(ino, to->i4);
+		put_unaligned_be32(ino, &to->i4.i);
 }
 
 xfs_ino_t
Index: xfs/fs/xfs/xfs_dir2_sf.h
===================================================================
--- xfs.orig/fs/xfs/xfs_dir2_sf.h	2011-06-30 20:24:08.732919663 +0200
+++ xfs/fs/xfs/xfs_dir2_sf.h	2011-06-30 20:38:37.019575543 +0200
@@ -95,13 +95,13 @@ static inline int xfs_dir2_sf_hdr_size(i
 static inline xfs_dir2_data_aoff_t
 xfs_dir2_sf_get_offset(xfs_dir2_sf_entry_t *sfep)
 {
-	return INT_GET_UNALIGNED_16_BE(&(sfep)->offset.i);
+	return get_unaligned_be16(&sfep->offset.i);
 }
 
 static inline void
 xfs_dir2_sf_put_offset(xfs_dir2_sf_entry_t *sfep, xfs_dir2_data_aoff_t off)
 {
-	INT_SET_UNALIGNED_16_BE(&(sfep)->offset.i, off);
+	put_unaligned_be16(off, &sfep->offset.i);
 }
 
 static inline int

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 88+ messages in thread

* [PATCH 23/27] xfs: remove the unused xfs_bufhash structure
  2011-07-01  9:43 [PATCH 00/27] patch queue for Linux 3.1, V2 Christoph Hellwig
                   ` (20 preceding siblings ...)
  2011-07-01  9:43 ` [PATCH 22/27] xfs: use generic get_unaligned_beXX helpers Christoph Hellwig
@ 2011-07-01  9:43 ` Christoph Hellwig
  2011-07-06  3:44   ` Dave Chinner
  2011-07-06  3:49   ` Alex Elder
  2011-07-01  9:43 ` [PATCH 24/27] xfs: clean up buffer locking helpers Christoph Hellwig
                   ` (4 subsequent siblings)
  26 siblings, 2 replies; 88+ messages in thread
From: Christoph Hellwig @ 2011-07-01  9:43 UTC (permalink / raw)
  To: xfs

[-- Attachment #1: xfs-remove-bufhash --]
[-- Type: text/plain, Size: 686 bytes --]

Signed-off-by: Christoph Hellwig <hch@lst.de>

Index: xfs/fs/xfs/linux-2.6/xfs_buf.h
===================================================================
--- xfs.orig/fs/xfs/linux-2.6/xfs_buf.h	2011-06-29 11:26:14.542550346 +0200
+++ xfs/fs/xfs/linux-2.6/xfs_buf.h	2011-06-29 13:50:40.648935352 +0200
@@ -91,11 +91,6 @@ typedef enum {
 	XBT_FORCE_FLUSH = 1,
 } xfs_buftarg_flags_t;
 
-typedef struct xfs_bufhash {
-	struct list_head	bh_list;
-	spinlock_t		bh_lock;
-} xfs_bufhash_t;
-
 typedef struct xfs_buftarg {
 	dev_t			bt_dev;
 	struct block_device	*bt_bdev;

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 88+ messages in thread

* [PATCH 24/27] xfs: clean up buffer locking helpers
  2011-07-01  9:43 [PATCH 00/27] patch queue for Linux 3.1, V2 Christoph Hellwig
                   ` (21 preceding siblings ...)
  2011-07-01  9:43 ` [PATCH 23/27] xfs: remove the unused xfs_bufhash structure Christoph Hellwig
@ 2011-07-01  9:43 ` Christoph Hellwig
  2011-07-06  3:47   ` Dave Chinner
  2011-07-06  3:55   ` Alex Elder
  2011-07-01  9:43 ` [PATCH 25/27] xfs: return the buffer locked from xfs_buf_get_uncached Christoph Hellwig
                   ` (3 subsequent siblings)
  26 siblings, 2 replies; 88+ messages in thread
From: Christoph Hellwig @ 2011-07-01  9:43 UTC (permalink / raw)
  To: xfs

[-- Attachment #1: xfs-cleanup-buffer-locking --]
[-- Type: text/plain, Size: 11189 bytes --]

Rename xfs_buf_cond_lock and reverse it's return value to fit most other
trylock operations in the Kernel and XFS (with the exception of down_trylock,
after which xfs_buf_cond_lock was modelled), and replace xfs_buf_lock_val
with an xfs_buf_islocked for use in asserts, or and opencoded variant in
tracing.  remove the XFS_BUF_* wrappers for all the locking helpers.

Signed-off-by: Christoph Hellwig <hch@lst.de>

Index: xfs/fs/xfs/linux-2.6/xfs_buf.c
===================================================================
--- xfs.orig/fs/xfs/linux-2.6/xfs_buf.c	2011-06-29 11:26:14.000000000 +0200
+++ xfs/fs/xfs/linux-2.6/xfs_buf.c	2011-06-29 13:57:15.596795734 +0200
@@ -499,16 +499,14 @@ found:
 	spin_unlock(&pag->pag_buf_lock);
 	xfs_perag_put(pag);
 
-	if (xfs_buf_cond_lock(bp)) {
-		/* failed, so wait for the lock if requested. */
-		if (!(flags & XBF_TRYLOCK)) {
-			xfs_buf_lock(bp);
-			XFS_STATS_INC(xb_get_locked_waited);
-		} else {
+	if (!xfs_buf_trylock(bp)) {
+		if (flags & XBF_TRYLOCK) {
 			xfs_buf_rele(bp);
 			XFS_STATS_INC(xb_busy_locked);
 			return NULL;
 		}
+		xfs_buf_lock(bp);
+		XFS_STATS_INC(xb_get_locked_waited);
 	}
 
 	/*
@@ -896,8 +894,8 @@ xfs_buf_rele(
  *	to push on stale inode buffers.
  */
 int
-xfs_buf_cond_lock(
-	xfs_buf_t		*bp)
+xfs_buf_trylock(
+	struct xfs_buf		*bp)
 {
 	int			locked;
 
@@ -907,15 +905,8 @@ xfs_buf_cond_lock(
 	else if (atomic_read(&bp->b_pin_count) && (bp->b_flags & XBF_STALE))
 		xfs_log_force(bp->b_target->bt_mount, 0);
 
-	trace_xfs_buf_cond_lock(bp, _RET_IP_);
-	return locked ? 0 : -EBUSY;
-}
-
-int
-xfs_buf_lock_value(
-	xfs_buf_t		*bp)
-{
-	return bp->b_sema.count;
+	trace_xfs_buf_trylock(bp, _RET_IP_);
+	return locked;
 }
 
 /*
@@ -929,7 +920,7 @@ xfs_buf_lock_value(
  */
 void
 xfs_buf_lock(
-	xfs_buf_t		*bp)
+	struct xfs_buf		*bp)
 {
 	trace_xfs_buf_lock(bp, _RET_IP_);
 
@@ -950,7 +941,7 @@ xfs_buf_lock(
  */
 void
 xfs_buf_unlock(
-	xfs_buf_t		*bp)
+	struct xfs_buf		*bp)
 {
 	if ((bp->b_flags & (XBF_DELWRI|_XBF_DELWRI_Q)) == XBF_DELWRI) {
 		atomic_inc(&bp->b_hold);
@@ -1694,7 +1685,7 @@ xfs_buf_delwri_split(
 	list_for_each_entry_safe(bp, n, dwq, b_list) {
 		ASSERT(bp->b_flags & XBF_DELWRI);
 
-		if (!XFS_BUF_ISPINNED(bp) && !xfs_buf_cond_lock(bp)) {
+		if (!XFS_BUF_ISPINNED(bp) && xfs_buf_trylock(bp)) {
 			if (!force &&
 			    time_before(jiffies, bp->b_queuetime + age)) {
 				xfs_buf_unlock(bp);
Index: xfs/fs/xfs/linux-2.6/xfs_buf.h
===================================================================
--- xfs.orig/fs/xfs/linux-2.6/xfs_buf.h	2011-06-29 13:50:40.000000000 +0200
+++ xfs/fs/xfs/linux-2.6/xfs_buf.h	2011-06-29 13:54:35.250997736 +0200
@@ -187,10 +187,11 @@ extern void xfs_buf_free(xfs_buf_t *);
 extern void xfs_buf_rele(xfs_buf_t *);
 
 /* Locking and Unlocking Buffers */
-extern int xfs_buf_cond_lock(xfs_buf_t *);
-extern int xfs_buf_lock_value(xfs_buf_t *);
+extern int xfs_buf_trylock(xfs_buf_t *);
 extern void xfs_buf_lock(xfs_buf_t *);
 extern void xfs_buf_unlock(xfs_buf_t *);
+#define xfs_buf_islocked(bp) \
+	((bp)->b_sema.count <= 0)
 
 /* Buffer Read and Write Routines */
 extern int xfs_bwrite(struct xfs_mount *mp, struct xfs_buf *bp);
@@ -308,10 +309,6 @@ xfs_buf_set_ref(
 
 #define XFS_BUF_ISPINNED(bp)	atomic_read(&((bp)->b_pin_count))
 
-#define XFS_BUF_VALUSEMA(bp)	xfs_buf_lock_value(bp)
-#define XFS_BUF_CPSEMA(bp)	(xfs_buf_cond_lock(bp) == 0)
-#define XFS_BUF_VSEMA(bp)	xfs_buf_unlock(bp)
-#define XFS_BUF_PSEMA(bp,x)	xfs_buf_lock(bp)
 #define XFS_BUF_FINISH_IOWAIT(bp)	complete(&bp->b_iowait);
 
 #define XFS_BUF_SET_TARGET(bp, target)	((bp)->b_target = (target))
Index: xfs/fs/xfs/linux-2.6/xfs_trace.h
===================================================================
--- xfs.orig/fs/xfs/linux-2.6/xfs_trace.h	2011-06-29 11:35:45.000000000 +0200
+++ xfs/fs/xfs/linux-2.6/xfs_trace.h	2011-06-29 13:54:32.974343403 +0200
@@ -293,7 +293,7 @@ DECLARE_EVENT_CLASS(xfs_buf_class,
 		__entry->buffer_length = bp->b_buffer_length;
 		__entry->hold = atomic_read(&bp->b_hold);
 		__entry->pincount = atomic_read(&bp->b_pin_count);
-		__entry->lockval = xfs_buf_lock_value(bp);
+		__entry->lockval = bp->b_sema.count;
 		__entry->flags = bp->b_flags;
 		__entry->caller_ip = caller_ip;
 	),
@@ -323,7 +323,7 @@ DEFINE_BUF_EVENT(xfs_buf_bawrite);
 DEFINE_BUF_EVENT(xfs_buf_bdwrite);
 DEFINE_BUF_EVENT(xfs_buf_lock);
 DEFINE_BUF_EVENT(xfs_buf_lock_done);
-DEFINE_BUF_EVENT(xfs_buf_cond_lock);
+DEFINE_BUF_EVENT(xfs_buf_trylock);
 DEFINE_BUF_EVENT(xfs_buf_unlock);
 DEFINE_BUF_EVENT(xfs_buf_iowait);
 DEFINE_BUF_EVENT(xfs_buf_iowait_done);
@@ -366,7 +366,7 @@ DECLARE_EVENT_CLASS(xfs_buf_flags_class,
 		__entry->flags = flags;
 		__entry->hold = atomic_read(&bp->b_hold);
 		__entry->pincount = atomic_read(&bp->b_pin_count);
-		__entry->lockval = xfs_buf_lock_value(bp);
+		__entry->lockval = bp->b_sema.count;
 		__entry->caller_ip = caller_ip;
 	),
 	TP_printk("dev %d:%d bno 0x%llx len 0x%zx hold %d pincount %d "
@@ -409,7 +409,7 @@ TRACE_EVENT(xfs_buf_ioerror,
 		__entry->buffer_length = bp->b_buffer_length;
 		__entry->hold = atomic_read(&bp->b_hold);
 		__entry->pincount = atomic_read(&bp->b_pin_count);
-		__entry->lockval = xfs_buf_lock_value(bp);
+		__entry->lockval = bp->b_sema.count;
 		__entry->error = error;
 		__entry->flags = bp->b_flags;
 		__entry->caller_ip = caller_ip;
@@ -454,7 +454,7 @@ DECLARE_EVENT_CLASS(xfs_buf_item_class,
 		__entry->buf_flags = bip->bli_buf->b_flags;
 		__entry->buf_hold = atomic_read(&bip->bli_buf->b_hold);
 		__entry->buf_pincount = atomic_read(&bip->bli_buf->b_pin_count);
-		__entry->buf_lockval = xfs_buf_lock_value(bip->bli_buf);
+		__entry->buf_lockval = bip->bli_buf->b_sema.count;
 		__entry->li_desc = bip->bli_item.li_desc;
 		__entry->li_flags = bip->bli_item.li_flags;
 	),
Index: xfs/fs/xfs/quota/xfs_dquot.c
===================================================================
--- xfs.orig/fs/xfs/quota/xfs_dquot.c	2011-05-11 08:41:56.000000000 +0200
+++ xfs/fs/xfs/quota/xfs_dquot.c	2011-06-29 13:53:07.801471491 +0200
@@ -318,7 +318,7 @@ xfs_qm_init_dquot_blk(
 
 	ASSERT(tp);
 	ASSERT(XFS_BUF_ISBUSY(bp));
-	ASSERT(XFS_BUF_VALUSEMA(bp) <= 0);
+	ASSERT(xfs_buf_islocked(bp));
 
 	d = (xfs_dqblk_t *)XFS_BUF_PTR(bp);
 
@@ -534,7 +534,7 @@ xfs_qm_dqtobp(
 	}
 
 	ASSERT(XFS_BUF_ISBUSY(bp));
-	ASSERT(XFS_BUF_VALUSEMA(bp) <= 0);
+	ASSERT(xfs_buf_islocked(bp));
 
 	/*
 	 * calculate the location of the dquot inside the buffer.
@@ -622,7 +622,7 @@ xfs_qm_dqread(
 	 * brelse it because we have the changes incore.
 	 */
 	ASSERT(XFS_BUF_ISBUSY(bp));
-	ASSERT(XFS_BUF_VALUSEMA(bp) <= 0);
+	ASSERT(xfs_buf_islocked(bp));
 	xfs_trans_brelse(tp, bp);
 
 	return (error);
Index: xfs/fs/xfs/xfs_buf_item.c
===================================================================
--- xfs.orig/fs/xfs/xfs_buf_item.c	2011-04-22 06:21:45.000000000 +0200
+++ xfs/fs/xfs/xfs_buf_item.c	2011-06-29 13:53:20.938066990 +0200
@@ -420,7 +420,7 @@ xfs_buf_item_unpin(
 
 	if (freed && stale) {
 		ASSERT(bip->bli_flags & XFS_BLI_STALE);
-		ASSERT(XFS_BUF_VALUSEMA(bp) <= 0);
+		ASSERT(xfs_buf_islocked(bp));
 		ASSERT(!(XFS_BUF_ISDELAYWRITE(bp)));
 		ASSERT(XFS_BUF_ISSTALE(bp));
 		ASSERT(bip->bli_format.blf_flags & XFS_BLF_CANCEL);
@@ -483,7 +483,7 @@ xfs_buf_item_trylock(
 
 	if (XFS_BUF_ISPINNED(bp))
 		return XFS_ITEM_PINNED;
-	if (!XFS_BUF_CPSEMA(bp))
+	if (!xfs_buf_trylock(bp))
 		return XFS_ITEM_LOCKED;
 
 	/* take a reference to the buffer.  */
@@ -905,7 +905,7 @@ xfs_buf_attach_iodone(
 	xfs_log_item_t	*head_lip;
 
 	ASSERT(XFS_BUF_ISBUSY(bp));
-	ASSERT(XFS_BUF_VALUSEMA(bp) <= 0);
+	ASSERT(xfs_buf_islocked(bp));
 
 	lip->li_cb = cb;
 	if (XFS_BUF_FSPRIVATE(bp, void *) != NULL) {
Index: xfs/fs/xfs/xfs_log.c
===================================================================
--- xfs.orig/fs/xfs/xfs_log.c	2011-06-17 14:07:57.000000000 +0200
+++ xfs/fs/xfs/xfs_log.c	2011-06-29 13:53:33.954663139 +0200
@@ -1059,7 +1059,7 @@ xlog_alloc_log(xfs_mount_t	*mp,
 	XFS_BUF_SET_IODONE_FUNC(bp, xlog_iodone);
 	XFS_BUF_SET_FSPRIVATE2(bp, (unsigned long)1);
 	ASSERT(XFS_BUF_ISBUSY(bp));
-	ASSERT(XFS_BUF_VALUSEMA(bp) <= 0);
+	ASSERT(xfs_buf_islocked(bp));
 	log->l_xbuf = bp;
 
 	spin_lock_init(&log->l_icloglock);
@@ -1090,7 +1090,7 @@ xlog_alloc_log(xfs_mount_t	*mp,
 						log->l_iclog_size, 0);
 		if (!bp)
 			goto out_free_iclog;
-		if (!XFS_BUF_CPSEMA(bp))
+		if (!xfs_buf_trylock(bp))
 			ASSERT(0);
 		XFS_BUF_SET_IODONE_FUNC(bp, xlog_iodone);
 		XFS_BUF_SET_FSPRIVATE2(bp, (unsigned long)1);
@@ -1118,7 +1118,7 @@ xlog_alloc_log(xfs_mount_t	*mp,
 		iclog->ic_datap = (char *)iclog->ic_data + log->l_iclog_hsize;
 
 		ASSERT(XFS_BUF_ISBUSY(iclog->ic_bp));
-		ASSERT(XFS_BUF_VALUSEMA(iclog->ic_bp) <= 0);
+		ASSERT(xfs_buf_islocked(iclog->ic_bp));
 		init_waitqueue_head(&iclog->ic_force_wait);
 		init_waitqueue_head(&iclog->ic_write_wait);
 
Index: xfs/fs/xfs/xfs_log_recover.c
===================================================================
--- xfs.orig/fs/xfs/xfs_log_recover.c	2011-05-20 15:25:52.000000000 +0200
+++ xfs/fs/xfs/xfs_log_recover.c	2011-06-29 13:51:20.425386530 +0200
@@ -264,7 +264,7 @@ xlog_bwrite(
 	XFS_BUF_ZEROFLAGS(bp);
 	XFS_BUF_BUSY(bp);
 	XFS_BUF_HOLD(bp);
-	XFS_BUF_PSEMA(bp, PRIBIO);
+	xfs_buf_lock(bp);
 	XFS_BUF_SET_COUNT(bp, BBTOB(nbblks));
 	XFS_BUF_SET_TARGET(bp, log->l_mp->m_logdev_targp);
 
Index: xfs/fs/xfs/xfs_mount.c
===================================================================
--- xfs.orig/fs/xfs/xfs_mount.c	2011-06-29 11:38:53.000000000 +0200
+++ xfs/fs/xfs/xfs_mount.c	2011-06-29 13:51:20.425386530 +0200
@@ -1941,22 +1941,19 @@ unwind:
  * the superblock buffer if it can be locked without sleeping.
  * If it can't then we'll return NULL.
  */
-xfs_buf_t *
+struct xfs_buf *
 xfs_getsb(
-	xfs_mount_t	*mp,
-	int		flags)
+	struct xfs_mount	*mp,
+	int			flags)
 {
-	xfs_buf_t	*bp;
+	struct xfs_buf		*bp = mp->m_sb_bp;
 
-	ASSERT(mp->m_sb_bp != NULL);
-	bp = mp->m_sb_bp;
-	if (flags & XBF_TRYLOCK) {
-		if (!XFS_BUF_CPSEMA(bp)) {
+	if (!xfs_buf_trylock(bp)) {
+		if (flags & XBF_TRYLOCK)
 			return NULL;
-		}
-	} else {
-		XFS_BUF_PSEMA(bp, PRIBIO);
+		xfs_buf_lock(bp);
 	}
+
 	XFS_BUF_HOLD(bp);
 	ASSERT(XFS_BUF_ISDONE(bp));
 	return bp;
Index: xfs/fs/xfs/xfs_trans_buf.c
===================================================================
--- xfs.orig/fs/xfs/xfs_trans_buf.c	2011-03-27 23:52:57.000000000 +0200
+++ xfs/fs/xfs/xfs_trans_buf.c	2011-06-29 13:53:47.084592005 +0200
@@ -160,7 +160,7 @@ xfs_trans_get_buf(xfs_trans_t	*tp,
 	 */
 	bp = xfs_trans_buf_item_match(tp, target_dev, blkno, len);
 	if (bp != NULL) {
-		ASSERT(XFS_BUF_VALUSEMA(bp) <= 0);
+		ASSERT(xfs_buf_islocked(bp));
 		if (XFS_FORCED_SHUTDOWN(tp->t_mountp))
 			XFS_BUF_SUPER_STALE(bp);
 
@@ -327,7 +327,7 @@ xfs_trans_read_buf(
 	 */
 	bp = xfs_trans_buf_item_match(tp, target, blkno, len);
 	if (bp != NULL) {
-		ASSERT(XFS_BUF_VALUSEMA(bp) <= 0);
+		ASSERT(xfs_buf_islocked(bp));
 		ASSERT(XFS_BUF_FSPRIVATE2(bp, xfs_trans_t *) == tp);
 		ASSERT(XFS_BUF_FSPRIVATE(bp, void *) != NULL);
 		ASSERT((XFS_BUF_ISERROR(bp)) == 0);

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 88+ messages in thread

* [PATCH 25/27] xfs: return the buffer locked from xfs_buf_get_uncached
  2011-07-01  9:43 [PATCH 00/27] patch queue for Linux 3.1, V2 Christoph Hellwig
                   ` (22 preceding siblings ...)
  2011-07-01  9:43 ` [PATCH 24/27] xfs: clean up buffer locking helpers Christoph Hellwig
@ 2011-07-01  9:43 ` Christoph Hellwig
  2011-07-06  3:48   ` Dave Chinner
  2011-07-06  3:57   ` Alex Elder
  2011-07-01  9:43 ` [PATCH 26/27] xfs: cleanup I/O-related buffer flags Christoph Hellwig
                   ` (2 subsequent siblings)
  26 siblings, 2 replies; 88+ messages in thread
From: Christoph Hellwig @ 2011-07-01  9:43 UTC (permalink / raw)
  To: xfs

[-- Attachment #1: xfs-buf_get_uncached-locked-buffer --]
[-- Type: text/plain, Size: 2875 bytes --]

All other xfs_buf_get/read-like helpers return the buffer locked, make sure
xfs_buf_get_uncached isn't different for no reason.  Half of the callers
already lock it directly after, and the others probably should also keep
it locked if only for consistency and beeing able to use xfs_buf_rele,
but I'll leave that for later.

Signed-off-by: Christoph Hellwig <hch@lst.de>

Index: xfs/fs/xfs/linux-2.6/xfs_buf.c
===================================================================
--- xfs.orig/fs/xfs/linux-2.6/xfs_buf.c	2011-06-29 13:57:15.596795734 +0200
+++ xfs/fs/xfs/linux-2.6/xfs_buf.c	2011-06-29 13:57:32.243372220 +0200
@@ -679,7 +679,6 @@ xfs_buf_read_uncached(
 		return NULL;
 
 	/* set up the buffer for a read IO */
-	xfs_buf_lock(bp);
 	XFS_BUF_SET_ADDR(bp, daddr);
 	XFS_BUF_READ(bp);
 	XFS_BUF_BUSY(bp);
@@ -814,8 +813,6 @@ xfs_buf_get_uncached(
 		goto fail_free_mem;
 	}
 
-	xfs_buf_unlock(bp);
-
 	trace_xfs_buf_get_uncached(bp, _RET_IP_);
 	return bp;
 
Index: xfs/fs/xfs/xfs_log.c
===================================================================
--- xfs.orig/fs/xfs/xfs_log.c	2011-06-29 13:53:33.954663139 +0200
+++ xfs/fs/xfs/xfs_log.c	2011-06-29 13:57:32.243372220 +0200
@@ -1090,8 +1090,7 @@ xlog_alloc_log(xfs_mount_t	*mp,
 						log->l_iclog_size, 0);
 		if (!bp)
 			goto out_free_iclog;
-		if (!xfs_buf_trylock(bp))
-			ASSERT(0);
+
 		XFS_BUF_SET_IODONE_FUNC(bp, xlog_iodone);
 		XFS_BUF_SET_FSPRIVATE2(bp, (unsigned long)1);
 		iclog->ic_bp = bp;
Index: xfs/fs/xfs/xfs_log_recover.c
===================================================================
--- xfs.orig/fs/xfs/xfs_log_recover.c	2011-06-29 13:51:20.425386530 +0200
+++ xfs/fs/xfs/xfs_log_recover.c	2011-06-29 13:57:32.246705535 +0200
@@ -91,6 +91,8 @@ xlog_get_bp(
 	xlog_t		*log,
 	int		nbblks)
 {
+	struct xfs_buf	*bp;
+
 	if (!xlog_buf_bbcount_valid(log, nbblks)) {
 		xfs_warn(log->l_mp, "Invalid block length (0x%x) for buffer",
 			nbblks);
@@ -118,8 +120,10 @@ xlog_get_bp(
 		nbblks += log->l_sectBBsize;
 	nbblks = round_up(nbblks, log->l_sectBBsize);
 
-	return xfs_buf_get_uncached(log->l_mp->m_logdev_targp,
-					BBTOB(nbblks), 0);
+	bp = xfs_buf_get_uncached(log->l_mp->m_logdev_targp, BBTOB(nbblks), 0);
+	if (bp)
+		xfs_buf_unlock(bp);
+	return bp;
 }
 
 STATIC void
Index: xfs/fs/xfs/xfs_vnodeops.c
===================================================================
--- xfs.orig/fs/xfs/xfs_vnodeops.c	2011-06-29 11:35:45.789455635 +0200
+++ xfs/fs/xfs/xfs_vnodeops.c	2011-06-29 13:57:32.250038850 +0200
@@ -1969,6 +1969,8 @@ xfs_zero_remaining_bytes(
 	if (!bp)
 		return XFS_ERROR(ENOMEM);
 
+	xfs_buf_unlock(bp);
+
 	for (offset = startoff; offset <= endoff; offset = lastoffset + 1) {
 		offset_fsb = XFS_B_TO_FSBT(mp, offset);
 		nimap = 1;

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 88+ messages in thread

* [PATCH 26/27] xfs: cleanup I/O-related buffer flags
  2011-07-01  9:43 [PATCH 00/27] patch queue for Linux 3.1, V2 Christoph Hellwig
                   ` (23 preceding siblings ...)
  2011-07-01  9:43 ` [PATCH 25/27] xfs: return the buffer locked from xfs_buf_get_uncached Christoph Hellwig
@ 2011-07-01  9:43 ` Christoph Hellwig
  2011-07-06  3:54   ` Dave Chinner
  2011-07-06  4:09   ` Alex Elder
  2011-07-01  9:43 ` [PATCH 27/27] xfs: avoid a few disk cache flushes Christoph Hellwig
  2011-07-06  4:40 ` [PATCH 00/27] patch queue for Linux 3.1, V2 Alex Elder
  26 siblings, 2 replies; 88+ messages in thread
From: Christoph Hellwig @ 2011-07-01  9:43 UTC (permalink / raw)
  To: xfs

[-- Attachment #1: xfs-buf-cleanup-flags --]
[-- Type: text/plain, Size: 7874 bytes --]

Remove the unused and misnamed _XBF_RUN_QUEUES flag, rename XBF_LOG_BUFFER
to the more fitting XBF_SYNCIO, and split XBF_ORDERED into XBF_FUA and
XBF_FLUSH to allow more fine grained control over the bio flags.  Also
cleanup processing of the flags in _xfs_buf_ioapply to make more sense,
and renumber the sparse flag number space to group flags by purpose.

Signed-off-by: Christoph Hellwig <hch@lst.de>

Index: xfs/fs/xfs/linux-2.6/xfs_buf.c
===================================================================
--- xfs.orig/fs/xfs/linux-2.6/xfs_buf.c	2011-06-29 14:04:28.084452749 +0200
+++ xfs/fs/xfs/linux-2.6/xfs_buf.c	2011-06-29 14:27:00.987123445 +0200
@@ -592,10 +592,8 @@ _xfs_buf_read(
 	ASSERT(!(flags & (XBF_DELWRI|XBF_WRITE)));
 	ASSERT(bp->b_bn != XFS_BUF_DADDR_NULL);
 
-	bp->b_flags &= ~(XBF_WRITE | XBF_ASYNC | XBF_DELWRI | \
-			XBF_READ_AHEAD | _XBF_RUN_QUEUES);
-	bp->b_flags |= flags & (XBF_READ | XBF_ASYNC | \
-			XBF_READ_AHEAD | _XBF_RUN_QUEUES);
+	bp->b_flags &= ~(XBF_WRITE | XBF_ASYNC | XBF_DELWRI | XBF_READ_AHEAD);
+	bp->b_flags |= flags & (XBF_READ | XBF_ASYNC | XBF_READ_AHEAD);
 
 	status = xfs_buf_iorequest(bp);
 	if (status || XFS_BUF_ISERROR(bp) || (flags & XBF_ASYNC))
@@ -1211,23 +1209,21 @@ _xfs_buf_ioapply(
 	total_nr_pages = bp->b_page_count;
 	map_i = 0;
 
-	if (bp->b_flags & XBF_ORDERED) {
-		ASSERT(!(bp->b_flags & XBF_READ));
-		rw = WRITE_FLUSH_FUA;
-	} else if (bp->b_flags & XBF_LOG_BUFFER) {
-		ASSERT(!(bp->b_flags & XBF_READ_AHEAD));
-		bp->b_flags &= ~_XBF_RUN_QUEUES;
-		rw = (bp->b_flags & XBF_WRITE) ? WRITE_SYNC : READ_SYNC;
-	} else if (bp->b_flags & _XBF_RUN_QUEUES) {
-		ASSERT(!(bp->b_flags & XBF_READ_AHEAD));
-		bp->b_flags &= ~_XBF_RUN_QUEUES;
-		rw = (bp->b_flags & XBF_WRITE) ? WRITE_META : READ_META;
+	if (bp->b_flags & XBF_WRITE) {
+		if (bp->b_flags & XBF_SYNCIO)
+			rw = WRITE_SYNC;
+		else
+			rw = WRITE;
+		if (bp->b_flags & XBF_FUA)
+			rw |= REQ_FUA;
+		if (bp->b_flags & XBF_FLUSH)
+			rw |= REQ_FLUSH;
+	} else if (bp->b_flags & XBF_READ_AHEAD) {
+		rw = READA;
 	} else {
-		rw = (bp->b_flags & XBF_WRITE) ? WRITE :
-		     (bp->b_flags & XBF_READ_AHEAD) ? READA : READ;
+		rw = READ;
 	}
 
-
 next_chunk:
 	atomic_inc(&bp->b_io_remaining);
 	nr_pages = BIO_MAX_SECTORS >> (PAGE_SHIFT - BBSHIFT);
@@ -1689,8 +1685,7 @@ xfs_buf_delwri_split(
 				break;
 			}
 
-			bp->b_flags &= ~(XBF_DELWRI|_XBF_DELWRI_Q|
-					 _XBF_RUN_QUEUES);
+			bp->b_flags &= ~(XBF_DELWRI | _XBF_DELWRI_Q);
 			bp->b_flags |= XBF_WRITE;
 			list_move_tail(&bp->b_list, list);
 			trace_xfs_buf_delwri_split(bp, _RET_IP_);
Index: xfs/fs/xfs/linux-2.6/xfs_buf.h
===================================================================
--- xfs.orig/fs/xfs/linux-2.6/xfs_buf.h	2011-06-29 14:03:57.994615760 +0200
+++ xfs/fs/xfs/linux-2.6/xfs_buf.h	2011-06-29 14:18:16.806629842 +0200
@@ -46,43 +46,46 @@ typedef enum {
 
 #define XBF_READ	(1 << 0) /* buffer intended for reading from device */
 #define XBF_WRITE	(1 << 1) /* buffer intended for writing to device */
-#define XBF_MAPPED	(1 << 2) /* buffer mapped (b_addr valid) */
+#define XBF_READ_AHEAD	(1 << 2) /* asynchronous read-ahead */
+#define XBF_MAPPED	(1 << 3) /* buffer mapped (b_addr valid) */
 #define XBF_ASYNC	(1 << 4) /* initiator will not wait for completion */
 #define XBF_DONE	(1 << 5) /* all pages in the buffer uptodate */
 #define XBF_DELWRI	(1 << 6) /* buffer has dirty pages */
 #define XBF_STALE	(1 << 7) /* buffer has been staled, do not find it */
-#define XBF_ORDERED	(1 << 11)/* use ordered writes */
-#define XBF_READ_AHEAD	(1 << 12)/* asynchronous read-ahead */
-#define XBF_LOG_BUFFER	(1 << 13)/* this is a buffer used for the log */
+
+/* I/O hints for the BIO layer */
+#define XBF_SYNCIO	(1 << 10)/* treat this buffer as synchronous I/O */
+#define XBF_FUA		(1 << 11)/* force cache write through mode */
+#define XBF_FLUSH	(1 << 12)/* flush the disk cache before a write */
 
 /* flags used only as arguments to access routines */
-#define XBF_LOCK	(1 << 14)/* lock requested */
-#define XBF_TRYLOCK	(1 << 15)/* lock requested, but do not wait */
-#define XBF_DONT_BLOCK	(1 << 16)/* do not block in current thread */
+#define XBF_LOCK	(1 << 15)/* lock requested */
+#define XBF_TRYLOCK	(1 << 16)/* lock requested, but do not wait */
+#define XBF_DONT_BLOCK	(1 << 17)/* do not block in current thread */
 
 /* flags used only internally */
-#define _XBF_PAGES	(1 << 18)/* backed by refcounted pages */
-#define	_XBF_RUN_QUEUES	(1 << 19)/* run block device task queue	*/
-#define	_XBF_KMEM	(1 << 20)/* backed by heap memory */
-#define _XBF_DELWRI_Q	(1 << 21)/* buffer on delwri queue */
+#define _XBF_PAGES	(1 << 20)/* backed by refcounted pages */
+#define	_XBF_KMEM	(1 << 21)/* backed by heap memory */
+#define _XBF_DELWRI_Q	(1 << 22)/* buffer on delwri queue */
 
 typedef unsigned int xfs_buf_flags_t;
 
 #define XFS_BUF_FLAGS \
 	{ XBF_READ,		"READ" }, \
 	{ XBF_WRITE,		"WRITE" }, \
+	{ XBF_READ_AHEAD,	"READ_AHEAD" }, \
 	{ XBF_MAPPED,		"MAPPED" }, \
 	{ XBF_ASYNC,		"ASYNC" }, \
 	{ XBF_DONE,		"DONE" }, \
 	{ XBF_DELWRI,		"DELWRI" }, \
 	{ XBF_STALE,		"STALE" }, \
-	{ XBF_ORDERED,		"ORDERED" }, \
-	{ XBF_READ_AHEAD,	"READ_AHEAD" }, \
+	{ XBF_SYNCIO,		"SYNCIO" }, \
+	{ XBF_FUA,		"FUA" }, \
+	{ XBF_FLUSH,		"FLUSH" }, \
 	{ XBF_LOCK,		"LOCK" },  	/* should never be set */\
 	{ XBF_TRYLOCK,		"TRYLOCK" }, 	/* ditto */\
 	{ XBF_DONT_BLOCK,	"DONT_BLOCK" },	/* ditto */\
 	{ _XBF_PAGES,		"PAGES" }, \
-	{ _XBF_RUN_QUEUES,	"RUN_QUEUES" }, \
 	{ _XBF_KMEM,		"KMEM" }, \
 	{ _XBF_DELWRI_Q,	"DELWRI_Q" }
 
@@ -230,8 +233,9 @@ extern void xfs_buf_terminate(void);
 
 
 #define XFS_BUF_BFLAGS(bp)	((bp)->b_flags)
-#define XFS_BUF_ZEROFLAGS(bp)	((bp)->b_flags &= \
-		~(XBF_READ|XBF_WRITE|XBF_ASYNC|XBF_DELWRI|XBF_ORDERED))
+#define XFS_BUF_ZEROFLAGS(bp) \
+	((bp)->b_flags &= ~(XBF_READ|XBF_WRITE|XBF_ASYNC|XBF_DELWRI| \
+			    XBF_SYNCIO|XBF_FUA|XBF_FLUSH))
 
 void xfs_buf_stale(struct xfs_buf *bp);
 #define XFS_BUF_STALE(bp)	xfs_buf_stale(bp);
@@ -263,10 +267,6 @@ void xfs_buf_stale(struct xfs_buf *bp);
 #define XFS_BUF_UNASYNC(bp)	((bp)->b_flags &= ~XBF_ASYNC)
 #define XFS_BUF_ISASYNC(bp)	((bp)->b_flags & XBF_ASYNC)
 
-#define XFS_BUF_ORDERED(bp)	((bp)->b_flags |= XBF_ORDERED)
-#define XFS_BUF_UNORDERED(bp)	((bp)->b_flags &= ~XBF_ORDERED)
-#define XFS_BUF_ISORDERED(bp)	((bp)->b_flags & XBF_ORDERED)
-
 #define XFS_BUF_HOLD(bp)	xfs_buf_hold(bp)
 #define XFS_BUF_READ(bp)	((bp)->b_flags |= XBF_READ)
 #define XFS_BUF_UNREAD(bp)	((bp)->b_flags &= ~XBF_READ)
Index: xfs/fs/xfs/xfs_log.c
===================================================================
--- xfs.orig/fs/xfs/xfs_log.c	2011-06-29 14:04:18.587837528 +0200
+++ xfs/fs/xfs/xfs_log.c	2011-06-29 19:45:20.176987585 +0200
@@ -1268,7 +1268,6 @@ xlog_bdstrat(
 		return 0;
 	}
 
-	bp->b_flags |= _XBF_RUN_QUEUES;
 	xfs_buf_iorequest(bp);
 	return 0;
 }
@@ -1369,7 +1368,7 @@ xlog_sync(xlog_t		*log,
 	XFS_BUF_ZEROFLAGS(bp);
 	XFS_BUF_BUSY(bp);
 	XFS_BUF_ASYNC(bp);
-	bp->b_flags |= XBF_LOG_BUFFER;
+	bp->b_flags |= XBF_SYNCIO;
 
 	if (log->l_mp->m_flags & XFS_MOUNT_BARRIER) {
 		/*
@@ -1380,7 +1379,7 @@ xlog_sync(xlog_t		*log,
 		 */
 		if (log->l_mp->m_logdev_targp != log->l_mp->m_ddev_targp)
 			xfs_blkdev_issue_flush(log->l_mp->m_ddev_targp);
-		XFS_BUF_ORDERED(bp);
+		bp->b_flags |= XBF_FUA | XBF_FLUSH;
 	}
 
 	ASSERT(XFS_BUF_ADDR(bp) <= log->l_logBBsize-1);
@@ -1413,9 +1412,9 @@ xlog_sync(xlog_t		*log,
 		XFS_BUF_ZEROFLAGS(bp);
 		XFS_BUF_BUSY(bp);
 		XFS_BUF_ASYNC(bp);
-		bp->b_flags |= XBF_LOG_BUFFER;
+		bp->b_flags |= XBF_SYNCIO;
 		if (log->l_mp->m_flags & XFS_MOUNT_BARRIER)
-			XFS_BUF_ORDERED(bp);
+			bp->b_flags |= XBF_FUA | XBF_FLUSH;
 		dptr = XFS_BUF_PTR(bp);
 		/*
 		 * Bump the cycle numbers at the start of each block

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 88+ messages in thread

* [PATCH 27/27] xfs: avoid a few disk cache flushes
  2011-07-01  9:43 [PATCH 00/27] patch queue for Linux 3.1, V2 Christoph Hellwig
                   ` (24 preceding siblings ...)
  2011-07-01  9:43 ` [PATCH 26/27] xfs: cleanup I/O-related buffer flags Christoph Hellwig
@ 2011-07-01  9:43 ` Christoph Hellwig
  2011-07-06  3:55   ` Dave Chinner
  2011-07-06  4:11   ` Alex Elder
  2011-07-06  4:40 ` [PATCH 00/27] patch queue for Linux 3.1, V2 Alex Elder
  26 siblings, 2 replies; 88+ messages in thread
From: Christoph Hellwig @ 2011-07-01  9:43 UTC (permalink / raw)
  To: xfs

[-- Attachment #1: xfs-avoid-cache-flushes --]
[-- Type: text/plain, Size: 1999 bytes --]

There is no need for a pre-flush when doing writing the second part of a
split log buffer, and if we are using an external log there is no need
to do a full cache flush of the log device at all given that all writes
to it use the FUA flag.

Signed-off-by: Christoph Hellwig <hch@lst.de>

Index: xfs/fs/xfs/xfs_log.c
===================================================================
--- xfs.orig/fs/xfs/xfs_log.c	2011-07-01 11:35:50.874088428 +0200
+++ xfs/fs/xfs/xfs_log.c	2011-07-01 11:35:51.287421756 +0200
@@ -1371,15 +1371,21 @@ xlog_sync(xlog_t		*log,
 	bp->b_flags |= XBF_SYNCIO;
 
 	if (log->l_mp->m_flags & XFS_MOUNT_BARRIER) {
+		bp->b_flags |= XBF_FUA;
+
 		/*
-		 * If we have an external log device, flush the data device
-		 * before flushing the log to make sure all meta data
-		 * written back from the AIL actually made it to disk
-		 * before writing out the new log tail LSN in the log buffer.
+		 * Flush the data device before flushing the log to make
+		 * sure all meta data written back from the AIL actually made
+		 * it to disk before stamping the new log tail LSN into the
+		 * log buffer.  For an external log we need to issue the
+		 * flush explicitly, and unfortunately synchronously here;
+		 * for an internal log we can simply use the block layer
+		 * state machine for preflushes.
 		 */
 		if (log->l_mp->m_logdev_targp != log->l_mp->m_ddev_targp)
 			xfs_blkdev_issue_flush(log->l_mp->m_ddev_targp);
-		bp->b_flags |= XBF_FUA | XBF_FLUSH;
+		else
+			bp->b_flags |= XBF_FLUSH;
 	}
 
 	ASSERT(XFS_BUF_ADDR(bp) <= log->l_logBBsize-1);
@@ -1414,7 +1420,7 @@ xlog_sync(xlog_t		*log,
 		XFS_BUF_ASYNC(bp);
 		bp->b_flags |= XBF_SYNCIO;
 		if (log->l_mp->m_flags & XFS_MOUNT_BARRIER)
-			bp->b_flags |= XBF_FUA | XBF_FLUSH;
+			bp->b_flags |= XBF_FUA;
 		dptr = XFS_BUF_PTR(bp);
 		/*
 		 * Bump the cycle numbers at the start of each block

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 02/27] xfs: re-enable non-blocking behaviour in xfs_map_blocks
  2011-07-01  9:43 ` [PATCH 02/27] xfs: re-enable non-blocking behaviour in xfs_map_blocks Christoph Hellwig
@ 2011-07-05 22:35   ` Alex Elder
  2011-07-06  6:37     ` Christoph Hellwig
  0 siblings, 1 reply; 88+ messages in thread
From: Alex Elder @ 2011-07-05 22:35 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: xfs

On Fri, 2011-07-01 at 05:43 -0400, Christoph Hellwig wrote:
> The non-blockig behaviour in xfs_map_blocks currently is conditional on
> having both the WB_SYNC_NONE sync_mode and the nonblocking flag set.
> The latter used to be used by both pdflush, kswapd and a few other places
> in older kernels, but has been fading out starting with the introduction
> of the per-bdi flusher threads.
> 
> Enable the non-blocking behaviour for all WB_SYNC_NONE calls to get back
> the behaviour we want.

The subject line should refer to xfs_vm_writepage()
(not xfs_map_blocks()).  Unless I hear otherwise I
will plan to change that for you.

Other than that this looks OK to me.

Signed-off-by: Alex Elder <aelder@sgi.com>


> Signed-off-by: Christoph Hellwig <hch@lst.de>


_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 08/27] xfs: improve sync behaviour in the fact of aggressive dirtying
  2011-07-01  9:43 ` [PATCH 08/27] xfs: improve sync behaviour in the fact of aggressive dirtying Christoph Hellwig
@ 2011-07-05 22:36   ` Alex Elder
  2011-07-06  8:15     ` Christoph Hellwig
  0 siblings, 1 reply; 88+ messages in thread
From: Alex Elder @ 2011-07-05 22:36 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: xfs

On Fri, 2011-07-01 at 05:43 -0400, Christoph Hellwig wrote:
> The following script from Wu Fengguang shows very bad behaviour in XFS
> when aggressively dirtying data during a sync on XFS, with sync times
> up to almost 10 times as long as ext4.

(Note that I skipped over patch 7 for the time being,
trying to skip ahead to simpler changes to review.)

I think the change looks fine but the description doesn't
completely match it (unless I'm missing something).

> A large part of the issue is that XFS writes data out itself two times
> in the ->sync_fs method, overriding the lifelock protection in the core
> writeback code, and another issue is the lock-less xfs_ioend_wait call,
> which doesn't prevent new ioend from beeing queue up while waiting for
> the count to reach zero.

The change affects only the first thing you mention here, not
the second.

Also, if you plan to update the description--some typo's:
- "in the face of" in the subject
- "livelock protection" above
- "beeing" -> "being"


> This patch removes the XFS-internal sync calls and relies on the VFS
> to do it's work just like all other filesystems do.  Note that the
> i_iocount wait which is rather suboptimal is simply removed here.
> We already do it in ->write_inode, which keeps the current supoptimal
> behaviour.  We'll eventually need to remove that as well, but that's
> material for a separate commit.

The i_iocount wait is not affected by your patch.

> ------------------------------ snip ------------------------------
> #!/bin/sh
> 
> umount /dev/sda7
> mkfs.xfs -f /dev/sda7
> # mkfs.ext4 /dev/sda7
> # mkfs.btrfs /dev/sda7
> mount /dev/sda7 /fs
> 
> echo $((50<<20)) > /proc/sys/vm/dirty_bytes
> 
> pid=
> for i in `seq 10`
> do
> 	dd if=/dev/zero of=/fs/zero-$i bs=1M count=1000 &
> 	pid="$pid $!"
> done
> 
> sleep 1
> 
> tic=$(date +'%s')
> sync
> tac=$(date +'%s')
> 
> echo
> echo sync time: $((tac-tic))
> egrep '(Dirty|Writeback|NFS_Unstable)' /proc/meminfo
> 
> pidof dd > /dev/null && { kill -9 $pid; echo sync NOT livelocked; }
> ------------------------------ snip ------------------------------
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> Reported-by: Wu Fengguang <fengguang.wu@intel.com>
> Reviewed-by: Dave Chinner <dchinner@redhat.com>

I'm OK with the change, but really prefer to have
the description not include stuff that just isn't
there.  If you want me to commit this as-is, just
say so and I will.  Otherwise, post an update and
I'll use that.  In any case, you can consider this
reviewed by me.

Reviewed-by: Alex Elder <aelder@sgi.com>


> Index: xfs/fs/xfs/linux-2.6/xfs_sync.c
> ===================================================================
> --- xfs.orig/fs/xfs/linux-2.6/xfs_sync.c	2011-06-29 11:26:14.109219361 +0200
> +++ xfs/fs/xfs/linux-2.6/xfs_sync.c	2011-06-29 11:37:20.642275110 +0200
> @@ -359,14 +359,12 @@ xfs_quiesce_data(
>  {
>  	int			error, error2 = 0;
>  
> -	/* push non-blocking */
> -	xfs_sync_data(mp, 0);
>  	xfs_qm_sync(mp, SYNC_TRYLOCK);
> -
> -	/* push and block till complete */
> -	xfs_sync_data(mp, SYNC_WAIT);
>  	xfs_qm_sync(mp, SYNC_WAIT);
>  
> +	/* force out the newly dirtied log buffers */
> +	xfs_log_force(mp, XFS_LOG_SYNC);
> +
>  	/* write superblock and hoover up shutdown errors */
>  	error = xfs_sync_fsdata(mp);
>  
> 
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs



_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 09/27] xfs: fix filesystsem freeze race in xfs_trans_alloc
  2011-07-01  9:43 ` [PATCH 09/27] xfs: fix filesystsem freeze race in xfs_trans_alloc Christoph Hellwig
@ 2011-07-05 22:36   ` Alex Elder
  0 siblings, 0 replies; 88+ messages in thread
From: Alex Elder @ 2011-07-05 22:36 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: xfs

On Fri, 2011-07-01 at 05:43 -0400, Christoph Hellwig wrote:
> As pointed out by Jan xfs_trans_alloc can race with a concurrent filesystem
> freeze when it sleeps during the memory allocation.  Fix this by moving the
> wait_for_freeze call after the memory allocation.  This means moving the
> freeze into the low-level _xfs_trans_alloc helper, which thus grows a new
> argument.  Also fix up some comments in that area while at it.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> Reviewed-by: Dave Chinner <david@fromorbit.com>

Looks good.  The race has to do with the check of
mp->m_active_trans in xfs_quiesce_attr(), which
is called by the freeze_fs method, xfs_fs_freeze()

Reviewed-by: Alex Elder <aelder@sgi.com>


_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 10/27] xfs: remove i_transp
  2011-07-01  9:43 ` [PATCH 10/27] xfs: remove i_transp Christoph Hellwig
@ 2011-07-05 22:36   ` Alex Elder
  0 siblings, 0 replies; 88+ messages in thread
From: Alex Elder @ 2011-07-05 22:36 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: xfs

On Fri, 2011-07-01 at 05:43 -0400, Christoph Hellwig wrote:
> Remove the transaction pointer in the inode.  It's only used to avoid
> passing down an argument in the bmap code, and for a few asserts in
> the transaction code right now.

Looks OK to me.

Reviewed-by: Alex Elder <aelder@sgi.com>

> Signed-off-by: Christoph Hellwig <hch@lst.de>
> Reviewed-by: Dave Chinner <dchinner@redhat.com>


_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 11/27] xfs: kill the unused struct xfs_sync_work
  2011-07-01  9:43 ` [PATCH 11/27] xfs: kill the unused struct xfs_sync_work Christoph Hellwig
@ 2011-07-05 22:36   ` Alex Elder
  0 siblings, 0 replies; 88+ messages in thread
From: Alex Elder @ 2011-07-05 22:36 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: xfs

On Fri, 2011-07-01 at 05:43 -0400, Christoph Hellwig wrote:
> Signed-off-by: Christoph Hellwig <hch@lst.de>

Looks good.

Reviewed-by: Alex Elder <aelder@sgi.com>



_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 12/27] xfs: factor out xfs_dir2_leaf_find_entry
  2011-07-01  9:43 ` [PATCH 12/27] xfs: factor out xfs_dir2_leaf_find_entry Christoph Hellwig
@ 2011-07-05 22:36   ` Alex Elder
  0 siblings, 0 replies; 88+ messages in thread
From: Alex Elder @ 2011-07-05 22:36 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: xfs

On Fri, 2011-07-01 at 05:43 -0400, Christoph Hellwig wrote:
> plain text document attachment (xfs-factor-dir2-leaf-code)
> Add a new xfs_dir2_leaf_find_entry helper to factor out some duplicate code
> from xfs_dir2_leaf_addname xfs_dir2_leafn_add.  Found by Eric Sandeen using
> an automated code duplication checker.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> Reviewed-by: Dave Chinner <dchinner@redhat.com>

Nice.

Reviewed-by: Alex Elder <aelder@sgi.com>


_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 13/27] xfs: cleanup shortform directory inode number handling
  2011-07-01  9:43 ` [PATCH 13/27] xfs: cleanup shortform directory inode number handling Christoph Hellwig
@ 2011-07-05 22:36   ` Alex Elder
  0 siblings, 0 replies; 88+ messages in thread
From: Alex Elder @ 2011-07-05 22:36 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: xfs

On Fri, 2011-07-01 at 05:43 -0400, Christoph Hellwig wrote:
> Refactor the shortform directory helpers that deal with the 32-bit vs
> 64-bit wide inode numbers into more sensible helpers, and kill the
> xfs_intino_t typedef that is now superflous.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> Reviewed-by: Dave Chinner <dchinner@redhat.com>

Looking at XFS_{GET,PUT}_DIR_INO{4,8}(), they could
maybe benefit from conversion to cpu_to_be32() and
friends.  They're only used in these few spots.

Looks good though.

Reviewed-by: Alex Elder <aelder@sgi.com>



_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 14/27] xfs: kill struct xfs_dir2_sf
  2011-07-01  9:43 ` [PATCH 14/27] xfs: kill struct xfs_dir2_sf Christoph Hellwig
@ 2011-07-06  1:57   ` Dave Chinner
  2011-07-06  8:28     ` Christoph Hellwig
  2011-07-06  3:24   ` Alex Elder
  1 sibling, 1 reply; 88+ messages in thread
From: Dave Chinner @ 2011-07-06  1:57 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: xfs

On Fri, Jul 01, 2011 at 05:43:35AM -0400, Christoph Hellwig wrote:
> The list field of it is never cactually used, so all uses can simply be
> replaced with the xfs_dir2_sf_hdr_t type that it has as first member.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> 
> - * fit into the literal area of the inode.
> + * Small directories are packed as tightly as possible so as to fit into the
> + * literal area of the inode.  They consist of a single xfs_dir2_sf_hdr header
> + * followed by zero or more xfs_dir2_sf_entry structures.  Due the different
> + * inode number storage sized and the variable length name filed in
                           size                               field
> + * the xfs_dir2_sf_entry all these structure are variable length, and the
                                      structures
> + * accessors in this file need to be used to iterate over them.
                             should be

>  static inline int
> -xfs_dir2_sf_entsize_byentry(xfs_dir2_sf_t *sfp, xfs_dir2_sf_entry_t *sfep)
> +xfs_dir2_sf_entsize_byentry(xfs_dir2_sf_hdr_t *sfp, xfs_dir2_sf_entry_t *sfep)
>  {
>  	return ((uint)sizeof(xfs_dir2_sf_entry_t) - 1 + (sfep)->namelen - \
> -		((sfp)->hdr.i8count == 0) * \
> +		((sfp)->i8count == 0) * \
>  		((uint)sizeof(xfs_dir2_ino8_t) - (uint)sizeof(xfs_dir2_ino4_t)));
>  }
>  
> -static inline xfs_dir2_sf_entry_t *xfs_dir2_sf_firstentry(xfs_dir2_sf_t *sfp)
> +static inline xfs_dir2_sf_entry_t *xfs_dir2_sf_firstentry(xfs_dir2_sf_hdr_t *sfp)

Probably should split this onto two lines.

Otherwise looks good.

Reviewed-by: Dave Chinner <dchinner@redhat.com>
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 15/27] xfs: cleanup the defintion of struct xfs_dir2_sf_entry
  2011-07-01  9:43 ` [PATCH 15/27] xfs: cleanup the defintion of struct xfs_dir2_sf_entry Christoph Hellwig
@ 2011-07-06  2:00   ` Dave Chinner
  2011-07-06  3:33   ` Alex Elder
  1 sibling, 0 replies; 88+ messages in thread
From: Dave Chinner @ 2011-07-06  2:00 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: xfs

On Fri, Jul 01, 2011 at 05:43:36AM -0400, Christoph Hellwig wrote:
> Remove the inumber member which is at a variable offset after the actual
> name, and make name a real variable sized C99 array instead of the incorrect
> one-sized array which confuses (not only) gcc.  Based on this clean up
> the helpers to calculate the entry size.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>

Reviewed-by: Dave Chinner <dchinner@redhat.com>
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 16/27] xfs: avoid usage of struct xfs_dir2_block
  2011-07-01  9:43 ` [PATCH 16/27] xfs: avoid usage of struct xfs_dir2_block Christoph Hellwig
@ 2011-07-06  2:19   ` Dave Chinner
  2011-07-06  8:35     ` Christoph Hellwig
  2011-07-06  3:36   ` Alex Elder
  1 sibling, 1 reply; 88+ messages in thread
From: Dave Chinner @ 2011-07-06  2:19 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: xfs

On Fri, Jul 01, 2011 at 05:43:37AM -0400, Christoph Hellwig wrote:
> In most places we can simply pass around and use the struct xfs_dir2_data_hdr,
> which is the first and most important member of struct xfs_dir2_block instead
> of the full structure.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
....
> @@ -105,13 +105,13 @@ xfs_dir2_block_addname(
>  		return error;
>  	}
>  	ASSERT(bp != NULL);
> -	block = bp->data;
> +	hdr = bp->data;
>  	/*
>  	 * Check the magic number, corrupted if wrong.
>  	 */
> -	if (unlikely(be32_to_cpu(block->hdr.magic) != XFS_DIR2_BLOCK_MAGIC)) {
> +	if (unlikely(hdr->magic != cpu_to_be32(XFS_DIR2_BLOCK_MAGIC))) {

Took me a moment to realise what this does - turns the byte swap
into a compile-time operation rather than a runtime operation.
Nice.

Perhaps we should do that same optimisation in other magic number
checks around the place?

Looks good.

Reviewed-by: Dave Chinner <dchinner@redhat.com>
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 17/27] xfs: kill struct xfs_dir2_block
  2011-07-01  9:43 ` [PATCH 17/27] xfs: kill " Christoph Hellwig
@ 2011-07-06  2:31   ` Dave Chinner
  2011-07-06  8:37     ` Christoph Hellwig
  2011-07-06  3:36   ` Alex Elder
  1 sibling, 1 reply; 88+ messages in thread
From: Dave Chinner @ 2011-07-06  2:31 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: xfs

On Fri, Jul 01, 2011 at 05:43:38AM -0400, Christoph Hellwig wrote:
> Remove the confusing xfs_dir2_block structure.  It is supposed to describe
> an XFS dir2 block format btree block, but due to the variable sized nature
> of almost all elements in it it can't actuall do anything close to that
> job.  In addition to accessing the fixed offset header structure it was
> only used to get a pointer to the first dir or unused entry after it,
> which can be trivially replaced by pointer arithmetics on the header
> pointer.  For most users that is actually more natural anyway, as they
> don't use a typed pointer but rather a character pointer for further
> arithmetics.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
.....
> @@ -471,14 +470,13 @@ xfs_dir2_block_getdents(
>  	 * We'll skip entries before this.
>  	 */
>  	wantoff = xfs_dir2_dataptr_to_off(mp, *offset);
> -	block = bp->data;
> -	hdr = &block->hdr;
> +	hdr = bp->data;
>  	xfs_dir2_data_check(dp, bp);
>  	/*
>  	 * Set up values for the loop.
>  	 */
>  	btp = xfs_dir2_block_tail_p(mp, hdr);
> -	ptr = (char *)block->u;
> +	ptr = (char *)(hdr + 1);
>  	endptr = (char *)xfs_dir2_block_leaf_p(btp);

That is slightly less obvious what it is doing. It's jumping over
the entire header, but could easily be confused with jumping one
byte in.

Perhaps adding a wrapper e.g. xfs_dir2_block_data_p(hdr) to match
the xfs_dir2_block_tail_p() and xfs_dir2_block_leaf_p() wrappers,
and converting all the other cases to use this as well?

> @@ -1103,7 +1099,7 @@ xfs_dir2_sf_to_block(
>  	 * The whole thing is initialized to free by the init routine.
>  	 * Say we're using the leaf and tail area.
>  	 */
> -	dup = (xfs_dir2_data_unused_t *)block->u;
> +	dup = (xfs_dir2_data_unused_t *)(hdr + 1);

and maybe a xfs_dir2_block_unused_p() wrapper just to avoid the cast
here, though I'm not sure it's worth adding a wrapper just for this
one use.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 18/27] xfs: avoid usage of struct xfs_dir2_data
  2011-07-01  9:43 ` [PATCH 18/27] xfs: avoid usage of struct xfs_dir2_data Christoph Hellwig
@ 2011-07-06  3:02   ` Dave Chinner
  2011-07-06  8:43     ` Christoph Hellwig
  2011-07-06  3:38   ` Alex Elder
  1 sibling, 1 reply; 88+ messages in thread
From: Dave Chinner @ 2011-07-06  3:02 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: xfs

On Fri, Jul 01, 2011 at 05:43:39AM -0400, Christoph Hellwig wrote:
> In most places we can simply pass around and use the struct xfs_dir2_data_hdr,
> which is the first and most important member of struct xfs_dir2_data instead
> of the full structure.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>

Only a couple of minor things.

....

> @@ -251,12 +258,13 @@ xfs_dir2_data_freeinsert(
>  	xfs_dir2_data_free_t	new;		/* new bestfree entry */
>  
>  #ifdef __KERNEL__
> -	ASSERT(be32_to_cpu(d->hdr.magic) == XFS_DIR2_DATA_MAGIC ||
> -	       be32_to_cpu(d->hdr.magic) == XFS_DIR2_BLOCK_MAGIC);
> +	ASSERT(be32_to_cpu(hdr->magic) == XFS_DIR2_DATA_MAGIC ||
> +	       be32_to_cpu(hdr->magic) == XFS_DIR2_BLOCK_MAGIC);
>  #endif

You kill the ifdef __KERNEL__ there.

> @@ -286,36 +294,36 @@ xfs_dir2_data_freeinsert(
>   */
>  STATIC void
>  xfs_dir2_data_freeremove(
> -	xfs_dir2_data_t		*d,		/* data block pointer */
> +	xfs_dir2_data_hdr_t	*hdr,		/* data block header */
>  	xfs_dir2_data_free_t	*dfp,		/* bestfree entry pointer */
>  	int			*loghead)	/* out: log data header */
>  {
>  #ifdef __KERNEL__
> -	ASSERT(be32_to_cpu(d->hdr.magic) == XFS_DIR2_DATA_MAGIC ||
> -	       be32_to_cpu(d->hdr.magic) == XFS_DIR2_BLOCK_MAGIC);
> +	ASSERT(be32_to_cpu(hdr->magic) == XFS_DIR2_DATA_MAGIC ||
> +	       be32_to_cpu(hdr->magic) == XFS_DIR2_BLOCK_MAGIC);
>  #endif

And there.

> @@ -335,23 +344,23 @@ xfs_dir2_data_freescan(
>  	char			*p;		/* current entry pointer */
>  
>  #ifdef __KERNEL__
> -	ASSERT(be32_to_cpu(d->hdr.magic) == XFS_DIR2_DATA_MAGIC ||
> -	       be32_to_cpu(d->hdr.magic) == XFS_DIR2_BLOCK_MAGIC);
> +	ASSERT(hdr->magic == cpu_to_be32(XFS_DIR2_DATA_MAGIC) ||
> +	       hdr->magic == cpu_to_be32(XFS_DIR2_BLOCK_MAGIC));
>  #endif

I'll stop commenting on this now ;)

However, I have noticed that you've converted some of the magic
number compares to cpu_to_be32(XFS_DIR2_DATA_MAGIC) form and not
others. I'm not so concerned about the ASSERT()s, but some of the
real runtime checks are touched but not then not changed around.
Anyway, this can probably be done later as a separate cleanup.

>  			if (!needscan) {
> -				xfs_dir2_data_freeremove(d, dfp, needlogp);
> -				(void)xfs_dir2_data_freeinsert(d, newdup,
> +				xfs_dir2_data_freeremove(hdr, dfp, needlogp);
> +				(void)xfs_dir2_data_freeinsert(hdr, newdup,
>  					needlogp);
> -				(void)xfs_dir2_data_freeinsert(d, newdup2,
> +				(void)xfs_dir2_data_freeinsert(hdr, newdup2,
>  					needlogp);
>  			}
>  		}

Kill the (void) casts?

Otherwise looks good.

Reviewed-by: Dave Chinner <dchinner@redhat.com>
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 19/27] xfs: kill struct xfs_dir2_data
  2011-07-01  9:43 ` [PATCH 19/27] xfs: kill " Christoph Hellwig
@ 2011-07-06  3:05   ` Dave Chinner
  2011-07-06  3:38   ` Alex Elder
  1 sibling, 0 replies; 88+ messages in thread
From: Dave Chinner @ 2011-07-06  3:05 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: xfs

On Fri, Jul 01, 2011 at 05:43:40AM -0400, Christoph Hellwig wrote:
> Remove the confusing xfs_dir2_data structure.  It is supposed to describe
> an XFS dir2 data btree block, but due to the variable sized nature of
> almost all elements in it it can't actuall do anything close to that
> job.  In addition to accessing the fixed offset header structure it was
> only used to get a pointer to the first dir or unused entry after it,
> which can be trivially replaced by pointer arithmetics on the header
> pointer.  For most users that is actually more natural anyway, as they
> don't use a typed pointer but rather a character pointer for further
> arithmetics.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> 
> @@ -70,10 +69,9 @@ xfs_dir2_data_check(
>  	struct xfs_name		name;
>  
>  	mp = dp->i_mount;
> -	d = bp->data;
> -	hdr = &d->hdr;
> +	hdr = bp->data;
>  	bf = hdr->bestfree;
> -	p = (char *)d->u;
> +	p = (char *)(hdr + 1);

Same comment as the previous patch about using a wrapper for this.

Otherwise looks fine.

Reviewed-by: Dave Chinner <dchinner@redhat.com>

-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 20/27] xfs: cleanup the defintion of struct xfs_dir2_data_entry
  2011-07-01  9:43 ` [PATCH 20/27] xfs: cleanup the defintion of struct xfs_dir2_data_entry Christoph Hellwig
@ 2011-07-06  3:06   ` Dave Chinner
  2011-07-06  3:44   ` Alex Elder
  1 sibling, 0 replies; 88+ messages in thread
From: Dave Chinner @ 2011-07-06  3:06 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: xfs

On Fri, Jul 01, 2011 at 05:43:41AM -0400, Christoph Hellwig wrote:
> Remove the tag member which is at a variable offset after the actual
> name, and make name a real variable sized C99 array instead of the incorrect
> one-sized array which confuses (not only) gcc.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>

Reviewed-by: Dave Chinner <dchinner@redhat.com>

-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 21/27] xfs: cleanup struct xfs_dir2_leaf
  2011-07-01  9:43 ` [PATCH 21/27] xfs: cleanup struct xfs_dir2_leaf Christoph Hellwig
@ 2011-07-06  3:13   ` Dave Chinner
  2011-07-06  3:44   ` Alex Elder
  1 sibling, 0 replies; 88+ messages in thread
From: Dave Chinner @ 2011-07-06  3:13 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: xfs

On Fri, Jul 01, 2011 at 05:43:42AM -0400, Christoph Hellwig wrote:
> Simplify the confusing xfs_dir2_leaf structure.  It is supposed to describe
> an XFS dir2 leaf format btree block, but due to the variable sized nature
> of almost all elements in it it can't actuall do anything close to that
> job.   Remove the members that are after the first variable sized array,
> given that they could only be used for sizeof expressions that can as well
> just use the underlying types directly, and make the ents array a real
> C99 variable sized array.
> 
> Also factor out the xfs_dir2_leaf_size, to make the sizing of a leaf
> entry which already was convoluted somewhat readable after using the
> longer type names in the sizeof expressions.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
.....
> 
> Index: xfs/fs/xfs/xfs_dir2_leaf.h
> ===================================================================
> --- xfs.orig/fs/xfs/xfs_dir2_leaf.h	2011-06-30 09:18:07.263416117 +0200
> +++ xfs/fs/xfs/xfs_dir2_leaf.h	2011-06-30 09:38:44.723400763 +0200
> @@ -72,10 +72,7 @@ typedef struct xfs_dir2_leaf_tail {
>   */
>  typedef struct xfs_dir2_leaf {
>  	xfs_dir2_leaf_hdr_t	hdr;		/* leaf header */
> -	xfs_dir2_leaf_entry_t	ents[1];	/* entries */
> -						/* ... */
> -	xfs_dir2_data_off_t	bests[1];	/* best free counts */
> -	xfs_dir2_leaf_tail_t	tail;		/* leaf tail */
> +	xfs_dir2_leaf_entry_t	ents[];	/* entries */
>  } xfs_dir2_leaf_t;

This needs a coment describing the layout of the structure like
you've done with the other structures that have been cleaned up.

Cheers,

Dave.

-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 14/27] xfs: kill struct xfs_dir2_sf
  2011-07-01  9:43 ` [PATCH 14/27] xfs: kill struct xfs_dir2_sf Christoph Hellwig
  2011-07-06  1:57   ` Dave Chinner
@ 2011-07-06  3:24   ` Alex Elder
  2011-07-06  8:33     ` Christoph Hellwig
  1 sibling, 1 reply; 88+ messages in thread
From: Alex Elder @ 2011-07-06  3:24 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: xfs

On Fri, 2011-07-01 at 05:43 -0400, Christoph Hellwig wrote:
> The list field of it is never cactually used, so all uses can simply be
> replaced with the xfs_dir2_sf_hdr_t type that it has as first member.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>

Looks like a lot of places could be converted to use
"struct xfs_dir2_sf_hdr" rather than the typedef, but
it's not worth re-posting for that.  (Plus I suspect
such changes may be in forthcoming patches...)

Another few dumb little suggestions below--mostly
regarding a consistent naming scheme--but otherwise
this looks good.

Reviewed-by: Alex Elder <aelder@sgi.com>

. . .

> Index: xfs/fs/xfs/xfs_dir2_block.c
> ===================================================================
> --- xfs.orig/fs/xfs/xfs_dir2_block.c	2011-06-30 09:32:00.000000000 +0200
> +++ xfs/fs/xfs/xfs_dir2_block.c	2011-06-30 09:35:55.810069526 +0200
. . .
> @@ -1061,32 +1060,30 @@ xfs_dir2_sf_to_block(
>  		ASSERT(XFS_FORCED_SHUTDOWN(mp));
>  		return XFS_ERROR(EIO);
>  	}
> +
> +	oldsfp = (xfs_dir2_sf_hdr_t *)dp->i_df.if_u1.if_data;
> +
>  	ASSERT(dp->i_df.if_bytes == dp->i_d.di_size);
>  	ASSERT(dp->i_df.if_u1.if_data != NULL);

	ASSERT(oldsfp != NULL);

> -	sfp = (xfs_dir2_sf_t *)dp->i_df.if_u1.if_data;
> -	ASSERT(dp->i_d.di_size >= xfs_dir2_sf_hdr_size(sfp->hdr.i8count));
> +	ASSERT(dp->i_d.di_size >= xfs_dir2_sf_hdr_size(oldsfp->i8count));

. . .

> Index: xfs/fs/xfs/xfs_dir2_sf.c
> ===================================================================

. . .

> @@ -67,10 +67,10 @@ static void xfs_dir2_sf_toino8(xfs_da_ar
>   */
>  static xfs_ino_t
>  xfs_dir2_sf_get_ino(
> -	struct xfs_dir2_sf	*sfp,
> +	struct xfs_dir2_sf_hdr	*hdr,

I think I like the name "hdr" better than "sfp";
was it just too widespread a change to do a
similar rename elsewhere?  (xfs_dir2_block_to_sf()
uses "sfhp" already, though I like just "hdr".)

>  	xfs_dir2_inou_t		*from)
>  {
> -	if (sfp->hdr.i8count)
> +	if (hdr->i8count)
>  		return XFS_GET_DIR_INO8(from->i8);
>  	else
>  		return XFS_GET_DIR_INO4(from->i4);

. . .

> @@ -237,7 +237,7 @@ xfs_dir2_block_to_sf(
>  	xfs_mount_t		*mp;		/* filesystem mount point */
>  	char			*ptr;		/* current data pointer */
>  	xfs_dir2_sf_entry_t	*sfep;		/* shortform entry */
> -	xfs_dir2_sf_t		*sfp;		/* shortform structure */
> +	xfs_dir2_sf_hdr_t	*sfp;		/* shortform structure */

    xfs_dir2_sf_hdr_t       *hdr;   /* shortform directory header */
 
>  	trace_xfs_dir2_block_to_sf(args);
>  
. . .


_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 15/27] xfs: cleanup the defintion of struct xfs_dir2_sf_entry
  2011-07-01  9:43 ` [PATCH 15/27] xfs: cleanup the defintion of struct xfs_dir2_sf_entry Christoph Hellwig
  2011-07-06  2:00   ` Dave Chinner
@ 2011-07-06  3:33   ` Alex Elder
  2011-07-06  8:34     ` Christoph Hellwig
  1 sibling, 1 reply; 88+ messages in thread
From: Alex Elder @ 2011-07-06  3:33 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: xfs

On Fri, 2011-07-01 at 05:43 -0400, Christoph Hellwig wrote:
> Remove the inumber member which is at a variable offset after the actual
> name, and make name a real variable sized C99 array instead of the incorrect
> one-sized array which confuses (not only) gcc.  Based on this clean up
> the helpers to calculate the entry size.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>

Why was the inode put after then name in the
first place?

Anyway, looks good.

Reviewed-by: Alex Elder <aelder@sgi.com>


_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 16/27] xfs: avoid usage of struct xfs_dir2_block
  2011-07-01  9:43 ` [PATCH 16/27] xfs: avoid usage of struct xfs_dir2_block Christoph Hellwig
  2011-07-06  2:19   ` Dave Chinner
@ 2011-07-06  3:36   ` Alex Elder
  1 sibling, 0 replies; 88+ messages in thread
From: Alex Elder @ 2011-07-06  3:36 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: xfs

On Fri, 2011-07-01 at 05:43 -0400, Christoph Hellwig wrote:
> In most places we can simply pass around and use the struct xfs_dir2_data_hdr,
> which is the first and most important member of struct xfs_dir2_block instead
> of the full structure.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>

Looks good.

Reviewed-by: Alex Elder <aelder@sgi.com>



_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 17/27] xfs: kill struct xfs_dir2_block
  2011-07-01  9:43 ` [PATCH 17/27] xfs: kill " Christoph Hellwig
  2011-07-06  2:31   ` Dave Chinner
@ 2011-07-06  3:36   ` Alex Elder
  1 sibling, 0 replies; 88+ messages in thread
From: Alex Elder @ 2011-07-06  3:36 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: xfs

On Fri, 2011-07-01 at 05:43 -0400, Christoph Hellwig wrote:
> Remove the confusing xfs_dir2_block structure.  It is supposed to describe
> an XFS dir2 block format btree block, but due to the variable sized nature
> of almost all elements in it it can't actuall do anything close to that
> job.  In addition to accessing the fixed offset header structure it was
> only used to get a pointer to the first dir or unused entry after it,
> which can be trivially replaced by pointer arithmetics on the header
> pointer.  For most users that is actually more natural anyway, as they
> don't use a typed pointer but rather a character pointer for further
> arithmetics.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>

Yes, I think this is an improvement.

Reviewed-by: Alex Elder <aelder@sgi.com>


_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 18/27] xfs: avoid usage of struct xfs_dir2_data
  2011-07-01  9:43 ` [PATCH 18/27] xfs: avoid usage of struct xfs_dir2_data Christoph Hellwig
  2011-07-06  3:02   ` Dave Chinner
@ 2011-07-06  3:38   ` Alex Elder
  2011-07-06  8:45     ` Christoph Hellwig
  1 sibling, 1 reply; 88+ messages in thread
From: Alex Elder @ 2011-07-06  3:38 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: xfs

On Fri, 2011-07-01 at 05:43 -0400, Christoph Hellwig wrote:
> In most places we can simply pass around and use the struct xfs_dir2_data_hdr,
> which is the first and most important member of struct xfs_dir2_data instead
> of the full structure.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>

In xfs_dir2_data_freescan() you get the address of the
dir2_data_t by forcibly casting the address of the
header to that type.  We all know that's fine, but
it's an unsavory practice.  Why don't you pass the
full dir2_data_t in that case where it's needed?
Are you simply trying to avoid passing *any* pointers
to variable-sized types?

I'm not worried about this, but just wondered
what you thought about this.

(Update: I think you eliminate that line in the next
patch anyway...)

In any case:

Reviewed-by: Alex Elder <aelder@sgi.com>

. . .
		}
> Index: xfs/fs/xfs/xfs_dir2_data.c
> ===================================================================
> --- xfs.orig/fs/xfs/xfs_dir2_data.c	2011-06-30 09:38:36.586734196 +0200
> +++ xfs/fs/xfs/xfs_dir2_data.c	2011-06-30 09:38:40.133400821 +0200

. . .
 
> @@ -325,9 +333,10 @@ xfs_dir2_data_freeremove(
>  void
>  xfs_dir2_data_freescan(
>  	xfs_mount_t		*mp,		/* filesystem mount point */
> -	xfs_dir2_data_t		*d,		/* data block pointer */
> +	xfs_dir2_data_hdr_t	*hdr,		/* data block header */
>  	int			*loghead)	/* out: log data header */
>  {
> +	xfs_dir2_data_t		*d = (xfs_dir2_data_t *)hdr;
>  	xfs_dir2_block_tail_t	*btp;		/* block tail */
>  	xfs_dir2_data_entry_t	*dep;		/* active data entry */
>  	xfs_dir2_data_unused_t	*dup;		/* unused data entry */

. . .



_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 19/27] xfs: kill struct xfs_dir2_data
  2011-07-01  9:43 ` [PATCH 19/27] xfs: kill " Christoph Hellwig
  2011-07-06  3:05   ` Dave Chinner
@ 2011-07-06  3:38   ` Alex Elder
  1 sibling, 0 replies; 88+ messages in thread
From: Alex Elder @ 2011-07-06  3:38 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: xfs

On Fri, 2011-07-01 at 05:43 -0400, Christoph Hellwig wrote:
> Remove the confusing xfs_dir2_data structure.  It is supposed to describe
> an XFS dir2 data btree block, but due to the variable sized nature of
> almost all elements in it it can't actuall do anything close to that
> job.  In addition to accessing the fixed offset header structure it was
> only used to get a pointer to the first dir or unused entry after it,
> which can be trivially replaced by pointer arithmetics on the header
> pointer.  For most users that is actually more natural anyway, as they
> don't use a typed pointer but rather a character pointer for further
> arithmetics.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>

Looks good.  I like the diagrams.

Reviewed-by: Alex Elder <aelder@sgi.com>


_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 22/27] xfs: use generic get_unaligned_beXX helpers
  2011-07-01  9:43 ` [PATCH 22/27] xfs: use generic get_unaligned_beXX helpers Christoph Hellwig
@ 2011-07-06  3:44   ` Dave Chinner
  2011-07-06  9:07     ` Christoph Hellwig
  2011-07-06  3:47   ` Alex Elder
  1 sibling, 1 reply; 88+ messages in thread
From: Dave Chinner @ 2011-07-06  3:44 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: xfs

On Fri, Jul 01, 2011 at 05:43:43AM -0400, Christoph Hellwig wrote:
> Switch the shortform directory code over to use the generic
> get_unaligned_beXX helpers instead of reinventing them.  As a result
> kill off xfs_arch.h and move the setting of XFS_NATIVE_HOST into
> xfs_linux.h.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
.....
> -/*
> - * In directories inode numbers are stored as unaligned arrays of unsigned
> - * 8bit integers on disk.
> - *
> - * For v1 directories or v2 directories that contain inode numbers that
> - * do not fit into 32bit the array has eight members, but the first member
> - * is always zero:
> - *
> - *  |unused|48-55|40-47|32-39|24-31|16-23| 8-15| 0- 7|

Well, I learnt something today.

So we only support 56 bit inode numbers in shortform directories?
AFAIK, Nothing else in the code enforces this limitation. I
just found XFS_MAXINUMBER to define the maximum inode
number to be 56 bits in size, but, well, it's not used anywhere
relevant (like when initialising AGs)......

Hmmmm. I wonder if it is a hold-over from the days of 4GB AGs?
That would have meant inode numbers used 6 bits for the chunk index,
2^22 - 2^6 for the agbno and 2^32 for the agno, which gives 54 bits
maximum inode number and so XFS_MAXINUMBER @ 56 bits makes sense, as
does the zero high byte in the dir2 inode number.

Now we have 2^30 bits for the agbno+chunk index, and 32 bits for the
agno, so inode numbers can reach 62 bits, which is outside the range
of the 56-bit MAXINUMBER limit.

So my questions are now this:
	- is there any other reason for a 56 bit inode number limit?
	- why isn't it enforced for the rest of the directory code?
	- did we lose that checking when we converted the rest of
	  the directory code to use the generic byte swapping
	  functions?
	- do we need to increase XFS_MAXINUMBER to reflect the
	  current reality of 1TB AGs and simply ignore the zero high
	  byte restriction?

As it is, we need to update AG initialisation to disallow inode
allocation in AGs above the XFS_MAXINUMBER if we don't allow them in
the directory structure....


> +++ xfs/fs/xfs/xfs_dir2_sf.c	2011-06-30 20:46:45.366236141 +0200
> @@ -59,11 +59,12 @@ static void xfs_dir2_sf_toino4(xfs_da_ar
>  static void xfs_dir2_sf_toino8(xfs_da_args_t *args);
>  #endif /* XFS_BIG_INUMS */
>  
> -
>  /*
>   * Inode numbers in short-form directories can come in two versions,
>   * either 4 bytes or 8 bytes wide.  These helpers deal with the
>   * two forms transparently by looking at the headers i8count field.
> + *
> + * For 64-bit inode number the most significant byte must be zero.

This comment is what lead me one that path....

>   */
>  static xfs_ino_t
>  xfs_dir2_sf_get_ino(
> @@ -71,9 +72,9 @@ xfs_dir2_sf_get_ino(
>  	xfs_dir2_inou_t		*from)
>  {
>  	if (hdr->i8count)
> -		return XFS_GET_DIR_INO8(from->i8);
> +		return get_unaligned_be64(&from->i8.i) & 0x00ffffffffffffffULL;
>  	else
> -		return XFS_GET_DIR_INO4(from->i4);
> +		return get_unaligned_be32(&from->i4.i);
>  }
>  
>  static void
> @@ -82,10 +83,12 @@ xfs_dir2_sf_put_ino(
>  	xfs_dir2_inou_t		*to,
>  	xfs_ino_t		ino)
>  {
> +	ASSERT((ino & 0xff00000000000000ULL) == 0);
> +
>  	if (hdr->i8count)
> -		XFS_PUT_DIR_INO8(ino, to->i8);
> +		put_unaligned_be64(ino, &to->i8.i);
>  	else
> -		XFS_PUT_DIR_INO4(ino, to->i4);
> +		put_unaligned_be32(ino, &to->i4.i);
>  }
>  
>  xfs_ino_t
> Index: xfs/fs/xfs/xfs_dir2_sf.h
> ===================================================================
> --- xfs.orig/fs/xfs/xfs_dir2_sf.h	2011-06-30 20:24:08.732919663 +0200
> +++ xfs/fs/xfs/xfs_dir2_sf.h	2011-06-30 20:38:37.019575543 +0200
> @@ -95,13 +95,13 @@ static inline int xfs_dir2_sf_hdr_size(i
>  static inline xfs_dir2_data_aoff_t
>  xfs_dir2_sf_get_offset(xfs_dir2_sf_entry_t *sfep)
>  {
> -	return INT_GET_UNALIGNED_16_BE(&(sfep)->offset.i);
> +	return get_unaligned_be16(&sfep->offset.i);
>  }
>  
>  static inline void
>  xfs_dir2_sf_put_offset(xfs_dir2_sf_entry_t *sfep, xfs_dir2_data_aoff_t off)
>  {
> -	INT_SET_UNALIGNED_16_BE(&(sfep)->offset.i, off);
> +	put_unaligned_be16(off, &sfep->offset.i);
>  }
>  
>  static inline int

As a straight translation this patch is fine, but I think I'd like
to resolve some of the questions it raises first before anything
else....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 20/27] xfs: cleanup the defintion of struct xfs_dir2_data_entry
  2011-07-01  9:43 ` [PATCH 20/27] xfs: cleanup the defintion of struct xfs_dir2_data_entry Christoph Hellwig
  2011-07-06  3:06   ` Dave Chinner
@ 2011-07-06  3:44   ` Alex Elder
  2011-07-06  8:48     ` Christoph Hellwig
  1 sibling, 1 reply; 88+ messages in thread
From: Alex Elder @ 2011-07-06  3:44 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: xfs

On Fri, 2011-07-01 at 05:43 -0400, Christoph Hellwig wrote:
> Remove the tag member which is at a variable offset after the actual
> name, and make name a real variable sized C99 array instead of the incorrect
> one-sized array which confuses (not only) gcc.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>

One comment, but looks good.

Oh, and also, fix spelling of "definition" in the subject
line here and in other patches in this series.

Reviewed-by: Alex Elder <aelder@sgi.com>

> Index: xfs/fs/xfs/xfs_dir2_data.h
> ===================================================================
> --- xfs.orig/fs/xfs/xfs_dir2_data.h	2011-06-29 13:42:35.521563513 +0200
> +++ xfs/fs/xfs/xfs_dir2_data.h	2011-06-29 13:43:03.284746440 +0200
> @@ -98,14 +98,14 @@ typedef struct xfs_dir2_data_hdr {
>  
>  /*
>   * Active entry in a data block.  Aligned to 8 bytes.
> - * Tag appears as the last 2 bytes.
> + *
> + * After the variable length name field there is a 2 byte tag field, which
> + * can be accessed using xfs_dir2_data_entry_tag_p.
>   */
>  typedef struct xfs_dir2_data_entry {
>  	__be64			inumber;	/* inode number */
>  	__u8			namelen;	/* name length */
> -	__u8			name[1];	/* name bytes, no null */
> -						/* variable offset */
> -	__be16			tag;		/* starting offset of us */
> +	__u8			name[];		/* name bytes, no null */

Maybe put the comment about the tag field here, as
was done elsewhere.  (But the pictures are even
better...)

>  } xfs_dir2_data_entry_t;
>  
>  /*
> 
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs



_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 21/27] xfs: cleanup struct xfs_dir2_leaf
  2011-07-01  9:43 ` [PATCH 21/27] xfs: cleanup struct xfs_dir2_leaf Christoph Hellwig
  2011-07-06  3:13   ` Dave Chinner
@ 2011-07-06  3:44   ` Alex Elder
  1 sibling, 0 replies; 88+ messages in thread
From: Alex Elder @ 2011-07-06  3:44 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: xfs

On Fri, 2011-07-01 at 05:43 -0400, Christoph Hellwig wrote:
> Simplify the confusing xfs_dir2_leaf structure.  It is supposed to describe
> an XFS dir2 leaf format btree block, but due to the variable sized nature
> of almost all elements in it it can't actuall do anything close to that
> job.   Remove the members that are after the first variable sized array,
> given that they could only be used for sizeof expressions that can as well
> just use the underlying types directly, and make the ents array a real
> C99 variable sized array.
> 
> Also factor out the xfs_dir2_leaf_size, to make the sizing of a leaf
> entry which already was convoluted somewhat readable after using the
> longer type names in the sizeof expressions.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>

One comment below, otherwise looks good.

Reviewed-by: Alex Elder <aelder@sgi.com>

. . .

> Index: xfs/fs/xfs/xfs_dir2_leaf.h
> ===================================================================
> --- xfs.orig/fs/xfs/xfs_dir2_leaf.h	2011-06-30 09:18:07.263416117 +0200
> +++ xfs/fs/xfs/xfs_dir2_leaf.h	2011-06-30 09:38:44.723400763 +0200
> @@ -72,10 +72,7 @@ typedef struct xfs_dir2_leaf_tail {
>   */

A comment explaining the implied/variable-offset
fields is needed here.

>  typedef struct xfs_dir2_leaf {
>  	xfs_dir2_leaf_hdr_t	hdr;		/* leaf header */
> -	xfs_dir2_leaf_entry_t	ents[1];	/* entries */
> -						/* ... */
> -	xfs_dir2_data_off_t	bests[1];	/* best free counts */
> -	xfs_dir2_leaf_tail_t	tail;		/* leaf tail */
> +	xfs_dir2_leaf_entry_t	ents[];	/* entries */
>  } xfs_dir2_leaf_t;
>  
>  /*
> 
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs



_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 23/27] xfs: remove the unused xfs_bufhash structure
  2011-07-01  9:43 ` [PATCH 23/27] xfs: remove the unused xfs_bufhash structure Christoph Hellwig
@ 2011-07-06  3:44   ` Dave Chinner
  2011-07-06  3:49   ` Alex Elder
  1 sibling, 0 replies; 88+ messages in thread
From: Dave Chinner @ 2011-07-06  3:44 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: xfs

On Fri, Jul 01, 2011 at 05:43:44AM -0400, Christoph Hellwig wrote:
> Signed-off-by: Christoph Hellwig <hch@lst.de>

My Bad.

Reviewed-by: Dave Chinner <dchinner@redhat.com>

-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 24/27] xfs: clean up buffer locking helpers
  2011-07-01  9:43 ` [PATCH 24/27] xfs: clean up buffer locking helpers Christoph Hellwig
@ 2011-07-06  3:47   ` Dave Chinner
  2011-07-06  3:55   ` Alex Elder
  1 sibling, 0 replies; 88+ messages in thread
From: Dave Chinner @ 2011-07-06  3:47 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: xfs

On Fri, Jul 01, 2011 at 05:43:45AM -0400, Christoph Hellwig wrote:
> Rename xfs_buf_cond_lock and reverse it's return value to fit most other
> trylock operations in the Kernel and XFS (with the exception of down_trylock,
> after which xfs_buf_cond_lock was modelled), and replace xfs_buf_lock_val
> with an xfs_buf_islocked for use in asserts, or and opencoded variant in
> tracing.  remove the XFS_BUF_* wrappers for all the locking helpers.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>

Reviewed-by: Dave Chinner <dchinner@redhat.com>

-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 22/27] xfs: use generic get_unaligned_beXX helpers
  2011-07-01  9:43 ` [PATCH 22/27] xfs: use generic get_unaligned_beXX helpers Christoph Hellwig
  2011-07-06  3:44   ` Dave Chinner
@ 2011-07-06  3:47   ` Alex Elder
  1 sibling, 0 replies; 88+ messages in thread
From: Alex Elder @ 2011-07-06  3:47 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: xfs

On Fri, 2011-07-01 at 05:43 -0400, Christoph Hellwig wrote:
> Switch the shortform directory code over to use the generic
> get_unaligned_beXX helpers instead of reinventing them.  As a result
> kill off xfs_arch.h and move the setting of XFS_NATIVE_HOST into
> xfs_linux.h.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>

Huh, good thinking!  (Just what I suggested in reviewing
patch 13 in this series, but even better.)

Reviewed-by: Alex Elder <aelder@sgi.com>


_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 25/27] xfs: return the buffer locked from xfs_buf_get_uncached
  2011-07-01  9:43 ` [PATCH 25/27] xfs: return the buffer locked from xfs_buf_get_uncached Christoph Hellwig
@ 2011-07-06  3:48   ` Dave Chinner
  2011-07-06  3:57   ` Alex Elder
  1 sibling, 0 replies; 88+ messages in thread
From: Dave Chinner @ 2011-07-06  3:48 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: xfs

On Fri, Jul 01, 2011 at 05:43:46AM -0400, Christoph Hellwig wrote:
> All other xfs_buf_get/read-like helpers return the buffer locked, make sure
> xfs_buf_get_uncached isn't different for no reason.  Half of the callers
> already lock it directly after, and the others probably should also keep
> it locked if only for consistency and beeing able to use xfs_buf_rele,
> but I'll leave that for later.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>

Reviewed-by: Dave Chinner <dchinner@redhat.com>

-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 23/27] xfs: remove the unused xfs_bufhash structure
  2011-07-01  9:43 ` [PATCH 23/27] xfs: remove the unused xfs_bufhash structure Christoph Hellwig
  2011-07-06  3:44   ` Dave Chinner
@ 2011-07-06  3:49   ` Alex Elder
  1 sibling, 0 replies; 88+ messages in thread
From: Alex Elder @ 2011-07-06  3:49 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: xfs

On Fri, 2011-07-01 at 05:43 -0400, Christoph Hellwig wrote:
> Signed-off-by: Christoph Hellwig <hch@lst.de>

Looks good.

Reviewed-by: Alex Elder <aelder@sgi.com>


_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 26/27] xfs: cleanup I/O-related buffer flags
  2011-07-01  9:43 ` [PATCH 26/27] xfs: cleanup I/O-related buffer flags Christoph Hellwig
@ 2011-07-06  3:54   ` Dave Chinner
  2011-07-06  9:11     ` Christoph Hellwig
  2011-07-06  4:09   ` Alex Elder
  1 sibling, 1 reply; 88+ messages in thread
From: Dave Chinner @ 2011-07-06  3:54 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: xfs

On Fri, Jul 01, 2011 at 05:43:47AM -0400, Christoph Hellwig wrote:
> Remove the unused and misnamed _XBF_RUN_QUEUES flag, rename XBF_LOG_BUFFER
> to the more fitting XBF_SYNCIO, and split XBF_ORDERED into XBF_FUA and
> XBF_FLUSH to allow more fine grained control over the bio flags.  Also
> cleanup processing of the flags in _xfs_buf_ioapply to make more sense,
> and renumber the sparse flag number space to group flags by purpose.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> 
> -		bp->b_flags &= ~_XBF_RUN_QUEUES;
> -		rw = (bp->b_flags & XBF_WRITE) ? WRITE_SYNC : READ_SYNC;
> -	} else if (bp->b_flags & _XBF_RUN_QUEUES) {
> -		ASSERT(!(bp->b_flags & XBF_READ_AHEAD));
> -		bp->b_flags &= ~_XBF_RUN_QUEUES;
> -		rw = (bp->b_flags & XBF_WRITE) ? WRITE_META : READ_META;
> +	if (bp->b_flags & XBF_WRITE) {
> +		if (bp->b_flags & XBF_SYNCIO)
> +			rw = WRITE_SYNC;
> +		else
> +			rw = WRITE;
> +		if (bp->b_flags & XBF_FUA)
> +			rw |= REQ_FUA;
> +		if (bp->b_flags & XBF_FLUSH)
> +			rw |= REQ_FLUSH;
> +	} else if (bp->b_flags & XBF_READ_AHEAD) {
> +		rw = READA;
>  	} else {
> -		rw = (bp->b_flags & XBF_WRITE) ? WRITE :
> -		     (bp->b_flags & XBF_READ_AHEAD) ? READA : READ;
> +		rw = READ;
>  	}

Is it worthwhile tagging all these as READ_META and WRITE_META?
Though that probably needs to be done as a separate commit...

>  /* flags used only internally */
> -#define _XBF_PAGES	(1 << 18)/* backed by refcounted pages */
> -#define	_XBF_RUN_QUEUES	(1 << 19)/* run block device task queue	*/
> -#define	_XBF_KMEM	(1 << 20)/* backed by heap memory */
> -#define _XBF_DELWRI_Q	(1 << 21)/* buffer on delwri queue */
> +#define _XBF_PAGES	(1 << 20)/* backed by refcounted pages */
> +#define	_XBF_KMEM	(1 << 21)/* backed by heap memory */
> +#define _XBF_DELWRI_Q	(1 << 22)/* buffer on delwri queue */

Might be worthwhile cleaning up the stray tab before _XBF_KMEM
there.

Otherwise looks good.

Reviewed-by: Dave Chinner <dchinner@redhat.com>
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 24/27] xfs: clean up buffer locking helpers
  2011-07-01  9:43 ` [PATCH 24/27] xfs: clean up buffer locking helpers Christoph Hellwig
  2011-07-06  3:47   ` Dave Chinner
@ 2011-07-06  3:55   ` Alex Elder
  1 sibling, 0 replies; 88+ messages in thread
From: Alex Elder @ 2011-07-06  3:55 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: xfs

On Fri, 2011-07-01 at 05:43 -0400, Christoph Hellwig wrote:
> Rename xfs_buf_cond_lock and reverse it's return value to fit most other
> trylock operations in the Kernel and XFS (with the exception of down_trylock,
> after which xfs_buf_cond_lock was modelled), and replace xfs_buf_lock_val
> with an xfs_buf_islocked for use in asserts, or and opencoded variant in
> tracing.  remove the XFS_BUF_* wrappers for all the locking helpers.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>

Nice change.  More readable.

Reviewed-by: Alex Elder <aelder@sgi.com>


_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 27/27] xfs: avoid a few disk cache flushes
  2011-07-01  9:43 ` [PATCH 27/27] xfs: avoid a few disk cache flushes Christoph Hellwig
@ 2011-07-06  3:55   ` Dave Chinner
  2011-07-06  4:11   ` Alex Elder
  1 sibling, 0 replies; 88+ messages in thread
From: Dave Chinner @ 2011-07-06  3:55 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: xfs

On Fri, Jul 01, 2011 at 05:43:48AM -0400, Christoph Hellwig wrote:
> There is no need for a pre-flush when doing writing the second part of a
> split log buffer, and if we are using an external log there is no need
> to do a full cache flush of the log device at all given that all writes
> to it use the FUA flag.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>

Reviewed-by: Dave Chinner <dchinner@redhat.com>

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 25/27] xfs: return the buffer locked from xfs_buf_get_uncached
  2011-07-01  9:43 ` [PATCH 25/27] xfs: return the buffer locked from xfs_buf_get_uncached Christoph Hellwig
  2011-07-06  3:48   ` Dave Chinner
@ 2011-07-06  3:57   ` Alex Elder
  1 sibling, 0 replies; 88+ messages in thread
From: Alex Elder @ 2011-07-06  3:57 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: xfs

On Fri, 2011-07-01 at 05:43 -0400, Christoph Hellwig wrote:
> All other xfs_buf_get/read-like helpers return the buffer locked, make sure
> xfs_buf_get_uncached isn't different for no reason.  Half of the callers
> already lock it directly after, and the others probably should also keep
> it locked if only for consistency and beeing able to use xfs_buf_rele,
> but I'll leave that for later.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>

Looks good.

Reviewed-by: Alex Elder <aelder@sgi.com>


_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 26/27] xfs: cleanup I/O-related buffer flags
  2011-07-01  9:43 ` [PATCH 26/27] xfs: cleanup I/O-related buffer flags Christoph Hellwig
  2011-07-06  3:54   ` Dave Chinner
@ 2011-07-06  4:09   ` Alex Elder
  2011-07-06  9:11     ` Christoph Hellwig
  1 sibling, 1 reply; 88+ messages in thread
From: Alex Elder @ 2011-07-06  4:09 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: xfs

On Fri, 2011-07-01 at 05:43 -0400, Christoph Hellwig wrote:
> Remove the unused and misnamed _XBF_RUN_QUEUES flag, rename XBF_LOG_BUFFER
> to the more fitting XBF_SYNCIO, and split XBF_ORDERED into XBF_FUA and
> XBF_FLUSH to allow more fine grained control over the bio flags.  Also
> cleanup processing of the flags in _xfs_buf_ioapply to make more sense,
> and renumber the sparse flag number space to group flags by purpose.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>

Interesting that _XBF_RUN_QUEUES was never actually used.
The new names are much more understandable.  Looks like
READ_META and WRITE_META are now effectively gone...

Looks good.

Reviewed-by: Alex Elder <aelder@sgi.com>


_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 27/27] xfs: avoid a few disk cache flushes
  2011-07-01  9:43 ` [PATCH 27/27] xfs: avoid a few disk cache flushes Christoph Hellwig
  2011-07-06  3:55   ` Dave Chinner
@ 2011-07-06  4:11   ` Alex Elder
  1 sibling, 0 replies; 88+ messages in thread
From: Alex Elder @ 2011-07-06  4:11 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: xfs

On Fri, 2011-07-01 at 05:43 -0400, Christoph Hellwig wrote:
> There is no need for a pre-flush when doing writing the second part of a
> split log buffer, and if we are using an external log there is no need
> to do a full cache flush of the log device at all given that all writes
> to it use the FUA flag.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>

Looks good.

Reviewed-by: Alex Elder <aelder@sgi.com>


_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 07/27] xfs: split xfs_itruncate_finish
  2011-07-01  9:43 ` [PATCH 07/27] xfs: split xfs_itruncate_finish Christoph Hellwig
@ 2011-07-06  4:35   ` Alex Elder
  2011-07-06  8:11     ` Christoph Hellwig
  0 siblings, 1 reply; 88+ messages in thread
From: Alex Elder @ 2011-07-06  4:35 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: xfs

On Fri, 2011-07-01 at 05:43 -0400, Christoph Hellwig wrote:
> plain text document attachment (xfs-split-xfs_itruncate_finish)
> Split the guts of xfs_itruncate_finish that loop over the existing extents
> and calls xfs_bunmapi on them into a new helper, xfs_itruncate_externs.
> Make xfs_attr_inactive call it directly instead of xfs_itruncate_finish,
> which allows to simplify the latter a lot, by only letting it deal with
> the data fork.  As a result xfs_itruncate_finish is renamed to
> xfs_itruncate_data to make its use case more obvious.
> 
> Also remove the sync parameter from xfs_itruncate_data, which has been
> unessecary since the introduction of the busy extent list in 2002, and
> completely dead code since 2003 when the XFS_BMAPI_ASYNC parameter was
> made a no-op.
> 
> I can't actually see why the xfs_attr_inactive needs to set the transaction
> sync, but let's keep this patch simple and without changes in behaviour.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> Reviewed-by: Dave Chinner <dchinner@redhat.com>

OK, finally got through this.  Not with my usual
rigor, but it looks like a pretty reasonable
split-up of the function.

I have one remark but that's it.

Reviewed-by: Alex Elder <aelder@sgi.com>

. . .

> @@ -1390,128 +1274,143 @@ xfs_itruncate_finish(
>  	 * beyond the maximum file size (ie it is the same as last_block),
>  	 * then there is nothing to do.
>  	 */
> +	first_unmap_block = XFS_B_TO_FSB(mp, (xfs_ufsize_t)new_size);
>  	last_block = XFS_B_TO_FSB(mp, (xfs_ufsize_t)XFS_MAXIOFFSET(mp));
> -	ASSERT(first_unmap_block <= last_block);

. . .

> +		if (error)
> +			goto out_bmap_cancel;
>  
>  		/*
>  		 * Duplicate the transaction that has the permanent
>  		 * reservation and commit the old transaction.
>  		 */
> -		error = xfs_bmap_finish(tp, &free_list, &committed);
> -		ntp = *tp;
> +		error = xfs_bmap_finish(&tp, &free_list, &committed);
>  		if (committed)
> -			xfs_trans_ijoin(ntp, ip);
> -
> -		if (error) {
> -			/*
> -			 * If the bmap finish call encounters an error, return
> -			 * to the caller where the transaction can be properly
> -			 * aborted.  We just need to make sure we're not
> -			 * holding any resources that we were not when we came
> -			 * in.
> -			 *
> -			 * Aborting from this point might lose some blocks in
> -			 * the file system, but oh well.

The above comment (if true--I haven't really checked) seems
like something significant to preserve.

> -			 */
> -			xfs_bmap_cancel(&free_list);
> -			return error;
> -		}
> +			xfs_trans_ijoin(tp, ip);
> +		if (error)
> +			goto out_bmap_cancel;

. . .

> +
> +out:
> +	*tpp = tp;
> +	return error;
> +out_bmap_cancel:
> +	/*
> +	 * If the bunmapi call encounters an error, return to the caller where
> +	 * the transaction can be properly aborted.  We just need to make sure
> +	 * we're not holding any resources that we were not when we came in.
> +	 */
> +	xfs_bmap_cancel(&free_list);
> +	goto out;
> +}
> +

. . .

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 00/27] patch queue for Linux 3.1, V2
  2011-07-01  9:43 [PATCH 00/27] patch queue for Linux 3.1, V2 Christoph Hellwig
                   ` (25 preceding siblings ...)
  2011-07-01  9:43 ` [PATCH 27/27] xfs: avoid a few disk cache flushes Christoph Hellwig
@ 2011-07-06  4:40 ` Alex Elder
  2011-07-06  6:42   ` Christoph Hellwig
  26 siblings, 1 reply; 88+ messages in thread
From: Alex Elder @ 2011-07-06  4:40 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: xfs

On Fri, 2011-07-01 at 05:43 -0400, Christoph Hellwig wrote:
> This is my current patch queue for Linux 3.1.  Compared to the last
> posting all review comments were incorporated and two additional trivial
> patches were added.  The ->writepages implementation was dropped for now,
> given the bad situation of kswap-originating writeback, but I'll repost
> the fixed version separately to get feedback on the updated version.

I mentioned a bunch of fairly minor issues with
this series; I don't believe I found anything
that was actually incorrect.  Quite a few of
them consist of small but good cleanups.

I'll give you a chance to re-post them (or
point me at a repository to pull from).  At
this point I trust you'll do the right thing
so I'm ready to use what you give me.

For now I'm testing overnight with this
latest version of the patches.

					-Alex

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 02/27] xfs: re-enable non-blocking behaviour in xfs_map_blocks
  2011-07-05 22:35   ` Alex Elder
@ 2011-07-06  6:37     ` Christoph Hellwig
  2011-07-06 13:36       ` Alex Elder
  0 siblings, 1 reply; 88+ messages in thread
From: Christoph Hellwig @ 2011-07-06  6:37 UTC (permalink / raw)
  To: Alex Elder; +Cc: Christoph Hellwig, xfs

On Tue, Jul 05, 2011 at 05:35:19PM -0500, Alex Elder wrote:
> On Fri, 2011-07-01 at 05:43 -0400, Christoph Hellwig wrote:
> > The non-blockig behaviour in xfs_map_blocks currently is conditional on
> > having both the WB_SYNC_NONE sync_mode and the nonblocking flag set.
> > The latter used to be used by both pdflush, kswapd and a few other places
> > in older kernels, but has been fading out starting with the introduction
> > of the per-bdi flusher threads.
> > 
> > Enable the non-blocking behaviour for all WB_SYNC_NONE calls to get back
> > the behaviour we want.
> 
> The subject line should refer to xfs_vm_writepage()
> (not xfs_map_blocks()).  Unless I hear otherwise I
> will plan to change that for you.

Well, the actual xfs_ilock_nowait call is in xfs_map_blocks, the logic
controlling it in xfs_vm_writepage, so either one is fine.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 00/27] patch queue for Linux 3.1, V2
  2011-07-06  4:40 ` [PATCH 00/27] patch queue for Linux 3.1, V2 Alex Elder
@ 2011-07-06  6:42   ` Christoph Hellwig
  2011-07-06 13:32     ` Alex Elder
  0 siblings, 1 reply; 88+ messages in thread
From: Christoph Hellwig @ 2011-07-06  6:42 UTC (permalink / raw)
  To: Alex Elder; +Cc: Christoph Hellwig, xfs

On Tue, Jul 05, 2011 at 11:40:17PM -0500, Alex Elder wrote:
> On Fri, 2011-07-01 at 05:43 -0400, Christoph Hellwig wrote:
> > This is my current patch queue for Linux 3.1.  Compared to the last
> > posting all review comments were incorporated and two additional trivial
> > patches were added.  The ->writepages implementation was dropped for now,
> > given the bad situation of kswap-originating writeback, but I'll repost
> > the fixed version separately to get feedback on the updated version.
> 
> I mentioned a bunch of fairly minor issues with
> this series; I don't believe I found anything
> that was actually incorrect.  Quite a few of
> them consist of small but good cleanups.
> 
> I'll give you a chance to re-post them (or
> point me at a repository to pull from).  At
> this point I trust you'll do the right thing
> so I'm ready to use what you give me.
> 
> For now I'm testing overnight with this
> latest version of the patches.

I'll redo the series after going through your and Daves comments.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 07/27] xfs: split xfs_itruncate_finish
  2011-07-06  4:35   ` Alex Elder
@ 2011-07-06  8:11     ` Christoph Hellwig
  2011-07-06 14:05       ` Alex Elder
  0 siblings, 1 reply; 88+ messages in thread
From: Christoph Hellwig @ 2011-07-06  8:11 UTC (permalink / raw)
  To: Alex Elder; +Cc: Christoph Hellwig, xfs

On Tue, Jul 05, 2011 at 11:35:58PM -0500, Alex Elder wrote:
> The above comment (if true--I haven't really checked) seems
> like something significant to preserve.

The comment at the goto label already catches the important bits of it.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 08/27] xfs: improve sync behaviour in the fact of aggressive dirtying
  2011-07-05 22:36   ` Alex Elder
@ 2011-07-06  8:15     ` Christoph Hellwig
  2011-07-06 14:59       ` Alex Elder
  0 siblings, 1 reply; 88+ messages in thread
From: Christoph Hellwig @ 2011-07-06  8:15 UTC (permalink / raw)
  To: Alex Elder; +Cc: Christoph Hellwig, xfs

On Tue, Jul 05, 2011 at 05:36:18PM -0500, Alex Elder wrote:
> > A large part of the issue is that XFS writes data out itself two times
> > in the ->sync_fs method, overriding the lifelock protection in the core
> > writeback code, and another issue is the lock-less xfs_ioend_wait call,
> > which doesn't prevent new ioend from beeing queue up while waiting for
> > the count to reach zero.
> 
> The change affects only the first thing you mention here, not
> the second.

It does.  We're also removing the xfs_ioend_wait done from
xfs_sync_data/xfs_sync_inode_data.  We still have another one in
->write_inode, though.

> The i_iocount wait is not affected by your patch.

We're only removing one of the two we're doing per inode now.

> I'm OK with the change, but really prefer to have
> the description not include stuff that just isn't
> there.  If you want me to commit this as-is, just
> say so and I will.  Otherwise, post an update and
> I'll use that.  In any case, you can consider this
> reviewed by me.

If you have an idea how to reword the description send it my way.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 14/27] xfs: kill struct xfs_dir2_sf
  2011-07-06  1:57   ` Dave Chinner
@ 2011-07-06  8:28     ` Christoph Hellwig
  0 siblings, 0 replies; 88+ messages in thread
From: Christoph Hellwig @ 2011-07-06  8:28 UTC (permalink / raw)
  To: Dave Chinner; +Cc: Christoph Hellwig, xfs

On Wed, Jul 06, 2011 at 11:57:05AM +1000, Dave Chinner wrote:
> > - * fit into the literal area of the inode.
> > + * Small directories are packed as tightly as possible so as to fit into the
> > + * literal area of the inode.  They consist of a single xfs_dir2_sf_hdr header
> > + * followed by zero or more xfs_dir2_sf_entry structures.  Due the different
> > + * inode number storage sized and the variable length name filed in
>                            size                               field
> > + * the xfs_dir2_sf_entry all these structure are variable length, and the
>                                       structures
> > + * accessors in this file need to be used to iterate over them.
>                              should be

Thanks!

> > -static inline xfs_dir2_sf_entry_t *xfs_dir2_sf_firstentry(xfs_dir2_sf_t *sfp)
> > +static inline xfs_dir2_sf_entry_t *xfs_dir2_sf_firstentry(xfs_dir2_sf_hdr_t *sfp)
> 
> Probably should split this onto two lines.

I'll make sure it'll be fine in the end, not sure which patch it'll get
folded into.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 14/27] xfs: kill struct xfs_dir2_sf
  2011-07-06  3:24   ` Alex Elder
@ 2011-07-06  8:33     ` Christoph Hellwig
  2011-07-06 15:05       ` Alex Elder
  0 siblings, 1 reply; 88+ messages in thread
From: Christoph Hellwig @ 2011-07-06  8:33 UTC (permalink / raw)
  To: Alex Elder; +Cc: Christoph Hellwig, xfs

On Tue, Jul 05, 2011 at 10:24:18PM -0500, Alex Elder wrote:
> On Fri, 2011-07-01 at 05:43 -0400, Christoph Hellwig wrote:
> > The list field of it is never cactually used, so all uses can simply be
> > replaced with the xfs_dir2_sf_hdr_t type that it has as first member.
> > 
> > Signed-off-by: Christoph Hellwig <hch@lst.de>
> 
> Looks like a lot of places could be converted to use
> "struct xfs_dir2_sf_hdr" rather than the typedef, but
> it's not worth re-posting for that.  (Plus I suspect
> such changes may be in forthcoming patches...)

In general they should, but I try to avoid that where it means
massive formatting changes, as that just clutters up the patch.

> > +	oldsfp = (xfs_dir2_sf_hdr_t *)dp->i_df.if_u1.if_data;
> > +
> >  	ASSERT(dp->i_df.if_bytes == dp->i_d.di_size);
> >  	ASSERT(dp->i_df.if_u1.if_data != NULL);
> 
> 	ASSERT(oldsfp != NULL);

What for?  We'll just dereference it later anyway.

> >  static xfs_ino_t
> >  xfs_dir2_sf_get_ino(
> > -	struct xfs_dir2_sf	*sfp,
> > +	struct xfs_dir2_sf_hdr	*hdr,
> 
> I think I like the name "hdr" better than "sfp";
> was it just too widespread a change to do a
> similar rename elsewhere?  (xfs_dir2_block_to_sf()
> uses "sfhp" already, though I like just "hdr".)

Yeah, I tried to keep the change small in general.  If people like it
I can do a big sweep to convert stuff to struct types and common names
as a follow-on.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 15/27] xfs: cleanup the defintion of struct xfs_dir2_sf_entry
  2011-07-06  3:33   ` Alex Elder
@ 2011-07-06  8:34     ` Christoph Hellwig
  0 siblings, 0 replies; 88+ messages in thread
From: Christoph Hellwig @ 2011-07-06  8:34 UTC (permalink / raw)
  To: Alex Elder; +Cc: Christoph Hellwig, xfs

On Tue, Jul 05, 2011 at 10:33:31PM -0500, Alex Elder wrote:
> On Fri, 2011-07-01 at 05:43 -0400, Christoph Hellwig wrote:
> > Remove the inumber member which is at a variable offset after the actual
> > name, and make name a real variable sized C99 array instead of the incorrect
> > one-sized array which confuses (not only) gcc.  Based on this clean up
> > the helpers to calculate the entry size.
> > 
> > Signed-off-by: Christoph Hellwig <hch@lst.de>
> 
> Why was the inode put after then name in the
> first place?

I've wondered that, too.  But it's been long before my time.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 16/27] xfs: avoid usage of struct xfs_dir2_block
  2011-07-06  2:19   ` Dave Chinner
@ 2011-07-06  8:35     ` Christoph Hellwig
  0 siblings, 0 replies; 88+ messages in thread
From: Christoph Hellwig @ 2011-07-06  8:35 UTC (permalink / raw)
  To: Dave Chinner; +Cc: Christoph Hellwig, xfs

On Wed, Jul 06, 2011 at 12:19:35PM +1000, Dave Chinner wrote:
> Took me a moment to realise what this does - turns the byte swap
> into a compile-time operation rather than a runtime operation.
> Nice.
> 
> Perhaps we should do that same optimisation in other magic number
> checks around the place?

We should, and I should have been more consistant about it.  In fact
I probably should split it into a separate patch.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 17/27] xfs: kill struct xfs_dir2_block
  2011-07-06  2:31   ` Dave Chinner
@ 2011-07-06  8:37     ` Christoph Hellwig
  2011-07-06 15:11       ` Alex Elder
  0 siblings, 1 reply; 88+ messages in thread
From: Christoph Hellwig @ 2011-07-06  8:37 UTC (permalink / raw)
  To: Dave Chinner; +Cc: Christoph Hellwig, xfs

On Wed, Jul 06, 2011 at 12:31:57PM +1000, Dave Chinner wrote:
> >  	btp = xfs_dir2_block_tail_p(mp, hdr);
> > -	ptr = (char *)block->u;
> > +	ptr = (char *)(hdr + 1);
> >  	endptr = (char *)xfs_dir2_block_leaf_p(btp);
> 
> That is slightly less obvious what it is doing. It's jumping over
> the entire header, but could easily be confused with jumping one
> byte in.
> 
> Perhaps adding a wrapper e.g. xfs_dir2_block_data_p(hdr) to match
> the xfs_dir2_block_tail_p() and xfs_dir2_block_leaf_p() wrappers,
> and converting all the other cases to use this as well?

I had that in the initial version, but given that we usually use
the result as char, and not one of the two types of the union just
made the code very messy.

I can try it again, maybe as a add-on patch at the end so that we can
decide if it actually improves anything.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 18/27] xfs: avoid usage of struct xfs_dir2_data
  2011-07-06  3:02   ` Dave Chinner
@ 2011-07-06  8:43     ` Christoph Hellwig
  0 siblings, 0 replies; 88+ messages in thread
From: Christoph Hellwig @ 2011-07-06  8:43 UTC (permalink / raw)
  To: Dave Chinner; +Cc: Christoph Hellwig, xfs

On Wed, Jul 06, 2011 at 01:02:28PM +1000, Dave Chinner wrote:
> > @@ -251,12 +258,13 @@ xfs_dir2_data_freeinsert(
> >  	xfs_dir2_data_free_t	new;		/* new bestfree entry */
> >  
> >  #ifdef __KERNEL__
> > -	ASSERT(be32_to_cpu(d->hdr.magic) == XFS_DIR2_DATA_MAGIC ||
> > -	       be32_to_cpu(d->hdr.magic) == XFS_DIR2_BLOCK_MAGIC);
> > +	ASSERT(be32_to_cpu(hdr->magic) == XFS_DIR2_DATA_MAGIC ||
> > +	       be32_to_cpu(hdr->magic) == XFS_DIR2_BLOCK_MAGIC);
> >  #endif
> 
> You kill the ifdef __KERNEL__ there.

If I do it I'd rather do it as a sepaarate patch, and after actually
testing it with xfsprogs.

> >  			if (!needscan) {
> > -				xfs_dir2_data_freeremove(d, dfp, needlogp);
> > -				(void)xfs_dir2_data_freeinsert(d, newdup,
> > +				xfs_dir2_data_freeremove(hdr, dfp, needlogp);
> > +				(void)xfs_dir2_data_freeinsert(hdr, newdup,
> >  					needlogp);
> > -				(void)xfs_dir2_data_freeinsert(d, newdup2,
> > +				(void)xfs_dir2_data_freeinsert(hdr, newdup2,
> >  					needlogp);
> >  			}
> >  		}
> 
> Kill the (void) casts?
> 

Sounds fine.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 18/27] xfs: avoid usage of struct xfs_dir2_data
  2011-07-06  3:38   ` Alex Elder
@ 2011-07-06  8:45     ` Christoph Hellwig
  0 siblings, 0 replies; 88+ messages in thread
From: Christoph Hellwig @ 2011-07-06  8:45 UTC (permalink / raw)
  To: Alex Elder; +Cc: Christoph Hellwig, xfs

On Tue, Jul 05, 2011 at 10:38:22PM -0500, Alex Elder wrote:
> On Fri, 2011-07-01 at 05:43 -0400, Christoph Hellwig wrote:
> > In most places we can simply pass around and use the struct xfs_dir2_data_hdr,
> > which is the first and most important member of struct xfs_dir2_data instead
> > of the full structure.
> > 
> > Signed-off-by: Christoph Hellwig <hch@lst.de>
> 
> In xfs_dir2_data_freescan() you get the address of the
> dir2_data_t by forcibly casting the address of the
> header to that type.  We all know that's fine, but
> it's an unsavory practice.  Why don't you pass the
> full dir2_data_t in that case where it's needed?
> Are you simply trying to avoid passing *any* pointers
> to variable-sized types?

I'm restricting the dir2_data_t scope to where we needed it,
to kill it (and thus the cast) off entirely in the next patch.

> (Update: I think you eliminate that line in the next
> patch anyway...)

Exactly!

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 20/27] xfs: cleanup the defintion of struct xfs_dir2_data_entry
  2011-07-06  3:44   ` Alex Elder
@ 2011-07-06  8:48     ` Christoph Hellwig
  0 siblings, 0 replies; 88+ messages in thread
From: Christoph Hellwig @ 2011-07-06  8:48 UTC (permalink / raw)
  To: Alex Elder; +Cc: Christoph Hellwig, xfs

On Tue, Jul 05, 2011 at 10:44:32PM -0500, Alex Elder wrote:
> Maybe put the comment about the tag field here, as
> was done elsewhere.  (But the pictures are even
> better...)

I'll see what I can do about it.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 22/27] xfs: use generic get_unaligned_beXX helpers
  2011-07-06  3:44   ` Dave Chinner
@ 2011-07-06  9:07     ` Christoph Hellwig
  2011-07-07  8:00       ` Christoph Hellwig
  0 siblings, 1 reply; 88+ messages in thread
From: Christoph Hellwig @ 2011-07-06  9:07 UTC (permalink / raw)
  To: Dave Chinner; +Cc: Christoph Hellwig, xfs

On Wed, Jul 06, 2011 at 01:44:21PM +1000, Dave Chinner wrote:
> Hmmmm. I wonder if it is a hold-over from the days of 4GB AGs?

Probably.

> That would have meant inode numbers used 6 bits for the chunk index,
> 2^22 - 2^6 for the agbno and 2^32 for the agno, which gives 54 bits
> maximum inode number and so XFS_MAXINUMBER @ 56 bits makes sense, as
> does the zero high byte in the dir2 inode number.
> 
> Now we have 2^30 bits for the agbno+chunk index, and 32 bits for the
> agno, so inode numbers can reach 62 bits, which is outside the range
> of the 56-bit MAXINUMBER limit.
> 
> So my questions are now this:
> 	- did we lose that checking when we converted the rest of
> 	  the directory code to use the generic byte swapping
> 	  functions?

The history lesson tells us:

Before my commit

	"Fix and streamline directory inode number handling"

we didn't do any capping for them in Linux since the early days
of adding byte swaping support.  However my commit was modelled
after the original IRIX code, which had the same behaviour as my
code in the same XFS_DI_LO/HI helpers.

Even back then XFS_GET_DIR_INO8/XFS_SET_DIR_INO8 were only
used by the dir2_sf code, but the dirv1 used a similar XFS_GET_DIR_INO
macro for all read accesses, which had the same limitation.  Writes
on the other hand were done using XFS_DIR_SF_PUT_DIRINO (even for
non-shortform directories), which did not discard the most significant
bytes.

> 	- do we need to increase XFS_MAXINUMBER to reflect the
> 	  current reality of 1TB AGs and simply ignore the zero high
> 	  byte restriction?

I think so.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 26/27] xfs: cleanup I/O-related buffer flags
  2011-07-06  3:54   ` Dave Chinner
@ 2011-07-06  9:11     ` Christoph Hellwig
  0 siblings, 0 replies; 88+ messages in thread
From: Christoph Hellwig @ 2011-07-06  9:11 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

On Wed, Jul 06, 2011 at 01:54:11PM +1000, Dave Chinner wrote:
> Is it worthwhile tagging all these as READ_META and WRITE_META?
> Though that probably needs to be done as a separate commit...

Right now they preempt synchronous writes, which is not something we
want.  There is a patch out on lkml removing that hack, at which point
we should tag them as _META to make blktrace output more useful.

> >  /* flags used only internally */
> > -#define _XBF_PAGES	(1 << 18)/* backed by refcounted pages */
> > -#define	_XBF_RUN_QUEUES	(1 << 19)/* run block device task queue	*/
> > -#define	_XBF_KMEM	(1 << 20)/* backed by heap memory */
> > -#define _XBF_DELWRI_Q	(1 << 21)/* buffer on delwri queue */
> > +#define _XBF_PAGES	(1 << 20)/* backed by refcounted pages */
> > +#define	_XBF_KMEM	(1 << 21)/* backed by heap memory */
> > +#define _XBF_DELWRI_Q	(1 << 22)/* buffer on delwri queue */
> 
> Might be worthwhile cleaning up the stray tab before _XBF_KMEM
> there.

done.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 26/27] xfs: cleanup I/O-related buffer flags
  2011-07-06  4:09   ` Alex Elder
@ 2011-07-06  9:11     ` Christoph Hellwig
  0 siblings, 0 replies; 88+ messages in thread
From: Christoph Hellwig @ 2011-07-06  9:11 UTC (permalink / raw)
  To: Alex Elder; +Cc: Christoph Hellwig, xfs

On Tue, Jul 05, 2011 at 11:09:15PM -0500, Alex Elder wrote:
> On Fri, 2011-07-01 at 05:43 -0400, Christoph Hellwig wrote:
> > Remove the unused and misnamed _XBF_RUN_QUEUES flag, rename XBF_LOG_BUFFER
> > to the more fitting XBF_SYNCIO, and split XBF_ORDERED into XBF_FUA and
> > XBF_FLUSH to allow more fine grained control over the bio flags.  Also
> > cleanup processing of the flags in _xfs_buf_ioapply to make more sense,
> > and renumber the sparse flag number space to group flags by purpose.
> > 
> > Signed-off-by: Christoph Hellwig <hch@lst.de>
> 
> Interesting that _XBF_RUN_QUEUES was never actually used.
> The new names are much more understandable.  Looks like
> READ_META and WRITE_META are now effectively gone...

They have been gone for a while.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 00/27] patch queue for Linux 3.1, V2
  2011-07-06  6:42   ` Christoph Hellwig
@ 2011-07-06 13:32     ` Alex Elder
  2011-07-06 13:43       ` Christoph Hellwig
  0 siblings, 1 reply; 88+ messages in thread
From: Alex Elder @ 2011-07-06 13:32 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: xfs

On Wed, 2011-07-06 at 02:42 -0400, Christoph Hellwig wrote:
> On Tue, Jul 05, 2011 at 11:40:17PM -0500, Alex Elder wrote:
> > On Fri, 2011-07-01 at 05:43 -0400, Christoph Hellwig wrote:
> > > This is my current patch queue for Linux 3.1.  Compared to the last
> > > posting all review comments were incorporated and two additional trivial
> > > patches were added.  The ->writepages implementation was dropped for now,
> > > given the bad situation of kswap-originating writeback, but I'll repost
> > > the fixed version separately to get feedback on the updated version.
> > 
> > I mentioned a bunch of fairly minor issues with
> > this series; I don't believe I found anything
> > that was actually incorrect.  Quite a few of
> > them consist of small but good cleanups.
> > 
> > I'll give you a chance to re-post them (or
> > point me at a repository to pull from).  At
> > this point I trust you'll do the right thing
> > so I'm ready to use what you give me.
> > 
> > For now I'm testing overnight with this
> > latest version of the patches.
> 
> I'll redo the series after going through your and Daves comments.

No unexpected errors overnight.  I'm going to try
to fit Dave's for-3.0 patch(es) in before committing
these, but whenever you've got your updates done (today
preferably) I'll take them.

					-Alex

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 02/27] xfs: re-enable non-blocking behaviour in xfs_map_blocks
  2011-07-06  6:37     ` Christoph Hellwig
@ 2011-07-06 13:36       ` Alex Elder
  0 siblings, 0 replies; 88+ messages in thread
From: Alex Elder @ 2011-07-06 13:36 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: xfs

On Wed, 2011-07-06 at 02:37 -0400, Christoph Hellwig wrote:
> On Tue, Jul 05, 2011 at 05:35:19PM -0500, Alex Elder wrote:
> > On Fri, 2011-07-01 at 05:43 -0400, Christoph Hellwig wrote:
> > > The non-blockig behaviour in xfs_map_blocks currently is conditional on
> > > having both the WB_SYNC_NONE sync_mode and the nonblocking flag set.
> > > The latter used to be used by both pdflush, kswapd and a few other places
> > > in older kernels, but has been fading out starting with the introduction
> > > of the per-bdi flusher threads.
> > > 
> > > Enable the non-blocking behaviour for all WB_SYNC_NONE calls to get back
> > > the behaviour we want.
> > 
> > The subject line should refer to xfs_vm_writepage()
> > (not xfs_map_blocks()).  Unless I hear otherwise I
> > will plan to change that for you.
> 
> Well, the actual xfs_ilock_nowait call is in xfs_map_blocks, the logic
> controlling it in xfs_vm_writepage, so either one is fine.

You're right.  Nevermind.	-Alex

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 00/27] patch queue for Linux 3.1, V2
  2011-07-06 13:32     ` Alex Elder
@ 2011-07-06 13:43       ` Christoph Hellwig
  0 siblings, 0 replies; 88+ messages in thread
From: Christoph Hellwig @ 2011-07-06 13:43 UTC (permalink / raw)
  To: Alex Elder; +Cc: xfs

On Wed, Jul 06, 2011 at 08:32:17AM -0500, Alex Elder wrote:
> No unexpected errors overnight.  I'm going to try
> to fit Dave's for-3.0 patch(es) in before committing
> these, but whenever you've got your updates done (today
> preferably) I'll take them.

I fear you'll have to wait until tomorrow.  While I've implemented
most of the suggestions already I'll be out early today and haven't
even started an xfsqa run yet.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 07/27] xfs: split xfs_itruncate_finish
  2011-07-06  8:11     ` Christoph Hellwig
@ 2011-07-06 14:05       ` Alex Elder
  0 siblings, 0 replies; 88+ messages in thread
From: Alex Elder @ 2011-07-06 14:05 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: xfs

On Wed, 2011-07-06 at 04:11 -0400, Christoph Hellwig wrote:
> On Tue, Jul 05, 2011 at 11:35:58PM -0500, Alex Elder wrote:
> > The above comment (if true--I haven't really checked) seems
> > like something significant to preserve.
> 
> The comment at the goto label already catches the important bits of it.

I was referring to this:

    Aborting from this point might lose some blocks in
    the file system, but oh well.

I don't actually understand what the comment means, but if
you feel it isn't important that's fine.

					-Alex

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 08/27] xfs: improve sync behaviour in the fact of aggressive dirtying
  2011-07-06  8:15     ` Christoph Hellwig
@ 2011-07-06 14:59       ` Alex Elder
  0 siblings, 0 replies; 88+ messages in thread
From: Alex Elder @ 2011-07-06 14:59 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: xfs

On Wed, 2011-07-06 at 04:15 -0400, Christoph Hellwig wrote:
> On Tue, Jul 05, 2011 at 05:36:18PM -0500, Alex Elder wrote:
> > > A large part of the issue is that XFS writes data out itself two times
> > > in the ->sync_fs method, overriding the lifelock protection in the core
> > > writeback code, and another issue is the lock-less xfs_ioend_wait call,
> > > which doesn't prevent new ioend from beeing queue up while waiting for
> > > the count to reach zero.
> > 
> > The change affects only the first thing you mention here, not
> > the second.
> 
> It does.  We're also removing the xfs_ioend_wait done from
> xfs_sync_data/xfs_sync_inode_data.  We still have another one in
> ->write_inode, though.

OK, now I see what you're talking about.  I guess the way it was
stated I expected that the code would now *prevent* new ioends
from being queued while waiting.

> 
> > The i_iocount wait is not affected by your patch.
> 
> We're only removing one of the two we're doing per inode now.
> 
> > I'm OK with the change, but really prefer to have
> > the description not include stuff that just isn't
> > there.  If you want me to commit this as-is, just
> > say so and I will.  Otherwise, post an update and
> > I'll use that.  In any case, you can consider this
> > reviewed by me.
> 
> If you have an idea how to reword the description send it my way.

Here's an attempt.  (It also gives you a chance to correct
my understanding...)

A large part of the issue is that XFS writes data out itself
two times in the ->sync_fs method, overriding the livelock
protection in the core writeback code.  This patch removes
these XFS-internal sync calls and relies on the VFS to do it's
work just like all other filesystems do.

Another issue is the lock-less xfs_ioend_wait() call,
which doesn't prevent new ioends from being queued up
while waiting for the count to reach zero.  Removing
the second SYNC_WAIT call to xfs_sync_data() eliminates
one place this is used unnecessarily by avoiding the
wait request at the end of xfs_sync_inode_data().  In
most cases there is no need to wait for ongoing writes
to make it to disk, as long as those queued at the time
of a sync request get flushed out.

We still wait like this in ->write_inode, and we should
remove that as well, but that's material for a separate
commit.

					-Alex

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 14/27] xfs: kill struct xfs_dir2_sf
  2011-07-06  8:33     ` Christoph Hellwig
@ 2011-07-06 15:05       ` Alex Elder
  0 siblings, 0 replies; 88+ messages in thread
From: Alex Elder @ 2011-07-06 15:05 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: xfs

On Wed, 2011-07-06 at 04:33 -0400, Christoph Hellwig wrote:
> On Tue, Jul 05, 2011 at 10:24:18PM -0500, Alex Elder wrote:
> > On Fri, 2011-07-01 at 05:43 -0400, Christoph Hellwig wrote:
> > > The list field of it is never cactually used, so all uses can simply be
> > > replaced with the xfs_dir2_sf_hdr_t type that it has as first member.
> > > 
> > > Signed-off-by: Christoph Hellwig <hch@lst.de>
> > 
> > Looks like a lot of places could be converted to use
> > "struct xfs_dir2_sf_hdr" rather than the typedef, but
> > it's not worth re-posting for that.  (Plus I suspect
> > such changes may be in forthcoming patches...)
> 
> In general they should, but I try to avoid that where it means
> massive formatting changes, as that just clutters up the patch.

Understood.

> > > +	oldsfp = (xfs_dir2_sf_hdr_t *)dp->i_df.if_u1.if_data;
> > > +
> > >  	ASSERT(dp->i_df.if_bytes == dp->i_d.di_size);
> > >  	ASSERT(dp->i_df.if_u1.if_data != NULL);
> > 
> > 	ASSERT(oldsfp != NULL);
> 
> What for?  We'll just dereference it later anyway.

It was simply because you already assigned oldsfp
the value you were asserting was null.  Your way
states something about the source value though,
so I guess it more directly states the condition
you're assuming here.

> > >  static xfs_ino_t
> > >  xfs_dir2_sf_get_ino(
> > > -	struct xfs_dir2_sf	*sfp,
> > > +	struct xfs_dir2_sf_hdr	*hdr,
> > 
> > I think I like the name "hdr" better than "sfp";
> > was it just too widespread a change to do a
> > similar rename elsewhere?  (xfs_dir2_block_to_sf()
> > uses "sfhp" already, though I like just "hdr".)
> 
> Yeah, I tried to keep the change small in general.  If people like it
> I can do a big sweep to convert stuff to struct types and common names
> as a follow-on.

No pressing need.  If you're inspired to do it, fine, but
it's readable despite the inconsistency.

					-Alex

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 17/27] xfs: kill struct xfs_dir2_block
  2011-07-06  8:37     ` Christoph Hellwig
@ 2011-07-06 15:11       ` Alex Elder
  0 siblings, 0 replies; 88+ messages in thread
From: Alex Elder @ 2011-07-06 15:11 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: xfs

On Wed, 2011-07-06 at 04:37 -0400, Christoph Hellwig wrote:
> On Wed, Jul 06, 2011 at 12:31:57PM +1000, Dave Chinner wrote:
> > >  	btp = xfs_dir2_block_tail_p(mp, hdr);
> > > -	ptr = (char *)block->u;
> > > +	ptr = (char *)(hdr + 1);
> > >  	endptr = (char *)xfs_dir2_block_leaf_p(btp);
> > 
> > That is slightly less obvious what it is doing. It's jumping over
> > the entire header, but could easily be confused with jumping one
> > byte in.

I actually have become pretty accustomed to this
idiom and have a (small) preference for the way it
is here rather than using a new macro that does
the same thing.  Either way is fine with me though.

					-Alex

> > Perhaps adding a wrapper e.g. xfs_dir2_block_data_p(hdr) to match
> > the xfs_dir2_block_tail_p() and xfs_dir2_block_leaf_p() wrappers,
> > and converting all the other cases to use this as well?
> 
> I had that in the initial version, but given that we usually use
> the result as char, and not one of the two types of the union just
> made the code very messy.
> 
> I can try it again, maybe as a add-on patch at the end so that we can
> decide if it actually improves anything.
> 
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs



_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 22/27] xfs: use generic get_unaligned_beXX helpers
  2011-07-06  9:07     ` Christoph Hellwig
@ 2011-07-07  8:00       ` Christoph Hellwig
  0 siblings, 0 replies; 88+ messages in thread
From: Christoph Hellwig @ 2011-07-07  8:00 UTC (permalink / raw)
  To: Dave Chinner; +Cc: Christoph Hellwig, xfs

One interestin caveat about inode numers is that the we use the
XFS_DIR2_DATA_FREE_TAG magic number in the firt two bytes of
struct xfs_dir2_data_unused to detect this free space header in all
non-shortform directory formats.  These two bytes overlap with
the inode number in struct xfs_dir2_data_entry.  

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 88+ messages in thread

* [PATCH 26/27] xfs: cleanup I/O-related buffer flags
  2011-06-29 14:01 [PATCH 00/27] patch queue for Linux 3.1 Christoph Hellwig
@ 2011-06-29 14:01 ` Christoph Hellwig
  0 siblings, 0 replies; 88+ messages in thread
From: Christoph Hellwig @ 2011-06-29 14:01 UTC (permalink / raw)
  To: xfs

[-- Attachment #1: xfs-buf-cleanup-flags --]
[-- Type: text/plain, Size: 15219 bytes --]

Remove the unused and misnamed _XBF_RUN_QUEUES flag, rename XBF_LOG_BUFFER
to the more fitting XBF_SYNCIO, and split XBF_ORDERED into XBF_FUA and
XBF_FLUSH to allow more fine grained control over the bio flags.  Also
cleanup processing of the flags in _xfs_buf_ioapply to make more sense,
and renumber the sparse flag number space to group flags by purpose.

Signed-off-by: Christoph Hellwig <hch@lst.de>

Index: xfs/fs/xfs/linux-2.6/xfs_buf.c
===================================================================
--- xfs.orig/fs/xfs/linux-2.6/xfs_buf.c	2011-06-29 14:04:28.084452749 +0200
+++ xfs/fs/xfs/linux-2.6/xfs_buf.c	2011-06-29 14:13:45.171434748 +0200
@@ -592,10 +592,8 @@ _xfs_buf_read(
 	ASSERT(!(flags & (XBF_DELWRI|XBF_WRITE)));
 	ASSERT(bp->b_bn != XFS_BUF_DADDR_NULL);
 
-	bp->b_flags &= ~(XBF_WRITE | XBF_ASYNC | XBF_DELWRI | \
-			XBF_READ_AHEAD | _XBF_RUN_QUEUES);
-	bp->b_flags |= flags & (XBF_READ | XBF_ASYNC | \
-			XBF_READ_AHEAD | _XBF_RUN_QUEUES);
+	bp->b_flags &= ~(XBF_WRITE | XBF_ASYNC | XBF_DELWRI | XBF_READ_AHEAD);
+	bp->b_flags |= flags & (XBF_READ | XBF_ASYNC | XBF_READ_AHEAD);
 
 	status = xfs_buf_iorequest(bp);
 	if (status || XFS_BUF_ISERROR(bp) || (flags & XBF_ASYNC))
@@ -1211,23 +1209,21 @@ _xfs_buf_ioapply(
 	total_nr_pages = bp->b_page_count;
 	map_i = 0;
 
-	if (bp->b_flags & XBF_ORDERED) {
-		ASSERT(!(bp->b_flags & XBF_READ));
-		rw = WRITE_FLUSH_FUA;
-	} else if (bp->b_flags & XBF_LOG_BUFFER) {
-		ASSERT(!(bp->b_flags & XBF_READ_AHEAD));
-		bp->b_flags &= ~_XBF_RUN_QUEUES;
-		rw = (bp->b_flags & XBF_WRITE) ? WRITE_SYNC : READ_SYNC;
-	} else if (bp->b_flags & _XBF_RUN_QUEUES) {
-		ASSERT(!(bp->b_flags & XBF_READ_AHEAD));
-		bp->b_flags &= ~_XBF_RUN_QUEUES;
-		rw = (bp->b_flags & XBF_WRITE) ? WRITE_META : READ_META;
+	if (bp->b_flags & XBF_WRITE) {
+		if (bp->b_flags & XBF_SYNCIO)
+			rw = WRITE_SYNC;
+		else
+			rw = WRITE;
+		if (bp->b_flags & XBF_FUA)
+			rw |= REQ_FUA;
+		if (bp->b_flags & XBF_FLUSH)
+			rw |= REQ_FLUSH;
+	} else if (bp->b_flags & XBF_READ_AHEAD) {
+		rw = READ;
 	} else {
-		rw = (bp->b_flags & XBF_WRITE) ? WRITE :
-		     (bp->b_flags & XBF_READ_AHEAD) ? READA : READ;
+		rw = READ;
 	}
 
-
 next_chunk:
 	atomic_inc(&bp->b_io_remaining);
 	nr_pages = BIO_MAX_SECTORS >> (PAGE_SHIFT - BBSHIFT);
@@ -1689,8 +1685,7 @@ xfs_buf_delwri_split(
 				break;
 			}
 
-			bp->b_flags &= ~(XBF_DELWRI|_XBF_DELWRI_Q|
-					 _XBF_RUN_QUEUES);
+			bp->b_flags &= ~(XBF_DELWRI | _XBF_DELWRI_Q);
 			bp->b_flags |= XBF_WRITE;
 			list_move_tail(&bp->b_list, list);
 			trace_xfs_buf_delwri_split(bp, _RET_IP_);
Index: xfs/fs/xfs/linux-2.6/xfs_buf.h
===================================================================
--- xfs.orig/fs/xfs/linux-2.6/xfs_buf.h	2011-06-29 14:03:57.994615760 +0200
+++ xfs/fs/xfs/linux-2.6/xfs_buf.h	2011-06-29 14:18:16.806629842 +0200
@@ -46,43 +46,46 @@ typedef enum {
 
 #define XBF_READ	(1 << 0) /* buffer intended for reading from device */
 #define XBF_WRITE	(1 << 1) /* buffer intended for writing to device */
-#define XBF_MAPPED	(1 << 2) /* buffer mapped (b_addr valid) */
+#define XBF_READ_AHEAD	(1 << 2) /* asynchronous read-ahead */
+#define XBF_MAPPED	(1 << 3) /* buffer mapped (b_addr valid) */
 #define XBF_ASYNC	(1 << 4) /* initiator will not wait for completion */
 #define XBF_DONE	(1 << 5) /* all pages in the buffer uptodate */
 #define XBF_DELWRI	(1 << 6) /* buffer has dirty pages */
 #define XBF_STALE	(1 << 7) /* buffer has been staled, do not find it */
-#define XBF_ORDERED	(1 << 11)/* use ordered writes */
-#define XBF_READ_AHEAD	(1 << 12)/* asynchronous read-ahead */
-#define XBF_LOG_BUFFER	(1 << 13)/* this is a buffer used for the log */
+
+/* I/O hints for the BIO layer */
+#define XBF_SYNCIO	(1 << 10)/* treat this buffer as synchronous I/O */
+#define XBF_FUA		(1 << 11)/* force cache write through mode */
+#define XBF_FLUSH	(1 << 12)/* flush the disk cache before a write */
 
 /* flags used only as arguments to access routines */
-#define XBF_LOCK	(1 << 14)/* lock requested */
-#define XBF_TRYLOCK	(1 << 15)/* lock requested, but do not wait */
-#define XBF_DONT_BLOCK	(1 << 16)/* do not block in current thread */
+#define XBF_LOCK	(1 << 15)/* lock requested */
+#define XBF_TRYLOCK	(1 << 16)/* lock requested, but do not wait */
+#define XBF_DONT_BLOCK	(1 << 17)/* do not block in current thread */
 
 /* flags used only internally */
-#define _XBF_PAGES	(1 << 18)/* backed by refcounted pages */
-#define	_XBF_RUN_QUEUES	(1 << 19)/* run block device task queue	*/
-#define	_XBF_KMEM	(1 << 20)/* backed by heap memory */
-#define _XBF_DELWRI_Q	(1 << 21)/* buffer on delwri queue */
+#define _XBF_PAGES	(1 << 20)/* backed by refcounted pages */
+#define	_XBF_KMEM	(1 << 21)/* backed by heap memory */
+#define _XBF_DELWRI_Q	(1 << 22)/* buffer on delwri queue */
 
 typedef unsigned int xfs_buf_flags_t;
 
 #define XFS_BUF_FLAGS \
 	{ XBF_READ,		"READ" }, \
 	{ XBF_WRITE,		"WRITE" }, \
+	{ XBF_READ_AHEAD,	"READ_AHEAD" }, \
 	{ XBF_MAPPED,		"MAPPED" }, \
 	{ XBF_ASYNC,		"ASYNC" }, \
 	{ XBF_DONE,		"DONE" }, \
 	{ XBF_DELWRI,		"DELWRI" }, \
 	{ XBF_STALE,		"STALE" }, \
-	{ XBF_ORDERED,		"ORDERED" }, \
-	{ XBF_READ_AHEAD,	"READ_AHEAD" }, \
+	{ XBF_SYNCIO,		"SYNCIO" }, \
+	{ XBF_FUA,		"FUA" }, \
+	{ XBF_FLUSH,		"FLUSH" }, \
 	{ XBF_LOCK,		"LOCK" },  	/* should never be set */\
 	{ XBF_TRYLOCK,		"TRYLOCK" }, 	/* ditto */\
 	{ XBF_DONT_BLOCK,	"DONT_BLOCK" },	/* ditto */\
 	{ _XBF_PAGES,		"PAGES" }, \
-	{ _XBF_RUN_QUEUES,	"RUN_QUEUES" }, \
 	{ _XBF_KMEM,		"KMEM" }, \
 	{ _XBF_DELWRI_Q,	"DELWRI_Q" }
 
@@ -230,8 +233,9 @@ extern void xfs_buf_terminate(void);
 
 
 #define XFS_BUF_BFLAGS(bp)	((bp)->b_flags)
-#define XFS_BUF_ZEROFLAGS(bp)	((bp)->b_flags &= \
-		~(XBF_READ|XBF_WRITE|XBF_ASYNC|XBF_DELWRI|XBF_ORDERED))
+#define XFS_BUF_ZEROFLAGS(bp) \
+	((bp)->b_flags &= ~(XBF_READ|XBF_WRITE|XBF_ASYNC|XBF_DELWRI| \
+			    XBF_SYNCIO|XBF_FUA|XBF_FLUSH))
 
 void xfs_buf_stale(struct xfs_buf *bp);
 #define XFS_BUF_STALE(bp)	xfs_buf_stale(bp);
@@ -263,10 +267,6 @@ void xfs_buf_stale(struct xfs_buf *bp);
 #define XFS_BUF_UNASYNC(bp)	((bp)->b_flags &= ~XBF_ASYNC)
 #define XFS_BUF_ISASYNC(bp)	((bp)->b_flags & XBF_ASYNC)
 
-#define XFS_BUF_ORDERED(bp)	((bp)->b_flags |= XBF_ORDERED)
-#define XFS_BUF_UNORDERED(bp)	((bp)->b_flags &= ~XBF_ORDERED)
-#define XFS_BUF_ISORDERED(bp)	((bp)->b_flags & XBF_ORDERED)
-
 #define XFS_BUF_HOLD(bp)	xfs_buf_hold(bp)
 #define XFS_BUF_READ(bp)	((bp)->b_flags |= XBF_READ)
 #define XFS_BUF_UNREAD(bp)	((bp)->b_flags &= ~XBF_READ)
Index: xfs/fs/xfs/xfs_log.c
===================================================================
--- xfs.orig/fs/xfs/xfs_log.c	2011-06-29 14:04:18.587837528 +0200
+++ xfs/fs/xfs/xfs_log.c	2011-06-29 14:13:47.761420718 +0200
@@ -1268,7 +1268,6 @@ xlog_bdstrat(
 		return 0;
 	}
 
-	bp->b_flags |= _XBF_RUN_QUEUES;
 	xfs_buf_iorequest(bp);
 	return 0;
 }
@@ -1369,7 +1368,7 @@ xlog_sync(xlog_t		*log,
 	XFS_BUF_ZEROFLAGS(bp);
 	XFS_BUF_BUSY(bp);
 	XFS_BUF_ASYNC(bp);
-	bp->b_flags |= XBF_LOG_BUFFER;
+	bp->b_flags |= XBF_SYNCIO;
 
 	if (log->l_mp->m_flags & XFS_MOUNT_BARRIER) {
 		/*
@@ -1380,7 +1379,7 @@ xlog_sync(xlog_t		*log,
 		 */
 		if (log->l_mp->m_logdev_targp != log->l_mp->m_ddev_targp)
 			xfs_blkdev_issue_flush(log->l_mp->m_ddev_targp);
-		XFS_BUF_ORDERED(bp);
+		bp->b_flags |= XBF_FUA | XBF_FLUSH;
 	}
 
 	ASSERT(XFS_BUF_ADDR(bp) <= log->l_logBBsize-1);
@@ -1413,9 +1412,9 @@ xlog_sync(xlog_t		*log,
 		XFS_BUF_ZEROFLAGS(bp);
 		XFS_BUF_BUSY(bp);
 		XFS_BUF_ASYNC(bp);
-		bp->b_flags |= XBF_LOG_BUFFER;
+		bp->b_flags |= XBF_SYNCIO;
 		if (log->l_mp->m_flags & XFS_MOUNT_BARRIER)
-			XFS_BUF_ORDERED(bp);
+			bp->b_flags |= XBF_FUA | XBF_FLUSH;
 		dptr = XFS_BUF_PTR(bp);
 		/*
 		 * Bump the cycle numbers at the start of each block

Index: xfs/fs/xfs/linux-2.6/xfs_buf.c
===================================================================
--- xfs.orig/fs/xfs/linux-2.6/xfs_buf.c	2011-06-29 14:04:28.084452749 +0200
+++ xfs/fs/xfs/linux-2.6/xfs_buf.c	2011-06-29 14:13:45.171434748 +0200
@@ -592,10 +592,8 @@ _xfs_buf_read(
 	ASSERT(!(flags & (XBF_DELWRI|XBF_WRITE)));
 	ASSERT(bp->b_bn != XFS_BUF_DADDR_NULL);
 
-	bp->b_flags &= ~(XBF_WRITE | XBF_ASYNC | XBF_DELWRI | \
-			XBF_READ_AHEAD | _XBF_RUN_QUEUES);
-	bp->b_flags |= flags & (XBF_READ | XBF_ASYNC | \
-			XBF_READ_AHEAD | _XBF_RUN_QUEUES);
+	bp->b_flags &= ~(XBF_WRITE | XBF_ASYNC | XBF_DELWRI | XBF_READ_AHEAD);
+	bp->b_flags |= flags & (XBF_READ | XBF_ASYNC | XBF_READ_AHEAD);
 
 	status = xfs_buf_iorequest(bp);
 	if (status || XFS_BUF_ISERROR(bp) || (flags & XBF_ASYNC))
@@ -1211,23 +1209,21 @@ _xfs_buf_ioapply(
 	total_nr_pages = bp->b_page_count;
 	map_i = 0;
 
-	if (bp->b_flags & XBF_ORDERED) {
-		ASSERT(!(bp->b_flags & XBF_READ));
-		rw = WRITE_FLUSH_FUA;
-	} else if (bp->b_flags & XBF_LOG_BUFFER) {
-		ASSERT(!(bp->b_flags & XBF_READ_AHEAD));
-		bp->b_flags &= ~_XBF_RUN_QUEUES;
-		rw = (bp->b_flags & XBF_WRITE) ? WRITE_SYNC : READ_SYNC;
-	} else if (bp->b_flags & _XBF_RUN_QUEUES) {
-		ASSERT(!(bp->b_flags & XBF_READ_AHEAD));
-		bp->b_flags &= ~_XBF_RUN_QUEUES;
-		rw = (bp->b_flags & XBF_WRITE) ? WRITE_META : READ_META;
+	if (bp->b_flags & XBF_WRITE) {
+		if (bp->b_flags & XBF_SYNCIO)
+			rw = WRITE_SYNC;
+		else
+			rw = WRITE;
+		if (bp->b_flags & XBF_FUA)
+			rw |= REQ_FUA;
+		if (bp->b_flags & XBF_FLUSH)
+			rw |= REQ_FLUSH;
+	} else if (bp->b_flags & XBF_READ_AHEAD) {
+		rw = READ;
 	} else {
-		rw = (bp->b_flags & XBF_WRITE) ? WRITE :
-		     (bp->b_flags & XBF_READ_AHEAD) ? READA : READ;
+		rw = READ;
 	}
 
-
 next_chunk:
 	atomic_inc(&bp->b_io_remaining);
 	nr_pages = BIO_MAX_SECTORS >> (PAGE_SHIFT - BBSHIFT);
@@ -1689,8 +1685,7 @@ xfs_buf_delwri_split(
 				break;
 			}
 
-			bp->b_flags &= ~(XBF_DELWRI|_XBF_DELWRI_Q|
-					 _XBF_RUN_QUEUES);
+			bp->b_flags &= ~(XBF_DELWRI | _XBF_DELWRI_Q);
 			bp->b_flags |= XBF_WRITE;
 			list_move_tail(&bp->b_list, list);
 			trace_xfs_buf_delwri_split(bp, _RET_IP_);
Index: xfs/fs/xfs/linux-2.6/xfs_buf.h
===================================================================
--- xfs.orig/fs/xfs/linux-2.6/xfs_buf.h	2011-06-29 14:03:57.994615760 +0200
+++ xfs/fs/xfs/linux-2.6/xfs_buf.h	2011-06-29 14:18:16.806629842 +0200
@@ -46,43 +46,46 @@ typedef enum {
 
 #define XBF_READ	(1 << 0) /* buffer intended for reading from device */
 #define XBF_WRITE	(1 << 1) /* buffer intended for writing to device */
-#define XBF_MAPPED	(1 << 2) /* buffer mapped (b_addr valid) */
+#define XBF_READ_AHEAD	(1 << 2) /* asynchronous read-ahead */
+#define XBF_MAPPED	(1 << 3) /* buffer mapped (b_addr valid) */
 #define XBF_ASYNC	(1 << 4) /* initiator will not wait for completion */
 #define XBF_DONE	(1 << 5) /* all pages in the buffer uptodate */
 #define XBF_DELWRI	(1 << 6) /* buffer has dirty pages */
 #define XBF_STALE	(1 << 7) /* buffer has been staled, do not find it */
-#define XBF_ORDERED	(1 << 11)/* use ordered writes */
-#define XBF_READ_AHEAD	(1 << 12)/* asynchronous read-ahead */
-#define XBF_LOG_BUFFER	(1 << 13)/* this is a buffer used for the log */
+
+/* I/O hints for the BIO layer */
+#define XBF_SYNCIO	(1 << 10)/* treat this buffer as synchronous I/O */
+#define XBF_FUA		(1 << 11)/* force cache write through mode */
+#define XBF_FLUSH	(1 << 12)/* flush the disk cache before a write */
 
 /* flags used only as arguments to access routines */
-#define XBF_LOCK	(1 << 14)/* lock requested */
-#define XBF_TRYLOCK	(1 << 15)/* lock requested, but do not wait */
-#define XBF_DONT_BLOCK	(1 << 16)/* do not block in current thread */
+#define XBF_LOCK	(1 << 15)/* lock requested */
+#define XBF_TRYLOCK	(1 << 16)/* lock requested, but do not wait */
+#define XBF_DONT_BLOCK	(1 << 17)/* do not block in current thread */
 
 /* flags used only internally */
-#define _XBF_PAGES	(1 << 18)/* backed by refcounted pages */
-#define	_XBF_RUN_QUEUES	(1 << 19)/* run block device task queue	*/
-#define	_XBF_KMEM	(1 << 20)/* backed by heap memory */
-#define _XBF_DELWRI_Q	(1 << 21)/* buffer on delwri queue */
+#define _XBF_PAGES	(1 << 20)/* backed by refcounted pages */
+#define	_XBF_KMEM	(1 << 21)/* backed by heap memory */
+#define _XBF_DELWRI_Q	(1 << 22)/* buffer on delwri queue */
 
 typedef unsigned int xfs_buf_flags_t;
 
 #define XFS_BUF_FLAGS \
 	{ XBF_READ,		"READ" }, \
 	{ XBF_WRITE,		"WRITE" }, \
+	{ XBF_READ_AHEAD,	"READ_AHEAD" }, \
 	{ XBF_MAPPED,		"MAPPED" }, \
 	{ XBF_ASYNC,		"ASYNC" }, \
 	{ XBF_DONE,		"DONE" }, \
 	{ XBF_DELWRI,		"DELWRI" }, \
 	{ XBF_STALE,		"STALE" }, \
-	{ XBF_ORDERED,		"ORDERED" }, \
-	{ XBF_READ_AHEAD,	"READ_AHEAD" }, \
+	{ XBF_SYNCIO,		"SYNCIO" }, \
+	{ XBF_FUA,		"FUA" }, \
+	{ XBF_FLUSH,		"FLUSH" }, \
 	{ XBF_LOCK,		"LOCK" },  	/* should never be set */\
 	{ XBF_TRYLOCK,		"TRYLOCK" }, 	/* ditto */\
 	{ XBF_DONT_BLOCK,	"DONT_BLOCK" },	/* ditto */\
 	{ _XBF_PAGES,		"PAGES" }, \
-	{ _XBF_RUN_QUEUES,	"RUN_QUEUES" }, \
 	{ _XBF_KMEM,		"KMEM" }, \
 	{ _XBF_DELWRI_Q,	"DELWRI_Q" }
 
@@ -230,8 +233,9 @@ extern void xfs_buf_terminate(void);
 
 
 #define XFS_BUF_BFLAGS(bp)	((bp)->b_flags)
-#define XFS_BUF_ZEROFLAGS(bp)	((bp)->b_flags &= \
-		~(XBF_READ|XBF_WRITE|XBF_ASYNC|XBF_DELWRI|XBF_ORDERED))
+#define XFS_BUF_ZEROFLAGS(bp) \
+	((bp)->b_flags &= ~(XBF_READ|XBF_WRITE|XBF_ASYNC|XBF_DELWRI| \
+			    XBF_SYNCIO|XBF_FUA|XBF_FLUSH))
 
 void xfs_buf_stale(struct xfs_buf *bp);
 #define XFS_BUF_STALE(bp)	xfs_buf_stale(bp);
@@ -263,10 +267,6 @@ void xfs_buf_stale(struct xfs_buf *bp);
 #define XFS_BUF_UNASYNC(bp)	((bp)->b_flags &= ~XBF_ASYNC)
 #define XFS_BUF_ISASYNC(bp)	((bp)->b_flags & XBF_ASYNC)
 
-#define XFS_BUF_ORDERED(bp)	((bp)->b_flags |= XBF_ORDERED)
-#define XFS_BUF_UNORDERED(bp)	((bp)->b_flags &= ~XBF_ORDERED)
-#define XFS_BUF_ISORDERED(bp)	((bp)->b_flags & XBF_ORDERED)
-
 #define XFS_BUF_HOLD(bp)	xfs_buf_hold(bp)
 #define XFS_BUF_READ(bp)	((bp)->b_flags |= XBF_READ)
 #define XFS_BUF_UNREAD(bp)	((bp)->b_flags &= ~XBF_READ)
Index: xfs/fs/xfs/xfs_log.c
===================================================================
--- xfs.orig/fs/xfs/xfs_log.c	2011-06-29 14:04:18.587837528 +0200
+++ xfs/fs/xfs/xfs_log.c	2011-06-29 14:13:47.761420718 +0200
@@ -1268,7 +1268,6 @@ xlog_bdstrat(
 		return 0;
 	}
 
-	bp->b_flags |= _XBF_RUN_QUEUES;
 	xfs_buf_iorequest(bp);
 	return 0;
 }
@@ -1369,7 +1368,7 @@ xlog_sync(xlog_t		*log,
 	XFS_BUF_ZEROFLAGS(bp);
 	XFS_BUF_BUSY(bp);
 	XFS_BUF_ASYNC(bp);
-	bp->b_flags |= XBF_LOG_BUFFER;
+	bp->b_flags |= XBF_SYNCIO;
 
 	if (log->l_mp->m_flags & XFS_MOUNT_BARRIER) {
 		/*
@@ -1380,7 +1379,7 @@ xlog_sync(xlog_t		*log,
 		 */
 		if (log->l_mp->m_logdev_targp != log->l_mp->m_ddev_targp)
 			xfs_blkdev_issue_flush(log->l_mp->m_ddev_targp);
-		XFS_BUF_ORDERED(bp);
+		bp->b_flags |= XBF_FUA | XBF_FLUSH;
 	}
 
 	ASSERT(XFS_BUF_ADDR(bp) <= log->l_logBBsize-1);
@@ -1413,9 +1412,9 @@ xlog_sync(xlog_t		*log,
 		XFS_BUF_ZEROFLAGS(bp);
 		XFS_BUF_BUSY(bp);
 		XFS_BUF_ASYNC(bp);
-		bp->b_flags |= XBF_LOG_BUFFER;
+		bp->b_flags |= XBF_SYNCIO;
 		if (log->l_mp->m_flags & XFS_MOUNT_BARRIER)
-			XFS_BUF_ORDERED(bp);
+			bp->b_flags |= XBF_FUA | XBF_FLUSH;
 		dptr = XFS_BUF_PTR(bp);
 		/*
 		 * Bump the cycle numbers at the start of each block

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 88+ messages in thread

end of thread, other threads:[~2011-07-07  8:00 UTC | newest]

Thread overview: 88+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-07-01  9:43 [PATCH 00/27] patch queue for Linux 3.1, V2 Christoph Hellwig
2011-07-01  9:43 ` [PATCH 01/27] xfs: PF_FSTRANS should never be set in ->writepage Christoph Hellwig
2011-07-01  9:43 ` [PATCH 02/27] xfs: re-enable non-blocking behaviour in xfs_map_blocks Christoph Hellwig
2011-07-05 22:35   ` Alex Elder
2011-07-06  6:37     ` Christoph Hellwig
2011-07-06 13:36       ` Alex Elder
2011-07-01  9:43 ` [PATCH 03/27] xfs: work around bogus gcc warning in xfs_allocbt_init_cursor Christoph Hellwig
2011-07-01  9:43 ` [PATCH 04/27] xfs: split xfs_setattr Christoph Hellwig
2011-07-01  9:43 ` [PATCH 06/27] xfs: kill xfs_itruncate_start Christoph Hellwig
2011-07-01  9:43 ` [PATCH 07/27] xfs: split xfs_itruncate_finish Christoph Hellwig
2011-07-06  4:35   ` Alex Elder
2011-07-06  8:11     ` Christoph Hellwig
2011-07-06 14:05       ` Alex Elder
2011-07-01  9:43 ` [PATCH 08/27] xfs: improve sync behaviour in the fact of aggressive dirtying Christoph Hellwig
2011-07-05 22:36   ` Alex Elder
2011-07-06  8:15     ` Christoph Hellwig
2011-07-06 14:59       ` Alex Elder
2011-07-01  9:43 ` [PATCH 09/27] xfs: fix filesystsem freeze race in xfs_trans_alloc Christoph Hellwig
2011-07-05 22:36   ` Alex Elder
2011-07-01  9:43 ` [PATCH 10/27] xfs: remove i_transp Christoph Hellwig
2011-07-05 22:36   ` Alex Elder
2011-07-01  9:43 ` [PATCH 11/27] xfs: kill the unused struct xfs_sync_work Christoph Hellwig
2011-07-05 22:36   ` Alex Elder
2011-07-01  9:43 ` [PATCH 12/27] xfs: factor out xfs_dir2_leaf_find_entry Christoph Hellwig
2011-07-05 22:36   ` Alex Elder
2011-07-01  9:43 ` [PATCH 13/27] xfs: cleanup shortform directory inode number handling Christoph Hellwig
2011-07-05 22:36   ` Alex Elder
2011-07-01  9:43 ` [PATCH 14/27] xfs: kill struct xfs_dir2_sf Christoph Hellwig
2011-07-06  1:57   ` Dave Chinner
2011-07-06  8:28     ` Christoph Hellwig
2011-07-06  3:24   ` Alex Elder
2011-07-06  8:33     ` Christoph Hellwig
2011-07-06 15:05       ` Alex Elder
2011-07-01  9:43 ` [PATCH 15/27] xfs: cleanup the defintion of struct xfs_dir2_sf_entry Christoph Hellwig
2011-07-06  2:00   ` Dave Chinner
2011-07-06  3:33   ` Alex Elder
2011-07-06  8:34     ` Christoph Hellwig
2011-07-01  9:43 ` [PATCH 16/27] xfs: avoid usage of struct xfs_dir2_block Christoph Hellwig
2011-07-06  2:19   ` Dave Chinner
2011-07-06  8:35     ` Christoph Hellwig
2011-07-06  3:36   ` Alex Elder
2011-07-01  9:43 ` [PATCH 17/27] xfs: kill " Christoph Hellwig
2011-07-06  2:31   ` Dave Chinner
2011-07-06  8:37     ` Christoph Hellwig
2011-07-06 15:11       ` Alex Elder
2011-07-06  3:36   ` Alex Elder
2011-07-01  9:43 ` [PATCH 18/27] xfs: avoid usage of struct xfs_dir2_data Christoph Hellwig
2011-07-06  3:02   ` Dave Chinner
2011-07-06  8:43     ` Christoph Hellwig
2011-07-06  3:38   ` Alex Elder
2011-07-06  8:45     ` Christoph Hellwig
2011-07-01  9:43 ` [PATCH 19/27] xfs: kill " Christoph Hellwig
2011-07-06  3:05   ` Dave Chinner
2011-07-06  3:38   ` Alex Elder
2011-07-01  9:43 ` [PATCH 20/27] xfs: cleanup the defintion of struct xfs_dir2_data_entry Christoph Hellwig
2011-07-06  3:06   ` Dave Chinner
2011-07-06  3:44   ` Alex Elder
2011-07-06  8:48     ` Christoph Hellwig
2011-07-01  9:43 ` [PATCH 21/27] xfs: cleanup struct xfs_dir2_leaf Christoph Hellwig
2011-07-06  3:13   ` Dave Chinner
2011-07-06  3:44   ` Alex Elder
2011-07-01  9:43 ` [PATCH 22/27] xfs: use generic get_unaligned_beXX helpers Christoph Hellwig
2011-07-06  3:44   ` Dave Chinner
2011-07-06  9:07     ` Christoph Hellwig
2011-07-07  8:00       ` Christoph Hellwig
2011-07-06  3:47   ` Alex Elder
2011-07-01  9:43 ` [PATCH 23/27] xfs: remove the unused xfs_bufhash structure Christoph Hellwig
2011-07-06  3:44   ` Dave Chinner
2011-07-06  3:49   ` Alex Elder
2011-07-01  9:43 ` [PATCH 24/27] xfs: clean up buffer locking helpers Christoph Hellwig
2011-07-06  3:47   ` Dave Chinner
2011-07-06  3:55   ` Alex Elder
2011-07-01  9:43 ` [PATCH 25/27] xfs: return the buffer locked from xfs_buf_get_uncached Christoph Hellwig
2011-07-06  3:48   ` Dave Chinner
2011-07-06  3:57   ` Alex Elder
2011-07-01  9:43 ` [PATCH 26/27] xfs: cleanup I/O-related buffer flags Christoph Hellwig
2011-07-06  3:54   ` Dave Chinner
2011-07-06  9:11     ` Christoph Hellwig
2011-07-06  4:09   ` Alex Elder
2011-07-06  9:11     ` Christoph Hellwig
2011-07-01  9:43 ` [PATCH 27/27] xfs: avoid a few disk cache flushes Christoph Hellwig
2011-07-06  3:55   ` Dave Chinner
2011-07-06  4:11   ` Alex Elder
2011-07-06  4:40 ` [PATCH 00/27] patch queue for Linux 3.1, V2 Alex Elder
2011-07-06  6:42   ` Christoph Hellwig
2011-07-06 13:32     ` Alex Elder
2011-07-06 13:43       ` Christoph Hellwig
  -- strict thread matches above, loose matches on Subject: below --
2011-06-29 14:01 [PATCH 00/27] patch queue for Linux 3.1 Christoph Hellwig
2011-06-29 14:01 ` [PATCH 26/27] xfs: cleanup I/O-related buffer flags Christoph Hellwig

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.