All of lore.kernel.org
 help / color / mirror / Atom feed
From: Christoph Hellwig <hch@infradead.org>
To: xfs@oss.sgi.com
Subject: [PATCH 6/6] xfs: remove handling of duplicates the busy extent tree
Date: Fri, 21 Jan 2011 04:22:33 -0500	[thread overview]
Message-ID: <20110121092551.629220945@bombadil.infradead.org> (raw)
In-Reply-To: 20110121092227.115815324@bombadil.infradead.org

[-- Attachment #1: xfs-remove-duplicate-extent-handling --]
[-- Type: text/plain, Size: 9754 bytes --]

With the recent changes we never re-used busy extents.  Remove all handling
of them and replace them with asserts.  This also effectively reverts
commit 955833cf2ad0aa39b336e853cad212d867199984

	"xfs: make the log ticket ID available outside the log infrastructure"

Signed-off-by: Christoph Hellwig <hch@lst.de>

Index: xfs/fs/xfs/xfs_ag.h
===================================================================
--- xfs.orig/fs/xfs/xfs_ag.h	2011-01-18 13:06:47.588254235 +0100
+++ xfs/fs/xfs/xfs_ag.h	2011-01-18 13:07:05.608012022 +0100
@@ -187,7 +187,6 @@ struct xfs_busy_extent {
 	xfs_agnumber_t	agno;
 	xfs_agblock_t	bno;
 	xfs_extlen_t	length;
-	xlog_tid_t	tid;		/* transaction that created this */
 	int		flags;
 };
 
Index: xfs/fs/xfs/xfs_alloc.c
===================================================================
--- xfs.orig/fs/xfs/xfs_alloc.c	2011-01-18 13:06:47.595254654 +0100
+++ xfs/fs/xfs/xfs_alloc.c	2011-01-18 13:08:34.000000000 +0100
@@ -2518,83 +2518,7 @@ error0:
 /*
  * Insert a new extent into the busy tree.
  *
- * The busy extent tree is indexed by the start block of the busy extent.
- * there can be multiple overlapping ranges in the busy extent tree but only
- * ever one entry at a given start block. The reason for this is that
- * multi-block extents can be freed, then smaller chunks of that extent
- * allocated and freed again before the first transaction commit is on disk.
- * If the exact same start block is freed a second time, we have to wait for
- * that busy extent to pass out of the tree before the new extent is inserted.
- * There are two main cases we have to handle here.
- *
- * The first case is a transaction that triggers a "free - allocate - free"
- * cycle. This can occur during btree manipulations as a btree block is freed
- * to the freelist, then allocated from the free list, then freed again. In
- * this case, the second extxpnet free is what triggers the duplicate and as
- * such the transaction IDs should match. Because the extent was allocated in
- * this transaction, the transaction must be marked as synchronous. This is
- * true for all cases where the free/alloc/free occurs in the one transaction,
- * hence the addition of the ASSERT(tp->t_flags & XFS_TRANS_SYNC) to this case.
- * This serves to catch violations of the second case quite effectively.
- *
- * The second case is where the free/alloc/free occur in different
- * transactions. In this case, the thread freeing the extent the second time
- * can't mark the extent busy immediately because it is already tracked in a
- * transaction that may be committing.  When the log commit for the existing
- * busy extent completes, the busy extent will be removed from the tree. If we
- * allow the second busy insert to continue using that busy extent structure,
- * it can be freed before this transaction is safely in the log.  Hence our
- * only option in this case is to force the log to remove the existing busy
- * extent from the list before we insert the new one with the current
- * transaction ID.
- *
- * The problem we are trying to avoid in the free-alloc-free in separate
- * transactions is most easily described with a timeline:
- *
- *      Thread 1	Thread 2	Thread 3	xfslogd
- *	xact alloc
- *	free X
- *	mark busy
- *	commit xact
- *	free xact
- *			xact alloc
- *			alloc X
- *			busy search
- *			mark xact sync
- *			commit xact
- *			free xact
- *			force log
- *			checkpoint starts
- *			....
- *					xact alloc
- *					free X
- *					mark busy
- *					finds match
- *					*** KABOOM! ***
- *					....
- *							log IO completes
- *							unbusy X
- *			checkpoint completes
- *
- * By issuing a log force in thread 3 @ "KABOOM", the thread will block until
- * the checkpoint completes, and the busy extent it matched will have been
- * removed from the tree when it is woken. Hence it can then continue safely.
- *
- * However, to ensure this matching process is robust, we need to use the
- * transaction ID for identifying transaction, as delayed logging results in
- * the busy extent and transaction lifecycles being different. i.e. the busy
- * extent is active for a lot longer than the transaction.  Hence the
- * transaction structure can be freed and reallocated, then mark the same
- * extent busy again in the new transaction. In this case the new transaction
- * will have a different tid but can have the same address, and hence we need
- * to check against the tid.
- *
- * Future: for delayed logging, we could avoid the log force if the extent was
- * first freed in the current checkpoint sequence. This, however, requires the
- * ability to pin the current checkpoint in memory until this transaction
- * commits to ensure that both the original free and the current one combine
- * logically into the one checkpoint. If the checkpoint sequences are
- * different, however, we still need to wait on a log force.
+ * The busy extent tree is indexed by the start block of the busy extent
  */
 STATIC void
 xfs_alloc_busy_insert(
@@ -2609,8 +2533,6 @@ xfs_alloc_busy_insert(
 	struct xfs_perag	*pag;
 	struct rb_node		**rbp;
 	struct rb_node		*parent;
-	int			match;
-
 
 	new = kmem_zalloc(sizeof(struct xfs_busy_extent), KM_MAYFAIL);
 	if (!new) {
@@ -2628,74 +2550,35 @@ xfs_alloc_busy_insert(
 	new->bno = bno;
 	new->length = len;
 	new->flags = flags;
-	new->tid = xfs_log_get_trans_ident(tp);
-
 	INIT_LIST_HEAD(&new->list);
 
 	/* trace before insert to be able to see failed inserts */
 	trace_xfs_alloc_busy(tp, agno, bno, len, 0);
 
 	pag = xfs_perag_get(tp->t_mountp, new->agno);
-restart:
 	spin_lock(&pag->pagb_lock);
 	rbp = &pag->pagb_tree.rb_node;
 	parent = NULL;
-	busyp = NULL;
-	match = 0;
-	while (*rbp && match >= 0) {
+	while (*rbp) {
 		parent = *rbp;
 		busyp = rb_entry(parent, struct xfs_busy_extent, rb_node);
 
 		if (new->bno < busyp->bno) {
-			/* may overlap, but exact start block is lower */
+			ASSERT(new->bno + new->length <= busyp->bno);
 			rbp = &(*rbp)->rb_left;
-			if (new->bno + new->length > busyp->bno)
-				match = busyp->tid == new->tid ? 1 : -1;
-		} else if (new->bno > busyp->bno) {
-			/* may overlap, but exact start block is higher */
-			rbp = &(*rbp)->rb_right;
-			if (bno < busyp->bno + busyp->length)
-				match = busyp->tid == new->tid ? 1 : -1;
 		} else {
-			match = busyp->tid == new->tid ? 1 : -1;
-			break;
+			ASSERT(new->bno > busyp->bno);
+			ASSERT(bno >= busyp->bno + busyp->length);
+			rbp = &(*rbp)->rb_right;
 		}
 	}
-	if (match < 0) {
-		/* overlap marked busy in different transaction */
-		spin_unlock(&pag->pagb_lock);
-		xfs_log_force(tp->t_mountp, XFS_LOG_SYNC);
-		goto restart;
-	}
-	if (match > 0) {
-		/*
-		 * overlap marked busy in same transaction. Update if exact
-		 * start block match, otherwise combine the busy extents into
-		 * a single range.
-		 */
-		if (busyp->bno == new->bno) {
-			busyp->length = max(busyp->length, new->length);
-			spin_unlock(&pag->pagb_lock);
-			ASSERT(tp->t_flags & XFS_TRANS_SYNC);
-			xfs_perag_put(pag);
-			kmem_free(new);
-			return;
-		}
-		rb_erase(&busyp->rb_node, &pag->pagb_tree);
-		new->length = max(busyp->bno + busyp->length,
-					new->bno + new->length) -
-				min(busyp->bno, new->bno);
-		new->bno = min(busyp->bno, new->bno);
-	} else
-		busyp = NULL;
 
 	rb_link_node(&new->rb_node, parent, rbp);
 	rb_insert_color(&new->rb_node, &pag->pagb_tree);
-
 	list_add(&new->list, &tp->t_busy);
+
 	spin_unlock(&pag->pagb_lock);
 	xfs_perag_put(pag);
-	kmem_free(busyp);
 }
 
 /*
Index: xfs/fs/xfs/xfs_log.c
===================================================================
--- xfs.orig/fs/xfs/xfs_log.c	2011-01-18 13:06:47.612254724 +0100
+++ xfs/fs/xfs/xfs_log.c	2011-01-18 13:07:05.929013420 +0100
@@ -3256,13 +3256,6 @@ xfs_log_ticket_get(
 	return ticket;
 }
 
-xlog_tid_t
-xfs_log_get_trans_ident(
-	struct xfs_trans	*tp)
-{
-	return tp->t_ticket->t_tid;
-}
-
 /*
  * Allocate and initialise a new log ticket.
  */
Index: xfs/fs/xfs/xfs_log.h
===================================================================
--- xfs.orig/fs/xfs/xfs_log.h	2011-01-18 13:06:47.618254514 +0100
+++ xfs/fs/xfs/xfs_log.h	2011-01-18 13:07:05.936017889 +0100
@@ -189,8 +189,6 @@ void	  xlog_iodone(struct xfs_buf *);
 struct xlog_ticket *xfs_log_ticket_get(struct xlog_ticket *ticket);
 void	  xfs_log_ticket_put(struct xlog_ticket *ticket);
 
-xlog_tid_t xfs_log_get_trans_ident(struct xfs_trans *tp);
-
 int	xfs_log_commit_cil(struct xfs_mount *mp, struct xfs_trans *tp,
 				struct xfs_log_vec *log_vector,
 				xfs_lsn_t *commit_lsn, int flags);
Index: xfs/fs/xfs/xfs_log_priv.h
===================================================================
--- xfs.orig/fs/xfs/xfs_log_priv.h	2011-01-18 13:06:47.625253956 +0100
+++ xfs/fs/xfs/xfs_log_priv.h	2011-01-18 13:07:05.951006924 +0100
@@ -148,6 +148,8 @@ static inline uint xlog_get_client_id(__
 #define	XLOG_RECOVERY_NEEDED	0x4	/* log was recovered */
 #define XLOG_IO_ERROR		0x8	/* log hit an I/O error, and being
 					   shutdown */
+typedef __uint32_t xlog_tid_t;
+
 
 #ifdef __KERNEL__
 /*
Index: xfs/fs/xfs/xfs_types.h
===================================================================
--- xfs.orig/fs/xfs/xfs_types.h	2011-01-18 13:06:47.645254026 +0100
+++ xfs/fs/xfs/xfs_types.h	2011-01-18 13:07:05.958005458 +0100
@@ -73,8 +73,6 @@ typedef	__int32_t	xfs_tid_t;	/* transact
 typedef	__uint32_t	xfs_dablk_t;	/* dir/attr block number (in file) */
 typedef	__uint32_t	xfs_dahash_t;	/* dir/attr hash value */
 
-typedef __uint32_t	xlog_tid_t;	/* transaction ID type */
-
 /*
  * These types are 64 bits on disk but are either 32 or 64 bits in memory.
  * Disk based types:

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

  parent reply	other threads:[~2011-01-21  9:23 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-01-21  9:22 [PATCH 0/6] do not reuse busy extents Christoph Hellwig
2011-01-21  9:22 ` [PATCH 1/6] xfs: clean up the xfs_alloc_compute_aligned calling convention Christoph Hellwig
2011-01-25  4:23   ` Dave Chinner
2011-01-27 23:21   ` Alex Elder
2011-01-21  9:22 ` [PATCH 3/6] xfs: do not immediately reuse busy extent ranges Christoph Hellwig
2011-01-27 23:20   ` Alex Elder
2011-01-28  1:58   ` Dave Chinner
2011-01-28 16:19     ` Alex Elder
2011-01-29  0:25       ` Dave Chinner
2011-01-21  9:22 ` [PATCH 4/6] xfs: optimize xfs_alloc_fix_freelist Christoph Hellwig
2011-01-28  5:36   ` Dave Chinner
2011-01-28  5:51     ` Dave Chinner
2011-01-28 22:17   ` Alex Elder
2011-01-21  9:22 ` [PATCH 5/6] xfs: do not classify freed allocation btree blocks as busy Christoph Hellwig
2011-01-28  6:33   ` Dave Chinner
2011-01-28 22:17   ` Alex Elder
2011-02-01 23:02   ` Alex Elder
2011-01-21  9:22 ` Christoph Hellwig [this message]
2011-02-01 23:02   ` [PATCH 6/6] xfs: remove handling of duplicates the busy extent tree Alex Elder

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20110121092551.629220945@bombadil.infradead.org \
    --to=hch@infradead.org \
    --cc=xfs@oss.sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.