All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 1/8 v2] xfs: log race fixes and cleanups
@ 2019-09-05  8:47 Dave Chinner
  2019-09-05  8:47 ` [PATCH 1/8] xfs: push the AIL in xlog_grant_head_wake Dave Chinner
                   ` (8 more replies)
  0 siblings, 9 replies; 25+ messages in thread
From: Dave Chinner @ 2019-09-05  8:47 UTC (permalink / raw)
  To: linux-xfs

Hi folks,

This is a followup to the original patchset here:

https://lore.kernel.org/linux-xfs/20190905072856.GE1119@dread.disaster.area/T/#m8ae6bdccbf4248b5d219ad40ab5caa92f9e0a979

It is aimed at solving the hangs occurring in generic/530 and
cleaning up the code around the iclog completion. This version is
largely just changes for review comments, though there is a new
patch (#3) to address what is probably the underlying cause of
all the issues.

Chandan has tested the new unbound workqueue change and it solved
the hang on his test machine, passes all the log group tests on my
test machines.

Cheers,

Dave.

Version 3:
- add patch to yeild the CPU in AGI unlinked list processing and to
  allow the CIL push work to be done on any CPU so it doesn't get
  stuck on a CPU that isn't being yeilded.
- Added comment to explain the AIL push added to an unsuccessful
  wakeup in xlog_grant_head_wake()
- removed "did callbacks" parameter from
  xlog_state_callback_check_state and cleaned up comments
- fixed detection of the icloglock being dropped during the iclog
  state scan.
- fixed unintended logic change with iclogs in IOERROR state when
  factoring out state processing
- other small whitespace and cleanup bits.



^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH 1/8] xfs: push the AIL in xlog_grant_head_wake
  2019-09-05  8:47 [PATCH 1/8 v2] xfs: log race fixes and cleanups Dave Chinner
@ 2019-09-05  8:47 ` Dave Chinner
  2019-09-05 15:18   ` Darrick J. Wong
  2019-09-05  8:47 ` [PATCH 2/8] xfs: fix missed wakeup on l_flush_wait Dave Chinner
                   ` (7 subsequent siblings)
  8 siblings, 1 reply; 25+ messages in thread
From: Dave Chinner @ 2019-09-05  8:47 UTC (permalink / raw)
  To: linux-xfs

From: Dave Chinner <dchinner@redhat.com>

In the situation where the log is full and the CIL has not recently
flushed, the AIL push threshold is throttled back to the where the
last write of the head of the log was completed. This is stored in
log->l_last_sync_lsn. Hence if the CIL holds > 25% of the log space
pinned by flushes and/or aggregation in progress, we can get the
situation where the head of the log lags a long way behind the
reservation grant head.

When this happens, the AIL push target is trimmed back from where
the reservation grant head wants to push the log tail to, back to
where the head of the log currently is. This means the push target
doesn't reach far enough into the log to actually move the tail
before the transaction reservation goes to sleep.

When the CIL push completes, it moves the log head forward such that
the AIL push target can now be moved, but that has no mechanism for
puhsing the log tail. Further, if the next tail movement of the log
is not large enough wake the waiter (i.e. still not enough space for
it to have a reservation granted), we don't wake anything up, and
hence we do not update the AIL push target to take into account the
head of the log moving and allowing the push target to be moved
forwards.

To avoid this particular condition, if we fail to wake the first
waiter on the grant head because we don't have enough space,
push on the AIL again. This will pick up any movement of the log
head and allow the push target to move forward due to completion of
CIL pushing.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/xfs_log.c | 23 ++++++++++++++++++++++-
 1 file changed, 22 insertions(+), 1 deletion(-)

diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c
index b159a9e9fef0..5e21450fb8f5 100644
--- a/fs/xfs/xfs_log.c
+++ b/fs/xfs/xfs_log.c
@@ -214,15 +214,36 @@ xlog_grant_head_wake(
 {
 	struct xlog_ticket	*tic;
 	int			need_bytes;
+	bool			woken_task = false;
 
 	list_for_each_entry(tic, &head->waiters, t_queue) {
+		/*
+		 * The is a chance that the size of the CIL checkpoints in
+		 * progress result at the last AIL push resulted in the log head
+		 * (l_last_sync_lsn) not reflecting where the log head now is.
+		 * Hence when we are woken here, it may be the head of the log
+		 * that has moved rather than the tail. In that case, the tail
+		 * didn't move and there won't be space available until the AIL
+		 * push target is updated and the tail pushed again. If the AIL
+		 * is already pushed to it's target, we will hang here until
+		 * something else updates the AIL push target.
+		 *
+		 * Hence if there isn't space to wake the first waiter on the
+		 * grant head, we need to push the AIL again to ensure the
+		 * target reflects the current log tail and log head position
+		 * before we wait for the tail to move again.
+		 */
 		need_bytes = xlog_ticket_reservation(log, head, tic);
-		if (*free_bytes < need_bytes)
+		if (*free_bytes < need_bytes) {
+			if (!woken_task)
+				xlog_grant_push_ail(log, need_bytes);
 			return false;
+		}
 
 		*free_bytes -= need_bytes;
 		trace_xfs_log_grant_wake_up(log, tic);
 		wake_up_process(tic->t_task);
+		woken_task = true;
 	}
 
 	return true;
-- 
2.23.0.rc1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 2/8] xfs: fix missed wakeup on l_flush_wait
  2019-09-05  8:47 [PATCH 1/8 v2] xfs: log race fixes and cleanups Dave Chinner
  2019-09-05  8:47 ` [PATCH 1/8] xfs: push the AIL in xlog_grant_head_wake Dave Chinner
@ 2019-09-05  8:47 ` Dave Chinner
  2019-09-05 15:21   ` Darrick J. Wong
  2019-09-05  8:47 ` [PATCH 3/8] xfs: prevent CIL push holdoff in log recovery Dave Chinner
                   ` (6 subsequent siblings)
  8 siblings, 1 reply; 25+ messages in thread
From: Dave Chinner @ 2019-09-05  8:47 UTC (permalink / raw)
  To: linux-xfs

From: Rik van Riel <riel@surriel.com>

The code in xlog_wait uses the spinlock to make adding the task to
the wait queue, and setting the task state to UNINTERRUPTIBLE atomic
with respect to the waker.

Doing the wakeup after releasing the spinlock opens up the following
race condition:

Task 1					task 2
add task to wait queue
					wake up task
set task state to UNINTERRUPTIBLE

This issue was found through code inspection as a result of kworkers
being observed stuck in UNINTERRUPTIBLE state with an empty
wait queue. It is rare and largely unreproducable.

Simply moving the spin_unlock to after the wake_up_all results
in the waker not being able to see a task on the waitqueue before
it has set its state to UNINTERRUPTIBLE.

This bug dates back to the conversion of this code to generic
waitqueue infrastructure from a counting semaphore back in 2008
which didn't place the wakeups consistently w.r.t. to the relevant
spin locks.

[dchinner: Also fix a similar issue in the shutdown path on
xc_commit_wait. Update commit log with more details of the issue.]

Fixes: d748c62367eb ("[XFS] Convert l_flushsema to a sv_t")
Reported-by: Chris Mason <clm@fb.com>
Signed-off-by: Rik van Riel <riel@surriel.com>
Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/xfs_log.c | 9 ++++-----
 1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c
index 5e21450fb8f5..8380ed065608 100644
--- a/fs/xfs/xfs_log.c
+++ b/fs/xfs/xfs_log.c
@@ -2644,7 +2644,6 @@ xlog_state_do_callback(
 	int		   funcdidcallbacks; /* flag: function did callbacks */
 	int		   repeats;	/* for issuing console warnings if
 					 * looping too many times */
-	int		   wake = 0;
 
 	spin_lock(&log->l_icloglock);
 	first_iclog = iclog = log->l_iclog;
@@ -2840,11 +2839,9 @@ xlog_state_do_callback(
 #endif
 
 	if (log->l_iclog->ic_state & (XLOG_STATE_ACTIVE|XLOG_STATE_IOERROR))
-		wake = 1;
-	spin_unlock(&log->l_icloglock);
-
-	if (wake)
 		wake_up_all(&log->l_flush_wait);
+
+	spin_unlock(&log->l_icloglock);
 }
 
 
@@ -3944,7 +3941,9 @@ xfs_log_force_umount(
 	 * item committed callback functions will do this again under lock to
 	 * avoid races.
 	 */
+	spin_lock(&log->l_cilp->xc_push_lock);
 	wake_up_all(&log->l_cilp->xc_commit_wait);
+	spin_unlock(&log->l_cilp->xc_push_lock);
 	xlog_state_do_callback(log, true, NULL);
 
 #ifdef XFSERRORDEBUG
-- 
2.23.0.rc1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 3/8] xfs: prevent CIL push holdoff in log recovery
  2019-09-05  8:47 [PATCH 1/8 v2] xfs: log race fixes and cleanups Dave Chinner
  2019-09-05  8:47 ` [PATCH 1/8] xfs: push the AIL in xlog_grant_head_wake Dave Chinner
  2019-09-05  8:47 ` [PATCH 2/8] xfs: fix missed wakeup on l_flush_wait Dave Chinner
@ 2019-09-05  8:47 ` Dave Chinner
  2019-09-05 15:26   ` Darrick J. Wong
  2019-09-05  8:47 ` [PATCH 4/8] xfs: factor debug code out of xlog_state_do_callback() Dave Chinner
                   ` (5 subsequent siblings)
  8 siblings, 1 reply; 25+ messages in thread
From: Dave Chinner @ 2019-09-05  8:47 UTC (permalink / raw)
  To: linux-xfs

From: Dave Chinner <dchinner@redhat.com>

generic/530 on a machine with enough ram and a non-preemptible
kernel can run the AGI processing phase of log recovery enitrely out
of cache. This means it never blocks on locks, never waits for IO
and runs entirely through the unlinked lists until it either
completes or blocks and hangs because it has run out of log space.

It runs out of log space because the background CIL push is
scheduled but never runs. queue_work() queues the CIL work on the
current CPU that is busy, and the workqueue code will not run it on
any other CPU. Hence if the unlinked list processing never yields
the CPU voluntarily, the push work is delayed indefinitely. This
results in the CIL aggregating changes until all the log space is
consumed.

When the log recoveyr processing evenutally blocks, the CIL flushes
but because the last iclog isn't submitted for IO because it isn't
full, the CIL flush never completes and nothing ever moves the log
head forwards, or indeed inserts anything into the tail of the log,
and hence nothing is able to get the log moving again and recovery
hangs.

There are several problems here, but the two obvious ones from
the trace are that:
	a) log recovery does not yield the CPU for over 4 seconds,
	b) binding CIL pushes to a single CPU is a really bad idea.

This patch addresses just these two aspects of the problem, and are
suitable for backporting to work around any issues in older kernels.
The more fundamental problem of preventing the CIL from consuming
more than 50% of the log without committing will take more invasive
and complex work, so will be done as followup work.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/xfs_log_recover.c | 1 +
 fs/xfs/xfs_super.c       | 3 ++-
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c
index f05c6c99c4f3..c9665455431e 100644
--- a/fs/xfs/xfs_log_recover.c
+++ b/fs/xfs/xfs_log_recover.c
@@ -5080,6 +5080,7 @@ xlog_recover_process_iunlinks(
 			while (agino != NULLAGINO) {
 				agino = xlog_recover_process_one_iunlink(mp,
 							agno, agino, bucket);
+				cond_resched();
 			}
 		}
 		xfs_buf_rele(agibp);
diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
index f9450235533c..55a268997bde 100644
--- a/fs/xfs/xfs_super.c
+++ b/fs/xfs/xfs_super.c
@@ -818,7 +818,8 @@ xfs_init_mount_workqueues(
 		goto out_destroy_buf;
 
 	mp->m_cil_workqueue = alloc_workqueue("xfs-cil/%s",
-			WQ_MEM_RECLAIM|WQ_FREEZABLE, 0, mp->m_fsname);
+			WQ_MEM_RECLAIM|WQ_FREEZABLE|WQ_UNBOUND,
+			0, mp->m_fsname);
 	if (!mp->m_cil_workqueue)
 		goto out_destroy_unwritten;
 
-- 
2.23.0.rc1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 4/8] xfs: factor debug code out of xlog_state_do_callback()
  2019-09-05  8:47 [PATCH 1/8 v2] xfs: log race fixes and cleanups Dave Chinner
                   ` (2 preceding siblings ...)
  2019-09-05  8:47 ` [PATCH 3/8] xfs: prevent CIL push holdoff in log recovery Dave Chinner
@ 2019-09-05  8:47 ` Dave Chinner
  2019-09-05 15:30   ` Darrick J. Wong
  2019-09-05  8:47 ` [PATCH 5/8] xfs: factor callbacks " Dave Chinner
                   ` (4 subsequent siblings)
  8 siblings, 1 reply; 25+ messages in thread
From: Dave Chinner @ 2019-09-05  8:47 UTC (permalink / raw)
  To: linux-xfs

From: Dave Chinner <dchinner@redhat.com>

Start making this function readable by lifting the debug code into
a conditional function.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/xfs_log.c | 79 +++++++++++++++++++++++++++---------------------
 1 file changed, 44 insertions(+), 35 deletions(-)

diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c
index 8380ed065608..2904bf0d17f3 100644
--- a/fs/xfs/xfs_log.c
+++ b/fs/xfs/xfs_log.c
@@ -2628,6 +2628,48 @@ xlog_get_lowest_lsn(
 	return lowest_lsn;
 }
 
+#ifdef DEBUG
+/*
+ * Make one last gasp attempt to see if iclogs are being left in limbo.  If the
+ * above loop finds an iclog earlier than the current iclog and in one of the
+ * syncing states, the current iclog is put into DO_CALLBACK and the callbacks
+ * are deferred to the completion of the earlier iclog. Walk the iclogs in order
+ * and make sure that no iclog is in DO_CALLBACK unless an earlier iclog is in
+ * one of the syncing states.
+ *
+ * Note that SYNCING|IOERROR is a valid state so we cannot just check for
+ * ic_state == SYNCING.
+ */
+static void
+xlog_state_callback_check_state(
+	struct xlog		*log)
+{
+	struct xlog_in_core	*first_iclog = log->l_iclog;
+	struct xlog_in_core	*iclog = first_iclog;
+
+	do {
+		ASSERT(iclog->ic_state != XLOG_STATE_DO_CALLBACK);
+		/*
+		 * Terminate the loop if iclogs are found in states
+		 * which will cause other threads to clean up iclogs.
+		 *
+		 * SYNCING - i/o completion will go through logs
+		 * DONE_SYNC - interrupt thread should be waiting for
+		 *              l_icloglock
+		 * IOERROR - give up hope all ye who enter here
+		 */
+		if (iclog->ic_state == XLOG_STATE_WANT_SYNC ||
+		    iclog->ic_state & XLOG_STATE_SYNCING ||
+		    iclog->ic_state == XLOG_STATE_DONE_SYNC ||
+		    iclog->ic_state == XLOG_STATE_IOERROR )
+			break;
+		iclog = iclog->ic_next;
+	} while (first_iclog != iclog);
+}
+#else
+#define xlog_state_callback_check_state(l)	((void)0)
+#endif
+
 STATIC void
 xlog_state_do_callback(
 	struct xlog		*log,
@@ -2802,41 +2844,8 @@ xlog_state_do_callback(
 		}
 	} while (!ioerrors && loopdidcallbacks);
 
-#ifdef DEBUG
-	/*
-	 * Make one last gasp attempt to see if iclogs are being left in limbo.
-	 * If the above loop finds an iclog earlier than the current iclog and
-	 * in one of the syncing states, the current iclog is put into
-	 * DO_CALLBACK and the callbacks are deferred to the completion of the
-	 * earlier iclog. Walk the iclogs in order and make sure that no iclog
-	 * is in DO_CALLBACK unless an earlier iclog is in one of the syncing
-	 * states.
-	 *
-	 * Note that SYNCING|IOABORT is a valid state so we cannot just check
-	 * for ic_state == SYNCING.
-	 */
-	if (funcdidcallbacks) {
-		first_iclog = iclog = log->l_iclog;
-		do {
-			ASSERT(iclog->ic_state != XLOG_STATE_DO_CALLBACK);
-			/*
-			 * Terminate the loop if iclogs are found in states
-			 * which will cause other threads to clean up iclogs.
-			 *
-			 * SYNCING - i/o completion will go through logs
-			 * DONE_SYNC - interrupt thread should be waiting for
-			 *              l_icloglock
-			 * IOERROR - give up hope all ye who enter here
-			 */
-			if (iclog->ic_state == XLOG_STATE_WANT_SYNC ||
-			    iclog->ic_state & XLOG_STATE_SYNCING ||
-			    iclog->ic_state == XLOG_STATE_DONE_SYNC ||
-			    iclog->ic_state == XLOG_STATE_IOERROR )
-				break;
-			iclog = iclog->ic_next;
-		} while (first_iclog != iclog);
-	}
-#endif
+	if (funcdidcallbacks)
+		xlog_state_callback_check_state(log);
 
 	if (log->l_iclog->ic_state & (XLOG_STATE_ACTIVE|XLOG_STATE_IOERROR))
 		wake_up_all(&log->l_flush_wait);
-- 
2.23.0.rc1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 5/8] xfs: factor callbacks out of xlog_state_do_callback()
  2019-09-05  8:47 [PATCH 1/8 v2] xfs: log race fixes and cleanups Dave Chinner
                   ` (3 preceding siblings ...)
  2019-09-05  8:47 ` [PATCH 4/8] xfs: factor debug code out of xlog_state_do_callback() Dave Chinner
@ 2019-09-05  8:47 ` Dave Chinner
  2019-09-05 15:39   ` Darrick J. Wong
  2019-09-05  8:47 ` [PATCH 6/8] xfs: factor iclog state processing " Dave Chinner
                   ` (3 subsequent siblings)
  8 siblings, 1 reply; 25+ messages in thread
From: Dave Chinner @ 2019-09-05  8:47 UTC (permalink / raw)
  To: linux-xfs

From: Dave Chinner <dchinner@redhat.com>

Simplify the code flow by lifting the iclog callback work out of
the main iclog iteration loop. This isolates the log juggling and
callbacks from the iclog state change logic in the loop.

Note that the loopdidcallbacks variable is not actually tracking
whether callbacks are actually run - it is tracking whether the
icloglock was dropped during the loop and so determines if we
completed the entire iclog scan loop atomically. Hence we know for
certain there are either no more ordered completions to run or
that the next completion will run the remaining ordered iclog
completions. Hence rename that variable appropriately for it's
function.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/xfs_log.c | 70 +++++++++++++++++++++++++++++++-----------------
 1 file changed, 45 insertions(+), 25 deletions(-)

diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c
index 2904bf0d17f3..73aa8e152c83 100644
--- a/fs/xfs/xfs_log.c
+++ b/fs/xfs/xfs_log.c
@@ -2628,6 +2628,42 @@ xlog_get_lowest_lsn(
 	return lowest_lsn;
 }
 
+/*
+ * Keep processing entries in the iclog callback list until we come around and
+ * it is empty.  We need to atomically see that the list is empty and change the
+ * state to DIRTY so that we don't miss any more callbacks being added.
+ *
+ * This function is called with the icloglock held and returns with it held. We
+ * drop it while running callbacks, however, as holding it over thousands of
+ * callbacks is unnecessary and causes excessive contention if we do.
+ */
+static void
+xlog_state_do_iclog_callbacks(
+	struct xlog		*log,
+	struct xlog_in_core	*iclog,
+	bool			aborted)
+{
+	spin_unlock(&log->l_icloglock);
+	spin_lock(&iclog->ic_callback_lock);
+	while (!list_empty(&iclog->ic_callbacks)) {
+		LIST_HEAD(tmp);
+
+		list_splice_init(&iclog->ic_callbacks, &tmp);
+
+		spin_unlock(&iclog->ic_callback_lock);
+		xlog_cil_process_committed(&tmp, aborted);
+		spin_lock(&iclog->ic_callback_lock);
+	}
+
+	/*
+	 * Pick up the icloglock while still holding the callback lock so we
+	 * serialise against anyone trying to add more callbacks to this iclog
+	 * now we've finished processing.
+	 */
+	spin_lock(&log->l_icloglock);
+	spin_unlock(&iclog->ic_callback_lock);
+}
+
 #ifdef DEBUG
 /*
  * Make one last gasp attempt to see if iclogs are being left in limbo.  If the
@@ -2682,7 +2718,7 @@ xlog_state_do_callback(
 	int		   flushcnt = 0;
 	xfs_lsn_t	   lowest_lsn;
 	int		   ioerrors;	/* counter: iclogs with errors */
-	int		   loopdidcallbacks; /* flag: inner loop did callbacks*/
+	bool		   cycled_icloglock;
 	int		   funcdidcallbacks; /* flag: function did callbacks */
 	int		   repeats;	/* for issuing console warnings if
 					 * looping too many times */
@@ -2704,7 +2740,7 @@ xlog_state_do_callback(
 		 */
 		first_iclog = log->l_iclog;
 		iclog = log->l_iclog;
-		loopdidcallbacks = 0;
+		cycled_icloglock = false;
 		repeats++;
 
 		do {
@@ -2795,31 +2831,13 @@ xlog_state_do_callback(
 			} else
 				ioerrors++;
 
-			spin_unlock(&log->l_icloglock);
-
 			/*
-			 * Keep processing entries in the callback list until
-			 * we come around and it is empty.  We need to
-			 * atomically see that the list is empty and change the
-			 * state to DIRTY so that we don't miss any more
-			 * callbacks being added.
+			 * Running callbacks will drop the icloglock which means
+			 * we'll have to run at least one more complete loop.
 			 */
-			spin_lock(&iclog->ic_callback_lock);
-			while (!list_empty(&iclog->ic_callbacks)) {
-				LIST_HEAD(tmp);
+			cycled_icloglock = true;
+			xlog_state_do_iclog_callbacks(log, iclog, aborted);
 
-				list_splice_init(&iclog->ic_callbacks, &tmp);
-
-				spin_unlock(&iclog->ic_callback_lock);
-				xlog_cil_process_committed(&tmp, aborted);
-				spin_lock(&iclog->ic_callback_lock);
-			}
-
-			loopdidcallbacks++;
-			funcdidcallbacks++;
-
-			spin_lock(&log->l_icloglock);
-			spin_unlock(&iclog->ic_callback_lock);
 			if (!(iclog->ic_state & XLOG_STATE_IOERROR))
 				iclog->ic_state = XLOG_STATE_DIRTY;
 
@@ -2835,6 +2853,8 @@ xlog_state_do_callback(
 			iclog = iclog->ic_next;
 		} while (first_iclog != iclog);
 
+		funcdidcallbacks += cycled_icloglock;
+
 		if (repeats > 5000) {
 			flushcnt += repeats;
 			repeats = 0;
@@ -2842,7 +2862,7 @@ xlog_state_do_callback(
 				"%s: possible infinite loop (%d iterations)",
 				__func__, flushcnt);
 		}
-	} while (!ioerrors && loopdidcallbacks);
+	} while (!ioerrors && cycled_icloglock);
 
 	if (funcdidcallbacks)
 		xlog_state_callback_check_state(log);
-- 
2.23.0.rc1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 6/8] xfs: factor iclog state processing out of xlog_state_do_callback()
  2019-09-05  8:47 [PATCH 1/8 v2] xfs: log race fixes and cleanups Dave Chinner
                   ` (4 preceding siblings ...)
  2019-09-05  8:47 ` [PATCH 5/8] xfs: factor callbacks " Dave Chinner
@ 2019-09-05  8:47 ` Dave Chinner
  2019-09-05 15:45   ` Darrick J. Wong
  2019-09-05  8:47 ` [PATCH 7/8] xfs: push iclog state cleaning into xlog_state_clean_log Dave Chinner
                   ` (2 subsequent siblings)
  8 siblings, 1 reply; 25+ messages in thread
From: Dave Chinner @ 2019-09-05  8:47 UTC (permalink / raw)
  To: linux-xfs

From: Dave Chinner <dchinner@redhat.com>

The iclog IO completion state processing is somewhat complex, and
because it's inside two nested loops it is highly indented and very
hard to read. Factor it out, flatten the logic flow and clean up the
comments so that it much easier to see what the code is doing both
in processing the individual iclogs and in the over
xlog_state_do_callback() operation.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/xfs_log.c | 195 ++++++++++++++++++++++++-----------------------
 1 file changed, 98 insertions(+), 97 deletions(-)

diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c
index 73aa8e152c83..356204ddf865 100644
--- a/fs/xfs/xfs_log.c
+++ b/fs/xfs/xfs_log.c
@@ -2628,6 +2628,90 @@ xlog_get_lowest_lsn(
 	return lowest_lsn;
 }
 
+/*
+ * Return true if we need to stop processing, false to continue to the next
+ * iclog. The caller will need to run callbacks if the iclog is returned in the
+ * XLOG_STATE_CALLBACK state.
+ */
+static bool
+xlog_state_iodone_process_iclog(
+	struct xlog		*log,
+	struct xlog_in_core	*iclog,
+	struct xlog_in_core	*completed_iclog,
+	bool			*ioerror)
+{
+	xfs_lsn_t		lowest_lsn;
+
+	/* Skip all iclogs in the ACTIVE & DIRTY states */
+	if (iclog->ic_state & (XLOG_STATE_ACTIVE | XLOG_STATE_DIRTY))
+		return false;
+
+	/*
+	 * Between marking a filesystem SHUTDOWN and stopping the log, we do
+	 * flush all iclogs to disk (if there wasn't a log I/O error). So, we do
+	 * want things to go smoothly in case of just a SHUTDOWN  w/o a
+	 * LOG_IO_ERROR.
+	 */
+	if (iclog->ic_state & XLOG_STATE_IOERROR) {
+		*ioerror = true;
+		return false;
+	}
+
+	/*
+	 * Can only perform callbacks in order.  Since this iclog is not in the
+	 * DONE_SYNC/ DO_CALLBACK state, we skip the rest and just try to clean
+	 * up.  If we set our iclog to DO_CALLBACK, we will not process it when
+	 * we retry since a previous iclog is in the CALLBACK and the state
+	 * cannot change since we are holding the l_icloglock.
+	 */
+	if (!(iclog->ic_state &
+			(XLOG_STATE_DONE_SYNC | XLOG_STATE_DO_CALLBACK))) {
+		if (completed_iclog &&
+		    (completed_iclog->ic_state == XLOG_STATE_DONE_SYNC)) {
+			completed_iclog->ic_state = XLOG_STATE_DO_CALLBACK;
+		}
+		return true;
+	}
+
+	/*
+	 * We now have an iclog that is in either the DO_CALLBACK or DONE_SYNC
+	 * states. The other states (WANT_SYNC, SYNCING, or CALLBACK were caught
+	 * by the above if and are going to clean (i.e. we aren't doing their
+	 * callbacks) see the above if.
+	 *
+	 * We will do one more check here to see if we have chased our tail
+	 * around.
+	 */
+	lowest_lsn = xlog_get_lowest_lsn(log);
+	if (lowest_lsn &&
+	    XFS_LSN_CMP(lowest_lsn, be64_to_cpu(iclog->ic_header.h_lsn)) < 0)
+		return false; /* Leave this iclog for another thread */
+
+	iclog->ic_state = XLOG_STATE_CALLBACK;
+
+	/*
+	 * Completion of a iclog IO does not imply that a transaction has
+	 * completed, as transactions can be large enough to span many iclogs.
+	 * We cannot change the tail of the log half way through a transaction
+	 * as this may be the only transaction in the log and moving th etail to
+	 * point to the middle of it will prevent recovery from finding the
+	 * start of the transaction.  Hence we should only update the
+	 * last_sync_lsn if this iclog contains transaction completion callbacks
+	 * on it.
+	 *
+	 * We have to do this before we drop the icloglock to ensure we are the
+	 * only one that can update it.
+	 */
+	ASSERT(XFS_LSN_CMP(atomic64_read(&log->l_last_sync_lsn),
+			be64_to_cpu(iclog->ic_header.h_lsn)) <= 0);
+	if (!list_empty_careful(&iclog->ic_callbacks))
+		atomic64_set(&log->l_last_sync_lsn,
+			be64_to_cpu(iclog->ic_header.h_lsn));
+
+	return false;
+
+}
+
 /*
  * Keep processing entries in the iclog callback list until we come around and
  * it is empty.  We need to atomically see that the list is empty and change the
@@ -2712,22 +2796,15 @@ xlog_state_do_callback(
 	bool			aborted,
 	struct xlog_in_core	*ciclog)
 {
-	xlog_in_core_t	   *iclog;
-	xlog_in_core_t	   *first_iclog;	/* used to know when we've
-						 * processed all iclogs once */
-	int		   flushcnt = 0;
-	xfs_lsn_t	   lowest_lsn;
-	int		   ioerrors;	/* counter: iclogs with errors */
-	bool		   cycled_icloglock;
-	int		   funcdidcallbacks; /* flag: function did callbacks */
-	int		   repeats;	/* for issuing console warnings if
-					 * looping too many times */
+	struct xlog_in_core	*iclog;
+	struct xlog_in_core	*first_iclog;
+	int			flushcnt = 0;
+	int			funcdidcallbacks = 0;
+	bool			cycled_icloglock;
+	bool			ioerror;
+	int			repeats = 0;
 
 	spin_lock(&log->l_icloglock);
-	first_iclog = iclog = log->l_iclog;
-	ioerrors = 0;
-	funcdidcallbacks = 0;
-	repeats = 0;
 
 	do {
 		/*
@@ -2741,96 +2818,20 @@ xlog_state_do_callback(
 		first_iclog = log->l_iclog;
 		iclog = log->l_iclog;
 		cycled_icloglock = false;
+		ioerror = false;
 		repeats++;
 
 		do {
+			if (xlog_state_iodone_process_iclog(log, iclog,
+							ciclog, &ioerror))
+				break;
 
-			/* skip all iclogs in the ACTIVE & DIRTY states */
-			if (iclog->ic_state &
-			    (XLOG_STATE_ACTIVE|XLOG_STATE_DIRTY)) {
+			if (!(iclog->ic_state &
+			      (XLOG_STATE_CALLBACK | XLOG_STATE_IOERROR))) {
 				iclog = iclog->ic_next;
 				continue;
 			}
 
-			/*
-			 * Between marking a filesystem SHUTDOWN and stopping
-			 * the log, we do flush all iclogs to disk (if there
-			 * wasn't a log I/O error). So, we do want things to
-			 * go smoothly in case of just a SHUTDOWN  w/o a
-			 * LOG_IO_ERROR.
-			 */
-			if (!(iclog->ic_state & XLOG_STATE_IOERROR)) {
-				/*
-				 * Can only perform callbacks in order.  Since
-				 * this iclog is not in the DONE_SYNC/
-				 * DO_CALLBACK state, we skip the rest and
-				 * just try to clean up.  If we set our iclog
-				 * to DO_CALLBACK, we will not process it when
-				 * we retry since a previous iclog is in the
-				 * CALLBACK and the state cannot change since
-				 * we are holding the l_icloglock.
-				 */
-				if (!(iclog->ic_state &
-					(XLOG_STATE_DONE_SYNC |
-						 XLOG_STATE_DO_CALLBACK))) {
-					if (ciclog && (ciclog->ic_state ==
-							XLOG_STATE_DONE_SYNC)) {
-						ciclog->ic_state = XLOG_STATE_DO_CALLBACK;
-					}
-					break;
-				}
-				/*
-				 * We now have an iclog that is in either the
-				 * DO_CALLBACK or DONE_SYNC states. The other
-				 * states (WANT_SYNC, SYNCING, or CALLBACK were
-				 * caught by the above if and are going to
-				 * clean (i.e. we aren't doing their callbacks)
-				 * see the above if.
-				 */
-
-				/*
-				 * We will do one more check here to see if we
-				 * have chased our tail around.
-				 */
-
-				lowest_lsn = xlog_get_lowest_lsn(log);
-				if (lowest_lsn &&
-				    XFS_LSN_CMP(lowest_lsn,
-						be64_to_cpu(iclog->ic_header.h_lsn)) < 0) {
-					iclog = iclog->ic_next;
-					continue; /* Leave this iclog for
-						   * another thread */
-				}
-
-				iclog->ic_state = XLOG_STATE_CALLBACK;
-
-
-				/*
-				 * Completion of a iclog IO does not imply that
-				 * a transaction has completed, as transactions
-				 * can be large enough to span many iclogs. We
-				 * cannot change the tail of the log half way
-				 * through a transaction as this may be the only
-				 * transaction in the log and moving th etail to
-				 * point to the middle of it will prevent
-				 * recovery from finding the start of the
-				 * transaction. Hence we should only update the
-				 * last_sync_lsn if this iclog contains
-				 * transaction completion callbacks on it.
-				 *
-				 * We have to do this before we drop the
-				 * icloglock to ensure we are the only one that
-				 * can update it.
-				 */
-				ASSERT(XFS_LSN_CMP(atomic64_read(&log->l_last_sync_lsn),
-					be64_to_cpu(iclog->ic_header.h_lsn)) <= 0);
-				if (!list_empty_careful(&iclog->ic_callbacks))
-					atomic64_set(&log->l_last_sync_lsn,
-						be64_to_cpu(iclog->ic_header.h_lsn));
-
-			} else
-				ioerrors++;
-
 			/*
 			 * Running callbacks will drop the icloglock which means
 			 * we'll have to run at least one more complete loop.
@@ -2862,7 +2863,7 @@ xlog_state_do_callback(
 				"%s: possible infinite loop (%d iterations)",
 				__func__, flushcnt);
 		}
-	} while (!ioerrors && cycled_icloglock);
+	} while (!ioerror && cycled_icloglock);
 
 	if (funcdidcallbacks)
 		xlog_state_callback_check_state(log);
-- 
2.23.0.rc1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 7/8] xfs: push iclog state cleaning into xlog_state_clean_log
  2019-09-05  8:47 [PATCH 1/8 v2] xfs: log race fixes and cleanups Dave Chinner
                   ` (5 preceding siblings ...)
  2019-09-05  8:47 ` [PATCH 6/8] xfs: factor iclog state processing " Dave Chinner
@ 2019-09-05  8:47 ` Dave Chinner
  2019-09-05 15:48   ` Darrick J. Wong
  2019-09-05  8:47 ` [PATCH 8/8] xfs: push the grant head when the log head moves forward Dave Chinner
  2019-09-05 15:44 ` [PATCH 1/8 v2] xfs: log race fixes and cleanups Christoph Hellwig
  8 siblings, 1 reply; 25+ messages in thread
From: Dave Chinner @ 2019-09-05  8:47 UTC (permalink / raw)
  To: linux-xfs

From: Dave Chinner <dchinner@redhat.com>

xlog_state_clean_log() is only called from one place, and it occurs
when an iclog is transitioning back to ACTIVE. Prior to calling
xlog_state_clean_log, the iclog we are processing has a hard coded
state check to DIRTY so that xlog_state_clean_log() processes it
correctly. We also have a hard coded wakeup after
xlog_state_clean_log() to enfore log force waiters on that iclog
are woken correctly.

Both of these things are operations required to finish processing an
iclog and return it to the ACTIVE state again, so they make little
sense to be separated from the rest of the clean state transition
code.

Hence push these things inside xlog_state_clean_log(), document the
behaviour and rename it xlog_state_clean_iclog() to indicate that
it's being driven by an iclog state change and does the iclog state
change work itself.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 fs/xfs/xfs_log.c | 57 ++++++++++++++++++++++++++++--------------------
 1 file changed, 33 insertions(+), 24 deletions(-)

diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c
index 356204ddf865..bef314361bc4 100644
--- a/fs/xfs/xfs_log.c
+++ b/fs/xfs/xfs_log.c
@@ -2521,21 +2521,35 @@ xlog_write(
  *****************************************************************************
  */
 
-/* Clean iclogs starting from the head.  This ordering must be
- * maintained, so an iclog doesn't become ACTIVE beyond one that
- * is SYNCING.  This is also required to maintain the notion that we use
- * a ordered wait queue to hold off would be writers to the log when every
- * iclog is trying to sync to disk.
+/*
+ * An iclog has just finished it's completion processing, so we need to update
+ * the iclog state and propagate that up into the overall log state. Hence we
+ * prepare the iclog for cleaning, and then clean all the pending dirty iclogs
+ * starting from the head, and then wake up any threads that are waiting for the
+ * iclog to be marked clean.
+ *
+ * The ordering of marking iclogs ACTIVE must be maintained, so an iclog
+ * doesn't become ACTIVE beyond one that is SYNCING.  This is also required to
+ * maintain the notion that we use a ordered wait queue to hold off would be
+ * writers to the log when every iclog is trying to sync to disk.
+ *
+ * Caller must hold the icloglock before calling us.
  *
- * State Change: DIRTY -> ACTIVE
+ * State Change: !IOERROR -> DIRTY -> ACTIVE
  */
 STATIC void
-xlog_state_clean_log(
-	struct xlog *log)
+xlog_state_clean_iclog(
+	struct xlog		*log,
+	struct xlog_in_core	*dirty_iclog)
 {
-	xlog_in_core_t	*iclog;
-	int changed = 0;
+	struct xlog_in_core	*iclog;
+	int			changed = 0;
+
+	/* Prepare the completed iclog. */
+	if (!(dirty_iclog->ic_state & XLOG_STATE_IOERROR))
+		dirty_iclog->ic_state = XLOG_STATE_DIRTY;
 
+	/* Walk all the iclogs to update the ordered active state. */
 	iclog = log->l_iclog;
 	do {
 		if (iclog->ic_state == XLOG_STATE_DIRTY) {
@@ -2573,7 +2587,13 @@ xlog_state_clean_log(
 		iclog = iclog->ic_next;
 	} while (iclog != log->l_iclog);
 
-	/* log is locked when we are called */
+
+	/*
+	 * Wake up threads waiting in xfs_log_force() for the dirty iclog
+	 * to be cleaned.
+	 */
+	wake_up_all(&dirty_iclog->ic_force_wait);
+
 	/*
 	 * Change state for the dummy log recording.
 	 * We usually go to NEED. But we go to NEED2 if the changed indicates
@@ -2607,7 +2627,7 @@ xlog_state_clean_log(
 			ASSERT(0);
 		}
 	}
-}	/* xlog_state_clean_log */
+}
 
 STATIC xfs_lsn_t
 xlog_get_lowest_lsn(
@@ -2839,18 +2859,7 @@ xlog_state_do_callback(
 			cycled_icloglock = true;
 			xlog_state_do_iclog_callbacks(log, iclog, aborted);
 
-			if (!(iclog->ic_state & XLOG_STATE_IOERROR))
-				iclog->ic_state = XLOG_STATE_DIRTY;
-
-			/*
-			 * Transition from DIRTY to ACTIVE if applicable.
-			 * NOP if STATE_IOERROR.
-			 */
-			xlog_state_clean_log(log);
-
-			/* wake up threads waiting in xfs_log_force() */
-			wake_up_all(&iclog->ic_force_wait);
-
+			xlog_state_clean_iclog(log, iclog);
 			iclog = iclog->ic_next;
 		} while (first_iclog != iclog);
 
-- 
2.23.0.rc1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 8/8] xfs: push the grant head when the log head moves forward
  2019-09-05  8:47 [PATCH 1/8 v2] xfs: log race fixes and cleanups Dave Chinner
                   ` (6 preceding siblings ...)
  2019-09-05  8:47 ` [PATCH 7/8] xfs: push iclog state cleaning into xlog_state_clean_log Dave Chinner
@ 2019-09-05  8:47 ` Dave Chinner
  2019-09-05 16:00   ` Darrick J. Wong
  2019-09-05 15:44 ` [PATCH 1/8 v2] xfs: log race fixes and cleanups Christoph Hellwig
  8 siblings, 1 reply; 25+ messages in thread
From: Dave Chinner @ 2019-09-05  8:47 UTC (permalink / raw)
  To: linux-xfs

From: Dave Chinner <dchinner@redhat.com>

When the log fills up, we can get into the state where the
outstanding items in the CIL being committed and aggregated are
larger than the range that the reservation grant head tail pushing
will attempt to clean. This can result in the tail pushing range
being trimmed back to the the log head (l_last_sync_lsn) and so
may not actually move the push target at all.

When the iclogs associated with the CIL commit finally land, the
log head moves forward, and this removes the restriction on the AIL
push target. However, if we already have transactions sleeping on
the grant head, and there's nothing in the AIL still to flush from
the current push target, then nothing will move the tail of the log
and trigger a log reservation wakeup.

Hence the there is nothing that will trigger xlog_grant_push_ail()
to recalculate the AIL push target and start pushing on the AIL
again to write back the metadata objects that pin the tail of the
log and hence free up space and allow the transaction reservations
to be woken and make progress.

Hence we need to push on the grant head when we move the log head
forward, as this may be the only trigger we have that can move the
AIL push target forwards in this situation.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/xfs_log.c | 72 +++++++++++++++++++++++++++++++-----------------
 1 file changed, 47 insertions(+), 25 deletions(-)

diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c
index bef314361bc4..f90765af6916 100644
--- a/fs/xfs/xfs_log.c
+++ b/fs/xfs/xfs_log.c
@@ -2648,6 +2648,46 @@ xlog_get_lowest_lsn(
 	return lowest_lsn;
 }
 
+/*
+ * Completion of a iclog IO does not imply that a transaction has completed, as
+ * transactions can be large enough to span many iclogs. We cannot change the
+ * tail of the log half way through a transaction as this may be the only
+ * transaction in the log and moving the tail to point to the middle of it
+ * will prevent recovery from finding the start of the transaction. Hence we
+ * should only update the last_sync_lsn if this iclog contains transaction
+ * completion callbacks on it.
+ *
+ * We have to do this before we drop the icloglock to ensure we are the only one
+ * that can update it.
+ *
+ * If we are moving the last_sync_lsn forwards, we also need to ensure we kick
+ * the reservation grant head pushing. This is due to the fact that the push
+ * target is bound by the current last_sync_lsn value. Hence if we have a large
+ * amount of log space bound up in this committing transaction then the
+ * last_sync_lsn value may be the limiting factor preventing tail pushing from
+ * freeing space in the log. Hence once we've updated the last_sync_lsn we
+ * should push the AIL to ensure the push target (and hence the grant head) is
+ * no longer bound by the old log head location and can move forwards and make
+ * progress again.
+ */
+static void
+xlog_state_set_callback(
+	struct xlog		*log,
+	struct xlog_in_core	*iclog,
+	xfs_lsn_t		header_lsn)
+{
+	iclog->ic_state = XLOG_STATE_CALLBACK;
+
+	ASSERT(XFS_LSN_CMP(atomic64_read(&log->l_last_sync_lsn),
+			   header_lsn) <= 0);
+
+	if (list_empty_careful(&iclog->ic_callbacks))
+		return;
+
+	atomic64_set(&log->l_last_sync_lsn, header_lsn);
+	xlog_grant_push_ail(log, 0);
+}
+
 /*
  * Return true if we need to stop processing, false to continue to the next
  * iclog. The caller will need to run callbacks if the iclog is returned in the
@@ -2661,6 +2701,7 @@ xlog_state_iodone_process_iclog(
 	bool			*ioerror)
 {
 	xfs_lsn_t		lowest_lsn;
+	xfs_lsn_t		header_lsn;
 
 	/* Skip all iclogs in the ACTIVE & DIRTY states */
 	if (iclog->ic_state & (XLOG_STATE_ACTIVE | XLOG_STATE_DIRTY))
@@ -2700,34 +2741,15 @@ xlog_state_iodone_process_iclog(
 	 * callbacks) see the above if.
 	 *
 	 * We will do one more check here to see if we have chased our tail
-	 * around.
+	 * around. If this is not the lowest lsn iclog, then we will leave it
+	 * for another completion to process.
 	 */
+	header_lsn = be64_to_cpu(iclog->ic_header.h_lsn);
 	lowest_lsn = xlog_get_lowest_lsn(log);
-	if (lowest_lsn &&
-	    XFS_LSN_CMP(lowest_lsn, be64_to_cpu(iclog->ic_header.h_lsn)) < 0)
-		return false; /* Leave this iclog for another thread */
-
-	iclog->ic_state = XLOG_STATE_CALLBACK;
-
-	/*
-	 * Completion of a iclog IO does not imply that a transaction has
-	 * completed, as transactions can be large enough to span many iclogs.
-	 * We cannot change the tail of the log half way through a transaction
-	 * as this may be the only transaction in the log and moving th etail to
-	 * point to the middle of it will prevent recovery from finding the
-	 * start of the transaction.  Hence we should only update the
-	 * last_sync_lsn if this iclog contains transaction completion callbacks
-	 * on it.
-	 *
-	 * We have to do this before we drop the icloglock to ensure we are the
-	 * only one that can update it.
-	 */
-	ASSERT(XFS_LSN_CMP(atomic64_read(&log->l_last_sync_lsn),
-			be64_to_cpu(iclog->ic_header.h_lsn)) <= 0);
-	if (!list_empty_careful(&iclog->ic_callbacks))
-		atomic64_set(&log->l_last_sync_lsn,
-			be64_to_cpu(iclog->ic_header.h_lsn));
+	if (lowest_lsn && XFS_LSN_CMP(lowest_lsn, header_lsn) < 0)
+		return false;
 
+	xlog_state_set_callback(log, iclog, header_lsn);
 	return false;
 
 }
-- 
2.23.0.rc1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH 1/8] xfs: push the AIL in xlog_grant_head_wake
  2019-09-05  8:47 ` [PATCH 1/8] xfs: push the AIL in xlog_grant_head_wake Dave Chinner
@ 2019-09-05 15:18   ` Darrick J. Wong
  2019-09-05 22:02     ` Dave Chinner
  0 siblings, 1 reply; 25+ messages in thread
From: Darrick J. Wong @ 2019-09-05 15:18 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs

On Thu, Sep 05, 2019 at 06:47:10PM +1000, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> In the situation where the log is full and the CIL has not recently
> flushed, the AIL push threshold is throttled back to the where the
> last write of the head of the log was completed. This is stored in
> log->l_last_sync_lsn. Hence if the CIL holds > 25% of the log space
> pinned by flushes and/or aggregation in progress, we can get the
> situation where the head of the log lags a long way behind the
> reservation grant head.
> 
> When this happens, the AIL push target is trimmed back from where
> the reservation grant head wants to push the log tail to, back to
> where the head of the log currently is. This means the push target
> doesn't reach far enough into the log to actually move the tail
> before the transaction reservation goes to sleep.
> 
> When the CIL push completes, it moves the log head forward such that
> the AIL push target can now be moved, but that has no mechanism for
> puhsing the log tail. Further, if the next tail movement of the log
> is not large enough wake the waiter (i.e. still not enough space for
> it to have a reservation granted), we don't wake anything up, and
> hence we do not update the AIL push target to take into account the
> head of the log moving and allowing the push target to be moved
> forwards.
> 
> To avoid this particular condition, if we fail to wake the first
> waiter on the grant head because we don't have enough space,
> push on the AIL again. This will pick up any movement of the log
> head and allow the push target to move forward due to completion of
> CIL pushing.
> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> ---
>  fs/xfs/xfs_log.c | 23 ++++++++++++++++++++++-
>  1 file changed, 22 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c
> index b159a9e9fef0..5e21450fb8f5 100644
> --- a/fs/xfs/xfs_log.c
> +++ b/fs/xfs/xfs_log.c
> @@ -214,15 +214,36 @@ xlog_grant_head_wake(
>  {
>  	struct xlog_ticket	*tic;
>  	int			need_bytes;
> +	bool			woken_task = false;
>  
>  	list_for_each_entry(tic, &head->waiters, t_queue) {
> +		/*
> +		 * The is a chance that the size of the CIL checkpoints in
> +		 * progress result at the last AIL push resulted in the log head
> +		 * (l_last_sync_lsn) not reflecting where the log head now is.

That's a bit difficult to understand...

"There is a chance that the size of the CIL checkpoints in progress at
the last AIL push results in the last committed log head (l_last_sync_lsn)
not reflecting where the log head is now." ?

(Did I get that right?)

> +		 * Hence when we are woken here, it may be the head of the log
> +		 * that has moved rather than the tail. In that case, the tail
> +		 * didn't move and there won't be space available until the AIL
> +		 * push target is updated and the tail pushed again. If the AIL
> +		 * is already pushed to it's target, we will hang here until

Nit: "its", not "it is".

Other than that I think I can tell what this is doing now. :)

--D

> +		 * something else updates the AIL push target.
> +		 *
> +		 * Hence if there isn't space to wake the first waiter on the
> +		 * grant head, we need to push the AIL again to ensure the
> +		 * target reflects the current log tail and log head position
> +		 * before we wait for the tail to move again.
> +		 */
>  		need_bytes = xlog_ticket_reservation(log, head, tic);
> -		if (*free_bytes < need_bytes)
> +		if (*free_bytes < need_bytes) {
> +			if (!woken_task)
> +				xlog_grant_push_ail(log, need_bytes);
>  			return false;
> +		}
>  
>  		*free_bytes -= need_bytes;
>  		trace_xfs_log_grant_wake_up(log, tic);
>  		wake_up_process(tic->t_task);
> +		woken_task = true;
>  	}
>  
>  	return true;
> -- 
> 2.23.0.rc1
> 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 2/8] xfs: fix missed wakeup on l_flush_wait
  2019-09-05  8:47 ` [PATCH 2/8] xfs: fix missed wakeup on l_flush_wait Dave Chinner
@ 2019-09-05 15:21   ` Darrick J. Wong
  0 siblings, 0 replies; 25+ messages in thread
From: Darrick J. Wong @ 2019-09-05 15:21 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs

On Thu, Sep 05, 2019 at 06:47:11PM +1000, Dave Chinner wrote:
> From: Rik van Riel <riel@surriel.com>
> 
> The code in xlog_wait uses the spinlock to make adding the task to
> the wait queue, and setting the task state to UNINTERRUPTIBLE atomic
> with respect to the waker.
> 
> Doing the wakeup after releasing the spinlock opens up the following
> race condition:
> 
> Task 1					task 2
> add task to wait queue
> 					wake up task
> set task state to UNINTERRUPTIBLE
> 
> This issue was found through code inspection as a result of kworkers
> being observed stuck in UNINTERRUPTIBLE state with an empty
> wait queue. It is rare and largely unreproducable.
> 
> Simply moving the spin_unlock to after the wake_up_all results
> in the waker not being able to see a task on the waitqueue before
> it has set its state to UNINTERRUPTIBLE.
> 
> This bug dates back to the conversion of this code to generic
> waitqueue infrastructure from a counting semaphore back in 2008
> which didn't place the wakeups consistently w.r.t. to the relevant
> spin locks.
> 
> [dchinner: Also fix a similar issue in the shutdown path on
> xc_commit_wait. Update commit log with more details of the issue.]
> 
> Fixes: d748c62367eb ("[XFS] Convert l_flushsema to a sv_t")
> Reported-by: Chris Mason <clm@fb.com>
> Signed-off-by: Rik van Riel <riel@surriel.com>
> Signed-off-by: Dave Chinner <dchinner@redhat.com>

Looks ok to me,
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>

--D

> ---
>  fs/xfs/xfs_log.c | 9 ++++-----
>  1 file changed, 4 insertions(+), 5 deletions(-)
> 
> diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c
> index 5e21450fb8f5..8380ed065608 100644
> --- a/fs/xfs/xfs_log.c
> +++ b/fs/xfs/xfs_log.c
> @@ -2644,7 +2644,6 @@ xlog_state_do_callback(
>  	int		   funcdidcallbacks; /* flag: function did callbacks */
>  	int		   repeats;	/* for issuing console warnings if
>  					 * looping too many times */
> -	int		   wake = 0;
>  
>  	spin_lock(&log->l_icloglock);
>  	first_iclog = iclog = log->l_iclog;
> @@ -2840,11 +2839,9 @@ xlog_state_do_callback(
>  #endif
>  
>  	if (log->l_iclog->ic_state & (XLOG_STATE_ACTIVE|XLOG_STATE_IOERROR))
> -		wake = 1;
> -	spin_unlock(&log->l_icloglock);
> -
> -	if (wake)
>  		wake_up_all(&log->l_flush_wait);
> +
> +	spin_unlock(&log->l_icloglock);
>  }
>  
>  
> @@ -3944,7 +3941,9 @@ xfs_log_force_umount(
>  	 * item committed callback functions will do this again under lock to
>  	 * avoid races.
>  	 */
> +	spin_lock(&log->l_cilp->xc_push_lock);
>  	wake_up_all(&log->l_cilp->xc_commit_wait);
> +	spin_unlock(&log->l_cilp->xc_push_lock);
>  	xlog_state_do_callback(log, true, NULL);
>  
>  #ifdef XFSERRORDEBUG
> -- 
> 2.23.0.rc1
> 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 3/8] xfs: prevent CIL push holdoff in log recovery
  2019-09-05  8:47 ` [PATCH 3/8] xfs: prevent CIL push holdoff in log recovery Dave Chinner
@ 2019-09-05 15:26   ` Darrick J. Wong
  2019-09-05 22:10     ` Dave Chinner
  0 siblings, 1 reply; 25+ messages in thread
From: Darrick J. Wong @ 2019-09-05 15:26 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs

On Thu, Sep 05, 2019 at 06:47:12PM +1000, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> generic/530 on a machine with enough ram and a non-preemptible
> kernel can run the AGI processing phase of log recovery enitrely out
> of cache. This means it never blocks on locks, never waits for IO
> and runs entirely through the unlinked lists until it either
> completes or blocks and hangs because it has run out of log space.
> 
> It runs out of log space because the background CIL push is
> scheduled but never runs. queue_work() queues the CIL work on the
> current CPU that is busy, and the workqueue code will not run it on
> any other CPU. Hence if the unlinked list processing never yields
> the CPU voluntarily, the push work is delayed indefinitely. This
> results in the CIL aggregating changes until all the log space is
> consumed.
> 
> When the log recoveyr processing evenutally blocks, the CIL flushes
> but because the last iclog isn't submitted for IO because it isn't
> full, the CIL flush never completes and nothing ever moves the log
> head forwards, or indeed inserts anything into the tail of the log,
> and hence nothing is able to get the log moving again and recovery
> hangs.
> 
> There are several problems here, but the two obvious ones from
> the trace are that:
> 	a) log recovery does not yield the CPU for over 4 seconds,
> 	b) binding CIL pushes to a single CPU is a really bad idea.
> 
> This patch addresses just these two aspects of the problem, and are
> suitable for backporting to work around any issues in older kernels.
> The more fundamental problem of preventing the CIL from consuming
> more than 50% of the log without committing will take more invasive
> and complex work, so will be done as followup work.
> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> ---
>  fs/xfs/xfs_log_recover.c | 1 +
>  fs/xfs/xfs_super.c       | 3 ++-
>  2 files changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c
> index f05c6c99c4f3..c9665455431e 100644
> --- a/fs/xfs/xfs_log_recover.c
> +++ b/fs/xfs/xfs_log_recover.c
> @@ -5080,6 +5080,7 @@ xlog_recover_process_iunlinks(
>  			while (agino != NULLAGINO) {
>  				agino = xlog_recover_process_one_iunlink(mp,
>  							agno, agino, bucket);
> +				cond_resched();

Funny, I encountered a similar problem in the deferred inactivation
series where iunlinked inodes marked for inactivation pile up until we
OOM or stall in the log.  I solved it by kicking the inactivation
workqueue and going to sleep every ~1000 inodes.

>  			}
>  		}
>  		xfs_buf_rele(agibp);
> diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
> index f9450235533c..55a268997bde 100644
> --- a/fs/xfs/xfs_super.c
> +++ b/fs/xfs/xfs_super.c
> @@ -818,7 +818,8 @@ xfs_init_mount_workqueues(
>  		goto out_destroy_buf;
>  
>  	mp->m_cil_workqueue = alloc_workqueue("xfs-cil/%s",
> -			WQ_MEM_RECLAIM|WQ_FREEZABLE, 0, mp->m_fsname);
> +			WQ_MEM_RECLAIM|WQ_FREEZABLE|WQ_UNBOUND,

More stupid nits: spaces between the "|".

Otherwise looks ok to me...

--D

> +			0, mp->m_fsname);
>  	if (!mp->m_cil_workqueue)
>  		goto out_destroy_unwritten;
>  
> -- 
> 2.23.0.rc1
> 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 4/8] xfs: factor debug code out of xlog_state_do_callback()
  2019-09-05  8:47 ` [PATCH 4/8] xfs: factor debug code out of xlog_state_do_callback() Dave Chinner
@ 2019-09-05 15:30   ` Darrick J. Wong
  2019-09-05 22:14     ` Dave Chinner
  0 siblings, 1 reply; 25+ messages in thread
From: Darrick J. Wong @ 2019-09-05 15:30 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs

On Thu, Sep 05, 2019 at 06:47:13PM +1000, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> Start making this function readable by lifting the debug code into
> a conditional function.
> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> ---
>  fs/xfs/xfs_log.c | 79 +++++++++++++++++++++++++++---------------------
>  1 file changed, 44 insertions(+), 35 deletions(-)
> 
> diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c
> index 8380ed065608..2904bf0d17f3 100644
> --- a/fs/xfs/xfs_log.c
> +++ b/fs/xfs/xfs_log.c
> @@ -2628,6 +2628,48 @@ xlog_get_lowest_lsn(
>  	return lowest_lsn;
>  }
>  
> +#ifdef DEBUG
> +/*
> + * Make one last gasp attempt to see if iclogs are being left in limbo.  If the
> + * above loop finds an iclog earlier than the current iclog and in one of the
> + * syncing states, the current iclog is put into DO_CALLBACK and the callbacks
> + * are deferred to the completion of the earlier iclog. Walk the iclogs in order
> + * and make sure that no iclog is in DO_CALLBACK unless an earlier iclog is in
> + * one of the syncing states.
> + *
> + * Note that SYNCING|IOERROR is a valid state so we cannot just check for
> + * ic_state == SYNCING.
> + */
> +static void
> +xlog_state_callback_check_state(
> +	struct xlog		*log)
> +{
> +	struct xlog_in_core	*first_iclog = log->l_iclog;
> +	struct xlog_in_core	*iclog = first_iclog;
> +
> +	do {
> +		ASSERT(iclog->ic_state != XLOG_STATE_DO_CALLBACK);
> +		/*
> +		 * Terminate the loop if iclogs are found in states
> +		 * which will cause other threads to clean up iclogs.
> +		 *
> +		 * SYNCING - i/o completion will go through logs
> +		 * DONE_SYNC - interrupt thread should be waiting for
> +		 *              l_icloglock
> +		 * IOERROR - give up hope all ye who enter here
> +		 */
> +		if (iclog->ic_state == XLOG_STATE_WANT_SYNC ||
> +		    iclog->ic_state & XLOG_STATE_SYNCING ||

Code like this make my eyes twitch (Does ic_state track mutually
exclusive state?  Or is it state flags?  I think it's the second!),
but as this is simply refactoring debugging code...

Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>

--D

> +		    iclog->ic_state == XLOG_STATE_DONE_SYNC ||
> +		    iclog->ic_state == XLOG_STATE_IOERROR )
> +			break;
> +		iclog = iclog->ic_next;
> +	} while (first_iclog != iclog);
> +}
> +#else
> +#define xlog_state_callback_check_state(l)	((void)0)
> +#endif
> +
>  STATIC void
>  xlog_state_do_callback(
>  	struct xlog		*log,
> @@ -2802,41 +2844,8 @@ xlog_state_do_callback(
>  		}
>  	} while (!ioerrors && loopdidcallbacks);
>  
> -#ifdef DEBUG
> -	/*
> -	 * Make one last gasp attempt to see if iclogs are being left in limbo.
> -	 * If the above loop finds an iclog earlier than the current iclog and
> -	 * in one of the syncing states, the current iclog is put into
> -	 * DO_CALLBACK and the callbacks are deferred to the completion of the
> -	 * earlier iclog. Walk the iclogs in order and make sure that no iclog
> -	 * is in DO_CALLBACK unless an earlier iclog is in one of the syncing
> -	 * states.
> -	 *
> -	 * Note that SYNCING|IOABORT is a valid state so we cannot just check
> -	 * for ic_state == SYNCING.
> -	 */
> -	if (funcdidcallbacks) {
> -		first_iclog = iclog = log->l_iclog;
> -		do {
> -			ASSERT(iclog->ic_state != XLOG_STATE_DO_CALLBACK);
> -			/*
> -			 * Terminate the loop if iclogs are found in states
> -			 * which will cause other threads to clean up iclogs.
> -			 *
> -			 * SYNCING - i/o completion will go through logs
> -			 * DONE_SYNC - interrupt thread should be waiting for
> -			 *              l_icloglock
> -			 * IOERROR - give up hope all ye who enter here
> -			 */
> -			if (iclog->ic_state == XLOG_STATE_WANT_SYNC ||
> -			    iclog->ic_state & XLOG_STATE_SYNCING ||
> -			    iclog->ic_state == XLOG_STATE_DONE_SYNC ||
> -			    iclog->ic_state == XLOG_STATE_IOERROR )
> -				break;
> -			iclog = iclog->ic_next;
> -		} while (first_iclog != iclog);
> -	}
> -#endif
> +	if (funcdidcallbacks)
> +		xlog_state_callback_check_state(log);
>  
>  	if (log->l_iclog->ic_state & (XLOG_STATE_ACTIVE|XLOG_STATE_IOERROR))
>  		wake_up_all(&log->l_flush_wait);
> -- 
> 2.23.0.rc1
> 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 5/8] xfs: factor callbacks out of xlog_state_do_callback()
  2019-09-05  8:47 ` [PATCH 5/8] xfs: factor callbacks " Dave Chinner
@ 2019-09-05 15:39   ` Darrick J. Wong
  2019-09-05 22:17     ` Dave Chinner
  0 siblings, 1 reply; 25+ messages in thread
From: Darrick J. Wong @ 2019-09-05 15:39 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs

On Thu, Sep 05, 2019 at 06:47:14PM +1000, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> Simplify the code flow by lifting the iclog callback work out of
> the main iclog iteration loop. This isolates the log juggling and
> callbacks from the iclog state change logic in the loop.
> 
> Note that the loopdidcallbacks variable is not actually tracking
> whether callbacks are actually run - it is tracking whether the
> icloglock was dropped during the loop and so determines if we
> completed the entire iclog scan loop atomically. Hence we know for
> certain there are either no more ordered completions to run or
> that the next completion will run the remaining ordered iclog
> completions. Hence rename that variable appropriately for it's
> function.
> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> ---
>  fs/xfs/xfs_log.c | 70 +++++++++++++++++++++++++++++++-----------------
>  1 file changed, 45 insertions(+), 25 deletions(-)
> 
> diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c
> index 2904bf0d17f3..73aa8e152c83 100644
> --- a/fs/xfs/xfs_log.c
> +++ b/fs/xfs/xfs_log.c
> @@ -2628,6 +2628,42 @@ xlog_get_lowest_lsn(
>  	return lowest_lsn;
>  }
>  
> +/*
> + * Keep processing entries in the iclog callback list until we come around and
> + * it is empty.  We need to atomically see that the list is empty and change the
> + * state to DIRTY so that we don't miss any more callbacks being added.
> + *
> + * This function is called with the icloglock held and returns with it held. We
> + * drop it while running callbacks, however, as holding it over thousands of
> + * callbacks is unnecessary and causes excessive contention if we do.
> + */
> +static void
> +xlog_state_do_iclog_callbacks(
> +	struct xlog		*log,
> +	struct xlog_in_core	*iclog,
> +	bool			aborted)
> +{
> +	spin_unlock(&log->l_icloglock);
> +	spin_lock(&iclog->ic_callback_lock);
> +	while (!list_empty(&iclog->ic_callbacks)) {
> +		LIST_HEAD(tmp);
> +
> +		list_splice_init(&iclog->ic_callbacks, &tmp);
> +
> +		spin_unlock(&iclog->ic_callback_lock);
> +		xlog_cil_process_committed(&tmp, aborted);
> +		spin_lock(&iclog->ic_callback_lock);
> +	}
> +
> +	/*
> +	 * Pick up the icloglock while still holding the callback lock so we
> +	 * serialise against anyone trying to add more callbacks to this iclog
> +	 * now we've finished processing.
> +	 */
> +	spin_lock(&log->l_icloglock);
> +	spin_unlock(&iclog->ic_callback_lock);
> +}
> +
>  #ifdef DEBUG
>  /*
>   * Make one last gasp attempt to see if iclogs are being left in limbo.  If the
> @@ -2682,7 +2718,7 @@ xlog_state_do_callback(
>  	int		   flushcnt = 0;
>  	xfs_lsn_t	   lowest_lsn;
>  	int		   ioerrors;	/* counter: iclogs with errors */
> -	int		   loopdidcallbacks; /* flag: inner loop did callbacks*/
> +	bool		   cycled_icloglock;
>  	int		   funcdidcallbacks; /* flag: function did callbacks */
>  	int		   repeats;	/* for issuing console warnings if
>  					 * looping too many times */
> @@ -2704,7 +2740,7 @@ xlog_state_do_callback(
>  		 */
>  		first_iclog = log->l_iclog;
>  		iclog = log->l_iclog;
> -		loopdidcallbacks = 0;
> +		cycled_icloglock = false;
>  		repeats++;
>  
>  		do {
> @@ -2795,31 +2831,13 @@ xlog_state_do_callback(
>  			} else
>  				ioerrors++;
>  
> -			spin_unlock(&log->l_icloglock);
> -
>  			/*
> -			 * Keep processing entries in the callback list until
> -			 * we come around and it is empty.  We need to
> -			 * atomically see that the list is empty and change the
> -			 * state to DIRTY so that we don't miss any more
> -			 * callbacks being added.
> +			 * Running callbacks will drop the icloglock which means
> +			 * we'll have to run at least one more complete loop.
>  			 */
> -			spin_lock(&iclog->ic_callback_lock);
> -			while (!list_empty(&iclog->ic_callbacks)) {
> -				LIST_HEAD(tmp);
> +			cycled_icloglock = true;
> +			xlog_state_do_iclog_callbacks(log, iclog, aborted);
>  
> -				list_splice_init(&iclog->ic_callbacks, &tmp);
> -
> -				spin_unlock(&iclog->ic_callback_lock);
> -				xlog_cil_process_committed(&tmp, aborted);
> -				spin_lock(&iclog->ic_callback_lock);
> -			}
> -
> -			loopdidcallbacks++;
> -			funcdidcallbacks++;
> -
> -			spin_lock(&log->l_icloglock);
> -			spin_unlock(&iclog->ic_callback_lock);
>  			if (!(iclog->ic_state & XLOG_STATE_IOERROR))
>  				iclog->ic_state = XLOG_STATE_DIRTY;
>  
> @@ -2835,6 +2853,8 @@ xlog_state_do_callback(
>  			iclog = iclog->ic_next;
>  		} while (first_iclog != iclog);
>  
> +		funcdidcallbacks += cycled_icloglock;

funcdidcallbacks is effectively a yes/no state flag, so maybe it should
be turned into a boolean and this statement becomes:

	funcdidcallbacks |= cycled_icloglock;

Though I guess we're not at huge risk of integer overflow and it
controls whether or not we run a debugging check so maybe we don't care?

--D

> +
>  		if (repeats > 5000) {
>  			flushcnt += repeats;
>  			repeats = 0;
> @@ -2842,7 +2862,7 @@ xlog_state_do_callback(
>  				"%s: possible infinite loop (%d iterations)",
>  				__func__, flushcnt);
>  		}
> -	} while (!ioerrors && loopdidcallbacks);
> +	} while (!ioerrors && cycled_icloglock);
>  
>  	if (funcdidcallbacks)
>  		xlog_state_callback_check_state(log);
> -- 
> 2.23.0.rc1
> 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 1/8 v2] xfs: log race fixes and cleanups
  2019-09-05  8:47 [PATCH 1/8 v2] xfs: log race fixes and cleanups Dave Chinner
                   ` (7 preceding siblings ...)
  2019-09-05  8:47 ` [PATCH 8/8] xfs: push the grant head when the log head moves forward Dave Chinner
@ 2019-09-05 15:44 ` Christoph Hellwig
  8 siblings, 0 replies; 25+ messages in thread
From: Christoph Hellwig @ 2019-09-05 15:44 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs

With this series generic/530 works for me.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 6/8] xfs: factor iclog state processing out of xlog_state_do_callback()
  2019-09-05  8:47 ` [PATCH 6/8] xfs: factor iclog state processing " Dave Chinner
@ 2019-09-05 15:45   ` Darrick J. Wong
  0 siblings, 0 replies; 25+ messages in thread
From: Darrick J. Wong @ 2019-09-05 15:45 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs

On Thu, Sep 05, 2019 at 06:47:15PM +1000, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> The iclog IO completion state processing is somewhat complex, and
> because it's inside two nested loops it is highly indented and very
> hard to read. Factor it out, flatten the logic flow and clean up the
> comments so that it much easier to see what the code is doing both
> in processing the individual iclogs and in the over
> xlog_state_do_callback() operation.
> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>

Looks ok,
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>

--D

> ---
>  fs/xfs/xfs_log.c | 195 ++++++++++++++++++++++++-----------------------
>  1 file changed, 98 insertions(+), 97 deletions(-)
> 
> diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c
> index 73aa8e152c83..356204ddf865 100644
> --- a/fs/xfs/xfs_log.c
> +++ b/fs/xfs/xfs_log.c
> @@ -2628,6 +2628,90 @@ xlog_get_lowest_lsn(
>  	return lowest_lsn;
>  }
>  
> +/*
> + * Return true if we need to stop processing, false to continue to the next
> + * iclog. The caller will need to run callbacks if the iclog is returned in the
> + * XLOG_STATE_CALLBACK state.
> + */
> +static bool
> +xlog_state_iodone_process_iclog(
> +	struct xlog		*log,
> +	struct xlog_in_core	*iclog,
> +	struct xlog_in_core	*completed_iclog,
> +	bool			*ioerror)
> +{
> +	xfs_lsn_t		lowest_lsn;
> +
> +	/* Skip all iclogs in the ACTIVE & DIRTY states */
> +	if (iclog->ic_state & (XLOG_STATE_ACTIVE | XLOG_STATE_DIRTY))
> +		return false;
> +
> +	/*
> +	 * Between marking a filesystem SHUTDOWN and stopping the log, we do
> +	 * flush all iclogs to disk (if there wasn't a log I/O error). So, we do
> +	 * want things to go smoothly in case of just a SHUTDOWN  w/o a
> +	 * LOG_IO_ERROR.
> +	 */
> +	if (iclog->ic_state & XLOG_STATE_IOERROR) {
> +		*ioerror = true;
> +		return false;
> +	}
> +
> +	/*
> +	 * Can only perform callbacks in order.  Since this iclog is not in the
> +	 * DONE_SYNC/ DO_CALLBACK state, we skip the rest and just try to clean
> +	 * up.  If we set our iclog to DO_CALLBACK, we will not process it when
> +	 * we retry since a previous iclog is in the CALLBACK and the state
> +	 * cannot change since we are holding the l_icloglock.
> +	 */
> +	if (!(iclog->ic_state &
> +			(XLOG_STATE_DONE_SYNC | XLOG_STATE_DO_CALLBACK))) {
> +		if (completed_iclog &&
> +		    (completed_iclog->ic_state == XLOG_STATE_DONE_SYNC)) {
> +			completed_iclog->ic_state = XLOG_STATE_DO_CALLBACK;
> +		}
> +		return true;
> +	}
> +
> +	/*
> +	 * We now have an iclog that is in either the DO_CALLBACK or DONE_SYNC
> +	 * states. The other states (WANT_SYNC, SYNCING, or CALLBACK were caught
> +	 * by the above if and are going to clean (i.e. we aren't doing their
> +	 * callbacks) see the above if.
> +	 *
> +	 * We will do one more check here to see if we have chased our tail
> +	 * around.
> +	 */
> +	lowest_lsn = xlog_get_lowest_lsn(log);
> +	if (lowest_lsn &&
> +	    XFS_LSN_CMP(lowest_lsn, be64_to_cpu(iclog->ic_header.h_lsn)) < 0)
> +		return false; /* Leave this iclog for another thread */
> +
> +	iclog->ic_state = XLOG_STATE_CALLBACK;
> +
> +	/*
> +	 * Completion of a iclog IO does not imply that a transaction has
> +	 * completed, as transactions can be large enough to span many iclogs.
> +	 * We cannot change the tail of the log half way through a transaction
> +	 * as this may be the only transaction in the log and moving th etail to
> +	 * point to the middle of it will prevent recovery from finding the
> +	 * start of the transaction.  Hence we should only update the
> +	 * last_sync_lsn if this iclog contains transaction completion callbacks
> +	 * on it.
> +	 *
> +	 * We have to do this before we drop the icloglock to ensure we are the
> +	 * only one that can update it.
> +	 */
> +	ASSERT(XFS_LSN_CMP(atomic64_read(&log->l_last_sync_lsn),
> +			be64_to_cpu(iclog->ic_header.h_lsn)) <= 0);
> +	if (!list_empty_careful(&iclog->ic_callbacks))
> +		atomic64_set(&log->l_last_sync_lsn,
> +			be64_to_cpu(iclog->ic_header.h_lsn));
> +
> +	return false;
> +
> +}
> +
>  /*
>   * Keep processing entries in the iclog callback list until we come around and
>   * it is empty.  We need to atomically see that the list is empty and change the
> @@ -2712,22 +2796,15 @@ xlog_state_do_callback(
>  	bool			aborted,
>  	struct xlog_in_core	*ciclog)
>  {
> -	xlog_in_core_t	   *iclog;
> -	xlog_in_core_t	   *first_iclog;	/* used to know when we've
> -						 * processed all iclogs once */
> -	int		   flushcnt = 0;
> -	xfs_lsn_t	   lowest_lsn;
> -	int		   ioerrors;	/* counter: iclogs with errors */
> -	bool		   cycled_icloglock;
> -	int		   funcdidcallbacks; /* flag: function did callbacks */
> -	int		   repeats;	/* for issuing console warnings if
> -					 * looping too many times */
> +	struct xlog_in_core	*iclog;
> +	struct xlog_in_core	*first_iclog;
> +	int			flushcnt = 0;
> +	int			funcdidcallbacks = 0;
> +	bool			cycled_icloglock;
> +	bool			ioerror;
> +	int			repeats = 0;
>  
>  	spin_lock(&log->l_icloglock);
> -	first_iclog = iclog = log->l_iclog;
> -	ioerrors = 0;
> -	funcdidcallbacks = 0;
> -	repeats = 0;
>  
>  	do {
>  		/*
> @@ -2741,96 +2818,20 @@ xlog_state_do_callback(
>  		first_iclog = log->l_iclog;
>  		iclog = log->l_iclog;
>  		cycled_icloglock = false;
> +		ioerror = false;
>  		repeats++;
>  
>  		do {
> +			if (xlog_state_iodone_process_iclog(log, iclog,
> +							ciclog, &ioerror))
> +				break;
>  
> -			/* skip all iclogs in the ACTIVE & DIRTY states */
> -			if (iclog->ic_state &
> -			    (XLOG_STATE_ACTIVE|XLOG_STATE_DIRTY)) {
> +			if (!(iclog->ic_state &
> +			      (XLOG_STATE_CALLBACK | XLOG_STATE_IOERROR))) {
>  				iclog = iclog->ic_next;
>  				continue;
>  			}
>  
> -			/*
> -			 * Between marking a filesystem SHUTDOWN and stopping
> -			 * the log, we do flush all iclogs to disk (if there
> -			 * wasn't a log I/O error). So, we do want things to
> -			 * go smoothly in case of just a SHUTDOWN  w/o a
> -			 * LOG_IO_ERROR.
> -			 */
> -			if (!(iclog->ic_state & XLOG_STATE_IOERROR)) {
> -				/*
> -				 * Can only perform callbacks in order.  Since
> -				 * this iclog is not in the DONE_SYNC/
> -				 * DO_CALLBACK state, we skip the rest and
> -				 * just try to clean up.  If we set our iclog
> -				 * to DO_CALLBACK, we will not process it when
> -				 * we retry since a previous iclog is in the
> -				 * CALLBACK and the state cannot change since
> -				 * we are holding the l_icloglock.
> -				 */
> -				if (!(iclog->ic_state &
> -					(XLOG_STATE_DONE_SYNC |
> -						 XLOG_STATE_DO_CALLBACK))) {
> -					if (ciclog && (ciclog->ic_state ==
> -							XLOG_STATE_DONE_SYNC)) {
> -						ciclog->ic_state = XLOG_STATE_DO_CALLBACK;
> -					}
> -					break;
> -				}
> -				/*
> -				 * We now have an iclog that is in either the
> -				 * DO_CALLBACK or DONE_SYNC states. The other
> -				 * states (WANT_SYNC, SYNCING, or CALLBACK were
> -				 * caught by the above if and are going to
> -				 * clean (i.e. we aren't doing their callbacks)
> -				 * see the above if.
> -				 */
> -
> -				/*
> -				 * We will do one more check here to see if we
> -				 * have chased our tail around.
> -				 */
> -
> -				lowest_lsn = xlog_get_lowest_lsn(log);
> -				if (lowest_lsn &&
> -				    XFS_LSN_CMP(lowest_lsn,
> -						be64_to_cpu(iclog->ic_header.h_lsn)) < 0) {
> -					iclog = iclog->ic_next;
> -					continue; /* Leave this iclog for
> -						   * another thread */
> -				}
> -
> -				iclog->ic_state = XLOG_STATE_CALLBACK;
> -
> -
> -				/*
> -				 * Completion of a iclog IO does not imply that
> -				 * a transaction has completed, as transactions
> -				 * can be large enough to span many iclogs. We
> -				 * cannot change the tail of the log half way
> -				 * through a transaction as this may be the only
> -				 * transaction in the log and moving th etail to
> -				 * point to the middle of it will prevent
> -				 * recovery from finding the start of the
> -				 * transaction. Hence we should only update the
> -				 * last_sync_lsn if this iclog contains
> -				 * transaction completion callbacks on it.
> -				 *
> -				 * We have to do this before we drop the
> -				 * icloglock to ensure we are the only one that
> -				 * can update it.
> -				 */
> -				ASSERT(XFS_LSN_CMP(atomic64_read(&log->l_last_sync_lsn),
> -					be64_to_cpu(iclog->ic_header.h_lsn)) <= 0);
> -				if (!list_empty_careful(&iclog->ic_callbacks))
> -					atomic64_set(&log->l_last_sync_lsn,
> -						be64_to_cpu(iclog->ic_header.h_lsn));
> -
> -			} else
> -				ioerrors++;
> -
>  			/*
>  			 * Running callbacks will drop the icloglock which means
>  			 * we'll have to run at least one more complete loop.
> @@ -2862,7 +2863,7 @@ xlog_state_do_callback(
>  				"%s: possible infinite loop (%d iterations)",
>  				__func__, flushcnt);
>  		}
> -	} while (!ioerrors && cycled_icloglock);
> +	} while (!ioerror && cycled_icloglock);
>  
>  	if (funcdidcallbacks)
>  		xlog_state_callback_check_state(log);
> -- 
> 2.23.0.rc1
> 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 7/8] xfs: push iclog state cleaning into xlog_state_clean_log
  2019-09-05  8:47 ` [PATCH 7/8] xfs: push iclog state cleaning into xlog_state_clean_log Dave Chinner
@ 2019-09-05 15:48   ` Darrick J. Wong
  2019-09-05 22:28     ` Dave Chinner
  0 siblings, 1 reply; 25+ messages in thread
From: Darrick J. Wong @ 2019-09-05 15:48 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs

On Thu, Sep 05, 2019 at 06:47:16PM +1000, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> xlog_state_clean_log() is only called from one place, and it occurs
> when an iclog is transitioning back to ACTIVE. Prior to calling
> xlog_state_clean_log, the iclog we are processing has a hard coded
> state check to DIRTY so that xlog_state_clean_log() processes it
> correctly. We also have a hard coded wakeup after
> xlog_state_clean_log() to enfore log force waiters on that iclog
> are woken correctly.
> 
> Both of these things are operations required to finish processing an
> iclog and return it to the ACTIVE state again, so they make little
> sense to be separated from the rest of the clean state transition
> code.
> 
> Hence push these things inside xlog_state_clean_log(), document the
> behaviour and rename it xlog_state_clean_iclog() to indicate that
> it's being driven by an iclog state change and does the iclog state
> change work itself.
> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> Reviewed-by: Christoph Hellwig <hch@lst.de>
> ---
>  fs/xfs/xfs_log.c | 57 ++++++++++++++++++++++++++++--------------------
>  1 file changed, 33 insertions(+), 24 deletions(-)
> 
> diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c
> index 356204ddf865..bef314361bc4 100644
> --- a/fs/xfs/xfs_log.c
> +++ b/fs/xfs/xfs_log.c
> @@ -2521,21 +2521,35 @@ xlog_write(
>   *****************************************************************************
>   */
>  
> -/* Clean iclogs starting from the head.  This ordering must be
> - * maintained, so an iclog doesn't become ACTIVE beyond one that
> - * is SYNCING.  This is also required to maintain the notion that we use
> - * a ordered wait queue to hold off would be writers to the log when every
> - * iclog is trying to sync to disk.
> +/*
> + * An iclog has just finished it's completion processing, so we need to update

it's -> its, but I can fix that on import.

Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>

--D

> + * the iclog state and propagate that up into the overall log state. Hence we
> + * prepare the iclog for cleaning, and then clean all the pending dirty iclogs
> + * starting from the head, and then wake up any threads that are waiting for the
> + * iclog to be marked clean.
> + *
> + * The ordering of marking iclogs ACTIVE must be maintained, so an iclog
> + * doesn't become ACTIVE beyond one that is SYNCING.  This is also required to
> + * maintain the notion that we use a ordered wait queue to hold off would be
> + * writers to the log when every iclog is trying to sync to disk.
> + *
> + * Caller must hold the icloglock before calling us.
>   *
> - * State Change: DIRTY -> ACTIVE
> + * State Change: !IOERROR -> DIRTY -> ACTIVE
>   */
>  STATIC void
> -xlog_state_clean_log(
> -	struct xlog *log)
> +xlog_state_clean_iclog(
> +	struct xlog		*log,
> +	struct xlog_in_core	*dirty_iclog)
>  {
> -	xlog_in_core_t	*iclog;
> -	int changed = 0;
> +	struct xlog_in_core	*iclog;
> +	int			changed = 0;
> +
> +	/* Prepare the completed iclog. */
> +	if (!(dirty_iclog->ic_state & XLOG_STATE_IOERROR))
> +		dirty_iclog->ic_state = XLOG_STATE_DIRTY;
>  
> +	/* Walk all the iclogs to update the ordered active state. */
>  	iclog = log->l_iclog;
>  	do {
>  		if (iclog->ic_state == XLOG_STATE_DIRTY) {
> @@ -2573,7 +2587,13 @@ xlog_state_clean_log(
>  		iclog = iclog->ic_next;
>  	} while (iclog != log->l_iclog);
>  
> -	/* log is locked when we are called */
> +
> +	/*
> +	 * Wake up threads waiting in xfs_log_force() for the dirty iclog
> +	 * to be cleaned.
> +	 */
> +	wake_up_all(&dirty_iclog->ic_force_wait);
> +
>  	/*
>  	 * Change state for the dummy log recording.
>  	 * We usually go to NEED. But we go to NEED2 if the changed indicates
> @@ -2607,7 +2627,7 @@ xlog_state_clean_log(
>  			ASSERT(0);
>  		}
>  	}
> -}	/* xlog_state_clean_log */
> +}
>  
>  STATIC xfs_lsn_t
>  xlog_get_lowest_lsn(
> @@ -2839,18 +2859,7 @@ xlog_state_do_callback(
>  			cycled_icloglock = true;
>  			xlog_state_do_iclog_callbacks(log, iclog, aborted);
>  
> -			if (!(iclog->ic_state & XLOG_STATE_IOERROR))
> -				iclog->ic_state = XLOG_STATE_DIRTY;
> -
> -			/*
> -			 * Transition from DIRTY to ACTIVE if applicable.
> -			 * NOP if STATE_IOERROR.
> -			 */
> -			xlog_state_clean_log(log);
> -
> -			/* wake up threads waiting in xfs_log_force() */
> -			wake_up_all(&iclog->ic_force_wait);
> -
> +			xlog_state_clean_iclog(log, iclog);
>  			iclog = iclog->ic_next;
>  		} while (first_iclog != iclog);
>  
> -- 
> 2.23.0.rc1
> 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 8/8] xfs: push the grant head when the log head moves forward
  2019-09-05  8:47 ` [PATCH 8/8] xfs: push the grant head when the log head moves forward Dave Chinner
@ 2019-09-05 16:00   ` Darrick J. Wong
  0 siblings, 0 replies; 25+ messages in thread
From: Darrick J. Wong @ 2019-09-05 16:00 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs

On Thu, Sep 05, 2019 at 06:47:17PM +1000, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> When the log fills up, we can get into the state where the
> outstanding items in the CIL being committed and aggregated are
> larger than the range that the reservation grant head tail pushing
> will attempt to clean. This can result in the tail pushing range
> being trimmed back to the the log head (l_last_sync_lsn) and so
> may not actually move the push target at all.
> 
> When the iclogs associated with the CIL commit finally land, the
> log head moves forward, and this removes the restriction on the AIL
> push target. However, if we already have transactions sleeping on
> the grant head, and there's nothing in the AIL still to flush from
> the current push target, then nothing will move the tail of the log
> and trigger a log reservation wakeup.
> 
> Hence the there is nothing that will trigger xlog_grant_push_ail()
> to recalculate the AIL push target and start pushing on the AIL
> again to write back the metadata objects that pin the tail of the
> log and hence free up space and allow the transaction reservations
> to be woken and make progress.
> 
> Hence we need to push on the grant head when we move the log head
> forward, as this may be the only trigger we have that can move the
> AIL push target forwards in this situation.
> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>

Seems reasonable to me.

There's two unfortunate twists for applying this series -- there won't be
any new for-next trees from Stephen Rothwell until Sept. 30th, which
means we (XFS developers) are all pretty much on our own for testing
this in the xfs for-next branch.

The second twist of course is that I'm leaving Friday afternoon for a
vacation.  That means either (a) everything passes muster, I fix the
comment nits, and push this into xfs for-next before I go; (b) there are
deeper review comments and so this waits until I return on the 16th; or
(c) I guess Dave could tack it on for-next himself when the patches are
ready since he still has commit access. ;)

Either way this probably means a separate pull request for the log fixes
during the second week of the merge window.  Thoughts/flames?

Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>

--D

> ---
>  fs/xfs/xfs_log.c | 72 +++++++++++++++++++++++++++++++-----------------
>  1 file changed, 47 insertions(+), 25 deletions(-)
> 
> diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c
> index bef314361bc4..f90765af6916 100644
> --- a/fs/xfs/xfs_log.c
> +++ b/fs/xfs/xfs_log.c
> @@ -2648,6 +2648,46 @@ xlog_get_lowest_lsn(
>  	return lowest_lsn;
>  }
>  
> +/*
> + * Completion of a iclog IO does not imply that a transaction has completed, as
> + * transactions can be large enough to span many iclogs. We cannot change the
> + * tail of the log half way through a transaction as this may be the only
> + * transaction in the log and moving the tail to point to the middle of it
> + * will prevent recovery from finding the start of the transaction. Hence we
> + * should only update the last_sync_lsn if this iclog contains transaction
> + * completion callbacks on it.
> + *
> + * We have to do this before we drop the icloglock to ensure we are the only one
> + * that can update it.
> + *
> + * If we are moving the last_sync_lsn forwards, we also need to ensure we kick
> + * the reservation grant head pushing. This is due to the fact that the push
> + * target is bound by the current last_sync_lsn value. Hence if we have a large
> + * amount of log space bound up in this committing transaction then the
> + * last_sync_lsn value may be the limiting factor preventing tail pushing from
> + * freeing space in the log. Hence once we've updated the last_sync_lsn we
> + * should push the AIL to ensure the push target (and hence the grant head) is
> + * no longer bound by the old log head location and can move forwards and make
> + * progress again.
> + */
> +static void
> +xlog_state_set_callback(
> +	struct xlog		*log,
> +	struct xlog_in_core	*iclog,
> +	xfs_lsn_t		header_lsn)
> +{
> +	iclog->ic_state = XLOG_STATE_CALLBACK;
> +
> +	ASSERT(XFS_LSN_CMP(atomic64_read(&log->l_last_sync_lsn),
> +			   header_lsn) <= 0);
> +
> +	if (list_empty_careful(&iclog->ic_callbacks))
> +		return;
> +
> +	atomic64_set(&log->l_last_sync_lsn, header_lsn);
> +	xlog_grant_push_ail(log, 0);
> +}
> +
>  /*
>   * Return true if we need to stop processing, false to continue to the next
>   * iclog. The caller will need to run callbacks if the iclog is returned in the
> @@ -2661,6 +2701,7 @@ xlog_state_iodone_process_iclog(
>  	bool			*ioerror)
>  {
>  	xfs_lsn_t		lowest_lsn;
> +	xfs_lsn_t		header_lsn;
>  
>  	/* Skip all iclogs in the ACTIVE & DIRTY states */
>  	if (iclog->ic_state & (XLOG_STATE_ACTIVE | XLOG_STATE_DIRTY))
> @@ -2700,34 +2741,15 @@ xlog_state_iodone_process_iclog(
>  	 * callbacks) see the above if.
>  	 *
>  	 * We will do one more check here to see if we have chased our tail
> -	 * around.
> +	 * around. If this is not the lowest lsn iclog, then we will leave it
> +	 * for another completion to process.
>  	 */
> +	header_lsn = be64_to_cpu(iclog->ic_header.h_lsn);
>  	lowest_lsn = xlog_get_lowest_lsn(log);
> -	if (lowest_lsn &&
> -	    XFS_LSN_CMP(lowest_lsn, be64_to_cpu(iclog->ic_header.h_lsn)) < 0)
> -		return false; /* Leave this iclog for another thread */
> -
> -	iclog->ic_state = XLOG_STATE_CALLBACK;
> -
> -	/*
> -	 * Completion of a iclog IO does not imply that a transaction has
> -	 * completed, as transactions can be large enough to span many iclogs.
> -	 * We cannot change the tail of the log half way through a transaction
> -	 * as this may be the only transaction in the log and moving th etail to
> -	 * point to the middle of it will prevent recovery from finding the
> -	 * start of the transaction.  Hence we should only update the
> -	 * last_sync_lsn if this iclog contains transaction completion callbacks
> -	 * on it.
> -	 *
> -	 * We have to do this before we drop the icloglock to ensure we are the
> -	 * only one that can update it.
> -	 */
> -	ASSERT(XFS_LSN_CMP(atomic64_read(&log->l_last_sync_lsn),
> -			be64_to_cpu(iclog->ic_header.h_lsn)) <= 0);
> -	if (!list_empty_careful(&iclog->ic_callbacks))
> -		atomic64_set(&log->l_last_sync_lsn,
> -			be64_to_cpu(iclog->ic_header.h_lsn));
> +	if (lowest_lsn && XFS_LSN_CMP(lowest_lsn, header_lsn) < 0)
> +		return false;
>  
> +	xlog_state_set_callback(log, iclog, header_lsn);
>  	return false;
>  
>  }
> -- 
> 2.23.0.rc1
> 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 1/8] xfs: push the AIL in xlog_grant_head_wake
  2019-09-05 15:18   ` Darrick J. Wong
@ 2019-09-05 22:02     ` Dave Chinner
  0 siblings, 0 replies; 25+ messages in thread
From: Dave Chinner @ 2019-09-05 22:02 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

On Thu, Sep 05, 2019 at 08:18:28AM -0700, Darrick J. Wong wrote:
> On Thu, Sep 05, 2019 at 06:47:10PM +1000, Dave Chinner wrote:
> > From: Dave Chinner <dchinner@redhat.com>
> > 
> > In the situation where the log is full and the CIL has not recently
> > flushed, the AIL push threshold is throttled back to the where the
> > last write of the head of the log was completed. This is stored in
> > log->l_last_sync_lsn. Hence if the CIL holds > 25% of the log space
> > pinned by flushes and/or aggregation in progress, we can get the
> > situation where the head of the log lags a long way behind the
> > reservation grant head.
> > 
> > When this happens, the AIL push target is trimmed back from where
> > the reservation grant head wants to push the log tail to, back to
> > where the head of the log currently is. This means the push target
> > doesn't reach far enough into the log to actually move the tail
> > before the transaction reservation goes to sleep.
> > 
> > When the CIL push completes, it moves the log head forward such that
> > the AIL push target can now be moved, but that has no mechanism for
> > puhsing the log tail. Further, if the next tail movement of the log
> > is not large enough wake the waiter (i.e. still not enough space for
> > it to have a reservation granted), we don't wake anything up, and
> > hence we do not update the AIL push target to take into account the
> > head of the log moving and allowing the push target to be moved
> > forwards.
> > 
> > To avoid this particular condition, if we fail to wake the first
> > waiter on the grant head because we don't have enough space,
> > push on the AIL again. This will pick up any movement of the log
> > head and allow the push target to move forward due to completion of
> > CIL pushing.
> > 
> > Signed-off-by: Dave Chinner <dchinner@redhat.com>
> > ---
> >  fs/xfs/xfs_log.c | 23 ++++++++++++++++++++++-
> >  1 file changed, 22 insertions(+), 1 deletion(-)
> > 
> > diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c
> > index b159a9e9fef0..5e21450fb8f5 100644
> > --- a/fs/xfs/xfs_log.c
> > +++ b/fs/xfs/xfs_log.c
> > @@ -214,15 +214,36 @@ xlog_grant_head_wake(
> >  {
> >  	struct xlog_ticket	*tic;
> >  	int			need_bytes;
> > +	bool			woken_task = false;
> >  
> >  	list_for_each_entry(tic, &head->waiters, t_queue) {
> > +		/*
> > +		 * The is a chance that the size of the CIL checkpoints in
> > +		 * progress result at the last AIL push resulted in the log head
> > +		 * (l_last_sync_lsn) not reflecting where the log head now is.
> 
> That's a bit difficult to understand...

Yup I failed to edit it properly and left an extra "result" in the
sentence...

> "There is a chance that the size of the CIL checkpoints in progress at
> the last AIL push results in the last committed log head (l_last_sync_lsn)
> not reflecting where the log head is now." ?
> 
> (Did I get that right?)

*nod*.

> > +		 * Hence when we are woken here, it may be the head of the log
> > +		 * that has moved rather than the tail. In that case, the tail
> > +		 * didn't move and there won't be space available until the AIL
> > +		 * push target is updated and the tail pushed again. If the AIL
> > +		 * is already pushed to it's target, we will hang here until
> 
> Nit: "its", not "it is".
> 
> Other than that I think I can tell what this is doing now. :)

In reading this again, it is all a bit clunky. I've revised it a bit
more to more concisely describe the situation:

	/*
	 * The is a chance that the size of the CIL checkpoints in
	 * progress at the last AIL push target calculation resulted in
	 * limiting the target to the log head (l_last_sync_lsn) at the
	 * time. This may not reflect where the log head is now as the
	 * CIL checkpoints may have completed.
	 *
	 * Hence when we are woken here, it may be that the head of the
	 * log that has moved rather than the tail. As the tail didn't
	 * move, there still won't be space available for the
	 * reservation we require.  However, if the AIL has already
	 * pushed to the target defined by the old log head location, we
	 * will hang here waiting for something else to update the AIL
	 * push target.
	 *
	 * Therefore, if there isn't space to wake the first waiter on
	 * the grant head, we need to push the AIL again to ensure the
	 * target reflects both the current log tail and log head
	 * position before we wait for the tail to move again.
	 */

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 3/8] xfs: prevent CIL push holdoff in log recovery
  2019-09-05 15:26   ` Darrick J. Wong
@ 2019-09-05 22:10     ` Dave Chinner
  0 siblings, 0 replies; 25+ messages in thread
From: Dave Chinner @ 2019-09-05 22:10 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

On Thu, Sep 05, 2019 at 08:26:44AM -0700, Darrick J. Wong wrote:
> On Thu, Sep 05, 2019 at 06:47:12PM +1000, Dave Chinner wrote:
> > From: Dave Chinner <dchinner@redhat.com>
> > 
> > generic/530 on a machine with enough ram and a non-preemptible
> > kernel can run the AGI processing phase of log recovery enitrely out
> > of cache. This means it never blocks on locks, never waits for IO
> > and runs entirely through the unlinked lists until it either
> > completes or blocks and hangs because it has run out of log space.
> > 
> > It runs out of log space because the background CIL push is
> > scheduled but never runs. queue_work() queues the CIL work on the
> > current CPU that is busy, and the workqueue code will not run it on
> > any other CPU. Hence if the unlinked list processing never yields
> > the CPU voluntarily, the push work is delayed indefinitely. This
> > results in the CIL aggregating changes until all the log space is
> > consumed.
> > 
> > When the log recoveyr processing evenutally blocks, the CIL flushes
> > but because the last iclog isn't submitted for IO because it isn't
> > full, the CIL flush never completes and nothing ever moves the log
> > head forwards, or indeed inserts anything into the tail of the log,
> > and hence nothing is able to get the log moving again and recovery
> > hangs.
> > 
> > There are several problems here, but the two obvious ones from
> > the trace are that:
> > 	a) log recovery does not yield the CPU for over 4 seconds,
> > 	b) binding CIL pushes to a single CPU is a really bad idea.
> > 
> > This patch addresses just these two aspects of the problem, and are
> > suitable for backporting to work around any issues in older kernels.
> > The more fundamental problem of preventing the CIL from consuming
> > more than 50% of the log without committing will take more invasive
> > and complex work, so will be done as followup work.
> > 
> > Signed-off-by: Dave Chinner <dchinner@redhat.com>
> > ---
> >  fs/xfs/xfs_log_recover.c | 1 +
> >  fs/xfs/xfs_super.c       | 3 ++-
> >  2 files changed, 3 insertions(+), 1 deletion(-)
> > 
> > diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c
> > index f05c6c99c4f3..c9665455431e 100644
> > --- a/fs/xfs/xfs_log_recover.c
> > +++ b/fs/xfs/xfs_log_recover.c
> > @@ -5080,6 +5080,7 @@ xlog_recover_process_iunlinks(
> >  			while (agino != NULLAGINO) {
> >  				agino = xlog_recover_process_one_iunlink(mp,
> >  							agno, agino, bucket);
> > +				cond_resched();
> 
> Funny, I encountered a similar problem in the deferred inactivation
> series where iunlinked inodes marked for inactivation pile up until we
> OOM or stall in the log.  I solved it by kicking the inactivation
> workqueue and going to sleep every ~1000 inodes.

If the workqueue had already been kicked, then yielding with
cond_resched() would probably be enough to avoid that.

I think I'm going to have a bit of a look at our use of workqueues -
I didn't realise that the default behaviour of the workqueues was
"cannot run work unless CPU is yeilded" - it kinda makes the "do
async work by workqueue" model somewhat problematic if the work
queued by a single CPU can only be run on that same CPU instead of
concurrently across all idle CPUs...

> >  			}
> >  		}
> >  		xfs_buf_rele(agibp);
> > diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
> > index f9450235533c..55a268997bde 100644
> > --- a/fs/xfs/xfs_super.c
> > +++ b/fs/xfs/xfs_super.c
> > @@ -818,7 +818,8 @@ xfs_init_mount_workqueues(
> >  		goto out_destroy_buf;
> >  
> >  	mp->m_cil_workqueue = alloc_workqueue("xfs-cil/%s",
> > -			WQ_MEM_RECLAIM|WQ_FREEZABLE, 0, mp->m_fsname);
> > +			WQ_MEM_RECLAIM|WQ_FREEZABLE|WQ_UNBOUND,
> 
> More stupid nits: spaces between the "|".

It's the same as the rest of the code in that function.

Fixed anyway.

-Dave.

-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 4/8] xfs: factor debug code out of xlog_state_do_callback()
  2019-09-05 15:30   ` Darrick J. Wong
@ 2019-09-05 22:14     ` Dave Chinner
  0 siblings, 0 replies; 25+ messages in thread
From: Dave Chinner @ 2019-09-05 22:14 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

On Thu, Sep 05, 2019 at 08:30:06AM -0700, Darrick J. Wong wrote:
> On Thu, Sep 05, 2019 at 06:47:13PM +1000, Dave Chinner wrote:
> > From: Dave Chinner <dchinner@redhat.com>
> > 
> > Start making this function readable by lifting the debug code into
> > a conditional function.
> > 
> > Signed-off-by: Dave Chinner <dchinner@redhat.com>
> > ---
> >  fs/xfs/xfs_log.c | 79 +++++++++++++++++++++++++++---------------------
> >  1 file changed, 44 insertions(+), 35 deletions(-)
> > 
> > diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c
> > index 8380ed065608..2904bf0d17f3 100644
> > --- a/fs/xfs/xfs_log.c
> > +++ b/fs/xfs/xfs_log.c
> > @@ -2628,6 +2628,48 @@ xlog_get_lowest_lsn(
> >  	return lowest_lsn;
> >  }
> >  
> > +#ifdef DEBUG
> > +/*
> > + * Make one last gasp attempt to see if iclogs are being left in limbo.  If the
> > + * above loop finds an iclog earlier than the current iclog and in one of the
> > + * syncing states, the current iclog is put into DO_CALLBACK and the callbacks
> > + * are deferred to the completion of the earlier iclog. Walk the iclogs in order
> > + * and make sure that no iclog is in DO_CALLBACK unless an earlier iclog is in
> > + * one of the syncing states.
> > + *
> > + * Note that SYNCING|IOERROR is a valid state so we cannot just check for
> > + * ic_state == SYNCING.
> > + */
> > +static void
> > +xlog_state_callback_check_state(
> > +	struct xlog		*log)
> > +{
> > +	struct xlog_in_core	*first_iclog = log->l_iclog;
> > +	struct xlog_in_core	*iclog = first_iclog;
> > +
> > +	do {
> > +		ASSERT(iclog->ic_state != XLOG_STATE_DO_CALLBACK);
> > +		/*
> > +		 * Terminate the loop if iclogs are found in states
> > +		 * which will cause other threads to clean up iclogs.
> > +		 *
> > +		 * SYNCING - i/o completion will go through logs
> > +		 * DONE_SYNC - interrupt thread should be waiting for
> > +		 *              l_icloglock
> > +		 * IOERROR - give up hope all ye who enter here
> > +		 */
> > +		if (iclog->ic_state == XLOG_STATE_WANT_SYNC ||
> > +		    iclog->ic_state & XLOG_STATE_SYNCING ||
> 
> Code like this make my eyes twitch (Does ic_state track mutually
> exclusive state?  Or is it state flags?  I think it's the second!),
> but as this is simply refactoring debugging code...

Yup, it's just another of those little things about this code that
makes eyes cross and heads explode.  Fixing this is what Christoph
and I were talking about in the past version of this series -
pulling the ioerror state out of the iclog state machine state
flags...

> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>

Thanks!

-Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 5/8] xfs: factor callbacks out of xlog_state_do_callback()
  2019-09-05 15:39   ` Darrick J. Wong
@ 2019-09-05 22:17     ` Dave Chinner
  0 siblings, 0 replies; 25+ messages in thread
From: Dave Chinner @ 2019-09-05 22:17 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

On Thu, Sep 05, 2019 at 08:39:07AM -0700, Darrick J. Wong wrote:
> On Thu, Sep 05, 2019 at 06:47:14PM +1000, Dave Chinner wrote:
> > @@ -2795,31 +2831,13 @@ xlog_state_do_callback(
> >  			} else
> >  				ioerrors++;
> >  
> > -			spin_unlock(&log->l_icloglock);
> > -
> >  			/*
> > -			 * Keep processing entries in the callback list until
> > -			 * we come around and it is empty.  We need to
> > -			 * atomically see that the list is empty and change the
> > -			 * state to DIRTY so that we don't miss any more
> > -			 * callbacks being added.
> > +			 * Running callbacks will drop the icloglock which means
> > +			 * we'll have to run at least one more complete loop.
> >  			 */
> > -			spin_lock(&iclog->ic_callback_lock);
> > -			while (!list_empty(&iclog->ic_callbacks)) {
> > -				LIST_HEAD(tmp);
> > +			cycled_icloglock = true;
> > +			xlog_state_do_iclog_callbacks(log, iclog, aborted);
> >  
> > -				list_splice_init(&iclog->ic_callbacks, &tmp);
> > -
> > -				spin_unlock(&iclog->ic_callback_lock);
> > -				xlog_cil_process_committed(&tmp, aborted);
> > -				spin_lock(&iclog->ic_callback_lock);
> > -			}
> > -
> > -			loopdidcallbacks++;
> > -			funcdidcallbacks++;
> > -
> > -			spin_lock(&log->l_icloglock);
> > -			spin_unlock(&iclog->ic_callback_lock);
> >  			if (!(iclog->ic_state & XLOG_STATE_IOERROR))
> >  				iclog->ic_state = XLOG_STATE_DIRTY;
> >  
> > @@ -2835,6 +2853,8 @@ xlog_state_do_callback(
> >  			iclog = iclog->ic_next;
> >  		} while (first_iclog != iclog);
> >  
> > +		funcdidcallbacks += cycled_icloglock;
> 
> funcdidcallbacks is effectively a yes/no state flag, so maybe it should
> be turned into a boolean and this statement becomes:
> 
> 	funcdidcallbacks |= cycled_icloglock;

Fixed. I renamed it to did_callbacks at the same time, just to be a
little less eye-bleedy...

> Though I guess we're not at huge risk of integer overflow and it
> controls whether or not we run a debugging check so maybe we don't care?

All that really matters is we don't need a branch to calculate it :P

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 7/8] xfs: push iclog state cleaning into xlog_state_clean_log
  2019-09-05 15:48   ` Darrick J. Wong
@ 2019-09-05 22:28     ` Dave Chinner
  0 siblings, 0 replies; 25+ messages in thread
From: Dave Chinner @ 2019-09-05 22:28 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

On Thu, Sep 05, 2019 at 08:48:53AM -0700, Darrick J. Wong wrote:
> On Thu, Sep 05, 2019 at 06:47:16PM +1000, Dave Chinner wrote:
> > From: Dave Chinner <dchinner@redhat.com>
> > 
> > xlog_state_clean_log() is only called from one place, and it occurs
> > when an iclog is transitioning back to ACTIVE. Prior to calling
> > xlog_state_clean_log, the iclog we are processing has a hard coded
> > state check to DIRTY so that xlog_state_clean_log() processes it
> > correctly. We also have a hard coded wakeup after
> > xlog_state_clean_log() to enfore log force waiters on that iclog
> > are woken correctly.
> > 
> > Both of these things are operations required to finish processing an
> > iclog and return it to the ACTIVE state again, so they make little
> > sense to be separated from the rest of the clean state transition
> > code.
> > 
> > Hence push these things inside xlog_state_clean_log(), document the
> > behaviour and rename it xlog_state_clean_iclog() to indicate that
> > it's being driven by an iclog state change and does the iclog state
> > change work itself.
> > 
> > Signed-off-by: Dave Chinner <dchinner@redhat.com>
> > Reviewed-by: Christoph Hellwig <hch@lst.de>
> > ---
> >  fs/xfs/xfs_log.c | 57 ++++++++++++++++++++++++++++--------------------
> >  1 file changed, 33 insertions(+), 24 deletions(-)
> > 
> > diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c
> > index 356204ddf865..bef314361bc4 100644
> > --- a/fs/xfs/xfs_log.c
> > +++ b/fs/xfs/xfs_log.c
> > @@ -2521,21 +2521,35 @@ xlog_write(
> >   *****************************************************************************
> >   */
> >  
> > -/* Clean iclogs starting from the head.  This ordering must be
> > - * maintained, so an iclog doesn't become ACTIVE beyond one that
> > - * is SYNCING.  This is also required to maintain the notion that we use
> > - * a ordered wait queue to hold off would be writers to the log when every
> > - * iclog is trying to sync to disk.
> > +/*
> > + * An iclog has just finished it's completion processing, so we need to update
> 
> it's -> its, but I can fix that on import.

Fixed - "just finished IO completion processing"...

-Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 1/8] xfs: push the AIL in xlog_grant_head_wake
  2019-09-06  0:05 ` [PATCH 1/8] xfs: push the AIL in xlog_grant_head_wake Dave Chinner
@ 2019-09-06  0:12   ` Darrick J. Wong
  0 siblings, 0 replies; 25+ messages in thread
From: Darrick J. Wong @ 2019-09-06  0:12 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs

On Fri, Sep 06, 2019 at 10:05:46AM +1000, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> In the situation where the log is full and the CIL has not recently
> flushed, the AIL push threshold is throttled back to the where the
> last write of the head of the log was completed. This is stored in
> log->l_last_sync_lsn. Hence if the CIL holds > 25% of the log space
> pinned by flushes and/or aggregation in progress, we can get the
> situation where the head of the log lags a long way behind the
> reservation grant head.
> 
> When this happens, the AIL push target is trimmed back from where
> the reservation grant head wants to push the log tail to, back to
> where the head of the log currently is. This means the push target
> doesn't reach far enough into the log to actually move the tail
> before the transaction reservation goes to sleep.
> 
> When the CIL push completes, it moves the log head forward such that
> the AIL push target can now be moved, but that has no mechanism for
> puhsing the log tail. Further, if the next tail movement of the log
> is not large enough wake the waiter (i.e. still not enough space for
> it to have a reservation granted), we don't wake anything up, and
> hence we do not update the AIL push target to take into account the
> head of the log moving and allowing the push target to be moved
> forwards.
> 
> To avoid this particular condition, if we fail to wake the first
> waiter on the grant head because we don't have enough space,
> push on the AIL again. This will pick up any movement of the log
> head and allow the push target to move forward due to completion of
> CIL pushing.
> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> ---
>  fs/xfs/xfs_log.c | 29 ++++++++++++++++++++++++++++-
>  1 file changed, 28 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c
> index b159a9e9fef0..c2862b1a2780 100644
> --- a/fs/xfs/xfs_log.c
> +++ b/fs/xfs/xfs_log.c
> @@ -214,15 +214,42 @@ xlog_grant_head_wake(
>  {
>  	struct xlog_ticket	*tic;
>  	int			need_bytes;
> +	bool			woken_task = false;
>  
>  	list_for_each_entry(tic, &head->waiters, t_queue) {
> +
> +		/*
> +		 * The is a chance that the size of the CIL checkpoints in

"There is a chance..."

Other than that,
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>

--D

> +		 * progress at the last AIL push target calculation resulted in
> +		 * limiting the target to the log head (l_last_sync_lsn) at the
> +		 * time. This may not reflect where the log head is now as the
> +		 * CIL checkpoints may have completed.
> +		 *
> +		 * Hence when we are woken here, it may be that the head of the
> +		 * log that has moved rather than the tail. As the tail didn't
> +		 * move, there still won't be space available for the
> +		 * reservation we require.  However, if the AIL has already
> +		 * pushed to the target defined by the old log head location, we
> +		 * will hang here waiting for something else to update the AIL
> +		 * push target.
> +		 *
> +		 * Therefore, if there isn't space to wake the first waiter on
> +		 * the grant head, we need to push the AIL again to ensure the
> +		 * target reflects both the current log tail and log head
> +		 * position before we wait for the tail to move again.
> +		 */
> +
>  		need_bytes = xlog_ticket_reservation(log, head, tic);
> -		if (*free_bytes < need_bytes)
> +		if (*free_bytes < need_bytes) {
> +			if (!woken_task)
> +				xlog_grant_push_ail(log, need_bytes);
>  			return false;
> +		}
>  
>  		*free_bytes -= need_bytes;
>  		trace_xfs_log_grant_wake_up(log, tic);
>  		wake_up_process(tic->t_task);
> +		woken_task = true;
>  	}
>  
>  	return true;
> -- 
> 2.23.0.rc1
> 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH 1/8] xfs: push the AIL in xlog_grant_head_wake
  2019-09-06  0:05 [PATCH0/8 v3] " Dave Chinner
@ 2019-09-06  0:05 ` Dave Chinner
  2019-09-06  0:12   ` Darrick J. Wong
  0 siblings, 1 reply; 25+ messages in thread
From: Dave Chinner @ 2019-09-06  0:05 UTC (permalink / raw)
  To: linux-xfs

From: Dave Chinner <dchinner@redhat.com>

In the situation where the log is full and the CIL has not recently
flushed, the AIL push threshold is throttled back to the where the
last write of the head of the log was completed. This is stored in
log->l_last_sync_lsn. Hence if the CIL holds > 25% of the log space
pinned by flushes and/or aggregation in progress, we can get the
situation where the head of the log lags a long way behind the
reservation grant head.

When this happens, the AIL push target is trimmed back from where
the reservation grant head wants to push the log tail to, back to
where the head of the log currently is. This means the push target
doesn't reach far enough into the log to actually move the tail
before the transaction reservation goes to sleep.

When the CIL push completes, it moves the log head forward such that
the AIL push target can now be moved, but that has no mechanism for
puhsing the log tail. Further, if the next tail movement of the log
is not large enough wake the waiter (i.e. still not enough space for
it to have a reservation granted), we don't wake anything up, and
hence we do not update the AIL push target to take into account the
head of the log moving and allowing the push target to be moved
forwards.

To avoid this particular condition, if we fail to wake the first
waiter on the grant head because we don't have enough space,
push on the AIL again. This will pick up any movement of the log
head and allow the push target to move forward due to completion of
CIL pushing.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/xfs_log.c | 29 ++++++++++++++++++++++++++++-
 1 file changed, 28 insertions(+), 1 deletion(-)

diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c
index b159a9e9fef0..c2862b1a2780 100644
--- a/fs/xfs/xfs_log.c
+++ b/fs/xfs/xfs_log.c
@@ -214,15 +214,42 @@ xlog_grant_head_wake(
 {
 	struct xlog_ticket	*tic;
 	int			need_bytes;
+	bool			woken_task = false;
 
 	list_for_each_entry(tic, &head->waiters, t_queue) {
+
+		/*
+		 * The is a chance that the size of the CIL checkpoints in
+		 * progress at the last AIL push target calculation resulted in
+		 * limiting the target to the log head (l_last_sync_lsn) at the
+		 * time. This may not reflect where the log head is now as the
+		 * CIL checkpoints may have completed.
+		 *
+		 * Hence when we are woken here, it may be that the head of the
+		 * log that has moved rather than the tail. As the tail didn't
+		 * move, there still won't be space available for the
+		 * reservation we require.  However, if the AIL has already
+		 * pushed to the target defined by the old log head location, we
+		 * will hang here waiting for something else to update the AIL
+		 * push target.
+		 *
+		 * Therefore, if there isn't space to wake the first waiter on
+		 * the grant head, we need to push the AIL again to ensure the
+		 * target reflects both the current log tail and log head
+		 * position before we wait for the tail to move again.
+		 */
+
 		need_bytes = xlog_ticket_reservation(log, head, tic);
-		if (*free_bytes < need_bytes)
+		if (*free_bytes < need_bytes) {
+			if (!woken_task)
+				xlog_grant_push_ail(log, need_bytes);
 			return false;
+		}
 
 		*free_bytes -= need_bytes;
 		trace_xfs_log_grant_wake_up(log, tic);
 		wake_up_process(tic->t_task);
+		woken_task = true;
 	}
 
 	return true;
-- 
2.23.0.rc1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2019-09-06  0:12 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-09-05  8:47 [PATCH 1/8 v2] xfs: log race fixes and cleanups Dave Chinner
2019-09-05  8:47 ` [PATCH 1/8] xfs: push the AIL in xlog_grant_head_wake Dave Chinner
2019-09-05 15:18   ` Darrick J. Wong
2019-09-05 22:02     ` Dave Chinner
2019-09-05  8:47 ` [PATCH 2/8] xfs: fix missed wakeup on l_flush_wait Dave Chinner
2019-09-05 15:21   ` Darrick J. Wong
2019-09-05  8:47 ` [PATCH 3/8] xfs: prevent CIL push holdoff in log recovery Dave Chinner
2019-09-05 15:26   ` Darrick J. Wong
2019-09-05 22:10     ` Dave Chinner
2019-09-05  8:47 ` [PATCH 4/8] xfs: factor debug code out of xlog_state_do_callback() Dave Chinner
2019-09-05 15:30   ` Darrick J. Wong
2019-09-05 22:14     ` Dave Chinner
2019-09-05  8:47 ` [PATCH 5/8] xfs: factor callbacks " Dave Chinner
2019-09-05 15:39   ` Darrick J. Wong
2019-09-05 22:17     ` Dave Chinner
2019-09-05  8:47 ` [PATCH 6/8] xfs: factor iclog state processing " Dave Chinner
2019-09-05 15:45   ` Darrick J. Wong
2019-09-05  8:47 ` [PATCH 7/8] xfs: push iclog state cleaning into xlog_state_clean_log Dave Chinner
2019-09-05 15:48   ` Darrick J. Wong
2019-09-05 22:28     ` Dave Chinner
2019-09-05  8:47 ` [PATCH 8/8] xfs: push the grant head when the log head moves forward Dave Chinner
2019-09-05 16:00   ` Darrick J. Wong
2019-09-05 15:44 ` [PATCH 1/8 v2] xfs: log race fixes and cleanups Christoph Hellwig
2019-09-06  0:05 [PATCH0/8 v3] " Dave Chinner
2019-09-06  0:05 ` [PATCH 1/8] xfs: push the AIL in xlog_grant_head_wake Dave Chinner
2019-09-06  0:12   ` Darrick J. Wong

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.