All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/8 V2] xfs: log fixes for for-next
@ 2021-06-17  8:26 Dave Chinner
  2021-06-17  8:26 ` [PATCH 1/8] xfs: add iclog state trace events Dave Chinner
                   ` (9 more replies)
  0 siblings, 10 replies; 50+ messages in thread
From: Dave Chinner @ 2021-06-17  8:26 UTC (permalink / raw)
  To: linux-xfs

Hi folks,

This is followup from the first set of log fixes for for-next that
were posted here:

https://lore.kernel.org/linux-xfs/20210615175719.GD158209@locust/T/#mde2cf0bb7d2ac369815a7e9371f0303efc89f51b

The first two patches of this series are updates for those patches,
change log below. The rest is the fix for the bigger issue we
uncovered in investigating the generic/019 failures, being that
we're triggering a zero-day bug in the way log recovery assigns LSNs
to checkpoints.

The "simple" fix of using the same ordering code as the commit
record for the start records in the CIL push turned into a lot of
patches once I started cleaning it up, separating out all the
different bits and finally realising all the things I needed to
change to avoid unintentional logic/behavioural changes. Hence
there's some code movement, some factoring, API changes to
xlog_write(), changing where we attach callbacks to commit iclogs so
they remain correctly ordered if there are multiple commit records
in the one iclog and then, finally, strictly ordering the start
records....

The original "simple fix" I tested last night ran almost a thousand
cycles of generic/019 without a log hang or recovery failure of any
kind. The refactored patchset has run a couple hundred cycles of
g/019 and g/475 over the last few hours without a failure, so I'm
posting this so we can get a review iteration done while I sleep so
we can - hopefully - get this sorted out before the end of the week.

Cheers,

Dave.

Version 2:

- tested on 5.13-rc6 + linux-xfs/for-next
- added strings for XLOG_STATE* variables to tracepoint output.
- rewrote the past/future iclog detection to use iclog header LSNs
  rather than iclog states as the state values do not tell us anything
  useful about the temporal relativity of the iclog in relation to
  the current commit iclog.
- added patches to strictly order checkpoint start records the same
  way we strictly order checkpoint commit records.



^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH 1/8] xfs: add iclog state trace events
  2021-06-17  8:26 [PATCH 0/8 V2] xfs: log fixes for for-next Dave Chinner
@ 2021-06-17  8:26 ` Dave Chinner
  2021-06-17 16:45   ` Darrick J. Wong
  2021-06-18 14:09   ` Christoph Hellwig
  2021-06-17  8:26 ` [PATCH 2/8] xfs: don't wait on future iclogs when pushing the CIL Dave Chinner
                   ` (8 subsequent siblings)
  9 siblings, 2 replies; 50+ messages in thread
From: Dave Chinner @ 2021-06-17  8:26 UTC (permalink / raw)
  To: linux-xfs

From: Dave Chinner <dchinner@redhat.com>

For the DEBUGS!

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/xfs_log.c      | 18 +++++++++++++
 fs/xfs/xfs_log_priv.h | 10 ++++++++
 fs/xfs/xfs_trace.h    | 60 +++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 88 insertions(+)

diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c
index e921b554b683..54fd6a695bb5 100644
--- a/fs/xfs/xfs_log.c
+++ b/fs/xfs/xfs_log.c
@@ -524,6 +524,7 @@ __xlog_state_release_iclog(
 		iclog->ic_header.h_tail_lsn = cpu_to_be64(tail_lsn);
 		xlog_verify_tail_lsn(log, iclog, tail_lsn);
 		/* cycle incremented when incrementing curr_block */
+		trace_xlog_iclog_syncing(iclog, _RET_IP_);
 		return true;
 	}
 
@@ -543,6 +544,7 @@ xlog_state_release_iclog(
 {
 	lockdep_assert_held(&log->l_icloglock);
 
+	trace_xlog_iclog_release(iclog, _RET_IP_);
 	if (iclog->ic_state == XLOG_STATE_IOERROR)
 		return -EIO;
 
@@ -804,6 +806,7 @@ xlog_wait_on_iclog(
 {
 	struct xlog		*log = iclog->ic_log;
 
+	trace_xlog_iclog_wait_on(iclog, _RET_IP_);
 	if (!XLOG_FORCED_SHUTDOWN(log) &&
 	    iclog->ic_state != XLOG_STATE_ACTIVE &&
 	    iclog->ic_state != XLOG_STATE_DIRTY) {
@@ -1804,6 +1807,7 @@ xlog_write_iclog(
 	unsigned int		count)
 {
 	ASSERT(bno < log->l_logBBsize);
+	trace_xlog_iclog_write(iclog, _RET_IP_);
 
 	/*
 	 * We lock the iclogbufs here so that we can serialise against I/O
@@ -1950,6 +1954,7 @@ xlog_sync(
 	unsigned int		size;
 
 	ASSERT(atomic_read(&iclog->ic_refcnt) == 0);
+	trace_xlog_iclog_sync(iclog, _RET_IP_);
 
 	count = xlog_calc_iclog_size(log, iclog, &roundoff);
 
@@ -2488,6 +2493,7 @@ xlog_state_activate_iclog(
 	int			*iclogs_changed)
 {
 	ASSERT(list_empty_careful(&iclog->ic_callbacks));
+	trace_xlog_iclog_activate(iclog, _RET_IP_);
 
 	/*
 	 * If the number of ops in this iclog indicate it just contains the
@@ -2577,6 +2583,8 @@ xlog_state_clean_iclog(
 {
 	int			iclogs_changed = 0;
 
+	trace_xlog_iclog_clean(dirty_iclog, _RET_IP_);
+
 	dirty_iclog->ic_state = XLOG_STATE_DIRTY;
 
 	xlog_state_activate_iclogs(log, &iclogs_changed);
@@ -2636,6 +2644,7 @@ xlog_state_set_callback(
 	struct xlog_in_core	*iclog,
 	xfs_lsn_t		header_lsn)
 {
+	trace_xlog_iclog_callback(iclog, _RET_IP_);
 	iclog->ic_state = XLOG_STATE_CALLBACK;
 
 	ASSERT(XFS_LSN_CMP(atomic64_read(&log->l_last_sync_lsn),
@@ -2717,6 +2726,7 @@ xlog_state_do_iclog_callbacks(
 		__releases(&log->l_icloglock)
 		__acquires(&log->l_icloglock)
 {
+	trace_xlog_iclog_callbacks_start(iclog, _RET_IP_);
 	spin_unlock(&log->l_icloglock);
 	spin_lock(&iclog->ic_callback_lock);
 	while (!list_empty(&iclog->ic_callbacks)) {
@@ -2736,6 +2746,7 @@ xlog_state_do_iclog_callbacks(
 	 */
 	spin_lock(&log->l_icloglock);
 	spin_unlock(&iclog->ic_callback_lock);
+	trace_xlog_iclog_callbacks_done(iclog, _RET_IP_);
 }
 
 STATIC void
@@ -2827,6 +2838,7 @@ xlog_state_done_syncing(
 
 	spin_lock(&log->l_icloglock);
 	ASSERT(atomic_read(&iclog->ic_refcnt) == 0);
+	trace_xlog_iclog_sync_done(iclog, _RET_IP_);
 
 	/*
 	 * If we got an error, either on the first buffer, or in the case of
@@ -2899,6 +2911,8 @@ xlog_state_get_iclog_space(
 	atomic_inc(&iclog->ic_refcnt);	/* prevents sync */
 	log_offset = iclog->ic_offset;
 
+	trace_xlog_iclog_get_space(iclog, _RET_IP_);
+
 	/* On the 1st write to an iclog, figure out lsn.  This works
 	 * if iclogs marked XLOG_STATE_WANT_SYNC always write out what they are
 	 * committing to.  If the offset is set, that's how many blocks
@@ -3056,6 +3070,7 @@ xlog_state_switch_iclogs(
 {
 	ASSERT(iclog->ic_state == XLOG_STATE_ACTIVE);
 	assert_spin_locked(&log->l_icloglock);
+	trace_xlog_iclog_switch(iclog, _RET_IP_);
 
 	if (!eventual_size)
 		eventual_size = iclog->ic_offset;
@@ -3138,6 +3153,8 @@ xfs_log_force(
 	if (iclog->ic_state == XLOG_STATE_IOERROR)
 		goto out_error;
 
+	trace_xlog_iclog_force(iclog, _RET_IP_);
+
 	if (iclog->ic_state == XLOG_STATE_DIRTY ||
 	    (iclog->ic_state == XLOG_STATE_ACTIVE &&
 	     atomic_read(&iclog->ic_refcnt) == 0 && iclog->ic_offset == 0)) {
@@ -3225,6 +3242,7 @@ xlog_force_lsn(
 		goto out_error;
 
 	while (be64_to_cpu(iclog->ic_header.h_lsn) != lsn) {
+		trace_xlog_iclog_force_lsn(iclog, _RET_IP_);
 		iclog = iclog->ic_next;
 		if (iclog == log->l_iclog)
 			goto out_unlock;
diff --git a/fs/xfs/xfs_log_priv.h b/fs/xfs/xfs_log_priv.h
index e4e421a70335..330befd9f6be 100644
--- a/fs/xfs/xfs_log_priv.h
+++ b/fs/xfs/xfs_log_priv.h
@@ -50,6 +50,16 @@ enum xlog_iclog_state {
 	XLOG_STATE_IOERROR,	/* IO error happened in sync'ing log */
 };
 
+#define XLOG_STATE_STRINGS \
+	{ XLOG_STATE_ACTIVE,	"XLOG_STATE_ACTIVE" }, \
+	{ XLOG_STATE_WANT_SYNC,	"XLOG_STATE_WANT_SYNC" }, \
+	{ XLOG_STATE_SYNCING,	"XLOG_STATE_SYNCING" }, \
+	{ XLOG_STATE_DONE_SYNC,	"XLOG_STATE_DONE_SYNC" }, \
+	{ XLOG_STATE_CALLBACK,	"XLOG_STATE_CALLBACK" }, \
+	{ XLOG_STATE_DIRTY,	"XLOG_STATE_DIRTY" }, \
+	{ XLOG_STATE_IOERROR,	"XLOG_STATE_IOERROR" }
+
+
 /*
  * Log ticket flags
  */
diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
index 71dca776c110..28d570742000 100644
--- a/fs/xfs/xfs_trace.h
+++ b/fs/xfs/xfs_trace.h
@@ -24,6 +24,7 @@ struct xlog_ticket;
 struct xlog_recover;
 struct xlog_recover_item;
 struct xlog_rec_header;
+struct xlog_in_core;
 struct xfs_buf_log_format;
 struct xfs_inode_log_format;
 struct xfs_bmbt_irec;
@@ -3927,6 +3928,65 @@ DEFINE_EVENT(xfs_icwalk_class, name,	\
 DEFINE_ICWALK_EVENT(xfs_ioc_free_eofblocks);
 DEFINE_ICWALK_EVENT(xfs_blockgc_free_space);
 
+TRACE_DEFINE_ENUM(XLOG_STATE_ACTIVE);
+TRACE_DEFINE_ENUM(XLOG_STATE_WANT_SYNC);
+TRACE_DEFINE_ENUM(XLOG_STATE_SYNCING);
+TRACE_DEFINE_ENUM(XLOG_STATE_DONE_SYNC);
+TRACE_DEFINE_ENUM(XLOG_STATE_CALLBACK);
+TRACE_DEFINE_ENUM(XLOG_STATE_DIRTY);
+TRACE_DEFINE_ENUM(XLOG_STATE_IOERROR);
+
+DECLARE_EVENT_CLASS(xlog_iclog_class,
+	TP_PROTO(struct xlog_in_core *iclog, unsigned long caller_ip),
+	TP_ARGS(iclog, caller_ip),
+	TP_STRUCT__entry(
+		__field(dev_t, dev)
+		__field(uint32_t, state)
+		__field(int32_t, refcount)
+		__field(uint32_t, offset)
+		__field(unsigned long long, lsn)
+		__field(unsigned long, caller_ip)
+	),
+	TP_fast_assign(
+		__entry->dev = iclog->ic_log->l_mp->m_super->s_dev;
+		__entry->state = iclog->ic_state;
+		__entry->refcount = atomic_read(&iclog->ic_refcnt);
+		__entry->offset = iclog->ic_offset;
+		__entry->lsn = be64_to_cpu(iclog->ic_header.h_lsn);
+		__entry->caller_ip = caller_ip;
+	),
+	TP_printk("dev %d:%d state %s refcnt %d offset %u lsn 0x%llx caller %pS",
+		  MAJOR(__entry->dev), MINOR(__entry->dev),
+		  __print_symbolic(__entry->state, XLOG_STATE_STRINGS),
+		  __entry->refcount,
+		  __entry->offset,
+		  __entry->lsn,
+		  (char *)__entry->caller_ip)
+
+);
+
+#define DEFINE_ICLOG_EVENT(name)	\
+DEFINE_EVENT(xlog_iclog_class, name,	\
+	TP_PROTO(struct xlog_in_core *iclog, unsigned long caller_ip), \
+	TP_ARGS(iclog, caller_ip))
+
+DEFINE_ICLOG_EVENT(xlog_iclog_activate);
+DEFINE_ICLOG_EVENT(xlog_iclog_clean);
+DEFINE_ICLOG_EVENT(xlog_iclog_callback);
+DEFINE_ICLOG_EVENT(xlog_iclog_callbacks_start);
+DEFINE_ICLOG_EVENT(xlog_iclog_callbacks_done);
+DEFINE_ICLOG_EVENT(xlog_iclog_force);
+DEFINE_ICLOG_EVENT(xlog_iclog_force_lsn);
+DEFINE_ICLOG_EVENT(xlog_iclog_get_space);
+DEFINE_ICLOG_EVENT(xlog_iclog_release);
+DEFINE_ICLOG_EVENT(xlog_iclog_switch);
+DEFINE_ICLOG_EVENT(xlog_iclog_sync);
+DEFINE_ICLOG_EVENT(xlog_iclog_syncing);
+DEFINE_ICLOG_EVENT(xlog_iclog_sync_done);
+DEFINE_ICLOG_EVENT(xlog_iclog_want_sync);
+DEFINE_ICLOG_EVENT(xlog_iclog_wait_on);
+DEFINE_ICLOG_EVENT(xlog_iclog_write);
+
 #endif /* _TRACE_XFS_H */
 
 #undef TRACE_INCLUDE_PATH
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH 2/8] xfs: don't wait on future iclogs when pushing the CIL
  2021-06-17  8:26 [PATCH 0/8 V2] xfs: log fixes for for-next Dave Chinner
  2021-06-17  8:26 ` [PATCH 1/8] xfs: add iclog state trace events Dave Chinner
@ 2021-06-17  8:26 ` Dave Chinner
  2021-06-17 17:49   ` Darrick J. Wong
  2021-06-17  8:26 ` [PATCH 3/8] xfs: move xlog_commit_record to xfs_log_cil.c Dave Chinner
                   ` (7 subsequent siblings)
  9 siblings, 1 reply; 50+ messages in thread
From: Dave Chinner @ 2021-06-17  8:26 UTC (permalink / raw)
  To: linux-xfs

From: Dave Chinner <dchinner@redhat.com>

The iclogbuf ring attached to the struct xlog is circular, hence the
first and last iclogs in the ring can only be determined by
comparing them against the log->l_iclog pointer.

In xfs_cil_push_work(), we want to wait on previous iclogs that were
issued so that we can flush them to stable storage with the commit
record write, and it simply waits on the previous iclog in the ring.
This, however, leads to CIL push hangs in generic/019 like so:

task:kworker/u33:0   state:D stack:12680 pid:    7 ppid:     2 flags:0x00004000
Workqueue: xfs-cil/pmem1 xlog_cil_push_work
Call Trace:
 __schedule+0x30b/0x9f0
 schedule+0x68/0xe0
 xlog_wait_on_iclog+0x121/0x190
 ? wake_up_q+0xa0/0xa0
 xlog_cil_push_work+0x994/0xa10
 ? _raw_spin_lock+0x15/0x20
 ? xfs_swap_extents+0x920/0x920
 process_one_work+0x1ab/0x390
 worker_thread+0x56/0x3d0
 ? rescuer_thread+0x3c0/0x3c0
 kthread+0x14d/0x170
 ? __kthread_bind_mask+0x70/0x70
 ret_from_fork+0x1f/0x30

With other threads blocking in either xlog_state_get_iclog_space()
waiting for iclog space or xlog_grant_head_wait() waiting for log
reservation space.

The problem here is that the previous iclog on the ring might
actually be a future iclog. That is, if log->l_iclog points at
commit_iclog, commit_iclog is the first (oldest) iclog in the ring
and there are no previous iclogs pending as they have all completed
their IO and been activated again. IOWs, commit_iclog->ic_prev
points to an iclog that will be written in the future, not one that
has been written in the past.

Hence, in this case, waiting on the ->ic_prev iclog is incorrect
behaviour, and depending on the state of the future iclog, we can
end up with a circular ABA wait cycle and we hang.

The fix is made more complex by the fact that many iclogs states
cannot be used to determine if the iclog is a past or future iclog.
Hence we have to determine past iclogs by checking the LSN of the
iclog rather than their state. A past ACTIVE iclog will have a LSN
of zero, while a future ACTIVE iclog will have a LSN greater than
the current iclog. We don't wait on either of these cases.

Similarly, a future iclog that hasn't completed IO will have an LSN
greater than the current iclog and so we don't wait on them. A past
iclog that is still undergoing IO completion will have a LSN less
than the current iclog and those are the only iclogs that we need to
wait on.

Hence we can use the iclog LSN to determine what iclogs we need to
wait on here.

Fixes: 5fd9256ce156 ("xfs: separate CIL commit record IO")
Reported-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/xfs_log_cil.c | 51 ++++++++++++++++++++++++++++++++++++++------
 1 file changed, 45 insertions(+), 6 deletions(-)

diff --git a/fs/xfs/xfs_log_cil.c b/fs/xfs/xfs_log_cil.c
index 705619e9dab4..2fb0ab02dda3 100644
--- a/fs/xfs/xfs_log_cil.c
+++ b/fs/xfs/xfs_log_cil.c
@@ -1075,15 +1075,54 @@ xlog_cil_push_work(
 	ticket = ctx->ticket;
 
 	/*
-	 * If the checkpoint spans multiple iclogs, wait for all previous
-	 * iclogs to complete before we submit the commit_iclog. In this case,
-	 * the commit_iclog write needs to issue a pre-flush so that the
-	 * ordering is correctly preserved down to stable storage.
+	 * If the checkpoint spans multiple iclogs, wait for all previous iclogs
+	 * to complete before we submit the commit_iclog. We can't use state
+	 * checks for this - ACTIVE can be either a past completed iclog or a
+	 * future iclog being filled, while WANT_SYNC through SYNC_DONE can be a
+	 * past or future iclog awaiting IO or ordered IO completion to be run.
+	 * In the latter case, if it's a future iclog and we wait on it, the we
+	 * will hang because it won't get processed through to ic_force_wait
+	 * wakeup until this commit_iclog is written to disk.  Hence we use the
+	 * iclog header lsn and compare it to the commit lsn to determine if we
+	 * need to wait on iclogs or not.
 	 */
 	spin_lock(&log->l_icloglock);
 	if (ctx->start_lsn != commit_lsn) {
-		xlog_wait_on_iclog(commit_iclog->ic_prev);
-		spin_lock(&log->l_icloglock);
+		struct xlog_in_core	*iclog;
+
+		for (iclog = commit_iclog->ic_prev;
+		     iclog != commit_iclog;
+		     iclog = iclog->ic_prev) {
+			xfs_lsn_t	hlsn;
+
+			/*
+			 * If the LSN of the iclog is zero or in the future it
+			 * means it has passed through IO completion and
+			 * activation and hence all previous iclogs have also
+			 * done so. We do not need to wait at all in this case.
+			 */
+			hlsn = be64_to_cpu(iclog->ic_header.h_lsn);
+			if (!hlsn || XFS_LSN_CMP(hlsn, commit_lsn) > 0)
+				break;
+
+			/*
+			 * If the LSN of the iclog is older than the commit lsn,
+			 * we have to wait on it. Waiting on this via the
+			 * ic_force_wait should also order the completion of all
+			 * older iclogs, too, but we leave checking that to the
+			 * next loop iteration.
+			 */
+			ASSERT(XFS_LSN_CMP(hlsn, commit_lsn) < 0);
+			xlog_wait_on_iclog(iclog);
+			spin_lock(&log->l_icloglock);
+		}
+
+		/*
+		 * Regardless of whether we need to wait or not, the the
+		 * commit_iclog write needs to issue a pre-flush so that the
+		 * ordering for this checkpoint is correctly preserved down to
+		 * stable storage.
+		 */
 		commit_iclog->ic_flags |= XLOG_ICL_NEED_FLUSH;
 	}
 
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH 3/8] xfs: move xlog_commit_record to xfs_log_cil.c
  2021-06-17  8:26 [PATCH 0/8 V2] xfs: log fixes for for-next Dave Chinner
  2021-06-17  8:26 ` [PATCH 1/8] xfs: add iclog state trace events Dave Chinner
  2021-06-17  8:26 ` [PATCH 2/8] xfs: don't wait on future iclogs when pushing the CIL Dave Chinner
@ 2021-06-17  8:26 ` Dave Chinner
  2021-06-17 12:57     ` kernel test robot
                     ` (2 more replies)
  2021-06-17  8:26 ` [PATCH 4/8] xfs: pass a CIL context to xlog_write() Dave Chinner
                   ` (6 subsequent siblings)
  9 siblings, 3 replies; 50+ messages in thread
From: Dave Chinner @ 2021-06-17  8:26 UTC (permalink / raw)
  To: linux-xfs

From: Dave Chinner <dchinner@redhat.com>

It is only used by the CIL checkpoints, and is the counterpart to
start record formatting and writing that is already local to
xfs_log_cil.c.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/xfs_log.c      | 41 ---------------------------------------
 fs/xfs/xfs_log_cil.c  | 45 ++++++++++++++++++++++++++++++++++++++++++-
 fs/xfs/xfs_log_priv.h |  2 --
 3 files changed, 44 insertions(+), 44 deletions(-)

diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c
index 54fd6a695bb5..cf661c155786 100644
--- a/fs/xfs/xfs_log.c
+++ b/fs/xfs/xfs_log.c
@@ -1563,47 +1563,6 @@ xlog_alloc_log(
 	return ERR_PTR(error);
 }	/* xlog_alloc_log */
 
-/*
- * Write out the commit record of a transaction associated with the given
- * ticket to close off a running log write. Return the lsn of the commit record.
- */
-int
-xlog_commit_record(
-	struct xlog		*log,
-	struct xlog_ticket	*ticket,
-	struct xlog_in_core	**iclog,
-	xfs_lsn_t		*lsn)
-{
-	struct xlog_op_header	ophdr = {
-		.oh_clientid = XFS_TRANSACTION,
-		.oh_tid = cpu_to_be32(ticket->t_tid),
-		.oh_flags = XLOG_COMMIT_TRANS,
-	};
-	struct xfs_log_iovec reg = {
-		.i_addr = &ophdr,
-		.i_len = sizeof(struct xlog_op_header),
-		.i_type = XLOG_REG_TYPE_COMMIT,
-	};
-	struct xfs_log_vec vec = {
-		.lv_niovecs = 1,
-		.lv_iovecp = &reg,
-	};
-	int	error;
-	LIST_HEAD(lv_chain);
-	INIT_LIST_HEAD(&vec.lv_list);
-	list_add(&vec.lv_list, &lv_chain);
-
-	if (XLOG_FORCED_SHUTDOWN(log))
-		return -EIO;
-
-	/* account for space used by record data */
-	ticket->t_curr_res -= reg.i_len;
-	error = xlog_write(log, &lv_chain, ticket, lsn, iclog, reg.i_len);
-	if (error)
-		xfs_force_shutdown(log->l_mp, SHUTDOWN_LOG_IO_ERROR);
-	return error;
-}
-
 /*
  * Compute the LSN that we'd need to push the log tail towards in order to have
  * (a) enough on-disk log space to log the number of bytes specified, (b) at
diff --git a/fs/xfs/xfs_log_cil.c b/fs/xfs/xfs_log_cil.c
index 2fb0ab02dda3..2c8b25888c53 100644
--- a/fs/xfs/xfs_log_cil.c
+++ b/fs/xfs/xfs_log_cil.c
@@ -783,6 +783,48 @@ xlog_cil_build_trans_hdr(
 	tic->t_curr_res -= lvhdr->lv_bytes;
 }
 
+/*
+ * Write out the commit record of a checkpoint transaction associated with the
+ * given ticket to close off a running log write. Return the lsn of the commit
+ * record.
+ */
+int
+xlog_cil_write_commit_record(
+	struct xlog		*log,
+	struct xlog_ticket	*ticket,
+	struct xlog_in_core	**iclog,
+	xfs_lsn_t		*lsn)
+{
+	struct xlog_op_header	ophdr = {
+		.oh_clientid = XFS_TRANSACTION,
+		.oh_tid = cpu_to_be32(ticket->t_tid),
+		.oh_flags = XLOG_COMMIT_TRANS,
+	};
+	struct xfs_log_iovec reg = {
+		.i_addr = &ophdr,
+		.i_len = sizeof(struct xlog_op_header),
+		.i_type = XLOG_REG_TYPE_COMMIT,
+	};
+	struct xfs_log_vec vec = {
+		.lv_niovecs = 1,
+		.lv_iovecp = &reg,
+	};
+	int	error;
+	LIST_HEAD(lv_chain);
+	INIT_LIST_HEAD(&vec.lv_list);
+	list_add(&vec.lv_list, &lv_chain);
+
+	if (XLOG_FORCED_SHUTDOWN(log))
+		return -EIO;
+
+	/* account for space used by record data */
+	ticket->t_curr_res -= reg.i_len;
+	error = xlog_write(log, &lv_chain, ticket, lsn, iclog, reg.i_len);
+	if (error)
+		xfs_force_shutdown(log->l_mp, SHUTDOWN_LOG_IO_ERROR);
+	return error;
+}
+
 /*
  * CIL item reordering compare function. We want to order in ascending ID order,
  * but we want to leave items with the same ID in the order they were added to
@@ -1041,7 +1083,8 @@ xlog_cil_push_work(
 	}
 	spin_unlock(&cil->xc_push_lock);
 
-	error = xlog_commit_record(log, ctx->ticket, &commit_iclog, &commit_lsn);
+	error = xlog_cil_write_commit_record(log, ctx->ticket, &commit_iclog,
+			&commit_lsn);
 	if (error)
 		goto out_abort_free_ticket;
 
diff --git a/fs/xfs/xfs_log_priv.h b/fs/xfs/xfs_log_priv.h
index 330befd9f6be..26f26769d1c6 100644
--- a/fs/xfs/xfs_log_priv.h
+++ b/fs/xfs/xfs_log_priv.h
@@ -490,8 +490,6 @@ void	xlog_print_trans(struct xfs_trans *);
 int	xlog_write(struct xlog *log, struct list_head *lv_chain,
 		struct xlog_ticket *tic, xfs_lsn_t *start_lsn,
 		struct xlog_in_core **commit_iclog, uint32_t len);
-int	xlog_commit_record(struct xlog *log, struct xlog_ticket *ticket,
-		struct xlog_in_core **iclog, xfs_lsn_t *lsn);
 
 void	xfs_log_ticket_ungrant(struct xlog *log, struct xlog_ticket *ticket);
 void	xfs_log_ticket_regrant(struct xlog *log, struct xlog_ticket *ticket);
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH 4/8] xfs: pass a CIL context to xlog_write()
  2021-06-17  8:26 [PATCH 0/8 V2] xfs: log fixes for for-next Dave Chinner
                   ` (2 preceding siblings ...)
  2021-06-17  8:26 ` [PATCH 3/8] xfs: move xlog_commit_record to xfs_log_cil.c Dave Chinner
@ 2021-06-17  8:26 ` Dave Chinner
  2021-06-17 14:46     ` kernel test robot
                     ` (2 more replies)
  2021-06-17  8:26 ` [PATCH 5/8] xfs: factor out log write ordering from xlog_cil_push_work() Dave Chinner
                   ` (5 subsequent siblings)
  9 siblings, 3 replies; 50+ messages in thread
From: Dave Chinner @ 2021-06-17  8:26 UTC (permalink / raw)
  To: linux-xfs

From: Dave Chinner <dchinner@redhat.com>

Pass the CIL context to xlog_write() rather than a pointer to a LSN
variable. Only the CIL checkpoint calls to xlog_write() need to know
about the start LSN of the writes, so rework xlog_write to directly
write the LSNs into the CIL context structure.

This removes the commit_lsn variable from xlog_cil_push_work(), so
now we only have to issue the commit record ordering wakeup from
there.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/xfs_log.c      | 22 +++++++++++++++++-----
 fs/xfs/xfs_log_cil.c  | 19 ++++++++-----------
 fs/xfs/xfs_log_priv.h |  4 ++--
 3 files changed, 27 insertions(+), 18 deletions(-)

diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c
index cf661c155786..fc0e43c57683 100644
--- a/fs/xfs/xfs_log.c
+++ b/fs/xfs/xfs_log.c
@@ -871,7 +871,7 @@ xlog_write_unmount_record(
 	 */
 	if (log->l_targ != log->l_mp->m_ddev_targp)
 		blkdev_issue_flush(log->l_targ->bt_bdev);
-	return xlog_write(log, &lv_chain, ticket, NULL, NULL, reg.i_len);
+	return xlog_write(log, NULL, &lv_chain, ticket, NULL, reg.i_len);
 }
 
 /*
@@ -2383,9 +2383,9 @@ xlog_write_partial(
 int
 xlog_write(
 	struct xlog		*log,
+	struct xfs_cil_ctx	*ctx,
 	struct list_head	*lv_chain,
 	struct xlog_ticket	*ticket,
-	xfs_lsn_t		*start_lsn,
 	struct xlog_in_core	**commit_iclog,
 	uint32_t		len)
 {
@@ -2408,9 +2408,21 @@ xlog_write(
 	if (error)
 		return error;
 
-	/* start_lsn is the LSN of the first iclog written to. */
-	if (start_lsn)
-		*start_lsn = be64_to_cpu(iclog->ic_header.h_lsn);
+	/*
+	 * If we have a CIL context, record the LSN of the iclog we were just
+	 * granted space to start writing into. If the context doesn't have
+	 * a start_lsn recorded, then this iclog will contain the start record
+	 * for the checkpoint. Otherwise this write contains the commit record
+	 * for the checkpoint.
+	 */
+	if (ctx) {
+		spin_lock(&ctx->cil->xc_push_lock);
+		if (!ctx->start_lsn)
+			ctx->start_lsn = be64_to_cpu(iclog->ic_header.h_lsn);
+		else
+			ctx->commit_lsn = be64_to_cpu(iclog->ic_header.h_lsn);
+		spin_unlock(&ctx->cil->xc_push_lock);
+	}
 
 	lv = list_first_entry_or_null(lv_chain, struct xfs_log_vec, lv_list);
 	while (lv) {
diff --git a/fs/xfs/xfs_log_cil.c b/fs/xfs/xfs_log_cil.c
index 2c8b25888c53..35fc3e57d870 100644
--- a/fs/xfs/xfs_log_cil.c
+++ b/fs/xfs/xfs_log_cil.c
@@ -790,14 +790,13 @@ xlog_cil_build_trans_hdr(
  */
 int
 xlog_cil_write_commit_record(
-	struct xlog		*log,
-	struct xlog_ticket	*ticket,
-	struct xlog_in_core	**iclog,
-	xfs_lsn_t		*lsn)
+	struct xfs_cil_ctx	*ctx,
+	struct xlog_in_core	**iclog)
 {
+	struct xlog		*log = ctx->cil->xc_log;
 	struct xlog_op_header	ophdr = {
 		.oh_clientid = XFS_TRANSACTION,
-		.oh_tid = cpu_to_be32(ticket->t_tid),
+		.oh_tid = cpu_to_be32(ctx->ticket->t_tid),
 		.oh_flags = XLOG_COMMIT_TRANS,
 	};
 	struct xfs_log_iovec reg = {
@@ -818,8 +817,8 @@ xlog_cil_write_commit_record(
 		return -EIO;
 
 	/* account for space used by record data */
-	ticket->t_curr_res -= reg.i_len;
-	error = xlog_write(log, &lv_chain, ticket, lsn, iclog, reg.i_len);
+	ctx->ticket->t_curr_res -= reg.i_len;
+	error = xlog_write(log, ctx, &lv_chain, ctx->ticket, iclog, reg.i_len);
 	if (error)
 		xfs_force_shutdown(log->l_mp, SHUTDOWN_LOG_IO_ERROR);
 	return error;
@@ -1038,7 +1037,7 @@ xlog_cil_push_work(
 	 * use the commit record lsn then we can move the tail beyond the grant
 	 * write head.
 	 */
-	error = xlog_write(log, &ctx->lv_chain, ctx->ticket, &ctx->start_lsn,
+	error = xlog_write(log, ctx, &ctx->lv_chain, ctx->ticket,
 				NULL, num_bytes);
 
 	/*
@@ -1083,8 +1082,7 @@ xlog_cil_push_work(
 	}
 	spin_unlock(&cil->xc_push_lock);
 
-	error = xlog_cil_write_commit_record(log, ctx->ticket, &commit_iclog,
-			&commit_lsn);
+	error = xlog_cil_write_commit_record(ctx, &commit_iclog);
 	if (error)
 		goto out_abort_free_ticket;
 
@@ -1104,7 +1102,6 @@ xlog_cil_push_work(
 	 * and wake up anyone who is waiting for the commit to complete.
 	 */
 	spin_lock(&cil->xc_push_lock);
-	ctx->commit_lsn = commit_lsn;
 	wake_up_all(&cil->xc_commit_wait);
 	spin_unlock(&cil->xc_push_lock);
 
diff --git a/fs/xfs/xfs_log_priv.h b/fs/xfs/xfs_log_priv.h
index 26f26769d1c6..af8a9dfa8068 100644
--- a/fs/xfs/xfs_log_priv.h
+++ b/fs/xfs/xfs_log_priv.h
@@ -487,8 +487,8 @@ xlog_write_adv_cnt(void **ptr, int *len, int *off, size_t bytes)
 
 void	xlog_print_tic_res(struct xfs_mount *mp, struct xlog_ticket *ticket);
 void	xlog_print_trans(struct xfs_trans *);
-int	xlog_write(struct xlog *log, struct list_head *lv_chain,
-		struct xlog_ticket *tic, xfs_lsn_t *start_lsn,
+int	xlog_write(struct xlog *log, struct xfs_cil_ctx *ctx,
+		struct list_head *lv_chain, struct xlog_ticket *tic,
 		struct xlog_in_core **commit_iclog, uint32_t len);
 
 void	xfs_log_ticket_ungrant(struct xlog *log, struct xlog_ticket *ticket);
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH 5/8] xfs: factor out log write ordering from xlog_cil_push_work()
  2021-06-17  8:26 [PATCH 0/8 V2] xfs: log fixes for for-next Dave Chinner
                   ` (3 preceding siblings ...)
  2021-06-17  8:26 ` [PATCH 4/8] xfs: pass a CIL context to xlog_write() Dave Chinner
@ 2021-06-17  8:26 ` Dave Chinner
  2021-06-17 19:59   ` Darrick J. Wong
  2021-06-17  8:26 ` [PATCH 6/8] xfs: separate out setting CIL context LSNs from xlog_write Dave Chinner
                   ` (4 subsequent siblings)
  9 siblings, 1 reply; 50+ messages in thread
From: Dave Chinner @ 2021-06-17  8:26 UTC (permalink / raw)
  To: linux-xfs

From: Dave Chinner <dchinner@redhat.com>

So we can use it for start record ordering as well as commit record
ordering in future.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/xfs_log_cil.c | 89 ++++++++++++++++++++++++++------------------
 1 file changed, 52 insertions(+), 37 deletions(-)

diff --git a/fs/xfs/xfs_log_cil.c b/fs/xfs/xfs_log_cil.c
index 35fc3e57d870..f993ec69fc97 100644
--- a/fs/xfs/xfs_log_cil.c
+++ b/fs/xfs/xfs_log_cil.c
@@ -784,9 +784,54 @@ xlog_cil_build_trans_hdr(
 }
 
 /*
- * Write out the commit record of a checkpoint transaction associated with the
- * given ticket to close off a running log write. Return the lsn of the commit
- * record.
+ * Ensure that the order of log writes follows checkpoint sequence order. This
+ * relies on the context LSN being zero until the log write has guaranteed the
+ * LSN that the log write will start at via xlog_state_get_iclog_space().
+ */
+static int
+xlog_cil_order_write(
+	struct xfs_cil		*cil,
+	xfs_csn_t		sequence)
+{
+	struct xfs_cil_ctx	*ctx;
+
+restart:
+	spin_lock(&cil->xc_push_lock);
+	list_for_each_entry(ctx, &cil->xc_committing, committing) {
+		/*
+		 * Avoid getting stuck in this loop because we were woken by the
+		 * shutdown, but then went back to sleep once already in the
+		 * shutdown state.
+		 */
+		if (XLOG_FORCED_SHUTDOWN(cil->xc_log)) {
+			spin_unlock(&cil->xc_push_lock);
+			return -EIO;
+		}
+
+		/*
+		 * Higher sequences will wait for this one so skip them.
+		 * Don't wait for our own sequence, either.
+		 */
+		if (ctx->sequence >= sequence)
+			continue;
+		if (!ctx->commit_lsn) {
+			/*
+			 * It is still being pushed! Wait for the push to
+			 * complete, then start again from the beginning.
+			 */
+			xlog_wait(&cil->xc_commit_wait, &cil->xc_push_lock);
+			goto restart;
+		}
+	}
+	spin_unlock(&cil->xc_push_lock);
+	return 0;
+}
+
+/*
+ * Write out the commit record of a checkpoint transaction to close off a
+ * running log write. These commit records are strictly ordered in ascending CIL
+ * sequence order so that log recovery will always replay the checkpoints in the
+ * correct order.
  */
 int
 xlog_cil_write_commit_record(
@@ -816,6 +861,10 @@ xlog_cil_write_commit_record(
 	if (XLOG_FORCED_SHUTDOWN(log))
 		return -EIO;
 
+	error = xlog_cil_order_write(ctx->cil, ctx->sequence);
+	if (error)
+		return error;
+
 	/* account for space used by record data */
 	ctx->ticket->t_curr_res -= reg.i_len;
 	error = xlog_write(log, ctx, &lv_chain, ctx->ticket, iclog, reg.i_len);
@@ -1048,40 +1097,6 @@ xlog_cil_push_work(
 	if (error)
 		goto out_abort_free_ticket;
 
-	/*
-	 * now that we've written the checkpoint into the log, strictly
-	 * order the commit records so replay will get them in the right order.
-	 */
-restart:
-	spin_lock(&cil->xc_push_lock);
-	list_for_each_entry(new_ctx, &cil->xc_committing, committing) {
-		/*
-		 * Avoid getting stuck in this loop because we were woken by the
-		 * shutdown, but then went back to sleep once already in the
-		 * shutdown state.
-		 */
-		if (XLOG_FORCED_SHUTDOWN(log)) {
-			spin_unlock(&cil->xc_push_lock);
-			goto out_abort_free_ticket;
-		}
-
-		/*
-		 * Higher sequences will wait for this one so skip them.
-		 * Don't wait for our own sequence, either.
-		 */
-		if (new_ctx->sequence >= ctx->sequence)
-			continue;
-		if (!new_ctx->commit_lsn) {
-			/*
-			 * It is still being pushed! Wait for the push to
-			 * complete, then start again from the beginning.
-			 */
-			xlog_wait(&cil->xc_commit_wait, &cil->xc_push_lock);
-			goto restart;
-		}
-	}
-	spin_unlock(&cil->xc_push_lock);
-
 	error = xlog_cil_write_commit_record(ctx, &commit_iclog);
 	if (error)
 		goto out_abort_free_ticket;
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH 6/8] xfs: separate out setting CIL context LSNs from xlog_write
  2021-06-17  8:26 [PATCH 0/8 V2] xfs: log fixes for for-next Dave Chinner
                   ` (4 preceding siblings ...)
  2021-06-17  8:26 ` [PATCH 5/8] xfs: factor out log write ordering from xlog_cil_push_work() Dave Chinner
@ 2021-06-17  8:26 ` Dave Chinner
  2021-06-17 20:28   ` Darrick J. Wong
  2021-06-17  8:26 ` [PATCH 7/8] xfs: attached iclog callbacks in xlog_cil_set_ctx_write_state() Dave Chinner
                   ` (3 subsequent siblings)
  9 siblings, 1 reply; 50+ messages in thread
From: Dave Chinner @ 2021-06-17  8:26 UTC (permalink / raw)
  To: linux-xfs

From: Dave Chinner <dchinner@redhat.com>

In preparation for moving more CIL context specific functionality
into these operations.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/xfs_log.c      | 17 ++---------------
 fs/xfs/xfs_log_cil.c  | 23 +++++++++++++++++++++++
 fs/xfs/xfs_log_priv.h |  2 ++
 3 files changed, 27 insertions(+), 15 deletions(-)

diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c
index fc0e43c57683..1c214b395223 100644
--- a/fs/xfs/xfs_log.c
+++ b/fs/xfs/xfs_log.c
@@ -2408,21 +2408,8 @@ xlog_write(
 	if (error)
 		return error;
 
-	/*
-	 * If we have a CIL context, record the LSN of the iclog we were just
-	 * granted space to start writing into. If the context doesn't have
-	 * a start_lsn recorded, then this iclog will contain the start record
-	 * for the checkpoint. Otherwise this write contains the commit record
-	 * for the checkpoint.
-	 */
-	if (ctx) {
-		spin_lock(&ctx->cil->xc_push_lock);
-		if (!ctx->start_lsn)
-			ctx->start_lsn = be64_to_cpu(iclog->ic_header.h_lsn);
-		else
-			ctx->commit_lsn = be64_to_cpu(iclog->ic_header.h_lsn);
-		spin_unlock(&ctx->cil->xc_push_lock);
-	}
+	if (ctx)
+		xlog_cil_set_ctx_write_state(ctx, iclog);
 
 	lv = list_first_entry_or_null(lv_chain, struct xfs_log_vec, lv_list);
 	while (lv) {
diff --git a/fs/xfs/xfs_log_cil.c b/fs/xfs/xfs_log_cil.c
index f993ec69fc97..2d8d904ffb78 100644
--- a/fs/xfs/xfs_log_cil.c
+++ b/fs/xfs/xfs_log_cil.c
@@ -783,6 +783,29 @@ xlog_cil_build_trans_hdr(
 	tic->t_curr_res -= lvhdr->lv_bytes;
 }
 
+/*
+ * Record the LSN of the iclog we were just granted space to start writing into.
+ * If the context doesn't have a start_lsn recorded, then this iclog will
+ * contain the start record for the checkpoint. Otherwise this write contains
+ * the commit record for the checkpoint.
+ */
+void
+xlog_cil_set_ctx_write_state(
+	struct xfs_cil_ctx	*ctx,
+	struct xlog_in_core	*iclog)
+{
+	struct xfs_cil		*cil = ctx->cil;
+	xfs_lsn_t		lsn = be64_to_cpu(iclog->ic_header.h_lsn);
+
+	ASSERT(!ctx->commit_lsn);
+	spin_lock(&cil->xc_push_lock);
+	if (!ctx->start_lsn)
+		ctx->start_lsn = lsn;
+	else
+		ctx->commit_lsn = lsn;
+	spin_unlock(&cil->xc_push_lock);
+}
+
 /*
  * Ensure that the order of log writes follows checkpoint sequence order. This
  * relies on the context LSN being zero until the log write has guaranteed the
diff --git a/fs/xfs/xfs_log_priv.h b/fs/xfs/xfs_log_priv.h
index af8a9dfa8068..849ba2eb3483 100644
--- a/fs/xfs/xfs_log_priv.h
+++ b/fs/xfs/xfs_log_priv.h
@@ -563,6 +563,8 @@ void	xlog_cil_destroy(struct xlog *log);
 bool	xlog_cil_empty(struct xlog *log);
 void	xlog_cil_commit(struct xlog *log, struct xfs_trans *tp,
 			xfs_csn_t *commit_seq, bool regrant);
+void	xlog_cil_set_ctx_write_state(struct xfs_cil_ctx *ctx,
+			struct xlog_in_core *iclog);
 
 /*
  * CIL force routines
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH 7/8] xfs: attached iclog callbacks in xlog_cil_set_ctx_write_state()
  2021-06-17  8:26 [PATCH 0/8 V2] xfs: log fixes for for-next Dave Chinner
                   ` (5 preceding siblings ...)
  2021-06-17  8:26 ` [PATCH 6/8] xfs: separate out setting CIL context LSNs from xlog_write Dave Chinner
@ 2021-06-17  8:26 ` Dave Chinner
  2021-06-17 20:55   ` Darrick J. Wong
  2021-06-17  8:26 ` [PATCH 8/8] xfs: order CIL checkpoint start records Dave Chinner
                   ` (2 subsequent siblings)
  9 siblings, 1 reply; 50+ messages in thread
From: Dave Chinner @ 2021-06-17  8:26 UTC (permalink / raw)
  To: linux-xfs

From: Dave Chinner <dchinner@redhat.com>

We currently attach iclog callbacks for the CIL when the commit
iclog is returned from xlog_write. Because
xlog_state_get_iclog_space() always guarantees that the commit
record will fit in the iclog it returns, we can move this IO
callback setting to xlog_cil_set_ctx_write_state(), record the
commit iclog in the context and remove the need for the commit iclog
to be returned by xlog_write() altogether.


Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/xfs_log.c      |  8 ++----
 fs/xfs/xfs_log_cil.c  | 65 +++++++++++++++++++++++++------------------
 fs/xfs/xfs_log_priv.h |  3 +-
 3 files changed, 42 insertions(+), 34 deletions(-)

diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c
index 1c214b395223..359246d54db7 100644
--- a/fs/xfs/xfs_log.c
+++ b/fs/xfs/xfs_log.c
@@ -871,7 +871,7 @@ xlog_write_unmount_record(
 	 */
 	if (log->l_targ != log->l_mp->m_ddev_targp)
 		blkdev_issue_flush(log->l_targ->bt_bdev);
-	return xlog_write(log, NULL, &lv_chain, ticket, NULL, reg.i_len);
+	return xlog_write(log, NULL, &lv_chain, ticket, reg.i_len);
 }
 
 /*
@@ -2386,7 +2386,6 @@ xlog_write(
 	struct xfs_cil_ctx	*ctx,
 	struct list_head	*lv_chain,
 	struct xlog_ticket	*ticket,
-	struct xlog_in_core	**commit_iclog,
 	uint32_t		len)
 {
 	struct xlog_in_core	*iclog = NULL;
@@ -2436,10 +2435,7 @@ xlog_write(
 	 */
 	spin_lock(&log->l_icloglock);
 	xlog_state_finish_copy(log, iclog, record_cnt, 0);
-	if (commit_iclog)
-		*commit_iclog = iclog;
-	else
-		error = xlog_state_release_iclog(log, iclog, ticket);
+	error = xlog_state_release_iclog(log, iclog, ticket);
 	spin_unlock(&log->l_icloglock);
 
 	return error;
diff --git a/fs/xfs/xfs_log_cil.c b/fs/xfs/xfs_log_cil.c
index 2d8d904ffb78..87e30917ce2e 100644
--- a/fs/xfs/xfs_log_cil.c
+++ b/fs/xfs/xfs_log_cil.c
@@ -799,11 +799,34 @@ xlog_cil_set_ctx_write_state(
 
 	ASSERT(!ctx->commit_lsn);
 	spin_lock(&cil->xc_push_lock);
-	if (!ctx->start_lsn)
+	if (!ctx->start_lsn) {
 		ctx->start_lsn = lsn;
-	else
-		ctx->commit_lsn = lsn;
+		spin_unlock(&cil->xc_push_lock);
+		return;
+	}
+
+	/*
+	 * Take a reference to the iclog for the context so that we still hold
+	 * it when xlog_write is done and has released it. This means the
+	 * context controls when the iclog is released for IO.
+	 */
+	atomic_inc(&iclog->ic_refcnt);
+	ctx->commit_iclog = iclog;
+	ctx->commit_lsn = lsn;
 	spin_unlock(&cil->xc_push_lock);
+
+	/*
+	 * xlog_state_get_iclog_space() guarantees there is enough space in the
+	 * iclog for an entire commit record, so attach the context callbacks to
+	 * the iclog at this time if we are not already in a shutdown state.
+	 */
+	spin_lock(&iclog->ic_callback_lock);
+	if (iclog->ic_state == XLOG_STATE_IOERROR) {
+		spin_unlock(&iclog->ic_callback_lock);
+		return;
+	}
+	list_add_tail(&ctx->iclog_entry, &iclog->ic_callbacks);
+	spin_unlock(&iclog->ic_callback_lock);
 }
 
 /*
@@ -858,8 +881,7 @@ xlog_cil_order_write(
  */
 int
 xlog_cil_write_commit_record(
-	struct xfs_cil_ctx	*ctx,
-	struct xlog_in_core	**iclog)
+	struct xfs_cil_ctx	*ctx)
 {
 	struct xlog		*log = ctx->cil->xc_log;
 	struct xlog_op_header	ophdr = {
@@ -890,7 +912,7 @@ xlog_cil_write_commit_record(
 
 	/* account for space used by record data */
 	ctx->ticket->t_curr_res -= reg.i_len;
-	error = xlog_write(log, ctx, &lv_chain, ctx->ticket, iclog, reg.i_len);
+	error = xlog_write(log, ctx, &lv_chain, ctx->ticket, reg.i_len);
 	if (error)
 		xfs_force_shutdown(log->l_mp, SHUTDOWN_LOG_IO_ERROR);
 	return error;
@@ -940,7 +962,6 @@ xlog_cil_push_work(
 	struct xlog		*log = cil->xc_log;
 	struct xfs_log_vec	*lv;
 	struct xfs_cil_ctx	*new_ctx;
-	struct xlog_in_core	*commit_iclog;
 	int			num_iovecs = 0;
 	int			num_bytes = 0;
 	int			error = 0;
@@ -1109,8 +1130,7 @@ xlog_cil_push_work(
 	 * use the commit record lsn then we can move the tail beyond the grant
 	 * write head.
 	 */
-	error = xlog_write(log, ctx, &ctx->lv_chain, ctx->ticket,
-				NULL, num_bytes);
+	error = xlog_write(log, ctx, &ctx->lv_chain, ctx->ticket, num_bytes);
 
 	/*
 	 * Take the lvhdr back off the lv_chain as it should not be passed
@@ -1120,20 +1140,10 @@ xlog_cil_push_work(
 	if (error)
 		goto out_abort_free_ticket;
 
-	error = xlog_cil_write_commit_record(ctx, &commit_iclog);
+	error = xlog_cil_write_commit_record(ctx);
 	if (error)
 		goto out_abort_free_ticket;
 
-	spin_lock(&commit_iclog->ic_callback_lock);
-	if (commit_iclog->ic_state == XLOG_STATE_IOERROR) {
-		spin_unlock(&commit_iclog->ic_callback_lock);
-		goto out_abort_free_ticket;
-	}
-	ASSERT_ALWAYS(commit_iclog->ic_state == XLOG_STATE_ACTIVE ||
-		      commit_iclog->ic_state == XLOG_STATE_WANT_SYNC);
-	list_add_tail(&ctx->iclog_entry, &commit_iclog->ic_callbacks);
-	spin_unlock(&commit_iclog->ic_callback_lock);
-
 	/*
 	 * now the checkpoint commit is complete and we've attached the
 	 * callbacks to the iclog we can assign the commit LSN to the context
@@ -1168,8 +1178,8 @@ xlog_cil_push_work(
 	if (ctx->start_lsn != commit_lsn) {
 		struct xlog_in_core	*iclog;
 
-		for (iclog = commit_iclog->ic_prev;
-		     iclog != commit_iclog;
+		for (iclog = ctx->commit_iclog->ic_prev;
+		     iclog != ctx->commit_iclog;
 		     iclog = iclog->ic_prev) {
 			xfs_lsn_t	hlsn;
 
@@ -1201,7 +1211,7 @@ xlog_cil_push_work(
 		 * ordering for this checkpoint is correctly preserved down to
 		 * stable storage.
 		 */
-		commit_iclog->ic_flags |= XLOG_ICL_NEED_FLUSH;
+		ctx->commit_iclog->ic_flags |= XLOG_ICL_NEED_FLUSH;
 	}
 
 	/*
@@ -1214,10 +1224,11 @@ xlog_cil_push_work(
 	 * will be written when released, switch it's state to WANT_SYNC right
 	 * now.
 	 */
-	commit_iclog->ic_flags |= XLOG_ICL_NEED_FUA;
-	if (push_commit_stable && commit_iclog->ic_state == XLOG_STATE_ACTIVE)
-		xlog_state_switch_iclogs(log, commit_iclog, 0);
-	xlog_state_release_iclog(log, commit_iclog, ticket);
+	ctx->commit_iclog->ic_flags |= XLOG_ICL_NEED_FUA;
+	if (push_commit_stable &&
+	    ctx->commit_iclog->ic_state == XLOG_STATE_ACTIVE)
+		xlog_state_switch_iclogs(log, ctx->commit_iclog, 0);
+	xlog_state_release_iclog(log, ctx->commit_iclog, ticket);
 	spin_unlock(&log->l_icloglock);
 
 	xfs_log_ticket_ungrant(log, ticket);
diff --git a/fs/xfs/xfs_log_priv.h b/fs/xfs/xfs_log_priv.h
index 849ba2eb3483..72dfa3b89513 100644
--- a/fs/xfs/xfs_log_priv.h
+++ b/fs/xfs/xfs_log_priv.h
@@ -237,6 +237,7 @@ struct xfs_cil_ctx {
 	struct work_struct	discard_endio_work;
 	struct work_struct	push_work;
 	atomic_t		order_id;
+	struct xlog_in_core	*commit_iclog;
 };
 
 /*
@@ -489,7 +490,7 @@ void	xlog_print_tic_res(struct xfs_mount *mp, struct xlog_ticket *ticket);
 void	xlog_print_trans(struct xfs_trans *);
 int	xlog_write(struct xlog *log, struct xfs_cil_ctx *ctx,
 		struct list_head *lv_chain, struct xlog_ticket *tic,
-		struct xlog_in_core **commit_iclog, uint32_t len);
+		uint32_t len);
 
 void	xfs_log_ticket_ungrant(struct xlog *log, struct xlog_ticket *ticket);
 void	xfs_log_ticket_regrant(struct xlog *log, struct xlog_ticket *ticket);
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH 8/8] xfs: order CIL checkpoint start records
  2021-06-17  8:26 [PATCH 0/8 V2] xfs: log fixes for for-next Dave Chinner
                   ` (6 preceding siblings ...)
  2021-06-17  8:26 ` [PATCH 7/8] xfs: attached iclog callbacks in xlog_cil_set_ctx_write_state() Dave Chinner
@ 2021-06-17  8:26 ` Dave Chinner
  2021-06-17 21:31   ` Darrick J. Wong
  2021-06-17 18:32 ` [PATCH 0/8 V2] xfs: log fixes for for-next Brian Foster
  2021-06-18 22:48 ` Dave Chinner
  9 siblings, 1 reply; 50+ messages in thread
From: Dave Chinner @ 2021-06-17  8:26 UTC (permalink / raw)
  To: linux-xfs

From: Dave Chinner <dchinner@redhat.com>

Because log recovery depends on strictly ordered start records as
well as strictly ordered commit records.

This is a zero day bug in the way XFS writes pipelined transactions
to the journal which is exposed by commit facd77e4e38b ("xfs: CIL
work is serialised, not pipelined") which re-introduces explicit
concurrent commits back into the on-disk journal.

The XFS journal commit code has never ordered start records and we
have relied on strict commit record ordering for correct recovery
ordering of concurrently written transactions. Unfortunately, root
cause analysis uncovered the fact that log recovery uses the LSN of
the start record for transaction commit processing. Hence the
commits are processed in strict orderi by recovery, but the LSNs
associated with the commits can be out of order and so recovery may
stamp incorrect LSNs into objects and/or misorder intents in the AIL
for later processing. This can result in log recovery failures
and/or on disk corruption, sometimes silent.

Because this is a long standing log recovery issue, we can't just
fix log recovery and call it good. This still leaves older kernels
susceptible to recovery failures and corruption when replaying a log
from a kernel that pipelines checkpoints. There is also the issue
that in-memory ordering for AIL pushing and data integrity
operations are based on checkpoint start LSNs, and if the start LSN
is incorrect in the journal, it is also incorrect in memory.

Hence there's really only one choice for fixing this zero-day bug:
we need to strictly order checkpoint start records in ascending
sequence order in the log, the same way we already strictly order
commit records.

Fixes: facd77e4e38b ("xfs: CIL work is serialised, not pipelined")
Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/xfs_log.c      |   1 +
 fs/xfs/xfs_log_cil.c  | 101 +++++++++++++++++++++++++++++-------------
 fs/xfs/xfs_log_priv.h |   1 +
 3 files changed, 71 insertions(+), 32 deletions(-)

diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c
index 359246d54db7..94b6bccb9de9 100644
--- a/fs/xfs/xfs_log.c
+++ b/fs/xfs/xfs_log.c
@@ -3743,6 +3743,7 @@ xfs_log_force_umount(
 	 * avoid races.
 	 */
 	spin_lock(&log->l_cilp->xc_push_lock);
+	wake_up_all(&log->l_cilp->xc_start_wait);
 	wake_up_all(&log->l_cilp->xc_commit_wait);
 	spin_unlock(&log->l_cilp->xc_push_lock);
 	xlog_state_do_callback(log);
diff --git a/fs/xfs/xfs_log_cil.c b/fs/xfs/xfs_log_cil.c
index 87e30917ce2e..722c21f21b81 100644
--- a/fs/xfs/xfs_log_cil.c
+++ b/fs/xfs/xfs_log_cil.c
@@ -684,6 +684,7 @@ xlog_cil_committed(
 	 */
 	if (abort) {
 		spin_lock(&ctx->cil->xc_push_lock);
+		wake_up_all(&ctx->cil->xc_start_wait);
 		wake_up_all(&ctx->cil->xc_commit_wait);
 		spin_unlock(&ctx->cil->xc_push_lock);
 	}
@@ -788,6 +789,10 @@ xlog_cil_build_trans_hdr(
  * If the context doesn't have a start_lsn recorded, then this iclog will
  * contain the start record for the checkpoint. Otherwise this write contains
  * the commit record for the checkpoint.
+ *
+ * Once we've set the LSN for the given operation, wake up any ordered write
+ * waiters that can make progress now that we have a stable LSN for write
+ * ordering purposes.
  */
 void
 xlog_cil_set_ctx_write_state(
@@ -798,9 +803,16 @@ xlog_cil_set_ctx_write_state(
 	xfs_lsn_t		lsn = be64_to_cpu(iclog->ic_header.h_lsn);
 
 	ASSERT(!ctx->commit_lsn);
-	spin_lock(&cil->xc_push_lock);
 	if (!ctx->start_lsn) {
+		spin_lock(&cil->xc_push_lock);
+		/*
+		 * The LSN we need to pass to the log items on transaction
+		 * commit is the LSN reported by the first log vector write, not
+		 * the commit lsn. If we use the commit record lsn then we can
+		 * move the tail beyond the grant write head.
+		 */
 		ctx->start_lsn = lsn;
+		wake_up_all(&cil->xc_start_wait);
 		spin_unlock(&cil->xc_push_lock);
 		return;
 	}
@@ -811,9 +823,6 @@ xlog_cil_set_ctx_write_state(
 	 * context controls when the iclog is released for IO.
 	 */
 	atomic_inc(&iclog->ic_refcnt);
-	ctx->commit_iclog = iclog;
-	ctx->commit_lsn = lsn;
-	spin_unlock(&cil->xc_push_lock);
 
 	/*
 	 * xlog_state_get_iclog_space() guarantees there is enough space in the
@@ -827,6 +836,12 @@ xlog_cil_set_ctx_write_state(
 	}
 	list_add_tail(&ctx->iclog_entry, &iclog->ic_callbacks);
 	spin_unlock(&iclog->ic_callback_lock);
+
+	spin_lock(&cil->xc_push_lock);
+	ctx->commit_iclog = iclog;
+	ctx->commit_lsn = lsn;
+	wake_up_all(&cil->xc_commit_wait);
+	spin_unlock(&cil->xc_push_lock);
 }
 
 /*
@@ -834,10 +849,16 @@ xlog_cil_set_ctx_write_state(
  * relies on the context LSN being zero until the log write has guaranteed the
  * LSN that the log write will start at via xlog_state_get_iclog_space().
  */
+enum {
+	_START_RECORD,
+	_COMMIT_RECORD,
+};
+
 static int
 xlog_cil_order_write(
 	struct xfs_cil		*cil,
-	xfs_csn_t		sequence)
+	xfs_csn_t		sequence,
+	int			record)
 {
 	struct xfs_cil_ctx	*ctx;
 
@@ -860,19 +881,50 @@ xlog_cil_order_write(
 		 */
 		if (ctx->sequence >= sequence)
 			continue;
-		if (!ctx->commit_lsn) {
-			/*
-			 * It is still being pushed! Wait for the push to
-			 * complete, then start again from the beginning.
-			 */
-			xlog_wait(&cil->xc_commit_wait, &cil->xc_push_lock);
-			goto restart;
+
+		/* Wait until the LSN for the record has been recorded. */
+		switch (record) {
+		case _START_RECORD:
+			if (!ctx->start_lsn) {
+				xlog_wait(&cil->xc_start_wait, &cil->xc_push_lock);
+				goto restart;
+			}
+			break;
+		case _COMMIT_RECORD:
+			if (!ctx->commit_lsn) {
+				xlog_wait(&cil->xc_commit_wait, &cil->xc_push_lock);
+				goto restart;
+			}
+			break;
+		default:
+			ASSERT(0);
+			break;
 		}
 	}
 	spin_unlock(&cil->xc_push_lock);
 	return 0;
 }
 
+/*
+ * Write out the log vector change now attached to the CIL context. This will
+ * write a start record that needs to be strictly ordered in ascending CIL
+ * sequence order so that log recovery will always use in-order start LSNs when
+ * replaying checkpoints.
+ */
+static int
+xlog_cil_write_chain(
+	struct xfs_cil_ctx	*ctx,
+	uint32_t		num_bytes)
+{
+	struct xlog		*log = ctx->cil->xc_log;
+	int			error;
+
+	error = xlog_cil_order_write(ctx->cil, ctx->sequence, _START_RECORD);
+	if (error)
+		return error;
+	return xlog_write(log, ctx, &ctx->lv_chain, ctx->ticket, num_bytes);
+}
+
 /*
  * Write out the commit record of a checkpoint transaction to close off a
  * running log write. These commit records are strictly ordered in ascending CIL
@@ -906,7 +958,7 @@ xlog_cil_write_commit_record(
 	if (XLOG_FORCED_SHUTDOWN(log))
 		return -EIO;
 
-	error = xlog_cil_order_write(ctx->cil, ctx->sequence);
+	error = xlog_cil_order_write(ctx->cil, ctx->sequence, _COMMIT_RECORD);
 	if (error)
 		return error;
 
@@ -1125,17 +1177,10 @@ xlog_cil_push_work(
 	wait_for_completion(&bdev_flush);
 
 	/*
-	 * The LSN we need to pass to the log items on transaction commit is the
-	 * LSN reported by the first log vector write, not the commit lsn. If we
-	 * use the commit record lsn then we can move the tail beyond the grant
-	 * write head.
-	 */
-	error = xlog_write(log, ctx, &ctx->lv_chain, ctx->ticket, num_bytes);
-
-	/*
-	 * Take the lvhdr back off the lv_chain as it should not be passed
-	 * to log IO completion.
+	 * Once we write the log vector chain, take the lvhdr back off it as it
+	 * must not be passed to log IO completion.
 	 */
+	error = xlog_cil_write_chain(ctx, num_bytes);
 	list_del(&lvhdr.lv_list);
 	if (error)
 		goto out_abort_free_ticket;
@@ -1144,15 +1189,6 @@ xlog_cil_push_work(
 	if (error)
 		goto out_abort_free_ticket;
 
-	/*
-	 * now the checkpoint commit is complete and we've attached the
-	 * callbacks to the iclog we can assign the commit LSN to the context
-	 * and wake up anyone who is waiting for the commit to complete.
-	 */
-	spin_lock(&cil->xc_push_lock);
-	wake_up_all(&cil->xc_commit_wait);
-	spin_unlock(&cil->xc_push_lock);
-
 	/*
 	 * Pull the ticket off the ctx so we can ungrant it after releasing the
 	 * commit_iclog. The ctx may be freed by the time we return from
@@ -1728,6 +1764,7 @@ xlog_cil_init(
 	init_waitqueue_head(&cil->xc_push_wait);
 	init_rwsem(&cil->xc_ctx_lock);
 	init_waitqueue_head(&cil->xc_commit_wait);
+	init_waitqueue_head(&cil->xc_start_wait);
 	log->l_cilp = cil;
 
 	ctx = xlog_cil_ctx_alloc();
diff --git a/fs/xfs/xfs_log_priv.h b/fs/xfs/xfs_log_priv.h
index 72dfa3b89513..b807a179b916 100644
--- a/fs/xfs/xfs_log_priv.h
+++ b/fs/xfs/xfs_log_priv.h
@@ -279,6 +279,7 @@ struct xfs_cil {
 	bool			xc_push_commit_stable;
 	struct list_head	xc_committing;
 	wait_queue_head_t	xc_commit_wait;
+	wait_queue_head_t	xc_start_wait;
 	xfs_csn_t		xc_current_sequence;
 	wait_queue_head_t	xc_push_wait;	/* background push throttle */
 
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* Re: [PATCH 3/8] xfs: move xlog_commit_record to xfs_log_cil.c
  2021-06-17  8:26 ` [PATCH 3/8] xfs: move xlog_commit_record to xfs_log_cil.c Dave Chinner
@ 2021-06-17 12:57     ` kernel test robot
  2021-06-17 17:50   ` Darrick J. Wong
  2021-06-18 14:16   ` Christoph Hellwig
  2 siblings, 0 replies; 50+ messages in thread
From: kernel test robot @ 2021-06-17 12:57 UTC (permalink / raw)
  To: Dave Chinner, linux-xfs; +Cc: kbuild-all, clang-built-linux

[-- Attachment #1: Type: text/plain, Size: 3475 bytes --]

Hi Dave,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on xfs-linux/for-next]
[also build test WARNING on next-20210616]
[cannot apply to v5.13-rc6]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Dave-Chinner/xfs-log-fixes-for-for-next/20210617-162640
base:   https://git.kernel.org/pub/scm/fs/xfs/xfs-linux.git for-next
config: x86_64-randconfig-a011-20210617 (attached as .config)
compiler: clang version 13.0.0 (https://github.com/llvm/llvm-project 64720f57bea6a6bf033feef4a5751ab9c0c3b401)
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # install x86_64 cross compiling tool for clang build
        # apt-get install binutils-x86-64-linux-gnu
        # https://github.com/0day-ci/linux/commit/8634f301cb32bdc5ebbfcf0671509ca5fa857edd
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Dave-Chinner/xfs-log-fixes-for-for-next/20210617-162640
        git checkout 8634f301cb32bdc5ebbfcf0671509ca5fa857edd
        # save the attached .config to linux build tree
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross ARCH=x86_64 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All warnings (new ones prefixed by >>):

>> fs/xfs/xfs_log_cil.c:792:1: warning: no previous prototype for function 'xlog_cil_write_commit_record' [-Wmissing-prototypes]
   xlog_cil_write_commit_record(
   ^
   fs/xfs/xfs_log_cil.c:791:1: note: declare 'static' if the function is not intended to be used outside of this translation unit
   int
   ^
   static 
   1 warning generated.


vim +/xlog_cil_write_commit_record +792 fs/xfs/xfs_log_cil.c

   785	
   786	/*
   787	 * Write out the commit record of a checkpoint transaction associated with the
   788	 * given ticket to close off a running log write. Return the lsn of the commit
   789	 * record.
   790	 */
   791	int
 > 792	xlog_cil_write_commit_record(
   793		struct xlog		*log,
   794		struct xlog_ticket	*ticket,
   795		struct xlog_in_core	**iclog,
   796		xfs_lsn_t		*lsn)
   797	{
   798		struct xlog_op_header	ophdr = {
   799			.oh_clientid = XFS_TRANSACTION,
   800			.oh_tid = cpu_to_be32(ticket->t_tid),
   801			.oh_flags = XLOG_COMMIT_TRANS,
   802		};
   803		struct xfs_log_iovec reg = {
   804			.i_addr = &ophdr,
   805			.i_len = sizeof(struct xlog_op_header),
   806			.i_type = XLOG_REG_TYPE_COMMIT,
   807		};
   808		struct xfs_log_vec vec = {
   809			.lv_niovecs = 1,
   810			.lv_iovecp = &reg,
   811		};
   812		int	error;
   813		LIST_HEAD(lv_chain);
   814		INIT_LIST_HEAD(&vec.lv_list);
   815		list_add(&vec.lv_list, &lv_chain);
   816	
   817		if (XLOG_FORCED_SHUTDOWN(log))
   818			return -EIO;
   819	
   820		/* account for space used by record data */
   821		ticket->t_curr_res -= reg.i_len;
   822		error = xlog_write(log, &lv_chain, ticket, lsn, iclog, reg.i_len);
   823		if (error)
   824			xfs_force_shutdown(log->l_mp, SHUTDOWN_LOG_IO_ERROR);
   825		return error;
   826	}
   827	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 34451 bytes --]

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 3/8] xfs: move xlog_commit_record to xfs_log_cil.c
@ 2021-06-17 12:57     ` kernel test robot
  0 siblings, 0 replies; 50+ messages in thread
From: kernel test robot @ 2021-06-17 12:57 UTC (permalink / raw)
  To: kbuild-all

[-- Attachment #1: Type: text/plain, Size: 3568 bytes --]

Hi Dave,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on xfs-linux/for-next]
[also build test WARNING on next-20210616]
[cannot apply to v5.13-rc6]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Dave-Chinner/xfs-log-fixes-for-for-next/20210617-162640
base:   https://git.kernel.org/pub/scm/fs/xfs/xfs-linux.git for-next
config: x86_64-randconfig-a011-20210617 (attached as .config)
compiler: clang version 13.0.0 (https://github.com/llvm/llvm-project 64720f57bea6a6bf033feef4a5751ab9c0c3b401)
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # install x86_64 cross compiling tool for clang build
        # apt-get install binutils-x86-64-linux-gnu
        # https://github.com/0day-ci/linux/commit/8634f301cb32bdc5ebbfcf0671509ca5fa857edd
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Dave-Chinner/xfs-log-fixes-for-for-next/20210617-162640
        git checkout 8634f301cb32bdc5ebbfcf0671509ca5fa857edd
        # save the attached .config to linux build tree
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross ARCH=x86_64 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All warnings (new ones prefixed by >>):

>> fs/xfs/xfs_log_cil.c:792:1: warning: no previous prototype for function 'xlog_cil_write_commit_record' [-Wmissing-prototypes]
   xlog_cil_write_commit_record(
   ^
   fs/xfs/xfs_log_cil.c:791:1: note: declare 'static' if the function is not intended to be used outside of this translation unit
   int
   ^
   static 
   1 warning generated.


vim +/xlog_cil_write_commit_record +792 fs/xfs/xfs_log_cil.c

   785	
   786	/*
   787	 * Write out the commit record of a checkpoint transaction associated with the
   788	 * given ticket to close off a running log write. Return the lsn of the commit
   789	 * record.
   790	 */
   791	int
 > 792	xlog_cil_write_commit_record(
   793		struct xlog		*log,
   794		struct xlog_ticket	*ticket,
   795		struct xlog_in_core	**iclog,
   796		xfs_lsn_t		*lsn)
   797	{
   798		struct xlog_op_header	ophdr = {
   799			.oh_clientid = XFS_TRANSACTION,
   800			.oh_tid = cpu_to_be32(ticket->t_tid),
   801			.oh_flags = XLOG_COMMIT_TRANS,
   802		};
   803		struct xfs_log_iovec reg = {
   804			.i_addr = &ophdr,
   805			.i_len = sizeof(struct xlog_op_header),
   806			.i_type = XLOG_REG_TYPE_COMMIT,
   807		};
   808		struct xfs_log_vec vec = {
   809			.lv_niovecs = 1,
   810			.lv_iovecp = &reg,
   811		};
   812		int	error;
   813		LIST_HEAD(lv_chain);
   814		INIT_LIST_HEAD(&vec.lv_list);
   815		list_add(&vec.lv_list, &lv_chain);
   816	
   817		if (XLOG_FORCED_SHUTDOWN(log))
   818			return -EIO;
   819	
   820		/* account for space used by record data */
   821		ticket->t_curr_res -= reg.i_len;
   822		error = xlog_write(log, &lv_chain, ticket, lsn, iclog, reg.i_len);
   823		if (error)
   824			xfs_force_shutdown(log->l_mp, SHUTDOWN_LOG_IO_ERROR);
   825		return error;
   826	}
   827	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all(a)lists.01.org

[-- Attachment #2: config.gz --]
[-- Type: application/gzip, Size: 34451 bytes --]

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 4/8] xfs: pass a CIL context to xlog_write()
  2021-06-17  8:26 ` [PATCH 4/8] xfs: pass a CIL context to xlog_write() Dave Chinner
@ 2021-06-17 14:46     ` kernel test robot
  2021-06-17 20:24   ` Darrick J. Wong
  2021-06-18 14:23   ` Christoph Hellwig
  2 siblings, 0 replies; 50+ messages in thread
From: kernel test robot @ 2021-06-17 14:46 UTC (permalink / raw)
  To: Dave Chinner, linux-xfs; +Cc: kbuild-all, clang-built-linux

[-- Attachment #1: Type: text/plain, Size: 33372 bytes --]

Hi Dave,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on xfs-linux/for-next]
[also build test WARNING on next-20210617]
[cannot apply to v5.13-rc6]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Dave-Chinner/xfs-log-fixes-for-for-next/20210617-162640
base:   https://git.kernel.org/pub/scm/fs/xfs/xfs-linux.git for-next
config: x86_64-randconfig-a011-20210617 (attached as .config)
compiler: clang version 13.0.0 (https://github.com/llvm/llvm-project 64720f57bea6a6bf033feef4a5751ab9c0c3b401)
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # install x86_64 cross compiling tool for clang build
        # apt-get install binutils-x86-64-linux-gnu
        # https://github.com/0day-ci/linux/commit/fc3370002b56bcb25440b96ef5099f508c48360e
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Dave-Chinner/xfs-log-fixes-for-for-next/20210617-162640
        git checkout fc3370002b56bcb25440b96ef5099f508c48360e
        # save the attached .config to linux build tree
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross ARCH=x86_64 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All warnings (new ones prefixed by >>):

   fs/xfs/xfs_log_cil.c:792:1: warning: no previous prototype for function 'xlog_cil_write_commit_record' [-Wmissing-prototypes]
   xlog_cil_write_commit_record(
   ^
   fs/xfs/xfs_log_cil.c:791:1: note: declare 'static' if the function is not intended to be used outside of this translation unit
   int
   ^
   static 
>> fs/xfs/xfs_log_cil.c:1130:24: warning: variable 'commit_lsn' is uninitialized when used here [-Wuninitialized]
           if (ctx->start_lsn != commit_lsn) {
                                 ^~~~~~~~~~
   fs/xfs/xfs_log_cil.c:877:23: note: initialize the variable 'commit_lsn' to silence this warning
           xfs_lsn_t               commit_lsn;
                                             ^
                                              = 0
   2 warnings generated.


vim +/commit_lsn +1130 fs/xfs/xfs_log_cil.c

be05dd0e68ac999 Dave Chinner      2021-06-08   846  
71e330b593905e4 Dave Chinner      2010-05-21   847  /*
c7cc296ddd1f6d1 Christoph Hellwig 2020-03-20   848   * Push the Committed Item List to the log.
c7cc296ddd1f6d1 Christoph Hellwig 2020-03-20   849   *
c7cc296ddd1f6d1 Christoph Hellwig 2020-03-20   850   * If the current sequence is the same as xc_push_seq we need to do a flush. If
c7cc296ddd1f6d1 Christoph Hellwig 2020-03-20   851   * xc_push_seq is less than the current sequence, then it has already been
a44f13edf0ebb4e Dave Chinner      2010-08-24   852   * flushed and we don't need to do anything - the caller will wait for it to
a44f13edf0ebb4e Dave Chinner      2010-08-24   853   * complete if necessary.
a44f13edf0ebb4e Dave Chinner      2010-08-24   854   *
c7cc296ddd1f6d1 Christoph Hellwig 2020-03-20   855   * xc_push_seq is checked unlocked against the sequence number for a match.
c7cc296ddd1f6d1 Christoph Hellwig 2020-03-20   856   * Hence we can allow log forces to run racily and not issue pushes for the
c7cc296ddd1f6d1 Christoph Hellwig 2020-03-20   857   * same sequence twice.  If we get a race between multiple pushes for the same
c7cc296ddd1f6d1 Christoph Hellwig 2020-03-20   858   * sequence they will block on the first one and then abort, hence avoiding
c7cc296ddd1f6d1 Christoph Hellwig 2020-03-20   859   * needless pushes.
c7cc296ddd1f6d1 Christoph Hellwig 2020-03-20   860   */
c7cc296ddd1f6d1 Christoph Hellwig 2020-03-20   861  static void
c7cc296ddd1f6d1 Christoph Hellwig 2020-03-20   862  xlog_cil_push_work(
c7cc296ddd1f6d1 Christoph Hellwig 2020-03-20   863  	struct work_struct	*work)
71e330b593905e4 Dave Chinner      2010-05-21   864  {
facd77e4e38b8f0 Dave Chinner      2021-06-04   865  	struct xfs_cil_ctx	*ctx =
facd77e4e38b8f0 Dave Chinner      2021-06-04   866  		container_of(work, struct xfs_cil_ctx, push_work);
facd77e4e38b8f0 Dave Chinner      2021-06-04   867  	struct xfs_cil		*cil = ctx->cil;
c7cc296ddd1f6d1 Christoph Hellwig 2020-03-20   868  	struct xlog		*log = cil->xc_log;
71e330b593905e4 Dave Chinner      2010-05-21   869  	struct xfs_log_vec	*lv;
71e330b593905e4 Dave Chinner      2010-05-21   870  	struct xfs_cil_ctx	*new_ctx;
71e330b593905e4 Dave Chinner      2010-05-21   871  	struct xlog_in_core	*commit_iclog;
66fc9ffa8638be2 Dave Chinner      2021-06-04   872  	int			num_iovecs = 0;
66fc9ffa8638be2 Dave Chinner      2021-06-04   873  	int			num_bytes = 0;
71e330b593905e4 Dave Chinner      2010-05-21   874  	int			error = 0;
877cf3473914ae4 Dave Chinner      2021-06-04   875  	struct xlog_cil_trans_hdr thdr;
a47518453bf9581 Dave Chinner      2021-06-08   876  	struct xfs_log_vec	lvhdr = {};
71e330b593905e4 Dave Chinner      2010-05-21   877  	xfs_lsn_t		commit_lsn;
4c2d542f2e78653 Dave Chinner      2012-04-23   878  	xfs_lsn_t		push_seq;
0279bbbbc03f2ce Dave Chinner      2021-06-03   879  	struct bio		bio;
0279bbbbc03f2ce Dave Chinner      2021-06-03   880  	DECLARE_COMPLETION_ONSTACK(bdev_flush);
e12213ba5d909a3 Dave Chinner      2021-06-04   881  	bool			push_commit_stable;
e469cbe84f4ade9 Dave Chinner      2021-06-08   882  	struct xlog_ticket	*ticket;
71e330b593905e4 Dave Chinner      2010-05-21   883  
facd77e4e38b8f0 Dave Chinner      2021-06-04   884  	new_ctx = xlog_cil_ctx_alloc();
71e330b593905e4 Dave Chinner      2010-05-21   885  	new_ctx->ticket = xlog_cil_ticket_alloc(log);
71e330b593905e4 Dave Chinner      2010-05-21   886  
71e330b593905e4 Dave Chinner      2010-05-21   887  	down_write(&cil->xc_ctx_lock);
71e330b593905e4 Dave Chinner      2010-05-21   888  
4bb928cdb900d06 Dave Chinner      2013-08-12   889  	spin_lock(&cil->xc_push_lock);
4c2d542f2e78653 Dave Chinner      2012-04-23   890  	push_seq = cil->xc_push_seq;
4c2d542f2e78653 Dave Chinner      2012-04-23   891  	ASSERT(push_seq <= ctx->sequence);
e12213ba5d909a3 Dave Chinner      2021-06-04   892  	push_commit_stable = cil->xc_push_commit_stable;
e12213ba5d909a3 Dave Chinner      2021-06-04   893  	cil->xc_push_commit_stable = false;
71e330b593905e4 Dave Chinner      2010-05-21   894  
0e7ab7efe77451c Dave Chinner      2020-03-24   895  	/*
3682277520d6f4a Dave Chinner      2021-06-04   896  	 * As we are about to switch to a new, empty CIL context, we no longer
3682277520d6f4a Dave Chinner      2021-06-04   897  	 * need to throttle tasks on CIL space overruns. Wake any waiters that
3682277520d6f4a Dave Chinner      2021-06-04   898  	 * the hard push throttle may have caught so they can start committing
3682277520d6f4a Dave Chinner      2021-06-04   899  	 * to the new context. The ctx->xc_push_lock provides the serialisation
3682277520d6f4a Dave Chinner      2021-06-04   900  	 * necessary for safely using the lockless waitqueue_active() check in
3682277520d6f4a Dave Chinner      2021-06-04   901  	 * this context.
3682277520d6f4a Dave Chinner      2021-06-04   902  	 */
3682277520d6f4a Dave Chinner      2021-06-04   903  	if (waitqueue_active(&cil->xc_push_wait))
c7f87f3984cfa1e Dave Chinner      2020-06-16   904  		wake_up_all(&cil->xc_push_wait);
0e7ab7efe77451c Dave Chinner      2020-03-24   905  
4c2d542f2e78653 Dave Chinner      2012-04-23   906  	/*
4c2d542f2e78653 Dave Chinner      2012-04-23   907  	 * Check if we've anything to push. If there is nothing, then we don't
4c2d542f2e78653 Dave Chinner      2012-04-23   908  	 * move on to a new sequence number and so we have to be able to push
4c2d542f2e78653 Dave Chinner      2012-04-23   909  	 * this sequence again later.
4c2d542f2e78653 Dave Chinner      2012-04-23   910  	 */
0d11bae4bcf4aa9 Dave Chinner      2021-06-04   911  	if (test_bit(XLOG_CIL_EMPTY, &cil->xc_flags)) {
4c2d542f2e78653 Dave Chinner      2012-04-23   912  		cil->xc_push_seq = 0;
4bb928cdb900d06 Dave Chinner      2013-08-12   913  		spin_unlock(&cil->xc_push_lock);
a44f13edf0ebb4e Dave Chinner      2010-08-24   914  		goto out_skip;
4c2d542f2e78653 Dave Chinner      2012-04-23   915  	}
4c2d542f2e78653 Dave Chinner      2012-04-23   916  
a44f13edf0ebb4e Dave Chinner      2010-08-24   917  
cf085a1b5d22144 Joe Perches       2019-11-07   918  	/* check for a previously pushed sequence */
facd77e4e38b8f0 Dave Chinner      2021-06-04   919  	if (push_seq < ctx->sequence) {
8af3dcd3c89aef1 Dave Chinner      2014-09-23   920  		spin_unlock(&cil->xc_push_lock);
df806158b0f6eb2 Dave Chinner      2010-05-17   921  		goto out_skip;
8af3dcd3c89aef1 Dave Chinner      2014-09-23   922  	}
8af3dcd3c89aef1 Dave Chinner      2014-09-23   923  
8af3dcd3c89aef1 Dave Chinner      2014-09-23   924  	/*
8af3dcd3c89aef1 Dave Chinner      2014-09-23   925  	 * We are now going to push this context, so add it to the committing
8af3dcd3c89aef1 Dave Chinner      2014-09-23   926  	 * list before we do anything else. This ensures that anyone waiting on
8af3dcd3c89aef1 Dave Chinner      2014-09-23   927  	 * this push can easily detect the difference between a "push in
8af3dcd3c89aef1 Dave Chinner      2014-09-23   928  	 * progress" and "CIL is empty, nothing to do".
8af3dcd3c89aef1 Dave Chinner      2014-09-23   929  	 *
8af3dcd3c89aef1 Dave Chinner      2014-09-23   930  	 * IOWs, a wait loop can now check for:
8af3dcd3c89aef1 Dave Chinner      2014-09-23   931  	 *	the current sequence not being found on the committing list;
8af3dcd3c89aef1 Dave Chinner      2014-09-23   932  	 *	an empty CIL; and
8af3dcd3c89aef1 Dave Chinner      2014-09-23   933  	 *	an unchanged sequence number
8af3dcd3c89aef1 Dave Chinner      2014-09-23   934  	 * to detect a push that had nothing to do and therefore does not need
8af3dcd3c89aef1 Dave Chinner      2014-09-23   935  	 * waiting on. If the CIL is not empty, we get put on the committing
8af3dcd3c89aef1 Dave Chinner      2014-09-23   936  	 * list before emptying the CIL and bumping the sequence number. Hence
8af3dcd3c89aef1 Dave Chinner      2014-09-23   937  	 * an empty CIL and an unchanged sequence number means we jumped out
8af3dcd3c89aef1 Dave Chinner      2014-09-23   938  	 * above after doing nothing.
8af3dcd3c89aef1 Dave Chinner      2014-09-23   939  	 *
8af3dcd3c89aef1 Dave Chinner      2014-09-23   940  	 * Hence the waiter will either find the commit sequence on the
8af3dcd3c89aef1 Dave Chinner      2014-09-23   941  	 * committing list or the sequence number will be unchanged and the CIL
8af3dcd3c89aef1 Dave Chinner      2014-09-23   942  	 * still dirty. In that latter case, the push has not yet started, and
8af3dcd3c89aef1 Dave Chinner      2014-09-23   943  	 * so the waiter will have to continue trying to check the CIL
8af3dcd3c89aef1 Dave Chinner      2014-09-23   944  	 * committing list until it is found. In extreme cases of delay, the
8af3dcd3c89aef1 Dave Chinner      2014-09-23   945  	 * sequence may fully commit between the attempts the wait makes to wait
8af3dcd3c89aef1 Dave Chinner      2014-09-23   946  	 * on the commit sequence.
8af3dcd3c89aef1 Dave Chinner      2014-09-23   947  	 */
8af3dcd3c89aef1 Dave Chinner      2014-09-23   948  	list_add(&ctx->committing, &cil->xc_committing);
8af3dcd3c89aef1 Dave Chinner      2014-09-23   949  	spin_unlock(&cil->xc_push_lock);
df806158b0f6eb2 Dave Chinner      2010-05-17   950  
71e330b593905e4 Dave Chinner      2010-05-21   951  	/*
0279bbbbc03f2ce Dave Chinner      2021-06-03   952  	 * The CIL is stable at this point - nothing new will be added to it
0279bbbbc03f2ce Dave Chinner      2021-06-03   953  	 * because we hold the flush lock exclusively. Hence we can now issue
0279bbbbc03f2ce Dave Chinner      2021-06-03   954  	 * a cache flush to ensure all the completed metadata in the journal we
0279bbbbc03f2ce Dave Chinner      2021-06-03   955  	 * are about to overwrite is on stable storage.
0279bbbbc03f2ce Dave Chinner      2021-06-03   956  	 */
0279bbbbc03f2ce Dave Chinner      2021-06-03   957  	xfs_flush_bdev_async(&bio, log->l_mp->m_ddev_targp->bt_bdev,
0279bbbbc03f2ce Dave Chinner      2021-06-03   958  				&bdev_flush);
0279bbbbc03f2ce Dave Chinner      2021-06-03   959  
a8613836d99e627 Dave Chinner      2021-06-08   960  	xlog_cil_pcp_aggregate(cil, ctx);
a8613836d99e627 Dave Chinner      2021-06-08   961  
1f18c0c4b78cfb1 Dave Chinner      2021-06-08   962  	while (!list_empty(&ctx->log_items)) {
71e330b593905e4 Dave Chinner      2010-05-21   963  		struct xfs_log_item	*item;
71e330b593905e4 Dave Chinner      2010-05-21   964  
1f18c0c4b78cfb1 Dave Chinner      2021-06-08   965  		item = list_first_entry(&ctx->log_items,
71e330b593905e4 Dave Chinner      2010-05-21   966  					struct xfs_log_item, li_cil);
a47518453bf9581 Dave Chinner      2021-06-08   967  		lv = item->li_lv;
a1785f597c8b060 Dave Chinner      2021-06-08   968  		lv->lv_order_id = item->li_order_id;
a47518453bf9581 Dave Chinner      2021-06-08   969  		num_iovecs += lv->lv_niovecs;
66fc9ffa8638be2 Dave Chinner      2021-06-04   970  		/* we don't write ordered log vectors */
66fc9ffa8638be2 Dave Chinner      2021-06-04   971  		if (lv->lv_buf_len != XFS_LOG_VEC_ORDERED)
66fc9ffa8638be2 Dave Chinner      2021-06-04   972  			num_bytes += lv->lv_bytes;
a47518453bf9581 Dave Chinner      2021-06-08   973  
a47518453bf9581 Dave Chinner      2021-06-08   974  		list_add_tail(&lv->lv_list, &ctx->lv_chain);
a1785f597c8b060 Dave Chinner      2021-06-08   975  		list_del_init(&item->li_cil);
a1785f597c8b060 Dave Chinner      2021-06-08   976  		item->li_order_id = 0;
a1785f597c8b060 Dave Chinner      2021-06-08   977  		item->li_lv = NULL;
71e330b593905e4 Dave Chinner      2010-05-21   978  	}
71e330b593905e4 Dave Chinner      2010-05-21   979  
71e330b593905e4 Dave Chinner      2010-05-21   980  	/*
facd77e4e38b8f0 Dave Chinner      2021-06-04   981  	 * Switch the contexts so we can drop the context lock and move out
71e330b593905e4 Dave Chinner      2010-05-21   982  	 * of a shared context. We can't just go straight to the commit record,
71e330b593905e4 Dave Chinner      2010-05-21   983  	 * though - we need to synchronise with previous and future commits so
71e330b593905e4 Dave Chinner      2010-05-21   984  	 * that the commit records are correctly ordered in the log to ensure
71e330b593905e4 Dave Chinner      2010-05-21   985  	 * that we process items during log IO completion in the correct order.
71e330b593905e4 Dave Chinner      2010-05-21   986  	 *
71e330b593905e4 Dave Chinner      2010-05-21   987  	 * For example, if we get an EFI in one checkpoint and the EFD in the
71e330b593905e4 Dave Chinner      2010-05-21   988  	 * next (e.g. due to log forces), we do not want the checkpoint with
71e330b593905e4 Dave Chinner      2010-05-21   989  	 * the EFD to be committed before the checkpoint with the EFI.  Hence
71e330b593905e4 Dave Chinner      2010-05-21   990  	 * we must strictly order the commit records of the checkpoints so
71e330b593905e4 Dave Chinner      2010-05-21   991  	 * that: a) the checkpoint callbacks are attached to the iclogs in the
71e330b593905e4 Dave Chinner      2010-05-21   992  	 * correct order; and b) the checkpoints are replayed in correct order
71e330b593905e4 Dave Chinner      2010-05-21   993  	 * in log recovery.
71e330b593905e4 Dave Chinner      2010-05-21   994  	 *
71e330b593905e4 Dave Chinner      2010-05-21   995  	 * Hence we need to add this context to the committing context list so
71e330b593905e4 Dave Chinner      2010-05-21   996  	 * that higher sequences will wait for us to write out a commit record
71e330b593905e4 Dave Chinner      2010-05-21   997  	 * before they do.
f876e44603ad091 Dave Chinner      2014-02-27   998  	 *
f39ae5297c5ce2f Dave Chinner      2021-06-04   999  	 * xfs_log_force_seq requires us to mirror the new sequence into the cil
f876e44603ad091 Dave Chinner      2014-02-27  1000  	 * structure atomically with the addition of this sequence to the
f876e44603ad091 Dave Chinner      2014-02-27  1001  	 * committing list. This also ensures that we can do unlocked checks
f876e44603ad091 Dave Chinner      2014-02-27  1002  	 * against the current sequence in log forces without risking
f876e44603ad091 Dave Chinner      2014-02-27  1003  	 * deferencing a freed context pointer.
71e330b593905e4 Dave Chinner      2010-05-21  1004  	 */
4bb928cdb900d06 Dave Chinner      2013-08-12  1005  	spin_lock(&cil->xc_push_lock);
facd77e4e38b8f0 Dave Chinner      2021-06-04  1006  	xlog_cil_ctx_switch(cil, new_ctx);
4bb928cdb900d06 Dave Chinner      2013-08-12  1007  	spin_unlock(&cil->xc_push_lock);
71e330b593905e4 Dave Chinner      2010-05-21  1008  	up_write(&cil->xc_ctx_lock);
71e330b593905e4 Dave Chinner      2010-05-21  1009  
a1785f597c8b060 Dave Chinner      2021-06-08  1010  	/*
a1785f597c8b060 Dave Chinner      2021-06-08  1011  	 * Sort the log vector chain before we add the transaction headers.
a1785f597c8b060 Dave Chinner      2021-06-08  1012  	 * This ensures we always have the transaction headers at the start
a1785f597c8b060 Dave Chinner      2021-06-08  1013  	 * of the chain.
a1785f597c8b060 Dave Chinner      2021-06-08  1014  	 */
a1785f597c8b060 Dave Chinner      2021-06-08  1015  	list_sort(NULL, &ctx->lv_chain, xlog_cil_order_cmp);
a1785f597c8b060 Dave Chinner      2021-06-08  1016  
71e330b593905e4 Dave Chinner      2010-05-21  1017  	/*
71e330b593905e4 Dave Chinner      2010-05-21  1018  	 * Build a checkpoint transaction header and write it to the log to
71e330b593905e4 Dave Chinner      2010-05-21  1019  	 * begin the transaction. We need to account for the space used by the
71e330b593905e4 Dave Chinner      2010-05-21  1020  	 * transaction header here as it is not accounted for in xlog_write().
a47518453bf9581 Dave Chinner      2021-06-08  1021  	 * Add the lvhdr to the head of the lv chain we pass to xlog_write() so
a47518453bf9581 Dave Chinner      2021-06-08  1022  	 * it gets written into the iclog first.
71e330b593905e4 Dave Chinner      2010-05-21  1023  	 */
877cf3473914ae4 Dave Chinner      2021-06-04  1024  	xlog_cil_build_trans_hdr(ctx, &thdr, &lvhdr, num_iovecs);
66fc9ffa8638be2 Dave Chinner      2021-06-04  1025  	num_bytes += lvhdr.lv_bytes;
a47518453bf9581 Dave Chinner      2021-06-08  1026  	list_add(&lvhdr.lv_list, &ctx->lv_chain);
71e330b593905e4 Dave Chinner      2010-05-21  1027  
0279bbbbc03f2ce Dave Chinner      2021-06-03  1028  	/*
0279bbbbc03f2ce Dave Chinner      2021-06-03  1029  	 * Before we format and submit the first iclog, we have to ensure that
0279bbbbc03f2ce Dave Chinner      2021-06-03  1030  	 * the metadata writeback ordering cache flush is complete.
0279bbbbc03f2ce Dave Chinner      2021-06-03  1031  	 */
0279bbbbc03f2ce Dave Chinner      2021-06-03  1032  	wait_for_completion(&bdev_flush);
0279bbbbc03f2ce Dave Chinner      2021-06-03  1033  
877cf3473914ae4 Dave Chinner      2021-06-04  1034  	/*
877cf3473914ae4 Dave Chinner      2021-06-04  1035  	 * The LSN we need to pass to the log items on transaction commit is the
877cf3473914ae4 Dave Chinner      2021-06-04  1036  	 * LSN reported by the first log vector write, not the commit lsn. If we
877cf3473914ae4 Dave Chinner      2021-06-04  1037  	 * use the commit record lsn then we can move the tail beyond the grant
877cf3473914ae4 Dave Chinner      2021-06-04  1038  	 * write head.
877cf3473914ae4 Dave Chinner      2021-06-04  1039  	 */
fc3370002b56bcb Dave Chinner      2021-06-17  1040  	error = xlog_write(log, ctx, &ctx->lv_chain, ctx->ticket,
a47518453bf9581 Dave Chinner      2021-06-08  1041  				NULL, num_bytes);
a47518453bf9581 Dave Chinner      2021-06-08  1042  
a47518453bf9581 Dave Chinner      2021-06-08  1043  	/*
a47518453bf9581 Dave Chinner      2021-06-08  1044  	 * Take the lvhdr back off the lv_chain as it should not be passed
a47518453bf9581 Dave Chinner      2021-06-08  1045  	 * to log IO completion.
a47518453bf9581 Dave Chinner      2021-06-08  1046  	 */
a47518453bf9581 Dave Chinner      2021-06-08  1047  	list_del(&lvhdr.lv_list);
71e330b593905e4 Dave Chinner      2010-05-21  1048  	if (error)
7db37c5e6575b22 Dave Chinner      2011-01-27  1049  		goto out_abort_free_ticket;
71e330b593905e4 Dave Chinner      2010-05-21  1050  
71e330b593905e4 Dave Chinner      2010-05-21  1051  	/*
71e330b593905e4 Dave Chinner      2010-05-21  1052  	 * now that we've written the checkpoint into the log, strictly
71e330b593905e4 Dave Chinner      2010-05-21  1053  	 * order the commit records so replay will get them in the right order.
71e330b593905e4 Dave Chinner      2010-05-21  1054  	 */
71e330b593905e4 Dave Chinner      2010-05-21  1055  restart:
4bb928cdb900d06 Dave Chinner      2013-08-12  1056  	spin_lock(&cil->xc_push_lock);
71e330b593905e4 Dave Chinner      2010-05-21  1057  	list_for_each_entry(new_ctx, &cil->xc_committing, committing) {
ac983517ec5941d Dave Chinner      2014-05-07  1058  		/*
ac983517ec5941d Dave Chinner      2014-05-07  1059  		 * Avoid getting stuck in this loop because we were woken by the
ac983517ec5941d Dave Chinner      2014-05-07  1060  		 * shutdown, but then went back to sleep once already in the
ac983517ec5941d Dave Chinner      2014-05-07  1061  		 * shutdown state.
ac983517ec5941d Dave Chinner      2014-05-07  1062  		 */
ac983517ec5941d Dave Chinner      2014-05-07  1063  		if (XLOG_FORCED_SHUTDOWN(log)) {
ac983517ec5941d Dave Chinner      2014-05-07  1064  			spin_unlock(&cil->xc_push_lock);
ac983517ec5941d Dave Chinner      2014-05-07  1065  			goto out_abort_free_ticket;
ac983517ec5941d Dave Chinner      2014-05-07  1066  		}
ac983517ec5941d Dave Chinner      2014-05-07  1067  
71e330b593905e4 Dave Chinner      2010-05-21  1068  		/*
71e330b593905e4 Dave Chinner      2010-05-21  1069  		 * Higher sequences will wait for this one so skip them.
ac983517ec5941d Dave Chinner      2014-05-07  1070  		 * Don't wait for our own sequence, either.
71e330b593905e4 Dave Chinner      2010-05-21  1071  		 */
71e330b593905e4 Dave Chinner      2010-05-21  1072  		if (new_ctx->sequence >= ctx->sequence)
71e330b593905e4 Dave Chinner      2010-05-21  1073  			continue;
71e330b593905e4 Dave Chinner      2010-05-21  1074  		if (!new_ctx->commit_lsn) {
71e330b593905e4 Dave Chinner      2010-05-21  1075  			/*
71e330b593905e4 Dave Chinner      2010-05-21  1076  			 * It is still being pushed! Wait for the push to
71e330b593905e4 Dave Chinner      2010-05-21  1077  			 * complete, then start again from the beginning.
71e330b593905e4 Dave Chinner      2010-05-21  1078  			 */
4bb928cdb900d06 Dave Chinner      2013-08-12  1079  			xlog_wait(&cil->xc_commit_wait, &cil->xc_push_lock);
71e330b593905e4 Dave Chinner      2010-05-21  1080  			goto restart;
71e330b593905e4 Dave Chinner      2010-05-21  1081  		}
71e330b593905e4 Dave Chinner      2010-05-21  1082  	}
4bb928cdb900d06 Dave Chinner      2013-08-12  1083  	spin_unlock(&cil->xc_push_lock);
71e330b593905e4 Dave Chinner      2010-05-21  1084  
fc3370002b56bcb Dave Chinner      2021-06-17  1085  	error = xlog_cil_write_commit_record(ctx, &commit_iclog);
dd401770b0ff68f Dave Chinner      2020-03-25  1086  	if (error)
dd401770b0ff68f Dave Chinner      2020-03-25  1087  		goto out_abort_free_ticket;
dd401770b0ff68f Dave Chinner      2020-03-25  1088  
89ae379d564c5d8 Christoph Hellwig 2019-06-28  1089  	spin_lock(&commit_iclog->ic_callback_lock);
1858bb0bec612df Christoph Hellwig 2019-10-14  1090  	if (commit_iclog->ic_state == XLOG_STATE_IOERROR) {
89ae379d564c5d8 Christoph Hellwig 2019-06-28  1091  		spin_unlock(&commit_iclog->ic_callback_lock);
e469cbe84f4ade9 Dave Chinner      2021-06-08  1092  		goto out_abort_free_ticket;
89ae379d564c5d8 Christoph Hellwig 2019-06-28  1093  	}
89ae379d564c5d8 Christoph Hellwig 2019-06-28  1094  	ASSERT_ALWAYS(commit_iclog->ic_state == XLOG_STATE_ACTIVE ||
89ae379d564c5d8 Christoph Hellwig 2019-06-28  1095  		      commit_iclog->ic_state == XLOG_STATE_WANT_SYNC);
89ae379d564c5d8 Christoph Hellwig 2019-06-28  1096  	list_add_tail(&ctx->iclog_entry, &commit_iclog->ic_callbacks);
89ae379d564c5d8 Christoph Hellwig 2019-06-28  1097  	spin_unlock(&commit_iclog->ic_callback_lock);
71e330b593905e4 Dave Chinner      2010-05-21  1098  
71e330b593905e4 Dave Chinner      2010-05-21  1099  	/*
71e330b593905e4 Dave Chinner      2010-05-21  1100  	 * now the checkpoint commit is complete and we've attached the
71e330b593905e4 Dave Chinner      2010-05-21  1101  	 * callbacks to the iclog we can assign the commit LSN to the context
71e330b593905e4 Dave Chinner      2010-05-21  1102  	 * and wake up anyone who is waiting for the commit to complete.
71e330b593905e4 Dave Chinner      2010-05-21  1103  	 */
4bb928cdb900d06 Dave Chinner      2013-08-12  1104  	spin_lock(&cil->xc_push_lock);
eb40a87500ac2f6 Dave Chinner      2010-12-21  1105  	wake_up_all(&cil->xc_commit_wait);
4bb928cdb900d06 Dave Chinner      2013-08-12  1106  	spin_unlock(&cil->xc_push_lock);
71e330b593905e4 Dave Chinner      2010-05-21  1107  
e469cbe84f4ade9 Dave Chinner      2021-06-08  1108  	/*
e469cbe84f4ade9 Dave Chinner      2021-06-08  1109  	 * Pull the ticket off the ctx so we can ungrant it after releasing the
e469cbe84f4ade9 Dave Chinner      2021-06-08  1110  	 * commit_iclog. The ctx may be freed by the time we return from
e469cbe84f4ade9 Dave Chinner      2021-06-08  1111  	 * releasing the commit_iclog (i.e. checkpoint has been completed and
e469cbe84f4ade9 Dave Chinner      2021-06-08  1112  	 * callback run) so we can't reference the ctx after the call to
e469cbe84f4ade9 Dave Chinner      2021-06-08  1113  	 * xlog_state_release_iclog().
e469cbe84f4ade9 Dave Chinner      2021-06-08  1114  	 */
e469cbe84f4ade9 Dave Chinner      2021-06-08  1115  	ticket = ctx->ticket;
e469cbe84f4ade9 Dave Chinner      2021-06-08  1116  
5fd9256ce156ef7 Dave Chinner      2021-06-03  1117  	/*
815753dc16bbca2 Dave Chinner      2021-06-17  1118  	 * If the checkpoint spans multiple iclogs, wait for all previous iclogs
815753dc16bbca2 Dave Chinner      2021-06-17  1119  	 * to complete before we submit the commit_iclog. We can't use state
815753dc16bbca2 Dave Chinner      2021-06-17  1120  	 * checks for this - ACTIVE can be either a past completed iclog or a
815753dc16bbca2 Dave Chinner      2021-06-17  1121  	 * future iclog being filled, while WANT_SYNC through SYNC_DONE can be a
815753dc16bbca2 Dave Chinner      2021-06-17  1122  	 * past or future iclog awaiting IO or ordered IO completion to be run.
815753dc16bbca2 Dave Chinner      2021-06-17  1123  	 * In the latter case, if it's a future iclog and we wait on it, the we
815753dc16bbca2 Dave Chinner      2021-06-17  1124  	 * will hang because it won't get processed through to ic_force_wait
815753dc16bbca2 Dave Chinner      2021-06-17  1125  	 * wakeup until this commit_iclog is written to disk.  Hence we use the
815753dc16bbca2 Dave Chinner      2021-06-17  1126  	 * iclog header lsn and compare it to the commit lsn to determine if we
815753dc16bbca2 Dave Chinner      2021-06-17  1127  	 * need to wait on iclogs or not.
5fd9256ce156ef7 Dave Chinner      2021-06-03  1128  	 */
5fd9256ce156ef7 Dave Chinner      2021-06-03  1129  	spin_lock(&log->l_icloglock);
cb1acb3f3246368 Dave Chinner      2021-06-04 @1130  	if (ctx->start_lsn != commit_lsn) {
815753dc16bbca2 Dave Chinner      2021-06-17  1131  		struct xlog_in_core	*iclog;
815753dc16bbca2 Dave Chinner      2021-06-17  1132  
815753dc16bbca2 Dave Chinner      2021-06-17  1133  		for (iclog = commit_iclog->ic_prev;
815753dc16bbca2 Dave Chinner      2021-06-17  1134  		     iclog != commit_iclog;
815753dc16bbca2 Dave Chinner      2021-06-17  1135  		     iclog = iclog->ic_prev) {
815753dc16bbca2 Dave Chinner      2021-06-17  1136  			xfs_lsn_t	hlsn;
815753dc16bbca2 Dave Chinner      2021-06-17  1137  
815753dc16bbca2 Dave Chinner      2021-06-17  1138  			/*
815753dc16bbca2 Dave Chinner      2021-06-17  1139  			 * If the LSN of the iclog is zero or in the future it
815753dc16bbca2 Dave Chinner      2021-06-17  1140  			 * means it has passed through IO completion and
815753dc16bbca2 Dave Chinner      2021-06-17  1141  			 * activation and hence all previous iclogs have also
815753dc16bbca2 Dave Chinner      2021-06-17  1142  			 * done so. We do not need to wait at all in this case.
815753dc16bbca2 Dave Chinner      2021-06-17  1143  			 */
815753dc16bbca2 Dave Chinner      2021-06-17  1144  			hlsn = be64_to_cpu(iclog->ic_header.h_lsn);
815753dc16bbca2 Dave Chinner      2021-06-17  1145  			if (!hlsn || XFS_LSN_CMP(hlsn, commit_lsn) > 0)
815753dc16bbca2 Dave Chinner      2021-06-17  1146  				break;
815753dc16bbca2 Dave Chinner      2021-06-17  1147  
815753dc16bbca2 Dave Chinner      2021-06-17  1148  			/*
815753dc16bbca2 Dave Chinner      2021-06-17  1149  			 * If the LSN of the iclog is older than the commit lsn,
815753dc16bbca2 Dave Chinner      2021-06-17  1150  			 * we have to wait on it. Waiting on this via the
815753dc16bbca2 Dave Chinner      2021-06-17  1151  			 * ic_force_wait should also order the completion of all
815753dc16bbca2 Dave Chinner      2021-06-17  1152  			 * older iclogs, too, but we leave checking that to the
815753dc16bbca2 Dave Chinner      2021-06-17  1153  			 * next loop iteration.
815753dc16bbca2 Dave Chinner      2021-06-17  1154  			 */
815753dc16bbca2 Dave Chinner      2021-06-17  1155  			ASSERT(XFS_LSN_CMP(hlsn, commit_lsn) < 0);
815753dc16bbca2 Dave Chinner      2021-06-17  1156  			xlog_wait_on_iclog(iclog);
cb1acb3f3246368 Dave Chinner      2021-06-04  1157  			spin_lock(&log->l_icloglock);
815753dc16bbca2 Dave Chinner      2021-06-17  1158  		}
815753dc16bbca2 Dave Chinner      2021-06-17  1159  
815753dc16bbca2 Dave Chinner      2021-06-17  1160  		/*
815753dc16bbca2 Dave Chinner      2021-06-17  1161  		 * Regardless of whether we need to wait or not, the the
815753dc16bbca2 Dave Chinner      2021-06-17  1162  		 * commit_iclog write needs to issue a pre-flush so that the
815753dc16bbca2 Dave Chinner      2021-06-17  1163  		 * ordering for this checkpoint is correctly preserved down to
815753dc16bbca2 Dave Chinner      2021-06-17  1164  		 * stable storage.
815753dc16bbca2 Dave Chinner      2021-06-17  1165  		 */
cb1acb3f3246368 Dave Chinner      2021-06-04  1166  		commit_iclog->ic_flags |= XLOG_ICL_NEED_FLUSH;
5fd9256ce156ef7 Dave Chinner      2021-06-03  1167  	}
5fd9256ce156ef7 Dave Chinner      2021-06-03  1168  
cb1acb3f3246368 Dave Chinner      2021-06-04  1169  	/*
cb1acb3f3246368 Dave Chinner      2021-06-04  1170  	 * The commit iclog must be written to stable storage to guarantee
cb1acb3f3246368 Dave Chinner      2021-06-04  1171  	 * journal IO vs metadata writeback IO is correctly ordered on stable
cb1acb3f3246368 Dave Chinner      2021-06-04  1172  	 * storage.
e12213ba5d909a3 Dave Chinner      2021-06-04  1173  	 *
e12213ba5d909a3 Dave Chinner      2021-06-04  1174  	 * If the push caller needs the commit to be immediately stable and the
e12213ba5d909a3 Dave Chinner      2021-06-04  1175  	 * commit_iclog is not yet marked as XLOG_STATE_WANT_SYNC to indicate it
e12213ba5d909a3 Dave Chinner      2021-06-04  1176  	 * will be written when released, switch it's state to WANT_SYNC right
e12213ba5d909a3 Dave Chinner      2021-06-04  1177  	 * now.
cb1acb3f3246368 Dave Chinner      2021-06-04  1178  	 */
cb1acb3f3246368 Dave Chinner      2021-06-04  1179  	commit_iclog->ic_flags |= XLOG_ICL_NEED_FUA;
e12213ba5d909a3 Dave Chinner      2021-06-04  1180  	if (push_commit_stable && commit_iclog->ic_state == XLOG_STATE_ACTIVE)
e12213ba5d909a3 Dave Chinner      2021-06-04  1181  		xlog_state_switch_iclogs(log, commit_iclog, 0);
e469cbe84f4ade9 Dave Chinner      2021-06-08  1182  	xlog_state_release_iclog(log, commit_iclog, ticket);
cb1acb3f3246368 Dave Chinner      2021-06-04  1183  	spin_unlock(&log->l_icloglock);
e469cbe84f4ade9 Dave Chinner      2021-06-08  1184  
e469cbe84f4ade9 Dave Chinner      2021-06-08  1185  	xfs_log_ticket_ungrant(log, ticket);
c7cc296ddd1f6d1 Christoph Hellwig 2020-03-20  1186  	return;
71e330b593905e4 Dave Chinner      2010-05-21  1187  
71e330b593905e4 Dave Chinner      2010-05-21  1188  out_skip:
71e330b593905e4 Dave Chinner      2010-05-21  1189  	up_write(&cil->xc_ctx_lock);
71e330b593905e4 Dave Chinner      2010-05-21  1190  	xfs_log_ticket_put(new_ctx->ticket);
71e330b593905e4 Dave Chinner      2010-05-21  1191  	kmem_free(new_ctx);
c7cc296ddd1f6d1 Christoph Hellwig 2020-03-20  1192  	return;
71e330b593905e4 Dave Chinner      2010-05-21  1193  
7db37c5e6575b22 Dave Chinner      2011-01-27  1194  out_abort_free_ticket:
877cf3473914ae4 Dave Chinner      2021-06-04  1195  	xfs_log_ticket_ungrant(log, ctx->ticket);
12e6a0f449d585b Christoph Hellwig 2020-03-20  1196  	ASSERT(XLOG_FORCED_SHUTDOWN(log));
12e6a0f449d585b Christoph Hellwig 2020-03-20  1197  	xlog_cil_committed(ctx);
4c2d542f2e78653 Dave Chinner      2012-04-23  1198  }
4c2d542f2e78653 Dave Chinner      2012-04-23  1199  

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 34451 bytes --]

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 4/8] xfs: pass a CIL context to xlog_write()
@ 2021-06-17 14:46     ` kernel test robot
  0 siblings, 0 replies; 50+ messages in thread
From: kernel test robot @ 2021-06-17 14:46 UTC (permalink / raw)
  To: kbuild-all

[-- Attachment #1: Type: text/plain, Size: 33783 bytes --]

Hi Dave,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on xfs-linux/for-next]
[also build test WARNING on next-20210617]
[cannot apply to v5.13-rc6]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Dave-Chinner/xfs-log-fixes-for-for-next/20210617-162640
base:   https://git.kernel.org/pub/scm/fs/xfs/xfs-linux.git for-next
config: x86_64-randconfig-a011-20210617 (attached as .config)
compiler: clang version 13.0.0 (https://github.com/llvm/llvm-project 64720f57bea6a6bf033feef4a5751ab9c0c3b401)
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # install x86_64 cross compiling tool for clang build
        # apt-get install binutils-x86-64-linux-gnu
        # https://github.com/0day-ci/linux/commit/fc3370002b56bcb25440b96ef5099f508c48360e
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Dave-Chinner/xfs-log-fixes-for-for-next/20210617-162640
        git checkout fc3370002b56bcb25440b96ef5099f508c48360e
        # save the attached .config to linux build tree
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross ARCH=x86_64 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All warnings (new ones prefixed by >>):

   fs/xfs/xfs_log_cil.c:792:1: warning: no previous prototype for function 'xlog_cil_write_commit_record' [-Wmissing-prototypes]
   xlog_cil_write_commit_record(
   ^
   fs/xfs/xfs_log_cil.c:791:1: note: declare 'static' if the function is not intended to be used outside of this translation unit
   int
   ^
   static 
>> fs/xfs/xfs_log_cil.c:1130:24: warning: variable 'commit_lsn' is uninitialized when used here [-Wuninitialized]
           if (ctx->start_lsn != commit_lsn) {
                                 ^~~~~~~~~~
   fs/xfs/xfs_log_cil.c:877:23: note: initialize the variable 'commit_lsn' to silence this warning
           xfs_lsn_t               commit_lsn;
                                             ^
                                              = 0
   2 warnings generated.


vim +/commit_lsn +1130 fs/xfs/xfs_log_cil.c

be05dd0e68ac999 Dave Chinner      2021-06-08   846  
71e330b593905e4 Dave Chinner      2010-05-21   847  /*
c7cc296ddd1f6d1 Christoph Hellwig 2020-03-20   848   * Push the Committed Item List to the log.
c7cc296ddd1f6d1 Christoph Hellwig 2020-03-20   849   *
c7cc296ddd1f6d1 Christoph Hellwig 2020-03-20   850   * If the current sequence is the same as xc_push_seq we need to do a flush. If
c7cc296ddd1f6d1 Christoph Hellwig 2020-03-20   851   * xc_push_seq is less than the current sequence, then it has already been
a44f13edf0ebb4e Dave Chinner      2010-08-24   852   * flushed and we don't need to do anything - the caller will wait for it to
a44f13edf0ebb4e Dave Chinner      2010-08-24   853   * complete if necessary.
a44f13edf0ebb4e Dave Chinner      2010-08-24   854   *
c7cc296ddd1f6d1 Christoph Hellwig 2020-03-20   855   * xc_push_seq is checked unlocked against the sequence number for a match.
c7cc296ddd1f6d1 Christoph Hellwig 2020-03-20   856   * Hence we can allow log forces to run racily and not issue pushes for the
c7cc296ddd1f6d1 Christoph Hellwig 2020-03-20   857   * same sequence twice.  If we get a race between multiple pushes for the same
c7cc296ddd1f6d1 Christoph Hellwig 2020-03-20   858   * sequence they will block on the first one and then abort, hence avoiding
c7cc296ddd1f6d1 Christoph Hellwig 2020-03-20   859   * needless pushes.
c7cc296ddd1f6d1 Christoph Hellwig 2020-03-20   860   */
c7cc296ddd1f6d1 Christoph Hellwig 2020-03-20   861  static void
c7cc296ddd1f6d1 Christoph Hellwig 2020-03-20   862  xlog_cil_push_work(
c7cc296ddd1f6d1 Christoph Hellwig 2020-03-20   863  	struct work_struct	*work)
71e330b593905e4 Dave Chinner      2010-05-21   864  {
facd77e4e38b8f0 Dave Chinner      2021-06-04   865  	struct xfs_cil_ctx	*ctx =
facd77e4e38b8f0 Dave Chinner      2021-06-04   866  		container_of(work, struct xfs_cil_ctx, push_work);
facd77e4e38b8f0 Dave Chinner      2021-06-04   867  	struct xfs_cil		*cil = ctx->cil;
c7cc296ddd1f6d1 Christoph Hellwig 2020-03-20   868  	struct xlog		*log = cil->xc_log;
71e330b593905e4 Dave Chinner      2010-05-21   869  	struct xfs_log_vec	*lv;
71e330b593905e4 Dave Chinner      2010-05-21   870  	struct xfs_cil_ctx	*new_ctx;
71e330b593905e4 Dave Chinner      2010-05-21   871  	struct xlog_in_core	*commit_iclog;
66fc9ffa8638be2 Dave Chinner      2021-06-04   872  	int			num_iovecs = 0;
66fc9ffa8638be2 Dave Chinner      2021-06-04   873  	int			num_bytes = 0;
71e330b593905e4 Dave Chinner      2010-05-21   874  	int			error = 0;
877cf3473914ae4 Dave Chinner      2021-06-04   875  	struct xlog_cil_trans_hdr thdr;
a47518453bf9581 Dave Chinner      2021-06-08   876  	struct xfs_log_vec	lvhdr = {};
71e330b593905e4 Dave Chinner      2010-05-21   877  	xfs_lsn_t		commit_lsn;
4c2d542f2e78653 Dave Chinner      2012-04-23   878  	xfs_lsn_t		push_seq;
0279bbbbc03f2ce Dave Chinner      2021-06-03   879  	struct bio		bio;
0279bbbbc03f2ce Dave Chinner      2021-06-03   880  	DECLARE_COMPLETION_ONSTACK(bdev_flush);
e12213ba5d909a3 Dave Chinner      2021-06-04   881  	bool			push_commit_stable;
e469cbe84f4ade9 Dave Chinner      2021-06-08   882  	struct xlog_ticket	*ticket;
71e330b593905e4 Dave Chinner      2010-05-21   883  
facd77e4e38b8f0 Dave Chinner      2021-06-04   884  	new_ctx = xlog_cil_ctx_alloc();
71e330b593905e4 Dave Chinner      2010-05-21   885  	new_ctx->ticket = xlog_cil_ticket_alloc(log);
71e330b593905e4 Dave Chinner      2010-05-21   886  
71e330b593905e4 Dave Chinner      2010-05-21   887  	down_write(&cil->xc_ctx_lock);
71e330b593905e4 Dave Chinner      2010-05-21   888  
4bb928cdb900d06 Dave Chinner      2013-08-12   889  	spin_lock(&cil->xc_push_lock);
4c2d542f2e78653 Dave Chinner      2012-04-23   890  	push_seq = cil->xc_push_seq;
4c2d542f2e78653 Dave Chinner      2012-04-23   891  	ASSERT(push_seq <= ctx->sequence);
e12213ba5d909a3 Dave Chinner      2021-06-04   892  	push_commit_stable = cil->xc_push_commit_stable;
e12213ba5d909a3 Dave Chinner      2021-06-04   893  	cil->xc_push_commit_stable = false;
71e330b593905e4 Dave Chinner      2010-05-21   894  
0e7ab7efe77451c Dave Chinner      2020-03-24   895  	/*
3682277520d6f4a Dave Chinner      2021-06-04   896  	 * As we are about to switch to a new, empty CIL context, we no longer
3682277520d6f4a Dave Chinner      2021-06-04   897  	 * need to throttle tasks on CIL space overruns. Wake any waiters that
3682277520d6f4a Dave Chinner      2021-06-04   898  	 * the hard push throttle may have caught so they can start committing
3682277520d6f4a Dave Chinner      2021-06-04   899  	 * to the new context. The ctx->xc_push_lock provides the serialisation
3682277520d6f4a Dave Chinner      2021-06-04   900  	 * necessary for safely using the lockless waitqueue_active() check in
3682277520d6f4a Dave Chinner      2021-06-04   901  	 * this context.
3682277520d6f4a Dave Chinner      2021-06-04   902  	 */
3682277520d6f4a Dave Chinner      2021-06-04   903  	if (waitqueue_active(&cil->xc_push_wait))
c7f87f3984cfa1e Dave Chinner      2020-06-16   904  		wake_up_all(&cil->xc_push_wait);
0e7ab7efe77451c Dave Chinner      2020-03-24   905  
4c2d542f2e78653 Dave Chinner      2012-04-23   906  	/*
4c2d542f2e78653 Dave Chinner      2012-04-23   907  	 * Check if we've anything to push. If there is nothing, then we don't
4c2d542f2e78653 Dave Chinner      2012-04-23   908  	 * move on to a new sequence number and so we have to be able to push
4c2d542f2e78653 Dave Chinner      2012-04-23   909  	 * this sequence again later.
4c2d542f2e78653 Dave Chinner      2012-04-23   910  	 */
0d11bae4bcf4aa9 Dave Chinner      2021-06-04   911  	if (test_bit(XLOG_CIL_EMPTY, &cil->xc_flags)) {
4c2d542f2e78653 Dave Chinner      2012-04-23   912  		cil->xc_push_seq = 0;
4bb928cdb900d06 Dave Chinner      2013-08-12   913  		spin_unlock(&cil->xc_push_lock);
a44f13edf0ebb4e Dave Chinner      2010-08-24   914  		goto out_skip;
4c2d542f2e78653 Dave Chinner      2012-04-23   915  	}
4c2d542f2e78653 Dave Chinner      2012-04-23   916  
a44f13edf0ebb4e Dave Chinner      2010-08-24   917  
cf085a1b5d22144 Joe Perches       2019-11-07   918  	/* check for a previously pushed sequence */
facd77e4e38b8f0 Dave Chinner      2021-06-04   919  	if (push_seq < ctx->sequence) {
8af3dcd3c89aef1 Dave Chinner      2014-09-23   920  		spin_unlock(&cil->xc_push_lock);
df806158b0f6eb2 Dave Chinner      2010-05-17   921  		goto out_skip;
8af3dcd3c89aef1 Dave Chinner      2014-09-23   922  	}
8af3dcd3c89aef1 Dave Chinner      2014-09-23   923  
8af3dcd3c89aef1 Dave Chinner      2014-09-23   924  	/*
8af3dcd3c89aef1 Dave Chinner      2014-09-23   925  	 * We are now going to push this context, so add it to the committing
8af3dcd3c89aef1 Dave Chinner      2014-09-23   926  	 * list before we do anything else. This ensures that anyone waiting on
8af3dcd3c89aef1 Dave Chinner      2014-09-23   927  	 * this push can easily detect the difference between a "push in
8af3dcd3c89aef1 Dave Chinner      2014-09-23   928  	 * progress" and "CIL is empty, nothing to do".
8af3dcd3c89aef1 Dave Chinner      2014-09-23   929  	 *
8af3dcd3c89aef1 Dave Chinner      2014-09-23   930  	 * IOWs, a wait loop can now check for:
8af3dcd3c89aef1 Dave Chinner      2014-09-23   931  	 *	the current sequence not being found on the committing list;
8af3dcd3c89aef1 Dave Chinner      2014-09-23   932  	 *	an empty CIL; and
8af3dcd3c89aef1 Dave Chinner      2014-09-23   933  	 *	an unchanged sequence number
8af3dcd3c89aef1 Dave Chinner      2014-09-23   934  	 * to detect a push that had nothing to do and therefore does not need
8af3dcd3c89aef1 Dave Chinner      2014-09-23   935  	 * waiting on. If the CIL is not empty, we get put on the committing
8af3dcd3c89aef1 Dave Chinner      2014-09-23   936  	 * list before emptying the CIL and bumping the sequence number. Hence
8af3dcd3c89aef1 Dave Chinner      2014-09-23   937  	 * an empty CIL and an unchanged sequence number means we jumped out
8af3dcd3c89aef1 Dave Chinner      2014-09-23   938  	 * above after doing nothing.
8af3dcd3c89aef1 Dave Chinner      2014-09-23   939  	 *
8af3dcd3c89aef1 Dave Chinner      2014-09-23   940  	 * Hence the waiter will either find the commit sequence on the
8af3dcd3c89aef1 Dave Chinner      2014-09-23   941  	 * committing list or the sequence number will be unchanged and the CIL
8af3dcd3c89aef1 Dave Chinner      2014-09-23   942  	 * still dirty. In that latter case, the push has not yet started, and
8af3dcd3c89aef1 Dave Chinner      2014-09-23   943  	 * so the waiter will have to continue trying to check the CIL
8af3dcd3c89aef1 Dave Chinner      2014-09-23   944  	 * committing list until it is found. In extreme cases of delay, the
8af3dcd3c89aef1 Dave Chinner      2014-09-23   945  	 * sequence may fully commit between the attempts the wait makes to wait
8af3dcd3c89aef1 Dave Chinner      2014-09-23   946  	 * on the commit sequence.
8af3dcd3c89aef1 Dave Chinner      2014-09-23   947  	 */
8af3dcd3c89aef1 Dave Chinner      2014-09-23   948  	list_add(&ctx->committing, &cil->xc_committing);
8af3dcd3c89aef1 Dave Chinner      2014-09-23   949  	spin_unlock(&cil->xc_push_lock);
df806158b0f6eb2 Dave Chinner      2010-05-17   950  
71e330b593905e4 Dave Chinner      2010-05-21   951  	/*
0279bbbbc03f2ce Dave Chinner      2021-06-03   952  	 * The CIL is stable at this point - nothing new will be added to it
0279bbbbc03f2ce Dave Chinner      2021-06-03   953  	 * because we hold the flush lock exclusively. Hence we can now issue
0279bbbbc03f2ce Dave Chinner      2021-06-03   954  	 * a cache flush to ensure all the completed metadata in the journal we
0279bbbbc03f2ce Dave Chinner      2021-06-03   955  	 * are about to overwrite is on stable storage.
0279bbbbc03f2ce Dave Chinner      2021-06-03   956  	 */
0279bbbbc03f2ce Dave Chinner      2021-06-03   957  	xfs_flush_bdev_async(&bio, log->l_mp->m_ddev_targp->bt_bdev,
0279bbbbc03f2ce Dave Chinner      2021-06-03   958  				&bdev_flush);
0279bbbbc03f2ce Dave Chinner      2021-06-03   959  
a8613836d99e627 Dave Chinner      2021-06-08   960  	xlog_cil_pcp_aggregate(cil, ctx);
a8613836d99e627 Dave Chinner      2021-06-08   961  
1f18c0c4b78cfb1 Dave Chinner      2021-06-08   962  	while (!list_empty(&ctx->log_items)) {
71e330b593905e4 Dave Chinner      2010-05-21   963  		struct xfs_log_item	*item;
71e330b593905e4 Dave Chinner      2010-05-21   964  
1f18c0c4b78cfb1 Dave Chinner      2021-06-08   965  		item = list_first_entry(&ctx->log_items,
71e330b593905e4 Dave Chinner      2010-05-21   966  					struct xfs_log_item, li_cil);
a47518453bf9581 Dave Chinner      2021-06-08   967  		lv = item->li_lv;
a1785f597c8b060 Dave Chinner      2021-06-08   968  		lv->lv_order_id = item->li_order_id;
a47518453bf9581 Dave Chinner      2021-06-08   969  		num_iovecs += lv->lv_niovecs;
66fc9ffa8638be2 Dave Chinner      2021-06-04   970  		/* we don't write ordered log vectors */
66fc9ffa8638be2 Dave Chinner      2021-06-04   971  		if (lv->lv_buf_len != XFS_LOG_VEC_ORDERED)
66fc9ffa8638be2 Dave Chinner      2021-06-04   972  			num_bytes += lv->lv_bytes;
a47518453bf9581 Dave Chinner      2021-06-08   973  
a47518453bf9581 Dave Chinner      2021-06-08   974  		list_add_tail(&lv->lv_list, &ctx->lv_chain);
a1785f597c8b060 Dave Chinner      2021-06-08   975  		list_del_init(&item->li_cil);
a1785f597c8b060 Dave Chinner      2021-06-08   976  		item->li_order_id = 0;
a1785f597c8b060 Dave Chinner      2021-06-08   977  		item->li_lv = NULL;
71e330b593905e4 Dave Chinner      2010-05-21   978  	}
71e330b593905e4 Dave Chinner      2010-05-21   979  
71e330b593905e4 Dave Chinner      2010-05-21   980  	/*
facd77e4e38b8f0 Dave Chinner      2021-06-04   981  	 * Switch the contexts so we can drop the context lock and move out
71e330b593905e4 Dave Chinner      2010-05-21   982  	 * of a shared context. We can't just go straight to the commit record,
71e330b593905e4 Dave Chinner      2010-05-21   983  	 * though - we need to synchronise with previous and future commits so
71e330b593905e4 Dave Chinner      2010-05-21   984  	 * that the commit records are correctly ordered in the log to ensure
71e330b593905e4 Dave Chinner      2010-05-21   985  	 * that we process items during log IO completion in the correct order.
71e330b593905e4 Dave Chinner      2010-05-21   986  	 *
71e330b593905e4 Dave Chinner      2010-05-21   987  	 * For example, if we get an EFI in one checkpoint and the EFD in the
71e330b593905e4 Dave Chinner      2010-05-21   988  	 * next (e.g. due to log forces), we do not want the checkpoint with
71e330b593905e4 Dave Chinner      2010-05-21   989  	 * the EFD to be committed before the checkpoint with the EFI.  Hence
71e330b593905e4 Dave Chinner      2010-05-21   990  	 * we must strictly order the commit records of the checkpoints so
71e330b593905e4 Dave Chinner      2010-05-21   991  	 * that: a) the checkpoint callbacks are attached to the iclogs in the
71e330b593905e4 Dave Chinner      2010-05-21   992  	 * correct order; and b) the checkpoints are replayed in correct order
71e330b593905e4 Dave Chinner      2010-05-21   993  	 * in log recovery.
71e330b593905e4 Dave Chinner      2010-05-21   994  	 *
71e330b593905e4 Dave Chinner      2010-05-21   995  	 * Hence we need to add this context to the committing context list so
71e330b593905e4 Dave Chinner      2010-05-21   996  	 * that higher sequences will wait for us to write out a commit record
71e330b593905e4 Dave Chinner      2010-05-21   997  	 * before they do.
f876e44603ad091 Dave Chinner      2014-02-27   998  	 *
f39ae5297c5ce2f Dave Chinner      2021-06-04   999  	 * xfs_log_force_seq requires us to mirror the new sequence into the cil
f876e44603ad091 Dave Chinner      2014-02-27  1000  	 * structure atomically with the addition of this sequence to the
f876e44603ad091 Dave Chinner      2014-02-27  1001  	 * committing list. This also ensures that we can do unlocked checks
f876e44603ad091 Dave Chinner      2014-02-27  1002  	 * against the current sequence in log forces without risking
f876e44603ad091 Dave Chinner      2014-02-27  1003  	 * deferencing a freed context pointer.
71e330b593905e4 Dave Chinner      2010-05-21  1004  	 */
4bb928cdb900d06 Dave Chinner      2013-08-12  1005  	spin_lock(&cil->xc_push_lock);
facd77e4e38b8f0 Dave Chinner      2021-06-04  1006  	xlog_cil_ctx_switch(cil, new_ctx);
4bb928cdb900d06 Dave Chinner      2013-08-12  1007  	spin_unlock(&cil->xc_push_lock);
71e330b593905e4 Dave Chinner      2010-05-21  1008  	up_write(&cil->xc_ctx_lock);
71e330b593905e4 Dave Chinner      2010-05-21  1009  
a1785f597c8b060 Dave Chinner      2021-06-08  1010  	/*
a1785f597c8b060 Dave Chinner      2021-06-08  1011  	 * Sort the log vector chain before we add the transaction headers.
a1785f597c8b060 Dave Chinner      2021-06-08  1012  	 * This ensures we always have the transaction headers at the start
a1785f597c8b060 Dave Chinner      2021-06-08  1013  	 * of the chain.
a1785f597c8b060 Dave Chinner      2021-06-08  1014  	 */
a1785f597c8b060 Dave Chinner      2021-06-08  1015  	list_sort(NULL, &ctx->lv_chain, xlog_cil_order_cmp);
a1785f597c8b060 Dave Chinner      2021-06-08  1016  
71e330b593905e4 Dave Chinner      2010-05-21  1017  	/*
71e330b593905e4 Dave Chinner      2010-05-21  1018  	 * Build a checkpoint transaction header and write it to the log to
71e330b593905e4 Dave Chinner      2010-05-21  1019  	 * begin the transaction. We need to account for the space used by the
71e330b593905e4 Dave Chinner      2010-05-21  1020  	 * transaction header here as it is not accounted for in xlog_write().
a47518453bf9581 Dave Chinner      2021-06-08  1021  	 * Add the lvhdr to the head of the lv chain we pass to xlog_write() so
a47518453bf9581 Dave Chinner      2021-06-08  1022  	 * it gets written into the iclog first.
71e330b593905e4 Dave Chinner      2010-05-21  1023  	 */
877cf3473914ae4 Dave Chinner      2021-06-04  1024  	xlog_cil_build_trans_hdr(ctx, &thdr, &lvhdr, num_iovecs);
66fc9ffa8638be2 Dave Chinner      2021-06-04  1025  	num_bytes += lvhdr.lv_bytes;
a47518453bf9581 Dave Chinner      2021-06-08  1026  	list_add(&lvhdr.lv_list, &ctx->lv_chain);
71e330b593905e4 Dave Chinner      2010-05-21  1027  
0279bbbbc03f2ce Dave Chinner      2021-06-03  1028  	/*
0279bbbbc03f2ce Dave Chinner      2021-06-03  1029  	 * Before we format and submit the first iclog, we have to ensure that
0279bbbbc03f2ce Dave Chinner      2021-06-03  1030  	 * the metadata writeback ordering cache flush is complete.
0279bbbbc03f2ce Dave Chinner      2021-06-03  1031  	 */
0279bbbbc03f2ce Dave Chinner      2021-06-03  1032  	wait_for_completion(&bdev_flush);
0279bbbbc03f2ce Dave Chinner      2021-06-03  1033  
877cf3473914ae4 Dave Chinner      2021-06-04  1034  	/*
877cf3473914ae4 Dave Chinner      2021-06-04  1035  	 * The LSN we need to pass to the log items on transaction commit is the
877cf3473914ae4 Dave Chinner      2021-06-04  1036  	 * LSN reported by the first log vector write, not the commit lsn. If we
877cf3473914ae4 Dave Chinner      2021-06-04  1037  	 * use the commit record lsn then we can move the tail beyond the grant
877cf3473914ae4 Dave Chinner      2021-06-04  1038  	 * write head.
877cf3473914ae4 Dave Chinner      2021-06-04  1039  	 */
fc3370002b56bcb Dave Chinner      2021-06-17  1040  	error = xlog_write(log, ctx, &ctx->lv_chain, ctx->ticket,
a47518453bf9581 Dave Chinner      2021-06-08  1041  				NULL, num_bytes);
a47518453bf9581 Dave Chinner      2021-06-08  1042  
a47518453bf9581 Dave Chinner      2021-06-08  1043  	/*
a47518453bf9581 Dave Chinner      2021-06-08  1044  	 * Take the lvhdr back off the lv_chain as it should not be passed
a47518453bf9581 Dave Chinner      2021-06-08  1045  	 * to log IO completion.
a47518453bf9581 Dave Chinner      2021-06-08  1046  	 */
a47518453bf9581 Dave Chinner      2021-06-08  1047  	list_del(&lvhdr.lv_list);
71e330b593905e4 Dave Chinner      2010-05-21  1048  	if (error)
7db37c5e6575b22 Dave Chinner      2011-01-27  1049  		goto out_abort_free_ticket;
71e330b593905e4 Dave Chinner      2010-05-21  1050  
71e330b593905e4 Dave Chinner      2010-05-21  1051  	/*
71e330b593905e4 Dave Chinner      2010-05-21  1052  	 * now that we've written the checkpoint into the log, strictly
71e330b593905e4 Dave Chinner      2010-05-21  1053  	 * order the commit records so replay will get them in the right order.
71e330b593905e4 Dave Chinner      2010-05-21  1054  	 */
71e330b593905e4 Dave Chinner      2010-05-21  1055  restart:
4bb928cdb900d06 Dave Chinner      2013-08-12  1056  	spin_lock(&cil->xc_push_lock);
71e330b593905e4 Dave Chinner      2010-05-21  1057  	list_for_each_entry(new_ctx, &cil->xc_committing, committing) {
ac983517ec5941d Dave Chinner      2014-05-07  1058  		/*
ac983517ec5941d Dave Chinner      2014-05-07  1059  		 * Avoid getting stuck in this loop because we were woken by the
ac983517ec5941d Dave Chinner      2014-05-07  1060  		 * shutdown, but then went back to sleep once already in the
ac983517ec5941d Dave Chinner      2014-05-07  1061  		 * shutdown state.
ac983517ec5941d Dave Chinner      2014-05-07  1062  		 */
ac983517ec5941d Dave Chinner      2014-05-07  1063  		if (XLOG_FORCED_SHUTDOWN(log)) {
ac983517ec5941d Dave Chinner      2014-05-07  1064  			spin_unlock(&cil->xc_push_lock);
ac983517ec5941d Dave Chinner      2014-05-07  1065  			goto out_abort_free_ticket;
ac983517ec5941d Dave Chinner      2014-05-07  1066  		}
ac983517ec5941d Dave Chinner      2014-05-07  1067  
71e330b593905e4 Dave Chinner      2010-05-21  1068  		/*
71e330b593905e4 Dave Chinner      2010-05-21  1069  		 * Higher sequences will wait for this one so skip them.
ac983517ec5941d Dave Chinner      2014-05-07  1070  		 * Don't wait for our own sequence, either.
71e330b593905e4 Dave Chinner      2010-05-21  1071  		 */
71e330b593905e4 Dave Chinner      2010-05-21  1072  		if (new_ctx->sequence >= ctx->sequence)
71e330b593905e4 Dave Chinner      2010-05-21  1073  			continue;
71e330b593905e4 Dave Chinner      2010-05-21  1074  		if (!new_ctx->commit_lsn) {
71e330b593905e4 Dave Chinner      2010-05-21  1075  			/*
71e330b593905e4 Dave Chinner      2010-05-21  1076  			 * It is still being pushed! Wait for the push to
71e330b593905e4 Dave Chinner      2010-05-21  1077  			 * complete, then start again from the beginning.
71e330b593905e4 Dave Chinner      2010-05-21  1078  			 */
4bb928cdb900d06 Dave Chinner      2013-08-12  1079  			xlog_wait(&cil->xc_commit_wait, &cil->xc_push_lock);
71e330b593905e4 Dave Chinner      2010-05-21  1080  			goto restart;
71e330b593905e4 Dave Chinner      2010-05-21  1081  		}
71e330b593905e4 Dave Chinner      2010-05-21  1082  	}
4bb928cdb900d06 Dave Chinner      2013-08-12  1083  	spin_unlock(&cil->xc_push_lock);
71e330b593905e4 Dave Chinner      2010-05-21  1084  
fc3370002b56bcb Dave Chinner      2021-06-17  1085  	error = xlog_cil_write_commit_record(ctx, &commit_iclog);
dd401770b0ff68f Dave Chinner      2020-03-25  1086  	if (error)
dd401770b0ff68f Dave Chinner      2020-03-25  1087  		goto out_abort_free_ticket;
dd401770b0ff68f Dave Chinner      2020-03-25  1088  
89ae379d564c5d8 Christoph Hellwig 2019-06-28  1089  	spin_lock(&commit_iclog->ic_callback_lock);
1858bb0bec612df Christoph Hellwig 2019-10-14  1090  	if (commit_iclog->ic_state == XLOG_STATE_IOERROR) {
89ae379d564c5d8 Christoph Hellwig 2019-06-28  1091  		spin_unlock(&commit_iclog->ic_callback_lock);
e469cbe84f4ade9 Dave Chinner      2021-06-08  1092  		goto out_abort_free_ticket;
89ae379d564c5d8 Christoph Hellwig 2019-06-28  1093  	}
89ae379d564c5d8 Christoph Hellwig 2019-06-28  1094  	ASSERT_ALWAYS(commit_iclog->ic_state == XLOG_STATE_ACTIVE ||
89ae379d564c5d8 Christoph Hellwig 2019-06-28  1095  		      commit_iclog->ic_state == XLOG_STATE_WANT_SYNC);
89ae379d564c5d8 Christoph Hellwig 2019-06-28  1096  	list_add_tail(&ctx->iclog_entry, &commit_iclog->ic_callbacks);
89ae379d564c5d8 Christoph Hellwig 2019-06-28  1097  	spin_unlock(&commit_iclog->ic_callback_lock);
71e330b593905e4 Dave Chinner      2010-05-21  1098  
71e330b593905e4 Dave Chinner      2010-05-21  1099  	/*
71e330b593905e4 Dave Chinner      2010-05-21  1100  	 * now the checkpoint commit is complete and we've attached the
71e330b593905e4 Dave Chinner      2010-05-21  1101  	 * callbacks to the iclog we can assign the commit LSN to the context
71e330b593905e4 Dave Chinner      2010-05-21  1102  	 * and wake up anyone who is waiting for the commit to complete.
71e330b593905e4 Dave Chinner      2010-05-21  1103  	 */
4bb928cdb900d06 Dave Chinner      2013-08-12  1104  	spin_lock(&cil->xc_push_lock);
eb40a87500ac2f6 Dave Chinner      2010-12-21  1105  	wake_up_all(&cil->xc_commit_wait);
4bb928cdb900d06 Dave Chinner      2013-08-12  1106  	spin_unlock(&cil->xc_push_lock);
71e330b593905e4 Dave Chinner      2010-05-21  1107  
e469cbe84f4ade9 Dave Chinner      2021-06-08  1108  	/*
e469cbe84f4ade9 Dave Chinner      2021-06-08  1109  	 * Pull the ticket off the ctx so we can ungrant it after releasing the
e469cbe84f4ade9 Dave Chinner      2021-06-08  1110  	 * commit_iclog. The ctx may be freed by the time we return from
e469cbe84f4ade9 Dave Chinner      2021-06-08  1111  	 * releasing the commit_iclog (i.e. checkpoint has been completed and
e469cbe84f4ade9 Dave Chinner      2021-06-08  1112  	 * callback run) so we can't reference the ctx after the call to
e469cbe84f4ade9 Dave Chinner      2021-06-08  1113  	 * xlog_state_release_iclog().
e469cbe84f4ade9 Dave Chinner      2021-06-08  1114  	 */
e469cbe84f4ade9 Dave Chinner      2021-06-08  1115  	ticket = ctx->ticket;
e469cbe84f4ade9 Dave Chinner      2021-06-08  1116  
5fd9256ce156ef7 Dave Chinner      2021-06-03  1117  	/*
815753dc16bbca2 Dave Chinner      2021-06-17  1118  	 * If the checkpoint spans multiple iclogs, wait for all previous iclogs
815753dc16bbca2 Dave Chinner      2021-06-17  1119  	 * to complete before we submit the commit_iclog. We can't use state
815753dc16bbca2 Dave Chinner      2021-06-17  1120  	 * checks for this - ACTIVE can be either a past completed iclog or a
815753dc16bbca2 Dave Chinner      2021-06-17  1121  	 * future iclog being filled, while WANT_SYNC through SYNC_DONE can be a
815753dc16bbca2 Dave Chinner      2021-06-17  1122  	 * past or future iclog awaiting IO or ordered IO completion to be run.
815753dc16bbca2 Dave Chinner      2021-06-17  1123  	 * In the latter case, if it's a future iclog and we wait on it, the we
815753dc16bbca2 Dave Chinner      2021-06-17  1124  	 * will hang because it won't get processed through to ic_force_wait
815753dc16bbca2 Dave Chinner      2021-06-17  1125  	 * wakeup until this commit_iclog is written to disk.  Hence we use the
815753dc16bbca2 Dave Chinner      2021-06-17  1126  	 * iclog header lsn and compare it to the commit lsn to determine if we
815753dc16bbca2 Dave Chinner      2021-06-17  1127  	 * need to wait on iclogs or not.
5fd9256ce156ef7 Dave Chinner      2021-06-03  1128  	 */
5fd9256ce156ef7 Dave Chinner      2021-06-03  1129  	spin_lock(&log->l_icloglock);
cb1acb3f3246368 Dave Chinner      2021-06-04 @1130  	if (ctx->start_lsn != commit_lsn) {
815753dc16bbca2 Dave Chinner      2021-06-17  1131  		struct xlog_in_core	*iclog;
815753dc16bbca2 Dave Chinner      2021-06-17  1132  
815753dc16bbca2 Dave Chinner      2021-06-17  1133  		for (iclog = commit_iclog->ic_prev;
815753dc16bbca2 Dave Chinner      2021-06-17  1134  		     iclog != commit_iclog;
815753dc16bbca2 Dave Chinner      2021-06-17  1135  		     iclog = iclog->ic_prev) {
815753dc16bbca2 Dave Chinner      2021-06-17  1136  			xfs_lsn_t	hlsn;
815753dc16bbca2 Dave Chinner      2021-06-17  1137  
815753dc16bbca2 Dave Chinner      2021-06-17  1138  			/*
815753dc16bbca2 Dave Chinner      2021-06-17  1139  			 * If the LSN of the iclog is zero or in the future it
815753dc16bbca2 Dave Chinner      2021-06-17  1140  			 * means it has passed through IO completion and
815753dc16bbca2 Dave Chinner      2021-06-17  1141  			 * activation and hence all previous iclogs have also
815753dc16bbca2 Dave Chinner      2021-06-17  1142  			 * done so. We do not need to wait at all in this case.
815753dc16bbca2 Dave Chinner      2021-06-17  1143  			 */
815753dc16bbca2 Dave Chinner      2021-06-17  1144  			hlsn = be64_to_cpu(iclog->ic_header.h_lsn);
815753dc16bbca2 Dave Chinner      2021-06-17  1145  			if (!hlsn || XFS_LSN_CMP(hlsn, commit_lsn) > 0)
815753dc16bbca2 Dave Chinner      2021-06-17  1146  				break;
815753dc16bbca2 Dave Chinner      2021-06-17  1147  
815753dc16bbca2 Dave Chinner      2021-06-17  1148  			/*
815753dc16bbca2 Dave Chinner      2021-06-17  1149  			 * If the LSN of the iclog is older than the commit lsn,
815753dc16bbca2 Dave Chinner      2021-06-17  1150  			 * we have to wait on it. Waiting on this via the
815753dc16bbca2 Dave Chinner      2021-06-17  1151  			 * ic_force_wait should also order the completion of all
815753dc16bbca2 Dave Chinner      2021-06-17  1152  			 * older iclogs, too, but we leave checking that to the
815753dc16bbca2 Dave Chinner      2021-06-17  1153  			 * next loop iteration.
815753dc16bbca2 Dave Chinner      2021-06-17  1154  			 */
815753dc16bbca2 Dave Chinner      2021-06-17  1155  			ASSERT(XFS_LSN_CMP(hlsn, commit_lsn) < 0);
815753dc16bbca2 Dave Chinner      2021-06-17  1156  			xlog_wait_on_iclog(iclog);
cb1acb3f3246368 Dave Chinner      2021-06-04  1157  			spin_lock(&log->l_icloglock);
815753dc16bbca2 Dave Chinner      2021-06-17  1158  		}
815753dc16bbca2 Dave Chinner      2021-06-17  1159  
815753dc16bbca2 Dave Chinner      2021-06-17  1160  		/*
815753dc16bbca2 Dave Chinner      2021-06-17  1161  		 * Regardless of whether we need to wait or not, the the
815753dc16bbca2 Dave Chinner      2021-06-17  1162  		 * commit_iclog write needs to issue a pre-flush so that the
815753dc16bbca2 Dave Chinner      2021-06-17  1163  		 * ordering for this checkpoint is correctly preserved down to
815753dc16bbca2 Dave Chinner      2021-06-17  1164  		 * stable storage.
815753dc16bbca2 Dave Chinner      2021-06-17  1165  		 */
cb1acb3f3246368 Dave Chinner      2021-06-04  1166  		commit_iclog->ic_flags |= XLOG_ICL_NEED_FLUSH;
5fd9256ce156ef7 Dave Chinner      2021-06-03  1167  	}
5fd9256ce156ef7 Dave Chinner      2021-06-03  1168  
cb1acb3f3246368 Dave Chinner      2021-06-04  1169  	/*
cb1acb3f3246368 Dave Chinner      2021-06-04  1170  	 * The commit iclog must be written to stable storage to guarantee
cb1acb3f3246368 Dave Chinner      2021-06-04  1171  	 * journal IO vs metadata writeback IO is correctly ordered on stable
cb1acb3f3246368 Dave Chinner      2021-06-04  1172  	 * storage.
e12213ba5d909a3 Dave Chinner      2021-06-04  1173  	 *
e12213ba5d909a3 Dave Chinner      2021-06-04  1174  	 * If the push caller needs the commit to be immediately stable and the
e12213ba5d909a3 Dave Chinner      2021-06-04  1175  	 * commit_iclog is not yet marked as XLOG_STATE_WANT_SYNC to indicate it
e12213ba5d909a3 Dave Chinner      2021-06-04  1176  	 * will be written when released, switch it's state to WANT_SYNC right
e12213ba5d909a3 Dave Chinner      2021-06-04  1177  	 * now.
cb1acb3f3246368 Dave Chinner      2021-06-04  1178  	 */
cb1acb3f3246368 Dave Chinner      2021-06-04  1179  	commit_iclog->ic_flags |= XLOG_ICL_NEED_FUA;
e12213ba5d909a3 Dave Chinner      2021-06-04  1180  	if (push_commit_stable && commit_iclog->ic_state == XLOG_STATE_ACTIVE)
e12213ba5d909a3 Dave Chinner      2021-06-04  1181  		xlog_state_switch_iclogs(log, commit_iclog, 0);
e469cbe84f4ade9 Dave Chinner      2021-06-08  1182  	xlog_state_release_iclog(log, commit_iclog, ticket);
cb1acb3f3246368 Dave Chinner      2021-06-04  1183  	spin_unlock(&log->l_icloglock);
e469cbe84f4ade9 Dave Chinner      2021-06-08  1184  
e469cbe84f4ade9 Dave Chinner      2021-06-08  1185  	xfs_log_ticket_ungrant(log, ticket);
c7cc296ddd1f6d1 Christoph Hellwig 2020-03-20  1186  	return;
71e330b593905e4 Dave Chinner      2010-05-21  1187  
71e330b593905e4 Dave Chinner      2010-05-21  1188  out_skip:
71e330b593905e4 Dave Chinner      2010-05-21  1189  	up_write(&cil->xc_ctx_lock);
71e330b593905e4 Dave Chinner      2010-05-21  1190  	xfs_log_ticket_put(new_ctx->ticket);
71e330b593905e4 Dave Chinner      2010-05-21  1191  	kmem_free(new_ctx);
c7cc296ddd1f6d1 Christoph Hellwig 2020-03-20  1192  	return;
71e330b593905e4 Dave Chinner      2010-05-21  1193  
7db37c5e6575b22 Dave Chinner      2011-01-27  1194  out_abort_free_ticket:
877cf3473914ae4 Dave Chinner      2021-06-04  1195  	xfs_log_ticket_ungrant(log, ctx->ticket);
12e6a0f449d585b Christoph Hellwig 2020-03-20  1196  	ASSERT(XLOG_FORCED_SHUTDOWN(log));
12e6a0f449d585b Christoph Hellwig 2020-03-20  1197  	xlog_cil_committed(ctx);
4c2d542f2e78653 Dave Chinner      2012-04-23  1198  }
4c2d542f2e78653 Dave Chinner      2012-04-23  1199  

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all(a)lists.01.org

[-- Attachment #2: config.gz --]
[-- Type: application/gzip, Size: 34451 bytes --]

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 1/8] xfs: add iclog state trace events
  2021-06-17  8:26 ` [PATCH 1/8] xfs: add iclog state trace events Dave Chinner
@ 2021-06-17 16:45   ` Darrick J. Wong
  2021-06-18 14:09   ` Christoph Hellwig
  1 sibling, 0 replies; 50+ messages in thread
From: Darrick J. Wong @ 2021-06-17 16:45 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs

On Thu, Jun 17, 2021 at 06:26:10PM +1000, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> For the DEBUGS!
> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>

Still looks fine to me.
Reviewed-by: Darrick J. Wong <djwong@kernel.org>

--D

> ---
>  fs/xfs/xfs_log.c      | 18 +++++++++++++
>  fs/xfs/xfs_log_priv.h | 10 ++++++++
>  fs/xfs/xfs_trace.h    | 60 +++++++++++++++++++++++++++++++++++++++++++
>  3 files changed, 88 insertions(+)
> 
> diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c
> index e921b554b683..54fd6a695bb5 100644
> --- a/fs/xfs/xfs_log.c
> +++ b/fs/xfs/xfs_log.c
> @@ -524,6 +524,7 @@ __xlog_state_release_iclog(
>  		iclog->ic_header.h_tail_lsn = cpu_to_be64(tail_lsn);
>  		xlog_verify_tail_lsn(log, iclog, tail_lsn);
>  		/* cycle incremented when incrementing curr_block */
> +		trace_xlog_iclog_syncing(iclog, _RET_IP_);
>  		return true;
>  	}
>  
> @@ -543,6 +544,7 @@ xlog_state_release_iclog(
>  {
>  	lockdep_assert_held(&log->l_icloglock);
>  
> +	trace_xlog_iclog_release(iclog, _RET_IP_);
>  	if (iclog->ic_state == XLOG_STATE_IOERROR)
>  		return -EIO;
>  
> @@ -804,6 +806,7 @@ xlog_wait_on_iclog(
>  {
>  	struct xlog		*log = iclog->ic_log;
>  
> +	trace_xlog_iclog_wait_on(iclog, _RET_IP_);
>  	if (!XLOG_FORCED_SHUTDOWN(log) &&
>  	    iclog->ic_state != XLOG_STATE_ACTIVE &&
>  	    iclog->ic_state != XLOG_STATE_DIRTY) {
> @@ -1804,6 +1807,7 @@ xlog_write_iclog(
>  	unsigned int		count)
>  {
>  	ASSERT(bno < log->l_logBBsize);
> +	trace_xlog_iclog_write(iclog, _RET_IP_);
>  
>  	/*
>  	 * We lock the iclogbufs here so that we can serialise against I/O
> @@ -1950,6 +1954,7 @@ xlog_sync(
>  	unsigned int		size;
>  
>  	ASSERT(atomic_read(&iclog->ic_refcnt) == 0);
> +	trace_xlog_iclog_sync(iclog, _RET_IP_);
>  
>  	count = xlog_calc_iclog_size(log, iclog, &roundoff);
>  
> @@ -2488,6 +2493,7 @@ xlog_state_activate_iclog(
>  	int			*iclogs_changed)
>  {
>  	ASSERT(list_empty_careful(&iclog->ic_callbacks));
> +	trace_xlog_iclog_activate(iclog, _RET_IP_);
>  
>  	/*
>  	 * If the number of ops in this iclog indicate it just contains the
> @@ -2577,6 +2583,8 @@ xlog_state_clean_iclog(
>  {
>  	int			iclogs_changed = 0;
>  
> +	trace_xlog_iclog_clean(dirty_iclog, _RET_IP_);
> +
>  	dirty_iclog->ic_state = XLOG_STATE_DIRTY;
>  
>  	xlog_state_activate_iclogs(log, &iclogs_changed);
> @@ -2636,6 +2644,7 @@ xlog_state_set_callback(
>  	struct xlog_in_core	*iclog,
>  	xfs_lsn_t		header_lsn)
>  {
> +	trace_xlog_iclog_callback(iclog, _RET_IP_);
>  	iclog->ic_state = XLOG_STATE_CALLBACK;
>  
>  	ASSERT(XFS_LSN_CMP(atomic64_read(&log->l_last_sync_lsn),
> @@ -2717,6 +2726,7 @@ xlog_state_do_iclog_callbacks(
>  		__releases(&log->l_icloglock)
>  		__acquires(&log->l_icloglock)
>  {
> +	trace_xlog_iclog_callbacks_start(iclog, _RET_IP_);
>  	spin_unlock(&log->l_icloglock);
>  	spin_lock(&iclog->ic_callback_lock);
>  	while (!list_empty(&iclog->ic_callbacks)) {
> @@ -2736,6 +2746,7 @@ xlog_state_do_iclog_callbacks(
>  	 */
>  	spin_lock(&log->l_icloglock);
>  	spin_unlock(&iclog->ic_callback_lock);
> +	trace_xlog_iclog_callbacks_done(iclog, _RET_IP_);
>  }
>  
>  STATIC void
> @@ -2827,6 +2838,7 @@ xlog_state_done_syncing(
>  
>  	spin_lock(&log->l_icloglock);
>  	ASSERT(atomic_read(&iclog->ic_refcnt) == 0);
> +	trace_xlog_iclog_sync_done(iclog, _RET_IP_);
>  
>  	/*
>  	 * If we got an error, either on the first buffer, or in the case of
> @@ -2899,6 +2911,8 @@ xlog_state_get_iclog_space(
>  	atomic_inc(&iclog->ic_refcnt);	/* prevents sync */
>  	log_offset = iclog->ic_offset;
>  
> +	trace_xlog_iclog_get_space(iclog, _RET_IP_);
> +
>  	/* On the 1st write to an iclog, figure out lsn.  This works
>  	 * if iclogs marked XLOG_STATE_WANT_SYNC always write out what they are
>  	 * committing to.  If the offset is set, that's how many blocks
> @@ -3056,6 +3070,7 @@ xlog_state_switch_iclogs(
>  {
>  	ASSERT(iclog->ic_state == XLOG_STATE_ACTIVE);
>  	assert_spin_locked(&log->l_icloglock);
> +	trace_xlog_iclog_switch(iclog, _RET_IP_);
>  
>  	if (!eventual_size)
>  		eventual_size = iclog->ic_offset;
> @@ -3138,6 +3153,8 @@ xfs_log_force(
>  	if (iclog->ic_state == XLOG_STATE_IOERROR)
>  		goto out_error;
>  
> +	trace_xlog_iclog_force(iclog, _RET_IP_);
> +
>  	if (iclog->ic_state == XLOG_STATE_DIRTY ||
>  	    (iclog->ic_state == XLOG_STATE_ACTIVE &&
>  	     atomic_read(&iclog->ic_refcnt) == 0 && iclog->ic_offset == 0)) {
> @@ -3225,6 +3242,7 @@ xlog_force_lsn(
>  		goto out_error;
>  
>  	while (be64_to_cpu(iclog->ic_header.h_lsn) != lsn) {
> +		trace_xlog_iclog_force_lsn(iclog, _RET_IP_);
>  		iclog = iclog->ic_next;
>  		if (iclog == log->l_iclog)
>  			goto out_unlock;
> diff --git a/fs/xfs/xfs_log_priv.h b/fs/xfs/xfs_log_priv.h
> index e4e421a70335..330befd9f6be 100644
> --- a/fs/xfs/xfs_log_priv.h
> +++ b/fs/xfs/xfs_log_priv.h
> @@ -50,6 +50,16 @@ enum xlog_iclog_state {
>  	XLOG_STATE_IOERROR,	/* IO error happened in sync'ing log */
>  };
>  
> +#define XLOG_STATE_STRINGS \
> +	{ XLOG_STATE_ACTIVE,	"XLOG_STATE_ACTIVE" }, \
> +	{ XLOG_STATE_WANT_SYNC,	"XLOG_STATE_WANT_SYNC" }, \
> +	{ XLOG_STATE_SYNCING,	"XLOG_STATE_SYNCING" }, \
> +	{ XLOG_STATE_DONE_SYNC,	"XLOG_STATE_DONE_SYNC" }, \
> +	{ XLOG_STATE_CALLBACK,	"XLOG_STATE_CALLBACK" }, \
> +	{ XLOG_STATE_DIRTY,	"XLOG_STATE_DIRTY" }, \
> +	{ XLOG_STATE_IOERROR,	"XLOG_STATE_IOERROR" }
> +
> +
>  /*
>   * Log ticket flags
>   */
> diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
> index 71dca776c110..28d570742000 100644
> --- a/fs/xfs/xfs_trace.h
> +++ b/fs/xfs/xfs_trace.h
> @@ -24,6 +24,7 @@ struct xlog_ticket;
>  struct xlog_recover;
>  struct xlog_recover_item;
>  struct xlog_rec_header;
> +struct xlog_in_core;
>  struct xfs_buf_log_format;
>  struct xfs_inode_log_format;
>  struct xfs_bmbt_irec;
> @@ -3927,6 +3928,65 @@ DEFINE_EVENT(xfs_icwalk_class, name,	\
>  DEFINE_ICWALK_EVENT(xfs_ioc_free_eofblocks);
>  DEFINE_ICWALK_EVENT(xfs_blockgc_free_space);
>  
> +TRACE_DEFINE_ENUM(XLOG_STATE_ACTIVE);
> +TRACE_DEFINE_ENUM(XLOG_STATE_WANT_SYNC);
> +TRACE_DEFINE_ENUM(XLOG_STATE_SYNCING);
> +TRACE_DEFINE_ENUM(XLOG_STATE_DONE_SYNC);
> +TRACE_DEFINE_ENUM(XLOG_STATE_CALLBACK);
> +TRACE_DEFINE_ENUM(XLOG_STATE_DIRTY);
> +TRACE_DEFINE_ENUM(XLOG_STATE_IOERROR);
> +
> +DECLARE_EVENT_CLASS(xlog_iclog_class,
> +	TP_PROTO(struct xlog_in_core *iclog, unsigned long caller_ip),
> +	TP_ARGS(iclog, caller_ip),
> +	TP_STRUCT__entry(
> +		__field(dev_t, dev)
> +		__field(uint32_t, state)
> +		__field(int32_t, refcount)
> +		__field(uint32_t, offset)
> +		__field(unsigned long long, lsn)
> +		__field(unsigned long, caller_ip)
> +	),
> +	TP_fast_assign(
> +		__entry->dev = iclog->ic_log->l_mp->m_super->s_dev;
> +		__entry->state = iclog->ic_state;
> +		__entry->refcount = atomic_read(&iclog->ic_refcnt);
> +		__entry->offset = iclog->ic_offset;
> +		__entry->lsn = be64_to_cpu(iclog->ic_header.h_lsn);
> +		__entry->caller_ip = caller_ip;
> +	),
> +	TP_printk("dev %d:%d state %s refcnt %d offset %u lsn 0x%llx caller %pS",
> +		  MAJOR(__entry->dev), MINOR(__entry->dev),
> +		  __print_symbolic(__entry->state, XLOG_STATE_STRINGS),
> +		  __entry->refcount,
> +		  __entry->offset,
> +		  __entry->lsn,
> +		  (char *)__entry->caller_ip)
> +
> +);
> +
> +#define DEFINE_ICLOG_EVENT(name)	\
> +DEFINE_EVENT(xlog_iclog_class, name,	\
> +	TP_PROTO(struct xlog_in_core *iclog, unsigned long caller_ip), \
> +	TP_ARGS(iclog, caller_ip))
> +
> +DEFINE_ICLOG_EVENT(xlog_iclog_activate);
> +DEFINE_ICLOG_EVENT(xlog_iclog_clean);
> +DEFINE_ICLOG_EVENT(xlog_iclog_callback);
> +DEFINE_ICLOG_EVENT(xlog_iclog_callbacks_start);
> +DEFINE_ICLOG_EVENT(xlog_iclog_callbacks_done);
> +DEFINE_ICLOG_EVENT(xlog_iclog_force);
> +DEFINE_ICLOG_EVENT(xlog_iclog_force_lsn);
> +DEFINE_ICLOG_EVENT(xlog_iclog_get_space);
> +DEFINE_ICLOG_EVENT(xlog_iclog_release);
> +DEFINE_ICLOG_EVENT(xlog_iclog_switch);
> +DEFINE_ICLOG_EVENT(xlog_iclog_sync);
> +DEFINE_ICLOG_EVENT(xlog_iclog_syncing);
> +DEFINE_ICLOG_EVENT(xlog_iclog_sync_done);
> +DEFINE_ICLOG_EVENT(xlog_iclog_want_sync);
> +DEFINE_ICLOG_EVENT(xlog_iclog_wait_on);
> +DEFINE_ICLOG_EVENT(xlog_iclog_write);
> +
>  #endif /* _TRACE_XFS_H */
>  
>  #undef TRACE_INCLUDE_PATH
> -- 
> 2.31.1
> 

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 2/8] xfs: don't wait on future iclogs when pushing the CIL
  2021-06-17  8:26 ` [PATCH 2/8] xfs: don't wait on future iclogs when pushing the CIL Dave Chinner
@ 2021-06-17 17:49   ` Darrick J. Wong
  2021-06-17 21:55     ` Dave Chinner
  0 siblings, 1 reply; 50+ messages in thread
From: Darrick J. Wong @ 2021-06-17 17:49 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs

On Thu, Jun 17, 2021 at 06:26:11PM +1000, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> The iclogbuf ring attached to the struct xlog is circular, hence the
> first and last iclogs in the ring can only be determined by
> comparing them against the log->l_iclog pointer.
> 
> In xfs_cil_push_work(), we want to wait on previous iclogs that were
> issued so that we can flush them to stable storage with the commit
> record write, and it simply waits on the previous iclog in the ring.
> This, however, leads to CIL push hangs in generic/019 like so:
> 
> task:kworker/u33:0   state:D stack:12680 pid:    7 ppid:     2 flags:0x00004000
> Workqueue: xfs-cil/pmem1 xlog_cil_push_work
> Call Trace:
>  __schedule+0x30b/0x9f0
>  schedule+0x68/0xe0
>  xlog_wait_on_iclog+0x121/0x190
>  ? wake_up_q+0xa0/0xa0
>  xlog_cil_push_work+0x994/0xa10
>  ? _raw_spin_lock+0x15/0x20
>  ? xfs_swap_extents+0x920/0x920
>  process_one_work+0x1ab/0x390
>  worker_thread+0x56/0x3d0
>  ? rescuer_thread+0x3c0/0x3c0
>  kthread+0x14d/0x170
>  ? __kthread_bind_mask+0x70/0x70
>  ret_from_fork+0x1f/0x30
> 
> With other threads blocking in either xlog_state_get_iclog_space()
> waiting for iclog space or xlog_grant_head_wait() waiting for log
> reservation space.
> 
> The problem here is that the previous iclog on the ring might
> actually be a future iclog. That is, if log->l_iclog points at
> commit_iclog, commit_iclog is the first (oldest) iclog in the ring
> and there are no previous iclogs pending as they have all completed
> their IO and been activated again. IOWs, commit_iclog->ic_prev
> points to an iclog that will be written in the future, not one that
> has been written in the past.
> 
> Hence, in this case, waiting on the ->ic_prev iclog is incorrect
> behaviour, and depending on the state of the future iclog, we can
> end up with a circular ABA wait cycle and we hang.
> 
> The fix is made more complex by the fact that many iclogs states
> cannot be used to determine if the iclog is a past or future iclog.
> Hence we have to determine past iclogs by checking the LSN of the
> iclog rather than their state. A past ACTIVE iclog will have a LSN
> of zero, while a future ACTIVE iclog will have a LSN greater than
> the current iclog. We don't wait on either of these cases.
> 
> Similarly, a future iclog that hasn't completed IO will have an LSN
> greater than the current iclog and so we don't wait on them. A past
> iclog that is still undergoing IO completion will have a LSN less
> than the current iclog and those are the only iclogs that we need to
> wait on.
> 
> Hence we can use the iclog LSN to determine what iclogs we need to
> wait on here.
> 
> Fixes: 5fd9256ce156 ("xfs: separate CIL commit record IO")
> Reported-by: Brian Foster <bfoster@redhat.com>
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> ---
>  fs/xfs/xfs_log_cil.c | 51 ++++++++++++++++++++++++++++++++++++++------
>  1 file changed, 45 insertions(+), 6 deletions(-)
> 
> diff --git a/fs/xfs/xfs_log_cil.c b/fs/xfs/xfs_log_cil.c
> index 705619e9dab4..2fb0ab02dda3 100644
> --- a/fs/xfs/xfs_log_cil.c
> +++ b/fs/xfs/xfs_log_cil.c
> @@ -1075,15 +1075,54 @@ xlog_cil_push_work(
>  	ticket = ctx->ticket;
>  
>  	/*
> -	 * If the checkpoint spans multiple iclogs, wait for all previous
> -	 * iclogs to complete before we submit the commit_iclog. In this case,
> -	 * the commit_iclog write needs to issue a pre-flush so that the
> -	 * ordering is correctly preserved down to stable storage.
> +	 * If the checkpoint spans multiple iclogs, wait for all previous iclogs
> +	 * to complete before we submit the commit_iclog. We can't use state
> +	 * checks for this - ACTIVE can be either a past completed iclog or a
> +	 * future iclog being filled, while WANT_SYNC through SYNC_DONE can be a
> +	 * past or future iclog awaiting IO or ordered IO completion to be run.
> +	 * In the latter case, if it's a future iclog and we wait on it, the we
> +	 * will hang because it won't get processed through to ic_force_wait
> +	 * wakeup until this commit_iclog is written to disk.  Hence we use the
> +	 * iclog header lsn and compare it to the commit lsn to determine if we
> +	 * need to wait on iclogs or not.
>  	 */
>  	spin_lock(&log->l_icloglock);
>  	if (ctx->start_lsn != commit_lsn) {
> -		xlog_wait_on_iclog(commit_iclog->ic_prev);
> -		spin_lock(&log->l_icloglock);
> +		struct xlog_in_core	*iclog;
> +
> +		for (iclog = commit_iclog->ic_prev;
> +		     iclog != commit_iclog;
> +		     iclog = iclog->ic_prev) {
> +			xfs_lsn_t	hlsn;
> +
> +			/*
> +			 * If the LSN of the iclog is zero or in the future it
> +			 * means it has passed through IO completion and
> +			 * activation and hence all previous iclogs have also
> +			 * done so. We do not need to wait at all in this case.
> +			 */
> +			hlsn = be64_to_cpu(iclog->ic_header.h_lsn);
> +			if (!hlsn || XFS_LSN_CMP(hlsn, commit_lsn) > 0)
> +				break;
> +
> +			/*
> +			 * If the LSN of the iclog is older than the commit lsn,
> +			 * we have to wait on it. Waiting on this via the
> +			 * ic_force_wait should also order the completion of all
> +			 * older iclogs, too, but we leave checking that to the
> +			 * next loop iteration.
> +			 */
> +			ASSERT(XFS_LSN_CMP(hlsn, commit_lsn) < 0);
> +			xlog_wait_on_iclog(iclog);
> +			spin_lock(&log->l_icloglock);

The presence of a loop here confuses me a bit -- we really only need to
check and wait on commit->ic_prev since xlog_wait_on_iclog waits for
both the iclog that it is given as well as all previous iclogs, right?

Does "we leave checking that to the next loop iteration" mean that once
we've waited on commit->ic_prev, the next iclog iterated (i.e.
commit->ic_prev->ic_prev) should break out of the loop?

--D

> +		}
> +
> +		/*
> +		 * Regardless of whether we need to wait or not, the the
> +		 * commit_iclog write needs to issue a pre-flush so that the
> +		 * ordering for this checkpoint is correctly preserved down to
> +		 * stable storage.
> +		 */
>  		commit_iclog->ic_flags |= XLOG_ICL_NEED_FLUSH;
>  	}
>  
> -- 
> 2.31.1
> 

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 3/8] xfs: move xlog_commit_record to xfs_log_cil.c
  2021-06-17  8:26 ` [PATCH 3/8] xfs: move xlog_commit_record to xfs_log_cil.c Dave Chinner
  2021-06-17 12:57     ` kernel test robot
@ 2021-06-17 17:50   ` Darrick J. Wong
  2021-06-17 21:56     ` Dave Chinner
  2021-06-18 14:16   ` Christoph Hellwig
  2 siblings, 1 reply; 50+ messages in thread
From: Darrick J. Wong @ 2021-06-17 17:50 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs

On Thu, Jun 17, 2021 at 06:26:12PM +1000, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> It is only used by the CIL checkpoints, and is the counterpart to
> start record formatting and writing that is already local to
> xfs_log_cil.c.
> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> ---
>  fs/xfs/xfs_log.c      | 41 ---------------------------------------
>  fs/xfs/xfs_log_cil.c  | 45 ++++++++++++++++++++++++++++++++++++++++++-
>  fs/xfs/xfs_log_priv.h |  2 --
>  3 files changed, 44 insertions(+), 44 deletions(-)
> 
> diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c
> index 54fd6a695bb5..cf661c155786 100644
> --- a/fs/xfs/xfs_log.c
> +++ b/fs/xfs/xfs_log.c
> @@ -1563,47 +1563,6 @@ xlog_alloc_log(
>  	return ERR_PTR(error);
>  }	/* xlog_alloc_log */
>  
> -/*
> - * Write out the commit record of a transaction associated with the given
> - * ticket to close off a running log write. Return the lsn of the commit record.
> - */
> -int
> -xlog_commit_record(
> -	struct xlog		*log,
> -	struct xlog_ticket	*ticket,
> -	struct xlog_in_core	**iclog,
> -	xfs_lsn_t		*lsn)
> -{
> -	struct xlog_op_header	ophdr = {
> -		.oh_clientid = XFS_TRANSACTION,
> -		.oh_tid = cpu_to_be32(ticket->t_tid),
> -		.oh_flags = XLOG_COMMIT_TRANS,
> -	};
> -	struct xfs_log_iovec reg = {
> -		.i_addr = &ophdr,
> -		.i_len = sizeof(struct xlog_op_header),
> -		.i_type = XLOG_REG_TYPE_COMMIT,
> -	};
> -	struct xfs_log_vec vec = {
> -		.lv_niovecs = 1,
> -		.lv_iovecp = &reg,
> -	};
> -	int	error;
> -	LIST_HEAD(lv_chain);
> -	INIT_LIST_HEAD(&vec.lv_list);
> -	list_add(&vec.lv_list, &lv_chain);
> -
> -	if (XLOG_FORCED_SHUTDOWN(log))
> -		return -EIO;
> -
> -	/* account for space used by record data */
> -	ticket->t_curr_res -= reg.i_len;
> -	error = xlog_write(log, &lv_chain, ticket, lsn, iclog, reg.i_len);
> -	if (error)
> -		xfs_force_shutdown(log->l_mp, SHUTDOWN_LOG_IO_ERROR);
> -	return error;
> -}
> -
>  /*
>   * Compute the LSN that we'd need to push the log tail towards in order to have
>   * (a) enough on-disk log space to log the number of bytes specified, (b) at
> diff --git a/fs/xfs/xfs_log_cil.c b/fs/xfs/xfs_log_cil.c
> index 2fb0ab02dda3..2c8b25888c53 100644
> --- a/fs/xfs/xfs_log_cil.c
> +++ b/fs/xfs/xfs_log_cil.c
> @@ -783,6 +783,48 @@ xlog_cil_build_trans_hdr(
>  	tic->t_curr_res -= lvhdr->lv_bytes;
>  }
>  
> +/*
> + * Write out the commit record of a checkpoint transaction associated with the
> + * given ticket to close off a running log write. Return the lsn of the commit
> + * record.
> + */
> +int

static int, like the robot suggests?

With that fixed,
Reviewed-by: Darrick J. Wong <djwong@kernel.org>

--D

> +xlog_cil_write_commit_record(
> +	struct xlog		*log,
> +	struct xlog_ticket	*ticket,
> +	struct xlog_in_core	**iclog,
> +	xfs_lsn_t		*lsn)
> +{
> +	struct xlog_op_header	ophdr = {
> +		.oh_clientid = XFS_TRANSACTION,
> +		.oh_tid = cpu_to_be32(ticket->t_tid),
> +		.oh_flags = XLOG_COMMIT_TRANS,
> +	};
> +	struct xfs_log_iovec reg = {
> +		.i_addr = &ophdr,
> +		.i_len = sizeof(struct xlog_op_header),
> +		.i_type = XLOG_REG_TYPE_COMMIT,
> +	};
> +	struct xfs_log_vec vec = {
> +		.lv_niovecs = 1,
> +		.lv_iovecp = &reg,
> +	};
> +	int	error;
> +	LIST_HEAD(lv_chain);
> +	INIT_LIST_HEAD(&vec.lv_list);
> +	list_add(&vec.lv_list, &lv_chain);
> +
> +	if (XLOG_FORCED_SHUTDOWN(log))
> +		return -EIO;
> +
> +	/* account for space used by record data */
> +	ticket->t_curr_res -= reg.i_len;
> +	error = xlog_write(log, &lv_chain, ticket, lsn, iclog, reg.i_len);
> +	if (error)
> +		xfs_force_shutdown(log->l_mp, SHUTDOWN_LOG_IO_ERROR);
> +	return error;
> +}
> +
>  /*
>   * CIL item reordering compare function. We want to order in ascending ID order,
>   * but we want to leave items with the same ID in the order they were added to
> @@ -1041,7 +1083,8 @@ xlog_cil_push_work(
>  	}
>  	spin_unlock(&cil->xc_push_lock);
>  
> -	error = xlog_commit_record(log, ctx->ticket, &commit_iclog, &commit_lsn);
> +	error = xlog_cil_write_commit_record(log, ctx->ticket, &commit_iclog,
> +			&commit_lsn);
>  	if (error)
>  		goto out_abort_free_ticket;
>  
> diff --git a/fs/xfs/xfs_log_priv.h b/fs/xfs/xfs_log_priv.h
> index 330befd9f6be..26f26769d1c6 100644
> --- a/fs/xfs/xfs_log_priv.h
> +++ b/fs/xfs/xfs_log_priv.h
> @@ -490,8 +490,6 @@ void	xlog_print_trans(struct xfs_trans *);
>  int	xlog_write(struct xlog *log, struct list_head *lv_chain,
>  		struct xlog_ticket *tic, xfs_lsn_t *start_lsn,
>  		struct xlog_in_core **commit_iclog, uint32_t len);
> -int	xlog_commit_record(struct xlog *log, struct xlog_ticket *ticket,
> -		struct xlog_in_core **iclog, xfs_lsn_t *lsn);
>  
>  void	xfs_log_ticket_ungrant(struct xlog *log, struct xlog_ticket *ticket);
>  void	xfs_log_ticket_regrant(struct xlog *log, struct xlog_ticket *ticket);
> -- 
> 2.31.1
> 

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 0/8 V2] xfs: log fixes for for-next
  2021-06-17  8:26 [PATCH 0/8 V2] xfs: log fixes for for-next Dave Chinner
                   ` (7 preceding siblings ...)
  2021-06-17  8:26 ` [PATCH 8/8] xfs: order CIL checkpoint start records Dave Chinner
@ 2021-06-17 18:32 ` Brian Foster
  2021-06-17 19:05   ` Darrick J. Wong
  2021-06-18 22:48 ` Dave Chinner
  9 siblings, 1 reply; 50+ messages in thread
From: Brian Foster @ 2021-06-17 18:32 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs

On Thu, Jun 17, 2021 at 06:26:09PM +1000, Dave Chinner wrote:
> Hi folks,
> 
> This is followup from the first set of log fixes for for-next that
> were posted here:
> 
> https://lore.kernel.org/linux-xfs/20210615175719.GD158209@locust/T/#mde2cf0bb7d2ac369815a7e9371f0303efc89f51b
> 
> The first two patches of this series are updates for those patches,
> change log below. The rest is the fix for the bigger issue we
> uncovered in investigating the generic/019 failures, being that
> we're triggering a zero-day bug in the way log recovery assigns LSNs
> to checkpoints.
> 
> The "simple" fix of using the same ordering code as the commit
> record for the start records in the CIL push turned into a lot of
> patches once I started cleaning it up, separating out all the
> different bits and finally realising all the things I needed to
> change to avoid unintentional logic/behavioural changes. Hence
> there's some code movement, some factoring, API changes to
> xlog_write(), changing where we attach callbacks to commit iclogs so
> they remain correctly ordered if there are multiple commit records
> in the one iclog and then, finally, strictly ordering the start
> records....
> 
> The original "simple fix" I tested last night ran almost a thousand
> cycles of generic/019 without a log hang or recovery failure of any
> kind. The refactored patchset has run a couple hundred cycles of
> g/019 and g/475 over the last few hours without a failure, so I'm
> posting this so we can get a review iteration done while I sleep so
> we can - hopefully - get this sorted out before the end of the week.
> 

My first spin of this included generic/019 and generic/475, ran for 18
or so iterations and 475 exploded with a stream of asserts followed by a
NULL pointer crash:

# grep -e Assertion -e BUG dmesg.out
...
[ 7951.878058] XFS: Assertion failed: atomic_read(&buip->bui_refcount) > 0, file: fs/xfs/xfs_bmap_item.c, line: 57
[ 7952.261251] XFS: Assertion failed: atomic_read(&buip->bui_refcount) > 0, file: fs/xfs/xfs_bmap_item.c, line: 57
[ 7952.644444] XFS: Assertion failed: atomic_read(&buip->bui_refcount) > 0, file: fs/xfs/xfs_bmap_item.c, line: 57
[ 7953.027626] XFS: Assertion failed: atomic_read(&buip->bui_refcount) > 0, file: fs/xfs/xfs_bmap_item.c, line: 57
[ 7953.410804] BUG: kernel NULL pointer dereference, address: 000000000000031f
[ 7954.118973] BUG: unable to handle page fault for address: ffffa57ccf99fa98

I don't know if this is a regression, but I've not seen it before. I've
attempted to spin generic/475 since then to see if it reproduces again,
but so far I'm only running into some of the preexisting issues
associated with that test. I'll let it go a while more and probably
switch it back to running both sometime before the end of the day for an
overnight test.

A full copy of the assert and NULL pointer BUG splat is included below
for reference. It looks like the fault BUG splat ended up interspersed
or otherwise mangled, but I suspect that one is just fallout from the
immediately previous crash.

Brian

--- 8< ---

[ 7953.027626] XFS: Assertion failed: atomic_read(&buip->bui_refcount) > 0, file: fs/xfs/xfs_bmap_item.c, line: 57
[ 7953.037737] ------------[ cut here ]------------
[ 7953.042358] WARNING: CPU: 0 PID: 131627 at fs/xfs/xfs_message.c:112 assfail+0x25/0x28 [xfs]
[ 7953.050782] Modules linked in: rfkill dm_service_time dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua rpcrdma sunrpc rdma_ucm ib_srpt ib_isert iscsi_target_mod target_core_mod ib_iser libiscsi scsi_transport_iscsi rdma_cm ib_umad ib_ipoib iw_cm ib_cm mlx5_ib intel_rapl_msr intel_rapl_common isst_if_common ib_uverbs ib_core skx_edac nfit libnvdimm x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel mlx5_core kvm ipmi_ssif iTCO_wdt intel_pmc_bxt irqbypass iTCO_vendor_support rapl acpi_ipmi intel_cstate psample intel_uncore mei_me wmi_bmof ipmi_si pcspkr mlxfw i2c_i801 tg3 pci_hyperv_intf mei lpc_ich intel_pch_thermal ipmi_devintf i2c_smbus ipmi_msghandler acpi_power_meter fuse zram ip_tables xfs lpfc mgag200 drm_kms_helper nvmet_fc nvmet cec nvme_fc crct10dif_pclmul crc32_pclmul nvme_fabrics drm crc32c_intel nvme_core ghash_clmulni_intel scsi_transport_fc megaraid_sas i2c_algo_bit wmi
[ 7953.129548] CPU: 0 PID: 131627 Comm: kworker/u161:5 Tainted: G        W I       5.13.0-rc4+ #70
[ 7953.138243] Hardware name: Dell Inc. PowerEdge R740/01KPX8, BIOS 1.6.11 11/20/2018
[ 7953.145818] Workqueue: xfs-cil/dm-7 xlog_cil_push_work [xfs]
[ 7953.151554] RIP: 0010:assfail+0x25/0x28 [xfs]
[ 7953.155991] Code: ff ff 0f 0b c3 0f 1f 44 00 00 41 89 c8 48 89 d1 48 89 f2 48 c7 c6 d8 eb c3 c0 e8 cf fa ff ff 80 3d f1 d4 0a 00 00 74 02 0f 0b <0f> 0b c3 48 8d 45 10 48 89 e2 4c 89 e6 48 89 1c 24 48 89 44 24 18
[ 7953.174745] RSP: 0018:ffffa57ccf99fa50 EFLAGS: 00010246
[ 7953.179982] RAX: 00000000ffffffea RBX: 0000000500003977 RCX: 0000000000000000
[ 7953.187121] RDX: 00000000ffffffc0 RSI: 0000000000000000 RDI: ffffffffc0c300e2
[ 7953.194264] RBP: ffff91f685725040 R08: 0000000000000000 R09: 000000000000000a
[ 7953.201405] R10: 000000000000000a R11: f000000000000000 R12: ffff91f685725040
[ 7953.208546] R13: 0000000000000000 R14: ffff91f66abed140 R15: ffff91c76dfccb40
[ 7953.215686] FS:  0000000000000000(0000) GS:ffff91f580800000(0000) knlGS:0000000000000000
[ 7953.223781] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 7953.229533] CR2: 00000000020e2108 CR3: 0000003d02826003 CR4: 00000000007706f0
[ 7953.236667] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 7953.243809] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 7953.250949] PKRU: 55555554
[ 7953.253669] Call Trace:
[ 7953.256123]  xfs_bui_release+0x4b/0x50 [xfs]
[ 7953.260466]  xfs_trans_committed_bulk+0x158/0x2c0 [xfs]
[ 7953.265762]  ? lock_release+0x1cd/0x2a0
[ 7953.269610]  ? _raw_spin_unlock+0x1f/0x30
[ 7953.273630]  ? xlog_write+0x1e2/0x630 [xfs]
[ 7953.277886]  ? lock_acquire+0x15d/0x380
[ 7953.281732]  ? lock_acquire+0x15d/0x380
[ 7953.285582]  ? lock_release+0x1cd/0x2a0
[ 7953.289428]  ? trace_hardirqs_on+0x1b/0xd0
[ 7953.293536]  ? _raw_spin_unlock_irqrestore+0x37/0x40
[ 7953.298511]  ? __wake_up_common_lock+0x7a/0x90
[ 7953.302966]  ? lock_release+0x1cd/0x2a0
[ 7953.306813]  xlog_cil_committed+0x34f/0x390 [xfs]
[ 7953.311593]  ? xlog_cil_push_work+0x715/0x8d0 [xfs]
[ 7953.316547]  xlog_cil_push_work+0x740/0x8d0 [xfs]
[ 7953.321321]  ? _raw_spin_unlock_irq+0x24/0x40
[ 7953.325689]  ? finish_task_switch.isra.0+0xa0/0x2c0
[ 7953.330580]  ? kmem_cache_free+0x247/0x5c0
[ 7953.334685]  ? fsnotify_final_mark_destroy+0x1c/0x30
[ 7953.339658]  ? lock_acquire+0x15d/0x380
[ 7953.343505]  ? lock_acquire+0x15d/0x380
[ 7953.347353]  ? lock_release+0x1cd/0x2a0
[ 7953.351203]  process_one_work+0x26e/0x560
[ 7953.355225]  worker_thread+0x52/0x3b0
[ 7953.358898]  ? process_one_work+0x560/0x560
[ 7953.363094]  kthread+0x12c/0x150
[ 7953.366335]  ? __kthread_bind_mask+0x60/0x60
[ 7953.370617]  ret_from_fork+0x22/0x30
[ 7953.374206] irq event stamp: 0
[ 7953.377268] hardirqs last  enabled at (0): [<0000000000000000>] 0x0
[ 7953.383544] hardirqs last disabled at (0): [<ffffffffb50da3f4>] copy_process+0x754/0x1d00
[ 7953.391724] softirqs last  enabled at (0): [<ffffffffb50da3f4>] copy_process+0x754/0x1d00
[ 7953.399907] softirqs last disabled at (0): [<0000000000000000>] 0x0
[ 7953.406179] ---[ end trace f04c960f66265f3a ]---
[ 7953.410804] BUG: kernel NULL pointer dereference, address: 000000000000031f
[ 7953.417760] #PF: supervisor read access in kernel mode
[ 7953.422900] #PF: error_code(0x0000) - not-present page
[ 7953.428038] PGD 0 P4D 0 
[ 7953.430579] Oops: 0000 [#1] SMP PTI
[ 7953.434070] CPU: 0 PID: 131627 Comm: kworker/u161:5 Tainted: G        W I       5.13.0-rc4+ #70
[ 7953.442764] Hardware name: Dell Inc. PowerEdge R740/01KPX8, BIOS 1.6.11 11/20/2018
[ 7953.450330] Workqueue: xfs-cil/dm-7 xlog_cil_push_work [xfs]
[ 7953.456058] RIP: 0010:xfs_trans_committed_bulk+0xcc/0x2c0 [xfs]
[ 7953.462036] Code: 41 83 c5 01 48 89 54 c4 50 41 83 fd 1f 0f 8f 11 01 00 00 4d 8b 36 4c 3b 34 24 74 28 4d 8b 66 20 40 84 ed 75 54 49 8b 44 24 60 <f6> 00 01 74 91 48 8b 40 38 4c 89 e7 e8 63 6b 42 f5 4d 8b 36 4c 3b
[ 7953.480783] RSP: 0018:ffffa57ccf99fa68 EFLAGS: 00010202
[ 7953.486009] RAX: 000000000000031f RBX: 0000000500003977 RCX: 0000000000000000
[ 7953.493141] RDX: 00000000ffffffc0 RSI: 0000000000000000 RDI: ffffffffc0c300e2
[ 7953.500274] RBP: 0000000000000001 R08: 0000000000000000 R09: 000000000000000a
[ 7953.507404] R10: 000000000000000a R11: f000000000000000 R12: ffff91c759fedb20
[ 7953.514536] R13: 0000000000000000 R14: ffff91c759fedb00 R15: ffff91c76dfccb40
[ 7953.521671] FS:  0000000000000000(0000) GS:ffff91f580800000(0000) knlGS:0000000000000000
[ 7953.529757] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 7953.535501] CR2: 000000000000031f CR3: 0000003d02826003 CR4: 00000000007706f0
[ 7953.542633] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 7953.549768] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 7953.556899] PKRU: 55555554
[ 7953.559612] Call Trace:
[ 7953.562064]  ? lock_release+0x1cd/0x2a0
[ 7953.565902]  ? _raw_spin_unlock+0x1f/0x30
[ 7953.569917]  ? xlog_write+0x1e2/0x630 [xfs]
[ 7953.574162]  ? lock_acquire+0x15d/0x380
[ 7953.578000]  ? lock_acquire+0x15d/0x380
[ 7953.581841]  ? lock_release+0x1cd/0x2a0
[ 7953.585680]  ? trace_hardirqs_on+0x1b/0xd0
[ 7953.589780]  ? _raw_spin_unlock_irqrestore+0x37/0x40
[ 7953.594744]  ? __wake_up_common_lock+0x7a/0x90
[ 7953.599192]  ? lock_release+0x1cd/0x2a0
[ 7953.603031]  xlog_cil_committed+0x34f/0x390 [xfs]
[ 7953.607798]  ? xlog_cil_push_work+0x715/0x8d0 [xfs]
[ 7953.612738]  xlog_cil_push_work+0x740/0x8d0 [xfs]
[ 7953.617504]  ? _raw_spin_unlock_irq+0x24/0x40
[ 7953.621862]  ? finish_task_switch.isra.0+0xa0/0x2c0
[ 7953.626745]  ? kmem_cache_free+0x247/0x5c0
[ 7953.630839]  ? fsnotify_final_mark_destroy+0x1c/0x30
[ 7953.635806]  ? lock_acquire+0x15d/0x380
[ 7953.639646]  ? lock_acquire+0x15d/0x380
[ 7953.643484]  ? lock_release+0x1cd/0x2a0
[ 7953.647323]  process_one_work+0x26e/0x560
[ 7953.651337]  worker_thread+0x52/0x3b0
[ 7953.655003]  ? process_one_work+0x560/0x560
[ 7953.659188]  kthread+0x12c/0x150
[ 7953.662421]  ? __kthread_bind_mask+0x60/0x60
[ 7953.666694]  ret_from_fork+0x22/0x30
[ 7953.670273] Modules linked in: rfkill dm_service_time dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua rpcrdma sunrpc rdma_ucm ib_srpt ib_isert iscsi_target_mod target_core_mod ib_iser libiscsi scsi_transport_iscsi rdma_cm ib_umad ib_ipoib iw_cm ib_cm mlx5_ib intel_rapl_msr intel_rapl_common isst_if_common ib_uverbs ib_core skx_edac nfit libnvdimm x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel mlx5_core kvm ipmi_ssif iTCO_wdt intel_pmc_bxt irqbypass iTCO_vendor_support rapl acpi_ipmi intel_cstate psample intel_uncore mei_me wmi_bmof ipmi_si pcspkr mlxfw i2c_i801 tg3 pci_hyperv_intf mei lpc_ich intel_pch_thermal ipmi_devintf i2c_smbus ipmi_msghandler acpi_power_meter fuse zram ip_tables xfs lpfc mgag200 drm_kms_helper nvmet_fc nvmet cec nvme_fc crct10dif_pclmul crc32_pclmul nvme_fabrics drm crc32c_intel nvme_core ghash_clmulni_intel scsi_transport_fc megaraid_sas i2c_algo_bit wmi
[ 7953.749025] CR2: 000000000000031f
[ 7953.752345] ---[ end trace f04c960f66265f3b ]---


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 0/8 V2] xfs: log fixes for for-next
  2021-06-17 18:32 ` [PATCH 0/8 V2] xfs: log fixes for for-next Brian Foster
@ 2021-06-17 19:05   ` Darrick J. Wong
  2021-06-17 20:06     ` Brian Foster
  2021-06-17 23:43     ` Dave Chinner
  0 siblings, 2 replies; 50+ messages in thread
From: Darrick J. Wong @ 2021-06-17 19:05 UTC (permalink / raw)
  To: Brian Foster; +Cc: Dave Chinner, linux-xfs

On Thu, Jun 17, 2021 at 02:32:30PM -0400, Brian Foster wrote:
> On Thu, Jun 17, 2021 at 06:26:09PM +1000, Dave Chinner wrote:
> > Hi folks,
> > 
> > This is followup from the first set of log fixes for for-next that
> > were posted here:
> > 
> > https://lore.kernel.org/linux-xfs/20210615175719.GD158209@locust/T/#mde2cf0bb7d2ac369815a7e9371f0303efc89f51b
> > 
> > The first two patches of this series are updates for those patches,
> > change log below. The rest is the fix for the bigger issue we
> > uncovered in investigating the generic/019 failures, being that
> > we're triggering a zero-day bug in the way log recovery assigns LSNs
> > to checkpoints.
> > 
> > The "simple" fix of using the same ordering code as the commit
> > record for the start records in the CIL push turned into a lot of
> > patches once I started cleaning it up, separating out all the
> > different bits and finally realising all the things I needed to
> > change to avoid unintentional logic/behavioural changes. Hence
> > there's some code movement, some factoring, API changes to
> > xlog_write(), changing where we attach callbacks to commit iclogs so
> > they remain correctly ordered if there are multiple commit records
> > in the one iclog and then, finally, strictly ordering the start
> > records....
> > 
> > The original "simple fix" I tested last night ran almost a thousand
> > cycles of generic/019 without a log hang or recovery failure of any
> > kind. The refactored patchset has run a couple hundred cycles of
> > g/019 and g/475 over the last few hours without a failure, so I'm
> > posting this so we can get a review iteration done while I sleep so
> > we can - hopefully - get this sorted out before the end of the week.
> > 
> 
> My first spin of this included generic/019 and generic/475, ran for 18
> or so iterations and 475 exploded with a stream of asserts followed by a
> NULL pointer crash:
> 
> # grep -e Assertion -e BUG dmesg.out
> ...
> [ 7951.878058] XFS: Assertion failed: atomic_read(&buip->bui_refcount) > 0, file: fs/xfs/xfs_bmap_item.c, line: 57
> [ 7952.261251] XFS: Assertion failed: atomic_read(&buip->bui_refcount) > 0, file: fs/xfs/xfs_bmap_item.c, line: 57
> [ 7952.644444] XFS: Assertion failed: atomic_read(&buip->bui_refcount) > 0, file: fs/xfs/xfs_bmap_item.c, line: 57
> [ 7953.027626] XFS: Assertion failed: atomic_read(&buip->bui_refcount) > 0, file: fs/xfs/xfs_bmap_item.c, line: 57
> [ 7953.410804] BUG: kernel NULL pointer dereference, address: 000000000000031f
> [ 7954.118973] BUG: unable to handle page fault for address: ffffa57ccf99fa98
> 
> I don't know if this is a regression, but I've not seen it before. I've
> attempted to spin generic/475 since then to see if it reproduces again,
> but so far I'm only running into some of the preexisting issues
> associated with that test.

By any chance, do the two log recovery fixes I sent yesterday make those
problems go away?

> I'll let it go a while more and probably
> switch it back to running both sometime before the end of the day for an
> overnight test.

Also, do the CIL livelocks go away if you apply only patches 1-2?

> A full copy of the assert and NULL pointer BUG splat is included below
> for reference. It looks like the fault BUG splat ended up interspersed
> or otherwise mangled, but I suspect that one is just fallout from the
> immediately previous crash.

I have a question about the composition of this 8-patch series --
which patches fix the new cil code, and which ones fix the out of order
recovery problems?  I suspect that patches 1-2 are for the new CIL code,
and 3-8 are to fix the recovery problems.

Thinking with my distro kernel not-maintainer hat on, I'm considering
how to backport whatever fixes emerge for the recovery ordering issue
into existing kernels.  The way I see things right now, the CIL changes
(+ fixes) and the ordering bug fixes are separate issues.  The log
ordering problems should get fixed as soon as we have a practical
solution; the CIL changes could get deferred if need be since it's a
medium-high risk; and the real question is how to sequence all this?

(Or to put it another way: I'm still stuck going "oh wowwww this is a
lot more change" while trying to understand patch 4)

--D

> 
> Brian
> 
> --- 8< ---
> 
> [ 7953.027626] XFS: Assertion failed: atomic_read(&buip->bui_refcount) > 0, file: fs/xfs/xfs_bmap_item.c, line: 57
> [ 7953.037737] ------------[ cut here ]------------
> [ 7953.042358] WARNING: CPU: 0 PID: 131627 at fs/xfs/xfs_message.c:112 assfail+0x25/0x28 [xfs]
> [ 7953.050782] Modules linked in: rfkill dm_service_time dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua rpcrdma sunrpc rdma_ucm ib_srpt ib_isert iscsi_target_mod target_core_mod ib_iser libiscsi scsi_transport_iscsi rdma_cm ib_umad ib_ipoib iw_cm ib_cm mlx5_ib intel_rapl_msr intel_rapl_common isst_if_common ib_uverbs ib_core skx_edac nfit libnvdimm x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel mlx5_core kvm ipmi_ssif iTCO_wdt intel_pmc_bxt irqbypass iTCO_vendor_support rapl acpi_ipmi intel_cstate psample intel_uncore mei_me wmi_bmof ipmi_si pcspkr mlxfw i2c_i801 tg3 pci_hyperv_intf mei lpc_ich intel_pch_thermal ipmi_devintf i2c_smbus ipmi_msghandler acpi_power_meter fuse zram ip_tables xfs lpfc mgag200 drm_kms_helper nvmet_fc nvmet cec nvme_fc crct10dif_pclmul crc32_pclmul nvme_fabrics drm crc32c_intel nvme_core ghash_clmulni_intel scsi_transport_fc megaraid_sas i2c_algo_bit wmi
> [ 7953.129548] CPU: 0 PID: 131627 Comm: kworker/u161:5 Tainted: G        W I       5.13.0-rc4+ #70
> [ 7953.138243] Hardware name: Dell Inc. PowerEdge R740/01KPX8, BIOS 1.6.11 11/20/2018
> [ 7953.145818] Workqueue: xfs-cil/dm-7 xlog_cil_push_work [xfs]
> [ 7953.151554] RIP: 0010:assfail+0x25/0x28 [xfs]
> [ 7953.155991] Code: ff ff 0f 0b c3 0f 1f 44 00 00 41 89 c8 48 89 d1 48 89 f2 48 c7 c6 d8 eb c3 c0 e8 cf fa ff ff 80 3d f1 d4 0a 00 00 74 02 0f 0b <0f> 0b c3 48 8d 45 10 48 89 e2 4c 89 e6 48 89 1c 24 48 89 44 24 18
> [ 7953.174745] RSP: 0018:ffffa57ccf99fa50 EFLAGS: 00010246
> [ 7953.179982] RAX: 00000000ffffffea RBX: 0000000500003977 RCX: 0000000000000000
> [ 7953.187121] RDX: 00000000ffffffc0 RSI: 0000000000000000 RDI: ffffffffc0c300e2
> [ 7953.194264] RBP: ffff91f685725040 R08: 0000000000000000 R09: 000000000000000a
> [ 7953.201405] R10: 000000000000000a R11: f000000000000000 R12: ffff91f685725040
> [ 7953.208546] R13: 0000000000000000 R14: ffff91f66abed140 R15: ffff91c76dfccb40
> [ 7953.215686] FS:  0000000000000000(0000) GS:ffff91f580800000(0000) knlGS:0000000000000000
> [ 7953.223781] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 7953.229533] CR2: 00000000020e2108 CR3: 0000003d02826003 CR4: 00000000007706f0
> [ 7953.236667] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [ 7953.243809] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [ 7953.250949] PKRU: 55555554
> [ 7953.253669] Call Trace:
> [ 7953.256123]  xfs_bui_release+0x4b/0x50 [xfs]
> [ 7953.260466]  xfs_trans_committed_bulk+0x158/0x2c0 [xfs]
> [ 7953.265762]  ? lock_release+0x1cd/0x2a0
> [ 7953.269610]  ? _raw_spin_unlock+0x1f/0x30
> [ 7953.273630]  ? xlog_write+0x1e2/0x630 [xfs]
> [ 7953.277886]  ? lock_acquire+0x15d/0x380
> [ 7953.281732]  ? lock_acquire+0x15d/0x380
> [ 7953.285582]  ? lock_release+0x1cd/0x2a0
> [ 7953.289428]  ? trace_hardirqs_on+0x1b/0xd0
> [ 7953.293536]  ? _raw_spin_unlock_irqrestore+0x37/0x40
> [ 7953.298511]  ? __wake_up_common_lock+0x7a/0x90
> [ 7953.302966]  ? lock_release+0x1cd/0x2a0
> [ 7953.306813]  xlog_cil_committed+0x34f/0x390 [xfs]
> [ 7953.311593]  ? xlog_cil_push_work+0x715/0x8d0 [xfs]
> [ 7953.316547]  xlog_cil_push_work+0x740/0x8d0 [xfs]
> [ 7953.321321]  ? _raw_spin_unlock_irq+0x24/0x40
> [ 7953.325689]  ? finish_task_switch.isra.0+0xa0/0x2c0
> [ 7953.330580]  ? kmem_cache_free+0x247/0x5c0
> [ 7953.334685]  ? fsnotify_final_mark_destroy+0x1c/0x30
> [ 7953.339658]  ? lock_acquire+0x15d/0x380
> [ 7953.343505]  ? lock_acquire+0x15d/0x380
> [ 7953.347353]  ? lock_release+0x1cd/0x2a0
> [ 7953.351203]  process_one_work+0x26e/0x560
> [ 7953.355225]  worker_thread+0x52/0x3b0
> [ 7953.358898]  ? process_one_work+0x560/0x560
> [ 7953.363094]  kthread+0x12c/0x150
> [ 7953.366335]  ? __kthread_bind_mask+0x60/0x60
> [ 7953.370617]  ret_from_fork+0x22/0x30
> [ 7953.374206] irq event stamp: 0
> [ 7953.377268] hardirqs last  enabled at (0): [<0000000000000000>] 0x0
> [ 7953.383544] hardirqs last disabled at (0): [<ffffffffb50da3f4>] copy_process+0x754/0x1d00
> [ 7953.391724] softirqs last  enabled at (0): [<ffffffffb50da3f4>] copy_process+0x754/0x1d00
> [ 7953.399907] softirqs last disabled at (0): [<0000000000000000>] 0x0
> [ 7953.406179] ---[ end trace f04c960f66265f3a ]---
> [ 7953.410804] BUG: kernel NULL pointer dereference, address: 000000000000031f
> [ 7953.417760] #PF: supervisor read access in kernel mode
> [ 7953.422900] #PF: error_code(0x0000) - not-present page
> [ 7953.428038] PGD 0 P4D 0 
> [ 7953.430579] Oops: 0000 [#1] SMP PTI
> [ 7953.434070] CPU: 0 PID: 131627 Comm: kworker/u161:5 Tainted: G        W I       5.13.0-rc4+ #70
> [ 7953.442764] Hardware name: Dell Inc. PowerEdge R740/01KPX8, BIOS 1.6.11 11/20/2018
> [ 7953.450330] Workqueue: xfs-cil/dm-7 xlog_cil_push_work [xfs]
> [ 7953.456058] RIP: 0010:xfs_trans_committed_bulk+0xcc/0x2c0 [xfs]
> [ 7953.462036] Code: 41 83 c5 01 48 89 54 c4 50 41 83 fd 1f 0f 8f 11 01 00 00 4d 8b 36 4c 3b 34 24 74 28 4d 8b 66 20 40 84 ed 75 54 49 8b 44 24 60 <f6> 00 01 74 91 48 8b 40 38 4c 89 e7 e8 63 6b 42 f5 4d 8b 36 4c 3b
> [ 7953.480783] RSP: 0018:ffffa57ccf99fa68 EFLAGS: 00010202
> [ 7953.486009] RAX: 000000000000031f RBX: 0000000500003977 RCX: 0000000000000000
> [ 7953.493141] RDX: 00000000ffffffc0 RSI: 0000000000000000 RDI: ffffffffc0c300e2
> [ 7953.500274] RBP: 0000000000000001 R08: 0000000000000000 R09: 000000000000000a
> [ 7953.507404] R10: 000000000000000a R11: f000000000000000 R12: ffff91c759fedb20
> [ 7953.514536] R13: 0000000000000000 R14: ffff91c759fedb00 R15: ffff91c76dfccb40
> [ 7953.521671] FS:  0000000000000000(0000) GS:ffff91f580800000(0000) knlGS:0000000000000000
> [ 7953.529757] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 7953.535501] CR2: 000000000000031f CR3: 0000003d02826003 CR4: 00000000007706f0
> [ 7953.542633] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [ 7953.549768] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [ 7953.556899] PKRU: 55555554
> [ 7953.559612] Call Trace:
> [ 7953.562064]  ? lock_release+0x1cd/0x2a0
> [ 7953.565902]  ? _raw_spin_unlock+0x1f/0x30
> [ 7953.569917]  ? xlog_write+0x1e2/0x630 [xfs]
> [ 7953.574162]  ? lock_acquire+0x15d/0x380
> [ 7953.578000]  ? lock_acquire+0x15d/0x380
> [ 7953.581841]  ? lock_release+0x1cd/0x2a0
> [ 7953.585680]  ? trace_hardirqs_on+0x1b/0xd0
> [ 7953.589780]  ? _raw_spin_unlock_irqrestore+0x37/0x40
> [ 7953.594744]  ? __wake_up_common_lock+0x7a/0x90
> [ 7953.599192]  ? lock_release+0x1cd/0x2a0
> [ 7953.603031]  xlog_cil_committed+0x34f/0x390 [xfs]
> [ 7953.607798]  ? xlog_cil_push_work+0x715/0x8d0 [xfs]
> [ 7953.612738]  xlog_cil_push_work+0x740/0x8d0 [xfs]
> [ 7953.617504]  ? _raw_spin_unlock_irq+0x24/0x40
> [ 7953.621862]  ? finish_task_switch.isra.0+0xa0/0x2c0
> [ 7953.626745]  ? kmem_cache_free+0x247/0x5c0
> [ 7953.630839]  ? fsnotify_final_mark_destroy+0x1c/0x30
> [ 7953.635806]  ? lock_acquire+0x15d/0x380
> [ 7953.639646]  ? lock_acquire+0x15d/0x380
> [ 7953.643484]  ? lock_release+0x1cd/0x2a0
> [ 7953.647323]  process_one_work+0x26e/0x560
> [ 7953.651337]  worker_thread+0x52/0x3b0
> [ 7953.655003]  ? process_one_work+0x560/0x560
> [ 7953.659188]  kthread+0x12c/0x150
> [ 7953.662421]  ? __kthread_bind_mask+0x60/0x60
> [ 7953.666694]  ret_from_fork+0x22/0x30
> [ 7953.670273] Modules linked in: rfkill dm_service_time dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua rpcrdma sunrpc rdma_ucm ib_srpt ib_isert iscsi_target_mod target_core_mod ib_iser libiscsi scsi_transport_iscsi rdma_cm ib_umad ib_ipoib iw_cm ib_cm mlx5_ib intel_rapl_msr intel_rapl_common isst_if_common ib_uverbs ib_core skx_edac nfit libnvdimm x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel mlx5_core kvm ipmi_ssif iTCO_wdt intel_pmc_bxt irqbypass iTCO_vendor_support rapl acpi_ipmi intel_cstate psample intel_uncore mei_me wmi_bmof ipmi_si pcspkr mlxfw i2c_i801 tg3 pci_hyperv_intf mei lpc_ich intel_pch_thermal ipmi_devintf i2c_smbus ipmi_msghandler acpi_power_meter fuse zram ip_tables xfs lpfc mgag200 drm_kms_helper nvmet_fc nvmet cec nvme_fc crct10dif_pclmul crc32_pclmul nvme_fabrics drm crc32c_intel nvme_core ghash_clmulni_intel scsi_transport_fc megaraid_sas i2c_algo_bit wmi
> [ 7953.749025] CR2: 000000000000031f
> [ 7953.752345] ---[ end trace f04c960f66265f3b ]---
> 

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 5/8] xfs: factor out log write ordering from xlog_cil_push_work()
  2021-06-17  8:26 ` [PATCH 5/8] xfs: factor out log write ordering from xlog_cil_push_work() Dave Chinner
@ 2021-06-17 19:59   ` Darrick J. Wong
  2021-06-18 14:27     ` Christoph Hellwig
  0 siblings, 1 reply; 50+ messages in thread
From: Darrick J. Wong @ 2021-06-17 19:59 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs

On Thu, Jun 17, 2021 at 06:26:14PM +1000, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> So we can use it for start record ordering as well as commit record
> ordering in future.
> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>

This tricked me for a second until I realized that xlog_cil_order_write
is the chunk of code just prior to the xlog_cil_write_commit_record
call.

Reviewed-by: Darrick J. Wong <djwong@kernel.org>

--D

> ---
>  fs/xfs/xfs_log_cil.c | 89 ++++++++++++++++++++++++++------------------
>  1 file changed, 52 insertions(+), 37 deletions(-)
> 
> diff --git a/fs/xfs/xfs_log_cil.c b/fs/xfs/xfs_log_cil.c
> index 35fc3e57d870..f993ec69fc97 100644
> --- a/fs/xfs/xfs_log_cil.c
> +++ b/fs/xfs/xfs_log_cil.c
> @@ -784,9 +784,54 @@ xlog_cil_build_trans_hdr(
>  }
>  
>  /*
> - * Write out the commit record of a checkpoint transaction associated with the
> - * given ticket to close off a running log write. Return the lsn of the commit
> - * record.
> + * Ensure that the order of log writes follows checkpoint sequence order. This
> + * relies on the context LSN being zero until the log write has guaranteed the
> + * LSN that the log write will start at via xlog_state_get_iclog_space().
> + */
> +static int
> +xlog_cil_order_write(
> +	struct xfs_cil		*cil,
> +	xfs_csn_t		sequence)
> +{
> +	struct xfs_cil_ctx	*ctx;
> +
> +restart:
> +	spin_lock(&cil->xc_push_lock);
> +	list_for_each_entry(ctx, &cil->xc_committing, committing) {
> +		/*
> +		 * Avoid getting stuck in this loop because we were woken by the
> +		 * shutdown, but then went back to sleep once already in the
> +		 * shutdown state.
> +		 */
> +		if (XLOG_FORCED_SHUTDOWN(cil->xc_log)) {
> +			spin_unlock(&cil->xc_push_lock);
> +			return -EIO;
> +		}
> +
> +		/*
> +		 * Higher sequences will wait for this one so skip them.
> +		 * Don't wait for our own sequence, either.
> +		 */
> +		if (ctx->sequence >= sequence)
> +			continue;
> +		if (!ctx->commit_lsn) {
> +			/*
> +			 * It is still being pushed! Wait for the push to
> +			 * complete, then start again from the beginning.
> +			 */
> +			xlog_wait(&cil->xc_commit_wait, &cil->xc_push_lock);
> +			goto restart;
> +		}
> +	}
> +	spin_unlock(&cil->xc_push_lock);
> +	return 0;
> +}
> +
> +/*
> + * Write out the commit record of a checkpoint transaction to close off a
> + * running log write. These commit records are strictly ordered in ascending CIL
> + * sequence order so that log recovery will always replay the checkpoints in the
> + * correct order.
>   */
>  int
>  xlog_cil_write_commit_record(
> @@ -816,6 +861,10 @@ xlog_cil_write_commit_record(
>  	if (XLOG_FORCED_SHUTDOWN(log))
>  		return -EIO;
>  
> +	error = xlog_cil_order_write(ctx->cil, ctx->sequence);
> +	if (error)
> +		return error;
> +
>  	/* account for space used by record data */
>  	ctx->ticket->t_curr_res -= reg.i_len;
>  	error = xlog_write(log, ctx, &lv_chain, ctx->ticket, iclog, reg.i_len);
> @@ -1048,40 +1097,6 @@ xlog_cil_push_work(
>  	if (error)
>  		goto out_abort_free_ticket;
>  
> -	/*
> -	 * now that we've written the checkpoint into the log, strictly
> -	 * order the commit records so replay will get them in the right order.
> -	 */
> -restart:
> -	spin_lock(&cil->xc_push_lock);
> -	list_for_each_entry(new_ctx, &cil->xc_committing, committing) {
> -		/*
> -		 * Avoid getting stuck in this loop because we were woken by the
> -		 * shutdown, but then went back to sleep once already in the
> -		 * shutdown state.
> -		 */
> -		if (XLOG_FORCED_SHUTDOWN(log)) {
> -			spin_unlock(&cil->xc_push_lock);
> -			goto out_abort_free_ticket;
> -		}
> -
> -		/*
> -		 * Higher sequences will wait for this one so skip them.
> -		 * Don't wait for our own sequence, either.
> -		 */
> -		if (new_ctx->sequence >= ctx->sequence)
> -			continue;
> -		if (!new_ctx->commit_lsn) {
> -			/*
> -			 * It is still being pushed! Wait for the push to
> -			 * complete, then start again from the beginning.
> -			 */
> -			xlog_wait(&cil->xc_commit_wait, &cil->xc_push_lock);
> -			goto restart;
> -		}
> -	}
> -	spin_unlock(&cil->xc_push_lock);
> -
>  	error = xlog_cil_write_commit_record(ctx, &commit_iclog);
>  	if (error)
>  		goto out_abort_free_ticket;
> -- 
> 2.31.1
> 

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 0/8 V2] xfs: log fixes for for-next
  2021-06-17 19:05   ` Darrick J. Wong
@ 2021-06-17 20:06     ` Brian Foster
  2021-06-17 20:26       ` Darrick J. Wong
  2021-06-17 23:43     ` Dave Chinner
  1 sibling, 1 reply; 50+ messages in thread
From: Brian Foster @ 2021-06-17 20:06 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: Dave Chinner, linux-xfs

On Thu, Jun 17, 2021 at 12:05:19PM -0700, Darrick J. Wong wrote:
> On Thu, Jun 17, 2021 at 02:32:30PM -0400, Brian Foster wrote:
> > On Thu, Jun 17, 2021 at 06:26:09PM +1000, Dave Chinner wrote:
> > > Hi folks,
> > > 
> > > This is followup from the first set of log fixes for for-next that
> > > were posted here:
> > > 
> > > https://lore.kernel.org/linux-xfs/20210615175719.GD158209@locust/T/#mde2cf0bb7d2ac369815a7e9371f0303efc89f51b
> > > 
> > > The first two patches of this series are updates for those patches,
> > > change log below. The rest is the fix for the bigger issue we
> > > uncovered in investigating the generic/019 failures, being that
> > > we're triggering a zero-day bug in the way log recovery assigns LSNs
> > > to checkpoints.
> > > 
> > > The "simple" fix of using the same ordering code as the commit
> > > record for the start records in the CIL push turned into a lot of
> > > patches once I started cleaning it up, separating out all the
> > > different bits and finally realising all the things I needed to
> > > change to avoid unintentional logic/behavioural changes. Hence
> > > there's some code movement, some factoring, API changes to
> > > xlog_write(), changing where we attach callbacks to commit iclogs so
> > > they remain correctly ordered if there are multiple commit records
> > > in the one iclog and then, finally, strictly ordering the start
> > > records....
> > > 
> > > The original "simple fix" I tested last night ran almost a thousand
> > > cycles of generic/019 without a log hang or recovery failure of any
> > > kind. The refactored patchset has run a couple hundred cycles of
> > > g/019 and g/475 over the last few hours without a failure, so I'm
> > > posting this so we can get a review iteration done while I sleep so
> > > we can - hopefully - get this sorted out before the end of the week.
> > > 
> > 
> > My first spin of this included generic/019 and generic/475, ran for 18
> > or so iterations and 475 exploded with a stream of asserts followed by a
> > NULL pointer crash:
> > 
> > # grep -e Assertion -e BUG dmesg.out
> > ...
> > [ 7951.878058] XFS: Assertion failed: atomic_read(&buip->bui_refcount) > 0, file: fs/xfs/xfs_bmap_item.c, line: 57
> > [ 7952.261251] XFS: Assertion failed: atomic_read(&buip->bui_refcount) > 0, file: fs/xfs/xfs_bmap_item.c, line: 57
> > [ 7952.644444] XFS: Assertion failed: atomic_read(&buip->bui_refcount) > 0, file: fs/xfs/xfs_bmap_item.c, line: 57
> > [ 7953.027626] XFS: Assertion failed: atomic_read(&buip->bui_refcount) > 0, file: fs/xfs/xfs_bmap_item.c, line: 57
> > [ 7953.410804] BUG: kernel NULL pointer dereference, address: 000000000000031f
> > [ 7954.118973] BUG: unable to handle page fault for address: ffffa57ccf99fa98
> > 
> > I don't know if this is a regression, but I've not seen it before. I've
> > attempted to spin generic/475 since then to see if it reproduces again,
> > but so far I'm only running into some of the preexisting issues
> > associated with that test.
> 
> By any chance, do the two log recovery fixes I sent yesterday make those
> problems go away?
> 

Hadn't got to those ones yet...

> > I'll let it go a while more and probably
> > switch it back to running both sometime before the end of the day for an
> > overnight test.
> 
> Also, do the CIL livelocks go away if you apply only patches 1-2?
> 

It's kind of hard to discern the effect of individual fixes when
multiple corruptions are at play. :/ I suppose I could switch up my
planned overnight test to include the aforementioned 2 recovery fixes
and 1-2 from this series, if that is preferable..? I suspect that would
leave around the originally reported generic/019 corruption presumably
caused by the start LSN ordering issue, but we could see if the deadlock
is addressed and whether 475 survives any longer.

Brian

> > A full copy of the assert and NULL pointer BUG splat is included below
> > for reference. It looks like the fault BUG splat ended up interspersed
> > or otherwise mangled, but I suspect that one is just fallout from the
> > immediately previous crash.
> 
> I have a question about the composition of this 8-patch series --
> which patches fix the new cil code, and which ones fix the out of order
> recovery problems?  I suspect that patches 1-2 are for the new CIL code,
> and 3-8 are to fix the recovery problems.
> 
> Thinking with my distro kernel not-maintainer hat on, I'm considering
> how to backport whatever fixes emerge for the recovery ordering issue
> into existing kernels.  The way I see things right now, the CIL changes
> (+ fixes) and the ordering bug fixes are separate issues.  The log
> ordering problems should get fixed as soon as we have a practical
> solution; the CIL changes could get deferred if need be since it's a
> medium-high risk; and the real question is how to sequence all this?
> 
> (Or to put it another way: I'm still stuck going "oh wowwww this is a
> lot more change" while trying to understand patch 4)
> 
> --D
> 
> > 
> > Brian
> > 
> > --- 8< ---
> > 
> > [ 7953.027626] XFS: Assertion failed: atomic_read(&buip->bui_refcount) > 0, file: fs/xfs/xfs_bmap_item.c, line: 57
> > [ 7953.037737] ------------[ cut here ]------------
> > [ 7953.042358] WARNING: CPU: 0 PID: 131627 at fs/xfs/xfs_message.c:112 assfail+0x25/0x28 [xfs]
> > [ 7953.050782] Modules linked in: rfkill dm_service_time dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua rpcrdma sunrpc rdma_ucm ib_srpt ib_isert iscsi_target_mod target_core_mod ib_iser libiscsi scsi_transport_iscsi rdma_cm ib_umad ib_ipoib iw_cm ib_cm mlx5_ib intel_rapl_msr intel_rapl_common isst_if_common ib_uverbs ib_core skx_edac nfit libnvdimm x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel mlx5_core kvm ipmi_ssif iTCO_wdt intel_pmc_bxt irqbypass iTCO_vendor_support rapl acpi_ipmi intel_cstate psample intel_uncore mei_me wmi_bmof ipmi_si pcspkr mlxfw i2c_i801 tg3 pci_hyperv_intf mei lpc_ich intel_pch_thermal ipmi_devintf i2c_smbus ipmi_msghandler acpi_power_meter fuse zram ip_tables xfs lpfc mgag200 drm_kms_helper nvmet_fc nvmet cec nvme_fc crct10dif_pclmul crc32_pclmul nvme_fabrics drm crc32c_intel nvme_core ghash_clmulni_intel scsi_transport_fc megaraid_sas i2c_algo_bit wmi
> > [ 7953.129548] CPU: 0 PID: 131627 Comm: kworker/u161:5 Tainted: G        W I       5.13.0-rc4+ #70
> > [ 7953.138243] Hardware name: Dell Inc. PowerEdge R740/01KPX8, BIOS 1.6.11 11/20/2018
> > [ 7953.145818] Workqueue: xfs-cil/dm-7 xlog_cil_push_work [xfs]
> > [ 7953.151554] RIP: 0010:assfail+0x25/0x28 [xfs]
> > [ 7953.155991] Code: ff ff 0f 0b c3 0f 1f 44 00 00 41 89 c8 48 89 d1 48 89 f2 48 c7 c6 d8 eb c3 c0 e8 cf fa ff ff 80 3d f1 d4 0a 00 00 74 02 0f 0b <0f> 0b c3 48 8d 45 10 48 89 e2 4c 89 e6 48 89 1c 24 48 89 44 24 18
> > [ 7953.174745] RSP: 0018:ffffa57ccf99fa50 EFLAGS: 00010246
> > [ 7953.179982] RAX: 00000000ffffffea RBX: 0000000500003977 RCX: 0000000000000000
> > [ 7953.187121] RDX: 00000000ffffffc0 RSI: 0000000000000000 RDI: ffffffffc0c300e2
> > [ 7953.194264] RBP: ffff91f685725040 R08: 0000000000000000 R09: 000000000000000a
> > [ 7953.201405] R10: 000000000000000a R11: f000000000000000 R12: ffff91f685725040
> > [ 7953.208546] R13: 0000000000000000 R14: ffff91f66abed140 R15: ffff91c76dfccb40
> > [ 7953.215686] FS:  0000000000000000(0000) GS:ffff91f580800000(0000) knlGS:0000000000000000
> > [ 7953.223781] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [ 7953.229533] CR2: 00000000020e2108 CR3: 0000003d02826003 CR4: 00000000007706f0
> > [ 7953.236667] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > [ 7953.243809] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> > [ 7953.250949] PKRU: 55555554
> > [ 7953.253669] Call Trace:
> > [ 7953.256123]  xfs_bui_release+0x4b/0x50 [xfs]
> > [ 7953.260466]  xfs_trans_committed_bulk+0x158/0x2c0 [xfs]
> > [ 7953.265762]  ? lock_release+0x1cd/0x2a0
> > [ 7953.269610]  ? _raw_spin_unlock+0x1f/0x30
> > [ 7953.273630]  ? xlog_write+0x1e2/0x630 [xfs]
> > [ 7953.277886]  ? lock_acquire+0x15d/0x380
> > [ 7953.281732]  ? lock_acquire+0x15d/0x380
> > [ 7953.285582]  ? lock_release+0x1cd/0x2a0
> > [ 7953.289428]  ? trace_hardirqs_on+0x1b/0xd0
> > [ 7953.293536]  ? _raw_spin_unlock_irqrestore+0x37/0x40
> > [ 7953.298511]  ? __wake_up_common_lock+0x7a/0x90
> > [ 7953.302966]  ? lock_release+0x1cd/0x2a0
> > [ 7953.306813]  xlog_cil_committed+0x34f/0x390 [xfs]
> > [ 7953.311593]  ? xlog_cil_push_work+0x715/0x8d0 [xfs]
> > [ 7953.316547]  xlog_cil_push_work+0x740/0x8d0 [xfs]
> > [ 7953.321321]  ? _raw_spin_unlock_irq+0x24/0x40
> > [ 7953.325689]  ? finish_task_switch.isra.0+0xa0/0x2c0
> > [ 7953.330580]  ? kmem_cache_free+0x247/0x5c0
> > [ 7953.334685]  ? fsnotify_final_mark_destroy+0x1c/0x30
> > [ 7953.339658]  ? lock_acquire+0x15d/0x380
> > [ 7953.343505]  ? lock_acquire+0x15d/0x380
> > [ 7953.347353]  ? lock_release+0x1cd/0x2a0
> > [ 7953.351203]  process_one_work+0x26e/0x560
> > [ 7953.355225]  worker_thread+0x52/0x3b0
> > [ 7953.358898]  ? process_one_work+0x560/0x560
> > [ 7953.363094]  kthread+0x12c/0x150
> > [ 7953.366335]  ? __kthread_bind_mask+0x60/0x60
> > [ 7953.370617]  ret_from_fork+0x22/0x30
> > [ 7953.374206] irq event stamp: 0
> > [ 7953.377268] hardirqs last  enabled at (0): [<0000000000000000>] 0x0
> > [ 7953.383544] hardirqs last disabled at (0): [<ffffffffb50da3f4>] copy_process+0x754/0x1d00
> > [ 7953.391724] softirqs last  enabled at (0): [<ffffffffb50da3f4>] copy_process+0x754/0x1d00
> > [ 7953.399907] softirqs last disabled at (0): [<0000000000000000>] 0x0
> > [ 7953.406179] ---[ end trace f04c960f66265f3a ]---
> > [ 7953.410804] BUG: kernel NULL pointer dereference, address: 000000000000031f
> > [ 7953.417760] #PF: supervisor read access in kernel mode
> > [ 7953.422900] #PF: error_code(0x0000) - not-present page
> > [ 7953.428038] PGD 0 P4D 0 
> > [ 7953.430579] Oops: 0000 [#1] SMP PTI
> > [ 7953.434070] CPU: 0 PID: 131627 Comm: kworker/u161:5 Tainted: G        W I       5.13.0-rc4+ #70
> > [ 7953.442764] Hardware name: Dell Inc. PowerEdge R740/01KPX8, BIOS 1.6.11 11/20/2018
> > [ 7953.450330] Workqueue: xfs-cil/dm-7 xlog_cil_push_work [xfs]
> > [ 7953.456058] RIP: 0010:xfs_trans_committed_bulk+0xcc/0x2c0 [xfs]
> > [ 7953.462036] Code: 41 83 c5 01 48 89 54 c4 50 41 83 fd 1f 0f 8f 11 01 00 00 4d 8b 36 4c 3b 34 24 74 28 4d 8b 66 20 40 84 ed 75 54 49 8b 44 24 60 <f6> 00 01 74 91 48 8b 40 38 4c 89 e7 e8 63 6b 42 f5 4d 8b 36 4c 3b
> > [ 7953.480783] RSP: 0018:ffffa57ccf99fa68 EFLAGS: 00010202
> > [ 7953.486009] RAX: 000000000000031f RBX: 0000000500003977 RCX: 0000000000000000
> > [ 7953.493141] RDX: 00000000ffffffc0 RSI: 0000000000000000 RDI: ffffffffc0c300e2
> > [ 7953.500274] RBP: 0000000000000001 R08: 0000000000000000 R09: 000000000000000a
> > [ 7953.507404] R10: 000000000000000a R11: f000000000000000 R12: ffff91c759fedb20
> > [ 7953.514536] R13: 0000000000000000 R14: ffff91c759fedb00 R15: ffff91c76dfccb40
> > [ 7953.521671] FS:  0000000000000000(0000) GS:ffff91f580800000(0000) knlGS:0000000000000000
> > [ 7953.529757] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [ 7953.535501] CR2: 000000000000031f CR3: 0000003d02826003 CR4: 00000000007706f0
> > [ 7953.542633] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > [ 7953.549768] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> > [ 7953.556899] PKRU: 55555554
> > [ 7953.559612] Call Trace:
> > [ 7953.562064]  ? lock_release+0x1cd/0x2a0
> > [ 7953.565902]  ? _raw_spin_unlock+0x1f/0x30
> > [ 7953.569917]  ? xlog_write+0x1e2/0x630 [xfs]
> > [ 7953.574162]  ? lock_acquire+0x15d/0x380
> > [ 7953.578000]  ? lock_acquire+0x15d/0x380
> > [ 7953.581841]  ? lock_release+0x1cd/0x2a0
> > [ 7953.585680]  ? trace_hardirqs_on+0x1b/0xd0
> > [ 7953.589780]  ? _raw_spin_unlock_irqrestore+0x37/0x40
> > [ 7953.594744]  ? __wake_up_common_lock+0x7a/0x90
> > [ 7953.599192]  ? lock_release+0x1cd/0x2a0
> > [ 7953.603031]  xlog_cil_committed+0x34f/0x390 [xfs]
> > [ 7953.607798]  ? xlog_cil_push_work+0x715/0x8d0 [xfs]
> > [ 7953.612738]  xlog_cil_push_work+0x740/0x8d0 [xfs]
> > [ 7953.617504]  ? _raw_spin_unlock_irq+0x24/0x40
> > [ 7953.621862]  ? finish_task_switch.isra.0+0xa0/0x2c0
> > [ 7953.626745]  ? kmem_cache_free+0x247/0x5c0
> > [ 7953.630839]  ? fsnotify_final_mark_destroy+0x1c/0x30
> > [ 7953.635806]  ? lock_acquire+0x15d/0x380
> > [ 7953.639646]  ? lock_acquire+0x15d/0x380
> > [ 7953.643484]  ? lock_release+0x1cd/0x2a0
> > [ 7953.647323]  process_one_work+0x26e/0x560
> > [ 7953.651337]  worker_thread+0x52/0x3b0
> > [ 7953.655003]  ? process_one_work+0x560/0x560
> > [ 7953.659188]  kthread+0x12c/0x150
> > [ 7953.662421]  ? __kthread_bind_mask+0x60/0x60
> > [ 7953.666694]  ret_from_fork+0x22/0x30
> > [ 7953.670273] Modules linked in: rfkill dm_service_time dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua rpcrdma sunrpc rdma_ucm ib_srpt ib_isert iscsi_target_mod target_core_mod ib_iser libiscsi scsi_transport_iscsi rdma_cm ib_umad ib_ipoib iw_cm ib_cm mlx5_ib intel_rapl_msr intel_rapl_common isst_if_common ib_uverbs ib_core skx_edac nfit libnvdimm x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel mlx5_core kvm ipmi_ssif iTCO_wdt intel_pmc_bxt irqbypass iTCO_vendor_support rapl acpi_ipmi intel_cstate psample intel_uncore mei_me wmi_bmof ipmi_si pcspkr mlxfw i2c_i801 tg3 pci_hyperv_intf mei lpc_ich intel_pch_thermal ipmi_devintf i2c_smbus ipmi_msghandler acpi_power_meter fuse zram ip_tables xfs lpfc mgag200 drm_kms_helper nvmet_fc nvmet cec nvme_fc crct10dif_pclmul crc32_pclmul nvme_fabrics drm crc32c_intel nvme_core ghash_clmulni_intel scsi_transport_fc megaraid_sas i2c_algo_bit wmi
> > [ 7953.749025] CR2: 000000000000031f
> > [ 7953.752345] ---[ end trace f04c960f66265f3b ]---
> > 
> 


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 4/8] xfs: pass a CIL context to xlog_write()
  2021-06-17  8:26 ` [PATCH 4/8] xfs: pass a CIL context to xlog_write() Dave Chinner
  2021-06-17 14:46     ` kernel test robot
@ 2021-06-17 20:24   ` Darrick J. Wong
  2021-06-17 22:03     ` Dave Chinner
  2021-06-18 14:23   ` Christoph Hellwig
  2 siblings, 1 reply; 50+ messages in thread
From: Darrick J. Wong @ 2021-06-17 20:24 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs

On Thu, Jun 17, 2021 at 06:26:13PM +1000, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> Pass the CIL context to xlog_write() rather than a pointer to a LSN
> variable. Only the CIL checkpoint calls to xlog_write() need to know
> about the start LSN of the writes, so rework xlog_write to directly
> write the LSNs into the CIL context structure.
> 
> This removes the commit_lsn variable from xlog_cil_push_work(), so
> now we only have to issue the commit record ordering wakeup from
> there.
> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> ---
>  fs/xfs/xfs_log.c      | 22 +++++++++++++++++-----
>  fs/xfs/xfs_log_cil.c  | 19 ++++++++-----------
>  fs/xfs/xfs_log_priv.h |  4 ++--
>  3 files changed, 27 insertions(+), 18 deletions(-)
> 
> diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c
> index cf661c155786..fc0e43c57683 100644
> --- a/fs/xfs/xfs_log.c
> +++ b/fs/xfs/xfs_log.c
> @@ -871,7 +871,7 @@ xlog_write_unmount_record(
>  	 */
>  	if (log->l_targ != log->l_mp->m_ddev_targp)
>  		blkdev_issue_flush(log->l_targ->bt_bdev);
> -	return xlog_write(log, &lv_chain, ticket, NULL, NULL, reg.i_len);
> +	return xlog_write(log, NULL, &lv_chain, ticket, NULL, reg.i_len);
>  }
>  
>  /*
> @@ -2383,9 +2383,9 @@ xlog_write_partial(
>  int
>  xlog_write(
>  	struct xlog		*log,
> +	struct xfs_cil_ctx	*ctx,
>  	struct list_head	*lv_chain,
>  	struct xlog_ticket	*ticket,
> -	xfs_lsn_t		*start_lsn,
>  	struct xlog_in_core	**commit_iclog,
>  	uint32_t		len)
>  {
> @@ -2408,9 +2408,21 @@ xlog_write(
>  	if (error)
>  		return error;
>  
> -	/* start_lsn is the LSN of the first iclog written to. */
> -	if (start_lsn)
> -		*start_lsn = be64_to_cpu(iclog->ic_header.h_lsn);
> +	/*
> +	 * If we have a CIL context, record the LSN of the iclog we were just
> +	 * granted space to start writing into. If the context doesn't have
> +	 * a start_lsn recorded, then this iclog will contain the start record
> +	 * for the checkpoint. Otherwise this write contains the commit record
> +	 * for the checkpoint.
> +	 */
> +	if (ctx) {
> +		spin_lock(&ctx->cil->xc_push_lock);
> +		if (!ctx->start_lsn)
> +			ctx->start_lsn = be64_to_cpu(iclog->ic_header.h_lsn);
> +		else
> +			ctx->commit_lsn = be64_to_cpu(iclog->ic_header.h_lsn);
> +		spin_unlock(&ctx->cil->xc_push_lock);

This cycling of the push lock when setting start_lsn is new.  What are
we protecting against here by taking the lock?

Also, just to check my assumptions: why do we take the push lock when
setting commit_lsn?  Is that to synchronize with the xc_committing loop
that looks for contexts that need pushing?

--D

> +	}
>  
>  	lv = list_first_entry_or_null(lv_chain, struct xfs_log_vec, lv_list);
>  	while (lv) {
> diff --git a/fs/xfs/xfs_log_cil.c b/fs/xfs/xfs_log_cil.c
> index 2c8b25888c53..35fc3e57d870 100644
> --- a/fs/xfs/xfs_log_cil.c
> +++ b/fs/xfs/xfs_log_cil.c
> @@ -790,14 +790,13 @@ xlog_cil_build_trans_hdr(
>   */
>  int
>  xlog_cil_write_commit_record(
> -	struct xlog		*log,
> -	struct xlog_ticket	*ticket,
> -	struct xlog_in_core	**iclog,
> -	xfs_lsn_t		*lsn)
> +	struct xfs_cil_ctx	*ctx,
> +	struct xlog_in_core	**iclog)
>  {
> +	struct xlog		*log = ctx->cil->xc_log;
>  	struct xlog_op_header	ophdr = {
>  		.oh_clientid = XFS_TRANSACTION,
> -		.oh_tid = cpu_to_be32(ticket->t_tid),
> +		.oh_tid = cpu_to_be32(ctx->ticket->t_tid),
>  		.oh_flags = XLOG_COMMIT_TRANS,
>  	};
>  	struct xfs_log_iovec reg = {
> @@ -818,8 +817,8 @@ xlog_cil_write_commit_record(
>  		return -EIO;
>  
>  	/* account for space used by record data */
> -	ticket->t_curr_res -= reg.i_len;
> -	error = xlog_write(log, &lv_chain, ticket, lsn, iclog, reg.i_len);
> +	ctx->ticket->t_curr_res -= reg.i_len;
> +	error = xlog_write(log, ctx, &lv_chain, ctx->ticket, iclog, reg.i_len);
>  	if (error)
>  		xfs_force_shutdown(log->l_mp, SHUTDOWN_LOG_IO_ERROR);
>  	return error;
> @@ -1038,7 +1037,7 @@ xlog_cil_push_work(
>  	 * use the commit record lsn then we can move the tail beyond the grant
>  	 * write head.
>  	 */
> -	error = xlog_write(log, &ctx->lv_chain, ctx->ticket, &ctx->start_lsn,
> +	error = xlog_write(log, ctx, &ctx->lv_chain, ctx->ticket,
>  				NULL, num_bytes);
>  
>  	/*
> @@ -1083,8 +1082,7 @@ xlog_cil_push_work(
>  	}
>  	spin_unlock(&cil->xc_push_lock);
>  
> -	error = xlog_cil_write_commit_record(log, ctx->ticket, &commit_iclog,
> -			&commit_lsn);
> +	error = xlog_cil_write_commit_record(ctx, &commit_iclog);
>  	if (error)
>  		goto out_abort_free_ticket;
>  
> @@ -1104,7 +1102,6 @@ xlog_cil_push_work(
>  	 * and wake up anyone who is waiting for the commit to complete.
>  	 */
>  	spin_lock(&cil->xc_push_lock);
> -	ctx->commit_lsn = commit_lsn;
>  	wake_up_all(&cil->xc_commit_wait);
>  	spin_unlock(&cil->xc_push_lock);
>  
> diff --git a/fs/xfs/xfs_log_priv.h b/fs/xfs/xfs_log_priv.h
> index 26f26769d1c6..af8a9dfa8068 100644
> --- a/fs/xfs/xfs_log_priv.h
> +++ b/fs/xfs/xfs_log_priv.h
> @@ -487,8 +487,8 @@ xlog_write_adv_cnt(void **ptr, int *len, int *off, size_t bytes)
>  
>  void	xlog_print_tic_res(struct xfs_mount *mp, struct xlog_ticket *ticket);
>  void	xlog_print_trans(struct xfs_trans *);
> -int	xlog_write(struct xlog *log, struct list_head *lv_chain,
> -		struct xlog_ticket *tic, xfs_lsn_t *start_lsn,
> +int	xlog_write(struct xlog *log, struct xfs_cil_ctx *ctx,
> +		struct list_head *lv_chain, struct xlog_ticket *tic,
>  		struct xlog_in_core **commit_iclog, uint32_t len);
>  
>  void	xfs_log_ticket_ungrant(struct xlog *log, struct xlog_ticket *ticket);
> -- 
> 2.31.1
> 

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 0/8 V2] xfs: log fixes for for-next
  2021-06-17 20:06     ` Brian Foster
@ 2021-06-17 20:26       ` Darrick J. Wong
  2021-06-17 23:31         ` Brian Foster
  0 siblings, 1 reply; 50+ messages in thread
From: Darrick J. Wong @ 2021-06-17 20:26 UTC (permalink / raw)
  To: Brian Foster; +Cc: Dave Chinner, linux-xfs

On Thu, Jun 17, 2021 at 04:06:24PM -0400, Brian Foster wrote:
> On Thu, Jun 17, 2021 at 12:05:19PM -0700, Darrick J. Wong wrote:
> > On Thu, Jun 17, 2021 at 02:32:30PM -0400, Brian Foster wrote:
> > > On Thu, Jun 17, 2021 at 06:26:09PM +1000, Dave Chinner wrote:
> > > > Hi folks,
> > > > 
> > > > This is followup from the first set of log fixes for for-next that
> > > > were posted here:
> > > > 
> > > > https://lore.kernel.org/linux-xfs/20210615175719.GD158209@locust/T/#mde2cf0bb7d2ac369815a7e9371f0303efc89f51b
> > > > 
> > > > The first two patches of this series are updates for those patches,
> > > > change log below. The rest is the fix for the bigger issue we
> > > > uncovered in investigating the generic/019 failures, being that
> > > > we're triggering a zero-day bug in the way log recovery assigns LSNs
> > > > to checkpoints.
> > > > 
> > > > The "simple" fix of using the same ordering code as the commit
> > > > record for the start records in the CIL push turned into a lot of
> > > > patches once I started cleaning it up, separating out all the
> > > > different bits and finally realising all the things I needed to
> > > > change to avoid unintentional logic/behavioural changes. Hence
> > > > there's some code movement, some factoring, API changes to
> > > > xlog_write(), changing where we attach callbacks to commit iclogs so
> > > > they remain correctly ordered if there are multiple commit records
> > > > in the one iclog and then, finally, strictly ordering the start
> > > > records....
> > > > 
> > > > The original "simple fix" I tested last night ran almost a thousand
> > > > cycles of generic/019 without a log hang or recovery failure of any
> > > > kind. The refactored patchset has run a couple hundred cycles of
> > > > g/019 and g/475 over the last few hours without a failure, so I'm
> > > > posting this so we can get a review iteration done while I sleep so
> > > > we can - hopefully - get this sorted out before the end of the week.
> > > > 
> > > 
> > > My first spin of this included generic/019 and generic/475, ran for 18
> > > or so iterations and 475 exploded with a stream of asserts followed by a
> > > NULL pointer crash:
> > > 
> > > # grep -e Assertion -e BUG dmesg.out
> > > ...
> > > [ 7951.878058] XFS: Assertion failed: atomic_read(&buip->bui_refcount) > 0, file: fs/xfs/xfs_bmap_item.c, line: 57
> > > [ 7952.261251] XFS: Assertion failed: atomic_read(&buip->bui_refcount) > 0, file: fs/xfs/xfs_bmap_item.c, line: 57
> > > [ 7952.644444] XFS: Assertion failed: atomic_read(&buip->bui_refcount) > 0, file: fs/xfs/xfs_bmap_item.c, line: 57
> > > [ 7953.027626] XFS: Assertion failed: atomic_read(&buip->bui_refcount) > 0, file: fs/xfs/xfs_bmap_item.c, line: 57
> > > [ 7953.410804] BUG: kernel NULL pointer dereference, address: 000000000000031f
> > > [ 7954.118973] BUG: unable to handle page fault for address: ffffa57ccf99fa98
> > > 
> > > I don't know if this is a regression, but I've not seen it before. I've
> > > attempted to spin generic/475 since then to see if it reproduces again,
> > > but so far I'm only running into some of the preexisting issues
> > > associated with that test.
> > 
> > By any chance, do the two log recovery fixes I sent yesterday make those
> > problems go away?
> > 
> 
> Hadn't got to those ones yet...

<nod>

> > > I'll let it go a while more and probably
> > > switch it back to running both sometime before the end of the day for an
> > > overnight test.
> > 
> > Also, do the CIL livelocks go away if you apply only patches 1-2?
> > 
> 
> It's kind of hard to discern the effect of individual fixes when
> multiple corruptions are at play. :/ I suppose I could switch up my
> planned overnight test to include the aforementioned 2 recovery fixes
> and 1-2 from this series, if that is preferable..?

I dunno about overnight, but at least ~20 or so iterations?

> I suspect that would
> leave around the originally reported generic/019 corruption presumably
> caused by the start LSN ordering issue, but we could see if the deadlock
> is addressed and whether 475 survives any longer.

Might be a useful data point to figure out if these pieces are separate
or if they really do belong in an 8 patch series, since I think ~20 or
so iterations shouldn't take too long (though I guess it is nearly 16:30
your time, isn't it...)  Well, do whatever you think is best use of
machine time.

--D

> 
> Brian
> 
> > > A full copy of the assert and NULL pointer BUG splat is included below
> > > for reference. It looks like the fault BUG splat ended up interspersed
> > > or otherwise mangled, but I suspect that one is just fallout from the
> > > immediately previous crash.
> > 
> > I have a question about the composition of this 8-patch series --
> > which patches fix the new cil code, and which ones fix the out of order
> > recovery problems?  I suspect that patches 1-2 are for the new CIL code,
> > and 3-8 are to fix the recovery problems.
> > 
> > Thinking with my distro kernel not-maintainer hat on, I'm considering
> > how to backport whatever fixes emerge for the recovery ordering issue
> > into existing kernels.  The way I see things right now, the CIL changes
> > (+ fixes) and the ordering bug fixes are separate issues.  The log
> > ordering problems should get fixed as soon as we have a practical
> > solution; the CIL changes could get deferred if need be since it's a
> > medium-high risk; and the real question is how to sequence all this?
> > 
> > (Or to put it another way: I'm still stuck going "oh wowwww this is a
> > lot more change" while trying to understand patch 4)
> > 
> > --D
> > 
> > > 
> > > Brian
> > > 
> > > --- 8< ---
> > > 
> > > [ 7953.027626] XFS: Assertion failed: atomic_read(&buip->bui_refcount) > 0, file: fs/xfs/xfs_bmap_item.c, line: 57
> > > [ 7953.037737] ------------[ cut here ]------------
> > > [ 7953.042358] WARNING: CPU: 0 PID: 131627 at fs/xfs/xfs_message.c:112 assfail+0x25/0x28 [xfs]
> > > [ 7953.050782] Modules linked in: rfkill dm_service_time dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua rpcrdma sunrpc rdma_ucm ib_srpt ib_isert iscsi_target_mod target_core_mod ib_iser libiscsi scsi_transport_iscsi rdma_cm ib_umad ib_ipoib iw_cm ib_cm mlx5_ib intel_rapl_msr intel_rapl_common isst_if_common ib_uverbs ib_core skx_edac nfit libnvdimm x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel mlx5_core kvm ipmi_ssif iTCO_wdt intel_pmc_bxt irqbypass iTCO_vendor_support rapl acpi_ipmi intel_cstate psample intel_uncore mei_me wmi_bmof ipmi_si pcspkr mlxfw i2c_i801 tg3 pci_hyperv_intf mei lpc_ich intel_pch_thermal ipmi_devintf i2c_smbus ipmi_msghandler acpi_power_meter fuse zram ip_tables xfs lpfc mgag200 drm_kms_helper nvmet_fc nvmet cec nvme_fc crct10dif_pclmul crc32_pclmul nvme_fabrics drm crc32c_intel nvme_core ghash_clmulni_intel scsi_transport_fc megaraid_sas i2c_algo_bit wmi
> > > [ 7953.129548] CPU: 0 PID: 131627 Comm: kworker/u161:5 Tainted: G        W I       5.13.0-rc4+ #70
> > > [ 7953.138243] Hardware name: Dell Inc. PowerEdge R740/01KPX8, BIOS 1.6.11 11/20/2018
> > > [ 7953.145818] Workqueue: xfs-cil/dm-7 xlog_cil_push_work [xfs]
> > > [ 7953.151554] RIP: 0010:assfail+0x25/0x28 [xfs]
> > > [ 7953.155991] Code: ff ff 0f 0b c3 0f 1f 44 00 00 41 89 c8 48 89 d1 48 89 f2 48 c7 c6 d8 eb c3 c0 e8 cf fa ff ff 80 3d f1 d4 0a 00 00 74 02 0f 0b <0f> 0b c3 48 8d 45 10 48 89 e2 4c 89 e6 48 89 1c 24 48 89 44 24 18
> > > [ 7953.174745] RSP: 0018:ffffa57ccf99fa50 EFLAGS: 00010246
> > > [ 7953.179982] RAX: 00000000ffffffea RBX: 0000000500003977 RCX: 0000000000000000
> > > [ 7953.187121] RDX: 00000000ffffffc0 RSI: 0000000000000000 RDI: ffffffffc0c300e2
> > > [ 7953.194264] RBP: ffff91f685725040 R08: 0000000000000000 R09: 000000000000000a
> > > [ 7953.201405] R10: 000000000000000a R11: f000000000000000 R12: ffff91f685725040
> > > [ 7953.208546] R13: 0000000000000000 R14: ffff91f66abed140 R15: ffff91c76dfccb40
> > > [ 7953.215686] FS:  0000000000000000(0000) GS:ffff91f580800000(0000) knlGS:0000000000000000
> > > [ 7953.223781] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > [ 7953.229533] CR2: 00000000020e2108 CR3: 0000003d02826003 CR4: 00000000007706f0
> > > [ 7953.236667] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > > [ 7953.243809] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> > > [ 7953.250949] PKRU: 55555554
> > > [ 7953.253669] Call Trace:
> > > [ 7953.256123]  xfs_bui_release+0x4b/0x50 [xfs]
> > > [ 7953.260466]  xfs_trans_committed_bulk+0x158/0x2c0 [xfs]
> > > [ 7953.265762]  ? lock_release+0x1cd/0x2a0
> > > [ 7953.269610]  ? _raw_spin_unlock+0x1f/0x30
> > > [ 7953.273630]  ? xlog_write+0x1e2/0x630 [xfs]
> > > [ 7953.277886]  ? lock_acquire+0x15d/0x380
> > > [ 7953.281732]  ? lock_acquire+0x15d/0x380
> > > [ 7953.285582]  ? lock_release+0x1cd/0x2a0
> > > [ 7953.289428]  ? trace_hardirqs_on+0x1b/0xd0
> > > [ 7953.293536]  ? _raw_spin_unlock_irqrestore+0x37/0x40
> > > [ 7953.298511]  ? __wake_up_common_lock+0x7a/0x90
> > > [ 7953.302966]  ? lock_release+0x1cd/0x2a0
> > > [ 7953.306813]  xlog_cil_committed+0x34f/0x390 [xfs]
> > > [ 7953.311593]  ? xlog_cil_push_work+0x715/0x8d0 [xfs]
> > > [ 7953.316547]  xlog_cil_push_work+0x740/0x8d0 [xfs]
> > > [ 7953.321321]  ? _raw_spin_unlock_irq+0x24/0x40
> > > [ 7953.325689]  ? finish_task_switch.isra.0+0xa0/0x2c0
> > > [ 7953.330580]  ? kmem_cache_free+0x247/0x5c0
> > > [ 7953.334685]  ? fsnotify_final_mark_destroy+0x1c/0x30
> > > [ 7953.339658]  ? lock_acquire+0x15d/0x380
> > > [ 7953.343505]  ? lock_acquire+0x15d/0x380
> > > [ 7953.347353]  ? lock_release+0x1cd/0x2a0
> > > [ 7953.351203]  process_one_work+0x26e/0x560
> > > [ 7953.355225]  worker_thread+0x52/0x3b0
> > > [ 7953.358898]  ? process_one_work+0x560/0x560
> > > [ 7953.363094]  kthread+0x12c/0x150
> > > [ 7953.366335]  ? __kthread_bind_mask+0x60/0x60
> > > [ 7953.370617]  ret_from_fork+0x22/0x30
> > > [ 7953.374206] irq event stamp: 0
> > > [ 7953.377268] hardirqs last  enabled at (0): [<0000000000000000>] 0x0
> > > [ 7953.383544] hardirqs last disabled at (0): [<ffffffffb50da3f4>] copy_process+0x754/0x1d00
> > > [ 7953.391724] softirqs last  enabled at (0): [<ffffffffb50da3f4>] copy_process+0x754/0x1d00
> > > [ 7953.399907] softirqs last disabled at (0): [<0000000000000000>] 0x0
> > > [ 7953.406179] ---[ end trace f04c960f66265f3a ]---
> > > [ 7953.410804] BUG: kernel NULL pointer dereference, address: 000000000000031f
> > > [ 7953.417760] #PF: supervisor read access in kernel mode
> > > [ 7953.422900] #PF: error_code(0x0000) - not-present page
> > > [ 7953.428038] PGD 0 P4D 0 
> > > [ 7953.430579] Oops: 0000 [#1] SMP PTI
> > > [ 7953.434070] CPU: 0 PID: 131627 Comm: kworker/u161:5 Tainted: G        W I       5.13.0-rc4+ #70
> > > [ 7953.442764] Hardware name: Dell Inc. PowerEdge R740/01KPX8, BIOS 1.6.11 11/20/2018
> > > [ 7953.450330] Workqueue: xfs-cil/dm-7 xlog_cil_push_work [xfs]
> > > [ 7953.456058] RIP: 0010:xfs_trans_committed_bulk+0xcc/0x2c0 [xfs]
> > > [ 7953.462036] Code: 41 83 c5 01 48 89 54 c4 50 41 83 fd 1f 0f 8f 11 01 00 00 4d 8b 36 4c 3b 34 24 74 28 4d 8b 66 20 40 84 ed 75 54 49 8b 44 24 60 <f6> 00 01 74 91 48 8b 40 38 4c 89 e7 e8 63 6b 42 f5 4d 8b 36 4c 3b
> > > [ 7953.480783] RSP: 0018:ffffa57ccf99fa68 EFLAGS: 00010202
> > > [ 7953.486009] RAX: 000000000000031f RBX: 0000000500003977 RCX: 0000000000000000
> > > [ 7953.493141] RDX: 00000000ffffffc0 RSI: 0000000000000000 RDI: ffffffffc0c300e2
> > > [ 7953.500274] RBP: 0000000000000001 R08: 0000000000000000 R09: 000000000000000a
> > > [ 7953.507404] R10: 000000000000000a R11: f000000000000000 R12: ffff91c759fedb20
> > > [ 7953.514536] R13: 0000000000000000 R14: ffff91c759fedb00 R15: ffff91c76dfccb40
> > > [ 7953.521671] FS:  0000000000000000(0000) GS:ffff91f580800000(0000) knlGS:0000000000000000
> > > [ 7953.529757] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > [ 7953.535501] CR2: 000000000000031f CR3: 0000003d02826003 CR4: 00000000007706f0
> > > [ 7953.542633] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > > [ 7953.549768] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> > > [ 7953.556899] PKRU: 55555554
> > > [ 7953.559612] Call Trace:
> > > [ 7953.562064]  ? lock_release+0x1cd/0x2a0
> > > [ 7953.565902]  ? _raw_spin_unlock+0x1f/0x30
> > > [ 7953.569917]  ? xlog_write+0x1e2/0x630 [xfs]
> > > [ 7953.574162]  ? lock_acquire+0x15d/0x380
> > > [ 7953.578000]  ? lock_acquire+0x15d/0x380
> > > [ 7953.581841]  ? lock_release+0x1cd/0x2a0
> > > [ 7953.585680]  ? trace_hardirqs_on+0x1b/0xd0
> > > [ 7953.589780]  ? _raw_spin_unlock_irqrestore+0x37/0x40
> > > [ 7953.594744]  ? __wake_up_common_lock+0x7a/0x90
> > > [ 7953.599192]  ? lock_release+0x1cd/0x2a0
> > > [ 7953.603031]  xlog_cil_committed+0x34f/0x390 [xfs]
> > > [ 7953.607798]  ? xlog_cil_push_work+0x715/0x8d0 [xfs]
> > > [ 7953.612738]  xlog_cil_push_work+0x740/0x8d0 [xfs]
> > > [ 7953.617504]  ? _raw_spin_unlock_irq+0x24/0x40
> > > [ 7953.621862]  ? finish_task_switch.isra.0+0xa0/0x2c0
> > > [ 7953.626745]  ? kmem_cache_free+0x247/0x5c0
> > > [ 7953.630839]  ? fsnotify_final_mark_destroy+0x1c/0x30
> > > [ 7953.635806]  ? lock_acquire+0x15d/0x380
> > > [ 7953.639646]  ? lock_acquire+0x15d/0x380
> > > [ 7953.643484]  ? lock_release+0x1cd/0x2a0
> > > [ 7953.647323]  process_one_work+0x26e/0x560
> > > [ 7953.651337]  worker_thread+0x52/0x3b0
> > > [ 7953.655003]  ? process_one_work+0x560/0x560
> > > [ 7953.659188]  kthread+0x12c/0x150
> > > [ 7953.662421]  ? __kthread_bind_mask+0x60/0x60
> > > [ 7953.666694]  ret_from_fork+0x22/0x30
> > > [ 7953.670273] Modules linked in: rfkill dm_service_time dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua rpcrdma sunrpc rdma_ucm ib_srpt ib_isert iscsi_target_mod target_core_mod ib_iser libiscsi scsi_transport_iscsi rdma_cm ib_umad ib_ipoib iw_cm ib_cm mlx5_ib intel_rapl_msr intel_rapl_common isst_if_common ib_uverbs ib_core skx_edac nfit libnvdimm x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel mlx5_core kvm ipmi_ssif iTCO_wdt intel_pmc_bxt irqbypass iTCO_vendor_support rapl acpi_ipmi intel_cstate psample intel_uncore mei_me wmi_bmof ipmi_si pcspkr mlxfw i2c_i801 tg3 pci_hyperv_intf mei lpc_ich intel_pch_thermal ipmi_devintf i2c_smbus ipmi_msghandler acpi_power_meter fuse zram ip_tables xfs lpfc mgag200 drm_kms_helper nvmet_fc nvmet cec nvme_fc crct10dif_pclmul crc32_pclmul nvme_fabrics drm crc32c_intel nvme_core ghash_clmulni_intel scsi_transport_fc megaraid_sas i2c_algo_bit wmi
> > > [ 7953.749025] CR2: 000000000000031f
> > > [ 7953.752345] ---[ end trace f04c960f66265f3b ]---
> > > 
> > 
> 

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 6/8] xfs: separate out setting CIL context LSNs from xlog_write
  2021-06-17  8:26 ` [PATCH 6/8] xfs: separate out setting CIL context LSNs from xlog_write Dave Chinner
@ 2021-06-17 20:28   ` Darrick J. Wong
  2021-06-17 22:10     ` Dave Chinner
  0 siblings, 1 reply; 50+ messages in thread
From: Darrick J. Wong @ 2021-06-17 20:28 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs

On Thu, Jun 17, 2021 at 06:26:15PM +1000, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> In preparation for moving more CIL context specific functionality
> into these operations.
> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>

Looks fine as a hoist, though I wonder why you didn't do this in patch
4?

--D

> ---
>  fs/xfs/xfs_log.c      | 17 ++---------------
>  fs/xfs/xfs_log_cil.c  | 23 +++++++++++++++++++++++
>  fs/xfs/xfs_log_priv.h |  2 ++
>  3 files changed, 27 insertions(+), 15 deletions(-)
> 
> diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c
> index fc0e43c57683..1c214b395223 100644
> --- a/fs/xfs/xfs_log.c
> +++ b/fs/xfs/xfs_log.c
> @@ -2408,21 +2408,8 @@ xlog_write(
>  	if (error)
>  		return error;
>  
> -	/*
> -	 * If we have a CIL context, record the LSN of the iclog we were just
> -	 * granted space to start writing into. If the context doesn't have
> -	 * a start_lsn recorded, then this iclog will contain the start record
> -	 * for the checkpoint. Otherwise this write contains the commit record
> -	 * for the checkpoint.
> -	 */
> -	if (ctx) {
> -		spin_lock(&ctx->cil->xc_push_lock);
> -		if (!ctx->start_lsn)
> -			ctx->start_lsn = be64_to_cpu(iclog->ic_header.h_lsn);
> -		else
> -			ctx->commit_lsn = be64_to_cpu(iclog->ic_header.h_lsn);
> -		spin_unlock(&ctx->cil->xc_push_lock);
> -	}
> +	if (ctx)
> +		xlog_cil_set_ctx_write_state(ctx, iclog);
>  
>  	lv = list_first_entry_or_null(lv_chain, struct xfs_log_vec, lv_list);
>  	while (lv) {
> diff --git a/fs/xfs/xfs_log_cil.c b/fs/xfs/xfs_log_cil.c
> index f993ec69fc97..2d8d904ffb78 100644
> --- a/fs/xfs/xfs_log_cil.c
> +++ b/fs/xfs/xfs_log_cil.c
> @@ -783,6 +783,29 @@ xlog_cil_build_trans_hdr(
>  	tic->t_curr_res -= lvhdr->lv_bytes;
>  }
>  
> +/*
> + * Record the LSN of the iclog we were just granted space to start writing into.
> + * If the context doesn't have a start_lsn recorded, then this iclog will
> + * contain the start record for the checkpoint. Otherwise this write contains
> + * the commit record for the checkpoint.
> + */
> +void
> +xlog_cil_set_ctx_write_state(
> +	struct xfs_cil_ctx	*ctx,
> +	struct xlog_in_core	*iclog)
> +{
> +	struct xfs_cil		*cil = ctx->cil;
> +	xfs_lsn_t		lsn = be64_to_cpu(iclog->ic_header.h_lsn);
> +
> +	ASSERT(!ctx->commit_lsn);
> +	spin_lock(&cil->xc_push_lock);
> +	if (!ctx->start_lsn)
> +		ctx->start_lsn = lsn;
> +	else
> +		ctx->commit_lsn = lsn;
> +	spin_unlock(&cil->xc_push_lock);
> +}
> +
>  /*
>   * Ensure that the order of log writes follows checkpoint sequence order. This
>   * relies on the context LSN being zero until the log write has guaranteed the
> diff --git a/fs/xfs/xfs_log_priv.h b/fs/xfs/xfs_log_priv.h
> index af8a9dfa8068..849ba2eb3483 100644
> --- a/fs/xfs/xfs_log_priv.h
> +++ b/fs/xfs/xfs_log_priv.h
> @@ -563,6 +563,8 @@ void	xlog_cil_destroy(struct xlog *log);
>  bool	xlog_cil_empty(struct xlog *log);
>  void	xlog_cil_commit(struct xlog *log, struct xfs_trans *tp,
>  			xfs_csn_t *commit_seq, bool regrant);
> +void	xlog_cil_set_ctx_write_state(struct xfs_cil_ctx *ctx,
> +			struct xlog_in_core *iclog);
>  
>  /*
>   * CIL force routines
> -- 
> 2.31.1
> 

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 7/8] xfs: attached iclog callbacks in xlog_cil_set_ctx_write_state()
  2021-06-17  8:26 ` [PATCH 7/8] xfs: attached iclog callbacks in xlog_cil_set_ctx_write_state() Dave Chinner
@ 2021-06-17 20:55   ` Darrick J. Wong
  2021-06-17 22:20     ` Dave Chinner
  0 siblings, 1 reply; 50+ messages in thread
From: Darrick J. Wong @ 2021-06-17 20:55 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs

On Thu, Jun 17, 2021 at 06:26:16PM +1000, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> We currently attach iclog callbacks for the CIL when the commit
> iclog is returned from xlog_write. Because
> xlog_state_get_iclog_space() always guarantees that the commit
> record will fit in the iclog it returns, we can move this IO
> callback setting to xlog_cil_set_ctx_write_state(), record the
> commit iclog in the context and remove the need for the commit iclog
> to be returned by xlog_write() altogether.
> 
> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> ---
>  fs/xfs/xfs_log.c      |  8 ++----
>  fs/xfs/xfs_log_cil.c  | 65 +++++++++++++++++++++++++------------------
>  fs/xfs/xfs_log_priv.h |  3 +-
>  3 files changed, 42 insertions(+), 34 deletions(-)
> 
> diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c
> index 1c214b395223..359246d54db7 100644
> --- a/fs/xfs/xfs_log.c
> +++ b/fs/xfs/xfs_log.c
> @@ -871,7 +871,7 @@ xlog_write_unmount_record(
>  	 */
>  	if (log->l_targ != log->l_mp->m_ddev_targp)
>  		blkdev_issue_flush(log->l_targ->bt_bdev);
> -	return xlog_write(log, NULL, &lv_chain, ticket, NULL, reg.i_len);
> +	return xlog_write(log, NULL, &lv_chain, ticket, reg.i_len);
>  }
>  
>  /*
> @@ -2386,7 +2386,6 @@ xlog_write(
>  	struct xfs_cil_ctx	*ctx,
>  	struct list_head	*lv_chain,
>  	struct xlog_ticket	*ticket,
> -	struct xlog_in_core	**commit_iclog,
>  	uint32_t		len)
>  {
>  	struct xlog_in_core	*iclog = NULL;
> @@ -2436,10 +2435,7 @@ xlog_write(
>  	 */
>  	spin_lock(&log->l_icloglock);
>  	xlog_state_finish_copy(log, iclog, record_cnt, 0);
> -	if (commit_iclog)
> -		*commit_iclog = iclog;
> -	else
> -		error = xlog_state_release_iclog(log, iclog, ticket);
> +	error = xlog_state_release_iclog(log, iclog, ticket);
>  	spin_unlock(&log->l_icloglock);
>  
>  	return error;
> diff --git a/fs/xfs/xfs_log_cil.c b/fs/xfs/xfs_log_cil.c
> index 2d8d904ffb78..87e30917ce2e 100644
> --- a/fs/xfs/xfs_log_cil.c
> +++ b/fs/xfs/xfs_log_cil.c
> @@ -799,11 +799,34 @@ xlog_cil_set_ctx_write_state(
>  
>  	ASSERT(!ctx->commit_lsn);
>  	spin_lock(&cil->xc_push_lock);
> -	if (!ctx->start_lsn)
> +	if (!ctx->start_lsn) {
>  		ctx->start_lsn = lsn;
> -	else
> -		ctx->commit_lsn = lsn;
> +		spin_unlock(&cil->xc_push_lock);
> +		return;
> +	}
> +
> +	/*
> +	 * Take a reference to the iclog for the context so that we still hold
> +	 * it when xlog_write is done and has released it. This means the
> +	 * context controls when the iclog is released for IO.
> +	 */
> +	atomic_inc(&iclog->ic_refcnt);

Where do we drop this refcount?  Is this the accounting adjustment that
we have to make because xlog_write always decrements the iclog refcount
now?

> +	ctx->commit_iclog = iclog;
> +	ctx->commit_lsn = lsn;
>  	spin_unlock(&cil->xc_push_lock);

I've noticed how the setting of ctx->commit_lsn has moved to before the
point where we splice callback lists, only to move them back below in
the next patch.  That has made it harder for me to understand this
series.

I /think/ the goal of this patch is not really a functional change so
much as a refactoring to make the cil context track the commit iclog
directly and then smooth out some of the refcounting code, but the
shuffling around of these variables makes me wonder if I'm missing some
other subtlety.

--D

> +
> +	/*
> +	 * xlog_state_get_iclog_space() guarantees there is enough space in the
> +	 * iclog for an entire commit record, so attach the context callbacks to
> +	 * the iclog at this time if we are not already in a shutdown state.
> +	 */
> +	spin_lock(&iclog->ic_callback_lock);
> +	if (iclog->ic_state == XLOG_STATE_IOERROR) {
> +		spin_unlock(&iclog->ic_callback_lock);
> +		return;
> +	}
> +	list_add_tail(&ctx->iclog_entry, &iclog->ic_callbacks);
> +	spin_unlock(&iclog->ic_callback_lock);
>  }
>  
>  /*
> @@ -858,8 +881,7 @@ xlog_cil_order_write(
>   */
>  int
>  xlog_cil_write_commit_record(
> -	struct xfs_cil_ctx	*ctx,
> -	struct xlog_in_core	**iclog)
> +	struct xfs_cil_ctx	*ctx)
>  {
>  	struct xlog		*log = ctx->cil->xc_log;
>  	struct xlog_op_header	ophdr = {
> @@ -890,7 +912,7 @@ xlog_cil_write_commit_record(
>  
>  	/* account for space used by record data */
>  	ctx->ticket->t_curr_res -= reg.i_len;
> -	error = xlog_write(log, ctx, &lv_chain, ctx->ticket, iclog, reg.i_len);
> +	error = xlog_write(log, ctx, &lv_chain, ctx->ticket, reg.i_len);
>  	if (error)
>  		xfs_force_shutdown(log->l_mp, SHUTDOWN_LOG_IO_ERROR);
>  	return error;
> @@ -940,7 +962,6 @@ xlog_cil_push_work(
>  	struct xlog		*log = cil->xc_log;
>  	struct xfs_log_vec	*lv;
>  	struct xfs_cil_ctx	*new_ctx;
> -	struct xlog_in_core	*commit_iclog;
>  	int			num_iovecs = 0;
>  	int			num_bytes = 0;
>  	int			error = 0;
> @@ -1109,8 +1130,7 @@ xlog_cil_push_work(
>  	 * use the commit record lsn then we can move the tail beyond the grant
>  	 * write head.
>  	 */
> -	error = xlog_write(log, ctx, &ctx->lv_chain, ctx->ticket,
> -				NULL, num_bytes);
> +	error = xlog_write(log, ctx, &ctx->lv_chain, ctx->ticket, num_bytes);
>  
>  	/*
>  	 * Take the lvhdr back off the lv_chain as it should not be passed
> @@ -1120,20 +1140,10 @@ xlog_cil_push_work(
>  	if (error)
>  		goto out_abort_free_ticket;
>  
> -	error = xlog_cil_write_commit_record(ctx, &commit_iclog);
> +	error = xlog_cil_write_commit_record(ctx);
>  	if (error)
>  		goto out_abort_free_ticket;
>  
> -	spin_lock(&commit_iclog->ic_callback_lock);
> -	if (commit_iclog->ic_state == XLOG_STATE_IOERROR) {
> -		spin_unlock(&commit_iclog->ic_callback_lock);
> -		goto out_abort_free_ticket;
> -	}
> -	ASSERT_ALWAYS(commit_iclog->ic_state == XLOG_STATE_ACTIVE ||
> -		      commit_iclog->ic_state == XLOG_STATE_WANT_SYNC);
> -	list_add_tail(&ctx->iclog_entry, &commit_iclog->ic_callbacks);
> -	spin_unlock(&commit_iclog->ic_callback_lock);
> -
>  	/*
>  	 * now the checkpoint commit is complete and we've attached the
>  	 * callbacks to the iclog we can assign the commit LSN to the context
> @@ -1168,8 +1178,8 @@ xlog_cil_push_work(
>  	if (ctx->start_lsn != commit_lsn) {
>  		struct xlog_in_core	*iclog;
>  
> -		for (iclog = commit_iclog->ic_prev;
> -		     iclog != commit_iclog;
> +		for (iclog = ctx->commit_iclog->ic_prev;
> +		     iclog != ctx->commit_iclog;
>  		     iclog = iclog->ic_prev) {
>  			xfs_lsn_t	hlsn;
>  
> @@ -1201,7 +1211,7 @@ xlog_cil_push_work(
>  		 * ordering for this checkpoint is correctly preserved down to
>  		 * stable storage.
>  		 */
> -		commit_iclog->ic_flags |= XLOG_ICL_NEED_FLUSH;
> +		ctx->commit_iclog->ic_flags |= XLOG_ICL_NEED_FLUSH;
>  	}
>  
>  	/*
> @@ -1214,10 +1224,11 @@ xlog_cil_push_work(
>  	 * will be written when released, switch it's state to WANT_SYNC right
>  	 * now.
>  	 */
> -	commit_iclog->ic_flags |= XLOG_ICL_NEED_FUA;
> -	if (push_commit_stable && commit_iclog->ic_state == XLOG_STATE_ACTIVE)
> -		xlog_state_switch_iclogs(log, commit_iclog, 0);
> -	xlog_state_release_iclog(log, commit_iclog, ticket);
> +	ctx->commit_iclog->ic_flags |= XLOG_ICL_NEED_FUA;
> +	if (push_commit_stable &&
> +	    ctx->commit_iclog->ic_state == XLOG_STATE_ACTIVE)
> +		xlog_state_switch_iclogs(log, ctx->commit_iclog, 0);
> +	xlog_state_release_iclog(log, ctx->commit_iclog, ticket);
>  	spin_unlock(&log->l_icloglock);
>  
>  	xfs_log_ticket_ungrant(log, ticket);
> diff --git a/fs/xfs/xfs_log_priv.h b/fs/xfs/xfs_log_priv.h
> index 849ba2eb3483..72dfa3b89513 100644
> --- a/fs/xfs/xfs_log_priv.h
> +++ b/fs/xfs/xfs_log_priv.h
> @@ -237,6 +237,7 @@ struct xfs_cil_ctx {
>  	struct work_struct	discard_endio_work;
>  	struct work_struct	push_work;
>  	atomic_t		order_id;
> +	struct xlog_in_core	*commit_iclog;
>  };
>  
>  /*
> @@ -489,7 +490,7 @@ void	xlog_print_tic_res(struct xfs_mount *mp, struct xlog_ticket *ticket);
>  void	xlog_print_trans(struct xfs_trans *);
>  int	xlog_write(struct xlog *log, struct xfs_cil_ctx *ctx,
>  		struct list_head *lv_chain, struct xlog_ticket *tic,
> -		struct xlog_in_core **commit_iclog, uint32_t len);
> +		uint32_t len);
>  
>  void	xfs_log_ticket_ungrant(struct xlog *log, struct xlog_ticket *ticket);
>  void	xfs_log_ticket_regrant(struct xlog *log, struct xlog_ticket *ticket);
> -- 
> 2.31.1
> 

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 8/8] xfs: order CIL checkpoint start records
  2021-06-17  8:26 ` [PATCH 8/8] xfs: order CIL checkpoint start records Dave Chinner
@ 2021-06-17 21:31   ` Darrick J. Wong
  2021-06-17 22:49     ` Dave Chinner
  0 siblings, 1 reply; 50+ messages in thread
From: Darrick J. Wong @ 2021-06-17 21:31 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs

On Thu, Jun 17, 2021 at 06:26:17PM +1000, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> Because log recovery depends on strictly ordered start records as
> well as strictly ordered commit records.
> 
> This is a zero day bug in the way XFS writes pipelined transactions
> to the journal which is exposed by commit facd77e4e38b ("xfs: CIL
> work is serialised, not pipelined") which re-introduces explicit
> concurrent commits back into the on-disk journal.
> 
> The XFS journal commit code has never ordered start records and we
> have relied on strict commit record ordering for correct recovery
> ordering of concurrently written transactions. Unfortunately, root
> cause analysis uncovered the fact that log recovery uses the LSN of
> the start record for transaction commit processing. Hence the
> commits are processed in strict orderi by recovery, but the LSNs

s/orderi/order/ ?

> associated with the commits can be out of order and so recovery may
> stamp incorrect LSNs into objects and/or misorder intents in the AIL
> for later processing. This can result in log recovery failures
> and/or on disk corruption, sometimes silent.
> 
> Because this is a long standing log recovery issue, we can't just
> fix log recovery and call it good.

Could there be production filesystems out there that have this
mismatched ordering of start lsn and commit lsn?  This still leaves the
mystery of crashed customer filesystems containing btree blocks where
128 bytes in the middle clearly contain contents that are don't match or
duplicate the rest of the block, as though someone forgot to replay a
buffer vector or something.

What would a fix to log recovery entail?  Not skipping recovered items
if the start/commit sequencing is not the same?  Or am I not
understanding the problem correctly?

> This still leaves older kernels
> susceptible to recovery failures and corruption when replaying a log
> from a kernel that pipelines checkpoints.

> There is also the issue
> that in-memory ordering for AIL pushing and data integrity
> operations are based on checkpoint start LSNs, and if the start LSN
> is incorrect in the journal, it is also incorrect in memory.
> 
> Hence there's really only one choice for fixing this zero-day bug:
> we need to strictly order checkpoint start records in ascending
> sequence order in the log, the same way we already strictly order
> commit records.
> 
> Fixes: facd77e4e38b ("xfs: CIL work is serialised, not pipelined")
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> ---
>  fs/xfs/xfs_log.c      |   1 +
>  fs/xfs/xfs_log_cil.c  | 101 +++++++++++++++++++++++++++++-------------
>  fs/xfs/xfs_log_priv.h |   1 +
>  3 files changed, 71 insertions(+), 32 deletions(-)
> 
> diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c
> index 359246d54db7..94b6bccb9de9 100644
> --- a/fs/xfs/xfs_log.c
> +++ b/fs/xfs/xfs_log.c
> @@ -3743,6 +3743,7 @@ xfs_log_force_umount(
>  	 * avoid races.
>  	 */
>  	spin_lock(&log->l_cilp->xc_push_lock);
> +	wake_up_all(&log->l_cilp->xc_start_wait);
>  	wake_up_all(&log->l_cilp->xc_commit_wait);
>  	spin_unlock(&log->l_cilp->xc_push_lock);
>  	xlog_state_do_callback(log);
> diff --git a/fs/xfs/xfs_log_cil.c b/fs/xfs/xfs_log_cil.c
> index 87e30917ce2e..722c21f21b81 100644
> --- a/fs/xfs/xfs_log_cil.c
> +++ b/fs/xfs/xfs_log_cil.c
> @@ -684,6 +684,7 @@ xlog_cil_committed(
>  	 */
>  	if (abort) {
>  		spin_lock(&ctx->cil->xc_push_lock);
> +		wake_up_all(&ctx->cil->xc_start_wait);
>  		wake_up_all(&ctx->cil->xc_commit_wait);
>  		spin_unlock(&ctx->cil->xc_push_lock);
>  	}
> @@ -788,6 +789,10 @@ xlog_cil_build_trans_hdr(
>   * If the context doesn't have a start_lsn recorded, then this iclog will
>   * contain the start record for the checkpoint. Otherwise this write contains
>   * the commit record for the checkpoint.
> + *
> + * Once we've set the LSN for the given operation, wake up any ordered write
> + * waiters that can make progress now that we have a stable LSN for write
> + * ordering purposes.
>   */
>  void
>  xlog_cil_set_ctx_write_state(
> @@ -798,9 +803,16 @@ xlog_cil_set_ctx_write_state(
>  	xfs_lsn_t		lsn = be64_to_cpu(iclog->ic_header.h_lsn);
>  
>  	ASSERT(!ctx->commit_lsn);
> -	spin_lock(&cil->xc_push_lock);
>  	if (!ctx->start_lsn) {
> +		spin_lock(&cil->xc_push_lock);
> +		/*
> +		 * The LSN we need to pass to the log items on transaction
> +		 * commit is the LSN reported by the first log vector write, not
> +		 * the commit lsn. If we use the commit record lsn then we can
> +		 * move the tail beyond the grant write head.
> +		 */
>  		ctx->start_lsn = lsn;
> +		wake_up_all(&cil->xc_start_wait);
>  		spin_unlock(&cil->xc_push_lock);
>  		return;
>  	}
> @@ -811,9 +823,6 @@ xlog_cil_set_ctx_write_state(
>  	 * context controls when the iclog is released for IO.
>  	 */
>  	atomic_inc(&iclog->ic_refcnt);
> -	ctx->commit_iclog = iclog;
> -	ctx->commit_lsn = lsn;
> -	spin_unlock(&cil->xc_push_lock);
>  
>  	/*
>  	 * xlog_state_get_iclog_space() guarantees there is enough space in the
> @@ -827,6 +836,12 @@ xlog_cil_set_ctx_write_state(
>  	}
>  	list_add_tail(&ctx->iclog_entry, &iclog->ic_callbacks);
>  	spin_unlock(&iclog->ic_callback_lock);
> +
> +	spin_lock(&cil->xc_push_lock);
> +	ctx->commit_iclog = iclog;
> +	ctx->commit_lsn = lsn;
> +	wake_up_all(&cil->xc_commit_wait);
> +	spin_unlock(&cil->xc_push_lock);
>  }
>  
>  /*
> @@ -834,10 +849,16 @@ xlog_cil_set_ctx_write_state(
>   * relies on the context LSN being zero until the log write has guaranteed the
>   * LSN that the log write will start at via xlog_state_get_iclog_space().
>   */
> +enum {
> +	_START_RECORD,
> +	_COMMIT_RECORD,
> +};

Stupid nit: If this enum had a name you could skip the default clause
below because the compiler would typecheck the usage for you.

I think I grok how the code changes introduce a new ordering
requirement, at least.

--D

> +
>  static int
>  xlog_cil_order_write(
>  	struct xfs_cil		*cil,
> -	xfs_csn_t		sequence)
> +	xfs_csn_t		sequence,
> +	int			record)
>  {
>  	struct xfs_cil_ctx	*ctx;
>  
> @@ -860,19 +881,50 @@ xlog_cil_order_write(
>  		 */
>  		if (ctx->sequence >= sequence)
>  			continue;
> -		if (!ctx->commit_lsn) {
> -			/*
> -			 * It is still being pushed! Wait for the push to
> -			 * complete, then start again from the beginning.
> -			 */
> -			xlog_wait(&cil->xc_commit_wait, &cil->xc_push_lock);
> -			goto restart;
> +
> +		/* Wait until the LSN for the record has been recorded. */
> +		switch (record) {
> +		case _START_RECORD:
> +			if (!ctx->start_lsn) {
> +				xlog_wait(&cil->xc_start_wait, &cil->xc_push_lock);
> +				goto restart;
> +			}
> +			break;
> +		case _COMMIT_RECORD:
> +			if (!ctx->commit_lsn) {
> +				xlog_wait(&cil->xc_commit_wait, &cil->xc_push_lock);
> +				goto restart;
> +			}
> +			break;
> +		default:
> +			ASSERT(0);
> +			break;
>  		}
>  	}
>  	spin_unlock(&cil->xc_push_lock);
>  	return 0;
>  }
>  
> +/*
> + * Write out the log vector change now attached to the CIL context. This will
> + * write a start record that needs to be strictly ordered in ascending CIL
> + * sequence order so that log recovery will always use in-order start LSNs when
> + * replaying checkpoints.
> + */
> +static int
> +xlog_cil_write_chain(
> +	struct xfs_cil_ctx	*ctx,
> +	uint32_t		num_bytes)
> +{
> +	struct xlog		*log = ctx->cil->xc_log;
> +	int			error;
> +
> +	error = xlog_cil_order_write(ctx->cil, ctx->sequence, _START_RECORD);
> +	if (error)
> +		return error;
> +	return xlog_write(log, ctx, &ctx->lv_chain, ctx->ticket, num_bytes);
> +}
> +
>  /*
>   * Write out the commit record of a checkpoint transaction to close off a
>   * running log write. These commit records are strictly ordered in ascending CIL
> @@ -906,7 +958,7 @@ xlog_cil_write_commit_record(
>  	if (XLOG_FORCED_SHUTDOWN(log))
>  		return -EIO;
>  
> -	error = xlog_cil_order_write(ctx->cil, ctx->sequence);
> +	error = xlog_cil_order_write(ctx->cil, ctx->sequence, _COMMIT_RECORD);
>  	if (error)
>  		return error;
>  
> @@ -1125,17 +1177,10 @@ xlog_cil_push_work(
>  	wait_for_completion(&bdev_flush);
>  
>  	/*
> -	 * The LSN we need to pass to the log items on transaction commit is the
> -	 * LSN reported by the first log vector write, not the commit lsn. If we
> -	 * use the commit record lsn then we can move the tail beyond the grant
> -	 * write head.
> -	 */
> -	error = xlog_write(log, ctx, &ctx->lv_chain, ctx->ticket, num_bytes);
> -
> -	/*
> -	 * Take the lvhdr back off the lv_chain as it should not be passed
> -	 * to log IO completion.
> +	 * Once we write the log vector chain, take the lvhdr back off it as it
> +	 * must not be passed to log IO completion.
>  	 */
> +	error = xlog_cil_write_chain(ctx, num_bytes);
>  	list_del(&lvhdr.lv_list);
>  	if (error)
>  		goto out_abort_free_ticket;
> @@ -1144,15 +1189,6 @@ xlog_cil_push_work(
>  	if (error)
>  		goto out_abort_free_ticket;
>  
> -	/*
> -	 * now the checkpoint commit is complete and we've attached the
> -	 * callbacks to the iclog we can assign the commit LSN to the context
> -	 * and wake up anyone who is waiting for the commit to complete.
> -	 */
> -	spin_lock(&cil->xc_push_lock);
> -	wake_up_all(&cil->xc_commit_wait);
> -	spin_unlock(&cil->xc_push_lock);
> -
>  	/*
>  	 * Pull the ticket off the ctx so we can ungrant it after releasing the
>  	 * commit_iclog. The ctx may be freed by the time we return from
> @@ -1728,6 +1764,7 @@ xlog_cil_init(
>  	init_waitqueue_head(&cil->xc_push_wait);
>  	init_rwsem(&cil->xc_ctx_lock);
>  	init_waitqueue_head(&cil->xc_commit_wait);
> +	init_waitqueue_head(&cil->xc_start_wait);
>  	log->l_cilp = cil;
>  
>  	ctx = xlog_cil_ctx_alloc();
> diff --git a/fs/xfs/xfs_log_priv.h b/fs/xfs/xfs_log_priv.h
> index 72dfa3b89513..b807a179b916 100644
> --- a/fs/xfs/xfs_log_priv.h
> +++ b/fs/xfs/xfs_log_priv.h
> @@ -279,6 +279,7 @@ struct xfs_cil {
>  	bool			xc_push_commit_stable;
>  	struct list_head	xc_committing;
>  	wait_queue_head_t	xc_commit_wait;
> +	wait_queue_head_t	xc_start_wait;
>  	xfs_csn_t		xc_current_sequence;
>  	wait_queue_head_t	xc_push_wait;	/* background push throttle */
>  
> -- 
> 2.31.1
> 

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 2/8] xfs: don't wait on future iclogs when pushing the CIL
  2021-06-17 17:49   ` Darrick J. Wong
@ 2021-06-17 21:55     ` Dave Chinner
  0 siblings, 0 replies; 50+ messages in thread
From: Dave Chinner @ 2021-06-17 21:55 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

On Thu, Jun 17, 2021 at 10:49:10AM -0700, Darrick J. Wong wrote:
> On Thu, Jun 17, 2021 at 06:26:11PM +1000, Dave Chinner wrote:
> > diff --git a/fs/xfs/xfs_log_cil.c b/fs/xfs/xfs_log_cil.c
> > index 705619e9dab4..2fb0ab02dda3 100644
> > --- a/fs/xfs/xfs_log_cil.c
> > +++ b/fs/xfs/xfs_log_cil.c
> > @@ -1075,15 +1075,54 @@ xlog_cil_push_work(
> >  	ticket = ctx->ticket;
> >  
> >  	/*
> > -	 * If the checkpoint spans multiple iclogs, wait for all previous
> > -	 * iclogs to complete before we submit the commit_iclog. In this case,
> > -	 * the commit_iclog write needs to issue a pre-flush so that the
> > -	 * ordering is correctly preserved down to stable storage.
> > +	 * If the checkpoint spans multiple iclogs, wait for all previous iclogs
> > +	 * to complete before we submit the commit_iclog. We can't use state
> > +	 * checks for this - ACTIVE can be either a past completed iclog or a
> > +	 * future iclog being filled, while WANT_SYNC through SYNC_DONE can be a
> > +	 * past or future iclog awaiting IO or ordered IO completion to be run.
> > +	 * In the latter case, if it's a future iclog and we wait on it, the we
> > +	 * will hang because it won't get processed through to ic_force_wait
> > +	 * wakeup until this commit_iclog is written to disk.  Hence we use the
> > +	 * iclog header lsn and compare it to the commit lsn to determine if we
> > +	 * need to wait on iclogs or not.
> >  	 */
> >  	spin_lock(&log->l_icloglock);
> >  	if (ctx->start_lsn != commit_lsn) {
> > -		xlog_wait_on_iclog(commit_iclog->ic_prev);
> > -		spin_lock(&log->l_icloglock);
> > +		struct xlog_in_core	*iclog;
> > +
> > +		for (iclog = commit_iclog->ic_prev;
> > +		     iclog != commit_iclog;
> > +		     iclog = iclog->ic_prev) {
> > +			xfs_lsn_t	hlsn;
> > +
> > +			/*
> > +			 * If the LSN of the iclog is zero or in the future it
> > +			 * means it has passed through IO completion and
> > +			 * activation and hence all previous iclogs have also
> > +			 * done so. We do not need to wait at all in this case.
> > +			 */
> > +			hlsn = be64_to_cpu(iclog->ic_header.h_lsn);
> > +			if (!hlsn || XFS_LSN_CMP(hlsn, commit_lsn) > 0)
> > +				break;
> > +
> > +			/*
> > +			 * If the LSN of the iclog is older than the commit lsn,
> > +			 * we have to wait on it. Waiting on this via the
> > +			 * ic_force_wait should also order the completion of all
> > +			 * older iclogs, too, but we leave checking that to the
> > +			 * next loop iteration.
> > +			 */
> > +			ASSERT(XFS_LSN_CMP(hlsn, commit_lsn) < 0);
> > +			xlog_wait_on_iclog(iclog);
> > +			spin_lock(&log->l_icloglock);
> 
> The presence of a loop here confuses me a bit -- we really only need to
> check and wait on commit->ic_prev since xlog_wait_on_iclog waits for
> both the iclog that it is given as well as all previous iclogs, right?

I originally wrote this thinking about using the ic_write_wait queue
which would require checking all iclogs in the ring because the
completion signalled at the DONE_SYNC state is not ordered against
other iclogs. Hence I had planned to walk all the iclogs. THen I
realised that checking the LSN could tell us past/future and so we
only needed to wait on the first iclog with a LSN less than the
commit iclog.

ANd so I left the loop in place to ensure that, even if my assertion
about the ring aging order was incorrect, this code would Do The
Right Thing.

> we've waited on commit->ic_prev, the next iclog iterated (i.e.
> commit->ic_prev->ic_prev) should break out of the loop?

Yes, that is what it does.

I can strip this all out - it was really just being defensive
because I wanted to make sure things were working as I expected them
to be working...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 3/8] xfs: move xlog_commit_record to xfs_log_cil.c
  2021-06-17 17:50   ` Darrick J. Wong
@ 2021-06-17 21:56     ` Dave Chinner
  0 siblings, 0 replies; 50+ messages in thread
From: Dave Chinner @ 2021-06-17 21:56 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

On Thu, Jun 17, 2021 at 10:50:39AM -0700, Darrick J. Wong wrote:
> On Thu, Jun 17, 2021 at 06:26:12PM +1000, Dave Chinner wrote:
> > diff --git a/fs/xfs/xfs_log_cil.c b/fs/xfs/xfs_log_cil.c
> > index 2fb0ab02dda3..2c8b25888c53 100644
> > --- a/fs/xfs/xfs_log_cil.c
> > +++ b/fs/xfs/xfs_log_cil.c
> > @@ -783,6 +783,48 @@ xlog_cil_build_trans_hdr(
> >  	tic->t_curr_res -= lvhdr->lv_bytes;
> >  }
> >  
> > +/*
> > + * Write out the commit record of a checkpoint transaction associated with the
> > + * given ticket to close off a running log write. Return the lsn of the commit
> > + * record.
> > + */
> > +int
> 
> static int, like the robot suggests?

Huh. How did that get dropped? I definitely made this static in the
original patch....

> With that fixed,
> Reviewed-by: Darrick J. Wong <djwong@kernel.org>

Ta.

-Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 4/8] xfs: pass a CIL context to xlog_write()
  2021-06-17 20:24   ` Darrick J. Wong
@ 2021-06-17 22:03     ` Dave Chinner
  2021-06-17 22:18       ` Darrick J. Wong
  0 siblings, 1 reply; 50+ messages in thread
From: Dave Chinner @ 2021-06-17 22:03 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

On Thu, Jun 17, 2021 at 01:24:02PM -0700, Darrick J. Wong wrote:
> On Thu, Jun 17, 2021 at 06:26:13PM +1000, Dave Chinner wrote:
> > From: Dave Chinner <dchinner@redhat.com>
> > 
> > Pass the CIL context to xlog_write() rather than a pointer to a LSN
> > variable. Only the CIL checkpoint calls to xlog_write() need to know
> > about the start LSN of the writes, so rework xlog_write to directly
> > write the LSNs into the CIL context structure.
> > 
> > This removes the commit_lsn variable from xlog_cil_push_work(), so
> > now we only have to issue the commit record ordering wakeup from
> > there.
> > 
> > Signed-off-by: Dave Chinner <dchinner@redhat.com>
> > ---
> >  fs/xfs/xfs_log.c      | 22 +++++++++++++++++-----
> >  fs/xfs/xfs_log_cil.c  | 19 ++++++++-----------
> >  fs/xfs/xfs_log_priv.h |  4 ++--
> >  3 files changed, 27 insertions(+), 18 deletions(-)
> > 
> > diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c
> > index cf661c155786..fc0e43c57683 100644
> > --- a/fs/xfs/xfs_log.c
> > +++ b/fs/xfs/xfs_log.c
> > @@ -871,7 +871,7 @@ xlog_write_unmount_record(
> >  	 */
> >  	if (log->l_targ != log->l_mp->m_ddev_targp)
> >  		blkdev_issue_flush(log->l_targ->bt_bdev);
> > -	return xlog_write(log, &lv_chain, ticket, NULL, NULL, reg.i_len);
> > +	return xlog_write(log, NULL, &lv_chain, ticket, NULL, reg.i_len);
> >  }
> >  
> >  /*
> > @@ -2383,9 +2383,9 @@ xlog_write_partial(
> >  int
> >  xlog_write(
> >  	struct xlog		*log,
> > +	struct xfs_cil_ctx	*ctx,
> >  	struct list_head	*lv_chain,
> >  	struct xlog_ticket	*ticket,
> > -	xfs_lsn_t		*start_lsn,
> >  	struct xlog_in_core	**commit_iclog,
> >  	uint32_t		len)
> >  {
> > @@ -2408,9 +2408,21 @@ xlog_write(
> >  	if (error)
> >  		return error;
> >  
> > -	/* start_lsn is the LSN of the first iclog written to. */
> > -	if (start_lsn)
> > -		*start_lsn = be64_to_cpu(iclog->ic_header.h_lsn);
> > +	/*
> > +	 * If we have a CIL context, record the LSN of the iclog we were just
> > +	 * granted space to start writing into. If the context doesn't have
> > +	 * a start_lsn recorded, then this iclog will contain the start record
> > +	 * for the checkpoint. Otherwise this write contains the commit record
> > +	 * for the checkpoint.
> > +	 */
> > +	if (ctx) {
> > +		spin_lock(&ctx->cil->xc_push_lock);
> > +		if (!ctx->start_lsn)
> > +			ctx->start_lsn = be64_to_cpu(iclog->ic_header.h_lsn);
> > +		else
> > +			ctx->commit_lsn = be64_to_cpu(iclog->ic_header.h_lsn);
> > +		spin_unlock(&ctx->cil->xc_push_lock);
> 
> This cycling of the push lock when setting start_lsn is new.  What are
> we protecting against here by taking the lock?

Later in the series it will be the ordering wakeup when we set the
start_lsn. The ordering ends with both start_lsn and commit_lsn
being treated the same way w.r.t. wakeups, so I just started it off
the same way here.

> Also, just to check my assumptions: why do we take the push lock when
> setting commit_lsn?  Is that to synchronize with the xc_committing loop
> that looks for contexts that need pushing?

Yes - the spinlock provides the memory barriers for access to the
variable. I could use WRITE_ONCE/READ_ONCE here for this specific patch,
but the lock is necessary for compound operations in upcoming
patches so it didn't make any sense to use _ONCE macros here only to
remove them again later.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 6/8] xfs: separate out setting CIL context LSNs from xlog_write
  2021-06-17 20:28   ` Darrick J. Wong
@ 2021-06-17 22:10     ` Dave Chinner
  0 siblings, 0 replies; 50+ messages in thread
From: Dave Chinner @ 2021-06-17 22:10 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

On Thu, Jun 17, 2021 at 01:28:24PM -0700, Darrick J. Wong wrote:
> On Thu, Jun 17, 2021 at 06:26:15PM +1000, Dave Chinner wrote:
> > From: Dave Chinner <dchinner@redhat.com>
> > 
> > In preparation for moving more CIL context specific functionality
> > into these operations.
> > 
> > Signed-off-by: Dave Chinner <dchinner@redhat.com>
> 
> Looks fine as a hoist, though I wonder why you didn't do this in patch
> 4?

Because I wanted to keep the xlog_write() api change separate to
relocating the lsn code out of xlog_write().

There are enough review comments of "don't move and modify in the
one patch" that I won't even bother trying to do even simple "move
and modify" operations in a single patch anymore.

I can combine them if you want, but then someone is bound to pop up
in another review cycle and say "please separate....". :/

-Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 4/8] xfs: pass a CIL context to xlog_write()
  2021-06-17 22:03     ` Dave Chinner
@ 2021-06-17 22:18       ` Darrick J. Wong
  0 siblings, 0 replies; 50+ messages in thread
From: Darrick J. Wong @ 2021-06-17 22:18 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs

On Fri, Jun 18, 2021 at 08:03:37AM +1000, Dave Chinner wrote:
> On Thu, Jun 17, 2021 at 01:24:02PM -0700, Darrick J. Wong wrote:
> > On Thu, Jun 17, 2021 at 06:26:13PM +1000, Dave Chinner wrote:
> > > From: Dave Chinner <dchinner@redhat.com>
> > > 
> > > Pass the CIL context to xlog_write() rather than a pointer to a LSN
> > > variable. Only the CIL checkpoint calls to xlog_write() need to know
> > > about the start LSN of the writes, so rework xlog_write to directly
> > > write the LSNs into the CIL context structure.
> > > 
> > > This removes the commit_lsn variable from xlog_cil_push_work(), so
> > > now we only have to issue the commit record ordering wakeup from
> > > there.
> > > 
> > > Signed-off-by: Dave Chinner <dchinner@redhat.com>
> > > ---
> > >  fs/xfs/xfs_log.c      | 22 +++++++++++++++++-----
> > >  fs/xfs/xfs_log_cil.c  | 19 ++++++++-----------
> > >  fs/xfs/xfs_log_priv.h |  4 ++--
> > >  3 files changed, 27 insertions(+), 18 deletions(-)
> > > 
> > > diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c
> > > index cf661c155786..fc0e43c57683 100644
> > > --- a/fs/xfs/xfs_log.c
> > > +++ b/fs/xfs/xfs_log.c
> > > @@ -871,7 +871,7 @@ xlog_write_unmount_record(
> > >  	 */
> > >  	if (log->l_targ != log->l_mp->m_ddev_targp)
> > >  		blkdev_issue_flush(log->l_targ->bt_bdev);
> > > -	return xlog_write(log, &lv_chain, ticket, NULL, NULL, reg.i_len);
> > > +	return xlog_write(log, NULL, &lv_chain, ticket, NULL, reg.i_len);
> > >  }
> > >  
> > >  /*
> > > @@ -2383,9 +2383,9 @@ xlog_write_partial(
> > >  int
> > >  xlog_write(
> > >  	struct xlog		*log,
> > > +	struct xfs_cil_ctx	*ctx,
> > >  	struct list_head	*lv_chain,
> > >  	struct xlog_ticket	*ticket,
> > > -	xfs_lsn_t		*start_lsn,
> > >  	struct xlog_in_core	**commit_iclog,
> > >  	uint32_t		len)
> > >  {
> > > @@ -2408,9 +2408,21 @@ xlog_write(
> > >  	if (error)
> > >  		return error;
> > >  
> > > -	/* start_lsn is the LSN of the first iclog written to. */
> > > -	if (start_lsn)
> > > -		*start_lsn = be64_to_cpu(iclog->ic_header.h_lsn);
> > > +	/*
> > > +	 * If we have a CIL context, record the LSN of the iclog we were just
> > > +	 * granted space to start writing into. If the context doesn't have
> > > +	 * a start_lsn recorded, then this iclog will contain the start record
> > > +	 * for the checkpoint. Otherwise this write contains the commit record
> > > +	 * for the checkpoint.
> > > +	 */
> > > +	if (ctx) {
> > > +		spin_lock(&ctx->cil->xc_push_lock);
> > > +		if (!ctx->start_lsn)
> > > +			ctx->start_lsn = be64_to_cpu(iclog->ic_header.h_lsn);
> > > +		else
> > > +			ctx->commit_lsn = be64_to_cpu(iclog->ic_header.h_lsn);
> > > +		spin_unlock(&ctx->cil->xc_push_lock);
> > 
> > This cycling of the push lock when setting start_lsn is new.  What are
> > we protecting against here by taking the lock?
> 
> Later in the series it will be the ordering wakeup when we set the
> start_lsn. The ordering ends with both start_lsn and commit_lsn
> being treated the same way w.r.t. wakeups, so I just started it off
> the same way here.

Ah, right, I see that now that I've gotten to patch 8.

> > Also, just to check my assumptions: why do we take the push lock when
> > setting commit_lsn?  Is that to synchronize with the xc_committing loop
> > that looks for contexts that need pushing?
> 
> Yes - the spinlock provides the memory barriers for access to the
> variable. I could use WRITE_ONCE/READ_ONCE here for this specific patch,
> but the lock is necessary for compound operations in upcoming
> patches so it didn't make any sense to use _ONCE macros here only to
> remove them again later.

Nah, I'd leave it, especially since it's already a little strange that
the place where we set ctx->commit_lsn bounces around relative to the
callback list splicing...

--D

> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 7/8] xfs: attached iclog callbacks in xlog_cil_set_ctx_write_state()
  2021-06-17 20:55   ` Darrick J. Wong
@ 2021-06-17 22:20     ` Dave Chinner
  0 siblings, 0 replies; 50+ messages in thread
From: Dave Chinner @ 2021-06-17 22:20 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

On Thu, Jun 17, 2021 at 01:55:52PM -0700, Darrick J. Wong wrote:
> On Thu, Jun 17, 2021 at 06:26:16PM +1000, Dave Chinner wrote:
> > From: Dave Chinner <dchinner@redhat.com>
> > 
> > We currently attach iclog callbacks for the CIL when the commit
> > iclog is returned from xlog_write. Because
> > xlog_state_get_iclog_space() always guarantees that the commit
> > record will fit in the iclog it returns, we can move this IO
> > callback setting to xlog_cil_set_ctx_write_state(), record the
> > commit iclog in the context and remove the need for the commit iclog
> > to be returned by xlog_write() altogether.
> > 
> > 
> > Signed-off-by: Dave Chinner <dchinner@redhat.com>
> > ---
> >  fs/xfs/xfs_log.c      |  8 ++----
> >  fs/xfs/xfs_log_cil.c  | 65 +++++++++++++++++++++++++------------------
> >  fs/xfs/xfs_log_priv.h |  3 +-
> >  3 files changed, 42 insertions(+), 34 deletions(-)
> > 
> > diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c
> > index 1c214b395223..359246d54db7 100644
> > --- a/fs/xfs/xfs_log.c
> > +++ b/fs/xfs/xfs_log.c
> > @@ -871,7 +871,7 @@ xlog_write_unmount_record(
> >  	 */
> >  	if (log->l_targ != log->l_mp->m_ddev_targp)
> >  		blkdev_issue_flush(log->l_targ->bt_bdev);
> > -	return xlog_write(log, NULL, &lv_chain, ticket, NULL, reg.i_len);
> > +	return xlog_write(log, NULL, &lv_chain, ticket, reg.i_len);
> >  }
> >  
> >  /*
> > @@ -2386,7 +2386,6 @@ xlog_write(
> >  	struct xfs_cil_ctx	*ctx,
> >  	struct list_head	*lv_chain,
> >  	struct xlog_ticket	*ticket,
> > -	struct xlog_in_core	**commit_iclog,
> >  	uint32_t		len)
> >  {
> >  	struct xlog_in_core	*iclog = NULL;
> > @@ -2436,10 +2435,7 @@ xlog_write(
> >  	 */
> >  	spin_lock(&log->l_icloglock);
> >  	xlog_state_finish_copy(log, iclog, record_cnt, 0);
> > -	if (commit_iclog)
> > -		*commit_iclog = iclog;
> > -	else
> > -		error = xlog_state_release_iclog(log, iclog, ticket);
> > +	error = xlog_state_release_iclog(log, iclog, ticket);
> >  	spin_unlock(&log->l_icloglock);
> >  
> >  	return error;
> > diff --git a/fs/xfs/xfs_log_cil.c b/fs/xfs/xfs_log_cil.c
> > index 2d8d904ffb78..87e30917ce2e 100644
> > --- a/fs/xfs/xfs_log_cil.c
> > +++ b/fs/xfs/xfs_log_cil.c
> > @@ -799,11 +799,34 @@ xlog_cil_set_ctx_write_state(
> >  
> >  	ASSERT(!ctx->commit_lsn);
> >  	spin_lock(&cil->xc_push_lock);
> > -	if (!ctx->start_lsn)
> > +	if (!ctx->start_lsn) {
> >  		ctx->start_lsn = lsn;
> > -	else
> > -		ctx->commit_lsn = lsn;
> > +		spin_unlock(&cil->xc_push_lock);
> > +		return;
> > +	}
> > +
> > +	/*
> > +	 * Take a reference to the iclog for the context so that we still hold
> > +	 * it when xlog_write is done and has released it. This means the
> > +	 * context controls when the iclog is released for IO.
> > +	 */
> > +	atomic_inc(&iclog->ic_refcnt);
> 
> Where do we drop this refcount?

In xlog_cil_push_work() where we call xlog_state_release_iclog().

> Is this the accounting adjustment that
> we have to make because xlog_write always decrements the iclog refcount
> now?

Yes.

> > +	ctx->commit_iclog = iclog;
> > +	ctx->commit_lsn = lsn;
> >  	spin_unlock(&cil->xc_push_lock);
> 
> I've noticed how the setting of ctx->commit_lsn has moved to before the
> point where we splice callback lists, only to move them back below in
> the next patch.  That has made it harder for me to understand this
> series.
> 
> I /think/ the goal of this patch is not really a functional change so
> much as a refactoring to make the cil context track the commit iclog
> directly and then smooth out some of the refcounting code, but the
> shuffling around of these variables makes me wonder if I'm missing some
> other subtlety.

The subtlety is that we can't issue the wakup on the commit_lsn
until after the callbacks are attached to the commit iclog. When we
set ctx->commit_lsn doesn't really matter - I'm trying to keep the
order of "callbacks attached before we issue the wakeup" so that
when the waiter is woken and then adds it's callbacks to the same
iclog they will be appended to the list after the first commit
record's callbacks and hence they get processed in the correct order
when journal IO completion runs the callbacks on that iclog.

This patch doesn't move the wakeup from after the xlog_write() call
completes, so the ordering of setting
ctx->commit_lsn and attaching the callbacks inside xlog_write()
doesn't really matter. In the next patch, the wakeups move inside
xlog_write()->xlog_cil_set_ctx_write_state(), and so now it has to
ensure that the ordering is correct.

I'll rework the patches so that this one sets up the order the next
patch requires rather than minimal change in this patch and reorder
in the next patch...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 8/8] xfs: order CIL checkpoint start records
  2021-06-17 21:31   ` Darrick J. Wong
@ 2021-06-17 22:49     ` Dave Chinner
  0 siblings, 0 replies; 50+ messages in thread
From: Dave Chinner @ 2021-06-17 22:49 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

On Thu, Jun 17, 2021 at 02:31:43PM -0700, Darrick J. Wong wrote:
> On Thu, Jun 17, 2021 at 06:26:17PM +1000, Dave Chinner wrote:
> > From: Dave Chinner <dchinner@redhat.com>
> > 
> > Because log recovery depends on strictly ordered start records as
> > well as strictly ordered commit records.
> > 
> > This is a zero day bug in the way XFS writes pipelined transactions
> > to the journal which is exposed by commit facd77e4e38b ("xfs: CIL
> > work is serialised, not pipelined") which re-introduces explicit
> > concurrent commits back into the on-disk journal.
> > 
> > The XFS journal commit code has never ordered start records and we
> > have relied on strict commit record ordering for correct recovery
> > ordering of concurrently written transactions. Unfortunately, root
> > cause analysis uncovered the fact that log recovery uses the LSN of
> > the start record for transaction commit processing. Hence the
> > commits are processed in strict orderi by recovery, but the LSNs
> 
> s/orderi/order/ ?
> 
> > associated with the commits can be out of order and so recovery may
> > stamp incorrect LSNs into objects and/or misorder intents in the AIL
> > for later processing. This can result in log recovery failures
> > and/or on disk corruption, sometimes silent.
> > 
> > Because this is a long standing log recovery issue, we can't just
> > fix log recovery and call it good.
> 
> Could there be production filesystems out there that have this
> mismatched ordering of start lsn and commit lsn?  This still leaves the
> mystery of crashed customer filesystems containing btree blocks where
> 128 bytes in the middle clearly contain contents that are don't match or
> duplicate the rest of the block, as though someone forgot to replay a
> buffer vector or something.

Modulo bugs in delayed logging, I doubt there's any delayed logging
filesystems out there that have the problem. Older, non-delayed
logging filesystems are almost certain to see it, but they have much
smaller transactions and only EFIs to deal with so the corruption
risk is much, much, much lower.

> What would a fix to log recovery entail?  Not skipping recovered items
> if the start/commit sequencing is not the same?  Or am I not
> understanding the problem correctly?

I've been going back and forth on this trying to come up with a sane
solution, but I haven't come up with anything practical.

We could use the commit record LSN for recovery, but we write start
record LSNs into on-disk metadata when we flush it to disk and that
forces checkpoints that need recovery to use the same LSN in the
metadata it recovers and writes back as we use for runtime
writeback. Hence we then get problems with recovered filesystems not
having the same on-disk state as they would if the metadata was
written back from in-memory. i.e. two pieces of metadata in the same
atomic transaction could have different LSNs stamped in them
depending on whether they were written back at runtime or recovered
by log recovery at mount time...

And then my head explodes trying to work out what happens when we
have overlapping checkpoints and partial metadata writeback and
different LSN values for recovery vs writeback and recovery retries
after a failed recovery and <BOOM>

However, given that there are runtime integrity issues with out of
order start LSNs (log head can overwrite the log tail - I can give
more detail if you want), the only way out of this I can see is to
ensure that the start records are properly ordered at runtime to
avoid all the potential runtime issues that exist.  This also has
the nice "side effect" of avoiding the log recovery LSN ordering
problem.

IOWs, I'm not looking at this as log recovery bug that needs fixing.
Yes, there is a log recovery issue there (and has been forever), but
the more I think on this, the more I'm concerned about the potential
runtime impacts on data integrity correctness and potential
head-tail journal overwrite corruption. 

> > +	ctx->commit_lsn = lsn;
> > +	wake_up_all(&cil->xc_commit_wait);
> > +	spin_unlock(&cil->xc_push_lock);
> >  }
> >  
> >  /*
> > @@ -834,10 +849,16 @@ xlog_cil_set_ctx_write_state(
> >   * relies on the context LSN being zero until the log write has guaranteed the
> >   * LSN that the log write will start at via xlog_state_get_iclog_space().
> >   */
> > +enum {
> > +	_START_RECORD,
> > +	_COMMIT_RECORD,
> > +};
> 
> Stupid nit: If this enum had a name you could skip the default clause
> below because the compiler would typecheck the usage for you.

OK.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 0/8 V2] xfs: log fixes for for-next
  2021-06-17 20:26       ` Darrick J. Wong
@ 2021-06-17 23:31         ` Brian Foster
  0 siblings, 0 replies; 50+ messages in thread
From: Brian Foster @ 2021-06-17 23:31 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: Dave Chinner, linux-xfs

On Thu, Jun 17, 2021 at 01:26:42PM -0700, Darrick J. Wong wrote:
> On Thu, Jun 17, 2021 at 04:06:24PM -0400, Brian Foster wrote:
> > On Thu, Jun 17, 2021 at 12:05:19PM -0700, Darrick J. Wong wrote:
> > > On Thu, Jun 17, 2021 at 02:32:30PM -0400, Brian Foster wrote:
> > > > On Thu, Jun 17, 2021 at 06:26:09PM +1000, Dave Chinner wrote:
> > > > > Hi folks,
> > > > > 
> > > > > This is followup from the first set of log fixes for for-next that
> > > > > were posted here:
> > > > > 
> > > > > https://lore.kernel.org/linux-xfs/20210615175719.GD158209@locust/T/#mde2cf0bb7d2ac369815a7e9371f0303efc89f51b
> > > > > 
> > > > > The first two patches of this series are updates for those patches,
> > > > > change log below. The rest is the fix for the bigger issue we
> > > > > uncovered in investigating the generic/019 failures, being that
> > > > > we're triggering a zero-day bug in the way log recovery assigns LSNs
> > > > > to checkpoints.
> > > > > 
> > > > > The "simple" fix of using the same ordering code as the commit
> > > > > record for the start records in the CIL push turned into a lot of
> > > > > patches once I started cleaning it up, separating out all the
> > > > > different bits and finally realising all the things I needed to
> > > > > change to avoid unintentional logic/behavioural changes. Hence
> > > > > there's some code movement, some factoring, API changes to
> > > > > xlog_write(), changing where we attach callbacks to commit iclogs so
> > > > > they remain correctly ordered if there are multiple commit records
> > > > > in the one iclog and then, finally, strictly ordering the start
> > > > > records....
> > > > > 
> > > > > The original "simple fix" I tested last night ran almost a thousand
> > > > > cycles of generic/019 without a log hang or recovery failure of any
> > > > > kind. The refactored patchset has run a couple hundred cycles of
> > > > > g/019 and g/475 over the last few hours without a failure, so I'm
> > > > > posting this so we can get a review iteration done while I sleep so
> > > > > we can - hopefully - get this sorted out before the end of the week.
> > > > > 
> > > > 
> > > > My first spin of this included generic/019 and generic/475, ran for 18
> > > > or so iterations and 475 exploded with a stream of asserts followed by a
> > > > NULL pointer crash:
> > > > 
> > > > # grep -e Assertion -e BUG dmesg.out
> > > > ...
> > > > [ 7951.878058] XFS: Assertion failed: atomic_read(&buip->bui_refcount) > 0, file: fs/xfs/xfs_bmap_item.c, line: 57
> > > > [ 7952.261251] XFS: Assertion failed: atomic_read(&buip->bui_refcount) > 0, file: fs/xfs/xfs_bmap_item.c, line: 57
> > > > [ 7952.644444] XFS: Assertion failed: atomic_read(&buip->bui_refcount) > 0, file: fs/xfs/xfs_bmap_item.c, line: 57
> > > > [ 7953.027626] XFS: Assertion failed: atomic_read(&buip->bui_refcount) > 0, file: fs/xfs/xfs_bmap_item.c, line: 57
> > > > [ 7953.410804] BUG: kernel NULL pointer dereference, address: 000000000000031f
> > > > [ 7954.118973] BUG: unable to handle page fault for address: ffffa57ccf99fa98
> > > > 
> > > > I don't know if this is a regression, but I've not seen it before. I've
> > > > attempted to spin generic/475 since then to see if it reproduces again,
> > > > but so far I'm only running into some of the preexisting issues
> > > > associated with that test.
> > > 
> > > By any chance, do the two log recovery fixes I sent yesterday make those
> > > problems go away?
> > > 
> > 
> > Hadn't got to those ones yet...
> 
> <nod>
> 
> > > > I'll let it go a while more and probably
> > > > switch it back to running both sometime before the end of the day for an
> > > > overnight test.
> > > 
> > > Also, do the CIL livelocks go away if you apply only patches 1-2?
> > > 
> > 
> > It's kind of hard to discern the effect of individual fixes when
> > multiple corruptions are at play. :/ I suppose I could switch up my
> > planned overnight test to include the aforementioned 2 recovery fixes
> > and 1-2 from this series, if that is preferable..?
> 
> I dunno about overnight, but at least ~20 or so iterations?
> 
> > I suspect that would
> > leave around the originally reported generic/019 corruption presumably
> > caused by the start LSN ordering issue, but we could see if the deadlock
> > is addressed and whether 475 survives any longer.
> 
> Might be a useful data point to figure out if these pieces are separate
> or if they really do belong in an 8 patch series, since I think ~20 or
> so iterations shouldn't take too long (though I guess it is nearly 16:30
> your time, isn't it...)  Well, do whatever you think is best use of
> machine time.
> 

With the above combination of the first two patches in this series and
your two separate patches, I see no occurrence of a hang in ~50 iters of
generic/019 and do hit the preexisting generic/475 corruption in ~20
iters.

Brian

> --D
> 
> > 
> > Brian
> > 
> > > > A full copy of the assert and NULL pointer BUG splat is included below
> > > > for reference. It looks like the fault BUG splat ended up interspersed
> > > > or otherwise mangled, but I suspect that one is just fallout from the
> > > > immediately previous crash.
> > > 
> > > I have a question about the composition of this 8-patch series --
> > > which patches fix the new cil code, and which ones fix the out of order
> > > recovery problems?  I suspect that patches 1-2 are for the new CIL code,
> > > and 3-8 are to fix the recovery problems.
> > > 
> > > Thinking with my distro kernel not-maintainer hat on, I'm considering
> > > how to backport whatever fixes emerge for the recovery ordering issue
> > > into existing kernels.  The way I see things right now, the CIL changes
> > > (+ fixes) and the ordering bug fixes are separate issues.  The log
> > > ordering problems should get fixed as soon as we have a practical
> > > solution; the CIL changes could get deferred if need be since it's a
> > > medium-high risk; and the real question is how to sequence all this?
> > > 
> > > (Or to put it another way: I'm still stuck going "oh wowwww this is a
> > > lot more change" while trying to understand patch 4)
> > > 
> > > --D
> > > 
> > > > 
> > > > Brian
> > > > 
> > > > --- 8< ---
> > > > 
> > > > [ 7953.027626] XFS: Assertion failed: atomic_read(&buip->bui_refcount) > 0, file: fs/xfs/xfs_bmap_item.c, line: 57
> > > > [ 7953.037737] ------------[ cut here ]------------
> > > > [ 7953.042358] WARNING: CPU: 0 PID: 131627 at fs/xfs/xfs_message.c:112 assfail+0x25/0x28 [xfs]
> > > > [ 7953.050782] Modules linked in: rfkill dm_service_time dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua rpcrdma sunrpc rdma_ucm ib_srpt ib_isert iscsi_target_mod target_core_mod ib_iser libiscsi scsi_transport_iscsi rdma_cm ib_umad ib_ipoib iw_cm ib_cm mlx5_ib intel_rapl_msr intel_rapl_common isst_if_common ib_uverbs ib_core skx_edac nfit libnvdimm x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel mlx5_core kvm ipmi_ssif iTCO_wdt intel_pmc_bxt irqbypass iTCO_vendor_support rapl acpi_ipmi intel_cstate psample intel_uncore mei_me wmi_bmof ipmi_si pcspkr mlxfw i2c_i801 tg3 pci_hyperv_intf mei lpc_ich intel_pch_thermal ipmi_devintf i2c_smbus ipmi_msghandler acpi_power_meter fuse zram ip_tables xfs lpfc mgag200 drm_kms_helper nvmet_fc nvmet cec nvme_fc crct10dif_pclmul crc32_pclmul nvme_fabrics drm crc32c_intel nvme_core ghash_clmulni_intel scsi_transport_fc megaraid_sas i2c_algo_bit wmi
> > > > [ 7953.129548] CPU: 0 PID: 131627 Comm: kworker/u161:5 Tainted: G        W I       5.13.0-rc4+ #70
> > > > [ 7953.138243] Hardware name: Dell Inc. PowerEdge R740/01KPX8, BIOS 1.6.11 11/20/2018
> > > > [ 7953.145818] Workqueue: xfs-cil/dm-7 xlog_cil_push_work [xfs]
> > > > [ 7953.151554] RIP: 0010:assfail+0x25/0x28 [xfs]
> > > > [ 7953.155991] Code: ff ff 0f 0b c3 0f 1f 44 00 00 41 89 c8 48 89 d1 48 89 f2 48 c7 c6 d8 eb c3 c0 e8 cf fa ff ff 80 3d f1 d4 0a 00 00 74 02 0f 0b <0f> 0b c3 48 8d 45 10 48 89 e2 4c 89 e6 48 89 1c 24 48 89 44 24 18
> > > > [ 7953.174745] RSP: 0018:ffffa57ccf99fa50 EFLAGS: 00010246
> > > > [ 7953.179982] RAX: 00000000ffffffea RBX: 0000000500003977 RCX: 0000000000000000
> > > > [ 7953.187121] RDX: 00000000ffffffc0 RSI: 0000000000000000 RDI: ffffffffc0c300e2
> > > > [ 7953.194264] RBP: ffff91f685725040 R08: 0000000000000000 R09: 000000000000000a
> > > > [ 7953.201405] R10: 000000000000000a R11: f000000000000000 R12: ffff91f685725040
> > > > [ 7953.208546] R13: 0000000000000000 R14: ffff91f66abed140 R15: ffff91c76dfccb40
> > > > [ 7953.215686] FS:  0000000000000000(0000) GS:ffff91f580800000(0000) knlGS:0000000000000000
> > > > [ 7953.223781] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > > [ 7953.229533] CR2: 00000000020e2108 CR3: 0000003d02826003 CR4: 00000000007706f0
> > > > [ 7953.236667] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > > > [ 7953.243809] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> > > > [ 7953.250949] PKRU: 55555554
> > > > [ 7953.253669] Call Trace:
> > > > [ 7953.256123]  xfs_bui_release+0x4b/0x50 [xfs]
> > > > [ 7953.260466]  xfs_trans_committed_bulk+0x158/0x2c0 [xfs]
> > > > [ 7953.265762]  ? lock_release+0x1cd/0x2a0
> > > > [ 7953.269610]  ? _raw_spin_unlock+0x1f/0x30
> > > > [ 7953.273630]  ? xlog_write+0x1e2/0x630 [xfs]
> > > > [ 7953.277886]  ? lock_acquire+0x15d/0x380
> > > > [ 7953.281732]  ? lock_acquire+0x15d/0x380
> > > > [ 7953.285582]  ? lock_release+0x1cd/0x2a0
> > > > [ 7953.289428]  ? trace_hardirqs_on+0x1b/0xd0
> > > > [ 7953.293536]  ? _raw_spin_unlock_irqrestore+0x37/0x40
> > > > [ 7953.298511]  ? __wake_up_common_lock+0x7a/0x90
> > > > [ 7953.302966]  ? lock_release+0x1cd/0x2a0
> > > > [ 7953.306813]  xlog_cil_committed+0x34f/0x390 [xfs]
> > > > [ 7953.311593]  ? xlog_cil_push_work+0x715/0x8d0 [xfs]
> > > > [ 7953.316547]  xlog_cil_push_work+0x740/0x8d0 [xfs]
> > > > [ 7953.321321]  ? _raw_spin_unlock_irq+0x24/0x40
> > > > [ 7953.325689]  ? finish_task_switch.isra.0+0xa0/0x2c0
> > > > [ 7953.330580]  ? kmem_cache_free+0x247/0x5c0
> > > > [ 7953.334685]  ? fsnotify_final_mark_destroy+0x1c/0x30
> > > > [ 7953.339658]  ? lock_acquire+0x15d/0x380
> > > > [ 7953.343505]  ? lock_acquire+0x15d/0x380
> > > > [ 7953.347353]  ? lock_release+0x1cd/0x2a0
> > > > [ 7953.351203]  process_one_work+0x26e/0x560
> > > > [ 7953.355225]  worker_thread+0x52/0x3b0
> > > > [ 7953.358898]  ? process_one_work+0x560/0x560
> > > > [ 7953.363094]  kthread+0x12c/0x150
> > > > [ 7953.366335]  ? __kthread_bind_mask+0x60/0x60
> > > > [ 7953.370617]  ret_from_fork+0x22/0x30
> > > > [ 7953.374206] irq event stamp: 0
> > > > [ 7953.377268] hardirqs last  enabled at (0): [<0000000000000000>] 0x0
> > > > [ 7953.383544] hardirqs last disabled at (0): [<ffffffffb50da3f4>] copy_process+0x754/0x1d00
> > > > [ 7953.391724] softirqs last  enabled at (0): [<ffffffffb50da3f4>] copy_process+0x754/0x1d00
> > > > [ 7953.399907] softirqs last disabled at (0): [<0000000000000000>] 0x0
> > > > [ 7953.406179] ---[ end trace f04c960f66265f3a ]---
> > > > [ 7953.410804] BUG: kernel NULL pointer dereference, address: 000000000000031f
> > > > [ 7953.417760] #PF: supervisor read access in kernel mode
> > > > [ 7953.422900] #PF: error_code(0x0000) - not-present page
> > > > [ 7953.428038] PGD 0 P4D 0 
> > > > [ 7953.430579] Oops: 0000 [#1] SMP PTI
> > > > [ 7953.434070] CPU: 0 PID: 131627 Comm: kworker/u161:5 Tainted: G        W I       5.13.0-rc4+ #70
> > > > [ 7953.442764] Hardware name: Dell Inc. PowerEdge R740/01KPX8, BIOS 1.6.11 11/20/2018
> > > > [ 7953.450330] Workqueue: xfs-cil/dm-7 xlog_cil_push_work [xfs]
> > > > [ 7953.456058] RIP: 0010:xfs_trans_committed_bulk+0xcc/0x2c0 [xfs]
> > > > [ 7953.462036] Code: 41 83 c5 01 48 89 54 c4 50 41 83 fd 1f 0f 8f 11 01 00 00 4d 8b 36 4c 3b 34 24 74 28 4d 8b 66 20 40 84 ed 75 54 49 8b 44 24 60 <f6> 00 01 74 91 48 8b 40 38 4c 89 e7 e8 63 6b 42 f5 4d 8b 36 4c 3b
> > > > [ 7953.480783] RSP: 0018:ffffa57ccf99fa68 EFLAGS: 00010202
> > > > [ 7953.486009] RAX: 000000000000031f RBX: 0000000500003977 RCX: 0000000000000000
> > > > [ 7953.493141] RDX: 00000000ffffffc0 RSI: 0000000000000000 RDI: ffffffffc0c300e2
> > > > [ 7953.500274] RBP: 0000000000000001 R08: 0000000000000000 R09: 000000000000000a
> > > > [ 7953.507404] R10: 000000000000000a R11: f000000000000000 R12: ffff91c759fedb20
> > > > [ 7953.514536] R13: 0000000000000000 R14: ffff91c759fedb00 R15: ffff91c76dfccb40
> > > > [ 7953.521671] FS:  0000000000000000(0000) GS:ffff91f580800000(0000) knlGS:0000000000000000
> > > > [ 7953.529757] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > > [ 7953.535501] CR2: 000000000000031f CR3: 0000003d02826003 CR4: 00000000007706f0
> > > > [ 7953.542633] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > > > [ 7953.549768] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> > > > [ 7953.556899] PKRU: 55555554
> > > > [ 7953.559612] Call Trace:
> > > > [ 7953.562064]  ? lock_release+0x1cd/0x2a0
> > > > [ 7953.565902]  ? _raw_spin_unlock+0x1f/0x30
> > > > [ 7953.569917]  ? xlog_write+0x1e2/0x630 [xfs]
> > > > [ 7953.574162]  ? lock_acquire+0x15d/0x380
> > > > [ 7953.578000]  ? lock_acquire+0x15d/0x380
> > > > [ 7953.581841]  ? lock_release+0x1cd/0x2a0
> > > > [ 7953.585680]  ? trace_hardirqs_on+0x1b/0xd0
> > > > [ 7953.589780]  ? _raw_spin_unlock_irqrestore+0x37/0x40
> > > > [ 7953.594744]  ? __wake_up_common_lock+0x7a/0x90
> > > > [ 7953.599192]  ? lock_release+0x1cd/0x2a0
> > > > [ 7953.603031]  xlog_cil_committed+0x34f/0x390 [xfs]
> > > > [ 7953.607798]  ? xlog_cil_push_work+0x715/0x8d0 [xfs]
> > > > [ 7953.612738]  xlog_cil_push_work+0x740/0x8d0 [xfs]
> > > > [ 7953.617504]  ? _raw_spin_unlock_irq+0x24/0x40
> > > > [ 7953.621862]  ? finish_task_switch.isra.0+0xa0/0x2c0
> > > > [ 7953.626745]  ? kmem_cache_free+0x247/0x5c0
> > > > [ 7953.630839]  ? fsnotify_final_mark_destroy+0x1c/0x30
> > > > [ 7953.635806]  ? lock_acquire+0x15d/0x380
> > > > [ 7953.639646]  ? lock_acquire+0x15d/0x380
> > > > [ 7953.643484]  ? lock_release+0x1cd/0x2a0
> > > > [ 7953.647323]  process_one_work+0x26e/0x560
> > > > [ 7953.651337]  worker_thread+0x52/0x3b0
> > > > [ 7953.655003]  ? process_one_work+0x560/0x560
> > > > [ 7953.659188]  kthread+0x12c/0x150
> > > > [ 7953.662421]  ? __kthread_bind_mask+0x60/0x60
> > > > [ 7953.666694]  ret_from_fork+0x22/0x30
> > > > [ 7953.670273] Modules linked in: rfkill dm_service_time dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua rpcrdma sunrpc rdma_ucm ib_srpt ib_isert iscsi_target_mod target_core_mod ib_iser libiscsi scsi_transport_iscsi rdma_cm ib_umad ib_ipoib iw_cm ib_cm mlx5_ib intel_rapl_msr intel_rapl_common isst_if_common ib_uverbs ib_core skx_edac nfit libnvdimm x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel mlx5_core kvm ipmi_ssif iTCO_wdt intel_pmc_bxt irqbypass iTCO_vendor_support rapl acpi_ipmi intel_cstate psample intel_uncore mei_me wmi_bmof ipmi_si pcspkr mlxfw i2c_i801 tg3 pci_hyperv_intf mei lpc_ich intel_pch_thermal ipmi_devintf i2c_smbus ipmi_msghandler acpi_power_meter fuse zram ip_tables xfs lpfc mgag200 drm_kms_helper nvmet_fc nvmet cec nvme_fc crct10dif_pclmul crc32_pclmul nvme_fabrics drm crc32c_intel nvme_core ghash_clmulni_intel scsi_transport_fc megaraid_sas i2c_algo_bit wmi
> > > > [ 7953.749025] CR2: 000000000000031f
> > > > [ 7953.752345] ---[ end trace f04c960f66265f3b ]---
> > > > 
> > > 
> > 
> 


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 0/8 V2] xfs: log fixes for for-next
  2021-06-17 19:05   ` Darrick J. Wong
  2021-06-17 20:06     ` Brian Foster
@ 2021-06-17 23:43     ` Dave Chinner
  2021-06-18 13:08       ` Brian Foster
  1 sibling, 1 reply; 50+ messages in thread
From: Dave Chinner @ 2021-06-17 23:43 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: Brian Foster, linux-xfs

On Thu, Jun 17, 2021 at 12:05:19PM -0700, Darrick J. Wong wrote:
> On Thu, Jun 17, 2021 at 02:32:30PM -0400, Brian Foster wrote:
> > On Thu, Jun 17, 2021 at 06:26:09PM +1000, Dave Chinner wrote:
> > > Hi folks,
> > > 
> > > This is followup from the first set of log fixes for for-next that
> > > were posted here:
> > > 
> > > https://lore.kernel.org/linux-xfs/20210615175719.GD158209@locust/T/#mde2cf0bb7d2ac369815a7e9371f0303efc89f51b
> > > 
> > > The first two patches of this series are updates for those patches,
> > > change log below. The rest is the fix for the bigger issue we
> > > uncovered in investigating the generic/019 failures, being that
> > > we're triggering a zero-day bug in the way log recovery assigns LSNs
> > > to checkpoints.
> > > 
> > > The "simple" fix of using the same ordering code as the commit
> > > record for the start records in the CIL push turned into a lot of
> > > patches once I started cleaning it up, separating out all the
> > > different bits and finally realising all the things I needed to
> > > change to avoid unintentional logic/behavioural changes. Hence
> > > there's some code movement, some factoring, API changes to
> > > xlog_write(), changing where we attach callbacks to commit iclogs so
> > > they remain correctly ordered if there are multiple commit records
> > > in the one iclog and then, finally, strictly ordering the start
> > > records....
> > > 
> > > The original "simple fix" I tested last night ran almost a thousand
> > > cycles of generic/019 without a log hang or recovery failure of any
> > > kind. The refactored patchset has run a couple hundred cycles of
> > > g/019 and g/475 over the last few hours without a failure, so I'm
> > > posting this so we can get a review iteration done while I sleep so
> > > we can - hopefully - get this sorted out before the end of the week.
> > > 
> > 
> > My first spin of this included generic/019 and generic/475, ran for 18
> > or so iterations and 475 exploded with a stream of asserts followed by a
> > NULL pointer crash:
> > 
> > # grep -e Assertion -e BUG dmesg.out
> > ...
> > [ 7951.878058] XFS: Assertion failed: atomic_read(&buip->bui_refcount) > 0, file: fs/xfs/xfs_bmap_item.c, line: 57
> > [ 7952.261251] XFS: Assertion failed: atomic_read(&buip->bui_refcount) > 0, file: fs/xfs/xfs_bmap_item.c, line: 57
> > [ 7952.644444] XFS: Assertion failed: atomic_read(&buip->bui_refcount) > 0, file: fs/xfs/xfs_bmap_item.c, line: 57
> > [ 7953.027626] XFS: Assertion failed: atomic_read(&buip->bui_refcount) > 0, file: fs/xfs/xfs_bmap_item.c, line: 57
> > [ 7953.410804] BUG: kernel NULL pointer dereference, address: 000000000000031f
> > [ 7954.118973] BUG: unable to handle page fault for address: ffffa57ccf99fa98
> > 
> > I don't know if this is a regression, but I've not seen it before. I've
> > attempted to spin generic/475 since then to see if it reproduces again,
> > but so far I'm only running into some of the preexisting issues
> > associated with that test.

I've not seen anything like that. I can't see how the changes in the
patchset would affect BUI reference counting in any way. That seems
more like an underlying intent item shutdown reference count issue
to me (and we've had a *lot* of them in the past)....

> By any chance, do the two log recovery fixes I sent yesterday make those
> problems go away?
> 
> > I'll let it go a while more and probably
> > switch it back to running both sometime before the end of the day for an
> > overnight test.
> 
> Also, do the CIL livelocks go away if you apply only patches 1-2?
> 
> > A full copy of the assert and NULL pointer BUG splat is included below
> > for reference. It looks like the fault BUG splat ended up interspersed
> > or otherwise mangled, but I suspect that one is just fallout from the
> > immediately previous crash.
> 
> I have a question about the composition of this 8-patch series --
> which patches fix the new cil code, and which ones fix the out of order
> recovery problems?  I suspect that patches 1-2 are for the new CIL code,
> and 3-8 are to fix the recovery problems.

Yes. But don't think of 3-8 as fixing recovery problems - the are
fixing potential runtime data integrity issues (log force lsns for
fsync are based on start LSNs) and journal head->tail overwrite
issues (because AIL ordering is start LSN based).

So, basically, we get the reocvery fixes for free when we fix the
runtime start LSN ordering issues...

> Thinking with my distro kernel not-maintainer hat on, I'm considering
> how to backport whatever fixes emerge for the recovery ordering issue
> into existing kernels.  The way I see things right now, the CIL changes
> (+ fixes) and the ordering bug fixes are separate issues.  The log
> ordering problems should get fixed as soon as we have a practical
> solution; the CIL changes could get deferred if need be since it's a
> medium-high risk; and the real question is how to sequence all this?

The CIL changes in patches 1-2 are low risk - that's just a hang
because of a logic error and we fix that sort of thing all the time

> (Or to put it another way: I'm still stuck going "oh wowwww this is a
> lot more change" while trying to understand patch 4)

It's not unreasonable given the amount of change that was made in
the first place. Really, though, once you take the tracing and code
movement out of it, the actual logic change is much, much smaller...

/me wonders if anyone remembers that I said up front that I
considered the changes to the log code completely unreviewable and
that there would be bugs that slip through both my testing and
review?

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 0/8 V2] xfs: log fixes for for-next
  2021-06-17 23:43     ` Dave Chinner
@ 2021-06-18 13:08       ` Brian Foster
  2021-06-18 13:55         ` Christoph Hellwig
  2021-06-18 22:15         ` Dave Chinner
  0 siblings, 2 replies; 50+ messages in thread
From: Brian Foster @ 2021-06-18 13:08 UTC (permalink / raw)
  To: Dave Chinner; +Cc: Darrick J. Wong, linux-xfs

On Fri, Jun 18, 2021 at 09:43:08AM +1000, Dave Chinner wrote:
> On Thu, Jun 17, 2021 at 12:05:19PM -0700, Darrick J. Wong wrote:
> > On Thu, Jun 17, 2021 at 02:32:30PM -0400, Brian Foster wrote:
> > > On Thu, Jun 17, 2021 at 06:26:09PM +1000, Dave Chinner wrote:
> > > > Hi folks,
> > > > 
> > > > This is followup from the first set of log fixes for for-next that
> > > > were posted here:
> > > > 
> > > > https://lore.kernel.org/linux-xfs/20210615175719.GD158209@locust/T/#mde2cf0bb7d2ac369815a7e9371f0303efc89f51b
> > > > 
> > > > The first two patches of this series are updates for those patches,
> > > > change log below. The rest is the fix for the bigger issue we
> > > > uncovered in investigating the generic/019 failures, being that
> > > > we're triggering a zero-day bug in the way log recovery assigns LSNs
> > > > to checkpoints.
> > > > 
> > > > The "simple" fix of using the same ordering code as the commit
> > > > record for the start records in the CIL push turned into a lot of
> > > > patches once I started cleaning it up, separating out all the
> > > > different bits and finally realising all the things I needed to
> > > > change to avoid unintentional logic/behavioural changes. Hence
> > > > there's some code movement, some factoring, API changes to
> > > > xlog_write(), changing where we attach callbacks to commit iclogs so
> > > > they remain correctly ordered if there are multiple commit records
> > > > in the one iclog and then, finally, strictly ordering the start
> > > > records....
> > > > 
> > > > The original "simple fix" I tested last night ran almost a thousand
> > > > cycles of generic/019 without a log hang or recovery failure of any
> > > > kind. The refactored patchset has run a couple hundred cycles of
> > > > g/019 and g/475 over the last few hours without a failure, so I'm
> > > > posting this so we can get a review iteration done while I sleep so
> > > > we can - hopefully - get this sorted out before the end of the week.
> > > > 
> > > 
> > > My first spin of this included generic/019 and generic/475, ran for 18
> > > or so iterations and 475 exploded with a stream of asserts followed by a
> > > NULL pointer crash:
> > > 
> > > # grep -e Assertion -e BUG dmesg.out
> > > ...
> > > [ 7951.878058] XFS: Assertion failed: atomic_read(&buip->bui_refcount) > 0, file: fs/xfs/xfs_bmap_item.c, line: 57
> > > [ 7952.261251] XFS: Assertion failed: atomic_read(&buip->bui_refcount) > 0, file: fs/xfs/xfs_bmap_item.c, line: 57
> > > [ 7952.644444] XFS: Assertion failed: atomic_read(&buip->bui_refcount) > 0, file: fs/xfs/xfs_bmap_item.c, line: 57
> > > [ 7953.027626] XFS: Assertion failed: atomic_read(&buip->bui_refcount) > 0, file: fs/xfs/xfs_bmap_item.c, line: 57
> > > [ 7953.410804] BUG: kernel NULL pointer dereference, address: 000000000000031f
> > > [ 7954.118973] BUG: unable to handle page fault for address: ffffa57ccf99fa98
> > > 
> > > I don't know if this is a regression, but I've not seen it before. I've
> > > attempted to spin generic/475 since then to see if it reproduces again,
> > > but so far I'm only running into some of the preexisting issues
> > > associated with that test.
> 
> I've not seen anything like that. I can't see how the changes in the
> patchset would affect BUI reference counting in any way. That seems
> more like an underlying intent item shutdown reference count issue
> to me (and we've had a *lot* of them in the past)....
> 

I've not made sense of it either, but at the same time, I've not seen it
in all my testing thus far up until targeting this series, and now I've
seen it twice in as many test runs as my overnight run fell into some
kind of similar haywire state. Unfortunately it seemed to be
spinning/streaming assert output so I lost any record of the initial
crash signature. It wouldn't surprise me if the fundamental problem is
some older bug in another area of code, but it's hard to believe it's
not at least related to this series somehow.

Also FYI, earlier iterations of generic/475 triggered a couple instances
of the following assert failure before things broke down more severely:

 XFS: Assertion failed: *log_offset + *len <= iclog->ic_size || iclog->ic_state == XLOG_STATE_WANT_SYNC, file: fs/xfs/xfs_log.c, line: 2115
 ...
 ------------[ cut here ]------------
 WARNING: CPU: 45 PID: 951355 at fs/xfs/xfs_message.c:112 assfail+0x25/0x28 [xfs]
 Modules linked in: rfkill dm_service_time dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua rpcrdma sunrpc rdma_ucm ib_srpt ib_isert iscsi_target_mod target_core_mod ib_iser libiscsi scsi_transport_iscsi ib_umad ib_ipoib rdma_cm iw_cm ib_cm intel_rapl_msr mlx5_ib intel_rapl_common ib_uverbs isst_if_common ib_core skx_edac nfit libnvdimm x86_pkg_temp_thermal intel_powerclamp coretemp mlx5_core kvm_intel kvm ipmi_ssif irqbypass iTCO_wdt intel_pmc_bxt rapl psample intel_cstate iTCO_vendor_support acpi_ipmi mlxfw intel_uncore pci_hyperv_intf pcspkr wmi_bmof tg3 mei_me ipmi_si i2c_i801 mei ipmi_devintf i2c_smbus lpc_ich intel_pch_thermal ipmi_msghandler acpi_power_meter fuse zram ip_tables xfs lpfc mgag200 drm_kms_helper nvmet_fc nvmet cec crct10dif_pclmul nvme_fc crc32_pclmul drm nvme_fabrics crc32c_intel nvme_core ghash_clmulni_intel megaraid_sas scsi_transport_fc i2c_algo_bit wmi
 CPU: 45 PID: 951355 Comm: kworker/u162:5 Tainted: G          I       5.13.0-rc4+ #70
 Hardware name: Dell Inc. PowerEdge R740/01KPX8, BIOS 1.6.11 11/20/2018
 Workqueue: xfs-cil/dm-7 xlog_cil_push_work [xfs]
 RIP: 0010:assfail+0x25/0x28 [xfs]
 Code: ff ff 0f 0b c3 0f 1f 44 00 00 41 89 c8 48 89 d1 48 89 f2 48 c7 c6 d8 db 36 c0 e8 cf fa ff ff 80 3d f1 d4 0a 00 00 74 02 0f 0b <0f> 0b c3 48 8d 45 10 48 89 e2 4c 89 e6 48 89 1c 24 48 89 44 24 18
 RSP: 0018:ffffa59c80ce3bb0 EFLAGS: 00010246
 RAX: 00000000ffffffea RBX: ffff8b2671dddc00 RCX: 0000000000000000
 RDX: 00000000ffffffc0 RSI: 0000000000000000 RDI: ffffffffc035f0e2
 RBP: 0000000000015d60 R08: 0000000000000000 R09: 000000000000000a
 R10: 000000000000000a R11: f000000000000000 R12: ffff8b241716e6c0
 R13: 000000000000003c R14: ffff8b241716e6c0 R15: ffff8b24d9d17000
 FS:  0000000000000000(0000) GS:ffff8b52ff980000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 00007f0d0e270910 CR3: 00000031a2826002 CR4: 00000000007706e0
 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
 PKRU: 55555554
 Call Trace:
  xlog_write+0x567/0x630 [xfs]
  xlog_cil_push_work+0x5bd/0x8d0 [xfs]
  ? load_balance+0x179/0xd60
  ? lock_acquire+0x15d/0x380
  ? lock_release+0x1cd/0x2a0
  ? lock_acquire+0x15d/0x380
  ? lock_release+0x1cd/0x2a0
  ? finish_task_switch.isra.0+0xa0/0x2c0
  process_one_work+0x26e/0x560
  worker_thread+0x52/0x3b0
  ? process_one_work+0x560/0x560
  kthread+0x12c/0x150
  ? __kthread_bind_mask+0x60/0x60
  ret_from_fork+0x22/0x30
 irq event stamp: 0
 hardirqs last  enabled at (0): [<0000000000000000>] 0x0
 hardirqs last disabled at (0): [<ffffffffa10da3f4>] copy_process+0x754/0x1d00
 softirqs last  enabled at (0): [<ffffffffa10da3f4>] copy_process+0x754/0x1d00
 softirqs last disabled at (0): [<0000000000000000>] 0x0
 ---[ end trace 275cd74c3f62be17 ]---

Brian


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 0/8 V2] xfs: log fixes for for-next
  2021-06-18 13:08       ` Brian Foster
@ 2021-06-18 13:55         ` Christoph Hellwig
  2021-06-18 14:02           ` Christoph Hellwig
  2021-06-18 22:28           ` Dave Chinner
  2021-06-18 22:15         ` Dave Chinner
  1 sibling, 2 replies; 50+ messages in thread
From: Christoph Hellwig @ 2021-06-18 13:55 UTC (permalink / raw)
  To: Brian Foster; +Cc: Dave Chinner, Darrick J. Wong, linux-xfs

On Fri, Jun 18, 2021 at 09:08:15AM -0400, Brian Foster wrote:
> Also FYI, earlier iterations of generic/475 triggered a couple instances
> of the following assert failure before things broke down more severely:
> 
>  XFS: Assertion failed: *log_offset + *len <= iclog->ic_size || iclog->ic_state == XLOG_STATE_WANT_SYNC, file: fs/xfs/xfs_log.c, line: 2115

As you mentioned the placement of this exact assert in my cleanups
series:  after looking at a right place to move it, I'm really not sure
this assert makes much sense in this form.

xlog_write_single is always entered first by xlog_write, so we also
get here for something that later gets handled by xlog_write_partial.
Which means it could be way bigger than the current iclog, and I see no
reason why that iclog would have to be XLOG_STATE_WANT_SYNC.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 0/8 V2] xfs: log fixes for for-next
  2021-06-18 13:55         ` Christoph Hellwig
@ 2021-06-18 14:02           ` Christoph Hellwig
  2021-06-18 22:28           ` Dave Chinner
  1 sibling, 0 replies; 50+ messages in thread
From: Christoph Hellwig @ 2021-06-18 14:02 UTC (permalink / raw)
  To: Brian Foster; +Cc: Dave Chinner, Darrick J. Wong, linux-xfs

On Fri, Jun 18, 2021 at 02:55:03PM +0100, Christoph Hellwig wrote:
> xlog_write_single is always entered first by xlog_write, so we also
> get here for something that later gets handled by xlog_write_partial.
> Which means it could be way bigger than the current iclog, and I see no
> reason why that iclog would have to be XLOG_STATE_WANT_SYNC.

Actually I'll take that back.  There is a second call to
xlog_state_switch_iclogs which we should hit and thus have moved to
XLOG_STATE_WANT_SYNC.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 1/8] xfs: add iclog state trace events
  2021-06-17  8:26 ` [PATCH 1/8] xfs: add iclog state trace events Dave Chinner
  2021-06-17 16:45   ` Darrick J. Wong
@ 2021-06-18 14:09   ` Christoph Hellwig
  1 sibling, 0 replies; 50+ messages in thread
From: Christoph Hellwig @ 2021-06-18 14:09 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs

On Thu, Jun 17, 2021 at 06:26:10PM +1000, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> For the DEBUGS!
> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>

Looks good,

Reviewed-by: Christoph Hellwig <hch@lst.de>

(although I wouldn't mind a more useful commit message)

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 3/8] xfs: move xlog_commit_record to xfs_log_cil.c
  2021-06-17  8:26 ` [PATCH 3/8] xfs: move xlog_commit_record to xfs_log_cil.c Dave Chinner
  2021-06-17 12:57     ` kernel test robot
  2021-06-17 17:50   ` Darrick J. Wong
@ 2021-06-18 14:16   ` Christoph Hellwig
  2 siblings, 0 replies; 50+ messages in thread
From: Christoph Hellwig @ 2021-06-18 14:16 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs

On Thu, Jun 17, 2021 at 06:26:12PM +1000, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> It is only used by the CIL checkpoints, and is the counterpart to
> start record formatting and writing that is already local to
> xfs_log_cil.c.
> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>

Looks good,

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 4/8] xfs: pass a CIL context to xlog_write()
  2021-06-17  8:26 ` [PATCH 4/8] xfs: pass a CIL context to xlog_write() Dave Chinner
  2021-06-17 14:46     ` kernel test robot
  2021-06-17 20:24   ` Darrick J. Wong
@ 2021-06-18 14:23   ` Christoph Hellwig
  2 siblings, 0 replies; 50+ messages in thread
From: Christoph Hellwig @ 2021-06-18 14:23 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs

> +	/*
> +	 * If we have a CIL context, record the LSN of the iclog we were just
> +	 * granted space to start writing into. If the context doesn't have
> +	 * a start_lsn recorded, then this iclog will contain the start record
> +	 * for the checkpoint. Otherwise this write contains the commit record
> +	 * for the checkpoint.
> +	 */
> +	if (ctx) {
> +		spin_lock(&ctx->cil->xc_push_lock);
> +		if (!ctx->start_lsn)
> +			ctx->start_lsn = be64_to_cpu(iclog->ic_header.h_lsn);
> +		else
> +			ctx->commit_lsn = be64_to_cpu(iclog->ic_header.h_lsn);
> +		spin_unlock(&ctx->cil->xc_push_lock);
> +	}

I have to say that having this cil_ctx specific logic that somehow
reverse eingeer what the callers is doing here seems pretty awkware.
To me the logical interface would be to pass a function pointer and
private data except for the performance penalty of indirect calls.

But to make this somewhat bearable I think you should start with the
above block as a helper implemented in xfs_log_cil.c.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 5/8] xfs: factor out log write ordering from xlog_cil_push_work()
  2021-06-17 19:59   ` Darrick J. Wong
@ 2021-06-18 14:27     ` Christoph Hellwig
  2021-06-18 22:34       ` Dave Chinner
  0 siblings, 1 reply; 50+ messages in thread
From: Christoph Hellwig @ 2021-06-18 14:27 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: Dave Chinner, linux-xfs

On Thu, Jun 17, 2021 at 12:59:04PM -0700, Darrick J. Wong wrote:
> On Thu, Jun 17, 2021 at 06:26:14PM +1000, Dave Chinner wrote:
> > From: Dave Chinner <dchinner@redhat.com>
> > 
> > So we can use it for start record ordering as well as commit record
> > ordering in future.
> > 
> > Signed-off-by: Dave Chinner <dchinner@redhat.com>
> 
> This tricked me for a second until I realized that xlog_cil_order_write
> is the chunk of code just prior to the xlog_cil_write_commit_record
> call.

Yeah, moving the caller at the same time as the factoring is a trick
test for every reader.  I think this needs to be documented in the
commit log.  Or even better moved to a separate log, but it seems you
get shot for that kind of suggestion on the xfs list these days..

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 0/8 V2] xfs: log fixes for for-next
  2021-06-18 13:08       ` Brian Foster
  2021-06-18 13:55         ` Christoph Hellwig
@ 2021-06-18 22:15         ` Dave Chinner
  1 sibling, 0 replies; 50+ messages in thread
From: Dave Chinner @ 2021-06-18 22:15 UTC (permalink / raw)
  To: Brian Foster; +Cc: Darrick J. Wong, linux-xfs

On Fri, Jun 18, 2021 at 09:08:15AM -0400, Brian Foster wrote:
> On Fri, Jun 18, 2021 at 09:43:08AM +1000, Dave Chinner wrote:
> > On Thu, Jun 17, 2021 at 12:05:19PM -0700, Darrick J. Wong wrote:
> > > On Thu, Jun 17, 2021 at 02:32:30PM -0400, Brian Foster wrote:
> > > > On Thu, Jun 17, 2021 at 06:26:09PM +1000, Dave Chinner wrote:
> > > > > Hi folks,
> > > > > 
> > > > > This is followup from the first set of log fixes for for-next that
> > > > > were posted here:
> > > > > 
> > > > > https://lore.kernel.org/linux-xfs/20210615175719.GD158209@locust/T/#mde2cf0bb7d2ac369815a7e9371f0303efc89f51b
> > > > > 
> > > > > The first two patches of this series are updates for those patches,
> > > > > change log below. The rest is the fix for the bigger issue we
> > > > > uncovered in investigating the generic/019 failures, being that
> > > > > we're triggering a zero-day bug in the way log recovery assigns LSNs
> > > > > to checkpoints.
> > > > > 
> > > > > The "simple" fix of using the same ordering code as the commit
> > > > > record for the start records in the CIL push turned into a lot of
> > > > > patches once I started cleaning it up, separating out all the
> > > > > different bits and finally realising all the things I needed to
> > > > > change to avoid unintentional logic/behavioural changes. Hence
> > > > > there's some code movement, some factoring, API changes to
> > > > > xlog_write(), changing where we attach callbacks to commit iclogs so
> > > > > they remain correctly ordered if there are multiple commit records
> > > > > in the one iclog and then, finally, strictly ordering the start
> > > > > records....
> > > > > 
> > > > > The original "simple fix" I tested last night ran almost a thousand
> > > > > cycles of generic/019 without a log hang or recovery failure of any
> > > > > kind. The refactored patchset has run a couple hundred cycles of
> > > > > g/019 and g/475 over the last few hours without a failure, so I'm
> > > > > posting this so we can get a review iteration done while I sleep so
> > > > > we can - hopefully - get this sorted out before the end of the week.
> > > > > 
> > > > 
> > > > My first spin of this included generic/019 and generic/475, ran for 18
> > > > or so iterations and 475 exploded with a stream of asserts followed by a
> > > > NULL pointer crash:
> > > > 
> > > > # grep -e Assertion -e BUG dmesg.out
> > > > ...
> > > > [ 7951.878058] XFS: Assertion failed: atomic_read(&buip->bui_refcount) > 0, file: fs/xfs/xfs_bmap_item.c, line: 57
> > > > [ 7952.261251] XFS: Assertion failed: atomic_read(&buip->bui_refcount) > 0, file: fs/xfs/xfs_bmap_item.c, line: 57
> > > > [ 7952.644444] XFS: Assertion failed: atomic_read(&buip->bui_refcount) > 0, file: fs/xfs/xfs_bmap_item.c, line: 57
> > > > [ 7953.027626] XFS: Assertion failed: atomic_read(&buip->bui_refcount) > 0, file: fs/xfs/xfs_bmap_item.c, line: 57
> > > > [ 7953.410804] BUG: kernel NULL pointer dereference, address: 000000000000031f
> > > > [ 7954.118973] BUG: unable to handle page fault for address: ffffa57ccf99fa98
> > > > 
> > > > I don't know if this is a regression, but I've not seen it before. I've
> > > > attempted to spin generic/475 since then to see if it reproduces again,
> > > > but so far I'm only running into some of the preexisting issues
> > > > associated with that test.
> > 
> > I've not seen anything like that. I can't see how the changes in the
> > patchset would affect BUI reference counting in any way. That seems
> > more like an underlying intent item shutdown reference count issue
> > to me (and we've had a *lot* of them in the past)....
> > 
> 
> I've not made sense of it either, but at the same time, I've not seen it
> in all my testing thus far up until targeting this series, and now I've
> seen it twice in as many test runs as my overnight run fell into some
> kind of similar haywire state. Unfortunately it seemed to be
> spinning/streaming assert output so I lost any record of the initial
> crash signature. It wouldn't surprise me if the fundamental problem is
> some older bug in another area of code, but it's hard to believe it's
> not at least related to this series somehow.
> 
> Also FYI, earlier iterations of generic/475 triggered a couple instances
> of the following assert failure before things broke down more severely:
> 
>  XFS: Assertion failed: *log_offset + *len <= iclog->ic_size || iclog->ic_state == XLOG_STATE_WANT_SYNC, file: fs/xfs/xfs_log.c, line: 2115

Yup, that's a bogus state check in the asssert. I already have a
patch to fix that - the async shutdown can change the iclog state
to XLOG_STATE_IOERROR at any time, so any iclog state assert outside of
the log->l_icloglock needs also to allow for XLOG_STATE_IOERROR as
a valid state.

This is one of the problems I was alluding to on #xfs when I said:

[18/6/21 14:42] <dchinner> I'm really not liking getting repeatedly
caught out by racing, unreferenced iclog state changes during
shutdown and having to handle them everywhere.

Patch, FYI, below.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

xfs: fix incorrect assert in xlog_write_single

From: Dave Chinner <dchinner@redhat.com>

generic/475 failed with this assert after a log shutdown:

[ 3953.166235] XFS: Assertion failed: *log_offset + *len <= iclog->ic_size || iclog->ic_state == XLOG_STATE_WANT_SYNC, file: fs/xfs/xfs_log.c, line: 2115

The problem is that after the log has shut down, the iclog state is
XLOG_STATE_IOERROR. The shutdown can change the iclog state at any
time while we are writing to it, so we need to add IOERROR to the
valid states here.

Note that we already have similar IOERROR state checks in asserts
in the xlog_write() code for this reason (e.g. in
xlog_write_get_more_iclog_space()) so this is just a case where the
IOERROR state check was missed. The IOERROR state will be processed
when we release the iclog, so just add the state into the assert and
let the iclog release code handle the error.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/xfs_log.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c
index 94b6bccb9de9..221c080df305 100644
--- a/fs/xfs/xfs_log.c
+++ b/fs/xfs/xfs_log.c
@@ -2113,7 +2113,8 @@ xlog_write_single(
 	int			index;
 
 	ASSERT(*log_offset + *len <= iclog->ic_size ||
-		iclog->ic_state == XLOG_STATE_WANT_SYNC);
+		iclog->ic_state == XLOG_STATE_WANT_SYNC ||
+		iclog->ic_state == XLOG_STATE_IOERROR);
 
 	ptr = iclog->ic_datap + *log_offset;
 	for (lv = log_vector;

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* Re: [PATCH 0/8 V2] xfs: log fixes for for-next
  2021-06-18 13:55         ` Christoph Hellwig
  2021-06-18 14:02           ` Christoph Hellwig
@ 2021-06-18 22:28           ` Dave Chinner
  1 sibling, 0 replies; 50+ messages in thread
From: Dave Chinner @ 2021-06-18 22:28 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Brian Foster, Darrick J. Wong, linux-xfs

On Fri, Jun 18, 2021 at 02:55:03PM +0100, Christoph Hellwig wrote:
> On Fri, Jun 18, 2021 at 09:08:15AM -0400, Brian Foster wrote:
> > Also FYI, earlier iterations of generic/475 triggered a couple instances
> > of the following assert failure before things broke down more severely:
> > 
> >  XFS: Assertion failed: *log_offset + *len <= iclog->ic_size || iclog->ic_state == XLOG_STATE_WANT_SYNC, file: fs/xfs/xfs_log.c, line: 2115
> 
> As you mentioned the placement of this exact assert in my cleanups
> series:  after looking at a right place to move it, I'm really not sure
> this assert makes much sense in this form.

It actually makes perfect sense when you look at the iclog state
transitions in xlog_state_get_iclog_space() w.r.t. the length that
is passed to it.

> xlog_write_single is always entered first by xlog_write, so we also
> get here for something that later gets handled by xlog_write_partial.
> Which means it could be way bigger than the current iclog, and I see no
> reason why that iclog would have to be XLOG_STATE_WANT_SYNC.

Yup, completely intentional and if len is larger than can fit in the
iclog we are writing into, the iclog *must* be in
XLOG_STATE_WANT_SYNC.

 That is, if the length requested in xlog_state_get_iclog_space()
fits entirely in the iclog that is returned, _get_space() will
increment the offset of the iclog to exclusively reserve that amount
of space for the write we are going to do. It then leaves the state
as ACTIVE so another process can then also reserve some/all of the
remaining unused space in the iclog. Hence here in
xlog_write_single() we will have *log_offset + *len <=
iclog->ic_size and ic_state = ACTIVE as true for a write that fits
entirely in the iclog.

If _get_space() finds that the len is larger than will fit in the
iclog, it will reserve the entire remaining space in the iclog for the current caller
by switching out the iclog and moving the state to
XLOG_STATE_WANT_SYNC. This means no other caller to _get_space() will
be able to reserve space in this iclog because the state is no
longer ACTIVE.

IOWs, if  *log_offset + *len > iclog->ic_size, then _get_space()
*must* have set the state of the iclog to _WANT_SYNC so that the
owner of the iclog has exclusive use of the space in the iclog from
*log_offset all the way to the end of the iclog. The overlap beyond
the end of this iclog will be handled by the xlog_write_partial(),
and it will release this iclog and get a new one to continue the
write.

Long story short, the assert is valid, but asynchronous shutdown
changing ic_state without having references to the iclogs or caring
about how they are being used is turning out to be a massive Charlie
Foxtrot right now...

Cheers,

Dave.
> 

-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 5/8] xfs: factor out log write ordering from xlog_cil_push_work()
  2021-06-18 14:27     ` Christoph Hellwig
@ 2021-06-18 22:34       ` Dave Chinner
  0 siblings, 0 replies; 50+ messages in thread
From: Dave Chinner @ 2021-06-18 22:34 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Darrick J. Wong, linux-xfs

On Fri, Jun 18, 2021 at 03:27:49PM +0100, Christoph Hellwig wrote:
> On Thu, Jun 17, 2021 at 12:59:04PM -0700, Darrick J. Wong wrote:
> > On Thu, Jun 17, 2021 at 06:26:14PM +1000, Dave Chinner wrote:
> > > From: Dave Chinner <dchinner@redhat.com>
> > > 
> > > So we can use it for start record ordering as well as commit record
> > > ordering in future.
> > > 
> > > Signed-off-by: Dave Chinner <dchinner@redhat.com>
> > 
> > This tricked me for a second until I realized that xlog_cil_order_write
> > is the chunk of code just prior to the xlog_cil_write_commit_record
> > call.
> 
> Yeah, moving the caller at the same time as the factoring is a trick
> test for every reader.  I think this needs to be documented in the
> commit log.  Or even better moved to a separate log, but it seems you
> get shot for that kind of suggestion on the xfs list these days..

Sorry, what? This should be a straight factoring - the place we do
the ordering check must not change because that'll break shit.

Ngggh.

Yeah, thanks git. When I rebased the patch, it's merged the hunk
into the wrong place. It gets fixed up later when I move the ordering
inside the xlog_cil_write_commit_record() function, but this patch
by itself was silently broken by the tooling.

-Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 0/8 V2] xfs: log fixes for for-next
  2021-06-17  8:26 [PATCH 0/8 V2] xfs: log fixes for for-next Dave Chinner
                   ` (8 preceding siblings ...)
  2021-06-17 18:32 ` [PATCH 0/8 V2] xfs: log fixes for for-next Brian Foster
@ 2021-06-18 22:48 ` Dave Chinner
  2021-06-19 20:22   ` Darrick J. Wong
  9 siblings, 1 reply; 50+ messages in thread
From: Dave Chinner @ 2021-06-18 22:48 UTC (permalink / raw)
  To: linux-xfs

On Thu, Jun 17, 2021 at 06:26:09PM +1000, Dave Chinner wrote:
> Hi folks,
> 
> This is followup from the first set of log fixes for for-next that
> were posted here:
> 
> https://lore.kernel.org/linux-xfs/20210615175719.GD158209@locust/T/#mde2cf0bb7d2ac369815a7e9371f0303efc89f51b
> 
> The first two patches of this series are updates for those patches,
> change log below. The rest is the fix for the bigger issue we
> uncovered in investigating the generic/019 failures, being that
> we're triggering a zero-day bug in the way log recovery assigns LSNs
> to checkpoints.
> 
> The "simple" fix of using the same ordering code as the commit
> record for the start records in the CIL push turned into a lot of
> patches once I started cleaning it up, separating out all the
> different bits and finally realising all the things I needed to
> change to avoid unintentional logic/behavioural changes. Hence
> there's some code movement, some factoring, API changes to
> xlog_write(), changing where we attach callbacks to commit iclogs so
> they remain correctly ordered if there are multiple commit records
> in the one iclog and then, finally, strictly ordering the start
> records....
> 
> The original "simple fix" I tested last night ran almost a thousand
> cycles of generic/019 without a log hang or recovery failure of any
> kind. The refactored patchset has run a couple hundred cycles of
> g/019 and g/475 over the last few hours without a failure, so I'm
> posting this so we can get a review iteration done while I sleep so
> we can - hopefully - get this sorted out before the end of the week.

Update on this so people know what's happening.

Yesterday I found another zero-day bug in the CIL code that triggers
when a shutdown occurs.

The shutdown processing runs asynchronously and without caring about
the current state or users of the iclogs. SO when it runs
xlog_state_do_callbacks() after changing the state of all iclogs to
XLOG_STATE_IOERROR, it runs the callbacks on all the iclogs and
frees everything associated with them.

That includes the CIL context structure that xlog_cil_push_now() is
still working on because it has a referenced iclog that it hasn't
yet released.

Hence the initial CIL commit that stamps the CIL context with the
commit lsn -after- it has attached the context to the commit_iclog
callback list can race with shutdown. This results in a UAF
situation and an 8 byte memory corruption when we stamp the LSN into
the context.

The current for-next tree does *much more* with the context after
the callbacks are attached, which opens up this UAF to both reads
and writes of free memory. The fix in patch 2, which adds a sleep on
the previous iclog after attaching the callbacks to the commit iclog
opens this window even futher.

ANd then the start record ordering patch set moves the commit iclog
into CIL context structure which we dereference after waiting on the
previous iclog means we are dereferencing pointers freed memory.

So, basically, before any of these fixes can go forwards, I first
need to fix the pre-existing CIL push/shutdown race.

And then, after I've rebased all these fixes on that fix and we're
back to square one and before we do anything else in the log, we
need to fix the mess that is caused by unco-ordinated shutdown
changing iclog state and running completions while we still have
active references to the iclogs and are preparing the iclog for IO.
XLOG_STATE_IOERROR must be considered harmful at this point in time.

-Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 0/8 V2] xfs: log fixes for for-next
  2021-06-18 22:48 ` Dave Chinner
@ 2021-06-19 20:22   ` Darrick J. Wong
  2021-06-20 22:18     ` Dave Chinner
  0 siblings, 1 reply; 50+ messages in thread
From: Darrick J. Wong @ 2021-06-19 20:22 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs

On Sat, Jun 19, 2021 at 08:48:30AM +1000, Dave Chinner wrote:
> On Thu, Jun 17, 2021 at 06:26:09PM +1000, Dave Chinner wrote:
> > Hi folks,
> > 
> > This is followup from the first set of log fixes for for-next that
> > were posted here:
> > 
> > https://lore.kernel.org/linux-xfs/20210615175719.GD158209@locust/T/#mde2cf0bb7d2ac369815a7e9371f0303efc89f51b
> > 
> > The first two patches of this series are updates for those patches,
> > change log below. The rest is the fix for the bigger issue we
> > uncovered in investigating the generic/019 failures, being that
> > we're triggering a zero-day bug in the way log recovery assigns LSNs
> > to checkpoints.
> > 
> > The "simple" fix of using the same ordering code as the commit
> > record for the start records in the CIL push turned into a lot of
> > patches once I started cleaning it up, separating out all the
> > different bits and finally realising all the things I needed to
> > change to avoid unintentional logic/behavioural changes. Hence
> > there's some code movement, some factoring, API changes to
> > xlog_write(), changing where we attach callbacks to commit iclogs so
> > they remain correctly ordered if there are multiple commit records
> > in the one iclog and then, finally, strictly ordering the start
> > records....
> > 
> > The original "simple fix" I tested last night ran almost a thousand
> > cycles of generic/019 without a log hang or recovery failure of any
> > kind. The refactored patchset has run a couple hundred cycles of
> > g/019 and g/475 over the last few hours without a failure, so I'm
> > posting this so we can get a review iteration done while I sleep so
> > we can - hopefully - get this sorted out before the end of the week.
> 
> Update on this so people know what's happening.
> 
> Yesterday I found another zero-day bug in the CIL code that triggers
> when a shutdown occurs.
> 
> The shutdown processing runs asynchronously and without caring about
> the current state or users of the iclogs. SO when it runs
> xlog_state_do_callbacks() after changing the state of all iclogs to
> XLOG_STATE_IOERROR, it runs the callbacks on all the iclogs and
> frees everything associated with them.
> 
> That includes the CIL context structure that xlog_cil_push_now() is
> still working on because it has a referenced iclog that it hasn't
> yet released.
> 
> Hence the initial CIL commit that stamps the CIL context with the
> commit lsn -after- it has attached the context to the commit_iclog
> callback list can race with shutdown. This results in a UAF
> situation and an 8 byte memory corruption when we stamp the LSN into
> the context.
> 
> The current for-next tree does *much more* with the context after
> the callbacks are attached, which opens up this UAF to both reads
> and writes of free memory. The fix in patch 2, which adds a sleep on
> the previous iclog after attaching the callbacks to the commit iclog
> opens this window even futher.
> 
> ANd then the start record ordering patch set moves the commit iclog
> into CIL context structure which we dereference after waiting on the
> previous iclog means we are dereferencing pointers freed memory.
> 
> So, basically, before any of these fixes can go forwards, I first
> need to fix the pre-existing CIL push/shutdown race.
> 
> And then, after I've rebased all these fixes on that fix and we're
> back to square one and before we do anything else in the log, we
> need to fix the mess that is caused by unco-ordinated shutdown
> changing iclog state and running completions while we still have
> active references to the iclogs and are preparing the iclog for IO.
> XLOG_STATE_IOERROR must be considered harmful at this point in time.

This puts me in a difficult spot.  We're past -rc6, which means that
Linus could tag 5.13.0 tomorrow, and if he does that, whatever's in
for-next needs to have had at least a few days to soak before Linus will
want to pull it upstream.

Or this could be yet another one of those crazy kernels that goes all
the way to -rc8, in which case there's still time to make small
adjustments.  But who knows, I have no schedule visibility.

However, this doesn't sound like small adjustments.  I think it's best
that I withdraw the CIL changes from for-next until we have more time to
fix these issues and make sure that there aren't any bugs that are
easily found by developers.  I feel confident enough about everything
between "xfs: log stripe roundoff is a property of the log" and
"xfs: xfs_log_force_lsn isn't passed a LSN" to keep them in for-next.

I'll also throw in the random fixes that got reviewed this week.

--D

> 
> -Dave.
> -- 
> Dave Chinner
> david@fromorbit.com

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 0/8 V2] xfs: log fixes for for-next
  2021-06-19 20:22   ` Darrick J. Wong
@ 2021-06-20 22:18     ` Dave Chinner
  0 siblings, 0 replies; 50+ messages in thread
From: Dave Chinner @ 2021-06-20 22:18 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

On Sat, Jun 19, 2021 at 01:22:49PM -0700, Darrick J. Wong wrote:
> On Sat, Jun 19, 2021 at 08:48:30AM +1000, Dave Chinner wrote:
> > On Thu, Jun 17, 2021 at 06:26:09PM +1000, Dave Chinner wrote:
> > > Hi folks,
> > > 
> > > This is followup from the first set of log fixes for for-next that
> > > were posted here:
> > > 
> > > https://lore.kernel.org/linux-xfs/20210615175719.GD158209@locust/T/#mde2cf0bb7d2ac369815a7e9371f0303efc89f51b
> > > 
> > > The first two patches of this series are updates for those patches,
> > > change log below. The rest is the fix for the bigger issue we
> > > uncovered in investigating the generic/019 failures, being that
> > > we're triggering a zero-day bug in the way log recovery assigns LSNs
> > > to checkpoints.
> > > 
> > > The "simple" fix of using the same ordering code as the commit
> > > record for the start records in the CIL push turned into a lot of
> > > patches once I started cleaning it up, separating out all the
> > > different bits and finally realising all the things I needed to
> > > change to avoid unintentional logic/behavioural changes. Hence
> > > there's some code movement, some factoring, API changes to
> > > xlog_write(), changing where we attach callbacks to commit iclogs so
> > > they remain correctly ordered if there are multiple commit records
> > > in the one iclog and then, finally, strictly ordering the start
> > > records....
> > > 
> > > The original "simple fix" I tested last night ran almost a thousand
> > > cycles of generic/019 without a log hang or recovery failure of any
> > > kind. The refactored patchset has run a couple hundred cycles of
> > > g/019 and g/475 over the last few hours without a failure, so I'm
> > > posting this so we can get a review iteration done while I sleep so
> > > we can - hopefully - get this sorted out before the end of the week.
> > 
> > Update on this so people know what's happening.
> > 
> > Yesterday I found another zero-day bug in the CIL code that triggers
> > when a shutdown occurs.
> > 
> > The shutdown processing runs asynchronously and without caring about
> > the current state or users of the iclogs. SO when it runs
> > xlog_state_do_callbacks() after changing the state of all iclogs to
> > XLOG_STATE_IOERROR, it runs the callbacks on all the iclogs and
> > frees everything associated with them.
> > 
> > That includes the CIL context structure that xlog_cil_push_now() is
> > still working on because it has a referenced iclog that it hasn't
> > yet released.
> > 
> > Hence the initial CIL commit that stamps the CIL context with the
> > commit lsn -after- it has attached the context to the commit_iclog
> > callback list can race with shutdown. This results in a UAF
> > situation and an 8 byte memory corruption when we stamp the LSN into
> > the context.
> > 
> > The current for-next tree does *much more* with the context after
> > the callbacks are attached, which opens up this UAF to both reads
> > and writes of free memory. The fix in patch 2, which adds a sleep on
> > the previous iclog after attaching the callbacks to the commit iclog
> > opens this window even futher.
> > 
> > ANd then the start record ordering patch set moves the commit iclog
> > into CIL context structure which we dereference after waiting on the
> > previous iclog means we are dereferencing pointers freed memory.
> > 
> > So, basically, before any of these fixes can go forwards, I first
> > need to fix the pre-existing CIL push/shutdown race.
> > 
> > And then, after I've rebased all these fixes on that fix and we're
> > back to square one and before we do anything else in the log, we
> > need to fix the mess that is caused by unco-ordinated shutdown
> > changing iclog state and running completions while we still have
> > active references to the iclogs and are preparing the iclog for IO.
> > XLOG_STATE_IOERROR must be considered harmful at this point in time.
> 
> This puts me in a difficult spot.  We're past -rc6, which means that
> Linus could tag 5.13.0 tomorrow, and if he does that, whatever's in
> for-next needs to have had at least a few days to soak before Linus will
> want to pull it upstream.
> 
> Or this could be yet another one of those crazy kernels that goes all
> the way to -rc8, in which case there's still time to make small
> adjustments.  But who knows, I have no schedule visibility.
> 
> However, this doesn't sound like small adjustments.  I think it's best
> that I withdraw the CIL changes from for-next until we have more time to
> fix these issues and make sure that there aren't any bugs that are
> easily found by developers.  I feel confident enough about everything
> between "xfs: log stripe roundoff is a property of the log" and
> "xfs: xfs_log_force_lsn isn't passed a LSN" to keep them in for-next.

Yup, that's a fair call. I was going to ask you to do this anyway
this morning (Monday) because I haven't been able to come up with a
magic bullet that fixes everything and makes it all better over the
weekend.

I'll start a new branch that fixes the UAF bug and the start record
ordering, and then rebase the CIL/log scalability patchset on top of
that. I'll also pull Christoph's cleanups for the new xlog_write()
code on top of that, too.

Oh, well, good thing I hadn't deleted the merged branches yet....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 4/8] xfs: pass a CIL context to xlog_write()
  2021-06-17  8:26 ` [PATCH 4/8] xfs: pass a CIL context to xlog_write() Dave Chinner
  2021-06-17 14:46     ` kernel test robot
@ 2021-06-28  8:58 ` Dan Carpenter
  2021-06-18 14:23   ` Christoph Hellwig
  2 siblings, 0 replies; 50+ messages in thread
From: kernel test robot @ 2021-06-26 23:10 UTC (permalink / raw)
  To: kbuild

[-- Attachment #1: Type: text/plain, Size: 32323 bytes --]

CC: kbuild-all(a)lists.01.org
In-Reply-To: <20210617082617.971602-5-david@fromorbit.com>
References: <20210617082617.971602-5-david@fromorbit.com>
TO: Dave Chinner <david@fromorbit.com>
TO: linux-xfs(a)vger.kernel.org

Hi Dave,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on xfs-linux/for-next]
[cannot apply to v5.13-rc7 next-20210625]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Dave-Chinner/xfs-log-fixes-for-for-next/20210617-162640
base:   https://git.kernel.org/pub/scm/fs/xfs/xfs-linux.git for-next
:::::: branch date: 10 days ago
:::::: commit date: 10 days ago
config: h8300-randconfig-m031-20210625 (attached as .config)
compiler: h8300-linux-gcc (GCC) 9.3.0

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>

New smatch warnings:
fs/xfs/xfs_log_cil.c:1130 xlog_cil_push_work() error: uninitialized symbol 'commit_lsn'.

Old smatch warnings:
fs/xfs/xfs_log_cil.c:644 xlog_discard_busy_extents() warn: should '(busyp->length) << mp->m_blkbb_log' be a 64 bit type?

vim +/commit_lsn +1130 fs/xfs/xfs_log_cil.c

be05dd0e68ac99 Dave Chinner      2021-06-08   846  
71e330b593905e Dave Chinner      2010-05-21   847  /*
c7cc296ddd1f6d Christoph Hellwig 2020-03-20   848   * Push the Committed Item List to the log.
c7cc296ddd1f6d Christoph Hellwig 2020-03-20   849   *
c7cc296ddd1f6d Christoph Hellwig 2020-03-20   850   * If the current sequence is the same as xc_push_seq we need to do a flush. If
c7cc296ddd1f6d Christoph Hellwig 2020-03-20   851   * xc_push_seq is less than the current sequence, then it has already been
a44f13edf0ebb4 Dave Chinner      2010-08-24   852   * flushed and we don't need to do anything - the caller will wait for it to
a44f13edf0ebb4 Dave Chinner      2010-08-24   853   * complete if necessary.
a44f13edf0ebb4 Dave Chinner      2010-08-24   854   *
c7cc296ddd1f6d Christoph Hellwig 2020-03-20   855   * xc_push_seq is checked unlocked against the sequence number for a match.
c7cc296ddd1f6d Christoph Hellwig 2020-03-20   856   * Hence we can allow log forces to run racily and not issue pushes for the
c7cc296ddd1f6d Christoph Hellwig 2020-03-20   857   * same sequence twice.  If we get a race between multiple pushes for the same
c7cc296ddd1f6d Christoph Hellwig 2020-03-20   858   * sequence they will block on the first one and then abort, hence avoiding
c7cc296ddd1f6d Christoph Hellwig 2020-03-20   859   * needless pushes.
c7cc296ddd1f6d Christoph Hellwig 2020-03-20   860   */
c7cc296ddd1f6d Christoph Hellwig 2020-03-20   861  static void
c7cc296ddd1f6d Christoph Hellwig 2020-03-20   862  xlog_cil_push_work(
c7cc296ddd1f6d Christoph Hellwig 2020-03-20   863  	struct work_struct	*work)
71e330b593905e Dave Chinner      2010-05-21   864  {
facd77e4e38b8f Dave Chinner      2021-06-04   865  	struct xfs_cil_ctx	*ctx =
facd77e4e38b8f Dave Chinner      2021-06-04   866  		container_of(work, struct xfs_cil_ctx, push_work);
facd77e4e38b8f Dave Chinner      2021-06-04   867  	struct xfs_cil		*cil = ctx->cil;
c7cc296ddd1f6d Christoph Hellwig 2020-03-20   868  	struct xlog		*log = cil->xc_log;
71e330b593905e Dave Chinner      2010-05-21   869  	struct xfs_log_vec	*lv;
71e330b593905e Dave Chinner      2010-05-21   870  	struct xfs_cil_ctx	*new_ctx;
71e330b593905e Dave Chinner      2010-05-21   871  	struct xlog_in_core	*commit_iclog;
66fc9ffa8638be Dave Chinner      2021-06-04   872  	int			num_iovecs = 0;
66fc9ffa8638be Dave Chinner      2021-06-04   873  	int			num_bytes = 0;
71e330b593905e Dave Chinner      2010-05-21   874  	int			error = 0;
877cf3473914ae Dave Chinner      2021-06-04   875  	struct xlog_cil_trans_hdr thdr;
a47518453bf958 Dave Chinner      2021-06-08   876  	struct xfs_log_vec	lvhdr = {};
71e330b593905e Dave Chinner      2010-05-21   877  	xfs_lsn_t		commit_lsn;
4c2d542f2e7865 Dave Chinner      2012-04-23   878  	xfs_lsn_t		push_seq;
0279bbbbc03f2c Dave Chinner      2021-06-03   879  	struct bio		bio;
0279bbbbc03f2c Dave Chinner      2021-06-03   880  	DECLARE_COMPLETION_ONSTACK(bdev_flush);
e12213ba5d909a Dave Chinner      2021-06-04   881  	bool			push_commit_stable;
e469cbe84f4ade Dave Chinner      2021-06-08   882  	struct xlog_ticket	*ticket;
71e330b593905e Dave Chinner      2010-05-21   883  
facd77e4e38b8f Dave Chinner      2021-06-04   884  	new_ctx = xlog_cil_ctx_alloc();
71e330b593905e Dave Chinner      2010-05-21   885  	new_ctx->ticket = xlog_cil_ticket_alloc(log);
71e330b593905e Dave Chinner      2010-05-21   886  
71e330b593905e Dave Chinner      2010-05-21   887  	down_write(&cil->xc_ctx_lock);
71e330b593905e Dave Chinner      2010-05-21   888  
4bb928cdb900d0 Dave Chinner      2013-08-12   889  	spin_lock(&cil->xc_push_lock);
4c2d542f2e7865 Dave Chinner      2012-04-23   890  	push_seq = cil->xc_push_seq;
4c2d542f2e7865 Dave Chinner      2012-04-23   891  	ASSERT(push_seq <= ctx->sequence);
e12213ba5d909a Dave Chinner      2021-06-04   892  	push_commit_stable = cil->xc_push_commit_stable;
e12213ba5d909a Dave Chinner      2021-06-04   893  	cil->xc_push_commit_stable = false;
71e330b593905e Dave Chinner      2010-05-21   894  
0e7ab7efe77451 Dave Chinner      2020-03-24   895  	/*
3682277520d6f4 Dave Chinner      2021-06-04   896  	 * As we are about to switch to a new, empty CIL context, we no longer
3682277520d6f4 Dave Chinner      2021-06-04   897  	 * need to throttle tasks on CIL space overruns. Wake any waiters that
3682277520d6f4 Dave Chinner      2021-06-04   898  	 * the hard push throttle may have caught so they can start committing
3682277520d6f4 Dave Chinner      2021-06-04   899  	 * to the new context. The ctx->xc_push_lock provides the serialisation
3682277520d6f4 Dave Chinner      2021-06-04   900  	 * necessary for safely using the lockless waitqueue_active() check in
3682277520d6f4 Dave Chinner      2021-06-04   901  	 * this context.
3682277520d6f4 Dave Chinner      2021-06-04   902  	 */
3682277520d6f4 Dave Chinner      2021-06-04   903  	if (waitqueue_active(&cil->xc_push_wait))
c7f87f3984cfa1 Dave Chinner      2020-06-16   904  		wake_up_all(&cil->xc_push_wait);
0e7ab7efe77451 Dave Chinner      2020-03-24   905  
4c2d542f2e7865 Dave Chinner      2012-04-23   906  	/*
4c2d542f2e7865 Dave Chinner      2012-04-23   907  	 * Check if we've anything to push. If there is nothing, then we don't
4c2d542f2e7865 Dave Chinner      2012-04-23   908  	 * move on to a new sequence number and so we have to be able to push
4c2d542f2e7865 Dave Chinner      2012-04-23   909  	 * this sequence again later.
4c2d542f2e7865 Dave Chinner      2012-04-23   910  	 */
0d11bae4bcf4aa Dave Chinner      2021-06-04   911  	if (test_bit(XLOG_CIL_EMPTY, &cil->xc_flags)) {
4c2d542f2e7865 Dave Chinner      2012-04-23   912  		cil->xc_push_seq = 0;
4bb928cdb900d0 Dave Chinner      2013-08-12   913  		spin_unlock(&cil->xc_push_lock);
a44f13edf0ebb4 Dave Chinner      2010-08-24   914  		goto out_skip;
4c2d542f2e7865 Dave Chinner      2012-04-23   915  	}
4c2d542f2e7865 Dave Chinner      2012-04-23   916  
a44f13edf0ebb4 Dave Chinner      2010-08-24   917  
cf085a1b5d2214 Joe Perches       2019-11-07   918  	/* check for a previously pushed sequence */
facd77e4e38b8f Dave Chinner      2021-06-04   919  	if (push_seq < ctx->sequence) {
8af3dcd3c89aef Dave Chinner      2014-09-23   920  		spin_unlock(&cil->xc_push_lock);
df806158b0f6eb Dave Chinner      2010-05-17   921  		goto out_skip;
8af3dcd3c89aef Dave Chinner      2014-09-23   922  	}
8af3dcd3c89aef Dave Chinner      2014-09-23   923  
8af3dcd3c89aef Dave Chinner      2014-09-23   924  	/*
8af3dcd3c89aef Dave Chinner      2014-09-23   925  	 * We are now going to push this context, so add it to the committing
8af3dcd3c89aef Dave Chinner      2014-09-23   926  	 * list before we do anything else. This ensures that anyone waiting on
8af3dcd3c89aef Dave Chinner      2014-09-23   927  	 * this push can easily detect the difference between a "push in
8af3dcd3c89aef Dave Chinner      2014-09-23   928  	 * progress" and "CIL is empty, nothing to do".
8af3dcd3c89aef Dave Chinner      2014-09-23   929  	 *
8af3dcd3c89aef Dave Chinner      2014-09-23   930  	 * IOWs, a wait loop can now check for:
8af3dcd3c89aef Dave Chinner      2014-09-23   931  	 *	the current sequence not being found on the committing list;
8af3dcd3c89aef Dave Chinner      2014-09-23   932  	 *	an empty CIL; and
8af3dcd3c89aef Dave Chinner      2014-09-23   933  	 *	an unchanged sequence number
8af3dcd3c89aef Dave Chinner      2014-09-23   934  	 * to detect a push that had nothing to do and therefore does not need
8af3dcd3c89aef Dave Chinner      2014-09-23   935  	 * waiting on. If the CIL is not empty, we get put on the committing
8af3dcd3c89aef Dave Chinner      2014-09-23   936  	 * list before emptying the CIL and bumping the sequence number. Hence
8af3dcd3c89aef Dave Chinner      2014-09-23   937  	 * an empty CIL and an unchanged sequence number means we jumped out
8af3dcd3c89aef Dave Chinner      2014-09-23   938  	 * above after doing nothing.
8af3dcd3c89aef Dave Chinner      2014-09-23   939  	 *
8af3dcd3c89aef Dave Chinner      2014-09-23   940  	 * Hence the waiter will either find the commit sequence on the
8af3dcd3c89aef Dave Chinner      2014-09-23   941  	 * committing list or the sequence number will be unchanged and the CIL
8af3dcd3c89aef Dave Chinner      2014-09-23   942  	 * still dirty. In that latter case, the push has not yet started, and
8af3dcd3c89aef Dave Chinner      2014-09-23   943  	 * so the waiter will have to continue trying to check the CIL
8af3dcd3c89aef Dave Chinner      2014-09-23   944  	 * committing list until it is found. In extreme cases of delay, the
8af3dcd3c89aef Dave Chinner      2014-09-23   945  	 * sequence may fully commit between the attempts the wait makes to wait
8af3dcd3c89aef Dave Chinner      2014-09-23   946  	 * on the commit sequence.
8af3dcd3c89aef Dave Chinner      2014-09-23   947  	 */
8af3dcd3c89aef Dave Chinner      2014-09-23   948  	list_add(&ctx->committing, &cil->xc_committing);
8af3dcd3c89aef Dave Chinner      2014-09-23   949  	spin_unlock(&cil->xc_push_lock);
df806158b0f6eb Dave Chinner      2010-05-17   950  
71e330b593905e Dave Chinner      2010-05-21   951  	/*
0279bbbbc03f2c Dave Chinner      2021-06-03   952  	 * The CIL is stable at this point - nothing new will be added to it
0279bbbbc03f2c Dave Chinner      2021-06-03   953  	 * because we hold the flush lock exclusively. Hence we can now issue
0279bbbbc03f2c Dave Chinner      2021-06-03   954  	 * a cache flush to ensure all the completed metadata in the journal we
0279bbbbc03f2c Dave Chinner      2021-06-03   955  	 * are about to overwrite is on stable storage.
0279bbbbc03f2c Dave Chinner      2021-06-03   956  	 */
0279bbbbc03f2c Dave Chinner      2021-06-03   957  	xfs_flush_bdev_async(&bio, log->l_mp->m_ddev_targp->bt_bdev,
0279bbbbc03f2c Dave Chinner      2021-06-03   958  				&bdev_flush);
0279bbbbc03f2c Dave Chinner      2021-06-03   959  
a8613836d99e62 Dave Chinner      2021-06-08   960  	xlog_cil_pcp_aggregate(cil, ctx);
a8613836d99e62 Dave Chinner      2021-06-08   961  
1f18c0c4b78cfb Dave Chinner      2021-06-08   962  	while (!list_empty(&ctx->log_items)) {
71e330b593905e Dave Chinner      2010-05-21   963  		struct xfs_log_item	*item;
71e330b593905e Dave Chinner      2010-05-21   964  
1f18c0c4b78cfb Dave Chinner      2021-06-08   965  		item = list_first_entry(&ctx->log_items,
71e330b593905e Dave Chinner      2010-05-21   966  					struct xfs_log_item, li_cil);
a47518453bf958 Dave Chinner      2021-06-08   967  		lv = item->li_lv;
a1785f597c8b06 Dave Chinner      2021-06-08   968  		lv->lv_order_id = item->li_order_id;
a47518453bf958 Dave Chinner      2021-06-08   969  		num_iovecs += lv->lv_niovecs;
66fc9ffa8638be Dave Chinner      2021-06-04   970  		/* we don't write ordered log vectors */
66fc9ffa8638be Dave Chinner      2021-06-04   971  		if (lv->lv_buf_len != XFS_LOG_VEC_ORDERED)
66fc9ffa8638be Dave Chinner      2021-06-04   972  			num_bytes += lv->lv_bytes;
a47518453bf958 Dave Chinner      2021-06-08   973  
a47518453bf958 Dave Chinner      2021-06-08   974  		list_add_tail(&lv->lv_list, &ctx->lv_chain);
a1785f597c8b06 Dave Chinner      2021-06-08   975  		list_del_init(&item->li_cil);
a1785f597c8b06 Dave Chinner      2021-06-08   976  		item->li_order_id = 0;
a1785f597c8b06 Dave Chinner      2021-06-08   977  		item->li_lv = NULL;
71e330b593905e Dave Chinner      2010-05-21   978  	}
71e330b593905e Dave Chinner      2010-05-21   979  
71e330b593905e Dave Chinner      2010-05-21   980  	/*
facd77e4e38b8f Dave Chinner      2021-06-04   981  	 * Switch the contexts so we can drop the context lock and move out
71e330b593905e Dave Chinner      2010-05-21   982  	 * of a shared context. We can't just go straight to the commit record,
71e330b593905e Dave Chinner      2010-05-21   983  	 * though - we need to synchronise with previous and future commits so
71e330b593905e Dave Chinner      2010-05-21   984  	 * that the commit records are correctly ordered in the log to ensure
71e330b593905e Dave Chinner      2010-05-21   985  	 * that we process items during log IO completion in the correct order.
71e330b593905e Dave Chinner      2010-05-21   986  	 *
71e330b593905e Dave Chinner      2010-05-21   987  	 * For example, if we get an EFI in one checkpoint and the EFD in the
71e330b593905e Dave Chinner      2010-05-21   988  	 * next (e.g. due to log forces), we do not want the checkpoint with
71e330b593905e Dave Chinner      2010-05-21   989  	 * the EFD to be committed before the checkpoint with the EFI.  Hence
71e330b593905e Dave Chinner      2010-05-21   990  	 * we must strictly order the commit records of the checkpoints so
71e330b593905e Dave Chinner      2010-05-21   991  	 * that: a) the checkpoint callbacks are attached to the iclogs in the
71e330b593905e Dave Chinner      2010-05-21   992  	 * correct order; and b) the checkpoints are replayed in correct order
71e330b593905e Dave Chinner      2010-05-21   993  	 * in log recovery.
71e330b593905e Dave Chinner      2010-05-21   994  	 *
71e330b593905e Dave Chinner      2010-05-21   995  	 * Hence we need to add this context to the committing context list so
71e330b593905e Dave Chinner      2010-05-21   996  	 * that higher sequences will wait for us to write out a commit record
71e330b593905e Dave Chinner      2010-05-21   997  	 * before they do.
f876e44603ad09 Dave Chinner      2014-02-27   998  	 *
f39ae5297c5ce2 Dave Chinner      2021-06-04   999  	 * xfs_log_force_seq requires us to mirror the new sequence into the cil
f876e44603ad09 Dave Chinner      2014-02-27  1000  	 * structure atomically with the addition of this sequence to the
f876e44603ad09 Dave Chinner      2014-02-27  1001  	 * committing list. This also ensures that we can do unlocked checks
f876e44603ad09 Dave Chinner      2014-02-27  1002  	 * against the current sequence in log forces without risking
f876e44603ad09 Dave Chinner      2014-02-27  1003  	 * deferencing a freed context pointer.
71e330b593905e Dave Chinner      2010-05-21  1004  	 */
4bb928cdb900d0 Dave Chinner      2013-08-12  1005  	spin_lock(&cil->xc_push_lock);
facd77e4e38b8f Dave Chinner      2021-06-04  1006  	xlog_cil_ctx_switch(cil, new_ctx);
4bb928cdb900d0 Dave Chinner      2013-08-12  1007  	spin_unlock(&cil->xc_push_lock);
71e330b593905e Dave Chinner      2010-05-21  1008  	up_write(&cil->xc_ctx_lock);
71e330b593905e Dave Chinner      2010-05-21  1009  
a1785f597c8b06 Dave Chinner      2021-06-08  1010  	/*
a1785f597c8b06 Dave Chinner      2021-06-08  1011  	 * Sort the log vector chain before we add the transaction headers.
a1785f597c8b06 Dave Chinner      2021-06-08  1012  	 * This ensures we always have the transaction headers at the start
a1785f597c8b06 Dave Chinner      2021-06-08  1013  	 * of the chain.
a1785f597c8b06 Dave Chinner      2021-06-08  1014  	 */
a1785f597c8b06 Dave Chinner      2021-06-08  1015  	list_sort(NULL, &ctx->lv_chain, xlog_cil_order_cmp);
a1785f597c8b06 Dave Chinner      2021-06-08  1016  
71e330b593905e Dave Chinner      2010-05-21  1017  	/*
71e330b593905e Dave Chinner      2010-05-21  1018  	 * Build a checkpoint transaction header and write it to the log to
71e330b593905e Dave Chinner      2010-05-21  1019  	 * begin the transaction. We need to account for the space used by the
71e330b593905e Dave Chinner      2010-05-21  1020  	 * transaction header here as it is not accounted for in xlog_write().
a47518453bf958 Dave Chinner      2021-06-08  1021  	 * Add the lvhdr to the head of the lv chain we pass to xlog_write() so
a47518453bf958 Dave Chinner      2021-06-08  1022  	 * it gets written into the iclog first.
71e330b593905e Dave Chinner      2010-05-21  1023  	 */
877cf3473914ae Dave Chinner      2021-06-04  1024  	xlog_cil_build_trans_hdr(ctx, &thdr, &lvhdr, num_iovecs);
66fc9ffa8638be Dave Chinner      2021-06-04  1025  	num_bytes += lvhdr.lv_bytes;
a47518453bf958 Dave Chinner      2021-06-08  1026  	list_add(&lvhdr.lv_list, &ctx->lv_chain);
71e330b593905e Dave Chinner      2010-05-21  1027  
0279bbbbc03f2c Dave Chinner      2021-06-03  1028  	/*
0279bbbbc03f2c Dave Chinner      2021-06-03  1029  	 * Before we format and submit the first iclog, we have to ensure that
0279bbbbc03f2c Dave Chinner      2021-06-03  1030  	 * the metadata writeback ordering cache flush is complete.
0279bbbbc03f2c Dave Chinner      2021-06-03  1031  	 */
0279bbbbc03f2c Dave Chinner      2021-06-03  1032  	wait_for_completion(&bdev_flush);
0279bbbbc03f2c Dave Chinner      2021-06-03  1033  
877cf3473914ae Dave Chinner      2021-06-04  1034  	/*
877cf3473914ae Dave Chinner      2021-06-04  1035  	 * The LSN we need to pass to the log items on transaction commit is the
877cf3473914ae Dave Chinner      2021-06-04  1036  	 * LSN reported by the first log vector write, not the commit lsn. If we
877cf3473914ae Dave Chinner      2021-06-04  1037  	 * use the commit record lsn then we can move the tail beyond the grant
877cf3473914ae Dave Chinner      2021-06-04  1038  	 * write head.
877cf3473914ae Dave Chinner      2021-06-04  1039  	 */
fc3370002b56bc Dave Chinner      2021-06-17  1040  	error = xlog_write(log, ctx, &ctx->lv_chain, ctx->ticket,
a47518453bf958 Dave Chinner      2021-06-08  1041  				NULL, num_bytes);
a47518453bf958 Dave Chinner      2021-06-08  1042  
a47518453bf958 Dave Chinner      2021-06-08  1043  	/*
a47518453bf958 Dave Chinner      2021-06-08  1044  	 * Take the lvhdr back off the lv_chain as it should not be passed
a47518453bf958 Dave Chinner      2021-06-08  1045  	 * to log IO completion.
a47518453bf958 Dave Chinner      2021-06-08  1046  	 */
a47518453bf958 Dave Chinner      2021-06-08  1047  	list_del(&lvhdr.lv_list);
71e330b593905e Dave Chinner      2010-05-21  1048  	if (error)
7db37c5e6575b2 Dave Chinner      2011-01-27  1049  		goto out_abort_free_ticket;
71e330b593905e Dave Chinner      2010-05-21  1050  
71e330b593905e Dave Chinner      2010-05-21  1051  	/*
71e330b593905e Dave Chinner      2010-05-21  1052  	 * now that we've written the checkpoint into the log, strictly
71e330b593905e Dave Chinner      2010-05-21  1053  	 * order the commit records so replay will get them in the right order.
71e330b593905e Dave Chinner      2010-05-21  1054  	 */
71e330b593905e Dave Chinner      2010-05-21  1055  restart:
4bb928cdb900d0 Dave Chinner      2013-08-12  1056  	spin_lock(&cil->xc_push_lock);
71e330b593905e Dave Chinner      2010-05-21  1057  	list_for_each_entry(new_ctx, &cil->xc_committing, committing) {
ac983517ec5941 Dave Chinner      2014-05-07  1058  		/*
ac983517ec5941 Dave Chinner      2014-05-07  1059  		 * Avoid getting stuck in this loop because we were woken by the
ac983517ec5941 Dave Chinner      2014-05-07  1060  		 * shutdown, but then went back to sleep once already in the
ac983517ec5941 Dave Chinner      2014-05-07  1061  		 * shutdown state.
ac983517ec5941 Dave Chinner      2014-05-07  1062  		 */
ac983517ec5941 Dave Chinner      2014-05-07  1063  		if (XLOG_FORCED_SHUTDOWN(log)) {
ac983517ec5941 Dave Chinner      2014-05-07  1064  			spin_unlock(&cil->xc_push_lock);
ac983517ec5941 Dave Chinner      2014-05-07  1065  			goto out_abort_free_ticket;
ac983517ec5941 Dave Chinner      2014-05-07  1066  		}
ac983517ec5941 Dave Chinner      2014-05-07  1067  
71e330b593905e Dave Chinner      2010-05-21  1068  		/*
71e330b593905e Dave Chinner      2010-05-21  1069  		 * Higher sequences will wait for this one so skip them.
ac983517ec5941 Dave Chinner      2014-05-07  1070  		 * Don't wait for our own sequence, either.
71e330b593905e Dave Chinner      2010-05-21  1071  		 */
71e330b593905e Dave Chinner      2010-05-21  1072  		if (new_ctx->sequence >= ctx->sequence)
71e330b593905e Dave Chinner      2010-05-21  1073  			continue;
71e330b593905e Dave Chinner      2010-05-21  1074  		if (!new_ctx->commit_lsn) {
71e330b593905e Dave Chinner      2010-05-21  1075  			/*
71e330b593905e Dave Chinner      2010-05-21  1076  			 * It is still being pushed! Wait for the push to
71e330b593905e Dave Chinner      2010-05-21  1077  			 * complete, then start again from the beginning.
71e330b593905e Dave Chinner      2010-05-21  1078  			 */
4bb928cdb900d0 Dave Chinner      2013-08-12  1079  			xlog_wait(&cil->xc_commit_wait, &cil->xc_push_lock);
71e330b593905e Dave Chinner      2010-05-21  1080  			goto restart;
71e330b593905e Dave Chinner      2010-05-21  1081  		}
71e330b593905e Dave Chinner      2010-05-21  1082  	}
4bb928cdb900d0 Dave Chinner      2013-08-12  1083  	spin_unlock(&cil->xc_push_lock);
71e330b593905e Dave Chinner      2010-05-21  1084  
fc3370002b56bc Dave Chinner      2021-06-17  1085  	error = xlog_cil_write_commit_record(ctx, &commit_iclog);
dd401770b0ff68 Dave Chinner      2020-03-25  1086  	if (error)
dd401770b0ff68 Dave Chinner      2020-03-25  1087  		goto out_abort_free_ticket;
dd401770b0ff68 Dave Chinner      2020-03-25  1088  
89ae379d564c5d Christoph Hellwig 2019-06-28  1089  	spin_lock(&commit_iclog->ic_callback_lock);
1858bb0bec612d Christoph Hellwig 2019-10-14  1090  	if (commit_iclog->ic_state == XLOG_STATE_IOERROR) {
89ae379d564c5d Christoph Hellwig 2019-06-28  1091  		spin_unlock(&commit_iclog->ic_callback_lock);
e469cbe84f4ade Dave Chinner      2021-06-08  1092  		goto out_abort_free_ticket;
89ae379d564c5d Christoph Hellwig 2019-06-28  1093  	}
89ae379d564c5d Christoph Hellwig 2019-06-28  1094  	ASSERT_ALWAYS(commit_iclog->ic_state == XLOG_STATE_ACTIVE ||
89ae379d564c5d Christoph Hellwig 2019-06-28  1095  		      commit_iclog->ic_state == XLOG_STATE_WANT_SYNC);
89ae379d564c5d Christoph Hellwig 2019-06-28  1096  	list_add_tail(&ctx->iclog_entry, &commit_iclog->ic_callbacks);
89ae379d564c5d Christoph Hellwig 2019-06-28  1097  	spin_unlock(&commit_iclog->ic_callback_lock);
71e330b593905e Dave Chinner      2010-05-21  1098  
71e330b593905e Dave Chinner      2010-05-21  1099  	/*
71e330b593905e Dave Chinner      2010-05-21  1100  	 * now the checkpoint commit is complete and we've attached the
71e330b593905e Dave Chinner      2010-05-21  1101  	 * callbacks to the iclog we can assign the commit LSN to the context
71e330b593905e Dave Chinner      2010-05-21  1102  	 * and wake up anyone who is waiting for the commit to complete.
71e330b593905e Dave Chinner      2010-05-21  1103  	 */
4bb928cdb900d0 Dave Chinner      2013-08-12  1104  	spin_lock(&cil->xc_push_lock);
eb40a87500ac2f Dave Chinner      2010-12-21  1105  	wake_up_all(&cil->xc_commit_wait);
4bb928cdb900d0 Dave Chinner      2013-08-12  1106  	spin_unlock(&cil->xc_push_lock);
71e330b593905e Dave Chinner      2010-05-21  1107  
e469cbe84f4ade Dave Chinner      2021-06-08  1108  	/*
e469cbe84f4ade Dave Chinner      2021-06-08  1109  	 * Pull the ticket off the ctx so we can ungrant it after releasing the
e469cbe84f4ade Dave Chinner      2021-06-08  1110  	 * commit_iclog. The ctx may be freed by the time we return from
e469cbe84f4ade Dave Chinner      2021-06-08  1111  	 * releasing the commit_iclog (i.e. checkpoint has been completed and
e469cbe84f4ade Dave Chinner      2021-06-08  1112  	 * callback run) so we can't reference the ctx after the call to
e469cbe84f4ade Dave Chinner      2021-06-08  1113  	 * xlog_state_release_iclog().
e469cbe84f4ade Dave Chinner      2021-06-08  1114  	 */
e469cbe84f4ade Dave Chinner      2021-06-08  1115  	ticket = ctx->ticket;
e469cbe84f4ade Dave Chinner      2021-06-08  1116  
5fd9256ce156ef Dave Chinner      2021-06-03  1117  	/*
815753dc16bbca Dave Chinner      2021-06-17  1118  	 * If the checkpoint spans multiple iclogs, wait for all previous iclogs
815753dc16bbca Dave Chinner      2021-06-17  1119  	 * to complete before we submit the commit_iclog. We can't use state
815753dc16bbca Dave Chinner      2021-06-17  1120  	 * checks for this - ACTIVE can be either a past completed iclog or a
815753dc16bbca Dave Chinner      2021-06-17  1121  	 * future iclog being filled, while WANT_SYNC through SYNC_DONE can be a
815753dc16bbca Dave Chinner      2021-06-17  1122  	 * past or future iclog awaiting IO or ordered IO completion to be run.
815753dc16bbca Dave Chinner      2021-06-17  1123  	 * In the latter case, if it's a future iclog and we wait on it, the we
815753dc16bbca Dave Chinner      2021-06-17  1124  	 * will hang because it won't get processed through to ic_force_wait
815753dc16bbca Dave Chinner      2021-06-17  1125  	 * wakeup until this commit_iclog is written to disk.  Hence we use the
815753dc16bbca Dave Chinner      2021-06-17  1126  	 * iclog header lsn and compare it to the commit lsn to determine if we
815753dc16bbca Dave Chinner      2021-06-17  1127  	 * need to wait on iclogs or not.
5fd9256ce156ef Dave Chinner      2021-06-03  1128  	 */
5fd9256ce156ef Dave Chinner      2021-06-03  1129  	spin_lock(&log->l_icloglock);
cb1acb3f324636 Dave Chinner      2021-06-04 @1130  	if (ctx->start_lsn != commit_lsn) {
815753dc16bbca Dave Chinner      2021-06-17  1131  		struct xlog_in_core	*iclog;
815753dc16bbca Dave Chinner      2021-06-17  1132  
815753dc16bbca Dave Chinner      2021-06-17  1133  		for (iclog = commit_iclog->ic_prev;
815753dc16bbca Dave Chinner      2021-06-17  1134  		     iclog != commit_iclog;
815753dc16bbca Dave Chinner      2021-06-17  1135  		     iclog = iclog->ic_prev) {
815753dc16bbca Dave Chinner      2021-06-17  1136  			xfs_lsn_t	hlsn;
815753dc16bbca Dave Chinner      2021-06-17  1137  
815753dc16bbca Dave Chinner      2021-06-17  1138  			/*
815753dc16bbca Dave Chinner      2021-06-17  1139  			 * If the LSN of the iclog is zero or in the future it
815753dc16bbca Dave Chinner      2021-06-17  1140  			 * means it has passed through IO completion and
815753dc16bbca Dave Chinner      2021-06-17  1141  			 * activation and hence all previous iclogs have also
815753dc16bbca Dave Chinner      2021-06-17  1142  			 * done so. We do not need to wait at all in this case.
815753dc16bbca Dave Chinner      2021-06-17  1143  			 */
815753dc16bbca Dave Chinner      2021-06-17  1144  			hlsn = be64_to_cpu(iclog->ic_header.h_lsn);
815753dc16bbca Dave Chinner      2021-06-17  1145  			if (!hlsn || XFS_LSN_CMP(hlsn, commit_lsn) > 0)
815753dc16bbca Dave Chinner      2021-06-17  1146  				break;
815753dc16bbca Dave Chinner      2021-06-17  1147  
815753dc16bbca Dave Chinner      2021-06-17  1148  			/*
815753dc16bbca Dave Chinner      2021-06-17  1149  			 * If the LSN of the iclog is older than the commit lsn,
815753dc16bbca Dave Chinner      2021-06-17  1150  			 * we have to wait on it. Waiting on this via the
815753dc16bbca Dave Chinner      2021-06-17  1151  			 * ic_force_wait should also order the completion of all
815753dc16bbca Dave Chinner      2021-06-17  1152  			 * older iclogs, too, but we leave checking that to the
815753dc16bbca Dave Chinner      2021-06-17  1153  			 * next loop iteration.
815753dc16bbca Dave Chinner      2021-06-17  1154  			 */
815753dc16bbca Dave Chinner      2021-06-17  1155  			ASSERT(XFS_LSN_CMP(hlsn, commit_lsn) < 0);
815753dc16bbca Dave Chinner      2021-06-17  1156  			xlog_wait_on_iclog(iclog);
cb1acb3f324636 Dave Chinner      2021-06-04  1157  			spin_lock(&log->l_icloglock);
815753dc16bbca Dave Chinner      2021-06-17  1158  		}
815753dc16bbca Dave Chinner      2021-06-17  1159  
815753dc16bbca Dave Chinner      2021-06-17  1160  		/*
815753dc16bbca Dave Chinner      2021-06-17  1161  		 * Regardless of whether we need to wait or not, the the
815753dc16bbca Dave Chinner      2021-06-17  1162  		 * commit_iclog write needs to issue a pre-flush so that the
815753dc16bbca Dave Chinner      2021-06-17  1163  		 * ordering for this checkpoint is correctly preserved down to
815753dc16bbca Dave Chinner      2021-06-17  1164  		 * stable storage.
815753dc16bbca Dave Chinner      2021-06-17  1165  		 */
cb1acb3f324636 Dave Chinner      2021-06-04  1166  		commit_iclog->ic_flags |= XLOG_ICL_NEED_FLUSH;
5fd9256ce156ef Dave Chinner      2021-06-03  1167  	}
5fd9256ce156ef Dave Chinner      2021-06-03  1168  
cb1acb3f324636 Dave Chinner      2021-06-04  1169  	/*
cb1acb3f324636 Dave Chinner      2021-06-04  1170  	 * The commit iclog must be written to stable storage to guarantee
cb1acb3f324636 Dave Chinner      2021-06-04  1171  	 * journal IO vs metadata writeback IO is correctly ordered on stable
cb1acb3f324636 Dave Chinner      2021-06-04  1172  	 * storage.
e12213ba5d909a Dave Chinner      2021-06-04  1173  	 *
e12213ba5d909a Dave Chinner      2021-06-04  1174  	 * If the push caller needs the commit to be immediately stable and the
e12213ba5d909a Dave Chinner      2021-06-04  1175  	 * commit_iclog is not yet marked as XLOG_STATE_WANT_SYNC to indicate it
e12213ba5d909a Dave Chinner      2021-06-04  1176  	 * will be written when released, switch it's state to WANT_SYNC right
e12213ba5d909a Dave Chinner      2021-06-04  1177  	 * now.
cb1acb3f324636 Dave Chinner      2021-06-04  1178  	 */
cb1acb3f324636 Dave Chinner      2021-06-04  1179  	commit_iclog->ic_flags |= XLOG_ICL_NEED_FUA;
e12213ba5d909a Dave Chinner      2021-06-04  1180  	if (push_commit_stable && commit_iclog->ic_state == XLOG_STATE_ACTIVE)
e12213ba5d909a Dave Chinner      2021-06-04  1181  		xlog_state_switch_iclogs(log, commit_iclog, 0);
e469cbe84f4ade Dave Chinner      2021-06-08  1182  	xlog_state_release_iclog(log, commit_iclog, ticket);
cb1acb3f324636 Dave Chinner      2021-06-04  1183  	spin_unlock(&log->l_icloglock);
e469cbe84f4ade Dave Chinner      2021-06-08  1184  
e469cbe84f4ade Dave Chinner      2021-06-08  1185  	xfs_log_ticket_ungrant(log, ticket);
c7cc296ddd1f6d Christoph Hellwig 2020-03-20  1186  	return;
71e330b593905e Dave Chinner      2010-05-21  1187  
71e330b593905e Dave Chinner      2010-05-21  1188  out_skip:
71e330b593905e Dave Chinner      2010-05-21  1189  	up_write(&cil->xc_ctx_lock);
71e330b593905e Dave Chinner      2010-05-21  1190  	xfs_log_ticket_put(new_ctx->ticket);
71e330b593905e Dave Chinner      2010-05-21  1191  	kmem_free(new_ctx);
c7cc296ddd1f6d Christoph Hellwig 2020-03-20  1192  	return;
71e330b593905e Dave Chinner      2010-05-21  1193  
7db37c5e6575b2 Dave Chinner      2011-01-27  1194  out_abort_free_ticket:
877cf3473914ae Dave Chinner      2021-06-04  1195  	xfs_log_ticket_ungrant(log, ctx->ticket);
12e6a0f449d585 Christoph Hellwig 2020-03-20  1196  	ASSERT(XLOG_FORCED_SHUTDOWN(log));
12e6a0f449d585 Christoph Hellwig 2020-03-20  1197  	xlog_cil_committed(ctx);
4c2d542f2e7865 Dave Chinner      2012-04-23  1198  }
4c2d542f2e7865 Dave Chinner      2012-04-23  1199  

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all(a)lists.01.org

[-- Attachment #2: config.gz --]
[-- Type: application/gzip, Size: 21700 bytes --]

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 4/8] xfs: pass a CIL context to xlog_write()
@ 2021-06-28  8:58 ` Dan Carpenter
  0 siblings, 0 replies; 50+ messages in thread
From: Dan Carpenter @ 2021-06-28  8:58 UTC (permalink / raw)
  To: kbuild, Dave Chinner, linux-xfs; +Cc: lkp, kbuild-all

Hi Dave,

url:    https://github.com/0day-ci/linux/commits/Dave-Chinner/xfs-log-fixes-for-for-next/20210617-162640
base:   https://git.kernel.org/pub/scm/fs/xfs/xfs-linux.git for-next
config: h8300-randconfig-m031-20210625 (attached as .config)
compiler: h8300-linux-gcc (GCC) 9.3.0

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>

New smatch warnings:
fs/xfs/xfs_log_cil.c:1130 xlog_cil_push_work() error: uninitialized symbol 'commit_lsn'.


vim +/commit_lsn +1130 fs/xfs/xfs_log_cil.c

c7cc296ddd1f6d Christoph Hellwig 2020-03-20   861  static void
c7cc296ddd1f6d Christoph Hellwig 2020-03-20   862  xlog_cil_push_work(
c7cc296ddd1f6d Christoph Hellwig 2020-03-20   863  	struct work_struct	*work)
71e330b593905e Dave Chinner      2010-05-21   864  {
facd77e4e38b8f Dave Chinner      2021-06-04   865  	struct xfs_cil_ctx	*ctx =
facd77e4e38b8f Dave Chinner      2021-06-04   866  		container_of(work, struct xfs_cil_ctx, push_work);
facd77e4e38b8f Dave Chinner      2021-06-04   867  	struct xfs_cil		*cil = ctx->cil;
c7cc296ddd1f6d Christoph Hellwig 2020-03-20   868  	struct xlog		*log = cil->xc_log;
71e330b593905e Dave Chinner      2010-05-21   869  	struct xfs_log_vec	*lv;
71e330b593905e Dave Chinner      2010-05-21   870  	struct xfs_cil_ctx	*new_ctx;
71e330b593905e Dave Chinner      2010-05-21   871  	struct xlog_in_core	*commit_iclog;
66fc9ffa8638be Dave Chinner      2021-06-04   872  	int			num_iovecs = 0;
66fc9ffa8638be Dave Chinner      2021-06-04   873  	int			num_bytes = 0;
71e330b593905e Dave Chinner      2010-05-21   874  	int			error = 0;
877cf3473914ae Dave Chinner      2021-06-04   875  	struct xlog_cil_trans_hdr thdr;
a47518453bf958 Dave Chinner      2021-06-08   876  	struct xfs_log_vec	lvhdr = {};
71e330b593905e Dave Chinner      2010-05-21   877  	xfs_lsn_t		commit_lsn;
                                                                                ^^^^^^^^^^

4c2d542f2e7865 Dave Chinner      2012-04-23   878  	xfs_lsn_t		push_seq;
0279bbbbc03f2c Dave Chinner      2021-06-03   879  	struct bio		bio;
0279bbbbc03f2c Dave Chinner      2021-06-03   880  	DECLARE_COMPLETION_ONSTACK(bdev_flush);
e12213ba5d909a Dave Chinner      2021-06-04   881  	bool			push_commit_stable;
e469cbe84f4ade Dave Chinner      2021-06-08   882  	struct xlog_ticket	*ticket;
71e330b593905e Dave Chinner      2010-05-21   883  
facd77e4e38b8f Dave Chinner      2021-06-04   884  	new_ctx = xlog_cil_ctx_alloc();
71e330b593905e Dave Chinner      2010-05-21   885  	new_ctx->ticket = xlog_cil_ticket_alloc(log);
71e330b593905e Dave Chinner      2010-05-21   886  
71e330b593905e Dave Chinner      2010-05-21   887  	down_write(&cil->xc_ctx_lock);
71e330b593905e Dave Chinner      2010-05-21   888  
4bb928cdb900d0 Dave Chinner      2013-08-12   889  	spin_lock(&cil->xc_push_lock);
4c2d542f2e7865 Dave Chinner      2012-04-23   890  	push_seq = cil->xc_push_seq;
4c2d542f2e7865 Dave Chinner      2012-04-23   891  	ASSERT(push_seq <= ctx->sequence);
e12213ba5d909a Dave Chinner      2021-06-04   892  	push_commit_stable = cil->xc_push_commit_stable;
e12213ba5d909a Dave Chinner      2021-06-04   893  	cil->xc_push_commit_stable = false;
71e330b593905e Dave Chinner      2010-05-21   894  
0e7ab7efe77451 Dave Chinner      2020-03-24   895  	/*
3682277520d6f4 Dave Chinner      2021-06-04   896  	 * As we are about to switch to a new, empty CIL context, we no longer
3682277520d6f4 Dave Chinner      2021-06-04   897  	 * need to throttle tasks on CIL space overruns. Wake any waiters that
3682277520d6f4 Dave Chinner      2021-06-04   898  	 * the hard push throttle may have caught so they can start committing
3682277520d6f4 Dave Chinner      2021-06-04   899  	 * to the new context. The ctx->xc_push_lock provides the serialisation
3682277520d6f4 Dave Chinner      2021-06-04   900  	 * necessary for safely using the lockless waitqueue_active() check in
3682277520d6f4 Dave Chinner      2021-06-04   901  	 * this context.
3682277520d6f4 Dave Chinner      2021-06-04   902  	 */
3682277520d6f4 Dave Chinner      2021-06-04   903  	if (waitqueue_active(&cil->xc_push_wait))
c7f87f3984cfa1 Dave Chinner      2020-06-16   904  		wake_up_all(&cil->xc_push_wait);
0e7ab7efe77451 Dave Chinner      2020-03-24   905  
4c2d542f2e7865 Dave Chinner      2012-04-23   906  	/*
4c2d542f2e7865 Dave Chinner      2012-04-23   907  	 * Check if we've anything to push. If there is nothing, then we don't
4c2d542f2e7865 Dave Chinner      2012-04-23   908  	 * move on to a new sequence number and so we have to be able to push
4c2d542f2e7865 Dave Chinner      2012-04-23   909  	 * this sequence again later.
4c2d542f2e7865 Dave Chinner      2012-04-23   910  	 */
0d11bae4bcf4aa Dave Chinner      2021-06-04   911  	if (test_bit(XLOG_CIL_EMPTY, &cil->xc_flags)) {
4c2d542f2e7865 Dave Chinner      2012-04-23   912  		cil->xc_push_seq = 0;
4bb928cdb900d0 Dave Chinner      2013-08-12   913  		spin_unlock(&cil->xc_push_lock);
a44f13edf0ebb4 Dave Chinner      2010-08-24   914  		goto out_skip;
4c2d542f2e7865 Dave Chinner      2012-04-23   915  	}
4c2d542f2e7865 Dave Chinner      2012-04-23   916  
a44f13edf0ebb4 Dave Chinner      2010-08-24   917  
cf085a1b5d2214 Joe Perches       2019-11-07   918  	/* check for a previously pushed sequence */
facd77e4e38b8f Dave Chinner      2021-06-04   919  	if (push_seq < ctx->sequence) {
8af3dcd3c89aef Dave Chinner      2014-09-23   920  		spin_unlock(&cil->xc_push_lock);
df806158b0f6eb Dave Chinner      2010-05-17   921  		goto out_skip;
8af3dcd3c89aef Dave Chinner      2014-09-23   922  	}
8af3dcd3c89aef Dave Chinner      2014-09-23   923  
8af3dcd3c89aef Dave Chinner      2014-09-23   924  	/*
8af3dcd3c89aef Dave Chinner      2014-09-23   925  	 * We are now going to push this context, so add it to the committing
8af3dcd3c89aef Dave Chinner      2014-09-23   926  	 * list before we do anything else. This ensures that anyone waiting on
8af3dcd3c89aef Dave Chinner      2014-09-23   927  	 * this push can easily detect the difference between a "push in
8af3dcd3c89aef Dave Chinner      2014-09-23   928  	 * progress" and "CIL is empty, nothing to do".
8af3dcd3c89aef Dave Chinner      2014-09-23   929  	 *
8af3dcd3c89aef Dave Chinner      2014-09-23   930  	 * IOWs, a wait loop can now check for:
8af3dcd3c89aef Dave Chinner      2014-09-23   931  	 *	the current sequence not being found on the committing list;
8af3dcd3c89aef Dave Chinner      2014-09-23   932  	 *	an empty CIL; and
8af3dcd3c89aef Dave Chinner      2014-09-23   933  	 *	an unchanged sequence number
8af3dcd3c89aef Dave Chinner      2014-09-23   934  	 * to detect a push that had nothing to do and therefore does not need
8af3dcd3c89aef Dave Chinner      2014-09-23   935  	 * waiting on. If the CIL is not empty, we get put on the committing
8af3dcd3c89aef Dave Chinner      2014-09-23   936  	 * list before emptying the CIL and bumping the sequence number. Hence
8af3dcd3c89aef Dave Chinner      2014-09-23   937  	 * an empty CIL and an unchanged sequence number means we jumped out
8af3dcd3c89aef Dave Chinner      2014-09-23   938  	 * above after doing nothing.
8af3dcd3c89aef Dave Chinner      2014-09-23   939  	 *
8af3dcd3c89aef Dave Chinner      2014-09-23   940  	 * Hence the waiter will either find the commit sequence on the
8af3dcd3c89aef Dave Chinner      2014-09-23   941  	 * committing list or the sequence number will be unchanged and the CIL
8af3dcd3c89aef Dave Chinner      2014-09-23   942  	 * still dirty. In that latter case, the push has not yet started, and
8af3dcd3c89aef Dave Chinner      2014-09-23   943  	 * so the waiter will have to continue trying to check the CIL
8af3dcd3c89aef Dave Chinner      2014-09-23   944  	 * committing list until it is found. In extreme cases of delay, the
8af3dcd3c89aef Dave Chinner      2014-09-23   945  	 * sequence may fully commit between the attempts the wait makes to wait
8af3dcd3c89aef Dave Chinner      2014-09-23   946  	 * on the commit sequence.
8af3dcd3c89aef Dave Chinner      2014-09-23   947  	 */
8af3dcd3c89aef Dave Chinner      2014-09-23   948  	list_add(&ctx->committing, &cil->xc_committing);
8af3dcd3c89aef Dave Chinner      2014-09-23   949  	spin_unlock(&cil->xc_push_lock);
df806158b0f6eb Dave Chinner      2010-05-17   950  
71e330b593905e Dave Chinner      2010-05-21   951  	/*
0279bbbbc03f2c Dave Chinner      2021-06-03   952  	 * The CIL is stable at this point - nothing new will be added to it
0279bbbbc03f2c Dave Chinner      2021-06-03   953  	 * because we hold the flush lock exclusively. Hence we can now issue
0279bbbbc03f2c Dave Chinner      2021-06-03   954  	 * a cache flush to ensure all the completed metadata in the journal we
0279bbbbc03f2c Dave Chinner      2021-06-03   955  	 * are about to overwrite is on stable storage.
0279bbbbc03f2c Dave Chinner      2021-06-03   956  	 */
0279bbbbc03f2c Dave Chinner      2021-06-03   957  	xfs_flush_bdev_async(&bio, log->l_mp->m_ddev_targp->bt_bdev,
0279bbbbc03f2c Dave Chinner      2021-06-03   958  				&bdev_flush);
0279bbbbc03f2c Dave Chinner      2021-06-03   959  
a8613836d99e62 Dave Chinner      2021-06-08   960  	xlog_cil_pcp_aggregate(cil, ctx);
a8613836d99e62 Dave Chinner      2021-06-08   961  
1f18c0c4b78cfb Dave Chinner      2021-06-08   962  	while (!list_empty(&ctx->log_items)) {
71e330b593905e Dave Chinner      2010-05-21   963  		struct xfs_log_item	*item;
71e330b593905e Dave Chinner      2010-05-21   964  
1f18c0c4b78cfb Dave Chinner      2021-06-08   965  		item = list_first_entry(&ctx->log_items,
71e330b593905e Dave Chinner      2010-05-21   966  					struct xfs_log_item, li_cil);
a47518453bf958 Dave Chinner      2021-06-08   967  		lv = item->li_lv;
a1785f597c8b06 Dave Chinner      2021-06-08   968  		lv->lv_order_id = item->li_order_id;
a47518453bf958 Dave Chinner      2021-06-08   969  		num_iovecs += lv->lv_niovecs;
66fc9ffa8638be Dave Chinner      2021-06-04   970  		/* we don't write ordered log vectors */
66fc9ffa8638be Dave Chinner      2021-06-04   971  		if (lv->lv_buf_len != XFS_LOG_VEC_ORDERED)
66fc9ffa8638be Dave Chinner      2021-06-04   972  			num_bytes += lv->lv_bytes;
a47518453bf958 Dave Chinner      2021-06-08   973  
a47518453bf958 Dave Chinner      2021-06-08   974  		list_add_tail(&lv->lv_list, &ctx->lv_chain);
a1785f597c8b06 Dave Chinner      2021-06-08   975  		list_del_init(&item->li_cil);
a1785f597c8b06 Dave Chinner      2021-06-08   976  		item->li_order_id = 0;
a1785f597c8b06 Dave Chinner      2021-06-08   977  		item->li_lv = NULL;
71e330b593905e Dave Chinner      2010-05-21   978  	}
71e330b593905e Dave Chinner      2010-05-21   979  
71e330b593905e Dave Chinner      2010-05-21   980  	/*
facd77e4e38b8f Dave Chinner      2021-06-04   981  	 * Switch the contexts so we can drop the context lock and move out
71e330b593905e Dave Chinner      2010-05-21   982  	 * of a shared context. We can't just go straight to the commit record,
71e330b593905e Dave Chinner      2010-05-21   983  	 * though - we need to synchronise with previous and future commits so
71e330b593905e Dave Chinner      2010-05-21   984  	 * that the commit records are correctly ordered in the log to ensure
71e330b593905e Dave Chinner      2010-05-21   985  	 * that we process items during log IO completion in the correct order.
71e330b593905e Dave Chinner      2010-05-21   986  	 *
71e330b593905e Dave Chinner      2010-05-21   987  	 * For example, if we get an EFI in one checkpoint and the EFD in the
71e330b593905e Dave Chinner      2010-05-21   988  	 * next (e.g. due to log forces), we do not want the checkpoint with
71e330b593905e Dave Chinner      2010-05-21   989  	 * the EFD to be committed before the checkpoint with the EFI.  Hence
71e330b593905e Dave Chinner      2010-05-21   990  	 * we must strictly order the commit records of the checkpoints so
71e330b593905e Dave Chinner      2010-05-21   991  	 * that: a) the checkpoint callbacks are attached to the iclogs in the
71e330b593905e Dave Chinner      2010-05-21   992  	 * correct order; and b) the checkpoints are replayed in correct order
71e330b593905e Dave Chinner      2010-05-21   993  	 * in log recovery.
71e330b593905e Dave Chinner      2010-05-21   994  	 *
71e330b593905e Dave Chinner      2010-05-21   995  	 * Hence we need to add this context to the committing context list so
71e330b593905e Dave Chinner      2010-05-21   996  	 * that higher sequences will wait for us to write out a commit record
71e330b593905e Dave Chinner      2010-05-21   997  	 * before they do.
f876e44603ad09 Dave Chinner      2014-02-27   998  	 *
f39ae5297c5ce2 Dave Chinner      2021-06-04   999  	 * xfs_log_force_seq requires us to mirror the new sequence into the cil
f876e44603ad09 Dave Chinner      2014-02-27  1000  	 * structure atomically with the addition of this sequence to the
f876e44603ad09 Dave Chinner      2014-02-27  1001  	 * committing list. This also ensures that we can do unlocked checks
f876e44603ad09 Dave Chinner      2014-02-27  1002  	 * against the current sequence in log forces without risking
f876e44603ad09 Dave Chinner      2014-02-27  1003  	 * deferencing a freed context pointer.
71e330b593905e Dave Chinner      2010-05-21  1004  	 */
4bb928cdb900d0 Dave Chinner      2013-08-12  1005  	spin_lock(&cil->xc_push_lock);
facd77e4e38b8f Dave Chinner      2021-06-04  1006  	xlog_cil_ctx_switch(cil, new_ctx);
4bb928cdb900d0 Dave Chinner      2013-08-12  1007  	spin_unlock(&cil->xc_push_lock);
71e330b593905e Dave Chinner      2010-05-21  1008  	up_write(&cil->xc_ctx_lock);
71e330b593905e Dave Chinner      2010-05-21  1009  
a1785f597c8b06 Dave Chinner      2021-06-08  1010  	/*
a1785f597c8b06 Dave Chinner      2021-06-08  1011  	 * Sort the log vector chain before we add the transaction headers.
a1785f597c8b06 Dave Chinner      2021-06-08  1012  	 * This ensures we always have the transaction headers at the start
a1785f597c8b06 Dave Chinner      2021-06-08  1013  	 * of the chain.
a1785f597c8b06 Dave Chinner      2021-06-08  1014  	 */
a1785f597c8b06 Dave Chinner      2021-06-08  1015  	list_sort(NULL, &ctx->lv_chain, xlog_cil_order_cmp);
a1785f597c8b06 Dave Chinner      2021-06-08  1016  
71e330b593905e Dave Chinner      2010-05-21  1017  	/*
71e330b593905e Dave Chinner      2010-05-21  1018  	 * Build a checkpoint transaction header and write it to the log to
71e330b593905e Dave Chinner      2010-05-21  1019  	 * begin the transaction. We need to account for the space used by the
71e330b593905e Dave Chinner      2010-05-21  1020  	 * transaction header here as it is not accounted for in xlog_write().
a47518453bf958 Dave Chinner      2021-06-08  1021  	 * Add the lvhdr to the head of the lv chain we pass to xlog_write() so
a47518453bf958 Dave Chinner      2021-06-08  1022  	 * it gets written into the iclog first.
71e330b593905e Dave Chinner      2010-05-21  1023  	 */
877cf3473914ae Dave Chinner      2021-06-04  1024  	xlog_cil_build_trans_hdr(ctx, &thdr, &lvhdr, num_iovecs);
66fc9ffa8638be Dave Chinner      2021-06-04  1025  	num_bytes += lvhdr.lv_bytes;
a47518453bf958 Dave Chinner      2021-06-08  1026  	list_add(&lvhdr.lv_list, &ctx->lv_chain);
71e330b593905e Dave Chinner      2010-05-21  1027  
0279bbbbc03f2c Dave Chinner      2021-06-03  1028  	/*
0279bbbbc03f2c Dave Chinner      2021-06-03  1029  	 * Before we format and submit the first iclog, we have to ensure that
0279bbbbc03f2c Dave Chinner      2021-06-03  1030  	 * the metadata writeback ordering cache flush is complete.
0279bbbbc03f2c Dave Chinner      2021-06-03  1031  	 */
0279bbbbc03f2c Dave Chinner      2021-06-03  1032  	wait_for_completion(&bdev_flush);
0279bbbbc03f2c Dave Chinner      2021-06-03  1033  
877cf3473914ae Dave Chinner      2021-06-04  1034  	/*
877cf3473914ae Dave Chinner      2021-06-04  1035  	 * The LSN we need to pass to the log items on transaction commit is the
877cf3473914ae Dave Chinner      2021-06-04  1036  	 * LSN reported by the first log vector write, not the commit lsn. If we
877cf3473914ae Dave Chinner      2021-06-04  1037  	 * use the commit record lsn then we can move the tail beyond the grant
877cf3473914ae Dave Chinner      2021-06-04  1038  	 * write head.
877cf3473914ae Dave Chinner      2021-06-04  1039  	 */
fc3370002b56bc Dave Chinner      2021-06-17  1040  	error = xlog_write(log, ctx, &ctx->lv_chain, ctx->ticket,
a47518453bf958 Dave Chinner      2021-06-08  1041  				NULL, num_bytes);
a47518453bf958 Dave Chinner      2021-06-08  1042  
a47518453bf958 Dave Chinner      2021-06-08  1043  	/*
a47518453bf958 Dave Chinner      2021-06-08  1044  	 * Take the lvhdr back off the lv_chain as it should not be passed
a47518453bf958 Dave Chinner      2021-06-08  1045  	 * to log IO completion.
a47518453bf958 Dave Chinner      2021-06-08  1046  	 */
a47518453bf958 Dave Chinner      2021-06-08  1047  	list_del(&lvhdr.lv_list);
71e330b593905e Dave Chinner      2010-05-21  1048  	if (error)
7db37c5e6575b2 Dave Chinner      2011-01-27  1049  		goto out_abort_free_ticket;
71e330b593905e Dave Chinner      2010-05-21  1050  
71e330b593905e Dave Chinner      2010-05-21  1051  	/*
71e330b593905e Dave Chinner      2010-05-21  1052  	 * now that we've written the checkpoint into the log, strictly
71e330b593905e Dave Chinner      2010-05-21  1053  	 * order the commit records so replay will get them in the right order.
71e330b593905e Dave Chinner      2010-05-21  1054  	 */
71e330b593905e Dave Chinner      2010-05-21  1055  restart:
4bb928cdb900d0 Dave Chinner      2013-08-12  1056  	spin_lock(&cil->xc_push_lock);
71e330b593905e Dave Chinner      2010-05-21  1057  	list_for_each_entry(new_ctx, &cil->xc_committing, committing) {
ac983517ec5941 Dave Chinner      2014-05-07  1058  		/*
ac983517ec5941 Dave Chinner      2014-05-07  1059  		 * Avoid getting stuck in this loop because we were woken by the
ac983517ec5941 Dave Chinner      2014-05-07  1060  		 * shutdown, but then went back to sleep once already in the
ac983517ec5941 Dave Chinner      2014-05-07  1061  		 * shutdown state.
ac983517ec5941 Dave Chinner      2014-05-07  1062  		 */
ac983517ec5941 Dave Chinner      2014-05-07  1063  		if (XLOG_FORCED_SHUTDOWN(log)) {
ac983517ec5941 Dave Chinner      2014-05-07  1064  			spin_unlock(&cil->xc_push_lock);
ac983517ec5941 Dave Chinner      2014-05-07  1065  			goto out_abort_free_ticket;
ac983517ec5941 Dave Chinner      2014-05-07  1066  		}
ac983517ec5941 Dave Chinner      2014-05-07  1067  
71e330b593905e Dave Chinner      2010-05-21  1068  		/*
71e330b593905e Dave Chinner      2010-05-21  1069  		 * Higher sequences will wait for this one so skip them.
ac983517ec5941 Dave Chinner      2014-05-07  1070  		 * Don't wait for our own sequence, either.
71e330b593905e Dave Chinner      2010-05-21  1071  		 */
71e330b593905e Dave Chinner      2010-05-21  1072  		if (new_ctx->sequence >= ctx->sequence)
71e330b593905e Dave Chinner      2010-05-21  1073  			continue;
71e330b593905e Dave Chinner      2010-05-21  1074  		if (!new_ctx->commit_lsn) {
71e330b593905e Dave Chinner      2010-05-21  1075  			/*
71e330b593905e Dave Chinner      2010-05-21  1076  			 * It is still being pushed! Wait for the push to
71e330b593905e Dave Chinner      2010-05-21  1077  			 * complete, then start again from the beginning.
71e330b593905e Dave Chinner      2010-05-21  1078  			 */
4bb928cdb900d0 Dave Chinner      2013-08-12  1079  			xlog_wait(&cil->xc_commit_wait, &cil->xc_push_lock);
71e330b593905e Dave Chinner      2010-05-21  1080  			goto restart;
71e330b593905e Dave Chinner      2010-05-21  1081  		}
71e330b593905e Dave Chinner      2010-05-21  1082  	}
4bb928cdb900d0 Dave Chinner      2013-08-12  1083  	spin_unlock(&cil->xc_push_lock);
71e330b593905e Dave Chinner      2010-05-21  1084  
fc3370002b56bc Dave Chinner      2021-06-17  1085  	error = xlog_cil_write_commit_record(ctx, &commit_iclog);
dd401770b0ff68 Dave Chinner      2020-03-25  1086  	if (error)
dd401770b0ff68 Dave Chinner      2020-03-25  1087  		goto out_abort_free_ticket;
dd401770b0ff68 Dave Chinner      2020-03-25  1088  
89ae379d564c5d Christoph Hellwig 2019-06-28  1089  	spin_lock(&commit_iclog->ic_callback_lock);
1858bb0bec612d Christoph Hellwig 2019-10-14  1090  	if (commit_iclog->ic_state == XLOG_STATE_IOERROR) {
89ae379d564c5d Christoph Hellwig 2019-06-28  1091  		spin_unlock(&commit_iclog->ic_callback_lock);
e469cbe84f4ade Dave Chinner      2021-06-08  1092  		goto out_abort_free_ticket;
89ae379d564c5d Christoph Hellwig 2019-06-28  1093  	}
89ae379d564c5d Christoph Hellwig 2019-06-28  1094  	ASSERT_ALWAYS(commit_iclog->ic_state == XLOG_STATE_ACTIVE ||
89ae379d564c5d Christoph Hellwig 2019-06-28  1095  		      commit_iclog->ic_state == XLOG_STATE_WANT_SYNC);
89ae379d564c5d Christoph Hellwig 2019-06-28  1096  	list_add_tail(&ctx->iclog_entry, &commit_iclog->ic_callbacks);
89ae379d564c5d Christoph Hellwig 2019-06-28  1097  	spin_unlock(&commit_iclog->ic_callback_lock);
71e330b593905e Dave Chinner      2010-05-21  1098  
71e330b593905e Dave Chinner      2010-05-21  1099  	/*
71e330b593905e Dave Chinner      2010-05-21  1100  	 * now the checkpoint commit is complete and we've attached the
71e330b593905e Dave Chinner      2010-05-21  1101  	 * callbacks to the iclog we can assign the commit LSN to the context
71e330b593905e Dave Chinner      2010-05-21  1102  	 * and wake up anyone who is waiting for the commit to complete.
71e330b593905e Dave Chinner      2010-05-21  1103  	 */
4bb928cdb900d0 Dave Chinner      2013-08-12  1104  	spin_lock(&cil->xc_push_lock);
eb40a87500ac2f Dave Chinner      2010-12-21  1105  	wake_up_all(&cil->xc_commit_wait);
4bb928cdb900d0 Dave Chinner      2013-08-12  1106  	spin_unlock(&cil->xc_push_lock);
71e330b593905e Dave Chinner      2010-05-21  1107  
e469cbe84f4ade Dave Chinner      2021-06-08  1108  	/*
e469cbe84f4ade Dave Chinner      2021-06-08  1109  	 * Pull the ticket off the ctx so we can ungrant it after releasing the
e469cbe84f4ade Dave Chinner      2021-06-08  1110  	 * commit_iclog. The ctx may be freed by the time we return from
e469cbe84f4ade Dave Chinner      2021-06-08  1111  	 * releasing the commit_iclog (i.e. checkpoint has been completed and
e469cbe84f4ade Dave Chinner      2021-06-08  1112  	 * callback run) so we can't reference the ctx after the call to
e469cbe84f4ade Dave Chinner      2021-06-08  1113  	 * xlog_state_release_iclog().
e469cbe84f4ade Dave Chinner      2021-06-08  1114  	 */
e469cbe84f4ade Dave Chinner      2021-06-08  1115  	ticket = ctx->ticket;
e469cbe84f4ade Dave Chinner      2021-06-08  1116  
5fd9256ce156ef Dave Chinner      2021-06-03  1117  	/*
815753dc16bbca Dave Chinner      2021-06-17  1118  	 * If the checkpoint spans multiple iclogs, wait for all previous iclogs
815753dc16bbca Dave Chinner      2021-06-17  1119  	 * to complete before we submit the commit_iclog. We can't use state
815753dc16bbca Dave Chinner      2021-06-17  1120  	 * checks for this - ACTIVE can be either a past completed iclog or a
815753dc16bbca Dave Chinner      2021-06-17  1121  	 * future iclog being filled, while WANT_SYNC through SYNC_DONE can be a
815753dc16bbca Dave Chinner      2021-06-17  1122  	 * past or future iclog awaiting IO or ordered IO completion to be run.
815753dc16bbca Dave Chinner      2021-06-17  1123  	 * In the latter case, if it's a future iclog and we wait on it, the we
815753dc16bbca Dave Chinner      2021-06-17  1124  	 * will hang because it won't get processed through to ic_force_wait
815753dc16bbca Dave Chinner      2021-06-17  1125  	 * wakeup until this commit_iclog is written to disk.  Hence we use the
815753dc16bbca Dave Chinner      2021-06-17  1126  	 * iclog header lsn and compare it to the commit lsn to determine if we
815753dc16bbca Dave Chinner      2021-06-17  1127  	 * need to wait on iclogs or not.
5fd9256ce156ef Dave Chinner      2021-06-03  1128  	 */
5fd9256ce156ef Dave Chinner      2021-06-03  1129  	spin_lock(&log->l_icloglock);
cb1acb3f324636 Dave Chinner      2021-06-04 @1130  	if (ctx->start_lsn != commit_lsn) {
                                                            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Never initialized.

815753dc16bbca Dave Chinner      2021-06-17  1131  		struct xlog_in_core	*iclog;
815753dc16bbca Dave Chinner      2021-06-17  1132  
815753dc16bbca Dave Chinner      2021-06-17  1133  		for (iclog = commit_iclog->ic_prev;
815753dc16bbca Dave Chinner      2021-06-17  1134  		     iclog != commit_iclog;
815753dc16bbca Dave Chinner      2021-06-17  1135  		     iclog = iclog->ic_prev) {
815753dc16bbca Dave Chinner      2021-06-17  1136  			xfs_lsn_t	hlsn;
815753dc16bbca Dave Chinner      2021-06-17  1137  
815753dc16bbca Dave Chinner      2021-06-17  1138  			/*
815753dc16bbca Dave Chinner      2021-06-17  1139  			 * If the LSN of the iclog is zero or in the future it
815753dc16bbca Dave Chinner      2021-06-17  1140  			 * means it has passed through IO completion and
815753dc16bbca Dave Chinner      2021-06-17  1141  			 * activation and hence all previous iclogs have also
815753dc16bbca Dave Chinner      2021-06-17  1142  			 * done so. We do not need to wait at all in this case.
815753dc16bbca Dave Chinner      2021-06-17  1143  			 */
815753dc16bbca Dave Chinner      2021-06-17  1144  			hlsn = be64_to_cpu(iclog->ic_header.h_lsn);
815753dc16bbca Dave Chinner      2021-06-17  1145  			if (!hlsn || XFS_LSN_CMP(hlsn, commit_lsn) > 0)
815753dc16bbca Dave Chinner      2021-06-17  1146  				break;
815753dc16bbca Dave Chinner      2021-06-17  1147  
815753dc16bbca Dave Chinner      2021-06-17  1148  			/*
815753dc16bbca Dave Chinner      2021-06-17  1149  			 * If the LSN of the iclog is older than the commit lsn,
815753dc16bbca Dave Chinner      2021-06-17  1150  			 * we have to wait on it. Waiting on this via the
815753dc16bbca Dave Chinner      2021-06-17  1151  			 * ic_force_wait should also order the completion of all
815753dc16bbca Dave Chinner      2021-06-17  1152  			 * older iclogs, too, but we leave checking that to the
815753dc16bbca Dave Chinner      2021-06-17  1153  			 * next loop iteration.
815753dc16bbca Dave Chinner      2021-06-17  1154  			 */
815753dc16bbca Dave Chinner      2021-06-17  1155  			ASSERT(XFS_LSN_CMP(hlsn, commit_lsn) < 0);
815753dc16bbca Dave Chinner      2021-06-17  1156  			xlog_wait_on_iclog(iclog);
cb1acb3f324636 Dave Chinner      2021-06-04  1157  			spin_lock(&log->l_icloglock);
815753dc16bbca Dave Chinner      2021-06-17  1158  		}
815753dc16bbca Dave Chinner      2021-06-17  1159  
815753dc16bbca Dave Chinner      2021-06-17  1160  		/*
815753dc16bbca Dave Chinner      2021-06-17  1161  		 * Regardless of whether we need to wait or not, the the
815753dc16bbca Dave Chinner      2021-06-17  1162  		 * commit_iclog write needs to issue a pre-flush so that the
815753dc16bbca Dave Chinner      2021-06-17  1163  		 * ordering for this checkpoint is correctly preserved down to
815753dc16bbca Dave Chinner      2021-06-17  1164  		 * stable storage.
815753dc16bbca Dave Chinner      2021-06-17  1165  		 */
cb1acb3f324636 Dave Chinner      2021-06-04  1166  		commit_iclog->ic_flags |= XLOG_ICL_NEED_FLUSH;
5fd9256ce156ef Dave Chinner      2021-06-03  1167  	}
5fd9256ce156ef Dave Chinner      2021-06-03  1168  
cb1acb3f324636 Dave Chinner      2021-06-04  1169  	/*
cb1acb3f324636 Dave Chinner      2021-06-04  1170  	 * The commit iclog must be written to stable storage to guarantee
cb1acb3f324636 Dave Chinner      2021-06-04  1171  	 * journal IO vs metadata writeback IO is correctly ordered on stable
cb1acb3f324636 Dave Chinner      2021-06-04  1172  	 * storage.
e12213ba5d909a Dave Chinner      2021-06-04  1173  	 *
e12213ba5d909a Dave Chinner      2021-06-04  1174  	 * If the push caller needs the commit to be immediately stable and the
e12213ba5d909a Dave Chinner      2021-06-04  1175  	 * commit_iclog is not yet marked as XLOG_STATE_WANT_SYNC to indicate it
e12213ba5d909a Dave Chinner      2021-06-04  1176  	 * will be written when released, switch it's state to WANT_SYNC right
e12213ba5d909a Dave Chinner      2021-06-04  1177  	 * now.
cb1acb3f324636 Dave Chinner      2021-06-04  1178  	 */
cb1acb3f324636 Dave Chinner      2021-06-04  1179  	commit_iclog->ic_flags |= XLOG_ICL_NEED_FUA;
e12213ba5d909a Dave Chinner      2021-06-04  1180  	if (push_commit_stable && commit_iclog->ic_state == XLOG_STATE_ACTIVE)
e12213ba5d909a Dave Chinner      2021-06-04  1181  		xlog_state_switch_iclogs(log, commit_iclog, 0);
e469cbe84f4ade Dave Chinner      2021-06-08  1182  	xlog_state_release_iclog(log, commit_iclog, ticket);
cb1acb3f324636 Dave Chinner      2021-06-04  1183  	spin_unlock(&log->l_icloglock);
e469cbe84f4ade Dave Chinner      2021-06-08  1184  
e469cbe84f4ade Dave Chinner      2021-06-08  1185  	xfs_log_ticket_ungrant(log, ticket);
c7cc296ddd1f6d Christoph Hellwig 2020-03-20  1186  	return;
71e330b593905e Dave Chinner      2010-05-21  1187  
71e330b593905e Dave Chinner      2010-05-21  1188  out_skip:
71e330b593905e Dave Chinner      2010-05-21  1189  	up_write(&cil->xc_ctx_lock);
71e330b593905e Dave Chinner      2010-05-21  1190  	xfs_log_ticket_put(new_ctx->ticket);
71e330b593905e Dave Chinner      2010-05-21  1191  	kmem_free(new_ctx);
c7cc296ddd1f6d Christoph Hellwig 2020-03-20  1192  	return;
71e330b593905e Dave Chinner      2010-05-21  1193  
7db37c5e6575b2 Dave Chinner      2011-01-27  1194  out_abort_free_ticket:
877cf3473914ae Dave Chinner      2021-06-04  1195  	xfs_log_ticket_ungrant(log, ctx->ticket);
12e6a0f449d585 Christoph Hellwig 2020-03-20  1196  	ASSERT(XLOG_FORCED_SHUTDOWN(log));
12e6a0f449d585 Christoph Hellwig 2020-03-20  1197  	xlog_cil_committed(ctx);
4c2d542f2e7865 Dave Chinner      2012-04-23  1198  }

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 4/8] xfs: pass a CIL context to xlog_write()
@ 2021-06-28  8:58 ` Dan Carpenter
  0 siblings, 0 replies; 50+ messages in thread
From: Dan Carpenter @ 2021-06-28  8:58 UTC (permalink / raw)
  To: kbuild-all

[-- Attachment #1: Type: text/plain, Size: 30277 bytes --]

Hi Dave,

url:    https://github.com/0day-ci/linux/commits/Dave-Chinner/xfs-log-fixes-for-for-next/20210617-162640
base:   https://git.kernel.org/pub/scm/fs/xfs/xfs-linux.git for-next
config: h8300-randconfig-m031-20210625 (attached as .config)
compiler: h8300-linux-gcc (GCC) 9.3.0

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>

New smatch warnings:
fs/xfs/xfs_log_cil.c:1130 xlog_cil_push_work() error: uninitialized symbol 'commit_lsn'.


vim +/commit_lsn +1130 fs/xfs/xfs_log_cil.c

c7cc296ddd1f6d Christoph Hellwig 2020-03-20   861  static void
c7cc296ddd1f6d Christoph Hellwig 2020-03-20   862  xlog_cil_push_work(
c7cc296ddd1f6d Christoph Hellwig 2020-03-20   863  	struct work_struct	*work)
71e330b593905e Dave Chinner      2010-05-21   864  {
facd77e4e38b8f Dave Chinner      2021-06-04   865  	struct xfs_cil_ctx	*ctx =
facd77e4e38b8f Dave Chinner      2021-06-04   866  		container_of(work, struct xfs_cil_ctx, push_work);
facd77e4e38b8f Dave Chinner      2021-06-04   867  	struct xfs_cil		*cil = ctx->cil;
c7cc296ddd1f6d Christoph Hellwig 2020-03-20   868  	struct xlog		*log = cil->xc_log;
71e330b593905e Dave Chinner      2010-05-21   869  	struct xfs_log_vec	*lv;
71e330b593905e Dave Chinner      2010-05-21   870  	struct xfs_cil_ctx	*new_ctx;
71e330b593905e Dave Chinner      2010-05-21   871  	struct xlog_in_core	*commit_iclog;
66fc9ffa8638be Dave Chinner      2021-06-04   872  	int			num_iovecs = 0;
66fc9ffa8638be Dave Chinner      2021-06-04   873  	int			num_bytes = 0;
71e330b593905e Dave Chinner      2010-05-21   874  	int			error = 0;
877cf3473914ae Dave Chinner      2021-06-04   875  	struct xlog_cil_trans_hdr thdr;
a47518453bf958 Dave Chinner      2021-06-08   876  	struct xfs_log_vec	lvhdr = {};
71e330b593905e Dave Chinner      2010-05-21   877  	xfs_lsn_t		commit_lsn;
                                                                                ^^^^^^^^^^

4c2d542f2e7865 Dave Chinner      2012-04-23   878  	xfs_lsn_t		push_seq;
0279bbbbc03f2c Dave Chinner      2021-06-03   879  	struct bio		bio;
0279bbbbc03f2c Dave Chinner      2021-06-03   880  	DECLARE_COMPLETION_ONSTACK(bdev_flush);
e12213ba5d909a Dave Chinner      2021-06-04   881  	bool			push_commit_stable;
e469cbe84f4ade Dave Chinner      2021-06-08   882  	struct xlog_ticket	*ticket;
71e330b593905e Dave Chinner      2010-05-21   883  
facd77e4e38b8f Dave Chinner      2021-06-04   884  	new_ctx = xlog_cil_ctx_alloc();
71e330b593905e Dave Chinner      2010-05-21   885  	new_ctx->ticket = xlog_cil_ticket_alloc(log);
71e330b593905e Dave Chinner      2010-05-21   886  
71e330b593905e Dave Chinner      2010-05-21   887  	down_write(&cil->xc_ctx_lock);
71e330b593905e Dave Chinner      2010-05-21   888  
4bb928cdb900d0 Dave Chinner      2013-08-12   889  	spin_lock(&cil->xc_push_lock);
4c2d542f2e7865 Dave Chinner      2012-04-23   890  	push_seq = cil->xc_push_seq;
4c2d542f2e7865 Dave Chinner      2012-04-23   891  	ASSERT(push_seq <= ctx->sequence);
e12213ba5d909a Dave Chinner      2021-06-04   892  	push_commit_stable = cil->xc_push_commit_stable;
e12213ba5d909a Dave Chinner      2021-06-04   893  	cil->xc_push_commit_stable = false;
71e330b593905e Dave Chinner      2010-05-21   894  
0e7ab7efe77451 Dave Chinner      2020-03-24   895  	/*
3682277520d6f4 Dave Chinner      2021-06-04   896  	 * As we are about to switch to a new, empty CIL context, we no longer
3682277520d6f4 Dave Chinner      2021-06-04   897  	 * need to throttle tasks on CIL space overruns. Wake any waiters that
3682277520d6f4 Dave Chinner      2021-06-04   898  	 * the hard push throttle may have caught so they can start committing
3682277520d6f4 Dave Chinner      2021-06-04   899  	 * to the new context. The ctx->xc_push_lock provides the serialisation
3682277520d6f4 Dave Chinner      2021-06-04   900  	 * necessary for safely using the lockless waitqueue_active() check in
3682277520d6f4 Dave Chinner      2021-06-04   901  	 * this context.
3682277520d6f4 Dave Chinner      2021-06-04   902  	 */
3682277520d6f4 Dave Chinner      2021-06-04   903  	if (waitqueue_active(&cil->xc_push_wait))
c7f87f3984cfa1 Dave Chinner      2020-06-16   904  		wake_up_all(&cil->xc_push_wait);
0e7ab7efe77451 Dave Chinner      2020-03-24   905  
4c2d542f2e7865 Dave Chinner      2012-04-23   906  	/*
4c2d542f2e7865 Dave Chinner      2012-04-23   907  	 * Check if we've anything to push. If there is nothing, then we don't
4c2d542f2e7865 Dave Chinner      2012-04-23   908  	 * move on to a new sequence number and so we have to be able to push
4c2d542f2e7865 Dave Chinner      2012-04-23   909  	 * this sequence again later.
4c2d542f2e7865 Dave Chinner      2012-04-23   910  	 */
0d11bae4bcf4aa Dave Chinner      2021-06-04   911  	if (test_bit(XLOG_CIL_EMPTY, &cil->xc_flags)) {
4c2d542f2e7865 Dave Chinner      2012-04-23   912  		cil->xc_push_seq = 0;
4bb928cdb900d0 Dave Chinner      2013-08-12   913  		spin_unlock(&cil->xc_push_lock);
a44f13edf0ebb4 Dave Chinner      2010-08-24   914  		goto out_skip;
4c2d542f2e7865 Dave Chinner      2012-04-23   915  	}
4c2d542f2e7865 Dave Chinner      2012-04-23   916  
a44f13edf0ebb4 Dave Chinner      2010-08-24   917  
cf085a1b5d2214 Joe Perches       2019-11-07   918  	/* check for a previously pushed sequence */
facd77e4e38b8f Dave Chinner      2021-06-04   919  	if (push_seq < ctx->sequence) {
8af3dcd3c89aef Dave Chinner      2014-09-23   920  		spin_unlock(&cil->xc_push_lock);
df806158b0f6eb Dave Chinner      2010-05-17   921  		goto out_skip;
8af3dcd3c89aef Dave Chinner      2014-09-23   922  	}
8af3dcd3c89aef Dave Chinner      2014-09-23   923  
8af3dcd3c89aef Dave Chinner      2014-09-23   924  	/*
8af3dcd3c89aef Dave Chinner      2014-09-23   925  	 * We are now going to push this context, so add it to the committing
8af3dcd3c89aef Dave Chinner      2014-09-23   926  	 * list before we do anything else. This ensures that anyone waiting on
8af3dcd3c89aef Dave Chinner      2014-09-23   927  	 * this push can easily detect the difference between a "push in
8af3dcd3c89aef Dave Chinner      2014-09-23   928  	 * progress" and "CIL is empty, nothing to do".
8af3dcd3c89aef Dave Chinner      2014-09-23   929  	 *
8af3dcd3c89aef Dave Chinner      2014-09-23   930  	 * IOWs, a wait loop can now check for:
8af3dcd3c89aef Dave Chinner      2014-09-23   931  	 *	the current sequence not being found on the committing list;
8af3dcd3c89aef Dave Chinner      2014-09-23   932  	 *	an empty CIL; and
8af3dcd3c89aef Dave Chinner      2014-09-23   933  	 *	an unchanged sequence number
8af3dcd3c89aef Dave Chinner      2014-09-23   934  	 * to detect a push that had nothing to do and therefore does not need
8af3dcd3c89aef Dave Chinner      2014-09-23   935  	 * waiting on. If the CIL is not empty, we get put on the committing
8af3dcd3c89aef Dave Chinner      2014-09-23   936  	 * list before emptying the CIL and bumping the sequence number. Hence
8af3dcd3c89aef Dave Chinner      2014-09-23   937  	 * an empty CIL and an unchanged sequence number means we jumped out
8af3dcd3c89aef Dave Chinner      2014-09-23   938  	 * above after doing nothing.
8af3dcd3c89aef Dave Chinner      2014-09-23   939  	 *
8af3dcd3c89aef Dave Chinner      2014-09-23   940  	 * Hence the waiter will either find the commit sequence on the
8af3dcd3c89aef Dave Chinner      2014-09-23   941  	 * committing list or the sequence number will be unchanged and the CIL
8af3dcd3c89aef Dave Chinner      2014-09-23   942  	 * still dirty. In that latter case, the push has not yet started, and
8af3dcd3c89aef Dave Chinner      2014-09-23   943  	 * so the waiter will have to continue trying to check the CIL
8af3dcd3c89aef Dave Chinner      2014-09-23   944  	 * committing list until it is found. In extreme cases of delay, the
8af3dcd3c89aef Dave Chinner      2014-09-23   945  	 * sequence may fully commit between the attempts the wait makes to wait
8af3dcd3c89aef Dave Chinner      2014-09-23   946  	 * on the commit sequence.
8af3dcd3c89aef Dave Chinner      2014-09-23   947  	 */
8af3dcd3c89aef Dave Chinner      2014-09-23   948  	list_add(&ctx->committing, &cil->xc_committing);
8af3dcd3c89aef Dave Chinner      2014-09-23   949  	spin_unlock(&cil->xc_push_lock);
df806158b0f6eb Dave Chinner      2010-05-17   950  
71e330b593905e Dave Chinner      2010-05-21   951  	/*
0279bbbbc03f2c Dave Chinner      2021-06-03   952  	 * The CIL is stable at this point - nothing new will be added to it
0279bbbbc03f2c Dave Chinner      2021-06-03   953  	 * because we hold the flush lock exclusively. Hence we can now issue
0279bbbbc03f2c Dave Chinner      2021-06-03   954  	 * a cache flush to ensure all the completed metadata in the journal we
0279bbbbc03f2c Dave Chinner      2021-06-03   955  	 * are about to overwrite is on stable storage.
0279bbbbc03f2c Dave Chinner      2021-06-03   956  	 */
0279bbbbc03f2c Dave Chinner      2021-06-03   957  	xfs_flush_bdev_async(&bio, log->l_mp->m_ddev_targp->bt_bdev,
0279bbbbc03f2c Dave Chinner      2021-06-03   958  				&bdev_flush);
0279bbbbc03f2c Dave Chinner      2021-06-03   959  
a8613836d99e62 Dave Chinner      2021-06-08   960  	xlog_cil_pcp_aggregate(cil, ctx);
a8613836d99e62 Dave Chinner      2021-06-08   961  
1f18c0c4b78cfb Dave Chinner      2021-06-08   962  	while (!list_empty(&ctx->log_items)) {
71e330b593905e Dave Chinner      2010-05-21   963  		struct xfs_log_item	*item;
71e330b593905e Dave Chinner      2010-05-21   964  
1f18c0c4b78cfb Dave Chinner      2021-06-08   965  		item = list_first_entry(&ctx->log_items,
71e330b593905e Dave Chinner      2010-05-21   966  					struct xfs_log_item, li_cil);
a47518453bf958 Dave Chinner      2021-06-08   967  		lv = item->li_lv;
a1785f597c8b06 Dave Chinner      2021-06-08   968  		lv->lv_order_id = item->li_order_id;
a47518453bf958 Dave Chinner      2021-06-08   969  		num_iovecs += lv->lv_niovecs;
66fc9ffa8638be Dave Chinner      2021-06-04   970  		/* we don't write ordered log vectors */
66fc9ffa8638be Dave Chinner      2021-06-04   971  		if (lv->lv_buf_len != XFS_LOG_VEC_ORDERED)
66fc9ffa8638be Dave Chinner      2021-06-04   972  			num_bytes += lv->lv_bytes;
a47518453bf958 Dave Chinner      2021-06-08   973  
a47518453bf958 Dave Chinner      2021-06-08   974  		list_add_tail(&lv->lv_list, &ctx->lv_chain);
a1785f597c8b06 Dave Chinner      2021-06-08   975  		list_del_init(&item->li_cil);
a1785f597c8b06 Dave Chinner      2021-06-08   976  		item->li_order_id = 0;
a1785f597c8b06 Dave Chinner      2021-06-08   977  		item->li_lv = NULL;
71e330b593905e Dave Chinner      2010-05-21   978  	}
71e330b593905e Dave Chinner      2010-05-21   979  
71e330b593905e Dave Chinner      2010-05-21   980  	/*
facd77e4e38b8f Dave Chinner      2021-06-04   981  	 * Switch the contexts so we can drop the context lock and move out
71e330b593905e Dave Chinner      2010-05-21   982  	 * of a shared context. We can't just go straight to the commit record,
71e330b593905e Dave Chinner      2010-05-21   983  	 * though - we need to synchronise with previous and future commits so
71e330b593905e Dave Chinner      2010-05-21   984  	 * that the commit records are correctly ordered in the log to ensure
71e330b593905e Dave Chinner      2010-05-21   985  	 * that we process items during log IO completion in the correct order.
71e330b593905e Dave Chinner      2010-05-21   986  	 *
71e330b593905e Dave Chinner      2010-05-21   987  	 * For example, if we get an EFI in one checkpoint and the EFD in the
71e330b593905e Dave Chinner      2010-05-21   988  	 * next (e.g. due to log forces), we do not want the checkpoint with
71e330b593905e Dave Chinner      2010-05-21   989  	 * the EFD to be committed before the checkpoint with the EFI.  Hence
71e330b593905e Dave Chinner      2010-05-21   990  	 * we must strictly order the commit records of the checkpoints so
71e330b593905e Dave Chinner      2010-05-21   991  	 * that: a) the checkpoint callbacks are attached to the iclogs in the
71e330b593905e Dave Chinner      2010-05-21   992  	 * correct order; and b) the checkpoints are replayed in correct order
71e330b593905e Dave Chinner      2010-05-21   993  	 * in log recovery.
71e330b593905e Dave Chinner      2010-05-21   994  	 *
71e330b593905e Dave Chinner      2010-05-21   995  	 * Hence we need to add this context to the committing context list so
71e330b593905e Dave Chinner      2010-05-21   996  	 * that higher sequences will wait for us to write out a commit record
71e330b593905e Dave Chinner      2010-05-21   997  	 * before they do.
f876e44603ad09 Dave Chinner      2014-02-27   998  	 *
f39ae5297c5ce2 Dave Chinner      2021-06-04   999  	 * xfs_log_force_seq requires us to mirror the new sequence into the cil
f876e44603ad09 Dave Chinner      2014-02-27  1000  	 * structure atomically with the addition of this sequence to the
f876e44603ad09 Dave Chinner      2014-02-27  1001  	 * committing list. This also ensures that we can do unlocked checks
f876e44603ad09 Dave Chinner      2014-02-27  1002  	 * against the current sequence in log forces without risking
f876e44603ad09 Dave Chinner      2014-02-27  1003  	 * deferencing a freed context pointer.
71e330b593905e Dave Chinner      2010-05-21  1004  	 */
4bb928cdb900d0 Dave Chinner      2013-08-12  1005  	spin_lock(&cil->xc_push_lock);
facd77e4e38b8f Dave Chinner      2021-06-04  1006  	xlog_cil_ctx_switch(cil, new_ctx);
4bb928cdb900d0 Dave Chinner      2013-08-12  1007  	spin_unlock(&cil->xc_push_lock);
71e330b593905e Dave Chinner      2010-05-21  1008  	up_write(&cil->xc_ctx_lock);
71e330b593905e Dave Chinner      2010-05-21  1009  
a1785f597c8b06 Dave Chinner      2021-06-08  1010  	/*
a1785f597c8b06 Dave Chinner      2021-06-08  1011  	 * Sort the log vector chain before we add the transaction headers.
a1785f597c8b06 Dave Chinner      2021-06-08  1012  	 * This ensures we always have the transaction headers at the start
a1785f597c8b06 Dave Chinner      2021-06-08  1013  	 * of the chain.
a1785f597c8b06 Dave Chinner      2021-06-08  1014  	 */
a1785f597c8b06 Dave Chinner      2021-06-08  1015  	list_sort(NULL, &ctx->lv_chain, xlog_cil_order_cmp);
a1785f597c8b06 Dave Chinner      2021-06-08  1016  
71e330b593905e Dave Chinner      2010-05-21  1017  	/*
71e330b593905e Dave Chinner      2010-05-21  1018  	 * Build a checkpoint transaction header and write it to the log to
71e330b593905e Dave Chinner      2010-05-21  1019  	 * begin the transaction. We need to account for the space used by the
71e330b593905e Dave Chinner      2010-05-21  1020  	 * transaction header here as it is not accounted for in xlog_write().
a47518453bf958 Dave Chinner      2021-06-08  1021  	 * Add the lvhdr to the head of the lv chain we pass to xlog_write() so
a47518453bf958 Dave Chinner      2021-06-08  1022  	 * it gets written into the iclog first.
71e330b593905e Dave Chinner      2010-05-21  1023  	 */
877cf3473914ae Dave Chinner      2021-06-04  1024  	xlog_cil_build_trans_hdr(ctx, &thdr, &lvhdr, num_iovecs);
66fc9ffa8638be Dave Chinner      2021-06-04  1025  	num_bytes += lvhdr.lv_bytes;
a47518453bf958 Dave Chinner      2021-06-08  1026  	list_add(&lvhdr.lv_list, &ctx->lv_chain);
71e330b593905e Dave Chinner      2010-05-21  1027  
0279bbbbc03f2c Dave Chinner      2021-06-03  1028  	/*
0279bbbbc03f2c Dave Chinner      2021-06-03  1029  	 * Before we format and submit the first iclog, we have to ensure that
0279bbbbc03f2c Dave Chinner      2021-06-03  1030  	 * the metadata writeback ordering cache flush is complete.
0279bbbbc03f2c Dave Chinner      2021-06-03  1031  	 */
0279bbbbc03f2c Dave Chinner      2021-06-03  1032  	wait_for_completion(&bdev_flush);
0279bbbbc03f2c Dave Chinner      2021-06-03  1033  
877cf3473914ae Dave Chinner      2021-06-04  1034  	/*
877cf3473914ae Dave Chinner      2021-06-04  1035  	 * The LSN we need to pass to the log items on transaction commit is the
877cf3473914ae Dave Chinner      2021-06-04  1036  	 * LSN reported by the first log vector write, not the commit lsn. If we
877cf3473914ae Dave Chinner      2021-06-04  1037  	 * use the commit record lsn then we can move the tail beyond the grant
877cf3473914ae Dave Chinner      2021-06-04  1038  	 * write head.
877cf3473914ae Dave Chinner      2021-06-04  1039  	 */
fc3370002b56bc Dave Chinner      2021-06-17  1040  	error = xlog_write(log, ctx, &ctx->lv_chain, ctx->ticket,
a47518453bf958 Dave Chinner      2021-06-08  1041  				NULL, num_bytes);
a47518453bf958 Dave Chinner      2021-06-08  1042  
a47518453bf958 Dave Chinner      2021-06-08  1043  	/*
a47518453bf958 Dave Chinner      2021-06-08  1044  	 * Take the lvhdr back off the lv_chain as it should not be passed
a47518453bf958 Dave Chinner      2021-06-08  1045  	 * to log IO completion.
a47518453bf958 Dave Chinner      2021-06-08  1046  	 */
a47518453bf958 Dave Chinner      2021-06-08  1047  	list_del(&lvhdr.lv_list);
71e330b593905e Dave Chinner      2010-05-21  1048  	if (error)
7db37c5e6575b2 Dave Chinner      2011-01-27  1049  		goto out_abort_free_ticket;
71e330b593905e Dave Chinner      2010-05-21  1050  
71e330b593905e Dave Chinner      2010-05-21  1051  	/*
71e330b593905e Dave Chinner      2010-05-21  1052  	 * now that we've written the checkpoint into the log, strictly
71e330b593905e Dave Chinner      2010-05-21  1053  	 * order the commit records so replay will get them in the right order.
71e330b593905e Dave Chinner      2010-05-21  1054  	 */
71e330b593905e Dave Chinner      2010-05-21  1055  restart:
4bb928cdb900d0 Dave Chinner      2013-08-12  1056  	spin_lock(&cil->xc_push_lock);
71e330b593905e Dave Chinner      2010-05-21  1057  	list_for_each_entry(new_ctx, &cil->xc_committing, committing) {
ac983517ec5941 Dave Chinner      2014-05-07  1058  		/*
ac983517ec5941 Dave Chinner      2014-05-07  1059  		 * Avoid getting stuck in this loop because we were woken by the
ac983517ec5941 Dave Chinner      2014-05-07  1060  		 * shutdown, but then went back to sleep once already in the
ac983517ec5941 Dave Chinner      2014-05-07  1061  		 * shutdown state.
ac983517ec5941 Dave Chinner      2014-05-07  1062  		 */
ac983517ec5941 Dave Chinner      2014-05-07  1063  		if (XLOG_FORCED_SHUTDOWN(log)) {
ac983517ec5941 Dave Chinner      2014-05-07  1064  			spin_unlock(&cil->xc_push_lock);
ac983517ec5941 Dave Chinner      2014-05-07  1065  			goto out_abort_free_ticket;
ac983517ec5941 Dave Chinner      2014-05-07  1066  		}
ac983517ec5941 Dave Chinner      2014-05-07  1067  
71e330b593905e Dave Chinner      2010-05-21  1068  		/*
71e330b593905e Dave Chinner      2010-05-21  1069  		 * Higher sequences will wait for this one so skip them.
ac983517ec5941 Dave Chinner      2014-05-07  1070  		 * Don't wait for our own sequence, either.
71e330b593905e Dave Chinner      2010-05-21  1071  		 */
71e330b593905e Dave Chinner      2010-05-21  1072  		if (new_ctx->sequence >= ctx->sequence)
71e330b593905e Dave Chinner      2010-05-21  1073  			continue;
71e330b593905e Dave Chinner      2010-05-21  1074  		if (!new_ctx->commit_lsn) {
71e330b593905e Dave Chinner      2010-05-21  1075  			/*
71e330b593905e Dave Chinner      2010-05-21  1076  			 * It is still being pushed! Wait for the push to
71e330b593905e Dave Chinner      2010-05-21  1077  			 * complete, then start again from the beginning.
71e330b593905e Dave Chinner      2010-05-21  1078  			 */
4bb928cdb900d0 Dave Chinner      2013-08-12  1079  			xlog_wait(&cil->xc_commit_wait, &cil->xc_push_lock);
71e330b593905e Dave Chinner      2010-05-21  1080  			goto restart;
71e330b593905e Dave Chinner      2010-05-21  1081  		}
71e330b593905e Dave Chinner      2010-05-21  1082  	}
4bb928cdb900d0 Dave Chinner      2013-08-12  1083  	spin_unlock(&cil->xc_push_lock);
71e330b593905e Dave Chinner      2010-05-21  1084  
fc3370002b56bc Dave Chinner      2021-06-17  1085  	error = xlog_cil_write_commit_record(ctx, &commit_iclog);
dd401770b0ff68 Dave Chinner      2020-03-25  1086  	if (error)
dd401770b0ff68 Dave Chinner      2020-03-25  1087  		goto out_abort_free_ticket;
dd401770b0ff68 Dave Chinner      2020-03-25  1088  
89ae379d564c5d Christoph Hellwig 2019-06-28  1089  	spin_lock(&commit_iclog->ic_callback_lock);
1858bb0bec612d Christoph Hellwig 2019-10-14  1090  	if (commit_iclog->ic_state == XLOG_STATE_IOERROR) {
89ae379d564c5d Christoph Hellwig 2019-06-28  1091  		spin_unlock(&commit_iclog->ic_callback_lock);
e469cbe84f4ade Dave Chinner      2021-06-08  1092  		goto out_abort_free_ticket;
89ae379d564c5d Christoph Hellwig 2019-06-28  1093  	}
89ae379d564c5d Christoph Hellwig 2019-06-28  1094  	ASSERT_ALWAYS(commit_iclog->ic_state == XLOG_STATE_ACTIVE ||
89ae379d564c5d Christoph Hellwig 2019-06-28  1095  		      commit_iclog->ic_state == XLOG_STATE_WANT_SYNC);
89ae379d564c5d Christoph Hellwig 2019-06-28  1096  	list_add_tail(&ctx->iclog_entry, &commit_iclog->ic_callbacks);
89ae379d564c5d Christoph Hellwig 2019-06-28  1097  	spin_unlock(&commit_iclog->ic_callback_lock);
71e330b593905e Dave Chinner      2010-05-21  1098  
71e330b593905e Dave Chinner      2010-05-21  1099  	/*
71e330b593905e Dave Chinner      2010-05-21  1100  	 * now the checkpoint commit is complete and we've attached the
71e330b593905e Dave Chinner      2010-05-21  1101  	 * callbacks to the iclog we can assign the commit LSN to the context
71e330b593905e Dave Chinner      2010-05-21  1102  	 * and wake up anyone who is waiting for the commit to complete.
71e330b593905e Dave Chinner      2010-05-21  1103  	 */
4bb928cdb900d0 Dave Chinner      2013-08-12  1104  	spin_lock(&cil->xc_push_lock);
eb40a87500ac2f Dave Chinner      2010-12-21  1105  	wake_up_all(&cil->xc_commit_wait);
4bb928cdb900d0 Dave Chinner      2013-08-12  1106  	spin_unlock(&cil->xc_push_lock);
71e330b593905e Dave Chinner      2010-05-21  1107  
e469cbe84f4ade Dave Chinner      2021-06-08  1108  	/*
e469cbe84f4ade Dave Chinner      2021-06-08  1109  	 * Pull the ticket off the ctx so we can ungrant it after releasing the
e469cbe84f4ade Dave Chinner      2021-06-08  1110  	 * commit_iclog. The ctx may be freed by the time we return from
e469cbe84f4ade Dave Chinner      2021-06-08  1111  	 * releasing the commit_iclog (i.e. checkpoint has been completed and
e469cbe84f4ade Dave Chinner      2021-06-08  1112  	 * callback run) so we can't reference the ctx after the call to
e469cbe84f4ade Dave Chinner      2021-06-08  1113  	 * xlog_state_release_iclog().
e469cbe84f4ade Dave Chinner      2021-06-08  1114  	 */
e469cbe84f4ade Dave Chinner      2021-06-08  1115  	ticket = ctx->ticket;
e469cbe84f4ade Dave Chinner      2021-06-08  1116  
5fd9256ce156ef Dave Chinner      2021-06-03  1117  	/*
815753dc16bbca Dave Chinner      2021-06-17  1118  	 * If the checkpoint spans multiple iclogs, wait for all previous iclogs
815753dc16bbca Dave Chinner      2021-06-17  1119  	 * to complete before we submit the commit_iclog. We can't use state
815753dc16bbca Dave Chinner      2021-06-17  1120  	 * checks for this - ACTIVE can be either a past completed iclog or a
815753dc16bbca Dave Chinner      2021-06-17  1121  	 * future iclog being filled, while WANT_SYNC through SYNC_DONE can be a
815753dc16bbca Dave Chinner      2021-06-17  1122  	 * past or future iclog awaiting IO or ordered IO completion to be run.
815753dc16bbca Dave Chinner      2021-06-17  1123  	 * In the latter case, if it's a future iclog and we wait on it, the we
815753dc16bbca Dave Chinner      2021-06-17  1124  	 * will hang because it won't get processed through to ic_force_wait
815753dc16bbca Dave Chinner      2021-06-17  1125  	 * wakeup until this commit_iclog is written to disk.  Hence we use the
815753dc16bbca Dave Chinner      2021-06-17  1126  	 * iclog header lsn and compare it to the commit lsn to determine if we
815753dc16bbca Dave Chinner      2021-06-17  1127  	 * need to wait on iclogs or not.
5fd9256ce156ef Dave Chinner      2021-06-03  1128  	 */
5fd9256ce156ef Dave Chinner      2021-06-03  1129  	spin_lock(&log->l_icloglock);
cb1acb3f324636 Dave Chinner      2021-06-04 @1130  	if (ctx->start_lsn != commit_lsn) {
                                                            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Never initialized.

815753dc16bbca Dave Chinner      2021-06-17  1131  		struct xlog_in_core	*iclog;
815753dc16bbca Dave Chinner      2021-06-17  1132  
815753dc16bbca Dave Chinner      2021-06-17  1133  		for (iclog = commit_iclog->ic_prev;
815753dc16bbca Dave Chinner      2021-06-17  1134  		     iclog != commit_iclog;
815753dc16bbca Dave Chinner      2021-06-17  1135  		     iclog = iclog->ic_prev) {
815753dc16bbca Dave Chinner      2021-06-17  1136  			xfs_lsn_t	hlsn;
815753dc16bbca Dave Chinner      2021-06-17  1137  
815753dc16bbca Dave Chinner      2021-06-17  1138  			/*
815753dc16bbca Dave Chinner      2021-06-17  1139  			 * If the LSN of the iclog is zero or in the future it
815753dc16bbca Dave Chinner      2021-06-17  1140  			 * means it has passed through IO completion and
815753dc16bbca Dave Chinner      2021-06-17  1141  			 * activation and hence all previous iclogs have also
815753dc16bbca Dave Chinner      2021-06-17  1142  			 * done so. We do not need to wait at all in this case.
815753dc16bbca Dave Chinner      2021-06-17  1143  			 */
815753dc16bbca Dave Chinner      2021-06-17  1144  			hlsn = be64_to_cpu(iclog->ic_header.h_lsn);
815753dc16bbca Dave Chinner      2021-06-17  1145  			if (!hlsn || XFS_LSN_CMP(hlsn, commit_lsn) > 0)
815753dc16bbca Dave Chinner      2021-06-17  1146  				break;
815753dc16bbca Dave Chinner      2021-06-17  1147  
815753dc16bbca Dave Chinner      2021-06-17  1148  			/*
815753dc16bbca Dave Chinner      2021-06-17  1149  			 * If the LSN of the iclog is older than the commit lsn,
815753dc16bbca Dave Chinner      2021-06-17  1150  			 * we have to wait on it. Waiting on this via the
815753dc16bbca Dave Chinner      2021-06-17  1151  			 * ic_force_wait should also order the completion of all
815753dc16bbca Dave Chinner      2021-06-17  1152  			 * older iclogs, too, but we leave checking that to the
815753dc16bbca Dave Chinner      2021-06-17  1153  			 * next loop iteration.
815753dc16bbca Dave Chinner      2021-06-17  1154  			 */
815753dc16bbca Dave Chinner      2021-06-17  1155  			ASSERT(XFS_LSN_CMP(hlsn, commit_lsn) < 0);
815753dc16bbca Dave Chinner      2021-06-17  1156  			xlog_wait_on_iclog(iclog);
cb1acb3f324636 Dave Chinner      2021-06-04  1157  			spin_lock(&log->l_icloglock);
815753dc16bbca Dave Chinner      2021-06-17  1158  		}
815753dc16bbca Dave Chinner      2021-06-17  1159  
815753dc16bbca Dave Chinner      2021-06-17  1160  		/*
815753dc16bbca Dave Chinner      2021-06-17  1161  		 * Regardless of whether we need to wait or not, the the
815753dc16bbca Dave Chinner      2021-06-17  1162  		 * commit_iclog write needs to issue a pre-flush so that the
815753dc16bbca Dave Chinner      2021-06-17  1163  		 * ordering for this checkpoint is correctly preserved down to
815753dc16bbca Dave Chinner      2021-06-17  1164  		 * stable storage.
815753dc16bbca Dave Chinner      2021-06-17  1165  		 */
cb1acb3f324636 Dave Chinner      2021-06-04  1166  		commit_iclog->ic_flags |= XLOG_ICL_NEED_FLUSH;
5fd9256ce156ef Dave Chinner      2021-06-03  1167  	}
5fd9256ce156ef Dave Chinner      2021-06-03  1168  
cb1acb3f324636 Dave Chinner      2021-06-04  1169  	/*
cb1acb3f324636 Dave Chinner      2021-06-04  1170  	 * The commit iclog must be written to stable storage to guarantee
cb1acb3f324636 Dave Chinner      2021-06-04  1171  	 * journal IO vs metadata writeback IO is correctly ordered on stable
cb1acb3f324636 Dave Chinner      2021-06-04  1172  	 * storage.
e12213ba5d909a Dave Chinner      2021-06-04  1173  	 *
e12213ba5d909a Dave Chinner      2021-06-04  1174  	 * If the push caller needs the commit to be immediately stable and the
e12213ba5d909a Dave Chinner      2021-06-04  1175  	 * commit_iclog is not yet marked as XLOG_STATE_WANT_SYNC to indicate it
e12213ba5d909a Dave Chinner      2021-06-04  1176  	 * will be written when released, switch it's state to WANT_SYNC right
e12213ba5d909a Dave Chinner      2021-06-04  1177  	 * now.
cb1acb3f324636 Dave Chinner      2021-06-04  1178  	 */
cb1acb3f324636 Dave Chinner      2021-06-04  1179  	commit_iclog->ic_flags |= XLOG_ICL_NEED_FUA;
e12213ba5d909a Dave Chinner      2021-06-04  1180  	if (push_commit_stable && commit_iclog->ic_state == XLOG_STATE_ACTIVE)
e12213ba5d909a Dave Chinner      2021-06-04  1181  		xlog_state_switch_iclogs(log, commit_iclog, 0);
e469cbe84f4ade Dave Chinner      2021-06-08  1182  	xlog_state_release_iclog(log, commit_iclog, ticket);
cb1acb3f324636 Dave Chinner      2021-06-04  1183  	spin_unlock(&log->l_icloglock);
e469cbe84f4ade Dave Chinner      2021-06-08  1184  
e469cbe84f4ade Dave Chinner      2021-06-08  1185  	xfs_log_ticket_ungrant(log, ticket);
c7cc296ddd1f6d Christoph Hellwig 2020-03-20  1186  	return;
71e330b593905e Dave Chinner      2010-05-21  1187  
71e330b593905e Dave Chinner      2010-05-21  1188  out_skip:
71e330b593905e Dave Chinner      2010-05-21  1189  	up_write(&cil->xc_ctx_lock);
71e330b593905e Dave Chinner      2010-05-21  1190  	xfs_log_ticket_put(new_ctx->ticket);
71e330b593905e Dave Chinner      2010-05-21  1191  	kmem_free(new_ctx);
c7cc296ddd1f6d Christoph Hellwig 2020-03-20  1192  	return;
71e330b593905e Dave Chinner      2010-05-21  1193  
7db37c5e6575b2 Dave Chinner      2011-01-27  1194  out_abort_free_ticket:
877cf3473914ae Dave Chinner      2021-06-04  1195  	xfs_log_ticket_ungrant(log, ctx->ticket);
12e6a0f449d585 Christoph Hellwig 2020-03-20  1196  	ASSERT(XLOG_FORCED_SHUTDOWN(log));
12e6a0f449d585 Christoph Hellwig 2020-03-20  1197  	xlog_cil_committed(ctx);
4c2d542f2e7865 Dave Chinner      2012-04-23  1198  }

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all(a)lists.01.org

^ permalink raw reply	[flat|nested] 50+ messages in thread

end of thread, other threads:[~2021-06-28  8:58 UTC | newest]

Thread overview: 50+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-06-17  8:26 [PATCH 0/8 V2] xfs: log fixes for for-next Dave Chinner
2021-06-17  8:26 ` [PATCH 1/8] xfs: add iclog state trace events Dave Chinner
2021-06-17 16:45   ` Darrick J. Wong
2021-06-18 14:09   ` Christoph Hellwig
2021-06-17  8:26 ` [PATCH 2/8] xfs: don't wait on future iclogs when pushing the CIL Dave Chinner
2021-06-17 17:49   ` Darrick J. Wong
2021-06-17 21:55     ` Dave Chinner
2021-06-17  8:26 ` [PATCH 3/8] xfs: move xlog_commit_record to xfs_log_cil.c Dave Chinner
2021-06-17 12:57   ` kernel test robot
2021-06-17 12:57     ` kernel test robot
2021-06-17 17:50   ` Darrick J. Wong
2021-06-17 21:56     ` Dave Chinner
2021-06-18 14:16   ` Christoph Hellwig
2021-06-17  8:26 ` [PATCH 4/8] xfs: pass a CIL context to xlog_write() Dave Chinner
2021-06-17 14:46   ` kernel test robot
2021-06-17 14:46     ` kernel test robot
2021-06-17 20:24   ` Darrick J. Wong
2021-06-17 22:03     ` Dave Chinner
2021-06-17 22:18       ` Darrick J. Wong
2021-06-18 14:23   ` Christoph Hellwig
2021-06-17  8:26 ` [PATCH 5/8] xfs: factor out log write ordering from xlog_cil_push_work() Dave Chinner
2021-06-17 19:59   ` Darrick J. Wong
2021-06-18 14:27     ` Christoph Hellwig
2021-06-18 22:34       ` Dave Chinner
2021-06-17  8:26 ` [PATCH 6/8] xfs: separate out setting CIL context LSNs from xlog_write Dave Chinner
2021-06-17 20:28   ` Darrick J. Wong
2021-06-17 22:10     ` Dave Chinner
2021-06-17  8:26 ` [PATCH 7/8] xfs: attached iclog callbacks in xlog_cil_set_ctx_write_state() Dave Chinner
2021-06-17 20:55   ` Darrick J. Wong
2021-06-17 22:20     ` Dave Chinner
2021-06-17  8:26 ` [PATCH 8/8] xfs: order CIL checkpoint start records Dave Chinner
2021-06-17 21:31   ` Darrick J. Wong
2021-06-17 22:49     ` Dave Chinner
2021-06-17 18:32 ` [PATCH 0/8 V2] xfs: log fixes for for-next Brian Foster
2021-06-17 19:05   ` Darrick J. Wong
2021-06-17 20:06     ` Brian Foster
2021-06-17 20:26       ` Darrick J. Wong
2021-06-17 23:31         ` Brian Foster
2021-06-17 23:43     ` Dave Chinner
2021-06-18 13:08       ` Brian Foster
2021-06-18 13:55         ` Christoph Hellwig
2021-06-18 14:02           ` Christoph Hellwig
2021-06-18 22:28           ` Dave Chinner
2021-06-18 22:15         ` Dave Chinner
2021-06-18 22:48 ` Dave Chinner
2021-06-19 20:22   ` Darrick J. Wong
2021-06-20 22:18     ` Dave Chinner
2021-06-26 23:10 [PATCH 4/8] xfs: pass a CIL context to xlog_write() kernel test robot
2021-06-28  8:58 ` Dan Carpenter
2021-06-28  8:58 ` Dan Carpenter

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.