* [PATCH 0/8 V2] xfs: log fixes for for-next
@ 2021-06-17 8:26 Dave Chinner
2021-06-17 8:26 ` [PATCH 1/8] xfs: add iclog state trace events Dave Chinner
` (9 more replies)
0 siblings, 10 replies; 50+ messages in thread
From: Dave Chinner @ 2021-06-17 8:26 UTC (permalink / raw)
To: linux-xfs
Hi folks,
This is followup from the first set of log fixes for for-next that
were posted here:
https://lore.kernel.org/linux-xfs/20210615175719.GD158209@locust/T/#mde2cf0bb7d2ac369815a7e9371f0303efc89f51b
The first two patches of this series are updates for those patches,
change log below. The rest is the fix for the bigger issue we
uncovered in investigating the generic/019 failures, being that
we're triggering a zero-day bug in the way log recovery assigns LSNs
to checkpoints.
The "simple" fix of using the same ordering code as the commit
record for the start records in the CIL push turned into a lot of
patches once I started cleaning it up, separating out all the
different bits and finally realising all the things I needed to
change to avoid unintentional logic/behavioural changes. Hence
there's some code movement, some factoring, API changes to
xlog_write(), changing where we attach callbacks to commit iclogs so
they remain correctly ordered if there are multiple commit records
in the one iclog and then, finally, strictly ordering the start
records....
The original "simple fix" I tested last night ran almost a thousand
cycles of generic/019 without a log hang or recovery failure of any
kind. The refactored patchset has run a couple hundred cycles of
g/019 and g/475 over the last few hours without a failure, so I'm
posting this so we can get a review iteration done while I sleep so
we can - hopefully - get this sorted out before the end of the week.
Cheers,
Dave.
Version 2:
- tested on 5.13-rc6 + linux-xfs/for-next
- added strings for XLOG_STATE* variables to tracepoint output.
- rewrote the past/future iclog detection to use iclog header LSNs
rather than iclog states as the state values do not tell us anything
useful about the temporal relativity of the iclog in relation to
the current commit iclog.
- added patches to strictly order checkpoint start records the same
way we strictly order checkpoint commit records.
^ permalink raw reply [flat|nested] 50+ messages in thread
* [PATCH 1/8] xfs: add iclog state trace events
2021-06-17 8:26 [PATCH 0/8 V2] xfs: log fixes for for-next Dave Chinner
@ 2021-06-17 8:26 ` Dave Chinner
2021-06-17 16:45 ` Darrick J. Wong
2021-06-18 14:09 ` Christoph Hellwig
2021-06-17 8:26 ` [PATCH 2/8] xfs: don't wait on future iclogs when pushing the CIL Dave Chinner
` (8 subsequent siblings)
9 siblings, 2 replies; 50+ messages in thread
From: Dave Chinner @ 2021-06-17 8:26 UTC (permalink / raw)
To: linux-xfs
From: Dave Chinner <dchinner@redhat.com>
For the DEBUGS!
Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
fs/xfs/xfs_log.c | 18 +++++++++++++
fs/xfs/xfs_log_priv.h | 10 ++++++++
fs/xfs/xfs_trace.h | 60 +++++++++++++++++++++++++++++++++++++++++++
3 files changed, 88 insertions(+)
diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c
index e921b554b683..54fd6a695bb5 100644
--- a/fs/xfs/xfs_log.c
+++ b/fs/xfs/xfs_log.c
@@ -524,6 +524,7 @@ __xlog_state_release_iclog(
iclog->ic_header.h_tail_lsn = cpu_to_be64(tail_lsn);
xlog_verify_tail_lsn(log, iclog, tail_lsn);
/* cycle incremented when incrementing curr_block */
+ trace_xlog_iclog_syncing(iclog, _RET_IP_);
return true;
}
@@ -543,6 +544,7 @@ xlog_state_release_iclog(
{
lockdep_assert_held(&log->l_icloglock);
+ trace_xlog_iclog_release(iclog, _RET_IP_);
if (iclog->ic_state == XLOG_STATE_IOERROR)
return -EIO;
@@ -804,6 +806,7 @@ xlog_wait_on_iclog(
{
struct xlog *log = iclog->ic_log;
+ trace_xlog_iclog_wait_on(iclog, _RET_IP_);
if (!XLOG_FORCED_SHUTDOWN(log) &&
iclog->ic_state != XLOG_STATE_ACTIVE &&
iclog->ic_state != XLOG_STATE_DIRTY) {
@@ -1804,6 +1807,7 @@ xlog_write_iclog(
unsigned int count)
{
ASSERT(bno < log->l_logBBsize);
+ trace_xlog_iclog_write(iclog, _RET_IP_);
/*
* We lock the iclogbufs here so that we can serialise against I/O
@@ -1950,6 +1954,7 @@ xlog_sync(
unsigned int size;
ASSERT(atomic_read(&iclog->ic_refcnt) == 0);
+ trace_xlog_iclog_sync(iclog, _RET_IP_);
count = xlog_calc_iclog_size(log, iclog, &roundoff);
@@ -2488,6 +2493,7 @@ xlog_state_activate_iclog(
int *iclogs_changed)
{
ASSERT(list_empty_careful(&iclog->ic_callbacks));
+ trace_xlog_iclog_activate(iclog, _RET_IP_);
/*
* If the number of ops in this iclog indicate it just contains the
@@ -2577,6 +2583,8 @@ xlog_state_clean_iclog(
{
int iclogs_changed = 0;
+ trace_xlog_iclog_clean(dirty_iclog, _RET_IP_);
+
dirty_iclog->ic_state = XLOG_STATE_DIRTY;
xlog_state_activate_iclogs(log, &iclogs_changed);
@@ -2636,6 +2644,7 @@ xlog_state_set_callback(
struct xlog_in_core *iclog,
xfs_lsn_t header_lsn)
{
+ trace_xlog_iclog_callback(iclog, _RET_IP_);
iclog->ic_state = XLOG_STATE_CALLBACK;
ASSERT(XFS_LSN_CMP(atomic64_read(&log->l_last_sync_lsn),
@@ -2717,6 +2726,7 @@ xlog_state_do_iclog_callbacks(
__releases(&log->l_icloglock)
__acquires(&log->l_icloglock)
{
+ trace_xlog_iclog_callbacks_start(iclog, _RET_IP_);
spin_unlock(&log->l_icloglock);
spin_lock(&iclog->ic_callback_lock);
while (!list_empty(&iclog->ic_callbacks)) {
@@ -2736,6 +2746,7 @@ xlog_state_do_iclog_callbacks(
*/
spin_lock(&log->l_icloglock);
spin_unlock(&iclog->ic_callback_lock);
+ trace_xlog_iclog_callbacks_done(iclog, _RET_IP_);
}
STATIC void
@@ -2827,6 +2838,7 @@ xlog_state_done_syncing(
spin_lock(&log->l_icloglock);
ASSERT(atomic_read(&iclog->ic_refcnt) == 0);
+ trace_xlog_iclog_sync_done(iclog, _RET_IP_);
/*
* If we got an error, either on the first buffer, or in the case of
@@ -2899,6 +2911,8 @@ xlog_state_get_iclog_space(
atomic_inc(&iclog->ic_refcnt); /* prevents sync */
log_offset = iclog->ic_offset;
+ trace_xlog_iclog_get_space(iclog, _RET_IP_);
+
/* On the 1st write to an iclog, figure out lsn. This works
* if iclogs marked XLOG_STATE_WANT_SYNC always write out what they are
* committing to. If the offset is set, that's how many blocks
@@ -3056,6 +3070,7 @@ xlog_state_switch_iclogs(
{
ASSERT(iclog->ic_state == XLOG_STATE_ACTIVE);
assert_spin_locked(&log->l_icloglock);
+ trace_xlog_iclog_switch(iclog, _RET_IP_);
if (!eventual_size)
eventual_size = iclog->ic_offset;
@@ -3138,6 +3153,8 @@ xfs_log_force(
if (iclog->ic_state == XLOG_STATE_IOERROR)
goto out_error;
+ trace_xlog_iclog_force(iclog, _RET_IP_);
+
if (iclog->ic_state == XLOG_STATE_DIRTY ||
(iclog->ic_state == XLOG_STATE_ACTIVE &&
atomic_read(&iclog->ic_refcnt) == 0 && iclog->ic_offset == 0)) {
@@ -3225,6 +3242,7 @@ xlog_force_lsn(
goto out_error;
while (be64_to_cpu(iclog->ic_header.h_lsn) != lsn) {
+ trace_xlog_iclog_force_lsn(iclog, _RET_IP_);
iclog = iclog->ic_next;
if (iclog == log->l_iclog)
goto out_unlock;
diff --git a/fs/xfs/xfs_log_priv.h b/fs/xfs/xfs_log_priv.h
index e4e421a70335..330befd9f6be 100644
--- a/fs/xfs/xfs_log_priv.h
+++ b/fs/xfs/xfs_log_priv.h
@@ -50,6 +50,16 @@ enum xlog_iclog_state {
XLOG_STATE_IOERROR, /* IO error happened in sync'ing log */
};
+#define XLOG_STATE_STRINGS \
+ { XLOG_STATE_ACTIVE, "XLOG_STATE_ACTIVE" }, \
+ { XLOG_STATE_WANT_SYNC, "XLOG_STATE_WANT_SYNC" }, \
+ { XLOG_STATE_SYNCING, "XLOG_STATE_SYNCING" }, \
+ { XLOG_STATE_DONE_SYNC, "XLOG_STATE_DONE_SYNC" }, \
+ { XLOG_STATE_CALLBACK, "XLOG_STATE_CALLBACK" }, \
+ { XLOG_STATE_DIRTY, "XLOG_STATE_DIRTY" }, \
+ { XLOG_STATE_IOERROR, "XLOG_STATE_IOERROR" }
+
+
/*
* Log ticket flags
*/
diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
index 71dca776c110..28d570742000 100644
--- a/fs/xfs/xfs_trace.h
+++ b/fs/xfs/xfs_trace.h
@@ -24,6 +24,7 @@ struct xlog_ticket;
struct xlog_recover;
struct xlog_recover_item;
struct xlog_rec_header;
+struct xlog_in_core;
struct xfs_buf_log_format;
struct xfs_inode_log_format;
struct xfs_bmbt_irec;
@@ -3927,6 +3928,65 @@ DEFINE_EVENT(xfs_icwalk_class, name, \
DEFINE_ICWALK_EVENT(xfs_ioc_free_eofblocks);
DEFINE_ICWALK_EVENT(xfs_blockgc_free_space);
+TRACE_DEFINE_ENUM(XLOG_STATE_ACTIVE);
+TRACE_DEFINE_ENUM(XLOG_STATE_WANT_SYNC);
+TRACE_DEFINE_ENUM(XLOG_STATE_SYNCING);
+TRACE_DEFINE_ENUM(XLOG_STATE_DONE_SYNC);
+TRACE_DEFINE_ENUM(XLOG_STATE_CALLBACK);
+TRACE_DEFINE_ENUM(XLOG_STATE_DIRTY);
+TRACE_DEFINE_ENUM(XLOG_STATE_IOERROR);
+
+DECLARE_EVENT_CLASS(xlog_iclog_class,
+ TP_PROTO(struct xlog_in_core *iclog, unsigned long caller_ip),
+ TP_ARGS(iclog, caller_ip),
+ TP_STRUCT__entry(
+ __field(dev_t, dev)
+ __field(uint32_t, state)
+ __field(int32_t, refcount)
+ __field(uint32_t, offset)
+ __field(unsigned long long, lsn)
+ __field(unsigned long, caller_ip)
+ ),
+ TP_fast_assign(
+ __entry->dev = iclog->ic_log->l_mp->m_super->s_dev;
+ __entry->state = iclog->ic_state;
+ __entry->refcount = atomic_read(&iclog->ic_refcnt);
+ __entry->offset = iclog->ic_offset;
+ __entry->lsn = be64_to_cpu(iclog->ic_header.h_lsn);
+ __entry->caller_ip = caller_ip;
+ ),
+ TP_printk("dev %d:%d state %s refcnt %d offset %u lsn 0x%llx caller %pS",
+ MAJOR(__entry->dev), MINOR(__entry->dev),
+ __print_symbolic(__entry->state, XLOG_STATE_STRINGS),
+ __entry->refcount,
+ __entry->offset,
+ __entry->lsn,
+ (char *)__entry->caller_ip)
+
+);
+
+#define DEFINE_ICLOG_EVENT(name) \
+DEFINE_EVENT(xlog_iclog_class, name, \
+ TP_PROTO(struct xlog_in_core *iclog, unsigned long caller_ip), \
+ TP_ARGS(iclog, caller_ip))
+
+DEFINE_ICLOG_EVENT(xlog_iclog_activate);
+DEFINE_ICLOG_EVENT(xlog_iclog_clean);
+DEFINE_ICLOG_EVENT(xlog_iclog_callback);
+DEFINE_ICLOG_EVENT(xlog_iclog_callbacks_start);
+DEFINE_ICLOG_EVENT(xlog_iclog_callbacks_done);
+DEFINE_ICLOG_EVENT(xlog_iclog_force);
+DEFINE_ICLOG_EVENT(xlog_iclog_force_lsn);
+DEFINE_ICLOG_EVENT(xlog_iclog_get_space);
+DEFINE_ICLOG_EVENT(xlog_iclog_release);
+DEFINE_ICLOG_EVENT(xlog_iclog_switch);
+DEFINE_ICLOG_EVENT(xlog_iclog_sync);
+DEFINE_ICLOG_EVENT(xlog_iclog_syncing);
+DEFINE_ICLOG_EVENT(xlog_iclog_sync_done);
+DEFINE_ICLOG_EVENT(xlog_iclog_want_sync);
+DEFINE_ICLOG_EVENT(xlog_iclog_wait_on);
+DEFINE_ICLOG_EVENT(xlog_iclog_write);
+
#endif /* _TRACE_XFS_H */
#undef TRACE_INCLUDE_PATH
--
2.31.1
^ permalink raw reply related [flat|nested] 50+ messages in thread
* [PATCH 2/8] xfs: don't wait on future iclogs when pushing the CIL
2021-06-17 8:26 [PATCH 0/8 V2] xfs: log fixes for for-next Dave Chinner
2021-06-17 8:26 ` [PATCH 1/8] xfs: add iclog state trace events Dave Chinner
@ 2021-06-17 8:26 ` Dave Chinner
2021-06-17 17:49 ` Darrick J. Wong
2021-06-17 8:26 ` [PATCH 3/8] xfs: move xlog_commit_record to xfs_log_cil.c Dave Chinner
` (7 subsequent siblings)
9 siblings, 1 reply; 50+ messages in thread
From: Dave Chinner @ 2021-06-17 8:26 UTC (permalink / raw)
To: linux-xfs
From: Dave Chinner <dchinner@redhat.com>
The iclogbuf ring attached to the struct xlog is circular, hence the
first and last iclogs in the ring can only be determined by
comparing them against the log->l_iclog pointer.
In xfs_cil_push_work(), we want to wait on previous iclogs that were
issued so that we can flush them to stable storage with the commit
record write, and it simply waits on the previous iclog in the ring.
This, however, leads to CIL push hangs in generic/019 like so:
task:kworker/u33:0 state:D stack:12680 pid: 7 ppid: 2 flags:0x00004000
Workqueue: xfs-cil/pmem1 xlog_cil_push_work
Call Trace:
__schedule+0x30b/0x9f0
schedule+0x68/0xe0
xlog_wait_on_iclog+0x121/0x190
? wake_up_q+0xa0/0xa0
xlog_cil_push_work+0x994/0xa10
? _raw_spin_lock+0x15/0x20
? xfs_swap_extents+0x920/0x920
process_one_work+0x1ab/0x390
worker_thread+0x56/0x3d0
? rescuer_thread+0x3c0/0x3c0
kthread+0x14d/0x170
? __kthread_bind_mask+0x70/0x70
ret_from_fork+0x1f/0x30
With other threads blocking in either xlog_state_get_iclog_space()
waiting for iclog space or xlog_grant_head_wait() waiting for log
reservation space.
The problem here is that the previous iclog on the ring might
actually be a future iclog. That is, if log->l_iclog points at
commit_iclog, commit_iclog is the first (oldest) iclog in the ring
and there are no previous iclogs pending as they have all completed
their IO and been activated again. IOWs, commit_iclog->ic_prev
points to an iclog that will be written in the future, not one that
has been written in the past.
Hence, in this case, waiting on the ->ic_prev iclog is incorrect
behaviour, and depending on the state of the future iclog, we can
end up with a circular ABA wait cycle and we hang.
The fix is made more complex by the fact that many iclogs states
cannot be used to determine if the iclog is a past or future iclog.
Hence we have to determine past iclogs by checking the LSN of the
iclog rather than their state. A past ACTIVE iclog will have a LSN
of zero, while a future ACTIVE iclog will have a LSN greater than
the current iclog. We don't wait on either of these cases.
Similarly, a future iclog that hasn't completed IO will have an LSN
greater than the current iclog and so we don't wait on them. A past
iclog that is still undergoing IO completion will have a LSN less
than the current iclog and those are the only iclogs that we need to
wait on.
Hence we can use the iclog LSN to determine what iclogs we need to
wait on here.
Fixes: 5fd9256ce156 ("xfs: separate CIL commit record IO")
Reported-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
fs/xfs/xfs_log_cil.c | 51 ++++++++++++++++++++++++++++++++++++++------
1 file changed, 45 insertions(+), 6 deletions(-)
diff --git a/fs/xfs/xfs_log_cil.c b/fs/xfs/xfs_log_cil.c
index 705619e9dab4..2fb0ab02dda3 100644
--- a/fs/xfs/xfs_log_cil.c
+++ b/fs/xfs/xfs_log_cil.c
@@ -1075,15 +1075,54 @@ xlog_cil_push_work(
ticket = ctx->ticket;
/*
- * If the checkpoint spans multiple iclogs, wait for all previous
- * iclogs to complete before we submit the commit_iclog. In this case,
- * the commit_iclog write needs to issue a pre-flush so that the
- * ordering is correctly preserved down to stable storage.
+ * If the checkpoint spans multiple iclogs, wait for all previous iclogs
+ * to complete before we submit the commit_iclog. We can't use state
+ * checks for this - ACTIVE can be either a past completed iclog or a
+ * future iclog being filled, while WANT_SYNC through SYNC_DONE can be a
+ * past or future iclog awaiting IO or ordered IO completion to be run.
+ * In the latter case, if it's a future iclog and we wait on it, the we
+ * will hang because it won't get processed through to ic_force_wait
+ * wakeup until this commit_iclog is written to disk. Hence we use the
+ * iclog header lsn and compare it to the commit lsn to determine if we
+ * need to wait on iclogs or not.
*/
spin_lock(&log->l_icloglock);
if (ctx->start_lsn != commit_lsn) {
- xlog_wait_on_iclog(commit_iclog->ic_prev);
- spin_lock(&log->l_icloglock);
+ struct xlog_in_core *iclog;
+
+ for (iclog = commit_iclog->ic_prev;
+ iclog != commit_iclog;
+ iclog = iclog->ic_prev) {
+ xfs_lsn_t hlsn;
+
+ /*
+ * If the LSN of the iclog is zero or in the future it
+ * means it has passed through IO completion and
+ * activation and hence all previous iclogs have also
+ * done so. We do not need to wait at all in this case.
+ */
+ hlsn = be64_to_cpu(iclog->ic_header.h_lsn);
+ if (!hlsn || XFS_LSN_CMP(hlsn, commit_lsn) > 0)
+ break;
+
+ /*
+ * If the LSN of the iclog is older than the commit lsn,
+ * we have to wait on it. Waiting on this via the
+ * ic_force_wait should also order the completion of all
+ * older iclogs, too, but we leave checking that to the
+ * next loop iteration.
+ */
+ ASSERT(XFS_LSN_CMP(hlsn, commit_lsn) < 0);
+ xlog_wait_on_iclog(iclog);
+ spin_lock(&log->l_icloglock);
+ }
+
+ /*
+ * Regardless of whether we need to wait or not, the the
+ * commit_iclog write needs to issue a pre-flush so that the
+ * ordering for this checkpoint is correctly preserved down to
+ * stable storage.
+ */
commit_iclog->ic_flags |= XLOG_ICL_NEED_FLUSH;
}
--
2.31.1
^ permalink raw reply related [flat|nested] 50+ messages in thread
* [PATCH 3/8] xfs: move xlog_commit_record to xfs_log_cil.c
2021-06-17 8:26 [PATCH 0/8 V2] xfs: log fixes for for-next Dave Chinner
2021-06-17 8:26 ` [PATCH 1/8] xfs: add iclog state trace events Dave Chinner
2021-06-17 8:26 ` [PATCH 2/8] xfs: don't wait on future iclogs when pushing the CIL Dave Chinner
@ 2021-06-17 8:26 ` Dave Chinner
2021-06-17 12:57 ` kernel test robot
` (2 more replies)
2021-06-17 8:26 ` [PATCH 4/8] xfs: pass a CIL context to xlog_write() Dave Chinner
` (6 subsequent siblings)
9 siblings, 3 replies; 50+ messages in thread
From: Dave Chinner @ 2021-06-17 8:26 UTC (permalink / raw)
To: linux-xfs
From: Dave Chinner <dchinner@redhat.com>
It is only used by the CIL checkpoints, and is the counterpart to
start record formatting and writing that is already local to
xfs_log_cil.c.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
fs/xfs/xfs_log.c | 41 ---------------------------------------
fs/xfs/xfs_log_cil.c | 45 ++++++++++++++++++++++++++++++++++++++++++-
fs/xfs/xfs_log_priv.h | 2 --
3 files changed, 44 insertions(+), 44 deletions(-)
diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c
index 54fd6a695bb5..cf661c155786 100644
--- a/fs/xfs/xfs_log.c
+++ b/fs/xfs/xfs_log.c
@@ -1563,47 +1563,6 @@ xlog_alloc_log(
return ERR_PTR(error);
} /* xlog_alloc_log */
-/*
- * Write out the commit record of a transaction associated with the given
- * ticket to close off a running log write. Return the lsn of the commit record.
- */
-int
-xlog_commit_record(
- struct xlog *log,
- struct xlog_ticket *ticket,
- struct xlog_in_core **iclog,
- xfs_lsn_t *lsn)
-{
- struct xlog_op_header ophdr = {
- .oh_clientid = XFS_TRANSACTION,
- .oh_tid = cpu_to_be32(ticket->t_tid),
- .oh_flags = XLOG_COMMIT_TRANS,
- };
- struct xfs_log_iovec reg = {
- .i_addr = &ophdr,
- .i_len = sizeof(struct xlog_op_header),
- .i_type = XLOG_REG_TYPE_COMMIT,
- };
- struct xfs_log_vec vec = {
- .lv_niovecs = 1,
- .lv_iovecp = ®,
- };
- int error;
- LIST_HEAD(lv_chain);
- INIT_LIST_HEAD(&vec.lv_list);
- list_add(&vec.lv_list, &lv_chain);
-
- if (XLOG_FORCED_SHUTDOWN(log))
- return -EIO;
-
- /* account for space used by record data */
- ticket->t_curr_res -= reg.i_len;
- error = xlog_write(log, &lv_chain, ticket, lsn, iclog, reg.i_len);
- if (error)
- xfs_force_shutdown(log->l_mp, SHUTDOWN_LOG_IO_ERROR);
- return error;
-}
-
/*
* Compute the LSN that we'd need to push the log tail towards in order to have
* (a) enough on-disk log space to log the number of bytes specified, (b) at
diff --git a/fs/xfs/xfs_log_cil.c b/fs/xfs/xfs_log_cil.c
index 2fb0ab02dda3..2c8b25888c53 100644
--- a/fs/xfs/xfs_log_cil.c
+++ b/fs/xfs/xfs_log_cil.c
@@ -783,6 +783,48 @@ xlog_cil_build_trans_hdr(
tic->t_curr_res -= lvhdr->lv_bytes;
}
+/*
+ * Write out the commit record of a checkpoint transaction associated with the
+ * given ticket to close off a running log write. Return the lsn of the commit
+ * record.
+ */
+int
+xlog_cil_write_commit_record(
+ struct xlog *log,
+ struct xlog_ticket *ticket,
+ struct xlog_in_core **iclog,
+ xfs_lsn_t *lsn)
+{
+ struct xlog_op_header ophdr = {
+ .oh_clientid = XFS_TRANSACTION,
+ .oh_tid = cpu_to_be32(ticket->t_tid),
+ .oh_flags = XLOG_COMMIT_TRANS,
+ };
+ struct xfs_log_iovec reg = {
+ .i_addr = &ophdr,
+ .i_len = sizeof(struct xlog_op_header),
+ .i_type = XLOG_REG_TYPE_COMMIT,
+ };
+ struct xfs_log_vec vec = {
+ .lv_niovecs = 1,
+ .lv_iovecp = ®,
+ };
+ int error;
+ LIST_HEAD(lv_chain);
+ INIT_LIST_HEAD(&vec.lv_list);
+ list_add(&vec.lv_list, &lv_chain);
+
+ if (XLOG_FORCED_SHUTDOWN(log))
+ return -EIO;
+
+ /* account for space used by record data */
+ ticket->t_curr_res -= reg.i_len;
+ error = xlog_write(log, &lv_chain, ticket, lsn, iclog, reg.i_len);
+ if (error)
+ xfs_force_shutdown(log->l_mp, SHUTDOWN_LOG_IO_ERROR);
+ return error;
+}
+
/*
* CIL item reordering compare function. We want to order in ascending ID order,
* but we want to leave items with the same ID in the order they were added to
@@ -1041,7 +1083,8 @@ xlog_cil_push_work(
}
spin_unlock(&cil->xc_push_lock);
- error = xlog_commit_record(log, ctx->ticket, &commit_iclog, &commit_lsn);
+ error = xlog_cil_write_commit_record(log, ctx->ticket, &commit_iclog,
+ &commit_lsn);
if (error)
goto out_abort_free_ticket;
diff --git a/fs/xfs/xfs_log_priv.h b/fs/xfs/xfs_log_priv.h
index 330befd9f6be..26f26769d1c6 100644
--- a/fs/xfs/xfs_log_priv.h
+++ b/fs/xfs/xfs_log_priv.h
@@ -490,8 +490,6 @@ void xlog_print_trans(struct xfs_trans *);
int xlog_write(struct xlog *log, struct list_head *lv_chain,
struct xlog_ticket *tic, xfs_lsn_t *start_lsn,
struct xlog_in_core **commit_iclog, uint32_t len);
-int xlog_commit_record(struct xlog *log, struct xlog_ticket *ticket,
- struct xlog_in_core **iclog, xfs_lsn_t *lsn);
void xfs_log_ticket_ungrant(struct xlog *log, struct xlog_ticket *ticket);
void xfs_log_ticket_regrant(struct xlog *log, struct xlog_ticket *ticket);
--
2.31.1
^ permalink raw reply related [flat|nested] 50+ messages in thread
* [PATCH 4/8] xfs: pass a CIL context to xlog_write()
2021-06-17 8:26 [PATCH 0/8 V2] xfs: log fixes for for-next Dave Chinner
` (2 preceding siblings ...)
2021-06-17 8:26 ` [PATCH 3/8] xfs: move xlog_commit_record to xfs_log_cil.c Dave Chinner
@ 2021-06-17 8:26 ` Dave Chinner
2021-06-17 14:46 ` kernel test robot
` (2 more replies)
2021-06-17 8:26 ` [PATCH 5/8] xfs: factor out log write ordering from xlog_cil_push_work() Dave Chinner
` (5 subsequent siblings)
9 siblings, 3 replies; 50+ messages in thread
From: Dave Chinner @ 2021-06-17 8:26 UTC (permalink / raw)
To: linux-xfs
From: Dave Chinner <dchinner@redhat.com>
Pass the CIL context to xlog_write() rather than a pointer to a LSN
variable. Only the CIL checkpoint calls to xlog_write() need to know
about the start LSN of the writes, so rework xlog_write to directly
write the LSNs into the CIL context structure.
This removes the commit_lsn variable from xlog_cil_push_work(), so
now we only have to issue the commit record ordering wakeup from
there.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
fs/xfs/xfs_log.c | 22 +++++++++++++++++-----
fs/xfs/xfs_log_cil.c | 19 ++++++++-----------
fs/xfs/xfs_log_priv.h | 4 ++--
3 files changed, 27 insertions(+), 18 deletions(-)
diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c
index cf661c155786..fc0e43c57683 100644
--- a/fs/xfs/xfs_log.c
+++ b/fs/xfs/xfs_log.c
@@ -871,7 +871,7 @@ xlog_write_unmount_record(
*/
if (log->l_targ != log->l_mp->m_ddev_targp)
blkdev_issue_flush(log->l_targ->bt_bdev);
- return xlog_write(log, &lv_chain, ticket, NULL, NULL, reg.i_len);
+ return xlog_write(log, NULL, &lv_chain, ticket, NULL, reg.i_len);
}
/*
@@ -2383,9 +2383,9 @@ xlog_write_partial(
int
xlog_write(
struct xlog *log,
+ struct xfs_cil_ctx *ctx,
struct list_head *lv_chain,
struct xlog_ticket *ticket,
- xfs_lsn_t *start_lsn,
struct xlog_in_core **commit_iclog,
uint32_t len)
{
@@ -2408,9 +2408,21 @@ xlog_write(
if (error)
return error;
- /* start_lsn is the LSN of the first iclog written to. */
- if (start_lsn)
- *start_lsn = be64_to_cpu(iclog->ic_header.h_lsn);
+ /*
+ * If we have a CIL context, record the LSN of the iclog we were just
+ * granted space to start writing into. If the context doesn't have
+ * a start_lsn recorded, then this iclog will contain the start record
+ * for the checkpoint. Otherwise this write contains the commit record
+ * for the checkpoint.
+ */
+ if (ctx) {
+ spin_lock(&ctx->cil->xc_push_lock);
+ if (!ctx->start_lsn)
+ ctx->start_lsn = be64_to_cpu(iclog->ic_header.h_lsn);
+ else
+ ctx->commit_lsn = be64_to_cpu(iclog->ic_header.h_lsn);
+ spin_unlock(&ctx->cil->xc_push_lock);
+ }
lv = list_first_entry_or_null(lv_chain, struct xfs_log_vec, lv_list);
while (lv) {
diff --git a/fs/xfs/xfs_log_cil.c b/fs/xfs/xfs_log_cil.c
index 2c8b25888c53..35fc3e57d870 100644
--- a/fs/xfs/xfs_log_cil.c
+++ b/fs/xfs/xfs_log_cil.c
@@ -790,14 +790,13 @@ xlog_cil_build_trans_hdr(
*/
int
xlog_cil_write_commit_record(
- struct xlog *log,
- struct xlog_ticket *ticket,
- struct xlog_in_core **iclog,
- xfs_lsn_t *lsn)
+ struct xfs_cil_ctx *ctx,
+ struct xlog_in_core **iclog)
{
+ struct xlog *log = ctx->cil->xc_log;
struct xlog_op_header ophdr = {
.oh_clientid = XFS_TRANSACTION,
- .oh_tid = cpu_to_be32(ticket->t_tid),
+ .oh_tid = cpu_to_be32(ctx->ticket->t_tid),
.oh_flags = XLOG_COMMIT_TRANS,
};
struct xfs_log_iovec reg = {
@@ -818,8 +817,8 @@ xlog_cil_write_commit_record(
return -EIO;
/* account for space used by record data */
- ticket->t_curr_res -= reg.i_len;
- error = xlog_write(log, &lv_chain, ticket, lsn, iclog, reg.i_len);
+ ctx->ticket->t_curr_res -= reg.i_len;
+ error = xlog_write(log, ctx, &lv_chain, ctx->ticket, iclog, reg.i_len);
if (error)
xfs_force_shutdown(log->l_mp, SHUTDOWN_LOG_IO_ERROR);
return error;
@@ -1038,7 +1037,7 @@ xlog_cil_push_work(
* use the commit record lsn then we can move the tail beyond the grant
* write head.
*/
- error = xlog_write(log, &ctx->lv_chain, ctx->ticket, &ctx->start_lsn,
+ error = xlog_write(log, ctx, &ctx->lv_chain, ctx->ticket,
NULL, num_bytes);
/*
@@ -1083,8 +1082,7 @@ xlog_cil_push_work(
}
spin_unlock(&cil->xc_push_lock);
- error = xlog_cil_write_commit_record(log, ctx->ticket, &commit_iclog,
- &commit_lsn);
+ error = xlog_cil_write_commit_record(ctx, &commit_iclog);
if (error)
goto out_abort_free_ticket;
@@ -1104,7 +1102,6 @@ xlog_cil_push_work(
* and wake up anyone who is waiting for the commit to complete.
*/
spin_lock(&cil->xc_push_lock);
- ctx->commit_lsn = commit_lsn;
wake_up_all(&cil->xc_commit_wait);
spin_unlock(&cil->xc_push_lock);
diff --git a/fs/xfs/xfs_log_priv.h b/fs/xfs/xfs_log_priv.h
index 26f26769d1c6..af8a9dfa8068 100644
--- a/fs/xfs/xfs_log_priv.h
+++ b/fs/xfs/xfs_log_priv.h
@@ -487,8 +487,8 @@ xlog_write_adv_cnt(void **ptr, int *len, int *off, size_t bytes)
void xlog_print_tic_res(struct xfs_mount *mp, struct xlog_ticket *ticket);
void xlog_print_trans(struct xfs_trans *);
-int xlog_write(struct xlog *log, struct list_head *lv_chain,
- struct xlog_ticket *tic, xfs_lsn_t *start_lsn,
+int xlog_write(struct xlog *log, struct xfs_cil_ctx *ctx,
+ struct list_head *lv_chain, struct xlog_ticket *tic,
struct xlog_in_core **commit_iclog, uint32_t len);
void xfs_log_ticket_ungrant(struct xlog *log, struct xlog_ticket *ticket);
--
2.31.1
^ permalink raw reply related [flat|nested] 50+ messages in thread
* [PATCH 5/8] xfs: factor out log write ordering from xlog_cil_push_work()
2021-06-17 8:26 [PATCH 0/8 V2] xfs: log fixes for for-next Dave Chinner
` (3 preceding siblings ...)
2021-06-17 8:26 ` [PATCH 4/8] xfs: pass a CIL context to xlog_write() Dave Chinner
@ 2021-06-17 8:26 ` Dave Chinner
2021-06-17 19:59 ` Darrick J. Wong
2021-06-17 8:26 ` [PATCH 6/8] xfs: separate out setting CIL context LSNs from xlog_write Dave Chinner
` (4 subsequent siblings)
9 siblings, 1 reply; 50+ messages in thread
From: Dave Chinner @ 2021-06-17 8:26 UTC (permalink / raw)
To: linux-xfs
From: Dave Chinner <dchinner@redhat.com>
So we can use it for start record ordering as well as commit record
ordering in future.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
fs/xfs/xfs_log_cil.c | 89 ++++++++++++++++++++++++++------------------
1 file changed, 52 insertions(+), 37 deletions(-)
diff --git a/fs/xfs/xfs_log_cil.c b/fs/xfs/xfs_log_cil.c
index 35fc3e57d870..f993ec69fc97 100644
--- a/fs/xfs/xfs_log_cil.c
+++ b/fs/xfs/xfs_log_cil.c
@@ -784,9 +784,54 @@ xlog_cil_build_trans_hdr(
}
/*
- * Write out the commit record of a checkpoint transaction associated with the
- * given ticket to close off a running log write. Return the lsn of the commit
- * record.
+ * Ensure that the order of log writes follows checkpoint sequence order. This
+ * relies on the context LSN being zero until the log write has guaranteed the
+ * LSN that the log write will start at via xlog_state_get_iclog_space().
+ */
+static int
+xlog_cil_order_write(
+ struct xfs_cil *cil,
+ xfs_csn_t sequence)
+{
+ struct xfs_cil_ctx *ctx;
+
+restart:
+ spin_lock(&cil->xc_push_lock);
+ list_for_each_entry(ctx, &cil->xc_committing, committing) {
+ /*
+ * Avoid getting stuck in this loop because we were woken by the
+ * shutdown, but then went back to sleep once already in the
+ * shutdown state.
+ */
+ if (XLOG_FORCED_SHUTDOWN(cil->xc_log)) {
+ spin_unlock(&cil->xc_push_lock);
+ return -EIO;
+ }
+
+ /*
+ * Higher sequences will wait for this one so skip them.
+ * Don't wait for our own sequence, either.
+ */
+ if (ctx->sequence >= sequence)
+ continue;
+ if (!ctx->commit_lsn) {
+ /*
+ * It is still being pushed! Wait for the push to
+ * complete, then start again from the beginning.
+ */
+ xlog_wait(&cil->xc_commit_wait, &cil->xc_push_lock);
+ goto restart;
+ }
+ }
+ spin_unlock(&cil->xc_push_lock);
+ return 0;
+}
+
+/*
+ * Write out the commit record of a checkpoint transaction to close off a
+ * running log write. These commit records are strictly ordered in ascending CIL
+ * sequence order so that log recovery will always replay the checkpoints in the
+ * correct order.
*/
int
xlog_cil_write_commit_record(
@@ -816,6 +861,10 @@ xlog_cil_write_commit_record(
if (XLOG_FORCED_SHUTDOWN(log))
return -EIO;
+ error = xlog_cil_order_write(ctx->cil, ctx->sequence);
+ if (error)
+ return error;
+
/* account for space used by record data */
ctx->ticket->t_curr_res -= reg.i_len;
error = xlog_write(log, ctx, &lv_chain, ctx->ticket, iclog, reg.i_len);
@@ -1048,40 +1097,6 @@ xlog_cil_push_work(
if (error)
goto out_abort_free_ticket;
- /*
- * now that we've written the checkpoint into the log, strictly
- * order the commit records so replay will get them in the right order.
- */
-restart:
- spin_lock(&cil->xc_push_lock);
- list_for_each_entry(new_ctx, &cil->xc_committing, committing) {
- /*
- * Avoid getting stuck in this loop because we were woken by the
- * shutdown, but then went back to sleep once already in the
- * shutdown state.
- */
- if (XLOG_FORCED_SHUTDOWN(log)) {
- spin_unlock(&cil->xc_push_lock);
- goto out_abort_free_ticket;
- }
-
- /*
- * Higher sequences will wait for this one so skip them.
- * Don't wait for our own sequence, either.
- */
- if (new_ctx->sequence >= ctx->sequence)
- continue;
- if (!new_ctx->commit_lsn) {
- /*
- * It is still being pushed! Wait for the push to
- * complete, then start again from the beginning.
- */
- xlog_wait(&cil->xc_commit_wait, &cil->xc_push_lock);
- goto restart;
- }
- }
- spin_unlock(&cil->xc_push_lock);
-
error = xlog_cil_write_commit_record(ctx, &commit_iclog);
if (error)
goto out_abort_free_ticket;
--
2.31.1
^ permalink raw reply related [flat|nested] 50+ messages in thread
* [PATCH 6/8] xfs: separate out setting CIL context LSNs from xlog_write
2021-06-17 8:26 [PATCH 0/8 V2] xfs: log fixes for for-next Dave Chinner
` (4 preceding siblings ...)
2021-06-17 8:26 ` [PATCH 5/8] xfs: factor out log write ordering from xlog_cil_push_work() Dave Chinner
@ 2021-06-17 8:26 ` Dave Chinner
2021-06-17 20:28 ` Darrick J. Wong
2021-06-17 8:26 ` [PATCH 7/8] xfs: attached iclog callbacks in xlog_cil_set_ctx_write_state() Dave Chinner
` (3 subsequent siblings)
9 siblings, 1 reply; 50+ messages in thread
From: Dave Chinner @ 2021-06-17 8:26 UTC (permalink / raw)
To: linux-xfs
From: Dave Chinner <dchinner@redhat.com>
In preparation for moving more CIL context specific functionality
into these operations.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
fs/xfs/xfs_log.c | 17 ++---------------
fs/xfs/xfs_log_cil.c | 23 +++++++++++++++++++++++
fs/xfs/xfs_log_priv.h | 2 ++
3 files changed, 27 insertions(+), 15 deletions(-)
diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c
index fc0e43c57683..1c214b395223 100644
--- a/fs/xfs/xfs_log.c
+++ b/fs/xfs/xfs_log.c
@@ -2408,21 +2408,8 @@ xlog_write(
if (error)
return error;
- /*
- * If we have a CIL context, record the LSN of the iclog we were just
- * granted space to start writing into. If the context doesn't have
- * a start_lsn recorded, then this iclog will contain the start record
- * for the checkpoint. Otherwise this write contains the commit record
- * for the checkpoint.
- */
- if (ctx) {
- spin_lock(&ctx->cil->xc_push_lock);
- if (!ctx->start_lsn)
- ctx->start_lsn = be64_to_cpu(iclog->ic_header.h_lsn);
- else
- ctx->commit_lsn = be64_to_cpu(iclog->ic_header.h_lsn);
- spin_unlock(&ctx->cil->xc_push_lock);
- }
+ if (ctx)
+ xlog_cil_set_ctx_write_state(ctx, iclog);
lv = list_first_entry_or_null(lv_chain, struct xfs_log_vec, lv_list);
while (lv) {
diff --git a/fs/xfs/xfs_log_cil.c b/fs/xfs/xfs_log_cil.c
index f993ec69fc97..2d8d904ffb78 100644
--- a/fs/xfs/xfs_log_cil.c
+++ b/fs/xfs/xfs_log_cil.c
@@ -783,6 +783,29 @@ xlog_cil_build_trans_hdr(
tic->t_curr_res -= lvhdr->lv_bytes;
}
+/*
+ * Record the LSN of the iclog we were just granted space to start writing into.
+ * If the context doesn't have a start_lsn recorded, then this iclog will
+ * contain the start record for the checkpoint. Otherwise this write contains
+ * the commit record for the checkpoint.
+ */
+void
+xlog_cil_set_ctx_write_state(
+ struct xfs_cil_ctx *ctx,
+ struct xlog_in_core *iclog)
+{
+ struct xfs_cil *cil = ctx->cil;
+ xfs_lsn_t lsn = be64_to_cpu(iclog->ic_header.h_lsn);
+
+ ASSERT(!ctx->commit_lsn);
+ spin_lock(&cil->xc_push_lock);
+ if (!ctx->start_lsn)
+ ctx->start_lsn = lsn;
+ else
+ ctx->commit_lsn = lsn;
+ spin_unlock(&cil->xc_push_lock);
+}
+
/*
* Ensure that the order of log writes follows checkpoint sequence order. This
* relies on the context LSN being zero until the log write has guaranteed the
diff --git a/fs/xfs/xfs_log_priv.h b/fs/xfs/xfs_log_priv.h
index af8a9dfa8068..849ba2eb3483 100644
--- a/fs/xfs/xfs_log_priv.h
+++ b/fs/xfs/xfs_log_priv.h
@@ -563,6 +563,8 @@ void xlog_cil_destroy(struct xlog *log);
bool xlog_cil_empty(struct xlog *log);
void xlog_cil_commit(struct xlog *log, struct xfs_trans *tp,
xfs_csn_t *commit_seq, bool regrant);
+void xlog_cil_set_ctx_write_state(struct xfs_cil_ctx *ctx,
+ struct xlog_in_core *iclog);
/*
* CIL force routines
--
2.31.1
^ permalink raw reply related [flat|nested] 50+ messages in thread
* [PATCH 7/8] xfs: attached iclog callbacks in xlog_cil_set_ctx_write_state()
2021-06-17 8:26 [PATCH 0/8 V2] xfs: log fixes for for-next Dave Chinner
` (5 preceding siblings ...)
2021-06-17 8:26 ` [PATCH 6/8] xfs: separate out setting CIL context LSNs from xlog_write Dave Chinner
@ 2021-06-17 8:26 ` Dave Chinner
2021-06-17 20:55 ` Darrick J. Wong
2021-06-17 8:26 ` [PATCH 8/8] xfs: order CIL checkpoint start records Dave Chinner
` (2 subsequent siblings)
9 siblings, 1 reply; 50+ messages in thread
From: Dave Chinner @ 2021-06-17 8:26 UTC (permalink / raw)
To: linux-xfs
From: Dave Chinner <dchinner@redhat.com>
We currently attach iclog callbacks for the CIL when the commit
iclog is returned from xlog_write. Because
xlog_state_get_iclog_space() always guarantees that the commit
record will fit in the iclog it returns, we can move this IO
callback setting to xlog_cil_set_ctx_write_state(), record the
commit iclog in the context and remove the need for the commit iclog
to be returned by xlog_write() altogether.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
fs/xfs/xfs_log.c | 8 ++----
fs/xfs/xfs_log_cil.c | 65 +++++++++++++++++++++++++------------------
fs/xfs/xfs_log_priv.h | 3 +-
3 files changed, 42 insertions(+), 34 deletions(-)
diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c
index 1c214b395223..359246d54db7 100644
--- a/fs/xfs/xfs_log.c
+++ b/fs/xfs/xfs_log.c
@@ -871,7 +871,7 @@ xlog_write_unmount_record(
*/
if (log->l_targ != log->l_mp->m_ddev_targp)
blkdev_issue_flush(log->l_targ->bt_bdev);
- return xlog_write(log, NULL, &lv_chain, ticket, NULL, reg.i_len);
+ return xlog_write(log, NULL, &lv_chain, ticket, reg.i_len);
}
/*
@@ -2386,7 +2386,6 @@ xlog_write(
struct xfs_cil_ctx *ctx,
struct list_head *lv_chain,
struct xlog_ticket *ticket,
- struct xlog_in_core **commit_iclog,
uint32_t len)
{
struct xlog_in_core *iclog = NULL;
@@ -2436,10 +2435,7 @@ xlog_write(
*/
spin_lock(&log->l_icloglock);
xlog_state_finish_copy(log, iclog, record_cnt, 0);
- if (commit_iclog)
- *commit_iclog = iclog;
- else
- error = xlog_state_release_iclog(log, iclog, ticket);
+ error = xlog_state_release_iclog(log, iclog, ticket);
spin_unlock(&log->l_icloglock);
return error;
diff --git a/fs/xfs/xfs_log_cil.c b/fs/xfs/xfs_log_cil.c
index 2d8d904ffb78..87e30917ce2e 100644
--- a/fs/xfs/xfs_log_cil.c
+++ b/fs/xfs/xfs_log_cil.c
@@ -799,11 +799,34 @@ xlog_cil_set_ctx_write_state(
ASSERT(!ctx->commit_lsn);
spin_lock(&cil->xc_push_lock);
- if (!ctx->start_lsn)
+ if (!ctx->start_lsn) {
ctx->start_lsn = lsn;
- else
- ctx->commit_lsn = lsn;
+ spin_unlock(&cil->xc_push_lock);
+ return;
+ }
+
+ /*
+ * Take a reference to the iclog for the context so that we still hold
+ * it when xlog_write is done and has released it. This means the
+ * context controls when the iclog is released for IO.
+ */
+ atomic_inc(&iclog->ic_refcnt);
+ ctx->commit_iclog = iclog;
+ ctx->commit_lsn = lsn;
spin_unlock(&cil->xc_push_lock);
+
+ /*
+ * xlog_state_get_iclog_space() guarantees there is enough space in the
+ * iclog for an entire commit record, so attach the context callbacks to
+ * the iclog at this time if we are not already in a shutdown state.
+ */
+ spin_lock(&iclog->ic_callback_lock);
+ if (iclog->ic_state == XLOG_STATE_IOERROR) {
+ spin_unlock(&iclog->ic_callback_lock);
+ return;
+ }
+ list_add_tail(&ctx->iclog_entry, &iclog->ic_callbacks);
+ spin_unlock(&iclog->ic_callback_lock);
}
/*
@@ -858,8 +881,7 @@ xlog_cil_order_write(
*/
int
xlog_cil_write_commit_record(
- struct xfs_cil_ctx *ctx,
- struct xlog_in_core **iclog)
+ struct xfs_cil_ctx *ctx)
{
struct xlog *log = ctx->cil->xc_log;
struct xlog_op_header ophdr = {
@@ -890,7 +912,7 @@ xlog_cil_write_commit_record(
/* account for space used by record data */
ctx->ticket->t_curr_res -= reg.i_len;
- error = xlog_write(log, ctx, &lv_chain, ctx->ticket, iclog, reg.i_len);
+ error = xlog_write(log, ctx, &lv_chain, ctx->ticket, reg.i_len);
if (error)
xfs_force_shutdown(log->l_mp, SHUTDOWN_LOG_IO_ERROR);
return error;
@@ -940,7 +962,6 @@ xlog_cil_push_work(
struct xlog *log = cil->xc_log;
struct xfs_log_vec *lv;
struct xfs_cil_ctx *new_ctx;
- struct xlog_in_core *commit_iclog;
int num_iovecs = 0;
int num_bytes = 0;
int error = 0;
@@ -1109,8 +1130,7 @@ xlog_cil_push_work(
* use the commit record lsn then we can move the tail beyond the grant
* write head.
*/
- error = xlog_write(log, ctx, &ctx->lv_chain, ctx->ticket,
- NULL, num_bytes);
+ error = xlog_write(log, ctx, &ctx->lv_chain, ctx->ticket, num_bytes);
/*
* Take the lvhdr back off the lv_chain as it should not be passed
@@ -1120,20 +1140,10 @@ xlog_cil_push_work(
if (error)
goto out_abort_free_ticket;
- error = xlog_cil_write_commit_record(ctx, &commit_iclog);
+ error = xlog_cil_write_commit_record(ctx);
if (error)
goto out_abort_free_ticket;
- spin_lock(&commit_iclog->ic_callback_lock);
- if (commit_iclog->ic_state == XLOG_STATE_IOERROR) {
- spin_unlock(&commit_iclog->ic_callback_lock);
- goto out_abort_free_ticket;
- }
- ASSERT_ALWAYS(commit_iclog->ic_state == XLOG_STATE_ACTIVE ||
- commit_iclog->ic_state == XLOG_STATE_WANT_SYNC);
- list_add_tail(&ctx->iclog_entry, &commit_iclog->ic_callbacks);
- spin_unlock(&commit_iclog->ic_callback_lock);
-
/*
* now the checkpoint commit is complete and we've attached the
* callbacks to the iclog we can assign the commit LSN to the context
@@ -1168,8 +1178,8 @@ xlog_cil_push_work(
if (ctx->start_lsn != commit_lsn) {
struct xlog_in_core *iclog;
- for (iclog = commit_iclog->ic_prev;
- iclog != commit_iclog;
+ for (iclog = ctx->commit_iclog->ic_prev;
+ iclog != ctx->commit_iclog;
iclog = iclog->ic_prev) {
xfs_lsn_t hlsn;
@@ -1201,7 +1211,7 @@ xlog_cil_push_work(
* ordering for this checkpoint is correctly preserved down to
* stable storage.
*/
- commit_iclog->ic_flags |= XLOG_ICL_NEED_FLUSH;
+ ctx->commit_iclog->ic_flags |= XLOG_ICL_NEED_FLUSH;
}
/*
@@ -1214,10 +1224,11 @@ xlog_cil_push_work(
* will be written when released, switch it's state to WANT_SYNC right
* now.
*/
- commit_iclog->ic_flags |= XLOG_ICL_NEED_FUA;
- if (push_commit_stable && commit_iclog->ic_state == XLOG_STATE_ACTIVE)
- xlog_state_switch_iclogs(log, commit_iclog, 0);
- xlog_state_release_iclog(log, commit_iclog, ticket);
+ ctx->commit_iclog->ic_flags |= XLOG_ICL_NEED_FUA;
+ if (push_commit_stable &&
+ ctx->commit_iclog->ic_state == XLOG_STATE_ACTIVE)
+ xlog_state_switch_iclogs(log, ctx->commit_iclog, 0);
+ xlog_state_release_iclog(log, ctx->commit_iclog, ticket);
spin_unlock(&log->l_icloglock);
xfs_log_ticket_ungrant(log, ticket);
diff --git a/fs/xfs/xfs_log_priv.h b/fs/xfs/xfs_log_priv.h
index 849ba2eb3483..72dfa3b89513 100644
--- a/fs/xfs/xfs_log_priv.h
+++ b/fs/xfs/xfs_log_priv.h
@@ -237,6 +237,7 @@ struct xfs_cil_ctx {
struct work_struct discard_endio_work;
struct work_struct push_work;
atomic_t order_id;
+ struct xlog_in_core *commit_iclog;
};
/*
@@ -489,7 +490,7 @@ void xlog_print_tic_res(struct xfs_mount *mp, struct xlog_ticket *ticket);
void xlog_print_trans(struct xfs_trans *);
int xlog_write(struct xlog *log, struct xfs_cil_ctx *ctx,
struct list_head *lv_chain, struct xlog_ticket *tic,
- struct xlog_in_core **commit_iclog, uint32_t len);
+ uint32_t len);
void xfs_log_ticket_ungrant(struct xlog *log, struct xlog_ticket *ticket);
void xfs_log_ticket_regrant(struct xlog *log, struct xlog_ticket *ticket);
--
2.31.1
^ permalink raw reply related [flat|nested] 50+ messages in thread
* [PATCH 8/8] xfs: order CIL checkpoint start records
2021-06-17 8:26 [PATCH 0/8 V2] xfs: log fixes for for-next Dave Chinner
` (6 preceding siblings ...)
2021-06-17 8:26 ` [PATCH 7/8] xfs: attached iclog callbacks in xlog_cil_set_ctx_write_state() Dave Chinner
@ 2021-06-17 8:26 ` Dave Chinner
2021-06-17 21:31 ` Darrick J. Wong
2021-06-17 18:32 ` [PATCH 0/8 V2] xfs: log fixes for for-next Brian Foster
2021-06-18 22:48 ` Dave Chinner
9 siblings, 1 reply; 50+ messages in thread
From: Dave Chinner @ 2021-06-17 8:26 UTC (permalink / raw)
To: linux-xfs
From: Dave Chinner <dchinner@redhat.com>
Because log recovery depends on strictly ordered start records as
well as strictly ordered commit records.
This is a zero day bug in the way XFS writes pipelined transactions
to the journal which is exposed by commit facd77e4e38b ("xfs: CIL
work is serialised, not pipelined") which re-introduces explicit
concurrent commits back into the on-disk journal.
The XFS journal commit code has never ordered start records and we
have relied on strict commit record ordering for correct recovery
ordering of concurrently written transactions. Unfortunately, root
cause analysis uncovered the fact that log recovery uses the LSN of
the start record for transaction commit processing. Hence the
commits are processed in strict orderi by recovery, but the LSNs
associated with the commits can be out of order and so recovery may
stamp incorrect LSNs into objects and/or misorder intents in the AIL
for later processing. This can result in log recovery failures
and/or on disk corruption, sometimes silent.
Because this is a long standing log recovery issue, we can't just
fix log recovery and call it good. This still leaves older kernels
susceptible to recovery failures and corruption when replaying a log
from a kernel that pipelines checkpoints. There is also the issue
that in-memory ordering for AIL pushing and data integrity
operations are based on checkpoint start LSNs, and if the start LSN
is incorrect in the journal, it is also incorrect in memory.
Hence there's really only one choice for fixing this zero-day bug:
we need to strictly order checkpoint start records in ascending
sequence order in the log, the same way we already strictly order
commit records.
Fixes: facd77e4e38b ("xfs: CIL work is serialised, not pipelined")
Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
fs/xfs/xfs_log.c | 1 +
fs/xfs/xfs_log_cil.c | 101 +++++++++++++++++++++++++++++-------------
fs/xfs/xfs_log_priv.h | 1 +
3 files changed, 71 insertions(+), 32 deletions(-)
diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c
index 359246d54db7..94b6bccb9de9 100644
--- a/fs/xfs/xfs_log.c
+++ b/fs/xfs/xfs_log.c
@@ -3743,6 +3743,7 @@ xfs_log_force_umount(
* avoid races.
*/
spin_lock(&log->l_cilp->xc_push_lock);
+ wake_up_all(&log->l_cilp->xc_start_wait);
wake_up_all(&log->l_cilp->xc_commit_wait);
spin_unlock(&log->l_cilp->xc_push_lock);
xlog_state_do_callback(log);
diff --git a/fs/xfs/xfs_log_cil.c b/fs/xfs/xfs_log_cil.c
index 87e30917ce2e..722c21f21b81 100644
--- a/fs/xfs/xfs_log_cil.c
+++ b/fs/xfs/xfs_log_cil.c
@@ -684,6 +684,7 @@ xlog_cil_committed(
*/
if (abort) {
spin_lock(&ctx->cil->xc_push_lock);
+ wake_up_all(&ctx->cil->xc_start_wait);
wake_up_all(&ctx->cil->xc_commit_wait);
spin_unlock(&ctx->cil->xc_push_lock);
}
@@ -788,6 +789,10 @@ xlog_cil_build_trans_hdr(
* If the context doesn't have a start_lsn recorded, then this iclog will
* contain the start record for the checkpoint. Otherwise this write contains
* the commit record for the checkpoint.
+ *
+ * Once we've set the LSN for the given operation, wake up any ordered write
+ * waiters that can make progress now that we have a stable LSN for write
+ * ordering purposes.
*/
void
xlog_cil_set_ctx_write_state(
@@ -798,9 +803,16 @@ xlog_cil_set_ctx_write_state(
xfs_lsn_t lsn = be64_to_cpu(iclog->ic_header.h_lsn);
ASSERT(!ctx->commit_lsn);
- spin_lock(&cil->xc_push_lock);
if (!ctx->start_lsn) {
+ spin_lock(&cil->xc_push_lock);
+ /*
+ * The LSN we need to pass to the log items on transaction
+ * commit is the LSN reported by the first log vector write, not
+ * the commit lsn. If we use the commit record lsn then we can
+ * move the tail beyond the grant write head.
+ */
ctx->start_lsn = lsn;
+ wake_up_all(&cil->xc_start_wait);
spin_unlock(&cil->xc_push_lock);
return;
}
@@ -811,9 +823,6 @@ xlog_cil_set_ctx_write_state(
* context controls when the iclog is released for IO.
*/
atomic_inc(&iclog->ic_refcnt);
- ctx->commit_iclog = iclog;
- ctx->commit_lsn = lsn;
- spin_unlock(&cil->xc_push_lock);
/*
* xlog_state_get_iclog_space() guarantees there is enough space in the
@@ -827,6 +836,12 @@ xlog_cil_set_ctx_write_state(
}
list_add_tail(&ctx->iclog_entry, &iclog->ic_callbacks);
spin_unlock(&iclog->ic_callback_lock);
+
+ spin_lock(&cil->xc_push_lock);
+ ctx->commit_iclog = iclog;
+ ctx->commit_lsn = lsn;
+ wake_up_all(&cil->xc_commit_wait);
+ spin_unlock(&cil->xc_push_lock);
}
/*
@@ -834,10 +849,16 @@ xlog_cil_set_ctx_write_state(
* relies on the context LSN being zero until the log write has guaranteed the
* LSN that the log write will start at via xlog_state_get_iclog_space().
*/
+enum {
+ _START_RECORD,
+ _COMMIT_RECORD,
+};
+
static int
xlog_cil_order_write(
struct xfs_cil *cil,
- xfs_csn_t sequence)
+ xfs_csn_t sequence,
+ int record)
{
struct xfs_cil_ctx *ctx;
@@ -860,19 +881,50 @@ xlog_cil_order_write(
*/
if (ctx->sequence >= sequence)
continue;
- if (!ctx->commit_lsn) {
- /*
- * It is still being pushed! Wait for the push to
- * complete, then start again from the beginning.
- */
- xlog_wait(&cil->xc_commit_wait, &cil->xc_push_lock);
- goto restart;
+
+ /* Wait until the LSN for the record has been recorded. */
+ switch (record) {
+ case _START_RECORD:
+ if (!ctx->start_lsn) {
+ xlog_wait(&cil->xc_start_wait, &cil->xc_push_lock);
+ goto restart;
+ }
+ break;
+ case _COMMIT_RECORD:
+ if (!ctx->commit_lsn) {
+ xlog_wait(&cil->xc_commit_wait, &cil->xc_push_lock);
+ goto restart;
+ }
+ break;
+ default:
+ ASSERT(0);
+ break;
}
}
spin_unlock(&cil->xc_push_lock);
return 0;
}
+/*
+ * Write out the log vector change now attached to the CIL context. This will
+ * write a start record that needs to be strictly ordered in ascending CIL
+ * sequence order so that log recovery will always use in-order start LSNs when
+ * replaying checkpoints.
+ */
+static int
+xlog_cil_write_chain(
+ struct xfs_cil_ctx *ctx,
+ uint32_t num_bytes)
+{
+ struct xlog *log = ctx->cil->xc_log;
+ int error;
+
+ error = xlog_cil_order_write(ctx->cil, ctx->sequence, _START_RECORD);
+ if (error)
+ return error;
+ return xlog_write(log, ctx, &ctx->lv_chain, ctx->ticket, num_bytes);
+}
+
/*
* Write out the commit record of a checkpoint transaction to close off a
* running log write. These commit records are strictly ordered in ascending CIL
@@ -906,7 +958,7 @@ xlog_cil_write_commit_record(
if (XLOG_FORCED_SHUTDOWN(log))
return -EIO;
- error = xlog_cil_order_write(ctx->cil, ctx->sequence);
+ error = xlog_cil_order_write(ctx->cil, ctx->sequence, _COMMIT_RECORD);
if (error)
return error;
@@ -1125,17 +1177,10 @@ xlog_cil_push_work(
wait_for_completion(&bdev_flush);
/*
- * The LSN we need to pass to the log items on transaction commit is the
- * LSN reported by the first log vector write, not the commit lsn. If we
- * use the commit record lsn then we can move the tail beyond the grant
- * write head.
- */
- error = xlog_write(log, ctx, &ctx->lv_chain, ctx->ticket, num_bytes);
-
- /*
- * Take the lvhdr back off the lv_chain as it should not be passed
- * to log IO completion.
+ * Once we write the log vector chain, take the lvhdr back off it as it
+ * must not be passed to log IO completion.
*/
+ error = xlog_cil_write_chain(ctx, num_bytes);
list_del(&lvhdr.lv_list);
if (error)
goto out_abort_free_ticket;
@@ -1144,15 +1189,6 @@ xlog_cil_push_work(
if (error)
goto out_abort_free_ticket;
- /*
- * now the checkpoint commit is complete and we've attached the
- * callbacks to the iclog we can assign the commit LSN to the context
- * and wake up anyone who is waiting for the commit to complete.
- */
- spin_lock(&cil->xc_push_lock);
- wake_up_all(&cil->xc_commit_wait);
- spin_unlock(&cil->xc_push_lock);
-
/*
* Pull the ticket off the ctx so we can ungrant it after releasing the
* commit_iclog. The ctx may be freed by the time we return from
@@ -1728,6 +1764,7 @@ xlog_cil_init(
init_waitqueue_head(&cil->xc_push_wait);
init_rwsem(&cil->xc_ctx_lock);
init_waitqueue_head(&cil->xc_commit_wait);
+ init_waitqueue_head(&cil->xc_start_wait);
log->l_cilp = cil;
ctx = xlog_cil_ctx_alloc();
diff --git a/fs/xfs/xfs_log_priv.h b/fs/xfs/xfs_log_priv.h
index 72dfa3b89513..b807a179b916 100644
--- a/fs/xfs/xfs_log_priv.h
+++ b/fs/xfs/xfs_log_priv.h
@@ -279,6 +279,7 @@ struct xfs_cil {
bool xc_push_commit_stable;
struct list_head xc_committing;
wait_queue_head_t xc_commit_wait;
+ wait_queue_head_t xc_start_wait;
xfs_csn_t xc_current_sequence;
wait_queue_head_t xc_push_wait; /* background push throttle */
--
2.31.1
^ permalink raw reply related [flat|nested] 50+ messages in thread
* Re: [PATCH 3/8] xfs: move xlog_commit_record to xfs_log_cil.c
2021-06-17 8:26 ` [PATCH 3/8] xfs: move xlog_commit_record to xfs_log_cil.c Dave Chinner
@ 2021-06-17 12:57 ` kernel test robot
2021-06-17 17:50 ` Darrick J. Wong
2021-06-18 14:16 ` Christoph Hellwig
2 siblings, 0 replies; 50+ messages in thread
From: kernel test robot @ 2021-06-17 12:57 UTC (permalink / raw)
To: Dave Chinner, linux-xfs; +Cc: kbuild-all, clang-built-linux
[-- Attachment #1: Type: text/plain, Size: 3475 bytes --]
Hi Dave,
Thank you for the patch! Perhaps something to improve:
[auto build test WARNING on xfs-linux/for-next]
[also build test WARNING on next-20210616]
[cannot apply to v5.13-rc6]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]
url: https://github.com/0day-ci/linux/commits/Dave-Chinner/xfs-log-fixes-for-for-next/20210617-162640
base: https://git.kernel.org/pub/scm/fs/xfs/xfs-linux.git for-next
config: x86_64-randconfig-a011-20210617 (attached as .config)
compiler: clang version 13.0.0 (https://github.com/llvm/llvm-project 64720f57bea6a6bf033feef4a5751ab9c0c3b401)
reproduce (this is a W=1 build):
wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# install x86_64 cross compiling tool for clang build
# apt-get install binutils-x86-64-linux-gnu
# https://github.com/0day-ci/linux/commit/8634f301cb32bdc5ebbfcf0671509ca5fa857edd
git remote add linux-review https://github.com/0day-ci/linux
git fetch --no-tags linux-review Dave-Chinner/xfs-log-fixes-for-for-next/20210617-162640
git checkout 8634f301cb32bdc5ebbfcf0671509ca5fa857edd
# save the attached .config to linux build tree
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross ARCH=x86_64
If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>
All warnings (new ones prefixed by >>):
>> fs/xfs/xfs_log_cil.c:792:1: warning: no previous prototype for function 'xlog_cil_write_commit_record' [-Wmissing-prototypes]
xlog_cil_write_commit_record(
^
fs/xfs/xfs_log_cil.c:791:1: note: declare 'static' if the function is not intended to be used outside of this translation unit
int
^
static
1 warning generated.
vim +/xlog_cil_write_commit_record +792 fs/xfs/xfs_log_cil.c
785
786 /*
787 * Write out the commit record of a checkpoint transaction associated with the
788 * given ticket to close off a running log write. Return the lsn of the commit
789 * record.
790 */
791 int
> 792 xlog_cil_write_commit_record(
793 struct xlog *log,
794 struct xlog_ticket *ticket,
795 struct xlog_in_core **iclog,
796 xfs_lsn_t *lsn)
797 {
798 struct xlog_op_header ophdr = {
799 .oh_clientid = XFS_TRANSACTION,
800 .oh_tid = cpu_to_be32(ticket->t_tid),
801 .oh_flags = XLOG_COMMIT_TRANS,
802 };
803 struct xfs_log_iovec reg = {
804 .i_addr = &ophdr,
805 .i_len = sizeof(struct xlog_op_header),
806 .i_type = XLOG_REG_TYPE_COMMIT,
807 };
808 struct xfs_log_vec vec = {
809 .lv_niovecs = 1,
810 .lv_iovecp = ®,
811 };
812 int error;
813 LIST_HEAD(lv_chain);
814 INIT_LIST_HEAD(&vec.lv_list);
815 list_add(&vec.lv_list, &lv_chain);
816
817 if (XLOG_FORCED_SHUTDOWN(log))
818 return -EIO;
819
820 /* account for space used by record data */
821 ticket->t_curr_res -= reg.i_len;
822 error = xlog_write(log, &lv_chain, ticket, lsn, iclog, reg.i_len);
823 if (error)
824 xfs_force_shutdown(log->l_mp, SHUTDOWN_LOG_IO_ERROR);
825 return error;
826 }
827
---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org
[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 34451 bytes --]
^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [PATCH 3/8] xfs: move xlog_commit_record to xfs_log_cil.c
@ 2021-06-17 12:57 ` kernel test robot
0 siblings, 0 replies; 50+ messages in thread
From: kernel test robot @ 2021-06-17 12:57 UTC (permalink / raw)
To: kbuild-all
[-- Attachment #1: Type: text/plain, Size: 3568 bytes --]
Hi Dave,
Thank you for the patch! Perhaps something to improve:
[auto build test WARNING on xfs-linux/for-next]
[also build test WARNING on next-20210616]
[cannot apply to v5.13-rc6]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]
url: https://github.com/0day-ci/linux/commits/Dave-Chinner/xfs-log-fixes-for-for-next/20210617-162640
base: https://git.kernel.org/pub/scm/fs/xfs/xfs-linux.git for-next
config: x86_64-randconfig-a011-20210617 (attached as .config)
compiler: clang version 13.0.0 (https://github.com/llvm/llvm-project 64720f57bea6a6bf033feef4a5751ab9c0c3b401)
reproduce (this is a W=1 build):
wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# install x86_64 cross compiling tool for clang build
# apt-get install binutils-x86-64-linux-gnu
# https://github.com/0day-ci/linux/commit/8634f301cb32bdc5ebbfcf0671509ca5fa857edd
git remote add linux-review https://github.com/0day-ci/linux
git fetch --no-tags linux-review Dave-Chinner/xfs-log-fixes-for-for-next/20210617-162640
git checkout 8634f301cb32bdc5ebbfcf0671509ca5fa857edd
# save the attached .config to linux build tree
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross ARCH=x86_64
If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>
All warnings (new ones prefixed by >>):
>> fs/xfs/xfs_log_cil.c:792:1: warning: no previous prototype for function 'xlog_cil_write_commit_record' [-Wmissing-prototypes]
xlog_cil_write_commit_record(
^
fs/xfs/xfs_log_cil.c:791:1: note: declare 'static' if the function is not intended to be used outside of this translation unit
int
^
static
1 warning generated.
vim +/xlog_cil_write_commit_record +792 fs/xfs/xfs_log_cil.c
785
786 /*
787 * Write out the commit record of a checkpoint transaction associated with the
788 * given ticket to close off a running log write. Return the lsn of the commit
789 * record.
790 */
791 int
> 792 xlog_cil_write_commit_record(
793 struct xlog *log,
794 struct xlog_ticket *ticket,
795 struct xlog_in_core **iclog,
796 xfs_lsn_t *lsn)
797 {
798 struct xlog_op_header ophdr = {
799 .oh_clientid = XFS_TRANSACTION,
800 .oh_tid = cpu_to_be32(ticket->t_tid),
801 .oh_flags = XLOG_COMMIT_TRANS,
802 };
803 struct xfs_log_iovec reg = {
804 .i_addr = &ophdr,
805 .i_len = sizeof(struct xlog_op_header),
806 .i_type = XLOG_REG_TYPE_COMMIT,
807 };
808 struct xfs_log_vec vec = {
809 .lv_niovecs = 1,
810 .lv_iovecp = ®,
811 };
812 int error;
813 LIST_HEAD(lv_chain);
814 INIT_LIST_HEAD(&vec.lv_list);
815 list_add(&vec.lv_list, &lv_chain);
816
817 if (XLOG_FORCED_SHUTDOWN(log))
818 return -EIO;
819
820 /* account for space used by record data */
821 ticket->t_curr_res -= reg.i_len;
822 error = xlog_write(log, &lv_chain, ticket, lsn, iclog, reg.i_len);
823 if (error)
824 xfs_force_shutdown(log->l_mp, SHUTDOWN_LOG_IO_ERROR);
825 return error;
826 }
827
---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all(a)lists.01.org
[-- Attachment #2: config.gz --]
[-- Type: application/gzip, Size: 34451 bytes --]
^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [PATCH 4/8] xfs: pass a CIL context to xlog_write()
2021-06-17 8:26 ` [PATCH 4/8] xfs: pass a CIL context to xlog_write() Dave Chinner
@ 2021-06-17 14:46 ` kernel test robot
2021-06-17 20:24 ` Darrick J. Wong
2021-06-18 14:23 ` Christoph Hellwig
2 siblings, 0 replies; 50+ messages in thread
From: kernel test robot @ 2021-06-17 14:46 UTC (permalink / raw)
To: Dave Chinner, linux-xfs; +Cc: kbuild-all, clang-built-linux
[-- Attachment #1: Type: text/plain, Size: 33372 bytes --]
Hi Dave,
Thank you for the patch! Perhaps something to improve:
[auto build test WARNING on xfs-linux/for-next]
[also build test WARNING on next-20210617]
[cannot apply to v5.13-rc6]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]
url: https://github.com/0day-ci/linux/commits/Dave-Chinner/xfs-log-fixes-for-for-next/20210617-162640
base: https://git.kernel.org/pub/scm/fs/xfs/xfs-linux.git for-next
config: x86_64-randconfig-a011-20210617 (attached as .config)
compiler: clang version 13.0.0 (https://github.com/llvm/llvm-project 64720f57bea6a6bf033feef4a5751ab9c0c3b401)
reproduce (this is a W=1 build):
wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# install x86_64 cross compiling tool for clang build
# apt-get install binutils-x86-64-linux-gnu
# https://github.com/0day-ci/linux/commit/fc3370002b56bcb25440b96ef5099f508c48360e
git remote add linux-review https://github.com/0day-ci/linux
git fetch --no-tags linux-review Dave-Chinner/xfs-log-fixes-for-for-next/20210617-162640
git checkout fc3370002b56bcb25440b96ef5099f508c48360e
# save the attached .config to linux build tree
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross ARCH=x86_64
If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>
All warnings (new ones prefixed by >>):
fs/xfs/xfs_log_cil.c:792:1: warning: no previous prototype for function 'xlog_cil_write_commit_record' [-Wmissing-prototypes]
xlog_cil_write_commit_record(
^
fs/xfs/xfs_log_cil.c:791:1: note: declare 'static' if the function is not intended to be used outside of this translation unit
int
^
static
>> fs/xfs/xfs_log_cil.c:1130:24: warning: variable 'commit_lsn' is uninitialized when used here [-Wuninitialized]
if (ctx->start_lsn != commit_lsn) {
^~~~~~~~~~
fs/xfs/xfs_log_cil.c:877:23: note: initialize the variable 'commit_lsn' to silence this warning
xfs_lsn_t commit_lsn;
^
= 0
2 warnings generated.
vim +/commit_lsn +1130 fs/xfs/xfs_log_cil.c
be05dd0e68ac999 Dave Chinner 2021-06-08 846
71e330b593905e4 Dave Chinner 2010-05-21 847 /*
c7cc296ddd1f6d1 Christoph Hellwig 2020-03-20 848 * Push the Committed Item List to the log.
c7cc296ddd1f6d1 Christoph Hellwig 2020-03-20 849 *
c7cc296ddd1f6d1 Christoph Hellwig 2020-03-20 850 * If the current sequence is the same as xc_push_seq we need to do a flush. If
c7cc296ddd1f6d1 Christoph Hellwig 2020-03-20 851 * xc_push_seq is less than the current sequence, then it has already been
a44f13edf0ebb4e Dave Chinner 2010-08-24 852 * flushed and we don't need to do anything - the caller will wait for it to
a44f13edf0ebb4e Dave Chinner 2010-08-24 853 * complete if necessary.
a44f13edf0ebb4e Dave Chinner 2010-08-24 854 *
c7cc296ddd1f6d1 Christoph Hellwig 2020-03-20 855 * xc_push_seq is checked unlocked against the sequence number for a match.
c7cc296ddd1f6d1 Christoph Hellwig 2020-03-20 856 * Hence we can allow log forces to run racily and not issue pushes for the
c7cc296ddd1f6d1 Christoph Hellwig 2020-03-20 857 * same sequence twice. If we get a race between multiple pushes for the same
c7cc296ddd1f6d1 Christoph Hellwig 2020-03-20 858 * sequence they will block on the first one and then abort, hence avoiding
c7cc296ddd1f6d1 Christoph Hellwig 2020-03-20 859 * needless pushes.
c7cc296ddd1f6d1 Christoph Hellwig 2020-03-20 860 */
c7cc296ddd1f6d1 Christoph Hellwig 2020-03-20 861 static void
c7cc296ddd1f6d1 Christoph Hellwig 2020-03-20 862 xlog_cil_push_work(
c7cc296ddd1f6d1 Christoph Hellwig 2020-03-20 863 struct work_struct *work)
71e330b593905e4 Dave Chinner 2010-05-21 864 {
facd77e4e38b8f0 Dave Chinner 2021-06-04 865 struct xfs_cil_ctx *ctx =
facd77e4e38b8f0 Dave Chinner 2021-06-04 866 container_of(work, struct xfs_cil_ctx, push_work);
facd77e4e38b8f0 Dave Chinner 2021-06-04 867 struct xfs_cil *cil = ctx->cil;
c7cc296ddd1f6d1 Christoph Hellwig 2020-03-20 868 struct xlog *log = cil->xc_log;
71e330b593905e4 Dave Chinner 2010-05-21 869 struct xfs_log_vec *lv;
71e330b593905e4 Dave Chinner 2010-05-21 870 struct xfs_cil_ctx *new_ctx;
71e330b593905e4 Dave Chinner 2010-05-21 871 struct xlog_in_core *commit_iclog;
66fc9ffa8638be2 Dave Chinner 2021-06-04 872 int num_iovecs = 0;
66fc9ffa8638be2 Dave Chinner 2021-06-04 873 int num_bytes = 0;
71e330b593905e4 Dave Chinner 2010-05-21 874 int error = 0;
877cf3473914ae4 Dave Chinner 2021-06-04 875 struct xlog_cil_trans_hdr thdr;
a47518453bf9581 Dave Chinner 2021-06-08 876 struct xfs_log_vec lvhdr = {};
71e330b593905e4 Dave Chinner 2010-05-21 877 xfs_lsn_t commit_lsn;
4c2d542f2e78653 Dave Chinner 2012-04-23 878 xfs_lsn_t push_seq;
0279bbbbc03f2ce Dave Chinner 2021-06-03 879 struct bio bio;
0279bbbbc03f2ce Dave Chinner 2021-06-03 880 DECLARE_COMPLETION_ONSTACK(bdev_flush);
e12213ba5d909a3 Dave Chinner 2021-06-04 881 bool push_commit_stable;
e469cbe84f4ade9 Dave Chinner 2021-06-08 882 struct xlog_ticket *ticket;
71e330b593905e4 Dave Chinner 2010-05-21 883
facd77e4e38b8f0 Dave Chinner 2021-06-04 884 new_ctx = xlog_cil_ctx_alloc();
71e330b593905e4 Dave Chinner 2010-05-21 885 new_ctx->ticket = xlog_cil_ticket_alloc(log);
71e330b593905e4 Dave Chinner 2010-05-21 886
71e330b593905e4 Dave Chinner 2010-05-21 887 down_write(&cil->xc_ctx_lock);
71e330b593905e4 Dave Chinner 2010-05-21 888
4bb928cdb900d06 Dave Chinner 2013-08-12 889 spin_lock(&cil->xc_push_lock);
4c2d542f2e78653 Dave Chinner 2012-04-23 890 push_seq = cil->xc_push_seq;
4c2d542f2e78653 Dave Chinner 2012-04-23 891 ASSERT(push_seq <= ctx->sequence);
e12213ba5d909a3 Dave Chinner 2021-06-04 892 push_commit_stable = cil->xc_push_commit_stable;
e12213ba5d909a3 Dave Chinner 2021-06-04 893 cil->xc_push_commit_stable = false;
71e330b593905e4 Dave Chinner 2010-05-21 894
0e7ab7efe77451c Dave Chinner 2020-03-24 895 /*
3682277520d6f4a Dave Chinner 2021-06-04 896 * As we are about to switch to a new, empty CIL context, we no longer
3682277520d6f4a Dave Chinner 2021-06-04 897 * need to throttle tasks on CIL space overruns. Wake any waiters that
3682277520d6f4a Dave Chinner 2021-06-04 898 * the hard push throttle may have caught so they can start committing
3682277520d6f4a Dave Chinner 2021-06-04 899 * to the new context. The ctx->xc_push_lock provides the serialisation
3682277520d6f4a Dave Chinner 2021-06-04 900 * necessary for safely using the lockless waitqueue_active() check in
3682277520d6f4a Dave Chinner 2021-06-04 901 * this context.
3682277520d6f4a Dave Chinner 2021-06-04 902 */
3682277520d6f4a Dave Chinner 2021-06-04 903 if (waitqueue_active(&cil->xc_push_wait))
c7f87f3984cfa1e Dave Chinner 2020-06-16 904 wake_up_all(&cil->xc_push_wait);
0e7ab7efe77451c Dave Chinner 2020-03-24 905
4c2d542f2e78653 Dave Chinner 2012-04-23 906 /*
4c2d542f2e78653 Dave Chinner 2012-04-23 907 * Check if we've anything to push. If there is nothing, then we don't
4c2d542f2e78653 Dave Chinner 2012-04-23 908 * move on to a new sequence number and so we have to be able to push
4c2d542f2e78653 Dave Chinner 2012-04-23 909 * this sequence again later.
4c2d542f2e78653 Dave Chinner 2012-04-23 910 */
0d11bae4bcf4aa9 Dave Chinner 2021-06-04 911 if (test_bit(XLOG_CIL_EMPTY, &cil->xc_flags)) {
4c2d542f2e78653 Dave Chinner 2012-04-23 912 cil->xc_push_seq = 0;
4bb928cdb900d06 Dave Chinner 2013-08-12 913 spin_unlock(&cil->xc_push_lock);
a44f13edf0ebb4e Dave Chinner 2010-08-24 914 goto out_skip;
4c2d542f2e78653 Dave Chinner 2012-04-23 915 }
4c2d542f2e78653 Dave Chinner 2012-04-23 916
a44f13edf0ebb4e Dave Chinner 2010-08-24 917
cf085a1b5d22144 Joe Perches 2019-11-07 918 /* check for a previously pushed sequence */
facd77e4e38b8f0 Dave Chinner 2021-06-04 919 if (push_seq < ctx->sequence) {
8af3dcd3c89aef1 Dave Chinner 2014-09-23 920 spin_unlock(&cil->xc_push_lock);
df806158b0f6eb2 Dave Chinner 2010-05-17 921 goto out_skip;
8af3dcd3c89aef1 Dave Chinner 2014-09-23 922 }
8af3dcd3c89aef1 Dave Chinner 2014-09-23 923
8af3dcd3c89aef1 Dave Chinner 2014-09-23 924 /*
8af3dcd3c89aef1 Dave Chinner 2014-09-23 925 * We are now going to push this context, so add it to the committing
8af3dcd3c89aef1 Dave Chinner 2014-09-23 926 * list before we do anything else. This ensures that anyone waiting on
8af3dcd3c89aef1 Dave Chinner 2014-09-23 927 * this push can easily detect the difference between a "push in
8af3dcd3c89aef1 Dave Chinner 2014-09-23 928 * progress" and "CIL is empty, nothing to do".
8af3dcd3c89aef1 Dave Chinner 2014-09-23 929 *
8af3dcd3c89aef1 Dave Chinner 2014-09-23 930 * IOWs, a wait loop can now check for:
8af3dcd3c89aef1 Dave Chinner 2014-09-23 931 * the current sequence not being found on the committing list;
8af3dcd3c89aef1 Dave Chinner 2014-09-23 932 * an empty CIL; and
8af3dcd3c89aef1 Dave Chinner 2014-09-23 933 * an unchanged sequence number
8af3dcd3c89aef1 Dave Chinner 2014-09-23 934 * to detect a push that had nothing to do and therefore does not need
8af3dcd3c89aef1 Dave Chinner 2014-09-23 935 * waiting on. If the CIL is not empty, we get put on the committing
8af3dcd3c89aef1 Dave Chinner 2014-09-23 936 * list before emptying the CIL and bumping the sequence number. Hence
8af3dcd3c89aef1 Dave Chinner 2014-09-23 937 * an empty CIL and an unchanged sequence number means we jumped out
8af3dcd3c89aef1 Dave Chinner 2014-09-23 938 * above after doing nothing.
8af3dcd3c89aef1 Dave Chinner 2014-09-23 939 *
8af3dcd3c89aef1 Dave Chinner 2014-09-23 940 * Hence the waiter will either find the commit sequence on the
8af3dcd3c89aef1 Dave Chinner 2014-09-23 941 * committing list or the sequence number will be unchanged and the CIL
8af3dcd3c89aef1 Dave Chinner 2014-09-23 942 * still dirty. In that latter case, the push has not yet started, and
8af3dcd3c89aef1 Dave Chinner 2014-09-23 943 * so the waiter will have to continue trying to check the CIL
8af3dcd3c89aef1 Dave Chinner 2014-09-23 944 * committing list until it is found. In extreme cases of delay, the
8af3dcd3c89aef1 Dave Chinner 2014-09-23 945 * sequence may fully commit between the attempts the wait makes to wait
8af3dcd3c89aef1 Dave Chinner 2014-09-23 946 * on the commit sequence.
8af3dcd3c89aef1 Dave Chinner 2014-09-23 947 */
8af3dcd3c89aef1 Dave Chinner 2014-09-23 948 list_add(&ctx->committing, &cil->xc_committing);
8af3dcd3c89aef1 Dave Chinner 2014-09-23 949 spin_unlock(&cil->xc_push_lock);
df806158b0f6eb2 Dave Chinner 2010-05-17 950
71e330b593905e4 Dave Chinner 2010-05-21 951 /*
0279bbbbc03f2ce Dave Chinner 2021-06-03 952 * The CIL is stable at this point - nothing new will be added to it
0279bbbbc03f2ce Dave Chinner 2021-06-03 953 * because we hold the flush lock exclusively. Hence we can now issue
0279bbbbc03f2ce Dave Chinner 2021-06-03 954 * a cache flush to ensure all the completed metadata in the journal we
0279bbbbc03f2ce Dave Chinner 2021-06-03 955 * are about to overwrite is on stable storage.
0279bbbbc03f2ce Dave Chinner 2021-06-03 956 */
0279bbbbc03f2ce Dave Chinner 2021-06-03 957 xfs_flush_bdev_async(&bio, log->l_mp->m_ddev_targp->bt_bdev,
0279bbbbc03f2ce Dave Chinner 2021-06-03 958 &bdev_flush);
0279bbbbc03f2ce Dave Chinner 2021-06-03 959
a8613836d99e627 Dave Chinner 2021-06-08 960 xlog_cil_pcp_aggregate(cil, ctx);
a8613836d99e627 Dave Chinner 2021-06-08 961
1f18c0c4b78cfb1 Dave Chinner 2021-06-08 962 while (!list_empty(&ctx->log_items)) {
71e330b593905e4 Dave Chinner 2010-05-21 963 struct xfs_log_item *item;
71e330b593905e4 Dave Chinner 2010-05-21 964
1f18c0c4b78cfb1 Dave Chinner 2021-06-08 965 item = list_first_entry(&ctx->log_items,
71e330b593905e4 Dave Chinner 2010-05-21 966 struct xfs_log_item, li_cil);
a47518453bf9581 Dave Chinner 2021-06-08 967 lv = item->li_lv;
a1785f597c8b060 Dave Chinner 2021-06-08 968 lv->lv_order_id = item->li_order_id;
a47518453bf9581 Dave Chinner 2021-06-08 969 num_iovecs += lv->lv_niovecs;
66fc9ffa8638be2 Dave Chinner 2021-06-04 970 /* we don't write ordered log vectors */
66fc9ffa8638be2 Dave Chinner 2021-06-04 971 if (lv->lv_buf_len != XFS_LOG_VEC_ORDERED)
66fc9ffa8638be2 Dave Chinner 2021-06-04 972 num_bytes += lv->lv_bytes;
a47518453bf9581 Dave Chinner 2021-06-08 973
a47518453bf9581 Dave Chinner 2021-06-08 974 list_add_tail(&lv->lv_list, &ctx->lv_chain);
a1785f597c8b060 Dave Chinner 2021-06-08 975 list_del_init(&item->li_cil);
a1785f597c8b060 Dave Chinner 2021-06-08 976 item->li_order_id = 0;
a1785f597c8b060 Dave Chinner 2021-06-08 977 item->li_lv = NULL;
71e330b593905e4 Dave Chinner 2010-05-21 978 }
71e330b593905e4 Dave Chinner 2010-05-21 979
71e330b593905e4 Dave Chinner 2010-05-21 980 /*
facd77e4e38b8f0 Dave Chinner 2021-06-04 981 * Switch the contexts so we can drop the context lock and move out
71e330b593905e4 Dave Chinner 2010-05-21 982 * of a shared context. We can't just go straight to the commit record,
71e330b593905e4 Dave Chinner 2010-05-21 983 * though - we need to synchronise with previous and future commits so
71e330b593905e4 Dave Chinner 2010-05-21 984 * that the commit records are correctly ordered in the log to ensure
71e330b593905e4 Dave Chinner 2010-05-21 985 * that we process items during log IO completion in the correct order.
71e330b593905e4 Dave Chinner 2010-05-21 986 *
71e330b593905e4 Dave Chinner 2010-05-21 987 * For example, if we get an EFI in one checkpoint and the EFD in the
71e330b593905e4 Dave Chinner 2010-05-21 988 * next (e.g. due to log forces), we do not want the checkpoint with
71e330b593905e4 Dave Chinner 2010-05-21 989 * the EFD to be committed before the checkpoint with the EFI. Hence
71e330b593905e4 Dave Chinner 2010-05-21 990 * we must strictly order the commit records of the checkpoints so
71e330b593905e4 Dave Chinner 2010-05-21 991 * that: a) the checkpoint callbacks are attached to the iclogs in the
71e330b593905e4 Dave Chinner 2010-05-21 992 * correct order; and b) the checkpoints are replayed in correct order
71e330b593905e4 Dave Chinner 2010-05-21 993 * in log recovery.
71e330b593905e4 Dave Chinner 2010-05-21 994 *
71e330b593905e4 Dave Chinner 2010-05-21 995 * Hence we need to add this context to the committing context list so
71e330b593905e4 Dave Chinner 2010-05-21 996 * that higher sequences will wait for us to write out a commit record
71e330b593905e4 Dave Chinner 2010-05-21 997 * before they do.
f876e44603ad091 Dave Chinner 2014-02-27 998 *
f39ae5297c5ce2f Dave Chinner 2021-06-04 999 * xfs_log_force_seq requires us to mirror the new sequence into the cil
f876e44603ad091 Dave Chinner 2014-02-27 1000 * structure atomically with the addition of this sequence to the
f876e44603ad091 Dave Chinner 2014-02-27 1001 * committing list. This also ensures that we can do unlocked checks
f876e44603ad091 Dave Chinner 2014-02-27 1002 * against the current sequence in log forces without risking
f876e44603ad091 Dave Chinner 2014-02-27 1003 * deferencing a freed context pointer.
71e330b593905e4 Dave Chinner 2010-05-21 1004 */
4bb928cdb900d06 Dave Chinner 2013-08-12 1005 spin_lock(&cil->xc_push_lock);
facd77e4e38b8f0 Dave Chinner 2021-06-04 1006 xlog_cil_ctx_switch(cil, new_ctx);
4bb928cdb900d06 Dave Chinner 2013-08-12 1007 spin_unlock(&cil->xc_push_lock);
71e330b593905e4 Dave Chinner 2010-05-21 1008 up_write(&cil->xc_ctx_lock);
71e330b593905e4 Dave Chinner 2010-05-21 1009
a1785f597c8b060 Dave Chinner 2021-06-08 1010 /*
a1785f597c8b060 Dave Chinner 2021-06-08 1011 * Sort the log vector chain before we add the transaction headers.
a1785f597c8b060 Dave Chinner 2021-06-08 1012 * This ensures we always have the transaction headers at the start
a1785f597c8b060 Dave Chinner 2021-06-08 1013 * of the chain.
a1785f597c8b060 Dave Chinner 2021-06-08 1014 */
a1785f597c8b060 Dave Chinner 2021-06-08 1015 list_sort(NULL, &ctx->lv_chain, xlog_cil_order_cmp);
a1785f597c8b060 Dave Chinner 2021-06-08 1016
71e330b593905e4 Dave Chinner 2010-05-21 1017 /*
71e330b593905e4 Dave Chinner 2010-05-21 1018 * Build a checkpoint transaction header and write it to the log to
71e330b593905e4 Dave Chinner 2010-05-21 1019 * begin the transaction. We need to account for the space used by the
71e330b593905e4 Dave Chinner 2010-05-21 1020 * transaction header here as it is not accounted for in xlog_write().
a47518453bf9581 Dave Chinner 2021-06-08 1021 * Add the lvhdr to the head of the lv chain we pass to xlog_write() so
a47518453bf9581 Dave Chinner 2021-06-08 1022 * it gets written into the iclog first.
71e330b593905e4 Dave Chinner 2010-05-21 1023 */
877cf3473914ae4 Dave Chinner 2021-06-04 1024 xlog_cil_build_trans_hdr(ctx, &thdr, &lvhdr, num_iovecs);
66fc9ffa8638be2 Dave Chinner 2021-06-04 1025 num_bytes += lvhdr.lv_bytes;
a47518453bf9581 Dave Chinner 2021-06-08 1026 list_add(&lvhdr.lv_list, &ctx->lv_chain);
71e330b593905e4 Dave Chinner 2010-05-21 1027
0279bbbbc03f2ce Dave Chinner 2021-06-03 1028 /*
0279bbbbc03f2ce Dave Chinner 2021-06-03 1029 * Before we format and submit the first iclog, we have to ensure that
0279bbbbc03f2ce Dave Chinner 2021-06-03 1030 * the metadata writeback ordering cache flush is complete.
0279bbbbc03f2ce Dave Chinner 2021-06-03 1031 */
0279bbbbc03f2ce Dave Chinner 2021-06-03 1032 wait_for_completion(&bdev_flush);
0279bbbbc03f2ce Dave Chinner 2021-06-03 1033
877cf3473914ae4 Dave Chinner 2021-06-04 1034 /*
877cf3473914ae4 Dave Chinner 2021-06-04 1035 * The LSN we need to pass to the log items on transaction commit is the
877cf3473914ae4 Dave Chinner 2021-06-04 1036 * LSN reported by the first log vector write, not the commit lsn. If we
877cf3473914ae4 Dave Chinner 2021-06-04 1037 * use the commit record lsn then we can move the tail beyond the grant
877cf3473914ae4 Dave Chinner 2021-06-04 1038 * write head.
877cf3473914ae4 Dave Chinner 2021-06-04 1039 */
fc3370002b56bcb Dave Chinner 2021-06-17 1040 error = xlog_write(log, ctx, &ctx->lv_chain, ctx->ticket,
a47518453bf9581 Dave Chinner 2021-06-08 1041 NULL, num_bytes);
a47518453bf9581 Dave Chinner 2021-06-08 1042
a47518453bf9581 Dave Chinner 2021-06-08 1043 /*
a47518453bf9581 Dave Chinner 2021-06-08 1044 * Take the lvhdr back off the lv_chain as it should not be passed
a47518453bf9581 Dave Chinner 2021-06-08 1045 * to log IO completion.
a47518453bf9581 Dave Chinner 2021-06-08 1046 */
a47518453bf9581 Dave Chinner 2021-06-08 1047 list_del(&lvhdr.lv_list);
71e330b593905e4 Dave Chinner 2010-05-21 1048 if (error)
7db37c5e6575b22 Dave Chinner 2011-01-27 1049 goto out_abort_free_ticket;
71e330b593905e4 Dave Chinner 2010-05-21 1050
71e330b593905e4 Dave Chinner 2010-05-21 1051 /*
71e330b593905e4 Dave Chinner 2010-05-21 1052 * now that we've written the checkpoint into the log, strictly
71e330b593905e4 Dave Chinner 2010-05-21 1053 * order the commit records so replay will get them in the right order.
71e330b593905e4 Dave Chinner 2010-05-21 1054 */
71e330b593905e4 Dave Chinner 2010-05-21 1055 restart:
4bb928cdb900d06 Dave Chinner 2013-08-12 1056 spin_lock(&cil->xc_push_lock);
71e330b593905e4 Dave Chinner 2010-05-21 1057 list_for_each_entry(new_ctx, &cil->xc_committing, committing) {
ac983517ec5941d Dave Chinner 2014-05-07 1058 /*
ac983517ec5941d Dave Chinner 2014-05-07 1059 * Avoid getting stuck in this loop because we were woken by the
ac983517ec5941d Dave Chinner 2014-05-07 1060 * shutdown, but then went back to sleep once already in the
ac983517ec5941d Dave Chinner 2014-05-07 1061 * shutdown state.
ac983517ec5941d Dave Chinner 2014-05-07 1062 */
ac983517ec5941d Dave Chinner 2014-05-07 1063 if (XLOG_FORCED_SHUTDOWN(log)) {
ac983517ec5941d Dave Chinner 2014-05-07 1064 spin_unlock(&cil->xc_push_lock);
ac983517ec5941d Dave Chinner 2014-05-07 1065 goto out_abort_free_ticket;
ac983517ec5941d Dave Chinner 2014-05-07 1066 }
ac983517ec5941d Dave Chinner 2014-05-07 1067
71e330b593905e4 Dave Chinner 2010-05-21 1068 /*
71e330b593905e4 Dave Chinner 2010-05-21 1069 * Higher sequences will wait for this one so skip them.
ac983517ec5941d Dave Chinner 2014-05-07 1070 * Don't wait for our own sequence, either.
71e330b593905e4 Dave Chinner 2010-05-21 1071 */
71e330b593905e4 Dave Chinner 2010-05-21 1072 if (new_ctx->sequence >= ctx->sequence)
71e330b593905e4 Dave Chinner 2010-05-21 1073 continue;
71e330b593905e4 Dave Chinner 2010-05-21 1074 if (!new_ctx->commit_lsn) {
71e330b593905e4 Dave Chinner 2010-05-21 1075 /*
71e330b593905e4 Dave Chinner 2010-05-21 1076 * It is still being pushed! Wait for the push to
71e330b593905e4 Dave Chinner 2010-05-21 1077 * complete, then start again from the beginning.
71e330b593905e4 Dave Chinner 2010-05-21 1078 */
4bb928cdb900d06 Dave Chinner 2013-08-12 1079 xlog_wait(&cil->xc_commit_wait, &cil->xc_push_lock);
71e330b593905e4 Dave Chinner 2010-05-21 1080 goto restart;
71e330b593905e4 Dave Chinner 2010-05-21 1081 }
71e330b593905e4 Dave Chinner 2010-05-21 1082 }
4bb928cdb900d06 Dave Chinner 2013-08-12 1083 spin_unlock(&cil->xc_push_lock);
71e330b593905e4 Dave Chinner 2010-05-21 1084
fc3370002b56bcb Dave Chinner 2021-06-17 1085 error = xlog_cil_write_commit_record(ctx, &commit_iclog);
dd401770b0ff68f Dave Chinner 2020-03-25 1086 if (error)
dd401770b0ff68f Dave Chinner 2020-03-25 1087 goto out_abort_free_ticket;
dd401770b0ff68f Dave Chinner 2020-03-25 1088
89ae379d564c5d8 Christoph Hellwig 2019-06-28 1089 spin_lock(&commit_iclog->ic_callback_lock);
1858bb0bec612df Christoph Hellwig 2019-10-14 1090 if (commit_iclog->ic_state == XLOG_STATE_IOERROR) {
89ae379d564c5d8 Christoph Hellwig 2019-06-28 1091 spin_unlock(&commit_iclog->ic_callback_lock);
e469cbe84f4ade9 Dave Chinner 2021-06-08 1092 goto out_abort_free_ticket;
89ae379d564c5d8 Christoph Hellwig 2019-06-28 1093 }
89ae379d564c5d8 Christoph Hellwig 2019-06-28 1094 ASSERT_ALWAYS(commit_iclog->ic_state == XLOG_STATE_ACTIVE ||
89ae379d564c5d8 Christoph Hellwig 2019-06-28 1095 commit_iclog->ic_state == XLOG_STATE_WANT_SYNC);
89ae379d564c5d8 Christoph Hellwig 2019-06-28 1096 list_add_tail(&ctx->iclog_entry, &commit_iclog->ic_callbacks);
89ae379d564c5d8 Christoph Hellwig 2019-06-28 1097 spin_unlock(&commit_iclog->ic_callback_lock);
71e330b593905e4 Dave Chinner 2010-05-21 1098
71e330b593905e4 Dave Chinner 2010-05-21 1099 /*
71e330b593905e4 Dave Chinner 2010-05-21 1100 * now the checkpoint commit is complete and we've attached the
71e330b593905e4 Dave Chinner 2010-05-21 1101 * callbacks to the iclog we can assign the commit LSN to the context
71e330b593905e4 Dave Chinner 2010-05-21 1102 * and wake up anyone who is waiting for the commit to complete.
71e330b593905e4 Dave Chinner 2010-05-21 1103 */
4bb928cdb900d06 Dave Chinner 2013-08-12 1104 spin_lock(&cil->xc_push_lock);
eb40a87500ac2f6 Dave Chinner 2010-12-21 1105 wake_up_all(&cil->xc_commit_wait);
4bb928cdb900d06 Dave Chinner 2013-08-12 1106 spin_unlock(&cil->xc_push_lock);
71e330b593905e4 Dave Chinner 2010-05-21 1107
e469cbe84f4ade9 Dave Chinner 2021-06-08 1108 /*
e469cbe84f4ade9 Dave Chinner 2021-06-08 1109 * Pull the ticket off the ctx so we can ungrant it after releasing the
e469cbe84f4ade9 Dave Chinner 2021-06-08 1110 * commit_iclog. The ctx may be freed by the time we return from
e469cbe84f4ade9 Dave Chinner 2021-06-08 1111 * releasing the commit_iclog (i.e. checkpoint has been completed and
e469cbe84f4ade9 Dave Chinner 2021-06-08 1112 * callback run) so we can't reference the ctx after the call to
e469cbe84f4ade9 Dave Chinner 2021-06-08 1113 * xlog_state_release_iclog().
e469cbe84f4ade9 Dave Chinner 2021-06-08 1114 */
e469cbe84f4ade9 Dave Chinner 2021-06-08 1115 ticket = ctx->ticket;
e469cbe84f4ade9 Dave Chinner 2021-06-08 1116
5fd9256ce156ef7 Dave Chinner 2021-06-03 1117 /*
815753dc16bbca2 Dave Chinner 2021-06-17 1118 * If the checkpoint spans multiple iclogs, wait for all previous iclogs
815753dc16bbca2 Dave Chinner 2021-06-17 1119 * to complete before we submit the commit_iclog. We can't use state
815753dc16bbca2 Dave Chinner 2021-06-17 1120 * checks for this - ACTIVE can be either a past completed iclog or a
815753dc16bbca2 Dave Chinner 2021-06-17 1121 * future iclog being filled, while WANT_SYNC through SYNC_DONE can be a
815753dc16bbca2 Dave Chinner 2021-06-17 1122 * past or future iclog awaiting IO or ordered IO completion to be run.
815753dc16bbca2 Dave Chinner 2021-06-17 1123 * In the latter case, if it's a future iclog and we wait on it, the we
815753dc16bbca2 Dave Chinner 2021-06-17 1124 * will hang because it won't get processed through to ic_force_wait
815753dc16bbca2 Dave Chinner 2021-06-17 1125 * wakeup until this commit_iclog is written to disk. Hence we use the
815753dc16bbca2 Dave Chinner 2021-06-17 1126 * iclog header lsn and compare it to the commit lsn to determine if we
815753dc16bbca2 Dave Chinner 2021-06-17 1127 * need to wait on iclogs or not.
5fd9256ce156ef7 Dave Chinner 2021-06-03 1128 */
5fd9256ce156ef7 Dave Chinner 2021-06-03 1129 spin_lock(&log->l_icloglock);
cb1acb3f3246368 Dave Chinner 2021-06-04 @1130 if (ctx->start_lsn != commit_lsn) {
815753dc16bbca2 Dave Chinner 2021-06-17 1131 struct xlog_in_core *iclog;
815753dc16bbca2 Dave Chinner 2021-06-17 1132
815753dc16bbca2 Dave Chinner 2021-06-17 1133 for (iclog = commit_iclog->ic_prev;
815753dc16bbca2 Dave Chinner 2021-06-17 1134 iclog != commit_iclog;
815753dc16bbca2 Dave Chinner 2021-06-17 1135 iclog = iclog->ic_prev) {
815753dc16bbca2 Dave Chinner 2021-06-17 1136 xfs_lsn_t hlsn;
815753dc16bbca2 Dave Chinner 2021-06-17 1137
815753dc16bbca2 Dave Chinner 2021-06-17 1138 /*
815753dc16bbca2 Dave Chinner 2021-06-17 1139 * If the LSN of the iclog is zero or in the future it
815753dc16bbca2 Dave Chinner 2021-06-17 1140 * means it has passed through IO completion and
815753dc16bbca2 Dave Chinner 2021-06-17 1141 * activation and hence all previous iclogs have also
815753dc16bbca2 Dave Chinner 2021-06-17 1142 * done so. We do not need to wait at all in this case.
815753dc16bbca2 Dave Chinner 2021-06-17 1143 */
815753dc16bbca2 Dave Chinner 2021-06-17 1144 hlsn = be64_to_cpu(iclog->ic_header.h_lsn);
815753dc16bbca2 Dave Chinner 2021-06-17 1145 if (!hlsn || XFS_LSN_CMP(hlsn, commit_lsn) > 0)
815753dc16bbca2 Dave Chinner 2021-06-17 1146 break;
815753dc16bbca2 Dave Chinner 2021-06-17 1147
815753dc16bbca2 Dave Chinner 2021-06-17 1148 /*
815753dc16bbca2 Dave Chinner 2021-06-17 1149 * If the LSN of the iclog is older than the commit lsn,
815753dc16bbca2 Dave Chinner 2021-06-17 1150 * we have to wait on it. Waiting on this via the
815753dc16bbca2 Dave Chinner 2021-06-17 1151 * ic_force_wait should also order the completion of all
815753dc16bbca2 Dave Chinner 2021-06-17 1152 * older iclogs, too, but we leave checking that to the
815753dc16bbca2 Dave Chinner 2021-06-17 1153 * next loop iteration.
815753dc16bbca2 Dave Chinner 2021-06-17 1154 */
815753dc16bbca2 Dave Chinner 2021-06-17 1155 ASSERT(XFS_LSN_CMP(hlsn, commit_lsn) < 0);
815753dc16bbca2 Dave Chinner 2021-06-17 1156 xlog_wait_on_iclog(iclog);
cb1acb3f3246368 Dave Chinner 2021-06-04 1157 spin_lock(&log->l_icloglock);
815753dc16bbca2 Dave Chinner 2021-06-17 1158 }
815753dc16bbca2 Dave Chinner 2021-06-17 1159
815753dc16bbca2 Dave Chinner 2021-06-17 1160 /*
815753dc16bbca2 Dave Chinner 2021-06-17 1161 * Regardless of whether we need to wait or not, the the
815753dc16bbca2 Dave Chinner 2021-06-17 1162 * commit_iclog write needs to issue a pre-flush so that the
815753dc16bbca2 Dave Chinner 2021-06-17 1163 * ordering for this checkpoint is correctly preserved down to
815753dc16bbca2 Dave Chinner 2021-06-17 1164 * stable storage.
815753dc16bbca2 Dave Chinner 2021-06-17 1165 */
cb1acb3f3246368 Dave Chinner 2021-06-04 1166 commit_iclog->ic_flags |= XLOG_ICL_NEED_FLUSH;
5fd9256ce156ef7 Dave Chinner 2021-06-03 1167 }
5fd9256ce156ef7 Dave Chinner 2021-06-03 1168
cb1acb3f3246368 Dave Chinner 2021-06-04 1169 /*
cb1acb3f3246368 Dave Chinner 2021-06-04 1170 * The commit iclog must be written to stable storage to guarantee
cb1acb3f3246368 Dave Chinner 2021-06-04 1171 * journal IO vs metadata writeback IO is correctly ordered on stable
cb1acb3f3246368 Dave Chinner 2021-06-04 1172 * storage.
e12213ba5d909a3 Dave Chinner 2021-06-04 1173 *
e12213ba5d909a3 Dave Chinner 2021-06-04 1174 * If the push caller needs the commit to be immediately stable and the
e12213ba5d909a3 Dave Chinner 2021-06-04 1175 * commit_iclog is not yet marked as XLOG_STATE_WANT_SYNC to indicate it
e12213ba5d909a3 Dave Chinner 2021-06-04 1176 * will be written when released, switch it's state to WANT_SYNC right
e12213ba5d909a3 Dave Chinner 2021-06-04 1177 * now.
cb1acb3f3246368 Dave Chinner 2021-06-04 1178 */
cb1acb3f3246368 Dave Chinner 2021-06-04 1179 commit_iclog->ic_flags |= XLOG_ICL_NEED_FUA;
e12213ba5d909a3 Dave Chinner 2021-06-04 1180 if (push_commit_stable && commit_iclog->ic_state == XLOG_STATE_ACTIVE)
e12213ba5d909a3 Dave Chinner 2021-06-04 1181 xlog_state_switch_iclogs(log, commit_iclog, 0);
e469cbe84f4ade9 Dave Chinner 2021-06-08 1182 xlog_state_release_iclog(log, commit_iclog, ticket);
cb1acb3f3246368 Dave Chinner 2021-06-04 1183 spin_unlock(&log->l_icloglock);
e469cbe84f4ade9 Dave Chinner 2021-06-08 1184
e469cbe84f4ade9 Dave Chinner 2021-06-08 1185 xfs_log_ticket_ungrant(log, ticket);
c7cc296ddd1f6d1 Christoph Hellwig 2020-03-20 1186 return;
71e330b593905e4 Dave Chinner 2010-05-21 1187
71e330b593905e4 Dave Chinner 2010-05-21 1188 out_skip:
71e330b593905e4 Dave Chinner 2010-05-21 1189 up_write(&cil->xc_ctx_lock);
71e330b593905e4 Dave Chinner 2010-05-21 1190 xfs_log_ticket_put(new_ctx->ticket);
71e330b593905e4 Dave Chinner 2010-05-21 1191 kmem_free(new_ctx);
c7cc296ddd1f6d1 Christoph Hellwig 2020-03-20 1192 return;
71e330b593905e4 Dave Chinner 2010-05-21 1193
7db37c5e6575b22 Dave Chinner 2011-01-27 1194 out_abort_free_ticket:
877cf3473914ae4 Dave Chinner 2021-06-04 1195 xfs_log_ticket_ungrant(log, ctx->ticket);
12e6a0f449d585b Christoph Hellwig 2020-03-20 1196 ASSERT(XLOG_FORCED_SHUTDOWN(log));
12e6a0f449d585b Christoph Hellwig 2020-03-20 1197 xlog_cil_committed(ctx);
4c2d542f2e78653 Dave Chinner 2012-04-23 1198 }
4c2d542f2e78653 Dave Chinner 2012-04-23 1199
---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org
[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 34451 bytes --]
^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [PATCH 4/8] xfs: pass a CIL context to xlog_write()
@ 2021-06-17 14:46 ` kernel test robot
0 siblings, 0 replies; 50+ messages in thread
From: kernel test robot @ 2021-06-17 14:46 UTC (permalink / raw)
To: kbuild-all
[-- Attachment #1: Type: text/plain, Size: 33783 bytes --]
Hi Dave,
Thank you for the patch! Perhaps something to improve:
[auto build test WARNING on xfs-linux/for-next]
[also build test WARNING on next-20210617]
[cannot apply to v5.13-rc6]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]
url: https://github.com/0day-ci/linux/commits/Dave-Chinner/xfs-log-fixes-for-for-next/20210617-162640
base: https://git.kernel.org/pub/scm/fs/xfs/xfs-linux.git for-next
config: x86_64-randconfig-a011-20210617 (attached as .config)
compiler: clang version 13.0.0 (https://github.com/llvm/llvm-project 64720f57bea6a6bf033feef4a5751ab9c0c3b401)
reproduce (this is a W=1 build):
wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# install x86_64 cross compiling tool for clang build
# apt-get install binutils-x86-64-linux-gnu
# https://github.com/0day-ci/linux/commit/fc3370002b56bcb25440b96ef5099f508c48360e
git remote add linux-review https://github.com/0day-ci/linux
git fetch --no-tags linux-review Dave-Chinner/xfs-log-fixes-for-for-next/20210617-162640
git checkout fc3370002b56bcb25440b96ef5099f508c48360e
# save the attached .config to linux build tree
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross ARCH=x86_64
If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>
All warnings (new ones prefixed by >>):
fs/xfs/xfs_log_cil.c:792:1: warning: no previous prototype for function 'xlog_cil_write_commit_record' [-Wmissing-prototypes]
xlog_cil_write_commit_record(
^
fs/xfs/xfs_log_cil.c:791:1: note: declare 'static' if the function is not intended to be used outside of this translation unit
int
^
static
>> fs/xfs/xfs_log_cil.c:1130:24: warning: variable 'commit_lsn' is uninitialized when used here [-Wuninitialized]
if (ctx->start_lsn != commit_lsn) {
^~~~~~~~~~
fs/xfs/xfs_log_cil.c:877:23: note: initialize the variable 'commit_lsn' to silence this warning
xfs_lsn_t commit_lsn;
^
= 0
2 warnings generated.
vim +/commit_lsn +1130 fs/xfs/xfs_log_cil.c
be05dd0e68ac999 Dave Chinner 2021-06-08 846
71e330b593905e4 Dave Chinner 2010-05-21 847 /*
c7cc296ddd1f6d1 Christoph Hellwig 2020-03-20 848 * Push the Committed Item List to the log.
c7cc296ddd1f6d1 Christoph Hellwig 2020-03-20 849 *
c7cc296ddd1f6d1 Christoph Hellwig 2020-03-20 850 * If the current sequence is the same as xc_push_seq we need to do a flush. If
c7cc296ddd1f6d1 Christoph Hellwig 2020-03-20 851 * xc_push_seq is less than the current sequence, then it has already been
a44f13edf0ebb4e Dave Chinner 2010-08-24 852 * flushed and we don't need to do anything - the caller will wait for it to
a44f13edf0ebb4e Dave Chinner 2010-08-24 853 * complete if necessary.
a44f13edf0ebb4e Dave Chinner 2010-08-24 854 *
c7cc296ddd1f6d1 Christoph Hellwig 2020-03-20 855 * xc_push_seq is checked unlocked against the sequence number for a match.
c7cc296ddd1f6d1 Christoph Hellwig 2020-03-20 856 * Hence we can allow log forces to run racily and not issue pushes for the
c7cc296ddd1f6d1 Christoph Hellwig 2020-03-20 857 * same sequence twice. If we get a race between multiple pushes for the same
c7cc296ddd1f6d1 Christoph Hellwig 2020-03-20 858 * sequence they will block on the first one and then abort, hence avoiding
c7cc296ddd1f6d1 Christoph Hellwig 2020-03-20 859 * needless pushes.
c7cc296ddd1f6d1 Christoph Hellwig 2020-03-20 860 */
c7cc296ddd1f6d1 Christoph Hellwig 2020-03-20 861 static void
c7cc296ddd1f6d1 Christoph Hellwig 2020-03-20 862 xlog_cil_push_work(
c7cc296ddd1f6d1 Christoph Hellwig 2020-03-20 863 struct work_struct *work)
71e330b593905e4 Dave Chinner 2010-05-21 864 {
facd77e4e38b8f0 Dave Chinner 2021-06-04 865 struct xfs_cil_ctx *ctx =
facd77e4e38b8f0 Dave Chinner 2021-06-04 866 container_of(work, struct xfs_cil_ctx, push_work);
facd77e4e38b8f0 Dave Chinner 2021-06-04 867 struct xfs_cil *cil = ctx->cil;
c7cc296ddd1f6d1 Christoph Hellwig 2020-03-20 868 struct xlog *log = cil->xc_log;
71e330b593905e4 Dave Chinner 2010-05-21 869 struct xfs_log_vec *lv;
71e330b593905e4 Dave Chinner 2010-05-21 870 struct xfs_cil_ctx *new_ctx;
71e330b593905e4 Dave Chinner 2010-05-21 871 struct xlog_in_core *commit_iclog;
66fc9ffa8638be2 Dave Chinner 2021-06-04 872 int num_iovecs = 0;
66fc9ffa8638be2 Dave Chinner 2021-06-04 873 int num_bytes = 0;
71e330b593905e4 Dave Chinner 2010-05-21 874 int error = 0;
877cf3473914ae4 Dave Chinner 2021-06-04 875 struct xlog_cil_trans_hdr thdr;
a47518453bf9581 Dave Chinner 2021-06-08 876 struct xfs_log_vec lvhdr = {};
71e330b593905e4 Dave Chinner 2010-05-21 877 xfs_lsn_t commit_lsn;
4c2d542f2e78653 Dave Chinner 2012-04-23 878 xfs_lsn_t push_seq;
0279bbbbc03f2ce Dave Chinner 2021-06-03 879 struct bio bio;
0279bbbbc03f2ce Dave Chinner 2021-06-03 880 DECLARE_COMPLETION_ONSTACK(bdev_flush);
e12213ba5d909a3 Dave Chinner 2021-06-04 881 bool push_commit_stable;
e469cbe84f4ade9 Dave Chinner 2021-06-08 882 struct xlog_ticket *ticket;
71e330b593905e4 Dave Chinner 2010-05-21 883
facd77e4e38b8f0 Dave Chinner 2021-06-04 884 new_ctx = xlog_cil_ctx_alloc();
71e330b593905e4 Dave Chinner 2010-05-21 885 new_ctx->ticket = xlog_cil_ticket_alloc(log);
71e330b593905e4 Dave Chinner 2010-05-21 886
71e330b593905e4 Dave Chinner 2010-05-21 887 down_write(&cil->xc_ctx_lock);
71e330b593905e4 Dave Chinner 2010-05-21 888
4bb928cdb900d06 Dave Chinner 2013-08-12 889 spin_lock(&cil->xc_push_lock);
4c2d542f2e78653 Dave Chinner 2012-04-23 890 push_seq = cil->xc_push_seq;
4c2d542f2e78653 Dave Chinner 2012-04-23 891 ASSERT(push_seq <= ctx->sequence);
e12213ba5d909a3 Dave Chinner 2021-06-04 892 push_commit_stable = cil->xc_push_commit_stable;
e12213ba5d909a3 Dave Chinner 2021-06-04 893 cil->xc_push_commit_stable = false;
71e330b593905e4 Dave Chinner 2010-05-21 894
0e7ab7efe77451c Dave Chinner 2020-03-24 895 /*
3682277520d6f4a Dave Chinner 2021-06-04 896 * As we are about to switch to a new, empty CIL context, we no longer
3682277520d6f4a Dave Chinner 2021-06-04 897 * need to throttle tasks on CIL space overruns. Wake any waiters that
3682277520d6f4a Dave Chinner 2021-06-04 898 * the hard push throttle may have caught so they can start committing
3682277520d6f4a Dave Chinner 2021-06-04 899 * to the new context. The ctx->xc_push_lock provides the serialisation
3682277520d6f4a Dave Chinner 2021-06-04 900 * necessary for safely using the lockless waitqueue_active() check in
3682277520d6f4a Dave Chinner 2021-06-04 901 * this context.
3682277520d6f4a Dave Chinner 2021-06-04 902 */
3682277520d6f4a Dave Chinner 2021-06-04 903 if (waitqueue_active(&cil->xc_push_wait))
c7f87f3984cfa1e Dave Chinner 2020-06-16 904 wake_up_all(&cil->xc_push_wait);
0e7ab7efe77451c Dave Chinner 2020-03-24 905
4c2d542f2e78653 Dave Chinner 2012-04-23 906 /*
4c2d542f2e78653 Dave Chinner 2012-04-23 907 * Check if we've anything to push. If there is nothing, then we don't
4c2d542f2e78653 Dave Chinner 2012-04-23 908 * move on to a new sequence number and so we have to be able to push
4c2d542f2e78653 Dave Chinner 2012-04-23 909 * this sequence again later.
4c2d542f2e78653 Dave Chinner 2012-04-23 910 */
0d11bae4bcf4aa9 Dave Chinner 2021-06-04 911 if (test_bit(XLOG_CIL_EMPTY, &cil->xc_flags)) {
4c2d542f2e78653 Dave Chinner 2012-04-23 912 cil->xc_push_seq = 0;
4bb928cdb900d06 Dave Chinner 2013-08-12 913 spin_unlock(&cil->xc_push_lock);
a44f13edf0ebb4e Dave Chinner 2010-08-24 914 goto out_skip;
4c2d542f2e78653 Dave Chinner 2012-04-23 915 }
4c2d542f2e78653 Dave Chinner 2012-04-23 916
a44f13edf0ebb4e Dave Chinner 2010-08-24 917
cf085a1b5d22144 Joe Perches 2019-11-07 918 /* check for a previously pushed sequence */
facd77e4e38b8f0 Dave Chinner 2021-06-04 919 if (push_seq < ctx->sequence) {
8af3dcd3c89aef1 Dave Chinner 2014-09-23 920 spin_unlock(&cil->xc_push_lock);
df806158b0f6eb2 Dave Chinner 2010-05-17 921 goto out_skip;
8af3dcd3c89aef1 Dave Chinner 2014-09-23 922 }
8af3dcd3c89aef1 Dave Chinner 2014-09-23 923
8af3dcd3c89aef1 Dave Chinner 2014-09-23 924 /*
8af3dcd3c89aef1 Dave Chinner 2014-09-23 925 * We are now going to push this context, so add it to the committing
8af3dcd3c89aef1 Dave Chinner 2014-09-23 926 * list before we do anything else. This ensures that anyone waiting on
8af3dcd3c89aef1 Dave Chinner 2014-09-23 927 * this push can easily detect the difference between a "push in
8af3dcd3c89aef1 Dave Chinner 2014-09-23 928 * progress" and "CIL is empty, nothing to do".
8af3dcd3c89aef1 Dave Chinner 2014-09-23 929 *
8af3dcd3c89aef1 Dave Chinner 2014-09-23 930 * IOWs, a wait loop can now check for:
8af3dcd3c89aef1 Dave Chinner 2014-09-23 931 * the current sequence not being found on the committing list;
8af3dcd3c89aef1 Dave Chinner 2014-09-23 932 * an empty CIL; and
8af3dcd3c89aef1 Dave Chinner 2014-09-23 933 * an unchanged sequence number
8af3dcd3c89aef1 Dave Chinner 2014-09-23 934 * to detect a push that had nothing to do and therefore does not need
8af3dcd3c89aef1 Dave Chinner 2014-09-23 935 * waiting on. If the CIL is not empty, we get put on the committing
8af3dcd3c89aef1 Dave Chinner 2014-09-23 936 * list before emptying the CIL and bumping the sequence number. Hence
8af3dcd3c89aef1 Dave Chinner 2014-09-23 937 * an empty CIL and an unchanged sequence number means we jumped out
8af3dcd3c89aef1 Dave Chinner 2014-09-23 938 * above after doing nothing.
8af3dcd3c89aef1 Dave Chinner 2014-09-23 939 *
8af3dcd3c89aef1 Dave Chinner 2014-09-23 940 * Hence the waiter will either find the commit sequence on the
8af3dcd3c89aef1 Dave Chinner 2014-09-23 941 * committing list or the sequence number will be unchanged and the CIL
8af3dcd3c89aef1 Dave Chinner 2014-09-23 942 * still dirty. In that latter case, the push has not yet started, and
8af3dcd3c89aef1 Dave Chinner 2014-09-23 943 * so the waiter will have to continue trying to check the CIL
8af3dcd3c89aef1 Dave Chinner 2014-09-23 944 * committing list until it is found. In extreme cases of delay, the
8af3dcd3c89aef1 Dave Chinner 2014-09-23 945 * sequence may fully commit between the attempts the wait makes to wait
8af3dcd3c89aef1 Dave Chinner 2014-09-23 946 * on the commit sequence.
8af3dcd3c89aef1 Dave Chinner 2014-09-23 947 */
8af3dcd3c89aef1 Dave Chinner 2014-09-23 948 list_add(&ctx->committing, &cil->xc_committing);
8af3dcd3c89aef1 Dave Chinner 2014-09-23 949 spin_unlock(&cil->xc_push_lock);
df806158b0f6eb2 Dave Chinner 2010-05-17 950
71e330b593905e4 Dave Chinner 2010-05-21 951 /*
0279bbbbc03f2ce Dave Chinner 2021-06-03 952 * The CIL is stable at this point - nothing new will be added to it
0279bbbbc03f2ce Dave Chinner 2021-06-03 953 * because we hold the flush lock exclusively. Hence we can now issue
0279bbbbc03f2ce Dave Chinner 2021-06-03 954 * a cache flush to ensure all the completed metadata in the journal we
0279bbbbc03f2ce Dave Chinner 2021-06-03 955 * are about to overwrite is on stable storage.
0279bbbbc03f2ce Dave Chinner 2021-06-03 956 */
0279bbbbc03f2ce Dave Chinner 2021-06-03 957 xfs_flush_bdev_async(&bio, log->l_mp->m_ddev_targp->bt_bdev,
0279bbbbc03f2ce Dave Chinner 2021-06-03 958 &bdev_flush);
0279bbbbc03f2ce Dave Chinner 2021-06-03 959
a8613836d99e627 Dave Chinner 2021-06-08 960 xlog_cil_pcp_aggregate(cil, ctx);
a8613836d99e627 Dave Chinner 2021-06-08 961
1f18c0c4b78cfb1 Dave Chinner 2021-06-08 962 while (!list_empty(&ctx->log_items)) {
71e330b593905e4 Dave Chinner 2010-05-21 963 struct xfs_log_item *item;
71e330b593905e4 Dave Chinner 2010-05-21 964
1f18c0c4b78cfb1 Dave Chinner 2021-06-08 965 item = list_first_entry(&ctx->log_items,
71e330b593905e4 Dave Chinner 2010-05-21 966 struct xfs_log_item, li_cil);
a47518453bf9581 Dave Chinner 2021-06-08 967 lv = item->li_lv;
a1785f597c8b060 Dave Chinner 2021-06-08 968 lv->lv_order_id = item->li_order_id;
a47518453bf9581 Dave Chinner 2021-06-08 969 num_iovecs += lv->lv_niovecs;
66fc9ffa8638be2 Dave Chinner 2021-06-04 970 /* we don't write ordered log vectors */
66fc9ffa8638be2 Dave Chinner 2021-06-04 971 if (lv->lv_buf_len != XFS_LOG_VEC_ORDERED)
66fc9ffa8638be2 Dave Chinner 2021-06-04 972 num_bytes += lv->lv_bytes;
a47518453bf9581 Dave Chinner 2021-06-08 973
a47518453bf9581 Dave Chinner 2021-06-08 974 list_add_tail(&lv->lv_list, &ctx->lv_chain);
a1785f597c8b060 Dave Chinner 2021-06-08 975 list_del_init(&item->li_cil);
a1785f597c8b060 Dave Chinner 2021-06-08 976 item->li_order_id = 0;
a1785f597c8b060 Dave Chinner 2021-06-08 977 item->li_lv = NULL;
71e330b593905e4 Dave Chinner 2010-05-21 978 }
71e330b593905e4 Dave Chinner 2010-05-21 979
71e330b593905e4 Dave Chinner 2010-05-21 980 /*
facd77e4e38b8f0 Dave Chinner 2021-06-04 981 * Switch the contexts so we can drop the context lock and move out
71e330b593905e4 Dave Chinner 2010-05-21 982 * of a shared context. We can't just go straight to the commit record,
71e330b593905e4 Dave Chinner 2010-05-21 983 * though - we need to synchronise with previous and future commits so
71e330b593905e4 Dave Chinner 2010-05-21 984 * that the commit records are correctly ordered in the log to ensure
71e330b593905e4 Dave Chinner 2010-05-21 985 * that we process items during log IO completion in the correct order.
71e330b593905e4 Dave Chinner 2010-05-21 986 *
71e330b593905e4 Dave Chinner 2010-05-21 987 * For example, if we get an EFI in one checkpoint and the EFD in the
71e330b593905e4 Dave Chinner 2010-05-21 988 * next (e.g. due to log forces), we do not want the checkpoint with
71e330b593905e4 Dave Chinner 2010-05-21 989 * the EFD to be committed before the checkpoint with the EFI. Hence
71e330b593905e4 Dave Chinner 2010-05-21 990 * we must strictly order the commit records of the checkpoints so
71e330b593905e4 Dave Chinner 2010-05-21 991 * that: a) the checkpoint callbacks are attached to the iclogs in the
71e330b593905e4 Dave Chinner 2010-05-21 992 * correct order; and b) the checkpoints are replayed in correct order
71e330b593905e4 Dave Chinner 2010-05-21 993 * in log recovery.
71e330b593905e4 Dave Chinner 2010-05-21 994 *
71e330b593905e4 Dave Chinner 2010-05-21 995 * Hence we need to add this context to the committing context list so
71e330b593905e4 Dave Chinner 2010-05-21 996 * that higher sequences will wait for us to write out a commit record
71e330b593905e4 Dave Chinner 2010-05-21 997 * before they do.
f876e44603ad091 Dave Chinner 2014-02-27 998 *
f39ae5297c5ce2f Dave Chinner 2021-06-04 999 * xfs_log_force_seq requires us to mirror the new sequence into the cil
f876e44603ad091 Dave Chinner 2014-02-27 1000 * structure atomically with the addition of this sequence to the
f876e44603ad091 Dave Chinner 2014-02-27 1001 * committing list. This also ensures that we can do unlocked checks
f876e44603ad091 Dave Chinner 2014-02-27 1002 * against the current sequence in log forces without risking
f876e44603ad091 Dave Chinner 2014-02-27 1003 * deferencing a freed context pointer.
71e330b593905e4 Dave Chinner 2010-05-21 1004 */
4bb928cdb900d06 Dave Chinner 2013-08-12 1005 spin_lock(&cil->xc_push_lock);
facd77e4e38b8f0 Dave Chinner 2021-06-04 1006 xlog_cil_ctx_switch(cil, new_ctx);
4bb928cdb900d06 Dave Chinner 2013-08-12 1007 spin_unlock(&cil->xc_push_lock);
71e330b593905e4 Dave Chinner 2010-05-21 1008 up_write(&cil->xc_ctx_lock);
71e330b593905e4 Dave Chinner 2010-05-21 1009
a1785f597c8b060 Dave Chinner 2021-06-08 1010 /*
a1785f597c8b060 Dave Chinner 2021-06-08 1011 * Sort the log vector chain before we add the transaction headers.
a1785f597c8b060 Dave Chinner 2021-06-08 1012 * This ensures we always have the transaction headers at the start
a1785f597c8b060 Dave Chinner 2021-06-08 1013 * of the chain.
a1785f597c8b060 Dave Chinner 2021-06-08 1014 */
a1785f597c8b060 Dave Chinner 2021-06-08 1015 list_sort(NULL, &ctx->lv_chain, xlog_cil_order_cmp);
a1785f597c8b060 Dave Chinner 2021-06-08 1016
71e330b593905e4 Dave Chinner 2010-05-21 1017 /*
71e330b593905e4 Dave Chinner 2010-05-21 1018 * Build a checkpoint transaction header and write it to the log to
71e330b593905e4 Dave Chinner 2010-05-21 1019 * begin the transaction. We need to account for the space used by the
71e330b593905e4 Dave Chinner 2010-05-21 1020 * transaction header here as it is not accounted for in xlog_write().
a47518453bf9581 Dave Chinner 2021-06-08 1021 * Add the lvhdr to the head of the lv chain we pass to xlog_write() so
a47518453bf9581 Dave Chinner 2021-06-08 1022 * it gets written into the iclog first.
71e330b593905e4 Dave Chinner 2010-05-21 1023 */
877cf3473914ae4 Dave Chinner 2021-06-04 1024 xlog_cil_build_trans_hdr(ctx, &thdr, &lvhdr, num_iovecs);
66fc9ffa8638be2 Dave Chinner 2021-06-04 1025 num_bytes += lvhdr.lv_bytes;
a47518453bf9581 Dave Chinner 2021-06-08 1026 list_add(&lvhdr.lv_list, &ctx->lv_chain);
71e330b593905e4 Dave Chinner 2010-05-21 1027
0279bbbbc03f2ce Dave Chinner 2021-06-03 1028 /*
0279bbbbc03f2ce Dave Chinner 2021-06-03 1029 * Before we format and submit the first iclog, we have to ensure that
0279bbbbc03f2ce Dave Chinner 2021-06-03 1030 * the metadata writeback ordering cache flush is complete.
0279bbbbc03f2ce Dave Chinner 2021-06-03 1031 */
0279bbbbc03f2ce Dave Chinner 2021-06-03 1032 wait_for_completion(&bdev_flush);
0279bbbbc03f2ce Dave Chinner 2021-06-03 1033
877cf3473914ae4 Dave Chinner 2021-06-04 1034 /*
877cf3473914ae4 Dave Chinner 2021-06-04 1035 * The LSN we need to pass to the log items on transaction commit is the
877cf3473914ae4 Dave Chinner 2021-06-04 1036 * LSN reported by the first log vector write, not the commit lsn. If we
877cf3473914ae4 Dave Chinner 2021-06-04 1037 * use the commit record lsn then we can move the tail beyond the grant
877cf3473914ae4 Dave Chinner 2021-06-04 1038 * write head.
877cf3473914ae4 Dave Chinner 2021-06-04 1039 */
fc3370002b56bcb Dave Chinner 2021-06-17 1040 error = xlog_write(log, ctx, &ctx->lv_chain, ctx->ticket,
a47518453bf9581 Dave Chinner 2021-06-08 1041 NULL, num_bytes);
a47518453bf9581 Dave Chinner 2021-06-08 1042
a47518453bf9581 Dave Chinner 2021-06-08 1043 /*
a47518453bf9581 Dave Chinner 2021-06-08 1044 * Take the lvhdr back off the lv_chain as it should not be passed
a47518453bf9581 Dave Chinner 2021-06-08 1045 * to log IO completion.
a47518453bf9581 Dave Chinner 2021-06-08 1046 */
a47518453bf9581 Dave Chinner 2021-06-08 1047 list_del(&lvhdr.lv_list);
71e330b593905e4 Dave Chinner 2010-05-21 1048 if (error)
7db37c5e6575b22 Dave Chinner 2011-01-27 1049 goto out_abort_free_ticket;
71e330b593905e4 Dave Chinner 2010-05-21 1050
71e330b593905e4 Dave Chinner 2010-05-21 1051 /*
71e330b593905e4 Dave Chinner 2010-05-21 1052 * now that we've written the checkpoint into the log, strictly
71e330b593905e4 Dave Chinner 2010-05-21 1053 * order the commit records so replay will get them in the right order.
71e330b593905e4 Dave Chinner 2010-05-21 1054 */
71e330b593905e4 Dave Chinner 2010-05-21 1055 restart:
4bb928cdb900d06 Dave Chinner 2013-08-12 1056 spin_lock(&cil->xc_push_lock);
71e330b593905e4 Dave Chinner 2010-05-21 1057 list_for_each_entry(new_ctx, &cil->xc_committing, committing) {
ac983517ec5941d Dave Chinner 2014-05-07 1058 /*
ac983517ec5941d Dave Chinner 2014-05-07 1059 * Avoid getting stuck in this loop because we were woken by the
ac983517ec5941d Dave Chinner 2014-05-07 1060 * shutdown, but then went back to sleep once already in the
ac983517ec5941d Dave Chinner 2014-05-07 1061 * shutdown state.
ac983517ec5941d Dave Chinner 2014-05-07 1062 */
ac983517ec5941d Dave Chinner 2014-05-07 1063 if (XLOG_FORCED_SHUTDOWN(log)) {
ac983517ec5941d Dave Chinner 2014-05-07 1064 spin_unlock(&cil->xc_push_lock);
ac983517ec5941d Dave Chinner 2014-05-07 1065 goto out_abort_free_ticket;
ac983517ec5941d Dave Chinner 2014-05-07 1066 }
ac983517ec5941d Dave Chinner 2014-05-07 1067
71e330b593905e4 Dave Chinner 2010-05-21 1068 /*
71e330b593905e4 Dave Chinner 2010-05-21 1069 * Higher sequences will wait for this one so skip them.
ac983517ec5941d Dave Chinner 2014-05-07 1070 * Don't wait for our own sequence, either.
71e330b593905e4 Dave Chinner 2010-05-21 1071 */
71e330b593905e4 Dave Chinner 2010-05-21 1072 if (new_ctx->sequence >= ctx->sequence)
71e330b593905e4 Dave Chinner 2010-05-21 1073 continue;
71e330b593905e4 Dave Chinner 2010-05-21 1074 if (!new_ctx->commit_lsn) {
71e330b593905e4 Dave Chinner 2010-05-21 1075 /*
71e330b593905e4 Dave Chinner 2010-05-21 1076 * It is still being pushed! Wait for the push to
71e330b593905e4 Dave Chinner 2010-05-21 1077 * complete, then start again from the beginning.
71e330b593905e4 Dave Chinner 2010-05-21 1078 */
4bb928cdb900d06 Dave Chinner 2013-08-12 1079 xlog_wait(&cil->xc_commit_wait, &cil->xc_push_lock);
71e330b593905e4 Dave Chinner 2010-05-21 1080 goto restart;
71e330b593905e4 Dave Chinner 2010-05-21 1081 }
71e330b593905e4 Dave Chinner 2010-05-21 1082 }
4bb928cdb900d06 Dave Chinner 2013-08-12 1083 spin_unlock(&cil->xc_push_lock);
71e330b593905e4 Dave Chinner 2010-05-21 1084
fc3370002b56bcb Dave Chinner 2021-06-17 1085 error = xlog_cil_write_commit_record(ctx, &commit_iclog);
dd401770b0ff68f Dave Chinner 2020-03-25 1086 if (error)
dd401770b0ff68f Dave Chinner 2020-03-25 1087 goto out_abort_free_ticket;
dd401770b0ff68f Dave Chinner 2020-03-25 1088
89ae379d564c5d8 Christoph Hellwig 2019-06-28 1089 spin_lock(&commit_iclog->ic_callback_lock);
1858bb0bec612df Christoph Hellwig 2019-10-14 1090 if (commit_iclog->ic_state == XLOG_STATE_IOERROR) {
89ae379d564c5d8 Christoph Hellwig 2019-06-28 1091 spin_unlock(&commit_iclog->ic_callback_lock);
e469cbe84f4ade9 Dave Chinner 2021-06-08 1092 goto out_abort_free_ticket;
89ae379d564c5d8 Christoph Hellwig 2019-06-28 1093 }
89ae379d564c5d8 Christoph Hellwig 2019-06-28 1094 ASSERT_ALWAYS(commit_iclog->ic_state == XLOG_STATE_ACTIVE ||
89ae379d564c5d8 Christoph Hellwig 2019-06-28 1095 commit_iclog->ic_state == XLOG_STATE_WANT_SYNC);
89ae379d564c5d8 Christoph Hellwig 2019-06-28 1096 list_add_tail(&ctx->iclog_entry, &commit_iclog->ic_callbacks);
89ae379d564c5d8 Christoph Hellwig 2019-06-28 1097 spin_unlock(&commit_iclog->ic_callback_lock);
71e330b593905e4 Dave Chinner 2010-05-21 1098
71e330b593905e4 Dave Chinner 2010-05-21 1099 /*
71e330b593905e4 Dave Chinner 2010-05-21 1100 * now the checkpoint commit is complete and we've attached the
71e330b593905e4 Dave Chinner 2010-05-21 1101 * callbacks to the iclog we can assign the commit LSN to the context
71e330b593905e4 Dave Chinner 2010-05-21 1102 * and wake up anyone who is waiting for the commit to complete.
71e330b593905e4 Dave Chinner 2010-05-21 1103 */
4bb928cdb900d06 Dave Chinner 2013-08-12 1104 spin_lock(&cil->xc_push_lock);
eb40a87500ac2f6 Dave Chinner 2010-12-21 1105 wake_up_all(&cil->xc_commit_wait);
4bb928cdb900d06 Dave Chinner 2013-08-12 1106 spin_unlock(&cil->xc_push_lock);
71e330b593905e4 Dave Chinner 2010-05-21 1107
e469cbe84f4ade9 Dave Chinner 2021-06-08 1108 /*
e469cbe84f4ade9 Dave Chinner 2021-06-08 1109 * Pull the ticket off the ctx so we can ungrant it after releasing the
e469cbe84f4ade9 Dave Chinner 2021-06-08 1110 * commit_iclog. The ctx may be freed by the time we return from
e469cbe84f4ade9 Dave Chinner 2021-06-08 1111 * releasing the commit_iclog (i.e. checkpoint has been completed and
e469cbe84f4ade9 Dave Chinner 2021-06-08 1112 * callback run) so we can't reference the ctx after the call to
e469cbe84f4ade9 Dave Chinner 2021-06-08 1113 * xlog_state_release_iclog().
e469cbe84f4ade9 Dave Chinner 2021-06-08 1114 */
e469cbe84f4ade9 Dave Chinner 2021-06-08 1115 ticket = ctx->ticket;
e469cbe84f4ade9 Dave Chinner 2021-06-08 1116
5fd9256ce156ef7 Dave Chinner 2021-06-03 1117 /*
815753dc16bbca2 Dave Chinner 2021-06-17 1118 * If the checkpoint spans multiple iclogs, wait for all previous iclogs
815753dc16bbca2 Dave Chinner 2021-06-17 1119 * to complete before we submit the commit_iclog. We can't use state
815753dc16bbca2 Dave Chinner 2021-06-17 1120 * checks for this - ACTIVE can be either a past completed iclog or a
815753dc16bbca2 Dave Chinner 2021-06-17 1121 * future iclog being filled, while WANT_SYNC through SYNC_DONE can be a
815753dc16bbca2 Dave Chinner 2021-06-17 1122 * past or future iclog awaiting IO or ordered IO completion to be run.
815753dc16bbca2 Dave Chinner 2021-06-17 1123 * In the latter case, if it's a future iclog and we wait on it, the we
815753dc16bbca2 Dave Chinner 2021-06-17 1124 * will hang because it won't get processed through to ic_force_wait
815753dc16bbca2 Dave Chinner 2021-06-17 1125 * wakeup until this commit_iclog is written to disk. Hence we use the
815753dc16bbca2 Dave Chinner 2021-06-17 1126 * iclog header lsn and compare it to the commit lsn to determine if we
815753dc16bbca2 Dave Chinner 2021-06-17 1127 * need to wait on iclogs or not.
5fd9256ce156ef7 Dave Chinner 2021-06-03 1128 */
5fd9256ce156ef7 Dave Chinner 2021-06-03 1129 spin_lock(&log->l_icloglock);
cb1acb3f3246368 Dave Chinner 2021-06-04 @1130 if (ctx->start_lsn != commit_lsn) {
815753dc16bbca2 Dave Chinner 2021-06-17 1131 struct xlog_in_core *iclog;
815753dc16bbca2 Dave Chinner 2021-06-17 1132
815753dc16bbca2 Dave Chinner 2021-06-17 1133 for (iclog = commit_iclog->ic_prev;
815753dc16bbca2 Dave Chinner 2021-06-17 1134 iclog != commit_iclog;
815753dc16bbca2 Dave Chinner 2021-06-17 1135 iclog = iclog->ic_prev) {
815753dc16bbca2 Dave Chinner 2021-06-17 1136 xfs_lsn_t hlsn;
815753dc16bbca2 Dave Chinner 2021-06-17 1137
815753dc16bbca2 Dave Chinner 2021-06-17 1138 /*
815753dc16bbca2 Dave Chinner 2021-06-17 1139 * If the LSN of the iclog is zero or in the future it
815753dc16bbca2 Dave Chinner 2021-06-17 1140 * means it has passed through IO completion and
815753dc16bbca2 Dave Chinner 2021-06-17 1141 * activation and hence all previous iclogs have also
815753dc16bbca2 Dave Chinner 2021-06-17 1142 * done so. We do not need to wait at all in this case.
815753dc16bbca2 Dave Chinner 2021-06-17 1143 */
815753dc16bbca2 Dave Chinner 2021-06-17 1144 hlsn = be64_to_cpu(iclog->ic_header.h_lsn);
815753dc16bbca2 Dave Chinner 2021-06-17 1145 if (!hlsn || XFS_LSN_CMP(hlsn, commit_lsn) > 0)
815753dc16bbca2 Dave Chinner 2021-06-17 1146 break;
815753dc16bbca2 Dave Chinner 2021-06-17 1147
815753dc16bbca2 Dave Chinner 2021-06-17 1148 /*
815753dc16bbca2 Dave Chinner 2021-06-17 1149 * If the LSN of the iclog is older than the commit lsn,
815753dc16bbca2 Dave Chinner 2021-06-17 1150 * we have to wait on it. Waiting on this via the
815753dc16bbca2 Dave Chinner 2021-06-17 1151 * ic_force_wait should also order the completion of all
815753dc16bbca2 Dave Chinner 2021-06-17 1152 * older iclogs, too, but we leave checking that to the
815753dc16bbca2 Dave Chinner 2021-06-17 1153 * next loop iteration.
815753dc16bbca2 Dave Chinner 2021-06-17 1154 */
815753dc16bbca2 Dave Chinner 2021-06-17 1155 ASSERT(XFS_LSN_CMP(hlsn, commit_lsn) < 0);
815753dc16bbca2 Dave Chinner 2021-06-17 1156 xlog_wait_on_iclog(iclog);
cb1acb3f3246368 Dave Chinner 2021-06-04 1157 spin_lock(&log->l_icloglock);
815753dc16bbca2 Dave Chinner 2021-06-17 1158 }
815753dc16bbca2 Dave Chinner 2021-06-17 1159
815753dc16bbca2 Dave Chinner 2021-06-17 1160 /*
815753dc16bbca2 Dave Chinner 2021-06-17 1161 * Regardless of whether we need to wait or not, the the
815753dc16bbca2 Dave Chinner 2021-06-17 1162 * commit_iclog write needs to issue a pre-flush so that the
815753dc16bbca2 Dave Chinner 2021-06-17 1163 * ordering for this checkpoint is correctly preserved down to
815753dc16bbca2 Dave Chinner 2021-06-17 1164 * stable storage.
815753dc16bbca2 Dave Chinner 2021-06-17 1165 */
cb1acb3f3246368 Dave Chinner 2021-06-04 1166 commit_iclog->ic_flags |= XLOG_ICL_NEED_FLUSH;
5fd9256ce156ef7 Dave Chinner 2021-06-03 1167 }
5fd9256ce156ef7 Dave Chinner 2021-06-03 1168
cb1acb3f3246368 Dave Chinner 2021-06-04 1169 /*
cb1acb3f3246368 Dave Chinner 2021-06-04 1170 * The commit iclog must be written to stable storage to guarantee
cb1acb3f3246368 Dave Chinner 2021-06-04 1171 * journal IO vs metadata writeback IO is correctly ordered on stable
cb1acb3f3246368 Dave Chinner 2021-06-04 1172 * storage.
e12213ba5d909a3 Dave Chinner 2021-06-04 1173 *
e12213ba5d909a3 Dave Chinner 2021-06-04 1174 * If the push caller needs the commit to be immediately stable and the
e12213ba5d909a3 Dave Chinner 2021-06-04 1175 * commit_iclog is not yet marked as XLOG_STATE_WANT_SYNC to indicate it
e12213ba5d909a3 Dave Chinner 2021-06-04 1176 * will be written when released, switch it's state to WANT_SYNC right
e12213ba5d909a3 Dave Chinner 2021-06-04 1177 * now.
cb1acb3f3246368 Dave Chinner 2021-06-04 1178 */
cb1acb3f3246368 Dave Chinner 2021-06-04 1179 commit_iclog->ic_flags |= XLOG_ICL_NEED_FUA;
e12213ba5d909a3 Dave Chinner 2021-06-04 1180 if (push_commit_stable && commit_iclog->ic_state == XLOG_STATE_ACTIVE)
e12213ba5d909a3 Dave Chinner 2021-06-04 1181 xlog_state_switch_iclogs(log, commit_iclog, 0);
e469cbe84f4ade9 Dave Chinner 2021-06-08 1182 xlog_state_release_iclog(log, commit_iclog, ticket);
cb1acb3f3246368 Dave Chinner 2021-06-04 1183 spin_unlock(&log->l_icloglock);
e469cbe84f4ade9 Dave Chinner 2021-06-08 1184
e469cbe84f4ade9 Dave Chinner 2021-06-08 1185 xfs_log_ticket_ungrant(log, ticket);
c7cc296ddd1f6d1 Christoph Hellwig 2020-03-20 1186 return;
71e330b593905e4 Dave Chinner 2010-05-21 1187
71e330b593905e4 Dave Chinner 2010-05-21 1188 out_skip:
71e330b593905e4 Dave Chinner 2010-05-21 1189 up_write(&cil->xc_ctx_lock);
71e330b593905e4 Dave Chinner 2010-05-21 1190 xfs_log_ticket_put(new_ctx->ticket);
71e330b593905e4 Dave Chinner 2010-05-21 1191 kmem_free(new_ctx);
c7cc296ddd1f6d1 Christoph Hellwig 2020-03-20 1192 return;
71e330b593905e4 Dave Chinner 2010-05-21 1193
7db37c5e6575b22 Dave Chinner 2011-01-27 1194 out_abort_free_ticket:
877cf3473914ae4 Dave Chinner 2021-06-04 1195 xfs_log_ticket_ungrant(log, ctx->ticket);
12e6a0f449d585b Christoph Hellwig 2020-03-20 1196 ASSERT(XLOG_FORCED_SHUTDOWN(log));
12e6a0f449d585b Christoph Hellwig 2020-03-20 1197 xlog_cil_committed(ctx);
4c2d542f2e78653 Dave Chinner 2012-04-23 1198 }
4c2d542f2e78653 Dave Chinner 2012-04-23 1199
---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all(a)lists.01.org
[-- Attachment #2: config.gz --]
[-- Type: application/gzip, Size: 34451 bytes --]
^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [PATCH 1/8] xfs: add iclog state trace events
2021-06-17 8:26 ` [PATCH 1/8] xfs: add iclog state trace events Dave Chinner
@ 2021-06-17 16:45 ` Darrick J. Wong
2021-06-18 14:09 ` Christoph Hellwig
1 sibling, 0 replies; 50+ messages in thread
From: Darrick J. Wong @ 2021-06-17 16:45 UTC (permalink / raw)
To: Dave Chinner; +Cc: linux-xfs
On Thu, Jun 17, 2021 at 06:26:10PM +1000, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
>
> For the DEBUGS!
>
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
Still looks fine to me.
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
--D
> ---
> fs/xfs/xfs_log.c | 18 +++++++++++++
> fs/xfs/xfs_log_priv.h | 10 ++++++++
> fs/xfs/xfs_trace.h | 60 +++++++++++++++++++++++++++++++++++++++++++
> 3 files changed, 88 insertions(+)
>
> diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c
> index e921b554b683..54fd6a695bb5 100644
> --- a/fs/xfs/xfs_log.c
> +++ b/fs/xfs/xfs_log.c
> @@ -524,6 +524,7 @@ __xlog_state_release_iclog(
> iclog->ic_header.h_tail_lsn = cpu_to_be64(tail_lsn);
> xlog_verify_tail_lsn(log, iclog, tail_lsn);
> /* cycle incremented when incrementing curr_block */
> + trace_xlog_iclog_syncing(iclog, _RET_IP_);
> return true;
> }
>
> @@ -543,6 +544,7 @@ xlog_state_release_iclog(
> {
> lockdep_assert_held(&log->l_icloglock);
>
> + trace_xlog_iclog_release(iclog, _RET_IP_);
> if (iclog->ic_state == XLOG_STATE_IOERROR)
> return -EIO;
>
> @@ -804,6 +806,7 @@ xlog_wait_on_iclog(
> {
> struct xlog *log = iclog->ic_log;
>
> + trace_xlog_iclog_wait_on(iclog, _RET_IP_);
> if (!XLOG_FORCED_SHUTDOWN(log) &&
> iclog->ic_state != XLOG_STATE_ACTIVE &&
> iclog->ic_state != XLOG_STATE_DIRTY) {
> @@ -1804,6 +1807,7 @@ xlog_write_iclog(
> unsigned int count)
> {
> ASSERT(bno < log->l_logBBsize);
> + trace_xlog_iclog_write(iclog, _RET_IP_);
>
> /*
> * We lock the iclogbufs here so that we can serialise against I/O
> @@ -1950,6 +1954,7 @@ xlog_sync(
> unsigned int size;
>
> ASSERT(atomic_read(&iclog->ic_refcnt) == 0);
> + trace_xlog_iclog_sync(iclog, _RET_IP_);
>
> count = xlog_calc_iclog_size(log, iclog, &roundoff);
>
> @@ -2488,6 +2493,7 @@ xlog_state_activate_iclog(
> int *iclogs_changed)
> {
> ASSERT(list_empty_careful(&iclog->ic_callbacks));
> + trace_xlog_iclog_activate(iclog, _RET_IP_);
>
> /*
> * If the number of ops in this iclog indicate it just contains the
> @@ -2577,6 +2583,8 @@ xlog_state_clean_iclog(
> {
> int iclogs_changed = 0;
>
> + trace_xlog_iclog_clean(dirty_iclog, _RET_IP_);
> +
> dirty_iclog->ic_state = XLOG_STATE_DIRTY;
>
> xlog_state_activate_iclogs(log, &iclogs_changed);
> @@ -2636,6 +2644,7 @@ xlog_state_set_callback(
> struct xlog_in_core *iclog,
> xfs_lsn_t header_lsn)
> {
> + trace_xlog_iclog_callback(iclog, _RET_IP_);
> iclog->ic_state = XLOG_STATE_CALLBACK;
>
> ASSERT(XFS_LSN_CMP(atomic64_read(&log->l_last_sync_lsn),
> @@ -2717,6 +2726,7 @@ xlog_state_do_iclog_callbacks(
> __releases(&log->l_icloglock)
> __acquires(&log->l_icloglock)
> {
> + trace_xlog_iclog_callbacks_start(iclog, _RET_IP_);
> spin_unlock(&log->l_icloglock);
> spin_lock(&iclog->ic_callback_lock);
> while (!list_empty(&iclog->ic_callbacks)) {
> @@ -2736,6 +2746,7 @@ xlog_state_do_iclog_callbacks(
> */
> spin_lock(&log->l_icloglock);
> spin_unlock(&iclog->ic_callback_lock);
> + trace_xlog_iclog_callbacks_done(iclog, _RET_IP_);
> }
>
> STATIC void
> @@ -2827,6 +2838,7 @@ xlog_state_done_syncing(
>
> spin_lock(&log->l_icloglock);
> ASSERT(atomic_read(&iclog->ic_refcnt) == 0);
> + trace_xlog_iclog_sync_done(iclog, _RET_IP_);
>
> /*
> * If we got an error, either on the first buffer, or in the case of
> @@ -2899,6 +2911,8 @@ xlog_state_get_iclog_space(
> atomic_inc(&iclog->ic_refcnt); /* prevents sync */
> log_offset = iclog->ic_offset;
>
> + trace_xlog_iclog_get_space(iclog, _RET_IP_);
> +
> /* On the 1st write to an iclog, figure out lsn. This works
> * if iclogs marked XLOG_STATE_WANT_SYNC always write out what they are
> * committing to. If the offset is set, that's how many blocks
> @@ -3056,6 +3070,7 @@ xlog_state_switch_iclogs(
> {
> ASSERT(iclog->ic_state == XLOG_STATE_ACTIVE);
> assert_spin_locked(&log->l_icloglock);
> + trace_xlog_iclog_switch(iclog, _RET_IP_);
>
> if (!eventual_size)
> eventual_size = iclog->ic_offset;
> @@ -3138,6 +3153,8 @@ xfs_log_force(
> if (iclog->ic_state == XLOG_STATE_IOERROR)
> goto out_error;
>
> + trace_xlog_iclog_force(iclog, _RET_IP_);
> +
> if (iclog->ic_state == XLOG_STATE_DIRTY ||
> (iclog->ic_state == XLOG_STATE_ACTIVE &&
> atomic_read(&iclog->ic_refcnt) == 0 && iclog->ic_offset == 0)) {
> @@ -3225,6 +3242,7 @@ xlog_force_lsn(
> goto out_error;
>
> while (be64_to_cpu(iclog->ic_header.h_lsn) != lsn) {
> + trace_xlog_iclog_force_lsn(iclog, _RET_IP_);
> iclog = iclog->ic_next;
> if (iclog == log->l_iclog)
> goto out_unlock;
> diff --git a/fs/xfs/xfs_log_priv.h b/fs/xfs/xfs_log_priv.h
> index e4e421a70335..330befd9f6be 100644
> --- a/fs/xfs/xfs_log_priv.h
> +++ b/fs/xfs/xfs_log_priv.h
> @@ -50,6 +50,16 @@ enum xlog_iclog_state {
> XLOG_STATE_IOERROR, /* IO error happened in sync'ing log */
> };
>
> +#define XLOG_STATE_STRINGS \
> + { XLOG_STATE_ACTIVE, "XLOG_STATE_ACTIVE" }, \
> + { XLOG_STATE_WANT_SYNC, "XLOG_STATE_WANT_SYNC" }, \
> + { XLOG_STATE_SYNCING, "XLOG_STATE_SYNCING" }, \
> + { XLOG_STATE_DONE_SYNC, "XLOG_STATE_DONE_SYNC" }, \
> + { XLOG_STATE_CALLBACK, "XLOG_STATE_CALLBACK" }, \
> + { XLOG_STATE_DIRTY, "XLOG_STATE_DIRTY" }, \
> + { XLOG_STATE_IOERROR, "XLOG_STATE_IOERROR" }
> +
> +
> /*
> * Log ticket flags
> */
> diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
> index 71dca776c110..28d570742000 100644
> --- a/fs/xfs/xfs_trace.h
> +++ b/fs/xfs/xfs_trace.h
> @@ -24,6 +24,7 @@ struct xlog_ticket;
> struct xlog_recover;
> struct xlog_recover_item;
> struct xlog_rec_header;
> +struct xlog_in_core;
> struct xfs_buf_log_format;
> struct xfs_inode_log_format;
> struct xfs_bmbt_irec;
> @@ -3927,6 +3928,65 @@ DEFINE_EVENT(xfs_icwalk_class, name, \
> DEFINE_ICWALK_EVENT(xfs_ioc_free_eofblocks);
> DEFINE_ICWALK_EVENT(xfs_blockgc_free_space);
>
> +TRACE_DEFINE_ENUM(XLOG_STATE_ACTIVE);
> +TRACE_DEFINE_ENUM(XLOG_STATE_WANT_SYNC);
> +TRACE_DEFINE_ENUM(XLOG_STATE_SYNCING);
> +TRACE_DEFINE_ENUM(XLOG_STATE_DONE_SYNC);
> +TRACE_DEFINE_ENUM(XLOG_STATE_CALLBACK);
> +TRACE_DEFINE_ENUM(XLOG_STATE_DIRTY);
> +TRACE_DEFINE_ENUM(XLOG_STATE_IOERROR);
> +
> +DECLARE_EVENT_CLASS(xlog_iclog_class,
> + TP_PROTO(struct xlog_in_core *iclog, unsigned long caller_ip),
> + TP_ARGS(iclog, caller_ip),
> + TP_STRUCT__entry(
> + __field(dev_t, dev)
> + __field(uint32_t, state)
> + __field(int32_t, refcount)
> + __field(uint32_t, offset)
> + __field(unsigned long long, lsn)
> + __field(unsigned long, caller_ip)
> + ),
> + TP_fast_assign(
> + __entry->dev = iclog->ic_log->l_mp->m_super->s_dev;
> + __entry->state = iclog->ic_state;
> + __entry->refcount = atomic_read(&iclog->ic_refcnt);
> + __entry->offset = iclog->ic_offset;
> + __entry->lsn = be64_to_cpu(iclog->ic_header.h_lsn);
> + __entry->caller_ip = caller_ip;
> + ),
> + TP_printk("dev %d:%d state %s refcnt %d offset %u lsn 0x%llx caller %pS",
> + MAJOR(__entry->dev), MINOR(__entry->dev),
> + __print_symbolic(__entry->state, XLOG_STATE_STRINGS),
> + __entry->refcount,
> + __entry->offset,
> + __entry->lsn,
> + (char *)__entry->caller_ip)
> +
> +);
> +
> +#define DEFINE_ICLOG_EVENT(name) \
> +DEFINE_EVENT(xlog_iclog_class, name, \
> + TP_PROTO(struct xlog_in_core *iclog, unsigned long caller_ip), \
> + TP_ARGS(iclog, caller_ip))
> +
> +DEFINE_ICLOG_EVENT(xlog_iclog_activate);
> +DEFINE_ICLOG_EVENT(xlog_iclog_clean);
> +DEFINE_ICLOG_EVENT(xlog_iclog_callback);
> +DEFINE_ICLOG_EVENT(xlog_iclog_callbacks_start);
> +DEFINE_ICLOG_EVENT(xlog_iclog_callbacks_done);
> +DEFINE_ICLOG_EVENT(xlog_iclog_force);
> +DEFINE_ICLOG_EVENT(xlog_iclog_force_lsn);
> +DEFINE_ICLOG_EVENT(xlog_iclog_get_space);
> +DEFINE_ICLOG_EVENT(xlog_iclog_release);
> +DEFINE_ICLOG_EVENT(xlog_iclog_switch);
> +DEFINE_ICLOG_EVENT(xlog_iclog_sync);
> +DEFINE_ICLOG_EVENT(xlog_iclog_syncing);
> +DEFINE_ICLOG_EVENT(xlog_iclog_sync_done);
> +DEFINE_ICLOG_EVENT(xlog_iclog_want_sync);
> +DEFINE_ICLOG_EVENT(xlog_iclog_wait_on);
> +DEFINE_ICLOG_EVENT(xlog_iclog_write);
> +
> #endif /* _TRACE_XFS_H */
>
> #undef TRACE_INCLUDE_PATH
> --
> 2.31.1
>
^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [PATCH 2/8] xfs: don't wait on future iclogs when pushing the CIL
2021-06-17 8:26 ` [PATCH 2/8] xfs: don't wait on future iclogs when pushing the CIL Dave Chinner
@ 2021-06-17 17:49 ` Darrick J. Wong
2021-06-17 21:55 ` Dave Chinner
0 siblings, 1 reply; 50+ messages in thread
From: Darrick J. Wong @ 2021-06-17 17:49 UTC (permalink / raw)
To: Dave Chinner; +Cc: linux-xfs
On Thu, Jun 17, 2021 at 06:26:11PM +1000, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
>
> The iclogbuf ring attached to the struct xlog is circular, hence the
> first and last iclogs in the ring can only be determined by
> comparing them against the log->l_iclog pointer.
>
> In xfs_cil_push_work(), we want to wait on previous iclogs that were
> issued so that we can flush them to stable storage with the commit
> record write, and it simply waits on the previous iclog in the ring.
> This, however, leads to CIL push hangs in generic/019 like so:
>
> task:kworker/u33:0 state:D stack:12680 pid: 7 ppid: 2 flags:0x00004000
> Workqueue: xfs-cil/pmem1 xlog_cil_push_work
> Call Trace:
> __schedule+0x30b/0x9f0
> schedule+0x68/0xe0
> xlog_wait_on_iclog+0x121/0x190
> ? wake_up_q+0xa0/0xa0
> xlog_cil_push_work+0x994/0xa10
> ? _raw_spin_lock+0x15/0x20
> ? xfs_swap_extents+0x920/0x920
> process_one_work+0x1ab/0x390
> worker_thread+0x56/0x3d0
> ? rescuer_thread+0x3c0/0x3c0
> kthread+0x14d/0x170
> ? __kthread_bind_mask+0x70/0x70
> ret_from_fork+0x1f/0x30
>
> With other threads blocking in either xlog_state_get_iclog_space()
> waiting for iclog space or xlog_grant_head_wait() waiting for log
> reservation space.
>
> The problem here is that the previous iclog on the ring might
> actually be a future iclog. That is, if log->l_iclog points at
> commit_iclog, commit_iclog is the first (oldest) iclog in the ring
> and there are no previous iclogs pending as they have all completed
> their IO and been activated again. IOWs, commit_iclog->ic_prev
> points to an iclog that will be written in the future, not one that
> has been written in the past.
>
> Hence, in this case, waiting on the ->ic_prev iclog is incorrect
> behaviour, and depending on the state of the future iclog, we can
> end up with a circular ABA wait cycle and we hang.
>
> The fix is made more complex by the fact that many iclogs states
> cannot be used to determine if the iclog is a past or future iclog.
> Hence we have to determine past iclogs by checking the LSN of the
> iclog rather than their state. A past ACTIVE iclog will have a LSN
> of zero, while a future ACTIVE iclog will have a LSN greater than
> the current iclog. We don't wait on either of these cases.
>
> Similarly, a future iclog that hasn't completed IO will have an LSN
> greater than the current iclog and so we don't wait on them. A past
> iclog that is still undergoing IO completion will have a LSN less
> than the current iclog and those are the only iclogs that we need to
> wait on.
>
> Hence we can use the iclog LSN to determine what iclogs we need to
> wait on here.
>
> Fixes: 5fd9256ce156 ("xfs: separate CIL commit record IO")
> Reported-by: Brian Foster <bfoster@redhat.com>
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> ---
> fs/xfs/xfs_log_cil.c | 51 ++++++++++++++++++++++++++++++++++++++------
> 1 file changed, 45 insertions(+), 6 deletions(-)
>
> diff --git a/fs/xfs/xfs_log_cil.c b/fs/xfs/xfs_log_cil.c
> index 705619e9dab4..2fb0ab02dda3 100644
> --- a/fs/xfs/xfs_log_cil.c
> +++ b/fs/xfs/xfs_log_cil.c
> @@ -1075,15 +1075,54 @@ xlog_cil_push_work(
> ticket = ctx->ticket;
>
> /*
> - * If the checkpoint spans multiple iclogs, wait for all previous
> - * iclogs to complete before we submit the commit_iclog. In this case,
> - * the commit_iclog write needs to issue a pre-flush so that the
> - * ordering is correctly preserved down to stable storage.
> + * If the checkpoint spans multiple iclogs, wait for all previous iclogs
> + * to complete before we submit the commit_iclog. We can't use state
> + * checks for this - ACTIVE can be either a past completed iclog or a
> + * future iclog being filled, while WANT_SYNC through SYNC_DONE can be a
> + * past or future iclog awaiting IO or ordered IO completion to be run.
> + * In the latter case, if it's a future iclog and we wait on it, the we
> + * will hang because it won't get processed through to ic_force_wait
> + * wakeup until this commit_iclog is written to disk. Hence we use the
> + * iclog header lsn and compare it to the commit lsn to determine if we
> + * need to wait on iclogs or not.
> */
> spin_lock(&log->l_icloglock);
> if (ctx->start_lsn != commit_lsn) {
> - xlog_wait_on_iclog(commit_iclog->ic_prev);
> - spin_lock(&log->l_icloglock);
> + struct xlog_in_core *iclog;
> +
> + for (iclog = commit_iclog->ic_prev;
> + iclog != commit_iclog;
> + iclog = iclog->ic_prev) {
> + xfs_lsn_t hlsn;
> +
> + /*
> + * If the LSN of the iclog is zero or in the future it
> + * means it has passed through IO completion and
> + * activation and hence all previous iclogs have also
> + * done so. We do not need to wait at all in this case.
> + */
> + hlsn = be64_to_cpu(iclog->ic_header.h_lsn);
> + if (!hlsn || XFS_LSN_CMP(hlsn, commit_lsn) > 0)
> + break;
> +
> + /*
> + * If the LSN of the iclog is older than the commit lsn,
> + * we have to wait on it. Waiting on this via the
> + * ic_force_wait should also order the completion of all
> + * older iclogs, too, but we leave checking that to the
> + * next loop iteration.
> + */
> + ASSERT(XFS_LSN_CMP(hlsn, commit_lsn) < 0);
> + xlog_wait_on_iclog(iclog);
> + spin_lock(&log->l_icloglock);
The presence of a loop here confuses me a bit -- we really only need to
check and wait on commit->ic_prev since xlog_wait_on_iclog waits for
both the iclog that it is given as well as all previous iclogs, right?
Does "we leave checking that to the next loop iteration" mean that once
we've waited on commit->ic_prev, the next iclog iterated (i.e.
commit->ic_prev->ic_prev) should break out of the loop?
--D
> + }
> +
> + /*
> + * Regardless of whether we need to wait or not, the the
> + * commit_iclog write needs to issue a pre-flush so that the
> + * ordering for this checkpoint is correctly preserved down to
> + * stable storage.
> + */
> commit_iclog->ic_flags |= XLOG_ICL_NEED_FLUSH;
> }
>
> --
> 2.31.1
>
^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [PATCH 3/8] xfs: move xlog_commit_record to xfs_log_cil.c
2021-06-17 8:26 ` [PATCH 3/8] xfs: move xlog_commit_record to xfs_log_cil.c Dave Chinner
2021-06-17 12:57 ` kernel test robot
@ 2021-06-17 17:50 ` Darrick J. Wong
2021-06-17 21:56 ` Dave Chinner
2021-06-18 14:16 ` Christoph Hellwig
2 siblings, 1 reply; 50+ messages in thread
From: Darrick J. Wong @ 2021-06-17 17:50 UTC (permalink / raw)
To: Dave Chinner; +Cc: linux-xfs
On Thu, Jun 17, 2021 at 06:26:12PM +1000, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
>
> It is only used by the CIL checkpoints, and is the counterpart to
> start record formatting and writing that is already local to
> xfs_log_cil.c.
>
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> ---
> fs/xfs/xfs_log.c | 41 ---------------------------------------
> fs/xfs/xfs_log_cil.c | 45 ++++++++++++++++++++++++++++++++++++++++++-
> fs/xfs/xfs_log_priv.h | 2 --
> 3 files changed, 44 insertions(+), 44 deletions(-)
>
> diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c
> index 54fd6a695bb5..cf661c155786 100644
> --- a/fs/xfs/xfs_log.c
> +++ b/fs/xfs/xfs_log.c
> @@ -1563,47 +1563,6 @@ xlog_alloc_log(
> return ERR_PTR(error);
> } /* xlog_alloc_log */
>
> -/*
> - * Write out the commit record of a transaction associated with the given
> - * ticket to close off a running log write. Return the lsn of the commit record.
> - */
> -int
> -xlog_commit_record(
> - struct xlog *log,
> - struct xlog_ticket *ticket,
> - struct xlog_in_core **iclog,
> - xfs_lsn_t *lsn)
> -{
> - struct xlog_op_header ophdr = {
> - .oh_clientid = XFS_TRANSACTION,
> - .oh_tid = cpu_to_be32(ticket->t_tid),
> - .oh_flags = XLOG_COMMIT_TRANS,
> - };
> - struct xfs_log_iovec reg = {
> - .i_addr = &ophdr,
> - .i_len = sizeof(struct xlog_op_header),
> - .i_type = XLOG_REG_TYPE_COMMIT,
> - };
> - struct xfs_log_vec vec = {
> - .lv_niovecs = 1,
> - .lv_iovecp = ®,
> - };
> - int error;
> - LIST_HEAD(lv_chain);
> - INIT_LIST_HEAD(&vec.lv_list);
> - list_add(&vec.lv_list, &lv_chain);
> -
> - if (XLOG_FORCED_SHUTDOWN(log))
> - return -EIO;
> -
> - /* account for space used by record data */
> - ticket->t_curr_res -= reg.i_len;
> - error = xlog_write(log, &lv_chain, ticket, lsn, iclog, reg.i_len);
> - if (error)
> - xfs_force_shutdown(log->l_mp, SHUTDOWN_LOG_IO_ERROR);
> - return error;
> -}
> -
> /*
> * Compute the LSN that we'd need to push the log tail towards in order to have
> * (a) enough on-disk log space to log the number of bytes specified, (b) at
> diff --git a/fs/xfs/xfs_log_cil.c b/fs/xfs/xfs_log_cil.c
> index 2fb0ab02dda3..2c8b25888c53 100644
> --- a/fs/xfs/xfs_log_cil.c
> +++ b/fs/xfs/xfs_log_cil.c
> @@ -783,6 +783,48 @@ xlog_cil_build_trans_hdr(
> tic->t_curr_res -= lvhdr->lv_bytes;
> }
>
> +/*
> + * Write out the commit record of a checkpoint transaction associated with the
> + * given ticket to close off a running log write. Return the lsn of the commit
> + * record.
> + */
> +int
static int, like the robot suggests?
With that fixed,
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
--D
> +xlog_cil_write_commit_record(
> + struct xlog *log,
> + struct xlog_ticket *ticket,
> + struct xlog_in_core **iclog,
> + xfs_lsn_t *lsn)
> +{
> + struct xlog_op_header ophdr = {
> + .oh_clientid = XFS_TRANSACTION,
> + .oh_tid = cpu_to_be32(ticket->t_tid),
> + .oh_flags = XLOG_COMMIT_TRANS,
> + };
> + struct xfs_log_iovec reg = {
> + .i_addr = &ophdr,
> + .i_len = sizeof(struct xlog_op_header),
> + .i_type = XLOG_REG_TYPE_COMMIT,
> + };
> + struct xfs_log_vec vec = {
> + .lv_niovecs = 1,
> + .lv_iovecp = ®,
> + };
> + int error;
> + LIST_HEAD(lv_chain);
> + INIT_LIST_HEAD(&vec.lv_list);
> + list_add(&vec.lv_list, &lv_chain);
> +
> + if (XLOG_FORCED_SHUTDOWN(log))
> + return -EIO;
> +
> + /* account for space used by record data */
> + ticket->t_curr_res -= reg.i_len;
> + error = xlog_write(log, &lv_chain, ticket, lsn, iclog, reg.i_len);
> + if (error)
> + xfs_force_shutdown(log->l_mp, SHUTDOWN_LOG_IO_ERROR);
> + return error;
> +}
> +
> /*
> * CIL item reordering compare function. We want to order in ascending ID order,
> * but we want to leave items with the same ID in the order they were added to
> @@ -1041,7 +1083,8 @@ xlog_cil_push_work(
> }
> spin_unlock(&cil->xc_push_lock);
>
> - error = xlog_commit_record(log, ctx->ticket, &commit_iclog, &commit_lsn);
> + error = xlog_cil_write_commit_record(log, ctx->ticket, &commit_iclog,
> + &commit_lsn);
> if (error)
> goto out_abort_free_ticket;
>
> diff --git a/fs/xfs/xfs_log_priv.h b/fs/xfs/xfs_log_priv.h
> index 330befd9f6be..26f26769d1c6 100644
> --- a/fs/xfs/xfs_log_priv.h
> +++ b/fs/xfs/xfs_log_priv.h
> @@ -490,8 +490,6 @@ void xlog_print_trans(struct xfs_trans *);
> int xlog_write(struct xlog *log, struct list_head *lv_chain,
> struct xlog_ticket *tic, xfs_lsn_t *start_lsn,
> struct xlog_in_core **commit_iclog, uint32_t len);
> -int xlog_commit_record(struct xlog *log, struct xlog_ticket *ticket,
> - struct xlog_in_core **iclog, xfs_lsn_t *lsn);
>
> void xfs_log_ticket_ungrant(struct xlog *log, struct xlog_ticket *ticket);
> void xfs_log_ticket_regrant(struct xlog *log, struct xlog_ticket *ticket);
> --
> 2.31.1
>
^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [PATCH 0/8 V2] xfs: log fixes for for-next
2021-06-17 8:26 [PATCH 0/8 V2] xfs: log fixes for for-next Dave Chinner
` (7 preceding siblings ...)
2021-06-17 8:26 ` [PATCH 8/8] xfs: order CIL checkpoint start records Dave Chinner
@ 2021-06-17 18:32 ` Brian Foster
2021-06-17 19:05 ` Darrick J. Wong
2021-06-18 22:48 ` Dave Chinner
9 siblings, 1 reply; 50+ messages in thread
From: Brian Foster @ 2021-06-17 18:32 UTC (permalink / raw)
To: Dave Chinner; +Cc: linux-xfs
On Thu, Jun 17, 2021 at 06:26:09PM +1000, Dave Chinner wrote:
> Hi folks,
>
> This is followup from the first set of log fixes for for-next that
> were posted here:
>
> https://lore.kernel.org/linux-xfs/20210615175719.GD158209@locust/T/#mde2cf0bb7d2ac369815a7e9371f0303efc89f51b
>
> The first two patches of this series are updates for those patches,
> change log below. The rest is the fix for the bigger issue we
> uncovered in investigating the generic/019 failures, being that
> we're triggering a zero-day bug in the way log recovery assigns LSNs
> to checkpoints.
>
> The "simple" fix of using the same ordering code as the commit
> record for the start records in the CIL push turned into a lot of
> patches once I started cleaning it up, separating out all the
> different bits and finally realising all the things I needed to
> change to avoid unintentional logic/behavioural changes. Hence
> there's some code movement, some factoring, API changes to
> xlog_write(), changing where we attach callbacks to commit iclogs so
> they remain correctly ordered if there are multiple commit records
> in the one iclog and then, finally, strictly ordering the start
> records....
>
> The original "simple fix" I tested last night ran almost a thousand
> cycles of generic/019 without a log hang or recovery failure of any
> kind. The refactored patchset has run a couple hundred cycles of
> g/019 and g/475 over the last few hours without a failure, so I'm
> posting this so we can get a review iteration done while I sleep so
> we can - hopefully - get this sorted out before the end of the week.
>
My first spin of this included generic/019 and generic/475, ran for 18
or so iterations and 475 exploded with a stream of asserts followed by a
NULL pointer crash:
# grep -e Assertion -e BUG dmesg.out
...
[ 7951.878058] XFS: Assertion failed: atomic_read(&buip->bui_refcount) > 0, file: fs/xfs/xfs_bmap_item.c, line: 57
[ 7952.261251] XFS: Assertion failed: atomic_read(&buip->bui_refcount) > 0, file: fs/xfs/xfs_bmap_item.c, line: 57
[ 7952.644444] XFS: Assertion failed: atomic_read(&buip->bui_refcount) > 0, file: fs/xfs/xfs_bmap_item.c, line: 57
[ 7953.027626] XFS: Assertion failed: atomic_read(&buip->bui_refcount) > 0, file: fs/xfs/xfs_bmap_item.c, line: 57
[ 7953.410804] BUG: kernel NULL pointer dereference, address: 000000000000031f
[ 7954.118973] BUG: unable to handle page fault for address: ffffa57ccf99fa98
I don't know if this is a regression, but I've not seen it before. I've
attempted to spin generic/475 since then to see if it reproduces again,
but so far I'm only running into some of the preexisting issues
associated with that test. I'll let it go a while more and probably
switch it back to running both sometime before the end of the day for an
overnight test.
A full copy of the assert and NULL pointer BUG splat is included below
for reference. It looks like the fault BUG splat ended up interspersed
or otherwise mangled, but I suspect that one is just fallout from the
immediately previous crash.
Brian
--- 8< ---
[ 7953.027626] XFS: Assertion failed: atomic_read(&buip->bui_refcount) > 0, file: fs/xfs/xfs_bmap_item.c, line: 57
[ 7953.037737] ------------[ cut here ]------------
[ 7953.042358] WARNING: CPU: 0 PID: 131627 at fs/xfs/xfs_message.c:112 assfail+0x25/0x28 [xfs]
[ 7953.050782] Modules linked in: rfkill dm_service_time dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua rpcrdma sunrpc rdma_ucm ib_srpt ib_isert iscsi_target_mod target_core_mod ib_iser libiscsi scsi_transport_iscsi rdma_cm ib_umad ib_ipoib iw_cm ib_cm mlx5_ib intel_rapl_msr intel_rapl_common isst_if_common ib_uverbs ib_core skx_edac nfit libnvdimm x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel mlx5_core kvm ipmi_ssif iTCO_wdt intel_pmc_bxt irqbypass iTCO_vendor_support rapl acpi_ipmi intel_cstate psample intel_uncore mei_me wmi_bmof ipmi_si pcspkr mlxfw i2c_i801 tg3 pci_hyperv_intf mei lpc_ich intel_pch_thermal ipmi_devintf i2c_smbus ipmi_msghandler acpi_power_meter fuse zram ip_tables xfs lpfc mgag200 drm_kms_helper nvmet_fc nvmet cec nvme_fc crct10dif_pclmul crc32_pclmul nvme_fabrics drm crc32c_intel nvme_core ghash_clmulni_intel scsi_transport_fc megaraid_sas i2c_algo_bit wmi
[ 7953.129548] CPU: 0 PID: 131627 Comm: kworker/u161:5 Tainted: G W I 5.13.0-rc4+ #70
[ 7953.138243] Hardware name: Dell Inc. PowerEdge R740/01KPX8, BIOS 1.6.11 11/20/2018
[ 7953.145818] Workqueue: xfs-cil/dm-7 xlog_cil_push_work [xfs]
[ 7953.151554] RIP: 0010:assfail+0x25/0x28 [xfs]
[ 7953.155991] Code: ff ff 0f 0b c3 0f 1f 44 00 00 41 89 c8 48 89 d1 48 89 f2 48 c7 c6 d8 eb c3 c0 e8 cf fa ff ff 80 3d f1 d4 0a 00 00 74 02 0f 0b <0f> 0b c3 48 8d 45 10 48 89 e2 4c 89 e6 48 89 1c 24 48 89 44 24 18
[ 7953.174745] RSP: 0018:ffffa57ccf99fa50 EFLAGS: 00010246
[ 7953.179982] RAX: 00000000ffffffea RBX: 0000000500003977 RCX: 0000000000000000
[ 7953.187121] RDX: 00000000ffffffc0 RSI: 0000000000000000 RDI: ffffffffc0c300e2
[ 7953.194264] RBP: ffff91f685725040 R08: 0000000000000000 R09: 000000000000000a
[ 7953.201405] R10: 000000000000000a R11: f000000000000000 R12: ffff91f685725040
[ 7953.208546] R13: 0000000000000000 R14: ffff91f66abed140 R15: ffff91c76dfccb40
[ 7953.215686] FS: 0000000000000000(0000) GS:ffff91f580800000(0000) knlGS:0000000000000000
[ 7953.223781] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 7953.229533] CR2: 00000000020e2108 CR3: 0000003d02826003 CR4: 00000000007706f0
[ 7953.236667] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 7953.243809] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 7953.250949] PKRU: 55555554
[ 7953.253669] Call Trace:
[ 7953.256123] xfs_bui_release+0x4b/0x50 [xfs]
[ 7953.260466] xfs_trans_committed_bulk+0x158/0x2c0 [xfs]
[ 7953.265762] ? lock_release+0x1cd/0x2a0
[ 7953.269610] ? _raw_spin_unlock+0x1f/0x30
[ 7953.273630] ? xlog_write+0x1e2/0x630 [xfs]
[ 7953.277886] ? lock_acquire+0x15d/0x380
[ 7953.281732] ? lock_acquire+0x15d/0x380
[ 7953.285582] ? lock_release+0x1cd/0x2a0
[ 7953.289428] ? trace_hardirqs_on+0x1b/0xd0
[ 7953.293536] ? _raw_spin_unlock_irqrestore+0x37/0x40
[ 7953.298511] ? __wake_up_common_lock+0x7a/0x90
[ 7953.302966] ? lock_release+0x1cd/0x2a0
[ 7953.306813] xlog_cil_committed+0x34f/0x390 [xfs]
[ 7953.311593] ? xlog_cil_push_work+0x715/0x8d0 [xfs]
[ 7953.316547] xlog_cil_push_work+0x740/0x8d0 [xfs]
[ 7953.321321] ? _raw_spin_unlock_irq+0x24/0x40
[ 7953.325689] ? finish_task_switch.isra.0+0xa0/0x2c0
[ 7953.330580] ? kmem_cache_free+0x247/0x5c0
[ 7953.334685] ? fsnotify_final_mark_destroy+0x1c/0x30
[ 7953.339658] ? lock_acquire+0x15d/0x380
[ 7953.343505] ? lock_acquire+0x15d/0x380
[ 7953.347353] ? lock_release+0x1cd/0x2a0
[ 7953.351203] process_one_work+0x26e/0x560
[ 7953.355225] worker_thread+0x52/0x3b0
[ 7953.358898] ? process_one_work+0x560/0x560
[ 7953.363094] kthread+0x12c/0x150
[ 7953.366335] ? __kthread_bind_mask+0x60/0x60
[ 7953.370617] ret_from_fork+0x22/0x30
[ 7953.374206] irq event stamp: 0
[ 7953.377268] hardirqs last enabled at (0): [<0000000000000000>] 0x0
[ 7953.383544] hardirqs last disabled at (0): [<ffffffffb50da3f4>] copy_process+0x754/0x1d00
[ 7953.391724] softirqs last enabled at (0): [<ffffffffb50da3f4>] copy_process+0x754/0x1d00
[ 7953.399907] softirqs last disabled at (0): [<0000000000000000>] 0x0
[ 7953.406179] ---[ end trace f04c960f66265f3a ]---
[ 7953.410804] BUG: kernel NULL pointer dereference, address: 000000000000031f
[ 7953.417760] #PF: supervisor read access in kernel mode
[ 7953.422900] #PF: error_code(0x0000) - not-present page
[ 7953.428038] PGD 0 P4D 0
[ 7953.430579] Oops: 0000 [#1] SMP PTI
[ 7953.434070] CPU: 0 PID: 131627 Comm: kworker/u161:5 Tainted: G W I 5.13.0-rc4+ #70
[ 7953.442764] Hardware name: Dell Inc. PowerEdge R740/01KPX8, BIOS 1.6.11 11/20/2018
[ 7953.450330] Workqueue: xfs-cil/dm-7 xlog_cil_push_work [xfs]
[ 7953.456058] RIP: 0010:xfs_trans_committed_bulk+0xcc/0x2c0 [xfs]
[ 7953.462036] Code: 41 83 c5 01 48 89 54 c4 50 41 83 fd 1f 0f 8f 11 01 00 00 4d 8b 36 4c 3b 34 24 74 28 4d 8b 66 20 40 84 ed 75 54 49 8b 44 24 60 <f6> 00 01 74 91 48 8b 40 38 4c 89 e7 e8 63 6b 42 f5 4d 8b 36 4c 3b
[ 7953.480783] RSP: 0018:ffffa57ccf99fa68 EFLAGS: 00010202
[ 7953.486009] RAX: 000000000000031f RBX: 0000000500003977 RCX: 0000000000000000
[ 7953.493141] RDX: 00000000ffffffc0 RSI: 0000000000000000 RDI: ffffffffc0c300e2
[ 7953.500274] RBP: 0000000000000001 R08: 0000000000000000 R09: 000000000000000a
[ 7953.507404] R10: 000000000000000a R11: f000000000000000 R12: ffff91c759fedb20
[ 7953.514536] R13: 0000000000000000 R14: ffff91c759fedb00 R15: ffff91c76dfccb40
[ 7953.521671] FS: 0000000000000000(0000) GS:ffff91f580800000(0000) knlGS:0000000000000000
[ 7953.529757] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 7953.535501] CR2: 000000000000031f CR3: 0000003d02826003 CR4: 00000000007706f0
[ 7953.542633] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 7953.549768] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 7953.556899] PKRU: 55555554
[ 7953.559612] Call Trace:
[ 7953.562064] ? lock_release+0x1cd/0x2a0
[ 7953.565902] ? _raw_spin_unlock+0x1f/0x30
[ 7953.569917] ? xlog_write+0x1e2/0x630 [xfs]
[ 7953.574162] ? lock_acquire+0x15d/0x380
[ 7953.578000] ? lock_acquire+0x15d/0x380
[ 7953.581841] ? lock_release+0x1cd/0x2a0
[ 7953.585680] ? trace_hardirqs_on+0x1b/0xd0
[ 7953.589780] ? _raw_spin_unlock_irqrestore+0x37/0x40
[ 7953.594744] ? __wake_up_common_lock+0x7a/0x90
[ 7953.599192] ? lock_release+0x1cd/0x2a0
[ 7953.603031] xlog_cil_committed+0x34f/0x390 [xfs]
[ 7953.607798] ? xlog_cil_push_work+0x715/0x8d0 [xfs]
[ 7953.612738] xlog_cil_push_work+0x740/0x8d0 [xfs]
[ 7953.617504] ? _raw_spin_unlock_irq+0x24/0x40
[ 7953.621862] ? finish_task_switch.isra.0+0xa0/0x2c0
[ 7953.626745] ? kmem_cache_free+0x247/0x5c0
[ 7953.630839] ? fsnotify_final_mark_destroy+0x1c/0x30
[ 7953.635806] ? lock_acquire+0x15d/0x380
[ 7953.639646] ? lock_acquire+0x15d/0x380
[ 7953.643484] ? lock_release+0x1cd/0x2a0
[ 7953.647323] process_one_work+0x26e/0x560
[ 7953.651337] worker_thread+0x52/0x3b0
[ 7953.655003] ? process_one_work+0x560/0x560
[ 7953.659188] kthread+0x12c/0x150
[ 7953.662421] ? __kthread_bind_mask+0x60/0x60
[ 7953.666694] ret_from_fork+0x22/0x30
[ 7953.670273] Modules linked in: rfkill dm_service_time dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua rpcrdma sunrpc rdma_ucm ib_srpt ib_isert iscsi_target_mod target_core_mod ib_iser libiscsi scsi_transport_iscsi rdma_cm ib_umad ib_ipoib iw_cm ib_cm mlx5_ib intel_rapl_msr intel_rapl_common isst_if_common ib_uverbs ib_core skx_edac nfit libnvdimm x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel mlx5_core kvm ipmi_ssif iTCO_wdt intel_pmc_bxt irqbypass iTCO_vendor_support rapl acpi_ipmi intel_cstate psample intel_uncore mei_me wmi_bmof ipmi_si pcspkr mlxfw i2c_i801 tg3 pci_hyperv_intf mei lpc_ich intel_pch_thermal ipmi_devintf i2c_smbus ipmi_msghandler acpi_power_meter fuse zram ip_tables xfs lpfc mgag200 drm_kms_helper nvmet_fc nvmet cec nvme_fc crct10dif_pclmul crc32_pclmul nvme_fabrics drm crc32c_intel nvme_core ghash_clmulni_intel scsi_transport_fc megaraid_sas i2c_algo_bit wmi
[ 7953.749025] CR2: 000000000000031f
[ 7953.752345] ---[ end trace f04c960f66265f3b ]---
^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [PATCH 0/8 V2] xfs: log fixes for for-next
2021-06-17 18:32 ` [PATCH 0/8 V2] xfs: log fixes for for-next Brian Foster
@ 2021-06-17 19:05 ` Darrick J. Wong
2021-06-17 20:06 ` Brian Foster
2021-06-17 23:43 ` Dave Chinner
0 siblings, 2 replies; 50+ messages in thread
From: Darrick J. Wong @ 2021-06-17 19:05 UTC (permalink / raw)
To: Brian Foster; +Cc: Dave Chinner, linux-xfs
On Thu, Jun 17, 2021 at 02:32:30PM -0400, Brian Foster wrote:
> On Thu, Jun 17, 2021 at 06:26:09PM +1000, Dave Chinner wrote:
> > Hi folks,
> >
> > This is followup from the first set of log fixes for for-next that
> > were posted here:
> >
> > https://lore.kernel.org/linux-xfs/20210615175719.GD158209@locust/T/#mde2cf0bb7d2ac369815a7e9371f0303efc89f51b
> >
> > The first two patches of this series are updates for those patches,
> > change log below. The rest is the fix for the bigger issue we
> > uncovered in investigating the generic/019 failures, being that
> > we're triggering a zero-day bug in the way log recovery assigns LSNs
> > to checkpoints.
> >
> > The "simple" fix of using the same ordering code as the commit
> > record for the start records in the CIL push turned into a lot of
> > patches once I started cleaning it up, separating out all the
> > different bits and finally realising all the things I needed to
> > change to avoid unintentional logic/behavioural changes. Hence
> > there's some code movement, some factoring, API changes to
> > xlog_write(), changing where we attach callbacks to commit iclogs so
> > they remain correctly ordered if there are multiple commit records
> > in the one iclog and then, finally, strictly ordering the start
> > records....
> >
> > The original "simple fix" I tested last night ran almost a thousand
> > cycles of generic/019 without a log hang or recovery failure of any
> > kind. The refactored patchset has run a couple hundred cycles of
> > g/019 and g/475 over the last few hours without a failure, so I'm
> > posting this so we can get a review iteration done while I sleep so
> > we can - hopefully - get this sorted out before the end of the week.
> >
>
> My first spin of this included generic/019 and generic/475, ran for 18
> or so iterations and 475 exploded with a stream of asserts followed by a
> NULL pointer crash:
>
> # grep -e Assertion -e BUG dmesg.out
> ...
> [ 7951.878058] XFS: Assertion failed: atomic_read(&buip->bui_refcount) > 0, file: fs/xfs/xfs_bmap_item.c, line: 57
> [ 7952.261251] XFS: Assertion failed: atomic_read(&buip->bui_refcount) > 0, file: fs/xfs/xfs_bmap_item.c, line: 57
> [ 7952.644444] XFS: Assertion failed: atomic_read(&buip->bui_refcount) > 0, file: fs/xfs/xfs_bmap_item.c, line: 57
> [ 7953.027626] XFS: Assertion failed: atomic_read(&buip->bui_refcount) > 0, file: fs/xfs/xfs_bmap_item.c, line: 57
> [ 7953.410804] BUG: kernel NULL pointer dereference, address: 000000000000031f
> [ 7954.118973] BUG: unable to handle page fault for address: ffffa57ccf99fa98
>
> I don't know if this is a regression, but I've not seen it before. I've
> attempted to spin generic/475 since then to see if it reproduces again,
> but so far I'm only running into some of the preexisting issues
> associated with that test.
By any chance, do the two log recovery fixes I sent yesterday make those
problems go away?
> I'll let it go a while more and probably
> switch it back to running both sometime before the end of the day for an
> overnight test.
Also, do the CIL livelocks go away if you apply only patches 1-2?
> A full copy of the assert and NULL pointer BUG splat is included below
> for reference. It looks like the fault BUG splat ended up interspersed
> or otherwise mangled, but I suspect that one is just fallout from the
> immediately previous crash.
I have a question about the composition of this 8-patch series --
which patches fix the new cil code, and which ones fix the out of order
recovery problems? I suspect that patches 1-2 are for the new CIL code,
and 3-8 are to fix the recovery problems.
Thinking with my distro kernel not-maintainer hat on, I'm considering
how to backport whatever fixes emerge for the recovery ordering issue
into existing kernels. The way I see things right now, the CIL changes
(+ fixes) and the ordering bug fixes are separate issues. The log
ordering problems should get fixed as soon as we have a practical
solution; the CIL changes could get deferred if need be since it's a
medium-high risk; and the real question is how to sequence all this?
(Or to put it another way: I'm still stuck going "oh wowwww this is a
lot more change" while trying to understand patch 4)
--D
>
> Brian
>
> --- 8< ---
>
> [ 7953.027626] XFS: Assertion failed: atomic_read(&buip->bui_refcount) > 0, file: fs/xfs/xfs_bmap_item.c, line: 57
> [ 7953.037737] ------------[ cut here ]------------
> [ 7953.042358] WARNING: CPU: 0 PID: 131627 at fs/xfs/xfs_message.c:112 assfail+0x25/0x28 [xfs]
> [ 7953.050782] Modules linked in: rfkill dm_service_time dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua rpcrdma sunrpc rdma_ucm ib_srpt ib_isert iscsi_target_mod target_core_mod ib_iser libiscsi scsi_transport_iscsi rdma_cm ib_umad ib_ipoib iw_cm ib_cm mlx5_ib intel_rapl_msr intel_rapl_common isst_if_common ib_uverbs ib_core skx_edac nfit libnvdimm x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel mlx5_core kvm ipmi_ssif iTCO_wdt intel_pmc_bxt irqbypass iTCO_vendor_support rapl acpi_ipmi intel_cstate psample intel_uncore mei_me wmi_bmof ipmi_si pcspkr mlxfw i2c_i801 tg3 pci_hyperv_intf mei lpc_ich intel_pch_thermal ipmi_devintf i2c_smbus ipmi_msghandler acpi_power_meter fuse zram ip_tables xfs lpfc mgag200 drm_kms_helper nvmet_fc nvmet cec nvme_fc crct10dif_pclmul crc32_pclmul nvme_fabrics drm crc32c_intel nvme_core ghash_clmulni_intel scsi_transport_fc megaraid_sas i2c_algo_bit wmi
> [ 7953.129548] CPU: 0 PID: 131627 Comm: kworker/u161:5 Tainted: G W I 5.13.0-rc4+ #70
> [ 7953.138243] Hardware name: Dell Inc. PowerEdge R740/01KPX8, BIOS 1.6.11 11/20/2018
> [ 7953.145818] Workqueue: xfs-cil/dm-7 xlog_cil_push_work [xfs]
> [ 7953.151554] RIP: 0010:assfail+0x25/0x28 [xfs]
> [ 7953.155991] Code: ff ff 0f 0b c3 0f 1f 44 00 00 41 89 c8 48 89 d1 48 89 f2 48 c7 c6 d8 eb c3 c0 e8 cf fa ff ff 80 3d f1 d4 0a 00 00 74 02 0f 0b <0f> 0b c3 48 8d 45 10 48 89 e2 4c 89 e6 48 89 1c 24 48 89 44 24 18
> [ 7953.174745] RSP: 0018:ffffa57ccf99fa50 EFLAGS: 00010246
> [ 7953.179982] RAX: 00000000ffffffea RBX: 0000000500003977 RCX: 0000000000000000
> [ 7953.187121] RDX: 00000000ffffffc0 RSI: 0000000000000000 RDI: ffffffffc0c300e2
> [ 7953.194264] RBP: ffff91f685725040 R08: 0000000000000000 R09: 000000000000000a
> [ 7953.201405] R10: 000000000000000a R11: f000000000000000 R12: ffff91f685725040
> [ 7953.208546] R13: 0000000000000000 R14: ffff91f66abed140 R15: ffff91c76dfccb40
> [ 7953.215686] FS: 0000000000000000(0000) GS:ffff91f580800000(0000) knlGS:0000000000000000
> [ 7953.223781] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 7953.229533] CR2: 00000000020e2108 CR3: 0000003d02826003 CR4: 00000000007706f0
> [ 7953.236667] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [ 7953.243809] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [ 7953.250949] PKRU: 55555554
> [ 7953.253669] Call Trace:
> [ 7953.256123] xfs_bui_release+0x4b/0x50 [xfs]
> [ 7953.260466] xfs_trans_committed_bulk+0x158/0x2c0 [xfs]
> [ 7953.265762] ? lock_release+0x1cd/0x2a0
> [ 7953.269610] ? _raw_spin_unlock+0x1f/0x30
> [ 7953.273630] ? xlog_write+0x1e2/0x630 [xfs]
> [ 7953.277886] ? lock_acquire+0x15d/0x380
> [ 7953.281732] ? lock_acquire+0x15d/0x380
> [ 7953.285582] ? lock_release+0x1cd/0x2a0
> [ 7953.289428] ? trace_hardirqs_on+0x1b/0xd0
> [ 7953.293536] ? _raw_spin_unlock_irqrestore+0x37/0x40
> [ 7953.298511] ? __wake_up_common_lock+0x7a/0x90
> [ 7953.302966] ? lock_release+0x1cd/0x2a0
> [ 7953.306813] xlog_cil_committed+0x34f/0x390 [xfs]
> [ 7953.311593] ? xlog_cil_push_work+0x715/0x8d0 [xfs]
> [ 7953.316547] xlog_cil_push_work+0x740/0x8d0 [xfs]
> [ 7953.321321] ? _raw_spin_unlock_irq+0x24/0x40
> [ 7953.325689] ? finish_task_switch.isra.0+0xa0/0x2c0
> [ 7953.330580] ? kmem_cache_free+0x247/0x5c0
> [ 7953.334685] ? fsnotify_final_mark_destroy+0x1c/0x30
> [ 7953.339658] ? lock_acquire+0x15d/0x380
> [ 7953.343505] ? lock_acquire+0x15d/0x380
> [ 7953.347353] ? lock_release+0x1cd/0x2a0
> [ 7953.351203] process_one_work+0x26e/0x560
> [ 7953.355225] worker_thread+0x52/0x3b0
> [ 7953.358898] ? process_one_work+0x560/0x560
> [ 7953.363094] kthread+0x12c/0x150
> [ 7953.366335] ? __kthread_bind_mask+0x60/0x60
> [ 7953.370617] ret_from_fork+0x22/0x30
> [ 7953.374206] irq event stamp: 0
> [ 7953.377268] hardirqs last enabled at (0): [<0000000000000000>] 0x0
> [ 7953.383544] hardirqs last disabled at (0): [<ffffffffb50da3f4>] copy_process+0x754/0x1d00
> [ 7953.391724] softirqs last enabled at (0): [<ffffffffb50da3f4>] copy_process+0x754/0x1d00
> [ 7953.399907] softirqs last disabled at (0): [<0000000000000000>] 0x0
> [ 7953.406179] ---[ end trace f04c960f66265f3a ]---
> [ 7953.410804] BUG: kernel NULL pointer dereference, address: 000000000000031f
> [ 7953.417760] #PF: supervisor read access in kernel mode
> [ 7953.422900] #PF: error_code(0x0000) - not-present page
> [ 7953.428038] PGD 0 P4D 0
> [ 7953.430579] Oops: 0000 [#1] SMP PTI
> [ 7953.434070] CPU: 0 PID: 131627 Comm: kworker/u161:5 Tainted: G W I 5.13.0-rc4+ #70
> [ 7953.442764] Hardware name: Dell Inc. PowerEdge R740/01KPX8, BIOS 1.6.11 11/20/2018
> [ 7953.450330] Workqueue: xfs-cil/dm-7 xlog_cil_push_work [xfs]
> [ 7953.456058] RIP: 0010:xfs_trans_committed_bulk+0xcc/0x2c0 [xfs]
> [ 7953.462036] Code: 41 83 c5 01 48 89 54 c4 50 41 83 fd 1f 0f 8f 11 01 00 00 4d 8b 36 4c 3b 34 24 74 28 4d 8b 66 20 40 84 ed 75 54 49 8b 44 24 60 <f6> 00 01 74 91 48 8b 40 38 4c 89 e7 e8 63 6b 42 f5 4d 8b 36 4c 3b
> [ 7953.480783] RSP: 0018:ffffa57ccf99fa68 EFLAGS: 00010202
> [ 7953.486009] RAX: 000000000000031f RBX: 0000000500003977 RCX: 0000000000000000
> [ 7953.493141] RDX: 00000000ffffffc0 RSI: 0000000000000000 RDI: ffffffffc0c300e2
> [ 7953.500274] RBP: 0000000000000001 R08: 0000000000000000 R09: 000000000000000a
> [ 7953.507404] R10: 000000000000000a R11: f000000000000000 R12: ffff91c759fedb20
> [ 7953.514536] R13: 0000000000000000 R14: ffff91c759fedb00 R15: ffff91c76dfccb40
> [ 7953.521671] FS: 0000000000000000(0000) GS:ffff91f580800000(0000) knlGS:0000000000000000
> [ 7953.529757] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 7953.535501] CR2: 000000000000031f CR3: 0000003d02826003 CR4: 00000000007706f0
> [ 7953.542633] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [ 7953.549768] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [ 7953.556899] PKRU: 55555554
> [ 7953.559612] Call Trace:
> [ 7953.562064] ? lock_release+0x1cd/0x2a0
> [ 7953.565902] ? _raw_spin_unlock+0x1f/0x30
> [ 7953.569917] ? xlog_write+0x1e2/0x630 [xfs]
> [ 7953.574162] ? lock_acquire+0x15d/0x380
> [ 7953.578000] ? lock_acquire+0x15d/0x380
> [ 7953.581841] ? lock_release+0x1cd/0x2a0
> [ 7953.585680] ? trace_hardirqs_on+0x1b/0xd0
> [ 7953.589780] ? _raw_spin_unlock_irqrestore+0x37/0x40
> [ 7953.594744] ? __wake_up_common_lock+0x7a/0x90
> [ 7953.599192] ? lock_release+0x1cd/0x2a0
> [ 7953.603031] xlog_cil_committed+0x34f/0x390 [xfs]
> [ 7953.607798] ? xlog_cil_push_work+0x715/0x8d0 [xfs]
> [ 7953.612738] xlog_cil_push_work+0x740/0x8d0 [xfs]
> [ 7953.617504] ? _raw_spin_unlock_irq+0x24/0x40
> [ 7953.621862] ? finish_task_switch.isra.0+0xa0/0x2c0
> [ 7953.626745] ? kmem_cache_free+0x247/0x5c0
> [ 7953.630839] ? fsnotify_final_mark_destroy+0x1c/0x30
> [ 7953.635806] ? lock_acquire+0x15d/0x380
> [ 7953.639646] ? lock_acquire+0x15d/0x380
> [ 7953.643484] ? lock_release+0x1cd/0x2a0
> [ 7953.647323] process_one_work+0x26e/0x560
> [ 7953.651337] worker_thread+0x52/0x3b0
> [ 7953.655003] ? process_one_work+0x560/0x560
> [ 7953.659188] kthread+0x12c/0x150
> [ 7953.662421] ? __kthread_bind_mask+0x60/0x60
> [ 7953.666694] ret_from_fork+0x22/0x30
> [ 7953.670273] Modules linked in: rfkill dm_service_time dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua rpcrdma sunrpc rdma_ucm ib_srpt ib_isert iscsi_target_mod target_core_mod ib_iser libiscsi scsi_transport_iscsi rdma_cm ib_umad ib_ipoib iw_cm ib_cm mlx5_ib intel_rapl_msr intel_rapl_common isst_if_common ib_uverbs ib_core skx_edac nfit libnvdimm x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel mlx5_core kvm ipmi_ssif iTCO_wdt intel_pmc_bxt irqbypass iTCO_vendor_support rapl acpi_ipmi intel_cstate psample intel_uncore mei_me wmi_bmof ipmi_si pcspkr mlxfw i2c_i801 tg3 pci_hyperv_intf mei lpc_ich intel_pch_thermal ipmi_devintf i2c_smbus ipmi_msghandler acpi_power_meter fuse zram ip_tables xfs lpfc mgag200 drm_kms_helper nvmet_fc nvmet cec nvme_fc crct10dif_pclmul crc32_pclmul nvme_fabrics drm crc32c_intel nvme_core ghash_clmulni_intel scsi_transport_fc megaraid_sas i2c_algo_bit wmi
> [ 7953.749025] CR2: 000000000000031f
> [ 7953.752345] ---[ end trace f04c960f66265f3b ]---
>
^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [PATCH 5/8] xfs: factor out log write ordering from xlog_cil_push_work()
2021-06-17 8:26 ` [PATCH 5/8] xfs: factor out log write ordering from xlog_cil_push_work() Dave Chinner
@ 2021-06-17 19:59 ` Darrick J. Wong
2021-06-18 14:27 ` Christoph Hellwig
0 siblings, 1 reply; 50+ messages in thread
From: Darrick J. Wong @ 2021-06-17 19:59 UTC (permalink / raw)
To: Dave Chinner; +Cc: linux-xfs
On Thu, Jun 17, 2021 at 06:26:14PM +1000, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
>
> So we can use it for start record ordering as well as commit record
> ordering in future.
>
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
This tricked me for a second until I realized that xlog_cil_order_write
is the chunk of code just prior to the xlog_cil_write_commit_record
call.
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
--D
> ---
> fs/xfs/xfs_log_cil.c | 89 ++++++++++++++++++++++++++------------------
> 1 file changed, 52 insertions(+), 37 deletions(-)
>
> diff --git a/fs/xfs/xfs_log_cil.c b/fs/xfs/xfs_log_cil.c
> index 35fc3e57d870..f993ec69fc97 100644
> --- a/fs/xfs/xfs_log_cil.c
> +++ b/fs/xfs/xfs_log_cil.c
> @@ -784,9 +784,54 @@ xlog_cil_build_trans_hdr(
> }
>
> /*
> - * Write out the commit record of a checkpoint transaction associated with the
> - * given ticket to close off a running log write. Return the lsn of the commit
> - * record.
> + * Ensure that the order of log writes follows checkpoint sequence order. This
> + * relies on the context LSN being zero until the log write has guaranteed the
> + * LSN that the log write will start at via xlog_state_get_iclog_space().
> + */
> +static int
> +xlog_cil_order_write(
> + struct xfs_cil *cil,
> + xfs_csn_t sequence)
> +{
> + struct xfs_cil_ctx *ctx;
> +
> +restart:
> + spin_lock(&cil->xc_push_lock);
> + list_for_each_entry(ctx, &cil->xc_committing, committing) {
> + /*
> + * Avoid getting stuck in this loop because we were woken by the
> + * shutdown, but then went back to sleep once already in the
> + * shutdown state.
> + */
> + if (XLOG_FORCED_SHUTDOWN(cil->xc_log)) {
> + spin_unlock(&cil->xc_push_lock);
> + return -EIO;
> + }
> +
> + /*
> + * Higher sequences will wait for this one so skip them.
> + * Don't wait for our own sequence, either.
> + */
> + if (ctx->sequence >= sequence)
> + continue;
> + if (!ctx->commit_lsn) {
> + /*
> + * It is still being pushed! Wait for the push to
> + * complete, then start again from the beginning.
> + */
> + xlog_wait(&cil->xc_commit_wait, &cil->xc_push_lock);
> + goto restart;
> + }
> + }
> + spin_unlock(&cil->xc_push_lock);
> + return 0;
> +}
> +
> +/*
> + * Write out the commit record of a checkpoint transaction to close off a
> + * running log write. These commit records are strictly ordered in ascending CIL
> + * sequence order so that log recovery will always replay the checkpoints in the
> + * correct order.
> */
> int
> xlog_cil_write_commit_record(
> @@ -816,6 +861,10 @@ xlog_cil_write_commit_record(
> if (XLOG_FORCED_SHUTDOWN(log))
> return -EIO;
>
> + error = xlog_cil_order_write(ctx->cil, ctx->sequence);
> + if (error)
> + return error;
> +
> /* account for space used by record data */
> ctx->ticket->t_curr_res -= reg.i_len;
> error = xlog_write(log, ctx, &lv_chain, ctx->ticket, iclog, reg.i_len);
> @@ -1048,40 +1097,6 @@ xlog_cil_push_work(
> if (error)
> goto out_abort_free_ticket;
>
> - /*
> - * now that we've written the checkpoint into the log, strictly
> - * order the commit records so replay will get them in the right order.
> - */
> -restart:
> - spin_lock(&cil->xc_push_lock);
> - list_for_each_entry(new_ctx, &cil->xc_committing, committing) {
> - /*
> - * Avoid getting stuck in this loop because we were woken by the
> - * shutdown, but then went back to sleep once already in the
> - * shutdown state.
> - */
> - if (XLOG_FORCED_SHUTDOWN(log)) {
> - spin_unlock(&cil->xc_push_lock);
> - goto out_abort_free_ticket;
> - }
> -
> - /*
> - * Higher sequences will wait for this one so skip them.
> - * Don't wait for our own sequence, either.
> - */
> - if (new_ctx->sequence >= ctx->sequence)
> - continue;
> - if (!new_ctx->commit_lsn) {
> - /*
> - * It is still being pushed! Wait for the push to
> - * complete, then start again from the beginning.
> - */
> - xlog_wait(&cil->xc_commit_wait, &cil->xc_push_lock);
> - goto restart;
> - }
> - }
> - spin_unlock(&cil->xc_push_lock);
> -
> error = xlog_cil_write_commit_record(ctx, &commit_iclog);
> if (error)
> goto out_abort_free_ticket;
> --
> 2.31.1
>
^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [PATCH 0/8 V2] xfs: log fixes for for-next
2021-06-17 19:05 ` Darrick J. Wong
@ 2021-06-17 20:06 ` Brian Foster
2021-06-17 20:26 ` Darrick J. Wong
2021-06-17 23:43 ` Dave Chinner
1 sibling, 1 reply; 50+ messages in thread
From: Brian Foster @ 2021-06-17 20:06 UTC (permalink / raw)
To: Darrick J. Wong; +Cc: Dave Chinner, linux-xfs
On Thu, Jun 17, 2021 at 12:05:19PM -0700, Darrick J. Wong wrote:
> On Thu, Jun 17, 2021 at 02:32:30PM -0400, Brian Foster wrote:
> > On Thu, Jun 17, 2021 at 06:26:09PM +1000, Dave Chinner wrote:
> > > Hi folks,
> > >
> > > This is followup from the first set of log fixes for for-next that
> > > were posted here:
> > >
> > > https://lore.kernel.org/linux-xfs/20210615175719.GD158209@locust/T/#mde2cf0bb7d2ac369815a7e9371f0303efc89f51b
> > >
> > > The first two patches of this series are updates for those patches,
> > > change log below. The rest is the fix for the bigger issue we
> > > uncovered in investigating the generic/019 failures, being that
> > > we're triggering a zero-day bug in the way log recovery assigns LSNs
> > > to checkpoints.
> > >
> > > The "simple" fix of using the same ordering code as the commit
> > > record for the start records in the CIL push turned into a lot of
> > > patches once I started cleaning it up, separating out all the
> > > different bits and finally realising all the things I needed to
> > > change to avoid unintentional logic/behavioural changes. Hence
> > > there's some code movement, some factoring, API changes to
> > > xlog_write(), changing where we attach callbacks to commit iclogs so
> > > they remain correctly ordered if there are multiple commit records
> > > in the one iclog and then, finally, strictly ordering the start
> > > records....
> > >
> > > The original "simple fix" I tested last night ran almost a thousand
> > > cycles of generic/019 without a log hang or recovery failure of any
> > > kind. The refactored patchset has run a couple hundred cycles of
> > > g/019 and g/475 over the last few hours without a failure, so I'm
> > > posting this so we can get a review iteration done while I sleep so
> > > we can - hopefully - get this sorted out before the end of the week.
> > >
> >
> > My first spin of this included generic/019 and generic/475, ran for 18
> > or so iterations and 475 exploded with a stream of asserts followed by a
> > NULL pointer crash:
> >
> > # grep -e Assertion -e BUG dmesg.out
> > ...
> > [ 7951.878058] XFS: Assertion failed: atomic_read(&buip->bui_refcount) > 0, file: fs/xfs/xfs_bmap_item.c, line: 57
> > [ 7952.261251] XFS: Assertion failed: atomic_read(&buip->bui_refcount) > 0, file: fs/xfs/xfs_bmap_item.c, line: 57
> > [ 7952.644444] XFS: Assertion failed: atomic_read(&buip->bui_refcount) > 0, file: fs/xfs/xfs_bmap_item.c, line: 57
> > [ 7953.027626] XFS: Assertion failed: atomic_read(&buip->bui_refcount) > 0, file: fs/xfs/xfs_bmap_item.c, line: 57
> > [ 7953.410804] BUG: kernel NULL pointer dereference, address: 000000000000031f
> > [ 7954.118973] BUG: unable to handle page fault for address: ffffa57ccf99fa98
> >
> > I don't know if this is a regression, but I've not seen it before. I've
> > attempted to spin generic/475 since then to see if it reproduces again,
> > but so far I'm only running into some of the preexisting issues
> > associated with that test.
>
> By any chance, do the two log recovery fixes I sent yesterday make those
> problems go away?
>
Hadn't got to those ones yet...
> > I'll let it go a while more and probably
> > switch it back to running both sometime before the end of the day for an
> > overnight test.
>
> Also, do the CIL livelocks go away if you apply only patches 1-2?
>
It's kind of hard to discern the effect of individual fixes when
multiple corruptions are at play. :/ I suppose I could switch up my
planned overnight test to include the aforementioned 2 recovery fixes
and 1-2 from this series, if that is preferable..? I suspect that would
leave around the originally reported generic/019 corruption presumably
caused by the start LSN ordering issue, but we could see if the deadlock
is addressed and whether 475 survives any longer.
Brian
> > A full copy of the assert and NULL pointer BUG splat is included below
> > for reference. It looks like the fault BUG splat ended up interspersed
> > or otherwise mangled, but I suspect that one is just fallout from the
> > immediately previous crash.
>
> I have a question about the composition of this 8-patch series --
> which patches fix the new cil code, and which ones fix the out of order
> recovery problems? I suspect that patches 1-2 are for the new CIL code,
> and 3-8 are to fix the recovery problems.
>
> Thinking with my distro kernel not-maintainer hat on, I'm considering
> how to backport whatever fixes emerge for the recovery ordering issue
> into existing kernels. The way I see things right now, the CIL changes
> (+ fixes) and the ordering bug fixes are separate issues. The log
> ordering problems should get fixed as soon as we have a practical
> solution; the CIL changes could get deferred if need be since it's a
> medium-high risk; and the real question is how to sequence all this?
>
> (Or to put it another way: I'm still stuck going "oh wowwww this is a
> lot more change" while trying to understand patch 4)
>
> --D
>
> >
> > Brian
> >
> > --- 8< ---
> >
> > [ 7953.027626] XFS: Assertion failed: atomic_read(&buip->bui_refcount) > 0, file: fs/xfs/xfs_bmap_item.c, line: 57
> > [ 7953.037737] ------------[ cut here ]------------
> > [ 7953.042358] WARNING: CPU: 0 PID: 131627 at fs/xfs/xfs_message.c:112 assfail+0x25/0x28 [xfs]
> > [ 7953.050782] Modules linked in: rfkill dm_service_time dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua rpcrdma sunrpc rdma_ucm ib_srpt ib_isert iscsi_target_mod target_core_mod ib_iser libiscsi scsi_transport_iscsi rdma_cm ib_umad ib_ipoib iw_cm ib_cm mlx5_ib intel_rapl_msr intel_rapl_common isst_if_common ib_uverbs ib_core skx_edac nfit libnvdimm x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel mlx5_core kvm ipmi_ssif iTCO_wdt intel_pmc_bxt irqbypass iTCO_vendor_support rapl acpi_ipmi intel_cstate psample intel_uncore mei_me wmi_bmof ipmi_si pcspkr mlxfw i2c_i801 tg3 pci_hyperv_intf mei lpc_ich intel_pch_thermal ipmi_devintf i2c_smbus ipmi_msghandler acpi_power_meter fuse zram ip_tables xfs lpfc mgag200 drm_kms_helper nvmet_fc nvmet cec nvme_fc crct10dif_pclmul crc32_pclmul nvme_fabrics drm crc32c_intel nvme_core ghash_clmulni_intel scsi_transport_fc megaraid_sas i2c_algo_bit wmi
> > [ 7953.129548] CPU: 0 PID: 131627 Comm: kworker/u161:5 Tainted: G W I 5.13.0-rc4+ #70
> > [ 7953.138243] Hardware name: Dell Inc. PowerEdge R740/01KPX8, BIOS 1.6.11 11/20/2018
> > [ 7953.145818] Workqueue: xfs-cil/dm-7 xlog_cil_push_work [xfs]
> > [ 7953.151554] RIP: 0010:assfail+0x25/0x28 [xfs]
> > [ 7953.155991] Code: ff ff 0f 0b c3 0f 1f 44 00 00 41 89 c8 48 89 d1 48 89 f2 48 c7 c6 d8 eb c3 c0 e8 cf fa ff ff 80 3d f1 d4 0a 00 00 74 02 0f 0b <0f> 0b c3 48 8d 45 10 48 89 e2 4c 89 e6 48 89 1c 24 48 89 44 24 18
> > [ 7953.174745] RSP: 0018:ffffa57ccf99fa50 EFLAGS: 00010246
> > [ 7953.179982] RAX: 00000000ffffffea RBX: 0000000500003977 RCX: 0000000000000000
> > [ 7953.187121] RDX: 00000000ffffffc0 RSI: 0000000000000000 RDI: ffffffffc0c300e2
> > [ 7953.194264] RBP: ffff91f685725040 R08: 0000000000000000 R09: 000000000000000a
> > [ 7953.201405] R10: 000000000000000a R11: f000000000000000 R12: ffff91f685725040
> > [ 7953.208546] R13: 0000000000000000 R14: ffff91f66abed140 R15: ffff91c76dfccb40
> > [ 7953.215686] FS: 0000000000000000(0000) GS:ffff91f580800000(0000) knlGS:0000000000000000
> > [ 7953.223781] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [ 7953.229533] CR2: 00000000020e2108 CR3: 0000003d02826003 CR4: 00000000007706f0
> > [ 7953.236667] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > [ 7953.243809] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> > [ 7953.250949] PKRU: 55555554
> > [ 7953.253669] Call Trace:
> > [ 7953.256123] xfs_bui_release+0x4b/0x50 [xfs]
> > [ 7953.260466] xfs_trans_committed_bulk+0x158/0x2c0 [xfs]
> > [ 7953.265762] ? lock_release+0x1cd/0x2a0
> > [ 7953.269610] ? _raw_spin_unlock+0x1f/0x30
> > [ 7953.273630] ? xlog_write+0x1e2/0x630 [xfs]
> > [ 7953.277886] ? lock_acquire+0x15d/0x380
> > [ 7953.281732] ? lock_acquire+0x15d/0x380
> > [ 7953.285582] ? lock_release+0x1cd/0x2a0
> > [ 7953.289428] ? trace_hardirqs_on+0x1b/0xd0
> > [ 7953.293536] ? _raw_spin_unlock_irqrestore+0x37/0x40
> > [ 7953.298511] ? __wake_up_common_lock+0x7a/0x90
> > [ 7953.302966] ? lock_release+0x1cd/0x2a0
> > [ 7953.306813] xlog_cil_committed+0x34f/0x390 [xfs]
> > [ 7953.311593] ? xlog_cil_push_work+0x715/0x8d0 [xfs]
> > [ 7953.316547] xlog_cil_push_work+0x740/0x8d0 [xfs]
> > [ 7953.321321] ? _raw_spin_unlock_irq+0x24/0x40
> > [ 7953.325689] ? finish_task_switch.isra.0+0xa0/0x2c0
> > [ 7953.330580] ? kmem_cache_free+0x247/0x5c0
> > [ 7953.334685] ? fsnotify_final_mark_destroy+0x1c/0x30
> > [ 7953.339658] ? lock_acquire+0x15d/0x380
> > [ 7953.343505] ? lock_acquire+0x15d/0x380
> > [ 7953.347353] ? lock_release+0x1cd/0x2a0
> > [ 7953.351203] process_one_work+0x26e/0x560
> > [ 7953.355225] worker_thread+0x52/0x3b0
> > [ 7953.358898] ? process_one_work+0x560/0x560
> > [ 7953.363094] kthread+0x12c/0x150
> > [ 7953.366335] ? __kthread_bind_mask+0x60/0x60
> > [ 7953.370617] ret_from_fork+0x22/0x30
> > [ 7953.374206] irq event stamp: 0
> > [ 7953.377268] hardirqs last enabled at (0): [<0000000000000000>] 0x0
> > [ 7953.383544] hardirqs last disabled at (0): [<ffffffffb50da3f4>] copy_process+0x754/0x1d00
> > [ 7953.391724] softirqs last enabled at (0): [<ffffffffb50da3f4>] copy_process+0x754/0x1d00
> > [ 7953.399907] softirqs last disabled at (0): [<0000000000000000>] 0x0
> > [ 7953.406179] ---[ end trace f04c960f66265f3a ]---
> > [ 7953.410804] BUG: kernel NULL pointer dereference, address: 000000000000031f
> > [ 7953.417760] #PF: supervisor read access in kernel mode
> > [ 7953.422900] #PF: error_code(0x0000) - not-present page
> > [ 7953.428038] PGD 0 P4D 0
> > [ 7953.430579] Oops: 0000 [#1] SMP PTI
> > [ 7953.434070] CPU: 0 PID: 131627 Comm: kworker/u161:5 Tainted: G W I 5.13.0-rc4+ #70
> > [ 7953.442764] Hardware name: Dell Inc. PowerEdge R740/01KPX8, BIOS 1.6.11 11/20/2018
> > [ 7953.450330] Workqueue: xfs-cil/dm-7 xlog_cil_push_work [xfs]
> > [ 7953.456058] RIP: 0010:xfs_trans_committed_bulk+0xcc/0x2c0 [xfs]
> > [ 7953.462036] Code: 41 83 c5 01 48 89 54 c4 50 41 83 fd 1f 0f 8f 11 01 00 00 4d 8b 36 4c 3b 34 24 74 28 4d 8b 66 20 40 84 ed 75 54 49 8b 44 24 60 <f6> 00 01 74 91 48 8b 40 38 4c 89 e7 e8 63 6b 42 f5 4d 8b 36 4c 3b
> > [ 7953.480783] RSP: 0018:ffffa57ccf99fa68 EFLAGS: 00010202
> > [ 7953.486009] RAX: 000000000000031f RBX: 0000000500003977 RCX: 0000000000000000
> > [ 7953.493141] RDX: 00000000ffffffc0 RSI: 0000000000000000 RDI: ffffffffc0c300e2
> > [ 7953.500274] RBP: 0000000000000001 R08: 0000000000000000 R09: 000000000000000a
> > [ 7953.507404] R10: 000000000000000a R11: f000000000000000 R12: ffff91c759fedb20
> > [ 7953.514536] R13: 0000000000000000 R14: ffff91c759fedb00 R15: ffff91c76dfccb40
> > [ 7953.521671] FS: 0000000000000000(0000) GS:ffff91f580800000(0000) knlGS:0000000000000000
> > [ 7953.529757] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [ 7953.535501] CR2: 000000000000031f CR3: 0000003d02826003 CR4: 00000000007706f0
> > [ 7953.542633] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > [ 7953.549768] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> > [ 7953.556899] PKRU: 55555554
> > [ 7953.559612] Call Trace:
> > [ 7953.562064] ? lock_release+0x1cd/0x2a0
> > [ 7953.565902] ? _raw_spin_unlock+0x1f/0x30
> > [ 7953.569917] ? xlog_write+0x1e2/0x630 [xfs]
> > [ 7953.574162] ? lock_acquire+0x15d/0x380
> > [ 7953.578000] ? lock_acquire+0x15d/0x380
> > [ 7953.581841] ? lock_release+0x1cd/0x2a0
> > [ 7953.585680] ? trace_hardirqs_on+0x1b/0xd0
> > [ 7953.589780] ? _raw_spin_unlock_irqrestore+0x37/0x40
> > [ 7953.594744] ? __wake_up_common_lock+0x7a/0x90
> > [ 7953.599192] ? lock_release+0x1cd/0x2a0
> > [ 7953.603031] xlog_cil_committed+0x34f/0x390 [xfs]
> > [ 7953.607798] ? xlog_cil_push_work+0x715/0x8d0 [xfs]
> > [ 7953.612738] xlog_cil_push_work+0x740/0x8d0 [xfs]
> > [ 7953.617504] ? _raw_spin_unlock_irq+0x24/0x40
> > [ 7953.621862] ? finish_task_switch.isra.0+0xa0/0x2c0
> > [ 7953.626745] ? kmem_cache_free+0x247/0x5c0
> > [ 7953.630839] ? fsnotify_final_mark_destroy+0x1c/0x30
> > [ 7953.635806] ? lock_acquire+0x15d/0x380
> > [ 7953.639646] ? lock_acquire+0x15d/0x380
> > [ 7953.643484] ? lock_release+0x1cd/0x2a0
> > [ 7953.647323] process_one_work+0x26e/0x560
> > [ 7953.651337] worker_thread+0x52/0x3b0
> > [ 7953.655003] ? process_one_work+0x560/0x560
> > [ 7953.659188] kthread+0x12c/0x150
> > [ 7953.662421] ? __kthread_bind_mask+0x60/0x60
> > [ 7953.666694] ret_from_fork+0x22/0x30
> > [ 7953.670273] Modules linked in: rfkill dm_service_time dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua rpcrdma sunrpc rdma_ucm ib_srpt ib_isert iscsi_target_mod target_core_mod ib_iser libiscsi scsi_transport_iscsi rdma_cm ib_umad ib_ipoib iw_cm ib_cm mlx5_ib intel_rapl_msr intel_rapl_common isst_if_common ib_uverbs ib_core skx_edac nfit libnvdimm x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel mlx5_core kvm ipmi_ssif iTCO_wdt intel_pmc_bxt irqbypass iTCO_vendor_support rapl acpi_ipmi intel_cstate psample intel_uncore mei_me wmi_bmof ipmi_si pcspkr mlxfw i2c_i801 tg3 pci_hyperv_intf mei lpc_ich intel_pch_thermal ipmi_devintf i2c_smbus ipmi_msghandler acpi_power_meter fuse zram ip_tables xfs lpfc mgag200 drm_kms_helper nvmet_fc nvmet cec nvme_fc crct10dif_pclmul crc32_pclmul nvme_fabrics drm crc32c_intel nvme_core ghash_clmulni_intel scsi_transport_fc megaraid_sas i2c_algo_bit wmi
> > [ 7953.749025] CR2: 000000000000031f
> > [ 7953.752345] ---[ end trace f04c960f66265f3b ]---
> >
>
^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [PATCH 4/8] xfs: pass a CIL context to xlog_write()
2021-06-17 8:26 ` [PATCH 4/8] xfs: pass a CIL context to xlog_write() Dave Chinner
2021-06-17 14:46 ` kernel test robot
@ 2021-06-17 20:24 ` Darrick J. Wong
2021-06-17 22:03 ` Dave Chinner
2021-06-18 14:23 ` Christoph Hellwig
2 siblings, 1 reply; 50+ messages in thread
From: Darrick J. Wong @ 2021-06-17 20:24 UTC (permalink / raw)
To: Dave Chinner; +Cc: linux-xfs
On Thu, Jun 17, 2021 at 06:26:13PM +1000, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
>
> Pass the CIL context to xlog_write() rather than a pointer to a LSN
> variable. Only the CIL checkpoint calls to xlog_write() need to know
> about the start LSN of the writes, so rework xlog_write to directly
> write the LSNs into the CIL context structure.
>
> This removes the commit_lsn variable from xlog_cil_push_work(), so
> now we only have to issue the commit record ordering wakeup from
> there.
>
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> ---
> fs/xfs/xfs_log.c | 22 +++++++++++++++++-----
> fs/xfs/xfs_log_cil.c | 19 ++++++++-----------
> fs/xfs/xfs_log_priv.h | 4 ++--
> 3 files changed, 27 insertions(+), 18 deletions(-)
>
> diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c
> index cf661c155786..fc0e43c57683 100644
> --- a/fs/xfs/xfs_log.c
> +++ b/fs/xfs/xfs_log.c
> @@ -871,7 +871,7 @@ xlog_write_unmount_record(
> */
> if (log->l_targ != log->l_mp->m_ddev_targp)
> blkdev_issue_flush(log->l_targ->bt_bdev);
> - return xlog_write(log, &lv_chain, ticket, NULL, NULL, reg.i_len);
> + return xlog_write(log, NULL, &lv_chain, ticket, NULL, reg.i_len);
> }
>
> /*
> @@ -2383,9 +2383,9 @@ xlog_write_partial(
> int
> xlog_write(
> struct xlog *log,
> + struct xfs_cil_ctx *ctx,
> struct list_head *lv_chain,
> struct xlog_ticket *ticket,
> - xfs_lsn_t *start_lsn,
> struct xlog_in_core **commit_iclog,
> uint32_t len)
> {
> @@ -2408,9 +2408,21 @@ xlog_write(
> if (error)
> return error;
>
> - /* start_lsn is the LSN of the first iclog written to. */
> - if (start_lsn)
> - *start_lsn = be64_to_cpu(iclog->ic_header.h_lsn);
> + /*
> + * If we have a CIL context, record the LSN of the iclog we were just
> + * granted space to start writing into. If the context doesn't have
> + * a start_lsn recorded, then this iclog will contain the start record
> + * for the checkpoint. Otherwise this write contains the commit record
> + * for the checkpoint.
> + */
> + if (ctx) {
> + spin_lock(&ctx->cil->xc_push_lock);
> + if (!ctx->start_lsn)
> + ctx->start_lsn = be64_to_cpu(iclog->ic_header.h_lsn);
> + else
> + ctx->commit_lsn = be64_to_cpu(iclog->ic_header.h_lsn);
> + spin_unlock(&ctx->cil->xc_push_lock);
This cycling of the push lock when setting start_lsn is new. What are
we protecting against here by taking the lock?
Also, just to check my assumptions: why do we take the push lock when
setting commit_lsn? Is that to synchronize with the xc_committing loop
that looks for contexts that need pushing?
--D
> + }
>
> lv = list_first_entry_or_null(lv_chain, struct xfs_log_vec, lv_list);
> while (lv) {
> diff --git a/fs/xfs/xfs_log_cil.c b/fs/xfs/xfs_log_cil.c
> index 2c8b25888c53..35fc3e57d870 100644
> --- a/fs/xfs/xfs_log_cil.c
> +++ b/fs/xfs/xfs_log_cil.c
> @@ -790,14 +790,13 @@ xlog_cil_build_trans_hdr(
> */
> int
> xlog_cil_write_commit_record(
> - struct xlog *log,
> - struct xlog_ticket *ticket,
> - struct xlog_in_core **iclog,
> - xfs_lsn_t *lsn)
> + struct xfs_cil_ctx *ctx,
> + struct xlog_in_core **iclog)
> {
> + struct xlog *log = ctx->cil->xc_log;
> struct xlog_op_header ophdr = {
> .oh_clientid = XFS_TRANSACTION,
> - .oh_tid = cpu_to_be32(ticket->t_tid),
> + .oh_tid = cpu_to_be32(ctx->ticket->t_tid),
> .oh_flags = XLOG_COMMIT_TRANS,
> };
> struct xfs_log_iovec reg = {
> @@ -818,8 +817,8 @@ xlog_cil_write_commit_record(
> return -EIO;
>
> /* account for space used by record data */
> - ticket->t_curr_res -= reg.i_len;
> - error = xlog_write(log, &lv_chain, ticket, lsn, iclog, reg.i_len);
> + ctx->ticket->t_curr_res -= reg.i_len;
> + error = xlog_write(log, ctx, &lv_chain, ctx->ticket, iclog, reg.i_len);
> if (error)
> xfs_force_shutdown(log->l_mp, SHUTDOWN_LOG_IO_ERROR);
> return error;
> @@ -1038,7 +1037,7 @@ xlog_cil_push_work(
> * use the commit record lsn then we can move the tail beyond the grant
> * write head.
> */
> - error = xlog_write(log, &ctx->lv_chain, ctx->ticket, &ctx->start_lsn,
> + error = xlog_write(log, ctx, &ctx->lv_chain, ctx->ticket,
> NULL, num_bytes);
>
> /*
> @@ -1083,8 +1082,7 @@ xlog_cil_push_work(
> }
> spin_unlock(&cil->xc_push_lock);
>
> - error = xlog_cil_write_commit_record(log, ctx->ticket, &commit_iclog,
> - &commit_lsn);
> + error = xlog_cil_write_commit_record(ctx, &commit_iclog);
> if (error)
> goto out_abort_free_ticket;
>
> @@ -1104,7 +1102,6 @@ xlog_cil_push_work(
> * and wake up anyone who is waiting for the commit to complete.
> */
> spin_lock(&cil->xc_push_lock);
> - ctx->commit_lsn = commit_lsn;
> wake_up_all(&cil->xc_commit_wait);
> spin_unlock(&cil->xc_push_lock);
>
> diff --git a/fs/xfs/xfs_log_priv.h b/fs/xfs/xfs_log_priv.h
> index 26f26769d1c6..af8a9dfa8068 100644
> --- a/fs/xfs/xfs_log_priv.h
> +++ b/fs/xfs/xfs_log_priv.h
> @@ -487,8 +487,8 @@ xlog_write_adv_cnt(void **ptr, int *len, int *off, size_t bytes)
>
> void xlog_print_tic_res(struct xfs_mount *mp, struct xlog_ticket *ticket);
> void xlog_print_trans(struct xfs_trans *);
> -int xlog_write(struct xlog *log, struct list_head *lv_chain,
> - struct xlog_ticket *tic, xfs_lsn_t *start_lsn,
> +int xlog_write(struct xlog *log, struct xfs_cil_ctx *ctx,
> + struct list_head *lv_chain, struct xlog_ticket *tic,
> struct xlog_in_core **commit_iclog, uint32_t len);
>
> void xfs_log_ticket_ungrant(struct xlog *log, struct xlog_ticket *ticket);
> --
> 2.31.1
>
^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [PATCH 0/8 V2] xfs: log fixes for for-next
2021-06-17 20:06 ` Brian Foster
@ 2021-06-17 20:26 ` Darrick J. Wong
2021-06-17 23:31 ` Brian Foster
0 siblings, 1 reply; 50+ messages in thread
From: Darrick J. Wong @ 2021-06-17 20:26 UTC (permalink / raw)
To: Brian Foster; +Cc: Dave Chinner, linux-xfs
On Thu, Jun 17, 2021 at 04:06:24PM -0400, Brian Foster wrote:
> On Thu, Jun 17, 2021 at 12:05:19PM -0700, Darrick J. Wong wrote:
> > On Thu, Jun 17, 2021 at 02:32:30PM -0400, Brian Foster wrote:
> > > On Thu, Jun 17, 2021 at 06:26:09PM +1000, Dave Chinner wrote:
> > > > Hi folks,
> > > >
> > > > This is followup from the first set of log fixes for for-next that
> > > > were posted here:
> > > >
> > > > https://lore.kernel.org/linux-xfs/20210615175719.GD158209@locust/T/#mde2cf0bb7d2ac369815a7e9371f0303efc89f51b
> > > >
> > > > The first two patches of this series are updates for those patches,
> > > > change log below. The rest is the fix for the bigger issue we
> > > > uncovered in investigating the generic/019 failures, being that
> > > > we're triggering a zero-day bug in the way log recovery assigns LSNs
> > > > to checkpoints.
> > > >
> > > > The "simple" fix of using the same ordering code as the commit
> > > > record for the start records in the CIL push turned into a lot of
> > > > patches once I started cleaning it up, separating out all the
> > > > different bits and finally realising all the things I needed to
> > > > change to avoid unintentional logic/behavioural changes. Hence
> > > > there's some code movement, some factoring, API changes to
> > > > xlog_write(), changing where we attach callbacks to commit iclogs so
> > > > they remain correctly ordered if there are multiple commit records
> > > > in the one iclog and then, finally, strictly ordering the start
> > > > records....
> > > >
> > > > The original "simple fix" I tested last night ran almost a thousand
> > > > cycles of generic/019 without a log hang or recovery failure of any
> > > > kind. The refactored patchset has run a couple hundred cycles of
> > > > g/019 and g/475 over the last few hours without a failure, so I'm
> > > > posting this so we can get a review iteration done while I sleep so
> > > > we can - hopefully - get this sorted out before the end of the week.
> > > >
> > >
> > > My first spin of this included generic/019 and generic/475, ran for 18
> > > or so iterations and 475 exploded with a stream of asserts followed by a
> > > NULL pointer crash:
> > >
> > > # grep -e Assertion -e BUG dmesg.out
> > > ...
> > > [ 7951.878058] XFS: Assertion failed: atomic_read(&buip->bui_refcount) > 0, file: fs/xfs/xfs_bmap_item.c, line: 57
> > > [ 7952.261251] XFS: Assertion failed: atomic_read(&buip->bui_refcount) > 0, file: fs/xfs/xfs_bmap_item.c, line: 57
> > > [ 7952.644444] XFS: Assertion failed: atomic_read(&buip->bui_refcount) > 0, file: fs/xfs/xfs_bmap_item.c, line: 57
> > > [ 7953.027626] XFS: Assertion failed: atomic_read(&buip->bui_refcount) > 0, file: fs/xfs/xfs_bmap_item.c, line: 57
> > > [ 7953.410804] BUG: kernel NULL pointer dereference, address: 000000000000031f
> > > [ 7954.118973] BUG: unable to handle page fault for address: ffffa57ccf99fa98
> > >
> > > I don't know if this is a regression, but I've not seen it before. I've
> > > attempted to spin generic/475 since then to see if it reproduces again,
> > > but so far I'm only running into some of the preexisting issues
> > > associated with that test.
> >
> > By any chance, do the two log recovery fixes I sent yesterday make those
> > problems go away?
> >
>
> Hadn't got to those ones yet...
<nod>
> > > I'll let it go a while more and probably
> > > switch it back to running both sometime before the end of the day for an
> > > overnight test.
> >
> > Also, do the CIL livelocks go away if you apply only patches 1-2?
> >
>
> It's kind of hard to discern the effect of individual fixes when
> multiple corruptions are at play. :/ I suppose I could switch up my
> planned overnight test to include the aforementioned 2 recovery fixes
> and 1-2 from this series, if that is preferable..?
I dunno about overnight, but at least ~20 or so iterations?
> I suspect that would
> leave around the originally reported generic/019 corruption presumably
> caused by the start LSN ordering issue, but we could see if the deadlock
> is addressed and whether 475 survives any longer.
Might be a useful data point to figure out if these pieces are separate
or if they really do belong in an 8 patch series, since I think ~20 or
so iterations shouldn't take too long (though I guess it is nearly 16:30
your time, isn't it...) Well, do whatever you think is best use of
machine time.
--D
>
> Brian
>
> > > A full copy of the assert and NULL pointer BUG splat is included below
> > > for reference. It looks like the fault BUG splat ended up interspersed
> > > or otherwise mangled, but I suspect that one is just fallout from the
> > > immediately previous crash.
> >
> > I have a question about the composition of this 8-patch series --
> > which patches fix the new cil code, and which ones fix the out of order
> > recovery problems? I suspect that patches 1-2 are for the new CIL code,
> > and 3-8 are to fix the recovery problems.
> >
> > Thinking with my distro kernel not-maintainer hat on, I'm considering
> > how to backport whatever fixes emerge for the recovery ordering issue
> > into existing kernels. The way I see things right now, the CIL changes
> > (+ fixes) and the ordering bug fixes are separate issues. The log
> > ordering problems should get fixed as soon as we have a practical
> > solution; the CIL changes could get deferred if need be since it's a
> > medium-high risk; and the real question is how to sequence all this?
> >
> > (Or to put it another way: I'm still stuck going "oh wowwww this is a
> > lot more change" while trying to understand patch 4)
> >
> > --D
> >
> > >
> > > Brian
> > >
> > > --- 8< ---
> > >
> > > [ 7953.027626] XFS: Assertion failed: atomic_read(&buip->bui_refcount) > 0, file: fs/xfs/xfs_bmap_item.c, line: 57
> > > [ 7953.037737] ------------[ cut here ]------------
> > > [ 7953.042358] WARNING: CPU: 0 PID: 131627 at fs/xfs/xfs_message.c:112 assfail+0x25/0x28 [xfs]
> > > [ 7953.050782] Modules linked in: rfkill dm_service_time dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua rpcrdma sunrpc rdma_ucm ib_srpt ib_isert iscsi_target_mod target_core_mod ib_iser libiscsi scsi_transport_iscsi rdma_cm ib_umad ib_ipoib iw_cm ib_cm mlx5_ib intel_rapl_msr intel_rapl_common isst_if_common ib_uverbs ib_core skx_edac nfit libnvdimm x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel mlx5_core kvm ipmi_ssif iTCO_wdt intel_pmc_bxt irqbypass iTCO_vendor_support rapl acpi_ipmi intel_cstate psample intel_uncore mei_me wmi_bmof ipmi_si pcspkr mlxfw i2c_i801 tg3 pci_hyperv_intf mei lpc_ich intel_pch_thermal ipmi_devintf i2c_smbus ipmi_msghandler acpi_power_meter fuse zram ip_tables xfs lpfc mgag200 drm_kms_helper nvmet_fc nvmet cec nvme_fc crct10dif_pclmul crc32_pclmul nvme_fabrics drm crc32c_intel nvme_core ghash_clmulni_intel scsi_transport_fc megaraid_sas i2c_algo_bit wmi
> > > [ 7953.129548] CPU: 0 PID: 131627 Comm: kworker/u161:5 Tainted: G W I 5.13.0-rc4+ #70
> > > [ 7953.138243] Hardware name: Dell Inc. PowerEdge R740/01KPX8, BIOS 1.6.11 11/20/2018
> > > [ 7953.145818] Workqueue: xfs-cil/dm-7 xlog_cil_push_work [xfs]
> > > [ 7953.151554] RIP: 0010:assfail+0x25/0x28 [xfs]
> > > [ 7953.155991] Code: ff ff 0f 0b c3 0f 1f 44 00 00 41 89 c8 48 89 d1 48 89 f2 48 c7 c6 d8 eb c3 c0 e8 cf fa ff ff 80 3d f1 d4 0a 00 00 74 02 0f 0b <0f> 0b c3 48 8d 45 10 48 89 e2 4c 89 e6 48 89 1c 24 48 89 44 24 18
> > > [ 7953.174745] RSP: 0018:ffffa57ccf99fa50 EFLAGS: 00010246
> > > [ 7953.179982] RAX: 00000000ffffffea RBX: 0000000500003977 RCX: 0000000000000000
> > > [ 7953.187121] RDX: 00000000ffffffc0 RSI: 0000000000000000 RDI: ffffffffc0c300e2
> > > [ 7953.194264] RBP: ffff91f685725040 R08: 0000000000000000 R09: 000000000000000a
> > > [ 7953.201405] R10: 000000000000000a R11: f000000000000000 R12: ffff91f685725040
> > > [ 7953.208546] R13: 0000000000000000 R14: ffff91f66abed140 R15: ffff91c76dfccb40
> > > [ 7953.215686] FS: 0000000000000000(0000) GS:ffff91f580800000(0000) knlGS:0000000000000000
> > > [ 7953.223781] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > [ 7953.229533] CR2: 00000000020e2108 CR3: 0000003d02826003 CR4: 00000000007706f0
> > > [ 7953.236667] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > > [ 7953.243809] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> > > [ 7953.250949] PKRU: 55555554
> > > [ 7953.253669] Call Trace:
> > > [ 7953.256123] xfs_bui_release+0x4b/0x50 [xfs]
> > > [ 7953.260466] xfs_trans_committed_bulk+0x158/0x2c0 [xfs]
> > > [ 7953.265762] ? lock_release+0x1cd/0x2a0
> > > [ 7953.269610] ? _raw_spin_unlock+0x1f/0x30
> > > [ 7953.273630] ? xlog_write+0x1e2/0x630 [xfs]
> > > [ 7953.277886] ? lock_acquire+0x15d/0x380
> > > [ 7953.281732] ? lock_acquire+0x15d/0x380
> > > [ 7953.285582] ? lock_release+0x1cd/0x2a0
> > > [ 7953.289428] ? trace_hardirqs_on+0x1b/0xd0
> > > [ 7953.293536] ? _raw_spin_unlock_irqrestore+0x37/0x40
> > > [ 7953.298511] ? __wake_up_common_lock+0x7a/0x90
> > > [ 7953.302966] ? lock_release+0x1cd/0x2a0
> > > [ 7953.306813] xlog_cil_committed+0x34f/0x390 [xfs]
> > > [ 7953.311593] ? xlog_cil_push_work+0x715/0x8d0 [xfs]
> > > [ 7953.316547] xlog_cil_push_work+0x740/0x8d0 [xfs]
> > > [ 7953.321321] ? _raw_spin_unlock_irq+0x24/0x40
> > > [ 7953.325689] ? finish_task_switch.isra.0+0xa0/0x2c0
> > > [ 7953.330580] ? kmem_cache_free+0x247/0x5c0
> > > [ 7953.334685] ? fsnotify_final_mark_destroy+0x1c/0x30
> > > [ 7953.339658] ? lock_acquire+0x15d/0x380
> > > [ 7953.343505] ? lock_acquire+0x15d/0x380
> > > [ 7953.347353] ? lock_release+0x1cd/0x2a0
> > > [ 7953.351203] process_one_work+0x26e/0x560
> > > [ 7953.355225] worker_thread+0x52/0x3b0
> > > [ 7953.358898] ? process_one_work+0x560/0x560
> > > [ 7953.363094] kthread+0x12c/0x150
> > > [ 7953.366335] ? __kthread_bind_mask+0x60/0x60
> > > [ 7953.370617] ret_from_fork+0x22/0x30
> > > [ 7953.374206] irq event stamp: 0
> > > [ 7953.377268] hardirqs last enabled at (0): [<0000000000000000>] 0x0
> > > [ 7953.383544] hardirqs last disabled at (0): [<ffffffffb50da3f4>] copy_process+0x754/0x1d00
> > > [ 7953.391724] softirqs last enabled at (0): [<ffffffffb50da3f4>] copy_process+0x754/0x1d00
> > > [ 7953.399907] softirqs last disabled at (0): [<0000000000000000>] 0x0
> > > [ 7953.406179] ---[ end trace f04c960f66265f3a ]---
> > > [ 7953.410804] BUG: kernel NULL pointer dereference, address: 000000000000031f
> > > [ 7953.417760] #PF: supervisor read access in kernel mode
> > > [ 7953.422900] #PF: error_code(0x0000) - not-present page
> > > [ 7953.428038] PGD 0 P4D 0
> > > [ 7953.430579] Oops: 0000 [#1] SMP PTI
> > > [ 7953.434070] CPU: 0 PID: 131627 Comm: kworker/u161:5 Tainted: G W I 5.13.0-rc4+ #70
> > > [ 7953.442764] Hardware name: Dell Inc. PowerEdge R740/01KPX8, BIOS 1.6.11 11/20/2018
> > > [ 7953.450330] Workqueue: xfs-cil/dm-7 xlog_cil_push_work [xfs]
> > > [ 7953.456058] RIP: 0010:xfs_trans_committed_bulk+0xcc/0x2c0 [xfs]
> > > [ 7953.462036] Code: 41 83 c5 01 48 89 54 c4 50 41 83 fd 1f 0f 8f 11 01 00 00 4d 8b 36 4c 3b 34 24 74 28 4d 8b 66 20 40 84 ed 75 54 49 8b 44 24 60 <f6> 00 01 74 91 48 8b 40 38 4c 89 e7 e8 63 6b 42 f5 4d 8b 36 4c 3b
> > > [ 7953.480783] RSP: 0018:ffffa57ccf99fa68 EFLAGS: 00010202
> > > [ 7953.486009] RAX: 000000000000031f RBX: 0000000500003977 RCX: 0000000000000000
> > > [ 7953.493141] RDX: 00000000ffffffc0 RSI: 0000000000000000 RDI: ffffffffc0c300e2
> > > [ 7953.500274] RBP: 0000000000000001 R08: 0000000000000000 R09: 000000000000000a
> > > [ 7953.507404] R10: 000000000000000a R11: f000000000000000 R12: ffff91c759fedb20
> > > [ 7953.514536] R13: 0000000000000000 R14: ffff91c759fedb00 R15: ffff91c76dfccb40
> > > [ 7953.521671] FS: 0000000000000000(0000) GS:ffff91f580800000(0000) knlGS:0000000000000000
> > > [ 7953.529757] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > [ 7953.535501] CR2: 000000000000031f CR3: 0000003d02826003 CR4: 00000000007706f0
> > > [ 7953.542633] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > > [ 7953.549768] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> > > [ 7953.556899] PKRU: 55555554
> > > [ 7953.559612] Call Trace:
> > > [ 7953.562064] ? lock_release+0x1cd/0x2a0
> > > [ 7953.565902] ? _raw_spin_unlock+0x1f/0x30
> > > [ 7953.569917] ? xlog_write+0x1e2/0x630 [xfs]
> > > [ 7953.574162] ? lock_acquire+0x15d/0x380
> > > [ 7953.578000] ? lock_acquire+0x15d/0x380
> > > [ 7953.581841] ? lock_release+0x1cd/0x2a0
> > > [ 7953.585680] ? trace_hardirqs_on+0x1b/0xd0
> > > [ 7953.589780] ? _raw_spin_unlock_irqrestore+0x37/0x40
> > > [ 7953.594744] ? __wake_up_common_lock+0x7a/0x90
> > > [ 7953.599192] ? lock_release+0x1cd/0x2a0
> > > [ 7953.603031] xlog_cil_committed+0x34f/0x390 [xfs]
> > > [ 7953.607798] ? xlog_cil_push_work+0x715/0x8d0 [xfs]
> > > [ 7953.612738] xlog_cil_push_work+0x740/0x8d0 [xfs]
> > > [ 7953.617504] ? _raw_spin_unlock_irq+0x24/0x40
> > > [ 7953.621862] ? finish_task_switch.isra.0+0xa0/0x2c0
> > > [ 7953.626745] ? kmem_cache_free+0x247/0x5c0
> > > [ 7953.630839] ? fsnotify_final_mark_destroy+0x1c/0x30
> > > [ 7953.635806] ? lock_acquire+0x15d/0x380
> > > [ 7953.639646] ? lock_acquire+0x15d/0x380
> > > [ 7953.643484] ? lock_release+0x1cd/0x2a0
> > > [ 7953.647323] process_one_work+0x26e/0x560
> > > [ 7953.651337] worker_thread+0x52/0x3b0
> > > [ 7953.655003] ? process_one_work+0x560/0x560
> > > [ 7953.659188] kthread+0x12c/0x150
> > > [ 7953.662421] ? __kthread_bind_mask+0x60/0x60
> > > [ 7953.666694] ret_from_fork+0x22/0x30
> > > [ 7953.670273] Modules linked in: rfkill dm_service_time dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua rpcrdma sunrpc rdma_ucm ib_srpt ib_isert iscsi_target_mod target_core_mod ib_iser libiscsi scsi_transport_iscsi rdma_cm ib_umad ib_ipoib iw_cm ib_cm mlx5_ib intel_rapl_msr intel_rapl_common isst_if_common ib_uverbs ib_core skx_edac nfit libnvdimm x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel mlx5_core kvm ipmi_ssif iTCO_wdt intel_pmc_bxt irqbypass iTCO_vendor_support rapl acpi_ipmi intel_cstate psample intel_uncore mei_me wmi_bmof ipmi_si pcspkr mlxfw i2c_i801 tg3 pci_hyperv_intf mei lpc_ich intel_pch_thermal ipmi_devintf i2c_smbus ipmi_msghandler acpi_power_meter fuse zram ip_tables xfs lpfc mgag200 drm_kms_helper nvmet_fc nvmet cec nvme_fc crct10dif_pclmul crc32_pclmul nvme_fabrics drm crc32c_intel nvme_core ghash_clmulni_intel scsi_transport_fc megaraid_sas i2c_algo_bit wmi
> > > [ 7953.749025] CR2: 000000000000031f
> > > [ 7953.752345] ---[ end trace f04c960f66265f3b ]---
> > >
> >
>
^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [PATCH 6/8] xfs: separate out setting CIL context LSNs from xlog_write
2021-06-17 8:26 ` [PATCH 6/8] xfs: separate out setting CIL context LSNs from xlog_write Dave Chinner
@ 2021-06-17 20:28 ` Darrick J. Wong
2021-06-17 22:10 ` Dave Chinner
0 siblings, 1 reply; 50+ messages in thread
From: Darrick J. Wong @ 2021-06-17 20:28 UTC (permalink / raw)
To: Dave Chinner; +Cc: linux-xfs
On Thu, Jun 17, 2021 at 06:26:15PM +1000, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
>
> In preparation for moving more CIL context specific functionality
> into these operations.
>
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
Looks fine as a hoist, though I wonder why you didn't do this in patch
4?
--D
> ---
> fs/xfs/xfs_log.c | 17 ++---------------
> fs/xfs/xfs_log_cil.c | 23 +++++++++++++++++++++++
> fs/xfs/xfs_log_priv.h | 2 ++
> 3 files changed, 27 insertions(+), 15 deletions(-)
>
> diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c
> index fc0e43c57683..1c214b395223 100644
> --- a/fs/xfs/xfs_log.c
> +++ b/fs/xfs/xfs_log.c
> @@ -2408,21 +2408,8 @@ xlog_write(
> if (error)
> return error;
>
> - /*
> - * If we have a CIL context, record the LSN of the iclog we were just
> - * granted space to start writing into. If the context doesn't have
> - * a start_lsn recorded, then this iclog will contain the start record
> - * for the checkpoint. Otherwise this write contains the commit record
> - * for the checkpoint.
> - */
> - if (ctx) {
> - spin_lock(&ctx->cil->xc_push_lock);
> - if (!ctx->start_lsn)
> - ctx->start_lsn = be64_to_cpu(iclog->ic_header.h_lsn);
> - else
> - ctx->commit_lsn = be64_to_cpu(iclog->ic_header.h_lsn);
> - spin_unlock(&ctx->cil->xc_push_lock);
> - }
> + if (ctx)
> + xlog_cil_set_ctx_write_state(ctx, iclog);
>
> lv = list_first_entry_or_null(lv_chain, struct xfs_log_vec, lv_list);
> while (lv) {
> diff --git a/fs/xfs/xfs_log_cil.c b/fs/xfs/xfs_log_cil.c
> index f993ec69fc97..2d8d904ffb78 100644
> --- a/fs/xfs/xfs_log_cil.c
> +++ b/fs/xfs/xfs_log_cil.c
> @@ -783,6 +783,29 @@ xlog_cil_build_trans_hdr(
> tic->t_curr_res -= lvhdr->lv_bytes;
> }
>
> +/*
> + * Record the LSN of the iclog we were just granted space to start writing into.
> + * If the context doesn't have a start_lsn recorded, then this iclog will
> + * contain the start record for the checkpoint. Otherwise this write contains
> + * the commit record for the checkpoint.
> + */
> +void
> +xlog_cil_set_ctx_write_state(
> + struct xfs_cil_ctx *ctx,
> + struct xlog_in_core *iclog)
> +{
> + struct xfs_cil *cil = ctx->cil;
> + xfs_lsn_t lsn = be64_to_cpu(iclog->ic_header.h_lsn);
> +
> + ASSERT(!ctx->commit_lsn);
> + spin_lock(&cil->xc_push_lock);
> + if (!ctx->start_lsn)
> + ctx->start_lsn = lsn;
> + else
> + ctx->commit_lsn = lsn;
> + spin_unlock(&cil->xc_push_lock);
> +}
> +
> /*
> * Ensure that the order of log writes follows checkpoint sequence order. This
> * relies on the context LSN being zero until the log write has guaranteed the
> diff --git a/fs/xfs/xfs_log_priv.h b/fs/xfs/xfs_log_priv.h
> index af8a9dfa8068..849ba2eb3483 100644
> --- a/fs/xfs/xfs_log_priv.h
> +++ b/fs/xfs/xfs_log_priv.h
> @@ -563,6 +563,8 @@ void xlog_cil_destroy(struct xlog *log);
> bool xlog_cil_empty(struct xlog *log);
> void xlog_cil_commit(struct xlog *log, struct xfs_trans *tp,
> xfs_csn_t *commit_seq, bool regrant);
> +void xlog_cil_set_ctx_write_state(struct xfs_cil_ctx *ctx,
> + struct xlog_in_core *iclog);
>
> /*
> * CIL force routines
> --
> 2.31.1
>
^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [PATCH 7/8] xfs: attached iclog callbacks in xlog_cil_set_ctx_write_state()
2021-06-17 8:26 ` [PATCH 7/8] xfs: attached iclog callbacks in xlog_cil_set_ctx_write_state() Dave Chinner
@ 2021-06-17 20:55 ` Darrick J. Wong
2021-06-17 22:20 ` Dave Chinner
0 siblings, 1 reply; 50+ messages in thread
From: Darrick J. Wong @ 2021-06-17 20:55 UTC (permalink / raw)
To: Dave Chinner; +Cc: linux-xfs
On Thu, Jun 17, 2021 at 06:26:16PM +1000, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
>
> We currently attach iclog callbacks for the CIL when the commit
> iclog is returned from xlog_write. Because
> xlog_state_get_iclog_space() always guarantees that the commit
> record will fit in the iclog it returns, we can move this IO
> callback setting to xlog_cil_set_ctx_write_state(), record the
> commit iclog in the context and remove the need for the commit iclog
> to be returned by xlog_write() altogether.
>
>
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> ---
> fs/xfs/xfs_log.c | 8 ++----
> fs/xfs/xfs_log_cil.c | 65 +++++++++++++++++++++++++------------------
> fs/xfs/xfs_log_priv.h | 3 +-
> 3 files changed, 42 insertions(+), 34 deletions(-)
>
> diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c
> index 1c214b395223..359246d54db7 100644
> --- a/fs/xfs/xfs_log.c
> +++ b/fs/xfs/xfs_log.c
> @@ -871,7 +871,7 @@ xlog_write_unmount_record(
> */
> if (log->l_targ != log->l_mp->m_ddev_targp)
> blkdev_issue_flush(log->l_targ->bt_bdev);
> - return xlog_write(log, NULL, &lv_chain, ticket, NULL, reg.i_len);
> + return xlog_write(log, NULL, &lv_chain, ticket, reg.i_len);
> }
>
> /*
> @@ -2386,7 +2386,6 @@ xlog_write(
> struct xfs_cil_ctx *ctx,
> struct list_head *lv_chain,
> struct xlog_ticket *ticket,
> - struct xlog_in_core **commit_iclog,
> uint32_t len)
> {
> struct xlog_in_core *iclog = NULL;
> @@ -2436,10 +2435,7 @@ xlog_write(
> */
> spin_lock(&log->l_icloglock);
> xlog_state_finish_copy(log, iclog, record_cnt, 0);
> - if (commit_iclog)
> - *commit_iclog = iclog;
> - else
> - error = xlog_state_release_iclog(log, iclog, ticket);
> + error = xlog_state_release_iclog(log, iclog, ticket);
> spin_unlock(&log->l_icloglock);
>
> return error;
> diff --git a/fs/xfs/xfs_log_cil.c b/fs/xfs/xfs_log_cil.c
> index 2d8d904ffb78..87e30917ce2e 100644
> --- a/fs/xfs/xfs_log_cil.c
> +++ b/fs/xfs/xfs_log_cil.c
> @@ -799,11 +799,34 @@ xlog_cil_set_ctx_write_state(
>
> ASSERT(!ctx->commit_lsn);
> spin_lock(&cil->xc_push_lock);
> - if (!ctx->start_lsn)
> + if (!ctx->start_lsn) {
> ctx->start_lsn = lsn;
> - else
> - ctx->commit_lsn = lsn;
> + spin_unlock(&cil->xc_push_lock);
> + return;
> + }
> +
> + /*
> + * Take a reference to the iclog for the context so that we still hold
> + * it when xlog_write is done and has released it. This means the
> + * context controls when the iclog is released for IO.
> + */
> + atomic_inc(&iclog->ic_refcnt);
Where do we drop this refcount? Is this the accounting adjustment that
we have to make because xlog_write always decrements the iclog refcount
now?
> + ctx->commit_iclog = iclog;
> + ctx->commit_lsn = lsn;
> spin_unlock(&cil->xc_push_lock);
I've noticed how the setting of ctx->commit_lsn has moved to before the
point where we splice callback lists, only to move them back below in
the next patch. That has made it harder for me to understand this
series.
I /think/ the goal of this patch is not really a functional change so
much as a refactoring to make the cil context track the commit iclog
directly and then smooth out some of the refcounting code, but the
shuffling around of these variables makes me wonder if I'm missing some
other subtlety.
--D
> +
> + /*
> + * xlog_state_get_iclog_space() guarantees there is enough space in the
> + * iclog for an entire commit record, so attach the context callbacks to
> + * the iclog at this time if we are not already in a shutdown state.
> + */
> + spin_lock(&iclog->ic_callback_lock);
> + if (iclog->ic_state == XLOG_STATE_IOERROR) {
> + spin_unlock(&iclog->ic_callback_lock);
> + return;
> + }
> + list_add_tail(&ctx->iclog_entry, &iclog->ic_callbacks);
> + spin_unlock(&iclog->ic_callback_lock);
> }
>
> /*
> @@ -858,8 +881,7 @@ xlog_cil_order_write(
> */
> int
> xlog_cil_write_commit_record(
> - struct xfs_cil_ctx *ctx,
> - struct xlog_in_core **iclog)
> + struct xfs_cil_ctx *ctx)
> {
> struct xlog *log = ctx->cil->xc_log;
> struct xlog_op_header ophdr = {
> @@ -890,7 +912,7 @@ xlog_cil_write_commit_record(
>
> /* account for space used by record data */
> ctx->ticket->t_curr_res -= reg.i_len;
> - error = xlog_write(log, ctx, &lv_chain, ctx->ticket, iclog, reg.i_len);
> + error = xlog_write(log, ctx, &lv_chain, ctx->ticket, reg.i_len);
> if (error)
> xfs_force_shutdown(log->l_mp, SHUTDOWN_LOG_IO_ERROR);
> return error;
> @@ -940,7 +962,6 @@ xlog_cil_push_work(
> struct xlog *log = cil->xc_log;
> struct xfs_log_vec *lv;
> struct xfs_cil_ctx *new_ctx;
> - struct xlog_in_core *commit_iclog;
> int num_iovecs = 0;
> int num_bytes = 0;
> int error = 0;
> @@ -1109,8 +1130,7 @@ xlog_cil_push_work(
> * use the commit record lsn then we can move the tail beyond the grant
> * write head.
> */
> - error = xlog_write(log, ctx, &ctx->lv_chain, ctx->ticket,
> - NULL, num_bytes);
> + error = xlog_write(log, ctx, &ctx->lv_chain, ctx->ticket, num_bytes);
>
> /*
> * Take the lvhdr back off the lv_chain as it should not be passed
> @@ -1120,20 +1140,10 @@ xlog_cil_push_work(
> if (error)
> goto out_abort_free_ticket;
>
> - error = xlog_cil_write_commit_record(ctx, &commit_iclog);
> + error = xlog_cil_write_commit_record(ctx);
> if (error)
> goto out_abort_free_ticket;
>
> - spin_lock(&commit_iclog->ic_callback_lock);
> - if (commit_iclog->ic_state == XLOG_STATE_IOERROR) {
> - spin_unlock(&commit_iclog->ic_callback_lock);
> - goto out_abort_free_ticket;
> - }
> - ASSERT_ALWAYS(commit_iclog->ic_state == XLOG_STATE_ACTIVE ||
> - commit_iclog->ic_state == XLOG_STATE_WANT_SYNC);
> - list_add_tail(&ctx->iclog_entry, &commit_iclog->ic_callbacks);
> - spin_unlock(&commit_iclog->ic_callback_lock);
> -
> /*
> * now the checkpoint commit is complete and we've attached the
> * callbacks to the iclog we can assign the commit LSN to the context
> @@ -1168,8 +1178,8 @@ xlog_cil_push_work(
> if (ctx->start_lsn != commit_lsn) {
> struct xlog_in_core *iclog;
>
> - for (iclog = commit_iclog->ic_prev;
> - iclog != commit_iclog;
> + for (iclog = ctx->commit_iclog->ic_prev;
> + iclog != ctx->commit_iclog;
> iclog = iclog->ic_prev) {
> xfs_lsn_t hlsn;
>
> @@ -1201,7 +1211,7 @@ xlog_cil_push_work(
> * ordering for this checkpoint is correctly preserved down to
> * stable storage.
> */
> - commit_iclog->ic_flags |= XLOG_ICL_NEED_FLUSH;
> + ctx->commit_iclog->ic_flags |= XLOG_ICL_NEED_FLUSH;
> }
>
> /*
> @@ -1214,10 +1224,11 @@ xlog_cil_push_work(
> * will be written when released, switch it's state to WANT_SYNC right
> * now.
> */
> - commit_iclog->ic_flags |= XLOG_ICL_NEED_FUA;
> - if (push_commit_stable && commit_iclog->ic_state == XLOG_STATE_ACTIVE)
> - xlog_state_switch_iclogs(log, commit_iclog, 0);
> - xlog_state_release_iclog(log, commit_iclog, ticket);
> + ctx->commit_iclog->ic_flags |= XLOG_ICL_NEED_FUA;
> + if (push_commit_stable &&
> + ctx->commit_iclog->ic_state == XLOG_STATE_ACTIVE)
> + xlog_state_switch_iclogs(log, ctx->commit_iclog, 0);
> + xlog_state_release_iclog(log, ctx->commit_iclog, ticket);
> spin_unlock(&log->l_icloglock);
>
> xfs_log_ticket_ungrant(log, ticket);
> diff --git a/fs/xfs/xfs_log_priv.h b/fs/xfs/xfs_log_priv.h
> index 849ba2eb3483..72dfa3b89513 100644
> --- a/fs/xfs/xfs_log_priv.h
> +++ b/fs/xfs/xfs_log_priv.h
> @@ -237,6 +237,7 @@ struct xfs_cil_ctx {
> struct work_struct discard_endio_work;
> struct work_struct push_work;
> atomic_t order_id;
> + struct xlog_in_core *commit_iclog;
> };
>
> /*
> @@ -489,7 +490,7 @@ void xlog_print_tic_res(struct xfs_mount *mp, struct xlog_ticket *ticket);
> void xlog_print_trans(struct xfs_trans *);
> int xlog_write(struct xlog *log, struct xfs_cil_ctx *ctx,
> struct list_head *lv_chain, struct xlog_ticket *tic,
> - struct xlog_in_core **commit_iclog, uint32_t len);
> + uint32_t len);
>
> void xfs_log_ticket_ungrant(struct xlog *log, struct xlog_ticket *ticket);
> void xfs_log_ticket_regrant(struct xlog *log, struct xlog_ticket *ticket);
> --
> 2.31.1
>
^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [PATCH 8/8] xfs: order CIL checkpoint start records
2021-06-17 8:26 ` [PATCH 8/8] xfs: order CIL checkpoint start records Dave Chinner
@ 2021-06-17 21:31 ` Darrick J. Wong
2021-06-17 22:49 ` Dave Chinner
0 siblings, 1 reply; 50+ messages in thread
From: Darrick J. Wong @ 2021-06-17 21:31 UTC (permalink / raw)
To: Dave Chinner; +Cc: linux-xfs
On Thu, Jun 17, 2021 at 06:26:17PM +1000, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
>
> Because log recovery depends on strictly ordered start records as
> well as strictly ordered commit records.
>
> This is a zero day bug in the way XFS writes pipelined transactions
> to the journal which is exposed by commit facd77e4e38b ("xfs: CIL
> work is serialised, not pipelined") which re-introduces explicit
> concurrent commits back into the on-disk journal.
>
> The XFS journal commit code has never ordered start records and we
> have relied on strict commit record ordering for correct recovery
> ordering of concurrently written transactions. Unfortunately, root
> cause analysis uncovered the fact that log recovery uses the LSN of
> the start record for transaction commit processing. Hence the
> commits are processed in strict orderi by recovery, but the LSNs
s/orderi/order/ ?
> associated with the commits can be out of order and so recovery may
> stamp incorrect LSNs into objects and/or misorder intents in the AIL
> for later processing. This can result in log recovery failures
> and/or on disk corruption, sometimes silent.
>
> Because this is a long standing log recovery issue, we can't just
> fix log recovery and call it good.
Could there be production filesystems out there that have this
mismatched ordering of start lsn and commit lsn? This still leaves the
mystery of crashed customer filesystems containing btree blocks where
128 bytes in the middle clearly contain contents that are don't match or
duplicate the rest of the block, as though someone forgot to replay a
buffer vector or something.
What would a fix to log recovery entail? Not skipping recovered items
if the start/commit sequencing is not the same? Or am I not
understanding the problem correctly?
> This still leaves older kernels
> susceptible to recovery failures and corruption when replaying a log
> from a kernel that pipelines checkpoints.
> There is also the issue
> that in-memory ordering for AIL pushing and data integrity
> operations are based on checkpoint start LSNs, and if the start LSN
> is incorrect in the journal, it is also incorrect in memory.
>
> Hence there's really only one choice for fixing this zero-day bug:
> we need to strictly order checkpoint start records in ascending
> sequence order in the log, the same way we already strictly order
> commit records.
>
> Fixes: facd77e4e38b ("xfs: CIL work is serialised, not pipelined")
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> ---
> fs/xfs/xfs_log.c | 1 +
> fs/xfs/xfs_log_cil.c | 101 +++++++++++++++++++++++++++++-------------
> fs/xfs/xfs_log_priv.h | 1 +
> 3 files changed, 71 insertions(+), 32 deletions(-)
>
> diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c
> index 359246d54db7..94b6bccb9de9 100644
> --- a/fs/xfs/xfs_log.c
> +++ b/fs/xfs/xfs_log.c
> @@ -3743,6 +3743,7 @@ xfs_log_force_umount(
> * avoid races.
> */
> spin_lock(&log->l_cilp->xc_push_lock);
> + wake_up_all(&log->l_cilp->xc_start_wait);
> wake_up_all(&log->l_cilp->xc_commit_wait);
> spin_unlock(&log->l_cilp->xc_push_lock);
> xlog_state_do_callback(log);
> diff --git a/fs/xfs/xfs_log_cil.c b/fs/xfs/xfs_log_cil.c
> index 87e30917ce2e..722c21f21b81 100644
> --- a/fs/xfs/xfs_log_cil.c
> +++ b/fs/xfs/xfs_log_cil.c
> @@ -684,6 +684,7 @@ xlog_cil_committed(
> */
> if (abort) {
> spin_lock(&ctx->cil->xc_push_lock);
> + wake_up_all(&ctx->cil->xc_start_wait);
> wake_up_all(&ctx->cil->xc_commit_wait);
> spin_unlock(&ctx->cil->xc_push_lock);
> }
> @@ -788,6 +789,10 @@ xlog_cil_build_trans_hdr(
> * If the context doesn't have a start_lsn recorded, then this iclog will
> * contain the start record for the checkpoint. Otherwise this write contains
> * the commit record for the checkpoint.
> + *
> + * Once we've set the LSN for the given operation, wake up any ordered write
> + * waiters that can make progress now that we have a stable LSN for write
> + * ordering purposes.
> */
> void
> xlog_cil_set_ctx_write_state(
> @@ -798,9 +803,16 @@ xlog_cil_set_ctx_write_state(
> xfs_lsn_t lsn = be64_to_cpu(iclog->ic_header.h_lsn);
>
> ASSERT(!ctx->commit_lsn);
> - spin_lock(&cil->xc_push_lock);
> if (!ctx->start_lsn) {
> + spin_lock(&cil->xc_push_lock);
> + /*
> + * The LSN we need to pass to the log items on transaction
> + * commit is the LSN reported by the first log vector write, not
> + * the commit lsn. If we use the commit record lsn then we can
> + * move the tail beyond the grant write head.
> + */
> ctx->start_lsn = lsn;
> + wake_up_all(&cil->xc_start_wait);
> spin_unlock(&cil->xc_push_lock);
> return;
> }
> @@ -811,9 +823,6 @@ xlog_cil_set_ctx_write_state(
> * context controls when the iclog is released for IO.
> */
> atomic_inc(&iclog->ic_refcnt);
> - ctx->commit_iclog = iclog;
> - ctx->commit_lsn = lsn;
> - spin_unlock(&cil->xc_push_lock);
>
> /*
> * xlog_state_get_iclog_space() guarantees there is enough space in the
> @@ -827,6 +836,12 @@ xlog_cil_set_ctx_write_state(
> }
> list_add_tail(&ctx->iclog_entry, &iclog->ic_callbacks);
> spin_unlock(&iclog->ic_callback_lock);
> +
> + spin_lock(&cil->xc_push_lock);
> + ctx->commit_iclog = iclog;
> + ctx->commit_lsn = lsn;
> + wake_up_all(&cil->xc_commit_wait);
> + spin_unlock(&cil->xc_push_lock);
> }
>
> /*
> @@ -834,10 +849,16 @@ xlog_cil_set_ctx_write_state(
> * relies on the context LSN being zero until the log write has guaranteed the
> * LSN that the log write will start at via xlog_state_get_iclog_space().
> */
> +enum {
> + _START_RECORD,
> + _COMMIT_RECORD,
> +};
Stupid nit: If this enum had a name you could skip the default clause
below because the compiler would typecheck the usage for you.
I think I grok how the code changes introduce a new ordering
requirement, at least.
--D
> +
> static int
> xlog_cil_order_write(
> struct xfs_cil *cil,
> - xfs_csn_t sequence)
> + xfs_csn_t sequence,
> + int record)
> {
> struct xfs_cil_ctx *ctx;
>
> @@ -860,19 +881,50 @@ xlog_cil_order_write(
> */
> if (ctx->sequence >= sequence)
> continue;
> - if (!ctx->commit_lsn) {
> - /*
> - * It is still being pushed! Wait for the push to
> - * complete, then start again from the beginning.
> - */
> - xlog_wait(&cil->xc_commit_wait, &cil->xc_push_lock);
> - goto restart;
> +
> + /* Wait until the LSN for the record has been recorded. */
> + switch (record) {
> + case _START_RECORD:
> + if (!ctx->start_lsn) {
> + xlog_wait(&cil->xc_start_wait, &cil->xc_push_lock);
> + goto restart;
> + }
> + break;
> + case _COMMIT_RECORD:
> + if (!ctx->commit_lsn) {
> + xlog_wait(&cil->xc_commit_wait, &cil->xc_push_lock);
> + goto restart;
> + }
> + break;
> + default:
> + ASSERT(0);
> + break;
> }
> }
> spin_unlock(&cil->xc_push_lock);
> return 0;
> }
>
> +/*
> + * Write out the log vector change now attached to the CIL context. This will
> + * write a start record that needs to be strictly ordered in ascending CIL
> + * sequence order so that log recovery will always use in-order start LSNs when
> + * replaying checkpoints.
> + */
> +static int
> +xlog_cil_write_chain(
> + struct xfs_cil_ctx *ctx,
> + uint32_t num_bytes)
> +{
> + struct xlog *log = ctx->cil->xc_log;
> + int error;
> +
> + error = xlog_cil_order_write(ctx->cil, ctx->sequence, _START_RECORD);
> + if (error)
> + return error;
> + return xlog_write(log, ctx, &ctx->lv_chain, ctx->ticket, num_bytes);
> +}
> +
> /*
> * Write out the commit record of a checkpoint transaction to close off a
> * running log write. These commit records are strictly ordered in ascending CIL
> @@ -906,7 +958,7 @@ xlog_cil_write_commit_record(
> if (XLOG_FORCED_SHUTDOWN(log))
> return -EIO;
>
> - error = xlog_cil_order_write(ctx->cil, ctx->sequence);
> + error = xlog_cil_order_write(ctx->cil, ctx->sequence, _COMMIT_RECORD);
> if (error)
> return error;
>
> @@ -1125,17 +1177,10 @@ xlog_cil_push_work(
> wait_for_completion(&bdev_flush);
>
> /*
> - * The LSN we need to pass to the log items on transaction commit is the
> - * LSN reported by the first log vector write, not the commit lsn. If we
> - * use the commit record lsn then we can move the tail beyond the grant
> - * write head.
> - */
> - error = xlog_write(log, ctx, &ctx->lv_chain, ctx->ticket, num_bytes);
> -
> - /*
> - * Take the lvhdr back off the lv_chain as it should not be passed
> - * to log IO completion.
> + * Once we write the log vector chain, take the lvhdr back off it as it
> + * must not be passed to log IO completion.
> */
> + error = xlog_cil_write_chain(ctx, num_bytes);
> list_del(&lvhdr.lv_list);
> if (error)
> goto out_abort_free_ticket;
> @@ -1144,15 +1189,6 @@ xlog_cil_push_work(
> if (error)
> goto out_abort_free_ticket;
>
> - /*
> - * now the checkpoint commit is complete and we've attached the
> - * callbacks to the iclog we can assign the commit LSN to the context
> - * and wake up anyone who is waiting for the commit to complete.
> - */
> - spin_lock(&cil->xc_push_lock);
> - wake_up_all(&cil->xc_commit_wait);
> - spin_unlock(&cil->xc_push_lock);
> -
> /*
> * Pull the ticket off the ctx so we can ungrant it after releasing the
> * commit_iclog. The ctx may be freed by the time we return from
> @@ -1728,6 +1764,7 @@ xlog_cil_init(
> init_waitqueue_head(&cil->xc_push_wait);
> init_rwsem(&cil->xc_ctx_lock);
> init_waitqueue_head(&cil->xc_commit_wait);
> + init_waitqueue_head(&cil->xc_start_wait);
> log->l_cilp = cil;
>
> ctx = xlog_cil_ctx_alloc();
> diff --git a/fs/xfs/xfs_log_priv.h b/fs/xfs/xfs_log_priv.h
> index 72dfa3b89513..b807a179b916 100644
> --- a/fs/xfs/xfs_log_priv.h
> +++ b/fs/xfs/xfs_log_priv.h
> @@ -279,6 +279,7 @@ struct xfs_cil {
> bool xc_push_commit_stable;
> struct list_head xc_committing;
> wait_queue_head_t xc_commit_wait;
> + wait_queue_head_t xc_start_wait;
> xfs_csn_t xc_current_sequence;
> wait_queue_head_t xc_push_wait; /* background push throttle */
>
> --
> 2.31.1
>
^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [PATCH 2/8] xfs: don't wait on future iclogs when pushing the CIL
2021-06-17 17:49 ` Darrick J. Wong
@ 2021-06-17 21:55 ` Dave Chinner
0 siblings, 0 replies; 50+ messages in thread
From: Dave Chinner @ 2021-06-17 21:55 UTC (permalink / raw)
To: Darrick J. Wong; +Cc: linux-xfs
On Thu, Jun 17, 2021 at 10:49:10AM -0700, Darrick J. Wong wrote:
> On Thu, Jun 17, 2021 at 06:26:11PM +1000, Dave Chinner wrote:
> > diff --git a/fs/xfs/xfs_log_cil.c b/fs/xfs/xfs_log_cil.c
> > index 705619e9dab4..2fb0ab02dda3 100644
> > --- a/fs/xfs/xfs_log_cil.c
> > +++ b/fs/xfs/xfs_log_cil.c
> > @@ -1075,15 +1075,54 @@ xlog_cil_push_work(
> > ticket = ctx->ticket;
> >
> > /*
> > - * If the checkpoint spans multiple iclogs, wait for all previous
> > - * iclogs to complete before we submit the commit_iclog. In this case,
> > - * the commit_iclog write needs to issue a pre-flush so that the
> > - * ordering is correctly preserved down to stable storage.
> > + * If the checkpoint spans multiple iclogs, wait for all previous iclogs
> > + * to complete before we submit the commit_iclog. We can't use state
> > + * checks for this - ACTIVE can be either a past completed iclog or a
> > + * future iclog being filled, while WANT_SYNC through SYNC_DONE can be a
> > + * past or future iclog awaiting IO or ordered IO completion to be run.
> > + * In the latter case, if it's a future iclog and we wait on it, the we
> > + * will hang because it won't get processed through to ic_force_wait
> > + * wakeup until this commit_iclog is written to disk. Hence we use the
> > + * iclog header lsn and compare it to the commit lsn to determine if we
> > + * need to wait on iclogs or not.
> > */
> > spin_lock(&log->l_icloglock);
> > if (ctx->start_lsn != commit_lsn) {
> > - xlog_wait_on_iclog(commit_iclog->ic_prev);
> > - spin_lock(&log->l_icloglock);
> > + struct xlog_in_core *iclog;
> > +
> > + for (iclog = commit_iclog->ic_prev;
> > + iclog != commit_iclog;
> > + iclog = iclog->ic_prev) {
> > + xfs_lsn_t hlsn;
> > +
> > + /*
> > + * If the LSN of the iclog is zero or in the future it
> > + * means it has passed through IO completion and
> > + * activation and hence all previous iclogs have also
> > + * done so. We do not need to wait at all in this case.
> > + */
> > + hlsn = be64_to_cpu(iclog->ic_header.h_lsn);
> > + if (!hlsn || XFS_LSN_CMP(hlsn, commit_lsn) > 0)
> > + break;
> > +
> > + /*
> > + * If the LSN of the iclog is older than the commit lsn,
> > + * we have to wait on it. Waiting on this via the
> > + * ic_force_wait should also order the completion of all
> > + * older iclogs, too, but we leave checking that to the
> > + * next loop iteration.
> > + */
> > + ASSERT(XFS_LSN_CMP(hlsn, commit_lsn) < 0);
> > + xlog_wait_on_iclog(iclog);
> > + spin_lock(&log->l_icloglock);
>
> The presence of a loop here confuses me a bit -- we really only need to
> check and wait on commit->ic_prev since xlog_wait_on_iclog waits for
> both the iclog that it is given as well as all previous iclogs, right?
I originally wrote this thinking about using the ic_write_wait queue
which would require checking all iclogs in the ring because the
completion signalled at the DONE_SYNC state is not ordered against
other iclogs. Hence I had planned to walk all the iclogs. THen I
realised that checking the LSN could tell us past/future and so we
only needed to wait on the first iclog with a LSN less than the
commit iclog.
ANd so I left the loop in place to ensure that, even if my assertion
about the ring aging order was incorrect, this code would Do The
Right Thing.
> we've waited on commit->ic_prev, the next iclog iterated (i.e.
> commit->ic_prev->ic_prev) should break out of the loop?
Yes, that is what it does.
I can strip this all out - it was really just being defensive
because I wanted to make sure things were working as I expected them
to be working...
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [PATCH 3/8] xfs: move xlog_commit_record to xfs_log_cil.c
2021-06-17 17:50 ` Darrick J. Wong
@ 2021-06-17 21:56 ` Dave Chinner
0 siblings, 0 replies; 50+ messages in thread
From: Dave Chinner @ 2021-06-17 21:56 UTC (permalink / raw)
To: Darrick J. Wong; +Cc: linux-xfs
On Thu, Jun 17, 2021 at 10:50:39AM -0700, Darrick J. Wong wrote:
> On Thu, Jun 17, 2021 at 06:26:12PM +1000, Dave Chinner wrote:
> > diff --git a/fs/xfs/xfs_log_cil.c b/fs/xfs/xfs_log_cil.c
> > index 2fb0ab02dda3..2c8b25888c53 100644
> > --- a/fs/xfs/xfs_log_cil.c
> > +++ b/fs/xfs/xfs_log_cil.c
> > @@ -783,6 +783,48 @@ xlog_cil_build_trans_hdr(
> > tic->t_curr_res -= lvhdr->lv_bytes;
> > }
> >
> > +/*
> > + * Write out the commit record of a checkpoint transaction associated with the
> > + * given ticket to close off a running log write. Return the lsn of the commit
> > + * record.
> > + */
> > +int
>
> static int, like the robot suggests?
Huh. How did that get dropped? I definitely made this static in the
original patch....
> With that fixed,
> Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Ta.
-Dave.
--
Dave Chinner
david@fromorbit.com
^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [PATCH 4/8] xfs: pass a CIL context to xlog_write()
2021-06-17 20:24 ` Darrick J. Wong
@ 2021-06-17 22:03 ` Dave Chinner
2021-06-17 22:18 ` Darrick J. Wong
0 siblings, 1 reply; 50+ messages in thread
From: Dave Chinner @ 2021-06-17 22:03 UTC (permalink / raw)
To: Darrick J. Wong; +Cc: linux-xfs
On Thu, Jun 17, 2021 at 01:24:02PM -0700, Darrick J. Wong wrote:
> On Thu, Jun 17, 2021 at 06:26:13PM +1000, Dave Chinner wrote:
> > From: Dave Chinner <dchinner@redhat.com>
> >
> > Pass the CIL context to xlog_write() rather than a pointer to a LSN
> > variable. Only the CIL checkpoint calls to xlog_write() need to know
> > about the start LSN of the writes, so rework xlog_write to directly
> > write the LSNs into the CIL context structure.
> >
> > This removes the commit_lsn variable from xlog_cil_push_work(), so
> > now we only have to issue the commit record ordering wakeup from
> > there.
> >
> > Signed-off-by: Dave Chinner <dchinner@redhat.com>
> > ---
> > fs/xfs/xfs_log.c | 22 +++++++++++++++++-----
> > fs/xfs/xfs_log_cil.c | 19 ++++++++-----------
> > fs/xfs/xfs_log_priv.h | 4 ++--
> > 3 files changed, 27 insertions(+), 18 deletions(-)
> >
> > diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c
> > index cf661c155786..fc0e43c57683 100644
> > --- a/fs/xfs/xfs_log.c
> > +++ b/fs/xfs/xfs_log.c
> > @@ -871,7 +871,7 @@ xlog_write_unmount_record(
> > */
> > if (log->l_targ != log->l_mp->m_ddev_targp)
> > blkdev_issue_flush(log->l_targ->bt_bdev);
> > - return xlog_write(log, &lv_chain, ticket, NULL, NULL, reg.i_len);
> > + return xlog_write(log, NULL, &lv_chain, ticket, NULL, reg.i_len);
> > }
> >
> > /*
> > @@ -2383,9 +2383,9 @@ xlog_write_partial(
> > int
> > xlog_write(
> > struct xlog *log,
> > + struct xfs_cil_ctx *ctx,
> > struct list_head *lv_chain,
> > struct xlog_ticket *ticket,
> > - xfs_lsn_t *start_lsn,
> > struct xlog_in_core **commit_iclog,
> > uint32_t len)
> > {
> > @@ -2408,9 +2408,21 @@ xlog_write(
> > if (error)
> > return error;
> >
> > - /* start_lsn is the LSN of the first iclog written to. */
> > - if (start_lsn)
> > - *start_lsn = be64_to_cpu(iclog->ic_header.h_lsn);
> > + /*
> > + * If we have a CIL context, record the LSN of the iclog we were just
> > + * granted space to start writing into. If the context doesn't have
> > + * a start_lsn recorded, then this iclog will contain the start record
> > + * for the checkpoint. Otherwise this write contains the commit record
> > + * for the checkpoint.
> > + */
> > + if (ctx) {
> > + spin_lock(&ctx->cil->xc_push_lock);
> > + if (!ctx->start_lsn)
> > + ctx->start_lsn = be64_to_cpu(iclog->ic_header.h_lsn);
> > + else
> > + ctx->commit_lsn = be64_to_cpu(iclog->ic_header.h_lsn);
> > + spin_unlock(&ctx->cil->xc_push_lock);
>
> This cycling of the push lock when setting start_lsn is new. What are
> we protecting against here by taking the lock?
Later in the series it will be the ordering wakeup when we set the
start_lsn. The ordering ends with both start_lsn and commit_lsn
being treated the same way w.r.t. wakeups, so I just started it off
the same way here.
> Also, just to check my assumptions: why do we take the push lock when
> setting commit_lsn? Is that to synchronize with the xc_committing loop
> that looks for contexts that need pushing?
Yes - the spinlock provides the memory barriers for access to the
variable. I could use WRITE_ONCE/READ_ONCE here for this specific patch,
but the lock is necessary for compound operations in upcoming
patches so it didn't make any sense to use _ONCE macros here only to
remove them again later.
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [PATCH 6/8] xfs: separate out setting CIL context LSNs from xlog_write
2021-06-17 20:28 ` Darrick J. Wong
@ 2021-06-17 22:10 ` Dave Chinner
0 siblings, 0 replies; 50+ messages in thread
From: Dave Chinner @ 2021-06-17 22:10 UTC (permalink / raw)
To: Darrick J. Wong; +Cc: linux-xfs
On Thu, Jun 17, 2021 at 01:28:24PM -0700, Darrick J. Wong wrote:
> On Thu, Jun 17, 2021 at 06:26:15PM +1000, Dave Chinner wrote:
> > From: Dave Chinner <dchinner@redhat.com>
> >
> > In preparation for moving more CIL context specific functionality
> > into these operations.
> >
> > Signed-off-by: Dave Chinner <dchinner@redhat.com>
>
> Looks fine as a hoist, though I wonder why you didn't do this in patch
> 4?
Because I wanted to keep the xlog_write() api change separate to
relocating the lsn code out of xlog_write().
There are enough review comments of "don't move and modify in the
one patch" that I won't even bother trying to do even simple "move
and modify" operations in a single patch anymore.
I can combine them if you want, but then someone is bound to pop up
in another review cycle and say "please separate....". :/
-Dave.
--
Dave Chinner
david@fromorbit.com
^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [PATCH 4/8] xfs: pass a CIL context to xlog_write()
2021-06-17 22:03 ` Dave Chinner
@ 2021-06-17 22:18 ` Darrick J. Wong
0 siblings, 0 replies; 50+ messages in thread
From: Darrick J. Wong @ 2021-06-17 22:18 UTC (permalink / raw)
To: Dave Chinner; +Cc: linux-xfs
On Fri, Jun 18, 2021 at 08:03:37AM +1000, Dave Chinner wrote:
> On Thu, Jun 17, 2021 at 01:24:02PM -0700, Darrick J. Wong wrote:
> > On Thu, Jun 17, 2021 at 06:26:13PM +1000, Dave Chinner wrote:
> > > From: Dave Chinner <dchinner@redhat.com>
> > >
> > > Pass the CIL context to xlog_write() rather than a pointer to a LSN
> > > variable. Only the CIL checkpoint calls to xlog_write() need to know
> > > about the start LSN of the writes, so rework xlog_write to directly
> > > write the LSNs into the CIL context structure.
> > >
> > > This removes the commit_lsn variable from xlog_cil_push_work(), so
> > > now we only have to issue the commit record ordering wakeup from
> > > there.
> > >
> > > Signed-off-by: Dave Chinner <dchinner@redhat.com>
> > > ---
> > > fs/xfs/xfs_log.c | 22 +++++++++++++++++-----
> > > fs/xfs/xfs_log_cil.c | 19 ++++++++-----------
> > > fs/xfs/xfs_log_priv.h | 4 ++--
> > > 3 files changed, 27 insertions(+), 18 deletions(-)
> > >
> > > diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c
> > > index cf661c155786..fc0e43c57683 100644
> > > --- a/fs/xfs/xfs_log.c
> > > +++ b/fs/xfs/xfs_log.c
> > > @@ -871,7 +871,7 @@ xlog_write_unmount_record(
> > > */
> > > if (log->l_targ != log->l_mp->m_ddev_targp)
> > > blkdev_issue_flush(log->l_targ->bt_bdev);
> > > - return xlog_write(log, &lv_chain, ticket, NULL, NULL, reg.i_len);
> > > + return xlog_write(log, NULL, &lv_chain, ticket, NULL, reg.i_len);
> > > }
> > >
> > > /*
> > > @@ -2383,9 +2383,9 @@ xlog_write_partial(
> > > int
> > > xlog_write(
> > > struct xlog *log,
> > > + struct xfs_cil_ctx *ctx,
> > > struct list_head *lv_chain,
> > > struct xlog_ticket *ticket,
> > > - xfs_lsn_t *start_lsn,
> > > struct xlog_in_core **commit_iclog,
> > > uint32_t len)
> > > {
> > > @@ -2408,9 +2408,21 @@ xlog_write(
> > > if (error)
> > > return error;
> > >
> > > - /* start_lsn is the LSN of the first iclog written to. */
> > > - if (start_lsn)
> > > - *start_lsn = be64_to_cpu(iclog->ic_header.h_lsn);
> > > + /*
> > > + * If we have a CIL context, record the LSN of the iclog we were just
> > > + * granted space to start writing into. If the context doesn't have
> > > + * a start_lsn recorded, then this iclog will contain the start record
> > > + * for the checkpoint. Otherwise this write contains the commit record
> > > + * for the checkpoint.
> > > + */
> > > + if (ctx) {
> > > + spin_lock(&ctx->cil->xc_push_lock);
> > > + if (!ctx->start_lsn)
> > > + ctx->start_lsn = be64_to_cpu(iclog->ic_header.h_lsn);
> > > + else
> > > + ctx->commit_lsn = be64_to_cpu(iclog->ic_header.h_lsn);
> > > + spin_unlock(&ctx->cil->xc_push_lock);
> >
> > This cycling of the push lock when setting start_lsn is new. What are
> > we protecting against here by taking the lock?
>
> Later in the series it will be the ordering wakeup when we set the
> start_lsn. The ordering ends with both start_lsn and commit_lsn
> being treated the same way w.r.t. wakeups, so I just started it off
> the same way here.
Ah, right, I see that now that I've gotten to patch 8.
> > Also, just to check my assumptions: why do we take the push lock when
> > setting commit_lsn? Is that to synchronize with the xc_committing loop
> > that looks for contexts that need pushing?
>
> Yes - the spinlock provides the memory barriers for access to the
> variable. I could use WRITE_ONCE/READ_ONCE here for this specific patch,
> but the lock is necessary for compound operations in upcoming
> patches so it didn't make any sense to use _ONCE macros here only to
> remove them again later.
Nah, I'd leave it, especially since it's already a little strange that
the place where we set ctx->commit_lsn bounces around relative to the
callback list splicing...
--D
> Cheers,
>
> Dave.
> --
> Dave Chinner
> david@fromorbit.com
^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [PATCH 7/8] xfs: attached iclog callbacks in xlog_cil_set_ctx_write_state()
2021-06-17 20:55 ` Darrick J. Wong
@ 2021-06-17 22:20 ` Dave Chinner
0 siblings, 0 replies; 50+ messages in thread
From: Dave Chinner @ 2021-06-17 22:20 UTC (permalink / raw)
To: Darrick J. Wong; +Cc: linux-xfs
On Thu, Jun 17, 2021 at 01:55:52PM -0700, Darrick J. Wong wrote:
> On Thu, Jun 17, 2021 at 06:26:16PM +1000, Dave Chinner wrote:
> > From: Dave Chinner <dchinner@redhat.com>
> >
> > We currently attach iclog callbacks for the CIL when the commit
> > iclog is returned from xlog_write. Because
> > xlog_state_get_iclog_space() always guarantees that the commit
> > record will fit in the iclog it returns, we can move this IO
> > callback setting to xlog_cil_set_ctx_write_state(), record the
> > commit iclog in the context and remove the need for the commit iclog
> > to be returned by xlog_write() altogether.
> >
> >
> > Signed-off-by: Dave Chinner <dchinner@redhat.com>
> > ---
> > fs/xfs/xfs_log.c | 8 ++----
> > fs/xfs/xfs_log_cil.c | 65 +++++++++++++++++++++++++------------------
> > fs/xfs/xfs_log_priv.h | 3 +-
> > 3 files changed, 42 insertions(+), 34 deletions(-)
> >
> > diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c
> > index 1c214b395223..359246d54db7 100644
> > --- a/fs/xfs/xfs_log.c
> > +++ b/fs/xfs/xfs_log.c
> > @@ -871,7 +871,7 @@ xlog_write_unmount_record(
> > */
> > if (log->l_targ != log->l_mp->m_ddev_targp)
> > blkdev_issue_flush(log->l_targ->bt_bdev);
> > - return xlog_write(log, NULL, &lv_chain, ticket, NULL, reg.i_len);
> > + return xlog_write(log, NULL, &lv_chain, ticket, reg.i_len);
> > }
> >
> > /*
> > @@ -2386,7 +2386,6 @@ xlog_write(
> > struct xfs_cil_ctx *ctx,
> > struct list_head *lv_chain,
> > struct xlog_ticket *ticket,
> > - struct xlog_in_core **commit_iclog,
> > uint32_t len)
> > {
> > struct xlog_in_core *iclog = NULL;
> > @@ -2436,10 +2435,7 @@ xlog_write(
> > */
> > spin_lock(&log->l_icloglock);
> > xlog_state_finish_copy(log, iclog, record_cnt, 0);
> > - if (commit_iclog)
> > - *commit_iclog = iclog;
> > - else
> > - error = xlog_state_release_iclog(log, iclog, ticket);
> > + error = xlog_state_release_iclog(log, iclog, ticket);
> > spin_unlock(&log->l_icloglock);
> >
> > return error;
> > diff --git a/fs/xfs/xfs_log_cil.c b/fs/xfs/xfs_log_cil.c
> > index 2d8d904ffb78..87e30917ce2e 100644
> > --- a/fs/xfs/xfs_log_cil.c
> > +++ b/fs/xfs/xfs_log_cil.c
> > @@ -799,11 +799,34 @@ xlog_cil_set_ctx_write_state(
> >
> > ASSERT(!ctx->commit_lsn);
> > spin_lock(&cil->xc_push_lock);
> > - if (!ctx->start_lsn)
> > + if (!ctx->start_lsn) {
> > ctx->start_lsn = lsn;
> > - else
> > - ctx->commit_lsn = lsn;
> > + spin_unlock(&cil->xc_push_lock);
> > + return;
> > + }
> > +
> > + /*
> > + * Take a reference to the iclog for the context so that we still hold
> > + * it when xlog_write is done and has released it. This means the
> > + * context controls when the iclog is released for IO.
> > + */
> > + atomic_inc(&iclog->ic_refcnt);
>
> Where do we drop this refcount?
In xlog_cil_push_work() where we call xlog_state_release_iclog().
> Is this the accounting adjustment that
> we have to make because xlog_write always decrements the iclog refcount
> now?
Yes.
> > + ctx->commit_iclog = iclog;
> > + ctx->commit_lsn = lsn;
> > spin_unlock(&cil->xc_push_lock);
>
> I've noticed how the setting of ctx->commit_lsn has moved to before the
> point where we splice callback lists, only to move them back below in
> the next patch. That has made it harder for me to understand this
> series.
>
> I /think/ the goal of this patch is not really a functional change so
> much as a refactoring to make the cil context track the commit iclog
> directly and then smooth out some of the refcounting code, but the
> shuffling around of these variables makes me wonder if I'm missing some
> other subtlety.
The subtlety is that we can't issue the wakup on the commit_lsn
until after the callbacks are attached to the commit iclog. When we
set ctx->commit_lsn doesn't really matter - I'm trying to keep the
order of "callbacks attached before we issue the wakeup" so that
when the waiter is woken and then adds it's callbacks to the same
iclog they will be appended to the list after the first commit
record's callbacks and hence they get processed in the correct order
when journal IO completion runs the callbacks on that iclog.
This patch doesn't move the wakeup from after the xlog_write() call
completes, so the ordering of setting
ctx->commit_lsn and attaching the callbacks inside xlog_write()
doesn't really matter. In the next patch, the wakeups move inside
xlog_write()->xlog_cil_set_ctx_write_state(), and so now it has to
ensure that the ordering is correct.
I'll rework the patches so that this one sets up the order the next
patch requires rather than minimal change in this patch and reorder
in the next patch...
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [PATCH 8/8] xfs: order CIL checkpoint start records
2021-06-17 21:31 ` Darrick J. Wong
@ 2021-06-17 22:49 ` Dave Chinner
0 siblings, 0 replies; 50+ messages in thread
From: Dave Chinner @ 2021-06-17 22:49 UTC (permalink / raw)
To: Darrick J. Wong; +Cc: linux-xfs
On Thu, Jun 17, 2021 at 02:31:43PM -0700, Darrick J. Wong wrote:
> On Thu, Jun 17, 2021 at 06:26:17PM +1000, Dave Chinner wrote:
> > From: Dave Chinner <dchinner@redhat.com>
> >
> > Because log recovery depends on strictly ordered start records as
> > well as strictly ordered commit records.
> >
> > This is a zero day bug in the way XFS writes pipelined transactions
> > to the journal which is exposed by commit facd77e4e38b ("xfs: CIL
> > work is serialised, not pipelined") which re-introduces explicit
> > concurrent commits back into the on-disk journal.
> >
> > The XFS journal commit code has never ordered start records and we
> > have relied on strict commit record ordering for correct recovery
> > ordering of concurrently written transactions. Unfortunately, root
> > cause analysis uncovered the fact that log recovery uses the LSN of
> > the start record for transaction commit processing. Hence the
> > commits are processed in strict orderi by recovery, but the LSNs
>
> s/orderi/order/ ?
>
> > associated with the commits can be out of order and so recovery may
> > stamp incorrect LSNs into objects and/or misorder intents in the AIL
> > for later processing. This can result in log recovery failures
> > and/or on disk corruption, sometimes silent.
> >
> > Because this is a long standing log recovery issue, we can't just
> > fix log recovery and call it good.
>
> Could there be production filesystems out there that have this
> mismatched ordering of start lsn and commit lsn? This still leaves the
> mystery of crashed customer filesystems containing btree blocks where
> 128 bytes in the middle clearly contain contents that are don't match or
> duplicate the rest of the block, as though someone forgot to replay a
> buffer vector or something.
Modulo bugs in delayed logging, I doubt there's any delayed logging
filesystems out there that have the problem. Older, non-delayed
logging filesystems are almost certain to see it, but they have much
smaller transactions and only EFIs to deal with so the corruption
risk is much, much, much lower.
> What would a fix to log recovery entail? Not skipping recovered items
> if the start/commit sequencing is not the same? Or am I not
> understanding the problem correctly?
I've been going back and forth on this trying to come up with a sane
solution, but I haven't come up with anything practical.
We could use the commit record LSN for recovery, but we write start
record LSNs into on-disk metadata when we flush it to disk and that
forces checkpoints that need recovery to use the same LSN in the
metadata it recovers and writes back as we use for runtime
writeback. Hence we then get problems with recovered filesystems not
having the same on-disk state as they would if the metadata was
written back from in-memory. i.e. two pieces of metadata in the same
atomic transaction could have different LSNs stamped in them
depending on whether they were written back at runtime or recovered
by log recovery at mount time...
And then my head explodes trying to work out what happens when we
have overlapping checkpoints and partial metadata writeback and
different LSN values for recovery vs writeback and recovery retries
after a failed recovery and <BOOM>
However, given that there are runtime integrity issues with out of
order start LSNs (log head can overwrite the log tail - I can give
more detail if you want), the only way out of this I can see is to
ensure that the start records are properly ordered at runtime to
avoid all the potential runtime issues that exist. This also has
the nice "side effect" of avoiding the log recovery LSN ordering
problem.
IOWs, I'm not looking at this as log recovery bug that needs fixing.
Yes, there is a log recovery issue there (and has been forever), but
the more I think on this, the more I'm concerned about the potential
runtime impacts on data integrity correctness and potential
head-tail journal overwrite corruption.
> > + ctx->commit_lsn = lsn;
> > + wake_up_all(&cil->xc_commit_wait);
> > + spin_unlock(&cil->xc_push_lock);
> > }
> >
> > /*
> > @@ -834,10 +849,16 @@ xlog_cil_set_ctx_write_state(
> > * relies on the context LSN being zero until the log write has guaranteed the
> > * LSN that the log write will start at via xlog_state_get_iclog_space().
> > */
> > +enum {
> > + _START_RECORD,
> > + _COMMIT_RECORD,
> > +};
>
> Stupid nit: If this enum had a name you could skip the default clause
> below because the compiler would typecheck the usage for you.
OK.
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [PATCH 0/8 V2] xfs: log fixes for for-next
2021-06-17 20:26 ` Darrick J. Wong
@ 2021-06-17 23:31 ` Brian Foster
0 siblings, 0 replies; 50+ messages in thread
From: Brian Foster @ 2021-06-17 23:31 UTC (permalink / raw)
To: Darrick J. Wong; +Cc: Dave Chinner, linux-xfs
On Thu, Jun 17, 2021 at 01:26:42PM -0700, Darrick J. Wong wrote:
> On Thu, Jun 17, 2021 at 04:06:24PM -0400, Brian Foster wrote:
> > On Thu, Jun 17, 2021 at 12:05:19PM -0700, Darrick J. Wong wrote:
> > > On Thu, Jun 17, 2021 at 02:32:30PM -0400, Brian Foster wrote:
> > > > On Thu, Jun 17, 2021 at 06:26:09PM +1000, Dave Chinner wrote:
> > > > > Hi folks,
> > > > >
> > > > > This is followup from the first set of log fixes for for-next that
> > > > > were posted here:
> > > > >
> > > > > https://lore.kernel.org/linux-xfs/20210615175719.GD158209@locust/T/#mde2cf0bb7d2ac369815a7e9371f0303efc89f51b
> > > > >
> > > > > The first two patches of this series are updates for those patches,
> > > > > change log below. The rest is the fix for the bigger issue we
> > > > > uncovered in investigating the generic/019 failures, being that
> > > > > we're triggering a zero-day bug in the way log recovery assigns LSNs
> > > > > to checkpoints.
> > > > >
> > > > > The "simple" fix of using the same ordering code as the commit
> > > > > record for the start records in the CIL push turned into a lot of
> > > > > patches once I started cleaning it up, separating out all the
> > > > > different bits and finally realising all the things I needed to
> > > > > change to avoid unintentional logic/behavioural changes. Hence
> > > > > there's some code movement, some factoring, API changes to
> > > > > xlog_write(), changing where we attach callbacks to commit iclogs so
> > > > > they remain correctly ordered if there are multiple commit records
> > > > > in the one iclog and then, finally, strictly ordering the start
> > > > > records....
> > > > >
> > > > > The original "simple fix" I tested last night ran almost a thousand
> > > > > cycles of generic/019 without a log hang or recovery failure of any
> > > > > kind. The refactored patchset has run a couple hundred cycles of
> > > > > g/019 and g/475 over the last few hours without a failure, so I'm
> > > > > posting this so we can get a review iteration done while I sleep so
> > > > > we can - hopefully - get this sorted out before the end of the week.
> > > > >
> > > >
> > > > My first spin of this included generic/019 and generic/475, ran for 18
> > > > or so iterations and 475 exploded with a stream of asserts followed by a
> > > > NULL pointer crash:
> > > >
> > > > # grep -e Assertion -e BUG dmesg.out
> > > > ...
> > > > [ 7951.878058] XFS: Assertion failed: atomic_read(&buip->bui_refcount) > 0, file: fs/xfs/xfs_bmap_item.c, line: 57
> > > > [ 7952.261251] XFS: Assertion failed: atomic_read(&buip->bui_refcount) > 0, file: fs/xfs/xfs_bmap_item.c, line: 57
> > > > [ 7952.644444] XFS: Assertion failed: atomic_read(&buip->bui_refcount) > 0, file: fs/xfs/xfs_bmap_item.c, line: 57
> > > > [ 7953.027626] XFS: Assertion failed: atomic_read(&buip->bui_refcount) > 0, file: fs/xfs/xfs_bmap_item.c, line: 57
> > > > [ 7953.410804] BUG: kernel NULL pointer dereference, address: 000000000000031f
> > > > [ 7954.118973] BUG: unable to handle page fault for address: ffffa57ccf99fa98
> > > >
> > > > I don't know if this is a regression, but I've not seen it before. I've
> > > > attempted to spin generic/475 since then to see if it reproduces again,
> > > > but so far I'm only running into some of the preexisting issues
> > > > associated with that test.
> > >
> > > By any chance, do the two log recovery fixes I sent yesterday make those
> > > problems go away?
> > >
> >
> > Hadn't got to those ones yet...
>
> <nod>
>
> > > > I'll let it go a while more and probably
> > > > switch it back to running both sometime before the end of the day for an
> > > > overnight test.
> > >
> > > Also, do the CIL livelocks go away if you apply only patches 1-2?
> > >
> >
> > It's kind of hard to discern the effect of individual fixes when
> > multiple corruptions are at play. :/ I suppose I could switch up my
> > planned overnight test to include the aforementioned 2 recovery fixes
> > and 1-2 from this series, if that is preferable..?
>
> I dunno about overnight, but at least ~20 or so iterations?
>
> > I suspect that would
> > leave around the originally reported generic/019 corruption presumably
> > caused by the start LSN ordering issue, but we could see if the deadlock
> > is addressed and whether 475 survives any longer.
>
> Might be a useful data point to figure out if these pieces are separate
> or if they really do belong in an 8 patch series, since I think ~20 or
> so iterations shouldn't take too long (though I guess it is nearly 16:30
> your time, isn't it...) Well, do whatever you think is best use of
> machine time.
>
With the above combination of the first two patches in this series and
your two separate patches, I see no occurrence of a hang in ~50 iters of
generic/019 and do hit the preexisting generic/475 corruption in ~20
iters.
Brian
> --D
>
> >
> > Brian
> >
> > > > A full copy of the assert and NULL pointer BUG splat is included below
> > > > for reference. It looks like the fault BUG splat ended up interspersed
> > > > or otherwise mangled, but I suspect that one is just fallout from the
> > > > immediately previous crash.
> > >
> > > I have a question about the composition of this 8-patch series --
> > > which patches fix the new cil code, and which ones fix the out of order
> > > recovery problems? I suspect that patches 1-2 are for the new CIL code,
> > > and 3-8 are to fix the recovery problems.
> > >
> > > Thinking with my distro kernel not-maintainer hat on, I'm considering
> > > how to backport whatever fixes emerge for the recovery ordering issue
> > > into existing kernels. The way I see things right now, the CIL changes
> > > (+ fixes) and the ordering bug fixes are separate issues. The log
> > > ordering problems should get fixed as soon as we have a practical
> > > solution; the CIL changes could get deferred if need be since it's a
> > > medium-high risk; and the real question is how to sequence all this?
> > >
> > > (Or to put it another way: I'm still stuck going "oh wowwww this is a
> > > lot more change" while trying to understand patch 4)
> > >
> > > --D
> > >
> > > >
> > > > Brian
> > > >
> > > > --- 8< ---
> > > >
> > > > [ 7953.027626] XFS: Assertion failed: atomic_read(&buip->bui_refcount) > 0, file: fs/xfs/xfs_bmap_item.c, line: 57
> > > > [ 7953.037737] ------------[ cut here ]------------
> > > > [ 7953.042358] WARNING: CPU: 0 PID: 131627 at fs/xfs/xfs_message.c:112 assfail+0x25/0x28 [xfs]
> > > > [ 7953.050782] Modules linked in: rfkill dm_service_time dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua rpcrdma sunrpc rdma_ucm ib_srpt ib_isert iscsi_target_mod target_core_mod ib_iser libiscsi scsi_transport_iscsi rdma_cm ib_umad ib_ipoib iw_cm ib_cm mlx5_ib intel_rapl_msr intel_rapl_common isst_if_common ib_uverbs ib_core skx_edac nfit libnvdimm x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel mlx5_core kvm ipmi_ssif iTCO_wdt intel_pmc_bxt irqbypass iTCO_vendor_support rapl acpi_ipmi intel_cstate psample intel_uncore mei_me wmi_bmof ipmi_si pcspkr mlxfw i2c_i801 tg3 pci_hyperv_intf mei lpc_ich intel_pch_thermal ipmi_devintf i2c_smbus ipmi_msghandler acpi_power_meter fuse zram ip_tables xfs lpfc mgag200 drm_kms_helper nvmet_fc nvmet cec nvme_fc crct10dif_pclmul crc32_pclmul nvme_fabrics drm crc32c_intel nvme_core ghash_clmulni_intel scsi_transport_fc megaraid_sas i2c_algo_bit wmi
> > > > [ 7953.129548] CPU: 0 PID: 131627 Comm: kworker/u161:5 Tainted: G W I 5.13.0-rc4+ #70
> > > > [ 7953.138243] Hardware name: Dell Inc. PowerEdge R740/01KPX8, BIOS 1.6.11 11/20/2018
> > > > [ 7953.145818] Workqueue: xfs-cil/dm-7 xlog_cil_push_work [xfs]
> > > > [ 7953.151554] RIP: 0010:assfail+0x25/0x28 [xfs]
> > > > [ 7953.155991] Code: ff ff 0f 0b c3 0f 1f 44 00 00 41 89 c8 48 89 d1 48 89 f2 48 c7 c6 d8 eb c3 c0 e8 cf fa ff ff 80 3d f1 d4 0a 00 00 74 02 0f 0b <0f> 0b c3 48 8d 45 10 48 89 e2 4c 89 e6 48 89 1c 24 48 89 44 24 18
> > > > [ 7953.174745] RSP: 0018:ffffa57ccf99fa50 EFLAGS: 00010246
> > > > [ 7953.179982] RAX: 00000000ffffffea RBX: 0000000500003977 RCX: 0000000000000000
> > > > [ 7953.187121] RDX: 00000000ffffffc0 RSI: 0000000000000000 RDI: ffffffffc0c300e2
> > > > [ 7953.194264] RBP: ffff91f685725040 R08: 0000000000000000 R09: 000000000000000a
> > > > [ 7953.201405] R10: 000000000000000a R11: f000000000000000 R12: ffff91f685725040
> > > > [ 7953.208546] R13: 0000000000000000 R14: ffff91f66abed140 R15: ffff91c76dfccb40
> > > > [ 7953.215686] FS: 0000000000000000(0000) GS:ffff91f580800000(0000) knlGS:0000000000000000
> > > > [ 7953.223781] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > > [ 7953.229533] CR2: 00000000020e2108 CR3: 0000003d02826003 CR4: 00000000007706f0
> > > > [ 7953.236667] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > > > [ 7953.243809] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> > > > [ 7953.250949] PKRU: 55555554
> > > > [ 7953.253669] Call Trace:
> > > > [ 7953.256123] xfs_bui_release+0x4b/0x50 [xfs]
> > > > [ 7953.260466] xfs_trans_committed_bulk+0x158/0x2c0 [xfs]
> > > > [ 7953.265762] ? lock_release+0x1cd/0x2a0
> > > > [ 7953.269610] ? _raw_spin_unlock+0x1f/0x30
> > > > [ 7953.273630] ? xlog_write+0x1e2/0x630 [xfs]
> > > > [ 7953.277886] ? lock_acquire+0x15d/0x380
> > > > [ 7953.281732] ? lock_acquire+0x15d/0x380
> > > > [ 7953.285582] ? lock_release+0x1cd/0x2a0
> > > > [ 7953.289428] ? trace_hardirqs_on+0x1b/0xd0
> > > > [ 7953.293536] ? _raw_spin_unlock_irqrestore+0x37/0x40
> > > > [ 7953.298511] ? __wake_up_common_lock+0x7a/0x90
> > > > [ 7953.302966] ? lock_release+0x1cd/0x2a0
> > > > [ 7953.306813] xlog_cil_committed+0x34f/0x390 [xfs]
> > > > [ 7953.311593] ? xlog_cil_push_work+0x715/0x8d0 [xfs]
> > > > [ 7953.316547] xlog_cil_push_work+0x740/0x8d0 [xfs]
> > > > [ 7953.321321] ? _raw_spin_unlock_irq+0x24/0x40
> > > > [ 7953.325689] ? finish_task_switch.isra.0+0xa0/0x2c0
> > > > [ 7953.330580] ? kmem_cache_free+0x247/0x5c0
> > > > [ 7953.334685] ? fsnotify_final_mark_destroy+0x1c/0x30
> > > > [ 7953.339658] ? lock_acquire+0x15d/0x380
> > > > [ 7953.343505] ? lock_acquire+0x15d/0x380
> > > > [ 7953.347353] ? lock_release+0x1cd/0x2a0
> > > > [ 7953.351203] process_one_work+0x26e/0x560
> > > > [ 7953.355225] worker_thread+0x52/0x3b0
> > > > [ 7953.358898] ? process_one_work+0x560/0x560
> > > > [ 7953.363094] kthread+0x12c/0x150
> > > > [ 7953.366335] ? __kthread_bind_mask+0x60/0x60
> > > > [ 7953.370617] ret_from_fork+0x22/0x30
> > > > [ 7953.374206] irq event stamp: 0
> > > > [ 7953.377268] hardirqs last enabled at (0): [<0000000000000000>] 0x0
> > > > [ 7953.383544] hardirqs last disabled at (0): [<ffffffffb50da3f4>] copy_process+0x754/0x1d00
> > > > [ 7953.391724] softirqs last enabled at (0): [<ffffffffb50da3f4>] copy_process+0x754/0x1d00
> > > > [ 7953.399907] softirqs last disabled at (0): [<0000000000000000>] 0x0
> > > > [ 7953.406179] ---[ end trace f04c960f66265f3a ]---
> > > > [ 7953.410804] BUG: kernel NULL pointer dereference, address: 000000000000031f
> > > > [ 7953.417760] #PF: supervisor read access in kernel mode
> > > > [ 7953.422900] #PF: error_code(0x0000) - not-present page
> > > > [ 7953.428038] PGD 0 P4D 0
> > > > [ 7953.430579] Oops: 0000 [#1] SMP PTI
> > > > [ 7953.434070] CPU: 0 PID: 131627 Comm: kworker/u161:5 Tainted: G W I 5.13.0-rc4+ #70
> > > > [ 7953.442764] Hardware name: Dell Inc. PowerEdge R740/01KPX8, BIOS 1.6.11 11/20/2018
> > > > [ 7953.450330] Workqueue: xfs-cil/dm-7 xlog_cil_push_work [xfs]
> > > > [ 7953.456058] RIP: 0010:xfs_trans_committed_bulk+0xcc/0x2c0 [xfs]
> > > > [ 7953.462036] Code: 41 83 c5 01 48 89 54 c4 50 41 83 fd 1f 0f 8f 11 01 00 00 4d 8b 36 4c 3b 34 24 74 28 4d 8b 66 20 40 84 ed 75 54 49 8b 44 24 60 <f6> 00 01 74 91 48 8b 40 38 4c 89 e7 e8 63 6b 42 f5 4d 8b 36 4c 3b
> > > > [ 7953.480783] RSP: 0018:ffffa57ccf99fa68 EFLAGS: 00010202
> > > > [ 7953.486009] RAX: 000000000000031f RBX: 0000000500003977 RCX: 0000000000000000
> > > > [ 7953.493141] RDX: 00000000ffffffc0 RSI: 0000000000000000 RDI: ffffffffc0c300e2
> > > > [ 7953.500274] RBP: 0000000000000001 R08: 0000000000000000 R09: 000000000000000a
> > > > [ 7953.507404] R10: 000000000000000a R11: f000000000000000 R12: ffff91c759fedb20
> > > > [ 7953.514536] R13: 0000000000000000 R14: ffff91c759fedb00 R15: ffff91c76dfccb40
> > > > [ 7953.521671] FS: 0000000000000000(0000) GS:ffff91f580800000(0000) knlGS:0000000000000000
> > > > [ 7953.529757] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > > [ 7953.535501] CR2: 000000000000031f CR3: 0000003d02826003 CR4: 00000000007706f0
> > > > [ 7953.542633] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > > > [ 7953.549768] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> > > > [ 7953.556899] PKRU: 55555554
> > > > [ 7953.559612] Call Trace:
> > > > [ 7953.562064] ? lock_release+0x1cd/0x2a0
> > > > [ 7953.565902] ? _raw_spin_unlock+0x1f/0x30
> > > > [ 7953.569917] ? xlog_write+0x1e2/0x630 [xfs]
> > > > [ 7953.574162] ? lock_acquire+0x15d/0x380
> > > > [ 7953.578000] ? lock_acquire+0x15d/0x380
> > > > [ 7953.581841] ? lock_release+0x1cd/0x2a0
> > > > [ 7953.585680] ? trace_hardirqs_on+0x1b/0xd0
> > > > [ 7953.589780] ? _raw_spin_unlock_irqrestore+0x37/0x40
> > > > [ 7953.594744] ? __wake_up_common_lock+0x7a/0x90
> > > > [ 7953.599192] ? lock_release+0x1cd/0x2a0
> > > > [ 7953.603031] xlog_cil_committed+0x34f/0x390 [xfs]
> > > > [ 7953.607798] ? xlog_cil_push_work+0x715/0x8d0 [xfs]
> > > > [ 7953.612738] xlog_cil_push_work+0x740/0x8d0 [xfs]
> > > > [ 7953.617504] ? _raw_spin_unlock_irq+0x24/0x40
> > > > [ 7953.621862] ? finish_task_switch.isra.0+0xa0/0x2c0
> > > > [ 7953.626745] ? kmem_cache_free+0x247/0x5c0
> > > > [ 7953.630839] ? fsnotify_final_mark_destroy+0x1c/0x30
> > > > [ 7953.635806] ? lock_acquire+0x15d/0x380
> > > > [ 7953.639646] ? lock_acquire+0x15d/0x380
> > > > [ 7953.643484] ? lock_release+0x1cd/0x2a0
> > > > [ 7953.647323] process_one_work+0x26e/0x560
> > > > [ 7953.651337] worker_thread+0x52/0x3b0
> > > > [ 7953.655003] ? process_one_work+0x560/0x560
> > > > [ 7953.659188] kthread+0x12c/0x150
> > > > [ 7953.662421] ? __kthread_bind_mask+0x60/0x60
> > > > [ 7953.666694] ret_from_fork+0x22/0x30
> > > > [ 7953.670273] Modules linked in: rfkill dm_service_time dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua rpcrdma sunrpc rdma_ucm ib_srpt ib_isert iscsi_target_mod target_core_mod ib_iser libiscsi scsi_transport_iscsi rdma_cm ib_umad ib_ipoib iw_cm ib_cm mlx5_ib intel_rapl_msr intel_rapl_common isst_if_common ib_uverbs ib_core skx_edac nfit libnvdimm x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel mlx5_core kvm ipmi_ssif iTCO_wdt intel_pmc_bxt irqbypass iTCO_vendor_support rapl acpi_ipmi intel_cstate psample intel_uncore mei_me wmi_bmof ipmi_si pcspkr mlxfw i2c_i801 tg3 pci_hyperv_intf mei lpc_ich intel_pch_thermal ipmi_devintf i2c_smbus ipmi_msghandler acpi_power_meter fuse zram ip_tables xfs lpfc mgag200 drm_kms_helper nvmet_fc nvmet cec nvme_fc crct10dif_pclmul crc32_pclmul nvme_fabrics drm crc32c_intel nvme_core ghash_clmulni_intel scsi_transport_fc megaraid_sas i2c_algo_bit wmi
> > > > [ 7953.749025] CR2: 000000000000031f
> > > > [ 7953.752345] ---[ end trace f04c960f66265f3b ]---
> > > >
> > >
> >
>
^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [PATCH 0/8 V2] xfs: log fixes for for-next
2021-06-17 19:05 ` Darrick J. Wong
2021-06-17 20:06 ` Brian Foster
@ 2021-06-17 23:43 ` Dave Chinner
2021-06-18 13:08 ` Brian Foster
1 sibling, 1 reply; 50+ messages in thread
From: Dave Chinner @ 2021-06-17 23:43 UTC (permalink / raw)
To: Darrick J. Wong; +Cc: Brian Foster, linux-xfs
On Thu, Jun 17, 2021 at 12:05:19PM -0700, Darrick J. Wong wrote:
> On Thu, Jun 17, 2021 at 02:32:30PM -0400, Brian Foster wrote:
> > On Thu, Jun 17, 2021 at 06:26:09PM +1000, Dave Chinner wrote:
> > > Hi folks,
> > >
> > > This is followup from the first set of log fixes for for-next that
> > > were posted here:
> > >
> > > https://lore.kernel.org/linux-xfs/20210615175719.GD158209@locust/T/#mde2cf0bb7d2ac369815a7e9371f0303efc89f51b
> > >
> > > The first two patches of this series are updates for those patches,
> > > change log below. The rest is the fix for the bigger issue we
> > > uncovered in investigating the generic/019 failures, being that
> > > we're triggering a zero-day bug in the way log recovery assigns LSNs
> > > to checkpoints.
> > >
> > > The "simple" fix of using the same ordering code as the commit
> > > record for the start records in the CIL push turned into a lot of
> > > patches once I started cleaning it up, separating out all the
> > > different bits and finally realising all the things I needed to
> > > change to avoid unintentional logic/behavioural changes. Hence
> > > there's some code movement, some factoring, API changes to
> > > xlog_write(), changing where we attach callbacks to commit iclogs so
> > > they remain correctly ordered if there are multiple commit records
> > > in the one iclog and then, finally, strictly ordering the start
> > > records....
> > >
> > > The original "simple fix" I tested last night ran almost a thousand
> > > cycles of generic/019 without a log hang or recovery failure of any
> > > kind. The refactored patchset has run a couple hundred cycles of
> > > g/019 and g/475 over the last few hours without a failure, so I'm
> > > posting this so we can get a review iteration done while I sleep so
> > > we can - hopefully - get this sorted out before the end of the week.
> > >
> >
> > My first spin of this included generic/019 and generic/475, ran for 18
> > or so iterations and 475 exploded with a stream of asserts followed by a
> > NULL pointer crash:
> >
> > # grep -e Assertion -e BUG dmesg.out
> > ...
> > [ 7951.878058] XFS: Assertion failed: atomic_read(&buip->bui_refcount) > 0, file: fs/xfs/xfs_bmap_item.c, line: 57
> > [ 7952.261251] XFS: Assertion failed: atomic_read(&buip->bui_refcount) > 0, file: fs/xfs/xfs_bmap_item.c, line: 57
> > [ 7952.644444] XFS: Assertion failed: atomic_read(&buip->bui_refcount) > 0, file: fs/xfs/xfs_bmap_item.c, line: 57
> > [ 7953.027626] XFS: Assertion failed: atomic_read(&buip->bui_refcount) > 0, file: fs/xfs/xfs_bmap_item.c, line: 57
> > [ 7953.410804] BUG: kernel NULL pointer dereference, address: 000000000000031f
> > [ 7954.118973] BUG: unable to handle page fault for address: ffffa57ccf99fa98
> >
> > I don't know if this is a regression, but I've not seen it before. I've
> > attempted to spin generic/475 since then to see if it reproduces again,
> > but so far I'm only running into some of the preexisting issues
> > associated with that test.
I've not seen anything like that. I can't see how the changes in the
patchset would affect BUI reference counting in any way. That seems
more like an underlying intent item shutdown reference count issue
to me (and we've had a *lot* of them in the past)....
> By any chance, do the two log recovery fixes I sent yesterday make those
> problems go away?
>
> > I'll let it go a while more and probably
> > switch it back to running both sometime before the end of the day for an
> > overnight test.
>
> Also, do the CIL livelocks go away if you apply only patches 1-2?
>
> > A full copy of the assert and NULL pointer BUG splat is included below
> > for reference. It looks like the fault BUG splat ended up interspersed
> > or otherwise mangled, but I suspect that one is just fallout from the
> > immediately previous crash.
>
> I have a question about the composition of this 8-patch series --
> which patches fix the new cil code, and which ones fix the out of order
> recovery problems? I suspect that patches 1-2 are for the new CIL code,
> and 3-8 are to fix the recovery problems.
Yes. But don't think of 3-8 as fixing recovery problems - the are
fixing potential runtime data integrity issues (log force lsns for
fsync are based on start LSNs) and journal head->tail overwrite
issues (because AIL ordering is start LSN based).
So, basically, we get the reocvery fixes for free when we fix the
runtime start LSN ordering issues...
> Thinking with my distro kernel not-maintainer hat on, I'm considering
> how to backport whatever fixes emerge for the recovery ordering issue
> into existing kernels. The way I see things right now, the CIL changes
> (+ fixes) and the ordering bug fixes are separate issues. The log
> ordering problems should get fixed as soon as we have a practical
> solution; the CIL changes could get deferred if need be since it's a
> medium-high risk; and the real question is how to sequence all this?
The CIL changes in patches 1-2 are low risk - that's just a hang
because of a logic error and we fix that sort of thing all the time
> (Or to put it another way: I'm still stuck going "oh wowwww this is a
> lot more change" while trying to understand patch 4)
It's not unreasonable given the amount of change that was made in
the first place. Really, though, once you take the tracing and code
movement out of it, the actual logic change is much, much smaller...
/me wonders if anyone remembers that I said up front that I
considered the changes to the log code completely unreviewable and
that there would be bugs that slip through both my testing and
review?
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [PATCH 0/8 V2] xfs: log fixes for for-next
2021-06-17 23:43 ` Dave Chinner
@ 2021-06-18 13:08 ` Brian Foster
2021-06-18 13:55 ` Christoph Hellwig
2021-06-18 22:15 ` Dave Chinner
0 siblings, 2 replies; 50+ messages in thread
From: Brian Foster @ 2021-06-18 13:08 UTC (permalink / raw)
To: Dave Chinner; +Cc: Darrick J. Wong, linux-xfs
On Fri, Jun 18, 2021 at 09:43:08AM +1000, Dave Chinner wrote:
> On Thu, Jun 17, 2021 at 12:05:19PM -0700, Darrick J. Wong wrote:
> > On Thu, Jun 17, 2021 at 02:32:30PM -0400, Brian Foster wrote:
> > > On Thu, Jun 17, 2021 at 06:26:09PM +1000, Dave Chinner wrote:
> > > > Hi folks,
> > > >
> > > > This is followup from the first set of log fixes for for-next that
> > > > were posted here:
> > > >
> > > > https://lore.kernel.org/linux-xfs/20210615175719.GD158209@locust/T/#mde2cf0bb7d2ac369815a7e9371f0303efc89f51b
> > > >
> > > > The first two patches of this series are updates for those patches,
> > > > change log below. The rest is the fix for the bigger issue we
> > > > uncovered in investigating the generic/019 failures, being that
> > > > we're triggering a zero-day bug in the way log recovery assigns LSNs
> > > > to checkpoints.
> > > >
> > > > The "simple" fix of using the same ordering code as the commit
> > > > record for the start records in the CIL push turned into a lot of
> > > > patches once I started cleaning it up, separating out all the
> > > > different bits and finally realising all the things I needed to
> > > > change to avoid unintentional logic/behavioural changes. Hence
> > > > there's some code movement, some factoring, API changes to
> > > > xlog_write(), changing where we attach callbacks to commit iclogs so
> > > > they remain correctly ordered if there are multiple commit records
> > > > in the one iclog and then, finally, strictly ordering the start
> > > > records....
> > > >
> > > > The original "simple fix" I tested last night ran almost a thousand
> > > > cycles of generic/019 without a log hang or recovery failure of any
> > > > kind. The refactored patchset has run a couple hundred cycles of
> > > > g/019 and g/475 over the last few hours without a failure, so I'm
> > > > posting this so we can get a review iteration done while I sleep so
> > > > we can - hopefully - get this sorted out before the end of the week.
> > > >
> > >
> > > My first spin of this included generic/019 and generic/475, ran for 18
> > > or so iterations and 475 exploded with a stream of asserts followed by a
> > > NULL pointer crash:
> > >
> > > # grep -e Assertion -e BUG dmesg.out
> > > ...
> > > [ 7951.878058] XFS: Assertion failed: atomic_read(&buip->bui_refcount) > 0, file: fs/xfs/xfs_bmap_item.c, line: 57
> > > [ 7952.261251] XFS: Assertion failed: atomic_read(&buip->bui_refcount) > 0, file: fs/xfs/xfs_bmap_item.c, line: 57
> > > [ 7952.644444] XFS: Assertion failed: atomic_read(&buip->bui_refcount) > 0, file: fs/xfs/xfs_bmap_item.c, line: 57
> > > [ 7953.027626] XFS: Assertion failed: atomic_read(&buip->bui_refcount) > 0, file: fs/xfs/xfs_bmap_item.c, line: 57
> > > [ 7953.410804] BUG: kernel NULL pointer dereference, address: 000000000000031f
> > > [ 7954.118973] BUG: unable to handle page fault for address: ffffa57ccf99fa98
> > >
> > > I don't know if this is a regression, but I've not seen it before. I've
> > > attempted to spin generic/475 since then to see if it reproduces again,
> > > but so far I'm only running into some of the preexisting issues
> > > associated with that test.
>
> I've not seen anything like that. I can't see how the changes in the
> patchset would affect BUI reference counting in any way. That seems
> more like an underlying intent item shutdown reference count issue
> to me (and we've had a *lot* of them in the past)....
>
I've not made sense of it either, but at the same time, I've not seen it
in all my testing thus far up until targeting this series, and now I've
seen it twice in as many test runs as my overnight run fell into some
kind of similar haywire state. Unfortunately it seemed to be
spinning/streaming assert output so I lost any record of the initial
crash signature. It wouldn't surprise me if the fundamental problem is
some older bug in another area of code, but it's hard to believe it's
not at least related to this series somehow.
Also FYI, earlier iterations of generic/475 triggered a couple instances
of the following assert failure before things broke down more severely:
XFS: Assertion failed: *log_offset + *len <= iclog->ic_size || iclog->ic_state == XLOG_STATE_WANT_SYNC, file: fs/xfs/xfs_log.c, line: 2115
...
------------[ cut here ]------------
WARNING: CPU: 45 PID: 951355 at fs/xfs/xfs_message.c:112 assfail+0x25/0x28 [xfs]
Modules linked in: rfkill dm_service_time dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua rpcrdma sunrpc rdma_ucm ib_srpt ib_isert iscsi_target_mod target_core_mod ib_iser libiscsi scsi_transport_iscsi ib_umad ib_ipoib rdma_cm iw_cm ib_cm intel_rapl_msr mlx5_ib intel_rapl_common ib_uverbs isst_if_common ib_core skx_edac nfit libnvdimm x86_pkg_temp_thermal intel_powerclamp coretemp mlx5_core kvm_intel kvm ipmi_ssif irqbypass iTCO_wdt intel_pmc_bxt rapl psample intel_cstate iTCO_vendor_support acpi_ipmi mlxfw intel_uncore pci_hyperv_intf pcspkr wmi_bmof tg3 mei_me ipmi_si i2c_i801 mei ipmi_devintf i2c_smbus lpc_ich intel_pch_thermal ipmi_msghandler acpi_power_meter fuse zram ip_tables xfs lpfc mgag200 drm_kms_helper nvmet_fc nvmet cec crct10dif_pclmul nvme_fc crc32_pclmul drm nvme_fabrics crc32c_intel nvme_core ghash_clmulni_intel megaraid_sas scsi_transport_fc i2c_algo_bit wmi
CPU: 45 PID: 951355 Comm: kworker/u162:5 Tainted: G I 5.13.0-rc4+ #70
Hardware name: Dell Inc. PowerEdge R740/01KPX8, BIOS 1.6.11 11/20/2018
Workqueue: xfs-cil/dm-7 xlog_cil_push_work [xfs]
RIP: 0010:assfail+0x25/0x28 [xfs]
Code: ff ff 0f 0b c3 0f 1f 44 00 00 41 89 c8 48 89 d1 48 89 f2 48 c7 c6 d8 db 36 c0 e8 cf fa ff ff 80 3d f1 d4 0a 00 00 74 02 0f 0b <0f> 0b c3 48 8d 45 10 48 89 e2 4c 89 e6 48 89 1c 24 48 89 44 24 18
RSP: 0018:ffffa59c80ce3bb0 EFLAGS: 00010246
RAX: 00000000ffffffea RBX: ffff8b2671dddc00 RCX: 0000000000000000
RDX: 00000000ffffffc0 RSI: 0000000000000000 RDI: ffffffffc035f0e2
RBP: 0000000000015d60 R08: 0000000000000000 R09: 000000000000000a
R10: 000000000000000a R11: f000000000000000 R12: ffff8b241716e6c0
R13: 000000000000003c R14: ffff8b241716e6c0 R15: ffff8b24d9d17000
FS: 0000000000000000(0000) GS:ffff8b52ff980000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f0d0e270910 CR3: 00000031a2826002 CR4: 00000000007706e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
PKRU: 55555554
Call Trace:
xlog_write+0x567/0x630 [xfs]
xlog_cil_push_work+0x5bd/0x8d0 [xfs]
? load_balance+0x179/0xd60
? lock_acquire+0x15d/0x380
? lock_release+0x1cd/0x2a0
? lock_acquire+0x15d/0x380
? lock_release+0x1cd/0x2a0
? finish_task_switch.isra.0+0xa0/0x2c0
process_one_work+0x26e/0x560
worker_thread+0x52/0x3b0
? process_one_work+0x560/0x560
kthread+0x12c/0x150
? __kthread_bind_mask+0x60/0x60
ret_from_fork+0x22/0x30
irq event stamp: 0
hardirqs last enabled at (0): [<0000000000000000>] 0x0
hardirqs last disabled at (0): [<ffffffffa10da3f4>] copy_process+0x754/0x1d00
softirqs last enabled at (0): [<ffffffffa10da3f4>] copy_process+0x754/0x1d00
softirqs last disabled at (0): [<0000000000000000>] 0x0
---[ end trace 275cd74c3f62be17 ]---
Brian
^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [PATCH 0/8 V2] xfs: log fixes for for-next
2021-06-18 13:08 ` Brian Foster
@ 2021-06-18 13:55 ` Christoph Hellwig
2021-06-18 14:02 ` Christoph Hellwig
2021-06-18 22:28 ` Dave Chinner
2021-06-18 22:15 ` Dave Chinner
1 sibling, 2 replies; 50+ messages in thread
From: Christoph Hellwig @ 2021-06-18 13:55 UTC (permalink / raw)
To: Brian Foster; +Cc: Dave Chinner, Darrick J. Wong, linux-xfs
On Fri, Jun 18, 2021 at 09:08:15AM -0400, Brian Foster wrote:
> Also FYI, earlier iterations of generic/475 triggered a couple instances
> of the following assert failure before things broke down more severely:
>
> XFS: Assertion failed: *log_offset + *len <= iclog->ic_size || iclog->ic_state == XLOG_STATE_WANT_SYNC, file: fs/xfs/xfs_log.c, line: 2115
As you mentioned the placement of this exact assert in my cleanups
series: after looking at a right place to move it, I'm really not sure
this assert makes much sense in this form.
xlog_write_single is always entered first by xlog_write, so we also
get here for something that later gets handled by xlog_write_partial.
Which means it could be way bigger than the current iclog, and I see no
reason why that iclog would have to be XLOG_STATE_WANT_SYNC.
^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [PATCH 0/8 V2] xfs: log fixes for for-next
2021-06-18 13:55 ` Christoph Hellwig
@ 2021-06-18 14:02 ` Christoph Hellwig
2021-06-18 22:28 ` Dave Chinner
1 sibling, 0 replies; 50+ messages in thread
From: Christoph Hellwig @ 2021-06-18 14:02 UTC (permalink / raw)
To: Brian Foster; +Cc: Dave Chinner, Darrick J. Wong, linux-xfs
On Fri, Jun 18, 2021 at 02:55:03PM +0100, Christoph Hellwig wrote:
> xlog_write_single is always entered first by xlog_write, so we also
> get here for something that later gets handled by xlog_write_partial.
> Which means it could be way bigger than the current iclog, and I see no
> reason why that iclog would have to be XLOG_STATE_WANT_SYNC.
Actually I'll take that back. There is a second call to
xlog_state_switch_iclogs which we should hit and thus have moved to
XLOG_STATE_WANT_SYNC.
^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [PATCH 1/8] xfs: add iclog state trace events
2021-06-17 8:26 ` [PATCH 1/8] xfs: add iclog state trace events Dave Chinner
2021-06-17 16:45 ` Darrick J. Wong
@ 2021-06-18 14:09 ` Christoph Hellwig
1 sibling, 0 replies; 50+ messages in thread
From: Christoph Hellwig @ 2021-06-18 14:09 UTC (permalink / raw)
To: Dave Chinner; +Cc: linux-xfs
On Thu, Jun 17, 2021 at 06:26:10PM +1000, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
>
> For the DEBUGS!
>
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
Looks good,
Reviewed-by: Christoph Hellwig <hch@lst.de>
(although I wouldn't mind a more useful commit message)
^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [PATCH 3/8] xfs: move xlog_commit_record to xfs_log_cil.c
2021-06-17 8:26 ` [PATCH 3/8] xfs: move xlog_commit_record to xfs_log_cil.c Dave Chinner
2021-06-17 12:57 ` kernel test robot
2021-06-17 17:50 ` Darrick J. Wong
@ 2021-06-18 14:16 ` Christoph Hellwig
2 siblings, 0 replies; 50+ messages in thread
From: Christoph Hellwig @ 2021-06-18 14:16 UTC (permalink / raw)
To: Dave Chinner; +Cc: linux-xfs
On Thu, Jun 17, 2021 at 06:26:12PM +1000, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
>
> It is only used by the CIL checkpoints, and is the counterpart to
> start record formatting and writing that is already local to
> xfs_log_cil.c.
>
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
Looks good,
Reviewed-by: Christoph Hellwig <hch@lst.de>
^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [PATCH 4/8] xfs: pass a CIL context to xlog_write()
2021-06-17 8:26 ` [PATCH 4/8] xfs: pass a CIL context to xlog_write() Dave Chinner
2021-06-17 14:46 ` kernel test robot
2021-06-17 20:24 ` Darrick J. Wong
@ 2021-06-18 14:23 ` Christoph Hellwig
2 siblings, 0 replies; 50+ messages in thread
From: Christoph Hellwig @ 2021-06-18 14:23 UTC (permalink / raw)
To: Dave Chinner; +Cc: linux-xfs
> + /*
> + * If we have a CIL context, record the LSN of the iclog we were just
> + * granted space to start writing into. If the context doesn't have
> + * a start_lsn recorded, then this iclog will contain the start record
> + * for the checkpoint. Otherwise this write contains the commit record
> + * for the checkpoint.
> + */
> + if (ctx) {
> + spin_lock(&ctx->cil->xc_push_lock);
> + if (!ctx->start_lsn)
> + ctx->start_lsn = be64_to_cpu(iclog->ic_header.h_lsn);
> + else
> + ctx->commit_lsn = be64_to_cpu(iclog->ic_header.h_lsn);
> + spin_unlock(&ctx->cil->xc_push_lock);
> + }
I have to say that having this cil_ctx specific logic that somehow
reverse eingeer what the callers is doing here seems pretty awkware.
To me the logical interface would be to pass a function pointer and
private data except for the performance penalty of indirect calls.
But to make this somewhat bearable I think you should start with the
above block as a helper implemented in xfs_log_cil.c.
^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [PATCH 5/8] xfs: factor out log write ordering from xlog_cil_push_work()
2021-06-17 19:59 ` Darrick J. Wong
@ 2021-06-18 14:27 ` Christoph Hellwig
2021-06-18 22:34 ` Dave Chinner
0 siblings, 1 reply; 50+ messages in thread
From: Christoph Hellwig @ 2021-06-18 14:27 UTC (permalink / raw)
To: Darrick J. Wong; +Cc: Dave Chinner, linux-xfs
On Thu, Jun 17, 2021 at 12:59:04PM -0700, Darrick J. Wong wrote:
> On Thu, Jun 17, 2021 at 06:26:14PM +1000, Dave Chinner wrote:
> > From: Dave Chinner <dchinner@redhat.com>
> >
> > So we can use it for start record ordering as well as commit record
> > ordering in future.
> >
> > Signed-off-by: Dave Chinner <dchinner@redhat.com>
>
> This tricked me for a second until I realized that xlog_cil_order_write
> is the chunk of code just prior to the xlog_cil_write_commit_record
> call.
Yeah, moving the caller at the same time as the factoring is a trick
test for every reader. I think this needs to be documented in the
commit log. Or even better moved to a separate log, but it seems you
get shot for that kind of suggestion on the xfs list these days..
^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [PATCH 0/8 V2] xfs: log fixes for for-next
2021-06-18 13:08 ` Brian Foster
2021-06-18 13:55 ` Christoph Hellwig
@ 2021-06-18 22:15 ` Dave Chinner
1 sibling, 0 replies; 50+ messages in thread
From: Dave Chinner @ 2021-06-18 22:15 UTC (permalink / raw)
To: Brian Foster; +Cc: Darrick J. Wong, linux-xfs
On Fri, Jun 18, 2021 at 09:08:15AM -0400, Brian Foster wrote:
> On Fri, Jun 18, 2021 at 09:43:08AM +1000, Dave Chinner wrote:
> > On Thu, Jun 17, 2021 at 12:05:19PM -0700, Darrick J. Wong wrote:
> > > On Thu, Jun 17, 2021 at 02:32:30PM -0400, Brian Foster wrote:
> > > > On Thu, Jun 17, 2021 at 06:26:09PM +1000, Dave Chinner wrote:
> > > > > Hi folks,
> > > > >
> > > > > This is followup from the first set of log fixes for for-next that
> > > > > were posted here:
> > > > >
> > > > > https://lore.kernel.org/linux-xfs/20210615175719.GD158209@locust/T/#mde2cf0bb7d2ac369815a7e9371f0303efc89f51b
> > > > >
> > > > > The first two patches of this series are updates for those patches,
> > > > > change log below. The rest is the fix for the bigger issue we
> > > > > uncovered in investigating the generic/019 failures, being that
> > > > > we're triggering a zero-day bug in the way log recovery assigns LSNs
> > > > > to checkpoints.
> > > > >
> > > > > The "simple" fix of using the same ordering code as the commit
> > > > > record for the start records in the CIL push turned into a lot of
> > > > > patches once I started cleaning it up, separating out all the
> > > > > different bits and finally realising all the things I needed to
> > > > > change to avoid unintentional logic/behavioural changes. Hence
> > > > > there's some code movement, some factoring, API changes to
> > > > > xlog_write(), changing where we attach callbacks to commit iclogs so
> > > > > they remain correctly ordered if there are multiple commit records
> > > > > in the one iclog and then, finally, strictly ordering the start
> > > > > records....
> > > > >
> > > > > The original "simple fix" I tested last night ran almost a thousand
> > > > > cycles of generic/019 without a log hang or recovery failure of any
> > > > > kind. The refactored patchset has run a couple hundred cycles of
> > > > > g/019 and g/475 over the last few hours without a failure, so I'm
> > > > > posting this so we can get a review iteration done while I sleep so
> > > > > we can - hopefully - get this sorted out before the end of the week.
> > > > >
> > > >
> > > > My first spin of this included generic/019 and generic/475, ran for 18
> > > > or so iterations and 475 exploded with a stream of asserts followed by a
> > > > NULL pointer crash:
> > > >
> > > > # grep -e Assertion -e BUG dmesg.out
> > > > ...
> > > > [ 7951.878058] XFS: Assertion failed: atomic_read(&buip->bui_refcount) > 0, file: fs/xfs/xfs_bmap_item.c, line: 57
> > > > [ 7952.261251] XFS: Assertion failed: atomic_read(&buip->bui_refcount) > 0, file: fs/xfs/xfs_bmap_item.c, line: 57
> > > > [ 7952.644444] XFS: Assertion failed: atomic_read(&buip->bui_refcount) > 0, file: fs/xfs/xfs_bmap_item.c, line: 57
> > > > [ 7953.027626] XFS: Assertion failed: atomic_read(&buip->bui_refcount) > 0, file: fs/xfs/xfs_bmap_item.c, line: 57
> > > > [ 7953.410804] BUG: kernel NULL pointer dereference, address: 000000000000031f
> > > > [ 7954.118973] BUG: unable to handle page fault for address: ffffa57ccf99fa98
> > > >
> > > > I don't know if this is a regression, but I've not seen it before. I've
> > > > attempted to spin generic/475 since then to see if it reproduces again,
> > > > but so far I'm only running into some of the preexisting issues
> > > > associated with that test.
> >
> > I've not seen anything like that. I can't see how the changes in the
> > patchset would affect BUI reference counting in any way. That seems
> > more like an underlying intent item shutdown reference count issue
> > to me (and we've had a *lot* of them in the past)....
> >
>
> I've not made sense of it either, but at the same time, I've not seen it
> in all my testing thus far up until targeting this series, and now I've
> seen it twice in as many test runs as my overnight run fell into some
> kind of similar haywire state. Unfortunately it seemed to be
> spinning/streaming assert output so I lost any record of the initial
> crash signature. It wouldn't surprise me if the fundamental problem is
> some older bug in another area of code, but it's hard to believe it's
> not at least related to this series somehow.
>
> Also FYI, earlier iterations of generic/475 triggered a couple instances
> of the following assert failure before things broke down more severely:
>
> XFS: Assertion failed: *log_offset + *len <= iclog->ic_size || iclog->ic_state == XLOG_STATE_WANT_SYNC, file: fs/xfs/xfs_log.c, line: 2115
Yup, that's a bogus state check in the asssert. I already have a
patch to fix that - the async shutdown can change the iclog state
to XLOG_STATE_IOERROR at any time, so any iclog state assert outside of
the log->l_icloglock needs also to allow for XLOG_STATE_IOERROR as
a valid state.
This is one of the problems I was alluding to on #xfs when I said:
[18/6/21 14:42] <dchinner> I'm really not liking getting repeatedly
caught out by racing, unreferenced iclog state changes during
shutdown and having to handle them everywhere.
Patch, FYI, below.
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
xfs: fix incorrect assert in xlog_write_single
From: Dave Chinner <dchinner@redhat.com>
generic/475 failed with this assert after a log shutdown:
[ 3953.166235] XFS: Assertion failed: *log_offset + *len <= iclog->ic_size || iclog->ic_state == XLOG_STATE_WANT_SYNC, file: fs/xfs/xfs_log.c, line: 2115
The problem is that after the log has shut down, the iclog state is
XLOG_STATE_IOERROR. The shutdown can change the iclog state at any
time while we are writing to it, so we need to add IOERROR to the
valid states here.
Note that we already have similar IOERROR state checks in asserts
in the xlog_write() code for this reason (e.g. in
xlog_write_get_more_iclog_space()) so this is just a case where the
IOERROR state check was missed. The IOERROR state will be processed
when we release the iclog, so just add the state into the assert and
let the iclog release code handle the error.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
fs/xfs/xfs_log.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c
index 94b6bccb9de9..221c080df305 100644
--- a/fs/xfs/xfs_log.c
+++ b/fs/xfs/xfs_log.c
@@ -2113,7 +2113,8 @@ xlog_write_single(
int index;
ASSERT(*log_offset + *len <= iclog->ic_size ||
- iclog->ic_state == XLOG_STATE_WANT_SYNC);
+ iclog->ic_state == XLOG_STATE_WANT_SYNC ||
+ iclog->ic_state == XLOG_STATE_IOERROR);
ptr = iclog->ic_datap + *log_offset;
for (lv = log_vector;
^ permalink raw reply related [flat|nested] 50+ messages in thread
* Re: [PATCH 0/8 V2] xfs: log fixes for for-next
2021-06-18 13:55 ` Christoph Hellwig
2021-06-18 14:02 ` Christoph Hellwig
@ 2021-06-18 22:28 ` Dave Chinner
1 sibling, 0 replies; 50+ messages in thread
From: Dave Chinner @ 2021-06-18 22:28 UTC (permalink / raw)
To: Christoph Hellwig; +Cc: Brian Foster, Darrick J. Wong, linux-xfs
On Fri, Jun 18, 2021 at 02:55:03PM +0100, Christoph Hellwig wrote:
> On Fri, Jun 18, 2021 at 09:08:15AM -0400, Brian Foster wrote:
> > Also FYI, earlier iterations of generic/475 triggered a couple instances
> > of the following assert failure before things broke down more severely:
> >
> > XFS: Assertion failed: *log_offset + *len <= iclog->ic_size || iclog->ic_state == XLOG_STATE_WANT_SYNC, file: fs/xfs/xfs_log.c, line: 2115
>
> As you mentioned the placement of this exact assert in my cleanups
> series: after looking at a right place to move it, I'm really not sure
> this assert makes much sense in this form.
It actually makes perfect sense when you look at the iclog state
transitions in xlog_state_get_iclog_space() w.r.t. the length that
is passed to it.
> xlog_write_single is always entered first by xlog_write, so we also
> get here for something that later gets handled by xlog_write_partial.
> Which means it could be way bigger than the current iclog, and I see no
> reason why that iclog would have to be XLOG_STATE_WANT_SYNC.
Yup, completely intentional and if len is larger than can fit in the
iclog we are writing into, the iclog *must* be in
XLOG_STATE_WANT_SYNC.
That is, if the length requested in xlog_state_get_iclog_space()
fits entirely in the iclog that is returned, _get_space() will
increment the offset of the iclog to exclusively reserve that amount
of space for the write we are going to do. It then leaves the state
as ACTIVE so another process can then also reserve some/all of the
remaining unused space in the iclog. Hence here in
xlog_write_single() we will have *log_offset + *len <=
iclog->ic_size and ic_state = ACTIVE as true for a write that fits
entirely in the iclog.
If _get_space() finds that the len is larger than will fit in the
iclog, it will reserve the entire remaining space in the iclog for the current caller
by switching out the iclog and moving the state to
XLOG_STATE_WANT_SYNC. This means no other caller to _get_space() will
be able to reserve space in this iclog because the state is no
longer ACTIVE.
IOWs, if *log_offset + *len > iclog->ic_size, then _get_space()
*must* have set the state of the iclog to _WANT_SYNC so that the
owner of the iclog has exclusive use of the space in the iclog from
*log_offset all the way to the end of the iclog. The overlap beyond
the end of this iclog will be handled by the xlog_write_partial(),
and it will release this iclog and get a new one to continue the
write.
Long story short, the assert is valid, but asynchronous shutdown
changing ic_state without having references to the iclogs or caring
about how they are being used is turning out to be a massive Charlie
Foxtrot right now...
Cheers,
Dave.
>
--
Dave Chinner
david@fromorbit.com
^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [PATCH 5/8] xfs: factor out log write ordering from xlog_cil_push_work()
2021-06-18 14:27 ` Christoph Hellwig
@ 2021-06-18 22:34 ` Dave Chinner
0 siblings, 0 replies; 50+ messages in thread
From: Dave Chinner @ 2021-06-18 22:34 UTC (permalink / raw)
To: Christoph Hellwig; +Cc: Darrick J. Wong, linux-xfs
On Fri, Jun 18, 2021 at 03:27:49PM +0100, Christoph Hellwig wrote:
> On Thu, Jun 17, 2021 at 12:59:04PM -0700, Darrick J. Wong wrote:
> > On Thu, Jun 17, 2021 at 06:26:14PM +1000, Dave Chinner wrote:
> > > From: Dave Chinner <dchinner@redhat.com>
> > >
> > > So we can use it for start record ordering as well as commit record
> > > ordering in future.
> > >
> > > Signed-off-by: Dave Chinner <dchinner@redhat.com>
> >
> > This tricked me for a second until I realized that xlog_cil_order_write
> > is the chunk of code just prior to the xlog_cil_write_commit_record
> > call.
>
> Yeah, moving the caller at the same time as the factoring is a trick
> test for every reader. I think this needs to be documented in the
> commit log. Or even better moved to a separate log, but it seems you
> get shot for that kind of suggestion on the xfs list these days..
Sorry, what? This should be a straight factoring - the place we do
the ordering check must not change because that'll break shit.
Ngggh.
Yeah, thanks git. When I rebased the patch, it's merged the hunk
into the wrong place. It gets fixed up later when I move the ordering
inside the xlog_cil_write_commit_record() function, but this patch
by itself was silently broken by the tooling.
-Dave.
--
Dave Chinner
david@fromorbit.com
^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [PATCH 0/8 V2] xfs: log fixes for for-next
2021-06-17 8:26 [PATCH 0/8 V2] xfs: log fixes for for-next Dave Chinner
` (8 preceding siblings ...)
2021-06-17 18:32 ` [PATCH 0/8 V2] xfs: log fixes for for-next Brian Foster
@ 2021-06-18 22:48 ` Dave Chinner
2021-06-19 20:22 ` Darrick J. Wong
9 siblings, 1 reply; 50+ messages in thread
From: Dave Chinner @ 2021-06-18 22:48 UTC (permalink / raw)
To: linux-xfs
On Thu, Jun 17, 2021 at 06:26:09PM +1000, Dave Chinner wrote:
> Hi folks,
>
> This is followup from the first set of log fixes for for-next that
> were posted here:
>
> https://lore.kernel.org/linux-xfs/20210615175719.GD158209@locust/T/#mde2cf0bb7d2ac369815a7e9371f0303efc89f51b
>
> The first two patches of this series are updates for those patches,
> change log below. The rest is the fix for the bigger issue we
> uncovered in investigating the generic/019 failures, being that
> we're triggering a zero-day bug in the way log recovery assigns LSNs
> to checkpoints.
>
> The "simple" fix of using the same ordering code as the commit
> record for the start records in the CIL push turned into a lot of
> patches once I started cleaning it up, separating out all the
> different bits and finally realising all the things I needed to
> change to avoid unintentional logic/behavioural changes. Hence
> there's some code movement, some factoring, API changes to
> xlog_write(), changing where we attach callbacks to commit iclogs so
> they remain correctly ordered if there are multiple commit records
> in the one iclog and then, finally, strictly ordering the start
> records....
>
> The original "simple fix" I tested last night ran almost a thousand
> cycles of generic/019 without a log hang or recovery failure of any
> kind. The refactored patchset has run a couple hundred cycles of
> g/019 and g/475 over the last few hours without a failure, so I'm
> posting this so we can get a review iteration done while I sleep so
> we can - hopefully - get this sorted out before the end of the week.
Update on this so people know what's happening.
Yesterday I found another zero-day bug in the CIL code that triggers
when a shutdown occurs.
The shutdown processing runs asynchronously and without caring about
the current state or users of the iclogs. SO when it runs
xlog_state_do_callbacks() after changing the state of all iclogs to
XLOG_STATE_IOERROR, it runs the callbacks on all the iclogs and
frees everything associated with them.
That includes the CIL context structure that xlog_cil_push_now() is
still working on because it has a referenced iclog that it hasn't
yet released.
Hence the initial CIL commit that stamps the CIL context with the
commit lsn -after- it has attached the context to the commit_iclog
callback list can race with shutdown. This results in a UAF
situation and an 8 byte memory corruption when we stamp the LSN into
the context.
The current for-next tree does *much more* with the context after
the callbacks are attached, which opens up this UAF to both reads
and writes of free memory. The fix in patch 2, which adds a sleep on
the previous iclog after attaching the callbacks to the commit iclog
opens this window even futher.
ANd then the start record ordering patch set moves the commit iclog
into CIL context structure which we dereference after waiting on the
previous iclog means we are dereferencing pointers freed memory.
So, basically, before any of these fixes can go forwards, I first
need to fix the pre-existing CIL push/shutdown race.
And then, after I've rebased all these fixes on that fix and we're
back to square one and before we do anything else in the log, we
need to fix the mess that is caused by unco-ordinated shutdown
changing iclog state and running completions while we still have
active references to the iclogs and are preparing the iclog for IO.
XLOG_STATE_IOERROR must be considered harmful at this point in time.
-Dave.
--
Dave Chinner
david@fromorbit.com
^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [PATCH 0/8 V2] xfs: log fixes for for-next
2021-06-18 22:48 ` Dave Chinner
@ 2021-06-19 20:22 ` Darrick J. Wong
2021-06-20 22:18 ` Dave Chinner
0 siblings, 1 reply; 50+ messages in thread
From: Darrick J. Wong @ 2021-06-19 20:22 UTC (permalink / raw)
To: Dave Chinner; +Cc: linux-xfs
On Sat, Jun 19, 2021 at 08:48:30AM +1000, Dave Chinner wrote:
> On Thu, Jun 17, 2021 at 06:26:09PM +1000, Dave Chinner wrote:
> > Hi folks,
> >
> > This is followup from the first set of log fixes for for-next that
> > were posted here:
> >
> > https://lore.kernel.org/linux-xfs/20210615175719.GD158209@locust/T/#mde2cf0bb7d2ac369815a7e9371f0303efc89f51b
> >
> > The first two patches of this series are updates for those patches,
> > change log below. The rest is the fix for the bigger issue we
> > uncovered in investigating the generic/019 failures, being that
> > we're triggering a zero-day bug in the way log recovery assigns LSNs
> > to checkpoints.
> >
> > The "simple" fix of using the same ordering code as the commit
> > record for the start records in the CIL push turned into a lot of
> > patches once I started cleaning it up, separating out all the
> > different bits and finally realising all the things I needed to
> > change to avoid unintentional logic/behavioural changes. Hence
> > there's some code movement, some factoring, API changes to
> > xlog_write(), changing where we attach callbacks to commit iclogs so
> > they remain correctly ordered if there are multiple commit records
> > in the one iclog and then, finally, strictly ordering the start
> > records....
> >
> > The original "simple fix" I tested last night ran almost a thousand
> > cycles of generic/019 without a log hang or recovery failure of any
> > kind. The refactored patchset has run a couple hundred cycles of
> > g/019 and g/475 over the last few hours without a failure, so I'm
> > posting this so we can get a review iteration done while I sleep so
> > we can - hopefully - get this sorted out before the end of the week.
>
> Update on this so people know what's happening.
>
> Yesterday I found another zero-day bug in the CIL code that triggers
> when a shutdown occurs.
>
> The shutdown processing runs asynchronously and without caring about
> the current state or users of the iclogs. SO when it runs
> xlog_state_do_callbacks() after changing the state of all iclogs to
> XLOG_STATE_IOERROR, it runs the callbacks on all the iclogs and
> frees everything associated with them.
>
> That includes the CIL context structure that xlog_cil_push_now() is
> still working on because it has a referenced iclog that it hasn't
> yet released.
>
> Hence the initial CIL commit that stamps the CIL context with the
> commit lsn -after- it has attached the context to the commit_iclog
> callback list can race with shutdown. This results in a UAF
> situation and an 8 byte memory corruption when we stamp the LSN into
> the context.
>
> The current for-next tree does *much more* with the context after
> the callbacks are attached, which opens up this UAF to both reads
> and writes of free memory. The fix in patch 2, which adds a sleep on
> the previous iclog after attaching the callbacks to the commit iclog
> opens this window even futher.
>
> ANd then the start record ordering patch set moves the commit iclog
> into CIL context structure which we dereference after waiting on the
> previous iclog means we are dereferencing pointers freed memory.
>
> So, basically, before any of these fixes can go forwards, I first
> need to fix the pre-existing CIL push/shutdown race.
>
> And then, after I've rebased all these fixes on that fix and we're
> back to square one and before we do anything else in the log, we
> need to fix the mess that is caused by unco-ordinated shutdown
> changing iclog state and running completions while we still have
> active references to the iclogs and are preparing the iclog for IO.
> XLOG_STATE_IOERROR must be considered harmful at this point in time.
This puts me in a difficult spot. We're past -rc6, which means that
Linus could tag 5.13.0 tomorrow, and if he does that, whatever's in
for-next needs to have had at least a few days to soak before Linus will
want to pull it upstream.
Or this could be yet another one of those crazy kernels that goes all
the way to -rc8, in which case there's still time to make small
adjustments. But who knows, I have no schedule visibility.
However, this doesn't sound like small adjustments. I think it's best
that I withdraw the CIL changes from for-next until we have more time to
fix these issues and make sure that there aren't any bugs that are
easily found by developers. I feel confident enough about everything
between "xfs: log stripe roundoff is a property of the log" and
"xfs: xfs_log_force_lsn isn't passed a LSN" to keep them in for-next.
I'll also throw in the random fixes that got reviewed this week.
--D
>
> -Dave.
> --
> Dave Chinner
> david@fromorbit.com
^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [PATCH 0/8 V2] xfs: log fixes for for-next
2021-06-19 20:22 ` Darrick J. Wong
@ 2021-06-20 22:18 ` Dave Chinner
0 siblings, 0 replies; 50+ messages in thread
From: Dave Chinner @ 2021-06-20 22:18 UTC (permalink / raw)
To: Darrick J. Wong; +Cc: linux-xfs
On Sat, Jun 19, 2021 at 01:22:49PM -0700, Darrick J. Wong wrote:
> On Sat, Jun 19, 2021 at 08:48:30AM +1000, Dave Chinner wrote:
> > On Thu, Jun 17, 2021 at 06:26:09PM +1000, Dave Chinner wrote:
> > > Hi folks,
> > >
> > > This is followup from the first set of log fixes for for-next that
> > > were posted here:
> > >
> > > https://lore.kernel.org/linux-xfs/20210615175719.GD158209@locust/T/#mde2cf0bb7d2ac369815a7e9371f0303efc89f51b
> > >
> > > The first two patches of this series are updates for those patches,
> > > change log below. The rest is the fix for the bigger issue we
> > > uncovered in investigating the generic/019 failures, being that
> > > we're triggering a zero-day bug in the way log recovery assigns LSNs
> > > to checkpoints.
> > >
> > > The "simple" fix of using the same ordering code as the commit
> > > record for the start records in the CIL push turned into a lot of
> > > patches once I started cleaning it up, separating out all the
> > > different bits and finally realising all the things I needed to
> > > change to avoid unintentional logic/behavioural changes. Hence
> > > there's some code movement, some factoring, API changes to
> > > xlog_write(), changing where we attach callbacks to commit iclogs so
> > > they remain correctly ordered if there are multiple commit records
> > > in the one iclog and then, finally, strictly ordering the start
> > > records....
> > >
> > > The original "simple fix" I tested last night ran almost a thousand
> > > cycles of generic/019 without a log hang or recovery failure of any
> > > kind. The refactored patchset has run a couple hundred cycles of
> > > g/019 and g/475 over the last few hours without a failure, so I'm
> > > posting this so we can get a review iteration done while I sleep so
> > > we can - hopefully - get this sorted out before the end of the week.
> >
> > Update on this so people know what's happening.
> >
> > Yesterday I found another zero-day bug in the CIL code that triggers
> > when a shutdown occurs.
> >
> > The shutdown processing runs asynchronously and without caring about
> > the current state or users of the iclogs. SO when it runs
> > xlog_state_do_callbacks() after changing the state of all iclogs to
> > XLOG_STATE_IOERROR, it runs the callbacks on all the iclogs and
> > frees everything associated with them.
> >
> > That includes the CIL context structure that xlog_cil_push_now() is
> > still working on because it has a referenced iclog that it hasn't
> > yet released.
> >
> > Hence the initial CIL commit that stamps the CIL context with the
> > commit lsn -after- it has attached the context to the commit_iclog
> > callback list can race with shutdown. This results in a UAF
> > situation and an 8 byte memory corruption when we stamp the LSN into
> > the context.
> >
> > The current for-next tree does *much more* with the context after
> > the callbacks are attached, which opens up this UAF to both reads
> > and writes of free memory. The fix in patch 2, which adds a sleep on
> > the previous iclog after attaching the callbacks to the commit iclog
> > opens this window even futher.
> >
> > ANd then the start record ordering patch set moves the commit iclog
> > into CIL context structure which we dereference after waiting on the
> > previous iclog means we are dereferencing pointers freed memory.
> >
> > So, basically, before any of these fixes can go forwards, I first
> > need to fix the pre-existing CIL push/shutdown race.
> >
> > And then, after I've rebased all these fixes on that fix and we're
> > back to square one and before we do anything else in the log, we
> > need to fix the mess that is caused by unco-ordinated shutdown
> > changing iclog state and running completions while we still have
> > active references to the iclogs and are preparing the iclog for IO.
> > XLOG_STATE_IOERROR must be considered harmful at this point in time.
>
> This puts me in a difficult spot. We're past -rc6, which means that
> Linus could tag 5.13.0 tomorrow, and if he does that, whatever's in
> for-next needs to have had at least a few days to soak before Linus will
> want to pull it upstream.
>
> Or this could be yet another one of those crazy kernels that goes all
> the way to -rc8, in which case there's still time to make small
> adjustments. But who knows, I have no schedule visibility.
>
> However, this doesn't sound like small adjustments. I think it's best
> that I withdraw the CIL changes from for-next until we have more time to
> fix these issues and make sure that there aren't any bugs that are
> easily found by developers. I feel confident enough about everything
> between "xfs: log stripe roundoff is a property of the log" and
> "xfs: xfs_log_force_lsn isn't passed a LSN" to keep them in for-next.
Yup, that's a fair call. I was going to ask you to do this anyway
this morning (Monday) because I haven't been able to come up with a
magic bullet that fixes everything and makes it all better over the
weekend.
I'll start a new branch that fixes the UAF bug and the start record
ordering, and then rebase the CIL/log scalability patchset on top of
that. I'll also pull Christoph's cleanups for the new xlog_write()
code on top of that, too.
Oh, well, good thing I hadn't deleted the merged branches yet....
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [PATCH 4/8] xfs: pass a CIL context to xlog_write()
2021-06-17 8:26 ` [PATCH 4/8] xfs: pass a CIL context to xlog_write() Dave Chinner
2021-06-17 14:46 ` kernel test robot
@ 2021-06-28 8:58 ` Dan Carpenter
2021-06-18 14:23 ` Christoph Hellwig
2 siblings, 0 replies; 50+ messages in thread
From: kernel test robot @ 2021-06-26 23:10 UTC (permalink / raw)
To: kbuild
[-- Attachment #1: Type: text/plain, Size: 32323 bytes --]
CC: kbuild-all(a)lists.01.org
In-Reply-To: <20210617082617.971602-5-david@fromorbit.com>
References: <20210617082617.971602-5-david@fromorbit.com>
TO: Dave Chinner <david@fromorbit.com>
TO: linux-xfs(a)vger.kernel.org
Hi Dave,
Thank you for the patch! Perhaps something to improve:
[auto build test WARNING on xfs-linux/for-next]
[cannot apply to v5.13-rc7 next-20210625]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]
url: https://github.com/0day-ci/linux/commits/Dave-Chinner/xfs-log-fixes-for-for-next/20210617-162640
base: https://git.kernel.org/pub/scm/fs/xfs/xfs-linux.git for-next
:::::: branch date: 10 days ago
:::::: commit date: 10 days ago
config: h8300-randconfig-m031-20210625 (attached as .config)
compiler: h8300-linux-gcc (GCC) 9.3.0
If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
New smatch warnings:
fs/xfs/xfs_log_cil.c:1130 xlog_cil_push_work() error: uninitialized symbol 'commit_lsn'.
Old smatch warnings:
fs/xfs/xfs_log_cil.c:644 xlog_discard_busy_extents() warn: should '(busyp->length) << mp->m_blkbb_log' be a 64 bit type?
vim +/commit_lsn +1130 fs/xfs/xfs_log_cil.c
be05dd0e68ac99 Dave Chinner 2021-06-08 846
71e330b593905e Dave Chinner 2010-05-21 847 /*
c7cc296ddd1f6d Christoph Hellwig 2020-03-20 848 * Push the Committed Item List to the log.
c7cc296ddd1f6d Christoph Hellwig 2020-03-20 849 *
c7cc296ddd1f6d Christoph Hellwig 2020-03-20 850 * If the current sequence is the same as xc_push_seq we need to do a flush. If
c7cc296ddd1f6d Christoph Hellwig 2020-03-20 851 * xc_push_seq is less than the current sequence, then it has already been
a44f13edf0ebb4 Dave Chinner 2010-08-24 852 * flushed and we don't need to do anything - the caller will wait for it to
a44f13edf0ebb4 Dave Chinner 2010-08-24 853 * complete if necessary.
a44f13edf0ebb4 Dave Chinner 2010-08-24 854 *
c7cc296ddd1f6d Christoph Hellwig 2020-03-20 855 * xc_push_seq is checked unlocked against the sequence number for a match.
c7cc296ddd1f6d Christoph Hellwig 2020-03-20 856 * Hence we can allow log forces to run racily and not issue pushes for the
c7cc296ddd1f6d Christoph Hellwig 2020-03-20 857 * same sequence twice. If we get a race between multiple pushes for the same
c7cc296ddd1f6d Christoph Hellwig 2020-03-20 858 * sequence they will block on the first one and then abort, hence avoiding
c7cc296ddd1f6d Christoph Hellwig 2020-03-20 859 * needless pushes.
c7cc296ddd1f6d Christoph Hellwig 2020-03-20 860 */
c7cc296ddd1f6d Christoph Hellwig 2020-03-20 861 static void
c7cc296ddd1f6d Christoph Hellwig 2020-03-20 862 xlog_cil_push_work(
c7cc296ddd1f6d Christoph Hellwig 2020-03-20 863 struct work_struct *work)
71e330b593905e Dave Chinner 2010-05-21 864 {
facd77e4e38b8f Dave Chinner 2021-06-04 865 struct xfs_cil_ctx *ctx =
facd77e4e38b8f Dave Chinner 2021-06-04 866 container_of(work, struct xfs_cil_ctx, push_work);
facd77e4e38b8f Dave Chinner 2021-06-04 867 struct xfs_cil *cil = ctx->cil;
c7cc296ddd1f6d Christoph Hellwig 2020-03-20 868 struct xlog *log = cil->xc_log;
71e330b593905e Dave Chinner 2010-05-21 869 struct xfs_log_vec *lv;
71e330b593905e Dave Chinner 2010-05-21 870 struct xfs_cil_ctx *new_ctx;
71e330b593905e Dave Chinner 2010-05-21 871 struct xlog_in_core *commit_iclog;
66fc9ffa8638be Dave Chinner 2021-06-04 872 int num_iovecs = 0;
66fc9ffa8638be Dave Chinner 2021-06-04 873 int num_bytes = 0;
71e330b593905e Dave Chinner 2010-05-21 874 int error = 0;
877cf3473914ae Dave Chinner 2021-06-04 875 struct xlog_cil_trans_hdr thdr;
a47518453bf958 Dave Chinner 2021-06-08 876 struct xfs_log_vec lvhdr = {};
71e330b593905e Dave Chinner 2010-05-21 877 xfs_lsn_t commit_lsn;
4c2d542f2e7865 Dave Chinner 2012-04-23 878 xfs_lsn_t push_seq;
0279bbbbc03f2c Dave Chinner 2021-06-03 879 struct bio bio;
0279bbbbc03f2c Dave Chinner 2021-06-03 880 DECLARE_COMPLETION_ONSTACK(bdev_flush);
e12213ba5d909a Dave Chinner 2021-06-04 881 bool push_commit_stable;
e469cbe84f4ade Dave Chinner 2021-06-08 882 struct xlog_ticket *ticket;
71e330b593905e Dave Chinner 2010-05-21 883
facd77e4e38b8f Dave Chinner 2021-06-04 884 new_ctx = xlog_cil_ctx_alloc();
71e330b593905e Dave Chinner 2010-05-21 885 new_ctx->ticket = xlog_cil_ticket_alloc(log);
71e330b593905e Dave Chinner 2010-05-21 886
71e330b593905e Dave Chinner 2010-05-21 887 down_write(&cil->xc_ctx_lock);
71e330b593905e Dave Chinner 2010-05-21 888
4bb928cdb900d0 Dave Chinner 2013-08-12 889 spin_lock(&cil->xc_push_lock);
4c2d542f2e7865 Dave Chinner 2012-04-23 890 push_seq = cil->xc_push_seq;
4c2d542f2e7865 Dave Chinner 2012-04-23 891 ASSERT(push_seq <= ctx->sequence);
e12213ba5d909a Dave Chinner 2021-06-04 892 push_commit_stable = cil->xc_push_commit_stable;
e12213ba5d909a Dave Chinner 2021-06-04 893 cil->xc_push_commit_stable = false;
71e330b593905e Dave Chinner 2010-05-21 894
0e7ab7efe77451 Dave Chinner 2020-03-24 895 /*
3682277520d6f4 Dave Chinner 2021-06-04 896 * As we are about to switch to a new, empty CIL context, we no longer
3682277520d6f4 Dave Chinner 2021-06-04 897 * need to throttle tasks on CIL space overruns. Wake any waiters that
3682277520d6f4 Dave Chinner 2021-06-04 898 * the hard push throttle may have caught so they can start committing
3682277520d6f4 Dave Chinner 2021-06-04 899 * to the new context. The ctx->xc_push_lock provides the serialisation
3682277520d6f4 Dave Chinner 2021-06-04 900 * necessary for safely using the lockless waitqueue_active() check in
3682277520d6f4 Dave Chinner 2021-06-04 901 * this context.
3682277520d6f4 Dave Chinner 2021-06-04 902 */
3682277520d6f4 Dave Chinner 2021-06-04 903 if (waitqueue_active(&cil->xc_push_wait))
c7f87f3984cfa1 Dave Chinner 2020-06-16 904 wake_up_all(&cil->xc_push_wait);
0e7ab7efe77451 Dave Chinner 2020-03-24 905
4c2d542f2e7865 Dave Chinner 2012-04-23 906 /*
4c2d542f2e7865 Dave Chinner 2012-04-23 907 * Check if we've anything to push. If there is nothing, then we don't
4c2d542f2e7865 Dave Chinner 2012-04-23 908 * move on to a new sequence number and so we have to be able to push
4c2d542f2e7865 Dave Chinner 2012-04-23 909 * this sequence again later.
4c2d542f2e7865 Dave Chinner 2012-04-23 910 */
0d11bae4bcf4aa Dave Chinner 2021-06-04 911 if (test_bit(XLOG_CIL_EMPTY, &cil->xc_flags)) {
4c2d542f2e7865 Dave Chinner 2012-04-23 912 cil->xc_push_seq = 0;
4bb928cdb900d0 Dave Chinner 2013-08-12 913 spin_unlock(&cil->xc_push_lock);
a44f13edf0ebb4 Dave Chinner 2010-08-24 914 goto out_skip;
4c2d542f2e7865 Dave Chinner 2012-04-23 915 }
4c2d542f2e7865 Dave Chinner 2012-04-23 916
a44f13edf0ebb4 Dave Chinner 2010-08-24 917
cf085a1b5d2214 Joe Perches 2019-11-07 918 /* check for a previously pushed sequence */
facd77e4e38b8f Dave Chinner 2021-06-04 919 if (push_seq < ctx->sequence) {
8af3dcd3c89aef Dave Chinner 2014-09-23 920 spin_unlock(&cil->xc_push_lock);
df806158b0f6eb Dave Chinner 2010-05-17 921 goto out_skip;
8af3dcd3c89aef Dave Chinner 2014-09-23 922 }
8af3dcd3c89aef Dave Chinner 2014-09-23 923
8af3dcd3c89aef Dave Chinner 2014-09-23 924 /*
8af3dcd3c89aef Dave Chinner 2014-09-23 925 * We are now going to push this context, so add it to the committing
8af3dcd3c89aef Dave Chinner 2014-09-23 926 * list before we do anything else. This ensures that anyone waiting on
8af3dcd3c89aef Dave Chinner 2014-09-23 927 * this push can easily detect the difference between a "push in
8af3dcd3c89aef Dave Chinner 2014-09-23 928 * progress" and "CIL is empty, nothing to do".
8af3dcd3c89aef Dave Chinner 2014-09-23 929 *
8af3dcd3c89aef Dave Chinner 2014-09-23 930 * IOWs, a wait loop can now check for:
8af3dcd3c89aef Dave Chinner 2014-09-23 931 * the current sequence not being found on the committing list;
8af3dcd3c89aef Dave Chinner 2014-09-23 932 * an empty CIL; and
8af3dcd3c89aef Dave Chinner 2014-09-23 933 * an unchanged sequence number
8af3dcd3c89aef Dave Chinner 2014-09-23 934 * to detect a push that had nothing to do and therefore does not need
8af3dcd3c89aef Dave Chinner 2014-09-23 935 * waiting on. If the CIL is not empty, we get put on the committing
8af3dcd3c89aef Dave Chinner 2014-09-23 936 * list before emptying the CIL and bumping the sequence number. Hence
8af3dcd3c89aef Dave Chinner 2014-09-23 937 * an empty CIL and an unchanged sequence number means we jumped out
8af3dcd3c89aef Dave Chinner 2014-09-23 938 * above after doing nothing.
8af3dcd3c89aef Dave Chinner 2014-09-23 939 *
8af3dcd3c89aef Dave Chinner 2014-09-23 940 * Hence the waiter will either find the commit sequence on the
8af3dcd3c89aef Dave Chinner 2014-09-23 941 * committing list or the sequence number will be unchanged and the CIL
8af3dcd3c89aef Dave Chinner 2014-09-23 942 * still dirty. In that latter case, the push has not yet started, and
8af3dcd3c89aef Dave Chinner 2014-09-23 943 * so the waiter will have to continue trying to check the CIL
8af3dcd3c89aef Dave Chinner 2014-09-23 944 * committing list until it is found. In extreme cases of delay, the
8af3dcd3c89aef Dave Chinner 2014-09-23 945 * sequence may fully commit between the attempts the wait makes to wait
8af3dcd3c89aef Dave Chinner 2014-09-23 946 * on the commit sequence.
8af3dcd3c89aef Dave Chinner 2014-09-23 947 */
8af3dcd3c89aef Dave Chinner 2014-09-23 948 list_add(&ctx->committing, &cil->xc_committing);
8af3dcd3c89aef Dave Chinner 2014-09-23 949 spin_unlock(&cil->xc_push_lock);
df806158b0f6eb Dave Chinner 2010-05-17 950
71e330b593905e Dave Chinner 2010-05-21 951 /*
0279bbbbc03f2c Dave Chinner 2021-06-03 952 * The CIL is stable at this point - nothing new will be added to it
0279bbbbc03f2c Dave Chinner 2021-06-03 953 * because we hold the flush lock exclusively. Hence we can now issue
0279bbbbc03f2c Dave Chinner 2021-06-03 954 * a cache flush to ensure all the completed metadata in the journal we
0279bbbbc03f2c Dave Chinner 2021-06-03 955 * are about to overwrite is on stable storage.
0279bbbbc03f2c Dave Chinner 2021-06-03 956 */
0279bbbbc03f2c Dave Chinner 2021-06-03 957 xfs_flush_bdev_async(&bio, log->l_mp->m_ddev_targp->bt_bdev,
0279bbbbc03f2c Dave Chinner 2021-06-03 958 &bdev_flush);
0279bbbbc03f2c Dave Chinner 2021-06-03 959
a8613836d99e62 Dave Chinner 2021-06-08 960 xlog_cil_pcp_aggregate(cil, ctx);
a8613836d99e62 Dave Chinner 2021-06-08 961
1f18c0c4b78cfb Dave Chinner 2021-06-08 962 while (!list_empty(&ctx->log_items)) {
71e330b593905e Dave Chinner 2010-05-21 963 struct xfs_log_item *item;
71e330b593905e Dave Chinner 2010-05-21 964
1f18c0c4b78cfb Dave Chinner 2021-06-08 965 item = list_first_entry(&ctx->log_items,
71e330b593905e Dave Chinner 2010-05-21 966 struct xfs_log_item, li_cil);
a47518453bf958 Dave Chinner 2021-06-08 967 lv = item->li_lv;
a1785f597c8b06 Dave Chinner 2021-06-08 968 lv->lv_order_id = item->li_order_id;
a47518453bf958 Dave Chinner 2021-06-08 969 num_iovecs += lv->lv_niovecs;
66fc9ffa8638be Dave Chinner 2021-06-04 970 /* we don't write ordered log vectors */
66fc9ffa8638be Dave Chinner 2021-06-04 971 if (lv->lv_buf_len != XFS_LOG_VEC_ORDERED)
66fc9ffa8638be Dave Chinner 2021-06-04 972 num_bytes += lv->lv_bytes;
a47518453bf958 Dave Chinner 2021-06-08 973
a47518453bf958 Dave Chinner 2021-06-08 974 list_add_tail(&lv->lv_list, &ctx->lv_chain);
a1785f597c8b06 Dave Chinner 2021-06-08 975 list_del_init(&item->li_cil);
a1785f597c8b06 Dave Chinner 2021-06-08 976 item->li_order_id = 0;
a1785f597c8b06 Dave Chinner 2021-06-08 977 item->li_lv = NULL;
71e330b593905e Dave Chinner 2010-05-21 978 }
71e330b593905e Dave Chinner 2010-05-21 979
71e330b593905e Dave Chinner 2010-05-21 980 /*
facd77e4e38b8f Dave Chinner 2021-06-04 981 * Switch the contexts so we can drop the context lock and move out
71e330b593905e Dave Chinner 2010-05-21 982 * of a shared context. We can't just go straight to the commit record,
71e330b593905e Dave Chinner 2010-05-21 983 * though - we need to synchronise with previous and future commits so
71e330b593905e Dave Chinner 2010-05-21 984 * that the commit records are correctly ordered in the log to ensure
71e330b593905e Dave Chinner 2010-05-21 985 * that we process items during log IO completion in the correct order.
71e330b593905e Dave Chinner 2010-05-21 986 *
71e330b593905e Dave Chinner 2010-05-21 987 * For example, if we get an EFI in one checkpoint and the EFD in the
71e330b593905e Dave Chinner 2010-05-21 988 * next (e.g. due to log forces), we do not want the checkpoint with
71e330b593905e Dave Chinner 2010-05-21 989 * the EFD to be committed before the checkpoint with the EFI. Hence
71e330b593905e Dave Chinner 2010-05-21 990 * we must strictly order the commit records of the checkpoints so
71e330b593905e Dave Chinner 2010-05-21 991 * that: a) the checkpoint callbacks are attached to the iclogs in the
71e330b593905e Dave Chinner 2010-05-21 992 * correct order; and b) the checkpoints are replayed in correct order
71e330b593905e Dave Chinner 2010-05-21 993 * in log recovery.
71e330b593905e Dave Chinner 2010-05-21 994 *
71e330b593905e Dave Chinner 2010-05-21 995 * Hence we need to add this context to the committing context list so
71e330b593905e Dave Chinner 2010-05-21 996 * that higher sequences will wait for us to write out a commit record
71e330b593905e Dave Chinner 2010-05-21 997 * before they do.
f876e44603ad09 Dave Chinner 2014-02-27 998 *
f39ae5297c5ce2 Dave Chinner 2021-06-04 999 * xfs_log_force_seq requires us to mirror the new sequence into the cil
f876e44603ad09 Dave Chinner 2014-02-27 1000 * structure atomically with the addition of this sequence to the
f876e44603ad09 Dave Chinner 2014-02-27 1001 * committing list. This also ensures that we can do unlocked checks
f876e44603ad09 Dave Chinner 2014-02-27 1002 * against the current sequence in log forces without risking
f876e44603ad09 Dave Chinner 2014-02-27 1003 * deferencing a freed context pointer.
71e330b593905e Dave Chinner 2010-05-21 1004 */
4bb928cdb900d0 Dave Chinner 2013-08-12 1005 spin_lock(&cil->xc_push_lock);
facd77e4e38b8f Dave Chinner 2021-06-04 1006 xlog_cil_ctx_switch(cil, new_ctx);
4bb928cdb900d0 Dave Chinner 2013-08-12 1007 spin_unlock(&cil->xc_push_lock);
71e330b593905e Dave Chinner 2010-05-21 1008 up_write(&cil->xc_ctx_lock);
71e330b593905e Dave Chinner 2010-05-21 1009
a1785f597c8b06 Dave Chinner 2021-06-08 1010 /*
a1785f597c8b06 Dave Chinner 2021-06-08 1011 * Sort the log vector chain before we add the transaction headers.
a1785f597c8b06 Dave Chinner 2021-06-08 1012 * This ensures we always have the transaction headers at the start
a1785f597c8b06 Dave Chinner 2021-06-08 1013 * of the chain.
a1785f597c8b06 Dave Chinner 2021-06-08 1014 */
a1785f597c8b06 Dave Chinner 2021-06-08 1015 list_sort(NULL, &ctx->lv_chain, xlog_cil_order_cmp);
a1785f597c8b06 Dave Chinner 2021-06-08 1016
71e330b593905e Dave Chinner 2010-05-21 1017 /*
71e330b593905e Dave Chinner 2010-05-21 1018 * Build a checkpoint transaction header and write it to the log to
71e330b593905e Dave Chinner 2010-05-21 1019 * begin the transaction. We need to account for the space used by the
71e330b593905e Dave Chinner 2010-05-21 1020 * transaction header here as it is not accounted for in xlog_write().
a47518453bf958 Dave Chinner 2021-06-08 1021 * Add the lvhdr to the head of the lv chain we pass to xlog_write() so
a47518453bf958 Dave Chinner 2021-06-08 1022 * it gets written into the iclog first.
71e330b593905e Dave Chinner 2010-05-21 1023 */
877cf3473914ae Dave Chinner 2021-06-04 1024 xlog_cil_build_trans_hdr(ctx, &thdr, &lvhdr, num_iovecs);
66fc9ffa8638be Dave Chinner 2021-06-04 1025 num_bytes += lvhdr.lv_bytes;
a47518453bf958 Dave Chinner 2021-06-08 1026 list_add(&lvhdr.lv_list, &ctx->lv_chain);
71e330b593905e Dave Chinner 2010-05-21 1027
0279bbbbc03f2c Dave Chinner 2021-06-03 1028 /*
0279bbbbc03f2c Dave Chinner 2021-06-03 1029 * Before we format and submit the first iclog, we have to ensure that
0279bbbbc03f2c Dave Chinner 2021-06-03 1030 * the metadata writeback ordering cache flush is complete.
0279bbbbc03f2c Dave Chinner 2021-06-03 1031 */
0279bbbbc03f2c Dave Chinner 2021-06-03 1032 wait_for_completion(&bdev_flush);
0279bbbbc03f2c Dave Chinner 2021-06-03 1033
877cf3473914ae Dave Chinner 2021-06-04 1034 /*
877cf3473914ae Dave Chinner 2021-06-04 1035 * The LSN we need to pass to the log items on transaction commit is the
877cf3473914ae Dave Chinner 2021-06-04 1036 * LSN reported by the first log vector write, not the commit lsn. If we
877cf3473914ae Dave Chinner 2021-06-04 1037 * use the commit record lsn then we can move the tail beyond the grant
877cf3473914ae Dave Chinner 2021-06-04 1038 * write head.
877cf3473914ae Dave Chinner 2021-06-04 1039 */
fc3370002b56bc Dave Chinner 2021-06-17 1040 error = xlog_write(log, ctx, &ctx->lv_chain, ctx->ticket,
a47518453bf958 Dave Chinner 2021-06-08 1041 NULL, num_bytes);
a47518453bf958 Dave Chinner 2021-06-08 1042
a47518453bf958 Dave Chinner 2021-06-08 1043 /*
a47518453bf958 Dave Chinner 2021-06-08 1044 * Take the lvhdr back off the lv_chain as it should not be passed
a47518453bf958 Dave Chinner 2021-06-08 1045 * to log IO completion.
a47518453bf958 Dave Chinner 2021-06-08 1046 */
a47518453bf958 Dave Chinner 2021-06-08 1047 list_del(&lvhdr.lv_list);
71e330b593905e Dave Chinner 2010-05-21 1048 if (error)
7db37c5e6575b2 Dave Chinner 2011-01-27 1049 goto out_abort_free_ticket;
71e330b593905e Dave Chinner 2010-05-21 1050
71e330b593905e Dave Chinner 2010-05-21 1051 /*
71e330b593905e Dave Chinner 2010-05-21 1052 * now that we've written the checkpoint into the log, strictly
71e330b593905e Dave Chinner 2010-05-21 1053 * order the commit records so replay will get them in the right order.
71e330b593905e Dave Chinner 2010-05-21 1054 */
71e330b593905e Dave Chinner 2010-05-21 1055 restart:
4bb928cdb900d0 Dave Chinner 2013-08-12 1056 spin_lock(&cil->xc_push_lock);
71e330b593905e Dave Chinner 2010-05-21 1057 list_for_each_entry(new_ctx, &cil->xc_committing, committing) {
ac983517ec5941 Dave Chinner 2014-05-07 1058 /*
ac983517ec5941 Dave Chinner 2014-05-07 1059 * Avoid getting stuck in this loop because we were woken by the
ac983517ec5941 Dave Chinner 2014-05-07 1060 * shutdown, but then went back to sleep once already in the
ac983517ec5941 Dave Chinner 2014-05-07 1061 * shutdown state.
ac983517ec5941 Dave Chinner 2014-05-07 1062 */
ac983517ec5941 Dave Chinner 2014-05-07 1063 if (XLOG_FORCED_SHUTDOWN(log)) {
ac983517ec5941 Dave Chinner 2014-05-07 1064 spin_unlock(&cil->xc_push_lock);
ac983517ec5941 Dave Chinner 2014-05-07 1065 goto out_abort_free_ticket;
ac983517ec5941 Dave Chinner 2014-05-07 1066 }
ac983517ec5941 Dave Chinner 2014-05-07 1067
71e330b593905e Dave Chinner 2010-05-21 1068 /*
71e330b593905e Dave Chinner 2010-05-21 1069 * Higher sequences will wait for this one so skip them.
ac983517ec5941 Dave Chinner 2014-05-07 1070 * Don't wait for our own sequence, either.
71e330b593905e Dave Chinner 2010-05-21 1071 */
71e330b593905e Dave Chinner 2010-05-21 1072 if (new_ctx->sequence >= ctx->sequence)
71e330b593905e Dave Chinner 2010-05-21 1073 continue;
71e330b593905e Dave Chinner 2010-05-21 1074 if (!new_ctx->commit_lsn) {
71e330b593905e Dave Chinner 2010-05-21 1075 /*
71e330b593905e Dave Chinner 2010-05-21 1076 * It is still being pushed! Wait for the push to
71e330b593905e Dave Chinner 2010-05-21 1077 * complete, then start again from the beginning.
71e330b593905e Dave Chinner 2010-05-21 1078 */
4bb928cdb900d0 Dave Chinner 2013-08-12 1079 xlog_wait(&cil->xc_commit_wait, &cil->xc_push_lock);
71e330b593905e Dave Chinner 2010-05-21 1080 goto restart;
71e330b593905e Dave Chinner 2010-05-21 1081 }
71e330b593905e Dave Chinner 2010-05-21 1082 }
4bb928cdb900d0 Dave Chinner 2013-08-12 1083 spin_unlock(&cil->xc_push_lock);
71e330b593905e Dave Chinner 2010-05-21 1084
fc3370002b56bc Dave Chinner 2021-06-17 1085 error = xlog_cil_write_commit_record(ctx, &commit_iclog);
dd401770b0ff68 Dave Chinner 2020-03-25 1086 if (error)
dd401770b0ff68 Dave Chinner 2020-03-25 1087 goto out_abort_free_ticket;
dd401770b0ff68 Dave Chinner 2020-03-25 1088
89ae379d564c5d Christoph Hellwig 2019-06-28 1089 spin_lock(&commit_iclog->ic_callback_lock);
1858bb0bec612d Christoph Hellwig 2019-10-14 1090 if (commit_iclog->ic_state == XLOG_STATE_IOERROR) {
89ae379d564c5d Christoph Hellwig 2019-06-28 1091 spin_unlock(&commit_iclog->ic_callback_lock);
e469cbe84f4ade Dave Chinner 2021-06-08 1092 goto out_abort_free_ticket;
89ae379d564c5d Christoph Hellwig 2019-06-28 1093 }
89ae379d564c5d Christoph Hellwig 2019-06-28 1094 ASSERT_ALWAYS(commit_iclog->ic_state == XLOG_STATE_ACTIVE ||
89ae379d564c5d Christoph Hellwig 2019-06-28 1095 commit_iclog->ic_state == XLOG_STATE_WANT_SYNC);
89ae379d564c5d Christoph Hellwig 2019-06-28 1096 list_add_tail(&ctx->iclog_entry, &commit_iclog->ic_callbacks);
89ae379d564c5d Christoph Hellwig 2019-06-28 1097 spin_unlock(&commit_iclog->ic_callback_lock);
71e330b593905e Dave Chinner 2010-05-21 1098
71e330b593905e Dave Chinner 2010-05-21 1099 /*
71e330b593905e Dave Chinner 2010-05-21 1100 * now the checkpoint commit is complete and we've attached the
71e330b593905e Dave Chinner 2010-05-21 1101 * callbacks to the iclog we can assign the commit LSN to the context
71e330b593905e Dave Chinner 2010-05-21 1102 * and wake up anyone who is waiting for the commit to complete.
71e330b593905e Dave Chinner 2010-05-21 1103 */
4bb928cdb900d0 Dave Chinner 2013-08-12 1104 spin_lock(&cil->xc_push_lock);
eb40a87500ac2f Dave Chinner 2010-12-21 1105 wake_up_all(&cil->xc_commit_wait);
4bb928cdb900d0 Dave Chinner 2013-08-12 1106 spin_unlock(&cil->xc_push_lock);
71e330b593905e Dave Chinner 2010-05-21 1107
e469cbe84f4ade Dave Chinner 2021-06-08 1108 /*
e469cbe84f4ade Dave Chinner 2021-06-08 1109 * Pull the ticket off the ctx so we can ungrant it after releasing the
e469cbe84f4ade Dave Chinner 2021-06-08 1110 * commit_iclog. The ctx may be freed by the time we return from
e469cbe84f4ade Dave Chinner 2021-06-08 1111 * releasing the commit_iclog (i.e. checkpoint has been completed and
e469cbe84f4ade Dave Chinner 2021-06-08 1112 * callback run) so we can't reference the ctx after the call to
e469cbe84f4ade Dave Chinner 2021-06-08 1113 * xlog_state_release_iclog().
e469cbe84f4ade Dave Chinner 2021-06-08 1114 */
e469cbe84f4ade Dave Chinner 2021-06-08 1115 ticket = ctx->ticket;
e469cbe84f4ade Dave Chinner 2021-06-08 1116
5fd9256ce156ef Dave Chinner 2021-06-03 1117 /*
815753dc16bbca Dave Chinner 2021-06-17 1118 * If the checkpoint spans multiple iclogs, wait for all previous iclogs
815753dc16bbca Dave Chinner 2021-06-17 1119 * to complete before we submit the commit_iclog. We can't use state
815753dc16bbca Dave Chinner 2021-06-17 1120 * checks for this - ACTIVE can be either a past completed iclog or a
815753dc16bbca Dave Chinner 2021-06-17 1121 * future iclog being filled, while WANT_SYNC through SYNC_DONE can be a
815753dc16bbca Dave Chinner 2021-06-17 1122 * past or future iclog awaiting IO or ordered IO completion to be run.
815753dc16bbca Dave Chinner 2021-06-17 1123 * In the latter case, if it's a future iclog and we wait on it, the we
815753dc16bbca Dave Chinner 2021-06-17 1124 * will hang because it won't get processed through to ic_force_wait
815753dc16bbca Dave Chinner 2021-06-17 1125 * wakeup until this commit_iclog is written to disk. Hence we use the
815753dc16bbca Dave Chinner 2021-06-17 1126 * iclog header lsn and compare it to the commit lsn to determine if we
815753dc16bbca Dave Chinner 2021-06-17 1127 * need to wait on iclogs or not.
5fd9256ce156ef Dave Chinner 2021-06-03 1128 */
5fd9256ce156ef Dave Chinner 2021-06-03 1129 spin_lock(&log->l_icloglock);
cb1acb3f324636 Dave Chinner 2021-06-04 @1130 if (ctx->start_lsn != commit_lsn) {
815753dc16bbca Dave Chinner 2021-06-17 1131 struct xlog_in_core *iclog;
815753dc16bbca Dave Chinner 2021-06-17 1132
815753dc16bbca Dave Chinner 2021-06-17 1133 for (iclog = commit_iclog->ic_prev;
815753dc16bbca Dave Chinner 2021-06-17 1134 iclog != commit_iclog;
815753dc16bbca Dave Chinner 2021-06-17 1135 iclog = iclog->ic_prev) {
815753dc16bbca Dave Chinner 2021-06-17 1136 xfs_lsn_t hlsn;
815753dc16bbca Dave Chinner 2021-06-17 1137
815753dc16bbca Dave Chinner 2021-06-17 1138 /*
815753dc16bbca Dave Chinner 2021-06-17 1139 * If the LSN of the iclog is zero or in the future it
815753dc16bbca Dave Chinner 2021-06-17 1140 * means it has passed through IO completion and
815753dc16bbca Dave Chinner 2021-06-17 1141 * activation and hence all previous iclogs have also
815753dc16bbca Dave Chinner 2021-06-17 1142 * done so. We do not need to wait at all in this case.
815753dc16bbca Dave Chinner 2021-06-17 1143 */
815753dc16bbca Dave Chinner 2021-06-17 1144 hlsn = be64_to_cpu(iclog->ic_header.h_lsn);
815753dc16bbca Dave Chinner 2021-06-17 1145 if (!hlsn || XFS_LSN_CMP(hlsn, commit_lsn) > 0)
815753dc16bbca Dave Chinner 2021-06-17 1146 break;
815753dc16bbca Dave Chinner 2021-06-17 1147
815753dc16bbca Dave Chinner 2021-06-17 1148 /*
815753dc16bbca Dave Chinner 2021-06-17 1149 * If the LSN of the iclog is older than the commit lsn,
815753dc16bbca Dave Chinner 2021-06-17 1150 * we have to wait on it. Waiting on this via the
815753dc16bbca Dave Chinner 2021-06-17 1151 * ic_force_wait should also order the completion of all
815753dc16bbca Dave Chinner 2021-06-17 1152 * older iclogs, too, but we leave checking that to the
815753dc16bbca Dave Chinner 2021-06-17 1153 * next loop iteration.
815753dc16bbca Dave Chinner 2021-06-17 1154 */
815753dc16bbca Dave Chinner 2021-06-17 1155 ASSERT(XFS_LSN_CMP(hlsn, commit_lsn) < 0);
815753dc16bbca Dave Chinner 2021-06-17 1156 xlog_wait_on_iclog(iclog);
cb1acb3f324636 Dave Chinner 2021-06-04 1157 spin_lock(&log->l_icloglock);
815753dc16bbca Dave Chinner 2021-06-17 1158 }
815753dc16bbca Dave Chinner 2021-06-17 1159
815753dc16bbca Dave Chinner 2021-06-17 1160 /*
815753dc16bbca Dave Chinner 2021-06-17 1161 * Regardless of whether we need to wait or not, the the
815753dc16bbca Dave Chinner 2021-06-17 1162 * commit_iclog write needs to issue a pre-flush so that the
815753dc16bbca Dave Chinner 2021-06-17 1163 * ordering for this checkpoint is correctly preserved down to
815753dc16bbca Dave Chinner 2021-06-17 1164 * stable storage.
815753dc16bbca Dave Chinner 2021-06-17 1165 */
cb1acb3f324636 Dave Chinner 2021-06-04 1166 commit_iclog->ic_flags |= XLOG_ICL_NEED_FLUSH;
5fd9256ce156ef Dave Chinner 2021-06-03 1167 }
5fd9256ce156ef Dave Chinner 2021-06-03 1168
cb1acb3f324636 Dave Chinner 2021-06-04 1169 /*
cb1acb3f324636 Dave Chinner 2021-06-04 1170 * The commit iclog must be written to stable storage to guarantee
cb1acb3f324636 Dave Chinner 2021-06-04 1171 * journal IO vs metadata writeback IO is correctly ordered on stable
cb1acb3f324636 Dave Chinner 2021-06-04 1172 * storage.
e12213ba5d909a Dave Chinner 2021-06-04 1173 *
e12213ba5d909a Dave Chinner 2021-06-04 1174 * If the push caller needs the commit to be immediately stable and the
e12213ba5d909a Dave Chinner 2021-06-04 1175 * commit_iclog is not yet marked as XLOG_STATE_WANT_SYNC to indicate it
e12213ba5d909a Dave Chinner 2021-06-04 1176 * will be written when released, switch it's state to WANT_SYNC right
e12213ba5d909a Dave Chinner 2021-06-04 1177 * now.
cb1acb3f324636 Dave Chinner 2021-06-04 1178 */
cb1acb3f324636 Dave Chinner 2021-06-04 1179 commit_iclog->ic_flags |= XLOG_ICL_NEED_FUA;
e12213ba5d909a Dave Chinner 2021-06-04 1180 if (push_commit_stable && commit_iclog->ic_state == XLOG_STATE_ACTIVE)
e12213ba5d909a Dave Chinner 2021-06-04 1181 xlog_state_switch_iclogs(log, commit_iclog, 0);
e469cbe84f4ade Dave Chinner 2021-06-08 1182 xlog_state_release_iclog(log, commit_iclog, ticket);
cb1acb3f324636 Dave Chinner 2021-06-04 1183 spin_unlock(&log->l_icloglock);
e469cbe84f4ade Dave Chinner 2021-06-08 1184
e469cbe84f4ade Dave Chinner 2021-06-08 1185 xfs_log_ticket_ungrant(log, ticket);
c7cc296ddd1f6d Christoph Hellwig 2020-03-20 1186 return;
71e330b593905e Dave Chinner 2010-05-21 1187
71e330b593905e Dave Chinner 2010-05-21 1188 out_skip:
71e330b593905e Dave Chinner 2010-05-21 1189 up_write(&cil->xc_ctx_lock);
71e330b593905e Dave Chinner 2010-05-21 1190 xfs_log_ticket_put(new_ctx->ticket);
71e330b593905e Dave Chinner 2010-05-21 1191 kmem_free(new_ctx);
c7cc296ddd1f6d Christoph Hellwig 2020-03-20 1192 return;
71e330b593905e Dave Chinner 2010-05-21 1193
7db37c5e6575b2 Dave Chinner 2011-01-27 1194 out_abort_free_ticket:
877cf3473914ae Dave Chinner 2021-06-04 1195 xfs_log_ticket_ungrant(log, ctx->ticket);
12e6a0f449d585 Christoph Hellwig 2020-03-20 1196 ASSERT(XLOG_FORCED_SHUTDOWN(log));
12e6a0f449d585 Christoph Hellwig 2020-03-20 1197 xlog_cil_committed(ctx);
4c2d542f2e7865 Dave Chinner 2012-04-23 1198 }
4c2d542f2e7865 Dave Chinner 2012-04-23 1199
---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all(a)lists.01.org
[-- Attachment #2: config.gz --]
[-- Type: application/gzip, Size: 21700 bytes --]
^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [PATCH 4/8] xfs: pass a CIL context to xlog_write()
@ 2021-06-28 8:58 ` Dan Carpenter
0 siblings, 0 replies; 50+ messages in thread
From: Dan Carpenter @ 2021-06-28 8:58 UTC (permalink / raw)
To: kbuild, Dave Chinner, linux-xfs; +Cc: lkp, kbuild-all
Hi Dave,
url: https://github.com/0day-ci/linux/commits/Dave-Chinner/xfs-log-fixes-for-for-next/20210617-162640
base: https://git.kernel.org/pub/scm/fs/xfs/xfs-linux.git for-next
config: h8300-randconfig-m031-20210625 (attached as .config)
compiler: h8300-linux-gcc (GCC) 9.3.0
If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
New smatch warnings:
fs/xfs/xfs_log_cil.c:1130 xlog_cil_push_work() error: uninitialized symbol 'commit_lsn'.
vim +/commit_lsn +1130 fs/xfs/xfs_log_cil.c
c7cc296ddd1f6d Christoph Hellwig 2020-03-20 861 static void
c7cc296ddd1f6d Christoph Hellwig 2020-03-20 862 xlog_cil_push_work(
c7cc296ddd1f6d Christoph Hellwig 2020-03-20 863 struct work_struct *work)
71e330b593905e Dave Chinner 2010-05-21 864 {
facd77e4e38b8f Dave Chinner 2021-06-04 865 struct xfs_cil_ctx *ctx =
facd77e4e38b8f Dave Chinner 2021-06-04 866 container_of(work, struct xfs_cil_ctx, push_work);
facd77e4e38b8f Dave Chinner 2021-06-04 867 struct xfs_cil *cil = ctx->cil;
c7cc296ddd1f6d Christoph Hellwig 2020-03-20 868 struct xlog *log = cil->xc_log;
71e330b593905e Dave Chinner 2010-05-21 869 struct xfs_log_vec *lv;
71e330b593905e Dave Chinner 2010-05-21 870 struct xfs_cil_ctx *new_ctx;
71e330b593905e Dave Chinner 2010-05-21 871 struct xlog_in_core *commit_iclog;
66fc9ffa8638be Dave Chinner 2021-06-04 872 int num_iovecs = 0;
66fc9ffa8638be Dave Chinner 2021-06-04 873 int num_bytes = 0;
71e330b593905e Dave Chinner 2010-05-21 874 int error = 0;
877cf3473914ae Dave Chinner 2021-06-04 875 struct xlog_cil_trans_hdr thdr;
a47518453bf958 Dave Chinner 2021-06-08 876 struct xfs_log_vec lvhdr = {};
71e330b593905e Dave Chinner 2010-05-21 877 xfs_lsn_t commit_lsn;
^^^^^^^^^^
4c2d542f2e7865 Dave Chinner 2012-04-23 878 xfs_lsn_t push_seq;
0279bbbbc03f2c Dave Chinner 2021-06-03 879 struct bio bio;
0279bbbbc03f2c Dave Chinner 2021-06-03 880 DECLARE_COMPLETION_ONSTACK(bdev_flush);
e12213ba5d909a Dave Chinner 2021-06-04 881 bool push_commit_stable;
e469cbe84f4ade Dave Chinner 2021-06-08 882 struct xlog_ticket *ticket;
71e330b593905e Dave Chinner 2010-05-21 883
facd77e4e38b8f Dave Chinner 2021-06-04 884 new_ctx = xlog_cil_ctx_alloc();
71e330b593905e Dave Chinner 2010-05-21 885 new_ctx->ticket = xlog_cil_ticket_alloc(log);
71e330b593905e Dave Chinner 2010-05-21 886
71e330b593905e Dave Chinner 2010-05-21 887 down_write(&cil->xc_ctx_lock);
71e330b593905e Dave Chinner 2010-05-21 888
4bb928cdb900d0 Dave Chinner 2013-08-12 889 spin_lock(&cil->xc_push_lock);
4c2d542f2e7865 Dave Chinner 2012-04-23 890 push_seq = cil->xc_push_seq;
4c2d542f2e7865 Dave Chinner 2012-04-23 891 ASSERT(push_seq <= ctx->sequence);
e12213ba5d909a Dave Chinner 2021-06-04 892 push_commit_stable = cil->xc_push_commit_stable;
e12213ba5d909a Dave Chinner 2021-06-04 893 cil->xc_push_commit_stable = false;
71e330b593905e Dave Chinner 2010-05-21 894
0e7ab7efe77451 Dave Chinner 2020-03-24 895 /*
3682277520d6f4 Dave Chinner 2021-06-04 896 * As we are about to switch to a new, empty CIL context, we no longer
3682277520d6f4 Dave Chinner 2021-06-04 897 * need to throttle tasks on CIL space overruns. Wake any waiters that
3682277520d6f4 Dave Chinner 2021-06-04 898 * the hard push throttle may have caught so they can start committing
3682277520d6f4 Dave Chinner 2021-06-04 899 * to the new context. The ctx->xc_push_lock provides the serialisation
3682277520d6f4 Dave Chinner 2021-06-04 900 * necessary for safely using the lockless waitqueue_active() check in
3682277520d6f4 Dave Chinner 2021-06-04 901 * this context.
3682277520d6f4 Dave Chinner 2021-06-04 902 */
3682277520d6f4 Dave Chinner 2021-06-04 903 if (waitqueue_active(&cil->xc_push_wait))
c7f87f3984cfa1 Dave Chinner 2020-06-16 904 wake_up_all(&cil->xc_push_wait);
0e7ab7efe77451 Dave Chinner 2020-03-24 905
4c2d542f2e7865 Dave Chinner 2012-04-23 906 /*
4c2d542f2e7865 Dave Chinner 2012-04-23 907 * Check if we've anything to push. If there is nothing, then we don't
4c2d542f2e7865 Dave Chinner 2012-04-23 908 * move on to a new sequence number and so we have to be able to push
4c2d542f2e7865 Dave Chinner 2012-04-23 909 * this sequence again later.
4c2d542f2e7865 Dave Chinner 2012-04-23 910 */
0d11bae4bcf4aa Dave Chinner 2021-06-04 911 if (test_bit(XLOG_CIL_EMPTY, &cil->xc_flags)) {
4c2d542f2e7865 Dave Chinner 2012-04-23 912 cil->xc_push_seq = 0;
4bb928cdb900d0 Dave Chinner 2013-08-12 913 spin_unlock(&cil->xc_push_lock);
a44f13edf0ebb4 Dave Chinner 2010-08-24 914 goto out_skip;
4c2d542f2e7865 Dave Chinner 2012-04-23 915 }
4c2d542f2e7865 Dave Chinner 2012-04-23 916
a44f13edf0ebb4 Dave Chinner 2010-08-24 917
cf085a1b5d2214 Joe Perches 2019-11-07 918 /* check for a previously pushed sequence */
facd77e4e38b8f Dave Chinner 2021-06-04 919 if (push_seq < ctx->sequence) {
8af3dcd3c89aef Dave Chinner 2014-09-23 920 spin_unlock(&cil->xc_push_lock);
df806158b0f6eb Dave Chinner 2010-05-17 921 goto out_skip;
8af3dcd3c89aef Dave Chinner 2014-09-23 922 }
8af3dcd3c89aef Dave Chinner 2014-09-23 923
8af3dcd3c89aef Dave Chinner 2014-09-23 924 /*
8af3dcd3c89aef Dave Chinner 2014-09-23 925 * We are now going to push this context, so add it to the committing
8af3dcd3c89aef Dave Chinner 2014-09-23 926 * list before we do anything else. This ensures that anyone waiting on
8af3dcd3c89aef Dave Chinner 2014-09-23 927 * this push can easily detect the difference between a "push in
8af3dcd3c89aef Dave Chinner 2014-09-23 928 * progress" and "CIL is empty, nothing to do".
8af3dcd3c89aef Dave Chinner 2014-09-23 929 *
8af3dcd3c89aef Dave Chinner 2014-09-23 930 * IOWs, a wait loop can now check for:
8af3dcd3c89aef Dave Chinner 2014-09-23 931 * the current sequence not being found on the committing list;
8af3dcd3c89aef Dave Chinner 2014-09-23 932 * an empty CIL; and
8af3dcd3c89aef Dave Chinner 2014-09-23 933 * an unchanged sequence number
8af3dcd3c89aef Dave Chinner 2014-09-23 934 * to detect a push that had nothing to do and therefore does not need
8af3dcd3c89aef Dave Chinner 2014-09-23 935 * waiting on. If the CIL is not empty, we get put on the committing
8af3dcd3c89aef Dave Chinner 2014-09-23 936 * list before emptying the CIL and bumping the sequence number. Hence
8af3dcd3c89aef Dave Chinner 2014-09-23 937 * an empty CIL and an unchanged sequence number means we jumped out
8af3dcd3c89aef Dave Chinner 2014-09-23 938 * above after doing nothing.
8af3dcd3c89aef Dave Chinner 2014-09-23 939 *
8af3dcd3c89aef Dave Chinner 2014-09-23 940 * Hence the waiter will either find the commit sequence on the
8af3dcd3c89aef Dave Chinner 2014-09-23 941 * committing list or the sequence number will be unchanged and the CIL
8af3dcd3c89aef Dave Chinner 2014-09-23 942 * still dirty. In that latter case, the push has not yet started, and
8af3dcd3c89aef Dave Chinner 2014-09-23 943 * so the waiter will have to continue trying to check the CIL
8af3dcd3c89aef Dave Chinner 2014-09-23 944 * committing list until it is found. In extreme cases of delay, the
8af3dcd3c89aef Dave Chinner 2014-09-23 945 * sequence may fully commit between the attempts the wait makes to wait
8af3dcd3c89aef Dave Chinner 2014-09-23 946 * on the commit sequence.
8af3dcd3c89aef Dave Chinner 2014-09-23 947 */
8af3dcd3c89aef Dave Chinner 2014-09-23 948 list_add(&ctx->committing, &cil->xc_committing);
8af3dcd3c89aef Dave Chinner 2014-09-23 949 spin_unlock(&cil->xc_push_lock);
df806158b0f6eb Dave Chinner 2010-05-17 950
71e330b593905e Dave Chinner 2010-05-21 951 /*
0279bbbbc03f2c Dave Chinner 2021-06-03 952 * The CIL is stable at this point - nothing new will be added to it
0279bbbbc03f2c Dave Chinner 2021-06-03 953 * because we hold the flush lock exclusively. Hence we can now issue
0279bbbbc03f2c Dave Chinner 2021-06-03 954 * a cache flush to ensure all the completed metadata in the journal we
0279bbbbc03f2c Dave Chinner 2021-06-03 955 * are about to overwrite is on stable storage.
0279bbbbc03f2c Dave Chinner 2021-06-03 956 */
0279bbbbc03f2c Dave Chinner 2021-06-03 957 xfs_flush_bdev_async(&bio, log->l_mp->m_ddev_targp->bt_bdev,
0279bbbbc03f2c Dave Chinner 2021-06-03 958 &bdev_flush);
0279bbbbc03f2c Dave Chinner 2021-06-03 959
a8613836d99e62 Dave Chinner 2021-06-08 960 xlog_cil_pcp_aggregate(cil, ctx);
a8613836d99e62 Dave Chinner 2021-06-08 961
1f18c0c4b78cfb Dave Chinner 2021-06-08 962 while (!list_empty(&ctx->log_items)) {
71e330b593905e Dave Chinner 2010-05-21 963 struct xfs_log_item *item;
71e330b593905e Dave Chinner 2010-05-21 964
1f18c0c4b78cfb Dave Chinner 2021-06-08 965 item = list_first_entry(&ctx->log_items,
71e330b593905e Dave Chinner 2010-05-21 966 struct xfs_log_item, li_cil);
a47518453bf958 Dave Chinner 2021-06-08 967 lv = item->li_lv;
a1785f597c8b06 Dave Chinner 2021-06-08 968 lv->lv_order_id = item->li_order_id;
a47518453bf958 Dave Chinner 2021-06-08 969 num_iovecs += lv->lv_niovecs;
66fc9ffa8638be Dave Chinner 2021-06-04 970 /* we don't write ordered log vectors */
66fc9ffa8638be Dave Chinner 2021-06-04 971 if (lv->lv_buf_len != XFS_LOG_VEC_ORDERED)
66fc9ffa8638be Dave Chinner 2021-06-04 972 num_bytes += lv->lv_bytes;
a47518453bf958 Dave Chinner 2021-06-08 973
a47518453bf958 Dave Chinner 2021-06-08 974 list_add_tail(&lv->lv_list, &ctx->lv_chain);
a1785f597c8b06 Dave Chinner 2021-06-08 975 list_del_init(&item->li_cil);
a1785f597c8b06 Dave Chinner 2021-06-08 976 item->li_order_id = 0;
a1785f597c8b06 Dave Chinner 2021-06-08 977 item->li_lv = NULL;
71e330b593905e Dave Chinner 2010-05-21 978 }
71e330b593905e Dave Chinner 2010-05-21 979
71e330b593905e Dave Chinner 2010-05-21 980 /*
facd77e4e38b8f Dave Chinner 2021-06-04 981 * Switch the contexts so we can drop the context lock and move out
71e330b593905e Dave Chinner 2010-05-21 982 * of a shared context. We can't just go straight to the commit record,
71e330b593905e Dave Chinner 2010-05-21 983 * though - we need to synchronise with previous and future commits so
71e330b593905e Dave Chinner 2010-05-21 984 * that the commit records are correctly ordered in the log to ensure
71e330b593905e Dave Chinner 2010-05-21 985 * that we process items during log IO completion in the correct order.
71e330b593905e Dave Chinner 2010-05-21 986 *
71e330b593905e Dave Chinner 2010-05-21 987 * For example, if we get an EFI in one checkpoint and the EFD in the
71e330b593905e Dave Chinner 2010-05-21 988 * next (e.g. due to log forces), we do not want the checkpoint with
71e330b593905e Dave Chinner 2010-05-21 989 * the EFD to be committed before the checkpoint with the EFI. Hence
71e330b593905e Dave Chinner 2010-05-21 990 * we must strictly order the commit records of the checkpoints so
71e330b593905e Dave Chinner 2010-05-21 991 * that: a) the checkpoint callbacks are attached to the iclogs in the
71e330b593905e Dave Chinner 2010-05-21 992 * correct order; and b) the checkpoints are replayed in correct order
71e330b593905e Dave Chinner 2010-05-21 993 * in log recovery.
71e330b593905e Dave Chinner 2010-05-21 994 *
71e330b593905e Dave Chinner 2010-05-21 995 * Hence we need to add this context to the committing context list so
71e330b593905e Dave Chinner 2010-05-21 996 * that higher sequences will wait for us to write out a commit record
71e330b593905e Dave Chinner 2010-05-21 997 * before they do.
f876e44603ad09 Dave Chinner 2014-02-27 998 *
f39ae5297c5ce2 Dave Chinner 2021-06-04 999 * xfs_log_force_seq requires us to mirror the new sequence into the cil
f876e44603ad09 Dave Chinner 2014-02-27 1000 * structure atomically with the addition of this sequence to the
f876e44603ad09 Dave Chinner 2014-02-27 1001 * committing list. This also ensures that we can do unlocked checks
f876e44603ad09 Dave Chinner 2014-02-27 1002 * against the current sequence in log forces without risking
f876e44603ad09 Dave Chinner 2014-02-27 1003 * deferencing a freed context pointer.
71e330b593905e Dave Chinner 2010-05-21 1004 */
4bb928cdb900d0 Dave Chinner 2013-08-12 1005 spin_lock(&cil->xc_push_lock);
facd77e4e38b8f Dave Chinner 2021-06-04 1006 xlog_cil_ctx_switch(cil, new_ctx);
4bb928cdb900d0 Dave Chinner 2013-08-12 1007 spin_unlock(&cil->xc_push_lock);
71e330b593905e Dave Chinner 2010-05-21 1008 up_write(&cil->xc_ctx_lock);
71e330b593905e Dave Chinner 2010-05-21 1009
a1785f597c8b06 Dave Chinner 2021-06-08 1010 /*
a1785f597c8b06 Dave Chinner 2021-06-08 1011 * Sort the log vector chain before we add the transaction headers.
a1785f597c8b06 Dave Chinner 2021-06-08 1012 * This ensures we always have the transaction headers at the start
a1785f597c8b06 Dave Chinner 2021-06-08 1013 * of the chain.
a1785f597c8b06 Dave Chinner 2021-06-08 1014 */
a1785f597c8b06 Dave Chinner 2021-06-08 1015 list_sort(NULL, &ctx->lv_chain, xlog_cil_order_cmp);
a1785f597c8b06 Dave Chinner 2021-06-08 1016
71e330b593905e Dave Chinner 2010-05-21 1017 /*
71e330b593905e Dave Chinner 2010-05-21 1018 * Build a checkpoint transaction header and write it to the log to
71e330b593905e Dave Chinner 2010-05-21 1019 * begin the transaction. We need to account for the space used by the
71e330b593905e Dave Chinner 2010-05-21 1020 * transaction header here as it is not accounted for in xlog_write().
a47518453bf958 Dave Chinner 2021-06-08 1021 * Add the lvhdr to the head of the lv chain we pass to xlog_write() so
a47518453bf958 Dave Chinner 2021-06-08 1022 * it gets written into the iclog first.
71e330b593905e Dave Chinner 2010-05-21 1023 */
877cf3473914ae Dave Chinner 2021-06-04 1024 xlog_cil_build_trans_hdr(ctx, &thdr, &lvhdr, num_iovecs);
66fc9ffa8638be Dave Chinner 2021-06-04 1025 num_bytes += lvhdr.lv_bytes;
a47518453bf958 Dave Chinner 2021-06-08 1026 list_add(&lvhdr.lv_list, &ctx->lv_chain);
71e330b593905e Dave Chinner 2010-05-21 1027
0279bbbbc03f2c Dave Chinner 2021-06-03 1028 /*
0279bbbbc03f2c Dave Chinner 2021-06-03 1029 * Before we format and submit the first iclog, we have to ensure that
0279bbbbc03f2c Dave Chinner 2021-06-03 1030 * the metadata writeback ordering cache flush is complete.
0279bbbbc03f2c Dave Chinner 2021-06-03 1031 */
0279bbbbc03f2c Dave Chinner 2021-06-03 1032 wait_for_completion(&bdev_flush);
0279bbbbc03f2c Dave Chinner 2021-06-03 1033
877cf3473914ae Dave Chinner 2021-06-04 1034 /*
877cf3473914ae Dave Chinner 2021-06-04 1035 * The LSN we need to pass to the log items on transaction commit is the
877cf3473914ae Dave Chinner 2021-06-04 1036 * LSN reported by the first log vector write, not the commit lsn. If we
877cf3473914ae Dave Chinner 2021-06-04 1037 * use the commit record lsn then we can move the tail beyond the grant
877cf3473914ae Dave Chinner 2021-06-04 1038 * write head.
877cf3473914ae Dave Chinner 2021-06-04 1039 */
fc3370002b56bc Dave Chinner 2021-06-17 1040 error = xlog_write(log, ctx, &ctx->lv_chain, ctx->ticket,
a47518453bf958 Dave Chinner 2021-06-08 1041 NULL, num_bytes);
a47518453bf958 Dave Chinner 2021-06-08 1042
a47518453bf958 Dave Chinner 2021-06-08 1043 /*
a47518453bf958 Dave Chinner 2021-06-08 1044 * Take the lvhdr back off the lv_chain as it should not be passed
a47518453bf958 Dave Chinner 2021-06-08 1045 * to log IO completion.
a47518453bf958 Dave Chinner 2021-06-08 1046 */
a47518453bf958 Dave Chinner 2021-06-08 1047 list_del(&lvhdr.lv_list);
71e330b593905e Dave Chinner 2010-05-21 1048 if (error)
7db37c5e6575b2 Dave Chinner 2011-01-27 1049 goto out_abort_free_ticket;
71e330b593905e Dave Chinner 2010-05-21 1050
71e330b593905e Dave Chinner 2010-05-21 1051 /*
71e330b593905e Dave Chinner 2010-05-21 1052 * now that we've written the checkpoint into the log, strictly
71e330b593905e Dave Chinner 2010-05-21 1053 * order the commit records so replay will get them in the right order.
71e330b593905e Dave Chinner 2010-05-21 1054 */
71e330b593905e Dave Chinner 2010-05-21 1055 restart:
4bb928cdb900d0 Dave Chinner 2013-08-12 1056 spin_lock(&cil->xc_push_lock);
71e330b593905e Dave Chinner 2010-05-21 1057 list_for_each_entry(new_ctx, &cil->xc_committing, committing) {
ac983517ec5941 Dave Chinner 2014-05-07 1058 /*
ac983517ec5941 Dave Chinner 2014-05-07 1059 * Avoid getting stuck in this loop because we were woken by the
ac983517ec5941 Dave Chinner 2014-05-07 1060 * shutdown, but then went back to sleep once already in the
ac983517ec5941 Dave Chinner 2014-05-07 1061 * shutdown state.
ac983517ec5941 Dave Chinner 2014-05-07 1062 */
ac983517ec5941 Dave Chinner 2014-05-07 1063 if (XLOG_FORCED_SHUTDOWN(log)) {
ac983517ec5941 Dave Chinner 2014-05-07 1064 spin_unlock(&cil->xc_push_lock);
ac983517ec5941 Dave Chinner 2014-05-07 1065 goto out_abort_free_ticket;
ac983517ec5941 Dave Chinner 2014-05-07 1066 }
ac983517ec5941 Dave Chinner 2014-05-07 1067
71e330b593905e Dave Chinner 2010-05-21 1068 /*
71e330b593905e Dave Chinner 2010-05-21 1069 * Higher sequences will wait for this one so skip them.
ac983517ec5941 Dave Chinner 2014-05-07 1070 * Don't wait for our own sequence, either.
71e330b593905e Dave Chinner 2010-05-21 1071 */
71e330b593905e Dave Chinner 2010-05-21 1072 if (new_ctx->sequence >= ctx->sequence)
71e330b593905e Dave Chinner 2010-05-21 1073 continue;
71e330b593905e Dave Chinner 2010-05-21 1074 if (!new_ctx->commit_lsn) {
71e330b593905e Dave Chinner 2010-05-21 1075 /*
71e330b593905e Dave Chinner 2010-05-21 1076 * It is still being pushed! Wait for the push to
71e330b593905e Dave Chinner 2010-05-21 1077 * complete, then start again from the beginning.
71e330b593905e Dave Chinner 2010-05-21 1078 */
4bb928cdb900d0 Dave Chinner 2013-08-12 1079 xlog_wait(&cil->xc_commit_wait, &cil->xc_push_lock);
71e330b593905e Dave Chinner 2010-05-21 1080 goto restart;
71e330b593905e Dave Chinner 2010-05-21 1081 }
71e330b593905e Dave Chinner 2010-05-21 1082 }
4bb928cdb900d0 Dave Chinner 2013-08-12 1083 spin_unlock(&cil->xc_push_lock);
71e330b593905e Dave Chinner 2010-05-21 1084
fc3370002b56bc Dave Chinner 2021-06-17 1085 error = xlog_cil_write_commit_record(ctx, &commit_iclog);
dd401770b0ff68 Dave Chinner 2020-03-25 1086 if (error)
dd401770b0ff68 Dave Chinner 2020-03-25 1087 goto out_abort_free_ticket;
dd401770b0ff68 Dave Chinner 2020-03-25 1088
89ae379d564c5d Christoph Hellwig 2019-06-28 1089 spin_lock(&commit_iclog->ic_callback_lock);
1858bb0bec612d Christoph Hellwig 2019-10-14 1090 if (commit_iclog->ic_state == XLOG_STATE_IOERROR) {
89ae379d564c5d Christoph Hellwig 2019-06-28 1091 spin_unlock(&commit_iclog->ic_callback_lock);
e469cbe84f4ade Dave Chinner 2021-06-08 1092 goto out_abort_free_ticket;
89ae379d564c5d Christoph Hellwig 2019-06-28 1093 }
89ae379d564c5d Christoph Hellwig 2019-06-28 1094 ASSERT_ALWAYS(commit_iclog->ic_state == XLOG_STATE_ACTIVE ||
89ae379d564c5d Christoph Hellwig 2019-06-28 1095 commit_iclog->ic_state == XLOG_STATE_WANT_SYNC);
89ae379d564c5d Christoph Hellwig 2019-06-28 1096 list_add_tail(&ctx->iclog_entry, &commit_iclog->ic_callbacks);
89ae379d564c5d Christoph Hellwig 2019-06-28 1097 spin_unlock(&commit_iclog->ic_callback_lock);
71e330b593905e Dave Chinner 2010-05-21 1098
71e330b593905e Dave Chinner 2010-05-21 1099 /*
71e330b593905e Dave Chinner 2010-05-21 1100 * now the checkpoint commit is complete and we've attached the
71e330b593905e Dave Chinner 2010-05-21 1101 * callbacks to the iclog we can assign the commit LSN to the context
71e330b593905e Dave Chinner 2010-05-21 1102 * and wake up anyone who is waiting for the commit to complete.
71e330b593905e Dave Chinner 2010-05-21 1103 */
4bb928cdb900d0 Dave Chinner 2013-08-12 1104 spin_lock(&cil->xc_push_lock);
eb40a87500ac2f Dave Chinner 2010-12-21 1105 wake_up_all(&cil->xc_commit_wait);
4bb928cdb900d0 Dave Chinner 2013-08-12 1106 spin_unlock(&cil->xc_push_lock);
71e330b593905e Dave Chinner 2010-05-21 1107
e469cbe84f4ade Dave Chinner 2021-06-08 1108 /*
e469cbe84f4ade Dave Chinner 2021-06-08 1109 * Pull the ticket off the ctx so we can ungrant it after releasing the
e469cbe84f4ade Dave Chinner 2021-06-08 1110 * commit_iclog. The ctx may be freed by the time we return from
e469cbe84f4ade Dave Chinner 2021-06-08 1111 * releasing the commit_iclog (i.e. checkpoint has been completed and
e469cbe84f4ade Dave Chinner 2021-06-08 1112 * callback run) so we can't reference the ctx after the call to
e469cbe84f4ade Dave Chinner 2021-06-08 1113 * xlog_state_release_iclog().
e469cbe84f4ade Dave Chinner 2021-06-08 1114 */
e469cbe84f4ade Dave Chinner 2021-06-08 1115 ticket = ctx->ticket;
e469cbe84f4ade Dave Chinner 2021-06-08 1116
5fd9256ce156ef Dave Chinner 2021-06-03 1117 /*
815753dc16bbca Dave Chinner 2021-06-17 1118 * If the checkpoint spans multiple iclogs, wait for all previous iclogs
815753dc16bbca Dave Chinner 2021-06-17 1119 * to complete before we submit the commit_iclog. We can't use state
815753dc16bbca Dave Chinner 2021-06-17 1120 * checks for this - ACTIVE can be either a past completed iclog or a
815753dc16bbca Dave Chinner 2021-06-17 1121 * future iclog being filled, while WANT_SYNC through SYNC_DONE can be a
815753dc16bbca Dave Chinner 2021-06-17 1122 * past or future iclog awaiting IO or ordered IO completion to be run.
815753dc16bbca Dave Chinner 2021-06-17 1123 * In the latter case, if it's a future iclog and we wait on it, the we
815753dc16bbca Dave Chinner 2021-06-17 1124 * will hang because it won't get processed through to ic_force_wait
815753dc16bbca Dave Chinner 2021-06-17 1125 * wakeup until this commit_iclog is written to disk. Hence we use the
815753dc16bbca Dave Chinner 2021-06-17 1126 * iclog header lsn and compare it to the commit lsn to determine if we
815753dc16bbca Dave Chinner 2021-06-17 1127 * need to wait on iclogs or not.
5fd9256ce156ef Dave Chinner 2021-06-03 1128 */
5fd9256ce156ef Dave Chinner 2021-06-03 1129 spin_lock(&log->l_icloglock);
cb1acb3f324636 Dave Chinner 2021-06-04 @1130 if (ctx->start_lsn != commit_lsn) {
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Never initialized.
815753dc16bbca Dave Chinner 2021-06-17 1131 struct xlog_in_core *iclog;
815753dc16bbca Dave Chinner 2021-06-17 1132
815753dc16bbca Dave Chinner 2021-06-17 1133 for (iclog = commit_iclog->ic_prev;
815753dc16bbca Dave Chinner 2021-06-17 1134 iclog != commit_iclog;
815753dc16bbca Dave Chinner 2021-06-17 1135 iclog = iclog->ic_prev) {
815753dc16bbca Dave Chinner 2021-06-17 1136 xfs_lsn_t hlsn;
815753dc16bbca Dave Chinner 2021-06-17 1137
815753dc16bbca Dave Chinner 2021-06-17 1138 /*
815753dc16bbca Dave Chinner 2021-06-17 1139 * If the LSN of the iclog is zero or in the future it
815753dc16bbca Dave Chinner 2021-06-17 1140 * means it has passed through IO completion and
815753dc16bbca Dave Chinner 2021-06-17 1141 * activation and hence all previous iclogs have also
815753dc16bbca Dave Chinner 2021-06-17 1142 * done so. We do not need to wait at all in this case.
815753dc16bbca Dave Chinner 2021-06-17 1143 */
815753dc16bbca Dave Chinner 2021-06-17 1144 hlsn = be64_to_cpu(iclog->ic_header.h_lsn);
815753dc16bbca Dave Chinner 2021-06-17 1145 if (!hlsn || XFS_LSN_CMP(hlsn, commit_lsn) > 0)
815753dc16bbca Dave Chinner 2021-06-17 1146 break;
815753dc16bbca Dave Chinner 2021-06-17 1147
815753dc16bbca Dave Chinner 2021-06-17 1148 /*
815753dc16bbca Dave Chinner 2021-06-17 1149 * If the LSN of the iclog is older than the commit lsn,
815753dc16bbca Dave Chinner 2021-06-17 1150 * we have to wait on it. Waiting on this via the
815753dc16bbca Dave Chinner 2021-06-17 1151 * ic_force_wait should also order the completion of all
815753dc16bbca Dave Chinner 2021-06-17 1152 * older iclogs, too, but we leave checking that to the
815753dc16bbca Dave Chinner 2021-06-17 1153 * next loop iteration.
815753dc16bbca Dave Chinner 2021-06-17 1154 */
815753dc16bbca Dave Chinner 2021-06-17 1155 ASSERT(XFS_LSN_CMP(hlsn, commit_lsn) < 0);
815753dc16bbca Dave Chinner 2021-06-17 1156 xlog_wait_on_iclog(iclog);
cb1acb3f324636 Dave Chinner 2021-06-04 1157 spin_lock(&log->l_icloglock);
815753dc16bbca Dave Chinner 2021-06-17 1158 }
815753dc16bbca Dave Chinner 2021-06-17 1159
815753dc16bbca Dave Chinner 2021-06-17 1160 /*
815753dc16bbca Dave Chinner 2021-06-17 1161 * Regardless of whether we need to wait or not, the the
815753dc16bbca Dave Chinner 2021-06-17 1162 * commit_iclog write needs to issue a pre-flush so that the
815753dc16bbca Dave Chinner 2021-06-17 1163 * ordering for this checkpoint is correctly preserved down to
815753dc16bbca Dave Chinner 2021-06-17 1164 * stable storage.
815753dc16bbca Dave Chinner 2021-06-17 1165 */
cb1acb3f324636 Dave Chinner 2021-06-04 1166 commit_iclog->ic_flags |= XLOG_ICL_NEED_FLUSH;
5fd9256ce156ef Dave Chinner 2021-06-03 1167 }
5fd9256ce156ef Dave Chinner 2021-06-03 1168
cb1acb3f324636 Dave Chinner 2021-06-04 1169 /*
cb1acb3f324636 Dave Chinner 2021-06-04 1170 * The commit iclog must be written to stable storage to guarantee
cb1acb3f324636 Dave Chinner 2021-06-04 1171 * journal IO vs metadata writeback IO is correctly ordered on stable
cb1acb3f324636 Dave Chinner 2021-06-04 1172 * storage.
e12213ba5d909a Dave Chinner 2021-06-04 1173 *
e12213ba5d909a Dave Chinner 2021-06-04 1174 * If the push caller needs the commit to be immediately stable and the
e12213ba5d909a Dave Chinner 2021-06-04 1175 * commit_iclog is not yet marked as XLOG_STATE_WANT_SYNC to indicate it
e12213ba5d909a Dave Chinner 2021-06-04 1176 * will be written when released, switch it's state to WANT_SYNC right
e12213ba5d909a Dave Chinner 2021-06-04 1177 * now.
cb1acb3f324636 Dave Chinner 2021-06-04 1178 */
cb1acb3f324636 Dave Chinner 2021-06-04 1179 commit_iclog->ic_flags |= XLOG_ICL_NEED_FUA;
e12213ba5d909a Dave Chinner 2021-06-04 1180 if (push_commit_stable && commit_iclog->ic_state == XLOG_STATE_ACTIVE)
e12213ba5d909a Dave Chinner 2021-06-04 1181 xlog_state_switch_iclogs(log, commit_iclog, 0);
e469cbe84f4ade Dave Chinner 2021-06-08 1182 xlog_state_release_iclog(log, commit_iclog, ticket);
cb1acb3f324636 Dave Chinner 2021-06-04 1183 spin_unlock(&log->l_icloglock);
e469cbe84f4ade Dave Chinner 2021-06-08 1184
e469cbe84f4ade Dave Chinner 2021-06-08 1185 xfs_log_ticket_ungrant(log, ticket);
c7cc296ddd1f6d Christoph Hellwig 2020-03-20 1186 return;
71e330b593905e Dave Chinner 2010-05-21 1187
71e330b593905e Dave Chinner 2010-05-21 1188 out_skip:
71e330b593905e Dave Chinner 2010-05-21 1189 up_write(&cil->xc_ctx_lock);
71e330b593905e Dave Chinner 2010-05-21 1190 xfs_log_ticket_put(new_ctx->ticket);
71e330b593905e Dave Chinner 2010-05-21 1191 kmem_free(new_ctx);
c7cc296ddd1f6d Christoph Hellwig 2020-03-20 1192 return;
71e330b593905e Dave Chinner 2010-05-21 1193
7db37c5e6575b2 Dave Chinner 2011-01-27 1194 out_abort_free_ticket:
877cf3473914ae Dave Chinner 2021-06-04 1195 xfs_log_ticket_ungrant(log, ctx->ticket);
12e6a0f449d585 Christoph Hellwig 2020-03-20 1196 ASSERT(XLOG_FORCED_SHUTDOWN(log));
12e6a0f449d585 Christoph Hellwig 2020-03-20 1197 xlog_cil_committed(ctx);
4c2d542f2e7865 Dave Chinner 2012-04-23 1198 }
---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org
^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [PATCH 4/8] xfs: pass a CIL context to xlog_write()
@ 2021-06-28 8:58 ` Dan Carpenter
0 siblings, 0 replies; 50+ messages in thread
From: Dan Carpenter @ 2021-06-28 8:58 UTC (permalink / raw)
To: kbuild-all
[-- Attachment #1: Type: text/plain, Size: 30277 bytes --]
Hi Dave,
url: https://github.com/0day-ci/linux/commits/Dave-Chinner/xfs-log-fixes-for-for-next/20210617-162640
base: https://git.kernel.org/pub/scm/fs/xfs/xfs-linux.git for-next
config: h8300-randconfig-m031-20210625 (attached as .config)
compiler: h8300-linux-gcc (GCC) 9.3.0
If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
New smatch warnings:
fs/xfs/xfs_log_cil.c:1130 xlog_cil_push_work() error: uninitialized symbol 'commit_lsn'.
vim +/commit_lsn +1130 fs/xfs/xfs_log_cil.c
c7cc296ddd1f6d Christoph Hellwig 2020-03-20 861 static void
c7cc296ddd1f6d Christoph Hellwig 2020-03-20 862 xlog_cil_push_work(
c7cc296ddd1f6d Christoph Hellwig 2020-03-20 863 struct work_struct *work)
71e330b593905e Dave Chinner 2010-05-21 864 {
facd77e4e38b8f Dave Chinner 2021-06-04 865 struct xfs_cil_ctx *ctx =
facd77e4e38b8f Dave Chinner 2021-06-04 866 container_of(work, struct xfs_cil_ctx, push_work);
facd77e4e38b8f Dave Chinner 2021-06-04 867 struct xfs_cil *cil = ctx->cil;
c7cc296ddd1f6d Christoph Hellwig 2020-03-20 868 struct xlog *log = cil->xc_log;
71e330b593905e Dave Chinner 2010-05-21 869 struct xfs_log_vec *lv;
71e330b593905e Dave Chinner 2010-05-21 870 struct xfs_cil_ctx *new_ctx;
71e330b593905e Dave Chinner 2010-05-21 871 struct xlog_in_core *commit_iclog;
66fc9ffa8638be Dave Chinner 2021-06-04 872 int num_iovecs = 0;
66fc9ffa8638be Dave Chinner 2021-06-04 873 int num_bytes = 0;
71e330b593905e Dave Chinner 2010-05-21 874 int error = 0;
877cf3473914ae Dave Chinner 2021-06-04 875 struct xlog_cil_trans_hdr thdr;
a47518453bf958 Dave Chinner 2021-06-08 876 struct xfs_log_vec lvhdr = {};
71e330b593905e Dave Chinner 2010-05-21 877 xfs_lsn_t commit_lsn;
^^^^^^^^^^
4c2d542f2e7865 Dave Chinner 2012-04-23 878 xfs_lsn_t push_seq;
0279bbbbc03f2c Dave Chinner 2021-06-03 879 struct bio bio;
0279bbbbc03f2c Dave Chinner 2021-06-03 880 DECLARE_COMPLETION_ONSTACK(bdev_flush);
e12213ba5d909a Dave Chinner 2021-06-04 881 bool push_commit_stable;
e469cbe84f4ade Dave Chinner 2021-06-08 882 struct xlog_ticket *ticket;
71e330b593905e Dave Chinner 2010-05-21 883
facd77e4e38b8f Dave Chinner 2021-06-04 884 new_ctx = xlog_cil_ctx_alloc();
71e330b593905e Dave Chinner 2010-05-21 885 new_ctx->ticket = xlog_cil_ticket_alloc(log);
71e330b593905e Dave Chinner 2010-05-21 886
71e330b593905e Dave Chinner 2010-05-21 887 down_write(&cil->xc_ctx_lock);
71e330b593905e Dave Chinner 2010-05-21 888
4bb928cdb900d0 Dave Chinner 2013-08-12 889 spin_lock(&cil->xc_push_lock);
4c2d542f2e7865 Dave Chinner 2012-04-23 890 push_seq = cil->xc_push_seq;
4c2d542f2e7865 Dave Chinner 2012-04-23 891 ASSERT(push_seq <= ctx->sequence);
e12213ba5d909a Dave Chinner 2021-06-04 892 push_commit_stable = cil->xc_push_commit_stable;
e12213ba5d909a Dave Chinner 2021-06-04 893 cil->xc_push_commit_stable = false;
71e330b593905e Dave Chinner 2010-05-21 894
0e7ab7efe77451 Dave Chinner 2020-03-24 895 /*
3682277520d6f4 Dave Chinner 2021-06-04 896 * As we are about to switch to a new, empty CIL context, we no longer
3682277520d6f4 Dave Chinner 2021-06-04 897 * need to throttle tasks on CIL space overruns. Wake any waiters that
3682277520d6f4 Dave Chinner 2021-06-04 898 * the hard push throttle may have caught so they can start committing
3682277520d6f4 Dave Chinner 2021-06-04 899 * to the new context. The ctx->xc_push_lock provides the serialisation
3682277520d6f4 Dave Chinner 2021-06-04 900 * necessary for safely using the lockless waitqueue_active() check in
3682277520d6f4 Dave Chinner 2021-06-04 901 * this context.
3682277520d6f4 Dave Chinner 2021-06-04 902 */
3682277520d6f4 Dave Chinner 2021-06-04 903 if (waitqueue_active(&cil->xc_push_wait))
c7f87f3984cfa1 Dave Chinner 2020-06-16 904 wake_up_all(&cil->xc_push_wait);
0e7ab7efe77451 Dave Chinner 2020-03-24 905
4c2d542f2e7865 Dave Chinner 2012-04-23 906 /*
4c2d542f2e7865 Dave Chinner 2012-04-23 907 * Check if we've anything to push. If there is nothing, then we don't
4c2d542f2e7865 Dave Chinner 2012-04-23 908 * move on to a new sequence number and so we have to be able to push
4c2d542f2e7865 Dave Chinner 2012-04-23 909 * this sequence again later.
4c2d542f2e7865 Dave Chinner 2012-04-23 910 */
0d11bae4bcf4aa Dave Chinner 2021-06-04 911 if (test_bit(XLOG_CIL_EMPTY, &cil->xc_flags)) {
4c2d542f2e7865 Dave Chinner 2012-04-23 912 cil->xc_push_seq = 0;
4bb928cdb900d0 Dave Chinner 2013-08-12 913 spin_unlock(&cil->xc_push_lock);
a44f13edf0ebb4 Dave Chinner 2010-08-24 914 goto out_skip;
4c2d542f2e7865 Dave Chinner 2012-04-23 915 }
4c2d542f2e7865 Dave Chinner 2012-04-23 916
a44f13edf0ebb4 Dave Chinner 2010-08-24 917
cf085a1b5d2214 Joe Perches 2019-11-07 918 /* check for a previously pushed sequence */
facd77e4e38b8f Dave Chinner 2021-06-04 919 if (push_seq < ctx->sequence) {
8af3dcd3c89aef Dave Chinner 2014-09-23 920 spin_unlock(&cil->xc_push_lock);
df806158b0f6eb Dave Chinner 2010-05-17 921 goto out_skip;
8af3dcd3c89aef Dave Chinner 2014-09-23 922 }
8af3dcd3c89aef Dave Chinner 2014-09-23 923
8af3dcd3c89aef Dave Chinner 2014-09-23 924 /*
8af3dcd3c89aef Dave Chinner 2014-09-23 925 * We are now going to push this context, so add it to the committing
8af3dcd3c89aef Dave Chinner 2014-09-23 926 * list before we do anything else. This ensures that anyone waiting on
8af3dcd3c89aef Dave Chinner 2014-09-23 927 * this push can easily detect the difference between a "push in
8af3dcd3c89aef Dave Chinner 2014-09-23 928 * progress" and "CIL is empty, nothing to do".
8af3dcd3c89aef Dave Chinner 2014-09-23 929 *
8af3dcd3c89aef Dave Chinner 2014-09-23 930 * IOWs, a wait loop can now check for:
8af3dcd3c89aef Dave Chinner 2014-09-23 931 * the current sequence not being found on the committing list;
8af3dcd3c89aef Dave Chinner 2014-09-23 932 * an empty CIL; and
8af3dcd3c89aef Dave Chinner 2014-09-23 933 * an unchanged sequence number
8af3dcd3c89aef Dave Chinner 2014-09-23 934 * to detect a push that had nothing to do and therefore does not need
8af3dcd3c89aef Dave Chinner 2014-09-23 935 * waiting on. If the CIL is not empty, we get put on the committing
8af3dcd3c89aef Dave Chinner 2014-09-23 936 * list before emptying the CIL and bumping the sequence number. Hence
8af3dcd3c89aef Dave Chinner 2014-09-23 937 * an empty CIL and an unchanged sequence number means we jumped out
8af3dcd3c89aef Dave Chinner 2014-09-23 938 * above after doing nothing.
8af3dcd3c89aef Dave Chinner 2014-09-23 939 *
8af3dcd3c89aef Dave Chinner 2014-09-23 940 * Hence the waiter will either find the commit sequence on the
8af3dcd3c89aef Dave Chinner 2014-09-23 941 * committing list or the sequence number will be unchanged and the CIL
8af3dcd3c89aef Dave Chinner 2014-09-23 942 * still dirty. In that latter case, the push has not yet started, and
8af3dcd3c89aef Dave Chinner 2014-09-23 943 * so the waiter will have to continue trying to check the CIL
8af3dcd3c89aef Dave Chinner 2014-09-23 944 * committing list until it is found. In extreme cases of delay, the
8af3dcd3c89aef Dave Chinner 2014-09-23 945 * sequence may fully commit between the attempts the wait makes to wait
8af3dcd3c89aef Dave Chinner 2014-09-23 946 * on the commit sequence.
8af3dcd3c89aef Dave Chinner 2014-09-23 947 */
8af3dcd3c89aef Dave Chinner 2014-09-23 948 list_add(&ctx->committing, &cil->xc_committing);
8af3dcd3c89aef Dave Chinner 2014-09-23 949 spin_unlock(&cil->xc_push_lock);
df806158b0f6eb Dave Chinner 2010-05-17 950
71e330b593905e Dave Chinner 2010-05-21 951 /*
0279bbbbc03f2c Dave Chinner 2021-06-03 952 * The CIL is stable at this point - nothing new will be added to it
0279bbbbc03f2c Dave Chinner 2021-06-03 953 * because we hold the flush lock exclusively. Hence we can now issue
0279bbbbc03f2c Dave Chinner 2021-06-03 954 * a cache flush to ensure all the completed metadata in the journal we
0279bbbbc03f2c Dave Chinner 2021-06-03 955 * are about to overwrite is on stable storage.
0279bbbbc03f2c Dave Chinner 2021-06-03 956 */
0279bbbbc03f2c Dave Chinner 2021-06-03 957 xfs_flush_bdev_async(&bio, log->l_mp->m_ddev_targp->bt_bdev,
0279bbbbc03f2c Dave Chinner 2021-06-03 958 &bdev_flush);
0279bbbbc03f2c Dave Chinner 2021-06-03 959
a8613836d99e62 Dave Chinner 2021-06-08 960 xlog_cil_pcp_aggregate(cil, ctx);
a8613836d99e62 Dave Chinner 2021-06-08 961
1f18c0c4b78cfb Dave Chinner 2021-06-08 962 while (!list_empty(&ctx->log_items)) {
71e330b593905e Dave Chinner 2010-05-21 963 struct xfs_log_item *item;
71e330b593905e Dave Chinner 2010-05-21 964
1f18c0c4b78cfb Dave Chinner 2021-06-08 965 item = list_first_entry(&ctx->log_items,
71e330b593905e Dave Chinner 2010-05-21 966 struct xfs_log_item, li_cil);
a47518453bf958 Dave Chinner 2021-06-08 967 lv = item->li_lv;
a1785f597c8b06 Dave Chinner 2021-06-08 968 lv->lv_order_id = item->li_order_id;
a47518453bf958 Dave Chinner 2021-06-08 969 num_iovecs += lv->lv_niovecs;
66fc9ffa8638be Dave Chinner 2021-06-04 970 /* we don't write ordered log vectors */
66fc9ffa8638be Dave Chinner 2021-06-04 971 if (lv->lv_buf_len != XFS_LOG_VEC_ORDERED)
66fc9ffa8638be Dave Chinner 2021-06-04 972 num_bytes += lv->lv_bytes;
a47518453bf958 Dave Chinner 2021-06-08 973
a47518453bf958 Dave Chinner 2021-06-08 974 list_add_tail(&lv->lv_list, &ctx->lv_chain);
a1785f597c8b06 Dave Chinner 2021-06-08 975 list_del_init(&item->li_cil);
a1785f597c8b06 Dave Chinner 2021-06-08 976 item->li_order_id = 0;
a1785f597c8b06 Dave Chinner 2021-06-08 977 item->li_lv = NULL;
71e330b593905e Dave Chinner 2010-05-21 978 }
71e330b593905e Dave Chinner 2010-05-21 979
71e330b593905e Dave Chinner 2010-05-21 980 /*
facd77e4e38b8f Dave Chinner 2021-06-04 981 * Switch the contexts so we can drop the context lock and move out
71e330b593905e Dave Chinner 2010-05-21 982 * of a shared context. We can't just go straight to the commit record,
71e330b593905e Dave Chinner 2010-05-21 983 * though - we need to synchronise with previous and future commits so
71e330b593905e Dave Chinner 2010-05-21 984 * that the commit records are correctly ordered in the log to ensure
71e330b593905e Dave Chinner 2010-05-21 985 * that we process items during log IO completion in the correct order.
71e330b593905e Dave Chinner 2010-05-21 986 *
71e330b593905e Dave Chinner 2010-05-21 987 * For example, if we get an EFI in one checkpoint and the EFD in the
71e330b593905e Dave Chinner 2010-05-21 988 * next (e.g. due to log forces), we do not want the checkpoint with
71e330b593905e Dave Chinner 2010-05-21 989 * the EFD to be committed before the checkpoint with the EFI. Hence
71e330b593905e Dave Chinner 2010-05-21 990 * we must strictly order the commit records of the checkpoints so
71e330b593905e Dave Chinner 2010-05-21 991 * that: a) the checkpoint callbacks are attached to the iclogs in the
71e330b593905e Dave Chinner 2010-05-21 992 * correct order; and b) the checkpoints are replayed in correct order
71e330b593905e Dave Chinner 2010-05-21 993 * in log recovery.
71e330b593905e Dave Chinner 2010-05-21 994 *
71e330b593905e Dave Chinner 2010-05-21 995 * Hence we need to add this context to the committing context list so
71e330b593905e Dave Chinner 2010-05-21 996 * that higher sequences will wait for us to write out a commit record
71e330b593905e Dave Chinner 2010-05-21 997 * before they do.
f876e44603ad09 Dave Chinner 2014-02-27 998 *
f39ae5297c5ce2 Dave Chinner 2021-06-04 999 * xfs_log_force_seq requires us to mirror the new sequence into the cil
f876e44603ad09 Dave Chinner 2014-02-27 1000 * structure atomically with the addition of this sequence to the
f876e44603ad09 Dave Chinner 2014-02-27 1001 * committing list. This also ensures that we can do unlocked checks
f876e44603ad09 Dave Chinner 2014-02-27 1002 * against the current sequence in log forces without risking
f876e44603ad09 Dave Chinner 2014-02-27 1003 * deferencing a freed context pointer.
71e330b593905e Dave Chinner 2010-05-21 1004 */
4bb928cdb900d0 Dave Chinner 2013-08-12 1005 spin_lock(&cil->xc_push_lock);
facd77e4e38b8f Dave Chinner 2021-06-04 1006 xlog_cil_ctx_switch(cil, new_ctx);
4bb928cdb900d0 Dave Chinner 2013-08-12 1007 spin_unlock(&cil->xc_push_lock);
71e330b593905e Dave Chinner 2010-05-21 1008 up_write(&cil->xc_ctx_lock);
71e330b593905e Dave Chinner 2010-05-21 1009
a1785f597c8b06 Dave Chinner 2021-06-08 1010 /*
a1785f597c8b06 Dave Chinner 2021-06-08 1011 * Sort the log vector chain before we add the transaction headers.
a1785f597c8b06 Dave Chinner 2021-06-08 1012 * This ensures we always have the transaction headers at the start
a1785f597c8b06 Dave Chinner 2021-06-08 1013 * of the chain.
a1785f597c8b06 Dave Chinner 2021-06-08 1014 */
a1785f597c8b06 Dave Chinner 2021-06-08 1015 list_sort(NULL, &ctx->lv_chain, xlog_cil_order_cmp);
a1785f597c8b06 Dave Chinner 2021-06-08 1016
71e330b593905e Dave Chinner 2010-05-21 1017 /*
71e330b593905e Dave Chinner 2010-05-21 1018 * Build a checkpoint transaction header and write it to the log to
71e330b593905e Dave Chinner 2010-05-21 1019 * begin the transaction. We need to account for the space used by the
71e330b593905e Dave Chinner 2010-05-21 1020 * transaction header here as it is not accounted for in xlog_write().
a47518453bf958 Dave Chinner 2021-06-08 1021 * Add the lvhdr to the head of the lv chain we pass to xlog_write() so
a47518453bf958 Dave Chinner 2021-06-08 1022 * it gets written into the iclog first.
71e330b593905e Dave Chinner 2010-05-21 1023 */
877cf3473914ae Dave Chinner 2021-06-04 1024 xlog_cil_build_trans_hdr(ctx, &thdr, &lvhdr, num_iovecs);
66fc9ffa8638be Dave Chinner 2021-06-04 1025 num_bytes += lvhdr.lv_bytes;
a47518453bf958 Dave Chinner 2021-06-08 1026 list_add(&lvhdr.lv_list, &ctx->lv_chain);
71e330b593905e Dave Chinner 2010-05-21 1027
0279bbbbc03f2c Dave Chinner 2021-06-03 1028 /*
0279bbbbc03f2c Dave Chinner 2021-06-03 1029 * Before we format and submit the first iclog, we have to ensure that
0279bbbbc03f2c Dave Chinner 2021-06-03 1030 * the metadata writeback ordering cache flush is complete.
0279bbbbc03f2c Dave Chinner 2021-06-03 1031 */
0279bbbbc03f2c Dave Chinner 2021-06-03 1032 wait_for_completion(&bdev_flush);
0279bbbbc03f2c Dave Chinner 2021-06-03 1033
877cf3473914ae Dave Chinner 2021-06-04 1034 /*
877cf3473914ae Dave Chinner 2021-06-04 1035 * The LSN we need to pass to the log items on transaction commit is the
877cf3473914ae Dave Chinner 2021-06-04 1036 * LSN reported by the first log vector write, not the commit lsn. If we
877cf3473914ae Dave Chinner 2021-06-04 1037 * use the commit record lsn then we can move the tail beyond the grant
877cf3473914ae Dave Chinner 2021-06-04 1038 * write head.
877cf3473914ae Dave Chinner 2021-06-04 1039 */
fc3370002b56bc Dave Chinner 2021-06-17 1040 error = xlog_write(log, ctx, &ctx->lv_chain, ctx->ticket,
a47518453bf958 Dave Chinner 2021-06-08 1041 NULL, num_bytes);
a47518453bf958 Dave Chinner 2021-06-08 1042
a47518453bf958 Dave Chinner 2021-06-08 1043 /*
a47518453bf958 Dave Chinner 2021-06-08 1044 * Take the lvhdr back off the lv_chain as it should not be passed
a47518453bf958 Dave Chinner 2021-06-08 1045 * to log IO completion.
a47518453bf958 Dave Chinner 2021-06-08 1046 */
a47518453bf958 Dave Chinner 2021-06-08 1047 list_del(&lvhdr.lv_list);
71e330b593905e Dave Chinner 2010-05-21 1048 if (error)
7db37c5e6575b2 Dave Chinner 2011-01-27 1049 goto out_abort_free_ticket;
71e330b593905e Dave Chinner 2010-05-21 1050
71e330b593905e Dave Chinner 2010-05-21 1051 /*
71e330b593905e Dave Chinner 2010-05-21 1052 * now that we've written the checkpoint into the log, strictly
71e330b593905e Dave Chinner 2010-05-21 1053 * order the commit records so replay will get them in the right order.
71e330b593905e Dave Chinner 2010-05-21 1054 */
71e330b593905e Dave Chinner 2010-05-21 1055 restart:
4bb928cdb900d0 Dave Chinner 2013-08-12 1056 spin_lock(&cil->xc_push_lock);
71e330b593905e Dave Chinner 2010-05-21 1057 list_for_each_entry(new_ctx, &cil->xc_committing, committing) {
ac983517ec5941 Dave Chinner 2014-05-07 1058 /*
ac983517ec5941 Dave Chinner 2014-05-07 1059 * Avoid getting stuck in this loop because we were woken by the
ac983517ec5941 Dave Chinner 2014-05-07 1060 * shutdown, but then went back to sleep once already in the
ac983517ec5941 Dave Chinner 2014-05-07 1061 * shutdown state.
ac983517ec5941 Dave Chinner 2014-05-07 1062 */
ac983517ec5941 Dave Chinner 2014-05-07 1063 if (XLOG_FORCED_SHUTDOWN(log)) {
ac983517ec5941 Dave Chinner 2014-05-07 1064 spin_unlock(&cil->xc_push_lock);
ac983517ec5941 Dave Chinner 2014-05-07 1065 goto out_abort_free_ticket;
ac983517ec5941 Dave Chinner 2014-05-07 1066 }
ac983517ec5941 Dave Chinner 2014-05-07 1067
71e330b593905e Dave Chinner 2010-05-21 1068 /*
71e330b593905e Dave Chinner 2010-05-21 1069 * Higher sequences will wait for this one so skip them.
ac983517ec5941 Dave Chinner 2014-05-07 1070 * Don't wait for our own sequence, either.
71e330b593905e Dave Chinner 2010-05-21 1071 */
71e330b593905e Dave Chinner 2010-05-21 1072 if (new_ctx->sequence >= ctx->sequence)
71e330b593905e Dave Chinner 2010-05-21 1073 continue;
71e330b593905e Dave Chinner 2010-05-21 1074 if (!new_ctx->commit_lsn) {
71e330b593905e Dave Chinner 2010-05-21 1075 /*
71e330b593905e Dave Chinner 2010-05-21 1076 * It is still being pushed! Wait for the push to
71e330b593905e Dave Chinner 2010-05-21 1077 * complete, then start again from the beginning.
71e330b593905e Dave Chinner 2010-05-21 1078 */
4bb928cdb900d0 Dave Chinner 2013-08-12 1079 xlog_wait(&cil->xc_commit_wait, &cil->xc_push_lock);
71e330b593905e Dave Chinner 2010-05-21 1080 goto restart;
71e330b593905e Dave Chinner 2010-05-21 1081 }
71e330b593905e Dave Chinner 2010-05-21 1082 }
4bb928cdb900d0 Dave Chinner 2013-08-12 1083 spin_unlock(&cil->xc_push_lock);
71e330b593905e Dave Chinner 2010-05-21 1084
fc3370002b56bc Dave Chinner 2021-06-17 1085 error = xlog_cil_write_commit_record(ctx, &commit_iclog);
dd401770b0ff68 Dave Chinner 2020-03-25 1086 if (error)
dd401770b0ff68 Dave Chinner 2020-03-25 1087 goto out_abort_free_ticket;
dd401770b0ff68 Dave Chinner 2020-03-25 1088
89ae379d564c5d Christoph Hellwig 2019-06-28 1089 spin_lock(&commit_iclog->ic_callback_lock);
1858bb0bec612d Christoph Hellwig 2019-10-14 1090 if (commit_iclog->ic_state == XLOG_STATE_IOERROR) {
89ae379d564c5d Christoph Hellwig 2019-06-28 1091 spin_unlock(&commit_iclog->ic_callback_lock);
e469cbe84f4ade Dave Chinner 2021-06-08 1092 goto out_abort_free_ticket;
89ae379d564c5d Christoph Hellwig 2019-06-28 1093 }
89ae379d564c5d Christoph Hellwig 2019-06-28 1094 ASSERT_ALWAYS(commit_iclog->ic_state == XLOG_STATE_ACTIVE ||
89ae379d564c5d Christoph Hellwig 2019-06-28 1095 commit_iclog->ic_state == XLOG_STATE_WANT_SYNC);
89ae379d564c5d Christoph Hellwig 2019-06-28 1096 list_add_tail(&ctx->iclog_entry, &commit_iclog->ic_callbacks);
89ae379d564c5d Christoph Hellwig 2019-06-28 1097 spin_unlock(&commit_iclog->ic_callback_lock);
71e330b593905e Dave Chinner 2010-05-21 1098
71e330b593905e Dave Chinner 2010-05-21 1099 /*
71e330b593905e Dave Chinner 2010-05-21 1100 * now the checkpoint commit is complete and we've attached the
71e330b593905e Dave Chinner 2010-05-21 1101 * callbacks to the iclog we can assign the commit LSN to the context
71e330b593905e Dave Chinner 2010-05-21 1102 * and wake up anyone who is waiting for the commit to complete.
71e330b593905e Dave Chinner 2010-05-21 1103 */
4bb928cdb900d0 Dave Chinner 2013-08-12 1104 spin_lock(&cil->xc_push_lock);
eb40a87500ac2f Dave Chinner 2010-12-21 1105 wake_up_all(&cil->xc_commit_wait);
4bb928cdb900d0 Dave Chinner 2013-08-12 1106 spin_unlock(&cil->xc_push_lock);
71e330b593905e Dave Chinner 2010-05-21 1107
e469cbe84f4ade Dave Chinner 2021-06-08 1108 /*
e469cbe84f4ade Dave Chinner 2021-06-08 1109 * Pull the ticket off the ctx so we can ungrant it after releasing the
e469cbe84f4ade Dave Chinner 2021-06-08 1110 * commit_iclog. The ctx may be freed by the time we return from
e469cbe84f4ade Dave Chinner 2021-06-08 1111 * releasing the commit_iclog (i.e. checkpoint has been completed and
e469cbe84f4ade Dave Chinner 2021-06-08 1112 * callback run) so we can't reference the ctx after the call to
e469cbe84f4ade Dave Chinner 2021-06-08 1113 * xlog_state_release_iclog().
e469cbe84f4ade Dave Chinner 2021-06-08 1114 */
e469cbe84f4ade Dave Chinner 2021-06-08 1115 ticket = ctx->ticket;
e469cbe84f4ade Dave Chinner 2021-06-08 1116
5fd9256ce156ef Dave Chinner 2021-06-03 1117 /*
815753dc16bbca Dave Chinner 2021-06-17 1118 * If the checkpoint spans multiple iclogs, wait for all previous iclogs
815753dc16bbca Dave Chinner 2021-06-17 1119 * to complete before we submit the commit_iclog. We can't use state
815753dc16bbca Dave Chinner 2021-06-17 1120 * checks for this - ACTIVE can be either a past completed iclog or a
815753dc16bbca Dave Chinner 2021-06-17 1121 * future iclog being filled, while WANT_SYNC through SYNC_DONE can be a
815753dc16bbca Dave Chinner 2021-06-17 1122 * past or future iclog awaiting IO or ordered IO completion to be run.
815753dc16bbca Dave Chinner 2021-06-17 1123 * In the latter case, if it's a future iclog and we wait on it, the we
815753dc16bbca Dave Chinner 2021-06-17 1124 * will hang because it won't get processed through to ic_force_wait
815753dc16bbca Dave Chinner 2021-06-17 1125 * wakeup until this commit_iclog is written to disk. Hence we use the
815753dc16bbca Dave Chinner 2021-06-17 1126 * iclog header lsn and compare it to the commit lsn to determine if we
815753dc16bbca Dave Chinner 2021-06-17 1127 * need to wait on iclogs or not.
5fd9256ce156ef Dave Chinner 2021-06-03 1128 */
5fd9256ce156ef Dave Chinner 2021-06-03 1129 spin_lock(&log->l_icloglock);
cb1acb3f324636 Dave Chinner 2021-06-04 @1130 if (ctx->start_lsn != commit_lsn) {
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Never initialized.
815753dc16bbca Dave Chinner 2021-06-17 1131 struct xlog_in_core *iclog;
815753dc16bbca Dave Chinner 2021-06-17 1132
815753dc16bbca Dave Chinner 2021-06-17 1133 for (iclog = commit_iclog->ic_prev;
815753dc16bbca Dave Chinner 2021-06-17 1134 iclog != commit_iclog;
815753dc16bbca Dave Chinner 2021-06-17 1135 iclog = iclog->ic_prev) {
815753dc16bbca Dave Chinner 2021-06-17 1136 xfs_lsn_t hlsn;
815753dc16bbca Dave Chinner 2021-06-17 1137
815753dc16bbca Dave Chinner 2021-06-17 1138 /*
815753dc16bbca Dave Chinner 2021-06-17 1139 * If the LSN of the iclog is zero or in the future it
815753dc16bbca Dave Chinner 2021-06-17 1140 * means it has passed through IO completion and
815753dc16bbca Dave Chinner 2021-06-17 1141 * activation and hence all previous iclogs have also
815753dc16bbca Dave Chinner 2021-06-17 1142 * done so. We do not need to wait at all in this case.
815753dc16bbca Dave Chinner 2021-06-17 1143 */
815753dc16bbca Dave Chinner 2021-06-17 1144 hlsn = be64_to_cpu(iclog->ic_header.h_lsn);
815753dc16bbca Dave Chinner 2021-06-17 1145 if (!hlsn || XFS_LSN_CMP(hlsn, commit_lsn) > 0)
815753dc16bbca Dave Chinner 2021-06-17 1146 break;
815753dc16bbca Dave Chinner 2021-06-17 1147
815753dc16bbca Dave Chinner 2021-06-17 1148 /*
815753dc16bbca Dave Chinner 2021-06-17 1149 * If the LSN of the iclog is older than the commit lsn,
815753dc16bbca Dave Chinner 2021-06-17 1150 * we have to wait on it. Waiting on this via the
815753dc16bbca Dave Chinner 2021-06-17 1151 * ic_force_wait should also order the completion of all
815753dc16bbca Dave Chinner 2021-06-17 1152 * older iclogs, too, but we leave checking that to the
815753dc16bbca Dave Chinner 2021-06-17 1153 * next loop iteration.
815753dc16bbca Dave Chinner 2021-06-17 1154 */
815753dc16bbca Dave Chinner 2021-06-17 1155 ASSERT(XFS_LSN_CMP(hlsn, commit_lsn) < 0);
815753dc16bbca Dave Chinner 2021-06-17 1156 xlog_wait_on_iclog(iclog);
cb1acb3f324636 Dave Chinner 2021-06-04 1157 spin_lock(&log->l_icloglock);
815753dc16bbca Dave Chinner 2021-06-17 1158 }
815753dc16bbca Dave Chinner 2021-06-17 1159
815753dc16bbca Dave Chinner 2021-06-17 1160 /*
815753dc16bbca Dave Chinner 2021-06-17 1161 * Regardless of whether we need to wait or not, the the
815753dc16bbca Dave Chinner 2021-06-17 1162 * commit_iclog write needs to issue a pre-flush so that the
815753dc16bbca Dave Chinner 2021-06-17 1163 * ordering for this checkpoint is correctly preserved down to
815753dc16bbca Dave Chinner 2021-06-17 1164 * stable storage.
815753dc16bbca Dave Chinner 2021-06-17 1165 */
cb1acb3f324636 Dave Chinner 2021-06-04 1166 commit_iclog->ic_flags |= XLOG_ICL_NEED_FLUSH;
5fd9256ce156ef Dave Chinner 2021-06-03 1167 }
5fd9256ce156ef Dave Chinner 2021-06-03 1168
cb1acb3f324636 Dave Chinner 2021-06-04 1169 /*
cb1acb3f324636 Dave Chinner 2021-06-04 1170 * The commit iclog must be written to stable storage to guarantee
cb1acb3f324636 Dave Chinner 2021-06-04 1171 * journal IO vs metadata writeback IO is correctly ordered on stable
cb1acb3f324636 Dave Chinner 2021-06-04 1172 * storage.
e12213ba5d909a Dave Chinner 2021-06-04 1173 *
e12213ba5d909a Dave Chinner 2021-06-04 1174 * If the push caller needs the commit to be immediately stable and the
e12213ba5d909a Dave Chinner 2021-06-04 1175 * commit_iclog is not yet marked as XLOG_STATE_WANT_SYNC to indicate it
e12213ba5d909a Dave Chinner 2021-06-04 1176 * will be written when released, switch it's state to WANT_SYNC right
e12213ba5d909a Dave Chinner 2021-06-04 1177 * now.
cb1acb3f324636 Dave Chinner 2021-06-04 1178 */
cb1acb3f324636 Dave Chinner 2021-06-04 1179 commit_iclog->ic_flags |= XLOG_ICL_NEED_FUA;
e12213ba5d909a Dave Chinner 2021-06-04 1180 if (push_commit_stable && commit_iclog->ic_state == XLOG_STATE_ACTIVE)
e12213ba5d909a Dave Chinner 2021-06-04 1181 xlog_state_switch_iclogs(log, commit_iclog, 0);
e469cbe84f4ade Dave Chinner 2021-06-08 1182 xlog_state_release_iclog(log, commit_iclog, ticket);
cb1acb3f324636 Dave Chinner 2021-06-04 1183 spin_unlock(&log->l_icloglock);
e469cbe84f4ade Dave Chinner 2021-06-08 1184
e469cbe84f4ade Dave Chinner 2021-06-08 1185 xfs_log_ticket_ungrant(log, ticket);
c7cc296ddd1f6d Christoph Hellwig 2020-03-20 1186 return;
71e330b593905e Dave Chinner 2010-05-21 1187
71e330b593905e Dave Chinner 2010-05-21 1188 out_skip:
71e330b593905e Dave Chinner 2010-05-21 1189 up_write(&cil->xc_ctx_lock);
71e330b593905e Dave Chinner 2010-05-21 1190 xfs_log_ticket_put(new_ctx->ticket);
71e330b593905e Dave Chinner 2010-05-21 1191 kmem_free(new_ctx);
c7cc296ddd1f6d Christoph Hellwig 2020-03-20 1192 return;
71e330b593905e Dave Chinner 2010-05-21 1193
7db37c5e6575b2 Dave Chinner 2011-01-27 1194 out_abort_free_ticket:
877cf3473914ae Dave Chinner 2021-06-04 1195 xfs_log_ticket_ungrant(log, ctx->ticket);
12e6a0f449d585 Christoph Hellwig 2020-03-20 1196 ASSERT(XLOG_FORCED_SHUTDOWN(log));
12e6a0f449d585 Christoph Hellwig 2020-03-20 1197 xlog_cil_committed(ctx);
4c2d542f2e7865 Dave Chinner 2012-04-23 1198 }
---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all(a)lists.01.org
^ permalink raw reply [flat|nested] 50+ messages in thread
end of thread, other threads:[~2021-06-28 8:58 UTC | newest]
Thread overview: 50+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-06-17 8:26 [PATCH 0/8 V2] xfs: log fixes for for-next Dave Chinner
2021-06-17 8:26 ` [PATCH 1/8] xfs: add iclog state trace events Dave Chinner
2021-06-17 16:45 ` Darrick J. Wong
2021-06-18 14:09 ` Christoph Hellwig
2021-06-17 8:26 ` [PATCH 2/8] xfs: don't wait on future iclogs when pushing the CIL Dave Chinner
2021-06-17 17:49 ` Darrick J. Wong
2021-06-17 21:55 ` Dave Chinner
2021-06-17 8:26 ` [PATCH 3/8] xfs: move xlog_commit_record to xfs_log_cil.c Dave Chinner
2021-06-17 12:57 ` kernel test robot
2021-06-17 12:57 ` kernel test robot
2021-06-17 17:50 ` Darrick J. Wong
2021-06-17 21:56 ` Dave Chinner
2021-06-18 14:16 ` Christoph Hellwig
2021-06-17 8:26 ` [PATCH 4/8] xfs: pass a CIL context to xlog_write() Dave Chinner
2021-06-17 14:46 ` kernel test robot
2021-06-17 14:46 ` kernel test robot
2021-06-17 20:24 ` Darrick J. Wong
2021-06-17 22:03 ` Dave Chinner
2021-06-17 22:18 ` Darrick J. Wong
2021-06-18 14:23 ` Christoph Hellwig
2021-06-17 8:26 ` [PATCH 5/8] xfs: factor out log write ordering from xlog_cil_push_work() Dave Chinner
2021-06-17 19:59 ` Darrick J. Wong
2021-06-18 14:27 ` Christoph Hellwig
2021-06-18 22:34 ` Dave Chinner
2021-06-17 8:26 ` [PATCH 6/8] xfs: separate out setting CIL context LSNs from xlog_write Dave Chinner
2021-06-17 20:28 ` Darrick J. Wong
2021-06-17 22:10 ` Dave Chinner
2021-06-17 8:26 ` [PATCH 7/8] xfs: attached iclog callbacks in xlog_cil_set_ctx_write_state() Dave Chinner
2021-06-17 20:55 ` Darrick J. Wong
2021-06-17 22:20 ` Dave Chinner
2021-06-17 8:26 ` [PATCH 8/8] xfs: order CIL checkpoint start records Dave Chinner
2021-06-17 21:31 ` Darrick J. Wong
2021-06-17 22:49 ` Dave Chinner
2021-06-17 18:32 ` [PATCH 0/8 V2] xfs: log fixes for for-next Brian Foster
2021-06-17 19:05 ` Darrick J. Wong
2021-06-17 20:06 ` Brian Foster
2021-06-17 20:26 ` Darrick J. Wong
2021-06-17 23:31 ` Brian Foster
2021-06-17 23:43 ` Dave Chinner
2021-06-18 13:08 ` Brian Foster
2021-06-18 13:55 ` Christoph Hellwig
2021-06-18 14:02 ` Christoph Hellwig
2021-06-18 22:28 ` Dave Chinner
2021-06-18 22:15 ` Dave Chinner
2021-06-18 22:48 ` Dave Chinner
2021-06-19 20:22 ` Darrick J. Wong
2021-06-20 22:18 ` Dave Chinner
2021-06-26 23:10 [PATCH 4/8] xfs: pass a CIL context to xlog_write() kernel test robot
2021-06-28 8:58 ` Dan Carpenter
2021-06-28 8:58 ` Dan Carpenter
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.