Linux-XFS Archive on lore.kernel.org
 help / color / Atom feed
* [RFC v2 PATCH 0/3] xfs: automatic relogging experiment
@ 2019-11-22 18:19 Brian Foster
  2019-11-22 18:19 ` [RFC v2 PATCH 1/3] xfs: set t_task at wait time instead of alloc time Brian Foster
                   ` (3 more replies)
  0 siblings, 4 replies; 6+ messages in thread
From: Brian Foster @ 2019-11-22 18:19 UTC (permalink / raw)
  To: linux-xfs

Hi all,

Here's a second pass at the automatic relogging experiment. The first
pass tagged the log item for relog and stole reservation from unrelated
transactions to opportunistically relog items, all within the log
subsystem. The primary feedback to that approach was to consider a
transaction function callback and concerns about reservation stealing.
This version somewhat accommodates those concerns, but still leaves some
open questions around broader usage.

In short, this version introduces a separate workqueue context for
committing and rolling transactions for relogged items and a hook to
trigger a relog based on the item being committed to the AIL. The relog
state (including log reservation for the relog) is tracked via the log
item. Only the log ticket from the caller transaction is regranted and
tracked to avoid processing open transactions through separate contexts.
This is essentially an open-coded transaction roll without the need for
a duplicate transaction.

While this seems to work reasonably well for the simple case of
quotaoff, there are still some open issues to resolve around formalizing
such a mechanism for broader use. Firstly, the quotaoff patch just ties
into the existing ->iop_committed() callback since that is currently
unused, but the queue relog call could just as easily be made directly
from the AIL commit code. IOW, the whole callback thing seems kind of
like a solution looking for a problem in this context. With external
tracking of relogged items, all we really need here fundamentally is a
notification that the CIL context has committed.

Another caveat is that this approach seems much more cumbersome and
inefficient with respect to batching relogs of unrelated items. Rather
than collect all unrelated items in a single transaction, we'd have a
separate tracking structure, log ticket (which regrants the caller's
entire reservation) and independent roll/regrant for each potential set
of items. I can see value in the simplicity and flexibility of use in
terms of being able to potentially register an already constructed
transaction for automatic relogging, but I question whether that suits
our use case(s).

All in all, I think this is still pretty raw and needs some more thought
on the design (or perhaps requirements) front. I could see this going
anywhere from something more low level like the previous version where
we could have a relogged items list (RIL) associated with the CIL and
require initial transactions donate real relog reservation to something
more like this approach where we have some separate context
queueing/rolling relog transactions based on log subsystem events. I
suspect there are capability considerations for either end of that
spectrum. For example, it might be harder to auto relog anything but
intents with the RIL approach where there is no transaction to own item
locks, etc., but do we currently have a use case to relog anything
besides QUOTAOFF and EFIs? If not, I lean more toward the more simple,
low level approach (at least for now), but I could be convinced
otherwise.

Thoughts on any of this appreciated.

Brian

rfcv2:
- Different approach based on workqueue and transaction rolling.
rfc: https://lore.kernel.org/linux-xfs/20191024172850.7698-1-bfoster@redhat.com/

Brian Foster (3):
  xfs: set t_task at wait time instead of alloc time
  xfs: prototype automatic intent relogging mechanism
  xfs: automatically relog quotaoff start intent

 fs/xfs/Makefile                |   1 +
 fs/xfs/libxfs/xfs_trans_resv.c |   3 +-
 fs/xfs/xfs_dquot_item.c        |  13 ++++
 fs/xfs/xfs_log.c               |  11 ++-
 fs/xfs/xfs_log_priv.h          |   1 +
 fs/xfs/xfs_qm_syscalls.c       |   9 ++-
 fs/xfs/xfs_trans.c             |   2 +-
 fs/xfs/xfs_trans.h             |  13 ++++
 fs/xfs/xfs_trans_relog.c       | 130 +++++++++++++++++++++++++++++++++
 9 files changed, 179 insertions(+), 4 deletions(-)
 create mode 100644 fs/xfs/xfs_trans_relog.c

-- 
2.20.1


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [RFC v2 PATCH 1/3] xfs: set t_task at wait time instead of alloc time
  2019-11-22 18:19 [RFC v2 PATCH 0/3] xfs: automatic relogging experiment Brian Foster
@ 2019-11-22 18:19 ` Brian Foster
  2019-11-22 18:19 ` [RFC v2 PATCH 2/3] xfs: prototype automatic intent relogging mechanism Brian Foster
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 6+ messages in thread
From: Brian Foster @ 2019-11-22 18:19 UTC (permalink / raw)
  To: linux-xfs

The xlog_ticket structure contains a task reference to support task
scheduling associated with log reservation acquisition. This
reference is assigned at ticket allocation time, but otherwise there
is no reason log space cannot be reserved for a ticket from a
context different from the allocating context. Move the task
assignment to the log reservation blocking code where it is used.

Signed-off-by: Brian Foster <bfoster@redhat.com>
---
 fs/xfs/xfs_log.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c
index 6a147c63a8a6..0c0c035c5be0 100644
--- a/fs/xfs/xfs_log.c
+++ b/fs/xfs/xfs_log.c
@@ -262,6 +262,7 @@ xlog_grant_head_wait(
 	int			need_bytes) __releases(&head->lock)
 					    __acquires(&head->lock)
 {
+	tic->t_task = current;
 	list_add_tail(&tic->t_queue, &head->waiters);
 
 	do {
@@ -3599,7 +3600,6 @@ xlog_ticket_alloc(
 	unit_res = xfs_log_calc_unit_res(log->l_mp, unit_bytes);
 
 	atomic_set(&tic->t_ref, 1);
-	tic->t_task		= current;
 	INIT_LIST_HEAD(&tic->t_queue);
 	tic->t_unit_res		= unit_res;
 	tic->t_curr_res		= unit_res;
-- 
2.20.1


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [RFC v2 PATCH 2/3] xfs: prototype automatic intent relogging mechanism
  2019-11-22 18:19 [RFC v2 PATCH 0/3] xfs: automatic relogging experiment Brian Foster
  2019-11-22 18:19 ` [RFC v2 PATCH 1/3] xfs: set t_task at wait time instead of alloc time Brian Foster
@ 2019-11-22 18:19 ` Brian Foster
  2019-11-22 18:19 ` [RFC v2 PATCH 3/3] xfs: automatically relog quotaoff start intent Brian Foster
  2019-11-25 18:55 ` [RFC v3 PATCH] xfs: automatic relogging experiment Brian Foster
  3 siblings, 0 replies; 6+ messages in thread
From: Brian Foster @ 2019-11-22 18:19 UTC (permalink / raw)
  To: linux-xfs

Signed-off-by: Brian Foster <bfoster@redhat.com>
---
 fs/xfs/Makefile          |   1 +
 fs/xfs/xfs_log.c         |   9 +++
 fs/xfs/xfs_log_priv.h    |   1 +
 fs/xfs/xfs_trans.c       |   2 +-
 fs/xfs/xfs_trans.h       |  13 ++++
 fs/xfs/xfs_trans_relog.c | 130 +++++++++++++++++++++++++++++++++++++++
 6 files changed, 155 insertions(+), 1 deletion(-)
 create mode 100644 fs/xfs/xfs_trans_relog.c

diff --git a/fs/xfs/Makefile b/fs/xfs/Makefile
index aceca2f9a3db..c4664a972e50 100644
--- a/fs/xfs/Makefile
+++ b/fs/xfs/Makefile
@@ -92,6 +92,7 @@ xfs-y				+= xfs_aops.o \
 				   xfs_symlink.o \
 				   xfs_sysfs.o \
 				   xfs_trans.o \
+				   xfs_trans_relog.o \
 				   xfs_xattr.o \
 				   kmem.o
 
diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c
index 0c0c035c5be0..4f4c6b38621a 100644
--- a/fs/xfs/xfs_log.c
+++ b/fs/xfs/xfs_log.c
@@ -1531,12 +1531,20 @@ xlog_alloc_log(
 	if (!log->l_ioend_workqueue)
 		goto out_free_iclog;
 
+	log->l_relog_workqueue = alloc_workqueue("xfs-relog/%s",
+			WQ_MEM_RECLAIM | WQ_FREEZABLE | WQ_HIGHPRI, 0,
+			mp->m_super->s_id);
+	if (!log->l_relog_workqueue)
+		goto out_destroy_workqueue;
+
 	error = xlog_cil_init(log);
 	if (error)
 		goto out_destroy_workqueue;
 	return log;
 
 out_destroy_workqueue:
+	if (log->l_relog_workqueue)
+		destroy_workqueue(log->l_relog_workqueue);
 	destroy_workqueue(log->l_ioend_workqueue);
 out_free_iclog:
 	for (iclog = log->l_iclog; iclog; iclog = prev_iclog) {
@@ -2008,6 +2016,7 @@ xlog_dealloc_log(
 	}
 
 	log->l_mp->m_log = NULL;
+	destroy_workqueue(log->l_relog_workqueue);
 	destroy_workqueue(log->l_ioend_workqueue);
 	kmem_free(log);
 }	/* xlog_dealloc_log */
diff --git a/fs/xfs/xfs_log_priv.h b/fs/xfs/xfs_log_priv.h
index b192c5a9f9fd..4d557a82eb73 100644
--- a/fs/xfs/xfs_log_priv.h
+++ b/fs/xfs/xfs_log_priv.h
@@ -349,6 +349,7 @@ struct xlog {
 	struct xfs_cil		*l_cilp;	/* CIL log is working with */
 	struct xfs_buftarg	*l_targ;        /* buftarg of log */
 	struct workqueue_struct	*l_ioend_workqueue; /* for I/O completions */
+	struct workqueue_struct	*l_relog_workqueue; /* for auto relog */
 	struct delayed_work	l_work;		/* background flush work */
 	uint			l_flags;
 	uint			l_quotaoffs_flag; /* XFS_DQ_*, for QUOTAOFFs */
diff --git a/fs/xfs/xfs_trans.c b/fs/xfs/xfs_trans.c
index 3b208f9a865c..37011fc803fc 100644
--- a/fs/xfs/xfs_trans.c
+++ b/fs/xfs/xfs_trans.c
@@ -923,7 +923,7 @@ xfs_trans_committed_bulk(
  * have already been unlocked as if the commit had succeeded.
  * Do not reference the transaction structure after this call.
  */
-static int
+int
 __xfs_trans_commit(
 	struct xfs_trans	*tp,
 	bool			regrant)
diff --git a/fs/xfs/xfs_trans.h b/fs/xfs/xfs_trans.h
index 64d7f171ebd3..066d4aca8416 100644
--- a/fs/xfs/xfs_trans.h
+++ b/fs/xfs/xfs_trans.h
@@ -48,6 +48,7 @@ struct xfs_log_item {
 	struct xfs_log_vec		*li_lv;		/* active log vector */
 	struct xfs_log_vec		*li_lv_shadow;	/* standby vector */
 	xfs_lsn_t			li_seq;		/* CIL commit seq */
+	void				*li_priv;
 };
 
 /*
@@ -143,6 +144,14 @@ typedef struct xfs_trans {
 	unsigned long		t_pflags;	/* saved process flags state */
 } xfs_trans_t;
 
+struct xfs_trans_relog {
+	struct work_struct	tr_work;
+	unsigned int		tr_log_res;
+	unsigned int		tr_log_count;
+	struct xlog_ticket	*tr_tic;
+	struct xfs_log_item	*tr_lip;
+};
+
 /*
  * XFS transaction mechanism exported interfaces that are
  * actually macros.
@@ -231,7 +240,11 @@ bool		xfs_trans_buf_is_dirty(struct xfs_buf *bp);
 void		xfs_trans_log_inode(xfs_trans_t *, struct xfs_inode *, uint);
 
 int		xfs_trans_commit(struct xfs_trans *);
+int		__xfs_trans_commit(struct xfs_trans *, bool);
 int		xfs_trans_roll(struct xfs_trans **);
+int		xfs_trans_relog(struct xfs_trans *, struct xfs_log_item *);
+void		xfs_trans_relog_cancel(struct xfs_log_item *);
+void		xfs_trans_relog_queue(struct xfs_log_item *);
 int		xfs_trans_roll_inode(struct xfs_trans **, struct xfs_inode *);
 void		xfs_trans_cancel(xfs_trans_t *);
 int		xfs_trans_ail_init(struct xfs_mount *);
diff --git a/fs/xfs/xfs_trans_relog.c b/fs/xfs/xfs_trans_relog.c
new file mode 100644
index 000000000000..af139aa501f6
--- /dev/null
+++ b/fs/xfs/xfs_trans_relog.c
@@ -0,0 +1,130 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (C) 2019 Red Hat, Inc.
+ * All Rights Reserved.
+ */
+#include "xfs.h"
+#include "xfs_log_format.h"
+#include "xfs_trans.h"
+#include "xfs_trans_priv.h"
+#include "xfs_trans_resv.h"
+#include "xfs_shared.h"
+#include "xfs_format.h"
+#include "xfs_mount.h"
+#include "xfs_log_priv.h"
+#include "xfs_log.h"
+
+/*
+ * Helper to commit and regrant the log ticket as if the transaction is being
+ * rolled. The ticket is expected to be held by the caller and can be reused in
+ * a subsequent transaction.
+ */
+static int
+xfs_trans_commit_regrant(
+	struct xfs_trans	*tp)
+{
+	struct xfs_mount	*mp = tp->t_mountp;
+	struct xlog_ticket	*tic = tp->t_ticket;
+	int			error;
+
+	ASSERT(atomic_read(&tic->t_ref) > 1);
+	error = __xfs_trans_commit(tp, true);
+	if (!error)
+		error = xfs_log_regrant(mp, tic);
+	return error;
+}
+
+static void
+xfs_relog_worker(
+	struct work_struct	*work)
+{
+	struct xfs_trans_relog	*trp = container_of(work,
+					struct xfs_trans_relog, tr_work);
+	struct xfs_log_item	*lip = trp->tr_lip;
+	struct xfs_mount	*mp = lip->li_mountp;
+	struct xfs_trans_res	resv = {};
+	struct xfs_trans	*tp;
+	int			error;
+
+	error = xfs_trans_alloc(mp, &resv, 0, 0, 0, &tp);
+	if (error)
+		return;
+
+	/* associate the caller ticket with our empty transaction */
+	tp->t_log_res = trp->tr_log_res;
+	tp->t_log_count = trp->tr_log_count;
+	tp->t_flags |= XFS_TRANS_PERM_LOG_RES;
+	tp->t_ticket = xfs_log_ticket_get(trp->tr_tic);
+
+	xfs_trans_add_item(tp, lip);
+	tp->t_flags |= XFS_TRANS_DIRTY;
+	set_bit(XFS_LI_DIRTY, &lip->li_flags);
+
+	/* commit the transaction and regrant the ticket for the next time */
+	error = xfs_trans_commit_regrant(tp);
+	ASSERT(!error);
+}
+
+void
+xfs_trans_relog_queue(
+	struct xfs_log_item	*lip)
+{
+	struct xfs_mount	*mp = lip->li_mountp;
+	struct xfs_trans_relog	*trp = lip->li_priv;
+
+	if (!trp)
+		return;
+
+	queue_work(mp->m_log->l_relog_workqueue, &trp->tr_work);
+}
+
+/*
+ * Commit the caller transaction, regrant the ticket and use it for automatic
+ * relogging of the provided log item.
+ */
+int
+xfs_trans_relog(
+	struct xfs_trans	*tp,
+	struct xfs_log_item	*lip)
+{
+	struct xfs_trans_relog	*trp;
+	int			error;
+
+	trp = kmem_zalloc(sizeof(struct xfs_trans_relog), KM_MAYFAIL);
+	if (!trp) {
+		xfs_trans_cancel(tp);
+		return -ENOMEM;
+	}
+
+	INIT_WORK(&trp->tr_work, xfs_relog_worker);
+	trp->tr_lip = lip;
+	trp->tr_tic = xfs_log_ticket_get(tp->t_ticket);
+	trp->tr_log_res = tp->t_log_res;
+	trp->tr_log_count = tp->t_log_count;
+	lip->li_priv = trp;
+
+	error = xfs_trans_commit_regrant(tp);
+	if (error) {
+		xfs_log_ticket_put(trp->tr_tic);
+		kmem_free(trp);
+	}
+
+	return error;
+}
+
+void
+xfs_trans_relog_cancel(
+	struct xfs_log_item	*lip)
+{
+	struct xfs_trans_relog	*trp = lip->li_priv;
+	struct xfs_mount	*mp = lip->li_mountp;
+
+	/* cancel queued relog task */
+	cancel_work_sync(&trp->tr_work);
+	lip->li_priv = NULL;
+
+	/* release log reservation and free ticket */
+	ASSERT(atomic_read(&trp->tr_tic->t_ref) == 1);
+	xfs_log_done(mp, trp->tr_tic, NULL, false);
+	kmem_free(trp);
+}
-- 
2.20.1


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [RFC v2 PATCH 3/3] xfs: automatically relog quotaoff start intent
  2019-11-22 18:19 [RFC v2 PATCH 0/3] xfs: automatic relogging experiment Brian Foster
  2019-11-22 18:19 ` [RFC v2 PATCH 1/3] xfs: set t_task at wait time instead of alloc time Brian Foster
  2019-11-22 18:19 ` [RFC v2 PATCH 2/3] xfs: prototype automatic intent relogging mechanism Brian Foster
@ 2019-11-22 18:19 ` Brian Foster
  2019-11-25 18:55 ` [RFC v3 PATCH] xfs: automatic relogging experiment Brian Foster
  3 siblings, 0 replies; 6+ messages in thread
From: Brian Foster @ 2019-11-22 18:19 UTC (permalink / raw)
  To: linux-xfs

Signed-off-by: Brian Foster <bfoster@redhat.com>
---
 fs/xfs/libxfs/xfs_trans_resv.c |  3 ++-
 fs/xfs/xfs_dquot_item.c        | 13 +++++++++++++
 fs/xfs/xfs_qm_syscalls.c       |  9 ++++++++-
 3 files changed, 23 insertions(+), 2 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_trans_resv.c b/fs/xfs/libxfs/xfs_trans_resv.c
index c55cd9a3dec9..88e222d0947a 100644
--- a/fs/xfs/libxfs/xfs_trans_resv.c
+++ b/fs/xfs/libxfs/xfs_trans_resv.c
@@ -866,7 +866,8 @@ xfs_trans_resv_calc(
 	resp->tr_qm_setqlim.tr_logcount = XFS_DEFAULT_LOG_COUNT;
 
 	resp->tr_qm_quotaoff.tr_logres = xfs_calc_qm_quotaoff_reservation(mp);
-	resp->tr_qm_quotaoff.tr_logcount = XFS_DEFAULT_LOG_COUNT;
+	resp->tr_qm_quotaoff.tr_logcount = XFS_DEFAULT_PERM_LOG_COUNT;
+	resp->tr_qm_quotaoff.tr_logflags |= XFS_TRANS_PERM_LOG_RES;
 
 	resp->tr_qm_equotaoff.tr_logres =
 		xfs_calc_qm_quotaoff_end_reservation();
diff --git a/fs/xfs/xfs_dquot_item.c b/fs/xfs/xfs_dquot_item.c
index d60647d7197b..57b7c9b731b1 100644
--- a/fs/xfs/xfs_dquot_item.c
+++ b/fs/xfs/xfs_dquot_item.c
@@ -288,6 +288,18 @@ xfs_qm_qoff_logitem_format(
 	xlog_finish_iovec(lv, vecp, sizeof(struct xfs_qoff_logitem));
 }
 
+/*
+ * XXX: simulate relog callback with hardcoded ->iop_committed() callback
+ */
+static xfs_lsn_t
+xfs_qm_qoff_logitem_committed(
+	struct xfs_log_item	*lip,
+	xfs_lsn_t		lsn)
+{
+	xfs_trans_relog_queue(lip);
+	return lsn;
+}
+
 /*
  * There isn't much you can do to push a quotaoff item.  It is simply
  * stuck waiting for the log to be flushed to disk.
@@ -333,6 +345,7 @@ static const struct xfs_item_ops xfs_qm_qoffend_logitem_ops = {
 static const struct xfs_item_ops xfs_qm_qoff_logitem_ops = {
 	.iop_size	= xfs_qm_qoff_logitem_size,
 	.iop_format	= xfs_qm_qoff_logitem_format,
+	.iop_committed	= xfs_qm_qoff_logitem_committed,
 	.iop_push	= xfs_qm_qoff_logitem_push,
 };
 
diff --git a/fs/xfs/xfs_qm_syscalls.c b/fs/xfs/xfs_qm_syscalls.c
index 1ea82764bf89..3f15f8576027 100644
--- a/fs/xfs/xfs_qm_syscalls.c
+++ b/fs/xfs/xfs_qm_syscalls.c
@@ -18,6 +18,7 @@
 #include "xfs_quota.h"
 #include "xfs_qm.h"
 #include "xfs_icache.h"
+#include "xfs_log.h"
 
 STATIC int
 xfs_qm_log_quotaoff(
@@ -50,7 +51,7 @@ xfs_qm_log_quotaoff(
 	 * We don't care about quotoff's performance.
 	 */
 	xfs_trans_set_sync(tp);
-	error = xfs_trans_commit(tp);
+	error = xfs_trans_relog(tp, &qoffi->qql_item);
 	if (error)
 		goto out;
 
@@ -227,7 +228,13 @@ xfs_qm_scall_quotaoff(
 	 *
 	 * So, we have QUOTAOFF start and end logitems; the start
 	 * logitem won't get overwritten until the end logitem appears...
+	 *
+	 * XXX: qoffend expects qoffstart in the AIL but not in the CIL. Force
+	 * the log after cancelling relogging to prevent item reinsert...
 	 */
+	//ssleep(3);
+	xfs_trans_relog_cancel(&qoffstart->qql_item);
+	xfs_log_force(mp, XFS_LOG_SYNC);
 	error = xfs_qm_log_quotaoff_end(mp, qoffstart, flags);
 	if (error) {
 		/* We're screwed now. Shutdown is the only option. */
-- 
2.20.1


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [RFC v3 PATCH] xfs: automatic relogging experiment
  2019-11-22 18:19 [RFC v2 PATCH 0/3] xfs: automatic relogging experiment Brian Foster
                   ` (2 preceding siblings ...)
  2019-11-22 18:19 ` [RFC v2 PATCH 3/3] xfs: automatically relog quotaoff start intent Brian Foster
@ 2019-11-25 18:55 ` Brian Foster
  2019-11-27 15:12   ` Brian Foster
  3 siblings, 1 reply; 6+ messages in thread
From: Brian Foster @ 2019-11-25 18:55 UTC (permalink / raw)
  To: linux-xfs

POC to automatically relog the quotaoff start intent. This approach
involves no reservation stealing nor transaction rolling, so
deadlock avoidance is not guaranteed. The tradeoff is simplicity and
an approach that might be effective enough in practice.

Signed-off-by: Brian Foster <bfoster@redhat.com>
---

Here's a quickly hacked up version of what I was rambling about in the
cover letter. I wanted to post this for comparison. As noted above, this
doesn't necessarily guarantee deadlock avoidance with transaction
rolling, etc., but might be good enough in practice for the current use
cases (particularly with CIL context size fixes). Even if not, there's a
clear enough path to tracking relog reservation with a ticket in the CIL
context in a manner that is more conducive to batching. We also may be
able to union ->li_cb() with a ->li_relog() variant to relog intent
items as dfops currently does for things like EFIs that don't currently
support direct relogging of the same object.

Thoughts about using something like this as an intermediate solution,
provided it holds up against some stress testing?

Brian

 fs/xfs/xfs_log.c         |  1 +
 fs/xfs/xfs_log_cil.c     | 50 +++++++++++++++++++++++++++++++++++++++-
 fs/xfs/xfs_log_priv.h    |  2 ++
 fs/xfs/xfs_qm_syscalls.c |  6 +++++
 fs/xfs/xfs_trans.h       |  5 +++-
 5 files changed, 62 insertions(+), 2 deletions(-)

diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c
index 6a147c63a8a6..4fb3c3156ea2 100644
--- a/fs/xfs/xfs_log.c
+++ b/fs/xfs/xfs_log.c
@@ -1086,6 +1086,7 @@ xfs_log_item_init(
 	INIT_LIST_HEAD(&item->li_cil);
 	INIT_LIST_HEAD(&item->li_bio_list);
 	INIT_LIST_HEAD(&item->li_trans);
+	INIT_LIST_HEAD(&item->li_ril);
 }
 
 /*
diff --git a/fs/xfs/xfs_log_cil.c b/fs/xfs/xfs_log_cil.c
index 48435cf2aa16..c16ebc448a40 100644
--- a/fs/xfs/xfs_log_cil.c
+++ b/fs/xfs/xfs_log_cil.c
@@ -19,6 +19,44 @@
 
 struct workqueue_struct *xfs_discard_wq;
 
+static void
+xfs_relog_worker(
+	struct work_struct	*work)
+{
+	struct xfs_cil_ctx	*ctx = container_of(work, struct xfs_cil_ctx, relog_work);
+	struct xfs_mount	*mp = ctx->cil->xc_log->l_mp;
+	struct xfs_trans	*tp;
+	struct xfs_log_item	*lip, *lipp;
+	int			error;
+
+	error = xfs_trans_alloc(mp, &M_RES(mp)->tr_qm_quotaoff, 0, 0, 0, &tp);
+	ASSERT(!error);
+
+	list_for_each_entry_safe(lip, lipp, &ctx->relog_list, li_ril) {
+		list_del_init(&lip->li_ril);
+
+		if (!test_bit(XFS_LI_RELOG, &lip->li_flags))
+			continue;
+
+		xfs_trans_add_item(tp, lip);
+		set_bit(XFS_LI_DIRTY, &lip->li_flags);
+		tp->t_flags |= XFS_TRANS_DIRTY;
+	}
+
+	error = xfs_trans_commit(tp);
+	ASSERT(!error);
+
+	/* XXX */
+	kmem_free(ctx);
+}
+
+static void
+xfs_relog_queue(
+	struct xfs_cil_ctx	*ctx)
+{
+	queue_work(xfs_discard_wq, &ctx->relog_work);
+}
+
 /*
  * Allocate a new ticket. Failing to get a new ticket makes it really hard to
  * recover, so we don't allow failure here. Also, we allocate in a context that
@@ -476,6 +514,9 @@ xlog_cil_insert_items(
 		 */
 		if (!list_is_last(&lip->li_cil, &cil->xc_cil))
 			list_move_tail(&lip->li_cil, &cil->xc_cil);
+
+		if (test_bit(XFS_LI_RELOG, &lip->li_flags))
+			list_move_tail(&lip->li_ril, &ctx->relog_list);
 	}
 
 	spin_unlock(&cil->xc_cil_lock);
@@ -605,7 +646,10 @@ xlog_cil_committed(
 
 	xlog_cil_free_logvec(ctx->lv_chain);
 
-	if (!list_empty(&ctx->busy_extents))
+	/* XXX: mutually exclusive w/ discard for POC to handle ctx freeing */
+	if (!list_empty(&ctx->relog_list))
+		xfs_relog_queue(ctx);
+	else if (!list_empty(&ctx->busy_extents))
 		xlog_discard_busy_extents(mp, ctx);
 	else
 		kmem_free(ctx);
@@ -746,8 +790,10 @@ xlog_cil_push(
 	 */
 	INIT_LIST_HEAD(&new_ctx->committing);
 	INIT_LIST_HEAD(&new_ctx->busy_extents);
+	INIT_LIST_HEAD(&new_ctx->relog_list);
 	new_ctx->sequence = ctx->sequence + 1;
 	new_ctx->cil = cil;
+	INIT_WORK(&new_ctx->relog_work, xfs_relog_worker);
 	cil->xc_ctx = new_ctx;
 
 	/*
@@ -1199,6 +1245,8 @@ xlog_cil_init(
 
 	INIT_LIST_HEAD(&ctx->committing);
 	INIT_LIST_HEAD(&ctx->busy_extents);
+	INIT_LIST_HEAD(&ctx->relog_list);
+	INIT_WORK(&ctx->relog_work, xfs_relog_worker);
 	ctx->sequence = 1;
 	ctx->cil = cil;
 	cil->xc_ctx = ctx;
diff --git a/fs/xfs/xfs_log_priv.h b/fs/xfs/xfs_log_priv.h
index b192c5a9f9fd..6fd7b7297bd3 100644
--- a/fs/xfs/xfs_log_priv.h
+++ b/fs/xfs/xfs_log_priv.h
@@ -243,6 +243,8 @@ struct xfs_cil_ctx {
 	struct list_head	iclog_entry;
 	struct list_head	committing;	/* ctx committing list */
 	struct work_struct	discard_endio_work;
+	struct list_head	relog_list;
+	struct work_struct	relog_work;
 };
 
 /*
diff --git a/fs/xfs/xfs_qm_syscalls.c b/fs/xfs/xfs_qm_syscalls.c
index 1ea82764bf89..08b6180cb5a3 100644
--- a/fs/xfs/xfs_qm_syscalls.c
+++ b/fs/xfs/xfs_qm_syscalls.c
@@ -18,6 +18,7 @@
 #include "xfs_quota.h"
 #include "xfs_qm.h"
 #include "xfs_icache.h"
+#include "xfs_log.h"
 
 STATIC int
 xfs_qm_log_quotaoff(
@@ -37,6 +38,7 @@ xfs_qm_log_quotaoff(
 
 	qoffi = xfs_trans_get_qoff_item(tp, NULL, flags & XFS_ALL_QUOTA_ACCT);
 	xfs_trans_log_quotaoff_item(tp, qoffi);
+	set_bit(XFS_LI_RELOG, &qoffi->qql_item.li_flags);
 
 	spin_lock(&mp->m_sb_lock);
 	mp->m_sb.sb_qflags = (mp->m_qflags & ~(flags)) & XFS_MOUNT_QUOTA_ALL;
@@ -69,6 +71,10 @@ xfs_qm_log_quotaoff_end(
 	int			error;
 	struct xfs_qoff_logitem	*qoffi;
 
+	clear_bit(XFS_LI_RELOG, &startqoff->qql_item.li_flags);
+	xfs_log_force(mp, XFS_LOG_SYNC);
+	flush_workqueue(xfs_discard_wq);
+
 	error = xfs_trans_alloc(mp, &M_RES(mp)->tr_qm_equotaoff, 0, 0, 0, &tp);
 	if (error)
 		return error;
diff --git a/fs/xfs/xfs_trans.h b/fs/xfs/xfs_trans.h
index 64d7f171ebd3..e04033c29f0d 100644
--- a/fs/xfs/xfs_trans.h
+++ b/fs/xfs/xfs_trans.h
@@ -48,6 +48,7 @@ struct xfs_log_item {
 	struct xfs_log_vec		*li_lv;		/* active log vector */
 	struct xfs_log_vec		*li_lv_shadow;	/* standby vector */
 	xfs_lsn_t			li_seq;		/* CIL commit seq */
+	struct list_head		li_ril;
 };
 
 /*
@@ -59,12 +60,14 @@ struct xfs_log_item {
 #define	XFS_LI_ABORTED	1
 #define	XFS_LI_FAILED	2
 #define	XFS_LI_DIRTY	3	/* log item dirty in transaction */
+#define	XFS_LI_RELOG	4	/* automatic relogging */
 
 #define XFS_LI_FLAGS \
 	{ (1 << XFS_LI_IN_AIL),		"IN_AIL" }, \
 	{ (1 << XFS_LI_ABORTED),	"ABORTED" }, \
 	{ (1 << XFS_LI_FAILED),		"FAILED" }, \
-	{ (1 << XFS_LI_DIRTY),		"DIRTY" }
+	{ (1 << XFS_LI_DIRTY),		"DIRTY" }, \
+	{ (1 << XFS_LI_RELOG),		"RELOG" }
 
 struct xfs_item_ops {
 	unsigned flags;
-- 
2.20.1


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [RFC v3 PATCH] xfs: automatic relogging experiment
  2019-11-25 18:55 ` [RFC v3 PATCH] xfs: automatic relogging experiment Brian Foster
@ 2019-11-27 15:12   ` Brian Foster
  0 siblings, 0 replies; 6+ messages in thread
From: Brian Foster @ 2019-11-27 15:12 UTC (permalink / raw)
  To: linux-xfs

On Mon, Nov 25, 2019 at 01:55:23PM -0500, Brian Foster wrote:
> POC to automatically relog the quotaoff start intent. This approach
> involves no reservation stealing nor transaction rolling, so
> deadlock avoidance is not guaranteed. The tradeoff is simplicity and
> an approach that might be effective enough in practice.
> 
> Signed-off-by: Brian Foster <bfoster@redhat.com>
> ---
> 
> Here's a quickly hacked up version of what I was rambling about in the
> cover letter. I wanted to post this for comparison. As noted above, this
> doesn't necessarily guarantee deadlock avoidance with transaction
> rolling, etc., but might be good enough in practice for the current use
> cases (particularly with CIL context size fixes). Even if not, there's a
> clear enough path to tracking relog reservation with a ticket in the CIL
> context in a manner that is more conducive to batching. We also may be
> able to union ->li_cb() with a ->li_relog() variant to relog intent
> items as dfops currently does for things like EFIs that don't currently
> support direct relogging of the same object.
> 
> Thoughts about using something like this as an intermediate solution,
> provided it holds up against some stress testing?
> 

In thinking about this a bit more, it occurs to me that there might be
an elegant way to provide simplicity and flexibility by incorporating
automatic relogging into xfsaild rather than tieing it into the CIL or
having it completely independent (as the past couple of RFCs have done).
From the simplicity standpoint, xfsaild already tracks logged+committed
items for us, so that eliminates the need for independent "RIL"
tracking. xfsaild already issues log item callbacks that could translate
the new log item LI_RELOG state bit to a corresponding XFS_ITEM_RELOG
return code that triggers a relog of the item. We also know the lsn of
the item at this point and can further compare to the tail lsn to only
relog such items when they sit at the log tail. This is more efficient
than the current proposal to automatically relog on every CIL
checkpoint.

From the flexibility standpoint, xfsaild already knows how to safely
access all types of log items via the log ops vector. IOW, it knows how
to exclusively access a log item for writeback, so it would just need
generic/mechanical bits to relog a particular item instead. The caveat
to this is that the task that requests auto relog must relenquish locks
for relogging to actually take place. For example, the sequence for a
traditional (non-intent) log item would be something like::

	- lock object
	- dirty in transaction
	- set lip relog bit
	- commit transaction
	- unlock object (background relogging now enabled)

At this point the log item is essentially pinned in-core without pinning
the tail of the log. It is free to be modified by any unrelated
transaction without conflict to either task, but xfsaild will not write
it back. Sometime later the original task would have to lock the item
and clear the relog state to cancel the sequence. The task could simply
release the item to allow writeback or join it to a final transaction
with the relog bit cleared.

There's still room for a custom ->iop_relog() callback or some such for
any items that require special relog handling. If releasing locks is
ever a concern for a particular operation, that callback could overload
the generic relog mechanism and serve as a notification to the lock
holder to roll its own transaction without ever unlocking the item. TBH
I'm still not sure there's a use case for this kind of non-intent
relogging, but ISTM this design model accommodates it reasonably enough
with minimal complexity.

There are still some open implementation questions around managing the
relog transaction(s), determining how much reservation is needed, how to
acquire it, etc. We most likely do not want to block the xfsaild task on
acquiring reservation, but it might be able to regrant a permanent
ticket or simply kick off the task of replenishing relog reservation to
a workqueue.

Thoughts?

Brian

> Brian
> 
>  fs/xfs/xfs_log.c         |  1 +
>  fs/xfs/xfs_log_cil.c     | 50 +++++++++++++++++++++++++++++++++++++++-
>  fs/xfs/xfs_log_priv.h    |  2 ++
>  fs/xfs/xfs_qm_syscalls.c |  6 +++++
>  fs/xfs/xfs_trans.h       |  5 +++-
>  5 files changed, 62 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c
> index 6a147c63a8a6..4fb3c3156ea2 100644
> --- a/fs/xfs/xfs_log.c
> +++ b/fs/xfs/xfs_log.c
> @@ -1086,6 +1086,7 @@ xfs_log_item_init(
>  	INIT_LIST_HEAD(&item->li_cil);
>  	INIT_LIST_HEAD(&item->li_bio_list);
>  	INIT_LIST_HEAD(&item->li_trans);
> +	INIT_LIST_HEAD(&item->li_ril);
>  }
>  
>  /*
> diff --git a/fs/xfs/xfs_log_cil.c b/fs/xfs/xfs_log_cil.c
> index 48435cf2aa16..c16ebc448a40 100644
> --- a/fs/xfs/xfs_log_cil.c
> +++ b/fs/xfs/xfs_log_cil.c
> @@ -19,6 +19,44 @@
>  
>  struct workqueue_struct *xfs_discard_wq;
>  
> +static void
> +xfs_relog_worker(
> +	struct work_struct	*work)
> +{
> +	struct xfs_cil_ctx	*ctx = container_of(work, struct xfs_cil_ctx, relog_work);
> +	struct xfs_mount	*mp = ctx->cil->xc_log->l_mp;
> +	struct xfs_trans	*tp;
> +	struct xfs_log_item	*lip, *lipp;
> +	int			error;
> +
> +	error = xfs_trans_alloc(mp, &M_RES(mp)->tr_qm_quotaoff, 0, 0, 0, &tp);
> +	ASSERT(!error);
> +
> +	list_for_each_entry_safe(lip, lipp, &ctx->relog_list, li_ril) {
> +		list_del_init(&lip->li_ril);
> +
> +		if (!test_bit(XFS_LI_RELOG, &lip->li_flags))
> +			continue;
> +
> +		xfs_trans_add_item(tp, lip);
> +		set_bit(XFS_LI_DIRTY, &lip->li_flags);
> +		tp->t_flags |= XFS_TRANS_DIRTY;
> +	}
> +
> +	error = xfs_trans_commit(tp);
> +	ASSERT(!error);
> +
> +	/* XXX */
> +	kmem_free(ctx);
> +}
> +
> +static void
> +xfs_relog_queue(
> +	struct xfs_cil_ctx	*ctx)
> +{
> +	queue_work(xfs_discard_wq, &ctx->relog_work);
> +}
> +
>  /*
>   * Allocate a new ticket. Failing to get a new ticket makes it really hard to
>   * recover, so we don't allow failure here. Also, we allocate in a context that
> @@ -476,6 +514,9 @@ xlog_cil_insert_items(
>  		 */
>  		if (!list_is_last(&lip->li_cil, &cil->xc_cil))
>  			list_move_tail(&lip->li_cil, &cil->xc_cil);
> +
> +		if (test_bit(XFS_LI_RELOG, &lip->li_flags))
> +			list_move_tail(&lip->li_ril, &ctx->relog_list);
>  	}
>  
>  	spin_unlock(&cil->xc_cil_lock);
> @@ -605,7 +646,10 @@ xlog_cil_committed(
>  
>  	xlog_cil_free_logvec(ctx->lv_chain);
>  
> -	if (!list_empty(&ctx->busy_extents))
> +	/* XXX: mutually exclusive w/ discard for POC to handle ctx freeing */
> +	if (!list_empty(&ctx->relog_list))
> +		xfs_relog_queue(ctx);
> +	else if (!list_empty(&ctx->busy_extents))
>  		xlog_discard_busy_extents(mp, ctx);
>  	else
>  		kmem_free(ctx);
> @@ -746,8 +790,10 @@ xlog_cil_push(
>  	 */
>  	INIT_LIST_HEAD(&new_ctx->committing);
>  	INIT_LIST_HEAD(&new_ctx->busy_extents);
> +	INIT_LIST_HEAD(&new_ctx->relog_list);
>  	new_ctx->sequence = ctx->sequence + 1;
>  	new_ctx->cil = cil;
> +	INIT_WORK(&new_ctx->relog_work, xfs_relog_worker);
>  	cil->xc_ctx = new_ctx;
>  
>  	/*
> @@ -1199,6 +1245,8 @@ xlog_cil_init(
>  
>  	INIT_LIST_HEAD(&ctx->committing);
>  	INIT_LIST_HEAD(&ctx->busy_extents);
> +	INIT_LIST_HEAD(&ctx->relog_list);
> +	INIT_WORK(&ctx->relog_work, xfs_relog_worker);
>  	ctx->sequence = 1;
>  	ctx->cil = cil;
>  	cil->xc_ctx = ctx;
> diff --git a/fs/xfs/xfs_log_priv.h b/fs/xfs/xfs_log_priv.h
> index b192c5a9f9fd..6fd7b7297bd3 100644
> --- a/fs/xfs/xfs_log_priv.h
> +++ b/fs/xfs/xfs_log_priv.h
> @@ -243,6 +243,8 @@ struct xfs_cil_ctx {
>  	struct list_head	iclog_entry;
>  	struct list_head	committing;	/* ctx committing list */
>  	struct work_struct	discard_endio_work;
> +	struct list_head	relog_list;
> +	struct work_struct	relog_work;
>  };
>  
>  /*
> diff --git a/fs/xfs/xfs_qm_syscalls.c b/fs/xfs/xfs_qm_syscalls.c
> index 1ea82764bf89..08b6180cb5a3 100644
> --- a/fs/xfs/xfs_qm_syscalls.c
> +++ b/fs/xfs/xfs_qm_syscalls.c
> @@ -18,6 +18,7 @@
>  #include "xfs_quota.h"
>  #include "xfs_qm.h"
>  #include "xfs_icache.h"
> +#include "xfs_log.h"
>  
>  STATIC int
>  xfs_qm_log_quotaoff(
> @@ -37,6 +38,7 @@ xfs_qm_log_quotaoff(
>  
>  	qoffi = xfs_trans_get_qoff_item(tp, NULL, flags & XFS_ALL_QUOTA_ACCT);
>  	xfs_trans_log_quotaoff_item(tp, qoffi);
> +	set_bit(XFS_LI_RELOG, &qoffi->qql_item.li_flags);
>  
>  	spin_lock(&mp->m_sb_lock);
>  	mp->m_sb.sb_qflags = (mp->m_qflags & ~(flags)) & XFS_MOUNT_QUOTA_ALL;
> @@ -69,6 +71,10 @@ xfs_qm_log_quotaoff_end(
>  	int			error;
>  	struct xfs_qoff_logitem	*qoffi;
>  
> +	clear_bit(XFS_LI_RELOG, &startqoff->qql_item.li_flags);
> +	xfs_log_force(mp, XFS_LOG_SYNC);
> +	flush_workqueue(xfs_discard_wq);
> +
>  	error = xfs_trans_alloc(mp, &M_RES(mp)->tr_qm_equotaoff, 0, 0, 0, &tp);
>  	if (error)
>  		return error;
> diff --git a/fs/xfs/xfs_trans.h b/fs/xfs/xfs_trans.h
> index 64d7f171ebd3..e04033c29f0d 100644
> --- a/fs/xfs/xfs_trans.h
> +++ b/fs/xfs/xfs_trans.h
> @@ -48,6 +48,7 @@ struct xfs_log_item {
>  	struct xfs_log_vec		*li_lv;		/* active log vector */
>  	struct xfs_log_vec		*li_lv_shadow;	/* standby vector */
>  	xfs_lsn_t			li_seq;		/* CIL commit seq */
> +	struct list_head		li_ril;
>  };
>  
>  /*
> @@ -59,12 +60,14 @@ struct xfs_log_item {
>  #define	XFS_LI_ABORTED	1
>  #define	XFS_LI_FAILED	2
>  #define	XFS_LI_DIRTY	3	/* log item dirty in transaction */
> +#define	XFS_LI_RELOG	4	/* automatic relogging */
>  
>  #define XFS_LI_FLAGS \
>  	{ (1 << XFS_LI_IN_AIL),		"IN_AIL" }, \
>  	{ (1 << XFS_LI_ABORTED),	"ABORTED" }, \
>  	{ (1 << XFS_LI_FAILED),		"FAILED" }, \
> -	{ (1 << XFS_LI_DIRTY),		"DIRTY" }
> +	{ (1 << XFS_LI_DIRTY),		"DIRTY" }, \
> +	{ (1 << XFS_LI_RELOG),		"RELOG" }
>  
>  struct xfs_item_ops {
>  	unsigned flags;
> -- 
> 2.20.1
> 


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, back to index

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-11-22 18:19 [RFC v2 PATCH 0/3] xfs: automatic relogging experiment Brian Foster
2019-11-22 18:19 ` [RFC v2 PATCH 1/3] xfs: set t_task at wait time instead of alloc time Brian Foster
2019-11-22 18:19 ` [RFC v2 PATCH 2/3] xfs: prototype automatic intent relogging mechanism Brian Foster
2019-11-22 18:19 ` [RFC v2 PATCH 3/3] xfs: automatically relog quotaoff start intent Brian Foster
2019-11-25 18:55 ` [RFC v3 PATCH] xfs: automatic relogging experiment Brian Foster
2019-11-27 15:12   ` Brian Foster

Linux-XFS Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-xfs/0 linux-xfs/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-xfs linux-xfs/ https://lore.kernel.org/linux-xfs \
		linux-xfs@vger.kernel.org
	public-inbox-index linux-xfs

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-xfs


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git