linux-xfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC v6 PATCH 00/10] xfs: automatic relogging experiment
@ 2020-04-06 12:36 Brian Foster
  2020-04-06 12:36 ` [RFC v6 PATCH 01/10] xfs: automatic relogging item management Brian Foster
                   ` (9 more replies)
  0 siblings, 10 replies; 15+ messages in thread
From: Brian Foster @ 2020-04-06 12:36 UTC (permalink / raw)
  To: linux-xfs

Hi all,

Here's a v6 of the automatic relogging RFC. The primary difference in
this version is a rework of the relog reservation model based on design
discussion on v5. Rather than roll existing transactions and initialize
a fixed size relog transaction, this version acquires worst case relog
reservation directly from the transactions that enable relog of log
items. This facilitates construction of arbitrary combinations of items
in the relog transaction without risk of deadlock due to log reservation
and/or lock ordering. See the associated commit log for further details
on the approach.

Beyond that, there are various other fixes and tweaks from v5. For
example, the log item relog callback is used unconditionally and
abstracts more of the reservation management code, buffer relogging is a
bit more restrictive and reliable, various helpers are refactored,
freeze is partly addressed, etc.

With regard to testing, this version survives a 1+ hour 80xcpu fsstress
workload with random buffer relogging enabled without any notable issues
and without observable reservation leaks. It also survives an fstests
auto run without regression.

Patches 1-5 are preparatory patches and core mechanism. Patch 6 uses
relogging to address the longstanding quotaoff deadlock problem. Patches
7-10 provide buffer relogging support and test code for DEBUG mode
kernels to stress the relog mechanism via random buffer relogging.

Thoughts, reviews, flames appreciated.

Brian

rfcv6:
- Rework relog reservation model.
- Drop unnecessary log ticket t_task fix.
- Use ->iop_relog() callback unconditionally.
- Rudimentary freeze handling for random buffer relogging.
- Various other fixes, tweaks and cleanups.
rfcv5: https://lore.kernel.org/linux-xfs/20200227134321.7238-1-bfoster@redhat.com/
- More fleshed out design to prevent log reservation deadlock and
  locking problems.
- Split out core patches between pre-reservation management, relog item
  state management and relog mechanism.
- Added experimental buffer relogging capability.
rfcv4: https://lore.kernel.org/linux-xfs/20191205175037.52529-1-bfoster@redhat.com/
- AIL based approach.
rfcv3: https://lore.kernel.org/linux-xfs/20191125185523.47556-1-bfoster@redhat.com/
- CIL based approach.
rfcv2: https://lore.kernel.org/linux-xfs/20191122181927.32870-1-bfoster@redhat.com/
- Different approach based on workqueue and transaction rolling.
rfc: https://lore.kernel.org/linux-xfs/20191024172850.7698-1-bfoster@redhat.com/

Brian Foster (10):
  xfs: automatic relogging item management
  xfs: create helper for ticket-less log res ungrant
  xfs: extra runtime reservation overhead for relog transactions
  xfs: relog log reservation stealing and accounting
  xfs: automatic log item relog mechanism
  xfs: automatically relog the quotaoff start intent
  xfs: prevent fs freeze with outstanding relog items
  xfs: buffer relogging support prototype
  xfs: create an error tag for random relog reservation
  xfs: relog random buffers based on errortag

 fs/xfs/libxfs/xfs_errortag.h |   4 +-
 fs/xfs/libxfs/xfs_shared.h   |   1 +
 fs/xfs/xfs_buf.c             |   4 +
 fs/xfs/xfs_buf_item.c        |  52 ++++++++++++-
 fs/xfs/xfs_dquot_item.c      |  26 +++++++
 fs/xfs/xfs_error.c           |   3 +
 fs/xfs/xfs_log.c             |  35 +++++++--
 fs/xfs/xfs_log.h             |   4 +-
 fs/xfs/xfs_log_cil.c         |   2 +-
 fs/xfs/xfs_log_priv.h        |   1 +
 fs/xfs/xfs_qm_syscalls.c     |  12 ++-
 fs/xfs/xfs_super.c           |   4 +
 fs/xfs/xfs_trace.h           |   4 +
 fs/xfs/xfs_trans.c           |  75 ++++++++++++++++--
 fs/xfs/xfs_trans.h           |  36 ++++++++-
 fs/xfs/xfs_trans_ail.c       | 146 ++++++++++++++++++++++++++++++++++-
 fs/xfs/xfs_trans_buf.c       |  73 ++++++++++++++++++
 fs/xfs/xfs_trans_priv.h      |  20 +++++
 18 files changed, 477 insertions(+), 25 deletions(-)

-- 
2.21.1


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [RFC v6 PATCH 01/10] xfs: automatic relogging item management
  2020-04-06 12:36 [RFC v6 PATCH 00/10] xfs: automatic relogging experiment Brian Foster
@ 2020-04-06 12:36 ` Brian Foster
  2020-04-06 12:36 ` [RFC v6 PATCH 02/10] xfs: create helper for ticket-less log res ungrant Brian Foster
                   ` (8 subsequent siblings)
  9 siblings, 0 replies; 15+ messages in thread
From: Brian Foster @ 2020-04-06 12:36 UTC (permalink / raw)
  To: linux-xfs

Add a log item flag to track relog state and a couple helpers to set
and clear the flag. The flag will be set on any log item that is to
be automatically relogged by log tail pressure.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Allison Collins <allison.henderson@oracle.com>
---
 fs/xfs/xfs_trace.h      |  2 ++
 fs/xfs/xfs_trans.c      | 20 ++++++++++++++++++++
 fs/xfs/xfs_trans.h      |  4 +++-
 fs/xfs/xfs_trans_priv.h |  2 ++
 4 files changed, 27 insertions(+), 1 deletion(-)

diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
index a4323a63438d..b683c94556a3 100644
--- a/fs/xfs/xfs_trace.h
+++ b/fs/xfs/xfs_trace.h
@@ -1068,6 +1068,8 @@ DEFINE_LOG_ITEM_EVENT(xfs_ail_push);
 DEFINE_LOG_ITEM_EVENT(xfs_ail_pinned);
 DEFINE_LOG_ITEM_EVENT(xfs_ail_locked);
 DEFINE_LOG_ITEM_EVENT(xfs_ail_flushing);
+DEFINE_LOG_ITEM_EVENT(xfs_relog_item);
+DEFINE_LOG_ITEM_EVENT(xfs_relog_item_cancel);
 
 DECLARE_EVENT_CLASS(xfs_ail_class,
 	TP_PROTO(struct xfs_log_item *lip, xfs_lsn_t old_lsn, xfs_lsn_t new_lsn),
diff --git a/fs/xfs/xfs_trans.c b/fs/xfs/xfs_trans.c
index 28b983ff8b11..4fbe11485bbb 100644
--- a/fs/xfs/xfs_trans.c
+++ b/fs/xfs/xfs_trans.c
@@ -770,6 +770,26 @@ xfs_trans_del_item(
 	list_del_init(&lip->li_trans);
 }
 
+void
+xfs_trans_relog_item(
+	struct xfs_trans	*tp,
+	struct xfs_log_item	*lip)
+{
+	if (test_and_set_bit(XFS_LI_RELOG, &lip->li_flags))
+		return;
+	trace_xfs_relog_item(lip);
+}
+
+void
+xfs_trans_relog_item_cancel(
+	struct xfs_trans	*tp,
+	struct xfs_log_item	*lip)
+{
+	if (!test_and_clear_bit(XFS_LI_RELOG, &lip->li_flags))
+		return;
+	trace_xfs_relog_item_cancel(lip);
+}
+
 /* Detach and unlock all of the items in a transaction */
 static void
 xfs_trans_free_items(
diff --git a/fs/xfs/xfs_trans.h b/fs/xfs/xfs_trans.h
index 752c7fef9de7..9ea0f415e5ca 100644
--- a/fs/xfs/xfs_trans.h
+++ b/fs/xfs/xfs_trans.h
@@ -59,12 +59,14 @@ struct xfs_log_item {
 #define	XFS_LI_ABORTED	1
 #define	XFS_LI_FAILED	2
 #define	XFS_LI_DIRTY	3	/* log item dirty in transaction */
+#define	XFS_LI_RELOG	4	/* automatically relog item */
 
 #define XFS_LI_FLAGS \
 	{ (1 << XFS_LI_IN_AIL),		"IN_AIL" }, \
 	{ (1 << XFS_LI_ABORTED),	"ABORTED" }, \
 	{ (1 << XFS_LI_FAILED),		"FAILED" }, \
-	{ (1 << XFS_LI_DIRTY),		"DIRTY" }
+	{ (1 << XFS_LI_DIRTY),		"DIRTY" }, \
+	{ (1 << XFS_LI_RELOG),		"RELOG" }
 
 struct xfs_item_ops {
 	unsigned flags;
diff --git a/fs/xfs/xfs_trans_priv.h b/fs/xfs/xfs_trans_priv.h
index 35655eac01a6..c890794314a6 100644
--- a/fs/xfs/xfs_trans_priv.h
+++ b/fs/xfs/xfs_trans_priv.h
@@ -16,6 +16,8 @@ struct xfs_log_vec;
 void	xfs_trans_init(struct xfs_mount *);
 void	xfs_trans_add_item(struct xfs_trans *, struct xfs_log_item *);
 void	xfs_trans_del_item(struct xfs_log_item *);
+void	xfs_trans_relog_item(struct xfs_trans *, struct xfs_log_item *);
+void	xfs_trans_relog_item_cancel(struct xfs_trans *, struct xfs_log_item *);
 void	xfs_trans_unreserve_and_mod_sb(struct xfs_trans *tp);
 
 void	xfs_trans_committed_bulk(struct xfs_ail *ailp, struct xfs_log_vec *lv,
-- 
2.21.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [RFC v6 PATCH 02/10] xfs: create helper for ticket-less log res ungrant
  2020-04-06 12:36 [RFC v6 PATCH 00/10] xfs: automatic relogging experiment Brian Foster
  2020-04-06 12:36 ` [RFC v6 PATCH 01/10] xfs: automatic relogging item management Brian Foster
@ 2020-04-06 12:36 ` Brian Foster
  2020-04-06 23:52   ` Allison Collins
  2020-04-06 12:36 ` [RFC v6 PATCH 03/10] xfs: extra runtime reservation overhead for relog transactions Brian Foster
                   ` (7 subsequent siblings)
  9 siblings, 1 reply; 15+ messages in thread
From: Brian Foster @ 2020-04-06 12:36 UTC (permalink / raw)
  To: linux-xfs

Log reservation is currently acquired and released via log tickets.
The relog mechanism introduces behavior where relog reservation is
transferred between transaction log tickets and an external pool of
relog reservation for active relog items. Certain contexts will be
able to release outstanding relog reservation without the need for a
log ticket. Factor out a helper to allow byte granularity log
reservation ungrant.

Signed-off-by: Brian Foster <bfoster@redhat.com>
---
 fs/xfs/xfs_log.c | 20 ++++++++++++++++----
 fs/xfs/xfs_log.h |  1 +
 2 files changed, 17 insertions(+), 4 deletions(-)

diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c
index 00fda2e8e738..d6b63490a78b 100644
--- a/fs/xfs/xfs_log.c
+++ b/fs/xfs/xfs_log.c
@@ -2980,6 +2980,21 @@ xfs_log_ticket_regrant(
 	xfs_log_ticket_put(ticket);
 }
 
+/*
+ * Restore log reservation directly to the grant heads.
+ */
+void
+xfs_log_ungrant_bytes(
+	struct xfs_mount	*mp,
+	int			bytes)
+{
+	struct xlog		*log = mp->m_log;
+
+	xlog_grant_sub_space(log, &log->l_reserve_head.grant, bytes);
+	xlog_grant_sub_space(log, &log->l_write_head.grant, bytes);
+	xfs_log_space_wake(mp);
+}
+
 /*
  * Give back the space left from a reservation.
  *
@@ -3018,12 +3033,9 @@ xfs_log_ticket_ungrant(
 		bytes += ticket->t_unit_res*ticket->t_cnt;
 	}
 
-	xlog_grant_sub_space(log, &log->l_reserve_head.grant, bytes);
-	xlog_grant_sub_space(log, &log->l_write_head.grant, bytes);
-
+	xfs_log_ungrant_bytes(log->l_mp, bytes);
 	trace_xfs_log_ticket_ungrant_exit(log, ticket);
 
-	xfs_log_space_wake(log->l_mp);
 	xfs_log_ticket_put(ticket);
 }
 
diff --git a/fs/xfs/xfs_log.h b/fs/xfs/xfs_log.h
index 1412d6993f1e..6d2f30f42245 100644
--- a/fs/xfs/xfs_log.h
+++ b/fs/xfs/xfs_log.h
@@ -125,6 +125,7 @@ int	  xfs_log_reserve(struct xfs_mount *mp,
 			  uint8_t		   clientid,
 			  bool		   permanent);
 int	  xfs_log_regrant(struct xfs_mount *mp, struct xlog_ticket *tic);
+void	  xfs_log_ungrant_bytes(struct xfs_mount *mp, int bytes);
 void      xfs_log_unmount(struct xfs_mount *mp);
 int	  xfs_log_force_umount(struct xfs_mount *mp, int logerror);
 
-- 
2.21.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [RFC v6 PATCH 03/10] xfs: extra runtime reservation overhead for relog transactions
  2020-04-06 12:36 [RFC v6 PATCH 00/10] xfs: automatic relogging experiment Brian Foster
  2020-04-06 12:36 ` [RFC v6 PATCH 01/10] xfs: automatic relogging item management Brian Foster
  2020-04-06 12:36 ` [RFC v6 PATCH 02/10] xfs: create helper for ticket-less log res ungrant Brian Foster
@ 2020-04-06 12:36 ` Brian Foster
  2020-04-07 23:04   ` Allison Collins
  2020-04-06 12:36 ` [RFC v6 PATCH 04/10] xfs: relog log reservation stealing and accounting Brian Foster
                   ` (6 subsequent siblings)
  9 siblings, 1 reply; 15+ messages in thread
From: Brian Foster @ 2020-04-06 12:36 UTC (permalink / raw)
  To: linux-xfs

Every transaction reservation includes runtime overhead on top of
the reservation calculated in the struct xfs_trans_res. This
overhead is required for things like the CIL context ticket, log
headers, etc., that are stolen from individual transactions. Since
reservation for the relog transaction is entirely contributed by
regular transactions, this runtime reservation overhead must be
contributed as well. This means that a transaction that relogs one
or more items must include overhead for the current transaction as
well as for the relog transaction.

Define a new transaction flag to indicate that a transaction is
relog enabled. Plumb this state down to the log ticket allocation
and use it to bump the worst case overhead included in the
transaction. The overhead will eventually be transferred to the
relog system as needed for individual log items.

Signed-off-by: Brian Foster <bfoster@redhat.com>
---
 fs/xfs/libxfs/xfs_shared.h |  1 +
 fs/xfs/xfs_log.c           | 12 +++++++++---
 fs/xfs/xfs_log.h           |  3 ++-
 fs/xfs/xfs_log_cil.c       |  2 +-
 fs/xfs/xfs_log_priv.h      |  1 +
 fs/xfs/xfs_trans.c         |  3 ++-
 6 files changed, 16 insertions(+), 6 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_shared.h b/fs/xfs/libxfs/xfs_shared.h
index c45acbd3add9..1ede1e720a5c 100644
--- a/fs/xfs/libxfs/xfs_shared.h
+++ b/fs/xfs/libxfs/xfs_shared.h
@@ -65,6 +65,7 @@ void	xfs_log_get_max_trans_res(struct xfs_mount *mp,
 #define XFS_TRANS_DQ_DIRTY	0x10	/* at least one dquot in trx dirty */
 #define XFS_TRANS_RESERVE	0x20    /* OK to use reserved data blocks */
 #define XFS_TRANS_NO_WRITECOUNT 0x40	/* do not elevate SB writecount */
+#define XFS_TRANS_RELOG		0x80	/* requires extra relog overhead */
 /*
  * LOWMODE is used by the allocator to activate the lowspace algorithm - when
  * free space is running low the extent allocator may choose to allocate an
diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c
index d6b63490a78b..b55abde6c142 100644
--- a/fs/xfs/xfs_log.c
+++ b/fs/xfs/xfs_log.c
@@ -418,7 +418,8 @@ xfs_log_reserve(
 	int		 	cnt,
 	struct xlog_ticket	**ticp,
 	uint8_t		 	client,
-	bool			permanent)
+	bool			permanent,
+	bool			relog)
 {
 	struct xlog		*log = mp->m_log;
 	struct xlog_ticket	*tic;
@@ -433,7 +434,8 @@ xfs_log_reserve(
 	XFS_STATS_INC(mp, xs_try_logspace);
 
 	ASSERT(*ticp == NULL);
-	tic = xlog_ticket_alloc(log, unit_bytes, cnt, client, permanent, 0);
+	tic = xlog_ticket_alloc(log, unit_bytes, cnt, client, permanent, relog,
+				0);
 	*ticp = tic;
 
 	xlog_grant_push_ail(log, tic->t_cnt ? tic->t_unit_res * tic->t_cnt
@@ -831,7 +833,7 @@ xlog_unmount_write(
 	uint			flags = XLOG_UNMOUNT_TRANS;
 	int			error;
 
-	error = xfs_log_reserve(mp, 600, 1, &tic, XFS_LOG, 0);
+	error = xfs_log_reserve(mp, 600, 1, &tic, XFS_LOG, false, false);
 	if (error)
 		goto out_err;
 
@@ -3421,6 +3423,7 @@ xlog_ticket_alloc(
 	int			cnt,
 	char			client,
 	bool			permanent,
+	bool			relog,
 	xfs_km_flags_t		alloc_flags)
 {
 	struct xlog_ticket	*tic;
@@ -3431,6 +3434,9 @@ xlog_ticket_alloc(
 		return NULL;
 
 	unit_res = xfs_log_calc_unit_res(log->l_mp, unit_bytes);
+	/* double the overhead for the relog transaction */
+	if (relog)
+		unit_res += (unit_res - unit_bytes);
 
 	atomic_set(&tic->t_ref, 1);
 	tic->t_task		= current;
diff --git a/fs/xfs/xfs_log.h b/fs/xfs/xfs_log.h
index 6d2f30f42245..f1089a4b299c 100644
--- a/fs/xfs/xfs_log.h
+++ b/fs/xfs/xfs_log.h
@@ -123,7 +123,8 @@ int	  xfs_log_reserve(struct xfs_mount *mp,
 			  int		   count,
 			  struct xlog_ticket **ticket,
 			  uint8_t		   clientid,
-			  bool		   permanent);
+			  bool		   permanent,
+			  bool		   relog);
 int	  xfs_log_regrant(struct xfs_mount *mp, struct xlog_ticket *tic);
 void	  xfs_log_ungrant_bytes(struct xfs_mount *mp, int bytes);
 void      xfs_log_unmount(struct xfs_mount *mp);
diff --git a/fs/xfs/xfs_log_cil.c b/fs/xfs/xfs_log_cil.c
index b43f0e8f43f2..1c48e95402aa 100644
--- a/fs/xfs/xfs_log_cil.c
+++ b/fs/xfs/xfs_log_cil.c
@@ -37,7 +37,7 @@ xlog_cil_ticket_alloc(
 {
 	struct xlog_ticket *tic;
 
-	tic = xlog_ticket_alloc(log, 0, 1, XFS_TRANSACTION, 0,
+	tic = xlog_ticket_alloc(log, 0, 1, XFS_TRANSACTION, false, false,
 				KM_NOFS);
 
 	/*
diff --git a/fs/xfs/xfs_log_priv.h b/fs/xfs/xfs_log_priv.h
index ec22c7a3867f..08d8ff9bce1a 100644
--- a/fs/xfs/xfs_log_priv.h
+++ b/fs/xfs/xfs_log_priv.h
@@ -465,6 +465,7 @@ xlog_ticket_alloc(
 	int		count,
 	char		client,
 	bool		permanent,
+	bool		relog,
 	xfs_km_flags_t	alloc_flags);
 
 
diff --git a/fs/xfs/xfs_trans.c b/fs/xfs/xfs_trans.c
index 4fbe11485bbb..1b25980315bd 100644
--- a/fs/xfs/xfs_trans.c
+++ b/fs/xfs/xfs_trans.c
@@ -177,6 +177,7 @@ xfs_trans_reserve(
 	 */
 	if (resp->tr_logres > 0) {
 		bool	permanent = false;
+		bool	relog	  = (tp->t_flags & XFS_TRANS_RELOG);
 
 		ASSERT(tp->t_log_res == 0 ||
 		       tp->t_log_res == resp->tr_logres);
@@ -199,7 +200,7 @@ xfs_trans_reserve(
 						resp->tr_logres,
 						resp->tr_logcount,
 						&tp->t_ticket, XFS_TRANSACTION,
-						permanent);
+						permanent, relog);
 		}
 
 		if (error)
-- 
2.21.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [RFC v6 PATCH 04/10] xfs: relog log reservation stealing and accounting
  2020-04-06 12:36 [RFC v6 PATCH 00/10] xfs: automatic relogging experiment Brian Foster
                   ` (2 preceding siblings ...)
  2020-04-06 12:36 ` [RFC v6 PATCH 03/10] xfs: extra runtime reservation overhead for relog transactions Brian Foster
@ 2020-04-06 12:36 ` Brian Foster
  2020-04-06 12:36 ` [RFC v6 PATCH 05/10] xfs: automatic log item relog mechanism Brian Foster
                   ` (5 subsequent siblings)
  9 siblings, 0 replies; 15+ messages in thread
From: Brian Foster @ 2020-04-06 12:36 UTC (permalink / raw)
  To: linux-xfs

The transaction that eventually commits relog enabled log items
requires log reservation like any other transaction. It is not safe
to acquire reservation on-demand because relogged items aren't
processed until they are likely at the tail of the log and require
movement in order to free up space in the log. As such, a relog
transaction that blocks on log reservation is a likely deadlock
vector.

To address this problem, implement a model where relog reservation
is contributed by the transaction that enables relogging on a
particular item. Update the relog helper to transfer reservation
from the transaction to the relog pool. The relog pool holds
outstanding reservation such that it can be used to commit the item
in an otherwise empty transaction. The upcoming relog mechanism is
responsible to replenish the relog reservation as items are
relogged. When relog is cancelled on a log item, transfer the
outstanding relog reservation to the current transaction (if
provided) for eventual release or otherwise release it directly to
the grant heads.

Note that this approach has several caveats:

- Log reservation calculations for transactions that relog items
  must be increased accordingly.
- Each current transaction reservation includes extra space for
  various overheads, such as for the CIL ticket, etc. Since items
  can be relogged in arbitrary combinations, this reservation
  overhead must be included for each relogged item.
- Relog reservation must be based on the worst case log requirement
  for the lifetime of an item. This is not a concern for fixed size
  log items, such as most intents. More granular log items like
  buffers can have additional ranges dirtied after relogging has
  been enabled for the item and the relog subsystem must have
  enough reservation to accommodate.

Signed-off-by: Brian Foster <bfoster@redhat.com>
---
 fs/xfs/xfs_log.c        |  3 +++
 fs/xfs/xfs_trans.c      | 20 ++++++++++++++++++++
 fs/xfs/xfs_trans.h      | 17 +++++++++++++++++
 fs/xfs/xfs_trans_ail.c  |  2 ++
 fs/xfs/xfs_trans_priv.h | 14 ++++++++++++++
 5 files changed, 56 insertions(+)

diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c
index b55abde6c142..940e5bb9786c 100644
--- a/fs/xfs/xfs_log.c
+++ b/fs/xfs/xfs_log.c
@@ -983,6 +983,9 @@ xfs_log_item_init(
 	item->li_type = type;
 	item->li_ops = ops;
 	item->li_lv = NULL;
+#ifdef DEBUG
+	atomic64_set(&item->li_relog_res, 0);
+#endif
 
 	INIT_LIST_HEAD(&item->li_ail);
 	INIT_LIST_HEAD(&item->li_cil);
diff --git a/fs/xfs/xfs_trans.c b/fs/xfs/xfs_trans.c
index 1b25980315bd..6fc81c3b8500 100644
--- a/fs/xfs/xfs_trans.c
+++ b/fs/xfs/xfs_trans.c
@@ -20,6 +20,7 @@
 #include "xfs_trace.h"
 #include "xfs_error.h"
 #include "xfs_defer.h"
+#include "xfs_log_priv.h"
 
 kmem_zone_t	*xfs_trans_zone;
 
@@ -776,9 +777,19 @@ xfs_trans_relog_item(
 	struct xfs_trans	*tp,
 	struct xfs_log_item	*lip)
 {
+	int			nbytes;
+
+	ASSERT(tp->t_flags & XFS_TRANS_RELOG);
+
 	if (test_and_set_bit(XFS_LI_RELOG, &lip->li_flags))
 		return;
 	trace_xfs_relog_item(lip);
+
+	nbytes = xfs_relog_calc_res(lip);
+
+	tp->t_ticket->t_curr_res -= nbytes;
+	xfs_relog_res_account(lip, nbytes);
+	tp->t_flags |= XFS_TRANS_DIRTY;
 }
 
 void
@@ -786,9 +797,18 @@ xfs_trans_relog_item_cancel(
 	struct xfs_trans	*tp,
 	struct xfs_log_item	*lip)
 {
+	int			res;
+
 	if (!test_and_clear_bit(XFS_LI_RELOG, &lip->li_flags))
 		return;
 	trace_xfs_relog_item_cancel(lip);
+
+	res = xfs_relog_calc_res(lip);
+	if (tp)
+		tp->t_ticket->t_curr_res += res;
+	else
+		xfs_log_ungrant_bytes(lip->li_mountp, res);
+	xfs_relog_res_account(lip, -res);
 }
 
 /* Detach and unlock all of the items in a transaction */
diff --git a/fs/xfs/xfs_trans.h b/fs/xfs/xfs_trans.h
index 9ea0f415e5ca..b4bb2a6c5251 100644
--- a/fs/xfs/xfs_trans.h
+++ b/fs/xfs/xfs_trans.h
@@ -48,6 +48,9 @@ struct xfs_log_item {
 	struct xfs_log_vec		*li_lv;		/* active log vector */
 	struct xfs_log_vec		*li_lv_shadow;	/* standby vector */
 	xfs_lsn_t			li_seq;		/* CIL commit seq */
+#ifdef DEBUG
+	atomic64_t			li_relog_res;	/* automatic relog log res */
+#endif
 };
 
 /*
@@ -212,6 +215,20 @@ xfs_trans_read_buf(
 				      flags, bpp, ops);
 }
 
+static inline int
+xfs_relog_calc_res(
+	struct xfs_log_item	*lip)
+{
+	int			niovecs = 0;
+	int			nbytes = 0;
+
+	lip->li_ops->iop_size(lip, &niovecs, &nbytes);
+	ASSERT(niovecs == 1);
+	nbytes = xfs_log_calc_unit_res(lip->li_mountp, nbytes);
+
+	return nbytes;
+}
+
 struct xfs_buf	*xfs_trans_getsb(xfs_trans_t *, struct xfs_mount *);
 
 void		xfs_trans_brelse(xfs_trans_t *, struct xfs_buf *);
diff --git a/fs/xfs/xfs_trans_ail.c b/fs/xfs/xfs_trans_ail.c
index 564253550b75..0b4b1c3cc4de 100644
--- a/fs/xfs/xfs_trans_ail.c
+++ b/fs/xfs/xfs_trans_ail.c
@@ -861,6 +861,7 @@ xfs_trans_ail_init(
 	spin_lock_init(&ailp->ail_lock);
 	INIT_LIST_HEAD(&ailp->ail_buf_list);
 	init_waitqueue_head(&ailp->ail_empty);
+	atomic64_set(&ailp->ail_relog_res, 0);
 
 	ailp->ail_task = kthread_run(xfsaild, ailp, "xfsaild/%s",
 			ailp->ail_mount->m_super->s_id);
@@ -881,6 +882,7 @@ xfs_trans_ail_destroy(
 {
 	struct xfs_ail	*ailp = mp->m_ail;
 
+	ASSERT(atomic64_read(&ailp->ail_relog_res) == 0);
 	kthread_stop(ailp->ail_task);
 	kmem_free(ailp);
 }
diff --git a/fs/xfs/xfs_trans_priv.h b/fs/xfs/xfs_trans_priv.h
index c890794314a6..926bafdf6cd7 100644
--- a/fs/xfs/xfs_trans_priv.h
+++ b/fs/xfs/xfs_trans_priv.h
@@ -63,6 +63,7 @@ struct xfs_ail {
 	int			ail_log_flush;
 	struct list_head	ail_buf_list;
 	wait_queue_head_t	ail_empty;
+	atomic64_t		ail_relog_res;
 };
 
 /*
@@ -182,4 +183,17 @@ xfs_set_li_failed(
 	}
 }
 
+static inline int64_t
+xfs_relog_res_account(
+	struct xfs_log_item	*lip,
+	int64_t			bytes)
+{
+#ifdef DEBUG
+	int64_t			res;
+
+	res = atomic64_add_return(bytes, &lip->li_relog_res);
+	ASSERT(res == bytes || (bytes < 0 && res == 0));
+#endif
+	return atomic64_add_return(bytes, &lip->li_ailp->ail_relog_res);
+}
 #endif	/* __XFS_TRANS_PRIV_H__ */
-- 
2.21.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [RFC v6 PATCH 05/10] xfs: automatic log item relog mechanism
  2020-04-06 12:36 [RFC v6 PATCH 00/10] xfs: automatic relogging experiment Brian Foster
                   ` (3 preceding siblings ...)
  2020-04-06 12:36 ` [RFC v6 PATCH 04/10] xfs: relog log reservation stealing and accounting Brian Foster
@ 2020-04-06 12:36 ` Brian Foster
  2020-04-06 12:36 ` [RFC v6 PATCH 06/10] xfs: automatically relog the quotaoff start intent Brian Foster
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 15+ messages in thread
From: Brian Foster @ 2020-04-06 12:36 UTC (permalink / raw)
  To: linux-xfs

Now that relog reservation is available and relog state tracking is
in place, all that remains to automatically relog items is the relog
mechanism itself. An item with relogging enabled is basically pinned
from writeback until relog is disabled. Instead of being written
back, the item must instead be periodically committed in a new
transaction to move it forward in the physical log. The purpose of
moving the item is to avoid long term tail pinning and thus avoid
log deadlocks for long running operations.

The ideal time to relog an item is in response to tail pushing
pressure. This accommodates the current workload at any given time
as opposed to a fixed time interval or log reservation heuristic,
which risks performance regression. This is essentially the same
heuristic that drives metadata writeback. XFS already implements
various log tail pushing heuristics that attempt to keep the log
progressing on an active fileystem under various workloads.

The act of relogging an item simply requires to add it to a
transaction and commit. This pushes the already dirty item into a
subsequent log checkpoint and frees up its previous location in the
on-disk log. Joining an item to a transaction of course requires
locking the item first, which means we have to be aware of
type-specific locks and lock ordering wherever the relog takes
place.

Fundamentally, this points to xfsaild as the ideal location to
process relog enabled items. xfsaild already processes log resident
items, is driven by log tail pushing pressure, processes arbitrary
log item types through callbacks, and is sensitive to type-specific
locking rules by design. The fact that automatic relogging
essentially diverts items between writeback or relog also suggests
xfsaild as an ideal location to process items one way or the other.

Of course, we don't want xfsaild to process transactions as it is a
critical component of the log subsystem for driving metadata
writeback and freeing up log space. Therefore, similar to how
xfsaild builds up a writeback queue of dirty items and queues writes
asynchronously, make xfsaild responsible only for directing pending
relog items into an appropriate queue and create an async
(workqueue) context for processing the queue. The workqueue context
utilizes the pre-reserved log reservation to drain the queue by
rolling a permanent transaction.

Update the AIL pushing infrastructure to support a new RELOG item
state. If a log item push returns the relog state, queue the item
for relog instead of writeback. On completion of a push cycle,
schedule the relog task at the same point metadata buffer I/O is
submitted. This allows items to be relogged automatically under the
same locking rules and pressure heuristics that govern metadata
writeback.

Signed-off-by: Brian Foster <bfoster@redhat.com>
---
 fs/xfs/xfs_trace.h      |   2 +
 fs/xfs/xfs_trans.c      |  13 ++++-
 fs/xfs/xfs_trans.h      |   8 ++-
 fs/xfs/xfs_trans_ail.c  | 111 +++++++++++++++++++++++++++++++++++++++-
 fs/xfs/xfs_trans_priv.h |   6 ++-
 5 files changed, 134 insertions(+), 6 deletions(-)

diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
index b683c94556a3..6edb40e28a9a 100644
--- a/fs/xfs/xfs_trace.h
+++ b/fs/xfs/xfs_trace.h
@@ -1068,6 +1068,8 @@ DEFINE_LOG_ITEM_EVENT(xfs_ail_push);
 DEFINE_LOG_ITEM_EVENT(xfs_ail_pinned);
 DEFINE_LOG_ITEM_EVENT(xfs_ail_locked);
 DEFINE_LOG_ITEM_EVENT(xfs_ail_flushing);
+DEFINE_LOG_ITEM_EVENT(xfs_ail_relog);
+DEFINE_LOG_ITEM_EVENT(xfs_ail_relog_queue);
 DEFINE_LOG_ITEM_EVENT(xfs_relog_item);
 DEFINE_LOG_ITEM_EVENT(xfs_relog_item_cancel);
 
diff --git a/fs/xfs/xfs_trans.c b/fs/xfs/xfs_trans.c
index 6fc81c3b8500..31ef5f671341 100644
--- a/fs/xfs/xfs_trans.c
+++ b/fs/xfs/xfs_trans.c
@@ -795,7 +795,8 @@ xfs_trans_relog_item(
 void
 xfs_trans_relog_item_cancel(
 	struct xfs_trans	*tp,
-	struct xfs_log_item	*lip)
+	struct xfs_log_item	*lip,
+	bool			wait)
 {
 	int			res;
 
@@ -803,6 +804,15 @@ xfs_trans_relog_item_cancel(
 		return;
 	trace_xfs_relog_item_cancel(lip);
 
+	/*
+	 * Must wait on active relog to complete before reclaiming reservation.
+	 * Currently a big hammer because the QUEUED state isn't cleared until
+	 * AIL (re)insertion. A separate state might be warranted.
+	 */
+	while (wait && wait_on_bit_timeout(&lip->li_flags, XFS_LI_RELOG_QUEUED,
+					   TASK_UNINTERRUPTIBLE, HZ))
+		xfs_log_force(lip->li_mountp, XFS_LOG_SYNC);
+
 	res = xfs_relog_calc_res(lip);
 	if (tp)
 		tp->t_ticket->t_curr_res += res;
@@ -896,6 +906,7 @@ xfs_trans_committed_bulk(
 
 		if (aborted)
 			set_bit(XFS_LI_ABORTED, &lip->li_flags);
+		clear_and_wake_up_bit(XFS_LI_RELOG_QUEUED, &lip->li_flags);
 
 		if (lip->li_ops->flags & XFS_ITEM_RELEASE_WHEN_COMMITTED) {
 			lip->li_ops->iop_release(lip);
diff --git a/fs/xfs/xfs_trans.h b/fs/xfs/xfs_trans.h
index b4bb2a6c5251..51f7c92a4ffb 100644
--- a/fs/xfs/xfs_trans.h
+++ b/fs/xfs/xfs_trans.h
@@ -63,13 +63,15 @@ struct xfs_log_item {
 #define	XFS_LI_FAILED	2
 #define	XFS_LI_DIRTY	3	/* log item dirty in transaction */
 #define	XFS_LI_RELOG	4	/* automatically relog item */
+#define XFS_LI_RELOG_QUEUED 5	/* queued for relog */
 
 #define XFS_LI_FLAGS \
 	{ (1 << XFS_LI_IN_AIL),		"IN_AIL" }, \
 	{ (1 << XFS_LI_ABORTED),	"ABORTED" }, \
 	{ (1 << XFS_LI_FAILED),		"FAILED" }, \
 	{ (1 << XFS_LI_DIRTY),		"DIRTY" }, \
-	{ (1 << XFS_LI_RELOG),		"RELOG" }
+	{ (1 << XFS_LI_RELOG),		"RELOG" }, \
+	{ (1 << XFS_LI_RELOG_QUEUED),	"RELOG_QUEUED" }
 
 struct xfs_item_ops {
 	unsigned flags;
@@ -81,7 +83,8 @@ struct xfs_item_ops {
 	void (*iop_committing)(struct xfs_log_item *, xfs_lsn_t commit_lsn);
 	void (*iop_release)(struct xfs_log_item *);
 	xfs_lsn_t (*iop_committed)(struct xfs_log_item *, xfs_lsn_t);
-	void (*iop_error)(struct xfs_log_item *, xfs_buf_t *);
+	void (*iop_error)(struct xfs_log_item *, struct xfs_buf *);
+	void (*iop_relog)(struct xfs_log_item *, struct xfs_trans *);
 };
 
 /*
@@ -100,6 +103,7 @@ void	xfs_log_item_init(struct xfs_mount *mp, struct xfs_log_item *item,
 #define XFS_ITEM_PINNED		1
 #define XFS_ITEM_LOCKED		2
 #define XFS_ITEM_FLUSHING	3
+#define XFS_ITEM_RELOG		4
 
 /*
  * Deferred operation item relogging limits.
diff --git a/fs/xfs/xfs_trans_ail.c b/fs/xfs/xfs_trans_ail.c
index 0b4b1c3cc4de..b78d026d6564 100644
--- a/fs/xfs/xfs_trans_ail.c
+++ b/fs/xfs/xfs_trans_ail.c
@@ -17,6 +17,7 @@
 #include "xfs_errortag.h"
 #include "xfs_error.h"
 #include "xfs_log.h"
+#include "xfs_log_priv.h"
 
 #ifdef DEBUG
 /*
@@ -152,6 +153,88 @@ xfs_ail_max_lsn(
 	return lsn;
 }
 
+/*
+ * Relog log items on the AIL relog queue.
+ *
+ * Note that relog is incompatible with filesystem freeze due to the
+ * multi-transaction nature of its users. The freeze sequence blocks all
+ * transactions and expects to drain the AIL. Allowing the relog transaction to
+ * proceed while freeze is in progress is not sufficient because it is not
+ * responsible for cancellation of relog state. The higher level operations must
+ * be guaranteed to progress to completion before the AIL can be drained of
+ * relog enabled items. This is currently accomplished by holding
+ * ->s_umount (quotaoff) or superblock write references (scrub) across the high
+ * level operations that depend on relog.
+ */
+static void
+xfs_ail_relog(
+	struct work_struct	*work)
+{
+	struct xfs_ail		*ailp = container_of(work, struct xfs_ail,
+						     ail_relog_work);
+	struct xfs_mount	*mp = ailp->ail_mount;
+	struct xfs_trans_res	tres = {};
+	struct xfs_trans	*tp;
+	struct xfs_log_item	*lip, *lipp;
+	int			error;
+	LIST_HEAD(relog_list);
+
+	/*
+	 * Open code allocation of an empty transaction and log ticket. The
+	 * ticket requires no initial reservation because the all outstanding
+	 * relog reservation is attached to log items.
+	 */
+	error = xfs_trans_alloc(mp, &tres, 0, 0, 0, &tp);
+	if (error)
+		goto out;
+	ASSERT(tp && !tp->t_ticket);
+	tp->t_flags |= XFS_TRANS_PERM_LOG_RES;
+	tp->t_log_count = 1;
+	tp->t_ticket = xlog_ticket_alloc(mp->m_log, 0, 1, XFS_TRANSACTION,
+					 true, false, 0);
+	/* reset to zero to undo res overhead calculation on ticket alloc */
+	tp->t_ticket->t_curr_res = 0;
+	tp->t_ticket->t_unit_res = 0;
+	tp->t_log_res = 0;
+
+	spin_lock(&ailp->ail_lock);
+	while (!list_empty(&ailp->ail_relog_list)) {
+		list_splice_init(&ailp->ail_relog_list, &relog_list);
+		spin_unlock(&ailp->ail_lock);
+
+		list_for_each_entry_safe(lip, lipp, &relog_list, li_trans) {
+			list_del_init(&lip->li_trans);
+
+			trace_xfs_ail_relog(lip);
+			ASSERT(lip->li_ops->iop_relog);
+			if (lip->li_ops->iop_relog)
+				lip->li_ops->iop_relog(lip, tp);
+		}
+
+		error = xfs_trans_roll(&tp);
+		if (error) {
+			xfs_trans_cancel(tp);
+			goto out;
+		}
+
+		/*
+		 * Now that the transaction has rolled, reset the ticket to
+		 * zero to reflect that the log reservation held by the
+		 * attached items has been replenished.
+		 */
+		tp->t_ticket->t_curr_res = 0;
+		tp->t_ticket->t_unit_res = 0;
+		tp->t_log_res = 0;
+
+		spin_lock(&ailp->ail_lock);
+	}
+	spin_unlock(&ailp->ail_lock);
+	xfs_trans_cancel(tp);
+
+out:
+	ASSERT(!error || XFS_FORCED_SHUTDOWN(mp));
+}
+
 /*
  * The cursor keeps track of where our current traversal is up to by tracking
  * the next item in the list for us. However, for this to be safe, removing an
@@ -372,7 +455,7 @@ static long
 xfsaild_push(
 	struct xfs_ail		*ailp)
 {
-	xfs_mount_t		*mp = ailp->ail_mount;
+	struct xfs_mount	*mp = ailp->ail_mount;
 	struct xfs_ail_cursor	cur;
 	struct xfs_log_item	*lip;
 	xfs_lsn_t		lsn;
@@ -434,6 +517,17 @@ xfsaild_push(
 			ailp->ail_last_pushed_lsn = lsn;
 			break;
 
+		case XFS_ITEM_RELOG:
+			/*
+			 * The item requires a relog. Add to the relog queue
+			 * and set a bit to prevent further relog requests.
+			 */
+			trace_xfs_ail_relog_queue(lip);
+			ASSERT(list_empty(&lip->li_trans));
+			set_bit(XFS_LI_RELOG_QUEUED, &lip->li_flags);
+			list_add_tail(&lip->li_trans, &ailp->ail_relog_list);
+			break;
+
 		case XFS_ITEM_FLUSHING:
 			/*
 			 * The item or its backing buffer is already being
@@ -500,6 +594,9 @@ xfsaild_push(
 	if (xfs_buf_delwri_submit_nowait(&ailp->ail_buf_list))
 		ailp->ail_log_flush++;
 
+	if (!list_empty(&ailp->ail_relog_list))
+		queue_work(ailp->ail_relog_wq, &ailp->ail_relog_work);
+
 	if (!count || XFS_LSN_CMP(lsn, target) >= 0) {
 out_done:
 		/*
@@ -861,16 +958,25 @@ xfs_trans_ail_init(
 	spin_lock_init(&ailp->ail_lock);
 	INIT_LIST_HEAD(&ailp->ail_buf_list);
 	init_waitqueue_head(&ailp->ail_empty);
+	INIT_LIST_HEAD(&ailp->ail_relog_list);
+	INIT_WORK(&ailp->ail_relog_work, xfs_ail_relog);
 	atomic64_set(&ailp->ail_relog_res, 0);
 
+	ailp->ail_relog_wq = alloc_workqueue("xfs-relog/%s", WQ_FREEZABLE, 0,
+					     mp->m_super->s_id);
+	if (!ailp->ail_relog_wq)
+		goto out_free_ailp;
+
 	ailp->ail_task = kthread_run(xfsaild, ailp, "xfsaild/%s",
 			ailp->ail_mount->m_super->s_id);
 	if (IS_ERR(ailp->ail_task))
-		goto out_free_ailp;
+		goto out_destroy_wq;
 
 	mp->m_ail = ailp;
 	return 0;
 
+out_destroy_wq:
+	destroy_workqueue(ailp->ail_relog_wq);
 out_free_ailp:
 	kmem_free(ailp);
 	return -ENOMEM;
@@ -884,5 +990,6 @@ xfs_trans_ail_destroy(
 
 	ASSERT(atomic64_read(&ailp->ail_relog_res) == 0);
 	kthread_stop(ailp->ail_task);
+	destroy_workqueue(ailp->ail_relog_wq);
 	kmem_free(ailp);
 }
diff --git a/fs/xfs/xfs_trans_priv.h b/fs/xfs/xfs_trans_priv.h
index 926bafdf6cd7..9301ea62d0ab 100644
--- a/fs/xfs/xfs_trans_priv.h
+++ b/fs/xfs/xfs_trans_priv.h
@@ -17,7 +17,8 @@ void	xfs_trans_init(struct xfs_mount *);
 void	xfs_trans_add_item(struct xfs_trans *, struct xfs_log_item *);
 void	xfs_trans_del_item(struct xfs_log_item *);
 void	xfs_trans_relog_item(struct xfs_trans *, struct xfs_log_item *);
-void	xfs_trans_relog_item_cancel(struct xfs_trans *, struct xfs_log_item *);
+void	xfs_trans_relog_item_cancel(struct xfs_trans *, struct xfs_log_item *,
+				    bool wait);
 void	xfs_trans_unreserve_and_mod_sb(struct xfs_trans *tp);
 
 void	xfs_trans_committed_bulk(struct xfs_ail *ailp, struct xfs_log_vec *lv,
@@ -64,6 +65,9 @@ struct xfs_ail {
 	struct list_head	ail_buf_list;
 	wait_queue_head_t	ail_empty;
 	atomic64_t		ail_relog_res;
+	struct work_struct	ail_relog_work;
+	struct list_head	ail_relog_list;
+	struct workqueue_struct	*ail_relog_wq;
 };
 
 /*
-- 
2.21.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [RFC v6 PATCH 06/10] xfs: automatically relog the quotaoff start intent
  2020-04-06 12:36 [RFC v6 PATCH 00/10] xfs: automatic relogging experiment Brian Foster
                   ` (4 preceding siblings ...)
  2020-04-06 12:36 ` [RFC v6 PATCH 05/10] xfs: automatic log item relog mechanism Brian Foster
@ 2020-04-06 12:36 ` Brian Foster
  2020-04-06 12:36 ` [RFC v6 PATCH 07/10] xfs: prevent fs freeze with outstanding relog items Brian Foster
                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 15+ messages in thread
From: Brian Foster @ 2020-04-06 12:36 UTC (permalink / raw)
  To: linux-xfs

The quotaoff operation has a rare but longstanding deadlock vector
in terms of how the operation is logged. A quotaoff start intent is
logged (synchronously) at the onset to ensure recovery can handle
the operation if interrupted before in-core changes are made. This
quotaoff intent pins the log tail while the quotaoff sequence scans
and purges dquots from all in-core inodes. While this operation
generally doesn't generate much log traffic on its own, it can be
time consuming. If unrelated, concurrent filesystem activity
consumes remaining log space before quotaoff is able to acquire log
reservation for the quotaoff end intent, the filesystem locks up
indefinitely.

quotaoff cannot allocate the end intent before the scan because the
latter can result in transaction allocation itself in certain
indirect cases (releasing an inode, for example). Further, rolling
the original transaction is difficult because the scanning work
occurs multiple layers down where caller context is lost and not
much information is available to determine how often to roll the
transaction.

To address this problem, enable automatic relogging of the quotaoff
start intent. This automatically relogs the intent whenever AIL
pushing finds the item at the tail of the log. When quotaoff
completes, wait for relogging to complete as the end intent expects
to be able to permanently remove the start intent from the log
subsystem. This ensures that the log tail is kept moving during a
particularly long quotaoff operation and avoids the log reservation
deadlock.

Note that the quotaoff reservation calculation does not need to be
updated for relog as it already (incorrectly) accounts for two
quotaoff intents.

Signed-off-by: Brian Foster <bfoster@redhat.com>
---
 fs/xfs/xfs_dquot_item.c  | 26 ++++++++++++++++++++++++++
 fs/xfs/xfs_qm_syscalls.c | 12 +++++++++++-
 2 files changed, 37 insertions(+), 1 deletion(-)

diff --git a/fs/xfs/xfs_dquot_item.c b/fs/xfs/xfs_dquot_item.c
index baad1748d0d1..22f8b0750afc 100644
--- a/fs/xfs/xfs_dquot_item.c
+++ b/fs/xfs/xfs_dquot_item.c
@@ -17,6 +17,7 @@
 #include "xfs_trans_priv.h"
 #include "xfs_qm.h"
 #include "xfs_log.h"
+#include "xfs_log_priv.h"
 
 static inline struct xfs_dq_logitem *DQUOT_ITEM(struct xfs_log_item *lip)
 {
@@ -298,6 +299,13 @@ xfs_qm_qoff_logitem_push(
 	struct xfs_log_item	*lip,
 	struct list_head	*buffer_list)
 {
+	struct xfs_log_item	*mlip = xfs_ail_min(lip->li_ailp);
+
+	if (test_bit(XFS_LI_RELOG, &lip->li_flags) &&
+	    !test_bit(XFS_LI_RELOG_QUEUED, &lip->li_flags) &&
+	    !XFS_LSN_CMP(lip->li_lsn, mlip->li_lsn))
+		return XFS_ITEM_RELOG;
+
 	return XFS_ITEM_LOCKED;
 }
 
@@ -329,6 +337,23 @@ xfs_qm_qoff_logitem_release(
 	}
 }
 
+STATIC void
+xfs_qm_qoff_logitem_relog(
+	struct xfs_log_item	*lip,
+	struct xfs_trans	*tp)
+{
+	int			res;
+
+	res = xfs_relog_calc_res(lip);
+
+	xfs_trans_add_item(tp, lip);
+	tp->t_ticket->t_curr_res += res;
+	tp->t_ticket->t_unit_res += res;
+	tp->t_log_res += res;
+	tp->t_flags |= XFS_TRANS_DIRTY;
+	set_bit(XFS_LI_DIRTY, &lip->li_flags);
+}
+
 static const struct xfs_item_ops xfs_qm_qoffend_logitem_ops = {
 	.iop_size	= xfs_qm_qoff_logitem_size,
 	.iop_format	= xfs_qm_qoff_logitem_format,
@@ -342,6 +367,7 @@ static const struct xfs_item_ops xfs_qm_qoff_logitem_ops = {
 	.iop_format	= xfs_qm_qoff_logitem_format,
 	.iop_push	= xfs_qm_qoff_logitem_push,
 	.iop_release	= xfs_qm_qoff_logitem_release,
+	.iop_relog	= xfs_qm_qoff_logitem_relog,
 };
 
 /*
diff --git a/fs/xfs/xfs_qm_syscalls.c b/fs/xfs/xfs_qm_syscalls.c
index 5d5ac65aa1cc..dc154051ec7b 100644
--- a/fs/xfs/xfs_qm_syscalls.c
+++ b/fs/xfs/xfs_qm_syscalls.c
@@ -18,6 +18,7 @@
 #include "xfs_quota.h"
 #include "xfs_qm.h"
 #include "xfs_icache.h"
+#include "xfs_trans_priv.h"
 
 STATIC int
 xfs_qm_log_quotaoff(
@@ -29,12 +30,14 @@ xfs_qm_log_quotaoff(
 	int			error;
 	struct xfs_qoff_logitem	*qoffi;
 
-	error = xfs_trans_alloc(mp, &M_RES(mp)->tr_qm_quotaoff, 0, 0, 0, &tp);
+	error = xfs_trans_alloc(mp, &M_RES(mp)->tr_qm_quotaoff, 0, 0,
+				XFS_TRANS_RELOG, &tp);
 	if (error)
 		goto out;
 
 	qoffi = xfs_trans_get_qoff_item(tp, NULL, flags & XFS_ALL_QUOTA_ACCT);
 	xfs_trans_log_quotaoff_item(tp, qoffi);
+	xfs_trans_relog_item(tp, &qoffi->qql_item);
 
 	spin_lock(&mp->m_sb_lock);
 	mp->m_sb.sb_qflags = (mp->m_qflags & ~(flags)) & XFS_MOUNT_QUOTA_ALL;
@@ -71,6 +74,13 @@ xfs_qm_log_quotaoff_end(
 	if (error)
 		return error;
 
+	/*
+	 * startqoff must be in the AIL and not the CIL when the end intent
+	 * commits to ensure it is not readded to the AIL out of order. Wait on
+	 * relog activity to drain to isolate startqoff to the AIL.
+	 */
+	xfs_trans_relog_item_cancel(tp, &(*startqoff)->qql_item, true);
+
 	qoffi = xfs_trans_get_qoff_item(tp, *startqoff,
 					flags & XFS_ALL_QUOTA_ACCT);
 	xfs_trans_log_quotaoff_item(tp, qoffi);
-- 
2.21.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [RFC v6 PATCH 07/10] xfs: prevent fs freeze with outstanding relog items
  2020-04-06 12:36 [RFC v6 PATCH 00/10] xfs: automatic relogging experiment Brian Foster
                   ` (5 preceding siblings ...)
  2020-04-06 12:36 ` [RFC v6 PATCH 06/10] xfs: automatically relog the quotaoff start intent Brian Foster
@ 2020-04-06 12:36 ` Brian Foster
  2020-04-09  0:05   ` Allison Collins
  2020-04-06 12:36 ` [RFC v6 PATCH 08/10] xfs: buffer relogging support prototype Brian Foster
                   ` (2 subsequent siblings)
  9 siblings, 1 reply; 15+ messages in thread
From: Brian Foster @ 2020-04-06 12:36 UTC (permalink / raw)
  To: linux-xfs

The automatic relog mechanism is currently incompatible with
filesystem freeze in a generic sense. Freeze protection is currently
implemented via locks that cannot be held on return to userspace,
which means we can't hold a superblock write reference for the
duration relogging is enabled on an item. It's too late to block
when the freeze sequence calls into the filesystem because the
transaction subsystem has already begun to be frozen. Not only can
this block the relog transaction, but blocking any unrelated
transaction essentially prevents a particular operation from
progressing to the point where it can disable relogging on an item.
Therefore marking the relog transaction as "nowrite" does not solve
the problem.

This is not a problem in practice because the two primary use cases
already exclude freeze via other means. quotaoff holds ->s_umount
read locked across the operation and scrub explicitly takes a
superblock write reference, both of which block freeze of the
transaction subsystem for the duration of relog enabled items.

As a fallback for future use cases and the upcoming random buffer
relogging test code, fail fs freeze attempts when the global relog
reservation counter is elevated to prevent deadlock. This is a
partial punt of the problem as compared to teaching freeze to wait
on relogged items because the only current dependency is test code.
In other words, this patch prevents deadlock if a user happens to
issue a freeze in conjunction with random buffer relog injection.

Signed-off-by: Brian Foster <bfoster@redhat.com>
---
 fs/xfs/xfs_super.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
index abf06bf9c3f3..0efa9dc70d71 100644
--- a/fs/xfs/xfs_super.c
+++ b/fs/xfs/xfs_super.c
@@ -35,6 +35,7 @@
 #include "xfs_refcount_item.h"
 #include "xfs_bmap_item.h"
 #include "xfs_reflink.h"
+#include "xfs_trans_priv.h"
 
 #include <linux/magic.h>
 #include <linux/fs_context.h>
@@ -870,6 +871,9 @@ xfs_fs_freeze(
 {
 	struct xfs_mount	*mp = XFS_M(sb);
 
+	if (WARN_ON_ONCE(atomic64_read(&mp->m_ail->ail_relog_res)))
+		return -EAGAIN;
+
 	xfs_stop_block_reaping(mp);
 	xfs_save_resvblks(mp);
 	xfs_quiesce_attr(mp);
-- 
2.21.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [RFC v6 PATCH 08/10] xfs: buffer relogging support prototype
  2020-04-06 12:36 [RFC v6 PATCH 00/10] xfs: automatic relogging experiment Brian Foster
                   ` (6 preceding siblings ...)
  2020-04-06 12:36 ` [RFC v6 PATCH 07/10] xfs: prevent fs freeze with outstanding relog items Brian Foster
@ 2020-04-06 12:36 ` Brian Foster
  2020-04-06 12:36 ` [RFC v6 PATCH 09/10] xfs: create an error tag for random relog reservation Brian Foster
  2020-04-06 12:36 ` [RFC v6 PATCH 10/10] xfs: relog random buffers based on errortag Brian Foster
  9 siblings, 0 replies; 15+ messages in thread
From: Brian Foster @ 2020-04-06 12:36 UTC (permalink / raw)
  To: linux-xfs

Add a quick and dirty implementation of buffer relogging support.
There is currently no use case for buffer relogging. This is for
experimental use only and serves as an example to demonstrate the
ability to relog arbitrary items in the future, if necessary.

Add helpers to manage relogged buffers, update the buffer log item
push handler to support relogged BLIs and add a log item relog
callback to properly join buffers to the relog transaction. Note
that buffers associated with higher level log items (i.e., inodes
and dquots) are skipped.

Signed-off-by: Brian Foster <bfoster@redhat.com>
---
 fs/xfs/xfs_buf.c       |  4 +++
 fs/xfs/xfs_buf_item.c  | 51 +++++++++++++++++++++++++++++++++---
 fs/xfs/xfs_trans.h     |  9 ++++++-
 fs/xfs/xfs_trans_buf.c | 59 ++++++++++++++++++++++++++++++++++++++++++
 4 files changed, 118 insertions(+), 5 deletions(-)

diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c
index 9ec3eaf1c618..2d16c7d6dee0 100644
--- a/fs/xfs/xfs_buf.c
+++ b/fs/xfs/xfs_buf.c
@@ -16,6 +16,8 @@
 #include "xfs_log.h"
 #include "xfs_errortag.h"
 #include "xfs_error.h"
+#include "xfs_trans.h"
+#include "xfs_buf_item.h"
 
 static kmem_zone_t *xfs_buf_zone;
 
@@ -1477,6 +1479,8 @@ __xfs_buf_submit(
 	trace_xfs_buf_submit(bp, _RET_IP_);
 
 	ASSERT(!(bp->b_flags & _XBF_DELWRI_Q));
+	ASSERT(!bp->b_log_item ||
+	       !test_bit(XFS_LI_RELOG, &bp->b_log_item->bli_item.li_flags));
 
 	/* on shutdown we stale and complete the buffer immediately */
 	if (XFS_FORCED_SHUTDOWN(bp->b_mount)) {
diff --git a/fs/xfs/xfs_buf_item.c b/fs/xfs/xfs_buf_item.c
index 1545657c3ca0..762359e6ab65 100644
--- a/fs/xfs/xfs_buf_item.c
+++ b/fs/xfs/xfs_buf_item.c
@@ -16,7 +16,7 @@
 #include "xfs_trans_priv.h"
 #include "xfs_trace.h"
 #include "xfs_log.h"
-
+#include "xfs_log_priv.h"
 
 kmem_zone_t	*xfs_buf_item_zone;
 
@@ -157,7 +157,7 @@ xfs_buf_item_size(
 		return;
 	}
 
-	ASSERT(bip->bli_flags & XFS_BLI_LOGGED);
+	ASSERT(bip->bli_flags & XFS_BLI_DIRTY);
 
 	if (bip->bli_flags & XFS_BLI_ORDERED) {
 		/*
@@ -463,6 +463,13 @@ xfs_buf_item_unpin(
 			list_del_init(&bp->b_li_list);
 			bp->b_iodone = NULL;
 		} else {
+			/* racy */
+			ASSERT(!test_bit(XFS_LI_RELOG_QUEUED, &lip->li_flags));
+			if (test_bit(XFS_LI_RELOG, &lip->li_flags)) {
+				atomic_dec(&bp->b_pin_count);
+				xfs_trans_relog_item_cancel(NULL, lip, true);
+			}
+
 			spin_lock(&ailp->ail_lock);
 			xfs_trans_ail_delete(ailp, lip, SHUTDOWN_LOG_IO_ERROR);
 			xfs_buf_item_relse(bp);
@@ -513,8 +520,6 @@ xfs_buf_item_push(
 	struct xfs_buf		*bp = bip->bli_buf;
 	uint			rval = XFS_ITEM_SUCCESS;
 
-	if (xfs_buf_ispinned(bp))
-		return XFS_ITEM_PINNED;
 	if (!xfs_buf_trylock(bp)) {
 		/*
 		 * If we have just raced with a buffer being pinned and it has
@@ -528,6 +533,16 @@ xfs_buf_item_push(
 		return XFS_ITEM_LOCKED;
 	}
 
+	/* relog bufs are pinned so check relog state first */
+	if (test_bit(XFS_LI_RELOG, &lip->li_flags) &&
+	    !test_bit(XFS_LI_RELOG_QUEUED, &lip->li_flags))
+		return XFS_ITEM_RELOG;
+
+	if (xfs_buf_ispinned(bp)) {
+		xfs_buf_unlock(bp);
+		return XFS_ITEM_PINNED;
+	}
+
 	ASSERT(!(bip->bli_flags & XFS_BLI_STALE));
 
 	trace_xfs_buf_item_push(bip);
@@ -694,6 +709,28 @@ xfs_buf_item_committed(
 	return lsn;
 }
 
+STATIC void
+xfs_buf_item_relog(
+	struct xfs_log_item	*lip,
+	struct xfs_trans	*tp)
+{
+	struct xfs_buf_log_item	*bip = BUF_ITEM(lip);
+	int			res;
+
+	/*
+	 * Grab a reference to the buffer for the transaction before we join
+	 * and dirty it.
+	 */
+	xfs_buf_hold(bip->bli_buf);
+	xfs_trans_bjoin(tp, bip->bli_buf);
+	xfs_trans_dirty_buf(tp, bip->bli_buf);
+
+	res = xfs_relog_calc_res(lip);
+	tp->t_ticket->t_curr_res += res;
+	tp->t_ticket->t_unit_res += res;
+	tp->t_log_res += res;
+}
+
 static const struct xfs_item_ops xfs_buf_item_ops = {
 	.iop_size	= xfs_buf_item_size,
 	.iop_format	= xfs_buf_item_format,
@@ -703,6 +740,7 @@ static const struct xfs_item_ops xfs_buf_item_ops = {
 	.iop_committing	= xfs_buf_item_committing,
 	.iop_committed	= xfs_buf_item_committed,
 	.iop_push	= xfs_buf_item_push,
+	.iop_relog	= xfs_buf_item_relog,
 };
 
 STATIC void
@@ -956,6 +994,11 @@ STATIC void
 xfs_buf_item_free(
 	struct xfs_buf_log_item	*bip)
 {
+	ASSERT(!test_bit(XFS_LI_RELOG, &bip->bli_item.li_flags));
+#ifdef DEBUG
+	ASSERT(!atomic64_read(&bip->bli_item.li_relog_res));
+#endif
+
 	xfs_buf_item_free_format(bip);
 	kmem_free(bip->bli_item.li_lv_shadow);
 	kmem_cache_free(xfs_buf_item_zone, bip);
diff --git a/fs/xfs/xfs_trans.h b/fs/xfs/xfs_trans.h
index 51f7c92a4ffb..a7a70430b8b2 100644
--- a/fs/xfs/xfs_trans.h
+++ b/fs/xfs/xfs_trans.h
@@ -227,7 +227,11 @@ xfs_relog_calc_res(
 	int			nbytes = 0;
 
 	lip->li_ops->iop_size(lip, &niovecs, &nbytes);
-	ASSERT(niovecs == 1);
+	ASSERT(niovecs == 1 || lip->li_type == XFS_LI_BUF);
+	/* ideally this is somewhere buffer specific.. */
+	if (lip->li_type == XFS_LI_BUF)
+		nbytes += round_up(sizeof(struct xlog_op_header) +
+				   sizeof(struct xfs_buf_log_format), 128);
 	nbytes = xfs_log_calc_unit_res(lip->li_mountp, nbytes);
 
 	return nbytes;
@@ -244,6 +248,9 @@ void		xfs_trans_inode_buf(xfs_trans_t *, struct xfs_buf *);
 void		xfs_trans_stale_inode_buf(xfs_trans_t *, struct xfs_buf *);
 bool		xfs_trans_ordered_buf(xfs_trans_t *, struct xfs_buf *);
 void		xfs_trans_dquot_buf(xfs_trans_t *, struct xfs_buf *, uint);
+bool		xfs_trans_relog_buf(struct xfs_trans *, struct xfs_buf *);
+void		xfs_trans_relog_buf_cancel(struct xfs_trans *,
+					   struct xfs_buf *);
 void		xfs_trans_inode_alloc_buf(xfs_trans_t *, struct xfs_buf *);
 void		xfs_trans_ichgtime(struct xfs_trans *, struct xfs_inode *, int);
 void		xfs_trans_ijoin(struct xfs_trans *, struct xfs_inode *, uint);
diff --git a/fs/xfs/xfs_trans_buf.c b/fs/xfs/xfs_trans_buf.c
index 08174ffa2118..2bed5d615541 100644
--- a/fs/xfs/xfs_trans_buf.c
+++ b/fs/xfs/xfs_trans_buf.c
@@ -588,6 +588,8 @@ xfs_trans_binval(
 		return;
 	}
 
+	/* return relog res before we reset dirty state */
+	xfs_trans_relog_buf_cancel(tp, bp);
 	xfs_buf_stale(bp);
 
 	bip->bli_flags |= XFS_BLI_STALE;
@@ -787,3 +789,60 @@ xfs_trans_dquot_buf(
 
 	xfs_trans_buf_set_type(tp, bp, type);
 }
+
+/*
+ * Enable automatic relogging on a buffer. This essentially pins a dirty buffer
+ * in-core until relogging is disabled.
+ */
+bool
+xfs_trans_relog_buf(
+	struct xfs_trans	*tp,
+	struct xfs_buf		*bp)
+{
+	struct xfs_buf_log_item	*bip = bp->b_log_item;
+	enum xfs_blft		blft;
+
+	ASSERT(xfs_buf_islocked(bp));
+
+	if (bip->bli_flags & (XFS_BLI_ORDERED|XFS_BLI_STALE))
+		return false;
+
+	/*
+	 * Skip buffers with higher level log items. Those items must be
+	 * relogged directly to move in the log.
+	 */
+	blft = xfs_blft_from_flags(&bip->__bli_format);
+	switch (blft) {
+	case XFS_BLFT_DINO_BUF:
+	case XFS_BLFT_UDQUOT_BUF:
+	case XFS_BLFT_PDQUOT_BUF:
+	case XFS_BLFT_GDQUOT_BUF:
+		return false;
+	default:
+		break;
+	}
+
+	/*
+	 * Relog expects a worst case reservation from ->iop_size. Hack that in
+	 * here by logging the entire buffer in this transaction. Also grab a
+	 * buffer pin to prevent it from being written out.
+	 */
+	xfs_buf_item_log(bip, 0, BBTOB(bp->b_length) - 1);
+	atomic_inc(&bp->b_pin_count);
+	xfs_trans_relog_item(tp, &bip->bli_item);
+	return true;
+}
+
+void
+xfs_trans_relog_buf_cancel(
+	struct xfs_trans	*tp,
+	struct xfs_buf		*bp)
+{
+	struct xfs_buf_log_item	*bip = bp->b_log_item;
+
+	if (!test_bit(XFS_LI_RELOG, &bip->bli_item.li_flags))
+		return;
+
+	atomic_dec(&bp->b_pin_count);
+	xfs_trans_relog_item_cancel(tp, &bip->bli_item, false);
+}
-- 
2.21.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [RFC v6 PATCH 09/10] xfs: create an error tag for random relog reservation
  2020-04-06 12:36 [RFC v6 PATCH 00/10] xfs: automatic relogging experiment Brian Foster
                   ` (7 preceding siblings ...)
  2020-04-06 12:36 ` [RFC v6 PATCH 08/10] xfs: buffer relogging support prototype Brian Foster
@ 2020-04-06 12:36 ` Brian Foster
  2020-04-06 12:36 ` [RFC v6 PATCH 10/10] xfs: relog random buffers based on errortag Brian Foster
  9 siblings, 0 replies; 15+ messages in thread
From: Brian Foster @ 2020-04-06 12:36 UTC (permalink / raw)
  To: linux-xfs

Create an errortag to enable relogging on random transactions. Since
relogging requires extra transaction reservation, artificially bump
the reservation on selected transactions and tag them with the relog
flag such that the requisite reservation overhead is added by the
ticket allocation code. This allows subsequent random buffer relog
events to target transactions where reservation is included. This is
necessary to avoid transaction reservation overruns on non-relog
transactions.

Note that this does not yet enable relogging of any particular
items. The tag will be reused in a subsequent patch to enable random
buffer relogging.

Signed-off-by: Brian Foster <bfoster@redhat.com>
---
 fs/xfs/libxfs/xfs_errortag.h |  4 +++-
 fs/xfs/xfs_error.c           |  3 +++
 fs/xfs/xfs_trans.c           | 21 ++++++++++++++++-----
 3 files changed, 22 insertions(+), 6 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_errortag.h b/fs/xfs/libxfs/xfs_errortag.h
index 79e6c4fb1d8a..ca7bcadb9455 100644
--- a/fs/xfs/libxfs/xfs_errortag.h
+++ b/fs/xfs/libxfs/xfs_errortag.h
@@ -55,7 +55,8 @@
 #define XFS_ERRTAG_FORCE_SCRUB_REPAIR			32
 #define XFS_ERRTAG_FORCE_SUMMARY_RECALC			33
 #define XFS_ERRTAG_IUNLINK_FALLBACK			34
-#define XFS_ERRTAG_MAX					35
+#define XFS_ERRTAG_RELOG				35
+#define XFS_ERRTAG_MAX					36
 
 /*
  * Random factors for above tags, 1 means always, 2 means 1/2 time, etc.
@@ -95,5 +96,6 @@
 #define XFS_RANDOM_FORCE_SCRUB_REPAIR			1
 #define XFS_RANDOM_FORCE_SUMMARY_RECALC			1
 #define XFS_RANDOM_IUNLINK_FALLBACK			(XFS_RANDOM_DEFAULT/10)
+#define XFS_RANDOM_RELOG				XFS_RANDOM_DEFAULT
 
 #endif /* __XFS_ERRORTAG_H_ */
diff --git a/fs/xfs/xfs_error.c b/fs/xfs/xfs_error.c
index a21e9cc6516a..9d1f9b4c50ea 100644
--- a/fs/xfs/xfs_error.c
+++ b/fs/xfs/xfs_error.c
@@ -53,6 +53,7 @@ static unsigned int xfs_errortag_random_default[] = {
 	XFS_RANDOM_FORCE_SCRUB_REPAIR,
 	XFS_RANDOM_FORCE_SUMMARY_RECALC,
 	XFS_RANDOM_IUNLINK_FALLBACK,
+	XFS_RANDOM_RELOG,
 };
 
 struct xfs_errortag_attr {
@@ -162,6 +163,7 @@ XFS_ERRORTAG_ATTR_RW(buf_lru_ref,	XFS_ERRTAG_BUF_LRU_REF);
 XFS_ERRORTAG_ATTR_RW(force_repair,	XFS_ERRTAG_FORCE_SCRUB_REPAIR);
 XFS_ERRORTAG_ATTR_RW(bad_summary,	XFS_ERRTAG_FORCE_SUMMARY_RECALC);
 XFS_ERRORTAG_ATTR_RW(iunlink_fallback,	XFS_ERRTAG_IUNLINK_FALLBACK);
+XFS_ERRORTAG_ATTR_RW(relog,		XFS_ERRTAG_RELOG);
 
 static struct attribute *xfs_errortag_attrs[] = {
 	XFS_ERRORTAG_ATTR_LIST(noerror),
@@ -199,6 +201,7 @@ static struct attribute *xfs_errortag_attrs[] = {
 	XFS_ERRORTAG_ATTR_LIST(force_repair),
 	XFS_ERRORTAG_ATTR_LIST(bad_summary),
 	XFS_ERRORTAG_ATTR_LIST(iunlink_fallback),
+	XFS_ERRORTAG_ATTR_LIST(relog),
 	NULL,
 };
 
diff --git a/fs/xfs/xfs_trans.c b/fs/xfs/xfs_trans.c
index 31ef5f671341..d7ca70a35d33 100644
--- a/fs/xfs/xfs_trans.c
+++ b/fs/xfs/xfs_trans.c
@@ -21,6 +21,7 @@
 #include "xfs_error.h"
 #include "xfs_defer.h"
 #include "xfs_log_priv.h"
+#include "xfs_errortag.h"
 
 kmem_zone_t	*xfs_trans_zone;
 
@@ -179,9 +180,10 @@ xfs_trans_reserve(
 	if (resp->tr_logres > 0) {
 		bool	permanent = false;
 		bool	relog	  = (tp->t_flags & XFS_TRANS_RELOG);
+		int	logres = resp->tr_logres;
 
 		ASSERT(tp->t_log_res == 0 ||
-		       tp->t_log_res == resp->tr_logres);
+		       tp->t_log_res == logres);
 		ASSERT(tp->t_log_count == 0 ||
 		       tp->t_log_count == resp->tr_logcount);
 
@@ -197,9 +199,18 @@ xfs_trans_reserve(
 			ASSERT(resp->tr_logflags & XFS_TRANS_PERM_LOG_RES);
 			error = xfs_log_regrant(mp, tp->t_ticket);
 		} else {
-			error = xfs_log_reserve(mp,
-						resp->tr_logres,
-						resp->tr_logcount,
+			/*
+			 * Enable relog overhead on random transactions to support
+			 * random item relogging.
+			 */
+			if (XFS_TEST_ERROR(false, mp, XFS_ERRTAG_RELOG) &&
+			    !relog) {
+				tp->t_flags |= XFS_TRANS_RELOG;
+				relog = true;
+				logres <<= 1;
+			}
+
+			error = xfs_log_reserve(mp, logres, resp->tr_logcount,
 						&tp->t_ticket, XFS_TRANSACTION,
 						permanent, relog);
 		}
@@ -207,7 +218,7 @@ xfs_trans_reserve(
 		if (error)
 			goto undo_blocks;
 
-		tp->t_log_res = resp->tr_logres;
+		tp->t_log_res = logres;
 		tp->t_log_count = resp->tr_logcount;
 	}
 
-- 
2.21.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [RFC v6 PATCH 10/10] xfs: relog random buffers based on errortag
  2020-04-06 12:36 [RFC v6 PATCH 00/10] xfs: automatic relogging experiment Brian Foster
                   ` (8 preceding siblings ...)
  2020-04-06 12:36 ` [RFC v6 PATCH 09/10] xfs: create an error tag for random relog reservation Brian Foster
@ 2020-04-06 12:36 ` Brian Foster
  9 siblings, 0 replies; 15+ messages in thread
From: Brian Foster @ 2020-04-06 12:36 UTC (permalink / raw)
  To: linux-xfs

Since there is currently no specific use case for buffer relogging,
add some hacky and experimental code to relog random buffers when
the associated errortag is enabled. Use fixed termination logic
regardless of the user-specified error rate to help ensure that the
relog queue doesn't grow indefinitely.

Note that this patch was useful in causing log reservation deadlocks
on an fsstress workload if the relog mechanism code is modified to
acquire its own log reservation rather than rely on the
pre-reservation mechanism. In other words, this helps prove that the
relog reservation management code effectively avoids log reservation
deadlocks.

Signed-off-by: Brian Foster <bfoster@redhat.com>
---
 fs/xfs/xfs_buf_item.c  |  1 +
 fs/xfs/xfs_trans.h     |  4 +++-
 fs/xfs/xfs_trans_ail.c | 33 +++++++++++++++++++++++++++++++++
 fs/xfs/xfs_trans_buf.c | 14 ++++++++++++++
 4 files changed, 51 insertions(+), 1 deletion(-)

diff --git a/fs/xfs/xfs_buf_item.c b/fs/xfs/xfs_buf_item.c
index 762359e6ab65..ed91de33278b 100644
--- a/fs/xfs/xfs_buf_item.c
+++ b/fs/xfs/xfs_buf_item.c
@@ -467,6 +467,7 @@ xfs_buf_item_unpin(
 			ASSERT(!test_bit(XFS_LI_RELOG_QUEUED, &lip->li_flags));
 			if (test_bit(XFS_LI_RELOG, &lip->li_flags)) {
 				atomic_dec(&bp->b_pin_count);
+				clear_bit(XFS_LI_RELOG_RAND, &bip->bli_item.li_flags);
 				xfs_trans_relog_item_cancel(NULL, lip, true);
 			}
 
diff --git a/fs/xfs/xfs_trans.h b/fs/xfs/xfs_trans.h
index a7a70430b8b2..0f40ce784367 100644
--- a/fs/xfs/xfs_trans.h
+++ b/fs/xfs/xfs_trans.h
@@ -64,6 +64,7 @@ struct xfs_log_item {
 #define	XFS_LI_DIRTY	3	/* log item dirty in transaction */
 #define	XFS_LI_RELOG	4	/* automatically relog item */
 #define XFS_LI_RELOG_QUEUED 5	/* queued for relog */
+#define XFS_LI_RELOG_RAND   6
 
 #define XFS_LI_FLAGS \
 	{ (1 << XFS_LI_IN_AIL),		"IN_AIL" }, \
@@ -71,7 +72,8 @@ struct xfs_log_item {
 	{ (1 << XFS_LI_FAILED),		"FAILED" }, \
 	{ (1 << XFS_LI_DIRTY),		"DIRTY" }, \
 	{ (1 << XFS_LI_RELOG),		"RELOG" }, \
-	{ (1 << XFS_LI_RELOG_QUEUED),	"RELOG_QUEUED" }
+	{ (1 << XFS_LI_RELOG_QUEUED),	"RELOG_QUEUED" }, \
+	{ (1 << XFS_LI_RELOG_RAND),	"RELOG_RAND" }
 
 struct xfs_item_ops {
 	unsigned flags;
diff --git a/fs/xfs/xfs_trans_ail.c b/fs/xfs/xfs_trans_ail.c
index b78d026d6564..a34a5e73427e 100644
--- a/fs/xfs/xfs_trans_ail.c
+++ b/fs/xfs/xfs_trans_ail.c
@@ -18,6 +18,7 @@
 #include "xfs_error.h"
 #include "xfs_log.h"
 #include "xfs_log_priv.h"
+#include "xfs_buf_item.h"
 
 #ifdef DEBUG
 /*
@@ -176,6 +177,7 @@ xfs_ail_relog(
 	struct xfs_trans_res	tres = {};
 	struct xfs_trans	*tp;
 	struct xfs_log_item	*lip, *lipp;
+	int			cancelres;
 	int			error;
 	LIST_HEAD(relog_list);
 
@@ -209,6 +211,37 @@ xfs_ail_relog(
 			ASSERT(lip->li_ops->iop_relog);
 			if (lip->li_ops->iop_relog)
 				lip->li_ops->iop_relog(lip, tp);
+
+			/*
+			 * Cancel random buffer relogs at a fixed rate to
+			 * prevent too much buildup.
+			 */
+			if (test_bit(XFS_LI_RELOG_RAND, &lip->li_flags) &&
+			    ((prandom_u32() & 1) ||
+			     (mp->m_flags & XFS_MOUNT_UNMOUNTING))) {
+				struct xfs_buf_log_item	*bli;
+				bli = container_of(lip, struct xfs_buf_log_item,
+						   bli_item);
+				xfs_trans_relog_buf_cancel(tp, bli->bli_buf);
+			}
+		}
+
+		/*
+		 * Cancelling relog reservation in the same transaction as
+		 * consuming it means the current transaction over releases
+		 * reservation on commit and the next transaction reservation
+		 * restores the grant heads to even. To avoid this behavior,
+		 * remove surplus reservation (->t_curr_res) from the committing
+		 * transaction and replace it with a reduction in the
+		 * reservation requirement (->t_unit_res) for the next. This has
+		 * no net effect on reservation accounting, but ensures we don't
+		 * cause problems elsewhere with odd reservation behavior.
+		 */
+		cancelres = tp->t_ticket->t_curr_res - tp->t_ticket->t_unit_res;
+		if (cancelres) {
+			tp->t_ticket->t_curr_res -= cancelres;
+			tp->t_ticket->t_unit_res -= cancelres;
+			tp->t_log_res -= cancelres;
 		}
 
 		error = xfs_trans_roll(&tp);
diff --git a/fs/xfs/xfs_trans_buf.c b/fs/xfs/xfs_trans_buf.c
index 2bed5d615541..9dadf39a40bb 100644
--- a/fs/xfs/xfs_trans_buf.c
+++ b/fs/xfs/xfs_trans_buf.c
@@ -14,6 +14,8 @@
 #include "xfs_buf_item.h"
 #include "xfs_trans_priv.h"
 #include "xfs_trace.h"
+#include "xfs_error.h"
+#include "xfs_errortag.h"
 
 /*
  * Check to see if a buffer matching the given parameters is already
@@ -527,6 +529,17 @@ xfs_trans_log_buf(
 
 	trace_xfs_trans_log_buf(bip);
 	xfs_buf_item_log(bip, first, last);
+
+	/*
+	 * Relog random buffers so long as the transaction is relog enabled and
+	 * the buffer wasn't already relogged explicitly.
+	 */
+	if (XFS_TEST_ERROR(false, tp->t_mountp, XFS_ERRTAG_RELOG) &&
+	    (tp->t_flags & XFS_TRANS_RELOG) &&
+	    !test_bit(XFS_LI_RELOG, &bip->bli_item.li_flags)) {
+		if (xfs_trans_relog_buf(tp, bp))
+			set_bit(XFS_LI_RELOG_RAND, &bip->bli_item.li_flags);
+	}
 }
 
 
@@ -845,4 +858,5 @@ xfs_trans_relog_buf_cancel(
 
 	atomic_dec(&bp->b_pin_count);
 	xfs_trans_relog_item_cancel(tp, &bip->bli_item, false);
+	clear_bit(XFS_LI_RELOG_RAND, &bip->bli_item.li_flags);
 }
-- 
2.21.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [RFC v6 PATCH 02/10] xfs: create helper for ticket-less log res ungrant
  2020-04-06 12:36 ` [RFC v6 PATCH 02/10] xfs: create helper for ticket-less log res ungrant Brian Foster
@ 2020-04-06 23:52   ` Allison Collins
  0 siblings, 0 replies; 15+ messages in thread
From: Allison Collins @ 2020-04-06 23:52 UTC (permalink / raw)
  To: Brian Foster, linux-xfs



On 4/6/20 5:36 AM, Brian Foster wrote:
> Log reservation is currently acquired and released via log tickets.
> The relog mechanism introduces behavior where relog reservation is
> transferred between transaction log tickets and an external pool of
> relog reservation for active relog items. Certain contexts will be
> able to release outstanding relog reservation without the need for a
> log ticket. Factor out a helper to allow byte granularity log
> reservation ungrant.
> 
> Signed-off-by: Brian Foster <bfoster@redhat.com>
Looks straight forward:
Reviewed-by: Allison Collins <allison.henderson@oracle.com>

> ---
>   fs/xfs/xfs_log.c | 20 ++++++++++++++++----
>   fs/xfs/xfs_log.h |  1 +
>   2 files changed, 17 insertions(+), 4 deletions(-)
> 
> diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c
> index 00fda2e8e738..d6b63490a78b 100644
> --- a/fs/xfs/xfs_log.c
> +++ b/fs/xfs/xfs_log.c
> @@ -2980,6 +2980,21 @@ xfs_log_ticket_regrant(
>   	xfs_log_ticket_put(ticket);
>   }
>   
> +/*
> + * Restore log reservation directly to the grant heads.
> + */
> +void
> +xfs_log_ungrant_bytes(
> +	struct xfs_mount	*mp,
> +	int			bytes)
> +{
> +	struct xlog		*log = mp->m_log;
> +
> +	xlog_grant_sub_space(log, &log->l_reserve_head.grant, bytes);
> +	xlog_grant_sub_space(log, &log->l_write_head.grant, bytes);
> +	xfs_log_space_wake(mp);
> +}
> +
>   /*
>    * Give back the space left from a reservation.
>    *
> @@ -3018,12 +3033,9 @@ xfs_log_ticket_ungrant(
>   		bytes += ticket->t_unit_res*ticket->t_cnt;
>   	}
>   
> -	xlog_grant_sub_space(log, &log->l_reserve_head.grant, bytes);
> -	xlog_grant_sub_space(log, &log->l_write_head.grant, bytes);
> -
> +	xfs_log_ungrant_bytes(log->l_mp, bytes);
>   	trace_xfs_log_ticket_ungrant_exit(log, ticket);
>   
> -	xfs_log_space_wake(log->l_mp);
>   	xfs_log_ticket_put(ticket);
>   }
>   
> diff --git a/fs/xfs/xfs_log.h b/fs/xfs/xfs_log.h
> index 1412d6993f1e..6d2f30f42245 100644
> --- a/fs/xfs/xfs_log.h
> +++ b/fs/xfs/xfs_log.h
> @@ -125,6 +125,7 @@ int	  xfs_log_reserve(struct xfs_mount *mp,
>   			  uint8_t		   clientid,
>   			  bool		   permanent);
>   int	  xfs_log_regrant(struct xfs_mount *mp, struct xlog_ticket *tic);
> +void	  xfs_log_ungrant_bytes(struct xfs_mount *mp, int bytes);
>   void      xfs_log_unmount(struct xfs_mount *mp);
>   int	  xfs_log_force_umount(struct xfs_mount *mp, int logerror);
>   
> 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC v6 PATCH 03/10] xfs: extra runtime reservation overhead for relog transactions
  2020-04-06 12:36 ` [RFC v6 PATCH 03/10] xfs: extra runtime reservation overhead for relog transactions Brian Foster
@ 2020-04-07 23:04   ` Allison Collins
  2020-04-08 11:43     ` Brian Foster
  0 siblings, 1 reply; 15+ messages in thread
From: Allison Collins @ 2020-04-07 23:04 UTC (permalink / raw)
  To: Brian Foster, linux-xfs



On 4/6/20 5:36 AM, Brian Foster wrote:
> Every transaction reservation includes runtime overhead on top of
> the reservation calculated in the struct xfs_trans_res. This
> overhead is required for things like the CIL context ticket, log
> headers, etc., that are stolen from individual transactions. Since
> reservation for the relog transaction is entirely contributed by
> regular transactions, this runtime reservation overhead must be
> contributed as well. This means that a transaction that relogs one
> or more items must include overhead for the current transaction as
> well as for the relog transaction.
> 
> Define a new transaction flag to indicate that a transaction is
> relog enabled. Plumb this state down to the log ticket allocation
> and use it to bump the worst case overhead included in the
> transaction. The overhead will eventually be transferred to the
> relog system as needed for individual log items.
> 
> Signed-off-by: Brian Foster <bfoster@redhat.com>
> ---
>   fs/xfs/libxfs/xfs_shared.h |  1 +
>   fs/xfs/xfs_log.c           | 12 +++++++++---
>   fs/xfs/xfs_log.h           |  3 ++-
>   fs/xfs/xfs_log_cil.c       |  2 +-
>   fs/xfs/xfs_log_priv.h      |  1 +
>   fs/xfs/xfs_trans.c         |  3 ++-
>   6 files changed, 16 insertions(+), 6 deletions(-)
> 
> diff --git a/fs/xfs/libxfs/xfs_shared.h b/fs/xfs/libxfs/xfs_shared.h
> index c45acbd3add9..1ede1e720a5c 100644
> --- a/fs/xfs/libxfs/xfs_shared.h
> +++ b/fs/xfs/libxfs/xfs_shared.h
> @@ -65,6 +65,7 @@ void	xfs_log_get_max_trans_res(struct xfs_mount *mp,
>   #define XFS_TRANS_DQ_DIRTY	0x10	/* at least one dquot in trx dirty */
>   #define XFS_TRANS_RESERVE	0x20    /* OK to use reserved data blocks */
>   #define XFS_TRANS_NO_WRITECOUNT 0x40	/* do not elevate SB writecount */
> +#define XFS_TRANS_RELOG		0x80	/* requires extra relog overhead */
>   /*
>    * LOWMODE is used by the allocator to activate the lowspace algorithm - when
>    * free space is running low the extent allocator may choose to allocate an
> diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c
> index d6b63490a78b..b55abde6c142 100644
> --- a/fs/xfs/xfs_log.c
> +++ b/fs/xfs/xfs_log.c
> @@ -418,7 +418,8 @@ xfs_log_reserve(
>   	int		 	cnt,
>   	struct xlog_ticket	**ticp,
>   	uint8_t		 	client,
> -	bool			permanent)
> +	bool			permanent,
> +	bool			relog)
>   {
>   	struct xlog		*log = mp->m_log;
>   	struct xlog_ticket	*tic;
> @@ -433,7 +434,8 @@ xfs_log_reserve(
>   	XFS_STATS_INC(mp, xs_try_logspace);
>   
>   	ASSERT(*ticp == NULL);
> -	tic = xlog_ticket_alloc(log, unit_bytes, cnt, client, permanent, 0);
> +	tic = xlog_ticket_alloc(log, unit_bytes, cnt, client, permanent, relog,
> +				0);
>   	*ticp = tic;
>   
>   	xlog_grant_push_ail(log, tic->t_cnt ? tic->t_unit_res * tic->t_cnt
> @@ -831,7 +833,7 @@ xlog_unmount_write(
>   	uint			flags = XLOG_UNMOUNT_TRANS;
>   	int			error;
>   
> -	error = xfs_log_reserve(mp, 600, 1, &tic, XFS_LOG, 0);
> +	error = xfs_log_reserve(mp, 600, 1, &tic, XFS_LOG, false, false);
>   	if (error)
>   		goto out_err;
>   
> @@ -3421,6 +3423,7 @@ xlog_ticket_alloc(
>   	int			cnt,
>   	char			client,
>   	bool			permanent,
> +	bool			relog,
>   	xfs_km_flags_t		alloc_flags)
I notice this routine already has a flag param.  Wondering if it would 
make sense to have a KM_RELOG flag instead of an extra bool param?  It 
would just be one less thing to pass around.  Other than that, it seems 
straight forward.

Reviewed-by: Allison Collins <allison.henderson@oracle.com>

>   {
>   	struct xlog_ticket	*tic;
> @@ -3431,6 +3434,9 @@ xlog_ticket_alloc(
>   		return NULL;
>   
>   	unit_res = xfs_log_calc_unit_res(log->l_mp, unit_bytes);
> +	/* double the overhead for the relog transaction */
> +	if (relog)
> +		unit_res += (unit_res - unit_bytes);
>   
>   	atomic_set(&tic->t_ref, 1);
>   	tic->t_task		= current;
> diff --git a/fs/xfs/xfs_log.h b/fs/xfs/xfs_log.h
> index 6d2f30f42245..f1089a4b299c 100644
> --- a/fs/xfs/xfs_log.h
> +++ b/fs/xfs/xfs_log.h
> @@ -123,7 +123,8 @@ int	  xfs_log_reserve(struct xfs_mount *mp,
>   			  int		   count,
>   			  struct xlog_ticket **ticket,
>   			  uint8_t		   clientid,
> -			  bool		   permanent);
> +			  bool		   permanent,
> +			  bool		   relog);
>   int	  xfs_log_regrant(struct xfs_mount *mp, struct xlog_ticket *tic);
>   void	  xfs_log_ungrant_bytes(struct xfs_mount *mp, int bytes);
>   void      xfs_log_unmount(struct xfs_mount *mp);
> diff --git a/fs/xfs/xfs_log_cil.c b/fs/xfs/xfs_log_cil.c
> index b43f0e8f43f2..1c48e95402aa 100644
> --- a/fs/xfs/xfs_log_cil.c
> +++ b/fs/xfs/xfs_log_cil.c
> @@ -37,7 +37,7 @@ xlog_cil_ticket_alloc(
>   {
>   	struct xlog_ticket *tic;
>   
> -	tic = xlog_ticket_alloc(log, 0, 1, XFS_TRANSACTION, 0,
> +	tic = xlog_ticket_alloc(log, 0, 1, XFS_TRANSACTION, false, false,
>   				KM_NOFS);
>   
>   	/*
> diff --git a/fs/xfs/xfs_log_priv.h b/fs/xfs/xfs_log_priv.h
> index ec22c7a3867f..08d8ff9bce1a 100644
> --- a/fs/xfs/xfs_log_priv.h
> +++ b/fs/xfs/xfs_log_priv.h
> @@ -465,6 +465,7 @@ xlog_ticket_alloc(
>   	int		count,
>   	char		client,
>   	bool		permanent,
> +	bool		relog,
>   	xfs_km_flags_t	alloc_flags);
>   
>   
> diff --git a/fs/xfs/xfs_trans.c b/fs/xfs/xfs_trans.c
> index 4fbe11485bbb..1b25980315bd 100644
> --- a/fs/xfs/xfs_trans.c
> +++ b/fs/xfs/xfs_trans.c
> @@ -177,6 +177,7 @@ xfs_trans_reserve(
>   	 */
>   	if (resp->tr_logres > 0) {
>   		bool	permanent = false;
> +		bool	relog	  = (tp->t_flags & XFS_TRANS_RELOG);
>   
>   		ASSERT(tp->t_log_res == 0 ||
>   		       tp->t_log_res == resp->tr_logres);
> @@ -199,7 +200,7 @@ xfs_trans_reserve(
>   						resp->tr_logres,
>   						resp->tr_logcount,
>   						&tp->t_ticket, XFS_TRANSACTION,
> -						permanent);
> +						permanent, relog);
>   		}
>   
>   		if (error)
> 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC v6 PATCH 03/10] xfs: extra runtime reservation overhead for relog transactions
  2020-04-07 23:04   ` Allison Collins
@ 2020-04-08 11:43     ` Brian Foster
  0 siblings, 0 replies; 15+ messages in thread
From: Brian Foster @ 2020-04-08 11:43 UTC (permalink / raw)
  To: Allison Collins; +Cc: linux-xfs

On Tue, Apr 07, 2020 at 04:04:43PM -0700, Allison Collins wrote:
> 
> 
> On 4/6/20 5:36 AM, Brian Foster wrote:
> > Every transaction reservation includes runtime overhead on top of
> > the reservation calculated in the struct xfs_trans_res. This
> > overhead is required for things like the CIL context ticket, log
> > headers, etc., that are stolen from individual transactions. Since
> > reservation for the relog transaction is entirely contributed by
> > regular transactions, this runtime reservation overhead must be
> > contributed as well. This means that a transaction that relogs one
> > or more items must include overhead for the current transaction as
> > well as for the relog transaction.
> > 
> > Define a new transaction flag to indicate that a transaction is
> > relog enabled. Plumb this state down to the log ticket allocation
> > and use it to bump the worst case overhead included in the
> > transaction. The overhead will eventually be transferred to the
> > relog system as needed for individual log items.
> > 
> > Signed-off-by: Brian Foster <bfoster@redhat.com>
> > ---
> >   fs/xfs/libxfs/xfs_shared.h |  1 +
> >   fs/xfs/xfs_log.c           | 12 +++++++++---
> >   fs/xfs/xfs_log.h           |  3 ++-
> >   fs/xfs/xfs_log_cil.c       |  2 +-
> >   fs/xfs/xfs_log_priv.h      |  1 +
> >   fs/xfs/xfs_trans.c         |  3 ++-
> >   6 files changed, 16 insertions(+), 6 deletions(-)
> > 
> > diff --git a/fs/xfs/libxfs/xfs_shared.h b/fs/xfs/libxfs/xfs_shared.h
> > index c45acbd3add9..1ede1e720a5c 100644
> > --- a/fs/xfs/libxfs/xfs_shared.h
> > +++ b/fs/xfs/libxfs/xfs_shared.h
> > @@ -65,6 +65,7 @@ void	xfs_log_get_max_trans_res(struct xfs_mount *mp,
> >   #define XFS_TRANS_DQ_DIRTY	0x10	/* at least one dquot in trx dirty */
> >   #define XFS_TRANS_RESERVE	0x20    /* OK to use reserved data blocks */
> >   #define XFS_TRANS_NO_WRITECOUNT 0x40	/* do not elevate SB writecount */
> > +#define XFS_TRANS_RELOG		0x80	/* requires extra relog overhead */
> >   /*
> >    * LOWMODE is used by the allocator to activate the lowspace algorithm - when
> >    * free space is running low the extent allocator may choose to allocate an
> > diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c
> > index d6b63490a78b..b55abde6c142 100644
> > --- a/fs/xfs/xfs_log.c
> > +++ b/fs/xfs/xfs_log.c
> > @@ -418,7 +418,8 @@ xfs_log_reserve(
> >   	int		 	cnt,
> >   	struct xlog_ticket	**ticp,
> >   	uint8_t		 	client,
> > -	bool			permanent)
> > +	bool			permanent,
> > +	bool			relog)
> >   {
> >   	struct xlog		*log = mp->m_log;
> >   	struct xlog_ticket	*tic;
> > @@ -433,7 +434,8 @@ xfs_log_reserve(
> >   	XFS_STATS_INC(mp, xs_try_logspace);
> >   	ASSERT(*ticp == NULL);
> > -	tic = xlog_ticket_alloc(log, unit_bytes, cnt, client, permanent, 0);
> > +	tic = xlog_ticket_alloc(log, unit_bytes, cnt, client, permanent, relog,
> > +				0);
> >   	*ticp = tic;
> >   	xlog_grant_push_ail(log, tic->t_cnt ? tic->t_unit_res * tic->t_cnt
> > @@ -831,7 +833,7 @@ xlog_unmount_write(
> >   	uint			flags = XLOG_UNMOUNT_TRANS;
> >   	int			error;
> > -	error = xfs_log_reserve(mp, 600, 1, &tic, XFS_LOG, 0);
> > +	error = xfs_log_reserve(mp, 600, 1, &tic, XFS_LOG, false, false);
> >   	if (error)
> >   		goto out_err;
> > @@ -3421,6 +3423,7 @@ xlog_ticket_alloc(
> >   	int			cnt,
> >   	char			client,
> >   	bool			permanent,
> > +	bool			relog,
> >   	xfs_km_flags_t		alloc_flags)
> I notice this routine already has a flag param.  Wondering if it would make
> sense to have a KM_RELOG flag instead of an extra bool param?  It would just
> be one less thing to pass around.  Other than that, it seems straight
> forward.
> 

In the xlog_ticket_alloc() case, those are actually memory allocation
flags (as opposed to transaction flags used in xfs_trans_alloc()). The
former includes things like KM_NOFS, etc., that translate to GFP_*
allocation flags. So I don't think it's appropriate to stash a
transaction or ticket oriented flag there.

I did halfway expect some feedback wrt to XFS_TRANS_RELOG because it
technically could be buried in the reservation structure itself, similar
to how permanent state is handled (though we'd still need to pass the
boolean around a bit). I opted to introduce XFS_TRANS_RELOG here instead
because I need to use that for random buffer relogging later in the
series, so it accomplished both at least for the time being. For a
non-RFC series I could look into moving that flag into the reservation,
relegating TRANS_RELOG to debug mode and letting the test code use it
from there.

> Reviewed-by: Allison Collins <allison.henderson@oracle.com>
> 

Thanks!

Brian

> >   {
> >   	struct xlog_ticket	*tic;
> > @@ -3431,6 +3434,9 @@ xlog_ticket_alloc(
> >   		return NULL;
> >   	unit_res = xfs_log_calc_unit_res(log->l_mp, unit_bytes);
> > +	/* double the overhead for the relog transaction */
> > +	if (relog)
> > +		unit_res += (unit_res - unit_bytes);
> >   	atomic_set(&tic->t_ref, 1);
> >   	tic->t_task		= current;
> > diff --git a/fs/xfs/xfs_log.h b/fs/xfs/xfs_log.h
> > index 6d2f30f42245..f1089a4b299c 100644
> > --- a/fs/xfs/xfs_log.h
> > +++ b/fs/xfs/xfs_log.h
> > @@ -123,7 +123,8 @@ int	  xfs_log_reserve(struct xfs_mount *mp,
> >   			  int		   count,
> >   			  struct xlog_ticket **ticket,
> >   			  uint8_t		   clientid,
> > -			  bool		   permanent);
> > +			  bool		   permanent,
> > +			  bool		   relog);
> >   int	  xfs_log_regrant(struct xfs_mount *mp, struct xlog_ticket *tic);
> >   void	  xfs_log_ungrant_bytes(struct xfs_mount *mp, int bytes);
> >   void      xfs_log_unmount(struct xfs_mount *mp);
> > diff --git a/fs/xfs/xfs_log_cil.c b/fs/xfs/xfs_log_cil.c
> > index b43f0e8f43f2..1c48e95402aa 100644
> > --- a/fs/xfs/xfs_log_cil.c
> > +++ b/fs/xfs/xfs_log_cil.c
> > @@ -37,7 +37,7 @@ xlog_cil_ticket_alloc(
> >   {
> >   	struct xlog_ticket *tic;
> > -	tic = xlog_ticket_alloc(log, 0, 1, XFS_TRANSACTION, 0,
> > +	tic = xlog_ticket_alloc(log, 0, 1, XFS_TRANSACTION, false, false,
> >   				KM_NOFS);
> >   	/*
> > diff --git a/fs/xfs/xfs_log_priv.h b/fs/xfs/xfs_log_priv.h
> > index ec22c7a3867f..08d8ff9bce1a 100644
> > --- a/fs/xfs/xfs_log_priv.h
> > +++ b/fs/xfs/xfs_log_priv.h
> > @@ -465,6 +465,7 @@ xlog_ticket_alloc(
> >   	int		count,
> >   	char		client,
> >   	bool		permanent,
> > +	bool		relog,
> >   	xfs_km_flags_t	alloc_flags);
> > diff --git a/fs/xfs/xfs_trans.c b/fs/xfs/xfs_trans.c
> > index 4fbe11485bbb..1b25980315bd 100644
> > --- a/fs/xfs/xfs_trans.c
> > +++ b/fs/xfs/xfs_trans.c
> > @@ -177,6 +177,7 @@ xfs_trans_reserve(
> >   	 */
> >   	if (resp->tr_logres > 0) {
> >   		bool	permanent = false;
> > +		bool	relog	  = (tp->t_flags & XFS_TRANS_RELOG);
> >   		ASSERT(tp->t_log_res == 0 ||
> >   		       tp->t_log_res == resp->tr_logres);
> > @@ -199,7 +200,7 @@ xfs_trans_reserve(
> >   						resp->tr_logres,
> >   						resp->tr_logcount,
> >   						&tp->t_ticket, XFS_TRANSACTION,
> > -						permanent);
> > +						permanent, relog);
> >   		}
> >   		if (error)
> > 
> 


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC v6 PATCH 07/10] xfs: prevent fs freeze with outstanding relog items
  2020-04-06 12:36 ` [RFC v6 PATCH 07/10] xfs: prevent fs freeze with outstanding relog items Brian Foster
@ 2020-04-09  0:05   ` Allison Collins
  0 siblings, 0 replies; 15+ messages in thread
From: Allison Collins @ 2020-04-09  0:05 UTC (permalink / raw)
  To: Brian Foster, linux-xfs



On 4/6/20 5:36 AM, Brian Foster wrote:
> The automatic relog mechanism is currently incompatible with
> filesystem freeze in a generic sense. Freeze protection is currently
> implemented via locks that cannot be held on return to userspace,
> which means we can't hold a superblock write reference for the
> duration relogging is enabled on an item. It's too late to block
> when the freeze sequence calls into the filesystem because the
> transaction subsystem has already begun to be frozen. Not only can
> this block the relog transaction, but blocking any unrelated
> transaction essentially prevents a particular operation from
> progressing to the point where it can disable relogging on an item.
> Therefore marking the relog transaction as "nowrite" does not solve
> the problem.
> 
> This is not a problem in practice because the two primary use cases
> already exclude freeze via other means. quotaoff holds ->s_umount
> read locked across the operation and scrub explicitly takes a
> superblock write reference, both of which block freeze of the
> transaction subsystem for the duration of relog enabled items.
> 
> As a fallback for future use cases and the upcoming random buffer
> relogging test code, fail fs freeze attempts when the global relog
> reservation counter is elevated to prevent deadlock. This is a
> partial punt of the problem as compared to teaching freeze to wait
> on relogged items because the only current dependency is test code.
> In other words, this patch prevents deadlock if a user happens to
> issue a freeze in conjunction with random buffer relog injection.
> 
> Signed-off-by: Brian Foster <bfoster@redhat.com>
Ok, long explanation, but makes sense :-)

Reviewed-by: Allison Collins <allison.henderson@oracle.com>
> ---
>   fs/xfs/xfs_super.c | 4 ++++
>   1 file changed, 4 insertions(+)
> 
> diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
> index abf06bf9c3f3..0efa9dc70d71 100644
> --- a/fs/xfs/xfs_super.c
> +++ b/fs/xfs/xfs_super.c
> @@ -35,6 +35,7 @@
>   #include "xfs_refcount_item.h"
>   #include "xfs_bmap_item.h"
>   #include "xfs_reflink.h"
> +#include "xfs_trans_priv.h"
>   
>   #include <linux/magic.h>
>   #include <linux/fs_context.h>
> @@ -870,6 +871,9 @@ xfs_fs_freeze(
>   {
>   	struct xfs_mount	*mp = XFS_M(sb);
>   
> +	if (WARN_ON_ONCE(atomic64_read(&mp->m_ail->ail_relog_res)))
> +		return -EAGAIN;
> +
>   	xfs_stop_block_reaping(mp);
>   	xfs_save_resvblks(mp);
>   	xfs_quiesce_attr(mp);
> 

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2020-04-09  0:05 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-04-06 12:36 [RFC v6 PATCH 00/10] xfs: automatic relogging experiment Brian Foster
2020-04-06 12:36 ` [RFC v6 PATCH 01/10] xfs: automatic relogging item management Brian Foster
2020-04-06 12:36 ` [RFC v6 PATCH 02/10] xfs: create helper for ticket-less log res ungrant Brian Foster
2020-04-06 23:52   ` Allison Collins
2020-04-06 12:36 ` [RFC v6 PATCH 03/10] xfs: extra runtime reservation overhead for relog transactions Brian Foster
2020-04-07 23:04   ` Allison Collins
2020-04-08 11:43     ` Brian Foster
2020-04-06 12:36 ` [RFC v6 PATCH 04/10] xfs: relog log reservation stealing and accounting Brian Foster
2020-04-06 12:36 ` [RFC v6 PATCH 05/10] xfs: automatic log item relog mechanism Brian Foster
2020-04-06 12:36 ` [RFC v6 PATCH 06/10] xfs: automatically relog the quotaoff start intent Brian Foster
2020-04-06 12:36 ` [RFC v6 PATCH 07/10] xfs: prevent fs freeze with outstanding relog items Brian Foster
2020-04-09  0:05   ` Allison Collins
2020-04-06 12:36 ` [RFC v6 PATCH 08/10] xfs: buffer relogging support prototype Brian Foster
2020-04-06 12:36 ` [RFC v6 PATCH 09/10] xfs: create an error tag for random relog reservation Brian Foster
2020-04-06 12:36 ` [RFC v6 PATCH 10/10] xfs: relog random buffers based on errortag Brian Foster

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).