* [RFC v2 PATCH 0/3] xfs: automatic relogging experiment @ 2019-11-22 18:19 Brian Foster 2019-11-22 18:19 ` [RFC v2 PATCH 1/3] xfs: set t_task at wait time instead of alloc time Brian Foster ` (3 more replies) 0 siblings, 4 replies; 6+ messages in thread From: Brian Foster @ 2019-11-22 18:19 UTC (permalink / raw) To: linux-xfs Hi all, Here's a second pass at the automatic relogging experiment. The first pass tagged the log item for relog and stole reservation from unrelated transactions to opportunistically relog items, all within the log subsystem. The primary feedback to that approach was to consider a transaction function callback and concerns about reservation stealing. This version somewhat accommodates those concerns, but still leaves some open questions around broader usage. In short, this version introduces a separate workqueue context for committing and rolling transactions for relogged items and a hook to trigger a relog based on the item being committed to the AIL. The relog state (including log reservation for the relog) is tracked via the log item. Only the log ticket from the caller transaction is regranted and tracked to avoid processing open transactions through separate contexts. This is essentially an open-coded transaction roll without the need for a duplicate transaction. While this seems to work reasonably well for the simple case of quotaoff, there are still some open issues to resolve around formalizing such a mechanism for broader use. Firstly, the quotaoff patch just ties into the existing ->iop_committed() callback since that is currently unused, but the queue relog call could just as easily be made directly from the AIL commit code. IOW, the whole callback thing seems kind of like a solution looking for a problem in this context. With external tracking of relogged items, all we really need here fundamentally is a notification that the CIL context has committed. Another caveat is that this approach seems much more cumbersome and inefficient with respect to batching relogs of unrelated items. Rather than collect all unrelated items in a single transaction, we'd have a separate tracking structure, log ticket (which regrants the caller's entire reservation) and independent roll/regrant for each potential set of items. I can see value in the simplicity and flexibility of use in terms of being able to potentially register an already constructed transaction for automatic relogging, but I question whether that suits our use case(s). All in all, I think this is still pretty raw and needs some more thought on the design (or perhaps requirements) front. I could see this going anywhere from something more low level like the previous version where we could have a relogged items list (RIL) associated with the CIL and require initial transactions donate real relog reservation to something more like this approach where we have some separate context queueing/rolling relog transactions based on log subsystem events. I suspect there are capability considerations for either end of that spectrum. For example, it might be harder to auto relog anything but intents with the RIL approach where there is no transaction to own item locks, etc., but do we currently have a use case to relog anything besides QUOTAOFF and EFIs? If not, I lean more toward the more simple, low level approach (at least for now), but I could be convinced otherwise. Thoughts on any of this appreciated. Brian rfcv2: - Different approach based on workqueue and transaction rolling. rfc: https://lore.kernel.org/linux-xfs/20191024172850.7698-1-bfoster@redhat.com/ Brian Foster (3): xfs: set t_task at wait time instead of alloc time xfs: prototype automatic intent relogging mechanism xfs: automatically relog quotaoff start intent fs/xfs/Makefile | 1 + fs/xfs/libxfs/xfs_trans_resv.c | 3 +- fs/xfs/xfs_dquot_item.c | 13 ++++ fs/xfs/xfs_log.c | 11 ++- fs/xfs/xfs_log_priv.h | 1 + fs/xfs/xfs_qm_syscalls.c | 9 ++- fs/xfs/xfs_trans.c | 2 +- fs/xfs/xfs_trans.h | 13 ++++ fs/xfs/xfs_trans_relog.c | 130 +++++++++++++++++++++++++++++++++ 9 files changed, 179 insertions(+), 4 deletions(-) create mode 100644 fs/xfs/xfs_trans_relog.c -- 2.20.1 ^ permalink raw reply [flat|nested] 6+ messages in thread
* [RFC v2 PATCH 1/3] xfs: set t_task at wait time instead of alloc time 2019-11-22 18:19 [RFC v2 PATCH 0/3] xfs: automatic relogging experiment Brian Foster @ 2019-11-22 18:19 ` Brian Foster 2019-11-22 18:19 ` [RFC v2 PATCH 2/3] xfs: prototype automatic intent relogging mechanism Brian Foster ` (2 subsequent siblings) 3 siblings, 0 replies; 6+ messages in thread From: Brian Foster @ 2019-11-22 18:19 UTC (permalink / raw) To: linux-xfs The xlog_ticket structure contains a task reference to support task scheduling associated with log reservation acquisition. This reference is assigned at ticket allocation time, but otherwise there is no reason log space cannot be reserved for a ticket from a context different from the allocating context. Move the task assignment to the log reservation blocking code where it is used. Signed-off-by: Brian Foster <bfoster@redhat.com> --- fs/xfs/xfs_log.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c index 6a147c63a8a6..0c0c035c5be0 100644 --- a/fs/xfs/xfs_log.c +++ b/fs/xfs/xfs_log.c @@ -262,6 +262,7 @@ xlog_grant_head_wait( int need_bytes) __releases(&head->lock) __acquires(&head->lock) { + tic->t_task = current; list_add_tail(&tic->t_queue, &head->waiters); do { @@ -3599,7 +3600,6 @@ xlog_ticket_alloc( unit_res = xfs_log_calc_unit_res(log->l_mp, unit_bytes); atomic_set(&tic->t_ref, 1); - tic->t_task = current; INIT_LIST_HEAD(&tic->t_queue); tic->t_unit_res = unit_res; tic->t_curr_res = unit_res; -- 2.20.1 ^ permalink raw reply related [flat|nested] 6+ messages in thread
* [RFC v2 PATCH 2/3] xfs: prototype automatic intent relogging mechanism 2019-11-22 18:19 [RFC v2 PATCH 0/3] xfs: automatic relogging experiment Brian Foster 2019-11-22 18:19 ` [RFC v2 PATCH 1/3] xfs: set t_task at wait time instead of alloc time Brian Foster @ 2019-11-22 18:19 ` Brian Foster 2019-11-22 18:19 ` [RFC v2 PATCH 3/3] xfs: automatically relog quotaoff start intent Brian Foster 2019-11-25 18:55 ` [RFC v3 PATCH] xfs: automatic relogging experiment Brian Foster 3 siblings, 0 replies; 6+ messages in thread From: Brian Foster @ 2019-11-22 18:19 UTC (permalink / raw) To: linux-xfs Signed-off-by: Brian Foster <bfoster@redhat.com> --- fs/xfs/Makefile | 1 + fs/xfs/xfs_log.c | 9 +++ fs/xfs/xfs_log_priv.h | 1 + fs/xfs/xfs_trans.c | 2 +- fs/xfs/xfs_trans.h | 13 ++++ fs/xfs/xfs_trans_relog.c | 130 +++++++++++++++++++++++++++++++++++++++ 6 files changed, 155 insertions(+), 1 deletion(-) create mode 100644 fs/xfs/xfs_trans_relog.c diff --git a/fs/xfs/Makefile b/fs/xfs/Makefile index aceca2f9a3db..c4664a972e50 100644 --- a/fs/xfs/Makefile +++ b/fs/xfs/Makefile @@ -92,6 +92,7 @@ xfs-y += xfs_aops.o \ xfs_symlink.o \ xfs_sysfs.o \ xfs_trans.o \ + xfs_trans_relog.o \ xfs_xattr.o \ kmem.o diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c index 0c0c035c5be0..4f4c6b38621a 100644 --- a/fs/xfs/xfs_log.c +++ b/fs/xfs/xfs_log.c @@ -1531,12 +1531,20 @@ xlog_alloc_log( if (!log->l_ioend_workqueue) goto out_free_iclog; + log->l_relog_workqueue = alloc_workqueue("xfs-relog/%s", + WQ_MEM_RECLAIM | WQ_FREEZABLE | WQ_HIGHPRI, 0, + mp->m_super->s_id); + if (!log->l_relog_workqueue) + goto out_destroy_workqueue; + error = xlog_cil_init(log); if (error) goto out_destroy_workqueue; return log; out_destroy_workqueue: + if (log->l_relog_workqueue) + destroy_workqueue(log->l_relog_workqueue); destroy_workqueue(log->l_ioend_workqueue); out_free_iclog: for (iclog = log->l_iclog; iclog; iclog = prev_iclog) { @@ -2008,6 +2016,7 @@ xlog_dealloc_log( } log->l_mp->m_log = NULL; + destroy_workqueue(log->l_relog_workqueue); destroy_workqueue(log->l_ioend_workqueue); kmem_free(log); } /* xlog_dealloc_log */ diff --git a/fs/xfs/xfs_log_priv.h b/fs/xfs/xfs_log_priv.h index b192c5a9f9fd..4d557a82eb73 100644 --- a/fs/xfs/xfs_log_priv.h +++ b/fs/xfs/xfs_log_priv.h @@ -349,6 +349,7 @@ struct xlog { struct xfs_cil *l_cilp; /* CIL log is working with */ struct xfs_buftarg *l_targ; /* buftarg of log */ struct workqueue_struct *l_ioend_workqueue; /* for I/O completions */ + struct workqueue_struct *l_relog_workqueue; /* for auto relog */ struct delayed_work l_work; /* background flush work */ uint l_flags; uint l_quotaoffs_flag; /* XFS_DQ_*, for QUOTAOFFs */ diff --git a/fs/xfs/xfs_trans.c b/fs/xfs/xfs_trans.c index 3b208f9a865c..37011fc803fc 100644 --- a/fs/xfs/xfs_trans.c +++ b/fs/xfs/xfs_trans.c @@ -923,7 +923,7 @@ xfs_trans_committed_bulk( * have already been unlocked as if the commit had succeeded. * Do not reference the transaction structure after this call. */ -static int +int __xfs_trans_commit( struct xfs_trans *tp, bool regrant) diff --git a/fs/xfs/xfs_trans.h b/fs/xfs/xfs_trans.h index 64d7f171ebd3..066d4aca8416 100644 --- a/fs/xfs/xfs_trans.h +++ b/fs/xfs/xfs_trans.h @@ -48,6 +48,7 @@ struct xfs_log_item { struct xfs_log_vec *li_lv; /* active log vector */ struct xfs_log_vec *li_lv_shadow; /* standby vector */ xfs_lsn_t li_seq; /* CIL commit seq */ + void *li_priv; }; /* @@ -143,6 +144,14 @@ typedef struct xfs_trans { unsigned long t_pflags; /* saved process flags state */ } xfs_trans_t; +struct xfs_trans_relog { + struct work_struct tr_work; + unsigned int tr_log_res; + unsigned int tr_log_count; + struct xlog_ticket *tr_tic; + struct xfs_log_item *tr_lip; +}; + /* * XFS transaction mechanism exported interfaces that are * actually macros. @@ -231,7 +240,11 @@ bool xfs_trans_buf_is_dirty(struct xfs_buf *bp); void xfs_trans_log_inode(xfs_trans_t *, struct xfs_inode *, uint); int xfs_trans_commit(struct xfs_trans *); +int __xfs_trans_commit(struct xfs_trans *, bool); int xfs_trans_roll(struct xfs_trans **); +int xfs_trans_relog(struct xfs_trans *, struct xfs_log_item *); +void xfs_trans_relog_cancel(struct xfs_log_item *); +void xfs_trans_relog_queue(struct xfs_log_item *); int xfs_trans_roll_inode(struct xfs_trans **, struct xfs_inode *); void xfs_trans_cancel(xfs_trans_t *); int xfs_trans_ail_init(struct xfs_mount *); diff --git a/fs/xfs/xfs_trans_relog.c b/fs/xfs/xfs_trans_relog.c new file mode 100644 index 000000000000..af139aa501f6 --- /dev/null +++ b/fs/xfs/xfs_trans_relog.c @@ -0,0 +1,130 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Copyright (C) 2019 Red Hat, Inc. + * All Rights Reserved. + */ +#include "xfs.h" +#include "xfs_log_format.h" +#include "xfs_trans.h" +#include "xfs_trans_priv.h" +#include "xfs_trans_resv.h" +#include "xfs_shared.h" +#include "xfs_format.h" +#include "xfs_mount.h" +#include "xfs_log_priv.h" +#include "xfs_log.h" + +/* + * Helper to commit and regrant the log ticket as if the transaction is being + * rolled. The ticket is expected to be held by the caller and can be reused in + * a subsequent transaction. + */ +static int +xfs_trans_commit_regrant( + struct xfs_trans *tp) +{ + struct xfs_mount *mp = tp->t_mountp; + struct xlog_ticket *tic = tp->t_ticket; + int error; + + ASSERT(atomic_read(&tic->t_ref) > 1); + error = __xfs_trans_commit(tp, true); + if (!error) + error = xfs_log_regrant(mp, tic); + return error; +} + +static void +xfs_relog_worker( + struct work_struct *work) +{ + struct xfs_trans_relog *trp = container_of(work, + struct xfs_trans_relog, tr_work); + struct xfs_log_item *lip = trp->tr_lip; + struct xfs_mount *mp = lip->li_mountp; + struct xfs_trans_res resv = {}; + struct xfs_trans *tp; + int error; + + error = xfs_trans_alloc(mp, &resv, 0, 0, 0, &tp); + if (error) + return; + + /* associate the caller ticket with our empty transaction */ + tp->t_log_res = trp->tr_log_res; + tp->t_log_count = trp->tr_log_count; + tp->t_flags |= XFS_TRANS_PERM_LOG_RES; + tp->t_ticket = xfs_log_ticket_get(trp->tr_tic); + + xfs_trans_add_item(tp, lip); + tp->t_flags |= XFS_TRANS_DIRTY; + set_bit(XFS_LI_DIRTY, &lip->li_flags); + + /* commit the transaction and regrant the ticket for the next time */ + error = xfs_trans_commit_regrant(tp); + ASSERT(!error); +} + +void +xfs_trans_relog_queue( + struct xfs_log_item *lip) +{ + struct xfs_mount *mp = lip->li_mountp; + struct xfs_trans_relog *trp = lip->li_priv; + + if (!trp) + return; + + queue_work(mp->m_log->l_relog_workqueue, &trp->tr_work); +} + +/* + * Commit the caller transaction, regrant the ticket and use it for automatic + * relogging of the provided log item. + */ +int +xfs_trans_relog( + struct xfs_trans *tp, + struct xfs_log_item *lip) +{ + struct xfs_trans_relog *trp; + int error; + + trp = kmem_zalloc(sizeof(struct xfs_trans_relog), KM_MAYFAIL); + if (!trp) { + xfs_trans_cancel(tp); + return -ENOMEM; + } + + INIT_WORK(&trp->tr_work, xfs_relog_worker); + trp->tr_lip = lip; + trp->tr_tic = xfs_log_ticket_get(tp->t_ticket); + trp->tr_log_res = tp->t_log_res; + trp->tr_log_count = tp->t_log_count; + lip->li_priv = trp; + + error = xfs_trans_commit_regrant(tp); + if (error) { + xfs_log_ticket_put(trp->tr_tic); + kmem_free(trp); + } + + return error; +} + +void +xfs_trans_relog_cancel( + struct xfs_log_item *lip) +{ + struct xfs_trans_relog *trp = lip->li_priv; + struct xfs_mount *mp = lip->li_mountp; + + /* cancel queued relog task */ + cancel_work_sync(&trp->tr_work); + lip->li_priv = NULL; + + /* release log reservation and free ticket */ + ASSERT(atomic_read(&trp->tr_tic->t_ref) == 1); + xfs_log_done(mp, trp->tr_tic, NULL, false); + kmem_free(trp); +} -- 2.20.1 ^ permalink raw reply related [flat|nested] 6+ messages in thread
* [RFC v2 PATCH 3/3] xfs: automatically relog quotaoff start intent 2019-11-22 18:19 [RFC v2 PATCH 0/3] xfs: automatic relogging experiment Brian Foster 2019-11-22 18:19 ` [RFC v2 PATCH 1/3] xfs: set t_task at wait time instead of alloc time Brian Foster 2019-11-22 18:19 ` [RFC v2 PATCH 2/3] xfs: prototype automatic intent relogging mechanism Brian Foster @ 2019-11-22 18:19 ` Brian Foster 2019-11-25 18:55 ` [RFC v3 PATCH] xfs: automatic relogging experiment Brian Foster 3 siblings, 0 replies; 6+ messages in thread From: Brian Foster @ 2019-11-22 18:19 UTC (permalink / raw) To: linux-xfs Signed-off-by: Brian Foster <bfoster@redhat.com> --- fs/xfs/libxfs/xfs_trans_resv.c | 3 ++- fs/xfs/xfs_dquot_item.c | 13 +++++++++++++ fs/xfs/xfs_qm_syscalls.c | 9 ++++++++- 3 files changed, 23 insertions(+), 2 deletions(-) diff --git a/fs/xfs/libxfs/xfs_trans_resv.c b/fs/xfs/libxfs/xfs_trans_resv.c index c55cd9a3dec9..88e222d0947a 100644 --- a/fs/xfs/libxfs/xfs_trans_resv.c +++ b/fs/xfs/libxfs/xfs_trans_resv.c @@ -866,7 +866,8 @@ xfs_trans_resv_calc( resp->tr_qm_setqlim.tr_logcount = XFS_DEFAULT_LOG_COUNT; resp->tr_qm_quotaoff.tr_logres = xfs_calc_qm_quotaoff_reservation(mp); - resp->tr_qm_quotaoff.tr_logcount = XFS_DEFAULT_LOG_COUNT; + resp->tr_qm_quotaoff.tr_logcount = XFS_DEFAULT_PERM_LOG_COUNT; + resp->tr_qm_quotaoff.tr_logflags |= XFS_TRANS_PERM_LOG_RES; resp->tr_qm_equotaoff.tr_logres = xfs_calc_qm_quotaoff_end_reservation(); diff --git a/fs/xfs/xfs_dquot_item.c b/fs/xfs/xfs_dquot_item.c index d60647d7197b..57b7c9b731b1 100644 --- a/fs/xfs/xfs_dquot_item.c +++ b/fs/xfs/xfs_dquot_item.c @@ -288,6 +288,18 @@ xfs_qm_qoff_logitem_format( xlog_finish_iovec(lv, vecp, sizeof(struct xfs_qoff_logitem)); } +/* + * XXX: simulate relog callback with hardcoded ->iop_committed() callback + */ +static xfs_lsn_t +xfs_qm_qoff_logitem_committed( + struct xfs_log_item *lip, + xfs_lsn_t lsn) +{ + xfs_trans_relog_queue(lip); + return lsn; +} + /* * There isn't much you can do to push a quotaoff item. It is simply * stuck waiting for the log to be flushed to disk. @@ -333,6 +345,7 @@ static const struct xfs_item_ops xfs_qm_qoffend_logitem_ops = { static const struct xfs_item_ops xfs_qm_qoff_logitem_ops = { .iop_size = xfs_qm_qoff_logitem_size, .iop_format = xfs_qm_qoff_logitem_format, + .iop_committed = xfs_qm_qoff_logitem_committed, .iop_push = xfs_qm_qoff_logitem_push, }; diff --git a/fs/xfs/xfs_qm_syscalls.c b/fs/xfs/xfs_qm_syscalls.c index 1ea82764bf89..3f15f8576027 100644 --- a/fs/xfs/xfs_qm_syscalls.c +++ b/fs/xfs/xfs_qm_syscalls.c @@ -18,6 +18,7 @@ #include "xfs_quota.h" #include "xfs_qm.h" #include "xfs_icache.h" +#include "xfs_log.h" STATIC int xfs_qm_log_quotaoff( @@ -50,7 +51,7 @@ xfs_qm_log_quotaoff( * We don't care about quotoff's performance. */ xfs_trans_set_sync(tp); - error = xfs_trans_commit(tp); + error = xfs_trans_relog(tp, &qoffi->qql_item); if (error) goto out; @@ -227,7 +228,13 @@ xfs_qm_scall_quotaoff( * * So, we have QUOTAOFF start and end logitems; the start * logitem won't get overwritten until the end logitem appears... + * + * XXX: qoffend expects qoffstart in the AIL but not in the CIL. Force + * the log after cancelling relogging to prevent item reinsert... */ + //ssleep(3); + xfs_trans_relog_cancel(&qoffstart->qql_item); + xfs_log_force(mp, XFS_LOG_SYNC); error = xfs_qm_log_quotaoff_end(mp, qoffstart, flags); if (error) { /* We're screwed now. Shutdown is the only option. */ -- 2.20.1 ^ permalink raw reply related [flat|nested] 6+ messages in thread
* [RFC v3 PATCH] xfs: automatic relogging experiment 2019-11-22 18:19 [RFC v2 PATCH 0/3] xfs: automatic relogging experiment Brian Foster ` (2 preceding siblings ...) 2019-11-22 18:19 ` [RFC v2 PATCH 3/3] xfs: automatically relog quotaoff start intent Brian Foster @ 2019-11-25 18:55 ` Brian Foster 2019-11-27 15:12 ` Brian Foster 3 siblings, 1 reply; 6+ messages in thread From: Brian Foster @ 2019-11-25 18:55 UTC (permalink / raw) To: linux-xfs POC to automatically relog the quotaoff start intent. This approach involves no reservation stealing nor transaction rolling, so deadlock avoidance is not guaranteed. The tradeoff is simplicity and an approach that might be effective enough in practice. Signed-off-by: Brian Foster <bfoster@redhat.com> --- Here's a quickly hacked up version of what I was rambling about in the cover letter. I wanted to post this for comparison. As noted above, this doesn't necessarily guarantee deadlock avoidance with transaction rolling, etc., but might be good enough in practice for the current use cases (particularly with CIL context size fixes). Even if not, there's a clear enough path to tracking relog reservation with a ticket in the CIL context in a manner that is more conducive to batching. We also may be able to union ->li_cb() with a ->li_relog() variant to relog intent items as dfops currently does for things like EFIs that don't currently support direct relogging of the same object. Thoughts about using something like this as an intermediate solution, provided it holds up against some stress testing? Brian fs/xfs/xfs_log.c | 1 + fs/xfs/xfs_log_cil.c | 50 +++++++++++++++++++++++++++++++++++++++- fs/xfs/xfs_log_priv.h | 2 ++ fs/xfs/xfs_qm_syscalls.c | 6 +++++ fs/xfs/xfs_trans.h | 5 +++- 5 files changed, 62 insertions(+), 2 deletions(-) diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c index 6a147c63a8a6..4fb3c3156ea2 100644 --- a/fs/xfs/xfs_log.c +++ b/fs/xfs/xfs_log.c @@ -1086,6 +1086,7 @@ xfs_log_item_init( INIT_LIST_HEAD(&item->li_cil); INIT_LIST_HEAD(&item->li_bio_list); INIT_LIST_HEAD(&item->li_trans); + INIT_LIST_HEAD(&item->li_ril); } /* diff --git a/fs/xfs/xfs_log_cil.c b/fs/xfs/xfs_log_cil.c index 48435cf2aa16..c16ebc448a40 100644 --- a/fs/xfs/xfs_log_cil.c +++ b/fs/xfs/xfs_log_cil.c @@ -19,6 +19,44 @@ struct workqueue_struct *xfs_discard_wq; +static void +xfs_relog_worker( + struct work_struct *work) +{ + struct xfs_cil_ctx *ctx = container_of(work, struct xfs_cil_ctx, relog_work); + struct xfs_mount *mp = ctx->cil->xc_log->l_mp; + struct xfs_trans *tp; + struct xfs_log_item *lip, *lipp; + int error; + + error = xfs_trans_alloc(mp, &M_RES(mp)->tr_qm_quotaoff, 0, 0, 0, &tp); + ASSERT(!error); + + list_for_each_entry_safe(lip, lipp, &ctx->relog_list, li_ril) { + list_del_init(&lip->li_ril); + + if (!test_bit(XFS_LI_RELOG, &lip->li_flags)) + continue; + + xfs_trans_add_item(tp, lip); + set_bit(XFS_LI_DIRTY, &lip->li_flags); + tp->t_flags |= XFS_TRANS_DIRTY; + } + + error = xfs_trans_commit(tp); + ASSERT(!error); + + /* XXX */ + kmem_free(ctx); +} + +static void +xfs_relog_queue( + struct xfs_cil_ctx *ctx) +{ + queue_work(xfs_discard_wq, &ctx->relog_work); +} + /* * Allocate a new ticket. Failing to get a new ticket makes it really hard to * recover, so we don't allow failure here. Also, we allocate in a context that @@ -476,6 +514,9 @@ xlog_cil_insert_items( */ if (!list_is_last(&lip->li_cil, &cil->xc_cil)) list_move_tail(&lip->li_cil, &cil->xc_cil); + + if (test_bit(XFS_LI_RELOG, &lip->li_flags)) + list_move_tail(&lip->li_ril, &ctx->relog_list); } spin_unlock(&cil->xc_cil_lock); @@ -605,7 +646,10 @@ xlog_cil_committed( xlog_cil_free_logvec(ctx->lv_chain); - if (!list_empty(&ctx->busy_extents)) + /* XXX: mutually exclusive w/ discard for POC to handle ctx freeing */ + if (!list_empty(&ctx->relog_list)) + xfs_relog_queue(ctx); + else if (!list_empty(&ctx->busy_extents)) xlog_discard_busy_extents(mp, ctx); else kmem_free(ctx); @@ -746,8 +790,10 @@ xlog_cil_push( */ INIT_LIST_HEAD(&new_ctx->committing); INIT_LIST_HEAD(&new_ctx->busy_extents); + INIT_LIST_HEAD(&new_ctx->relog_list); new_ctx->sequence = ctx->sequence + 1; new_ctx->cil = cil; + INIT_WORK(&new_ctx->relog_work, xfs_relog_worker); cil->xc_ctx = new_ctx; /* @@ -1199,6 +1245,8 @@ xlog_cil_init( INIT_LIST_HEAD(&ctx->committing); INIT_LIST_HEAD(&ctx->busy_extents); + INIT_LIST_HEAD(&ctx->relog_list); + INIT_WORK(&ctx->relog_work, xfs_relog_worker); ctx->sequence = 1; ctx->cil = cil; cil->xc_ctx = ctx; diff --git a/fs/xfs/xfs_log_priv.h b/fs/xfs/xfs_log_priv.h index b192c5a9f9fd..6fd7b7297bd3 100644 --- a/fs/xfs/xfs_log_priv.h +++ b/fs/xfs/xfs_log_priv.h @@ -243,6 +243,8 @@ struct xfs_cil_ctx { struct list_head iclog_entry; struct list_head committing; /* ctx committing list */ struct work_struct discard_endio_work; + struct list_head relog_list; + struct work_struct relog_work; }; /* diff --git a/fs/xfs/xfs_qm_syscalls.c b/fs/xfs/xfs_qm_syscalls.c index 1ea82764bf89..08b6180cb5a3 100644 --- a/fs/xfs/xfs_qm_syscalls.c +++ b/fs/xfs/xfs_qm_syscalls.c @@ -18,6 +18,7 @@ #include "xfs_quota.h" #include "xfs_qm.h" #include "xfs_icache.h" +#include "xfs_log.h" STATIC int xfs_qm_log_quotaoff( @@ -37,6 +38,7 @@ xfs_qm_log_quotaoff( qoffi = xfs_trans_get_qoff_item(tp, NULL, flags & XFS_ALL_QUOTA_ACCT); xfs_trans_log_quotaoff_item(tp, qoffi); + set_bit(XFS_LI_RELOG, &qoffi->qql_item.li_flags); spin_lock(&mp->m_sb_lock); mp->m_sb.sb_qflags = (mp->m_qflags & ~(flags)) & XFS_MOUNT_QUOTA_ALL; @@ -69,6 +71,10 @@ xfs_qm_log_quotaoff_end( int error; struct xfs_qoff_logitem *qoffi; + clear_bit(XFS_LI_RELOG, &startqoff->qql_item.li_flags); + xfs_log_force(mp, XFS_LOG_SYNC); + flush_workqueue(xfs_discard_wq); + error = xfs_trans_alloc(mp, &M_RES(mp)->tr_qm_equotaoff, 0, 0, 0, &tp); if (error) return error; diff --git a/fs/xfs/xfs_trans.h b/fs/xfs/xfs_trans.h index 64d7f171ebd3..e04033c29f0d 100644 --- a/fs/xfs/xfs_trans.h +++ b/fs/xfs/xfs_trans.h @@ -48,6 +48,7 @@ struct xfs_log_item { struct xfs_log_vec *li_lv; /* active log vector */ struct xfs_log_vec *li_lv_shadow; /* standby vector */ xfs_lsn_t li_seq; /* CIL commit seq */ + struct list_head li_ril; }; /* @@ -59,12 +60,14 @@ struct xfs_log_item { #define XFS_LI_ABORTED 1 #define XFS_LI_FAILED 2 #define XFS_LI_DIRTY 3 /* log item dirty in transaction */ +#define XFS_LI_RELOG 4 /* automatic relogging */ #define XFS_LI_FLAGS \ { (1 << XFS_LI_IN_AIL), "IN_AIL" }, \ { (1 << XFS_LI_ABORTED), "ABORTED" }, \ { (1 << XFS_LI_FAILED), "FAILED" }, \ - { (1 << XFS_LI_DIRTY), "DIRTY" } + { (1 << XFS_LI_DIRTY), "DIRTY" }, \ + { (1 << XFS_LI_RELOG), "RELOG" } struct xfs_item_ops { unsigned flags; -- 2.20.1 ^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [RFC v3 PATCH] xfs: automatic relogging experiment 2019-11-25 18:55 ` [RFC v3 PATCH] xfs: automatic relogging experiment Brian Foster @ 2019-11-27 15:12 ` Brian Foster 0 siblings, 0 replies; 6+ messages in thread From: Brian Foster @ 2019-11-27 15:12 UTC (permalink / raw) To: linux-xfs On Mon, Nov 25, 2019 at 01:55:23PM -0500, Brian Foster wrote: > POC to automatically relog the quotaoff start intent. This approach > involves no reservation stealing nor transaction rolling, so > deadlock avoidance is not guaranteed. The tradeoff is simplicity and > an approach that might be effective enough in practice. > > Signed-off-by: Brian Foster <bfoster@redhat.com> > --- > > Here's a quickly hacked up version of what I was rambling about in the > cover letter. I wanted to post this for comparison. As noted above, this > doesn't necessarily guarantee deadlock avoidance with transaction > rolling, etc., but might be good enough in practice for the current use > cases (particularly with CIL context size fixes). Even if not, there's a > clear enough path to tracking relog reservation with a ticket in the CIL > context in a manner that is more conducive to batching. We also may be > able to union ->li_cb() with a ->li_relog() variant to relog intent > items as dfops currently does for things like EFIs that don't currently > support direct relogging of the same object. > > Thoughts about using something like this as an intermediate solution, > provided it holds up against some stress testing? > In thinking about this a bit more, it occurs to me that there might be an elegant way to provide simplicity and flexibility by incorporating automatic relogging into xfsaild rather than tieing it into the CIL or having it completely independent (as the past couple of RFCs have done). From the simplicity standpoint, xfsaild already tracks logged+committed items for us, so that eliminates the need for independent "RIL" tracking. xfsaild already issues log item callbacks that could translate the new log item LI_RELOG state bit to a corresponding XFS_ITEM_RELOG return code that triggers a relog of the item. We also know the lsn of the item at this point and can further compare to the tail lsn to only relog such items when they sit at the log tail. This is more efficient than the current proposal to automatically relog on every CIL checkpoint. From the flexibility standpoint, xfsaild already knows how to safely access all types of log items via the log ops vector. IOW, it knows how to exclusively access a log item for writeback, so it would just need generic/mechanical bits to relog a particular item instead. The caveat to this is that the task that requests auto relog must relenquish locks for relogging to actually take place. For example, the sequence for a traditional (non-intent) log item would be something like:: - lock object - dirty in transaction - set lip relog bit - commit transaction - unlock object (background relogging now enabled) At this point the log item is essentially pinned in-core without pinning the tail of the log. It is free to be modified by any unrelated transaction without conflict to either task, but xfsaild will not write it back. Sometime later the original task would have to lock the item and clear the relog state to cancel the sequence. The task could simply release the item to allow writeback or join it to a final transaction with the relog bit cleared. There's still room for a custom ->iop_relog() callback or some such for any items that require special relog handling. If releasing locks is ever a concern for a particular operation, that callback could overload the generic relog mechanism and serve as a notification to the lock holder to roll its own transaction without ever unlocking the item. TBH I'm still not sure there's a use case for this kind of non-intent relogging, but ISTM this design model accommodates it reasonably enough with minimal complexity. There are still some open implementation questions around managing the relog transaction(s), determining how much reservation is needed, how to acquire it, etc. We most likely do not want to block the xfsaild task on acquiring reservation, but it might be able to regrant a permanent ticket or simply kick off the task of replenishing relog reservation to a workqueue. Thoughts? Brian > Brian > > fs/xfs/xfs_log.c | 1 + > fs/xfs/xfs_log_cil.c | 50 +++++++++++++++++++++++++++++++++++++++- > fs/xfs/xfs_log_priv.h | 2 ++ > fs/xfs/xfs_qm_syscalls.c | 6 +++++ > fs/xfs/xfs_trans.h | 5 +++- > 5 files changed, 62 insertions(+), 2 deletions(-) > > diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c > index 6a147c63a8a6..4fb3c3156ea2 100644 > --- a/fs/xfs/xfs_log.c > +++ b/fs/xfs/xfs_log.c > @@ -1086,6 +1086,7 @@ xfs_log_item_init( > INIT_LIST_HEAD(&item->li_cil); > INIT_LIST_HEAD(&item->li_bio_list); > INIT_LIST_HEAD(&item->li_trans); > + INIT_LIST_HEAD(&item->li_ril); > } > > /* > diff --git a/fs/xfs/xfs_log_cil.c b/fs/xfs/xfs_log_cil.c > index 48435cf2aa16..c16ebc448a40 100644 > --- a/fs/xfs/xfs_log_cil.c > +++ b/fs/xfs/xfs_log_cil.c > @@ -19,6 +19,44 @@ > > struct workqueue_struct *xfs_discard_wq; > > +static void > +xfs_relog_worker( > + struct work_struct *work) > +{ > + struct xfs_cil_ctx *ctx = container_of(work, struct xfs_cil_ctx, relog_work); > + struct xfs_mount *mp = ctx->cil->xc_log->l_mp; > + struct xfs_trans *tp; > + struct xfs_log_item *lip, *lipp; > + int error; > + > + error = xfs_trans_alloc(mp, &M_RES(mp)->tr_qm_quotaoff, 0, 0, 0, &tp); > + ASSERT(!error); > + > + list_for_each_entry_safe(lip, lipp, &ctx->relog_list, li_ril) { > + list_del_init(&lip->li_ril); > + > + if (!test_bit(XFS_LI_RELOG, &lip->li_flags)) > + continue; > + > + xfs_trans_add_item(tp, lip); > + set_bit(XFS_LI_DIRTY, &lip->li_flags); > + tp->t_flags |= XFS_TRANS_DIRTY; > + } > + > + error = xfs_trans_commit(tp); > + ASSERT(!error); > + > + /* XXX */ > + kmem_free(ctx); > +} > + > +static void > +xfs_relog_queue( > + struct xfs_cil_ctx *ctx) > +{ > + queue_work(xfs_discard_wq, &ctx->relog_work); > +} > + > /* > * Allocate a new ticket. Failing to get a new ticket makes it really hard to > * recover, so we don't allow failure here. Also, we allocate in a context that > @@ -476,6 +514,9 @@ xlog_cil_insert_items( > */ > if (!list_is_last(&lip->li_cil, &cil->xc_cil)) > list_move_tail(&lip->li_cil, &cil->xc_cil); > + > + if (test_bit(XFS_LI_RELOG, &lip->li_flags)) > + list_move_tail(&lip->li_ril, &ctx->relog_list); > } > > spin_unlock(&cil->xc_cil_lock); > @@ -605,7 +646,10 @@ xlog_cil_committed( > > xlog_cil_free_logvec(ctx->lv_chain); > > - if (!list_empty(&ctx->busy_extents)) > + /* XXX: mutually exclusive w/ discard for POC to handle ctx freeing */ > + if (!list_empty(&ctx->relog_list)) > + xfs_relog_queue(ctx); > + else if (!list_empty(&ctx->busy_extents)) > xlog_discard_busy_extents(mp, ctx); > else > kmem_free(ctx); > @@ -746,8 +790,10 @@ xlog_cil_push( > */ > INIT_LIST_HEAD(&new_ctx->committing); > INIT_LIST_HEAD(&new_ctx->busy_extents); > + INIT_LIST_HEAD(&new_ctx->relog_list); > new_ctx->sequence = ctx->sequence + 1; > new_ctx->cil = cil; > + INIT_WORK(&new_ctx->relog_work, xfs_relog_worker); > cil->xc_ctx = new_ctx; > > /* > @@ -1199,6 +1245,8 @@ xlog_cil_init( > > INIT_LIST_HEAD(&ctx->committing); > INIT_LIST_HEAD(&ctx->busy_extents); > + INIT_LIST_HEAD(&ctx->relog_list); > + INIT_WORK(&ctx->relog_work, xfs_relog_worker); > ctx->sequence = 1; > ctx->cil = cil; > cil->xc_ctx = ctx; > diff --git a/fs/xfs/xfs_log_priv.h b/fs/xfs/xfs_log_priv.h > index b192c5a9f9fd..6fd7b7297bd3 100644 > --- a/fs/xfs/xfs_log_priv.h > +++ b/fs/xfs/xfs_log_priv.h > @@ -243,6 +243,8 @@ struct xfs_cil_ctx { > struct list_head iclog_entry; > struct list_head committing; /* ctx committing list */ > struct work_struct discard_endio_work; > + struct list_head relog_list; > + struct work_struct relog_work; > }; > > /* > diff --git a/fs/xfs/xfs_qm_syscalls.c b/fs/xfs/xfs_qm_syscalls.c > index 1ea82764bf89..08b6180cb5a3 100644 > --- a/fs/xfs/xfs_qm_syscalls.c > +++ b/fs/xfs/xfs_qm_syscalls.c > @@ -18,6 +18,7 @@ > #include "xfs_quota.h" > #include "xfs_qm.h" > #include "xfs_icache.h" > +#include "xfs_log.h" > > STATIC int > xfs_qm_log_quotaoff( > @@ -37,6 +38,7 @@ xfs_qm_log_quotaoff( > > qoffi = xfs_trans_get_qoff_item(tp, NULL, flags & XFS_ALL_QUOTA_ACCT); > xfs_trans_log_quotaoff_item(tp, qoffi); > + set_bit(XFS_LI_RELOG, &qoffi->qql_item.li_flags); > > spin_lock(&mp->m_sb_lock); > mp->m_sb.sb_qflags = (mp->m_qflags & ~(flags)) & XFS_MOUNT_QUOTA_ALL; > @@ -69,6 +71,10 @@ xfs_qm_log_quotaoff_end( > int error; > struct xfs_qoff_logitem *qoffi; > > + clear_bit(XFS_LI_RELOG, &startqoff->qql_item.li_flags); > + xfs_log_force(mp, XFS_LOG_SYNC); > + flush_workqueue(xfs_discard_wq); > + > error = xfs_trans_alloc(mp, &M_RES(mp)->tr_qm_equotaoff, 0, 0, 0, &tp); > if (error) > return error; > diff --git a/fs/xfs/xfs_trans.h b/fs/xfs/xfs_trans.h > index 64d7f171ebd3..e04033c29f0d 100644 > --- a/fs/xfs/xfs_trans.h > +++ b/fs/xfs/xfs_trans.h > @@ -48,6 +48,7 @@ struct xfs_log_item { > struct xfs_log_vec *li_lv; /* active log vector */ > struct xfs_log_vec *li_lv_shadow; /* standby vector */ > xfs_lsn_t li_seq; /* CIL commit seq */ > + struct list_head li_ril; > }; > > /* > @@ -59,12 +60,14 @@ struct xfs_log_item { > #define XFS_LI_ABORTED 1 > #define XFS_LI_FAILED 2 > #define XFS_LI_DIRTY 3 /* log item dirty in transaction */ > +#define XFS_LI_RELOG 4 /* automatic relogging */ > > #define XFS_LI_FLAGS \ > { (1 << XFS_LI_IN_AIL), "IN_AIL" }, \ > { (1 << XFS_LI_ABORTED), "ABORTED" }, \ > { (1 << XFS_LI_FAILED), "FAILED" }, \ > - { (1 << XFS_LI_DIRTY), "DIRTY" } > + { (1 << XFS_LI_DIRTY), "DIRTY" }, \ > + { (1 << XFS_LI_RELOG), "RELOG" } > > struct xfs_item_ops { > unsigned flags; > -- > 2.20.1 > ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2019-11-27 15:12 UTC | newest] Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2019-11-22 18:19 [RFC v2 PATCH 0/3] xfs: automatic relogging experiment Brian Foster 2019-11-22 18:19 ` [RFC v2 PATCH 1/3] xfs: set t_task at wait time instead of alloc time Brian Foster 2019-11-22 18:19 ` [RFC v2 PATCH 2/3] xfs: prototype automatic intent relogging mechanism Brian Foster 2019-11-22 18:19 ` [RFC v2 PATCH 3/3] xfs: automatically relog quotaoff start intent Brian Foster 2019-11-25 18:55 ` [RFC v3 PATCH] xfs: automatic relogging experiment Brian Foster 2019-11-27 15:12 ` Brian Foster
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).