From: Brian Foster <bfoster@redhat.com> To: Dave Chinner <david@fromorbit.com> Cc: linux-xfs@vger.kernel.org, linux-mm@kvack.org, linux-fsdevel@vger.kernel.org Subject: Re: [PATCH 02/26] xfs: Throttle commits on delayed background CIL push Date: Fri, 11 Oct 2019 08:38:37 -0400 Message-ID: <20191011123837.GA61257@bfoster> (raw) In-Reply-To: <20191009032124.10541-3-david@fromorbit.com> On Wed, Oct 09, 2019 at 02:21:00PM +1100, Dave Chinner wrote: > From: Dave Chinner <dchinner@redhat.com> > > In certain situations the background CIL push can be indefinitely > delayed. While we have workarounds from the obvious cases now, it > doesn't solve the underlying issue. This issue is that there is no > upper limit on the CIL where we will either force or wait for > a background push to start, hence allowing the CIL to grow without > bound until it consumes all log space. > > To fix this, add a new wait queue to the CIL which allows background > pushes to wait for the CIL context to be switched out. This happens > when the push starts, so it will allow us to block incoming > transaction commit completion until the push has started. This will > only affect processes that are running modifications, and only when > the CIL threshold has been significantly overrun. > > This has no apparent impact on performance, and doesn't even trigger > until over 45 million inodes had been created in a 16-way fsmark > test on a 2GB log. That was limiting at 64MB of log space used, so > the active CIL size is only about 3% of the total log in that case. > The concurrent removal of those files did not trigger the background > sleep at all. > > Signed-off-by: Dave Chinner <dchinner@redhat.com> > --- This looks the same as the previous version. Brian > fs/xfs/xfs_log_cil.c | 37 +++++++++++++++++++++++++++++++++---- > fs/xfs/xfs_log_priv.h | 24 ++++++++++++++++++++++++ > fs/xfs/xfs_trace.h | 1 + > 3 files changed, 58 insertions(+), 4 deletions(-) > > diff --git a/fs/xfs/xfs_log_cil.c b/fs/xfs/xfs_log_cil.c > index ef652abd112c..4a09d50e1368 100644 > --- a/fs/xfs/xfs_log_cil.c > +++ b/fs/xfs/xfs_log_cil.c > @@ -670,6 +670,11 @@ xlog_cil_push( > push_seq = cil->xc_push_seq; > ASSERT(push_seq <= ctx->sequence); > > + /* > + * Wake up any background push waiters now this context is being pushed. > + */ > + wake_up_all(&ctx->push_wait); > + > /* > * Check if we've anything to push. If there is nothing, then we don't > * move on to a new sequence number and so we have to be able to push > @@ -746,6 +751,7 @@ xlog_cil_push( > */ > INIT_LIST_HEAD(&new_ctx->committing); > INIT_LIST_HEAD(&new_ctx->busy_extents); > + init_waitqueue_head(&new_ctx->push_wait); > new_ctx->sequence = ctx->sequence + 1; > new_ctx->cil = cil; > cil->xc_ctx = new_ctx; > @@ -900,7 +906,7 @@ xlog_cil_push_work( > */ > static void > xlog_cil_push_background( > - struct xlog *log) > + struct xlog *log) __releases(cil->xc_ctx_lock) > { > struct xfs_cil *cil = log->l_cilp; > > @@ -914,14 +920,36 @@ xlog_cil_push_background( > * don't do a background push if we haven't used up all the > * space available yet. > */ > - if (cil->xc_ctx->space_used < XLOG_CIL_SPACE_LIMIT(log)) > + if (cil->xc_ctx->space_used < XLOG_CIL_SPACE_LIMIT(log)) { > + up_read(&cil->xc_ctx_lock); > return; > + } > > spin_lock(&cil->xc_push_lock); > if (cil->xc_push_seq < cil->xc_current_sequence) { > cil->xc_push_seq = cil->xc_current_sequence; > queue_work(log->l_mp->m_cil_workqueue, &cil->xc_push_work); > } > + > + /* > + * Drop the context lock now, we can't hold that if we need to sleep > + * because we are over the blocking threshold. The push_lock is still > + * held, so blocking threshold sleep/wakeup is still correctly > + * serialised here. > + */ > + up_read(&cil->xc_ctx_lock); > + > + /* > + * If we are well over the space limit, throttle the work that is being > + * done until the push work on this context has begun. > + */ > + if (cil->xc_ctx->space_used >= XLOG_CIL_BLOCKING_SPACE_LIMIT(log)) { > + trace_xfs_log_cil_wait(log, cil->xc_ctx->ticket); > + ASSERT(cil->xc_ctx->space_used < log->l_logsize); > + xlog_wait(&cil->xc_ctx->push_wait, &cil->xc_push_lock); > + return; > + } > + > spin_unlock(&cil->xc_push_lock); > > } > @@ -1038,9 +1066,9 @@ xfs_log_commit_cil( > if (lip->li_ops->iop_committing) > lip->li_ops->iop_committing(lip, xc_commit_lsn); > } > - xlog_cil_push_background(log); > > - up_read(&cil->xc_ctx_lock); > + /* xlog_cil_push_background() releases cil->xc_ctx_lock */ > + xlog_cil_push_background(log); > } > > /* > @@ -1199,6 +1227,7 @@ xlog_cil_init( > > INIT_LIST_HEAD(&ctx->committing); > INIT_LIST_HEAD(&ctx->busy_extents); > + init_waitqueue_head(&ctx->push_wait); > ctx->sequence = 1; > ctx->cil = cil; > cil->xc_ctx = ctx; > diff --git a/fs/xfs/xfs_log_priv.h b/fs/xfs/xfs_log_priv.h > index a3cc8a9a16d9..f231b7dfaeab 100644 > --- a/fs/xfs/xfs_log_priv.h > +++ b/fs/xfs/xfs_log_priv.h > @@ -247,6 +247,7 @@ struct xfs_cil_ctx { > struct xfs_log_vec *lv_chain; /* logvecs being pushed */ > struct list_head iclog_entry; > struct list_head committing; /* ctx committing list */ > + wait_queue_head_t push_wait; /* background push throttle */ > struct work_struct discard_endio_work; > }; > > @@ -344,10 +345,33 @@ struct xfs_cil { > * buffer window (32MB) as measurements have shown this to be roughly the > * point of diminishing performance increases under highly concurrent > * modification workloads. > + * > + * To prevent the CIL from overflowing upper commit size bounds, we introduce a > + * new threshold at which we block committing transactions until the background > + * CIL commit commences and switches to a new context. While this is not a hard > + * limit, it forces the process committing a transaction to the CIL to block and > + * yeild the CPU, giving the CIL push work a chance to be scheduled and start > + * work. This prevents a process running lots of transactions from overfilling > + * the CIL because it is not yielding the CPU. We set the blocking limit at > + * twice the background push space threshold so we keep in line with the AIL > + * push thresholds. > + * > + * Note: this is not a -hard- limit as blocking is applied after the transaction > + * is inserted into the CIL and the push has been triggered. It is largely a > + * throttling mechanism that allows the CIL push to be scheduled and run. A hard > + * limit will be difficult to implement without introducing global serialisation > + * in the CIL commit fast path, and it's not at all clear that we actually need > + * such hard limits given the ~7 years we've run without a hard limit before > + * finding the first situation where a checkpoint size overflow actually > + * occurred. Hence the simple throttle, and an ASSERT check to tell us that > + * we've overrun the max size. > */ > #define XLOG_CIL_SPACE_LIMIT(log) \ > min_t(int, (log)->l_logsize >> 3, BBTOB(XLOG_TOTAL_REC_SHIFT(log)) << 4) > > +#define XLOG_CIL_BLOCKING_SPACE_LIMIT(log) \ > + (XLOG_CIL_SPACE_LIMIT(log) * 2) > + > /* > * ticket grant locks, queues and accounting have their own cachlines > * as these are quite hot and can be operated on concurrently. > diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h > index eaae275ed430..e7087ede2662 100644 > --- a/fs/xfs/xfs_trace.h > +++ b/fs/xfs/xfs_trace.h > @@ -1011,6 +1011,7 @@ DEFINE_LOGGRANT_EVENT(xfs_log_regrant_reserve_sub); > DEFINE_LOGGRANT_EVENT(xfs_log_ungrant_enter); > DEFINE_LOGGRANT_EVENT(xfs_log_ungrant_exit); > DEFINE_LOGGRANT_EVENT(xfs_log_ungrant_sub); > +DEFINE_LOGGRANT_EVENT(xfs_log_cil_wait); > > DECLARE_EVENT_CLASS(xfs_log_item_class, > TP_PROTO(struct xfs_log_item *lip), > -- > 2.23.0.rc1 >
next prev parent reply index Thread overview: 87+ messages / expand[flat|nested] mbox.gz Atom feed top 2019-10-09 3:20 [PATCH V2 00/26] mm, xfs: non-blocking inode reclaim Dave Chinner 2019-10-09 3:20 ` [PATCH 01/26] xfs: Lower CIL flush limit for large logs Dave Chinner 2019-10-11 12:39 ` Brian Foster 2019-10-30 17:08 ` Darrick J. Wong 2019-10-09 3:21 ` [PATCH 02/26] xfs: Throttle commits on delayed background CIL push Dave Chinner 2019-10-11 12:38 ` Brian Foster [this message] 2019-10-09 3:21 ` [PATCH 03/26] xfs: don't allow log IO to be throttled Dave Chinner 2019-10-11 9:35 ` Christoph Hellwig 2019-10-11 12:39 ` Brian Foster 2019-10-30 17:14 ` Darrick J. Wong 2019-10-09 3:21 ` [PATCH 04/26] xfs: Improve metadata buffer reclaim accountability Dave Chinner 2019-10-11 12:39 ` Brian Foster 2019-10-11 12:57 ` Christoph Hellwig 2019-10-11 23:14 ` Dave Chinner 2019-10-11 23:13 ` Dave Chinner 2019-10-12 12:05 ` Brian Foster 2019-10-13 3:14 ` Dave Chinner 2019-10-14 13:05 ` Brian Foster 2019-10-30 17:25 ` Darrick J. Wong 2019-10-30 21:43 ` Dave Chinner 2019-10-31 3:06 ` Darrick J. Wong 2019-10-31 20:50 ` Dave Chinner 2019-10-31 21:05 ` Darrick J. Wong 2019-10-31 21:22 ` Christoph Hellwig 2019-11-03 21:26 ` Dave Chinner 2019-11-04 23:08 ` Darrick J. Wong 2019-10-09 3:21 ` [PATCH 05/26] xfs: correctly acount for reclaimable slabs Dave Chinner 2019-10-11 12:39 ` Brian Foster 2019-10-30 17:16 ` Darrick J. Wong 2019-10-09 3:21 ` [PATCH 06/26] xfs: synchronous AIL pushing Dave Chinner 2019-10-11 9:42 ` Christoph Hellwig 2019-10-11 12:40 ` Brian Foster 2019-10-11 23:15 ` Dave Chinner 2019-10-09 3:21 ` [PATCH 07/26] xfs: tail updates only need to occur when LSN changes Dave Chinner 2019-10-11 9:50 ` Christoph Hellwig 2019-10-11 12:40 ` Brian Foster 2019-10-09 3:21 ` [PATCH 08/26] mm: directed shrinker work deferral Dave Chinner 2019-10-14 8:46 ` Christoph Hellwig 2019-10-14 13:06 ` Brian Foster 2019-10-18 7:59 ` Dave Chinner 2019-10-09 3:21 ` [PATCH 09/26] shrinkers: use defer_work for GFP_NOFS sensitive shrinkers Dave Chinner 2019-10-09 3:21 ` [PATCH 10/26] mm: factor shrinker work calculations Dave Chinner 2019-10-09 3:21 ` [PATCH 11/26] shrinker: defer work only to kswapd Dave Chinner 2019-10-09 3:21 ` [PATCH 12/26] shrinker: clean up variable types and tracepoints Dave Chinner 2019-10-09 3:21 ` [PATCH 13/26] mm: reclaim_state records pages reclaimed, not slabs Dave Chinner 2019-10-09 3:21 ` [PATCH 14/26] mm: back off direct reclaim on excessive shrinker deferral Dave Chinner 2019-10-11 16:21 ` Matthew Wilcox 2019-10-11 23:20 ` Dave Chinner 2019-10-09 3:21 ` [PATCH 15/26] mm: kswapd backoff for shrinkers Dave Chinner 2019-10-09 3:21 ` [PATCH 16/26] xfs: synchronous AIL pushing Dave Chinner 2019-10-11 10:18 ` Christoph Hellwig 2019-10-11 15:29 ` Brian Foster 2019-10-11 23:27 ` Dave Chinner 2019-10-12 12:08 ` Brian Foster 2019-10-09 3:21 ` [PATCH 17/26] xfs: don't block kswapd in inode reclaim Dave Chinner 2019-10-11 15:29 ` Brian Foster 2019-10-09 3:21 ` [PATCH 18/26] xfs: reduce kswapd blocking on inode locking Dave Chinner 2019-10-11 10:29 ` Christoph Hellwig 2019-10-09 3:21 ` [PATCH 19/26] xfs: kill background reclaim work Dave Chinner 2019-10-11 10:31 ` Christoph Hellwig 2019-10-09 3:21 ` [PATCH 20/26] xfs: use AIL pushing for inode reclaim IO Dave Chinner 2019-10-11 17:38 ` Brian Foster 2019-10-09 3:21 ` [PATCH 21/26] xfs: remove mode from xfs_reclaim_inodes() Dave Chinner 2019-10-11 10:39 ` Christoph Hellwig 2019-10-14 13:07 ` Brian Foster 2019-10-09 3:21 ` [PATCH 22/26] xfs: track reclaimable inodes using a LRU list Dave Chinner 2019-10-11 10:42 ` Christoph Hellwig 2019-10-14 13:07 ` Brian Foster 2019-10-09 3:21 ` [PATCH 23/26] xfs: reclaim inodes from the LRU Dave Chinner 2019-10-11 10:56 ` Christoph Hellwig 2019-10-30 23:25 ` Dave Chinner 2019-10-09 3:21 ` [PATCH 24/26] xfs: remove unusued old inode reclaim code Dave Chinner 2019-10-09 3:21 ` [PATCH 25/26] xfs: rework unreferenced inode lookups Dave Chinner 2019-10-11 12:55 ` Christoph Hellwig 2019-10-11 13:39 ` Peter Zijlstra 2019-10-11 23:38 ` Dave Chinner 2019-10-14 13:07 ` Brian Foster 2019-10-17 1:24 ` Dave Chinner 2019-10-17 7:57 ` Brian Foster 2019-10-18 20:29 ` Dave Chinner 2019-10-09 3:21 ` [PATCH 26/26] xfs: use xfs_ail_push_all_sync in xfs_reclaim_inodes Dave Chinner 2019-10-11 9:55 ` Christoph Hellwig 2019-10-09 7:06 ` [PATCH V2 00/26] mm, xfs: non-blocking inode reclaim Christoph Hellwig 2019-10-11 19:03 ` Josef Bacik 2019-10-11 23:48 ` Dave Chinner 2019-10-12 0:19 ` Josef Bacik 2019-10-12 0:48 ` Dave Chinner
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=20191011123837.GA61257@bfoster \ --to=bfoster@redhat.com \ --cc=david@fromorbit.com \ --cc=linux-fsdevel@vger.kernel.org \ --cc=linux-mm@kvack.org \ --cc=linux-xfs@vger.kernel.org \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: link
Linux-XFS Archive on lore.kernel.org Archives are clonable: git clone --mirror https://lore.kernel.org/linux-xfs/0 linux-xfs/git/0.git # If you have public-inbox 1.1+ installed, you may # initialize and index your mirror using the following commands: public-inbox-init -V2 linux-xfs linux-xfs/ https://lore.kernel.org/linux-xfs \ linux-xfs@vger.kernel.org public-inbox-index linux-xfs Example config snippet for mirrors Newsgroup available over NNTP: nntp://nntp.lore.kernel.org/org.kernel.vger.linux-xfs AGPL code for this site: git clone https://public-inbox.org/public-inbox.git