All of lore.kernel.org
 help / color / mirror / Atom feed
From: Brian Foster <bfoster@redhat.com>
To: Dave Chinner <david@fromorbit.com>
Cc: linux-xfs@vger.kernel.org
Subject: Re: [PATCH 2/2] xfs: Throttle commits on delayed background CIL push
Date: Wed, 2 Oct 2019 08:41:39 -0400	[thread overview]
Message-ID: <20191002124139.GB2403@bfoster> (raw)
In-Reply-To: <20191001231433.GU16973@dread.disaster.area>

On Wed, Oct 02, 2019 at 09:14:33AM +1000, Dave Chinner wrote:
> On Tue, Oct 01, 2019 at 09:13:36AM -0400, Brian Foster wrote:
> > On Tue, Oct 01, 2019 at 01:42:07PM +1000, Dave Chinner wrote:
> > > On Tue, Oct 01, 2019 at 07:53:36AM +1000, Dave Chinner wrote:
> > > > On Mon, Sep 30, 2019 at 01:03:58PM -0400, Brian Foster wrote:
> > > > > Have you done similar testing for small/minimum sized logs?
> > > > 
> > > > Yes. I've had the tracepoint active during xfstests runs on test
> > > > filesystems using default log sizes on 5-15GB filesystems. The only
> > > > test in all of xfstests that has triggered it is generic/017, and it
> > > > only triggered once.
> > > > 
> > > > e.g.
> > > > 
> > > > # trace-cmd start -e xfs_log_cil_wait
> > > > <run xfstests>
> > > > # trace-cmd show
> > > > # tracer: nop
> > > > #
> > > > # entries-in-buffer/entries-written: 1/1   #P:4
> > > > #
> > > > #                              _-----=> irqs-off
> > > > #                             / _----=> need-resched
> > > > #                            | / _---=> hardirq/softirq
> > > > #                            || / _--=> preempt-depth
> > > > #                            ||| /     delay
> > > > #           TASK-PID   CPU#  ||||    TIMESTAMP  FUNCTION
> > > > #              | |       |   ||||       |         |
> > > >           xfs_io-2158  [001] ...1   309.285959: xfs_log_cil_wait: dev 8:96 t_ocnt 1 t_cnt 1 t_curr_res 67956 t_unit_res 67956 t_flags XLOG_TIC_INITED reserveq empty writeq empty grant_reserve_cycle 75 grant_reserve_bytes 12878480 grant_write_cycle 75 grant_write_bytes 12878480 curr_cycle 75 curr_block 10448 tail_cycle 75 tail_block 3560
> > > > #
> > > > 
> > > > And the timestamp matched the time that generic/017 was running.
> > > 
> > > SO I've run this on my typical 16-way fsmark workload with different
> > > size logs. It barely triggers on log sizes larger than 64MB, on 32MB
> > > logs I can see it capturing all 16 fsmark processes while waiting
> > > for the CIL context to switch. This will give you an idea of the
> > > log cycles the capture is occuring on, and the count of processes
> > > being captured:
> > > 
> > > $ sudo trace-cmd show | awk -e '/^ / {print $23}' | sort -n |uniq -c
> > >      16 251
> [snip]
> > >      16 2892
> > > $
> > 
> > Thanks. I assume I'm looking at cycle numbers and event counts here?
> 
> Yes.
> 
> > > So typically groups of captures are hundreds of log cycles apart
> > > (100 cycles x 32MB = ~3GB of log writes), then there will be a
> > > stutter where the CIL dispatch is delayed, and then everything
> > > continues on. These all show the log is always around the 75% full
> > > (AIL tail pushing theshold) but the reservation grant wait lists are
> > > always empty so we're not running out of reservation space here.
> > > 
> > 
> > It's somewhat interesting that we manage to block every thread most of
> > the time before the CIL push task starts. I wonder a bit if that pattern
> > would hold for a system/workload with more CPUs (and if so, if there are
> > any odd side effects of stalling and waking hundreds of tasks at the
> > same time vs. our traditional queuing behavior).
> 
> If I increase the concurrency (e.g. 16->32 threads for fsmark on a
> 64MB log), we hammer the spinlock on the grant head -hard-. i.e. CPU
> usage goes up by 40%, performance goes down by 50%, and all that CPU
> time is spent spinning on the reserve grant head lock. Basically,
> the log reservation space runs out, and we end up queuing on the
> reservation grant head and then we get reminded of just how bad
> having a serialisation point in the reservation fast path actually
> is for scalability...
> 

The small log case is not really what I'm wondering about. Does this
behavior translate to a similar test with a maximum sized log?

...
> 
> Larger logs block more threads on the CIL throttle, but the 32MB CIL
> window can soak up hundreds of max sized transaction reservations
> before overflowing so even running several hundred concurrent
> modification threads I haven't been able to drive enough concurrency
> through the CIL to see any sort of adverse behaviour.  And the
> workloads are running pretty consistently at less than 5,000 context
> switches/sec so there's no evidence of repeated thundering heard
> wakeup problems, either.
> 

That speaks to the rarity of the throttle, which is good. But I'm
wondering, for example, what might happen on systems where we could have
hundreds of physical CPUs committing to the CIL, we block them all on
the throttle and then wake them all at once. IOW, can we potentially
create the contention conditions you reproduce above in scenarios where
they might not have existed before?

Brian

> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com

  reply	other threads:[~2019-10-02 12:41 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-09-30  6:03 [PATCH v2 0/2] xfs: limit CIL push sizes Dave Chinner
2019-09-30  6:03 ` [PATCH 1/2] xfs: Lower CIL flush limit for large logs Dave Chinner
2019-09-30 16:55   ` Brian Foster
2019-09-30  6:03 ` [PATCH 2/2] xfs: Throttle commits on delayed background CIL push Dave Chinner
2019-09-30 17:03   ` Brian Foster
2019-09-30 21:53     ` Dave Chinner
2019-10-01  3:42       ` Dave Chinner
2019-10-01 13:13         ` Brian Foster
2019-10-01 23:14           ` Dave Chinner
2019-10-02 12:41             ` Brian Foster [this message]
2019-10-03  1:25               ` Dave Chinner
2019-10-03 14:41                 ` Brian Foster
2019-10-04  2:27                   ` Dave Chinner
2019-10-04 11:50                     ` Brian Foster
2019-10-08  2:51                       ` Dave Chinner
2019-10-08 13:22                         ` Brian Foster
2019-10-08 17:34                           ` Brian Foster
2019-10-01 13:13       ` Brian Foster
2019-10-01 22:31         ` Dave Chinner
2019-10-02 12:40           ` Brian Foster
2019-10-03  0:53             ` Dave Chinner
2019-10-03 14:39               ` Brian Foster
2019-10-08  3:34                 ` Dave Chinner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20191002124139.GB2403@bfoster \
    --to=bfoster@redhat.com \
    --cc=david@fromorbit.com \
    --cc=linux-xfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.