All of lore.kernel.org
 help / color / mirror / Atom feed
From: Joshi <joshiiitr@gmail.com>
To: david@fromorbit.com
Cc: linux-xfs@vger.kernel.org
Subject: Re: Strange behavior with log IO fake-completions
Date: Tue, 11 Sep 2018 09:40:59 +0530	[thread overview]
Message-ID: <CA+1E3r+Lggw9JJymdhoyzkqQfqGETZE=R=WG09s7FOhY3Snb5g@mail.gmail.com> (raw)
In-Reply-To: <20180910235827.GD5631@dastard>

> > Test: iozone, single-thread, 1GiB file, 4K record, sync for each 4K (
> > '-eo' option).
> > Disk: 800GB NVMe disk. XFS based on 4.15, default options except log size = 184M
> > Machine: Intel Xeon E5-2690 @2.6 GHz, 2 NUMA nodes, 24 cpus each
> >
> > And results are :
> > ------------------------------------------------
> > baseline                log fake-completion
> > 109,845                 45,538
> > ------------------------------------------------
> > I wondered why fake-completion turned out to be ~50% slower!
> > May I know if anyone encountered this before, or knows why this can happen?
>
> You made all log IO submission/completion synchronous.
>
> https://marc.info/?l=linux-xfs&m=153532933529663&w=2
>
> > For fake-completion, I just tag all log IOs bufer-pointers (in
> > xlog_sync). And later in xfs_buf_submit, I just complete those tagged
> > log IOs without any real bio-formation (comment call to
> > _xfs_bio_ioapply). Hope this is correct/enough to do nothing!
>
> It'll work, but it's a pretty silly thing to do. See above.

Thank you very much, Dave. I feel things are somewhat different here
than other email-thread you pointed.
Only log IO was fake-completed, not that entire XFS volume was on
ramdisk. Underlying disk was NVMe, and I checked that no
merging/batching happened for log IO submissions in base case.
Completion count were same (as many as submitted) too.
Call that I disabled in base code (in xfs_buf_submit, for log IOs) is
_xfs_buf_ioapply. So only thing happened differently for log IO
submitter thread is that it executed bp-completion-handling-code
(xfs_buf_ioend_async). And that anyway pushes the processing to
worker. It still remains mostly async, I suppose. In original form, it
would have executed extra code to form/sent bio (possibility of
submission/completion merging, but that did not happen for this
workload), and completion would have come after some time say T.
I wondered about impact on XFS if this time T can be made very low by
underlying storage for certain IOs.
If underlying device/layers provide some sort of differentiated I/O
service enabling ultra-low-latency completion for certain IOs (flagged
as urgent), and one chooses log IO to take that low-latency path  -
won't we see same problem as shown by fake-completion?

Thanks,

On Tue, Sep 11, 2018 at 5:28 AM Dave Chinner <david@fromorbit.com> wrote:
>
> On Mon, Sep 10, 2018 at 11:37:45PM +0530, Joshi wrote:
> > Hi folks,
> > I wanted to check log IO speed impact during fsync-heavy workload.
> > To obtain theoretical maximum performance data, I did fake-completion
> > of all log IOs (i.e. log IO cost is made 0).
> >
> > Test: iozone, single-thread, 1GiB file, 4K record, sync for each 4K (
> > '-eo' option).
> > Disk: 800GB NVMe disk. XFS based on 4.15, default options except log size = 184M
> > Machine: Intel Xeon E5-2690 @2.6 GHz, 2 NUMA nodes, 24 cpus each
> >
> > And results are :
> > ------------------------------------------------
> > baseline                log fake-completion
> > 109,845                 45,538
> > ------------------------------------------------
> > I wondered why fake-completion turned out to be ~50% slower!
> > May I know if anyone encountered this before, or knows why this can happen?
>
> You made all log IO submission/completion synchronous.
>
> https://marc.info/?l=linux-xfs&m=153532933529663&w=2
>
> > For fake-completion, I just tag all log IOs bufer-pointers (in
> > xlog_sync). And later in xfs_buf_submit, I just complete those tagged
> > log IOs without any real bio-formation (comment call to
> > _xfs_bio_ioapply). Hope this is correct/enough to do nothing!
>
> It'll work, but it's a pretty silly thing to do. See above.
>
> > It seems to me that CPU count/frequency is playing a role here.
>
> Unlikely.
>
> > Above data was obtained with CPU frequency set to higher values. In
> > order to keep running CPU at nearly constant high frequency, I tried
> > things such as - performance governor, bios-based performance
> > settings, explicit setting of cpu scaling max frequency etc. However,
> > results did not differ much. Moreover frequency did not remain
> > constant/high.
> >
> > But when I used "affine/bind" option of iozone (-P option), iozone
> > runs on single cpu all the time, and I get to see expected result -
> > -------------------------------------------------------------
> > baseline (affine)         log fake-completion(affine)
> > 125,253                     163,367
> > -------------------------------------------------------------
>
> Yup, because now it forces the work that gets handed off to another
> workqueue (the CIL push workqueue) to also run on the same CPU
> rather than asynchronously on another CPU. The result is that you
> essentially force everything to run in a tight loop on a hot CPU
> cache. Hitting a hot cache can make code run much, much faster in
> microbenchmark situations like this, leading to optimisations that
> don't actually work in the real world where those same code paths
> never run confined to a single pre-primed, hot CPU cache.
>
> When you combine that with the fact that IOZone has a very well
> known susceptibility to CPU cache residency effects, it means the
> results are largely useless for comparison between different kernel
> builds. This is because small code changes can result in
> sufficiently large changes in kernel CPU cache footprint that it
> perturbs IOZone behaviour. We typically see variations of over
> +/-10% from IOZone just by running two kernels that have slightly
> different config parameters.
>
> IOWs, don't use IOZone for anything related to performance testing.
>
> > Also, during above episode, I felt the need to discover best way to
> > eliminate cpu frequency variations out of benchmarking. I'd be
> > thankful knowing about it.
>
> I've never bothered with tuning for affinity or CPU frequency
> scaling when perf testing. If you have to rely on such things to get
> optimal performance from your filesystem algorithms, you are doing
> it wrong.
>
> That is: a CPU running at near full utilisation will always be run
> at maximum frequency, hence if you have to tune CPU frequency to get
> decent performance your algorithm is limited by something that
> prevents full CPU utilisation, not CPU frequency.
>
> Similarly, if you have to use affinity to get decent performance,
> you're optimising for limited system utilisation rather than
> bein able to use all the resources in the machine effectively. The
> first goal of filesystem optimisation is to utilise every resource as
> efficiently as possible. Then people can constraint their workloads
> with affinity, containers, etc however they want without having to
> care about performance - it will never be worse than the performance
> at full resource utilisation.....
>
> Cheers,
>
> Dave.
> --
> Dave Chinner
> david@fromorbit.com



-- 
Joshi

  reply	other threads:[~2018-09-11  9:08 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-09-10 18:07 Strange behavior with log IO fake-completions Joshi
2018-09-10 23:58 ` Dave Chinner
2018-09-11  4:10   ` Joshi [this message]
2018-09-12  0:42     ` Dave Chinner
2018-09-14 11:51       ` Joshi
2018-09-17  2:56         ` Dave Chinner
2018-09-21  0:23           ` Dave Chinner
2018-09-21 12:54             ` Joshi
2018-09-24 14:25               ` Joshi
2018-09-26  0:59                 ` Dave Chinner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CA+1E3r+Lggw9JJymdhoyzkqQfqGETZE=R=WG09s7FOhY3Snb5g@mail.gmail.com' \
    --to=joshiiitr@gmail.com \
    --cc=david@fromorbit.com \
    --cc=linux-xfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.