linux-block.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Gregory Szorc <gregory.szorc@gmail.com>
To: linux-block@vger.kernel.org, linux-fsdevel@vger.kernel.org
Subject: Observability of filesystem syncing and device flushing
Date: Tue, 15 Jun 2021 21:59:49 -0700	[thread overview]
Message-ID: <CAKQoGa=6OZnqRNDsGpyY47S4SgrFCnGRTXAMgZ61axM58AWrEA@mail.gmail.com> (raw)

One thing I've learned from optimizing I/O performance of userland
applications is that a surprising number of applications are
effectively "fsync() benchmarks." What I mean by that is many
applications call fsync() - and similar functionality incurring a
flush to non-volatile storage - surprisingly often and that overall
I/O performance is effectively bottlenecked by how quickly the
underlying storage device can process repeated flushes.

You can easily confirm the validity of this statement by running
sync/flush heavy software (like pretty much any database software)
with eatmydata [1], an LD_PRELOAD library / utility that effectively
short-circuits libc functions like fsync(), fdatasync(), and msync()
so they no-op instead of triggering a sync/flush [via a syscall].

The kernel exposes various statistics about block devices,
filesystems, the page cache, etc via procfs. However, as far as I can
tell there are no direct or very limited statistics on sync/flush
operations. I've often wanted to capture/view system/kernel-level
activity for these sync/flush operations because they are often strong
contributors to I/O bottlenecks and it can be extremely useful to
correlate these operations against other metrics.

I've long considered sending kernel patches to add procfs
observability for sync/flushing activity so common system monitoring
tools can easily consume those metrics. I'm emailing linux-block@ and
linux-fsdevel@ to a) try to assess whether these patches would be
well-received b) gain feedback on the appropriate activity to track
(I'm not a kernel expert).

I think a reasonable response is "you can already observe this
activity via syscall tracing and using tools like systemtap, so no
additional procfs counters are needed." I'd counter by saying, yes,
you can get some valuable information this way. But I believe there
are some large holes, such as limited visibility into block layer
activity outside of debugfs and when kernel subsystems (such as
background flusher threads) incur work independently of an observable
syscall.

Do others feel there is an observability gap in the kernel around
filesystem syncing and storage flushing? If so, how would you
recommend improving matters?

Gregory

[1] https://www.flamingspork.com/projects/libeatmydata/

                 reply	other threads:[~2021-06-16  5:00 UTC|newest]

Thread overview: [no followups] expand[flat|nested]  mbox.gz  Atom feed

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAKQoGa=6OZnqRNDsGpyY47S4SgrFCnGRTXAMgZ61axM58AWrEA@mail.gmail.com' \
    --to=gregory.szorc@gmail.com \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --subject='Re: Observability of filesystem syncing and device flushing' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).