All of lore.kernel.org
 help / color / mirror / Atom feed
From: Dave Chinner <david@fromorbit.com>
To: Christoph Hellwig <hch@infradead.org>
Cc: linux-xfs@vger.kernel.org
Subject: Re: [PATCH 0/5] xfs: various log stuff...
Date: Thu, 4 Feb 2021 08:20:13 +1100	[thread overview]
Message-ID: <20210203212013.GV4662@dread.disaster.area> (raw)
In-Reply-To: <20210201123943.GA3281245@infradead.org>

On Mon, Feb 01, 2021 at 12:39:43PM +0000, Christoph Hellwig wrote:
> On Thu, Jan 28, 2021 at 03:41:49PM +1100, Dave Chinner wrote:
> > Hi folks,
> > 
> > Quick patch dump for y'all. A couple of minor cleanups to the
> > log behaviour, a fix for the CIL throttle hang and a couple of
> > patches to rework the cache flushing that journal IO does to reduce
> > the number of cache flushes by a couple of orders of magnitude.
> > 
> > All passes fstests with no regressions, no performance regressions
> > from fsmark, dbench and various fio workloads, some big gains even
> > on fast storage.
> 
> Can you elaborate on the big gains?

See the commit messages. dbench simulates fileserver behaviour with
extremely frequent fsync/->commit_metadata flush pointsi and that
shows gains at high client counts when logbsize=32k. fsmark is a
highly concurrent metadata modification worklaod designed to push
the journal to it's performance and scalability limits, etc, and
that shows 25% gains on logbsize=32k, bringing it up to the same
performance as logbsize=256k on the test machine.

> Workloads for one, but also
> what kind of storage.  For less FUA/flush to matter the device needs
> to have a write cache, which none of the really fast SSDs even has.

The gains are occurring on devices that have volatile caches. 
But that doesn't mean devices that have volatile caches are slow,
just that they can be faster with a better cache flushing strategy.

And yes, as you would expect, I don't see any change in behaviour on
data center SSDs that have no volatile caches because the block
layer elides cache flushes for them anyway.

But, really, device performance improvements really aren't the
motivation for this. The real motivation is removing orders of
magnitude of flush points from the software layers below the
filesystem. Stuff like software RAID, thin provisioning and other
functionality that must obey the flush/fua IOs they receive
regardless of whether the underlying hardware needs them or not.

Avoiding flush/fua for the journal IO means that RAID5/6 can cache
partial stripe writes from the XFS journal rather than having to
flush the partial stripe update for every journal IO. dm-thin
doesn't need to commit open transactions and flush all the dirty
data over newly allocated regions on every journal IO to a device
pool (i.e. cache flushes from one thinp device in a pool cause all
other thinp devices in the pool to stall new allocations until the
flush/fua is done). And so on.

There's no question at all that reducing the number of flush/fua
triggers is a good thing to be doing, regardless of the storage or
workloads that I've done validation testing on. The fact I've found
that on a decent performance SSD (120k randr IOPS, 60k randw IOPS)
shows a 25% increase in performance for journal IO bound workload
indicates just how much default configurations can be bound by the
journal cache flushes...

> So I'd only really expect gains from that on consumer grade SSDs and
> hard drives.

Sure, but those are exactly the devices we have always optimised
cache flushing for....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

      reply	other threads:[~2021-02-03 21:21 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-01-28  4:41 [PATCH 0/5] xfs: various log stuff Dave Chinner
2021-01-28  4:41 ` [PATCH 1/5] xfs: log stripe roundoff is a property of the log Dave Chinner
2021-01-28 14:57   ` Brian Foster
2021-01-28 20:59     ` Dave Chinner
2021-01-28 21:25   ` Darrick J. Wong
2021-01-28 22:00     ` Dave Chinner
2021-01-28  4:41 ` [PATCH 2/5] xfs: separate CIL commit record IO Dave Chinner
2021-01-28 15:07   ` Brian Foster
2021-01-28 21:22     ` Dave Chinner
     [not found]       ` <20210129145851.GB2660974@bfoster>
2021-01-29 22:25         ` Dave Chinner
2021-02-01 16:07           ` Brian Foster
2021-01-30  9:13   ` Chandan Babu R
2021-02-01 12:59   ` Christoph Hellwig
2021-01-28  4:41 ` [PATCH 3/5] xfs: journal IO cache flush reductions Dave Chinner
2021-01-28 15:12   ` Brian Foster
2021-01-28 21:46     ` Dave Chinner
2021-01-28 21:26   ` Dave Chinner
2021-01-30 12:56   ` Chandan Babu R
2021-01-28  4:41 ` [PATCH 4/5] xfs: Fix CIL throttle hang when CIL space used going backwards Dave Chinner
2021-01-28 16:53   ` Brian Foster
2021-02-02  5:52   ` Chandan Babu R
2021-02-17 11:33   ` Paul Menzel
2021-02-17 21:06   ` Donald Buczek
2021-01-28  4:41 ` [PATCH 5/5] xfs: reduce buffer log item shadow allocations Dave Chinner
2021-01-28 16:54   ` Brian Foster
2021-01-28 21:58     ` Dave Chinner
2021-02-02 12:01   ` Chandan Babu R
2021-02-01 12:39 ` [PATCH 0/5] xfs: various log stuff Christoph Hellwig
2021-02-03 21:20   ` Dave Chinner [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210203212013.GV4662@dread.disaster.area \
    --to=david@fromorbit.com \
    --cc=hch@infradead.org \
    --cc=linux-xfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.