All of lore.kernel.org
 help / color / mirror / Atom feed
From: Dave Chinner <david@fromorbit.com>
To: "Darrick J. Wong" <djwong@kernel.org>
Cc: sandeen@sandeen.net, Christoph Hellwig <hch@lst.de>,
	Dave Chinner <dchinner@redhat.com>, Theodore Ts'o <tytso@mit.edu>,
	linux-xfs@vger.kernel.org, allison.henderson@oracle.com
Subject: Re: [PATCH 19/17] mkfs: increase default log size for new (aka bigtime) filesystems
Date: Sun, 27 Feb 2022 08:37:20 +1100	[thread overview]
Message-ID: <20220226213720.GQ59715@dread.disaster.area> (raw)
In-Reply-To: <20220226025450.GY8313@magnolia>

On Fri, Feb 25, 2022 at 06:54:50PM -0800, Darrick J. Wong wrote:
> From: Darrick J. Wong <djwong@kernel.org>
> 
> Recently, the upstream kernel maintainer has been taking a lot of heat on
> account of writer threads encountering high latency when asking for log
> grant space when the log is small.  The reported use case is a heavily
> threaded indexing product logging trace information to a filesystem
> ranging in size between 20 and 250GB.  The meetings that result from the
> complaints about latency and stall warnings in dmesg both from this use
> case and also a large well known cloud product are now consuming 25% of
> the maintainer's weekly time and have been for months.

Is the transaction reservation space exhaustion caused by, as I
pointed out in another thread yesterday, the unbound concurrency in
IO completion? i.e. we have hundreds of active concurrent
transactions that then block on common objects between them (e.g.
inode locks) and serialise? Hence only handful of completions can
actually run concurrently, depsite every completion holding a full
reservation of log space to allow them to run concurrently?

> For small filesystems, the log is small by default because we have
> defaulted to a ratio of 1:2048 (or even less).  For grown filesystems,
> this is even worse, because big filesystems generate big metadata.
> However, the log size is still insufficient even if it is formatted at
> the larger size.
> 
> Therefore, if we're writing a new filesystem format (aka bigtime), bump
> the ratio unconditionally from 1:2048 to 1:256.  On a 220GB filesystem,
> the 99.95% latencies observed with a 200-writer file synchronous append
> workload running on a 44-AG filesystem (with 44 CPUs) spread across 4
> hard disks showed:
> 
> Log Size (MB)	Latency (ms)	Throughput (MB/s)
> 10		520		243
> 20		220		308
> 40		140		360
> 80		92		363
> 160		86		364
> 
> For 4 NVME, the results were:
> 
> 10		201		409
> 20		177		488
> 40		122		550
> 80		120		549
> 160		121		545
> 
> Hence we increase the ratio by 16x because there doesn't seem to be much
> improvement beyond that, and we don't want the log to grow /too/ large.

1:2048 -> 1:256 is an 8x bump, yes?  Which means we'll get a 2GB log
on a 512GB filesystem, and the 220GB log you tested is getting a
~1GB log?

I also wonder if the right thing to do here is just set a minimum
log size of 32MB? The worst of the long tail latencies are mitigated
by this point, and so even small filesystems grown out to 200GB will
have a log size that results in decent performance for this sort of
workload.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

  reply	other threads:[~2022-02-26 21:37 UTC|newest]

Thread overview: 60+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-01-20  0:21 [PATCHSET 00/17] xfsprogs: various 5.15 fixes Darrick J. Wong
2022-01-20  0:21 ` [PATCH 01/17] libxcmd: use emacs mode for command history editing Darrick J. Wong
2022-01-20  0:21 ` [PATCH 02/17] libxfs: shut down filesystem if we xfs_trans_cancel with deferred work items Darrick J. Wong
2022-02-04 21:36   ` Eric Sandeen
2022-02-04 21:47     ` Darrick J. Wong
2022-01-20  0:21 ` [PATCH 03/17] libxfs: don't leave dangling perag references from xfs_buf Darrick J. Wong
2022-02-04 22:05   ` Eric Sandeen
2022-01-20  0:21 ` [PATCH 04/17] libfrog: move the GETFSMAP definitions into libfrog Darrick J. Wong
2022-02-04 23:18   ` Eric Sandeen
2022-02-05  0:36     ` Darrick J. Wong
2022-02-07  1:05       ` Dave Chinner
2022-02-07 17:09         ` Darrick J. Wong
2022-02-07 21:32           ` Eric Sandeen
2022-02-10  3:33             ` Dave Chinner
2022-02-08 16:46   ` [PATCH v1.1 04/17] libfrog: always use the kernel GETFSMAP definitions Darrick J. Wong
2022-02-25 22:35     ` Eric Sandeen
2022-01-20  0:22 ` [PATCH 05/17] misc: add a crc32c self test to mkfs and repair Darrick J. Wong
2022-02-04 23:23   ` Eric Sandeen
2022-01-20  0:22 ` [PATCH 06/17] libxfs-apply: support filterdiff >= 0.4.2 only Darrick J. Wong
2022-01-20  0:22 ` [PATCH 07/17] xfs_db: fix nbits parameter in fa_ino[48] functions Darrick J. Wong
2022-02-25 21:45   ` Eric Sandeen
2022-01-20  0:22 ` [PATCH 08/17] xfs_repair: explicitly cast resource usage counts in do_warn Darrick J. Wong
2022-02-25 21:46   ` Eric Sandeen
2022-01-20  0:22 ` [PATCH 09/17] xfs_repair: explicitly cast directory inode numbers " Darrick J. Wong
2022-02-25 21:48   ` Eric Sandeen
2022-01-20  0:22 ` [PATCH 10/17] xfs_repair: fix indentation problems in upgrade_filesystem Darrick J. Wong
2022-02-25 21:53   ` Eric Sandeen
2022-01-20  0:22 ` [PATCH 11/17] xfs_repair: update secondary superblocks after changing features Darrick J. Wong
2022-02-25 21:57   ` Eric Sandeen
2022-01-20  0:22 ` [PATCH 12/17] xfs_scrub: report optional features in version string Darrick J. Wong
2022-01-20  1:16   ` Theodore Ts'o
2022-01-20  1:28     ` Darrick J. Wong
2022-01-20  1:32   ` [PATCH v2 " Darrick J. Wong
2022-02-25 22:14     ` Eric Sandeen
2022-02-26  0:04       ` Darrick J. Wong
2022-02-26  2:48         ` Darrick J. Wong
2022-02-26  2:53   ` [PATCH v3 " Darrick J. Wong
2022-02-28 21:38     ` Eric Sandeen
2022-01-20  0:22 ` [PATCH 13/17] mkfs: prevent corruption of passed-in suboption string values Darrick J. Wong
2022-01-20  0:22 ` [PATCH 14/17] mkfs: add configuration files for the last few LTS kernels Darrick J. Wong
2022-01-20  0:22 ` [PATCH 15/17] mkfs: document sample configuration file location Darrick J. Wong
2022-01-20  0:23 ` [PATCH 16/17] mkfs: add a config file for x86_64 pmem filesystems Darrick J. Wong
2022-02-25 22:21   ` Eric Sandeen
2022-02-26  2:38     ` Darrick J. Wong
2022-02-26  2:52   ` [PATCH v2 " Darrick J. Wong
2022-02-28 21:37     ` Eric Sandeen
2022-01-20  0:23 ` [PATCH 17/17] mkfs: enable inobtcount and bigtime by default Darrick J. Wong
2022-02-25 22:22   ` Eric Sandeen
2022-01-28 22:44 ` [PATCH 18/17] xfs_scrub: fix reporting if we can't open raw block devices Darrick J. Wong
2022-01-31 12:28   ` Christoph Hellwig
2022-02-26  2:54 ` [PATCH 19/17] mkfs: increase default log size for new (aka bigtime) filesystems Darrick J. Wong
2022-02-26 21:37   ` Dave Chinner [this message]
2022-02-28 23:22     ` Darrick J. Wong
2022-03-01  0:42       ` Dave Chinner
2022-03-01  2:38         ` Darrick J. Wong
2022-03-01 15:55           ` Brian Foster
2022-03-01  3:10         ` Dave Chinner
2022-02-28 21:44   ` Eric Sandeen
2022-03-01  2:21     ` Darrick J. Wong
2022-03-01  2:44       ` Eric Sandeen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220226213720.GQ59715@dread.disaster.area \
    --to=david@fromorbit.com \
    --cc=allison.henderson@oracle.com \
    --cc=dchinner@redhat.com \
    --cc=djwong@kernel.org \
    --cc=hch@lst.de \
    --cc=linux-xfs@vger.kernel.org \
    --cc=sandeen@sandeen.net \
    --cc=tytso@mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.