Linux-BTRFS Archive on lore.kernel.org
 help / color / Atom feed
From: Josef Bacik <josef@toxicpanda.com>
To: linux-btrfs@vger.kernel.org, kernel-team@fb.com
Subject: [PATCH v3 00/12] Improve preemptive ENOSPC flushing
Date: Fri,  9 Oct 2020 09:28:17 -0400
Message-ID: <cover.1602249928.git.josef@toxicpanda.com> (raw)

v2->v3:
- Added a cleanup to make sure we pass the right enum around for flush_space().
- Fixed a problem reported by a clang build where I used the wrong enum in the
  preemptive flushing for flush_space().
- Fixed up some nits pointed out by Nikolay.

v1->v2:
- Added a FORCE_COMMIT_TRANS flush operation so we can keep the flush_space
  stuff consistent and get all the normal tracepoints.
- Renamed fs_info->dio_bytes to ->ordered_bytes and changed it to count all
  ordered extents that were pending, not just DIO ordered extents that were
  pending.
- Reworked the clamping to not apply if we're not doing a lot of delalloc
  reservations.
- Reworked the preempt flushing loop to be more straightforward.
- Fixed the need_preemptive_flushing() helper to take into account DIO heavy
  workloads.

--- Original email ---

A while ago Nikolay started digging into a problem where they were seeing an
around 20% regression on random writes, and he bisected it down to

  btrfs: don't end the transaction for delayed refs in throttle

However this wasn't actually the cause of the problem.

This patch removed the code that would preemptively end the transactions if we
were low on space.  Because we had just introduced the ticketing code, this was
no longer necessary and was causing a lot of transaction commits.

And in Nikolay's testing he validated this, we would see like 100x more
transaction commits without that patch than with it, but the write regression
clearly appeared when this patch was applied.

The root cause of this is that the transaction commits were essentially
happening so quickly that we didn't end up needing to wait on space in the
ENOSPC ticketing code as much, and thus were able to write pretty quickly.  With
this gone, we now were getting a sawtoothy sort of behavior where we'd run up,
stop while we flushed metadata space, run some more, stop again etc.

When I implemented the ticketing infrastructure, I was trying to get us out of
excessively flushing space because we would sometimes over create block groups,
and thus short circuited flushing if we no longer had tickets.  This had the
side effect of breaking the preemptive flushing code, where we attempted to
flush space in the background before we were forced to wait for space.

Enter this patchset.  We still have some of this preemption logic sprinkled
everywhere, so I've separated it out of the normal ticketed flushing code, and
made preemptive flushing it's own thing.

The preemptive flushing logic is more specialized than the standard flushing
logic.  It attempts to flush in whichever pool has the highest usage.  This
means that if most of our space is tied up in pinned extents, we'll commit the
transaction.  If most of the space is tied up in delalloc, we'll flush delalloc,
etc.

To test this out I used the fio job that Nikolay used, this needs to be adjusted
so the overall IO size is at least 2x the RAM size for the box you are testing

fio --direct=0 --ioengine=sync --thread --directory=/mnt/test --invalidate=1 \
        --group_reporting=1 --runtime=300 --fallocate=none --ramp_time=10 \
        --name=RandomWrites-async-64512-4k-4 --new_group --rw=randwrite \
        --size=2g --numjobs=4 --bs=4k --fsync_on_close=0 --end_fsync=0 \
        --filename_format=FioWorkloads.\$jobnum

I got the following results

misc-next:Josefbw=13.4MiB/s (14.0MB/s), 13.4MiB/s-13.4MiB/s (14.0MB/s-14.0MB/s), io=4015MiB (4210MB), run=300323-300323msec
pre-throttling:Josefbw=16.9MiB/s (17.7MB/s), 16.9MiB/s-16.9MiB/s (17.7MB/s-17.7MB/s), io=5068MiB (5314MB), run=300069-300069msec
my patches:Josefbw=18.0MiB/s (18.9MB/s), 18.0MiB/s-18.0MiB/s (18.9MB/s-18.9MB/s), io=5403MiB (5666MB), run=300001-300001msec

Thanks,

Josef

Josef Bacik (12):
  btrfs: make flush_space take a enum btrfs_flush_state instead of int
  btrfs: add a trace point for reserve tickets
  btrfs: track ordered bytes instead of just dio ordered bytes
  btrfs: introduce a FORCE_COMMIT_TRANS flush operation
  btrfs: improve preemptive background space flushing
  btrfs: rename need_do_async_reclaim
  btrfs: check reclaim_size in need_preemptive_reclaim
  btrfs: rework btrfs_calc_reclaim_metadata_size
  btrfs: simplify the logic in need_preemptive_flushing
  btrfs: implement space clamping for preemptive flushing
  btrfs: adjust the flush trace point to include the source
  btrfs: add a trace class for dumping the current ENOSPC state

 fs/btrfs/ctree.h             |   4 +-
 fs/btrfs/disk-io.c           |   9 +-
 fs/btrfs/ordered-data.c      |  13 +-
 fs/btrfs/space-info.c        | 274 ++++++++++++++++++++++++++++-------
 fs/btrfs/space-info.h        |   3 +
 include/trace/events/btrfs.h | 104 ++++++++++++-
 6 files changed, 340 insertions(+), 67 deletions(-)

-- 
2.26.2


             reply index

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-10-09 13:28 Josef Bacik [this message]
2020-10-09 13:28 ` [PATCH v3 01/12] btrfs: make flush_space take a enum btrfs_flush_state instead of int Josef Bacik
2020-10-12 13:49   ` Nikolay Borisov
2021-01-26 18:36   ` David Sterba
2021-01-26 20:32     ` Josef Bacik
2021-01-27 15:27       ` David Sterba
2020-10-09 13:28 ` [PATCH v3 02/12] btrfs: add a trace point for reserve tickets Josef Bacik
2021-01-26 19:41   ` David Sterba
2020-10-09 13:28 ` [PATCH v3 03/12] btrfs: track ordered bytes instead of just dio ordered bytes Josef Bacik
2020-10-12 13:50   ` Nikolay Borisov
2020-10-09 13:28 ` [PATCH v3 04/12] btrfs: introduce a FORCE_COMMIT_TRANS flush operation Josef Bacik
2020-10-12 13:50   ` Nikolay Borisov
2020-10-29 17:03   ` David Sterba
2021-01-26 18:41     ` David Sterba
2020-10-09 13:28 ` [PATCH v3 05/12] btrfs: improve preemptive background space flushing Josef Bacik
2020-10-13 11:29   ` Nikolay Borisov
2020-10-09 13:28 ` [PATCH v3 06/12] btrfs: rename need_do_async_reclaim Josef Bacik
2021-01-26 18:51   ` David Sterba
2020-10-09 13:28 ` [PATCH v3 07/12] btrfs: check reclaim_size in need_preemptive_reclaim Josef Bacik
2020-10-09 13:28 ` [PATCH v3 08/12] btrfs: rework btrfs_calc_reclaim_metadata_size Josef Bacik
2020-10-09 13:28 ` [PATCH v3 09/12] btrfs: simplify the logic in need_preemptive_flushing Josef Bacik
2020-10-13 12:18   ` Nikolay Borisov
2020-10-09 13:28 ` [PATCH v3 10/12] btrfs: implement space clamping for preemptive flushing Josef Bacik
2020-10-29 17:48   ` David Sterba
2020-10-09 13:28 ` [PATCH v3 11/12] btrfs: adjust the flush trace point to include the source Josef Bacik
2021-01-26 19:13   ` David Sterba
2020-10-09 13:28 ` [PATCH v3 12/12] btrfs: add a trace class for dumping the current ENOSPC state Josef Bacik

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=cover.1602249928.git.josef@toxicpanda.com \
    --to=josef@toxicpanda.com \
    --cc=kernel-team@fb.com \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Linux-BTRFS Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-btrfs/0 linux-btrfs/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-btrfs linux-btrfs/ https://lore.kernel.org/linux-btrfs \
		linux-btrfs@vger.kernel.org
	public-inbox-index linux-btrfs

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-btrfs


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git