All of lore.kernel.org
 help / color / mirror / Atom feed
From: Su Yue <l@damenly.su>
To: Qu Wenruo <wqu@suse.com>
Cc: linux-btrfs@vger.kernel.org
Cc: jd.girard@sysnux.pf
Cc: ben.r.xiao@gmail.com
Subject: Re: [PATCH 0/4] btrfs: make autodefrag to defrag and only defrag small write ranges
Date: Tue, 22 Feb 2022 09:10:45 +0800	[thread overview]
Message-ID: <fsobsnaw.fsf@damenly.su> (raw)
In-Reply-To: <cover.1644737297.git.wqu@suse.com>


On Sun 13 Feb 2022 at 15:42, Qu Wenruo <wqu@suse.com> wrote:

> When a small write reaches disk, btrfs will mark the inode for
> autodefrag, and record the transid of the inode for autodefrag.
>
> Then autodefrag uses the transid to only scan newer file 
> extents.
>
> However this transid based scanning scheme has a hole, the small 
> write
> size threshold triggering autodefrag is not the same extent size
> threshold for autodefrag.
>
> For the following write sequence on an non-compressed inode:
>
>  pwrite 0 4k
>  sync
>  pwrite 4k 128k
>  sync
>  pwrite 132k 128k
>  sync.
>
> The first 4K is indeed a small write (<64K), but the later two 
> 128K ones
> are definite not (>64K).
>
> Hoever autodefrag will try to defrag all three writes, as the
> extent_threshold used for autodefrag is fixed 256K.
>
> This extra scanning on extents which didn't trigger autodefrag 
> can cause
> extra IO.
>
> This patchset will try to address the problem by:
>
> - Remove the inode_defrag re-queue behavior
>   Now we just scan one file til its end (while keep the
>   max_sectors_to_defrag limit, and frequently check the root 
>   refs, and
>   remount situation to exit).
>
>   This also saves several bytes from inode_defrag structure.
>
>   This is done in the 3rd patch.
>
> - Save @small_write value into inode_defrag and use it as 
> autodefrag
>   extent threshold
>   Now there is no gap for autodefrag and small_write.
>
>   This is done in the 4th patch.
>
> The remaining patches are:
>
> - Removing one dead parameter
>
> - Add extra trace events for autodefrag
>   So end users will no longer need to re-compile kernel modules, 
>   and
>   use trace events to provide debug info on the 
>   autodefrag/defrag ioctl.
>
> Unfortunately I don't have a good benchmark setup for the 
> patchset yet,
> but unlike previous RFC version, this one brings very little 
> extra
> resource usage, and is just changing the extent_threshold for
> autodefrag.
>
> Changelog:
> RFC->v1:
> - Add ftrace events for defrag
>
> - Add a new patch to change how we run defrag inodes
>   Instead of saving previous location and re-queue, just run it 
>   in one
>   run.
>   Previously btrfs_run_defrag_inodse() will always exhaust the 
>   existing
>   inode_defrag anyway, the change should not bring much 
>   difference.
>
> - Change autodefrag extent_thresh to close the gap, other than 
> using
>   another extent io tree
>   Now it uses less resource, keep the critical section small, 
>   while
>   can almost reach the same objective.
>
> Qu Wenruo (4):
>   btrfs: remove unused parameter for btrfs_add_inode_defrag()
>   btrfs: add trace events for defrag
>   btrfs: autodefrag: only scan one inode once
>   btrfs: close the gap between inode_should_defrag() and 
>   autodefrag
>     extent size threshold
>

Cc the reporters.

It was about ~20 days agao when I saw the report about autodefrag 
causes
IO burning. Then I turned the autodefrag option of the raid1 btrfs 
on
my NAS to do some experimental tests.

At first time, I tried to download files in ~20GB size but iotop 
showed
everything was fine. And the autodefrag was left in /etc/fstab.
When I was back from my vacation, it surprised me that my 4TB size 
disk
has been written about ~70TB and iotop showed btrfs-cleaner is 
eating
the whole disk io bandwidth.

And after switched kernel 5.16.8-arch1-1 from to Qu's 
autodefrag_fixes[1],
I'd say the btrfs on my NAS works well(no panic of course) and no 
more
IO burning occurs.

https://github.com/adam900710/linux/tree/autodefrag_fixes
--
Su
>  fs/btrfs/ctree.h             |   3 +-
>  fs/btrfs/file.c              | 165 
>  +++++++++++++++--------------------
>  fs/btrfs/inode.c             |   4 +-
>  fs/btrfs/ioctl.c             |   4 +
>  include/trace/events/btrfs.h | 127 +++++++++++++++++++++++++++
>  5 files changed, 206 insertions(+), 97 deletions(-)

      parent reply	other threads:[~2022-02-22  1:27 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-02-13  7:42 [PATCH 0/4] btrfs: make autodefrag to defrag and only defrag small write ranges Qu Wenruo
2022-02-13  7:42 ` [PATCH 1/4] btrfs: remove unused parameter for btrfs_add_inode_defrag() Qu Wenruo
2022-02-13  7:42 ` [PATCH 2/4] btrfs: add trace events for defrag Qu Wenruo
2022-02-13  7:42 ` [PATCH 3/4] btrfs: autodefrag: only scan one inode once Qu Wenruo
2022-02-22 17:32   ` David Sterba
2022-02-22 23:42     ` Qu Wenruo
2022-02-23 15:53       ` David Sterba
2022-02-24  6:59         ` Qu Wenruo
2022-02-24  9:45           ` Qu Wenruo
2022-02-24 12:18             ` Qu Wenruo
2022-02-24 19:44               ` David Sterba
2022-02-24 19:41           ` David Sterba
2022-02-13  7:42 ` [PATCH 4/4] btrfs: close the gap between inode_should_defrag() and autodefrag extent size threshold Qu Wenruo
2022-02-15  6:55 ` [PATCH 0/4] btrfs: make autodefrag to defrag and only defrag small write ranges Qu Wenruo
2022-02-22  1:10 ` Su Yue [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=fsobsnaw.fsf@damenly.su \
    --to=l@damenly.su \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=wqu@suse.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.