All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Theodore Ts'o" <tytso@mit.edu>
To: Harshad Shirwadkar <harshadshirwadkar@gmail.com>
Cc: linux-ext4@vger.kernel.org
Subject: Re: [PATCH v6 0/7] Block Allocator Improvements
Date: Fri, 9 Apr 2021 12:36:03 -0400	[thread overview]
Message-ID: <YHCCc5UKANOd2VbQ@mit.edu> (raw)
In-Reply-To: <20210401172129.189766-1-harshadshirwadkar@gmail.com>

Thanks, I've applied this patch series into the ext4 git tree.

	     	     	  	- Ted

On Thu, Apr 01, 2021 at 10:21:22AM -0700, Harshad Shirwadkar wrote:
> This patch series improves cr 0 and cr 1 passes of the allocator
> signficantly. Currently, at cr 0 and 1, we perform linear lookups to
> find the matching groups. That's very inefficient for large file
> systems where there are millions of block groups. At cr 0, we only
> care about the groups that have the largest free order >= the
> request's order and at cr 1 we only care about groups where average
> fragment size > the request size. so, this patchset introduces new
> data structures that allow us to perform cr 0 lookup in constant time
> and cr 1 lookup in log (number of groups) time instead of linear.
> 
> For cr 0, we add a list for each order and all the groups are enqueued
> to the appropriate list based on the largest free order in its buddy
> bitmap. This allows us to lookup a match at cr 0 in constant time.
> 
> For cr 1, we add a new rb tree of groups sorted by largest fragment
> size. This allows us to lookup a match for cr 1 in log (num groups)
> time.
> 
> These optimizations can be enabled by passing "mb_optimize_scan" mount
> option.
> 
> These changes may result in allocations to be spread across the block
> device. While that would not matter some block devices (such as flash)
> it may be a cause of concern for other block devices that benefit from
> storing related content togetther such as disk. However, it can be
> argued that in high fragmentation scenrio, especially for large disks,
> it's still worth optimizing the scanning since in such cases, we get
> cpu bound on group scanning instead of getting IO bound. Perhaps, in
> future, we could dynamically turn this new optimization on based on
> fragmentation levels for such devices.
> 
> Verified that there are no regressions in smoke tests (-g quick -c 4k).
> 
> Also, to demonstrate the effectiveness for the patch series, following
> experiment was performed:
> 
> Created a highly fragmented disk of size 65TB. The disk had no
> contiguous 2M regions. Following command was run consecutively for 3
> times:
> 
> time dd if=/dev/urandom of=file bs=2M count=10
> 
> Here are the results with and without cr 0/1 optimizations:
> 
> |---------+------------------------------+---------------------------|
> |         | Without CR 0/1 Optimizations | With CR 0/1 Optimizations |
> |---------+------------------------------+---------------------------|
> | 1st run | 5m1.871s                     | 2m47.642s                 |
> | 2nd run | 2m28.390s                    | 0m0.611s                  |
> | 3rd run | 2m26.530s                    | 0m1.255s                  |
> |---------+------------------------------+---------------------------|
> 
> The patch [3/6] "ext4: add mballoc stats proc file" is a modified
> version of the patch originally written by Artem Blagodarenko
> (artem.blagodarenko@gmail.com). With that patch, I ran following
> command with and without optimizations.
> 
> dd if=/dev/zero of=/mnt/file bs=2M count=2 conv=fsync
> 
> Without optimizations:
> 
> useless_c0_loops: 3
> useless_c1_loops: 39
> useless_c2_loops: 0
> useless_c3_loops: 0
> 
> With optimizations:
> 
> useless_c0_loops: 0
> useless_c1_loops: 0
> useless_c2_loops: 0
> useless_c3_loops: 0
> 
> This shows that CR0 and CR1 optimizations get rid of useless CR0 and
> CR1 loops altogether thereby significantly reducing the number of
> groups that get considered.
> 
> Changes from V5:
> ----------------
> - Turned block bitmap prefetching on by default
> - Fixed a bug where for cr >= 2, we were skipping first group without
>   searching in it
> - Renamed mb_linear_limit to mb_max_linear_groups
> 
> Harshad Shirwadkar (7):
>   ext4: drop s_mb_bal_lock and convert protected fields to atomic
>   ext4: add ability to return parsed options from parse_options
>   ext4: add mballoc stats proc file
>   ext4: add MB_NUM_ORDERS macro
>   ext4: improve cr 0 / cr 1 group scanning
>   ext4: add proc files to monitor new structures
>   ext4: make prefetch_block_bitmaps default
> 
>  fs/ext4/ext4.h    |  34 ++-
>  fs/ext4/mballoc.c | 590 +++++++++++++++++++++++++++++++++++++++++++---
>  fs/ext4/mballoc.h |  22 +-
>  fs/ext4/super.c   |  92 +++++---
>  fs/ext4/sysfs.c   |   6 +
>  5 files changed, 680 insertions(+), 64 deletions(-)
> 
> -- 
> 2.31.0.291.g576ba9dcdaf-goog
> 

      parent reply	other threads:[~2021-04-09 16:36 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-04-01 17:21 [PATCH v6 0/7] Block Allocator Improvements Harshad Shirwadkar
2021-04-01 17:21 ` [PATCH v6 1/7] ext4: drop s_mb_bal_lock and convert protected fields to atomic Harshad Shirwadkar
2021-04-01 17:21 ` [PATCH v6 2/7] ext4: add ability to return parsed options from parse_options Harshad Shirwadkar
2021-04-01 17:21 ` [PATCH v6 3/7] ext4: add mballoc stats proc file Harshad Shirwadkar
2021-04-01 17:21 ` [PATCH v6 4/7] ext4: add MB_NUM_ORDERS macro Harshad Shirwadkar
2021-04-01 17:21 ` [PATCH v6 5/7] ext4: improve cr 0 / cr 1 group scanning Harshad Shirwadkar
2021-04-01 17:21 ` [PATCH v6 6/7] ext4: add proc files to monitor new structures Harshad Shirwadkar
2021-04-01 17:21 ` [PATCH v6 7/7] ext4: make prefetch_block_bitmaps default Harshad Shirwadkar
2021-04-02  5:16   ` Andreas Dilger
2021-04-02 16:46     ` harshad shirwadkar
2021-04-02 17:39       ` Andreas Dilger
2021-04-09 16:36 ` Theodore Ts'o [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YHCCc5UKANOd2VbQ@mit.edu \
    --to=tytso@mit.edu \
    --cc=harshadshirwadkar@gmail.com \
    --cc=linux-ext4@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.