linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v4 00/22] btrfs: async discard support
@ 2019-11-25 19:46 Dennis Zhou
  2019-11-25 19:46 ` [PATCH 01/22] bitmap: genericize percpu bitmap region iterators Dennis Zhou
                   ` (22 more replies)
  0 siblings, 23 replies; 30+ messages in thread
From: Dennis Zhou @ 2019-11-25 19:46 UTC (permalink / raw)
  To: David Sterba, Chris Mason, Josef Bacik, Omar Sandoval
  Cc: kernel-team, linux-btrfs, Dennis Zhou

Hello,

v3 [1] had 2 minor issues which are fixed:
 - I was generically dividing u64 which made 32 bit arches unhappy. [2]
 - Uninitialized use of trim_state local variable [3]

I've gone through and fixed the apparent u64 divisions which passes the
make.cross build provided on the end of the series.

This is based on btrfs-devel#misc-next e88411ecf657.

[1] https://lore.kernel.org/linux-btrfs/cover.1574282259.git.dennis@kernel.org/
[2] https://lore.kernel.org/linux-btrfs/201911230623.j5g9qeNZ%25lkp@intel.com/
[3] https://lore.kernel.org/linux-btrfs/CAKwvOdn5j37AYzmoOsaSqyYdBkjqevbTrSyGQypB+G_NgxX0fQ@mail.gmail.com/

diffstats below:

Dennis Zhou (22):
  bitmap: genericize percpu bitmap region iterators
  btrfs: rename DISCARD opt to DISCARD_SYNC
  btrfs: keep track of which extents have been discarded
  btrfs: keep track of cleanliness of the bitmap
  btrfs: add the beginning of async discard, discard workqueue
  btrfs: handle empty block_group removal
  btrfs: discard one region at a time in async discard
  btrfs: add removal calls for sysfs debug/
  btrfs: make UUID/debug have its own kobject
  btrfs: add discard sysfs directory
  btrfs: track discardable extents for async discard
  btrfs: keep track of discardable_bytes
  btrfs: calculate discard delay based on number of extents
  btrfs: add bps discard rate limit
  btrfs: limit max discard size for async discard
  btrfs: make max async discard size tunable
  btrfs: have multiple discard lists
  btrfs: only keep track of data extents for async discard
  btrfs: keep track of discard reuse stats
  btrfs: add async discard header
  btrfs: increase the metadata allowance for the free_space_cache
  btrfs: make smaller extents more likely to go into bitmaps

 fs/btrfs/Makefile           |   2 +-
 fs/btrfs/block-group.c      |  56 ++-
 fs/btrfs/block-group.h      |  30 ++
 fs/btrfs/ctree.h            |  52 ++-
 fs/btrfs/discard.c          | 680 ++++++++++++++++++++++++++++++++++++
 fs/btrfs/discard.h          |  46 +++
 fs/btrfs/disk-io.c          |  15 +-
 fs/btrfs/extent-tree.c      |  23 +-
 fs/btrfs/free-space-cache.c | 587 ++++++++++++++++++++++++++-----
 fs/btrfs/free-space-cache.h |  39 ++-
 fs/btrfs/inode-map.c        |  13 +-
 fs/btrfs/scrub.c            |   7 +-
 fs/btrfs/super.c            |  39 ++-
 fs/btrfs/sysfs.c            | 205 ++++++++++-
 include/linux/bitmap.h      |  35 ++
 mm/percpu.c                 |  61 +---
 16 files changed, 1737 insertions(+), 153 deletions(-)
 create mode 100644 fs/btrfs/discard.c
 create mode 100644 fs/btrfs/discard.h

Thanks,
Dennis

^ permalink raw reply	[flat|nested] 30+ messages in thread
* [PATCH v6 00/22] btrfs: async discard support
@ 2019-12-14  0:22 Dennis Zhou
  2019-12-14  0:22 ` [PATCH 17/22] btrfs: have multiple discard lists Dennis Zhou
  0 siblings, 1 reply; 30+ messages in thread
From: Dennis Zhou @ 2019-12-14  0:22 UTC (permalink / raw)
  To: David Sterba, Chris Mason, Josef Bacik, Omar Sandoval
  Cc: kernel-team, linux-btrfs, Dennis Zhou

Hello,

Dave reported a lockdep issue [1]. I'm a bit surprised as I can't repro
it, but it obviously is right. I believe I fixed the issue by moving the
fully trimmed check outside of the block_group lock.  I mistakingly
thought the btrfs_block_group lock subsumed btrfs_free_space_ctl
tree_lock. This clearly isn't the case.

Changes in v6:
 - Move the fully trimmed check outside of the block_group lock.

v5 is available here: [2].

This series is on top of btrfs-devel#misc-next 7ee98bb808e2 + [3] and
[4].

[1] https://lore.kernel.org/linux-btrfs/20191210140438.GU2734@twin.jikos.cz/
[2] https://lore.kernel.org/linux-btrfs/cover.1575919745.git.dennis@kernel.org/
[3] https://lore.kernel.org/linux-btrfs/d934383ea528d920a95b6107daad6023b516f0f4.1576109087.git.dennis@kernel.org/
[4] https://lore.kernel.org/linux-btrfs/20191209193846.18162-1-dennis@kernel.org/

Dennis Zhou (22):
  bitmap: genericize percpu bitmap region iterators
  btrfs: rename DISCARD opt to DISCARD_SYNC
  btrfs: keep track of which extents have been discarded
  btrfs: keep track of cleanliness of the bitmap
  btrfs: add the beginning of async discard, discard workqueue
  btrfs: handle empty block_group removal
  btrfs: discard one region at a time in async discard
  btrfs: add removal calls for sysfs debug/
  btrfs: make UUID/debug have its own kobject
  btrfs: add discard sysfs directory
  btrfs: track discardable extents for async discard
  btrfs: keep track of discardable_bytes
  btrfs: calculate discard delay based on number of extents
  btrfs: add bps discard rate limit
  btrfs: limit max discard size for async discard
  btrfs: make max async discard size tunable
  btrfs: have multiple discard lists
  btrfs: only keep track of data extents for async discard
  btrfs: keep track of discard reuse stats
  btrfs: add async discard header
  btrfs: increase the metadata allowance for the free_space_cache
  btrfs: make smaller extents more likely to go into bitmaps

 fs/btrfs/Makefile           |   2 +-
 fs/btrfs/block-group.c      |  87 ++++-
 fs/btrfs/block-group.h      |  30 ++
 fs/btrfs/ctree.h            |  52 ++-
 fs/btrfs/discard.c          | 684 ++++++++++++++++++++++++++++++++++++
 fs/btrfs/discard.h          |  42 +++
 fs/btrfs/disk-io.c          |  15 +-
 fs/btrfs/extent-tree.c      |   8 +-
 fs/btrfs/free-space-cache.c | 611 +++++++++++++++++++++++++++-----
 fs/btrfs/free-space-cache.h |  41 ++-
 fs/btrfs/inode-map.c        |  13 +-
 fs/btrfs/inode.c            |   2 +-
 fs/btrfs/scrub.c            |   7 +-
 fs/btrfs/super.c            |  39 +-
 fs/btrfs/sysfs.c            | 205 ++++++++++-
 fs/btrfs/volumes.c          |   7 +
 include/linux/bitmap.h      |  35 ++
 mm/percpu.c                 |  61 +---
 18 files changed, 1789 insertions(+), 152 deletions(-)
 create mode 100644 fs/btrfs/discard.c
 create mode 100644 fs/btrfs/discard.h

Thanks,
Dennis

^ permalink raw reply	[flat|nested] 30+ messages in thread
* [PATCH v5 00/22] btrfs: async discard support
@ 2019-12-09 19:45 Dennis Zhou
  2019-12-09 19:46 ` [PATCH 17/22] btrfs: have multiple discard lists Dennis Zhou
  0 siblings, 1 reply; 30+ messages in thread
From: Dennis Zhou @ 2019-12-09 19:45 UTC (permalink / raw)
  To: David Sterba, Chris Mason, Josef Bacik, Omar Sandoval
  Cc: kernel-team, linux-btrfs, Dennis Zhou

Hello,

Dave reported that with async discard enabled, relocation fails [1].
This could be caused by two things. First, if we unpin extents, that
means we haven't fully discarded the block group and need to let async
discard revisit it. Second, relocation removes block_groups outside of
the normal. I fixed both issues and now it successfully passes xfstests
btrfs/003.

Changes in v5:
 - Changed the rules so free space is always added as the right type
   based on discard settings (see btrfs_add_free_space()), this removes
   the need to pass around trim_state in unpin_extent_range().
 - Handled relocation block group deletion (xfstests btrfs/003)
 - When adding to the discard lists, make sure the work queue is active.
   (made all additions go through either btrfs_discard_queue_work() or
   btrfs_discard_check_filter()).
 - Added 10 sec reuse timeout for fully empty block groups.

v4 is available here: [2].

This series is on top of btrfs-devel#misc-next fbc0468e42d2 + [3].

[1] https://lore.kernel.org/linux-btrfs/20191126215204.GP2734@twin.jikos.cz/
[2] https://lore.kernel.org/linux-btrfs/cover.1574709825.git.dennis@kernel.org/
[3] https://lore.kernel.org/linux-btrfs/20191209193846.18162-1-dennis@kernel.org/

Dennis Zhou (22):
  bitmap: genericize percpu bitmap region iterators
  btrfs: rename DISCARD opt to DISCARD_SYNC
  btrfs: keep track of which extents have been discarded
  btrfs: keep track of cleanliness of the bitmap
  btrfs: add the beginning of async discard, discard workqueue
  btrfs: handle empty block_group removal
  btrfs: discard one region at a time in async discard
  btrfs: add removal calls for sysfs debug/
  btrfs: make UUID/debug have its own kobject
  btrfs: add discard sysfs directory
  btrfs: track discardable extents for async discard
  btrfs: keep track of discardable_bytes
  btrfs: calculate discard delay based on number of extents
  btrfs: add bps discard rate limit
  btrfs: limit max discard size for async discard
  btrfs: make max async discard size tunable
  btrfs: have multiple discard lists
  btrfs: only keep track of data extents for async discard
  btrfs: keep track of discard reuse stats
  btrfs: add async discard header
  btrfs: increase the metadata allowance for the free_space_cache
  btrfs: make smaller extents more likely to go into bitmaps

 fs/btrfs/Makefile           |   2 +-
 fs/btrfs/block-group.c      |  88 ++++-
 fs/btrfs/block-group.h      |  30 ++
 fs/btrfs/ctree.h            |  52 ++-
 fs/btrfs/discard.c          | 684 ++++++++++++++++++++++++++++++++++++
 fs/btrfs/discard.h          |  42 +++
 fs/btrfs/disk-io.c          |  15 +-
 fs/btrfs/extent-tree.c      |   8 +-
 fs/btrfs/free-space-cache.c | 611 +++++++++++++++++++++++++++-----
 fs/btrfs/free-space-cache.h |  41 ++-
 fs/btrfs/inode-map.c        |  13 +-
 fs/btrfs/inode.c            |   2 +-
 fs/btrfs/scrub.c            |   7 +-
 fs/btrfs/super.c            |  39 +-
 fs/btrfs/sysfs.c            | 205 ++++++++++-
 fs/btrfs/volumes.c          |   7 +
 include/linux/bitmap.h      |  35 ++
 mm/percpu.c                 |  61 +---
 18 files changed, 1789 insertions(+), 153 deletions(-)
 create mode 100644 fs/btrfs/discard.c
 create mode 100644 fs/btrfs/discard.h

-- 
2.17.1


^ permalink raw reply	[flat|nested] 30+ messages in thread
* [PATCH v3 00/22] btrfs: async discard support
@ 2019-11-20 21:50 Dennis Zhou
  2019-11-20 21:51 ` [PATCH 17/22] btrfs: have multiple discard lists Dennis Zhou
  0 siblings, 1 reply; 30+ messages in thread
From: Dennis Zhou @ 2019-11-20 21:50 UTC (permalink / raw)
  To: David Sterba, Chris Mason, Josef Bacik, Omar Sandoval
  Cc: kernel-team, linux-btrfs, Dennis Zhou

Hello,

There were only a few minor things that changed from v2 to v3 in
addition to the rebase on top of misc-next.

The notable changes:
 - Non-async discarding continues to retrim regions because we load the
   block groups as fully trimmed. This gives us a way to reconcile the
   filesystem state on device after reboot with fstrim if deisred.
 - Moved BTRFS_TRIM_STATE_UNTRIMMED to be 0 in the enum.
 - Kept discard sysfs under debug/.
 - Rebased on top of btrfs-devel#misc-next fa17ed069c61. Renamed
   instances of cache -> block_group to adjust to the rename of
   btrfs_block_group_cache -> btrfs_block_group.

---

The RFC [1] went a lot better than expected, so let's call this v2.

Changes:
 - I don't believe there are any functional changes to the code, but
   there are plenty of changes for readability and a lot of comments
   added.
 - Split out the sysfs parts into their own patches as it was a bit of
   work to put discard/ under debug/.
 - Switched over any flags to be enums. All the variables are now of
   appropriate type rather than being all atomics.
 - Dropped 0015 [2] from v1, load block groups on mount. This is in
   favor of reading everything in as clean and running fstrim as needed.
 - Misc. bug fixes.
 - I changed the block_group discard delay from 300s to 120s. It is just
   to give the allocators a chance to reuse the LBA before letting async
   discard take a crack at it.

Data:
On a number of webservers, I collected data every minute accounting the
time we spent in btrfs_finish_extent_commit() (col. 1) and in
btrfs_commit_transaction() (col. 2). btrfs_finish_extent_commit() is
where we discard extents synchronously before returning them to the
free space cache.

discard=sync:
                 p99 total per minute       p99 total per minute
      Drive   |   extent_commit() (ms)  |    commit_trans() (ms)
    ---------------------------------------------------------------
     Drive A  |           434           |          1170 
     Drive B  |           880           |          2330
     Drive C  |          2943           |          3920
     Drive D  |          4763           |          5701

discard=async:
                 p99 total per minute       p99 total per minute
      Drive   |   extent_commit() (ms)  |    commit_trans() (ms)
    --------------------------------------------------------------
     Drive A  |           134           |           956
     Drive B  |            64           |          1972
     Drive C  |            59           |          1032
     Drive D  |            62           |          1200

While it's not great that the stats are cumulative over 1m, all of these
servers are running the same workload and and the delta between the two
are substantial. We are spending significantly less time in
btrfs_finish_extent_commit() which is responsible for discarding.

From v1:
--------
Discard is an operation that allows for the filesystem to communicate
with underlying ssds that a lba region is no longer needed. This gives
the drive the more information as it tries to manage the available free
space to minimize write amplification. However, discard hasn't been
given the most tlc. Discard is a problematic command because a drive's
physical block management is more or less a black box to us and the
effects of any particular discard aren't necessarily limited the
lifetime of a command.

Currently, btrfs handles discarding synchronously during transaction
commit. This problematically can delay transaction commit based on the
amount of space that needs to be trimmed and the efficacy of the discard
operation for a particular drive.

This series introduces async discarding, which removes discard from the
transaction commit path. While every SSD has the choice of implementing
trim support different, we strive here to do the right thing. The idea
hinges on recognizing that write amplification really only kicks in once
we're really low on free space.  As long as we trim enough to keep a
large enough pool of free space, in theory this should minimize the cost
of issuing discards on a workload and have limited cost overhead in
write amplification.

With async discard, we try to emphasize discarding larger regions
and reusing the lba (implicit discard). The first is done by using the
free space cache to maintain discard state and thus allows us to get
coalescing for fairly cheap. A background workqueue is used to scan over
an LRU kept list of the block groups. It then uses filters to determine
what to discard next hence giving priority to larger discards. While
reusing an lba isn't explicitly attempted, it happens implicitly via
find_free_extent() which if it happens to find a dirty extent, will
grant us reuse of the lba. Additionally, async discarding skips metadata
block groups as these should see a fairly high turnover as btrfs is a
self-packing filesystem being stingy with allocating new block groups
until necessary.

Preliminary results seem promising as when a lot of freeing is going on,
the discarding is delayed allowing for reuse which translates to less
discarding (in addition to the slower discarding). This has shown a
reduction in p90 and p99 read latencies on a test on our webservers.

I am currently working on tuning the rate at which it discards in the
background. I am doing this by evaluating other workloads and drives.
The iops and bps rate limits are fairly aggressive right now as my
basic survey of a few drives noted that the trim command itself is a
significant part of the overhead. So optimizing for larger trims is the
right thing to do.

This series contains the following 22 patches and is on top of
btrfs-devel#misc-next fa17ed069c61:
  0001-bitmap-genericize-percpu-bitmap-region-iterators.patch
  0002-btrfs-rename-DISCARD-opt-to-DISCARD_SYNC.patch
  0003-btrfs-keep-track-of-which-extents-have-been-discarde.patch
  0004-btrfs-keep-track-of-cleanliness-of-the-bitmap.patch
  0005-btrfs-add-the-beginning-of-async-discard-discard-wor.patch
  0006-btrfs-handle-empty-block_group-removal.patch
  0007-btrfs-discard-one-region-at-a-time-in-async-discard.patch
  0008-btrfs-add-removal-calls-for-sysfs-debug.patch
  0009-btrfs-make-UUID-debug-have-its-own-kobject.patch
  0010-btrfs-add-discard-sysfs-directory.patch
  0011-btrfs-track-discardable-extents-for-async-discard.patch
  0012-btrfs-keep-track-of-discardable_bytes.patch
  0013-btrfs-calculate-discard-delay-based-on-number-of-ext.patch
  0014-btrfs-add-bps-discard-rate-limit.patch
  0015-btrfs-limit-max-discard-size-for-async-discard.patch
  0016-btrfs-make-max-async-discard-size-tunable.patch
  0017-btrfs-have-multiple-discard-lists.patch
  0018-btrfs-only-keep-track-of-data-extents-for-async-disc.patch
  0019-btrfs-keep-track-of-discard-reuse-stats.patch
  0020-btrfs-add-async-discard-header.patch
  0021-btrfs-increase-the-metadata-allowance-for-the-free_s.patch
  0022-btrfs-make-smaller-extents-more-likely-to-go-into-bi.patch

0001 exports percpu's bitmap iterators for eventual use in 0011. 0002
renames DISCARD to DISCARD_SYNC. 0003 and 0004 adds discard tracking to
the free space cache. 0005-0007 adds the core of async discard support.
0008-0010 modify debug sysfs to allow for adding discard/ under debug/.
0011-0016 fiddle with stats and operation limits. 0018 makes async
discarding only track data block groups. 0019 adds reuse stats. 0020
adds an explanation header to discard.c. 0021 and 0022 modify the free
space cache metadata allowance, add a bitmap -> extent path and makes us
more likely to put smaller extents into the bitmaps.

diffstats below:

Dennis Zhou (22):
  bitmap: genericize percpu bitmap region iterators
  btrfs: rename DISCARD opt to DISCARD_SYNC
  btrfs: keep track of which extents have been discarded
  btrfs: keep track of cleanliness of the bitmap
  btrfs: add the beginning of async discard, discard workqueue
  btrfs: handle empty block_group removal
  btrfs: discard one region at a time in async discard
  btrfs: add removal calls for sysfs debug/
  btrfs: make UUID/debug have its own kobject
  btrfs: add discard sysfs directory
  btrfs: track discardable extents for async discard
  btrfs: keep track of discardable_bytes
  btrfs: calculate discard delay based on number of extents
  btrfs: add bps discard rate limit
  btrfs: limit max discard size for async discard
  btrfs: make max async discard size tunable
  btrfs: have multiple discard lists
  btrfs: only keep track of data extents for async discard
  btrfs: keep track of discard reuse stats
  btrfs: add async discard header
  btrfs: increase the metadata allowance for the free_space_cache
  btrfs: make smaller extents more likely to go into bitmaps

 fs/btrfs/Makefile           |   2 +-
 fs/btrfs/block-group.c      |  56 ++-
 fs/btrfs/block-group.h      |  30 ++
 fs/btrfs/ctree.h            |  52 ++-
 fs/btrfs/discard.c          | 679 ++++++++++++++++++++++++++++++++++++
 fs/btrfs/discard.h          |  46 +++
 fs/btrfs/disk-io.c          |  15 +-
 fs/btrfs/extent-tree.c      |  23 +-
 fs/btrfs/free-space-cache.c | 585 ++++++++++++++++++++++++++-----
 fs/btrfs/free-space-cache.h |  39 ++-
 fs/btrfs/inode-map.c        |  13 +-
 fs/btrfs/scrub.c            |   7 +-
 fs/btrfs/super.c            |  39 ++-
 fs/btrfs/sysfs.c            | 205 ++++++++++-
 include/linux/bitmap.h      |  35 ++
 mm/percpu.c                 |  61 +---
 16 files changed, 1734 insertions(+), 153 deletions(-)
 create mode 100644 fs/btrfs/discard.c
 create mode 100644 fs/btrfs/discard.h

Thanks,
Dennis

^ permalink raw reply	[flat|nested] 30+ messages in thread
* [PATCH v2 00/22] btrfs: async discard support
@ 2019-10-23 22:52 Dennis Zhou
  2019-10-23 22:53 ` [PATCH 17/22] btrfs: have multiple discard lists Dennis Zhou
  0 siblings, 1 reply; 30+ messages in thread
From: Dennis Zhou @ 2019-10-23 22:52 UTC (permalink / raw)
  To: David Sterba, Chris Mason, Josef Bacik, Omar Sandoval
  Cc: kernel-team, linux-btrfs, Dennis Zhou

Hello,

The RFC [1] went a lot better than expected, so let's call this v2.

Changes:
 - I don't believe there are any functional changes to the code, but
   there are plenty of changes for readability and a lot of comments
   added.
 - Split out the sysfs parts into their own patches as it was a bit of
   work to put discard/ under debug/.
 - Switched over any flags to be enums. All the variables are now of
   appropriate type rather than being all atomics.
 - Dropped 0015 [2] from v1, load block groups on mount. This is in
   favor of reading everything in as clean and running fstrim as needed.
 - Misc. bug fixes.
 - I changed the block_group discard delay from 300s to 120s. It is just
   to give the allocators a chance to reuse the LBA before letting async
   discard take a crack at it.

Data:
On a number of webservers, I collected data every minute accounting the
time we spent in btrfs_finish_extent_commit() (col. 1) and in
btrfs_commit_transaction() (col. 2). btrfs_finish_extent_commit() is
where we discard extents synchronously before returning them to the
free space cache.

discard=sync:
                 p99 total per minute       p99 total per minute
      Drive   |   extent_commit() (ms)  |    commit_trans() (ms)
    ---------------------------------------------------------------
     Drive A  |           434           |          1170 
     Drive B  |           880           |          2330
     Drive C  |          2943           |          3920
     Drive D  |          4763           |          5701

discard=async:
                 p99 total per minute       p99 total per minute
      Drive   |   extent_commit() (ms)  |    commit_trans() (ms)
    --------------------------------------------------------------
     Drive A  |           134           |           956
     Drive B  |            64           |          1972
     Drive C  |            59           |          1032
     Drive D  |            62           |          1200

While it's not great that the stats are cumulative over 1m, all of these
servers are running the same workload and and the delta between the two
are substantial. We are spending significantly less time in
btrfs_finish_extent_commit() which is responsible for discarding.

From v1:
--------
Discard is an operation that allows for the filesystem to communicate
with underlying ssds that a lba region is no longer needed. This gives
the drive the more information as it tries to manage the available free
space to minimize write amplification. However, discard hasn't been
given the most tlc. Discard is a problematic command because a drive's
physical block management is more or less a black box to us and the
effects of any particular discard aren't necessarily limited the
lifetime of a command.

Currently, btrfs handles discarding synchronously during transaction
commit. This problematically can delay transaction commit based on the
amount of space that needs to be trimmed and the efficacy of the discard
operation for a particular drive.

This series introduces async discarding, which removes discard from the
transaction commit path. While every SSD has the choice of implementing
trim support different, we strive here to do the right thing. The idea
hinges on recognizing that write amplification really only kicks in once
we're really low on free space.  As long as we trim enough to keep a
large enough pool of free space, in theory this should minimize the cost
of issuing discards on a workload and have limited cost overhead in
write amplification.

With async discard, we try to emphasize discarding larger regions
and reusing the lba (implicit discard). The first is done by using the
free space cache to maintain discard state and thus allows us to get
coalescing for fairly cheap. A background workqueue is used to scan over
an LRU kept list of the block groups. It then uses filters to determine
what to discard next hence giving priority to larger discards. While
reusing an lba isn't explicitly attempted, it happens implicitly via
find_free_extent() which if it happens to find a dirty extent, will
grant us reuse of the lba. Additionally, async discarding skips metadata
block groups as these should see a fairly high turnover as btrfs is a
self-packing filesystem being stingy with allocating new block groups
until necessary.

Preliminary results seem promising as when a lot of freeing is going on,
the discarding is delayed allowing for reuse which translates to less
discarding (in addition to the slower discarding). This has shown a
reduction in p90 and p99 read latencies on a test on our webservers.

I am currently working on tuning the rate at which it discards in the
background. I am doing this by evaluating other workloads and drives.
The iops and bps rate limits are fairly aggressive right now as my
basic survey of a few drives noted that the trim command itself is a
significant part of the overhead. So optimizing for larger trims is the
right thing to do.

This series contains the following 22 patches and is on top of
btrfs-devel#master 3b7c59a1950c:
  0001-bitmap-genericize-percpu-bitmap-region-iterators.patch
  0002-btrfs-rename-DISCARD-opt-to-DISCARD_SYNC.patch
  0003-btrfs-keep-track-of-which-extents-have-been-discarde.patch
  0004-btrfs-keep-track-of-cleanliness-of-the-bitmap.patch
  0005-btrfs-add-the-beginning-of-async-discard-discard-wor.patch
  0006-btrfs-handle-empty-block_group-removal.patch
  0007-btrfs-discard-one-region-at-a-time-in-async-discard.patch
  0008-btrfs-add-removal-calls-for-sysfs-debug.patch
  0009-btrfs-make-UUID-debug-have-its-own-kobject.patch
  0010-btrfs-add-discard-sysfs-directory.patch
  0011-btrfs-track-discardable-extents-for-async-discard.patch
  0012-btrfs-keep-track-of-discardable_bytes.patch
  0013-btrfs-calculate-discard-delay-based-on-number-of-ext.patch
  0014-btrfs-add-bps-discard-rate-limit.patch
  0015-btrfs-limit-max-discard-size-for-async-discard.patch
  0016-btrfs-make-max-async-discard-size-tunable.patch
  0017-btrfs-have-multiple-discard-lists.patch
  0018-btrfs-only-keep-track-of-data-extents-for-async-disc.patch
  0019-btrfs-keep-track-of-discard-reuse-stats.patch
  0020-btrfs-add-async-discard-header.patch
  0021-btrfs-increase-the-metadata-allowance-for-the-free_s.patch
  0022-btrfs-make-smaller-extents-more-likely-to-go-into-bi.patch

0001 exports percpu's bitmap iterators for eventual use in 0011. 0002
renames DISCARD to DISCARD_SYNC. 0003 and 0004 adds discard tracking to
the free space cache. 0005-0007 adds the core of async discard support.
0008-0010 modify debug sysfs to allow for adding discard/ under debug/.
0011-0016 fiddle with stats and operation limits. 0018 makes async
discarding only track data block groups. 0019 adds reuse stats. 0020
adds an explanation header to discard.c. 0021 and 0022 modify the free
space cache metadata allowance, add a bitmap -> extent path and makes us
more likely to put smaller extents into the bitmaps.

diffstats below:

Dennis Zhou (22):
  bitmap: genericize percpu bitmap region iterators
  btrfs: rename DISCARD opt to DISCARD_SYNC
  btrfs: keep track of which extents have been discarded
  btrfs: keep track of cleanliness of the bitmap
  btrfs: add the beginning of async discard, discard workqueue
  btrfs: handle empty block_group removal
  btrfs: discard one region at a time in async discard
  btrfs: add removal calls for sysfs debug/
  btrfs: make UUID/debug have its own kobject
  btrfs: add discard sysfs directory
  btrfs: track discardable extents for async discard
  btrfs: keep track of discardable_bytes
  btrfs: calculate discard delay based on number of extents
  btrfs: add bps discard rate limit
  btrfs: limit max discard size for async discard
  btrfs: make max async discard size tunable
  btrfs: have multiple discard lists
  btrfs: only keep track of data extents for async discard
  btrfs: keep track of discard reuse stats
  btrfs: add async discard header
  btrfs: increase the metadata allowance for the free_space_cache
  btrfs: make smaller extents more likely to go into bitmaps

 fs/btrfs/Makefile           |   2 +-
 fs/btrfs/block-group.c      |  56 ++-
 fs/btrfs/block-group.h      |  30 ++
 fs/btrfs/ctree.h            |  52 ++-
 fs/btrfs/discard.c          | 666 ++++++++++++++++++++++++++++++++++++
 fs/btrfs/discard.h          |  48 +++
 fs/btrfs/disk-io.c          |  15 +-
 fs/btrfs/extent-tree.c      |  23 +-
 fs/btrfs/free-space-cache.c | 587 ++++++++++++++++++++++++++-----
 fs/btrfs/free-space-cache.h |  39 ++-
 fs/btrfs/inode-map.c        |  13 +-
 fs/btrfs/scrub.c            |   7 +-
 fs/btrfs/super.c            |  39 ++-
 fs/btrfs/sysfs.c            | 203 ++++++++++-
 include/linux/bitmap.h      |  35 ++
 mm/percpu.c                 |  61 +---
 16 files changed, 1723 insertions(+), 153 deletions(-)
 create mode 100644 fs/btrfs/discard.c
 create mode 100644 fs/btrfs/discard.h

Thanks,
Dennis

^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2019-12-14  0:23 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-11-25 19:46 [PATCH v4 00/22] btrfs: async discard support Dennis Zhou
2019-11-25 19:46 ` [PATCH 01/22] bitmap: genericize percpu bitmap region iterators Dennis Zhou
2019-11-25 19:46 ` [PATCH 02/22] btrfs: rename DISCARD opt to DISCARD_SYNC Dennis Zhou
2019-11-25 19:46 ` [PATCH 03/22] btrfs: keep track of which extents have been discarded Dennis Zhou
2019-11-25 19:46 ` [PATCH 04/22] btrfs: keep track of cleanliness of the bitmap Dennis Zhou
2019-11-25 19:46 ` [PATCH 05/22] btrfs: add the beginning of async discard, discard workqueue Dennis Zhou
2019-11-25 19:46 ` [PATCH 06/22] btrfs: handle empty block_group removal Dennis Zhou
2019-11-25 19:46 ` [PATCH 07/22] btrfs: discard one region at a time in async discard Dennis Zhou
2019-11-25 19:46 ` [PATCH 08/22] btrfs: add removal calls for sysfs debug/ Dennis Zhou
2019-11-25 19:46 ` [PATCH 09/22] btrfs: make UUID/debug have its own kobject Dennis Zhou
2019-11-25 19:46 ` [PATCH 10/22] btrfs: add discard sysfs directory Dennis Zhou
2019-11-25 19:46 ` [PATCH 11/22] btrfs: track discardable extents for async discard Dennis Zhou
2019-11-25 19:46 ` [PATCH 12/22] btrfs: keep track of discardable_bytes Dennis Zhou
2019-11-25 19:46 ` [PATCH 13/22] btrfs: calculate discard delay based on number of extents Dennis Zhou
2019-11-25 19:46 ` [PATCH 14/22] btrfs: add bps discard rate limit Dennis Zhou
2019-11-25 19:46 ` [PATCH 15/22] btrfs: limit max discard size for async discard Dennis Zhou
2019-11-25 19:46 ` [PATCH 16/22] btrfs: make max async discard size tunable Dennis Zhou
2019-11-25 19:46 ` [PATCH 17/22] btrfs: have multiple discard lists Dennis Zhou
2019-11-25 19:46 ` [PATCH 18/22] btrfs: only keep track of data extents for async discard Dennis Zhou
2019-11-25 19:46 ` [PATCH 19/22] btrfs: keep track of discard reuse stats Dennis Zhou
2019-11-25 19:47 ` [PATCH 20/22] btrfs: add async discard header Dennis Zhou
2019-11-25 19:47 ` [PATCH 21/22] btrfs: increase the metadata allowance for the free_space_cache Dennis Zhou
2019-11-25 19:47 ` [PATCH 22/22] btrfs: make smaller extents more likely to go into bitmaps Dennis Zhou
2019-11-26 21:52 ` [PATCH v4 00/22] btrfs: async discard support David Sterba
2019-11-27 18:26   ` Dennis Zhou
  -- strict thread matches above, loose matches on Subject: below --
2019-12-14  0:22 [PATCH v6 " Dennis Zhou
2019-12-14  0:22 ` [PATCH 17/22] btrfs: have multiple discard lists Dennis Zhou
2019-12-09 19:45 [PATCH v5 00/22] btrfs: async discard support Dennis Zhou
2019-12-09 19:46 ` [PATCH 17/22] btrfs: have multiple discard lists Dennis Zhou
2019-11-20 21:50 [PATCH v3 00/22] btrfs: async discard support Dennis Zhou
2019-11-20 21:51 ` [PATCH 17/22] btrfs: have multiple discard lists Dennis Zhou
2019-10-23 22:52 [PATCH v2 00/22] btrfs: async discard support Dennis Zhou
2019-10-23 22:53 ` [PATCH 17/22] btrfs: have multiple discard lists Dennis Zhou
2019-11-08 19:44   ` Josef Bacik

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).