All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 00/12] btrfs: async discard follow up
@ 2020-01-02 21:26 Dennis Zhou
  2020-01-02 21:26 ` [PATCH 01/12] btrfs: calculate discard delay based on number of extents Dennis Zhou
                   ` (14 more replies)
  0 siblings, 15 replies; 28+ messages in thread
From: Dennis Zhou @ 2020-01-02 21:26 UTC (permalink / raw)
  To: David Sterba, Chris Mason, Josef Bacik, Omar Sandoval
  Cc: kernel-team, linux-btrfs, Dennis Zhou

Hello,

Dave applied 1-12 from v6 [1]. This is a follow up cleaning up the
remaining 10 patches adding 2 more to deal with a rare -1 [2] that I
haven't quite figured out how to repro. This is also available at [3].

This series is on top of btrfs-devel#misc-next-with-discard-v6 0c7be920bd7d.

[1] https://lore.kernel.org/linux-btrfs/cover.1576195673.git.dennis@kernel.org/
[2] https://lore.kernel.org/linux-btrfs/20191217145541.GE3929@suse.cz/
[3] https://git.kernel.org/pub/scm/linux/kernel/git/dennis/misc.git/log/?h=async-discard

Dennis Zhou (12):
  btrfs: calculate discard delay based on number of extents
  btrfs: add bps discard rate limit for async discard
  btrfs: limit max discard size for async discard
  btrfs: make max async discard size tunable
  btrfs: have multiple discard lists
  btrfs: only keep track of data extents for async discard
  btrfs: keep track of discard reuse stats
  btrfs: add async discard header
  btrfs: increase the metadata allowance for the free_space_cache
  btrfs: make smaller extents more likely to go into bitmaps
  btrfs: ensure removal of discardable_* in free_bitmap()
  btrfs: add correction to handle -1 edge case in async discard

 fs/btrfs/block-group.h      |   7 +
 fs/btrfs/ctree.h            |  10 +-
 fs/btrfs/discard.c          | 258 +++++++++++++++++++++++++++++++++---
 fs/btrfs/discard.h          |  12 ++
 fs/btrfs/extent-tree.c      |   4 +-
 fs/btrfs/free-space-cache.c | 154 +++++++++++++++------
 fs/btrfs/free-space-cache.h |   2 +-
 fs/btrfs/sysfs.c            | 129 ++++++++++++++++++
 8 files changed, 519 insertions(+), 57 deletions(-)

Thanks,
Dennis

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH 01/12] btrfs: calculate discard delay based on number of extents
  2020-01-02 21:26 [PATCH 00/12] btrfs: async discard follow up Dennis Zhou
@ 2020-01-02 21:26 ` Dennis Zhou
  2020-01-03 14:38   ` David Sterba
  2020-01-02 21:26 ` [PATCH 02/12] btrfs: add bps discard rate limit for async discard Dennis Zhou
                   ` (13 subsequent siblings)
  14 siblings, 1 reply; 28+ messages in thread
From: Dennis Zhou @ 2020-01-02 21:26 UTC (permalink / raw)
  To: David Sterba, Chris Mason, Josef Bacik, Omar Sandoval
  Cc: kernel-team, linux-btrfs, Dennis Zhou

An earlier patch keeps track of discardable_extents. These are
undiscarded extents managed by the free space cache. Here, we will use
this to dynamically calculate the discard delay interval.

There are 3 rate to consider. The first is the target convergence rate,
the rate to discard all discardable_extents over the
BTRFS_DISCARD_TARGET_MSEC time frame. This is clamped by the lower
limit, the iops limit or BTRFS_DISCARD_MIN_DELAY (1ms), and the upper
limit, BTRFS_DISCARD_MAX_DELAY (1s). We reevaluate this delay every
transaction commit.

Signed-off-by: Dennis Zhou <dennis@kernel.org>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
---
 fs/btrfs/ctree.h       |  2 ++
 fs/btrfs/discard.c     | 55 +++++++++++++++++++++++++++++++++++++++---
 fs/btrfs/discard.h     |  1 +
 fs/btrfs/extent-tree.c |  4 ++-
 fs/btrfs/sysfs.c       | 31 ++++++++++++++++++++++++
 5 files changed, 88 insertions(+), 5 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 7c1c236d13ae..c73bbc7e4491 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -468,6 +468,8 @@ struct btrfs_discard_ctl {
 	struct list_head discard_list[BTRFS_NR_DISCARD_LISTS];
 	atomic_t discardable_extents;
 	atomic64_t discardable_bytes;
+	unsigned long delay;
+	unsigned iops_limit;
 };
 
 /* delayed seq elem */
diff --git a/fs/btrfs/discard.c b/fs/btrfs/discard.c
index 173770bf8a2d..abcc3b2189d1 100644
--- a/fs/btrfs/discard.c
+++ b/fs/btrfs/discard.c
@@ -15,6 +15,12 @@
 #define BTRFS_DISCARD_DELAY		(120ULL * NSEC_PER_SEC)
 #define BTRFS_DISCARD_UNUSED_DELAY	(10ULL * NSEC_PER_SEC)
 
+/* Target completion latency of discarding all discardable extents */
+#define BTRFS_DISCARD_TARGET_MSEC	(6 * 60 * 60UL * MSEC_PER_SEC)
+#define BTRFS_DISCARD_MIN_DELAY_MSEC	(1UL)
+#define BTRFS_DISCARD_MAX_DELAY_MSEC	(1000UL)
+#define BTRFS_DISCARD_MAX_IOPS		(10U)
+
 static struct list_head *get_discard_list(struct btrfs_discard_ctl *discard_ctl,
 					  struct btrfs_block_group *block_group)
 {
@@ -235,11 +241,18 @@ void btrfs_discard_schedule_work(struct btrfs_discard_ctl *discard_ctl,
 
 	block_group = find_next_block_group(discard_ctl, now);
 	if (block_group) {
-		u64 delay = 0;
+		unsigned long delay = discard_ctl->delay;
+
+		/*
+		 * This timeout is to hopefully prevent immediate discarding
+		 * in a recently allocated block group.
+		 */
+		if (now < block_group->discard_eligible_time) {
+			u64 bg_timeout = (block_group->discard_eligible_time -
+					  now);
 
-		if (now < block_group->discard_eligible_time)
-			delay = nsecs_to_jiffies(
-				block_group->discard_eligible_time - now);
+			delay = max(delay, nsecs_to_jiffies(bg_timeout));
+		}
 
 		mod_delayed_work(discard_ctl->discard_workers,
 				 &discard_ctl->work, delay);
@@ -342,6 +355,38 @@ bool btrfs_run_discard_work(struct btrfs_discard_ctl *discard_ctl)
 		test_bit(BTRFS_FS_DISCARD_RUNNING, &fs_info->flags));
 }
 
+/**
+ * btrfs_discard_calc_delay - recalculate the base delay
+ * @discard_ctl: discard control
+ *
+ * Recalculate the base delay which is based off the total number of
+ * discardable_extents.  Clamp this between the lower_limit (iops_limit or 1ms)
+ * and the upper_limit (BTRFS_DISCARD_MAX_DELAY_MSEC).
+ */
+void btrfs_discard_calc_delay(struct btrfs_discard_ctl *discard_ctl)
+{
+	s32 discardable_extents =
+		atomic_read(&discard_ctl->discardable_extents);
+	unsigned iops_limit;
+	unsigned long delay, lower_limit = BTRFS_DISCARD_MIN_DELAY_MSEC;
+
+	if (!discardable_extents)
+		return;
+
+	spin_lock(&discard_ctl->lock);
+
+	iops_limit = READ_ONCE(discard_ctl->iops_limit);
+	if (iops_limit)
+		lower_limit = max_t(unsigned long, lower_limit,
+				    MSEC_PER_SEC / iops_limit);
+
+	delay = BTRFS_DISCARD_TARGET_MSEC / discardable_extents;
+	delay = clamp(delay, lower_limit, BTRFS_DISCARD_MAX_DELAY_MSEC);
+	discard_ctl->delay = msecs_to_jiffies(delay);
+
+	spin_unlock(&discard_ctl->lock);
+}
+
 /**
  * btrfs_discard_update_discardable - propagate discard counters
  * @block_group: block_group of interest
@@ -464,6 +509,8 @@ void btrfs_discard_init(struct btrfs_fs_info *fs_info)
 
 	atomic_set(&discard_ctl->discardable_extents, 0);
 	atomic64_set(&discard_ctl->discardable_bytes, 0);
+	discard_ctl->delay = BTRFS_DISCARD_MAX_DELAY_MSEC;
+	discard_ctl->iops_limit = BTRFS_DISCARD_MAX_IOPS;
 }
 
 void btrfs_discard_cleanup(struct btrfs_fs_info *fs_info)
diff --git a/fs/btrfs/discard.h b/fs/btrfs/discard.h
index 0f2f89b1b0b9..5250fe178e49 100644
--- a/fs/btrfs/discard.h
+++ b/fs/btrfs/discard.h
@@ -17,6 +17,7 @@ void btrfs_discard_schedule_work(struct btrfs_discard_ctl *discard_ctl,
 bool btrfs_run_discard_work(struct btrfs_discard_ctl *discard_ctl);
 
 /* Update operations */
+void btrfs_discard_calc_delay(struct btrfs_discard_ctl *discard_ctl);
 void btrfs_discard_update_discardable(struct btrfs_block_group *block_group,
 				      struct btrfs_free_space_ctl *ctl);
 
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 2c12366cfde5..0163fdd59f8f 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -2935,8 +2935,10 @@ int btrfs_finish_extent_commit(struct btrfs_trans_handle *trans)
 		cond_resched();
 	}
 
-	if (btrfs_test_opt(fs_info, DISCARD_ASYNC))
+	if (btrfs_test_opt(fs_info, DISCARD_ASYNC)) {
+		btrfs_discard_calc_delay(&fs_info->discard_ctl);
 		btrfs_discard_schedule_work(&fs_info->discard_ctl, true);
+	}
 
 	/*
 	 * Transaction is finished.  We don't need the lock anymore.  We
diff --git a/fs/btrfs/sysfs.c b/fs/btrfs/sysfs.c
index e9dbdbbbebeb..e175aaf7a1e6 100644
--- a/fs/btrfs/sysfs.c
+++ b/fs/btrfs/sysfs.c
@@ -344,6 +344,36 @@ static const struct attribute_group btrfs_static_feature_attr_group = {
  */
 #define discard_to_fs_info(_kobj)	to_fs_info((_kobj)->parent->parent)
 
+static ssize_t btrfs_discard_iops_limit_show(struct kobject *kobj,
+					     struct kobj_attribute *a,
+					     char *buf)
+{
+	struct btrfs_fs_info *fs_info = discard_to_fs_info(kobj);
+
+	return snprintf(buf, PAGE_SIZE, "%u\n",
+			READ_ONCE(fs_info->discard_ctl.iops_limit));
+}
+
+static ssize_t btrfs_discard_iops_limit_store(struct kobject *kobj,
+					      struct kobj_attribute *a,
+					      const char *buf, size_t len)
+{
+	struct btrfs_fs_info *fs_info = discard_to_fs_info(kobj);
+	struct btrfs_discard_ctl *discard_ctl = &fs_info->discard_ctl;
+	unsigned iops_limit;
+	int ret;
+
+	ret = kstrtouint(buf, 10, &iops_limit);
+	if (ret)
+		return -EINVAL;
+
+	WRITE_ONCE(discard_ctl->iops_limit, iops_limit);
+
+	return len;
+}
+BTRFS_ATTR_RW(discard, iops_limit, btrfs_discard_iops_limit_show,
+	      btrfs_discard_iops_limit_store);
+
 static ssize_t btrfs_discardable_extents_show(struct kobject *kobj,
 					      struct kobj_attribute *a,
 					      char *buf)
@@ -367,6 +397,7 @@ static ssize_t btrfs_discardable_bytes_show(struct kobject *kobj,
 BTRFS_ATTR(discard, discardable_bytes, btrfs_discardable_bytes_show);
 
 static const struct attribute *discard_debug_attrs[] = {
+	BTRFS_ATTR_PTR(discard, iops_limit),
 	BTRFS_ATTR_PTR(discard, discardable_extents),
 	BTRFS_ATTR_PTR(discard, discardable_bytes),
 	NULL,
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 02/12] btrfs: add bps discard rate limit for async discard
  2020-01-02 21:26 [PATCH 00/12] btrfs: async discard follow up Dennis Zhou
  2020-01-02 21:26 ` [PATCH 01/12] btrfs: calculate discard delay based on number of extents Dennis Zhou
@ 2020-01-02 21:26 ` Dennis Zhou
  2020-01-03 14:40   ` David Sterba
  2020-01-02 21:26 ` [PATCH 03/12] btrfs: limit max discard size " Dennis Zhou
                   ` (12 subsequent siblings)
  14 siblings, 1 reply; 28+ messages in thread
From: Dennis Zhou @ 2020-01-02 21:26 UTC (permalink / raw)
  To: David Sterba, Chris Mason, Josef Bacik, Omar Sandoval
  Cc: kernel-team, linux-btrfs, Dennis Zhou

Provide the ability to rate limit based on kbps in addition to iops as
additional guides for the target discard rate. The delay used ends up
being max(kbps_delay, iops_delay).

Signed-off-by: Dennis Zhou <dennis@kernel.org>
---
 fs/btrfs/ctree.h   |  2 ++
 fs/btrfs/discard.c | 23 +++++++++++++++++++++--
 fs/btrfs/sysfs.c   | 31 +++++++++++++++++++++++++++++++
 3 files changed, 54 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index c73bbc7e4491..2485cf94b628 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -466,10 +466,12 @@ struct btrfs_discard_ctl {
 	spinlock_t lock;
 	struct btrfs_block_group *block_group;
 	struct list_head discard_list[BTRFS_NR_DISCARD_LISTS];
+	u64 prev_discard;
 	atomic_t discardable_extents;
 	atomic64_t discardable_bytes;
 	unsigned long delay;
 	unsigned iops_limit;
+	u32 kbps_limit;
 };
 
 /* delayed seq elem */
diff --git a/fs/btrfs/discard.c b/fs/btrfs/discard.c
index abcc3b2189d1..eb148ca9a508 100644
--- a/fs/btrfs/discard.c
+++ b/fs/btrfs/discard.c
@@ -4,6 +4,7 @@
 #include <linux/kernel.h>
 #include <linux/ktime.h>
 #include <linux/list.h>
+#include <linux/math64.h>
 #include <linux/sizes.h>
 #include <linux/workqueue.h>
 #include "ctree.h"
@@ -222,8 +223,8 @@ void btrfs_discard_queue_work(struct btrfs_discard_ctl *discard_ctl,
  * @override: override the current timer
  *
  * Discards are issued by a delayed workqueue item.  @override is used to
- * update the current delay as the baseline delay interview is reevaluated
- * on transaction commit.  This is also maxed with any other rate limit.
+ * update the current delay as the baseline delay interval is reevaluated on
+ * transaction commit.  This is also maxed with any other rate limit.
  */
 void btrfs_discard_schedule_work(struct btrfs_discard_ctl *discard_ctl,
 				 bool override)
@@ -242,6 +243,20 @@ void btrfs_discard_schedule_work(struct btrfs_discard_ctl *discard_ctl,
 	block_group = find_next_block_group(discard_ctl, now);
 	if (block_group) {
 		unsigned long delay = discard_ctl->delay;
+		u32 kbps_limit = READ_ONCE(discard_ctl->kbps_limit);
+
+		/*
+		 * A single delayed workqueue item is responsible for
+		 * discarding, so we can manage the bytes rate limit by keeping
+		 * track of the previous discard.
+		 */
+		if (kbps_limit && discard_ctl->prev_discard) {
+			u64 bps_limit = ((u64)kbps_limit) * SZ_1K;
+			u64 bps_delay = div64_u64(discard_ctl->prev_discard *
+						  MSEC_PER_SEC, bps_limit);
+
+			delay = max(delay, msecs_to_jiffies(bps_delay));
+		}
 
 		/*
 		 * This timeout is to hopefully prevent immediate discarding
@@ -317,6 +332,8 @@ static void btrfs_discard_workfn(struct work_struct *work)
 				       btrfs_block_group_end(block_group),
 				       0, true);
 
+	discard_ctl->prev_discard = trimmed;
+
 	/* Determine next steps for a block_group */
 	if (block_group->discard_cursor >= btrfs_block_group_end(block_group)) {
 		if (discard_state == BTRFS_DISCARD_BITMAPS) {
@@ -507,10 +524,12 @@ void btrfs_discard_init(struct btrfs_fs_info *fs_info)
 	for (i = 0; i < BTRFS_NR_DISCARD_LISTS; i++)
 		INIT_LIST_HEAD(&discard_ctl->discard_list[i]);
 
+	discard_ctl->prev_discard = 0;
 	atomic_set(&discard_ctl->discardable_extents, 0);
 	atomic64_set(&discard_ctl->discardable_bytes, 0);
 	discard_ctl->delay = BTRFS_DISCARD_MAX_DELAY_MSEC;
 	discard_ctl->iops_limit = BTRFS_DISCARD_MAX_IOPS;
+	discard_ctl->kbps_limit = 0;
 }
 
 void btrfs_discard_cleanup(struct btrfs_fs_info *fs_info)
diff --git a/fs/btrfs/sysfs.c b/fs/btrfs/sysfs.c
index e175aaf7a1e6..39b59f368f06 100644
--- a/fs/btrfs/sysfs.c
+++ b/fs/btrfs/sysfs.c
@@ -374,6 +374,36 @@ static ssize_t btrfs_discard_iops_limit_store(struct kobject *kobj,
 BTRFS_ATTR_RW(discard, iops_limit, btrfs_discard_iops_limit_show,
 	      btrfs_discard_iops_limit_store);
 
+static ssize_t btrfs_discard_kbps_limit_show(struct kobject *kobj,
+					     struct kobj_attribute *a,
+					     char *buf)
+{
+	struct btrfs_fs_info *fs_info = discard_to_fs_info(kobj);
+
+	return snprintf(buf, PAGE_SIZE, "%u\n",
+			READ_ONCE(fs_info->discard_ctl.kbps_limit));
+}
+
+static ssize_t btrfs_discard_kbps_limit_store(struct kobject *kobj,
+					      struct kobj_attribute *a,
+					      const char *buf, size_t len)
+{
+	struct btrfs_fs_info *fs_info = discard_to_fs_info(kobj);
+	struct btrfs_discard_ctl *discard_ctl = &fs_info->discard_ctl;
+	u32 kbps_limit;
+	int ret;
+
+	ret = kstrtou32(buf, 10, &kbps_limit);
+	if (ret)
+		return -EINVAL;
+
+	WRITE_ONCE(discard_ctl->kbps_limit, kbps_limit);
+
+	return len;
+}
+BTRFS_ATTR_RW(discard, kbps_limit, btrfs_discard_kbps_limit_show,
+	      btrfs_discard_kbps_limit_store);
+
 static ssize_t btrfs_discardable_extents_show(struct kobject *kobj,
 					      struct kobj_attribute *a,
 					      char *buf)
@@ -398,6 +428,7 @@ BTRFS_ATTR(discard, discardable_bytes, btrfs_discardable_bytes_show);
 
 static const struct attribute *discard_debug_attrs[] = {
 	BTRFS_ATTR_PTR(discard, iops_limit),
+	BTRFS_ATTR_PTR(discard, kbps_limit),
 	BTRFS_ATTR_PTR(discard, discardable_extents),
 	BTRFS_ATTR_PTR(discard, discardable_bytes),
 	NULL,
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 03/12] btrfs: limit max discard size for async discard
  2020-01-02 21:26 [PATCH 00/12] btrfs: async discard follow up Dennis Zhou
  2020-01-02 21:26 ` [PATCH 01/12] btrfs: calculate discard delay based on number of extents Dennis Zhou
  2020-01-02 21:26 ` [PATCH 02/12] btrfs: add bps discard rate limit for async discard Dennis Zhou
@ 2020-01-02 21:26 ` Dennis Zhou
  2020-01-03 14:41   ` David Sterba
  2020-01-02 21:26 ` [PATCH 04/12] btrfs: make max async discard size tunable Dennis Zhou
                   ` (11 subsequent siblings)
  14 siblings, 1 reply; 28+ messages in thread
From: Dennis Zhou @ 2020-01-02 21:26 UTC (permalink / raw)
  To: David Sterba, Chris Mason, Josef Bacik, Omar Sandoval
  Cc: kernel-team, linux-btrfs, Dennis Zhou

Throttle the maximum size of a discard so that we can provide an upper
bound for the rate of async discard. While the block layer is able to
split discards into the appropriate sized discards, we want to be able
to account more accurately the rate at which we are consuming ncq slots
as well as limit the upper bound of work for a discard.

Signed-off-by: Dennis Zhou <dennis@kernel.org>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
---
 fs/btrfs/discard.h          |  5 +++++
 fs/btrfs/free-space-cache.c | 41 +++++++++++++++++++++++++++++--------
 2 files changed, 37 insertions(+), 9 deletions(-)

diff --git a/fs/btrfs/discard.h b/fs/btrfs/discard.h
index 5250fe178e49..562c60fab77a 100644
--- a/fs/btrfs/discard.h
+++ b/fs/btrfs/discard.h
@@ -3,10 +3,15 @@
 #ifndef BTRFS_DISCARD_H
 #define BTRFS_DISCARD_H
 
+#include <linux/sizes.h>
+
 struct btrfs_fs_info;
 struct btrfs_discard_ctl;
 struct btrfs_block_group;
 
+/* Discard size limits */
+#define BTRFS_ASYNC_DISCARD_MAX_SIZE	(SZ_64M)
+
 /* Work operations */
 void btrfs_discard_cancel_work(struct btrfs_discard_ctl *discard_ctl,
 			       struct btrfs_block_group *block_group);
diff --git a/fs/btrfs/free-space-cache.c b/fs/btrfs/free-space-cache.c
index 40fb918a82f4..34291c777998 100644
--- a/fs/btrfs/free-space-cache.c
+++ b/fs/btrfs/free-space-cache.c
@@ -3466,16 +3466,36 @@ static int trim_no_bitmap(struct btrfs_block_group *block_group,
 		extent_start = entry->offset;
 		extent_bytes = entry->bytes;
 		extent_trim_state = entry->trim_state;
-		start = max(start, extent_start);
-		bytes = min(extent_start + extent_bytes, end) - start;
-		if (bytes < minlen) {
-			spin_unlock(&ctl->tree_lock);
-			mutex_unlock(&ctl->cache_writeout_mutex);
-			goto next;
-		}
+		if (async) {
+			start = entry->offset;
+			bytes = entry->bytes;
+			if (bytes < minlen) {
+				spin_unlock(&ctl->tree_lock);
+				mutex_unlock(&ctl->cache_writeout_mutex);
+				goto next;
+			}
+			unlink_free_space(ctl, entry);
+			if (bytes > BTRFS_ASYNC_DISCARD_MAX_SIZE) {
+				bytes = extent_bytes =
+					BTRFS_ASYNC_DISCARD_MAX_SIZE;
+				entry->offset += BTRFS_ASYNC_DISCARD_MAX_SIZE;
+				entry->bytes -= BTRFS_ASYNC_DISCARD_MAX_SIZE;
+				link_free_space(ctl, entry);
+			} else {
+				kmem_cache_free(btrfs_free_space_cachep, entry);
+			}
+		} else {
+			start = max(start, extent_start);
+			bytes = min(extent_start + extent_bytes, end) - start;
+			if (bytes < minlen) {
+				spin_unlock(&ctl->tree_lock);
+				mutex_unlock(&ctl->cache_writeout_mutex);
+				goto next;
+			}
 
-		unlink_free_space(ctl, entry);
-		kmem_cache_free(btrfs_free_space_cachep, entry);
+			unlink_free_space(ctl, entry);
+			kmem_cache_free(btrfs_free_space_cachep, entry);
+		}
 
 		spin_unlock(&ctl->tree_lock);
 		trim_entry.start = extent_start;
@@ -3639,6 +3659,9 @@ static int trim_bitmaps(struct btrfs_block_group *block_group,
 			goto next;
 		}
 
+		if (async && bytes > BTRFS_ASYNC_DISCARD_MAX_SIZE)
+			bytes = BTRFS_ASYNC_DISCARD_MAX_SIZE;
+
 		bitmap_clear_bits(ctl, entry, start, bytes);
 		if (entry->bytes == 0)
 			free_bitmap(ctl, entry);
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 04/12] btrfs: make max async discard size tunable
  2020-01-02 21:26 [PATCH 00/12] btrfs: async discard follow up Dennis Zhou
                   ` (2 preceding siblings ...)
  2020-01-02 21:26 ` [PATCH 03/12] btrfs: limit max discard size " Dennis Zhou
@ 2020-01-02 21:26 ` Dennis Zhou
  2020-01-03 14:44   ` David Sterba
  2020-01-02 21:26 ` [PATCH 05/12] btrfs: have multiple discard lists Dennis Zhou
                   ` (10 subsequent siblings)
  14 siblings, 1 reply; 28+ messages in thread
From: Dennis Zhou @ 2020-01-02 21:26 UTC (permalink / raw)
  To: David Sterba, Chris Mason, Josef Bacik, Omar Sandoval
  Cc: kernel-team, linux-btrfs, Dennis Zhou

Expose max_discard_size as a tunable via sysfs.

Signed-off-by: Dennis Zhou <dennis@kernel.org>
---
 fs/btrfs/ctree.h            |  1 +
 fs/btrfs/discard.c          |  1 +
 fs/btrfs/discard.h          |  2 +-
 fs/btrfs/free-space-cache.c | 19 ++++++++++++-------
 fs/btrfs/sysfs.c            | 31 +++++++++++++++++++++++++++++++
 5 files changed, 46 insertions(+), 8 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 2485cf94b628..9328a0398678 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -469,6 +469,7 @@ struct btrfs_discard_ctl {
 	u64 prev_discard;
 	atomic_t discardable_extents;
 	atomic64_t discardable_bytes;
+	u64 max_discard_size;
 	unsigned long delay;
 	unsigned iops_limit;
 	u32 kbps_limit;
diff --git a/fs/btrfs/discard.c b/fs/btrfs/discard.c
index eb148ca9a508..822b888d90e3 100644
--- a/fs/btrfs/discard.c
+++ b/fs/btrfs/discard.c
@@ -527,6 +527,7 @@ void btrfs_discard_init(struct btrfs_fs_info *fs_info)
 	discard_ctl->prev_discard = 0;
 	atomic_set(&discard_ctl->discardable_extents, 0);
 	atomic64_set(&discard_ctl->discardable_bytes, 0);
+	discard_ctl->max_discard_size = BTRFS_ASYNC_DISCARD_DFL_MAX_SIZE;
 	discard_ctl->delay = BTRFS_DISCARD_MAX_DELAY_MSEC;
 	discard_ctl->iops_limit = BTRFS_DISCARD_MAX_IOPS;
 	discard_ctl->kbps_limit = 0;
diff --git a/fs/btrfs/discard.h b/fs/btrfs/discard.h
index 562c60fab77a..72816e479416 100644
--- a/fs/btrfs/discard.h
+++ b/fs/btrfs/discard.h
@@ -10,7 +10,7 @@ struct btrfs_discard_ctl;
 struct btrfs_block_group;
 
 /* Discard size limits */
-#define BTRFS_ASYNC_DISCARD_MAX_SIZE	(SZ_64M)
+#define BTRFS_ASYNC_DISCARD_DFL_MAX_SIZE	(SZ_64M)
 
 /* Work operations */
 void btrfs_discard_cancel_work(struct btrfs_discard_ctl *discard_ctl,
diff --git a/fs/btrfs/free-space-cache.c b/fs/btrfs/free-space-cache.c
index 34291c777998..fb1a53f9b39c 100644
--- a/fs/btrfs/free-space-cache.c
+++ b/fs/btrfs/free-space-cache.c
@@ -3428,6 +3428,8 @@ static int trim_no_bitmap(struct btrfs_block_group *block_group,
 			  u64 *total_trimmed, u64 start, u64 end, u64 minlen,
 			  bool async)
 {
+	struct btrfs_discard_ctl *discard_ctl =
+					&block_group->fs_info->discard_ctl;
 	struct btrfs_free_space_ctl *ctl = block_group->free_space_ctl;
 	struct btrfs_free_space *entry;
 	struct rb_node *node;
@@ -3436,6 +3438,7 @@ static int trim_no_bitmap(struct btrfs_block_group *block_group,
 	u64 extent_bytes;
 	enum btrfs_trim_state extent_trim_state;
 	u64 bytes;
+	u64 max_discard_size = READ_ONCE(discard_ctl->max_discard_size);
 
 	while (start < end) {
 		struct btrfs_trim_range trim_entry;
@@ -3475,11 +3478,10 @@ static int trim_no_bitmap(struct btrfs_block_group *block_group,
 				goto next;
 			}
 			unlink_free_space(ctl, entry);
-			if (bytes > BTRFS_ASYNC_DISCARD_MAX_SIZE) {
-				bytes = extent_bytes =
-					BTRFS_ASYNC_DISCARD_MAX_SIZE;
-				entry->offset += BTRFS_ASYNC_DISCARD_MAX_SIZE;
-				entry->bytes -= BTRFS_ASYNC_DISCARD_MAX_SIZE;
+			if (max_discard_size && bytes > max_discard_size) {
+				bytes = extent_bytes = max_discard_size;
+				entry->offset += max_discard_size;
+				entry->bytes -= max_discard_size;
 				link_free_space(ctl, entry);
 			} else {
 				kmem_cache_free(btrfs_free_space_cachep, entry);
@@ -3584,12 +3586,15 @@ static int trim_bitmaps(struct btrfs_block_group *block_group,
 			u64 *total_trimmed, u64 start, u64 end, u64 minlen,
 			bool async)
 {
+	struct btrfs_discard_ctl *discard_ctl =
+					&block_group->fs_info->discard_ctl;
 	struct btrfs_free_space_ctl *ctl = block_group->free_space_ctl;
 	struct btrfs_free_space *entry;
 	int ret = 0;
 	int ret2;
 	u64 bytes;
 	u64 offset = offset_to_bitmap(ctl, start);
+	u64 max_discard_size = READ_ONCE(discard_ctl->max_discard_size);
 
 	while (offset < end) {
 		bool next_bitmap = false;
@@ -3659,8 +3664,8 @@ static int trim_bitmaps(struct btrfs_block_group *block_group,
 			goto next;
 		}
 
-		if (async && bytes > BTRFS_ASYNC_DISCARD_MAX_SIZE)
-			bytes = BTRFS_ASYNC_DISCARD_MAX_SIZE;
+		if (async && max_discard_size && bytes > max_discard_size)
+			bytes = max_discard_size;
 
 		bitmap_clear_bits(ctl, entry, start, bytes);
 		if (entry->bytes == 0)
diff --git a/fs/btrfs/sysfs.c b/fs/btrfs/sysfs.c
index 39b59f368f06..8b0fd8557438 100644
--- a/fs/btrfs/sysfs.c
+++ b/fs/btrfs/sysfs.c
@@ -404,6 +404,36 @@ static ssize_t btrfs_discard_kbps_limit_store(struct kobject *kobj,
 BTRFS_ATTR_RW(discard, kbps_limit, btrfs_discard_kbps_limit_show,
 	      btrfs_discard_kbps_limit_store);
 
+static ssize_t btrfs_discard_max_discard_size_show(struct kobject *kobj,
+						   struct kobj_attribute *a,
+						   char *buf)
+{
+	struct btrfs_fs_info *fs_info = discard_to_fs_info(kobj);
+
+	return snprintf(buf, PAGE_SIZE, "%llu\n",
+			READ_ONCE(fs_info->discard_ctl.max_discard_size));
+}
+
+static ssize_t btrfs_discard_max_discard_size_store(struct kobject *kobj,
+						    struct kobj_attribute *a,
+						    const char *buf, size_t len)
+{
+	struct btrfs_fs_info *fs_info = discard_to_fs_info(kobj);
+	struct btrfs_discard_ctl *discard_ctl = &fs_info->discard_ctl;
+	u64 max_discard_size;
+	int ret;
+
+	ret = kstrtou64(buf, 10, &max_discard_size);
+	if (ret)
+		return -EINVAL;
+
+	WRITE_ONCE(discard_ctl->max_discard_size, max_discard_size);
+
+	return len;
+}
+BTRFS_ATTR_RW(discard, max_discard_size, btrfs_discard_max_discard_size_show,
+	      btrfs_discard_max_discard_size_store);
+
 static ssize_t btrfs_discardable_extents_show(struct kobject *kobj,
 					      struct kobj_attribute *a,
 					      char *buf)
@@ -429,6 +459,7 @@ BTRFS_ATTR(discard, discardable_bytes, btrfs_discardable_bytes_show);
 static const struct attribute *discard_debug_attrs[] = {
 	BTRFS_ATTR_PTR(discard, iops_limit),
 	BTRFS_ATTR_PTR(discard, kbps_limit),
+	BTRFS_ATTR_PTR(discard, max_discard_size),
 	BTRFS_ATTR_PTR(discard, discardable_extents),
 	BTRFS_ATTR_PTR(discard, discardable_bytes),
 	NULL,
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 05/12] btrfs: have multiple discard lists
  2020-01-02 21:26 [PATCH 00/12] btrfs: async discard follow up Dennis Zhou
                   ` (3 preceding siblings ...)
  2020-01-02 21:26 ` [PATCH 04/12] btrfs: make max async discard size tunable Dennis Zhou
@ 2020-01-02 21:26 ` Dennis Zhou
  2020-01-02 21:26 ` [PATCH 06/12] btrfs: only keep track of data extents for async discard Dennis Zhou
                   ` (9 subsequent siblings)
  14 siblings, 0 replies; 28+ messages in thread
From: Dennis Zhou @ 2020-01-02 21:26 UTC (permalink / raw)
  To: David Sterba, Chris Mason, Josef Bacik, Omar Sandoval
  Cc: kernel-team, linux-btrfs, Dennis Zhou

Non-block group destruction discarding currently only had a single list
with no minimum discard length. This can lead to caravaning more
meaningful discards behind a heavily fragmented block group.

This adds support for multiple lists with minimum discard lengths to
prevent the caravan effect. We promote block groups back up when we
exceed the BTRFS_ASYNC_DISCARD_MAX_FILTER size, currently we support
only 2 lists with filters of 1MB and 32KB respectively.

Signed-off-by: Dennis Zhou <dennis@kernel.org>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
---
 fs/btrfs/ctree.h            |   2 +-
 fs/btrfs/discard.c          | 102 ++++++++++++++++++++++++++++++++----
 fs/btrfs/discard.h          |   6 +++
 fs/btrfs/free-space-cache.c |  54 ++++++++++++++-----
 fs/btrfs/free-space-cache.h |   2 +-
 5 files changed, 142 insertions(+), 24 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 9328a0398678..09371e8b55a7 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -456,7 +456,7 @@ struct btrfs_full_stripe_locks_tree {
  * afterwards represent monotonically decreasing discard filter sizes to
  * prioritize what should be discarded next.
  */
-#define BTRFS_NR_DISCARD_LISTS		2
+#define BTRFS_NR_DISCARD_LISTS		3
 #define BTRFS_DISCARD_INDEX_UNUSED	0
 #define BTRFS_DISCARD_INDEX_START	1
 
diff --git a/fs/btrfs/discard.c b/fs/btrfs/discard.c
index 822b888d90e3..de436c0051ce 100644
--- a/fs/btrfs/discard.c
+++ b/fs/btrfs/discard.c
@@ -22,6 +22,10 @@
 #define BTRFS_DISCARD_MAX_DELAY_MSEC	(1000UL)
 #define BTRFS_DISCARD_MAX_IOPS		(10U)
 
+/* Montonically decreasing minimum length filters after index 0. */
+static int discard_minlen[BTRFS_NR_DISCARD_LISTS] = {0,
+	BTRFS_ASYNC_DISCARD_MAX_FILTER, BTRFS_ASYNC_DISCARD_MIN_FILTER};
+
 static struct list_head *get_discard_list(struct btrfs_discard_ctl *discard_ctl,
 					  struct btrfs_block_group *block_group)
 {
@@ -139,16 +143,18 @@ static struct btrfs_block_group *find_next_block_group(
  * peek_discard_list - wrap find_next_block_group()
  * @discard_ctl: discard control
  * @discard_state: the discard_state of the block_group after state management
+ * @discard_index: the discard_index of the block_group after state management
  *
  * This wraps find_next_block_group() and sets the block_group to be in use.
  * discard_state's control flow is managed here.  Variables related to
- * discard_state are reset here as needed (eg. discard_cursor).  @discard_state
- * is remembered as it may change while we're discarding, but we want the
- * discard to execute in the context determined here.
+ * discard_state are reset here as needed (eg discard_cursor).  @discard_state
+ * and @discard_index are remembered as it may change while we're discarding,
+ * but we want the discard to execute in the context determined here.
  */
 static struct btrfs_block_group *peek_discard_list(
 					struct btrfs_discard_ctl *discard_ctl,
-					enum btrfs_discard_state *discard_state)
+					enum btrfs_discard_state *discard_state,
+					int *discard_index)
 {
 	struct btrfs_block_group *block_group;
 	const u64 now = ktime_get_ns();
@@ -169,6 +175,7 @@ static struct btrfs_block_group *peek_discard_list(
 		}
 		discard_ctl->block_group = block_group;
 		*discard_state = block_group->discard_state;
+		*discard_index = block_group->discard_index;
 	} else {
 		block_group = NULL;
 	}
@@ -178,6 +185,64 @@ static struct btrfs_block_group *peek_discard_list(
 	return block_group;
 }
 
+/**
+ * btrfs_discard_check_filter - updates a block groups filters
+ * @block_group: block group of interest
+ * @bytes: recently freed region size after coalescing
+ *
+ * Async discard maintains multiple lists with progressively smaller filters
+ * to prioritize discarding based on size.  Should a free space that matches
+ * a larger filter be returned to the free_space_cache, prioritize that discard
+ * by moving @block_group to the proper filter.
+ */
+void btrfs_discard_check_filter(struct btrfs_block_group *block_group,
+				u64 bytes)
+{
+	struct btrfs_discard_ctl *discard_ctl;
+
+	if (!block_group ||
+	    !btrfs_test_opt(block_group->fs_info, DISCARD_ASYNC))
+		return;
+
+	discard_ctl = &block_group->fs_info->discard_ctl;
+
+	if (block_group->discard_index > BTRFS_DISCARD_INDEX_START &&
+	    bytes >= discard_minlen[block_group->discard_index - 1]) {
+		int i;
+
+		remove_from_discard_list(discard_ctl, block_group);
+
+		for (i = BTRFS_DISCARD_INDEX_START; i < BTRFS_NR_DISCARD_LISTS;
+		     i++) {
+			if (bytes >= discard_minlen[i]) {
+				block_group->discard_index = i;
+				add_to_discard_list(discard_ctl, block_group);
+				break;
+			}
+		}
+	}
+}
+
+/**
+ * btrfs_update_discard_index - moves a block_group along the discard lists
+ * @discard_ctl: discard control
+ * @block_group: block_group of interest
+ *
+ * Increment @block_group's discard_index.  If it falls of the list, let it be.
+ * Otherwise add it back to the appropriate list.
+ */
+static void btrfs_update_discard_index(struct btrfs_discard_ctl *discard_ctl,
+				       struct btrfs_block_group *block_group)
+{
+	block_group->discard_index++;
+	if (block_group->discard_index == BTRFS_NR_DISCARD_LISTS) {
+		block_group->discard_index = 1;
+		return;
+	}
+
+	add_to_discard_list(discard_ctl, block_group);
+}
+
 /**
  * btrfs_discard_cancel_work - remove a block_group from the discard lists
  * @discard_ctl: discard control
@@ -296,6 +361,8 @@ static void btrfs_finish_discard_pass(struct btrfs_discard_ctl *discard_ctl,
 			btrfs_mark_bg_unused(block_group);
 		else
 			add_to_discard_unused_list(discard_ctl, block_group);
+	} else {
+		btrfs_update_discard_index(discard_ctl, block_group);
 	}
 }
 
@@ -312,25 +379,42 @@ static void btrfs_discard_workfn(struct work_struct *work)
 	struct btrfs_discard_ctl *discard_ctl;
 	struct btrfs_block_group *block_group;
 	enum btrfs_discard_state discard_state;
+	int discard_index = 0;
 	u64 trimmed = 0;
+	u64 minlen = 0;
 
 	discard_ctl = container_of(work, struct btrfs_discard_ctl, work.work);
 
-	block_group = peek_discard_list(discard_ctl, &discard_state);
+	block_group = peek_discard_list(discard_ctl, &discard_state,
+					&discard_index);
 	if (!block_group || !btrfs_run_discard_work(discard_ctl))
 		return;
 
 	/* Perform discarding */
-	if (discard_state == BTRFS_DISCARD_BITMAPS)
+	minlen = discard_minlen[discard_index];
+
+	if (discard_state == BTRFS_DISCARD_BITMAPS) {
+		u64 maxlen = 0;
+
+		/*
+		 * Use the previous levels minimum discard length as the max
+		 * length filter.  In the case something is added to make a
+		 * region go beyond the max filter, the entire bitmap is set
+		 * back to BTRFS_TRIM_STATE_UNTRIMMED.
+		 */
+		if (discard_index != BTRFS_DISCARD_INDEX_UNUSED)
+			maxlen = discard_minlen[discard_index - 1];
+
 		btrfs_trim_block_group_bitmaps(block_group, &trimmed,
 				       block_group->discard_cursor,
 				       btrfs_block_group_end(block_group),
-				       0, true);
-	else
+				       minlen, maxlen, true);
+	} else {
 		btrfs_trim_block_group_extents(block_group, &trimmed,
 				       block_group->discard_cursor,
 				       btrfs_block_group_end(block_group),
-				       0, true);
+				       minlen, true);
+	}
 
 	discard_ctl->prev_discard = trimmed;
 
diff --git a/fs/btrfs/discard.h b/fs/btrfs/discard.h
index 72816e479416..f73276ae74f7 100644
--- a/fs/btrfs/discard.h
+++ b/fs/btrfs/discard.h
@@ -11,6 +11,12 @@ struct btrfs_block_group;
 
 /* Discard size limits */
 #define BTRFS_ASYNC_DISCARD_DFL_MAX_SIZE	(SZ_64M)
+#define BTRFS_ASYNC_DISCARD_MAX_FILTER		(SZ_1M)
+#define BTRFS_ASYNC_DISCARD_MIN_FILTER		(SZ_32K)
+
+/* List operations */
+void btrfs_discard_check_filter(struct btrfs_block_group *block_group,
+				u64 bytes);
 
 /* Work operations */
 void btrfs_discard_cancel_work(struct btrfs_discard_ctl *discard_ctl,
diff --git a/fs/btrfs/free-space-cache.c b/fs/btrfs/free-space-cache.c
index fb1a53f9b39c..6f0d60e86b6f 100644
--- a/fs/btrfs/free-space-cache.c
+++ b/fs/btrfs/free-space-cache.c
@@ -2465,6 +2465,7 @@ int __btrfs_add_free_space(struct btrfs_fs_info *fs_info,
 	struct btrfs_block_group *block_group = ctl->private;
 	struct btrfs_free_space *info;
 	int ret = 0;
+	u64 filter_bytes = bytes;
 
 	info = kmem_cache_zalloc(btrfs_free_space_cachep, GFP_NOFS);
 	if (!info)
@@ -2501,6 +2502,8 @@ int __btrfs_add_free_space(struct btrfs_fs_info *fs_info,
 	 */
 	steal_from_bitmap(ctl, info, true);
 
+	filter_bytes = max(filter_bytes, info->bytes);
+
 	ret = link_free_space(ctl, info);
 	if (ret)
 		kmem_cache_free(btrfs_free_space_cachep, info);
@@ -2513,8 +2516,10 @@ int __btrfs_add_free_space(struct btrfs_fs_info *fs_info,
 		ASSERT(ret != -EEXIST);
 	}
 
-	if (trim_state != BTRFS_TRIM_STATE_TRIMMED)
+	if (trim_state != BTRFS_TRIM_STATE_TRIMMED) {
+		btrfs_discard_check_filter(block_group, filter_bytes);
 		btrfs_discard_queue_work(&fs_info->discard_ctl, block_group);
+	}
 
 	return ret;
 }
@@ -3478,7 +3483,14 @@ static int trim_no_bitmap(struct btrfs_block_group *block_group,
 				goto next;
 			}
 			unlink_free_space(ctl, entry);
-			if (max_discard_size && bytes > max_discard_size) {
+			/*
+			 * Let bytes = BTRFS_MAX_DISCARD_SIZE + X.
+			 * If X < BTRFS_ASYNC_DISCARD_MIN_FILTER, we won't trim
+			 * X when we come back around.  So trim it now.
+			 */
+			if (max_discard_size &&
+			    bytes >= (max_discard_size +
+				      BTRFS_ASYNC_DISCARD_MIN_FILTER)) {
 				bytes = extent_bytes = max_discard_size;
 				entry->offset += max_discard_size;
 				entry->bytes -= max_discard_size;
@@ -3584,7 +3596,7 @@ static void end_trimming_bitmap(struct btrfs_free_space_ctl *ctl,
  */
 static int trim_bitmaps(struct btrfs_block_group *block_group,
 			u64 *total_trimmed, u64 start, u64 end, u64 minlen,
-			bool async)
+			u64 maxlen, bool async)
 {
 	struct btrfs_discard_ctl *discard_ctl =
 					&block_group->fs_info->discard_ctl;
@@ -3612,7 +3624,15 @@ static int trim_bitmaps(struct btrfs_block_group *block_group,
 		}
 
 		entry = tree_search_offset(ctl, offset, 1, 0);
-		if (!entry || (async && start == offset &&
+		/*
+		 * Bitmaps are marked trimmed lossily now to prevent constant
+		 * discarding of the same bitmap (the reason why we are bound
+		 * by the filters).  So, retrim the block group bitmaps when we
+		 * are preparing to punt to the unused_bgs list.  This uses
+		 * @minlen to determine if we are in BTRFS_DISCARD_INDEX_UNUSED
+		 * which is the only discard index which sets minlen to 0.
+		 */
+		if (!entry || (async && minlen && start == offset &&
 			       btrfs_free_space_trimmed(entry))) {
 			spin_unlock(&ctl->tree_lock);
 			mutex_unlock(&ctl->cache_writeout_mutex);
@@ -3633,10 +3653,10 @@ static int trim_bitmaps(struct btrfs_block_group *block_group,
 		ret2 = search_bitmap(ctl, entry, &start, &bytes, false);
 		if (ret2 || start >= end) {
 			/*
-			 * This keeps the invariant that all bytes are trimmed
-			 * if BTRFS_TRIM_STATE_TRIMMED is set on a bitmap.
+			 * We lossily consider a bitmap trimmed if we only skip
+			 * over regions <= BTRFS_ASYNC_DISCARD_MIN_FILTER.
 			 */
-			if (ret2 && !minlen)
+			if (ret2 && minlen <= BTRFS_ASYNC_DISCARD_MIN_FILTER)
 				end_trimming_bitmap(ctl, entry);
 			else
 				entry->trim_state = BTRFS_TRIM_STATE_UNTRIMMED;
@@ -3657,14 +3677,20 @@ static int trim_bitmaps(struct btrfs_block_group *block_group,
 		}
 
 		bytes = min(bytes, end - start);
-		if (bytes < minlen) {
-			entry->trim_state = BTRFS_TRIM_STATE_UNTRIMMED;
+		if (bytes < minlen || (async && maxlen && bytes > maxlen)) {
 			spin_unlock(&ctl->tree_lock);
 			mutex_unlock(&ctl->cache_writeout_mutex);
 			goto next;
 		}
 
-		if (async && max_discard_size && bytes > max_discard_size)
+		/*
+		 * Let bytes = BTRFS_MAX_DISCARD_SIZE + X.
+		 * If X < @minlen, we won't trim X when we come back around.
+		 * So trim it now.  We differ here from trimming extents as we
+		 * don't keep individual state per bit.
+		 */
+		if (async && max_discard_size &&
+		    bytes > (max_discard_size + minlen))
 			bytes = max_discard_size;
 
 		bitmap_clear_bits(ctl, entry, start, bytes);
@@ -3772,7 +3798,7 @@ int btrfs_trim_block_group(struct btrfs_block_group *block_group,
 	if (ret)
 		goto out;
 
-	ret = trim_bitmaps(block_group, trimmed, start, end, minlen, false);
+	ret = trim_bitmaps(block_group, trimmed, start, end, minlen, 0, false);
 	div64_u64_rem(end, BITS_PER_BITMAP * ctl->unit, &rem);
 	/* If we ended in the middle of a bitmap, reset the trimming flag */
 	if (rem)
@@ -3806,7 +3832,7 @@ int btrfs_trim_block_group_extents(struct btrfs_block_group *block_group,
 
 int btrfs_trim_block_group_bitmaps(struct btrfs_block_group *block_group,
 				   u64 *trimmed, u64 start, u64 end, u64 minlen,
-				   bool async)
+				   u64 maxlen, bool async)
 {
 	int ret;
 
@@ -3820,7 +3846,9 @@ int btrfs_trim_block_group_bitmaps(struct btrfs_block_group *block_group,
 	btrfs_get_block_group_trimming(block_group);
 	spin_unlock(&block_group->lock);
 
-	ret = trim_bitmaps(block_group, trimmed, start, end, minlen, async);
+	ret = trim_bitmaps(block_group, trimmed, start, end, minlen, maxlen,
+			   async);
+
 	btrfs_put_block_group_trimming(block_group);
 
 	return ret;
diff --git a/fs/btrfs/free-space-cache.h b/fs/btrfs/free-space-cache.h
index 5018190a49a3..2e0a8077aa74 100644
--- a/fs/btrfs/free-space-cache.h
+++ b/fs/btrfs/free-space-cache.h
@@ -146,7 +146,7 @@ int btrfs_trim_block_group_extents(struct btrfs_block_group *block_group,
 				   bool async);
 int btrfs_trim_block_group_bitmaps(struct btrfs_block_group *block_group,
 				   u64 *trimmed, u64 start, u64 end, u64 minlen,
-				   bool async);
+				   u64 maxlen, bool async);
 
 /* Support functions for running our sanity tests */
 #ifdef CONFIG_BTRFS_FS_RUN_SANITY_TESTS
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 06/12] btrfs: only keep track of data extents for async discard
  2020-01-02 21:26 [PATCH 00/12] btrfs: async discard follow up Dennis Zhou
                   ` (4 preceding siblings ...)
  2020-01-02 21:26 ` [PATCH 05/12] btrfs: have multiple discard lists Dennis Zhou
@ 2020-01-02 21:26 ` Dennis Zhou
  2020-01-02 21:26 ` [PATCH 07/12] btrfs: keep track of discard reuse stats Dennis Zhou
                   ` (8 subsequent siblings)
  14 siblings, 0 replies; 28+ messages in thread
From: Dennis Zhou @ 2020-01-02 21:26 UTC (permalink / raw)
  To: David Sterba, Chris Mason, Josef Bacik, Omar Sandoval
  Cc: kernel-team, linux-btrfs, Dennis Zhou

As mentioned earlier, discarding data can be done either by issuing an
explicit discard or implicitly by reusing the LBA. Metadata block_groups
see much more frequent reuse due to well it being metadata. So instead
of explicitly discarding metadata block_groups, just leave them be and
let the latter implicit discarding be done for them. For mixed
block_groups, block_groups which contain both metadata and data, we let
them be as higher fragmentation is expected.

Signed-off-by: Dennis Zhou <dennis@kernel.org>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
---
 fs/btrfs/block-group.h |  7 +++++++
 fs/btrfs/discard.c     | 16 ++++++++++++++--
 2 files changed, 21 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/block-group.h b/fs/btrfs/block-group.h
index a8d2edcd8760..4a088e690432 100644
--- a/fs/btrfs/block-group.h
+++ b/fs/btrfs/block-group.h
@@ -182,6 +182,13 @@ static inline u64 btrfs_block_group_end(struct btrfs_block_group *block_group)
 	return (block_group->start + block_group->length);
 }
 
+static inline bool btrfs_is_block_group_data_only(
+					struct btrfs_block_group *block_group)
+{
+	return ((block_group->flags & BTRFS_BLOCK_GROUP_DATA) &&
+		!(block_group->flags & BTRFS_BLOCK_GROUP_METADATA));
+}
+
 #ifdef CONFIG_BTRFS_DEBUG
 static inline int btrfs_should_fragment_free_space(
 		struct btrfs_block_group *block_group)
diff --git a/fs/btrfs/discard.c b/fs/btrfs/discard.c
index de436c0051ce..7dbbf762ee8d 100644
--- a/fs/btrfs/discard.c
+++ b/fs/btrfs/discard.c
@@ -54,6 +54,13 @@ static void __add_to_discard_list(struct btrfs_discard_ctl *discard_ctl,
 static void add_to_discard_list(struct btrfs_discard_ctl *discard_ctl,
 				struct btrfs_block_group *block_group)
 {
+	/*
+	 * Async discard only operates on block_groups that are explicitly for
+	 * data.  Mixed block_groups are not supported.
+	 */
+	if (!btrfs_is_block_group_data_only(block_group))
+		return;
+
 	spin_lock(&discard_ctl->lock);
 	__add_to_discard_list(discard_ctl, block_group);
 	spin_unlock(&discard_ctl->lock);
@@ -166,7 +173,10 @@ static struct btrfs_block_group *peek_discard_list(
 	if (block_group && now > block_group->discard_eligible_time) {
 		if (block_group->discard_index == BTRFS_DISCARD_INDEX_UNUSED &&
 		    block_group->used != 0) {
-			__add_to_discard_list(discard_ctl, block_group);
+			if (btrfs_is_block_group_data_only(block_group))
+				__add_to_discard_list(discard_ctl, block_group);
+			else
+				list_del_init(&block_group->discard_list);
 			goto again;
 		}
 		if (block_group->discard_state == BTRFS_DISCARD_RESET_CURSOR) {
@@ -504,7 +514,9 @@ void btrfs_discard_update_discardable(struct btrfs_block_group *block_group,
 	s32 extents_delta;
 	s64 bytes_delta;
 
-	if (!block_group || !btrfs_test_opt(block_group->fs_info, DISCARD_ASYNC))
+	if (!block_group ||
+	    !btrfs_test_opt(block_group->fs_info, DISCARD_ASYNC) ||
+	    !btrfs_is_block_group_data_only(block_group))
 		return;
 
 	discard_ctl = &block_group->fs_info->discard_ctl;
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 07/12] btrfs: keep track of discard reuse stats
  2020-01-02 21:26 [PATCH 00/12] btrfs: async discard follow up Dennis Zhou
                   ` (5 preceding siblings ...)
  2020-01-02 21:26 ` [PATCH 06/12] btrfs: only keep track of data extents for async discard Dennis Zhou
@ 2020-01-02 21:26 ` Dennis Zhou
  2020-01-02 21:26 ` [PATCH 08/12] btrfs: add async discard header Dennis Zhou
                   ` (7 subsequent siblings)
  14 siblings, 0 replies; 28+ messages in thread
From: Dennis Zhou @ 2020-01-02 21:26 UTC (permalink / raw)
  To: David Sterba, Chris Mason, Josef Bacik, Omar Sandoval
  Cc: kernel-team, linux-btrfs, Dennis Zhou

Keep track of how much we are discarding and how often we are reusing
with async discard. The discard_*_bytes doesn't need any special
protection because the work item provides the single threaded access.

Signed-off-by: Dennis Zhou <dennis@kernel.org>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
---
 fs/btrfs/ctree.h            |  3 +++
 fs/btrfs/discard.c          |  5 +++++
 fs/btrfs/free-space-cache.c | 14 ++++++++++++++
 fs/btrfs/sysfs.c            | 36 ++++++++++++++++++++++++++++++++++++
 4 files changed, 58 insertions(+)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 09371e8b55a7..fad310d46c78 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -473,6 +473,9 @@ struct btrfs_discard_ctl {
 	unsigned long delay;
 	unsigned iops_limit;
 	u32 kbps_limit;
+	u64 discard_extent_bytes;
+	u64 discard_bitmap_bytes;
+	atomic64_t discard_bytes_saved;
 };
 
 /* delayed seq elem */
diff --git a/fs/btrfs/discard.c b/fs/btrfs/discard.c
index 7dbbf762ee8d..bc6d4344397d 100644
--- a/fs/btrfs/discard.c
+++ b/fs/btrfs/discard.c
@@ -419,11 +419,13 @@ static void btrfs_discard_workfn(struct work_struct *work)
 				       block_group->discard_cursor,
 				       btrfs_block_group_end(block_group),
 				       minlen, maxlen, true);
+		discard_ctl->discard_bitmap_bytes += trimmed;
 	} else {
 		btrfs_trim_block_group_extents(block_group, &trimmed,
 				       block_group->discard_cursor,
 				       btrfs_block_group_end(block_group),
 				       minlen, true);
+		discard_ctl->discard_extent_bytes += trimmed;
 	}
 
 	discard_ctl->prev_discard = trimmed;
@@ -627,6 +629,9 @@ void btrfs_discard_init(struct btrfs_fs_info *fs_info)
 	discard_ctl->delay = BTRFS_DISCARD_MAX_DELAY_MSEC;
 	discard_ctl->iops_limit = BTRFS_DISCARD_MAX_IOPS;
 	discard_ctl->kbps_limit = 0;
+	discard_ctl->discard_extent_bytes = 0;
+	discard_ctl->discard_bitmap_bytes = 0;
+	atomic64_set(&discard_ctl->discard_bytes_saved, 0);
 }
 
 void btrfs_discard_cleanup(struct btrfs_fs_info *fs_info)
diff --git a/fs/btrfs/free-space-cache.c b/fs/btrfs/free-space-cache.c
index 6f0d60e86b6f..8a4a3b9cd544 100644
--- a/fs/btrfs/free-space-cache.c
+++ b/fs/btrfs/free-space-cache.c
@@ -2842,6 +2842,8 @@ u64 btrfs_find_space_for_alloc(struct btrfs_block_group *block_group,
 			       u64 *max_extent_size)
 {
 	struct btrfs_free_space_ctl *ctl = block_group->free_space_ctl;
+	struct btrfs_discard_ctl *discard_ctl =
+					&block_group->fs_info->discard_ctl;
 	struct btrfs_free_space *entry = NULL;
 	u64 bytes_search = bytes + empty_size;
 	u64 ret = 0;
@@ -2858,6 +2860,10 @@ u64 btrfs_find_space_for_alloc(struct btrfs_block_group *block_group,
 	ret = offset;
 	if (entry->bitmap) {
 		bitmap_clear_bits(ctl, entry, offset, bytes);
+
+		if (!btrfs_free_space_trimmed(entry))
+			atomic64_add(bytes, &discard_ctl->discard_bytes_saved);
+
 		if (!entry->bytes)
 			free_bitmap(ctl, entry);
 	} else {
@@ -2866,6 +2872,9 @@ u64 btrfs_find_space_for_alloc(struct btrfs_block_group *block_group,
 		align_gap = entry->offset;
 		align_gap_trim_state = entry->trim_state;
 
+		if (!btrfs_free_space_trimmed(entry))
+			atomic64_add(bytes, &discard_ctl->discard_bytes_saved);
+
 		entry->offset = offset + bytes;
 		WARN_ON(entry->bytes < bytes + align_gap_len);
 
@@ -2969,6 +2978,8 @@ u64 btrfs_alloc_from_cluster(struct btrfs_block_group *block_group,
 			     u64 min_start, u64 *max_extent_size)
 {
 	struct btrfs_free_space_ctl *ctl = block_group->free_space_ctl;
+	struct btrfs_discard_ctl *discard_ctl =
+					&block_group->fs_info->discard_ctl;
 	struct btrfs_free_space *entry = NULL;
 	struct rb_node *node;
 	u64 ret = 0;
@@ -3033,6 +3044,9 @@ u64 btrfs_alloc_from_cluster(struct btrfs_block_group *block_group,
 
 	spin_lock(&ctl->tree_lock);
 
+	if (!btrfs_free_space_trimmed(entry))
+		atomic64_add(bytes, &discard_ctl->discard_bytes_saved);
+
 	ctl->free_space -= bytes;
 	if (!entry->bitmap && !btrfs_free_space_trimmed(entry))
 		ctl->discardable_bytes[BTRFS_STAT_CURR] -= bytes;
diff --git a/fs/btrfs/sysfs.c b/fs/btrfs/sysfs.c
index 8b0fd8557438..2e973af8353f 100644
--- a/fs/btrfs/sysfs.c
+++ b/fs/btrfs/sysfs.c
@@ -456,12 +456,48 @@ static ssize_t btrfs_discardable_bytes_show(struct kobject *kobj,
 }
 BTRFS_ATTR(discard, discardable_bytes, btrfs_discardable_bytes_show);
 
+static ssize_t btrfs_discard_extent_bytes_show(struct kobject *kobj,
+					       struct kobj_attribute *a,
+					       char *buf)
+{
+	struct btrfs_fs_info *fs_info = discard_to_fs_info(kobj);
+
+	return snprintf(buf, PAGE_SIZE, "%lld\n",
+			fs_info->discard_ctl.discard_extent_bytes);
+}
+BTRFS_ATTR(discard, discard_extent_bytes, btrfs_discard_extent_bytes_show);
+
+static ssize_t btrfs_discard_bitmap_bytes_show(struct kobject *kobj,
+					       struct kobj_attribute *a,
+					       char *buf)
+{
+	struct btrfs_fs_info *fs_info = discard_to_fs_info(kobj);
+
+	return snprintf(buf, PAGE_SIZE, "%lld\n",
+			fs_info->discard_ctl.discard_bitmap_bytes);
+}
+BTRFS_ATTR(discard, discard_bitmap_bytes, btrfs_discard_bitmap_bytes_show);
+
+static ssize_t btrfs_discard_bytes_saved_show(struct kobject *kobj,
+					      struct kobj_attribute *a,
+					      char *buf)
+{
+	struct btrfs_fs_info *fs_info = discard_to_fs_info(kobj);
+
+	return snprintf(buf, PAGE_SIZE, "%lld\n",
+		atomic64_read(&fs_info->discard_ctl.discard_bytes_saved));
+}
+BTRFS_ATTR(discard, discard_bytes_saved, btrfs_discard_bytes_saved_show);
+
 static const struct attribute *discard_debug_attrs[] = {
 	BTRFS_ATTR_PTR(discard, iops_limit),
 	BTRFS_ATTR_PTR(discard, kbps_limit),
 	BTRFS_ATTR_PTR(discard, max_discard_size),
 	BTRFS_ATTR_PTR(discard, discardable_extents),
 	BTRFS_ATTR_PTR(discard, discardable_bytes),
+	BTRFS_ATTR_PTR(discard, discard_extent_bytes),
+	BTRFS_ATTR_PTR(discard, discard_bitmap_bytes),
+	BTRFS_ATTR_PTR(discard, discard_bytes_saved),
 	NULL,
 };
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 08/12] btrfs: add async discard header
  2020-01-02 21:26 [PATCH 00/12] btrfs: async discard follow up Dennis Zhou
                   ` (6 preceding siblings ...)
  2020-01-02 21:26 ` [PATCH 07/12] btrfs: keep track of discard reuse stats Dennis Zhou
@ 2020-01-02 21:26 ` Dennis Zhou
  2020-01-02 21:26 ` [PATCH 09/12] btrfs: increase the metadata allowance for the free_space_cache Dennis Zhou
                   ` (6 subsequent siblings)
  14 siblings, 0 replies; 28+ messages in thread
From: Dennis Zhou @ 2020-01-02 21:26 UTC (permalink / raw)
  To: David Sterba, Chris Mason, Josef Bacik, Omar Sandoval
  Cc: kernel-team, linux-btrfs, Dennis Zhou

Give a brief overview for how async discard is implemented.

Signed-off-by: Dennis Zhou <dennis@kernel.org>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
---
 fs/btrfs/discard.c | 38 ++++++++++++++++++++++++++++++++++++++
 1 file changed, 38 insertions(+)

diff --git a/fs/btrfs/discard.c b/fs/btrfs/discard.c
index bc6d4344397d..d5a89e3755ed 100644
--- a/fs/btrfs/discard.c
+++ b/fs/btrfs/discard.c
@@ -1,4 +1,42 @@
 // SPDX-License-Identifier: GPL-2.0
+/*
+ * This contains the logic to handle async discard.
+ *
+ * Async discard manages trimming of free space outside of transaction commit.
+ * Discarding is done by managing the block_groups on a LRU list based on free
+ * space recency.  Two passes are used to first prioritize discarding extents
+ * and then allow for trimming in the bitmap the best opportunity to coalesce.
+ * The block_groups are maintained on multiple lists to allow for multiple
+ * passes with different discard filter requirements.  A delayed work item is
+ * used to manage discarding with timeout determined by a max of the delay
+ * incurred by the iops rate limit, the byte rate limit, and the max delay of
+ * BTRFS_DISCARD_MAX_DELAY.
+ *
+ * Note, this only keeps track of block_groups that are explicitly for data.
+ * Mixed block_groups are not supported.
+ *
+ * The first list is special to manage discarding of fully free block groups.
+ * This is necessary because we issue a final trim for a full free block group
+ * after forgetting it.  When a block group becomes unused, instead of directly
+ * being added to the unused_bgs list, we add it to this first list.  Then
+ * from there, if it becomes fully discarded, we place it onto the unused_bgs
+ * list.
+ *
+ * The in-memory free space cache serves as the backing state for discard.
+ * Consequently this means there is no persistence.  We opt to load all the
+ * block groups in as not discarded, so the mount case degenerates to the
+ * crashing case.
+ *
+ * As the free space cache uses bitmaps, there exists a tradeoff between
+ * ease/efficiency for find_free_extent() and the accuracy of discard state.
+ * Here we opt to let untrimmed regions merge with everything while only letting
+ * trimmed regions merge with other trimmed regions.  This can cause
+ * overtrimming, but the coalescing benefit seems to be worth it.  Additionally,
+ * bitmap state is tracked as a whole.  If we're able to fully trim a bitmap,
+ * the trimmed flag is set on the bitmap.  Otherwise, if an allocation comes in,
+ * this resets the state and we will retry trimming the whole bitmap.  This is a
+ * tradeoff between discard state accuracy and the cost of accounting.
+ */
 
 #include <linux/jiffies.h>
 #include <linux/kernel.h>
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 09/12] btrfs: increase the metadata allowance for the free_space_cache
  2020-01-02 21:26 [PATCH 00/12] btrfs: async discard follow up Dennis Zhou
                   ` (7 preceding siblings ...)
  2020-01-02 21:26 ` [PATCH 08/12] btrfs: add async discard header Dennis Zhou
@ 2020-01-02 21:26 ` Dennis Zhou
  2020-01-02 21:26 ` [PATCH 10/12] btrfs: make smaller extents more likely to go into bitmaps Dennis Zhou
                   ` (5 subsequent siblings)
  14 siblings, 0 replies; 28+ messages in thread
From: Dennis Zhou @ 2020-01-02 21:26 UTC (permalink / raw)
  To: David Sterba, Chris Mason, Josef Bacik, Omar Sandoval
  Cc: kernel-team, linux-btrfs, Dennis Zhou

Currently, there is no way for the free space cache to recover from
being serviced by purely bitmaps because the extent threshold is set to
0 in recalculate_thresholds() when we surpass the metadata allowance.

This adds a recovery mechanism by keeping large extents out of the
bitmaps and increases the metadata upper bound to 64KB. The recovery
mechanism bypasses this upper bound, thus making it a soft upper bound.
But, with the bypass being 1MB or greater, it shouldn't add unbounded
overhead.

Signed-off-by: Dennis Zhou <dennis@kernel.org>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
---
 fs/btrfs/free-space-cache.c | 26 +++++++++++---------------
 1 file changed, 11 insertions(+), 15 deletions(-)

diff --git a/fs/btrfs/free-space-cache.c b/fs/btrfs/free-space-cache.c
index 8a4a3b9cd544..665f6eb6c828 100644
--- a/fs/btrfs/free-space-cache.c
+++ b/fs/btrfs/free-space-cache.c
@@ -24,7 +24,8 @@
 #include "discard.h"
 
 #define BITS_PER_BITMAP		(PAGE_SIZE * 8UL)
-#define MAX_CACHE_BYTES_PER_GIG	SZ_32K
+#define MAX_CACHE_BYTES_PER_GIG	SZ_64K
+#define FORCE_EXTENT_THRESHOLD	SZ_1M
 
 struct btrfs_trim_range {
 	u64 start;
@@ -1694,26 +1695,17 @@ static void recalculate_thresholds(struct btrfs_free_space_ctl *ctl)
 	ASSERT(ctl->total_bitmaps <= max_bitmaps);
 
 	/*
-	 * The goal is to keep the total amount of memory used per 1gb of space
-	 * at or below 32k, so we need to adjust how much memory we allow to be
-	 * used by extent based free space tracking
+	 * We are trying to keep the total amount of memory used per 1gb of
+	 * space to be MAX_CACHE_BYTES_PER_GIG.  However, with a reclamation
+	 * mechanism of pulling extents >= FORCE_EXTENT_THRESHOLD out of
+	 * bitmaps, we may end up using more memory than this.
 	 */
 	if (size < SZ_1G)
 		max_bytes = MAX_CACHE_BYTES_PER_GIG;
 	else
 		max_bytes = MAX_CACHE_BYTES_PER_GIG * div_u64(size, SZ_1G);
 
-	/*
-	 * we want to account for 1 more bitmap than what we have so we can make
-	 * sure we don't go over our overall goal of MAX_CACHE_BYTES_PER_GIG as
-	 * we add more bitmaps.
-	 */
-	bitmap_bytes = (ctl->total_bitmaps + 1) * ctl->unit;
-
-	if (bitmap_bytes >= max_bytes) {
-		ctl->extents_thresh = 0;
-		return;
-	}
+	bitmap_bytes = ctl->total_bitmaps * ctl->unit;
 
 	/*
 	 * we want the extent entry threshold to always be at most 1/2 the max
@@ -2099,6 +2091,10 @@ static bool use_bitmap(struct btrfs_free_space_ctl *ctl,
 		forced = true;
 #endif
 
+	/* This is a way to reclaim large regions from the bitmaps. */
+	if (!forced && info->bytes >= FORCE_EXTENT_THRESHOLD)
+		return false;
+
 	/*
 	 * If we are below the extents threshold then we can add this as an
 	 * extent, and don't have to deal with the bitmap
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 10/12] btrfs: make smaller extents more likely to go into bitmaps
  2020-01-02 21:26 [PATCH 00/12] btrfs: async discard follow up Dennis Zhou
                   ` (8 preceding siblings ...)
  2020-01-02 21:26 ` [PATCH 09/12] btrfs: increase the metadata allowance for the free_space_cache Dennis Zhou
@ 2020-01-02 21:26 ` Dennis Zhou
  2020-01-02 21:26 ` [PATCH 11/12] btrfs: ensure removal of discardable_* in free_bitmap() Dennis Zhou
                   ` (4 subsequent siblings)
  14 siblings, 0 replies; 28+ messages in thread
From: Dennis Zhou @ 2020-01-02 21:26 UTC (permalink / raw)
  To: David Sterba, Chris Mason, Josef Bacik, Omar Sandoval
  Cc: kernel-team, linux-btrfs, Dennis Zhou

It's less than ideal for small extents to eat into our extent budget, so
force extents <= 32KB into the bitmaps save for the first handful.

Signed-off-by: Dennis Zhou <dennis@kernel.org>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
---
 fs/btrfs/free-space-cache.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/free-space-cache.c b/fs/btrfs/free-space-cache.c
index 665f6eb6c828..6d74d96a1d13 100644
--- a/fs/btrfs/free-space-cache.c
+++ b/fs/btrfs/free-space-cache.c
@@ -2107,8 +2107,8 @@ static bool use_bitmap(struct btrfs_free_space_ctl *ctl,
 		 * of cache left then go ahead an dadd them, no sense in adding
 		 * the overhead of a bitmap if we don't have to.
 		 */
-		if (info->bytes <= fs_info->sectorsize * 4) {
-			if (ctl->free_extents * 2 <= ctl->extents_thresh)
+		if (info->bytes <= fs_info->sectorsize * 8) {
+			if (ctl->free_extents * 3 <= ctl->extents_thresh)
 				return false;
 		} else {
 			return false;
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 11/12] btrfs: ensure removal of discardable_* in free_bitmap()
  2020-01-02 21:26 [PATCH 00/12] btrfs: async discard follow up Dennis Zhou
                   ` (9 preceding siblings ...)
  2020-01-02 21:26 ` [PATCH 10/12] btrfs: make smaller extents more likely to go into bitmaps Dennis Zhou
@ 2020-01-02 21:26 ` Dennis Zhou
  2020-01-02 21:26 ` [PATCH 12/12] btrfs: add correction to handle -1 edge case in async discard Dennis Zhou
                   ` (3 subsequent siblings)
  14 siblings, 0 replies; 28+ messages in thread
From: Dennis Zhou @ 2020-01-02 21:26 UTC (permalink / raw)
  To: David Sterba, Chris Mason, Josef Bacik, Omar Sandoval
  Cc: kernel-team, linux-btrfs, Dennis Zhou

Most callers of free_bitmap() only call it if bitmap_info->bytes is 0.
However, there are certain cases where we may free the free space cache
via __btrfs_remove_free_space_cache(). This exposes a path where
free_bitmap() is called regardless. This may result in a bad accounting
situation for discardable_bytes and discardable_extents. So, remove the
stats and call btrfs_discard_update_discardable().

Signed-off-by: Dennis Zhou <dennis@kernel.org>
---
 fs/btrfs/free-space-cache.c | 14 ++++++++++++++
 1 file changed, 14 insertions(+)

diff --git a/fs/btrfs/free-space-cache.c b/fs/btrfs/free-space-cache.c
index 6d74d96a1d13..2b294c57060c 100644
--- a/fs/btrfs/free-space-cache.c
+++ b/fs/btrfs/free-space-cache.c
@@ -1959,6 +1959,18 @@ static void add_new_bitmap(struct btrfs_free_space_ctl *ctl,
 static void free_bitmap(struct btrfs_free_space_ctl *ctl,
 			struct btrfs_free_space *bitmap_info)
 {
+	/*
+	 * Normally when this is called, the bitmap is fully empty.  However,
+	 * if we are blowing up the free space cache for one reason or another
+	 * via __btrfs_remove_free_space_cache(), then it may not be freed and
+	 * we may leave stats on the table.
+	 */
+	if (bitmap_info->bytes && !btrfs_free_space_trimmed(bitmap_info)) {
+		ctl->discardable_extents[BTRFS_STAT_CURR] -=
+			bitmap_info->bitmap_extents;
+		ctl->discardable_bytes[BTRFS_STAT_CURR] -= bitmap_info->bytes;
+
+	}
 	unlink_free_space(ctl, bitmap_info);
 	kmem_cache_free(btrfs_free_space_bitmap_cachep, bitmap_info->bitmap);
 	kmem_cache_free(btrfs_free_space_cachep, bitmap_info);
@@ -2776,6 +2788,8 @@ void __btrfs_remove_free_space_cache(struct btrfs_free_space_ctl *ctl)
 {
 	spin_lock(&ctl->tree_lock);
 	__btrfs_remove_free_space_cache_locked(ctl);
+	if (ctl->private)
+		btrfs_discard_update_discardable(ctl->private, ctl);
 	spin_unlock(&ctl->tree_lock);
 }
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 12/12] btrfs: add correction to handle -1 edge case in async discard
  2020-01-02 21:26 [PATCH 00/12] btrfs: async discard follow up Dennis Zhou
                   ` (10 preceding siblings ...)
  2020-01-02 21:26 ` [PATCH 11/12] btrfs: ensure removal of discardable_* in free_bitmap() Dennis Zhou
@ 2020-01-02 21:26 ` Dennis Zhou
  2020-01-03 14:42   ` David Sterba
  2020-01-05 20:35   ` Nikolay Borisov
  2020-01-03 14:51 ` [PATCH 00/12] btrfs: async discard follow up David Sterba
                   ` (2 subsequent siblings)
  14 siblings, 2 replies; 28+ messages in thread
From: Dennis Zhou @ 2020-01-02 21:26 UTC (permalink / raw)
  To: David Sterba, Chris Mason, Josef Bacik, Omar Sandoval
  Cc: kernel-team, linux-btrfs, Dennis Zhou

From Dave's testing, it's possible to drive a file system to have -1
discardable_extents and a corresponding negative discardable_bytes. As
btrfs_discard_calc_delay() is the only user of discardable_extents, we
can correct here for any negative discardable_extents/discardable_bytes.

Reported-by: David Sterba <dsterba@suse.com>
Signed-off-by: Dennis Zhou <dennis@kernel.org>
---
 fs/btrfs/discard.c | 24 +++++++++++++++++++++---
 1 file changed, 21 insertions(+), 3 deletions(-)

diff --git a/fs/btrfs/discard.c b/fs/btrfs/discard.c
index d5a89e3755ed..d2c7851e31de 100644
--- a/fs/btrfs/discard.c
+++ b/fs/btrfs/discard.c
@@ -518,14 +518,32 @@ void btrfs_discard_calc_delay(struct btrfs_discard_ctl *discard_ctl)
 {
 	s32 discardable_extents =
 		atomic_read(&discard_ctl->discardable_extents);
+	s64 discardable_bytes = atomic64_read(&discard_ctl->discardable_bytes);
 	unsigned iops_limit;
 	unsigned long delay, lower_limit = BTRFS_DISCARD_MIN_DELAY_MSEC;
 
-	if (!discardable_extents)
-		return;
-
 	spin_lock(&discard_ctl->lock);
 
+	/*
+	 * The following is to fix a potential -1 discrepenancy that I'm not
+	 * sure how to reproduce.  But given that this is the only place that
+	 * utilizes these numbers and this is only called by from
+	 * btrfs_finish_extent_commit() which is synchronized, we can correct
+	 * here.
+	 */
+	if (discardable_extents < 0)
+		atomic_add(-discardable_extents,
+			   &discard_ctl->discardable_extents);
+
+	if (discardable_bytes < 0)
+		atomic64_add(-discardable_bytes,
+			     &discard_ctl->discardable_bytes);
+
+	if (discardable_extents <= 0) {
+		spin_unlock(&discard_ctl->lock);
+		return;
+	}
+
 	iops_limit = READ_ONCE(discard_ctl->iops_limit);
 	if (iops_limit)
 		lower_limit = max_t(unsigned long, lower_limit,
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [PATCH 01/12] btrfs: calculate discard delay based on number of extents
  2020-01-02 21:26 ` [PATCH 01/12] btrfs: calculate discard delay based on number of extents Dennis Zhou
@ 2020-01-03 14:38   ` David Sterba
  0 siblings, 0 replies; 28+ messages in thread
From: David Sterba @ 2020-01-03 14:38 UTC (permalink / raw)
  To: Dennis Zhou
  Cc: David Sterba, Chris Mason, Josef Bacik, Omar Sandoval,
	kernel-team, linux-btrfs

On Thu, Jan 02, 2020 at 04:26:35PM -0500, Dennis Zhou wrote:
> An earlier patch keeps track of discardable_extents. These are
> undiscarded extents managed by the free space cache. Here, we will use
> this to dynamically calculate the discard delay interval.
> 
> There are 3 rate to consider. The first is the target convergence rate,
> the rate to discard all discardable_extents over the
> BTRFS_DISCARD_TARGET_MSEC time frame. This is clamped by the lower
> limit, the iops limit or BTRFS_DISCARD_MIN_DELAY (1ms), and the upper
> limit, BTRFS_DISCARD_MAX_DELAY (1s). We reevaluate this delay every
> transaction commit.
> 
> Signed-off-by: Dennis Zhou <dennis@kernel.org>
> Reviewed-by: Josef Bacik <josef@toxicpanda.com>
> ---
>  fs/btrfs/ctree.h       |  2 ++
>  fs/btrfs/discard.c     | 55 +++++++++++++++++++++++++++++++++++++++---
>  fs/btrfs/discard.h     |  1 +
>  fs/btrfs/extent-tree.c |  4 ++-
>  fs/btrfs/sysfs.c       | 31 ++++++++++++++++++++++++
>  5 files changed, 88 insertions(+), 5 deletions(-)
> 
> diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
> index 7c1c236d13ae..c73bbc7e4491 100644
> --- a/fs/btrfs/ctree.h
> +++ b/fs/btrfs/ctree.h
> @@ -468,6 +468,8 @@ struct btrfs_discard_ctl {
>  	struct list_head discard_list[BTRFS_NR_DISCARD_LISTS];
>  	atomic_t discardable_extents;
>  	atomic64_t discardable_bytes;
> +	unsigned long delay;
> +	unsigned iops_limit;

As the kbps_limit uses u32, I switched that to u32 as well.

>  };
>  
>  /* delayed seq elem */
> diff --git a/fs/btrfs/discard.c b/fs/btrfs/discard.c
> index 173770bf8a2d..abcc3b2189d1 100644
> --- a/fs/btrfs/discard.c
> +++ b/fs/btrfs/discard.c
> @@ -15,6 +15,12 @@
>  #define BTRFS_DISCARD_DELAY		(120ULL * NSEC_PER_SEC)
>  #define BTRFS_DISCARD_UNUSED_DELAY	(10ULL * NSEC_PER_SEC)
>  
> +/* Target completion latency of discarding all discardable extents */
> +#define BTRFS_DISCARD_TARGET_MSEC	(6 * 60 * 60UL * MSEC_PER_SEC)
> +#define BTRFS_DISCARD_MIN_DELAY_MSEC	(1UL)
> +#define BTRFS_DISCARD_MAX_DELAY_MSEC	(1000UL)
> +#define BTRFS_DISCARD_MAX_IOPS		(10U)
> +
>  static struct list_head *get_discard_list(struct btrfs_discard_ctl *discard_ctl,
>  					  struct btrfs_block_group *block_group)
>  {
> @@ -235,11 +241,18 @@ void btrfs_discard_schedule_work(struct btrfs_discard_ctl *discard_ctl,
>  
>  	block_group = find_next_block_group(discard_ctl, now);
>  	if (block_group) {
> -		u64 delay = 0;
> +		unsigned long delay = discard_ctl->delay;
> +
> +		/*
> +		 * This timeout is to hopefully prevent immediate discarding
> +		 * in a recently allocated block group.
> +		 */
> +		if (now < block_group->discard_eligible_time) {
> +			u64 bg_timeout = (block_group->discard_eligible_time -
> +					  now);
>  
> -		if (now < block_group->discard_eligible_time)
> -			delay = nsecs_to_jiffies(
> -				block_group->discard_eligible_time - now);
> +			delay = max(delay, nsecs_to_jiffies(bg_timeout));
> +		}
>  
>  		mod_delayed_work(discard_ctl->discard_workers,
>  				 &discard_ctl->work, delay);
> @@ -342,6 +355,38 @@ bool btrfs_run_discard_work(struct btrfs_discard_ctl *discard_ctl)
>  		test_bit(BTRFS_FS_DISCARD_RUNNING, &fs_info->flags));
>  }
>  
> +/**
> + * btrfs_discard_calc_delay - recalculate the base delay
> + * @discard_ctl: discard control
> + *
> + * Recalculate the base delay which is based off the total number of
> + * discardable_extents.  Clamp this between the lower_limit (iops_limit or 1ms)
> + * and the upper_limit (BTRFS_DISCARD_MAX_DELAY_MSEC).
> + */
> +void btrfs_discard_calc_delay(struct btrfs_discard_ctl *discard_ctl)
> +{
> +	s32 discardable_extents =
> +		atomic_read(&discard_ctl->discardable_extents);
> +	unsigned iops_limit;
> +	unsigned long delay, lower_limit = BTRFS_DISCARD_MIN_DELAY_MSEC;
> +
> +	if (!discardable_extents)
> +		return;
> +
> +	spin_lock(&discard_ctl->lock);
> +
> +	iops_limit = READ_ONCE(discard_ctl->iops_limit);
> +	if (iops_limit)
> +		lower_limit = max_t(unsigned long, lower_limit,
> +				    MSEC_PER_SEC / iops_limit);
> +
> +	delay = BTRFS_DISCARD_TARGET_MSEC / discardable_extents;
> +	delay = clamp(delay, lower_limit, BTRFS_DISCARD_MAX_DELAY_MSEC);
> +	discard_ctl->delay = msecs_to_jiffies(delay);
> +
> +	spin_unlock(&discard_ctl->lock);
> +}
> +
>  /**
>   * btrfs_discard_update_discardable - propagate discard counters
>   * @block_group: block_group of interest
> @@ -464,6 +509,8 @@ void btrfs_discard_init(struct btrfs_fs_info *fs_info)
>  
>  	atomic_set(&discard_ctl->discardable_extents, 0);
>  	atomic64_set(&discard_ctl->discardable_bytes, 0);
> +	discard_ctl->delay = BTRFS_DISCARD_MAX_DELAY_MSEC;
> +	discard_ctl->iops_limit = BTRFS_DISCARD_MAX_IOPS;
>  }
>  
>  void btrfs_discard_cleanup(struct btrfs_fs_info *fs_info)
> diff --git a/fs/btrfs/discard.h b/fs/btrfs/discard.h
> index 0f2f89b1b0b9..5250fe178e49 100644
> --- a/fs/btrfs/discard.h
> +++ b/fs/btrfs/discard.h
> @@ -17,6 +17,7 @@ void btrfs_discard_schedule_work(struct btrfs_discard_ctl *discard_ctl,
>  bool btrfs_run_discard_work(struct btrfs_discard_ctl *discard_ctl);
>  
>  /* Update operations */
> +void btrfs_discard_calc_delay(struct btrfs_discard_ctl *discard_ctl);
>  void btrfs_discard_update_discardable(struct btrfs_block_group *block_group,
>  				      struct btrfs_free_space_ctl *ctl);
>  
> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
> index 2c12366cfde5..0163fdd59f8f 100644
> --- a/fs/btrfs/extent-tree.c
> +++ b/fs/btrfs/extent-tree.c
> @@ -2935,8 +2935,10 @@ int btrfs_finish_extent_commit(struct btrfs_trans_handle *trans)
>  		cond_resched();
>  	}
>  
> -	if (btrfs_test_opt(fs_info, DISCARD_ASYNC))
> +	if (btrfs_test_opt(fs_info, DISCARD_ASYNC)) {
> +		btrfs_discard_calc_delay(&fs_info->discard_ctl);
>  		btrfs_discard_schedule_work(&fs_info->discard_ctl, true);
> +	}
>  
>  	/*
>  	 * Transaction is finished.  We don't need the lock anymore.  We
> diff --git a/fs/btrfs/sysfs.c b/fs/btrfs/sysfs.c
> index e9dbdbbbebeb..e175aaf7a1e6 100644
> --- a/fs/btrfs/sysfs.c
> +++ b/fs/btrfs/sysfs.c
> @@ -344,6 +344,36 @@ static const struct attribute_group btrfs_static_feature_attr_group = {
>   */
>  #define discard_to_fs_info(_kobj)	to_fs_info((_kobj)->parent->parent)
>  
> +static ssize_t btrfs_discard_iops_limit_show(struct kobject *kobj,
> +					     struct kobj_attribute *a,
> +					     char *buf)
> +{
> +	struct btrfs_fs_info *fs_info = discard_to_fs_info(kobj);
> +
> +	return snprintf(buf, PAGE_SIZE, "%u\n",
> +			READ_ONCE(fs_info->discard_ctl.iops_limit));
> +}
> +
> +static ssize_t btrfs_discard_iops_limit_store(struct kobject *kobj,
> +					      struct kobj_attribute *a,
> +					      const char *buf, size_t len)
> +{
> +	struct btrfs_fs_info *fs_info = discard_to_fs_info(kobj);
> +	struct btrfs_discard_ctl *discard_ctl = &fs_info->discard_ctl;
> +	unsigned iops_limit;
> +	int ret;
> +
> +	ret = kstrtouint(buf, 10, &iops_limit);
> +	if (ret)
> +		return -EINVAL;
> +
> +	WRITE_ONCE(discard_ctl->iops_limit, iops_limit);
> +
> +	return len;
> +}
> +BTRFS_ATTR_RW(discard, iops_limit, btrfs_discard_iops_limit_show,
> +	      btrfs_discard_iops_limit_store);
> +
>  static ssize_t btrfs_discardable_extents_show(struct kobject *kobj,
>  					      struct kobj_attribute *a,
>  					      char *buf)
> @@ -367,6 +397,7 @@ static ssize_t btrfs_discardable_bytes_show(struct kobject *kobj,
>  BTRFS_ATTR(discard, discardable_bytes, btrfs_discardable_bytes_show);
>  
>  static const struct attribute *discard_debug_attrs[] = {
> +	BTRFS_ATTR_PTR(discard, iops_limit),
>  	BTRFS_ATTR_PTR(discard, discardable_extents),
>  	BTRFS_ATTR_PTR(discard, discardable_bytes),

I've reordered the callbacks and definitions so they're in alphabetical
order (in the base branch and in all the following patches).

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 02/12] btrfs: add bps discard rate limit for async discard
  2020-01-02 21:26 ` [PATCH 02/12] btrfs: add bps discard rate limit for async discard Dennis Zhou
@ 2020-01-03 14:40   ` David Sterba
  0 siblings, 0 replies; 28+ messages in thread
From: David Sterba @ 2020-01-03 14:40 UTC (permalink / raw)
  To: Dennis Zhou
  Cc: David Sterba, Chris Mason, Josef Bacik, Omar Sandoval,
	kernel-team, linux-btrfs

On Thu, Jan 02, 2020 at 04:26:36PM -0500, Dennis Zhou wrote:
> +		 */
> +		if (kbps_limit && discard_ctl->prev_discard) {
> +			u64 bps_limit = ((u64)kbps_limit) * SZ_1K;
> +			u64 bps_delay = div64_u64(discard_ctl->prev_discard *
> +						  MSEC_PER_SEC, bps_limit);

I hoped we can get rid of the 64bit division when the kbps is u32 but I
don't think it's necessary, bytes as units of the calculation are fine.

> +
> +			delay = max(delay, msecs_to_jiffies(bps_delay));
> +		}

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 03/12] btrfs: limit max discard size for async discard
  2020-01-02 21:26 ` [PATCH 03/12] btrfs: limit max discard size " Dennis Zhou
@ 2020-01-03 14:41   ` David Sterba
  0 siblings, 0 replies; 28+ messages in thread
From: David Sterba @ 2020-01-03 14:41 UTC (permalink / raw)
  To: Dennis Zhou
  Cc: David Sterba, Chris Mason, Josef Bacik, Omar Sandoval,
	kernel-team, linux-btrfs

On Thu, Jan 02, 2020 at 04:26:37PM -0500, Dennis Zhou wrote:
> Throttle the maximum size of a discard so that we can provide an upper
> bound for the rate of async discard. While the block layer is able to
> split discards into the appropriate sized discards, we want to be able
> to account more accurately the rate at which we are consuming ncq slots
> as well as limit the upper bound of work for a discard.
> 
> Signed-off-by: Dennis Zhou <dennis@kernel.org>
> Reviewed-by: Josef Bacik <josef@toxicpanda.com>
> ---
>  fs/btrfs/discard.h          |  5 +++++
>  fs/btrfs/free-space-cache.c | 41 +++++++++++++++++++++++++++++--------
>  2 files changed, 37 insertions(+), 9 deletions(-)
> 
> diff --git a/fs/btrfs/discard.h b/fs/btrfs/discard.h
> index 5250fe178e49..562c60fab77a 100644
> --- a/fs/btrfs/discard.h
> +++ b/fs/btrfs/discard.h
> @@ -3,10 +3,15 @@
>  #ifndef BTRFS_DISCARD_H
>  #define BTRFS_DISCARD_H
>  
> +#include <linux/sizes.h>
> +
>  struct btrfs_fs_info;
>  struct btrfs_discard_ctl;
>  struct btrfs_block_group;
>  
> +/* Discard size limits */
> +#define BTRFS_ASYNC_DISCARD_MAX_SIZE	(SZ_64M)
> +
>  /* Work operations */
>  void btrfs_discard_cancel_work(struct btrfs_discard_ctl *discard_ctl,
>  			       struct btrfs_block_group *block_group);
> diff --git a/fs/btrfs/free-space-cache.c b/fs/btrfs/free-space-cache.c
> index 40fb918a82f4..34291c777998 100644
> --- a/fs/btrfs/free-space-cache.c
> +++ b/fs/btrfs/free-space-cache.c
> @@ -3466,16 +3466,36 @@ static int trim_no_bitmap(struct btrfs_block_group *block_group,
>  		extent_start = entry->offset;
>  		extent_bytes = entry->bytes;
>  		extent_trim_state = entry->trim_state;
> -		start = max(start, extent_start);
> -		bytes = min(extent_start + extent_bytes, end) - start;
> -		if (bytes < minlen) {
> -			spin_unlock(&ctl->tree_lock);
> -			mutex_unlock(&ctl->cache_writeout_mutex);
> -			goto next;
> -		}
> +		if (async) {
> +			start = entry->offset;
> +			bytes = entry->bytes;
> +			if (bytes < minlen) {
> +				spin_unlock(&ctl->tree_lock);
> +				mutex_unlock(&ctl->cache_writeout_mutex);
> +				goto next;
> +			}
> +			unlink_free_space(ctl, entry);
> +			if (bytes > BTRFS_ASYNC_DISCARD_MAX_SIZE) {
> +				bytes = extent_bytes =
> +					BTRFS_ASYNC_DISCARD_MAX_SIZE;

You still left one chain assignment here, fixed and also in the followup
patch that switches to the variable limit.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 12/12] btrfs: add correction to handle -1 edge case in async discard
  2020-01-02 21:26 ` [PATCH 12/12] btrfs: add correction to handle -1 edge case in async discard Dennis Zhou
@ 2020-01-03 14:42   ` David Sterba
  2020-01-05 20:35   ` Nikolay Borisov
  1 sibling, 0 replies; 28+ messages in thread
From: David Sterba @ 2020-01-03 14:42 UTC (permalink / raw)
  To: Dennis Zhou
  Cc: David Sterba, Chris Mason, Josef Bacik, Omar Sandoval,
	kernel-team, linux-btrfs

On Thu, Jan 02, 2020 at 04:26:46PM -0500, Dennis Zhou wrote:
> >From Dave's testing, it's possible to drive a file system to have -1
> discardable_extents and a corresponding negative discardable_bytes. As
> btrfs_discard_calc_delay() is the only user of discardable_extents, we
> can correct here for any negative discardable_extents/discardable_bytes.

Changelog updated with the rough description of the workload.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 04/12] btrfs: make max async discard size tunable
  2020-01-02 21:26 ` [PATCH 04/12] btrfs: make max async discard size tunable Dennis Zhou
@ 2020-01-03 14:44   ` David Sterba
  0 siblings, 0 replies; 28+ messages in thread
From: David Sterba @ 2020-01-03 14:44 UTC (permalink / raw)
  To: Dennis Zhou
  Cc: David Sterba, Chris Mason, Josef Bacik, Omar Sandoval,
	kernel-team, linux-btrfs

On Thu, Jan 02, 2020 at 04:26:38PM -0500, Dennis Zhou wrote:
> -#define BTRFS_ASYNC_DISCARD_MAX_SIZE	(SZ_64M)
> +#define BTRFS_ASYNC_DISCARD_DFL_MAX_SIZE	(SZ_64M)

I'd prefer to spell DEFAULT, so changed it to that. The identifier
would be long in both cases so let's pick the descriptive version.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 00/12] btrfs: async discard follow up
  2020-01-02 21:26 [PATCH 00/12] btrfs: async discard follow up Dennis Zhou
                   ` (11 preceding siblings ...)
  2020-01-02 21:26 ` [PATCH 12/12] btrfs: add correction to handle -1 edge case in async discard Dennis Zhou
@ 2020-01-03 14:51 ` David Sterba
  2020-01-03 17:43   ` Dennis Zhou
  2020-01-06 15:25 ` David Sterba
  2020-01-06 16:30 ` David Sterba
  14 siblings, 1 reply; 28+ messages in thread
From: David Sterba @ 2020-01-03 14:51 UTC (permalink / raw)
  To: Dennis Zhou
  Cc: David Sterba, Chris Mason, Josef Bacik, Omar Sandoval,
	kernel-team, linux-btrfs

On Thu, Jan 02, 2020 at 04:26:34PM -0500, Dennis Zhou wrote:
> Hello,
> 
> Dave applied 1-12 from v6 [1]. This is a follow up cleaning up the
> remaining 10 patches adding 2 more to deal with a rare -1 [2] that I
> haven't quite figured out how to repro. This is also available at [3].
> 
> This series is on top of btrfs-devel#misc-next-with-discard-v6 0c7be920bd7d.
> 
> [1] https://lore.kernel.org/linux-btrfs/cover.1576195673.git.dennis@kernel.org/
> [2] https://lore.kernel.org/linux-btrfs/20191217145541.GE3929@suse.cz/
> [3] https://git.kernel.org/pub/scm/linux/kernel/git/dennis/misc.git/log/?h=async-discard
> 
> Dennis Zhou (12):
>   btrfs: calculate discard delay based on number of extents
>   btrfs: add bps discard rate limit for async discard
>   btrfs: limit max discard size for async discard
>   btrfs: make max async discard size tunable
>   btrfs: have multiple discard lists
>   btrfs: only keep track of data extents for async discard
>   btrfs: keep track of discard reuse stats
>   btrfs: add async discard header
>   btrfs: increase the metadata allowance for the free_space_cache
>   btrfs: make smaller extents more likely to go into bitmaps
>   btrfs: ensure removal of discardable_* in free_bitmap()
>   btrfs: add correction to handle -1 edge case in async discard

Besides the changes posted to the patches, I did more style cleanups and
formatting adjustments as I went through the patches. I'll do some
testing again to be sure there are no bugs introduced by that, but
otherwise the patchset can be considered merged to misc-next. I'll push
the branch today.

It's a lot of new code but I was able to comprehend what's going on,
great that there's the patch adding implementation overview.
As the feature is not on by default and requires "special" hardware, it
should be safe, basisc tests passed so now we're left with the hard bugs
and corner cases. Thanks.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 00/12] btrfs: async discard follow up
  2020-01-03 14:51 ` [PATCH 00/12] btrfs: async discard follow up David Sterba
@ 2020-01-03 17:43   ` Dennis Zhou
  0 siblings, 0 replies; 28+ messages in thread
From: Dennis Zhou @ 2020-01-03 17:43 UTC (permalink / raw)
  To: David Sterba
  Cc: Dennis Zhou, David Sterba, Chris Mason, Josef Bacik,
	Omar Sandoval, kernel-team, linux-btrfs

On Fri, Jan 03, 2020 at 03:51:25PM +0100, David Sterba wrote:
> On Thu, Jan 02, 2020 at 04:26:34PM -0500, Dennis Zhou wrote:
> > Hello,
> > 
> > Dave applied 1-12 from v6 [1]. This is a follow up cleaning up the
> > remaining 10 patches adding 2 more to deal with a rare -1 [2] that I
> > haven't quite figured out how to repro. This is also available at [3].
> > 
> > This series is on top of btrfs-devel#misc-next-with-discard-v6 0c7be920bd7d.
> > 
> > [1] https://lore.kernel.org/linux-btrfs/cover.1576195673.git.dennis@kernel.org/
> > [2] https://lore.kernel.org/linux-btrfs/20191217145541.GE3929@suse.cz/
> > [3] https://git.kernel.org/pub/scm/linux/kernel/git/dennis/misc.git/log/?h=async-discard
> > 
> > Dennis Zhou (12):
> >   btrfs: calculate discard delay based on number of extents
> >   btrfs: add bps discard rate limit for async discard
> >   btrfs: limit max discard size for async discard
> >   btrfs: make max async discard size tunable
> >   btrfs: have multiple discard lists
> >   btrfs: only keep track of data extents for async discard
> >   btrfs: keep track of discard reuse stats
> >   btrfs: add async discard header
> >   btrfs: increase the metadata allowance for the free_space_cache
> >   btrfs: make smaller extents more likely to go into bitmaps
> >   btrfs: ensure removal of discardable_* in free_bitmap()
> >   btrfs: add correction to handle -1 edge case in async discard
> 
> Besides the changes posted to the patches, I did more style cleanups and
> formatting adjustments as I went through the patches. I'll do some
> testing again to be sure there are no bugs introduced by that, but
> otherwise the patchset can be considered merged to misc-next. I'll push
> the branch today.
> 
> It's a lot of new code but I was able to comprehend what's going on,
> great that there's the patch adding implementation overview.
> As the feature is not on by default and requires "special" hardware, it
> should be safe, basisc tests passed so now we're left with the hard bugs
> and corner cases. Thanks.

Ah I apologize for the few misses. Thanks for fixing them and taking
this series! It definitely wasn't an easy series, so I appreciate the
help and patience!

Thanks,
Dennis

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 12/12] btrfs: add correction to handle -1 edge case in async discard
  2020-01-02 21:26 ` [PATCH 12/12] btrfs: add correction to handle -1 edge case in async discard Dennis Zhou
  2020-01-03 14:42   ` David Sterba
@ 2020-01-05 20:35   ` Nikolay Borisov
  2020-01-06 13:44     ` David Sterba
  1 sibling, 1 reply; 28+ messages in thread
From: Nikolay Borisov @ 2020-01-05 20:35 UTC (permalink / raw)
  To: Dennis Zhou, David Sterba, Chris Mason, Josef Bacik, Omar Sandoval
  Cc: kernel-team, linux-btrfs



On 2.01.20 г. 23:26 ч., Dennis Zhou wrote:
> From Dave's testing, it's possible to drive a file system to have -1
> discardable_extents and a corresponding negative discardable_bytes. As
> btrfs_discard_calc_delay() is the only user of discardable_extents, we
> can correct here for any negative discardable_extents/discardable_bytes.
> 
> Reported-by: David Sterba <dsterba@suse.com>
> Signed-off-by: Dennis Zhou <dennis@kernel.org>
> ---
>  fs/btrfs/discard.c | 24 +++++++++++++++++++++---
>  1 file changed, 21 insertions(+), 3 deletions(-)
> 
> diff --git a/fs/btrfs/discard.c b/fs/btrfs/discard.c
> index d5a89e3755ed..d2c7851e31de 100644
> --- a/fs/btrfs/discard.c
> +++ b/fs/btrfs/discard.c
> @@ -518,14 +518,32 @@ void btrfs_discard_calc_delay(struct btrfs_discard_ctl *discard_ctl)
>  {
>  	s32 discardable_extents =
>  		atomic_read(&discard_ctl->discardable_extents);
> +	s64 discardable_bytes = atomic64_read(&discard_ctl->discardable_bytes);
>  	unsigned iops_limit;
>  	unsigned long delay, lower_limit = BTRFS_DISCARD_MIN_DELAY_MSEC;
>  
> -	if (!discardable_extents)
> -		return;
> -
>  	spin_lock(&discard_ctl->lock);
>  
> +	/*
> +	 * The following is to fix a potential -1 discrepenancy that I'm not
> +	 * sure how to reproduce.  But given that this is the only place that
> +	 * utilizes these numbers and this is only called by from
> +	 * btrfs_finish_extent_commit() which is synchronized, we can correct
> +	 * here.
> +	 */
> +	if (discardable_extents < 0)
> +		atomic_add(-discardable_extents,
> +			   &discard_ctl->discardable_extents);
> +
> +	if (discardable_bytes < 0)
> +		atomic64_add(-discardable_bytes,
> +			     &discard_ctl->discardable_bytes);
> +
> +	if (discardable_extents <= 0) {
> +		spin_unlock(&discard_ctl->lock);
> +		return;
> +	}

Perhaps a WARN_ON for each of those conditions? AFAIU this is papering
over a real issue which is still not fully diagnosed, no? In this case
if someone hits it in the wild they could come back with some stack traces?

> +
>  	iops_limit = READ_ONCE(discard_ctl->iops_limit);
>  	if (iops_limit)
>  		lower_limit = max_t(unsigned long, lower_limit,
> 

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 12/12] btrfs: add correction to handle -1 edge case in async discard
  2020-01-05 20:35   ` Nikolay Borisov
@ 2020-01-06 13:44     ` David Sterba
  0 siblings, 0 replies; 28+ messages in thread
From: David Sterba @ 2020-01-06 13:44 UTC (permalink / raw)
  To: Nikolay Borisov
  Cc: Dennis Zhou, David Sterba, Chris Mason, Josef Bacik,
	Omar Sandoval, kernel-team, linux-btrfs

On Sun, Jan 05, 2020 at 10:35:56PM +0200, Nikolay Borisov wrote:
> > +	/*
> > +	 * The following is to fix a potential -1 discrepenancy that I'm not
> > +	 * sure how to reproduce.  But given that this is the only place that
> > +	 * utilizes these numbers and this is only called by from
> > +	 * btrfs_finish_extent_commit() which is synchronized, we can correct
> > +	 * here.
> > +	 */
> > +	if (discardable_extents < 0)
> > +		atomic_add(-discardable_extents,
> > +			   &discard_ctl->discardable_extents);
> > +
> > +	if (discardable_bytes < 0)
> > +		atomic64_add(-discardable_bytes,
> > +			     &discard_ctl->discardable_bytes);
> > +
> > +	if (discardable_extents <= 0) {
> > +		spin_unlock(&discard_ctl->lock);
> > +		return;
> > +	}
> 
> Perhaps a WARN_ON for each of those conditions? AFAIU this is papering
> over a real issue which is still not fully diagnosed, no? In this case
> if someone hits it in the wild they could come back with some stack traces?

I don't think the stacktrace itself would help us, the call chain will
be always the same. We need a reproducer for it and random user reports
would likely not help either, besides that the issue happens. Some sort
of developer-only warning would be desirable though.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 00/12] btrfs: async discard follow up
  2020-01-02 21:26 [PATCH 00/12] btrfs: async discard follow up Dennis Zhou
                   ` (12 preceding siblings ...)
  2020-01-03 14:51 ` [PATCH 00/12] btrfs: async discard follow up David Sterba
@ 2020-01-06 15:25 ` David Sterba
  2020-01-06 17:14   ` Dennis Zhou
  2020-01-06 16:30 ` David Sterba
  14 siblings, 1 reply; 28+ messages in thread
From: David Sterba @ 2020-01-06 15:25 UTC (permalink / raw)
  To: Dennis Zhou
  Cc: David Sterba, Chris Mason, Josef Bacik, Omar Sandoval,
	kernel-team, linux-btrfs

On Thu, Jan 02, 2020 at 04:26:34PM -0500, Dennis Zhou wrote:
> Dave applied 1-12 from v6 [1]. This is a follow up cleaning up the
> remaining 10 patches adding 2 more to deal with a rare -1 [2] that I
> haven't quite figured out how to repro. This is also available at [3].
> 
> This series is on top of btrfs-devel#misc-next-with-discard-v6 0c7be920bd7d.
> 
> [1] https://lore.kernel.org/linux-btrfs/cover.1576195673.git.dennis@kernel.org/
> [2] https://lore.kernel.org/linux-btrfs/20191217145541.GE3929@suse.cz/
> [3] https://git.kernel.org/pub/scm/linux/kernel/git/dennis/misc.git/log/?h=async-discard
> 
> Dennis Zhou (12):
>   btrfs: calculate discard delay based on number of extents
>   btrfs: add bps discard rate limit for async discard
>   btrfs: limit max discard size for async discard
>   btrfs: make max async discard size tunable
>   btrfs: have multiple discard lists
>   btrfs: only keep track of data extents for async discard
>   btrfs: keep track of discard reuse stats
>   btrfs: add async discard header
>   btrfs: increase the metadata allowance for the free_space_cache
>   btrfs: make smaller extents more likely to go into bitmaps
>   btrfs: ensure removal of discardable_* in free_bitmap()
>   btrfs: add correction to handle -1 edge case in async discard

I found this lockdep warning on the machine but can't tell what was the
exact load at the time. I did a few copy/delete/balance and git checkout
rounds, similar to the first testing loads. The branch tested was
basically current misc-next:

[251925.229374] ======================================================
[251925.235828] WARNING: possible circular locking dependency detected
[251925.242275] 5.5.0-rc4-1.ge195904-vanilla+ #553 Not tainted
[251925.248012] ------------------------------------------------------
[251925.254430] git/1531 is trying to acquire lock:
[251925.259191] ffff9f93e120e690 (&fs_info->reloc_mutex){+.+.}, at: btrfs_record_root_in_trans+0x44/0x70 [btrfs]
[251925.269419] 
[251925.269419] but task is already holding lock:
[251925.275629] ffff9f93df17bb38 (&mm->mmap_sem#2){++++}, at: vm_mmap_pgoff+0x71/0x100
[251925.283505] 
[251925.283505] which lock already depends on the new lock.
[251925.283505] 
[251925.292221] 
[251925.292221] the existing dependency chain (in reverse order) is:
[251925.300077] 
[251925.300077] -> #5 (&mm->mmap_sem#2){++++}:
[251925.306035]        __lock_acquire+0x3de/0x810
[251925.310612]        lock_acquire+0x95/0x1b0
[251925.314931]        __might_fault+0x60/0x80
[251925.319245]        _copy_from_user+0x1e/0xa0
[251925.323734]        get_sg_io_hdr+0xbb/0xf0
[251925.328053]        scsi_cmd_ioctl+0x213/0x430
[251925.332635]        cdrom_ioctl+0x3c/0x1499 [cdrom]
[251925.337657]        sr_block_ioctl+0xa0/0xd0 [sr_mod]
[251925.342845]        blkdev_ioctl+0x334/0xb70
[251925.347255]        block_ioctl+0x3f/0x50
[251925.351398]        do_vfs_ioctl+0x580/0x7b0
[251925.355790]        ksys_ioctl+0x3a/0x70
[251925.359845]        __x64_sys_ioctl+0x16/0x20
[251925.364347]        do_syscall_64+0x56/0x220
[251925.368750]        entry_SYSCALL_64_after_hwframe+0x49/0xbe
[251925.374538] 
[251925.374538] -> #4 (sr_mutex){+.+.}:
[251925.379877]        __lock_acquire+0x3de/0x810
[251925.384453]        lock_acquire+0x95/0x1b0
[251925.388768]        __mutex_lock+0xa0/0xb40
[251925.393074]        sr_block_open+0x9d/0x170 [sr_mod]
[251925.398254]        __blkdev_get+0xed/0x580
[251925.402574]        do_dentry_open+0x13c/0x3b0
[251925.407152]        do_last+0x18a/0x900
[251925.411117]        path_openat+0xa5/0x2b0
[251925.417191]        do_filp_open+0x91/0x100
[251925.421511]        do_sys_open+0x184/0x220
[251925.425817]        do_syscall_64+0x56/0x220
[251925.430220]        entry_SYSCALL_64_after_hwframe+0x49/0xbe
[251925.436008] 
[251925.436008] -> #3 (&bdev->bd_mutex){+.+.}:
[251925.441975]        __lock_acquire+0x3de/0x810
[251925.446538]        lock_acquire+0x95/0x1b0
[251925.450854]        __mutex_lock+0xa0/0xb40
[251925.455162]        __blkdev_get+0x7a/0x580
[251925.459488]        blkdev_get+0x78/0x150
[251925.463630]        blkdev_get_by_path+0x46/0x80
[251925.468413]        btrfs_get_bdev_and_sb+0x1b/0xb0 [btrfs]
[251925.474170]        open_fs_devices+0x77/0x280 [btrfs]
[251925.479482]        btrfs_open_devices+0x92/0xa0 [btrfs]
[251925.484967]        btrfs_mount_root+0x1fa/0x580 [btrfs]
[251925.490413]        legacy_get_tree+0x30/0x60
[251925.494899]        vfs_get_tree+0x23/0xb0
[251925.499139]        fc_mount+0xe/0x40
[251925.502933]        vfs_kern_mount.part.0+0x71/0x90
[251925.507981]        btrfs_mount+0x147/0x3e0 [btrfs]
[251925.512989]        legacy_get_tree+0x30/0x60
[251925.517477]        vfs_get_tree+0x23/0xb0
[251925.521707]        do_mount+0x7a8/0xa50
[251925.525751]        __x64_sys_mount+0x8e/0xd0
[251925.530243]        do_syscall_64+0x56/0x220
[251925.534646]        entry_SYSCALL_64_after_hwframe+0x49/0xbe
[251925.540427] 
[251925.540427] -> #2 (&fs_devs->device_list_mutex){+.+.}:
[251925.547434]        __lock_acquire+0x3de/0x810
[251925.552013]        lock_acquire+0x95/0x1b0
[251925.556328]        __mutex_lock+0xa0/0xb40
[251925.560677]        btrfs_run_dev_stats+0x35/0x90 [btrfs]
[251925.566243]        commit_cowonly_roots+0xb5/0x310 [btrfs]
[251925.571974]        btrfs_commit_transaction+0x508/0xb20 [btrfs]
[251925.578123]        sync_filesystem+0x74/0x90
[251925.582617]        generic_shutdown_super+0x22/0x100
[251925.587790]        kill_anon_super+0x14/0x30
[251925.592308]        btrfs_kill_super+0x12/0xa0 [btrfs]
[251925.597576]        deactivate_locked_super+0x31/0x70
[251925.602758]        cleanup_mnt+0x100/0x160
[251925.607074]        task_work_run+0x93/0xc0
[251925.611399]        exit_to_usermode_loop+0x9f/0xb0
[251925.616397]        do_syscall_64+0x1f0/0x220
[251925.620896]        entry_SYSCALL_64_after_hwframe+0x49/0xbe
[251925.626677] 
[251925.626677] -> #1 (&fs_info->tree_log_mutex){+.+.}:
[251925.633423]        __lock_acquire+0x3de/0x810
[251925.637997]        lock_acquire+0x95/0x1b0
[251925.642303]        __mutex_lock+0xa0/0xb40
[251925.646652]        btrfs_commit_transaction+0x4b0/0xb20 [btrfs]
[251925.652784]        sync_filesystem+0x74/0x90
[251925.657259]        generic_shutdown_super+0x22/0x100
[251925.662445]        kill_anon_super+0x14/0x30
[251925.666964]        btrfs_kill_super+0x12/0xa0 [btrfs]
[251925.672231]        deactivate_locked_super+0x31/0x70
[251925.677402]        cleanup_mnt+0x100/0x160
[251925.681716]        task_work_run+0x93/0xc0
[251925.686026]        exit_to_usermode_loop+0x9f/0xb0
[251925.691037]        do_syscall_64+0x1f0/0x220
[251925.695532]        entry_SYSCALL_64_after_hwframe+0x49/0xbe
[251925.701313] 
[251925.701313] -> #0 (&fs_info->reloc_mutex){+.+.}:
[251925.707789]        check_prev_add+0xa2/0xa90
[251925.712277]        validate_chain+0x6bf/0xbb0
[251925.716844]        __lock_acquire+0x3de/0x810
[251925.721420]        lock_acquire+0x95/0x1b0
[251925.725734]        __mutex_lock+0xa0/0xb40
[251925.730087]        btrfs_record_root_in_trans+0x44/0x70 [btrfs]
[251925.736260]        start_transaction+0xd8/0x590 [btrfs]
[251925.741737]        btrfs_dirty_inode+0x44/0xd0 [btrfs]
[251925.747103]        touch_atime+0xbe/0xe0
[251925.751268]        btrfs_file_mmap+0x3f/0x60 [btrfs]
[251925.756456]        mmap_region+0x3f7/0x680
[251925.760786]        do_mmap+0x36d/0x520
[251925.764744]        vm_mmap_pgoff+0x9d/0x100
[251925.769147]        ksys_mmap_pgoff+0x1a3/0x290
[251925.773812]        do_syscall_64+0x56/0x220
[251925.778215]        entry_SYSCALL_64_after_hwframe+0x49/0xbe
[251925.784005] 
[251925.784005] other info that might help us debug this:
[251925.784005] 
[251925.792546] Chain exists of:
[251925.792546]   &fs_info->reloc_mutex --> sr_mutex --> &mm->mmap_sem#2
[251925.792546] 
[251925.803779]  Possible unsafe locking scenario:
[251925.803779] 
[251925.810075]        CPU0                    CPU1
[251925.814820]        ----                    ----
[251925.819570]   lock(&mm->mmap_sem#2);
[251925.823354]                                lock(sr_mutex);
[251925.829059]                                lock(&mm->mmap_sem#2);
[251925.835363]   lock(&fs_info->reloc_mutex);
[251925.839674] 
[251925.839674]  *** DEADLOCK ***
[251925.839674] 
[251925.846136] 3 locks held by git/1531:
[251925.850015]  #0: ffff9f93df17bb38 (&mm->mmap_sem#2){++++}, at: vm_mmap_pgoff+0x71/0x100
[251925.858298]  #1: ffff9f93d6e3c488 (sb_writers#9){.+.+}, at: touch_atime+0x64/0xe0
[251925.866070]  #2: ffff9f93d6e3c6e8 (sb_internal#2){.+.+}, at: start_transaction+0x3e4/0x590 [btrfs]
[251925.875366] 
[251925.875366] stack backtrace:
[251925.880100] CPU: 5 PID: 1531 Comm: git Not tainted 5.5.0-rc4-1.ge195904-vanilla+ #553
[251925.888219] Hardware name: empty empty/S3993, BIOS PAQEX0-3 02/24/2008
[251925.894971] Call Trace:
[251925.897642]  dump_stack+0x71/0xa0
[251925.901178]  check_noncircular+0x177/0x190
[251925.905498]  check_prev_add+0xa2/0xa90
[251925.909481]  ? sched_clock_cpu+0x15/0x130
[251925.913710]  validate_chain+0x6bf/0xbb0
[251925.917772]  ? _raw_spin_unlock+0x1f/0x40
[251925.921995]  __lock_acquire+0x3de/0x810
[251925.926052]  lock_acquire+0x95/0x1b0
[251925.929890]  ? btrfs_record_root_in_trans+0x44/0x70 [btrfs]
[251925.935685]  __mutex_lock+0xa0/0xb40
[251925.939518]  ? btrfs_record_root_in_trans+0x44/0x70 [btrfs]
[251925.945348]  ? join_transaction+0x3f4/0x4b0 [btrfs]
[251925.950461]  ? sched_clock+0x5/0x10
[251925.954205]  ? btrfs_record_root_in_trans+0x44/0x70 [btrfs]
[251925.960045]  ? join_transaction+0x3f4/0x4b0 [btrfs]
[251925.965196]  ? btrfs_record_root_in_trans+0x44/0x70 [btrfs]
[251925.971015]  btrfs_record_root_in_trans+0x44/0x70 [btrfs]
[251925.976685]  start_transaction+0xd8/0x590 [btrfs]
[251925.981624]  ? ktime_get_coarse_real_ts64+0x70/0xd0
[251925.986781]  btrfs_dirty_inode+0x44/0xd0 [btrfs]
[251925.991624]  touch_atime+0xbe/0xe0
[251925.995271]  btrfs_file_mmap+0x3f/0x60 [btrfs]
[251925.999935]  mmap_region+0x3f7/0x680
[251926.003728]  do_mmap+0x36d/0x520
[251926.007180]  vm_mmap_pgoff+0x9d/0x100
[251926.011071]  ksys_mmap_pgoff+0x1a3/0x290
[251926.015216]  do_syscall_64+0x56/0x220
[251926.019095]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
[251926.024369] RIP: 0033:0x7f3d80959aca
[251926.028171] Code: 00 64 c7 00 13 00 00 00 eb d6 89 c3 e9 77 ff ff ff 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 49 89 ca b8 09 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 8e 63 2b 00 f7 d8 64 89 01 48
[251926.047306] RSP: 002b:00007ffcb8fa26c8 EFLAGS: 00000202 ORIG_RAX: 0000000000000009
[251926.055162] RAX: ffffffffffffffda RBX: 0000000000819e10 RCX: 00007f3d80959aca
[251926.062590] RDX: 0000000000000001 RSI: 0000000000819e10 RDI: 0000000000000000
[251926.070006] RBP: 0000000000000000 R08: 0000000000000003 R09: 0000000000000000
[251926.077424] R10: 0000000000000002 R11: 0000000000000202 R12: 0000000000000001
[251926.084853] R13: 0000000000000002 R14: 0000000000000003 R15: 0000000000000000

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 00/12] btrfs: async discard follow up
  2020-01-02 21:26 [PATCH 00/12] btrfs: async discard follow up Dennis Zhou
                   ` (13 preceding siblings ...)
  2020-01-06 15:25 ` David Sterba
@ 2020-01-06 16:30 ` David Sterba
  2020-01-06 17:28   ` Dennis Zhou
  14 siblings, 1 reply; 28+ messages in thread
From: David Sterba @ 2020-01-06 16:30 UTC (permalink / raw)
  To: Dennis Zhou
  Cc: David Sterba, Chris Mason, Josef Bacik, Omar Sandoval,
	kernel-team, linux-btrfs

Is it expected to leave the counters in a state where are discardable
extents but not process after a long period of time? I found

discard_bitmap_bytes:316833792
discard_bytes_saved:59390722048
discard_extent_bytes:26122764288
discardable_bytes:44863488
discardable_extents:883
iops_limit:10
kbps_limit:0
max_discard_size:67108864

there was activity when the number of extents wen from about 2000 to
that value (833), so this could bea nother instance of the -1 accounting
bug.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 00/12] btrfs: async discard follow up
  2020-01-06 15:25 ` David Sterba
@ 2020-01-06 17:14   ` Dennis Zhou
  2020-01-06 17:37     ` David Sterba
  0 siblings, 1 reply; 28+ messages in thread
From: Dennis Zhou @ 2020-01-06 17:14 UTC (permalink / raw)
  To: David Sterba
  Cc: Dennis Zhou, David Sterba, Chris Mason, Josef Bacik,
	Omar Sandoval, kernel-team, linux-btrfs

On Mon, Jan 06, 2020 at 04:25:42PM +0100, David Sterba wrote:
> On Thu, Jan 02, 2020 at 04:26:34PM -0500, Dennis Zhou wrote:
> > Dave applied 1-12 from v6 [1]. This is a follow up cleaning up the
> > remaining 10 patches adding 2 more to deal with a rare -1 [2] that I
> > haven't quite figured out how to repro. This is also available at [3].
> > 
> > This series is on top of btrfs-devel#misc-next-with-discard-v6 0c7be920bd7d.
> > 
> > [1] https://lore.kernel.org/linux-btrfs/cover.1576195673.git.dennis@kernel.org/
> > [2] https://lore.kernel.org/linux-btrfs/20191217145541.GE3929@suse.cz/
> > [3] https://git.kernel.org/pub/scm/linux/kernel/git/dennis/misc.git/log/?h=async-discard
> > 
> > Dennis Zhou (12):
> >   btrfs: calculate discard delay based on number of extents
> >   btrfs: add bps discard rate limit for async discard
> >   btrfs: limit max discard size for async discard
> >   btrfs: make max async discard size tunable
> >   btrfs: have multiple discard lists
> >   btrfs: only keep track of data extents for async discard
> >   btrfs: keep track of discard reuse stats
> >   btrfs: add async discard header
> >   btrfs: increase the metadata allowance for the free_space_cache
> >   btrfs: make smaller extents more likely to go into bitmaps
> >   btrfs: ensure removal of discardable_* in free_bitmap()
> >   btrfs: add correction to handle -1 edge case in async discard
> 
> I found this lockdep warning on the machine but can't tell what was the
> exact load at the time. I did a few copy/delete/balance and git checkout
> rounds, similar to the first testing loads. The branch tested was
> basically current misc-next:

I've definitely ran into an mmap_sem circular lockdep warning before,
but I believe at the time I was able to repro it without my patches on
top.

Besides that, I'm not sure how my series would be the trigger for this.
I'll take a closer look today.

> 
> [251925.229374] ======================================================
> [251925.235828] WARNING: possible circular locking dependency detected
> [251925.242275] 5.5.0-rc4-1.ge195904-vanilla+ #553 Not tainted
> [251925.248012] ------------------------------------------------------
> [251925.254430] git/1531 is trying to acquire lock:
> [251925.259191] ffff9f93e120e690 (&fs_info->reloc_mutex){+.+.}, at: btrfs_record_root_in_trans+0x44/0x70 [btrfs]
> [251925.269419] 
> [251925.269419] but task is already holding lock:
> [251925.275629] ffff9f93df17bb38 (&mm->mmap_sem#2){++++}, at: vm_mmap_pgoff+0x71/0x100
> [251925.283505] 
> [251925.283505] which lock already depends on the new lock.
> [251925.283505] 
> [251925.292221] 
> [251925.292221] the existing dependency chain (in reverse order) is:
> [251925.300077] 
> [251925.300077] -> #5 (&mm->mmap_sem#2){++++}:
> [251925.306035]        __lock_acquire+0x3de/0x810
> [251925.310612]        lock_acquire+0x95/0x1b0
> [251925.314931]        __might_fault+0x60/0x80
> [251925.319245]        _copy_from_user+0x1e/0xa0
> [251925.323734]        get_sg_io_hdr+0xbb/0xf0
> [251925.328053]        scsi_cmd_ioctl+0x213/0x430
> [251925.332635]        cdrom_ioctl+0x3c/0x1499 [cdrom]
> [251925.337657]        sr_block_ioctl+0xa0/0xd0 [sr_mod]
> [251925.342845]        blkdev_ioctl+0x334/0xb70
> [251925.347255]        block_ioctl+0x3f/0x50
> [251925.351398]        do_vfs_ioctl+0x580/0x7b0
> [251925.355790]        ksys_ioctl+0x3a/0x70
> [251925.359845]        __x64_sys_ioctl+0x16/0x20
> [251925.364347]        do_syscall_64+0x56/0x220
> [251925.368750]        entry_SYSCALL_64_after_hwframe+0x49/0xbe
> [251925.374538] 
> [251925.374538] -> #4 (sr_mutex){+.+.}:
> [251925.379877]        __lock_acquire+0x3de/0x810
> [251925.384453]        lock_acquire+0x95/0x1b0
> [251925.388768]        __mutex_lock+0xa0/0xb40
> [251925.393074]        sr_block_open+0x9d/0x170 [sr_mod]
> [251925.398254]        __blkdev_get+0xed/0x580
> [251925.402574]        do_dentry_open+0x13c/0x3b0
> [251925.407152]        do_last+0x18a/0x900
> [251925.411117]        path_openat+0xa5/0x2b0
> [251925.417191]        do_filp_open+0x91/0x100
> [251925.421511]        do_sys_open+0x184/0x220
> [251925.425817]        do_syscall_64+0x56/0x220
> [251925.430220]        entry_SYSCALL_64_after_hwframe+0x49/0xbe
> [251925.436008] 
> [251925.436008] -> #3 (&bdev->bd_mutex){+.+.}:
> [251925.441975]        __lock_acquire+0x3de/0x810
> [251925.446538]        lock_acquire+0x95/0x1b0
> [251925.450854]        __mutex_lock+0xa0/0xb40
> [251925.455162]        __blkdev_get+0x7a/0x580
> [251925.459488]        blkdev_get+0x78/0x150
> [251925.463630]        blkdev_get_by_path+0x46/0x80
> [251925.468413]        btrfs_get_bdev_and_sb+0x1b/0xb0 [btrfs]
> [251925.474170]        open_fs_devices+0x77/0x280 [btrfs]
> [251925.479482]        btrfs_open_devices+0x92/0xa0 [btrfs]
> [251925.484967]        btrfs_mount_root+0x1fa/0x580 [btrfs]
> [251925.490413]        legacy_get_tree+0x30/0x60
> [251925.494899]        vfs_get_tree+0x23/0xb0
> [251925.499139]        fc_mount+0xe/0x40
> [251925.502933]        vfs_kern_mount.part.0+0x71/0x90
> [251925.507981]        btrfs_mount+0x147/0x3e0 [btrfs]
> [251925.512989]        legacy_get_tree+0x30/0x60
> [251925.517477]        vfs_get_tree+0x23/0xb0
> [251925.521707]        do_mount+0x7a8/0xa50
> [251925.525751]        __x64_sys_mount+0x8e/0xd0
> [251925.530243]        do_syscall_64+0x56/0x220
> [251925.534646]        entry_SYSCALL_64_after_hwframe+0x49/0xbe
> [251925.540427] 
> [251925.540427] -> #2 (&fs_devs->device_list_mutex){+.+.}:
> [251925.547434]        __lock_acquire+0x3de/0x810
> [251925.552013]        lock_acquire+0x95/0x1b0
> [251925.556328]        __mutex_lock+0xa0/0xb40
> [251925.560677]        btrfs_run_dev_stats+0x35/0x90 [btrfs]
> [251925.566243]        commit_cowonly_roots+0xb5/0x310 [btrfs]
> [251925.571974]        btrfs_commit_transaction+0x508/0xb20 [btrfs]
> [251925.578123]        sync_filesystem+0x74/0x90
> [251925.582617]        generic_shutdown_super+0x22/0x100
> [251925.587790]        kill_anon_super+0x14/0x30
> [251925.592308]        btrfs_kill_super+0x12/0xa0 [btrfs]
> [251925.597576]        deactivate_locked_super+0x31/0x70
> [251925.602758]        cleanup_mnt+0x100/0x160
> [251925.607074]        task_work_run+0x93/0xc0
> [251925.611399]        exit_to_usermode_loop+0x9f/0xb0
> [251925.616397]        do_syscall_64+0x1f0/0x220
> [251925.620896]        entry_SYSCALL_64_after_hwframe+0x49/0xbe
> [251925.626677] 
> [251925.626677] -> #1 (&fs_info->tree_log_mutex){+.+.}:
> [251925.633423]        __lock_acquire+0x3de/0x810
> [251925.637997]        lock_acquire+0x95/0x1b0
> [251925.642303]        __mutex_lock+0xa0/0xb40
> [251925.646652]        btrfs_commit_transaction+0x4b0/0xb20 [btrfs]
> [251925.652784]        sync_filesystem+0x74/0x90
> [251925.657259]        generic_shutdown_super+0x22/0x100
> [251925.662445]        kill_anon_super+0x14/0x30
> [251925.666964]        btrfs_kill_super+0x12/0xa0 [btrfs]
> [251925.672231]        deactivate_locked_super+0x31/0x70
> [251925.677402]        cleanup_mnt+0x100/0x160
> [251925.681716]        task_work_run+0x93/0xc0
> [251925.686026]        exit_to_usermode_loop+0x9f/0xb0
> [251925.691037]        do_syscall_64+0x1f0/0x220
> [251925.695532]        entry_SYSCALL_64_after_hwframe+0x49/0xbe
> [251925.701313] 
> [251925.701313] -> #0 (&fs_info->reloc_mutex){+.+.}:
> [251925.707789]        check_prev_add+0xa2/0xa90
> [251925.712277]        validate_chain+0x6bf/0xbb0
> [251925.716844]        __lock_acquire+0x3de/0x810
> [251925.721420]        lock_acquire+0x95/0x1b0
> [251925.725734]        __mutex_lock+0xa0/0xb40
> [251925.730087]        btrfs_record_root_in_trans+0x44/0x70 [btrfs]
> [251925.736260]        start_transaction+0xd8/0x590 [btrfs]
> [251925.741737]        btrfs_dirty_inode+0x44/0xd0 [btrfs]
> [251925.747103]        touch_atime+0xbe/0xe0
> [251925.751268]        btrfs_file_mmap+0x3f/0x60 [btrfs]
> [251925.756456]        mmap_region+0x3f7/0x680
> [251925.760786]        do_mmap+0x36d/0x520
> [251925.764744]        vm_mmap_pgoff+0x9d/0x100
> [251925.769147]        ksys_mmap_pgoff+0x1a3/0x290
> [251925.773812]        do_syscall_64+0x56/0x220
> [251925.778215]        entry_SYSCALL_64_after_hwframe+0x49/0xbe
> [251925.784005] 
> [251925.784005] other info that might help us debug this:
> [251925.784005] 
> [251925.792546] Chain exists of:
> [251925.792546]   &fs_info->reloc_mutex --> sr_mutex --> &mm->mmap_sem#2
> [251925.792546] 
> [251925.803779]  Possible unsafe locking scenario:
> [251925.803779] 
> [251925.810075]        CPU0                    CPU1
> [251925.814820]        ----                    ----
> [251925.819570]   lock(&mm->mmap_sem#2);
> [251925.823354]                                lock(sr_mutex);
> [251925.829059]                                lock(&mm->mmap_sem#2);
> [251925.835363]   lock(&fs_info->reloc_mutex);
> [251925.839674] 
> [251925.839674]  *** DEADLOCK ***
> [251925.839674] 
> [251925.846136] 3 locks held by git/1531:
> [251925.850015]  #0: ffff9f93df17bb38 (&mm->mmap_sem#2){++++}, at: vm_mmap_pgoff+0x71/0x100
> [251925.858298]  #1: ffff9f93d6e3c488 (sb_writers#9){.+.+}, at: touch_atime+0x64/0xe0
> [251925.866070]  #2: ffff9f93d6e3c6e8 (sb_internal#2){.+.+}, at: start_transaction+0x3e4/0x590 [btrfs]
> [251925.875366] 
> [251925.875366] stack backtrace:
> [251925.880100] CPU: 5 PID: 1531 Comm: git Not tainted 5.5.0-rc4-1.ge195904-vanilla+ #553
> [251925.888219] Hardware name: empty empty/S3993, BIOS PAQEX0-3 02/24/2008
> [251925.894971] Call Trace:
> [251925.897642]  dump_stack+0x71/0xa0
> [251925.901178]  check_noncircular+0x177/0x190
> [251925.905498]  check_prev_add+0xa2/0xa90
> [251925.909481]  ? sched_clock_cpu+0x15/0x130
> [251925.913710]  validate_chain+0x6bf/0xbb0
> [251925.917772]  ? _raw_spin_unlock+0x1f/0x40
> [251925.921995]  __lock_acquire+0x3de/0x810
> [251925.926052]  lock_acquire+0x95/0x1b0
> [251925.929890]  ? btrfs_record_root_in_trans+0x44/0x70 [btrfs]
> [251925.935685]  __mutex_lock+0xa0/0xb40
> [251925.939518]  ? btrfs_record_root_in_trans+0x44/0x70 [btrfs]
> [251925.945348]  ? join_transaction+0x3f4/0x4b0 [btrfs]
> [251925.950461]  ? sched_clock+0x5/0x10
> [251925.954205]  ? btrfs_record_root_in_trans+0x44/0x70 [btrfs]
> [251925.960045]  ? join_transaction+0x3f4/0x4b0 [btrfs]
> [251925.965196]  ? btrfs_record_root_in_trans+0x44/0x70 [btrfs]
> [251925.971015]  btrfs_record_root_in_trans+0x44/0x70 [btrfs]
> [251925.976685]  start_transaction+0xd8/0x590 [btrfs]
> [251925.981624]  ? ktime_get_coarse_real_ts64+0x70/0xd0
> [251925.986781]  btrfs_dirty_inode+0x44/0xd0 [btrfs]
> [251925.991624]  touch_atime+0xbe/0xe0
> [251925.995271]  btrfs_file_mmap+0x3f/0x60 [btrfs]
> [251925.999935]  mmap_region+0x3f7/0x680
> [251926.003728]  do_mmap+0x36d/0x520
> [251926.007180]  vm_mmap_pgoff+0x9d/0x100
> [251926.011071]  ksys_mmap_pgoff+0x1a3/0x290
> [251926.015216]  do_syscall_64+0x56/0x220
> [251926.019095]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
> [251926.024369] RIP: 0033:0x7f3d80959aca
> [251926.028171] Code: 00 64 c7 00 13 00 00 00 eb d6 89 c3 e9 77 ff ff ff 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 49 89 ca b8 09 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 8e 63 2b 00 f7 d8 64 89 01 48
> [251926.047306] RSP: 002b:00007ffcb8fa26c8 EFLAGS: 00000202 ORIG_RAX: 0000000000000009
> [251926.055162] RAX: ffffffffffffffda RBX: 0000000000819e10 RCX: 00007f3d80959aca
> [251926.062590] RDX: 0000000000000001 RSI: 0000000000819e10 RDI: 0000000000000000
> [251926.070006] RBP: 0000000000000000 R08: 0000000000000003 R09: 0000000000000000
> [251926.077424] R10: 0000000000000002 R11: 0000000000000202 R12: 0000000000000001
> [251926.084853] R13: 0000000000000002 R14: 0000000000000003 R15: 0000000000000000

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 00/12] btrfs: async discard follow up
  2020-01-06 16:30 ` David Sterba
@ 2020-01-06 17:28   ` Dennis Zhou
  2020-01-06 17:49     ` David Sterba
  0 siblings, 1 reply; 28+ messages in thread
From: Dennis Zhou @ 2020-01-06 17:28 UTC (permalink / raw)
  To: David Sterba
  Cc: David Sterba, Chris Mason, Josef Bacik, Omar Sandoval,
	kernel-team, linux-btrfs

On Mon, Jan 06, 2020 at 05:30:01PM +0100, David Sterba wrote:
> Is it expected to leave the counters in a state where are discardable
> extents but not process after a long period of time? I found
> 
> discard_bitmap_bytes:316833792
> discard_bytes_saved:59390722048
> discard_extent_bytes:26122764288
> discardable_bytes:44863488
> discardable_extents:883
> iops_limit:10
> kbps_limit:0
> max_discard_size:67108864
> 
> there was activity when the number of extents wen from about 2000 to
> that value (833), so this could bea nother instance of the -1 accounting
> bug.

There is no guarantee each invocation of the work item will find
something to discard. This was designed to prevent any edge case from
consuming the cpu.

If free space is added back while a block_group has it's cursor being
moved (unless it's fully free), it will not go back and trim those
extents. So we may leave stuff untrimmed until the next time around.
This is also to prevent a pathological case of just resetting in the
same block_group. Therefore, we may be in a situation where we have
discardable extents, but we aren't actively discarding it. The premise
is some filesystem usage will eventually occur and kick it back onto the
list. This also works because btrfs tries to reuse block groups before
allocating another one.

The -1 case is special because it really has to be we're blowing up the
block_group with something left on the table. Because of the size, I'm
guessing it's bitmap related and I added removal of discardable_* inside
free_bitmap().

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 00/12] btrfs: async discard follow up
  2020-01-06 17:14   ` Dennis Zhou
@ 2020-01-06 17:37     ` David Sterba
  0 siblings, 0 replies; 28+ messages in thread
From: David Sterba @ 2020-01-06 17:37 UTC (permalink / raw)
  To: Dennis Zhou
  Cc: David Sterba, David Sterba, Chris Mason, Josef Bacik,
	Omar Sandoval, kernel-team, linux-btrfs

On Mon, Jan 06, 2020 at 12:14:50PM -0500, Dennis Zhou wrote:
> On Mon, Jan 06, 2020 at 04:25:42PM +0100, David Sterba wrote:
> > On Thu, Jan 02, 2020 at 04:26:34PM -0500, Dennis Zhou wrote:
> > > Dave applied 1-12 from v6 [1]. This is a follow up cleaning up the
> > > remaining 10 patches adding 2 more to deal with a rare -1 [2] that I
> > > haven't quite figured out how to repro. This is also available at [3].
> > > 
> > > This series is on top of btrfs-devel#misc-next-with-discard-v6 0c7be920bd7d.
> > > 
> > > [1] https://lore.kernel.org/linux-btrfs/cover.1576195673.git.dennis@kernel.org/
> > > [2] https://lore.kernel.org/linux-btrfs/20191217145541.GE3929@suse.cz/
> > > [3] https://git.kernel.org/pub/scm/linux/kernel/git/dennis/misc.git/log/?h=async-discard
> > > 
> > > Dennis Zhou (12):
> > >   btrfs: calculate discard delay based on number of extents
> > >   btrfs: add bps discard rate limit for async discard
> > >   btrfs: limit max discard size for async discard
> > >   btrfs: make max async discard size tunable
> > >   btrfs: have multiple discard lists
> > >   btrfs: only keep track of data extents for async discard
> > >   btrfs: keep track of discard reuse stats
> > >   btrfs: add async discard header
> > >   btrfs: increase the metadata allowance for the free_space_cache
> > >   btrfs: make smaller extents more likely to go into bitmaps
> > >   btrfs: ensure removal of discardable_* in free_bitmap()
> > >   btrfs: add correction to handle -1 edge case in async discard
> > 
> > I found this lockdep warning on the machine but can't tell what was the
> > exact load at the time. I did a few copy/delete/balance and git checkout
> > rounds, similar to the first testing loads. The branch tested was
> > basically current misc-next:
> 
> I've definitely ran into an mmap_sem circular lockdep warning before,
> but I believe at the time I was able to repro it without my patches on
> top.
> 
> Besides that, I'm not sure how my series would be the trigger for this.
> I'll take a closer look today.

Thanks, I don't remember the exact lockdep report, though I've seen some
transient mmap_sem warnings but nothing recent. What's weird is the
sr_mutex, that's from scsi cdrom. There is one on the machine so I guess
this is machine-specific and not related to the patchset.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 00/12] btrfs: async discard follow up
  2020-01-06 17:28   ` Dennis Zhou
@ 2020-01-06 17:49     ` David Sterba
  0 siblings, 0 replies; 28+ messages in thread
From: David Sterba @ 2020-01-06 17:49 UTC (permalink / raw)
  To: Dennis Zhou
  Cc: David Sterba, David Sterba, Chris Mason, Josef Bacik,
	Omar Sandoval, kernel-team, linux-btrfs

On Mon, Jan 06, 2020 at 12:28:47PM -0500, Dennis Zhou wrote:
> On Mon, Jan 06, 2020 at 05:30:01PM +0100, David Sterba wrote:
> > Is it expected to leave the counters in a state where are discardable
> > extents but not process after a long period of time? I found
> > 
> > discard_bitmap_bytes:316833792
> > discard_bytes_saved:59390722048
> > discard_extent_bytes:26122764288
> > discardable_bytes:44863488
> > discardable_extents:883
> > iops_limit:10
> > kbps_limit:0
> > max_discard_size:67108864
> > 
> > there was activity when the number of extents wen from about 2000 to
> > that value (833), so this could bea nother instance of the -1 accounting
> > bug.
> 
> There is no guarantee each invocation of the work item will find
> something to discard. This was designed to prevent any edge case from
> consuming the cpu.
> 
> If free space is added back while a block_group has it's cursor being
> moved (unless it's fully free), it will not go back and trim those
> extents. So we may leave stuff untrimmed until the next time around.
> This is also to prevent a pathological case of just resetting in the
> same block_group. Therefore, we may be in a situation where we have
> discardable extents, but we aren't actively discarding it. The premise
> is some filesystem usage will eventually occur and kick it back onto the
> list. This also works because btrfs tries to reuse block groups before
> allocating another one.

Ok I see, thanks. Removing all the data again followed by balance
reached the state with 1 extent some small size (corresponds to 'used'
of data block gruops), which could be explained by the above as well.

^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2020-01-06 17:49 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-01-02 21:26 [PATCH 00/12] btrfs: async discard follow up Dennis Zhou
2020-01-02 21:26 ` [PATCH 01/12] btrfs: calculate discard delay based on number of extents Dennis Zhou
2020-01-03 14:38   ` David Sterba
2020-01-02 21:26 ` [PATCH 02/12] btrfs: add bps discard rate limit for async discard Dennis Zhou
2020-01-03 14:40   ` David Sterba
2020-01-02 21:26 ` [PATCH 03/12] btrfs: limit max discard size " Dennis Zhou
2020-01-03 14:41   ` David Sterba
2020-01-02 21:26 ` [PATCH 04/12] btrfs: make max async discard size tunable Dennis Zhou
2020-01-03 14:44   ` David Sterba
2020-01-02 21:26 ` [PATCH 05/12] btrfs: have multiple discard lists Dennis Zhou
2020-01-02 21:26 ` [PATCH 06/12] btrfs: only keep track of data extents for async discard Dennis Zhou
2020-01-02 21:26 ` [PATCH 07/12] btrfs: keep track of discard reuse stats Dennis Zhou
2020-01-02 21:26 ` [PATCH 08/12] btrfs: add async discard header Dennis Zhou
2020-01-02 21:26 ` [PATCH 09/12] btrfs: increase the metadata allowance for the free_space_cache Dennis Zhou
2020-01-02 21:26 ` [PATCH 10/12] btrfs: make smaller extents more likely to go into bitmaps Dennis Zhou
2020-01-02 21:26 ` [PATCH 11/12] btrfs: ensure removal of discardable_* in free_bitmap() Dennis Zhou
2020-01-02 21:26 ` [PATCH 12/12] btrfs: add correction to handle -1 edge case in async discard Dennis Zhou
2020-01-03 14:42   ` David Sterba
2020-01-05 20:35   ` Nikolay Borisov
2020-01-06 13:44     ` David Sterba
2020-01-03 14:51 ` [PATCH 00/12] btrfs: async discard follow up David Sterba
2020-01-03 17:43   ` Dennis Zhou
2020-01-06 15:25 ` David Sterba
2020-01-06 17:14   ` Dennis Zhou
2020-01-06 17:37     ` David Sterba
2020-01-06 16:30 ` David Sterba
2020-01-06 17:28   ` Dennis Zhou
2020-01-06 17:49     ` David Sterba

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.