All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/7] ext4: avoid sysfs variables overflow causing BUG_ON/SOOB
@ 2024-01-26  8:57 Baokun Li
  2024-01-26  8:57 ` [PATCH 1/7] ext4: avoid overflow when setting values via sysfs Baokun Li
                   ` (6 more replies)
  0 siblings, 7 replies; 31+ messages in thread
From: Baokun Li @ 2024-01-26  8:57 UTC (permalink / raw)
  To: linux-ext4
  Cc: tytso, adilger.kernel, jack, ritesh.list, linux-kernel, yi.zhang,
	yangerkun, chengzhihao1, yukuai3, libaokun1

Hello everyone,

This patchset is intended to avoid variables that can be modified via sysfs
from overflowing when stored or used and thus causing various problems.

"kvm-xfstests -c ext4/all -g auto" has been executed with no new failures.

Baokun Li (7):
  ext4: avoid overflow when setting values via sysfs
  ext4: refactor out ext4_generic_attr_store()
  ext4: refactor out ext4_generic_attr_show()
  ext4: add positive int attr pointer to avoid sysfs variables overflow
  ext4: fix slab-out-of-bounds in
    ext4_mb_find_good_group_avg_frag_lists()
  ext4: set type of ac_groups_linear_remaining to __u32 to avoid
    overflow
  ext4: set the type of max_zeroout to unsigned int to avoid overflow

 fs/ext4/extents.c |   6 +-
 fs/ext4/mballoc.c |   2 +
 fs/ext4/mballoc.h |   2 +-
 fs/ext4/sysfs.c   | 159 +++++++++++++++++++++++++---------------------
 4 files changed, 92 insertions(+), 77 deletions(-)

-- 
2.31.1


^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH 1/7] ext4: avoid overflow when setting values via sysfs
  2024-01-26  8:57 [PATCH 0/7] ext4: avoid sysfs variables overflow causing BUG_ON/SOOB Baokun Li
@ 2024-01-26  8:57 ` Baokun Li
  2024-01-26  9:28   ` Zhang Yi
  2024-02-13 16:05   ` Jan Kara
  2024-01-26  8:57 ` [PATCH 2/7] ext4: refactor out ext4_generic_attr_store() Baokun Li
                   ` (5 subsequent siblings)
  6 siblings, 2 replies; 31+ messages in thread
From: Baokun Li @ 2024-01-26  8:57 UTC (permalink / raw)
  To: linux-ext4
  Cc: tytso, adilger.kernel, jack, ritesh.list, linux-kernel, yi.zhang,
	yangerkun, chengzhihao1, yukuai3, libaokun1

When setting values of type unsigned int through sysfs, we use kstrtoul()
to parse it and then truncate part of it as the final set value, when the
set value is greater than UINT_MAX, the set value will not match what we
see because of the truncation. As follows:

  $ echo 4294967296 > /sys/fs/ext4/sda/mb_max_linear_groups
  $ cat /sys/fs/ext4/sda/mb_max_linear_groups
    0

So when the value set is outside the variable type range, -EINVAL is
returned to avoid the inconsistency described above. In addition, a
judgment is added to avoid setting s_resv_clusters less than 0.

Signed-off-by: Baokun Li <libaokun1@huawei.com>
---
 fs/ext4/sysfs.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/fs/ext4/sysfs.c b/fs/ext4/sysfs.c
index 6d332dff79dd..3671a8aaf4af 100644
--- a/fs/ext4/sysfs.c
+++ b/fs/ext4/sysfs.c
@@ -104,7 +104,7 @@ static ssize_t reserved_clusters_store(struct ext4_sb_info *sbi,
 	int ret;
 
 	ret = kstrtoull(skip_spaces(buf), 0, &val);
-	if (ret || val >= clusters)
+	if (ret || val >= clusters || (s64)val < 0)
 		return -EINVAL;
 
 	atomic64_set(&sbi->s_resv_clusters, val);
@@ -463,6 +463,8 @@ static ssize_t ext4_attr_store(struct kobject *kobj,
 		ret = kstrtoul(skip_spaces(buf), 0, &t);
 		if (ret)
 			return ret;
+		if (t != (unsigned int)t)
+			return -EINVAL;
 		if (a->attr_ptr == ptr_ext4_super_block_offset)
 			*((__le32 *) ptr) = cpu_to_le32(t);
 		else
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH 2/7] ext4: refactor out ext4_generic_attr_store()
  2024-01-26  8:57 [PATCH 0/7] ext4: avoid sysfs variables overflow causing BUG_ON/SOOB Baokun Li
  2024-01-26  8:57 ` [PATCH 1/7] ext4: avoid overflow when setting values via sysfs Baokun Li
@ 2024-01-26  8:57 ` Baokun Li
  2024-01-26  9:37   ` Zhang Yi
  2024-02-13 16:47   ` Jan Kara
  2024-01-26  8:57 ` [PATCH 3/7] ext4: refactor out ext4_generic_attr_show() Baokun Li
                   ` (4 subsequent siblings)
  6 siblings, 2 replies; 31+ messages in thread
From: Baokun Li @ 2024-01-26  8:57 UTC (permalink / raw)
  To: linux-ext4
  Cc: tytso, adilger.kernel, jack, ritesh.list, linux-kernel, yi.zhang,
	yangerkun, chengzhihao1, yukuai3, libaokun1

Refactor out the function ext4_generic_attr_store() to handle the setting
of values of various common types, with no functional changes.

Signed-off-by: Baokun Li <libaokun1@huawei.com>
---
 fs/ext4/sysfs.c | 49 ++++++++++++++++++++++++++++---------------------
 1 file changed, 28 insertions(+), 21 deletions(-)

diff --git a/fs/ext4/sysfs.c b/fs/ext4/sysfs.c
index 3671a8aaf4af..834f9a0eb641 100644
--- a/fs/ext4/sysfs.c
+++ b/fs/ext4/sysfs.c
@@ -443,26 +443,22 @@ static ssize_t ext4_attr_show(struct kobject *kobj,
 	return 0;
 }
 
-static ssize_t ext4_attr_store(struct kobject *kobj,
-			       struct attribute *attr,
-			       const char *buf, size_t len)
+static ssize_t ext4_generic_attr_store(struct ext4_attr *a,
+				       struct ext4_sb_info *sbi,
+				       const char *buf, size_t len)
 {
-	struct ext4_sb_info *sbi = container_of(kobj, struct ext4_sb_info,
-						s_kobj);
-	struct ext4_attr *a = container_of(attr, struct ext4_attr, attr);
-	void *ptr = calc_ptr(a, sbi);
-	unsigned long t;
 	int ret;
+	unsigned long t;
+	void *ptr = calc_ptr(a, sbi);
+
+	if (!ptr)
+		return 0;
+	ret = kstrtoul(skip_spaces(buf), 0, &t);
+	if (ret)
+		return ret;
 
 	switch (a->attr_id) {
-	case attr_reserved_clusters:
-		return reserved_clusters_store(sbi, buf, len);
 	case attr_pointer_ui:
-		if (!ptr)
-			return 0;
-		ret = kstrtoul(skip_spaces(buf), 0, &t);
-		if (ret)
-			return ret;
 		if (t != (unsigned int)t)
 			return -EINVAL;
 		if (a->attr_ptr == ptr_ext4_super_block_offset)
@@ -471,19 +467,30 @@ static ssize_t ext4_attr_store(struct kobject *kobj,
 			*((unsigned int *) ptr) = t;
 		return len;
 	case attr_pointer_ul:
-		if (!ptr)
-			return 0;
-		ret = kstrtoul(skip_spaces(buf), 0, &t);
-		if (ret)
-			return ret;
 		*((unsigned long *) ptr) = t;
 		return len;
+	}
+	return 0;
+}
+
+static ssize_t ext4_attr_store(struct kobject *kobj,
+			       struct attribute *attr,
+			       const char *buf, size_t len)
+{
+	struct ext4_sb_info *sbi = container_of(kobj, struct ext4_sb_info,
+						s_kobj);
+	struct ext4_attr *a = container_of(attr, struct ext4_attr, attr);
+
+	switch (a->attr_id) {
+	case attr_reserved_clusters:
+		return reserved_clusters_store(sbi, buf, len);
 	case attr_inode_readahead:
 		return inode_readahead_blks_store(sbi, buf, len);
 	case attr_trigger_test_error:
 		return trigger_test_error(sbi, buf, len);
+	default:
+		return ext4_generic_attr_store(a, sbi, buf, len);
 	}
-	return 0;
 }
 
 static void ext4_sb_release(struct kobject *kobj)
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH 3/7] ext4: refactor out ext4_generic_attr_show()
  2024-01-26  8:57 [PATCH 0/7] ext4: avoid sysfs variables overflow causing BUG_ON/SOOB Baokun Li
  2024-01-26  8:57 ` [PATCH 1/7] ext4: avoid overflow when setting values via sysfs Baokun Li
  2024-01-26  8:57 ` [PATCH 2/7] ext4: refactor out ext4_generic_attr_store() Baokun Li
@ 2024-01-26  8:57 ` Baokun Li
  2024-01-26 10:08   ` Zhang Yi
  2024-02-13 16:44   ` Jan Kara
  2024-01-26  8:57 ` [PATCH 4/7] ext4: add positive int attr pointer to avoid sysfs variables overflow Baokun Li
                   ` (3 subsequent siblings)
  6 siblings, 2 replies; 31+ messages in thread
From: Baokun Li @ 2024-01-26  8:57 UTC (permalink / raw)
  To: linux-ext4
  Cc: tytso, adilger.kernel, jack, ritesh.list, linux-kernel, yi.zhang,
	yangerkun, chengzhihao1, yukuai3, libaokun1

Refactor out the function ext4_generic_attr_show() to handle the reading
of values of various common types, with no functional changes.

Signed-off-by: Baokun Li <libaokun1@huawei.com>
---
 fs/ext4/sysfs.c | 74 +++++++++++++++++++++----------------------------
 1 file changed, 32 insertions(+), 42 deletions(-)

diff --git a/fs/ext4/sysfs.c b/fs/ext4/sysfs.c
index 834f9a0eb641..a5d657fa05cb 100644
--- a/fs/ext4/sysfs.c
+++ b/fs/ext4/sysfs.c
@@ -366,13 +366,42 @@ static ssize_t __print_tstamp(char *buf, __le32 lo, __u8 hi)
 #define print_tstamp(buf, es, tstamp) \
 	__print_tstamp(buf, (es)->tstamp, (es)->tstamp ## _hi)
 
+static ssize_t ext4_generic_attr_show(struct ext4_attr *a,
+				      struct ext4_sb_info *sbi, char *buf)
+{
+	void *ptr = calc_ptr(a, sbi);
+
+	if (!ptr)
+		return 0;
+
+	switch (a->attr_id) {
+	case attr_inode_readahead:
+	case attr_pointer_ui:
+		if (a->attr_ptr == ptr_ext4_super_block_offset)
+			return sysfs_emit(buf, "%u\n", le32_to_cpup(ptr));
+		return sysfs_emit(buf, "%u\n", *((unsigned int *) ptr));
+	case attr_pointer_ul:
+		return sysfs_emit(buf, "%lu\n", *((unsigned long *) ptr));
+	case attr_pointer_u8:
+		return sysfs_emit(buf, "%u\n", *((unsigned char *) ptr));
+	case attr_pointer_u64:
+		if (a->attr_ptr == ptr_ext4_super_block_offset)
+			return sysfs_emit(buf, "%llu\n", le64_to_cpup(ptr));
+		return sysfs_emit(buf, "%llu\n", *((unsigned long long *) ptr));
+	case attr_pointer_string:
+		return sysfs_emit(buf, "%.*s\n", a->attr_size, (char *) ptr);
+	case attr_pointer_atomic:
+		return sysfs_emit(buf, "%d\n", atomic_read((atomic_t *) ptr));
+	}
+	return 0;
+}
+
 static ssize_t ext4_attr_show(struct kobject *kobj,
 			      struct attribute *attr, char *buf)
 {
 	struct ext4_sb_info *sbi = container_of(kobj, struct ext4_sb_info,
 						s_kobj);
 	struct ext4_attr *a = container_of(attr, struct ext4_attr, attr);
-	void *ptr = calc_ptr(a, sbi);
 
 	switch (a->attr_id) {
 	case attr_delayed_allocation_blocks:
@@ -391,45 +420,6 @@ static ssize_t ext4_attr_show(struct kobject *kobj,
 		return sysfs_emit(buf, "%llu\n",
 				(unsigned long long)
 			percpu_counter_sum(&sbi->s_sra_exceeded_retry_limit));
-	case attr_inode_readahead:
-	case attr_pointer_ui:
-		if (!ptr)
-			return 0;
-		if (a->attr_ptr == ptr_ext4_super_block_offset)
-			return sysfs_emit(buf, "%u\n",
-					le32_to_cpup(ptr));
-		else
-			return sysfs_emit(buf, "%u\n",
-					*((unsigned int *) ptr));
-	case attr_pointer_ul:
-		if (!ptr)
-			return 0;
-		return sysfs_emit(buf, "%lu\n",
-				*((unsigned long *) ptr));
-	case attr_pointer_u8:
-		if (!ptr)
-			return 0;
-		return sysfs_emit(buf, "%u\n",
-				*((unsigned char *) ptr));
-	case attr_pointer_u64:
-		if (!ptr)
-			return 0;
-		if (a->attr_ptr == ptr_ext4_super_block_offset)
-			return sysfs_emit(buf, "%llu\n",
-					le64_to_cpup(ptr));
-		else
-			return sysfs_emit(buf, "%llu\n",
-					*((unsigned long long *) ptr));
-	case attr_pointer_string:
-		if (!ptr)
-			return 0;
-		return sysfs_emit(buf, "%.*s\n", a->attr_size,
-				(char *) ptr);
-	case attr_pointer_atomic:
-		if (!ptr)
-			return 0;
-		return sysfs_emit(buf, "%d\n",
-				atomic_read((atomic_t *) ptr));
 	case attr_feature:
 		return sysfs_emit(buf, "supported\n");
 	case attr_first_error_time:
@@ -438,9 +428,9 @@ static ssize_t ext4_attr_show(struct kobject *kobj,
 		return print_tstamp(buf, sbi->s_es, s_last_error_time);
 	case attr_journal_task:
 		return journal_task_show(sbi, buf);
+	default:
+		return ext4_generic_attr_show(a, sbi, buf);
 	}
-
-	return 0;
 }
 
 static ssize_t ext4_generic_attr_store(struct ext4_attr *a,
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH 4/7] ext4: add positive int attr pointer to avoid sysfs variables overflow
  2024-01-26  8:57 [PATCH 0/7] ext4: avoid sysfs variables overflow causing BUG_ON/SOOB Baokun Li
                   ` (2 preceding siblings ...)
  2024-01-26  8:57 ` [PATCH 3/7] ext4: refactor out ext4_generic_attr_show() Baokun Li
@ 2024-01-26  8:57 ` Baokun Li
  2024-01-27  2:07   ` Zhang Yi
  2024-02-13 16:58   ` Jan Kara
  2024-01-26  8:57 ` [PATCH 5/7] ext4: fix slab-out-of-bounds in ext4_mb_find_good_group_avg_frag_lists() Baokun Li
                   ` (2 subsequent siblings)
  6 siblings, 2 replies; 31+ messages in thread
From: Baokun Li @ 2024-01-26  8:57 UTC (permalink / raw)
  To: linux-ext4
  Cc: tytso, adilger.kernel, jack, ritesh.list, linux-kernel, yi.zhang,
	yangerkun, chengzhihao1, yukuai3, libaokun1, stable

We can easily trigger a BUG_ON by using the following commands:

    mount /dev/$disk /tmp/test
    echo 2147483650 > /sys/fs/ext4/$disk/mb_group_prealloc
    echo test > /tmp/test/file && sync

==================================================================
kernel BUG at fs/ext4/mballoc.c:2029!
invalid opcode: 0000 [#1] PREEMPT SMP PTI
CPU: 3 PID: 320 Comm: kworker/u36:1 Not tainted 6.8.0-rc1 #462
RIP: 0010:mb_mark_used+0x358/0x370
[...]
Call Trace:
 ext4_mb_use_best_found+0x56/0x140
 ext4_mb_complex_scan_group+0x196/0x2f0
 ext4_mb_regular_allocator+0xa92/0xf00
 ext4_mb_new_blocks+0x302/0xbc0
 ext4_ext_map_blocks+0x95a/0xef0
 ext4_map_blocks+0x2b1/0x680
 ext4_do_writepages+0x733/0xbd0
[...]
==================================================================

In ext4_mb_normalize_group_request():
    ac->ac_g_ex.fe_len = EXT4_SB(sb)->s_mb_group_prealloc;

Here fe_len is of type int, but s_mb_group_prealloc is of type unsigned
int, so setting s_mb_group_prealloc to 2147483650 overflows fe_len to a
negative number, which ultimately triggers a BUG_ON() in mb_mark_used().

Therefore, we add attr_pointer_pi (aka positive int attr pointer) with a
value range of 0-INT_MAX to avoid the above problem. In addition to the
mb_group_prealloc sysfs interface, the following interfaces also have uint
to int conversions that result in overflows, and are also fixed.

  err_ratelimit_burst
  msg_ratelimit_burst
  warning_ratelimit_burst
  err_ratelimit_interval_ms
  msg_ratelimit_interval_ms
  warning_ratelimit_interval_ms
  mb_best_avail_max_trim_order

CC: stable@vger.kernel.org
Signed-off-by: Baokun Li <libaokun1@huawei.com>
---
 fs/ext4/sysfs.c | 25 +++++++++++++++++--------
 1 file changed, 17 insertions(+), 8 deletions(-)

diff --git a/fs/ext4/sysfs.c b/fs/ext4/sysfs.c
index a5d657fa05cb..6f9f96e00f2f 100644
--- a/fs/ext4/sysfs.c
+++ b/fs/ext4/sysfs.c
@@ -30,6 +30,7 @@ typedef enum {
 	attr_first_error_time,
 	attr_last_error_time,
 	attr_feature,
+	attr_pointer_pi,
 	attr_pointer_ui,
 	attr_pointer_ul,
 	attr_pointer_u64,
@@ -178,6 +179,9 @@ static struct ext4_attr ext4_attr_##_name = {			\
 #define EXT4_RO_ATTR_ES_STRING(_name,_elname,_size)			\
 	EXT4_ATTR_STRING(_name, 0444, _size, ext4_super_block, _elname)
 
+#define EXT4_RW_ATTR_SBI_PI(_name,_elname)      \
+	EXT4_ATTR_OFFSET(_name, 0644, pointer_pi, ext4_sb_info, _elname)
+
 #define EXT4_RW_ATTR_SBI_UI(_name,_elname)	\
 	EXT4_ATTR_OFFSET(_name, 0644, pointer_ui, ext4_sb_info, _elname)
 
@@ -213,17 +217,17 @@ EXT4_RW_ATTR_SBI_UI(mb_max_to_scan, s_mb_max_to_scan);
 EXT4_RW_ATTR_SBI_UI(mb_min_to_scan, s_mb_min_to_scan);
 EXT4_RW_ATTR_SBI_UI(mb_order2_req, s_mb_order2_reqs);
 EXT4_RW_ATTR_SBI_UI(mb_stream_req, s_mb_stream_request);
-EXT4_RW_ATTR_SBI_UI(mb_group_prealloc, s_mb_group_prealloc);
+EXT4_RW_ATTR_SBI_PI(mb_group_prealloc, s_mb_group_prealloc);
 EXT4_RW_ATTR_SBI_UI(mb_max_linear_groups, s_mb_max_linear_groups);
 EXT4_RW_ATTR_SBI_UI(extent_max_zeroout_kb, s_extent_max_zeroout_kb);
 EXT4_ATTR(trigger_fs_error, 0200, trigger_test_error);
-EXT4_RW_ATTR_SBI_UI(err_ratelimit_interval_ms, s_err_ratelimit_state.interval);
-EXT4_RW_ATTR_SBI_UI(err_ratelimit_burst, s_err_ratelimit_state.burst);
-EXT4_RW_ATTR_SBI_UI(warning_ratelimit_interval_ms, s_warning_ratelimit_state.interval);
-EXT4_RW_ATTR_SBI_UI(warning_ratelimit_burst, s_warning_ratelimit_state.burst);
-EXT4_RW_ATTR_SBI_UI(msg_ratelimit_interval_ms, s_msg_ratelimit_state.interval);
-EXT4_RW_ATTR_SBI_UI(msg_ratelimit_burst, s_msg_ratelimit_state.burst);
-EXT4_RW_ATTR_SBI_UI(mb_best_avail_max_trim_order, s_mb_best_avail_max_trim_order);
+EXT4_RW_ATTR_SBI_PI(err_ratelimit_interval_ms, s_err_ratelimit_state.interval);
+EXT4_RW_ATTR_SBI_PI(err_ratelimit_burst, s_err_ratelimit_state.burst);
+EXT4_RW_ATTR_SBI_PI(warning_ratelimit_interval_ms, s_warning_ratelimit_state.interval);
+EXT4_RW_ATTR_SBI_PI(warning_ratelimit_burst, s_warning_ratelimit_state.burst);
+EXT4_RW_ATTR_SBI_PI(msg_ratelimit_interval_ms, s_msg_ratelimit_state.interval);
+EXT4_RW_ATTR_SBI_PI(msg_ratelimit_burst, s_msg_ratelimit_state.burst);
+EXT4_RW_ATTR_SBI_PI(mb_best_avail_max_trim_order, s_mb_best_avail_max_trim_order);
 #ifdef CONFIG_EXT4_DEBUG
 EXT4_RW_ATTR_SBI_UL(simulate_fail, s_simulate_fail);
 #endif
@@ -376,6 +380,7 @@ static ssize_t ext4_generic_attr_show(struct ext4_attr *a,
 
 	switch (a->attr_id) {
 	case attr_inode_readahead:
+	case attr_pointer_pi:
 	case attr_pointer_ui:
 		if (a->attr_ptr == ptr_ext4_super_block_offset)
 			return sysfs_emit(buf, "%u\n", le32_to_cpup(ptr));
@@ -448,6 +453,10 @@ static ssize_t ext4_generic_attr_store(struct ext4_attr *a,
 		return ret;
 
 	switch (a->attr_id) {
+	case attr_pointer_pi:
+		if ((int)t < 0)
+			return -EINVAL;
+		fallthrough;
 	case attr_pointer_ui:
 		if (t != (unsigned int)t)
 			return -EINVAL;
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH 5/7] ext4: fix slab-out-of-bounds in ext4_mb_find_good_group_avg_frag_lists()
  2024-01-26  8:57 [PATCH 0/7] ext4: avoid sysfs variables overflow causing BUG_ON/SOOB Baokun Li
                   ` (3 preceding siblings ...)
  2024-01-26  8:57 ` [PATCH 4/7] ext4: add positive int attr pointer to avoid sysfs variables overflow Baokun Li
@ 2024-01-26  8:57 ` Baokun Li
  2024-01-27  2:09   ` Zhang Yi
                     ` (2 more replies)
  2024-01-26  8:57 ` [PATCH 6/7] ext4: set type of ac_groups_linear_remaining to __u32 to avoid overflow Baokun Li
  2024-01-26  8:57 ` [PATCH 7/7] ext4: set the type of max_zeroout to unsigned int " Baokun Li
  6 siblings, 3 replies; 31+ messages in thread
From: Baokun Li @ 2024-01-26  8:57 UTC (permalink / raw)
  To: linux-ext4
  Cc: tytso, adilger.kernel, jack, ritesh.list, linux-kernel, yi.zhang,
	yangerkun, chengzhihao1, yukuai3, libaokun1, stable

We can trigger a slab-out-of-bounds with the following commands:

    mkfs.ext4 -F /dev/$disk 10G
    mount /dev/$disk /tmp/test
    echo 2147483647 > /sys/fs/ext4/$disk/mb_group_prealloc
    echo test > /tmp/test/file && sync

==================================================================
BUG: KASAN: slab-out-of-bounds in ext4_mb_find_good_group_avg_frag_lists+0x8a/0x200 [ext4]
Read of size 8 at addr ffff888121b9d0f0 by task kworker/u2:0/11
CPU: 0 PID: 11 Comm: kworker/u2:0 Tainted: GL 6.7.0-next-20240118 #521
Call Trace:
 dump_stack_lvl+0x2c/0x50
 kasan_report+0xb6/0xf0
 ext4_mb_find_good_group_avg_frag_lists+0x8a/0x200 [ext4]
 ext4_mb_regular_allocator+0x19e9/0x2370 [ext4]
 ext4_mb_new_blocks+0x88a/0x1370 [ext4]
 ext4_ext_map_blocks+0x14f7/0x2390 [ext4]
 ext4_map_blocks+0x569/0xea0 [ext4]
 ext4_do_writepages+0x10f6/0x1bc0 [ext4]
[...]
==================================================================

The flow of issue triggering is as follows:

// Set s_mb_group_prealloc to 2147483647 via sysfs
ext4_mb_new_blocks
  ext4_mb_normalize_request
    ext4_mb_normalize_group_request
      ac->ac_g_ex.fe_len = EXT4_SB(sb)->s_mb_group_prealloc
  ext4_mb_regular_allocator
    ext4_mb_choose_next_group
      ext4_mb_choose_next_group_best_avail
        mb_avg_fragment_size_order
          order = fls(len) - 2 = 29
        ext4_mb_find_good_group_avg_frag_lists
          frag_list = &sbi->s_mb_avg_fragment_size[order]
          if (list_empty(frag_list)) // Trigger SOOB!

At 4k block size, the length of the s_mb_avg_fragment_size list is 14, but
an oversized s_mb_group_prealloc is set, causing slab-out-of-bounds to be
triggered by an attempt to access an element at index 29.

Therefore it is not allowed to set s_mb_group_prealloc to a value greater
than s_clusters_per_group via sysfs, and to avoid returning an order from
mb_avg_fragment_size_order() that is greater than MB_NUM_ORDERS(sb).

Fixes: 7e170922f06b ("ext4: Add allocation criteria 1.5 (CR1_5)")
CC: stable@vger.kernel.org
Signed-off-by: Baokun Li <libaokun1@huawei.com>
---
 fs/ext4/mballoc.c | 2 ++
 fs/ext4/sysfs.c   | 9 ++++++++-
 2 files changed, 10 insertions(+), 1 deletion(-)

diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index f44f668e407f..1ea6491b6b00 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -832,6 +832,8 @@ static int mb_avg_fragment_size_order(struct super_block *sb, ext4_grpblk_t len)
 		return 0;
 	if (order == MB_NUM_ORDERS(sb))
 		order--;
+	if (WARN_ON_ONCE(order > MB_NUM_ORDERS(sb)))
+		order = MB_NUM_ORDERS(sb) - 1;
 	return order;
 }
 
diff --git a/fs/ext4/sysfs.c b/fs/ext4/sysfs.c
index 6f9f96e00f2f..60ca7b2797b2 100644
--- a/fs/ext4/sysfs.c
+++ b/fs/ext4/sysfs.c
@@ -29,6 +29,7 @@ typedef enum {
 	attr_trigger_test_error,
 	attr_first_error_time,
 	attr_last_error_time,
+	attr_group_prealloc,
 	attr_feature,
 	attr_pointer_pi,
 	attr_pointer_ui,
@@ -211,13 +212,14 @@ EXT4_ATTR_FUNC(sra_exceeded_retry_limit, 0444);
 
 EXT4_ATTR_OFFSET(inode_readahead_blks, 0644, inode_readahead,
 		 ext4_sb_info, s_inode_readahead_blks);
+EXT4_ATTR_OFFSET(mb_group_prealloc, 0644, group_prealloc,
+		 ext4_sb_info, s_mb_group_prealloc);
 EXT4_RW_ATTR_SBI_UI(inode_goal, s_inode_goal);
 EXT4_RW_ATTR_SBI_UI(mb_stats, s_mb_stats);
 EXT4_RW_ATTR_SBI_UI(mb_max_to_scan, s_mb_max_to_scan);
 EXT4_RW_ATTR_SBI_UI(mb_min_to_scan, s_mb_min_to_scan);
 EXT4_RW_ATTR_SBI_UI(mb_order2_req, s_mb_order2_reqs);
 EXT4_RW_ATTR_SBI_UI(mb_stream_req, s_mb_stream_request);
-EXT4_RW_ATTR_SBI_PI(mb_group_prealloc, s_mb_group_prealloc);
 EXT4_RW_ATTR_SBI_UI(mb_max_linear_groups, s_mb_max_linear_groups);
 EXT4_RW_ATTR_SBI_UI(extent_max_zeroout_kb, s_extent_max_zeroout_kb);
 EXT4_ATTR(trigger_fs_error, 0200, trigger_test_error);
@@ -380,6 +382,7 @@ static ssize_t ext4_generic_attr_show(struct ext4_attr *a,
 
 	switch (a->attr_id) {
 	case attr_inode_readahead:
+	case attr_group_prealloc:
 	case attr_pointer_pi:
 	case attr_pointer_ui:
 		if (a->attr_ptr == ptr_ext4_super_block_offset)
@@ -453,6 +456,10 @@ static ssize_t ext4_generic_attr_store(struct ext4_attr *a,
 		return ret;
 
 	switch (a->attr_id) {
+	case attr_group_prealloc:
+		if (t > sbi->s_clusters_per_group)
+			return -EINVAL;
+		fallthrough;
 	case attr_pointer_pi:
 		if ((int)t < 0)
 			return -EINVAL;
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH 6/7] ext4: set type of ac_groups_linear_remaining to __u32 to avoid overflow
  2024-01-26  8:57 [PATCH 0/7] ext4: avoid sysfs variables overflow causing BUG_ON/SOOB Baokun Li
                   ` (4 preceding siblings ...)
  2024-01-26  8:57 ` [PATCH 5/7] ext4: fix slab-out-of-bounds in ext4_mb_find_good_group_avg_frag_lists() Baokun Li
@ 2024-01-26  8:57 ` Baokun Li
  2024-01-27  2:10   ` Zhang Yi
  2024-02-13 16:15   ` Jan Kara
  2024-01-26  8:57 ` [PATCH 7/7] ext4: set the type of max_zeroout to unsigned int " Baokun Li
  6 siblings, 2 replies; 31+ messages in thread
From: Baokun Li @ 2024-01-26  8:57 UTC (permalink / raw)
  To: linux-ext4
  Cc: tytso, adilger.kernel, jack, ritesh.list, linux-kernel, yi.zhang,
	yangerkun, chengzhihao1, yukuai3, libaokun1, stable

Now ac_groups_linear_remaining is of type __u16 and s_mb_max_linear_groups
is of type unsigned int, so an overflow occurs when setting a value above
65535 through the mb_max_linear_groups sysfs interface. Therefore, the
type of ac_groups_linear_remaining is set to __u32 to avoid overflow.

Fixes: 196e402adf2e ("ext4: improve cr 0 / cr 1 group scanning")
CC: stable@kernel.org
Signed-off-by: Baokun Li <libaokun1@huawei.com>
---
 fs/ext4/mballoc.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/ext4/mballoc.h b/fs/ext4/mballoc.h
index d7aeb5da7d86..498af2abc5d8 100644
--- a/fs/ext4/mballoc.h
+++ b/fs/ext4/mballoc.h
@@ -194,8 +194,8 @@ struct ext4_allocation_context {
 
 	__u32 ac_groups_considered;
 	__u32 ac_flags;		/* allocation hints */
+	__u32 ac_groups_linear_remaining;
 	__u16 ac_groups_scanned;
-	__u16 ac_groups_linear_remaining;
 	__u16 ac_found;
 	__u16 ac_cX_found[EXT4_MB_NUM_CRS];
 	__u16 ac_tail;
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH 7/7] ext4: set the type of max_zeroout to unsigned int to avoid overflow
  2024-01-26  8:57 [PATCH 0/7] ext4: avoid sysfs variables overflow causing BUG_ON/SOOB Baokun Li
                   ` (5 preceding siblings ...)
  2024-01-26  8:57 ` [PATCH 6/7] ext4: set type of ac_groups_linear_remaining to __u32 to avoid overflow Baokun Li
@ 2024-01-26  8:57 ` Baokun Li
  2024-01-27  2:12   ` Zhang Yi
  2024-02-13 16:38   ` Jan Kara
  6 siblings, 2 replies; 31+ messages in thread
From: Baokun Li @ 2024-01-26  8:57 UTC (permalink / raw)
  To: linux-ext4
  Cc: tytso, adilger.kernel, jack, ritesh.list, linux-kernel, yi.zhang,
	yangerkun, chengzhihao1, yukuai3, libaokun1

The max_zeroout is of type int and the s_extent_max_zeroout_kb is of
type uint, and the s_extent_max_zeroout_kb can be freely modified via
the sysfs interface. When the block size is 1024, max_zeroout may
overflow, so declare it as unsigned int to avoid overflow.

Signed-off-by: Baokun Li <libaokun1@huawei.com>
---
 fs/ext4/extents.c | 6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
index 01299b55a567..8653b13e8248 100644
--- a/fs/ext4/extents.c
+++ b/fs/ext4/extents.c
@@ -3425,10 +3425,8 @@ static int ext4_ext_convert_to_initialized(handle_t *handle,
 	struct ext4_extent zero_ex1, zero_ex2;
 	struct ext4_extent *ex, *abut_ex;
 	ext4_lblk_t ee_block, eof_block;
-	unsigned int ee_len, depth, map_len = map->m_len;
-	int allocated = 0, max_zeroout = 0;
-	int err = 0;
-	int split_flag = EXT4_EXT_DATA_VALID2;
+	unsigned int ee_len, depth, map_len = map->m_len, max_zeroout = 0;
+	int err = 0, allocated = 0, split_flag = EXT4_EXT_DATA_VALID2;
 
 	ext_debug(inode, "logical block %llu, max_blocks %u\n",
 		  (unsigned long long)map->m_lblk, map_len);
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* Re: [PATCH 1/7] ext4: avoid overflow when setting values via sysfs
  2024-01-26  8:57 ` [PATCH 1/7] ext4: avoid overflow when setting values via sysfs Baokun Li
@ 2024-01-26  9:28   ` Zhang Yi
  2024-02-13 16:05   ` Jan Kara
  1 sibling, 0 replies; 31+ messages in thread
From: Zhang Yi @ 2024-01-26  9:28 UTC (permalink / raw)
  To: Baokun Li, linux-ext4
  Cc: tytso, adilger.kernel, jack, ritesh.list, linux-kernel,
	yangerkun, chengzhihao1, yukuai3

On 2024/1/26 16:57, Baokun Li wrote:
> When setting values of type unsigned int through sysfs, we use kstrtoul()
> to parse it and then truncate part of it as the final set value, when the
> set value is greater than UINT_MAX, the set value will not match what we
> see because of the truncation. As follows:
> 
>   $ echo 4294967296 > /sys/fs/ext4/sda/mb_max_linear_groups
>   $ cat /sys/fs/ext4/sda/mb_max_linear_groups
>     0
> 
> So when the value set is outside the variable type range, -EINVAL is
> returned to avoid the inconsistency described above. In addition, a
> judgment is added to avoid setting s_resv_clusters less than 0.
> 
> Signed-off-by: Baokun Li <libaokun1@huawei.com>

Thanks for the patch. Looks good to me.

Reviewed-by: Zhang Yi <yi.zhang@huawei.com>

> ---
>  fs/ext4/sysfs.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/ext4/sysfs.c b/fs/ext4/sysfs.c
> index 6d332dff79dd..3671a8aaf4af 100644
> --- a/fs/ext4/sysfs.c
> +++ b/fs/ext4/sysfs.c
> @@ -104,7 +104,7 @@ static ssize_t reserved_clusters_store(struct ext4_sb_info *sbi,
>  	int ret;
>  
>  	ret = kstrtoull(skip_spaces(buf), 0, &val);
> -	if (ret || val >= clusters)
> +	if (ret || val >= clusters || (s64)val < 0)
>  		return -EINVAL;
>  
>  	atomic64_set(&sbi->s_resv_clusters, val);
> @@ -463,6 +463,8 @@ static ssize_t ext4_attr_store(struct kobject *kobj,
>  		ret = kstrtoul(skip_spaces(buf), 0, &t);
>  		if (ret)
>  			return ret;
> +		if (t != (unsigned int)t)
> +			return -EINVAL;
>  		if (a->attr_ptr == ptr_ext4_super_block_offset)
>  			*((__le32 *) ptr) = cpu_to_le32(t);
>  		else
> 

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 2/7] ext4: refactor out ext4_generic_attr_store()
  2024-01-26  8:57 ` [PATCH 2/7] ext4: refactor out ext4_generic_attr_store() Baokun Li
@ 2024-01-26  9:37   ` Zhang Yi
  2024-02-13 16:47   ` Jan Kara
  1 sibling, 0 replies; 31+ messages in thread
From: Zhang Yi @ 2024-01-26  9:37 UTC (permalink / raw)
  To: Baokun Li, linux-ext4
  Cc: tytso, adilger.kernel, jack, ritesh.list, linux-kernel,
	yangerkun, chengzhihao1, yukuai3

On 2024/1/26 16:57, Baokun Li wrote:
> Refactor out the function ext4_generic_attr_store() to handle the setting
> of values of various common types, with no functional changes.
> 
> Signed-off-by: Baokun Li <libaokun1@huawei.com>

Looks good to me.

Reviewed-by: Zhang Yi <yi.zhang@huawei.com>

> ---
>  fs/ext4/sysfs.c | 49 ++++++++++++++++++++++++++++---------------------
>  1 file changed, 28 insertions(+), 21 deletions(-)
> 
> diff --git a/fs/ext4/sysfs.c b/fs/ext4/sysfs.c
> index 3671a8aaf4af..834f9a0eb641 100644
> --- a/fs/ext4/sysfs.c
> +++ b/fs/ext4/sysfs.c
> @@ -443,26 +443,22 @@ static ssize_t ext4_attr_show(struct kobject *kobj,
>  	return 0;
>  }
>  
> -static ssize_t ext4_attr_store(struct kobject *kobj,
> -			       struct attribute *attr,
> -			       const char *buf, size_t len)
> +static ssize_t ext4_generic_attr_store(struct ext4_attr *a,
> +				       struct ext4_sb_info *sbi,
> +				       const char *buf, size_t len)
>  {
> -	struct ext4_sb_info *sbi = container_of(kobj, struct ext4_sb_info,
> -						s_kobj);
> -	struct ext4_attr *a = container_of(attr, struct ext4_attr, attr);
> -	void *ptr = calc_ptr(a, sbi);
> -	unsigned long t;
>  	int ret;
> +	unsigned long t;
> +	void *ptr = calc_ptr(a, sbi);
> +
> +	if (!ptr)
> +		return 0;
> +	ret = kstrtoul(skip_spaces(buf), 0, &t);
> +	if (ret)
> +		return ret;
>  
>  	switch (a->attr_id) {
> -	case attr_reserved_clusters:
> -		return reserved_clusters_store(sbi, buf, len);
>  	case attr_pointer_ui:
> -		if (!ptr)
> -			return 0;
> -		ret = kstrtoul(skip_spaces(buf), 0, &t);
> -		if (ret)
> -			return ret;
>  		if (t != (unsigned int)t)
>  			return -EINVAL;
>  		if (a->attr_ptr == ptr_ext4_super_block_offset)
> @@ -471,19 +467,30 @@ static ssize_t ext4_attr_store(struct kobject *kobj,
>  			*((unsigned int *) ptr) = t;
>  		return len;
>  	case attr_pointer_ul:
> -		if (!ptr)
> -			return 0;
> -		ret = kstrtoul(skip_spaces(buf), 0, &t);
> -		if (ret)
> -			return ret;
>  		*((unsigned long *) ptr) = t;
>  		return len;
> +	}
> +	return 0;
> +}
> +
> +static ssize_t ext4_attr_store(struct kobject *kobj,
> +			       struct attribute *attr,
> +			       const char *buf, size_t len)
> +{
> +	struct ext4_sb_info *sbi = container_of(kobj, struct ext4_sb_info,
> +						s_kobj);
> +	struct ext4_attr *a = container_of(attr, struct ext4_attr, attr);
> +
> +	switch (a->attr_id) {
> +	case attr_reserved_clusters:
> +		return reserved_clusters_store(sbi, buf, len);
>  	case attr_inode_readahead:
>  		return inode_readahead_blks_store(sbi, buf, len);
>  	case attr_trigger_test_error:
>  		return trigger_test_error(sbi, buf, len);
> +	default:
> +		return ext4_generic_attr_store(a, sbi, buf, len);
>  	}
> -	return 0;
>  }
>  
>  static void ext4_sb_release(struct kobject *kobj)
> 

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 3/7] ext4: refactor out ext4_generic_attr_show()
  2024-01-26  8:57 ` [PATCH 3/7] ext4: refactor out ext4_generic_attr_show() Baokun Li
@ 2024-01-26 10:08   ` Zhang Yi
  2024-02-13 16:44   ` Jan Kara
  1 sibling, 0 replies; 31+ messages in thread
From: Zhang Yi @ 2024-01-26 10:08 UTC (permalink / raw)
  To: Baokun Li, linux-ext4
  Cc: tytso, adilger.kernel, jack, ritesh.list, linux-kernel,
	yangerkun, chengzhihao1, yukuai3

On 2024/1/26 16:57, Baokun Li wrote:
> Refactor out the function ext4_generic_attr_show() to handle the reading
> of values of various common types, with no functional changes.
> 
> Signed-off-by: Baokun Li <libaokun1@huawei.com>

Looks good to me.

Reviewed-by: Zhang Yi <yi.zhang@huawei.com>

> ---
>  fs/ext4/sysfs.c | 74 +++++++++++++++++++++----------------------------
>  1 file changed, 32 insertions(+), 42 deletions(-)
> 
> diff --git a/fs/ext4/sysfs.c b/fs/ext4/sysfs.c
> index 834f9a0eb641..a5d657fa05cb 100644
> --- a/fs/ext4/sysfs.c
> +++ b/fs/ext4/sysfs.c
> @@ -366,13 +366,42 @@ static ssize_t __print_tstamp(char *buf, __le32 lo, __u8 hi)
>  #define print_tstamp(buf, es, tstamp) \
>  	__print_tstamp(buf, (es)->tstamp, (es)->tstamp ## _hi)
>  
> +static ssize_t ext4_generic_attr_show(struct ext4_attr *a,
> +				      struct ext4_sb_info *sbi, char *buf)
> +{
> +	void *ptr = calc_ptr(a, sbi);
> +
> +	if (!ptr)
> +		return 0;
> +
> +	switch (a->attr_id) {
> +	case attr_inode_readahead:
> +	case attr_pointer_ui:
> +		if (a->attr_ptr == ptr_ext4_super_block_offset)
> +			return sysfs_emit(buf, "%u\n", le32_to_cpup(ptr));
> +		return sysfs_emit(buf, "%u\n", *((unsigned int *) ptr));
> +	case attr_pointer_ul:
> +		return sysfs_emit(buf, "%lu\n", *((unsigned long *) ptr));
> +	case attr_pointer_u8:
> +		return sysfs_emit(buf, "%u\n", *((unsigned char *) ptr));
> +	case attr_pointer_u64:
> +		if (a->attr_ptr == ptr_ext4_super_block_offset)
> +			return sysfs_emit(buf, "%llu\n", le64_to_cpup(ptr));
> +		return sysfs_emit(buf, "%llu\n", *((unsigned long long *) ptr));
> +	case attr_pointer_string:
> +		return sysfs_emit(buf, "%.*s\n", a->attr_size, (char *) ptr);
> +	case attr_pointer_atomic:
> +		return sysfs_emit(buf, "%d\n", atomic_read((atomic_t *) ptr));
> +	}
> +	return 0;
> +}
> +
>  static ssize_t ext4_attr_show(struct kobject *kobj,
>  			      struct attribute *attr, char *buf)
>  {
>  	struct ext4_sb_info *sbi = container_of(kobj, struct ext4_sb_info,
>  						s_kobj);
>  	struct ext4_attr *a = container_of(attr, struct ext4_attr, attr);
> -	void *ptr = calc_ptr(a, sbi);
>  
>  	switch (a->attr_id) {
>  	case attr_delayed_allocation_blocks:
> @@ -391,45 +420,6 @@ static ssize_t ext4_attr_show(struct kobject *kobj,
>  		return sysfs_emit(buf, "%llu\n",
>  				(unsigned long long)
>  			percpu_counter_sum(&sbi->s_sra_exceeded_retry_limit));
> -	case attr_inode_readahead:
> -	case attr_pointer_ui:
> -		if (!ptr)
> -			return 0;
> -		if (a->attr_ptr == ptr_ext4_super_block_offset)
> -			return sysfs_emit(buf, "%u\n",
> -					le32_to_cpup(ptr));
> -		else
> -			return sysfs_emit(buf, "%u\n",
> -					*((unsigned int *) ptr));
> -	case attr_pointer_ul:
> -		if (!ptr)
> -			return 0;
> -		return sysfs_emit(buf, "%lu\n",
> -				*((unsigned long *) ptr));
> -	case attr_pointer_u8:
> -		if (!ptr)
> -			return 0;
> -		return sysfs_emit(buf, "%u\n",
> -				*((unsigned char *) ptr));
> -	case attr_pointer_u64:
> -		if (!ptr)
> -			return 0;
> -		if (a->attr_ptr == ptr_ext4_super_block_offset)
> -			return sysfs_emit(buf, "%llu\n",
> -					le64_to_cpup(ptr));
> -		else
> -			return sysfs_emit(buf, "%llu\n",
> -					*((unsigned long long *) ptr));
> -	case attr_pointer_string:
> -		if (!ptr)
> -			return 0;
> -		return sysfs_emit(buf, "%.*s\n", a->attr_size,
> -				(char *) ptr);
> -	case attr_pointer_atomic:
> -		if (!ptr)
> -			return 0;
> -		return sysfs_emit(buf, "%d\n",
> -				atomic_read((atomic_t *) ptr));
>  	case attr_feature:
>  		return sysfs_emit(buf, "supported\n");
>  	case attr_first_error_time:
> @@ -438,9 +428,9 @@ static ssize_t ext4_attr_show(struct kobject *kobj,
>  		return print_tstamp(buf, sbi->s_es, s_last_error_time);
>  	case attr_journal_task:
>  		return journal_task_show(sbi, buf);
> +	default:
> +		return ext4_generic_attr_show(a, sbi, buf);
>  	}
> -
> -	return 0;
>  }
>  
>  static ssize_t ext4_generic_attr_store(struct ext4_attr *a,
> 

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 4/7] ext4: add positive int attr pointer to avoid sysfs variables overflow
  2024-01-26  8:57 ` [PATCH 4/7] ext4: add positive int attr pointer to avoid sysfs variables overflow Baokun Li
@ 2024-01-27  2:07   ` Zhang Yi
  2024-02-13 16:58   ` Jan Kara
  1 sibling, 0 replies; 31+ messages in thread
From: Zhang Yi @ 2024-01-27  2:07 UTC (permalink / raw)
  To: Baokun Li, linux-ext4
  Cc: tytso, adilger.kernel, jack, ritesh.list, linux-kernel,
	yangerkun, chengzhihao1, yukuai3, stable

On 2024/1/26 16:57, Baokun Li wrote:
> We can easily trigger a BUG_ON by using the following commands:
> 
>     mount /dev/$disk /tmp/test
>     echo 2147483650 > /sys/fs/ext4/$disk/mb_group_prealloc
>     echo test > /tmp/test/file && sync
> 
> ==================================================================
> kernel BUG at fs/ext4/mballoc.c:2029!
> invalid opcode: 0000 [#1] PREEMPT SMP PTI
> CPU: 3 PID: 320 Comm: kworker/u36:1 Not tainted 6.8.0-rc1 #462
> RIP: 0010:mb_mark_used+0x358/0x370
> [...]
> Call Trace:
>  ext4_mb_use_best_found+0x56/0x140
>  ext4_mb_complex_scan_group+0x196/0x2f0
>  ext4_mb_regular_allocator+0xa92/0xf00
>  ext4_mb_new_blocks+0x302/0xbc0
>  ext4_ext_map_blocks+0x95a/0xef0
>  ext4_map_blocks+0x2b1/0x680
>  ext4_do_writepages+0x733/0xbd0
> [...]
> ==================================================================
> 
> In ext4_mb_normalize_group_request():
>     ac->ac_g_ex.fe_len = EXT4_SB(sb)->s_mb_group_prealloc;
> 
> Here fe_len is of type int, but s_mb_group_prealloc is of type unsigned
> int, so setting s_mb_group_prealloc to 2147483650 overflows fe_len to a
> negative number, which ultimately triggers a BUG_ON() in mb_mark_used().
> 
> Therefore, we add attr_pointer_pi (aka positive int attr pointer) with a
> value range of 0-INT_MAX to avoid the above problem. In addition to the
> mb_group_prealloc sysfs interface, the following interfaces also have uint
> to int conversions that result in overflows, and are also fixed.
> 
>   err_ratelimit_burst
>   msg_ratelimit_burst
>   warning_ratelimit_burst
>   err_ratelimit_interval_ms
>   msg_ratelimit_interval_ms
>   warning_ratelimit_interval_ms
>   mb_best_avail_max_trim_order
> 
> CC: stable@vger.kernel.org
> Signed-off-by: Baokun Li <libaokun1@huawei.com>

Looks good to me.

Reviewed-by: Zhang Yi <yi.zhang@huawei.com>

> ---
>  fs/ext4/sysfs.c | 25 +++++++++++++++++--------
>  1 file changed, 17 insertions(+), 8 deletions(-)
> 
> diff --git a/fs/ext4/sysfs.c b/fs/ext4/sysfs.c
> index a5d657fa05cb..6f9f96e00f2f 100644
> --- a/fs/ext4/sysfs.c
> +++ b/fs/ext4/sysfs.c
> @@ -30,6 +30,7 @@ typedef enum {
>  	attr_first_error_time,
>  	attr_last_error_time,
>  	attr_feature,
> +	attr_pointer_pi,
>  	attr_pointer_ui,
>  	attr_pointer_ul,
>  	attr_pointer_u64,
> @@ -178,6 +179,9 @@ static struct ext4_attr ext4_attr_##_name = {			\
>  #define EXT4_RO_ATTR_ES_STRING(_name,_elname,_size)			\
>  	EXT4_ATTR_STRING(_name, 0444, _size, ext4_super_block, _elname)
>  
> +#define EXT4_RW_ATTR_SBI_PI(_name,_elname)      \
> +	EXT4_ATTR_OFFSET(_name, 0644, pointer_pi, ext4_sb_info, _elname)
> +
>  #define EXT4_RW_ATTR_SBI_UI(_name,_elname)	\
>  	EXT4_ATTR_OFFSET(_name, 0644, pointer_ui, ext4_sb_info, _elname)
>  
> @@ -213,17 +217,17 @@ EXT4_RW_ATTR_SBI_UI(mb_max_to_scan, s_mb_max_to_scan);
>  EXT4_RW_ATTR_SBI_UI(mb_min_to_scan, s_mb_min_to_scan);
>  EXT4_RW_ATTR_SBI_UI(mb_order2_req, s_mb_order2_reqs);
>  EXT4_RW_ATTR_SBI_UI(mb_stream_req, s_mb_stream_request);
> -EXT4_RW_ATTR_SBI_UI(mb_group_prealloc, s_mb_group_prealloc);
> +EXT4_RW_ATTR_SBI_PI(mb_group_prealloc, s_mb_group_prealloc);
>  EXT4_RW_ATTR_SBI_UI(mb_max_linear_groups, s_mb_max_linear_groups);
>  EXT4_RW_ATTR_SBI_UI(extent_max_zeroout_kb, s_extent_max_zeroout_kb);
>  EXT4_ATTR(trigger_fs_error, 0200, trigger_test_error);
> -EXT4_RW_ATTR_SBI_UI(err_ratelimit_interval_ms, s_err_ratelimit_state.interval);
> -EXT4_RW_ATTR_SBI_UI(err_ratelimit_burst, s_err_ratelimit_state.burst);
> -EXT4_RW_ATTR_SBI_UI(warning_ratelimit_interval_ms, s_warning_ratelimit_state.interval);
> -EXT4_RW_ATTR_SBI_UI(warning_ratelimit_burst, s_warning_ratelimit_state.burst);
> -EXT4_RW_ATTR_SBI_UI(msg_ratelimit_interval_ms, s_msg_ratelimit_state.interval);
> -EXT4_RW_ATTR_SBI_UI(msg_ratelimit_burst, s_msg_ratelimit_state.burst);
> -EXT4_RW_ATTR_SBI_UI(mb_best_avail_max_trim_order, s_mb_best_avail_max_trim_order);
> +EXT4_RW_ATTR_SBI_PI(err_ratelimit_interval_ms, s_err_ratelimit_state.interval);
> +EXT4_RW_ATTR_SBI_PI(err_ratelimit_burst, s_err_ratelimit_state.burst);
> +EXT4_RW_ATTR_SBI_PI(warning_ratelimit_interval_ms, s_warning_ratelimit_state.interval);
> +EXT4_RW_ATTR_SBI_PI(warning_ratelimit_burst, s_warning_ratelimit_state.burst);
> +EXT4_RW_ATTR_SBI_PI(msg_ratelimit_interval_ms, s_msg_ratelimit_state.interval);
> +EXT4_RW_ATTR_SBI_PI(msg_ratelimit_burst, s_msg_ratelimit_state.burst);
> +EXT4_RW_ATTR_SBI_PI(mb_best_avail_max_trim_order, s_mb_best_avail_max_trim_order);
>  #ifdef CONFIG_EXT4_DEBUG
>  EXT4_RW_ATTR_SBI_UL(simulate_fail, s_simulate_fail);
>  #endif
> @@ -376,6 +380,7 @@ static ssize_t ext4_generic_attr_show(struct ext4_attr *a,
>  
>  	switch (a->attr_id) {
>  	case attr_inode_readahead:
> +	case attr_pointer_pi:
>  	case attr_pointer_ui:
>  		if (a->attr_ptr == ptr_ext4_super_block_offset)
>  			return sysfs_emit(buf, "%u\n", le32_to_cpup(ptr));
> @@ -448,6 +453,10 @@ static ssize_t ext4_generic_attr_store(struct ext4_attr *a,
>  		return ret;
>  
>  	switch (a->attr_id) {
> +	case attr_pointer_pi:
> +		if ((int)t < 0)
> +			return -EINVAL;
> +		fallthrough;
>  	case attr_pointer_ui:
>  		if (t != (unsigned int)t)
>  			return -EINVAL;
> 

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 5/7] ext4: fix slab-out-of-bounds in ext4_mb_find_good_group_avg_frag_lists()
  2024-01-26  8:57 ` [PATCH 5/7] ext4: fix slab-out-of-bounds in ext4_mb_find_good_group_avg_frag_lists() Baokun Li
@ 2024-01-27  2:09   ` Zhang Yi
  2024-02-13 16:14   ` Jan Kara
  2024-02-20  5:39   ` Ojaswin Mujoo
  2 siblings, 0 replies; 31+ messages in thread
From: Zhang Yi @ 2024-01-27  2:09 UTC (permalink / raw)
  To: Baokun Li, linux-ext4
  Cc: tytso, adilger.kernel, jack, ritesh.list, linux-kernel,
	yangerkun, chengzhihao1, yukuai3, stable

On 2024/1/26 16:57, Baokun Li wrote:
> We can trigger a slab-out-of-bounds with the following commands:
> 
>     mkfs.ext4 -F /dev/$disk 10G
>     mount /dev/$disk /tmp/test
>     echo 2147483647 > /sys/fs/ext4/$disk/mb_group_prealloc
>     echo test > /tmp/test/file && sync
> 
> ==================================================================
> BUG: KASAN: slab-out-of-bounds in ext4_mb_find_good_group_avg_frag_lists+0x8a/0x200 [ext4]
> Read of size 8 at addr ffff888121b9d0f0 by task kworker/u2:0/11
> CPU: 0 PID: 11 Comm: kworker/u2:0 Tainted: GL 6.7.0-next-20240118 #521
> Call Trace:
>  dump_stack_lvl+0x2c/0x50
>  kasan_report+0xb6/0xf0
>  ext4_mb_find_good_group_avg_frag_lists+0x8a/0x200 [ext4]
>  ext4_mb_regular_allocator+0x19e9/0x2370 [ext4]
>  ext4_mb_new_blocks+0x88a/0x1370 [ext4]
>  ext4_ext_map_blocks+0x14f7/0x2390 [ext4]
>  ext4_map_blocks+0x569/0xea0 [ext4]
>  ext4_do_writepages+0x10f6/0x1bc0 [ext4]
> [...]
> ==================================================================
> 
> The flow of issue triggering is as follows:
> 
> // Set s_mb_group_prealloc to 2147483647 via sysfs
> ext4_mb_new_blocks
>   ext4_mb_normalize_request
>     ext4_mb_normalize_group_request
>       ac->ac_g_ex.fe_len = EXT4_SB(sb)->s_mb_group_prealloc
>   ext4_mb_regular_allocator
>     ext4_mb_choose_next_group
>       ext4_mb_choose_next_group_best_avail
>         mb_avg_fragment_size_order
>           order = fls(len) - 2 = 29
>         ext4_mb_find_good_group_avg_frag_lists
>           frag_list = &sbi->s_mb_avg_fragment_size[order]
>           if (list_empty(frag_list)) // Trigger SOOB!
> 
> At 4k block size, the length of the s_mb_avg_fragment_size list is 14, but
> an oversized s_mb_group_prealloc is set, causing slab-out-of-bounds to be
> triggered by an attempt to access an element at index 29.
> 
> Therefore it is not allowed to set s_mb_group_prealloc to a value greater
> than s_clusters_per_group via sysfs, and to avoid returning an order from
> mb_avg_fragment_size_order() that is greater than MB_NUM_ORDERS(sb).
> 
> Fixes: 7e170922f06b ("ext4: Add allocation criteria 1.5 (CR1_5)")
> CC: stable@vger.kernel.org
> Signed-off-by: Baokun Li <libaokun1@huawei.com>

Looks good to me.

Reviewed-by: Zhang Yi <yi.zhang@huawei.com>

> ---
>  fs/ext4/mballoc.c | 2 ++
>  fs/ext4/sysfs.c   | 9 ++++++++-
>  2 files changed, 10 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
> index f44f668e407f..1ea6491b6b00 100644
> --- a/fs/ext4/mballoc.c
> +++ b/fs/ext4/mballoc.c
> @@ -832,6 +832,8 @@ static int mb_avg_fragment_size_order(struct super_block *sb, ext4_grpblk_t len)
>  		return 0;
>  	if (order == MB_NUM_ORDERS(sb))
>  		order--;
> +	if (WARN_ON_ONCE(order > MB_NUM_ORDERS(sb)))
> +		order = MB_NUM_ORDERS(sb) - 1;
>  	return order;
>  }
>  
> diff --git a/fs/ext4/sysfs.c b/fs/ext4/sysfs.c
> index 6f9f96e00f2f..60ca7b2797b2 100644
> --- a/fs/ext4/sysfs.c
> +++ b/fs/ext4/sysfs.c
> @@ -29,6 +29,7 @@ typedef enum {
>  	attr_trigger_test_error,
>  	attr_first_error_time,
>  	attr_last_error_time,
> +	attr_group_prealloc,
>  	attr_feature,
>  	attr_pointer_pi,
>  	attr_pointer_ui,
> @@ -211,13 +212,14 @@ EXT4_ATTR_FUNC(sra_exceeded_retry_limit, 0444);
>  
>  EXT4_ATTR_OFFSET(inode_readahead_blks, 0644, inode_readahead,
>  		 ext4_sb_info, s_inode_readahead_blks);
> +EXT4_ATTR_OFFSET(mb_group_prealloc, 0644, group_prealloc,
> +		 ext4_sb_info, s_mb_group_prealloc);
>  EXT4_RW_ATTR_SBI_UI(inode_goal, s_inode_goal);
>  EXT4_RW_ATTR_SBI_UI(mb_stats, s_mb_stats);
>  EXT4_RW_ATTR_SBI_UI(mb_max_to_scan, s_mb_max_to_scan);
>  EXT4_RW_ATTR_SBI_UI(mb_min_to_scan, s_mb_min_to_scan);
>  EXT4_RW_ATTR_SBI_UI(mb_order2_req, s_mb_order2_reqs);
>  EXT4_RW_ATTR_SBI_UI(mb_stream_req, s_mb_stream_request);
> -EXT4_RW_ATTR_SBI_PI(mb_group_prealloc, s_mb_group_prealloc);
>  EXT4_RW_ATTR_SBI_UI(mb_max_linear_groups, s_mb_max_linear_groups);
>  EXT4_RW_ATTR_SBI_UI(extent_max_zeroout_kb, s_extent_max_zeroout_kb);
>  EXT4_ATTR(trigger_fs_error, 0200, trigger_test_error);
> @@ -380,6 +382,7 @@ static ssize_t ext4_generic_attr_show(struct ext4_attr *a,
>  
>  	switch (a->attr_id) {
>  	case attr_inode_readahead:
> +	case attr_group_prealloc:
>  	case attr_pointer_pi:
>  	case attr_pointer_ui:
>  		if (a->attr_ptr == ptr_ext4_super_block_offset)
> @@ -453,6 +456,10 @@ static ssize_t ext4_generic_attr_store(struct ext4_attr *a,
>  		return ret;
>  
>  	switch (a->attr_id) {
> +	case attr_group_prealloc:
> +		if (t > sbi->s_clusters_per_group)
> +			return -EINVAL;
> +		fallthrough;
>  	case attr_pointer_pi:
>  		if ((int)t < 0)
>  			return -EINVAL;
> 

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 6/7] ext4: set type of ac_groups_linear_remaining to __u32 to avoid overflow
  2024-01-26  8:57 ` [PATCH 6/7] ext4: set type of ac_groups_linear_remaining to __u32 to avoid overflow Baokun Li
@ 2024-01-27  2:10   ` Zhang Yi
  2024-02-13 16:15   ` Jan Kara
  1 sibling, 0 replies; 31+ messages in thread
From: Zhang Yi @ 2024-01-27  2:10 UTC (permalink / raw)
  To: Baokun Li, linux-ext4
  Cc: tytso, adilger.kernel, jack, ritesh.list, linux-kernel,
	yangerkun, chengzhihao1, yukuai3, stable

On 2024/1/26 16:57, Baokun Li wrote:
> Now ac_groups_linear_remaining is of type __u16 and s_mb_max_linear_groups
> is of type unsigned int, so an overflow occurs when setting a value above
> 65535 through the mb_max_linear_groups sysfs interface. Therefore, the
> type of ac_groups_linear_remaining is set to __u32 to avoid overflow.
> 
> Fixes: 196e402adf2e ("ext4: improve cr 0 / cr 1 group scanning")
> CC: stable@kernel.org
> Signed-off-by: Baokun Li <libaokun1@huawei.com>

Looks good to me.

Reviewed-by: Zhang Yi <yi.zhang@huawei.com>

> ---
>  fs/ext4/mballoc.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/fs/ext4/mballoc.h b/fs/ext4/mballoc.h
> index d7aeb5da7d86..498af2abc5d8 100644
> --- a/fs/ext4/mballoc.h
> +++ b/fs/ext4/mballoc.h
> @@ -194,8 +194,8 @@ struct ext4_allocation_context {
>  
>  	__u32 ac_groups_considered;
>  	__u32 ac_flags;		/* allocation hints */
> +	__u32 ac_groups_linear_remaining;
>  	__u16 ac_groups_scanned;
> -	__u16 ac_groups_linear_remaining;
>  	__u16 ac_found;
>  	__u16 ac_cX_found[EXT4_MB_NUM_CRS];
>  	__u16 ac_tail;
> 

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 7/7] ext4: set the type of max_zeroout to unsigned int to avoid overflow
  2024-01-26  8:57 ` [PATCH 7/7] ext4: set the type of max_zeroout to unsigned int " Baokun Li
@ 2024-01-27  2:12   ` Zhang Yi
  2024-02-13 16:38   ` Jan Kara
  1 sibling, 0 replies; 31+ messages in thread
From: Zhang Yi @ 2024-01-27  2:12 UTC (permalink / raw)
  To: Baokun Li, linux-ext4
  Cc: tytso, adilger.kernel, jack, ritesh.list, linux-kernel,
	yangerkun, chengzhihao1, yukuai3

On 2024/1/26 16:57, Baokun Li wrote:
> The max_zeroout is of type int and the s_extent_max_zeroout_kb is of
> type uint, and the s_extent_max_zeroout_kb can be freely modified via
> the sysfs interface. When the block size is 1024, max_zeroout may
> overflow, so declare it as unsigned int to avoid overflow.
> 
> Signed-off-by: Baokun Li <libaokun1@huawei.com>

Looks good to me.

Reviewed-by: Zhang Yi <yi.zhang@huawei.com>

> ---
>  fs/ext4/extents.c | 6 ++----
>  1 file changed, 2 insertions(+), 4 deletions(-)
> 
> diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
> index 01299b55a567..8653b13e8248 100644
> --- a/fs/ext4/extents.c
> +++ b/fs/ext4/extents.c
> @@ -3425,10 +3425,8 @@ static int ext4_ext_convert_to_initialized(handle_t *handle,
>  	struct ext4_extent zero_ex1, zero_ex2;
>  	struct ext4_extent *ex, *abut_ex;
>  	ext4_lblk_t ee_block, eof_block;
> -	unsigned int ee_len, depth, map_len = map->m_len;
> -	int allocated = 0, max_zeroout = 0;
> -	int err = 0;
> -	int split_flag = EXT4_EXT_DATA_VALID2;
> +	unsigned int ee_len, depth, map_len = map->m_len, max_zeroout = 0;
> +	int err = 0, allocated = 0, split_flag = EXT4_EXT_DATA_VALID2;
>  
>  	ext_debug(inode, "logical block %llu, max_blocks %u\n",
>  		  (unsigned long long)map->m_lblk, map_len);
> 

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 1/7] ext4: avoid overflow when setting values via sysfs
  2024-01-26  8:57 ` [PATCH 1/7] ext4: avoid overflow when setting values via sysfs Baokun Li
  2024-01-26  9:28   ` Zhang Yi
@ 2024-02-13 16:05   ` Jan Kara
  2024-02-17  7:09     ` Baokun Li
  1 sibling, 1 reply; 31+ messages in thread
From: Jan Kara @ 2024-02-13 16:05 UTC (permalink / raw)
  To: Baokun Li
  Cc: linux-ext4, tytso, adilger.kernel, jack, ritesh.list,
	linux-kernel, yi.zhang, yangerkun, chengzhihao1, yukuai3

On Fri 26-01-24 16:57:10, Baokun Li wrote:
> When setting values of type unsigned int through sysfs, we use kstrtoul()
> to parse it and then truncate part of it as the final set value, when the
> set value is greater than UINT_MAX, the set value will not match what we
> see because of the truncation. As follows:
> 
>   $ echo 4294967296 > /sys/fs/ext4/sda/mb_max_linear_groups
>   $ cat /sys/fs/ext4/sda/mb_max_linear_groups
>     0
> 
> So when the value set is outside the variable type range, -EINVAL is
> returned to avoid the inconsistency described above. In addition, a
> judgment is added to avoid setting s_resv_clusters less than 0.
> 
> Signed-off-by: Baokun Li <libaokun1@huawei.com>
> ---
>  fs/ext4/sysfs.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/ext4/sysfs.c b/fs/ext4/sysfs.c
> index 6d332dff79dd..3671a8aaf4af 100644
> --- a/fs/ext4/sysfs.c
> +++ b/fs/ext4/sysfs.c
> @@ -104,7 +104,7 @@ static ssize_t reserved_clusters_store(struct ext4_sb_info *sbi,
>  	int ret;
>  
>  	ret = kstrtoull(skip_spaces(buf), 0, &val);
> -	if (ret || val >= clusters)
> +	if (ret || val >= clusters || (s64)val < 0)
>  		return -EINVAL;

This looks a bit pointless, doesn't it? 'val' is u64, clusters is u64. We
know that val < clusters so how could (s64)val be < 0?

> @@ -463,6 +463,8 @@ static ssize_t ext4_attr_store(struct kobject *kobj,
>  		ret = kstrtoul(skip_spaces(buf), 0, &t);
>  		if (ret)
>  			return ret;
> +		if (t != (unsigned int)t)
> +			return -EINVAL;
>  		if (a->attr_ptr == ptr_ext4_super_block_offset)
>  			*((__le32 *) ptr) = cpu_to_le32(t);
>  		else

I kind of agree with Alexey that using kstrtouint() here instead would look
nicer. And it isn't like you have to define many new variables. You just
need unsigned long for attr_pointer_ul and unsigned int for
attr_pointer_ui.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 5/7] ext4: fix slab-out-of-bounds in ext4_mb_find_good_group_avg_frag_lists()
  2024-01-26  8:57 ` [PATCH 5/7] ext4: fix slab-out-of-bounds in ext4_mb_find_good_group_avg_frag_lists() Baokun Li
  2024-01-27  2:09   ` Zhang Yi
@ 2024-02-13 16:14   ` Jan Kara
  2024-02-20  5:39   ` Ojaswin Mujoo
  2 siblings, 0 replies; 31+ messages in thread
From: Jan Kara @ 2024-02-13 16:14 UTC (permalink / raw)
  To: Baokun Li
  Cc: linux-ext4, tytso, adilger.kernel, jack, ritesh.list,
	linux-kernel, yi.zhang, yangerkun, chengzhihao1, yukuai3, stable

On Fri 26-01-24 16:57:14, Baokun Li wrote:
> We can trigger a slab-out-of-bounds with the following commands:
> 
>     mkfs.ext4 -F /dev/$disk 10G
>     mount /dev/$disk /tmp/test
>     echo 2147483647 > /sys/fs/ext4/$disk/mb_group_prealloc
>     echo test > /tmp/test/file && sync
> 
> ==================================================================
> BUG: KASAN: slab-out-of-bounds in ext4_mb_find_good_group_avg_frag_lists+0x8a/0x200 [ext4]
> Read of size 8 at addr ffff888121b9d0f0 by task kworker/u2:0/11
> CPU: 0 PID: 11 Comm: kworker/u2:0 Tainted: GL 6.7.0-next-20240118 #521
> Call Trace:
>  dump_stack_lvl+0x2c/0x50
>  kasan_report+0xb6/0xf0
>  ext4_mb_find_good_group_avg_frag_lists+0x8a/0x200 [ext4]
>  ext4_mb_regular_allocator+0x19e9/0x2370 [ext4]
>  ext4_mb_new_blocks+0x88a/0x1370 [ext4]
>  ext4_ext_map_blocks+0x14f7/0x2390 [ext4]
>  ext4_map_blocks+0x569/0xea0 [ext4]
>  ext4_do_writepages+0x10f6/0x1bc0 [ext4]
> [...]
> ==================================================================
> 
> The flow of issue triggering is as follows:
> 
> // Set s_mb_group_prealloc to 2147483647 via sysfs
> ext4_mb_new_blocks
>   ext4_mb_normalize_request
>     ext4_mb_normalize_group_request
>       ac->ac_g_ex.fe_len = EXT4_SB(sb)->s_mb_group_prealloc
>   ext4_mb_regular_allocator
>     ext4_mb_choose_next_group
>       ext4_mb_choose_next_group_best_avail
>         mb_avg_fragment_size_order
>           order = fls(len) - 2 = 29
>         ext4_mb_find_good_group_avg_frag_lists
>           frag_list = &sbi->s_mb_avg_fragment_size[order]
>           if (list_empty(frag_list)) // Trigger SOOB!
> 
> At 4k block size, the length of the s_mb_avg_fragment_size list is 14, but
> an oversized s_mb_group_prealloc is set, causing slab-out-of-bounds to be
> triggered by an attempt to access an element at index 29.
> 
> Therefore it is not allowed to set s_mb_group_prealloc to a value greater
> than s_clusters_per_group via sysfs, and to avoid returning an order from
> mb_avg_fragment_size_order() that is greater than MB_NUM_ORDERS(sb).
> 
> Fixes: 7e170922f06b ("ext4: Add allocation criteria 1.5 (CR1_5)")
> CC: stable@vger.kernel.org
> Signed-off-by: Baokun Li <libaokun1@huawei.com>

Looks good. Feel free to add:

Reviewed-by: Jan Kara <jack@suse.cz>

								Honza


> ---
>  fs/ext4/mballoc.c | 2 ++
>  fs/ext4/sysfs.c   | 9 ++++++++-
>  2 files changed, 10 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
> index f44f668e407f..1ea6491b6b00 100644
> --- a/fs/ext4/mballoc.c
> +++ b/fs/ext4/mballoc.c
> @@ -832,6 +832,8 @@ static int mb_avg_fragment_size_order(struct super_block *sb, ext4_grpblk_t len)
>  		return 0;
>  	if (order == MB_NUM_ORDERS(sb))
>  		order--;
> +	if (WARN_ON_ONCE(order > MB_NUM_ORDERS(sb)))
> +		order = MB_NUM_ORDERS(sb) - 1;
>  	return order;
>  }
>  
> diff --git a/fs/ext4/sysfs.c b/fs/ext4/sysfs.c
> index 6f9f96e00f2f..60ca7b2797b2 100644
> --- a/fs/ext4/sysfs.c
> +++ b/fs/ext4/sysfs.c
> @@ -29,6 +29,7 @@ typedef enum {
>  	attr_trigger_test_error,
>  	attr_first_error_time,
>  	attr_last_error_time,
> +	attr_group_prealloc,
>  	attr_feature,
>  	attr_pointer_pi,
>  	attr_pointer_ui,
> @@ -211,13 +212,14 @@ EXT4_ATTR_FUNC(sra_exceeded_retry_limit, 0444);
>  
>  EXT4_ATTR_OFFSET(inode_readahead_blks, 0644, inode_readahead,
>  		 ext4_sb_info, s_inode_readahead_blks);
> +EXT4_ATTR_OFFSET(mb_group_prealloc, 0644, group_prealloc,
> +		 ext4_sb_info, s_mb_group_prealloc);
>  EXT4_RW_ATTR_SBI_UI(inode_goal, s_inode_goal);
>  EXT4_RW_ATTR_SBI_UI(mb_stats, s_mb_stats);
>  EXT4_RW_ATTR_SBI_UI(mb_max_to_scan, s_mb_max_to_scan);
>  EXT4_RW_ATTR_SBI_UI(mb_min_to_scan, s_mb_min_to_scan);
>  EXT4_RW_ATTR_SBI_UI(mb_order2_req, s_mb_order2_reqs);
>  EXT4_RW_ATTR_SBI_UI(mb_stream_req, s_mb_stream_request);
> -EXT4_RW_ATTR_SBI_PI(mb_group_prealloc, s_mb_group_prealloc);
>  EXT4_RW_ATTR_SBI_UI(mb_max_linear_groups, s_mb_max_linear_groups);
>  EXT4_RW_ATTR_SBI_UI(extent_max_zeroout_kb, s_extent_max_zeroout_kb);
>  EXT4_ATTR(trigger_fs_error, 0200, trigger_test_error);
> @@ -380,6 +382,7 @@ static ssize_t ext4_generic_attr_show(struct ext4_attr *a,
>  
>  	switch (a->attr_id) {
>  	case attr_inode_readahead:
> +	case attr_group_prealloc:
>  	case attr_pointer_pi:
>  	case attr_pointer_ui:
>  		if (a->attr_ptr == ptr_ext4_super_block_offset)
> @@ -453,6 +456,10 @@ static ssize_t ext4_generic_attr_store(struct ext4_attr *a,
>  		return ret;
>  
>  	switch (a->attr_id) {
> +	case attr_group_prealloc:
> +		if (t > sbi->s_clusters_per_group)
> +			return -EINVAL;
> +		fallthrough;
>  	case attr_pointer_pi:
>  		if ((int)t < 0)
>  			return -EINVAL;
> -- 
> 2.31.1
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 6/7] ext4: set type of ac_groups_linear_remaining to __u32 to avoid overflow
  2024-01-26  8:57 ` [PATCH 6/7] ext4: set type of ac_groups_linear_remaining to __u32 to avoid overflow Baokun Li
  2024-01-27  2:10   ` Zhang Yi
@ 2024-02-13 16:15   ` Jan Kara
  1 sibling, 0 replies; 31+ messages in thread
From: Jan Kara @ 2024-02-13 16:15 UTC (permalink / raw)
  To: Baokun Li
  Cc: linux-ext4, tytso, adilger.kernel, jack, ritesh.list,
	linux-kernel, yi.zhang, yangerkun, chengzhihao1, yukuai3, stable

On Fri 26-01-24 16:57:15, Baokun Li wrote:
> Now ac_groups_linear_remaining is of type __u16 and s_mb_max_linear_groups
> is of type unsigned int, so an overflow occurs when setting a value above
> 65535 through the mb_max_linear_groups sysfs interface. Therefore, the
> type of ac_groups_linear_remaining is set to __u32 to avoid overflow.
> 
> Fixes: 196e402adf2e ("ext4: improve cr 0 / cr 1 group scanning")
> CC: stable@kernel.org
> Signed-off-by: Baokun Li <libaokun1@huawei.com>

Looks good. Feel free to add:

Reviewed-by: Jan Kara <jack@suse.cz>

								Honza

> ---
>  fs/ext4/mballoc.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/fs/ext4/mballoc.h b/fs/ext4/mballoc.h
> index d7aeb5da7d86..498af2abc5d8 100644
> --- a/fs/ext4/mballoc.h
> +++ b/fs/ext4/mballoc.h
> @@ -194,8 +194,8 @@ struct ext4_allocation_context {
>  
>  	__u32 ac_groups_considered;
>  	__u32 ac_flags;		/* allocation hints */
> +	__u32 ac_groups_linear_remaining;
>  	__u16 ac_groups_scanned;
> -	__u16 ac_groups_linear_remaining;
>  	__u16 ac_found;
>  	__u16 ac_cX_found[EXT4_MB_NUM_CRS];
>  	__u16 ac_tail;
> -- 
> 2.31.1
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 7/7] ext4: set the type of max_zeroout to unsigned int to avoid overflow
  2024-01-26  8:57 ` [PATCH 7/7] ext4: set the type of max_zeroout to unsigned int " Baokun Li
  2024-01-27  2:12   ` Zhang Yi
@ 2024-02-13 16:38   ` Jan Kara
  2024-02-17  7:45     ` Baokun Li
  1 sibling, 1 reply; 31+ messages in thread
From: Jan Kara @ 2024-02-13 16:38 UTC (permalink / raw)
  To: Baokun Li
  Cc: linux-ext4, tytso, adilger.kernel, jack, ritesh.list,
	linux-kernel, yi.zhang, yangerkun, chengzhihao1, yukuai3

On Fri 26-01-24 16:57:16, Baokun Li wrote:
> The max_zeroout is of type int and the s_extent_max_zeroout_kb is of
> type uint, and the s_extent_max_zeroout_kb can be freely modified via
> the sysfs interface. When the block size is 1024, max_zeroout may
> overflow, so declare it as unsigned int to avoid overflow.
> 
> Signed-off-by: Baokun Li <libaokun1@huawei.com>
> ---
>  fs/ext4/extents.c | 6 ++----
>  1 file changed, 2 insertions(+), 4 deletions(-)
> 
> diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
> index 01299b55a567..8653b13e8248 100644
> --- a/fs/ext4/extents.c
> +++ b/fs/ext4/extents.c
> @@ -3425,10 +3425,8 @@ static int ext4_ext_convert_to_initialized(handle_t *handle,
>  	struct ext4_extent zero_ex1, zero_ex2;
>  	struct ext4_extent *ex, *abut_ex;
>  	ext4_lblk_t ee_block, eof_block;
> -	unsigned int ee_len, depth, map_len = map->m_len;
> -	int allocated = 0, max_zeroout = 0;
> -	int err = 0;
> -	int split_flag = EXT4_EXT_DATA_VALID2;
> +	unsigned int ee_len, depth, map_len = map->m_len, max_zeroout = 0;
> +	int err = 0, allocated = 0, split_flag = EXT4_EXT_DATA_VALID2;

Honestly, I prefer if we keep unrelated variables on different lines,
especially when they have initializers. I find the code more readable that
way. So in this case:

	int err = 0;
	int split_flag = EXT4_EXT_DATA_VALID2;
	int allocated = 0;
	unsigned int max_zeroout = 0;

But otherwise the fix looks good!

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 3/7] ext4: refactor out ext4_generic_attr_show()
  2024-01-26  8:57 ` [PATCH 3/7] ext4: refactor out ext4_generic_attr_show() Baokun Li
  2024-01-26 10:08   ` Zhang Yi
@ 2024-02-13 16:44   ` Jan Kara
  1 sibling, 0 replies; 31+ messages in thread
From: Jan Kara @ 2024-02-13 16:44 UTC (permalink / raw)
  To: Baokun Li
  Cc: linux-ext4, tytso, adilger.kernel, jack, ritesh.list,
	linux-kernel, yi.zhang, yangerkun, chengzhihao1, yukuai3

On Fri 26-01-24 16:57:12, Baokun Li wrote:
> Refactor out the function ext4_generic_attr_show() to handle the reading
> of values of various common types, with no functional changes.
> 
> Signed-off-by: Baokun Li <libaokun1@huawei.com>

Looks good. Feel free to add:

Reviewed-by: Jan Kara <jack@suse.cz>

								Honza

> ---
>  fs/ext4/sysfs.c | 74 +++++++++++++++++++++----------------------------
>  1 file changed, 32 insertions(+), 42 deletions(-)
> 
> diff --git a/fs/ext4/sysfs.c b/fs/ext4/sysfs.c
> index 834f9a0eb641..a5d657fa05cb 100644
> --- a/fs/ext4/sysfs.c
> +++ b/fs/ext4/sysfs.c
> @@ -366,13 +366,42 @@ static ssize_t __print_tstamp(char *buf, __le32 lo, __u8 hi)
>  #define print_tstamp(buf, es, tstamp) \
>  	__print_tstamp(buf, (es)->tstamp, (es)->tstamp ## _hi)
>  
> +static ssize_t ext4_generic_attr_show(struct ext4_attr *a,
> +				      struct ext4_sb_info *sbi, char *buf)
> +{
> +	void *ptr = calc_ptr(a, sbi);
> +
> +	if (!ptr)
> +		return 0;
> +
> +	switch (a->attr_id) {
> +	case attr_inode_readahead:
> +	case attr_pointer_ui:
> +		if (a->attr_ptr == ptr_ext4_super_block_offset)
> +			return sysfs_emit(buf, "%u\n", le32_to_cpup(ptr));
> +		return sysfs_emit(buf, "%u\n", *((unsigned int *) ptr));
> +	case attr_pointer_ul:
> +		return sysfs_emit(buf, "%lu\n", *((unsigned long *) ptr));
> +	case attr_pointer_u8:
> +		return sysfs_emit(buf, "%u\n", *((unsigned char *) ptr));
> +	case attr_pointer_u64:
> +		if (a->attr_ptr == ptr_ext4_super_block_offset)
> +			return sysfs_emit(buf, "%llu\n", le64_to_cpup(ptr));
> +		return sysfs_emit(buf, "%llu\n", *((unsigned long long *) ptr));
> +	case attr_pointer_string:
> +		return sysfs_emit(buf, "%.*s\n", a->attr_size, (char *) ptr);
> +	case attr_pointer_atomic:
> +		return sysfs_emit(buf, "%d\n", atomic_read((atomic_t *) ptr));
> +	}
> +	return 0;
> +}
> +
>  static ssize_t ext4_attr_show(struct kobject *kobj,
>  			      struct attribute *attr, char *buf)
>  {
>  	struct ext4_sb_info *sbi = container_of(kobj, struct ext4_sb_info,
>  						s_kobj);
>  	struct ext4_attr *a = container_of(attr, struct ext4_attr, attr);
> -	void *ptr = calc_ptr(a, sbi);
>  
>  	switch (a->attr_id) {
>  	case attr_delayed_allocation_blocks:
> @@ -391,45 +420,6 @@ static ssize_t ext4_attr_show(struct kobject *kobj,
>  		return sysfs_emit(buf, "%llu\n",
>  				(unsigned long long)
>  			percpu_counter_sum(&sbi->s_sra_exceeded_retry_limit));
> -	case attr_inode_readahead:
> -	case attr_pointer_ui:
> -		if (!ptr)
> -			return 0;
> -		if (a->attr_ptr == ptr_ext4_super_block_offset)
> -			return sysfs_emit(buf, "%u\n",
> -					le32_to_cpup(ptr));
> -		else
> -			return sysfs_emit(buf, "%u\n",
> -					*((unsigned int *) ptr));
> -	case attr_pointer_ul:
> -		if (!ptr)
> -			return 0;
> -		return sysfs_emit(buf, "%lu\n",
> -				*((unsigned long *) ptr));
> -	case attr_pointer_u8:
> -		if (!ptr)
> -			return 0;
> -		return sysfs_emit(buf, "%u\n",
> -				*((unsigned char *) ptr));
> -	case attr_pointer_u64:
> -		if (!ptr)
> -			return 0;
> -		if (a->attr_ptr == ptr_ext4_super_block_offset)
> -			return sysfs_emit(buf, "%llu\n",
> -					le64_to_cpup(ptr));
> -		else
> -			return sysfs_emit(buf, "%llu\n",
> -					*((unsigned long long *) ptr));
> -	case attr_pointer_string:
> -		if (!ptr)
> -			return 0;
> -		return sysfs_emit(buf, "%.*s\n", a->attr_size,
> -				(char *) ptr);
> -	case attr_pointer_atomic:
> -		if (!ptr)
> -			return 0;
> -		return sysfs_emit(buf, "%d\n",
> -				atomic_read((atomic_t *) ptr));
>  	case attr_feature:
>  		return sysfs_emit(buf, "supported\n");
>  	case attr_first_error_time:
> @@ -438,9 +428,9 @@ static ssize_t ext4_attr_show(struct kobject *kobj,
>  		return print_tstamp(buf, sbi->s_es, s_last_error_time);
>  	case attr_journal_task:
>  		return journal_task_show(sbi, buf);
> +	default:
> +		return ext4_generic_attr_show(a, sbi, buf);
>  	}
> -
> -	return 0;
>  }
>  
>  static ssize_t ext4_generic_attr_store(struct ext4_attr *a,
> -- 
> 2.31.1
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 2/7] ext4: refactor out ext4_generic_attr_store()
  2024-01-26  8:57 ` [PATCH 2/7] ext4: refactor out ext4_generic_attr_store() Baokun Li
  2024-01-26  9:37   ` Zhang Yi
@ 2024-02-13 16:47   ` Jan Kara
  1 sibling, 0 replies; 31+ messages in thread
From: Jan Kara @ 2024-02-13 16:47 UTC (permalink / raw)
  To: Baokun Li
  Cc: linux-ext4, tytso, adilger.kernel, jack, ritesh.list,
	linux-kernel, yi.zhang, yangerkun, chengzhihao1, yukuai3

On Fri 26-01-24 16:57:11, Baokun Li wrote:
> Refactor out the function ext4_generic_attr_store() to handle the setting
> of values of various common types, with no functional changes.
> 
> Signed-off-by: Baokun Li <libaokun1@huawei.com>
...
> -static ssize_t ext4_attr_store(struct kobject *kobj,
> -			       struct attribute *attr,
> -			       const char *buf, size_t len)
> +static ssize_t ext4_generic_attr_store(struct ext4_attr *a,
> +				       struct ext4_sb_info *sbi,
> +				       const char *buf, size_t len)
>  {
> -	struct ext4_sb_info *sbi = container_of(kobj, struct ext4_sb_info,
> -						s_kobj);
> -	struct ext4_attr *a = container_of(attr, struct ext4_attr, attr);
> -	void *ptr = calc_ptr(a, sbi);
> -	unsigned long t;
>  	int ret;
> +	unsigned long t;
> +	void *ptr = calc_ptr(a, sbi);
> +
> +	if (!ptr)
> +		return 0;
> +	ret = kstrtoul(skip_spaces(buf), 0, &t);

The refactorization is nice but I'd keep the string to number conversion
inside the switch so that we can distinguish uint and ulong cases.

								Honza

> +	if (ret)
> +		return ret;
>  
>  	switch (a->attr_id) {
> -	case attr_reserved_clusters:
> -		return reserved_clusters_store(sbi, buf, len);
>  	case attr_pointer_ui:
> -		if (!ptr)
> -			return 0;
> -		ret = kstrtoul(skip_spaces(buf), 0, &t);
> -		if (ret)
> -			return ret;
>  		if (t != (unsigned int)t)
>  			return -EINVAL;
>  		if (a->attr_ptr == ptr_ext4_super_block_offset)
> @@ -471,19 +467,30 @@ static ssize_t ext4_attr_store(struct kobject *kobj,
>  			*((unsigned int *) ptr) = t;
>  		return len;
>  	case attr_pointer_ul:
> -		if (!ptr)
> -			return 0;
> -		ret = kstrtoul(skip_spaces(buf), 0, &t);
> -		if (ret)
> -			return ret;
>  		*((unsigned long *) ptr) = t;
>  		return len;
> +	}
> +	return 0;
> +}
> +
> +static ssize_t ext4_attr_store(struct kobject *kobj,
> +			       struct attribute *attr,
> +			       const char *buf, size_t len)
> +{
> +	struct ext4_sb_info *sbi = container_of(kobj, struct ext4_sb_info,
> +						s_kobj);
> +	struct ext4_attr *a = container_of(attr, struct ext4_attr, attr);
> +
> +	switch (a->attr_id) {
> +	case attr_reserved_clusters:
> +		return reserved_clusters_store(sbi, buf, len);
>  	case attr_inode_readahead:
>  		return inode_readahead_blks_store(sbi, buf, len);
>  	case attr_trigger_test_error:
>  		return trigger_test_error(sbi, buf, len);
> +	default:
> +		return ext4_generic_attr_store(a, sbi, buf, len);
>  	}
> -	return 0;
>  }
>  
>  static void ext4_sb_release(struct kobject *kobj)
> -- 
> 2.31.1
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 4/7] ext4: add positive int attr pointer to avoid sysfs variables overflow
  2024-01-26  8:57 ` [PATCH 4/7] ext4: add positive int attr pointer to avoid sysfs variables overflow Baokun Li
  2024-01-27  2:07   ` Zhang Yi
@ 2024-02-13 16:58   ` Jan Kara
  2024-02-17  7:41     ` Baokun Li
  1 sibling, 1 reply; 31+ messages in thread
From: Jan Kara @ 2024-02-13 16:58 UTC (permalink / raw)
  To: Baokun Li
  Cc: linux-ext4, tytso, adilger.kernel, jack, ritesh.list,
	linux-kernel, yi.zhang, yangerkun, chengzhihao1, yukuai3, stable

On Fri 26-01-24 16:57:13, Baokun Li wrote:
> We can easily trigger a BUG_ON by using the following commands:
> 
>     mount /dev/$disk /tmp/test
>     echo 2147483650 > /sys/fs/ext4/$disk/mb_group_prealloc
>     echo test > /tmp/test/file && sync
> 
> ==================================================================
> kernel BUG at fs/ext4/mballoc.c:2029!
> invalid opcode: 0000 [#1] PREEMPT SMP PTI
> CPU: 3 PID: 320 Comm: kworker/u36:1 Not tainted 6.8.0-rc1 #462
> RIP: 0010:mb_mark_used+0x358/0x370
> [...]
> Call Trace:
>  ext4_mb_use_best_found+0x56/0x140
>  ext4_mb_complex_scan_group+0x196/0x2f0
>  ext4_mb_regular_allocator+0xa92/0xf00
>  ext4_mb_new_blocks+0x302/0xbc0
>  ext4_ext_map_blocks+0x95a/0xef0
>  ext4_map_blocks+0x2b1/0x680
>  ext4_do_writepages+0x733/0xbd0
> [...]
> ==================================================================
> 
> In ext4_mb_normalize_group_request():
>     ac->ac_g_ex.fe_len = EXT4_SB(sb)->s_mb_group_prealloc;
> 
> Here fe_len is of type int, but s_mb_group_prealloc is of type unsigned
> int, so setting s_mb_group_prealloc to 2147483650 overflows fe_len to a
> negative number, which ultimately triggers a BUG_ON() in mb_mark_used().
> 
> Therefore, we add attr_pointer_pi (aka positive int attr pointer) with a
> value range of 0-INT_MAX to avoid the above problem. In addition to the
> mb_group_prealloc sysfs interface, the following interfaces also have uint
> to int conversions that result in overflows, and are also fixed.
> 
>   err_ratelimit_burst
>   msg_ratelimit_burst
>   warning_ratelimit_burst
>   err_ratelimit_interval_ms
>   msg_ratelimit_interval_ms
>   warning_ratelimit_interval_ms
>   mb_best_avail_max_trim_order
> 
> CC: stable@vger.kernel.org
> Signed-off-by: Baokun Li <libaokun1@huawei.com>

I don't think you need to change s_mb_group_prealloc here and then restrict
it even further in the next patch. I'd just leave it alone here.

Also I think that limiting mb_best_avail_max_trim_order to 64 instead of
INT_MAX will make us more resilient to surprises in the future :) But I
don't really insist.

								Honza

> ---
>  fs/ext4/sysfs.c | 25 +++++++++++++++++--------
>  1 file changed, 17 insertions(+), 8 deletions(-)
> 
> diff --git a/fs/ext4/sysfs.c b/fs/ext4/sysfs.c
> index a5d657fa05cb..6f9f96e00f2f 100644
> --- a/fs/ext4/sysfs.c
> +++ b/fs/ext4/sysfs.c
> @@ -30,6 +30,7 @@ typedef enum {
>  	attr_first_error_time,
>  	attr_last_error_time,
>  	attr_feature,
> +	attr_pointer_pi,
>  	attr_pointer_ui,
>  	attr_pointer_ul,
>  	attr_pointer_u64,
> @@ -178,6 +179,9 @@ static struct ext4_attr ext4_attr_##_name = {			\
>  #define EXT4_RO_ATTR_ES_STRING(_name,_elname,_size)			\
>  	EXT4_ATTR_STRING(_name, 0444, _size, ext4_super_block, _elname)
>  
> +#define EXT4_RW_ATTR_SBI_PI(_name,_elname)      \
> +	EXT4_ATTR_OFFSET(_name, 0644, pointer_pi, ext4_sb_info, _elname)
> +
>  #define EXT4_RW_ATTR_SBI_UI(_name,_elname)	\
>  	EXT4_ATTR_OFFSET(_name, 0644, pointer_ui, ext4_sb_info, _elname)
>  
> @@ -213,17 +217,17 @@ EXT4_RW_ATTR_SBI_UI(mb_max_to_scan, s_mb_max_to_scan);
>  EXT4_RW_ATTR_SBI_UI(mb_min_to_scan, s_mb_min_to_scan);
>  EXT4_RW_ATTR_SBI_UI(mb_order2_req, s_mb_order2_reqs);
>  EXT4_RW_ATTR_SBI_UI(mb_stream_req, s_mb_stream_request);
> -EXT4_RW_ATTR_SBI_UI(mb_group_prealloc, s_mb_group_prealloc);
> +EXT4_RW_ATTR_SBI_PI(mb_group_prealloc, s_mb_group_prealloc);
>  EXT4_RW_ATTR_SBI_UI(mb_max_linear_groups, s_mb_max_linear_groups);
>  EXT4_RW_ATTR_SBI_UI(extent_max_zeroout_kb, s_extent_max_zeroout_kb);
>  EXT4_ATTR(trigger_fs_error, 0200, trigger_test_error);
> -EXT4_RW_ATTR_SBI_UI(err_ratelimit_interval_ms, s_err_ratelimit_state.interval);
> -EXT4_RW_ATTR_SBI_UI(err_ratelimit_burst, s_err_ratelimit_state.burst);
> -EXT4_RW_ATTR_SBI_UI(warning_ratelimit_interval_ms, s_warning_ratelimit_state.interval);
> -EXT4_RW_ATTR_SBI_UI(warning_ratelimit_burst, s_warning_ratelimit_state.burst);
> -EXT4_RW_ATTR_SBI_UI(msg_ratelimit_interval_ms, s_msg_ratelimit_state.interval);
> -EXT4_RW_ATTR_SBI_UI(msg_ratelimit_burst, s_msg_ratelimit_state.burst);
> -EXT4_RW_ATTR_SBI_UI(mb_best_avail_max_trim_order, s_mb_best_avail_max_trim_order);
> +EXT4_RW_ATTR_SBI_PI(err_ratelimit_interval_ms, s_err_ratelimit_state.interval);
> +EXT4_RW_ATTR_SBI_PI(err_ratelimit_burst, s_err_ratelimit_state.burst);
> +EXT4_RW_ATTR_SBI_PI(warning_ratelimit_interval_ms, s_warning_ratelimit_state.interval);
> +EXT4_RW_ATTR_SBI_PI(warning_ratelimit_burst, s_warning_ratelimit_state.burst);
> +EXT4_RW_ATTR_SBI_PI(msg_ratelimit_interval_ms, s_msg_ratelimit_state.interval);
> +EXT4_RW_ATTR_SBI_PI(msg_ratelimit_burst, s_msg_ratelimit_state.burst);
> +EXT4_RW_ATTR_SBI_PI(mb_best_avail_max_trim_order, s_mb_best_avail_max_trim_order);
>  #ifdef CONFIG_EXT4_DEBUG
>  EXT4_RW_ATTR_SBI_UL(simulate_fail, s_simulate_fail);
>  #endif
> @@ -376,6 +380,7 @@ static ssize_t ext4_generic_attr_show(struct ext4_attr *a,
>  
>  	switch (a->attr_id) {
>  	case attr_inode_readahead:
> +	case attr_pointer_pi:
>  	case attr_pointer_ui:
>  		if (a->attr_ptr == ptr_ext4_super_block_offset)
>  			return sysfs_emit(buf, "%u\n", le32_to_cpup(ptr));
> @@ -448,6 +453,10 @@ static ssize_t ext4_generic_attr_store(struct ext4_attr *a,
>  		return ret;
>  
>  	switch (a->attr_id) {
> +	case attr_pointer_pi:
> +		if ((int)t < 0)
> +			return -EINVAL;
> +		fallthrough;
>  	case attr_pointer_ui:
>  		if (t != (unsigned int)t)
>  			return -EINVAL;
> -- 
> 2.31.1
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 1/7] ext4: avoid overflow when setting values via sysfs
  2024-02-13 16:05   ` Jan Kara
@ 2024-02-17  7:09     ` Baokun Li
  2024-02-23 11:54       ` Jan Kara
  0 siblings, 1 reply; 31+ messages in thread
From: Baokun Li @ 2024-02-17  7:09 UTC (permalink / raw)
  To: Jan Kara
  Cc: linux-ext4, tytso, adilger.kernel, ritesh.list, linux-kernel,
	yi.zhang, yangerkun, chengzhihao1, yukuai3, Baokun Li

On 2024/2/14 0:05, Jan Kara wrote:
> On Fri 26-01-24 16:57:10, Baokun Li wrote:
>> When setting values of type unsigned int through sysfs, we use kstrtoul()
>> to parse it and then truncate part of it as the final set value, when the
>> set value is greater than UINT_MAX, the set value will not match what we
>> see because of the truncation. As follows:
>>
>>    $ echo 4294967296 > /sys/fs/ext4/sda/mb_max_linear_groups
>>    $ cat /sys/fs/ext4/sda/mb_max_linear_groups
>>      0
>>
>> So when the value set is outside the variable type range, -EINVAL is
>> returned to avoid the inconsistency described above. In addition, a
>> judgment is added to avoid setting s_resv_clusters less than 0.
>>
>> Signed-off-by: Baokun Li <libaokun1@huawei.com>
>> ---
>>   fs/ext4/sysfs.c | 4 +++-
>>   1 file changed, 3 insertions(+), 1 deletion(-)
>>
>> diff --git a/fs/ext4/sysfs.c b/fs/ext4/sysfs.c
>> index 6d332dff79dd..3671a8aaf4af 100644
>> --- a/fs/ext4/sysfs.c
>> +++ b/fs/ext4/sysfs.c
>> @@ -104,7 +104,7 @@ static ssize_t reserved_clusters_store(struct ext4_sb_info *sbi,
>>   	int ret;
>>   
>>   	ret = kstrtoull(skip_spaces(buf), 0, &val);
>> -	if (ret || val >= clusters)
>> +	if (ret || val >= clusters || (s64)val < 0)
>>   		return -EINVAL;
> This looks a bit pointless, doesn't it? 'val' is u64, clusters is u64. We
> know that val < clusters so how could (s64)val be < 0?
When clusters is bigger than LLONG_MAX, (s64)val may be less than 0.
Of course we don't have such a large storage device yet, so it's only
theoretically possible to overflow here. But the previous patches in this
patch set were intended to ensure that the values set via sysfs did not
exceed the range of the variable type, so I've modified that here as well.
>
>> @@ -463,6 +463,8 @@ static ssize_t ext4_attr_store(struct kobject *kobj,
>>   		ret = kstrtoul(skip_spaces(buf), 0, &t);
>>   		if (ret)
>>   			return ret;
>> +		if (t != (unsigned int)t)
>> +			return -EINVAL;
>>   		if (a->attr_ptr == ptr_ext4_super_block_offset)
>>   			*((__le32 *) ptr) = cpu_to_le32(t);
>>   		else
> I kind of agree with Alexey that using kstrtouint() here instead would look
> nicer. And it isn't like you have to define many new variables. You just
> need unsigned long for attr_pointer_ul and unsigned int for
> attr_pointer_ui.
>
> 								Honza
If we use both kstrtouint() and kstrtoul(), then we need to add
kstrtouint() or kstrtoul() to each case, which would be a lot of
duplicate code as follows:

static ssize_t ext4_generic_attr_store(struct ext4_attr *a,
                                        struct ext4_sb_info *sbi,
                                        const char *buf, size_t len)
{
         int ret;
         unsigned int t;
         unsigned long lt;
         void *ptr = calc_ptr(a, sbi);

         if (!ptr)
                 return 0;

         switch (a->attr_id) {
         case attr_group_prealloc:
                 ret = kstrtouint(skip_spaces(buf), 0, &t);
                 if (ret)
                         return ret;
                 if (t > sbi->s_clusters_per_group)
                         return -EINVAL;
                 return len;
         case attr_pointer_pi:
                 ret = kstrtouint(skip_spaces(buf), 0, &t);
                 if (ret)
                         return ret;
                 if ((int)t < 0)
                         return -EINVAL;
                 return len;
         case attr_pointer_ui:
                 ret = kstrtouint(skip_spaces(buf), 0, &t);
                 if (ret)
                         return ret;
                 if (t != (unsigned int)t)
                         return -EINVAL;
                 if (a->attr_ptr == ptr_ext4_super_block_offset)
                         *((__le32 *) ptr) = cpu_to_le32(t);
                 else
                         *((unsigned int *) ptr) = t;
                 return len;
         case attr_pointer_ul:
                 ret = kstrtoul(skip_spaces(buf), 0, &lt);
                 if (ret)
                         return ret;
                 *((unsigned long *) ptr) = lt;
                 return len;
         }
         return 0;

}

Also, both kstrtouint() and kstrtoul() are based on the kstrtoull()
implementation, so it feels better to opencode kstrtoul() and
kstrtouint() to reduce duplicate code.
Why is it better to distinguish uint and ulong cases here?

Thanks for your review!
Happy Chinese New Year!
-- 
With Best Regards,
Baokun Li
.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 4/7] ext4: add positive int attr pointer to avoid sysfs variables overflow
  2024-02-13 16:58   ` Jan Kara
@ 2024-02-17  7:41     ` Baokun Li
  2024-02-23 12:05       ` Jan Kara
  0 siblings, 1 reply; 31+ messages in thread
From: Baokun Li @ 2024-02-17  7:41 UTC (permalink / raw)
  To: Jan Kara
  Cc: linux-ext4, tytso, adilger.kernel, ritesh.list, linux-kernel,
	yi.zhang, yangerkun, chengzhihao1, yukuai3, stable, Baokun Li

On 2024/2/14 0:58, Jan Kara wrote:
> On Fri 26-01-24 16:57:13, Baokun Li wrote:
>> We can easily trigger a BUG_ON by using the following commands:
>>
>>      mount /dev/$disk /tmp/test
>>      echo 2147483650 > /sys/fs/ext4/$disk/mb_group_prealloc
>>      echo test > /tmp/test/file && sync
>>
>> ==================================================================
>> kernel BUG at fs/ext4/mballoc.c:2029!
>> invalid opcode: 0000 [#1] PREEMPT SMP PTI
>> CPU: 3 PID: 320 Comm: kworker/u36:1 Not tainted 6.8.0-rc1 #462
>> RIP: 0010:mb_mark_used+0x358/0x370
>> [...]
>> Call Trace:
>>   ext4_mb_use_best_found+0x56/0x140
>>   ext4_mb_complex_scan_group+0x196/0x2f0
>>   ext4_mb_regular_allocator+0xa92/0xf00
>>   ext4_mb_new_blocks+0x302/0xbc0
>>   ext4_ext_map_blocks+0x95a/0xef0
>>   ext4_map_blocks+0x2b1/0x680
>>   ext4_do_writepages+0x733/0xbd0
>> [...]
>> ==================================================================
>>
>> In ext4_mb_normalize_group_request():
>>      ac->ac_g_ex.fe_len = EXT4_SB(sb)->s_mb_group_prealloc;
>>
>> Here fe_len is of type int, but s_mb_group_prealloc is of type unsigned
>> int, so setting s_mb_group_prealloc to 2147483650 overflows fe_len to a
>> negative number, which ultimately triggers a BUG_ON() in mb_mark_used().
>>
>> Therefore, we add attr_pointer_pi (aka positive int attr pointer) with a
>> value range of 0-INT_MAX to avoid the above problem. In addition to the
>> mb_group_prealloc sysfs interface, the following interfaces also have uint
>> to int conversions that result in overflows, and are also fixed.
>>
>>    err_ratelimit_burst
>>    msg_ratelimit_burst
>>    warning_ratelimit_burst
>>    err_ratelimit_interval_ms
>>    msg_ratelimit_interval_ms
>>    warning_ratelimit_interval_ms
>>    mb_best_avail_max_trim_order
>>
>> CC: stable@vger.kernel.org
>> Signed-off-by: Baokun Li <libaokun1@huawei.com>
> I don't think you need to change s_mb_group_prealloc here and then restrict
> it even further in the next patch. I'd just leave it alone here.
Yes, we could put the next patch before this one, but using
s_mb_group_prealloc as an example makes it easier to understand
why the attr_pointer_pi case is added here.There are several other
variables that don't have more convincing examples.
>
> Also I think that limiting mb_best_avail_max_trim_order to 64 instead of
> INT_MAX will make us more resilient to surprises in the future :) But I
> don't really insist.
>
> 								Honza
I think it's enough here to make sure that mb_best_avail_max_trim_order
is a positive number, since we always make sure that min_order
is not less than 0, as follows:

          order = fls(ac->ac_g_ex.fe_len) - 1;
          min_order = order - sbi->s_mb_best_avail_max_trim_order;
          if (min_order < 0)
                  min_order = 0;

An oversized mb_best_avail_max_trim_order can be interpreted as
always being CR_ANY_FREE. 😄
>> ---
>>   fs/ext4/sysfs.c | 25 +++++++++++++++++--------
>>   1 file changed, 17 insertions(+), 8 deletions(-)
>>
>> diff --git a/fs/ext4/sysfs.c b/fs/ext4/sysfs.c
>> index a5d657fa05cb..6f9f96e00f2f 100644
>> --- a/fs/ext4/sysfs.c
>> +++ b/fs/ext4/sysfs.c
>> @@ -30,6 +30,7 @@ typedef enum {
>>   	attr_first_error_time,
>>   	attr_last_error_time,
>>   	attr_feature,
>> +	attr_pointer_pi,
>>   	attr_pointer_ui,
>>   	attr_pointer_ul,
>>   	attr_pointer_u64,
>> @@ -178,6 +179,9 @@ static struct ext4_attr ext4_attr_##_name = {			\
>>   #define EXT4_RO_ATTR_ES_STRING(_name,_elname,_size)			\
>>   	EXT4_ATTR_STRING(_name, 0444, _size, ext4_super_block, _elname)
>>   
>> +#define EXT4_RW_ATTR_SBI_PI(_name,_elname)      \
>> +	EXT4_ATTR_OFFSET(_name, 0644, pointer_pi, ext4_sb_info, _elname)
>> +
>>   #define EXT4_RW_ATTR_SBI_UI(_name,_elname)	\
>>   	EXT4_ATTR_OFFSET(_name, 0644, pointer_ui, ext4_sb_info, _elname)
>>   
>> @@ -213,17 +217,17 @@ EXT4_RW_ATTR_SBI_UI(mb_max_to_scan, s_mb_max_to_scan);
>>   EXT4_RW_ATTR_SBI_UI(mb_min_to_scan, s_mb_min_to_scan);
>>   EXT4_RW_ATTR_SBI_UI(mb_order2_req, s_mb_order2_reqs);
>>   EXT4_RW_ATTR_SBI_UI(mb_stream_req, s_mb_stream_request);
>> -EXT4_RW_ATTR_SBI_UI(mb_group_prealloc, s_mb_group_prealloc);
>> +EXT4_RW_ATTR_SBI_PI(mb_group_prealloc, s_mb_group_prealloc);
>>   EXT4_RW_ATTR_SBI_UI(mb_max_linear_groups, s_mb_max_linear_groups);
>>   EXT4_RW_ATTR_SBI_UI(extent_max_zeroout_kb, s_extent_max_zeroout_kb);
>>   EXT4_ATTR(trigger_fs_error, 0200, trigger_test_error);
>> -EXT4_RW_ATTR_SBI_UI(err_ratelimit_interval_ms, s_err_ratelimit_state.interval);
>> -EXT4_RW_ATTR_SBI_UI(err_ratelimit_burst, s_err_ratelimit_state.burst);
>> -EXT4_RW_ATTR_SBI_UI(warning_ratelimit_interval_ms, s_warning_ratelimit_state.interval);
>> -EXT4_RW_ATTR_SBI_UI(warning_ratelimit_burst, s_warning_ratelimit_state.burst);
>> -EXT4_RW_ATTR_SBI_UI(msg_ratelimit_interval_ms, s_msg_ratelimit_state.interval);
>> -EXT4_RW_ATTR_SBI_UI(msg_ratelimit_burst, s_msg_ratelimit_state.burst);
>> -EXT4_RW_ATTR_SBI_UI(mb_best_avail_max_trim_order, s_mb_best_avail_max_trim_order);
>> +EXT4_RW_ATTR_SBI_PI(err_ratelimit_interval_ms, s_err_ratelimit_state.interval);
>> +EXT4_RW_ATTR_SBI_PI(err_ratelimit_burst, s_err_ratelimit_state.burst);
>> +EXT4_RW_ATTR_SBI_PI(warning_ratelimit_interval_ms, s_warning_ratelimit_state.interval);
>> +EXT4_RW_ATTR_SBI_PI(warning_ratelimit_burst, s_warning_ratelimit_state.burst);
>> +EXT4_RW_ATTR_SBI_PI(msg_ratelimit_interval_ms, s_msg_ratelimit_state.interval);
>> +EXT4_RW_ATTR_SBI_PI(msg_ratelimit_burst, s_msg_ratelimit_state.burst);
>> +EXT4_RW_ATTR_SBI_PI(mb_best_avail_max_trim_order, s_mb_best_avail_max_trim_order);
>>   #ifdef CONFIG_EXT4_DEBUG
>>   EXT4_RW_ATTR_SBI_UL(simulate_fail, s_simulate_fail);
>>   #endif
>> @@ -376,6 +380,7 @@ static ssize_t ext4_generic_attr_show(struct ext4_attr *a,
>>   
>>   	switch (a->attr_id) {
>>   	case attr_inode_readahead:
>> +	case attr_pointer_pi:
>>   	case attr_pointer_ui:
>>   		if (a->attr_ptr == ptr_ext4_super_block_offset)
>>   			return sysfs_emit(buf, "%u\n", le32_to_cpup(ptr));
>> @@ -448,6 +453,10 @@ static ssize_t ext4_generic_attr_store(struct ext4_attr *a,
>>   		return ret;
>>   
>>   	switch (a->attr_id) {
>> +	case attr_pointer_pi:
>> +		if ((int)t < 0)
>> +			return -EINVAL;
>> +		fallthrough;
>>   	case attr_pointer_ui:
>>   		if (t != (unsigned int)t)
>>   			return -EINVAL;
>> -- 
>> 2.31.1
>>


Thanks!
-- 
With Best Regards,
Baokun Li
.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 7/7] ext4: set the type of max_zeroout to unsigned int to avoid overflow
  2024-02-13 16:38   ` Jan Kara
@ 2024-02-17  7:45     ` Baokun Li
  0 siblings, 0 replies; 31+ messages in thread
From: Baokun Li @ 2024-02-17  7:45 UTC (permalink / raw)
  To: Jan Kara
  Cc: linux-ext4, tytso, adilger.kernel, ritesh.list, linux-kernel,
	yi.zhang, yangerkun, chengzhihao1, yukuai3, Baokun Li

On 2024/2/14 0:38, Jan Kara wrote:
> On Fri 26-01-24 16:57:16, Baokun Li wrote:
>> The max_zeroout is of type int and the s_extent_max_zeroout_kb is of
>> type uint, and the s_extent_max_zeroout_kb can be freely modified via
>> the sysfs interface. When the block size is 1024, max_zeroout may
>> overflow, so declare it as unsigned int to avoid overflow.
>>
>> Signed-off-by: Baokun Li <libaokun1@huawei.com>
>> ---
>>   fs/ext4/extents.c | 6 ++----
>>   1 file changed, 2 insertions(+), 4 deletions(-)
>>
>> diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
>> index 01299b55a567..8653b13e8248 100644
>> --- a/fs/ext4/extents.c
>> +++ b/fs/ext4/extents.c
>> @@ -3425,10 +3425,8 @@ static int ext4_ext_convert_to_initialized(handle_t *handle,
>>   	struct ext4_extent zero_ex1, zero_ex2;
>>   	struct ext4_extent *ex, *abut_ex;
>>   	ext4_lblk_t ee_block, eof_block;
>> -	unsigned int ee_len, depth, map_len = map->m_len;
>> -	int allocated = 0, max_zeroout = 0;
>> -	int err = 0;
>> -	int split_flag = EXT4_EXT_DATA_VALID2;
>> +	unsigned int ee_len, depth, map_len = map->m_len, max_zeroout = 0;
>> +	int err = 0, allocated = 0, split_flag = EXT4_EXT_DATA_VALID2;
> Honestly, I prefer if we keep unrelated variables on different lines,
> especially when they have initializers. I find the code more readable that
> way. So in this case:
>
> 	int err = 0;
> 	int split_flag = EXT4_EXT_DATA_VALID2;
> 	int allocated = 0;
> 	unsigned int max_zeroout = 0;
>
> But otherwise the fix looks good!
>
> 								Honza
Totally agree! I will replace it in the next version.

Thanks!
-- 
With Best Regards,
Baokun Li
.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 5/7] ext4: fix slab-out-of-bounds in ext4_mb_find_good_group_avg_frag_lists()
  2024-01-26  8:57 ` [PATCH 5/7] ext4: fix slab-out-of-bounds in ext4_mb_find_good_group_avg_frag_lists() Baokun Li
  2024-01-27  2:09   ` Zhang Yi
  2024-02-13 16:14   ` Jan Kara
@ 2024-02-20  5:39   ` Ojaswin Mujoo
  2024-02-20  6:31     ` Baokun Li
  2 siblings, 1 reply; 31+ messages in thread
From: Ojaswin Mujoo @ 2024-02-20  5:39 UTC (permalink / raw)
  To: Baokun Li
  Cc: linux-ext4, tytso, adilger.kernel, jack, ritesh.list,
	linux-kernel, yi.zhang, yangerkun, chengzhihao1, yukuai3, stable

On Fri, Jan 26, 2024 at 04:57:14PM +0800, Baokun Li wrote:

Hey Baokun, 

Good catch! I've added some minor comments below. Other than that feel
free to add 

Reviewed-by: Ojaswin Mujoo <ojaswin@linux.ibm.com>

> We can trigger a slab-out-of-bounds with the following commands:
> 
>     mkfs.ext4 -F /dev/$disk 10G
>     mount /dev/$disk /tmp/test
>     echo 2147483647 > /sys/fs/ext4/$disk/mb_group_prealloc
>     echo test > /tmp/test/file && sync
> 
> ==================================================================
> BUG: KASAN: slab-out-of-bounds in ext4_mb_find_good_group_avg_frag_lists+0x8a/0x200 [ext4]
> Read of size 8 at addr ffff888121b9d0f0 by task kworker/u2:0/11
> CPU: 0 PID: 11 Comm: kworker/u2:0 Tainted: GL 6.7.0-next-20240118 #521
> Call Trace:
>  dump_stack_lvl+0x2c/0x50
>  kasan_report+0xb6/0xf0
>  ext4_mb_find_good_group_avg_frag_lists+0x8a/0x200 [ext4]
>  ext4_mb_regular_allocator+0x19e9/0x2370 [ext4]
>  ext4_mb_new_blocks+0x88a/0x1370 [ext4]
>  ext4_ext_map_blocks+0x14f7/0x2390 [ext4]
>  ext4_map_blocks+0x569/0xea0 [ext4]
>  ext4_do_writepages+0x10f6/0x1bc0 [ext4]
> [...]
> ==================================================================
> 
> The flow of issue triggering is as follows:
> 
> // Set s_mb_group_prealloc to 2147483647 via sysfs
> ext4_mb_new_blocks
>   ext4_mb_normalize_request
>     ext4_mb_normalize_group_request
>       ac->ac_g_ex.fe_len = EXT4_SB(sb)->s_mb_group_prealloc
>   ext4_mb_regular_allocator
>     ext4_mb_choose_next_group
>       ext4_mb_choose_next_group_best_avail
>         mb_avg_fragment_size_order
>           order = fls(len) - 2 = 29
>         ext4_mb_find_good_group_avg_frag_lists
>           frag_list = &sbi->s_mb_avg_fragment_size[order]
>           if (list_empty(frag_list)) // Trigger SOOB!
> 
> At 4k block size, the length of the s_mb_avg_fragment_size list is 14, but
> an oversized s_mb_group_prealloc is set, causing slab-out-of-bounds to be
> triggered by an attempt to access an element at index 29.
> 
> Therefore it is not allowed to set s_mb_group_prealloc to a value greater
> than s_clusters_per_group via sysfs, and to avoid returning an order from
> mb_avg_fragment_size_order() that is greater than MB_NUM_ORDERS(sb).
> 
> Fixes: 7e170922f06b ("ext4: Add allocation criteria 1.5 (CR1_5)")
> CC: stable@vger.kernel.org
> Signed-off-by: Baokun Li <libaokun1@huawei.com>
> ---
>  fs/ext4/mballoc.c | 2 ++
>  fs/ext4/sysfs.c   | 9 ++++++++-
>  2 files changed, 10 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
> index f44f668e407f..1ea6491b6b00 100644
> --- a/fs/ext4/mballoc.c
> +++ b/fs/ext4/mballoc.c
> @@ -832,6 +832,8 @@ static int mb_avg_fragment_size_order(struct super_block *sb, ext4_grpblk_t len)
>     return 0;
>   if (order == MB_NUM_ORDERS(sb))
>     order--;
> + if (WARN_ON_ONCE(order > MB_NUM_ORDERS(sb)))
> +   order = MB_NUM_ORDERS(sb) - 1;
>   return order;
>  }

So along with this change, I think it'll also be good to add an extra 
check in ext4_mb_choose_next_group_best_avail() as:

  if (1 << min_order < ac->ac_o_ex.fe_len)
    min_order = fls(ac->ac_o_ex.fe_len);
 
+ if (order >= MB_NUM_ORDERS(ac->ac_sb))
+   order = MB_NUM_ORDERS(ac->ac_sb) - 1;
+
  for (i = order; i >= min_order; i--) {
    int frag_order;
    /*


The reason for this is that otherwise when order is large eg 29,
we would unnecessarily loop from i=29 to i=13 while always
looking at the same avg_fragment_list[13].

Regards,
ojaswin

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 5/7] ext4: fix slab-out-of-bounds in ext4_mb_find_good_group_avg_frag_lists()
  2024-02-20  5:39   ` Ojaswin Mujoo
@ 2024-02-20  6:31     ` Baokun Li
  0 siblings, 0 replies; 31+ messages in thread
From: Baokun Li @ 2024-02-20  6:31 UTC (permalink / raw)
  To: Ojaswin Mujoo
  Cc: linux-ext4, tytso, adilger.kernel, jack, ritesh.list,
	linux-kernel, yi.zhang, yangerkun, chengzhihao1, yukuai3, stable,
	Baokun Li

On 2024/2/20 13:39, Ojaswin Mujoo wrote:
> On Fri, Jan 26, 2024 at 04:57:14PM +0800, Baokun Li wrote:
>
> Hey Baokun,
>
> Good catch! I've added some minor comments below. Other than that feel
> free to add
>
> Reviewed-by: Ojaswin Mujoo <ojaswin@linux.ibm.com>
>
>> We can trigger a slab-out-of-bounds with the following commands:
>>
>>      mkfs.ext4 -F /dev/$disk 10G
>>      mount /dev/$disk /tmp/test
>>      echo 2147483647 > /sys/fs/ext4/$disk/mb_group_prealloc
>>      echo test > /tmp/test/file && sync
>>
>> ==================================================================
>> BUG: KASAN: slab-out-of-bounds in ext4_mb_find_good_group_avg_frag_lists+0x8a/0x200 [ext4]
>> Read of size 8 at addr ffff888121b9d0f0 by task kworker/u2:0/11
>> CPU: 0 PID: 11 Comm: kworker/u2:0 Tainted: GL 6.7.0-next-20240118 #521
>> Call Trace:
>>   dump_stack_lvl+0x2c/0x50
>>   kasan_report+0xb6/0xf0
>>   ext4_mb_find_good_group_avg_frag_lists+0x8a/0x200 [ext4]
>>   ext4_mb_regular_allocator+0x19e9/0x2370 [ext4]
>>   ext4_mb_new_blocks+0x88a/0x1370 [ext4]
>>   ext4_ext_map_blocks+0x14f7/0x2390 [ext4]
>>   ext4_map_blocks+0x569/0xea0 [ext4]
>>   ext4_do_writepages+0x10f6/0x1bc0 [ext4]
>> [...]
>> ==================================================================
>>
>> The flow of issue triggering is as follows:
>>
>> // Set s_mb_group_prealloc to 2147483647 via sysfs
>> ext4_mb_new_blocks
>>    ext4_mb_normalize_request
>>      ext4_mb_normalize_group_request
>>        ac->ac_g_ex.fe_len = EXT4_SB(sb)->s_mb_group_prealloc
>>    ext4_mb_regular_allocator
>>      ext4_mb_choose_next_group
>>        ext4_mb_choose_next_group_best_avail
>>          mb_avg_fragment_size_order
>>            order = fls(len) - 2 = 29
>>          ext4_mb_find_good_group_avg_frag_lists
>>            frag_list = &sbi->s_mb_avg_fragment_size[order]
>>            if (list_empty(frag_list)) // Trigger SOOB!
>>
>> At 4k block size, the length of the s_mb_avg_fragment_size list is 14, but
>> an oversized s_mb_group_prealloc is set, causing slab-out-of-bounds to be
>> triggered by an attempt to access an element at index 29.
>>
>> Therefore it is not allowed to set s_mb_group_prealloc to a value greater
>> than s_clusters_per_group via sysfs, and to avoid returning an order from
>> mb_avg_fragment_size_order() that is greater than MB_NUM_ORDERS(sb).
>>
>> Fixes: 7e170922f06b ("ext4: Add allocation criteria 1.5 (CR1_5)")
>> CC: stable@vger.kernel.org
>> Signed-off-by: Baokun Li <libaokun1@huawei.com>
>> ---
>>   fs/ext4/mballoc.c | 2 ++
>>   fs/ext4/sysfs.c   | 9 ++++++++-
>>   2 files changed, 10 insertions(+), 1 deletion(-)
>>
>> diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
>> index f44f668e407f..1ea6491b6b00 100644
>> --- a/fs/ext4/mballoc.c
>> +++ b/fs/ext4/mballoc.c
>> @@ -832,6 +832,8 @@ static int mb_avg_fragment_size_order(struct super_block *sb, ext4_grpblk_t len)
>>      return 0;
>>    if (order == MB_NUM_ORDERS(sb))
>>      order--;
>> + if (WARN_ON_ONCE(order > MB_NUM_ORDERS(sb)))
>> +   order = MB_NUM_ORDERS(sb) - 1;
>>    return order;
>>   }
> So along with this change, I think it'll also be good to add an extra
> check in ext4_mb_choose_next_group_best_avail() as:
>
>    if (1 << min_order < ac->ac_o_ex.fe_len)
>      min_order = fls(ac->ac_o_ex.fe_len);
>   
> + if (order >= MB_NUM_ORDERS(ac->ac_sb))
> +   order = MB_NUM_ORDERS(ac->ac_sb) - 1;
> +
>    for (i = order; i >= min_order; i--) {
>      int frag_order;
>      /*
>
>
> The reason for this is that otherwise when order is large eg 29,
> we would unnecessarily loop from i=29 to i=13 while always
> looking at the same avg_fragment_list[13].
>
> Regards,
> ojaswin


Yeah, good point! This will cut down on some unnecessary loops.

I'll add this extra check in the next version.

Thanks for having a look!

-- 
With Best Regards,
Baokun Li
.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 1/7] ext4: avoid overflow when setting values via sysfs
  2024-02-17  7:09     ` Baokun Li
@ 2024-02-23 11:54       ` Jan Kara
  2024-02-24  1:59         ` Baokun Li
  0 siblings, 1 reply; 31+ messages in thread
From: Jan Kara @ 2024-02-23 11:54 UTC (permalink / raw)
  To: Baokun Li
  Cc: Jan Kara, linux-ext4, tytso, adilger.kernel, ritesh.list,
	linux-kernel, yi.zhang, yangerkun, chengzhihao1, yukuai3

On Sat 17-02-24 15:09:06, Baokun Li wrote:
> On 2024/2/14 0:05, Jan Kara wrote:
> > On Fri 26-01-24 16:57:10, Baokun Li wrote:
> > > When setting values of type unsigned int through sysfs, we use kstrtoul()
> > > to parse it and then truncate part of it as the final set value, when the
> > > set value is greater than UINT_MAX, the set value will not match what we
> > > see because of the truncation. As follows:
> > > 
> > >    $ echo 4294967296 > /sys/fs/ext4/sda/mb_max_linear_groups
> > >    $ cat /sys/fs/ext4/sda/mb_max_linear_groups
> > >      0
> > > 
> > > So when the value set is outside the variable type range, -EINVAL is
> > > returned to avoid the inconsistency described above. In addition, a
> > > judgment is added to avoid setting s_resv_clusters less than 0.
> > > 
> > > Signed-off-by: Baokun Li <libaokun1@huawei.com>
> > > ---
> > >   fs/ext4/sysfs.c | 4 +++-
> > >   1 file changed, 3 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/fs/ext4/sysfs.c b/fs/ext4/sysfs.c
> > > index 6d332dff79dd..3671a8aaf4af 100644
> > > --- a/fs/ext4/sysfs.c
> > > +++ b/fs/ext4/sysfs.c
> > > @@ -104,7 +104,7 @@ static ssize_t reserved_clusters_store(struct ext4_sb_info *sbi,
> > >   	int ret;
> > >   	ret = kstrtoull(skip_spaces(buf), 0, &val);
> > > -	if (ret || val >= clusters)
> > > +	if (ret || val >= clusters || (s64)val < 0)
> > >   		return -EINVAL;
> > This looks a bit pointless, doesn't it? 'val' is u64, clusters is u64. We
> > know that val < clusters so how could (s64)val be < 0?
> When clusters is bigger than LLONG_MAX, (s64)val may be less than 0.
> Of course we don't have such a large storage device yet, so it's only
> theoretically possible to overflow here. But the previous patches in this
> patch set were intended to ensure that the values set via sysfs did not
> exceed the range of the variable type, so I've modified that here as well.

Well, my point was that the on disk format is limited to much less than
2^63 blocks. But I guess having the additional check does not matter.

> > > @@ -463,6 +463,8 @@ static ssize_t ext4_attr_store(struct kobject *kobj,
> > >   		ret = kstrtoul(skip_spaces(buf), 0, &t);
> > >   		if (ret)
> > >   			return ret;
> > > +		if (t != (unsigned int)t)
> > > +			return -EINVAL;
> > >   		if (a->attr_ptr == ptr_ext4_super_block_offset)
> > >   			*((__le32 *) ptr) = cpu_to_le32(t);
> > >   		else
> > I kind of agree with Alexey that using kstrtouint() here instead would look
> > nicer. And it isn't like you have to define many new variables. You just
> > need unsigned long for attr_pointer_ul and unsigned int for
> > attr_pointer_ui.
>
> If we use both kstrtouint() and kstrtoul(), then we need to add
> kstrtouint() or kstrtoul() to each case, which would be a lot of
> duplicate code as follows:

Well, it is 5 more lines if I'm counting right :) (3x 3 lines of conversion
- 2x 2 lines of boundary checks). I kind of find it easier to oversee the
boundary checks when everything is together at each parameter. But frankly
this is a bit of nitpicking so if you feel strongly about this I won't
insist.

> static ssize_t ext4_generic_attr_store(struct ext4_attr *a,
>                                        struct ext4_sb_info *sbi,
>                                        const char *buf, size_t len)
> {
>         int ret;
>         unsigned int t;
>         unsigned long lt;
>         void *ptr = calc_ptr(a, sbi);
> 
>         if (!ptr)
>                 return 0;
> 
>         switch (a->attr_id) {
>         case attr_group_prealloc:
>                 ret = kstrtouint(skip_spaces(buf), 0, &t);
>                 if (ret)
>                         return ret;
>                 if (t > sbi->s_clusters_per_group)
>                         return -EINVAL;
>                 return len;
>         case attr_pointer_pi:
>                 ret = kstrtouint(skip_spaces(buf), 0, &t);
>                 if (ret)
>                         return ret;
>                 if ((int)t < 0)
>                         return -EINVAL;
>                 return len;
>         case attr_pointer_ui:
>                 ret = kstrtouint(skip_spaces(buf), 0, &t);
>                 if (ret)
>                         return ret;
>                 if (t != (unsigned int)t)
>                         return -EINVAL;
		  ^^^ this can go away

>                 if (a->attr_ptr == ptr_ext4_super_block_offset)
>                         *((__le32 *) ptr) = cpu_to_le32(t);
>                 else
>                         *((unsigned int *) ptr) = t;
>                 return len;
>         case attr_pointer_ul:
>                 ret = kstrtoul(skip_spaces(buf), 0, &lt);
>                 if (ret)
>                         return ret;
>                 *((unsigned long *) ptr) = lt;
>                 return len;
>         }
>         return 0;
> 
> }
> 
> Also, both kstrtouint() and kstrtoul() are based on the kstrtoull()
> implementation, so it feels better to opencode kstrtoul() and
> kstrtouint() to reduce duplicate code.
> Why is it better to distinguish uint and ulong cases here?

Hopefully explained above :)


								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 4/7] ext4: add positive int attr pointer to avoid sysfs variables overflow
  2024-02-17  7:41     ` Baokun Li
@ 2024-02-23 12:05       ` Jan Kara
  2024-02-24  2:46         ` Baokun Li
  0 siblings, 1 reply; 31+ messages in thread
From: Jan Kara @ 2024-02-23 12:05 UTC (permalink / raw)
  To: Baokun Li
  Cc: Jan Kara, linux-ext4, tytso, adilger.kernel, ritesh.list,
	linux-kernel, yi.zhang, yangerkun, chengzhihao1, yukuai3, stable

On Sat 17-02-24 15:41:43, Baokun Li wrote:
> On 2024/2/14 0:58, Jan Kara wrote:
> > On Fri 26-01-24 16:57:13, Baokun Li wrote:
> > > We can easily trigger a BUG_ON by using the following commands:
> > > 
> > >      mount /dev/$disk /tmp/test
> > >      echo 2147483650 > /sys/fs/ext4/$disk/mb_group_prealloc
> > >      echo test > /tmp/test/file && sync
> > > 
> > > ==================================================================
> > > kernel BUG at fs/ext4/mballoc.c:2029!
> > > invalid opcode: 0000 [#1] PREEMPT SMP PTI
> > > CPU: 3 PID: 320 Comm: kworker/u36:1 Not tainted 6.8.0-rc1 #462
> > > RIP: 0010:mb_mark_used+0x358/0x370
> > > [...]
> > > Call Trace:
> > >   ext4_mb_use_best_found+0x56/0x140
> > >   ext4_mb_complex_scan_group+0x196/0x2f0
> > >   ext4_mb_regular_allocator+0xa92/0xf00
> > >   ext4_mb_new_blocks+0x302/0xbc0
> > >   ext4_ext_map_blocks+0x95a/0xef0
> > >   ext4_map_blocks+0x2b1/0x680
> > >   ext4_do_writepages+0x733/0xbd0
> > > [...]
> > > ==================================================================
> > > 
> > > In ext4_mb_normalize_group_request():
> > >      ac->ac_g_ex.fe_len = EXT4_SB(sb)->s_mb_group_prealloc;
> > > 
> > > Here fe_len is of type int, but s_mb_group_prealloc is of type unsigned
> > > int, so setting s_mb_group_prealloc to 2147483650 overflows fe_len to a
> > > negative number, which ultimately triggers a BUG_ON() in mb_mark_used().
> > > 
> > > Therefore, we add attr_pointer_pi (aka positive int attr pointer) with a
> > > value range of 0-INT_MAX to avoid the above problem. In addition to the
> > > mb_group_prealloc sysfs interface, the following interfaces also have uint
> > > to int conversions that result in overflows, and are also fixed.
> > > 
> > >    err_ratelimit_burst
> > >    msg_ratelimit_burst
> > >    warning_ratelimit_burst
> > >    err_ratelimit_interval_ms
> > >    msg_ratelimit_interval_ms
> > >    warning_ratelimit_interval_ms
> > >    mb_best_avail_max_trim_order
> > > 
> > > CC: stable@vger.kernel.org
> > > Signed-off-by: Baokun Li <libaokun1@huawei.com>
> > I don't think you need to change s_mb_group_prealloc here and then restrict
> > it even further in the next patch. I'd just leave it alone here.
> Yes, we could put the next patch before this one, but using
> s_mb_group_prealloc as an example makes it easier to understand
> why the attr_pointer_pi case is added here.There are several other
> variables that don't have more convincing examples.

Yes, I think reordering would be good. Because I've read the convertion and
started wondering: "is this enough?"

> > Also I think that limiting mb_best_avail_max_trim_order to 64 instead of
> > INT_MAX will make us more resilient to surprises in the future :) But I
> > don't really insist.
> > 
> > 								Honza
> I think it's enough here to make sure that mb_best_avail_max_trim_order
> is a positive number, since we always make sure that min_order
> is not less than 0, as follows:
> 
>          order = fls(ac->ac_g_ex.fe_len) - 1;
>          min_order = order - sbi->s_mb_best_avail_max_trim_order;
>          if (min_order < 0)
>                  min_order = 0;
> 
> An oversized mb_best_avail_max_trim_order can be interpreted as
> always being CR_ANY_FREE. 😄

Well, s_mb_best_avail_max_trim_order is not about allocation passes but
about how many times are we willing to shorten the goal extent to half and
still use the advanced free blocks search. And I agree that the mballoc
code is careful enough that large numbers don't matter there but still why
allowing storing garbage values? It is nicer to tell sysadmin he did
something wrong right away.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 1/7] ext4: avoid overflow when setting values via sysfs
  2024-02-23 11:54       ` Jan Kara
@ 2024-02-24  1:59         ` Baokun Li
  0 siblings, 0 replies; 31+ messages in thread
From: Baokun Li @ 2024-02-24  1:59 UTC (permalink / raw)
  To: Jan Kara
  Cc: linux-ext4, tytso, adilger.kernel, ritesh.list, linux-kernel,
	yi.zhang, yangerkun, chengzhihao1, yukuai3, Baokun Li

On 2024/2/23 19:54, Jan Kara wrote:
> On Sat 17-02-24 15:09:06, Baokun Li wrote:
>> On 2024/2/14 0:05, Jan Kara wrote:
>>> On Fri 26-01-24 16:57:10, Baokun Li wrote:
>>>> When setting values of type unsigned int through sysfs, we use kstrtoul()
>>>> to parse it and then truncate part of it as the final set value, when the
>>>> set value is greater than UINT_MAX, the set value will not match what we
>>>> see because of the truncation. As follows:
>>>>
>>>>     $ echo 4294967296 > /sys/fs/ext4/sda/mb_max_linear_groups
>>>>     $ cat /sys/fs/ext4/sda/mb_max_linear_groups
>>>>       0
>>>>
>>>> So when the value set is outside the variable type range, -EINVAL is
>>>> returned to avoid the inconsistency described above. In addition, a
>>>> judgment is added to avoid setting s_resv_clusters less than 0.
>>>>
>>>> Signed-off-by: Baokun Li <libaokun1@huawei.com>
>>>> ---
>>>>    fs/ext4/sysfs.c | 4 +++-
>>>>    1 file changed, 3 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/fs/ext4/sysfs.c b/fs/ext4/sysfs.c
>>>> index 6d332dff79dd..3671a8aaf4af 100644
>>>> --- a/fs/ext4/sysfs.c
>>>> +++ b/fs/ext4/sysfs.c
>>>> @@ -104,7 +104,7 @@ static ssize_t reserved_clusters_store(struct ext4_sb_info *sbi,
>>>>    	int ret;
>>>>    	ret = kstrtoull(skip_spaces(buf), 0, &val);
>>>> -	if (ret || val >= clusters)
>>>> +	if (ret || val >= clusters || (s64)val < 0)
>>>>    		return -EINVAL;
>>> This looks a bit pointless, doesn't it? 'val' is u64, clusters is u64. We
>>> know that val < clusters so how could (s64)val be < 0?
>> When clusters is bigger than LLONG_MAX, (s64)val may be less than 0.
>> Of course we don't have such a large storage device yet, so it's only
>> theoretically possible to overflow here. But the previous patches in this
>> patch set were intended to ensure that the values set via sysfs did not
>> exceed the range of the variable type, so I've modified that here as well.
> Well, my point was that the on disk format is limited to much less than
> 2^63 blocks. But I guess having the additional check does not matter.
OK.
>>>> @@ -463,6 +463,8 @@ static ssize_t ext4_attr_store(struct kobject *kobj,
>>>>    		ret = kstrtoul(skip_spaces(buf), 0, &t);
>>>>    		if (ret)
>>>>    			return ret;
>>>> +		if (t != (unsigned int)t)
>>>> +			return -EINVAL;
>>>>    		if (a->attr_ptr == ptr_ext4_super_block_offset)
>>>>    			*((__le32 *) ptr) = cpu_to_le32(t);
>>>>    		else
>>> I kind of agree with Alexey that using kstrtouint() here instead would look
>>> nicer. And it isn't like you have to define many new variables. You just
>>> need unsigned long for attr_pointer_ul and unsigned int for
>>> attr_pointer_ui.
>> If we use both kstrtouint() and kstrtoul(), then we need to add
>> kstrtouint() or kstrtoul() to each case, which would be a lot of
>> duplicate code as follows:
> Well, it is 5 more lines if I'm counting right :) (3x 3 lines of conversion
> - 2x 2 lines of boundary checks). I kind of find it easier to oversee the
> boundary checks when everything is together at each parameter. But frankly
> this is a bit of nitpicking so if you feel strongly about this I won't
> insist.
Makes sense, there may be some implicit checks that look unintuitive
this way in the original patch. Now keep the string to number conversion
inside the switch does look better. Let me send v2.
>> static ssize_t ext4_generic_attr_store(struct ext4_attr *a,
>>                                         struct ext4_sb_info *sbi,
>>                                         const char *buf, size_t len)
>> {
>>          int ret;
>>          unsigned int t;
>>          unsigned long lt;
>>          void *ptr = calc_ptr(a, sbi);
>>
>>          if (!ptr)
>>                  return 0;
>>
>>          switch (a->attr_id) {
>>          case attr_group_prealloc:
>>                  ret = kstrtouint(skip_spaces(buf), 0, &t);
>>                  if (ret)
>>                          return ret;
>>                  if (t > sbi->s_clusters_per_group)
>>                          return -EINVAL;
>>                  return len;
>>          case attr_pointer_pi:
>>                  ret = kstrtouint(skip_spaces(buf), 0, &t);
>>                  if (ret)
>>                          return ret;
>>                  if ((int)t < 0)
>>                          return -EINVAL;
>>                  return len;
>>          case attr_pointer_ui:
>>                  ret = kstrtouint(skip_spaces(buf), 0, &t);
>>                  if (ret)
>>                          return ret;
>>                  if (t != (unsigned int)t)
>>                          return -EINVAL;
> 		  ^^^ this can go away
I forgot to delete this, thanks!
>>                  if (a->attr_ptr == ptr_ext4_super_block_offset)
>>                          *((__le32 *) ptr) = cpu_to_le32(t);
>>                  else
>>                          *((unsigned int *) ptr) = t;
>>                  return len;
>>          case attr_pointer_ul:
>>                  ret = kstrtoul(skip_spaces(buf), 0, &lt);
>>                  if (ret)
>>                          return ret;
>>                  *((unsigned long *) ptr) = lt;
>>                  return len;
>>          }
>>          return 0;
>>
>> }
>>
>> Also, both kstrtouint() and kstrtoul() are based on the kstrtoull()
>> implementation, so it feels better to opencode kstrtoul() and
>> kstrtouint() to reduce duplicate code.
>> Why is it better to distinguish uint and ulong cases here?
> Hopefully explained above :)
>
>
> 								Honza
Yes, now I understand what you're considering. 😊

Thank you for your explanation!
-- 
With Best Regards,
Baokun Li
.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 4/7] ext4: add positive int attr pointer to avoid sysfs variables overflow
  2024-02-23 12:05       ` Jan Kara
@ 2024-02-24  2:46         ` Baokun Li
  0 siblings, 0 replies; 31+ messages in thread
From: Baokun Li @ 2024-02-24  2:46 UTC (permalink / raw)
  To: Jan Kara
  Cc: linux-ext4, tytso, adilger.kernel, ritesh.list, linux-kernel,
	yi.zhang, yangerkun, chengzhihao1, yukuai3, stable, Baokun Li

On 2024/2/23 20:05, Jan Kara wrote:
> On Sat 17-02-24 15:41:43, Baokun Li wrote:
>> On 2024/2/14 0:58, Jan Kara wrote:
>>> On Fri 26-01-24 16:57:13, Baokun Li wrote:
>>>> We can easily trigger a BUG_ON by using the following commands:
>>>>
>>>>       mount /dev/$disk /tmp/test
>>>>       echo 2147483650 > /sys/fs/ext4/$disk/mb_group_prealloc
>>>>       echo test > /tmp/test/file && sync
>>>>
>>>> ==================================================================
>>>> kernel BUG at fs/ext4/mballoc.c:2029!
>>>> invalid opcode: 0000 [#1] PREEMPT SMP PTI
>>>> CPU: 3 PID: 320 Comm: kworker/u36:1 Not tainted 6.8.0-rc1 #462
>>>> RIP: 0010:mb_mark_used+0x358/0x370
>>>> [...]
>>>> Call Trace:
>>>>    ext4_mb_use_best_found+0x56/0x140
>>>>    ext4_mb_complex_scan_group+0x196/0x2f0
>>>>    ext4_mb_regular_allocator+0xa92/0xf00
>>>>    ext4_mb_new_blocks+0x302/0xbc0
>>>>    ext4_ext_map_blocks+0x95a/0xef0
>>>>    ext4_map_blocks+0x2b1/0x680
>>>>    ext4_do_writepages+0x733/0xbd0
>>>> [...]
>>>> ==================================================================
>>>>
>>>> In ext4_mb_normalize_group_request():
>>>>       ac->ac_g_ex.fe_len = EXT4_SB(sb)->s_mb_group_prealloc;
>>>>
>>>> Here fe_len is of type int, but s_mb_group_prealloc is of type unsigned
>>>> int, so setting s_mb_group_prealloc to 2147483650 overflows fe_len to a
>>>> negative number, which ultimately triggers a BUG_ON() in mb_mark_used().
>>>>
>>>> Therefore, we add attr_pointer_pi (aka positive int attr pointer) with a
>>>> value range of 0-INT_MAX to avoid the above problem. In addition to the
>>>> mb_group_prealloc sysfs interface, the following interfaces also have uint
>>>> to int conversions that result in overflows, and are also fixed.
>>>>
>>>>     err_ratelimit_burst
>>>>     msg_ratelimit_burst
>>>>     warning_ratelimit_burst
>>>>     err_ratelimit_interval_ms
>>>>     msg_ratelimit_interval_ms
>>>>     warning_ratelimit_interval_ms
>>>>     mb_best_avail_max_trim_order
>>>>
>>>> CC: stable@vger.kernel.org
>>>> Signed-off-by: Baokun Li <libaokun1@huawei.com>
>>> I don't think you need to change s_mb_group_prealloc here and then restrict
>>> it even further in the next patch. I'd just leave it alone here.
>> Yes, we could put the next patch before this one, but using
>> s_mb_group_prealloc as an example makes it easier to understand
>> why the attr_pointer_pi case is added here.There are several other
>> variables that don't have more convincing examples.
> Yes, I think reordering would be good. Because I've read the convertion and
> started wondering: "is this enough?"
Well, I will put the next patch before this one in the next version.
>>> Also I think that limiting mb_best_avail_max_trim_order to 64 instead of
>>> INT_MAX will make us more resilient to surprises in the future :) But I
>>> don't really insist.
>>>
>>> 								Honza
>> I think it's enough here to make sure that mb_best_avail_max_trim_order
>> is a positive number, since we always make sure that min_order
>> is not less than 0, as follows:
>>
>>           order = fls(ac->ac_g_ex.fe_len) - 1;
>>           min_order = order - sbi->s_mb_best_avail_max_trim_order;
>>           if (min_order < 0)
>>                   min_order = 0;
>>
>> An oversized mb_best_avail_max_trim_order can be interpreted as
>> always being CR_ANY_FREE. 😄
> Well, s_mb_best_avail_max_trim_order is not about allocation passes but
> about how many times are we willing to shorten the goal extent to half and
> still use the advanced free blocks search.
Yes, this means that in CR1.5, in case the original request is satisfied,
we allow allocation of blocks with an order of (goal_extent_order -
s_mb_best_avail_max_trim_order) to accelerate block allocation.
> And I agree that the mballoc
> code is careful enough that large numbers don't matter there but still why
> allowing storing garbage values? It is nicer to tell sysadmin he did
> something wrong right away.
>
> 								Honza
Yes, we shouldn't allow storing rubbish values, otherwise it may
mislead admins, I will add an extra type to check it.


Thanks!
-- 
With Best Regards,
Baokun Li
.

^ permalink raw reply	[flat|nested] 31+ messages in thread

end of thread, other threads:[~2024-02-24  2:46 UTC | newest]

Thread overview: 31+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-01-26  8:57 [PATCH 0/7] ext4: avoid sysfs variables overflow causing BUG_ON/SOOB Baokun Li
2024-01-26  8:57 ` [PATCH 1/7] ext4: avoid overflow when setting values via sysfs Baokun Li
2024-01-26  9:28   ` Zhang Yi
2024-02-13 16:05   ` Jan Kara
2024-02-17  7:09     ` Baokun Li
2024-02-23 11:54       ` Jan Kara
2024-02-24  1:59         ` Baokun Li
2024-01-26  8:57 ` [PATCH 2/7] ext4: refactor out ext4_generic_attr_store() Baokun Li
2024-01-26  9:37   ` Zhang Yi
2024-02-13 16:47   ` Jan Kara
2024-01-26  8:57 ` [PATCH 3/7] ext4: refactor out ext4_generic_attr_show() Baokun Li
2024-01-26 10:08   ` Zhang Yi
2024-02-13 16:44   ` Jan Kara
2024-01-26  8:57 ` [PATCH 4/7] ext4: add positive int attr pointer to avoid sysfs variables overflow Baokun Li
2024-01-27  2:07   ` Zhang Yi
2024-02-13 16:58   ` Jan Kara
2024-02-17  7:41     ` Baokun Li
2024-02-23 12:05       ` Jan Kara
2024-02-24  2:46         ` Baokun Li
2024-01-26  8:57 ` [PATCH 5/7] ext4: fix slab-out-of-bounds in ext4_mb_find_good_group_avg_frag_lists() Baokun Li
2024-01-27  2:09   ` Zhang Yi
2024-02-13 16:14   ` Jan Kara
2024-02-20  5:39   ` Ojaswin Mujoo
2024-02-20  6:31     ` Baokun Li
2024-01-26  8:57 ` [PATCH 6/7] ext4: set type of ac_groups_linear_remaining to __u32 to avoid overflow Baokun Li
2024-01-27  2:10   ` Zhang Yi
2024-02-13 16:15   ` Jan Kara
2024-01-26  8:57 ` [PATCH 7/7] ext4: set the type of max_zeroout to unsigned int " Baokun Li
2024-01-27  2:12   ` Zhang Yi
2024-02-13 16:38   ` Jan Kara
2024-02-17  7:45     ` Baokun Li

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.