All of lore.kernel.org
 help / color / mirror / Atom feed
From: Sasha Levin <sashal@kernel.org>
To: linux-kernel@vger.kernel.org, stable@vger.kernel.org
Cc: Coly Li <colyli@suse.de>,
	Andrea Tomassetti <andrea.tomassetti-opensource@devo.com>,
	Eric Wheeler <bcache@lists.ewheeler.net>,
	Jens Axboe <axboe@kernel.dk>, Sasha Levin <sashal@kernel.org>,
	kent.overstreet@gmail.com, linux-bcache@vger.kernel.org
Subject: [PATCH AUTOSEL 6.6 07/40] bcache: avoid oversize memory allocation by small stripe_size
Date: Tue, 28 Nov 2023 16:05:13 -0500	[thread overview]
Message-ID: <20231128210615.875085-7-sashal@kernel.org> (raw)
In-Reply-To: <20231128210615.875085-1-sashal@kernel.org>

From: Coly Li <colyli@suse.de>

[ Upstream commit baf8fb7e0e5ec54ea0839f0c534f2cdcd79bea9c ]

Arraies bcache->stripe_sectors_dirty and bcache->full_dirty_stripes are
used for dirty data writeback, their sizes are decided by backing device
capacity and stripe size. Larger backing device capacity or smaller
stripe size make these two arraies occupies more dynamic memory space.

Currently bcache->stripe_size is directly inherited from
queue->limits.io_opt of underlying storage device. For normal hard
drives, its limits.io_opt is 0, and bcache sets the corresponding
stripe_size to 1TB (1<<31 sectors), it works fine 10+ years. But for
devices do declare value for queue->limits.io_opt, small stripe_size
(comparing to 1TB) becomes an issue for oversize memory allocations of
bcache->stripe_sectors_dirty and bcache->full_dirty_stripes, while the
capacity of hard drives gets much larger in recent decade.

For example a raid5 array assembled by three 20TB hardrives, the raid
device capacity is 40TB with typical 512KB limits.io_opt. After the math
calculation in bcache code, these two arraies will occupy 400MB dynamic
memory. Even worse Andrea Tomassetti reports that a 4KB limits.io_opt is
declared on a new 2TB hard drive, then these two arraies request 2GB and
512MB dynamic memory from kzalloc(). The result is that bcache device
always fails to initialize on his system.

To avoid the oversize memory allocation, bcache->stripe_size should not
directly inherited by queue->limits.io_opt from the underlying device.
This patch defines BCH_MIN_STRIPE_SZ (4MB) as minimal bcache stripe size
and set bcache device's stripe size against the declared limits.io_opt
value from the underlying storage device,
- If the declared limits.io_opt > BCH_MIN_STRIPE_SZ, bcache device will
  set its stripe size directly by this limits.io_opt value.
- If the declared limits.io_opt < BCH_MIN_STRIPE_SZ, bcache device will
  set its stripe size by a value multiplying limits.io_opt and euqal or
  large than BCH_MIN_STRIPE_SZ.

Then the minimal stripe size of a bcache device will always be >= 4MB.
For a 40TB raid5 device with 512KB limits.io_opt, memory occupied by
bcache->stripe_sectors_dirty and bcache->full_dirty_stripes will be 50MB
in total. For a 2TB hard drive with 4KB limits.io_opt, memory occupied
by these two arraies will be 2.5MB in total.

Such mount of memory allocated for bcache->stripe_sectors_dirty and
bcache->full_dirty_stripes is reasonable for most of storage devices.

Reported-by: Andrea Tomassetti <andrea.tomassetti-opensource@devo.com>
Signed-off-by: Coly Li <colyli@suse.de>
Reviewed-by: Eric Wheeler <bcache@lists.ewheeler.net>
Link: https://lore.kernel.org/r/20231120052503.6122-2-colyli@suse.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 drivers/md/bcache/bcache.h | 1 +
 drivers/md/bcache/super.c  | 2 ++
 2 files changed, 3 insertions(+)

diff --git a/drivers/md/bcache/bcache.h b/drivers/md/bcache/bcache.h
index 5a79bb3c272f1..83eb7f27db3d4 100644
--- a/drivers/md/bcache/bcache.h
+++ b/drivers/md/bcache/bcache.h
@@ -265,6 +265,7 @@ struct bcache_device {
 #define BCACHE_DEV_WB_RUNNING		3
 #define BCACHE_DEV_RATE_DW_RUNNING	4
 	int			nr_stripes;
+#define BCH_MIN_STRIPE_SZ		((4 << 20) >> SECTOR_SHIFT)
 	unsigned int		stripe_size;
 	atomic_t		*stripe_sectors_dirty;
 	unsigned long		*full_dirty_stripes;
diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c
index 0ae2b36762930..93791e46b1e8f 100644
--- a/drivers/md/bcache/super.c
+++ b/drivers/md/bcache/super.c
@@ -905,6 +905,8 @@ static int bcache_device_init(struct bcache_device *d, unsigned int block_size,
 
 	if (!d->stripe_size)
 		d->stripe_size = 1 << 31;
+	else if (d->stripe_size < BCH_MIN_STRIPE_SZ)
+		d->stripe_size = roundup(BCH_MIN_STRIPE_SZ, d->stripe_size);
 
 	n = DIV_ROUND_UP_ULL(sectors, d->stripe_size);
 	if (!n || n > max_stripes) {
-- 
2.42.0


  parent reply	other threads:[~2023-11-28 21:06 UTC|newest]

Thread overview: 42+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-11-28 21:05 [PATCH AUTOSEL 6.6 01/40] x86/hyperv: Fix the detection of E820_TYPE_PRAM in a Gen2 VM Sasha Levin
2023-11-28 21:05 ` [PATCH AUTOSEL 6.6 02/40] usb: aqc111: check packet for fixup for true limit Sasha Levin
2023-11-28 21:05 ` [PATCH AUTOSEL 6.6 03/40] stmmac: dwmac-loongson: Add architecture dependency Sasha Levin
2023-11-28 21:05   ` Sasha Levin
2023-11-28 21:05 ` [PATCH AUTOSEL 6.6 04/40] rxrpc: Fix some minor issues with bundle tracing Sasha Levin
2023-11-28 21:05 ` [PATCH AUTOSEL 6.6 05/40] blk-throttle: fix lockdep warning of "cgroup_mutex or RCU read lock required!" Sasha Levin
2023-11-28 21:05 ` [PATCH AUTOSEL 6.6 06/40] blk-cgroup: bypass blkcg_deactivate_policy after destroying Sasha Levin
2023-11-28 21:05 ` Sasha Levin [this message]
2023-11-28 21:05 ` [PATCH AUTOSEL 6.6 08/40] bcache: remove redundant assignment to variable cur_idx Sasha Levin
2023-11-28 21:05 ` [PATCH AUTOSEL 6.6 09/40] bcache: add code comments for bch_btree_node_get() and __bch_btree_node_alloc() Sasha Levin
2023-11-28 21:05 ` [PATCH AUTOSEL 6.6 10/40] bcache: avoid NULL checking to c->root in run_cache_set() Sasha Levin
2023-11-28 21:05 ` [PATCH AUTOSEL 6.6 11/40] nbd: fold nbd config initialization into nbd_alloc_config() Sasha Levin
2023-11-28 21:05 ` [PATCH AUTOSEL 6.6 12/40] nbd: factor out a helper to get nbd_config without holding 'config_lock' Sasha Levin
2023-11-28 21:05 ` [PATCH AUTOSEL 6.6 13/40] nbd: fix null-ptr-dereference while accessing 'nbd->config' Sasha Levin
2023-11-28 21:05 ` [PATCH AUTOSEL 6.6 14/40] nvme-auth: unlock mutex in one place only Sasha Levin
2023-11-28 21:05 ` [PATCH AUTOSEL 6.6 15/40] nvme-auth: set explanation code for failure2 msgs Sasha Levin
2023-11-28 21:05 ` [PATCH AUTOSEL 6.6 16/40] nvme: catch errors from nvme_configure_metadata() Sasha Levin
2023-11-28 21:05 ` [PATCH AUTOSEL 6.6 17/40] selftests/bpf: fix bpf_loop_bench for new callback verification scheme Sasha Levin
2023-11-28 21:05 ` [PATCH AUTOSEL 6.6 18/40] LoongArch: Add dependency between vmlinuz.efi and vmlinux.efi Sasha Levin
2023-11-28 21:05 ` [PATCH AUTOSEL 6.6 19/40] LoongArch: Record pc instead of offset in la_abs relocation Sasha Levin
2023-11-28 21:05 ` [PATCH AUTOSEL 6.6 20/40] LoongArch: Silence the boot warning about 'nokaslr' Sasha Levin
2023-11-28 21:05 ` [PATCH AUTOSEL 6.6 21/40] LoongArch: Mark {dmw,tlb}_virt_to_page() exports as non-GPL Sasha Levin
2023-11-28 21:05 ` [PATCH AUTOSEL 6.6 22/40] LoongArch: Implement constant timer shutdown interface Sasha Levin
2023-11-28 21:05 ` [PATCH AUTOSEL 6.6 23/40] platform/x86: intel_telemetry: Fix kernel doc descriptions Sasha Levin
2023-11-28 21:05 ` [PATCH AUTOSEL 6.6 24/40] HID: mcp2221: Set driver data before I2C adapter add Sasha Levin
2023-11-28 21:05 ` [PATCH AUTOSEL 6.6 25/40] HID: mcp2221: Allow IO to start during probe Sasha Levin
2023-11-28 21:05 ` [PATCH AUTOSEL 6.6 26/40] HID: apple: add Jamesdonkey and A3R to non-apple keyboards list Sasha Levin
2023-11-28 21:05 ` [PATCH AUTOSEL 6.6 27/40] HID: glorious: fix Glorious Model I HID report Sasha Levin
2023-11-28 21:05 ` [PATCH AUTOSEL 6.6 28/40] HID: add ALWAYS_POLL quirk for Apple kb Sasha Levin
2023-11-28 21:05 ` [PATCH AUTOSEL 6.6 29/40] nbd: pass nbd_sock to nbd_read_reply() instead of index Sasha Levin
2023-11-28 21:05 ` [PATCH AUTOSEL 6.6 30/40] HID: hid-asus: reset the backlight brightness level on resume Sasha Levin
2023-11-28 21:05 ` [PATCH AUTOSEL 6.6 31/40] HID: multitouch: Add quirk for HONOR GLO-GXXX touchpad Sasha Levin
2023-11-28 21:05 ` [PATCH AUTOSEL 6.6 32/40] nfc: virtual_ncidev: Add variable to check if ndev is running Sasha Levin
2023-11-28 21:05 ` [PATCH AUTOSEL 6.6 33/40] scripts/checkstack.pl: match all stack sizes for s390 Sasha Levin
2023-11-28 21:05 ` [PATCH AUTOSEL 6.6 34/40] asm-generic: qspinlock: fix queued_spin_value_unlocked() implementation Sasha Levin
2023-11-28 21:05 ` [PATCH AUTOSEL 6.6 35/40] eventfs: Do not allow NULL parent to eventfs_start_creating() Sasha Levin
2023-11-28 21:05 ` [PATCH AUTOSEL 6.6 36/40] net: usb: qmi_wwan: claim interface 4 for ZTE MF290 Sasha Levin
2023-11-28 21:05 ` [PATCH AUTOSEL 6.6 37/40] smb: client: implement ->query_reparse_point() for SMB1 Sasha Levin
2023-11-28 21:05 ` [PATCH AUTOSEL 6.6 38/40] smb: client: introduce ->parse_reparse_point() Sasha Levin
2023-11-28 21:05 ` [PATCH AUTOSEL 6.6 39/40] smb: client: set correct file type from NFS reparse points Sasha Levin
2023-11-28 21:05 ` [PATCH AUTOSEL 6.6 40/40] arm64: add dependency between vmlinuz.efi and Image Sasha Levin
2023-11-28 21:05   ` Sasha Levin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20231128210615.875085-7-sashal@kernel.org \
    --to=sashal@kernel.org \
    --cc=andrea.tomassetti-opensource@devo.com \
    --cc=axboe@kernel.dk \
    --cc=bcache@lists.ewheeler.net \
    --cc=colyli@suse.de \
    --cc=kent.overstreet@gmail.com \
    --cc=linux-bcache@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.