All of lore.kernel.org
 help / color / mirror / Atom feed
From: Ming Lei <ming.lei@canonical.com>
To: Jens Axboe <axboe@fb.com>, linux-kernel@vger.kernel.org
Cc: linux-block@vger.kernel.org,
	Christoph Hellwig <hch@infradead.org>,
	Kent Overstreet <kent.overstreet@gmail.com>,
	Eric Wheeler <bcache@lists.ewheeler.net>,
	Sebastian Roesner <sroesner-kernelorg@roesner-online.de>,
	Ming Lei <ming.lei@canonical.com>, stable@vger.kernel.org (4.3+),
	Shaohua Li <shli@fb.com>, Jens Axboe <axboe@kernel.dk>
Subject: [PATCH v4] block: make sure big bio is splitted into at most 256 bvecs
Date: Tue, 23 Aug 2016 21:49:45 +0800	[thread overview]
Message-ID: <1471960185-14044-1-git-send-email-ming.lei@canonical.com> (raw)

After arbitrary bio size is supported, the incoming bio may
be very big. We have to split the bio into small bios so that
each holds at most BIO_MAX_PAGES bvecs for safety reason, such
as bio_clone().

This patch fixes the following kernel crash:

> [  172.660142] BUG: unable to handle kernel NULL pointer dereference at 0000000000000028
> [  172.660229] IP: [<ffffffff811e53b4>] bio_trim+0xf/0x2a
> [  172.660289] PGD 7faf3e067 PUD 7f9279067 PMD 0
> [  172.660399] Oops: 0000 [#1] SMP
> [...]
> [  172.664780] Call Trace:
> [  172.664813]  [<ffffffffa007f3be>] ? raid1_make_request+0x2e8/0xad7 [raid1]
> [  172.664846]  [<ffffffff811f07da>] ? blk_queue_split+0x377/0x3d4
> [  172.664880]  [<ffffffffa005fb5f>] ? md_make_request+0xf6/0x1e9 [md_mod]
> [  172.664912]  [<ffffffff811eb860>] ? generic_make_request+0xb5/0x155
> [  172.664947]  [<ffffffffa0445c89>] ? prio_io+0x85/0x95 [bcache]
> [  172.664981]  [<ffffffffa0448252>] ? register_cache_set+0x355/0x8d0 [bcache]
> [  172.665016]  [<ffffffffa04497d3>] ? register_bcache+0x1006/0x1174 [bcache]

The issue can be reproduced by the following steps:
	- create one raid1 over two virtio-blk
	- build bcache device over the above raid1 and another cache device
	and bucket size is set as 2Mbytes
	- set cache mode as writeback
	- run random write over ext4 on the bcache device

Fixes: 54efd50(block: make generic_make_request handle arbitrarily sized bios)
Reported-by: Sebastian Roesner <sroesner-kernelorg@roesner-online.de>
Reported-by: Eric Wheeler <bcache@lists.ewheeler.net>
Cc: stable@vger.kernel.org (4.3+)
Cc: Shaohua Li <shli@fb.com>
Acked-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Ming Lei <ming.lei@canonical.com>
---
V4:
	- don't consider merging this kind of bio which is splitted
	because of reaching max bvec limit for making change simple,
	as requested by Christoph and Kent
V3:
	- rebase against v4.8-rc1 since .bi_rw of bio is renamed
	as .bi_opf

V2:
	- don't mark as REQ_NOMERGE in case the bio is splitted
	for reaching the limit of bvecs count
V1:
	- Kent pointed out that using max io size can't cover
	the case of non-full bvecs/pages


 block/blk-merge.c | 22 ++++++++++++++++++++++
 1 file changed, 22 insertions(+)

diff --git a/block/blk-merge.c b/block/blk-merge.c
index 3eec75a..f6ae884 100644
--- a/block/blk-merge.c
+++ b/block/blk-merge.c
@@ -94,9 +94,31 @@ static struct bio *blk_bio_segment_split(struct request_queue *q,
 	bool do_split = true;
 	struct bio *new = NULL;
 	const unsigned max_sectors = get_max_io_size(q, bio);
+	unsigned bvecs = 0;
 
 	bio_for_each_segment(bv, bio, iter) {
 		/*
+		 * With arbitrary bio size, the incoming bio may be very
+		 * big. We have to split the bio into small bios so that
+		 * each holds at most BIO_MAX_PAGES bvecs because
+		 * bio_clone() can fail to allocate big bvecs.
+		 *
+		 * It should have been better to apply the limit per
+		 * request queue in which bio_clone() is involved,
+		 * instead of globally. The biggest blocker is the
+		 * bio_clone() in bio bounce.
+		 *
+		 * If bio is splitted by this reason, we should have
+		 * allowed to continue bios merging, but don't do
+		 * that now for making the change simple.
+		 *
+		 * TODO: deal with bio bounce's bio_clone() gracefully
+		 * and convert the global limit into per-queue limit.
+		 */
+		if (bvecs++ >= BIO_MAX_PAGES)
+			goto split;
+
+		/*
 		 * If the queue doesn't support SG gaps and adding this
 		 * offset would create a gap, disallow it.
 		 */
-- 
2.7.4

             reply	other threads:[~2016-08-23 13:49 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-08-23 13:49 Ming Lei [this message]
2016-08-24  8:38 ` [PATCH v4] block: make sure big bio is splitted into at most 256 bvecs Kent Overstreet
2016-08-24 14:17 ` Jens Axboe
2016-08-24 14:17   ` Jens Axboe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1471960185-14044-1-git-send-email-ming.lei@canonical.com \
    --to=ming.lei@canonical.com \
    --cc=axboe@fb.com \
    --cc=axboe@kernel.dk \
    --cc=bcache@lists.ewheeler.net \
    --cc=hch@infradead.org \
    --cc=kent.overstreet@gmail.com \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=shli@fb.com \
    --cc=sroesner-kernelorg@roesner-online.de \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.