All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] block: copy bi_vcnt in __bio_clone_fast
@ 2014-10-08 22:54 Jeff Mahoney
  2014-10-09 13:53 ` Jeff Moyer
  0 siblings, 1 reply; 8+ messages in thread
From: Jeff Mahoney @ 2014-10-08 22:54 UTC (permalink / raw)
  To: Jens Axboe, Linux Kernel Maling List

Commit 05f1dd53152173 (block: add queue flag for disabling SG merging) uses
bi_vcnt to assign bio->bi_phys_segments if sg merging is disabled. When
using device mapper on top of a blk-mq device (virtio_blk in my test),
we'd end up overflowing the scatterlist in __blk_bios_map_sg.

__bio_clone_fast copies bi_iter and bi_io_vec but not bi_vcnt, so
blk_recount_segments would report bi_phys_segments as 0. Since
rq->nr_phys_segments is 0 as well, the checks to ensure that we don't
exceed the queue's segment limit end up allowing more bios (and segments) to
attach the a request until we finally map it. That also means we
pass the BUG_ON at the beginning of virtio_queue_rq, ultimately causing
memory corruption and a crash.

If we copy bi_vcnt in __bio_clone_fast, the bios and requests properly
report the number of segments and everything works as expected.

Originally reported at http://bugzilla.opensuse.org/show_bug.cgi?id=888259

Reported-by: Stephen Kulow <coolo@suse.com>
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
---

 block/bio.c |    1 +
 1 file changed, 1 insertion(+)

--- a/block/bio.c
+++ b/block/bio.c
@@ -564,6 +564,7 @@ void __bio_clone_fast(struct bio *bio, s
 	bio->bi_rw = bio_src->bi_rw;
 	bio->bi_iter = bio_src->bi_iter;
 	bio->bi_io_vec = bio_src->bi_io_vec;
+	bio->bi_vcnt = bio_src->bi_vcnt;
 }
 EXPORT_SYMBOL(__bio_clone_fast);
 

-- 
Jeff Mahoney
SUSE Labs


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] block: copy bi_vcnt in __bio_clone_fast
  2014-10-08 22:54 [PATCH] block: copy bi_vcnt in __bio_clone_fast Jeff Mahoney
@ 2014-10-09 13:53 ` Jeff Moyer
  2014-10-09 14:26   ` Jeff Mahoney
  0 siblings, 1 reply; 8+ messages in thread
From: Jeff Moyer @ 2014-10-09 13:53 UTC (permalink / raw)
  To: Jeff Mahoney; +Cc: Jens Axboe, Linux Kernel Maling List, Ming Lei

Jeff Mahoney <jeffm@suse.com> writes:

> Commit 05f1dd53152173 (block: add queue flag for disabling SG merging) uses
> bi_vcnt to assign bio->bi_phys_segments if sg merging is disabled. When
> using device mapper on top of a blk-mq device (virtio_blk in my test),
> we'd end up overflowing the scatterlist in __blk_bios_map_sg.
>
> __bio_clone_fast copies bi_iter and bi_io_vec but not bi_vcnt, so
> blk_recount_segments would report bi_phys_segments as 0. Since
> rq->nr_phys_segments is 0 as well, the checks to ensure that we don't
> exceed the queue's segment limit end up allowing more bios (and segments) to
> attach the a request until we finally map it. That also means we
> pass the BUG_ON at the beginning of virtio_queue_rq, ultimately causing
> memory corruption and a crash.
>
> If we copy bi_vcnt in __bio_clone_fast, the bios and requests properly
> report the number of segments and everything works as expected.
>
> Originally reported at http://bugzilla.opensuse.org/show_bug.cgi?id=888259

Hi, Jeff,

Did you manage to reproduce this problem with commit 0738854 (blk-merge:
fix blk_recount_segments) applied?  Or perhaps with commit 200612e (dm
table: propagate QUEUE_FLAG_NO_SG_MERGE)?

Cheers,
Jeff

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] block: copy bi_vcnt in __bio_clone_fast
  2014-10-09 13:53 ` Jeff Moyer
@ 2014-10-09 14:26   ` Jeff Mahoney
  2014-10-09 15:25     ` Ming Lei
  0 siblings, 1 reply; 8+ messages in thread
From: Jeff Mahoney @ 2014-10-09 14:26 UTC (permalink / raw)
  To: Jeff Moyer; +Cc: Jens Axboe, Linux Kernel Maling List, Ming Lei

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 10/9/14, 9:53 AM, Jeff Moyer wrote:
> Jeff Mahoney <jeffm@suse.com> writes:
> 
>> Commit 05f1dd53152173 (block: add queue flag for disabling SG
>> merging) uses bi_vcnt to assign bio->bi_phys_segments if sg
>> merging is disabled. When using device mapper on top of a blk-mq
>> device (virtio_blk in my test), we'd end up overflowing the
>> scatterlist in __blk_bios_map_sg.
>> 
>> __bio_clone_fast copies bi_iter and bi_io_vec but not bi_vcnt,
>> so blk_recount_segments would report bi_phys_segments as 0.
>> Since rq->nr_phys_segments is 0 as well, the checks to ensure
>> that we don't exceed the queue's segment limit end up allowing
>> more bios (and segments) to attach the a request until we finally
>> map it. That also means we pass the BUG_ON at the beginning of
>> virtio_queue_rq, ultimately causing memory corruption and a
>> crash.
>> 
>> If we copy bi_vcnt in __bio_clone_fast, the bios and requests
>> properly report the number of segments and everything works as
>> expected.
>> 
>> Originally reported at
>> http://bugzilla.opensuse.org/show_bug.cgi?id=888259
> 
> Hi, Jeff,
> 
> Did you manage to reproduce this problem with commit 0738854
> (blk-merge: fix blk_recount_segments) applied?  Or perhaps with
> commit 200612e (dm table: propagate QUEUE_FLAG_NO_SG_MERGE)?

Yep. I was able to reproduce it with 3.17. I did try 0738854 when I
was still using 3.16 since it looked like a good candidate. Neither of
those patches affect the problem here. bio->bi_phys_segments never
gets a value set in the fast clone case and that translates to
req->nr_phys_segments never getting properly accumulated. That might
not be a problem except that the NO_SG_MERGE behavior bypasses the
iteration that would come up with the correct value. In either case,
it's still correct to copy bi_vcnt from the source bio since it's
describing the same bvec. Doing the iteration with no merging would
just end up with the same value. bio_clone_bioset builds up bi_vcnt as
it iterates over the number of segments, so that works fine. It's only
__bio_clone_fast that's broken.

- -Jeff

- -- 
Jeff Mahoney
SUSE Labs
-----BEGIN PGP SIGNATURE-----
Version: GnuPG/MacGPG2 v2.0.19 (Darwin)

iQIcBAEBAgAGBQJUNpsSAAoJEB57S2MheeWy5k4QAL2N5KQC3cAv+jYHal5qASaZ
5vr2A2VCofnz9ADV/CHhMNzcArcL1MBn/hJvXC8l/RZkEKY2GYaf4Uqu8LoYjPeJ
7rUrJGQ74H2IS63n7+qgBI5AlL+ndSsOruxiCSExIJPVu4t7MeKGnDKLApq9s7zl
JAh+ec+iJ5mT3Nth714zSyDVBFvBOWhZVm5uf/9QeLrFv9aBG4896u6P50SzA5zn
QU7r9sSwws7o30+2f2A9k57vEPm9DiqKQBgQoI0reZmrrs6YzP3PivLp6v2DLlED
WToOy3b5ZLw2GKeAgYFStQpSLvpwBqFdqjydzJ+ynZ+a07YATDuwj1iailSzY/g2
pCZaXazQ4jBJF5Qsy3C1IpN7OAQS1CBzQTEeLrYi0KQ3aM+VQDp9SV6s1vakFEVD
VsYGJxlUG8MvuPFfvUlqGvYtoudy5R0rdTCdIbWAAQlQosser+1u6mhyqHnSPqyW
Pja/hXmTsffASwqQYXd8HqEVZRDtaqX30dfuLYYWWGq9WBcLPpPFrYilwCwgFZ9C
8HGaMHAtEC3vAN8KBj5rVNgMILrxZlMXVqf+BTs+M9dq6Ed7Q+VYytK9WTM7G3Hy
WY/fhWgFoAOyaTHc1EmxbWjxb5wM2anK3zeiJAomAsZ/f/xhwnq/aLgu3x846Af5
V1v8uHhjR+HfHRn0wmaN
=bziR
-----END PGP SIGNATURE-----

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] block: copy bi_vcnt in __bio_clone_fast
  2014-10-09 14:26   ` Jeff Mahoney
@ 2014-10-09 15:25     ` Ming Lei
  2014-10-09 16:13       ` Ming Lei
  0 siblings, 1 reply; 8+ messages in thread
From: Ming Lei @ 2014-10-09 15:25 UTC (permalink / raw)
  To: Jeff Mahoney; +Cc: Jeff Moyer, Jens Axboe, Linux Kernel Maling List

[-- Attachment #1: Type: text/plain, Size: 2122 bytes --]

On Thu, Oct 9, 2014 at 10:26 PM, Jeff Mahoney <jeffm@suse.com> wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> On 10/9/14, 9:53 AM, Jeff Moyer wrote:
>> Jeff Mahoney <jeffm@suse.com> writes:
>>
>>> Commit 05f1dd53152173 (block: add queue flag for disabling SG
>>> merging) uses bi_vcnt to assign bio->bi_phys_segments if sg
>>> merging is disabled. When using device mapper on top of a blk-mq
>>> device (virtio_blk in my test), we'd end up overflowing the
>>> scatterlist in __blk_bios_map_sg.
>>>
>>> __bio_clone_fast copies bi_iter and bi_io_vec but not bi_vcnt,
>>> so blk_recount_segments would report bi_phys_segments as 0.
>>> Since rq->nr_phys_segments is 0 as well, the checks to ensure
>>> that we don't exceed the queue's segment limit end up allowing
>>> more bios (and segments) to attach the a request until we finally
>>> map it. That also means we pass the BUG_ON at the beginning of
>>> virtio_queue_rq, ultimately causing memory corruption and a
>>> crash.
>>>
>>> If we copy bi_vcnt in __bio_clone_fast, the bios and requests
>>> properly report the number of segments and everything works as
>>> expected.
>>>
>>> Originally reported at
>>> http://bugzilla.opensuse.org/show_bug.cgi?id=888259
>>
>> Hi, Jeff,
>>
>> Did you manage to reproduce this problem with commit 0738854
>> (blk-merge: fix blk_recount_segments) applied?  Or perhaps with
>> commit 200612e (dm table: propagate QUEUE_FLAG_NO_SG_MERGE)?
>
> Yep. I was able to reproduce it with 3.17. I did try 0738854 when I
> was still using 3.16 since it looked like a good candidate. Neither of
> those patches affect the problem here. bio->bi_phys_segments never
> gets a value set in the fast clone case and that translates to
> req->nr_phys_segments never getting properly accumulated. That might
> not be a problem except that the NO_SG_MERGE behavior bypasses the
> iteration that would come up with the correct value. In either case,

This patch may get incorrect rq->nr_phys_segments because
bio cloning is often used in case of I/O splitting, so could you
test if the attached patch fixes your problem?

Thanks,
--
Ming Lei

[-- Attachment #2: 0001-blk-merge-don-t-compute-bi_phys_segments-from-bi_vcn.patch --]
[-- Type: text/x-patch, Size: 913 bytes --]

From d3946db93c8eb1da4a0c32249d603f415febf09e Mon Sep 17 00:00:00 2001
From: Ming Lei <ming.lei@canonical.com>
Date: Thu, 9 Oct 2014 23:17:35 +0800
Subject: [PATCH] blk-merge: don't compute bi_phys_segments from bi_vcnt for
 cloned bio

It isn't correct to figure out req->bi_phys_segments from bio->bi_vcnt
if the bio is cloned.

Signed-off-by: Ming Lei <ming.lei@canonical.com>
---
 block/blk-merge.c |    1 +
 1 file changed, 1 insertion(+)

diff --git a/block/blk-merge.c b/block/blk-merge.c
index f71bad3..d5cad1a 100644
--- a/block/blk-merge.c
+++ b/block/blk-merge.c
@@ -98,6 +98,7 @@ void blk_recalc_rq_segments(struct request *rq)
 void blk_recount_segments(struct request_queue *q, struct bio *bio)
 {
 	if (test_bit(QUEUE_FLAG_NO_SG_MERGE, &q->queue_flags) &&
+			!bio_flagged(bio, BIO_CLONED) &&
 			bio->bi_vcnt < queue_max_segments(q))
 		bio->bi_phys_segments = bio->bi_vcnt;
 	else {
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH] block: copy bi_vcnt in __bio_clone_fast
  2014-10-09 15:25     ` Ming Lei
@ 2014-10-09 16:13       ` Ming Lei
  2014-10-09 17:58         ` Jeff Mahoney
  0 siblings, 1 reply; 8+ messages in thread
From: Ming Lei @ 2014-10-09 16:13 UTC (permalink / raw)
  To: Jeff Mahoney; +Cc: Jeff Moyer, Jens Axboe, Linux Kernel Maling List

[-- Attachment #1: Type: text/plain, Size: 2372 bytes --]

On Thu, Oct 9, 2014 at 11:25 PM, Ming Lei <ming.lei@canonical.com> wrote:
> On Thu, Oct 9, 2014 at 10:26 PM, Jeff Mahoney <jeffm@suse.com> wrote:
>> -----BEGIN PGP SIGNED MESSAGE-----
>> Hash: SHA1
>>
>> On 10/9/14, 9:53 AM, Jeff Moyer wrote:
>>> Jeff Mahoney <jeffm@suse.com> writes:
>>>
>>>> Commit 05f1dd53152173 (block: add queue flag for disabling SG
>>>> merging) uses bi_vcnt to assign bio->bi_phys_segments if sg
>>>> merging is disabled. When using device mapper on top of a blk-mq
>>>> device (virtio_blk in my test), we'd end up overflowing the
>>>> scatterlist in __blk_bios_map_sg.
>>>>
>>>> __bio_clone_fast copies bi_iter and bi_io_vec but not bi_vcnt,
>>>> so blk_recount_segments would report bi_phys_segments as 0.
>>>> Since rq->nr_phys_segments is 0 as well, the checks to ensure
>>>> that we don't exceed the queue's segment limit end up allowing
>>>> more bios (and segments) to attach the a request until we finally
>>>> map it. That also means we pass the BUG_ON at the beginning of
>>>> virtio_queue_rq, ultimately causing memory corruption and a
>>>> crash.
>>>>
>>>> If we copy bi_vcnt in __bio_clone_fast, the bios and requests
>>>> properly report the number of segments and everything works as
>>>> expected.
>>>>
>>>> Originally reported at
>>>> http://bugzilla.opensuse.org/show_bug.cgi?id=888259
>>>
>>> Hi, Jeff,
>>>
>>> Did you manage to reproduce this problem with commit 0738854
>>> (blk-merge: fix blk_recount_segments) applied?  Or perhaps with
>>> commit 200612e (dm table: propagate QUEUE_FLAG_NO_SG_MERGE)?
>>
>> Yep. I was able to reproduce it with 3.17. I did try 0738854 when I
>> was still using 3.16 since it looked like a good candidate. Neither of
>> those patches affect the problem here. bio->bi_phys_segments never
>> gets a value set in the fast clone case and that translates to
>> req->nr_phys_segments never getting properly accumulated. That might
>> not be a problem except that the NO_SG_MERGE behavior bypasses the
>> iteration that would come up with the correct value. In either case,
>
> This patch may get incorrect rq->nr_phys_segments because
> bio cloning is often used in case of I/O splitting, so could you
> test if the attached patch fixes your problem?

Please ignore last patch and test the attached patch since we
still should use no sg merge to recalculate req's segments for
cloned bio.

Thanks,

[-- Attachment #2: 0001-blk-merge-don-t-compute-bi_phys_segments-from-bi_vcn.patch --]
[-- Type: text/x-patch, Size: 1275 bytes --]

From fd8be35c1b1e35781137b2020dd29552c3f8eb98 Mon Sep 17 00:00:00 2001
From: Ming Lei <ming.lei@canonical.com>
Date: Thu, 9 Oct 2014 23:17:35 +0800
Subject: [PATCH] blk-merge: don't compute bi_phys_segments from bi_vcnt for
 cloned bio

It isn't correct to figure out req->bi_phys_segments from bio->bi_vcnt
if the bio is cloned.

Signed-off-by: Ming Lei <ming.lei@canonical.com>
---
 block/blk-merge.c |    8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/block/blk-merge.c b/block/blk-merge.c
index f71bad3..ba99351 100644
--- a/block/blk-merge.c
+++ b/block/blk-merge.c
@@ -97,14 +97,18 @@ void blk_recalc_rq_segments(struct request *rq)
 
 void blk_recount_segments(struct request_queue *q, struct bio *bio)
 {
-	if (test_bit(QUEUE_FLAG_NO_SG_MERGE, &q->queue_flags) &&
+	bool no_sg_merge = !!test_bit(QUEUE_FLAG_NO_SG_MERGE,
+			&q->queue_flags);
+
+	if (no_sg_merge && !bio_flagged(bio, BIO_CLONED) &&
 			bio->bi_vcnt < queue_max_segments(q))
 		bio->bi_phys_segments = bio->bi_vcnt;
 	else {
 		struct bio *nxt = bio->bi_next;
 
 		bio->bi_next = NULL;
-		bio->bi_phys_segments = __blk_recalc_rq_segments(q, bio, false);
+		bio->bi_phys_segments = __blk_recalc_rq_segments(q, bio,
+				no_sg_merge);
 		bio->bi_next = nxt;
 	}
 
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH] block: copy bi_vcnt in __bio_clone_fast
  2014-10-09 16:13       ` Ming Lei
@ 2014-10-09 17:58         ` Jeff Mahoney
  2014-10-09 19:12           ` Jens Axboe
  0 siblings, 1 reply; 8+ messages in thread
From: Jeff Mahoney @ 2014-10-09 17:58 UTC (permalink / raw)
  To: Ming Lei; +Cc: Jeff Moyer, Jens Axboe, Linux Kernel Maling List

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 10/9/14, 12:13 PM, Ming Lei wrote:
> On Thu, Oct 9, 2014 at 11:25 PM, Ming Lei <ming.lei@canonical.com>
> wrote:
>> On Thu, Oct 9, 2014 at 10:26 PM, Jeff Mahoney <jeffm@suse.com>
>> wrote:
>>> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
>>> 
>>> On 10/9/14, 9:53 AM, Jeff Moyer wrote:
>>>> Jeff Mahoney <jeffm@suse.com> writes:
>>>> 
>>>>> Commit 05f1dd53152173 (block: add queue flag for disabling
>>>>> SG merging) uses bi_vcnt to assign bio->bi_phys_segments if
>>>>> sg merging is disabled. When using device mapper on top of
>>>>> a blk-mq device (virtio_blk in my test), we'd end up
>>>>> overflowing the scatterlist in __blk_bios_map_sg.
>>>>> 
>>>>> __bio_clone_fast copies bi_iter and bi_io_vec but not
>>>>> bi_vcnt, so blk_recount_segments would report
>>>>> bi_phys_segments as 0. Since rq->nr_phys_segments is 0 as
>>>>> well, the checks to ensure that we don't exceed the queue's
>>>>> segment limit end up allowing more bios (and segments) to
>>>>> attach the a request until we finally map it. That also
>>>>> means we pass the BUG_ON at the beginning of 
>>>>> virtio_queue_rq, ultimately causing memory corruption and
>>>>> a crash.
>>>>> 
>>>>> If we copy bi_vcnt in __bio_clone_fast, the bios and
>>>>> requests properly report the number of segments and
>>>>> everything works as expected.
>>>>> 
>>>>> Originally reported at 
>>>>> http://bugzilla.opensuse.org/show_bug.cgi?id=888259
>>>> 
>>>> Hi, Jeff,
>>>> 
>>>> Did you manage to reproduce this problem with commit 0738854 
>>>> (blk-merge: fix blk_recount_segments) applied?  Or perhaps
>>>> with commit 200612e (dm table: propagate
>>>> QUEUE_FLAG_NO_SG_MERGE)?
>>> 
>>> Yep. I was able to reproduce it with 3.17. I did try 0738854
>>> when I was still using 3.16 since it looked like a good
>>> candidate. Neither of those patches affect the problem here.
>>> bio->bi_phys_segments never gets a value set in the fast clone
>>> case and that translates to req->nr_phys_segments never getting
>>> properly accumulated. That might not be a problem except that
>>> the NO_SG_MERGE behavior bypasses the iteration that would come
>>> up with the correct value. In either case,
>> 
>> This patch may get incorrect rq->nr_phys_segments because bio
>> cloning is often used in case of I/O splitting, so could you test
>> if the attached patch fixes your problem?

Ah. Right.

> Please ignore last patch and test the attached patch since we still
> should use no sg merge to recalculate req's segments for cloned
> bio.

Yep, this fix works as expected.

Thanks,

- -Jeff

- -- 
Jeff Mahoney
SUSE Labs
-----BEGIN PGP SIGNATURE-----
Version: GnuPG/MacGPG2 v2.0.19 (Darwin)

iQIcBAEBAgAGBQJUNsyuAAoJEB57S2MheeWyCSoQALkgpSfaWXEV89Sg0oaVbI/b
KWaHv1lLKICdkIpU+ujwsZtbxOzgwhG13CeWmLZArVvVp2eljciOdpI4vpuXVtOK
12QFj7ShLtgCt7dw/2zE8NP7tY2nAWznHhd8RXrphwM2XJ2APUVGbiUld2e+IqlI
jgdFrSOOVK+ZTUlDTkxjGPD6Q5zCJ7JDZ7cN7IWQB6yu115UoCJ73j9YhpfcDkRA
TZKk5/1NsCq11Gn+a5zCx9WMfauwNEeuwDFXPol8BMAK6BjucrZJjPDG2ykihuuI
EWUUOdnCmXgQgpW9ARrOr7ldfJO7OYL2u9WIgoBwMl1r23ZmSvIvMKIbUFuSwPSz
8yuJpzoH4SaaEO7e2ud2+aMK88Ih1Rr2uN009WPMYn5WXeCWlCBxaLmrFN4nlu3Y
Z/CoFFlhjsNr9Z7QzCv+u2SyEllA0l3/u+t3SNts4jhRReDGrEQIjs0ASvbKqNYZ
+c3hgrZDwv/xL1f7hBSblwIE0DgBn1rPJBfSRkSfrLzSi103lZrBQxpjN3wtjcFK
037/BfkUuzqSTG9MAdZoVkdw/iNh0Q3qfC7PjcT/EDKn4g30+DE04Sh6gDyojyvu
Z0HtRYUPjnlH0OBzvK8AyXmzLoEjkiyL4rSie1F2cy+qK/uRCUjyon8oCK9nfrga
6KWf1kYhmEAO17mN1y6y
=iDIX
-----END PGP SIGNATURE-----

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] block: copy bi_vcnt in __bio_clone_fast
  2014-10-09 17:58         ` Jeff Mahoney
@ 2014-10-09 19:12           ` Jens Axboe
  2014-10-10  1:24             ` Ming Lei
  0 siblings, 1 reply; 8+ messages in thread
From: Jens Axboe @ 2014-10-09 19:12 UTC (permalink / raw)
  To: Jeff Mahoney, Ming Lei; +Cc: Jeff Moyer, Linux Kernel Maling List

On 10/09/2014 11:58 AM, Jeff Mahoney wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> On 10/9/14, 12:13 PM, Ming Lei wrote:
>> On Thu, Oct 9, 2014 at 11:25 PM, Ming Lei <ming.lei@canonical.com>
>> wrote:
>>> On Thu, Oct 9, 2014 at 10:26 PM, Jeff Mahoney <jeffm@suse.com>
>>> wrote:
>>>> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
>>>>
>>>> On 10/9/14, 9:53 AM, Jeff Moyer wrote:
>>>>> Jeff Mahoney <jeffm@suse.com> writes:
>>>>>
>>>>>> Commit 05f1dd53152173 (block: add queue flag for disabling
>>>>>> SG merging) uses bi_vcnt to assign bio->bi_phys_segments if
>>>>>> sg merging is disabled. When using device mapper on top of
>>>>>> a blk-mq device (virtio_blk in my test), we'd end up
>>>>>> overflowing the scatterlist in __blk_bios_map_sg.
>>>>>>
>>>>>> __bio_clone_fast copies bi_iter and bi_io_vec but not
>>>>>> bi_vcnt, so blk_recount_segments would report
>>>>>> bi_phys_segments as 0. Since rq->nr_phys_segments is 0 as
>>>>>> well, the checks to ensure that we don't exceed the queue's
>>>>>> segment limit end up allowing more bios (and segments) to
>>>>>> attach the a request until we finally map it. That also
>>>>>> means we pass the BUG_ON at the beginning of 
>>>>>> virtio_queue_rq, ultimately causing memory corruption and
>>>>>> a crash.
>>>>>>
>>>>>> If we copy bi_vcnt in __bio_clone_fast, the bios and
>>>>>> requests properly report the number of segments and
>>>>>> everything works as expected.
>>>>>>
>>>>>> Originally reported at 
>>>>>> http://bugzilla.opensuse.org/show_bug.cgi?id=888259
>>>>>
>>>>> Hi, Jeff,
>>>>>
>>>>> Did you manage to reproduce this problem with commit 0738854 
>>>>> (blk-merge: fix blk_recount_segments) applied?  Or perhaps
>>>>> with commit 200612e (dm table: propagate
>>>>> QUEUE_FLAG_NO_SG_MERGE)?
>>>>
>>>> Yep. I was able to reproduce it with 3.17. I did try 0738854
>>>> when I was still using 3.16 since it looked like a good
>>>> candidate. Neither of those patches affect the problem here.
>>>> bio->bi_phys_segments never gets a value set in the fast clone
>>>> case and that translates to req->nr_phys_segments never getting
>>>> properly accumulated. That might not be a problem except that
>>>> the NO_SG_MERGE behavior bypasses the iteration that would come
>>>> up with the correct value. In either case,
>>>
>>> This patch may get incorrect rq->nr_phys_segments because bio
>>> cloning is often used in case of I/O splitting, so could you test
>>> if the attached patch fixes your problem?
> 
> Ah. Right.
> 
>> Please ignore last patch and test the attached patch since we still
>> should use no sg merge to recalculate req's segments for cloned
>> bio.
> 
> Yep, this fix works as expected.

Thanks Ming, I've queued it up (and added Jeff's tested-by).

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] block: copy bi_vcnt in __bio_clone_fast
  2014-10-09 19:12           ` Jens Axboe
@ 2014-10-10  1:24             ` Ming Lei
  0 siblings, 0 replies; 8+ messages in thread
From: Ming Lei @ 2014-10-10  1:24 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Jeff Mahoney, Jeff Moyer, Linux Kernel Maling List

On Fri, Oct 10, 2014 at 3:12 AM, Jens Axboe <axboe@kernel.dk> wrote:
> On 10/09/2014 11:58 AM, Jeff Mahoney wrote:
>> -----BEGIN PGP SIGNED MESSAGE-----
>> Hash: SHA1
>>
>> On 10/9/14, 12:13 PM, Ming Lei wrote:
>>> On Thu, Oct 9, 2014 at 11:25 PM, Ming Lei <ming.lei@canonical.com>
>>> wrote:
>>>> On Thu, Oct 9, 2014 at 10:26 PM, Jeff Mahoney <jeffm@suse.com>
>>>> wrote:
>>>>> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
>>>>>
>>>>> On 10/9/14, 9:53 AM, Jeff Moyer wrote:
>>>>>> Jeff Mahoney <jeffm@suse.com> writes:
>>>>>>
>>>>>>> Commit 05f1dd53152173 (block: add queue flag for disabling
>>>>>>> SG merging) uses bi_vcnt to assign bio->bi_phys_segments if
>>>>>>> sg merging is disabled. When using device mapper on top of
>>>>>>> a blk-mq device (virtio_blk in my test), we'd end up
>>>>>>> overflowing the scatterlist in __blk_bios_map_sg.
>>>>>>>
>>>>>>> __bio_clone_fast copies bi_iter and bi_io_vec but not
>>>>>>> bi_vcnt, so blk_recount_segments would report
>>>>>>> bi_phys_segments as 0. Since rq->nr_phys_segments is 0 as
>>>>>>> well, the checks to ensure that we don't exceed the queue's
>>>>>>> segment limit end up allowing more bios (and segments) to
>>>>>>> attach the a request until we finally map it. That also
>>>>>>> means we pass the BUG_ON at the beginning of
>>>>>>> virtio_queue_rq, ultimately causing memory corruption and
>>>>>>> a crash.
>>>>>>>
>>>>>>> If we copy bi_vcnt in __bio_clone_fast, the bios and
>>>>>>> requests properly report the number of segments and
>>>>>>> everything works as expected.
>>>>>>>
>>>>>>> Originally reported at
>>>>>>> http://bugzilla.opensuse.org/show_bug.cgi?id=888259
>>>>>>
>>>>>> Hi, Jeff,
>>>>>>
>>>>>> Did you manage to reproduce this problem with commit 0738854
>>>>>> (blk-merge: fix blk_recount_segments) applied?  Or perhaps
>>>>>> with commit 200612e (dm table: propagate
>>>>>> QUEUE_FLAG_NO_SG_MERGE)?
>>>>>
>>>>> Yep. I was able to reproduce it with 3.17. I did try 0738854
>>>>> when I was still using 3.16 since it looked like a good
>>>>> candidate. Neither of those patches affect the problem here.
>>>>> bio->bi_phys_segments never gets a value set in the fast clone
>>>>> case and that translates to req->nr_phys_segments never getting
>>>>> properly accumulated. That might not be a problem except that
>>>>> the NO_SG_MERGE behavior bypasses the iteration that would come
>>>>> up with the correct value. In either case,
>>>>
>>>> This patch may get incorrect rq->nr_phys_segments because bio
>>>> cloning is often used in case of I/O splitting, so could you test
>>>> if the attached patch fixes your problem?
>>
>> Ah. Right.
>>
>>> Please ignore last patch and test the attached patch since we still
>>> should use no sg merge to recalculate req's segments for cloned
>>> bio.
>>
>> Yep, this fix works as expected.
>
> Thanks Ming, I've queued it up (and added Jeff's tested-by).

Sorry, looks my patch is still wrong:

- merge should be done if bio->bi_vcnt is bigger than max segment
- if one bio has been cloned from, bi_vcnt can't be relied on too

So that means it may be incorrect to use bio->bi_vcnt in
blk_recalc_rq_segments(), but seems like using bio_segments()
is a bit expensive for NO_SG_MERGE.

Thanks,

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2014-10-10  1:24 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-10-08 22:54 [PATCH] block: copy bi_vcnt in __bio_clone_fast Jeff Mahoney
2014-10-09 13:53 ` Jeff Moyer
2014-10-09 14:26   ` Jeff Mahoney
2014-10-09 15:25     ` Ming Lei
2014-10-09 16:13       ` Ming Lei
2014-10-09 17:58         ` Jeff Mahoney
2014-10-09 19:12           ` Jens Axboe
2014-10-10  1:24             ` Ming Lei

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.