All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] block: note about cloned bios and bio_for_each_segment_all
@ 2017-07-14 13:40 David Sterba
  2017-07-14 13:47 ` Ming Lei
  0 siblings, 1 reply; 13+ messages in thread
From: David Sterba @ 2017-07-14 13:40 UTC (permalink / raw)
  To: linux-block; +Cc: David Sterba, bo.li.liu, fdmanana

We've switched to cloned bios in btrfs and hit a nasty bug leading to
corruptions, when cloned bios are iterated by bio_for_each_segment_all.

Report and fix:
https://patchwork.kernel.org/patch/9838535/

As a matter of precaution, we've added assertions to btrfs code to catch
the bad usage pattern:

https://patchwork.kernel.org/patch/9839267/

The cloned/bi_vcnt behaviour seems tobe implementation dependent and is
not documented, so this patch at least warns about this one particular
case but this might still be insufficient.

CC: linux-block@vger.kernel.org
Signed-off-by: David Sterba <dsterba@suse.com>
---
 include/linux/bio.h | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/include/linux/bio.h b/include/linux/bio.h
index 7b1cf4ba0902..f1ac84edcf39 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -156,6 +156,9 @@ static inline void *bio_data(struct bio *bio)
 /*
  * drivers should _never_ use the all version - the bio may have been split
  * before it got to the driver and the driver won't own all of it
+ *
+ * Note that cloned bios must not use this as their bi_vcnt may be invalid and
+ * this could lead to silent corruptions.
  */
 #define bio_for_each_segment_all(bvl, bio, i)				\
 	for (i = 0, bvl = (bio)->bi_io_vec; i < (bio)->bi_vcnt; i++, bvl++)
-- 
2.13.0

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH] block: note about cloned bios and bio_for_each_segment_all
  2017-07-14 13:40 [PATCH] block: note about cloned bios and bio_for_each_segment_all David Sterba
@ 2017-07-14 13:47 ` Ming Lei
  2017-07-14 14:22   ` Jens Axboe
  2017-07-14 15:03   ` David Sterba
  0 siblings, 2 replies; 13+ messages in thread
From: Ming Lei @ 2017-07-14 13:47 UTC (permalink / raw)
  To: David Sterba; +Cc: linux-block, Liu Bo, fdmanana

On Fri, Jul 14, 2017 at 9:40 PM, David Sterba <dsterba@suse.com> wrote:
> We've switched to cloned bios in btrfs and hit a nasty bug leading to
> corruptions, when cloned bios are iterated by bio_for_each_segment_all.

No, you simply can't use bio_for_each_segment_all on cloned bio, and the
reason is obviously.

>
> Report and fix:
> https://patchwork.kernel.org/patch/9838535/
>
> As a matter of precaution, we've added assertions to btrfs code to catch
> the bad usage pattern:
>
> https://patchwork.kernel.org/patch/9839267/
>
> The cloned/bi_vcnt behaviour seems tobe implementation dependent and is
> not documented, so this patch at least warns about this one particular
> case but this might still be insufficient.
>
> CC: linux-block@vger.kernel.org
> Signed-off-by: David Sterba <dsterba@suse.com>
> ---
>  include/linux/bio.h | 3 +++
>  1 file changed, 3 insertions(+)
>
> diff --git a/include/linux/bio.h b/include/linux/bio.h
> index 7b1cf4ba0902..f1ac84edcf39 100644
> --- a/include/linux/bio.h
> +++ b/include/linux/bio.h
> @@ -156,6 +156,9 @@ static inline void *bio_data(struct bio *bio)
>  /*
>   * drivers should _never_ use the all version - the bio may have been split
>   * before it got to the driver and the driver won't own all of it
> + *
> + * Note that cloned bios must not use this as their bi_vcnt may be invalid and
> + * this could lead to silent corruptions.
>   */
>  #define bio_for_each_segment_all(bvl, bio, i)                          \
>         for (i = 0, bvl = (bio)->bi_io_vec; i < (bio)->bi_vcnt; i++, bvl++)
> --
> 2.13.0
>

Maybe we can add a warning here if it is a cloned bio.


-- 
Ming Lei

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] block: note about cloned bios and bio_for_each_segment_all
  2017-07-14 13:47 ` Ming Lei
@ 2017-07-14 14:22   ` Jens Axboe
  2017-07-14 20:54     ` Liu Bo
  2017-07-15  0:28     ` David Sterba
  2017-07-14 15:03   ` David Sterba
  1 sibling, 2 replies; 13+ messages in thread
From: Jens Axboe @ 2017-07-14 14:22 UTC (permalink / raw)
  To: Ming Lei, David Sterba; +Cc: linux-block, Liu Bo, fdmanana

On 07/14/2017 07:47 AM, Ming Lei wrote:
>> @@ -156,6 +156,9 @@ static inline void *bio_data(struct bio *bio)
>>  /*
>>   * drivers should _never_ use the all version - the bio may have been split
>>   * before it got to the driver and the driver won't own all of it
>> + *
>> + * Note that cloned bios must not use this as their bi_vcnt may be invalid and
>> + * this could lead to silent corruptions.
>>   */
>>  #define bio_for_each_segment_all(bvl, bio, i)                          \
>>         for (i = 0, bvl = (bio)->bi_io_vec; i < (bio)->bi_vcnt; i++, bvl++)
>> --
>> 2.13.0
>>
> 
> Maybe we can add a warning here if it is a cloned bio.

I think that's a good idea, it's easy for people to get this wrong, and
the consequences can be dire. How about something like this?

diff --git a/include/linux/bio.h b/include/linux/bio.h
index 7b1cf4ba0902..13b6ac6eae29 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -155,9 +155,12 @@ static inline void *bio_data(struct bio *bio)
 
 /*
  * drivers should _never_ use the all version - the bio may have been split
- * before it got to the driver and the driver won't own all of it
+ * before it got to the driver and the driver won't own all of it.
+ *
+ * Don't use this on cloned bio's.
  */
 #define bio_for_each_segment_all(bvl, bio, i)				\
+	WARN_ON_ONCE(bio_flagged(bio, BIO_CLONED));			\
 	for (i = 0, bvl = (bio)->bi_io_vec; i < (bio)->bi_vcnt; i++, bvl++)
 
 static inline void bio_advance_iter(struct bio *bio, struct bvec_iter *iter,

-- 
Jens Axboe

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH] block: note about cloned bios and bio_for_each_segment_all
  2017-07-14 13:47 ` Ming Lei
  2017-07-14 14:22   ` Jens Axboe
@ 2017-07-14 15:03   ` David Sterba
  2017-07-14 17:56     ` Filipe Manana
  1 sibling, 1 reply; 13+ messages in thread
From: David Sterba @ 2017-07-14 15:03 UTC (permalink / raw)
  To: Ming Lei; +Cc: Liu Bo, Filipe Manana, linux-block

On Fri, Jul 14, 2017 at 09:47:30PM +0800, Ming Lei wrote:
> On Fri, Jul 14, 2017 at 9:40 PM, David Sterba <dsterba@suse.com> wrote:
> > We've switched to cloned bios in btrfs and hit a nasty bug leading to
> > corruptions, when cloned bios are iterated by bio_for_each_segment_all.
> 
> No, you simply can't use bio_for_each_segment_all on cloned bio, and the
> reason is obviously.

This was not obvious to us, speaking for the btrfs developers trying to
make more use of the of the bio API, so we had to find out the hard way.

The proposed WARN_ON, possibly more sanity checks or documentation would
help us not to trip over similar problems in the future. I try to take
great care when dealing with code changing bios on our side so I read
the headers, and partially the implementation, but still can miss
something.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] block: note about cloned bios and bio_for_each_segment_all
  2017-07-14 15:03   ` David Sterba
@ 2017-07-14 17:56     ` Filipe Manana
  2017-07-14 18:19       ` Jens Axboe
  0 siblings, 1 reply; 13+ messages in thread
From: Filipe Manana @ 2017-07-14 17:56 UTC (permalink / raw)
  To: dsterba, Ming Lei; +Cc: Liu Bo, linux-block



On 07/14/2017 04:03 PM, David Sterba wrote:
> On Fri, Jul 14, 2017 at 09:47:30PM +0800, Ming Lei wrote:
>> On Fri, Jul 14, 2017 at 9:40 PM, David Sterba <dsterba@suse.com> wrote:
>>> We've switched to cloned bios in btrfs and hit a nasty bug leading to
>>> corruptions, when cloned bios are iterated by bio_for_each_segment_all.
>>
>> No, you simply can't use bio_for_each_segment_all on cloned bio, and the
>> reason is obviously.
> 
> This was not obvious to us, speaking for the btrfs developers trying to
> make more use of the of the bio API, so we had to find out the hard way.

Yep, it might be obvious to those familiar with the block layer's
internals, but for those not so familiar, it's not. There's no mention
in bio_clone_fast() that the cloned bio's bi_vcnt shouldn't be used, and
after finding that, one has to check which bio APIs use it and which
don't. In this specific btrfs issue, it lead to silent write
corruptions, making it harder to find (as opposed to crashes or other
immediate failures).

> 
> The proposed WARN_ON, possibly more sanity checks or documentation would
> help us not to trip over similar problems in the future. I try to take
> great care when dealing with code changing bios on our side so I read
> the headers, and partially the implementation, but still can miss
> something.
> 

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] block: note about cloned bios and bio_for_each_segment_all
  2017-07-14 17:56     ` Filipe Manana
@ 2017-07-14 18:19       ` Jens Axboe
  0 siblings, 0 replies; 13+ messages in thread
From: Jens Axboe @ 2017-07-14 18:19 UTC (permalink / raw)
  To: Filipe Manana, dsterba, Ming Lei; +Cc: Liu Bo, linux-block

On 07/14/2017 11:56 AM, Filipe Manana wrote:
> 
> 
> On 07/14/2017 04:03 PM, David Sterba wrote:
>> On Fri, Jul 14, 2017 at 09:47:30PM +0800, Ming Lei wrote:
>>> On Fri, Jul 14, 2017 at 9:40 PM, David Sterba <dsterba@suse.com> wrote:
>>>> We've switched to cloned bios in btrfs and hit a nasty bug leading to
>>>> corruptions, when cloned bios are iterated by bio_for_each_segment_all.
>>>
>>> No, you simply can't use bio_for_each_segment_all on cloned bio, and the
>>> reason is obviously.
>>
>> This was not obvious to us, speaking for the btrfs developers trying to
>> make more use of the of the bio API, so we had to find out the hard way.
> 
> Yep, it might be obvious to those familiar with the block layer's
> internals, but for those not so familiar, it's not. There's no mention
> in bio_clone_fast() that the cloned bio's bi_vcnt shouldn't be used,
> and after finding that, one has to check which bio APIs use it and
> which don't. In this specific btrfs issue, it lead to silent write
> corruptions, making it harder to find (as opposed to crashes or other
> immediate failures).

It's hard to circulate info like that, but the WARN_ON() should have
been there from the get-go. I just need someone to test that patch
triggers for the problematic case, then I'd be happy to get it queued
up.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] block: note about cloned bios and bio_for_each_segment_all
  2017-07-14 14:22   ` Jens Axboe
@ 2017-07-14 20:54     ` Liu Bo
  2017-07-14 22:35       ` Ming Lei
  2017-07-18 22:19       ` Bart Van Assche
  2017-07-15  0:28     ` David Sterba
  1 sibling, 2 replies; 13+ messages in thread
From: Liu Bo @ 2017-07-14 20:54 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Ming Lei, David Sterba, linux-block, fdmanana

On Fri, Jul 14, 2017 at 08:22:31AM -0600, Jens Axboe wrote:
> On 07/14/2017 07:47 AM, Ming Lei wrote:
> >> @@ -156,6 +156,9 @@ static inline void *bio_data(struct bio *bio)
> >>  /*
> >>   * drivers should _never_ use the all version - the bio may have been split
> >>   * before it got to the driver and the driver won't own all of it
> >> + *
> >> + * Note that cloned bios must not use this as their bi_vcnt may be invalid and
> >> + * this could lead to silent corruptions.
> >>   */
> >>  #define bio_for_each_segment_all(bvl, bio, i)                          \
> >>         for (i = 0, bvl = (bio)->bi_io_vec; i < (bio)->bi_vcnt; i++, bvl++)
> >> --
> >> 2.13.0
> >>
> > 
> > Maybe we can add a warning here if it is a cloned bio.
> 
> I think that's a good idea, it's easy for people to get this wrong, and
> the consequences can be dire. How about something like this?
> 
> diff --git a/include/linux/bio.h b/include/linux/bio.h
> index 7b1cf4ba0902..13b6ac6eae29 100644
> --- a/include/linux/bio.h
> +++ b/include/linux/bio.h
> @@ -155,9 +155,12 @@ static inline void *bio_data(struct bio *bio)
>  
>  /*
>   * drivers should _never_ use the all version - the bio may have been split
> - * before it got to the driver and the driver won't own all of it
> + * before it got to the driver and the driver won't own all of it.
> + *
> + * Don't use this on cloned bio's.
>   */
>  #define bio_for_each_segment_all(bvl, bio, i)				\
> +	WARN_ON_ONCE(bio_flagged(bio, BIO_CLONED));			\
>  	for (i = 0, bvl = (bio)->bi_io_vec; i < (bio)->bi_vcnt; i++, bvl++)
>  
>  static inline void bio_advance_iter(struct bio *bio, struct bvec_iter *iter,
> 

This patch gave me a crash, I'm double checking it..

thanks,
-liubo

[  104.140220] BUG: unable to handle kernel paging request at ffffffffa0399c1a
[  104.140675] IP: report_bug+0xc4/0x180
[  104.140916] PGD 2626067 
[  104.140917] P4D 2626067 
[  104.141089] PUD 2627063 
[  104.141259] PMD 2346aa067 
[  104.141429] PTE 800000023569c161
[  104.141610] 
[  104.141926] Oops: 0003 [#1] SMP
[  104.142137] Dumping ftrace buffer:
[  104.142366]    (ftrace buffer empty)
[  104.142602] Modules linked in: btrfs(OE) xor raid6_pq ppdev parport_pc parport serio_raw nfsd auth_rpcgss nfs_acl lockd grace sunrpc [last unloaded$
 xor]
[  104.143493] CPU: 0 PID: 144 Comm: kworker/u16:4 Tainted: G        W  OE   4.12.0+ #801
[  104.144009] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.9.3-1.fc25 04/01/2014
[  104.144654] Workqueue: btrfs-rmw btrfs_rmw_helper [btrfs]
[  104.145009] task: ffff8802382f8000 task.stack: ffffc90000f80000
[  104.145393] RIP: 0010:report_bug+0xc4/0x180
[  104.145668] RSP: 0018:ffffc90000f83ac8 EFLAGS: 00010002
[  104.146015] RAX: 0000000000000001 RBX: ffffffffa0356cff RCX: 0000000000000001
[  104.146474] RDX: ffffffffa0399c10 RSI: 0000000000000480 RDI: 0000000000000000
[  104.146931] RBP: ffffc90000f83ae8 R08: 0000000000000907 R09: 0000000000000000
[  104.147393] R10: 000000005170b2af R11: 000000002b881219 R12: ffffc90000f83c38
[  104.147852] R13: ffffffffa038a415 R14: 0000000000000004 R15: 0000000000000006
[  104.148315] FS:  0000000000000000(0000) GS:ffff88023a600000(0000) knlGS:0000000000000000
[  104.148836] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  104.149211] CR2: ffffffffa0399c1a CR3: 00000002244fc000 CR4: 00000000000006f0
[  104.149672] Call Trace:
[  104.149840]  fixup_bug+0x43/0x60
[  104.150060]  do_trap+0x18a/0x1f0
[  104.150276]  do_error_trap+0xdf/0x1a0
[  104.150606]  ? index_rbio_pages+0x14f/0x160 [btrfs]
[  104.150929]  ? trace_hardirqs_off_thunk+0x1a/0x1c
[  104.151241]  do_invalid_op+0x20/0x30
[  104.151478]  invalid_op+0x1e/0x30
[  104.151787] RIP: 0010:index_rbio_pages+0x14f/0x160 [btrfs]
[  104.152148] RSP: 0018:ffffc90000f83ce8 EFLAGS: 00010002
[  104.152488] RAX: ffffffffa0421938 RBX: 0000000000000000 RCX: 0000000000000000
[  104.152947] RDX: 0000000000000003 RSI: 0000000000000001 RDI: ffffffffa0440420
[  104.153410] RBP: ffffc90000f83d18 R08: 0000000000000000 R09: 0000000000000000
[  104.153869] R10: 0000000000000001 R11: 000000002b881219 R12: ffffffffa0421960
[  104.154330] R13: 0000000000000001 R14: ffff880236e00000 R15: ffff880227718080
[  104.154882]  ? index_rbio_pages+0xc6/0x160 [btrfs]
[  104.155286]  rmw_work+0x76/0x310 [btrfs]
[  104.155633]  btrfs_scrubparity_helper+0xad/0x8e0 [btrfs]
[  104.156070]  btrfs_rmw_helper+0xe/0x10 [btrfs]
[  104.156364]  process_one_work+0x34f/0x9c0
[  104.156631]  worker_thread+0x34a/0x6b0
[  104.156879]  kthread+0x180/0x190
[  104.157095]  ? create_worker+0x230/0x230
[  104.157352]  ? kthread_create_on_node+0x70/0x70
[  104.157648]  ? kthread_create_on_node+0x70/0x70
[  104.157944]  ret_from_fork+0x2a/0x40
[  104.158183] Code: 44 89 c7 31 c0 66 83 e7 04 0f 95 c0 48 83 c0 02 48 83 04 c5 18 e6 dc 82 01 66 85 ff b8 01 00 00 00 0f 85 72 ff ff ff 41 83 c8 04 <
66> 44 89 42 0a 48 89 c8 83 e0 01 48 83 c0 02 48 83 04 c5 f0 e5 
[  104.159436] RIP: report_bug+0xc4/0x180 RSP: ffffc90000f83ac8
[  104.159803] CR2: ffffffffa0399c1a
[  104.160026] ---[ end trace 78686c1f7150bacf ]---

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] block: note about cloned bios and bio_for_each_segment_all
  2017-07-14 20:54     ` Liu Bo
@ 2017-07-14 22:35       ` Ming Lei
  2017-07-14 22:37         ` Jens Axboe
  2017-07-14 23:20         ` Liu Bo
  2017-07-18 22:19       ` Bart Van Assche
  1 sibling, 2 replies; 13+ messages in thread
From: Ming Lei @ 2017-07-14 22:35 UTC (permalink / raw)
  To: Liu Bo; +Cc: Jens Axboe, David Sterba, linux-block, fdmanana

On Sat, Jul 15, 2017 at 4:54 AM, Liu Bo <bo.li.liu@oracle.com> wrote:
> On Fri, Jul 14, 2017 at 08:22:31AM -0600, Jens Axboe wrote:
>> On 07/14/2017 07:47 AM, Ming Lei wrote:
>> >> @@ -156,6 +156,9 @@ static inline void *bio_data(struct bio *bio)
>> >>  /*
>> >>   * drivers should _never_ use the all version - the bio may have been split
>> >>   * before it got to the driver and the driver won't own all of it
>> >> + *
>> >> + * Note that cloned bios must not use this as their bi_vcnt may be invalid and
>> >> + * this could lead to silent corruptions.
>> >>   */
>> >>  #define bio_for_each_segment_all(bvl, bio, i)                          \
>> >>         for (i = 0, bvl = (bio)->bi_io_vec; i < (bio)->bi_vcnt; i++, bvl++)
>> >> --
>> >> 2.13.0
>> >>
>> >
>> > Maybe we can add a warning here if it is a cloned bio.
>>
>> I think that's a good idea, it's easy for people to get this wrong, and
>> the consequences can be dire. How about something like this?
>>
>> diff --git a/include/linux/bio.h b/include/linux/bio.h
>> index 7b1cf4ba0902..13b6ac6eae29 100644
>> --- a/include/linux/bio.h
>> +++ b/include/linux/bio.h
>> @@ -155,9 +155,12 @@ static inline void *bio_data(struct bio *bio)
>>
>>  /*
>>   * drivers should _never_ use the all version - the bio may have been split
>> - * before it got to the driver and the driver won't own all of it
>> + * before it got to the driver and the driver won't own all of it.
>> + *
>> + * Don't use this on cloned bio's.
>>   */
>>  #define bio_for_each_segment_all(bvl, bio, i)                                \
>> +     WARN_ON_ONCE(bio_flagged(bio, BIO_CLONED));                     \
>>       for (i = 0, bvl = (bio)->bi_io_vec; i < (bio)->bi_vcnt; i++, bvl++)
>>
>>  static inline void bio_advance_iter(struct bio *bio, struct bvec_iter *iter,
>>
>
> This patch gave me a crash, I'm double checking it..

Hi Liu Bo,

Looks one extra warning shouldn't have trigger a crash, please double
check and update
with us.

I just start a VM and run a quick test on ext4, btrfs, looks
everything is fine, and not see
any warning.

-- 
Ming Lei

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] block: note about cloned bios and bio_for_each_segment_all
  2017-07-14 22:35       ` Ming Lei
@ 2017-07-14 22:37         ` Jens Axboe
  2017-07-14 23:20         ` Liu Bo
  1 sibling, 0 replies; 13+ messages in thread
From: Jens Axboe @ 2017-07-14 22:37 UTC (permalink / raw)
  To: Ming Lei, Liu Bo; +Cc: David Sterba, linux-block, fdmanana

On 07/14/2017 04:35 PM, Ming Lei wrote:
> On Sat, Jul 15, 2017 at 4:54 AM, Liu Bo <bo.li.liu@oracle.com> wrote:
>> On Fri, Jul 14, 2017 at 08:22:31AM -0600, Jens Axboe wrote:
>>> On 07/14/2017 07:47 AM, Ming Lei wrote:
>>>>> @@ -156,6 +156,9 @@ static inline void *bio_data(struct bio *bio)
>>>>>  /*
>>>>>   * drivers should _never_ use the all version - the bio may have been split
>>>>>   * before it got to the driver and the driver won't own all of it
>>>>> + *
>>>>> + * Note that cloned bios must not use this as their bi_vcnt may be invalid and
>>>>> + * this could lead to silent corruptions.
>>>>>   */
>>>>>  #define bio_for_each_segment_all(bvl, bio, i)                          \
>>>>>         for (i = 0, bvl = (bio)->bi_io_vec; i < (bio)->bi_vcnt; i++, bvl++)
>>>>> --
>>>>> 2.13.0
>>>>>
>>>>
>>>> Maybe we can add a warning here if it is a cloned bio.
>>>
>>> I think that's a good idea, it's easy for people to get this wrong, and
>>> the consequences can be dire. How about something like this?
>>>
>>> diff --git a/include/linux/bio.h b/include/linux/bio.h
>>> index 7b1cf4ba0902..13b6ac6eae29 100644
>>> --- a/include/linux/bio.h
>>> +++ b/include/linux/bio.h
>>> @@ -155,9 +155,12 @@ static inline void *bio_data(struct bio *bio)
>>>
>>>  /*
>>>   * drivers should _never_ use the all version - the bio may have been split
>>> - * before it got to the driver and the driver won't own all of it
>>> + * before it got to the driver and the driver won't own all of it.
>>> + *
>>> + * Don't use this on cloned bio's.
>>>   */
>>>  #define bio_for_each_segment_all(bvl, bio, i)                                \
>>> +     WARN_ON_ONCE(bio_flagged(bio, BIO_CLONED));                     \
>>>       for (i = 0, bvl = (bio)->bi_io_vec; i < (bio)->bi_vcnt; i++, bvl++)
>>>
>>>  static inline void bio_advance_iter(struct bio *bio, struct bvec_iter *iter,
>>>
>>
>> This patch gave me a crash, I'm double checking it..
> 
> Hi Liu Bo,
> 
> Looks one extra warning shouldn't have trigger a crash, please double
> check and update
> with us.
> 
> I just start a VM and run a quick test on ext4, btrfs, looks
> everything is fine, and not see
> any warning.

I booted it here too, and it works fine for me. So I agree, the crash seems
to be unrelated.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] block: note about cloned bios and bio_for_each_segment_all
  2017-07-14 22:35       ` Ming Lei
  2017-07-14 22:37         ` Jens Axboe
@ 2017-07-14 23:20         ` Liu Bo
  1 sibling, 0 replies; 13+ messages in thread
From: Liu Bo @ 2017-07-14 23:20 UTC (permalink / raw)
  To: Ming Lei; +Cc: Jens Axboe, David Sterba, linux-block, fdmanana

On Sat, Jul 15, 2017 at 06:35:05AM +0800, Ming Lei wrote:
> On Sat, Jul 15, 2017 at 4:54 AM, Liu Bo <bo.li.liu@oracle.com> wrote:
> > On Fri, Jul 14, 2017 at 08:22:31AM -0600, Jens Axboe wrote:
> >> On 07/14/2017 07:47 AM, Ming Lei wrote:
> >> >> @@ -156,6 +156,9 @@ static inline void *bio_data(struct bio *bio)
> >> >>  /*
> >> >>   * drivers should _never_ use the all version - the bio may have been split
> >> >>   * before it got to the driver and the driver won't own all of it
> >> >> + *
> >> >> + * Note that cloned bios must not use this as their bi_vcnt may be invalid and
> >> >> + * this could lead to silent corruptions.
> >> >>   */
> >> >>  #define bio_for_each_segment_all(bvl, bio, i)                          \
> >> >>         for (i = 0, bvl = (bio)->bi_io_vec; i < (bio)->bi_vcnt; i++, bvl++)
> >> >> --
> >> >> 2.13.0
> >> >>
> >> >
> >> > Maybe we can add a warning here if it is a cloned bio.
> >>
> >> I think that's a good idea, it's easy for people to get this wrong, and
> >> the consequences can be dire. How about something like this?
> >>
> >> diff --git a/include/linux/bio.h b/include/linux/bio.h
> >> index 7b1cf4ba0902..13b6ac6eae29 100644
> >> --- a/include/linux/bio.h
> >> +++ b/include/linux/bio.h
> >> @@ -155,9 +155,12 @@ static inline void *bio_data(struct bio *bio)
> >>
> >>  /*
> >>   * drivers should _never_ use the all version - the bio may have been split
> >> - * before it got to the driver and the driver won't own all of it
> >> + * before it got to the driver and the driver won't own all of it.
> >> + *
> >> + * Don't use this on cloned bio's.
> >>   */
> >>  #define bio_for_each_segment_all(bvl, bio, i)                                \
> >> +     WARN_ON_ONCE(bio_flagged(bio, BIO_CLONED));                     \
> >>       for (i = 0, bvl = (bio)->bi_io_vec; i < (bio)->bi_vcnt; i++, bvl++)
> >>
> >>  static inline void bio_advance_iter(struct bio *bio, struct bvec_iter *iter,
> >>
> >
> > This patch gave me a crash, I'm double checking it..
> 
> Hi Liu Bo,
> 
> Looks one extra warning shouldn't have trigger a crash, please double
> check and update
> with us.
> 
> I just start a VM and run a quick test on ext4, btrfs, looks
> everything is fine, and not see
> any warning.
> 
> -- 
> Ming Lei

I removed that WARN_ON() in btrfs's index_rbio_pages() which is simply
WARN_ON(bio_flagged(bio, BIO_CLONED));

And I still got the same crash.

The test I ran is

$ mkfs.btrfs -f -draid5 /dev/sd[cde]
$ mount /dev/sde /mnt/btrfs
$ xfs_io -f -d -c "pwrite 0 128K" /mnt/btrfs/foobar
# then kernel went to panic.

It's 4.12.0 vanilla + Jen's patch, but given that it's purely a
WARN_ON_ONCE(), I haven't figured out where the crash came from.

thanks,
-liubo

[   70.885850] BTRFS: device fsid b994b77f-a9a0-4fa2-8a66-5a36caa7c174 devid 1 transid 5 /dev/sdc
[   70.896194] BTRFS: device fsid b994b77f-a9a0-4fa2-8a66-5a36caa7c174 devid 2 transid 5 /dev/sdd
[   70.903853] BTRFS: device fsid b994b77f-a9a0-4fa2-8a66-5a36caa7c174 devid 3 transid 5 /dev/sde
[   72.991044] BTRFS info (device sde): disk space caching is enabled
[   72.991494] BTRFS info (device sde): has skinny extents
[   72.991836] BTRFS info (device sde): flagging fs with big metadata feature
[   73.015831] BTRFS info (device sde): creating UUID tree
[   82.798313] BUG: unable to handle kernel paging request at ffffffffa03bcc0e
[   82.799070] IP: report_bug+0xc4/0x180
[   82.799312] PGD 2626067 
[   82.799313] P4D 2626067 
[   82.799483] PUD 2627063 
[   82.799655] PMD 2346a3067 
[   82.799868] PTE 800000022019b161
[   82.800091] 
[   82.800471] Oops: 0003 [#1] SMP
[   82.800734] Dumping ftrace buffer:
[   82.800997]    (ftrace buffer empty)
[   82.801305] Modules linked in: btrfs(OE) xor raid6_pq ppdev parport_pc parport serio_raw nfsd auth_rpcgss nfs_acl lockd grace sunrpc
[   82.802312] CPU: 3 PID: 154 Comm: kworker/u16:5 Tainted: G           OE   4.12.0+ #802
[   82.802947] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.9.3-1.fc25 04/01/2014
[   82.803690] Workqueue: btrfs-rmw btrfs_rmw_helper [btrfs]
[   82.804176] task: ffff8802383bd200 task.stack: ffffc90000fa4000
[   82.804680] RIP: 0010:report_bug+0xc4/0x180
[   82.805082] RSP: 0018:ffffc90000fa7ac8 EFLAGS: 00010002
[   82.805454] RAX: 0000000000000001 RBX: ffffffffa0379ca5 RCX: 0000000000000001
[   82.805911] RDX: ffffffffa03bcc04 RSI: 000000000000047f RDI: 0000000000000000
[   82.806373] RBP: ffffc90000fa7ae8 R08: 0000000000000907 R09: 0000000000000000
[   82.806832] R10: 00000000e3e32d2f R11: 000000000526ce2c R12: ffffc90000fa7c38
[   82.807301] R13: ffffffffa03ad415 R14: 0000000000000004 R15: 0000000000000006
[   82.807758] FS:  0000000000000000(0000) GS:ffff88023ac00000(0000) knlGS:0000000000000000
[   82.808653] CR2: ffffffffa03bcc0e CR3: 00000002203f7000 CR4: 00000000000006e0
[   82.809129] Call Trace:
[   82.809297]  fixup_bug+0x43/0x60
[   82.809512]  do_trap+0x18a/0x1f0
[   82.809727]  do_error_trap+0xdf/0x1a0
[   82.810051]  ? index_rbio_pages+0xf5/0x100 [btrfs]
[   82.810366]  ? trace_hardirqs_off_thunk+0x1a/0x1c
[   82.810672]  do_invalid_op+0x20/0x30
[   82.810907]  invalid_op+0x1e/0x30
[   82.811205] RIP: 0010:index_rbio_pages+0xf5/0x100 [btrfs]
[   82.811554] RSP: 0018:ffffc90000fa7ce8 EFLAGS: 00010002
[   82.811891] RAX: 0000000000000002 RBX: ffffffffa0444938 RCX: 0000000000000000
[   82.812349] RDX: 0000000000000003 RSI: 0000000000000001 RDI: ffffffffa04633f8
[   82.812805] RBP: ffffc90000fa7d18 R08: 0000000000000001 R09: 0000000000000000
[   82.813280] R10: 0000000000000001 R11: 000000000526ce2c R12: 0000000000000001
[   82.813738] R13: ffff8802386fc800 R14: ffff88022fe9a200 R15: 0000000000000000
[   82.814280]  ? index_rbio_pages+0x7a/0x100 [btrfs]
[   82.814670]  rmw_work+0x76/0x310 [btrfs]
[   82.815007]  btrfs_scrubparity_helper+0xad/0x8e0 [btrfs]
[   82.815430]  btrfs_rmw_helper+0xe/0x10 [btrfs]
[   82.815723]  process_one_work+0x34f/0x9c0
[   82.815992]  worker_thread+0x34a/0x6b0
[   82.816241]  kthread+0x180/0x190
[   82.816455]  ? create_worker+0x230/0x230
[   82.816712]  ? kthread_create_on_node+0x70/0x70
[   82.817019]  ret_from_fork+0x2a/0x40
[   82.817267] Code: 44 89 c7 31 c0 66 83 e7 04 0f 95 c0 48 83 c0 02 48 83 04 c5 18 e6 dc 82 01 66 85 ff b8 01 00 00 00 0f 85 72 ff ff ff 41 83 c8 04 <
66> 44 89 42 0a 48 89 c8 83 e0 01 48 83 c0 02 48 83 04 c5 f0 e5 
[   82.818516] RIP: report_bug+0xc4/0x180 RSP: ffffc90000fa7ac8
[   82.818883] CR2: ffffffffa03bcc0e
[   82.819110] ---[ end trace 5a34df2460aff289 ]---

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] block: note about cloned bios and bio_for_each_segment_all
  2017-07-14 14:22   ` Jens Axboe
  2017-07-14 20:54     ` Liu Bo
@ 2017-07-15  0:28     ` David Sterba
  1 sibling, 0 replies; 13+ messages in thread
From: David Sterba @ 2017-07-15  0:28 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Ming Lei, Liu Bo, Filipe Manana, linux-block

On Fri, Jul 14, 2017 at 08:22:31AM -0600, Jens Axboe wrote:
>  /*
>   * drivers should _never_ use the all version - the bio may have been split
> - * before it got to the driver and the driver won't own all of it
> + * before it got to the driver and the driver won't own all of it.
> + *
> + * Don't use this on cloned bio's.
>   */
>  #define bio_for_each_segment_all(bvl, bio, i)				\
> +	WARN_ON_ONCE(bio_flagged(bio, BIO_CLONED));			\
>  	for (i = 0, bvl = (bio)->bi_io_vec; i < (bio)->bi_vcnt; i++, bvl++)

This needs a multiline statement protection, otherwise

if (...)
	bio_for_each_segment_all(...) {
		...
	}

would expand to

if (...)
	WARN_ON_ONCE(...);

bio_for_each_segment_all(...) { ... }

However "do { ... } while(0)" cannot be used here, so WARN_ON_ONCE could
be moved to the initialization block of for:

#define bio_for_each_segment_all(bvl, bio, i)				\
	for (WARN_ON_ONCE(bio_flagged(bio, BIO_CLONED)),		\
	     i = 0, bvl = (bio)->bi_io_vec; i < (bio)->bi_vcnt; i++, bvl++)

I haven't found any code using the exact broken pattern. So this does
not explain the crash Liu Bo reports.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] block: note about cloned bios and bio_for_each_segment_all
  2017-07-14 20:54     ` Liu Bo
  2017-07-14 22:35       ` Ming Lei
@ 2017-07-18 22:19       ` Bart Van Assche
  2017-07-24 16:19         ` Peter Zijlstra
  1 sibling, 1 reply; 13+ messages in thread
From: Bart Van Assche @ 2017-07-18 22:19 UTC (permalink / raw)
  To: bo.li.liu, Jens Axboe
  Cc: Ming Lei, David Sterba, linux-block, fdmanana, Peter Zijlstra,
	Ingo Molnar

On 07/14/17 13:54, Liu Bo wrote:
> On Fri, Jul 14, 2017 at 08:22:31AM -0600, Jens Axboe wrote:
>> On 07/14/2017 07:47 AM, Ming Lei wrote:
>>>> @@ -156,6 +156,9 @@ static inline void *bio_data(struct bio *bio)
>>>>  /*
>>>>   * drivers should _never_ use the all version - the bio may have been split
>>>>   * before it got to the driver and the driver won't own all of it
>>>> + *
>>>> + * Note that cloned bios must not use this as their bi_vcnt may be invalid and
>>>> + * this could lead to silent corruptions.
>>>>   */
>>>>  #define bio_for_each_segment_all(bvl, bio, i)                          \
>>>>         for (i = 0, bvl = (bio)->bi_io_vec; i < (bio)->bi_vcnt; i++, bvl++)
>>>> --
>>>> 2.13.0
>>>>
>>>
>>> Maybe we can add a warning here if it is a cloned bio.
>>
>> I think that's a good idea, it's easy for people to get this wrong, and
>> the consequences can be dire. How about something like this?
>>
>> diff --git a/include/linux/bio.h b/include/linux/bio.h
>> index 7b1cf4ba0902..13b6ac6eae29 100644
>> --- a/include/linux/bio.h
>> +++ b/include/linux/bio.h
>> @@ -155,9 +155,12 @@ static inline void *bio_data(struct bio *bio)
>>  
>>  /*
>>   * drivers should _never_ use the all version - the bio may have been split
>> - * before it got to the driver and the driver won't own all of it
>> + * before it got to the driver and the driver won't own all of it.
>> + *
>> + * Don't use this on cloned bio's.
>>   */
>>  #define bio_for_each_segment_all(bvl, bio, i)				\
>> +	WARN_ON_ONCE(bio_flagged(bio, BIO_CLONED));			\
>>  	for (i = 0, bvl = (bio)->bi_io_vec; i < (bio)->bi_vcnt; i++, bvl++)
>>  
>>  static inline void bio_advance_iter(struct bio *bio, struct bvec_iter *iter,
>>
> 
> This patch gave me a crash, I'm double checking it..
> 
> thanks,
> -liubo
> 
> [  104.140220] BUG: unable to handle kernel paging request at ffffffffa0399c1a
> [  104.140675] IP: report_bug+0xc4/0x180
> [  104.140916] PGD 2626067 
> [  104.140917] P4D 2626067 
> [  104.141089] PUD 2627063 
> [  104.141259] PMD 2346aa067 
> [  104.141429] PTE 800000023569c161
> [  104.141610] 
> [  104.141926] Oops: 0003 [#1] SMP
> [  104.142137] Dumping ftrace buffer:
> [  104.142366]    (ftrace buffer empty)
> [  104.142602] Modules linked in: btrfs(OE) xor raid6_pq ppdev parport_pc parport serio_raw nfsd auth_rpcgss nfs_acl lockd grace sunrpc [last unloaded$
>  xor]
> [  104.143493] CPU: 0 PID: 144 Comm: kworker/u16:4 Tainted: G        W  OE   4.12.0+ #801
> [  104.144009] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.9.3-1.fc25 04/01/2014
> [  104.144654] Workqueue: btrfs-rmw btrfs_rmw_helper [btrfs]
> [  104.145009] task: ffff8802382f8000 task.stack: ffffc90000f80000
> [  104.145393] RIP: 0010:report_bug+0xc4/0x180
> [  104.145668] RSP: 0018:ffffc90000f83ac8 EFLAGS: 00010002
> [  104.146015] RAX: 0000000000000001 RBX: ffffffffa0356cff RCX: 0000000000000001
> [  104.146474] RDX: ffffffffa0399c10 RSI: 0000000000000480 RDI: 0000000000000000
> [  104.146931] RBP: ffffc90000f83ae8 R08: 0000000000000907 R09: 0000000000000000
> [  104.147393] R10: 000000005170b2af R11: 000000002b881219 R12: ffffc90000f83c38
> [  104.147852] R13: ffffffffa038a415 R14: 0000000000000004 R15: 0000000000000006
> [  104.148315] FS:  0000000000000000(0000) GS:ffff88023a600000(0000) knlGS:0000000000000000
> [  104.148836] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [  104.149211] CR2: ffffffffa0399c1a CR3: 00000002244fc000 CR4: 00000000000006f0
> [  104.149672] Call Trace:
> [  104.149840]  fixup_bug+0x43/0x60
> [  104.150060]  do_trap+0x18a/0x1f0
> [  104.150276]  do_error_trap+0xdf/0x1a0
> [  104.150606]  ? index_rbio_pages+0x14f/0x160 [btrfs]
> [  104.150929]  ? trace_hardirqs_off_thunk+0x1a/0x1c
> [  104.151241]  do_invalid_op+0x20/0x30
> [  104.151478]  invalid_op+0x1e/0x30
> [  104.151787] RIP: 0010:index_rbio_pages+0x14f/0x160 [btrfs]
> [  104.152148] RSP: 0018:ffffc90000f83ce8 EFLAGS: 00010002
> [  104.152488] RAX: ffffffffa0421938 RBX: 0000000000000000 RCX: 0000000000000000
> [  104.152947] RDX: 0000000000000003 RSI: 0000000000000001 RDI: ffffffffa0440420
> [  104.153410] RBP: ffffc90000f83d18 R08: 0000000000000000 R09: 0000000000000000
> [  104.153869] R10: 0000000000000001 R11: 000000002b881219 R12: ffffffffa0421960
> [  104.154330] R13: 0000000000000001 R14: ffff880236e00000 R15: ffff880227718080
> [  104.154882]  ? index_rbio_pages+0xc6/0x160 [btrfs]
> [  104.155286]  rmw_work+0x76/0x310 [btrfs]
> [  104.155633]  btrfs_scrubparity_helper+0xad/0x8e0 [btrfs]
> [  104.156070]  btrfs_rmw_helper+0xe/0x10 [btrfs]
> [  104.156364]  process_one_work+0x34f/0x9c0
> [  104.156631]  worker_thread+0x34a/0x6b0
> [  104.156879]  kthread+0x180/0x190
> [  104.157095]  ? create_worker+0x230/0x230
> [  104.157352]  ? kthread_create_on_node+0x70/0x70
> [  104.157648]  ? kthread_create_on_node+0x70/0x70
> [  104.157944]  ret_from_fork+0x2a/0x40
> [  104.159436] RIP: report_bug+0xc4/0x180 RSP: ffffc90000f83ac8
> [  104.159803] CR2: ffffffffa0399c1a
> [  104.160026] ---[ end trace 78686c1f7150bacf ]---

(+Peter)
 
Hello Peter,

In a test I ran myself with kernel v4.12-rc1 I also noticed that a
WARN_ON_ONCE() statement triggered an oops in report_bug() and killed the
kernel thread of the caller instead of letting the caller continue. What I
ran into is probably the same oops as in the above call trace. For the test
I ran myself the disassembly is as follows:

(gdb) list *(report_bug+0x94)
0xffffffff812ba024 is in report_bug (lib/bug.c:177).
172                                     return BUG_TRAP_TYPE_WARN;
173
174                             /*
175                              * Since this is the only store, concurrency is not an issue.
176                              */
177                             bug->flags |= BUGFLAG_DONE;
178                     }
179             }
180
181             if (warning) {

Could this be related to patch "debug: Add _ONCE() logic to report_bug()"?
Is this a known issue? If so, is a fix perhaps already available?

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] block: note about cloned bios and bio_for_each_segment_all
  2017-07-18 22:19       ` Bart Van Assche
@ 2017-07-24 16:19         ` Peter Zijlstra
  0 siblings, 0 replies; 13+ messages in thread
From: Peter Zijlstra @ 2017-07-24 16:19 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: bo.li.liu, Jens Axboe, Ming Lei, David Sterba, linux-block,
	fdmanana, Ingo Molnar

On Tue, Jul 18, 2017 at 03:19:00PM -0700, Bart Van Assche wrote:
> Hello Peter,

Sorry for being late, I'm trying to recover from a few weeks of leave
and the inbox is in shambles.

> In a test I ran myself with kernel v4.12-rc1 I also noticed that a
> WARN_ON_ONCE() statement triggered an oops in report_bug() and killed the
> kernel thread of the caller instead of letting the caller continue. What I
> ran into is probably the same oops as in the above call trace. For the test
> I ran myself the disassembly is as follows:
> 
> (gdb) list *(report_bug+0x94)
> 0xffffffff812ba024 is in report_bug (lib/bug.c:177).
> 172                                     return BUG_TRAP_TYPE_WARN;
> 173
> 174                             /*
> 175                              * Since this is the only store, concurrency is not an issue.
> 176                              */
> 177                             bug->flags |= BUGFLAG_DONE;
> 178                     }
> 179             }
> 180
> 181             if (warning) {
> 
> Could this be related to patch "debug: Add _ONCE() logic to report_bug()"?
> Is this a known issue? If so, is a fix perhaps already available?

Yep..  please see:

  https://marc.info/?l=linux-kernel&m=150055119925677

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2017-07-24 16:19 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-07-14 13:40 [PATCH] block: note about cloned bios and bio_for_each_segment_all David Sterba
2017-07-14 13:47 ` Ming Lei
2017-07-14 14:22   ` Jens Axboe
2017-07-14 20:54     ` Liu Bo
2017-07-14 22:35       ` Ming Lei
2017-07-14 22:37         ` Jens Axboe
2017-07-14 23:20         ` Liu Bo
2017-07-18 22:19       ` Bart Van Assche
2017-07-24 16:19         ` Peter Zijlstra
2017-07-15  0:28     ` David Sterba
2017-07-14 15:03   ` David Sterba
2017-07-14 17:56     ` Filipe Manana
2017-07-14 18:19       ` Jens Axboe

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.