All of lore.kernel.org
 help / color / mirror / Atom feed
* Crash in btrfs_uuid_tree_iterate during mount
@ 2016-08-05 11:08 Nikolay Borisov
  2016-08-05 15:12 ` Chris Mason
  0 siblings, 1 reply; 13+ messages in thread
From: Nikolay Borisov @ 2016-08-05 11:08 UTC (permalink / raw)
  To: linux-btrfs; +Cc: SiteGround Operations

Hello, 

Recently I started getting the following crashes on some servers, 
running btrfs: 

[340435.480338] BTRFS info (device loop7): disk space caching is enabled
[340435.480509] BTRFS: has skinny extents
[340441.716174] BTRFS: checking UUID tree
[340441.912070] BUG: unable to handle kernel NULL pointer dereference at 0000000000000098
[340441.912463] IP: [<ffffffffa081f774>] btrfs_uuid_tree_iterate+0xf4/0x2d0 [btrfs]
[340441.912823] PGD 0 
[340441.913035] Oops: 0000 [#1] SMP 
[340441.913302] Modules linked in: 
[340441.916996] CPU: 10 PID: 24990 Comm: btrfs-uuid Tainted: P        W  O    4.4.14-clouder1 #55
[340441.917287] Hardware name: Supermicro X9DRD-iF/LF/X9DRD-iF, BIOS 3.2 01/16/2015
[340441.917573] task: ffff8801b95c1b80 ti: ffff88034e504000 task.ti: ffff88034e504000
[340441.917859] RIP: 0010:[<ffffffffa081f774>]  [<ffffffffa081f774>] btrfs_uuid_tree_iterate+0xf4/0x2d0 [btrfs]
[340441.918212] RSP: 0018:ffff88034e507e20  EFLAGS: 00010246
[340441.918382] RAX: 0000000000000000 RBX: 0000160000000000 RCX: ffff880000000000
[340441.918665] RDX: 0000000000000001 RSI: ffff8801e3abd140 RDI: ffff88046f027f00
[340441.918952] RBP: ffff88034e507ea8 R08: 000060fb80001760 R09: ffffffffa07ac1de
[340441.919236] R10: ffffe8ffffd41760 R11: ffffea00078eaf40 R12: ffff8801b98ab750
[340441.919521] R13: 00000000fffffffe R14: ffff8801e3abd140 R15: ffff880049586000
[340441.919810] FS:  0000000000000000(0000) GS:ffff88047fd40000(0000) knlGS:0000000000000000
[340441.920097] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[340441.920267] CR2: 0000000000000098 CR3: 0000000001c0a000 CR4: 00000000000406e0
[340441.920554] Stack:
[340441.920717]  ffff880049586000 ffff8801b98ab750 00003f7b00014fc0 ffff8803711dec08
[340441.921186]  ffffffffa07d0c40 ffff880332342000 0000000000000114 1b7088046d7612f8
[340441.921655]  8cfb42689378e508 70157e0ade97f5d6 8c42689378e5081b 15157e0ade97f5d6
[340441.922126] Call Trace:
[340441.922315]  [<ffffffffa07d0c40>] ? find_live_mirror.isra.18+0xc0/0xc0 [btrfs]
[340441.922614]  [<ffffffffa07d0ae0>] ? btrfs_uuid_scan_kthread+0x3c0/0x3c0 [btrfs]
[340441.922917]  [<ffffffffa07d0afb>] btrfs_uuid_rescan_kthread+0x1b/0x60 [btrfs]
[340441.923197]  [<ffffffff8107161f>] kthread+0xef/0x110
[340441.923363]  [<ffffffff81071530>] ? kthread_park+0x60/0x60
[340441.923531]  [<ffffffff816149ff>] ret_from_fork+0x3f/0x70
[340441.923697]  [<ffffffff81071530>] ? kthread_park+0x60/0x60
[340441.923863] Code: 0f 86 a0 00 00 00 48 bb 00 00 00 00 00 16 00 00 41 8b 44 24 40 48 b9 00 00 00 00 00 88 ff ff 8d 50 01 49 8b 04 24 41 89 54 24 40 <48> 03 98 98 00 00 00 48 89 d8 48 c1 f8 06 48 c1 e0 0c 3b 54 08 
[340441.927296] RIP  [<ffffffffa081f774>] btrfs_uuid_tree_iterate+0xf4/0x2d0 [btrfs]
[340441.927641]  RSP <ffff88034e507e20>
[340441.927806] CR2: 0000000000000098


ffffffffa081f774 is in the heavily inlined btrfs_next_item. Here
is the decoded instructions, right before the crash with annotations:

   0:	0f 86 a0 00 00 00    	jbe    0xa6
   6:	48 bb 00 00 00 00 00 	mov    $0x160000000000,%rbx
   d:	16 00 00 
  10:	41 8b 44 24 40       	mov    0x40(%r12),%eax ; r12 is btrfs_path, eax points to first slot
  15:	48 b9 00 00 00 00 00 	mov    $0xffff880000000000,%rcx
  1c:	88 ff ff 
  1f:	8d 50 01             	lea    0x1(%rax),%edx ; incr slot
  22:	49 8b 04 24          	mov    (%r12),%rax ; load first extent_buffer in rax
  26:	41 89 54 24 40       	mov    %edx,0x40(%r12) ; save incremented slot
  2b:*	48 03 98 98 00 00 00 	add    0x98(%rax),%rbx <-- trapping instruction ; load the first page from the extent_buffer
  32:	48 89 d8             	mov    %rbx,%rax
  35:	48 c1 f8 06          	sar    $0x6,%rax
  39:	48 c1 e0 0c          	shl    $0xc,%rax
  3d:	3b                   	.byte 0x3b
  3e:	54                   	push   %rsp
  3f:	08                   	.byte 0x8

So as can be seen rax is zero and naturally dereferencing it is 
also zero. What's interesting is the content of the btrf_path:

struct btrfs_path {
  nodes = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, 
  slots = {1, 0, 0, 0, 0, 0, 0, 0}, 
  locks = {0, 0, 0, 0, 0, 0, 0, 0}, 
  reada = 0, 
  lowest_level = 0, 
  search_for_split = 0, 
  keep_locks = 0, 
  skip_locking = 0, 
  leave_spinning = 0, 
  search_commit_root = 0, 
  need_commit_sem = 0, 
  skip_release_on_error = 0
}

Any ideas how come btrfs_path can be all zero, the one in
the first slot comes from the increment in btrfs_next_old_item.

Regards, 
Nikolay 



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Crash in btrfs_uuid_tree_iterate during mount
  2016-08-05 11:08 Crash in btrfs_uuid_tree_iterate during mount Nikolay Borisov
@ 2016-08-05 15:12 ` Chris Mason
  2016-08-05 19:14   ` Nikolay Borisov
  2016-08-08 10:49   ` Nikolay Borisov
  0 siblings, 2 replies; 13+ messages in thread
From: Chris Mason @ 2016-08-05 15:12 UTC (permalink / raw)
  To: Nikolay Borisov, linux-btrfs; +Cc: SiteGround Operations


On 08/05/2016 07:08 AM, Nikolay Borisov wrote:
> Hello,
>
> Recently I started getting the following crashes on some servers,
> running btrfs:
>
> [340435.480338] BTRFS info (device loop7): disk space caching is enabled
> [340435.480509] BTRFS: has skinny extents
> [340441.716174] BTRFS: checking UUID tree
> [340441.912070] BUG: unable to handle kernel NULL pointer dereference at 0000000000000098
> [340441.912463] IP: [<ffffffffa081f774>] btrfs_uuid_tree_iterate+0xf4/0x2d0 [btrfs]
> [340441.912823] PGD 0
> [340441.913035] Oops: 0000 [#1] SMP
> [340441.913302] Modules linked in:
> [340441.916996] CPU: 10 PID: 24990 Comm: btrfs-uuid Tainted: P        W  O    4.4.14-clouder1 #55
> [340441.917287] Hardware name: Supermicro X9DRD-iF/LF/X9DRD-iF, BIOS 3.2 01/16/2015
> [340441.917573] task: ffff8801b95c1b80 ti: ffff88034e504000 task.ti: ffff88034e504000
> [340441.917859] RIP: 0010:[<ffffffffa081f774>]  [<ffffffffa081f774>] btrfs_uuid_tree_iterate+0xf4/0x2d0 [btrfs]
> [340441.918212] RSP: 0018:ffff88034e507e20  EFLAGS: 00010246
> [340441.918382] RAX: 0000000000000000 RBX: 0000160000000000 RCX: ffff880000000000
> [340441.918665] RDX: 0000000000000001 RSI: ffff8801e3abd140 RDI: ffff88046f027f00
> [340441.918952] RBP: ffff88034e507ea8 R08: 000060fb80001760 R09: ffffffffa07ac1de
> [340441.919236] R10: ffffe8ffffd41760 R11: ffffea00078eaf40 R12: ffff8801b98ab750
> [340441.919521] R13: 00000000fffffffe R14: ffff8801e3abd140 R15: ffff880049586000
> [340441.919810] FS:  0000000000000000(0000) GS:ffff88047fd40000(0000) knlGS:0000000000000000
> [340441.920097] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [340441.920267] CR2: 0000000000000098 CR3: 0000000001c0a000 CR4: 00000000000406e0
> [340441.920554] Stack:
> [340441.920717]  ffff880049586000 ffff8801b98ab750 00003f7b00014fc0 ffff8803711dec08
> [340441.921186]  ffffffffa07d0c40 ffff880332342000 0000000000000114 1b7088046d7612f8
> [340441.921655]  8cfb42689378e508 70157e0ade97f5d6 8c42689378e5081b 15157e0ade97f5d6
> [340441.922126] Call Trace:
> [340441.922315]  [<ffffffffa07d0c40>] ? find_live_mirror.isra.18+0xc0/0xc0 [btrfs]
> [340441.922614]  [<ffffffffa07d0ae0>] ? btrfs_uuid_scan_kthread+0x3c0/0x3c0 [btrfs]
> [340441.922917]  [<ffffffffa07d0afb>] btrfs_uuid_rescan_kthread+0x1b/0x60 [btrfs]
> [340441.923197]  [<ffffffff8107161f>] kthread+0xef/0x110
> [340441.923363]  [<ffffffff81071530>] ? kthread_park+0x60/0x60
> [340441.923531]  [<ffffffff816149ff>] ret_from_fork+0x3f/0x70
> [340441.923697]  [<ffffffff81071530>] ? kthread_park+0x60/0x60
> [340441.923863] Code: 0f 86 a0 00 00 00 48 bb 00 00 00 00 00 16 00 00 41 8b 44 24 40 48 b9 00 00 00 00 00 88 ff ff 8d 50 01 49 8b 04 24 41 89 54 24 40 <48> 03 98 98 00 00 00 48 89 d8 48 c1 f8 06 48 c1 e0 0c 3b 54 08
> [340441.927296] RIP  [<ffffffffa081f774>] btrfs_uuid_tree_iterate+0xf4/0x2d0 [btrfs]
> [340441.927641]  RSP <ffff88034e507e20>
> [340441.927806] CR2: 0000000000000098
>
>
> ffffffffa081f774 is in the heavily inlined btrfs_next_item. Here
> is the decoded instructions, right before the crash with annotations:
>
>    0:	0f 86 a0 00 00 00    	jbe    0xa6
>    6:	48 bb 00 00 00 00 00 	mov    $0x160000000000,%rbx
>    d:	16 00 00
>   10:	41 8b 44 24 40       	mov    0x40(%r12),%eax ; r12 is btrfs_path, eax points to first slot
>   15:	48 b9 00 00 00 00 00 	mov    $0xffff880000000000,%rcx
>   1c:	88 ff ff
>   1f:	8d 50 01             	lea    0x1(%rax),%edx ; incr slot
>   22:	49 8b 04 24          	mov    (%r12),%rax ; load first extent_buffer in rax
>   26:	41 89 54 24 40       	mov    %edx,0x40(%r12) ; save incremented slot
>   2b:*	48 03 98 98 00 00 00 	add    0x98(%rax),%rbx <-- trapping instruction ; load the first page from the extent_buffer
>   32:	48 89 d8             	mov    %rbx,%rax
>   35:	48 c1 f8 06          	sar    $0x6,%rax
>   39:	48 c1 e0 0c          	shl    $0xc,%rax
>   3d:	3b                   	.byte 0x3b
>   3e:	54                   	push   %rsp
>   3f:	08                   	.byte 0x8
>
> So as can be seen rax is zero and naturally dereferencing it is
> also zero. What's interesting is the content of the btrf_path:
>
> struct btrfs_path {
>   nodes = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0},
>   slots = {1, 0, 0, 0, 0, 0, 0, 0},
>   locks = {0, 0, 0, 0, 0, 0, 0, 0},
>   reada = 0,
>   lowest_level = 0,
>   search_for_split = 0,
>   keep_locks = 0,
>   skip_locking = 0,
>   leave_spinning = 0,
>   search_commit_root = 0,
>   need_commit_sem = 0,
>   skip_release_on_error = 0
> }
>
> Any ideas how come btrfs_path can be all zero, the one in
> the first slot comes from the increment in btrfs_next_old_item.

Thanks for all the extra details.  It really must be this:

                        if (ret > 0) { 
                                 btrfs_release_path(path); 
                                 ret = btrfs_uuid_iter_rem(root, uuid, key.type,
                                                           subid_cpu); 
                                 if (ret == 0) { 
                                         /* 
                                          * this might look inefficient, but the
                                          * justification is that it is an
                                          * exception that check_func returns 1,
                                          * and that in the regular case only one
                                          * entry per UUID exists. 
                                          */ 
                                         goto again_search_slot; 
                                 } 
                                 if (ret < 0 && ret != -ENOENT) 
                                         goto out; 
                         } 
                         item_size -= sizeof(subid_le); 
                         offset += sizeof(subid_le);


We've released the path, which would explain why its full of NULL.  ret
was ENOENT, so it kept on going, and we fell through to
btrfs_next_item()

Once the path is released, we should either be searching again or
exiting.  A goto again_search_slot would probably fix it, but I'd want
to also bump the key so we don't just process the same item over and
over again.

Can you reproduce this reliably?  I'd hate to patch it now and make more
problems later just because we didn't fully understand the items we were
tripping over.

-chris

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Crash in btrfs_uuid_tree_iterate during mount
  2016-08-05 15:12 ` Chris Mason
@ 2016-08-05 19:14   ` Nikolay Borisov
  2016-08-08 10:49   ` Nikolay Borisov
  1 sibling, 0 replies; 13+ messages in thread
From: Nikolay Borisov @ 2016-08-05 19:14 UTC (permalink / raw)
  To: Chris Mason; +Cc: Nikolay Borisov, linux-btrfs, SiteGround Operations

On Fri, Aug 5, 2016 at 6:12 PM, Chris Mason <clm@fb.com> wrote:
>
> On 08/05/2016 07:08 AM, Nikolay Borisov wrote:
>> Hello,
>>
>> Any ideas how come btrfs_path can be all zero, the one in
>> the first slot comes from the increment in btrfs_next_old_item.
>
> Thanks for all the extra details.  It really must be this:
>
>                         if (ret > 0) {
>                                  btrfs_release_path(path);
>                                  ret = btrfs_uuid_iter_rem(root, uuid, key.type,
>                                                            subid_cpu);
>                                  if (ret == 0) {
>                                          /*
>                                           * this might look inefficient, but the
>                                           * justification is that it is an
>                                           * exception that check_func returns 1,
>                                           * and that in the regular case only one
>                                           * entry per UUID exists.
>                                           */
>                                          goto again_search_slot;
>                                  }
>                                  if (ret < 0 && ret != -ENOENT)
>                                          goto out;
>                          }
>                          item_size -= sizeof(subid_le);
>                          offset += sizeof(subid_le);
>
>
> We've released the path, which would explain why its full of NULL.  ret
> was ENOENT, so it kept on going, and we fell through to
> btrfs_next_item()
>
> Once the path is released, we should either be searching again or
> exiting.  A goto again_search_slot would probably fix it, but I'd want
> to also bump the key so we don't just process the same item over and
> over again.
>
> Can you reproduce this reliably?  I'd hate to patch it now and make more
> problems later just because we didn't fully understand the items we were
> tripping over.

Well there are 2 things I can do:
 a) Dig more in the crash dump to see whether ret has been saved to
the stack and extract the return value. If your theory is correct I
should see the value of ENOENT.
 b) Patch the code to print a warn when btrfs_uuid_iter_rem returns an
ENOENT, that way at least we will know that this is happening.

In either cases this would take me until at least next week, at which
time I should be able to  give more information.

>
> -chris

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Crash in btrfs_uuid_tree_iterate during mount
  2016-08-05 15:12 ` Chris Mason
  2016-08-05 19:14   ` Nikolay Borisov
@ 2016-08-08 10:49   ` Nikolay Borisov
  2016-08-08 14:16     ` Chris Mason
  1 sibling, 1 reply; 13+ messages in thread
From: Nikolay Borisov @ 2016-08-08 10:49 UTC (permalink / raw)
  To: Chris Mason, linux-btrfs; +Cc: SiteGround Operations



On 08/05/2016 06:12 PM, Chris Mason wrote:
> 
> On 08/05/2016 07:08 AM, Nikolay Borisov wrote:
>> Hello,
>>
>> Recently I started getting the following crashes on some servers,
>> running btrfs:
>>
>> [340435.480338] BTRFS info (device loop7): disk space caching is enabled
>> [340435.480509] BTRFS: has skinny extents
>> [340441.716174] BTRFS: checking UUID tree
>> [340441.912070] BUG: unable to handle kernel NULL pointer dereference at 0000000000000098
>> [340441.912463] IP: [<ffffffffa081f774>] btrfs_uuid_tree_iterate+0xf4/0x2d0 [btrfs]
>> [340441.912823] PGD 0
>> [340441.913035] Oops: 0000 [#1] SMP
>> [340441.913302] Modules linked in:
>> [340441.916996] CPU: 10 PID: 24990 Comm: btrfs-uuid Tainted: P        W  O    4.4.14-clouder1 #55
>> [340441.917287] Hardware name: Supermicro X9DRD-iF/LF/X9DRD-iF, BIOS 3.2 01/16/2015
>> [340441.917573] task: ffff8801b95c1b80 ti: ffff88034e504000 task.ti: ffff88034e504000
>> [340441.917859] RIP: 0010:[<ffffffffa081f774>]  [<ffffffffa081f774>] btrfs_uuid_tree_iterate+0xf4/0x2d0 [btrfs]
>> [340441.918212] RSP: 0018:ffff88034e507e20  EFLAGS: 00010246
>> [340441.918382] RAX: 0000000000000000 RBX: 0000160000000000 RCX: ffff880000000000
>> [340441.918665] RDX: 0000000000000001 RSI: ffff8801e3abd140 RDI: ffff88046f027f00
>> [340441.918952] RBP: ffff88034e507ea8 R08: 000060fb80001760 R09: ffffffffa07ac1de
>> [340441.919236] R10: ffffe8ffffd41760 R11: ffffea00078eaf40 R12: ffff8801b98ab750
>> [340441.919521] R13: 00000000fffffffe R14: ffff8801e3abd140 R15: ffff880049586000
>> [340441.919810] FS:  0000000000000000(0000) GS:ffff88047fd40000(0000) knlGS:0000000000000000
>> [340441.920097] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> [340441.920267] CR2: 0000000000000098 CR3: 0000000001c0a000 CR4: 00000000000406e0
>> [340441.920554] Stack:
>> [340441.920717]  ffff880049586000 ffff8801b98ab750 00003f7b00014fc0 ffff8803711dec08
>> [340441.921186]  ffffffffa07d0c40 ffff880332342000 0000000000000114 1b7088046d7612f8
>> [340441.921655]  8cfb42689378e508 70157e0ade97f5d6 8c42689378e5081b 15157e0ade97f5d6
>> [340441.922126] Call Trace:
>> [340441.922315]  [<ffffffffa07d0c40>] ? find_live_mirror.isra.18+0xc0/0xc0 [btrfs]
>> [340441.922614]  [<ffffffffa07d0ae0>] ? btrfs_uuid_scan_kthread+0x3c0/0x3c0 [btrfs]
>> [340441.922917]  [<ffffffffa07d0afb>] btrfs_uuid_rescan_kthread+0x1b/0x60 [btrfs]
>> [340441.923197]  [<ffffffff8107161f>] kthread+0xef/0x110
>> [340441.923363]  [<ffffffff81071530>] ? kthread_park+0x60/0x60
>> [340441.923531]  [<ffffffff816149ff>] ret_from_fork+0x3f/0x70
>> [340441.923697]  [<ffffffff81071530>] ? kthread_park+0x60/0x60
>> [340441.923863] Code: 0f 86 a0 00 00 00 48 bb 00 00 00 00 00 16 00 00 41 8b 44 24 40 48 b9 00 00 00 00 00 88 ff ff 8d 50 01 49 8b 04 24 41 89 54 24 40 <48> 03 98 98 00 00 00 48 89 d8 48 c1 f8 06 48 c1 e0 0c 3b 54 08
>> [340441.927296] RIP  [<ffffffffa081f774>] btrfs_uuid_tree_iterate+0xf4/0x2d0 [btrfs]
>> [340441.927641]  RSP <ffff88034e507e20>
>> [340441.927806] CR2: 0000000000000098
>>
>>
>> ffffffffa081f774 is in the heavily inlined btrfs_next_item. Here
>> is the decoded instructions, right before the crash with annotations:
>>
>>    0:	0f 86 a0 00 00 00    	jbe    0xa6
>>    6:	48 bb 00 00 00 00 00 	mov    $0x160000000000,%rbx
>>    d:	16 00 00
>>   10:	41 8b 44 24 40       	mov    0x40(%r12),%eax ; r12 is btrfs_path, eax points to first slot
>>   15:	48 b9 00 00 00 00 00 	mov    $0xffff880000000000,%rcx
>>   1c:	88 ff ff
>>   1f:	8d 50 01             	lea    0x1(%rax),%edx ; incr slot
>>   22:	49 8b 04 24          	mov    (%r12),%rax ; load first extent_buffer in rax
>>   26:	41 89 54 24 40       	mov    %edx,0x40(%r12) ; save incremented slot
>>   2b:*	48 03 98 98 00 00 00 	add    0x98(%rax),%rbx <-- trapping instruction ; load the first page from the extent_buffer
>>   32:	48 89 d8             	mov    %rbx,%rax
>>   35:	48 c1 f8 06          	sar    $0x6,%rax
>>   39:	48 c1 e0 0c          	shl    $0xc,%rax
>>   3d:	3b                   	.byte 0x3b
>>   3e:	54                   	push   %rsp
>>   3f:	08                   	.byte 0x8
>>
>> So as can be seen rax is zero and naturally dereferencing it is
>> also zero. What's interesting is the content of the btrf_path:
>>
>> struct btrfs_path {
>>   nodes = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0},
>>   slots = {1, 0, 0, 0, 0, 0, 0, 0},
>>   locks = {0, 0, 0, 0, 0, 0, 0, 0},
>>   reada = 0,
>>   lowest_level = 0,
>>   search_for_split = 0,
>>   keep_locks = 0,
>>   skip_locking = 0,
>>   leave_spinning = 0,
>>   search_commit_root = 0,
>>   need_commit_sem = 0,
>>   skip_release_on_error = 0
>> }
>>
>> Any ideas how come btrfs_path can be all zero, the one in
>> the first slot comes from the increment in btrfs_next_old_item.
> 
> Thanks for all the extra details.  It really must be this:
> 
>                         if (ret > 0) { 
>                                  btrfs_release_path(path); 
>                                  ret = btrfs_uuid_iter_rem(root, uuid, key.type,
>                                                            subid_cpu); 
>                                  if (ret == 0) { 
>                                          /* 
>                                           * this might look inefficient, but the
>                                           * justification is that it is an
>                                           * exception that check_func returns 1,
>                                           * and that in the regular case only one
>                                           * entry per UUID exists. 
>                                           */ 
>                                          goto again_search_slot; 
>                                  } 
>                                  if (ret < 0 && ret != -ENOENT) 
>                                          goto out; 
>                          } 
>                          item_size -= sizeof(subid_le); 
>                          offset += sizeof(subid_le);
> 
> 
> We've released the path, which would explain why its full of NULL.  ret
> was ENOENT, so it kept on going, and we fell through to
> btrfs_next_item()
> 
> Once the path is released, we should either be searching again or
> exiting.  A goto again_search_slot would probably fix it, but I'd want
> to also bump the key so we don't just process the same item over and
> over again.
> 
> Can you reproduce this reliably?  I'd hate to patch it now and make more
> problems later just because we didn't fully understand the items we were
> tripping over.

Hello Chris, 

Indeed it seems that btrfs_uuid_iter_rem returned a ENOENT: 

callq  0xffffffffa081f450 <btrfs_uuid_tree_rem>
mov    %eax,%r13d
je     0xffffffffa081f882 <btrfs_uuid_tree_iterate+514> ; if uuid_iter_rem returned -ENOENT; else fall through. 

I checked and r13d is not being touched between the invocation of 
btrfs_uuid_iter_rem and the btrfs_next_item: 
    RIP: ffffffffa081f774  RSP: ffff88034e507e20  RFLAGS: 00010246
    RAX: 0000000000000000  RBX: 0000160000000000  RCX: ffff880000000000
    RDX: 0000000000000001  RSI: ffff8801e3abd140  RDI: ffff88046f027f00
    RBP: ffff88034e507ea8   R8: 000060fb80001760   R9: ffffffffa07ac1de
    R10: ffffe8ffffd41760  R11: ffffea00078eaf40  R12: ffff8801b98ab750
    R13: 00000000fffffffe  R14: ffff8801e3abd140  R15: ffff880049586000
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018

r13 is clearly -ENOENT. So your assumption was correct. 

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Crash in btrfs_uuid_tree_iterate during mount
  2016-08-08 10:49   ` Nikolay Borisov
@ 2016-08-08 14:16     ` Chris Mason
  2016-08-08 14:21       ` Nikolay Borisov
  0 siblings, 1 reply; 13+ messages in thread
From: Chris Mason @ 2016-08-08 14:16 UTC (permalink / raw)
  To: Nikolay Borisov, linux-btrfs; +Cc: SiteGround Operations

On 08/08/2016 06:49 AM, Nikolay Borisov wrote:
>
>
> On 08/05/2016 06:12 PM, Chris Mason wrote:

> Hello Chris,
>
> Indeed it seems that btrfs_uuid_iter_rem returned a ENOENT:
>
> callq  0xffffffffa081f450 <btrfs_uuid_tree_rem>
> mov    %eax,%r13d
> je     0xffffffffa081f882 <btrfs_uuid_tree_iterate+514> ; if uuid_iter_rem returned -ENOENT; else fall through.
>
> I checked and r13d is not being touched between the invocation of
> btrfs_uuid_iter_rem and the btrfs_next_item:
>     RIP: ffffffffa081f774  RSP: ffff88034e507e20  RFLAGS: 00010246
>     RAX: 0000000000000000  RBX: 0000160000000000  RCX: ffff880000000000
>     RDX: 0000000000000001  RSI: ffff8801e3abd140  RDI: ffff88046f027f00
>     RBP: ffff88034e507ea8   R8: 000060fb80001760   R9: ffffffffa07ac1de
>     R10: ffffe8ffffd41760  R11: ffffea00078eaf40  R12: ffff8801b98ab750
>     R13: 00000000fffffffe  R14: ffff8801e3abd140  R15: ffff880049586000
>     ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
>
> r13 is clearly -ENOENT. So your assumption was correct.

Fantastic, thanks again for digging through it.  Making the patch is 
much easier than testing the patch in this case.  If you can trigger 
this semi-reliably, we can add some debugging to make sure we're not 
papering over some other problem.

-chris

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Crash in btrfs_uuid_tree_iterate during mount
  2016-08-08 14:16     ` Chris Mason
@ 2016-08-08 14:21       ` Nikolay Borisov
  2016-08-08 14:24         ` Chris Mason
  0 siblings, 1 reply; 13+ messages in thread
From: Nikolay Borisov @ 2016-08-08 14:21 UTC (permalink / raw)
  To: Chris Mason, linux-btrfs; +Cc: SiteGround Operations



On 08/08/2016 05:16 PM, Chris Mason wrote:
> 
> Fantastic, thanks again for digging through it.  Making the patch is
> much easier than testing the patch in this case.  If you can trigger
> this semi-reliably, we can add some debugging to make sure we're not
> papering over some other problem.

I don;'t have a reproducer per-se, but I have a fair number of servers,
whose workload causes the issue, so if I patch btrfs and apply it to
5-10 server I;d expect at least one of them to show what the real issue is.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Crash in btrfs_uuid_tree_iterate during mount
  2016-08-08 14:21       ` Nikolay Borisov
@ 2016-08-08 14:24         ` Chris Mason
  2016-08-29  7:25           ` Nikolay Borisov
  2016-09-07  7:38           ` [PATCH] btrfs: Fix handling of -ENOENT from btrfs_uuid_iter_rem Nikolay Borisov
  0 siblings, 2 replies; 13+ messages in thread
From: Chris Mason @ 2016-08-08 14:24 UTC (permalink / raw)
  To: Nikolay Borisov, linux-btrfs; +Cc: SiteGround Operations

On 08/08/2016 10:21 AM, Nikolay Borisov wrote:
>
>
> On 08/08/2016 05:16 PM, Chris Mason wrote:
>>
>> Fantastic, thanks again for digging through it.  Making the patch is
>> much easier than testing the patch in this case.  If you can trigger
>> this semi-reliably, we can add some debugging to make sure we're not
>> papering over some other problem.
>
> I don;'t have a reproducer per-se, but I have a fair number of servers,
> whose workload causes the issue, so if I patch btrfs and apply it to
> 5-10 server I;d expect at least one of them to show what the real issue is.
>

Ok, I'll try and get a patch out before my vacation on Wednesday.  Thanks!

-chris

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Crash in btrfs_uuid_tree_iterate during mount
  2016-08-08 14:24         ` Chris Mason
@ 2016-08-29  7:25           ` Nikolay Borisov
  2016-09-07  7:38           ` [PATCH] btrfs: Fix handling of -ENOENT from btrfs_uuid_iter_rem Nikolay Borisov
  1 sibling, 0 replies; 13+ messages in thread
From: Nikolay Borisov @ 2016-08-29  7:25 UTC (permalink / raw)
  To: Chris Mason, linux-btrfs; +Cc: SiteGround Operations



On 08/08/2016 05:24 PM, Chris Mason wrote:
> On 08/08/2016 10:21 AM, Nikolay Borisov wrote:
>>
>>
>> On 08/08/2016 05:16 PM, Chris Mason wrote:
>>>
>>> Fantastic, thanks again for digging through it.  Making the patch is
>>> much easier than testing the patch in this case.  If you can trigger
>>> this semi-reliably, we can add some debugging to make sure we're not
>>> papering over some other problem.
>>
>> I don;'t have a reproducer per-se, but I have a fair number of servers,
>> whose workload causes the issue, so if I patch btrfs and apply it to
>> 5-10 server I;d expect at least one of them to show what the real
>> issue is.
>>
> 
> Ok, I'll try and get a patch out before my vacation on Wednesday.  Thanks!

Ping


> 
> -chris

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH] btrfs: Fix handling of -ENOENT from btrfs_uuid_iter_rem
  2016-08-08 14:24         ` Chris Mason
  2016-08-29  7:25           ` Nikolay Borisov
@ 2016-09-07  7:38           ` Nikolay Borisov
  2016-09-19 18:13             ` David Sterba
  1 sibling, 1 reply; 13+ messages in thread
From: Nikolay Borisov @ 2016-09-07  7:38 UTC (permalink / raw)
  To: clm; +Cc: linux-btrfs, Nikolay Borisov

btrfs_uuid_iter_rem is able to return -ENOENT, however this condition
is not handled in btrfs_uuid_tree_iterate which can lead to calling
btrfs_next_item with freed path argument, leading to a null pointer
dereference. Fix it by redoing the search but with an incremented
objectid so we don't loop over the same key.

Signed-off-by: Nikolay Borisov <kernel@kyup.com>
Suggested-by: Chris Mason <clm@fb.com>
Link: https://lkml.kernel.org/r/57A473B0.2040203@kyup.com
---
 fs/btrfs/uuid-tree.c | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

Hello Chris, 

Since I keep getting those crashes I (hopefully correctly) implemented
your suggestion of redoing the search with an incremented key so we 
don't end up in a loop. Does that look correct? 


diff --git a/fs/btrfs/uuid-tree.c b/fs/btrfs/uuid-tree.c
index 778282944530..6e5b3866a65c 100644
--- a/fs/btrfs/uuid-tree.c
+++ b/fs/btrfs/uuid-tree.c
@@ -329,8 +329,12 @@ again_search_slot:
 					 * entry per UUID exists.
 					 */
 					goto again_search_slot;
-				}
-				if (ret < 0 && ret != -ENOENT)
+				} else if (ret == -ENOENT) {
+					key.type = 0;
+					key.offset = 0;
+					key.objectid++;
+					goto again_search_slot;
+				} else if (ret < 0)
 					goto out;
 			}
 			item_size -= sizeof(subid_le);
-- 
2.5.0

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH] btrfs: Fix handling of -ENOENT from btrfs_uuid_iter_rem
  2016-09-07  7:38           ` [PATCH] btrfs: Fix handling of -ENOENT from btrfs_uuid_iter_rem Nikolay Borisov
@ 2016-09-19 18:13             ` David Sterba
  2016-09-19 18:49               ` Chris Mason
  0 siblings, 1 reply; 13+ messages in thread
From: David Sterba @ 2016-09-19 18:13 UTC (permalink / raw)
  To: Nikolay Borisov; +Cc: clm, linux-btrfs

On Wed, Sep 07, 2016 at 10:38:58AM +0300, Nikolay Borisov wrote:
> btrfs_uuid_iter_rem is able to return -ENOENT, however this condition
> is not handled in btrfs_uuid_tree_iterate which can lead to calling
> btrfs_next_item with freed path argument, leading to a null pointer
> dereference. Fix it by redoing the search but with an incremented
> objectid so we don't loop over the same key.
> 
> Signed-off-by: Nikolay Borisov <kernel@kyup.com>
> Suggested-by: Chris Mason <clm@fb.com>
> Link: https://lkml.kernel.org/r/57A473B0.2040203@kyup.com

I'll queue the patch for 4.9, thanks.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] btrfs: Fix handling of -ENOENT from btrfs_uuid_iter_rem
  2016-09-19 18:13             ` David Sterba
@ 2016-09-19 18:49               ` Chris Mason
  2016-09-19 20:18                 ` David Sterba
       [not found]                 ` <CAJFSNy5eOdkn=YSA1-T7goOUNuX6ozUiGAM3tCTq7dvzsiJCug@mail.gmail.com>
  0 siblings, 2 replies; 13+ messages in thread
From: Chris Mason @ 2016-09-19 18:49 UTC (permalink / raw)
  To: dsterba, Nikolay Borisov, linux-btrfs

On 09/19/2016 02:13 PM, David Sterba wrote:
> On Wed, Sep 07, 2016 at 10:38:58AM +0300, Nikolay Borisov wrote:
>> btrfs_uuid_iter_rem is able to return -ENOENT, however this condition
>> is not handled in btrfs_uuid_tree_iterate which can lead to calling
>> btrfs_next_item with freed path argument, leading to a null pointer
>> dereference. Fix it by redoing the search but with an incremented
>> objectid so we don't loop over the same key.
>>
>> Signed-off-by: Nikolay Borisov <kernel@kyup.com>
>> Suggested-by: Chris Mason <clm@fb.com>
>> Link: https://lkml.kernel.org/r/57A473B0.2040203@kyup.com
>
> I'll queue the patch for 4.9, thanks.
>

Not having a good test for this kept me from trying the patch cold.  I 
think bumping the objectid will end up missing items.

We know its returning -ENOENT, so it should in theory be enough to just 
goto again_search_slot, assuming that we just raced with the deletion.

-chris

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] btrfs: Fix handling of -ENOENT from btrfs_uuid_iter_rem
  2016-09-19 18:49               ` Chris Mason
@ 2016-09-19 20:18                 ` David Sterba
       [not found]                 ` <CAJFSNy5eOdkn=YSA1-T7goOUNuX6ozUiGAM3tCTq7dvzsiJCug@mail.gmail.com>
  1 sibling, 0 replies; 13+ messages in thread
From: David Sterba @ 2016-09-19 20:18 UTC (permalink / raw)
  To: Chris Mason; +Cc: Nikolay Borisov, linux-btrfs

On Mon, Sep 19, 2016 at 02:49:41PM -0400, Chris Mason wrote:
> On 09/19/2016 02:13 PM, David Sterba wrote:
> > On Wed, Sep 07, 2016 at 10:38:58AM +0300, Nikolay Borisov wrote:
> >> btrfs_uuid_iter_rem is able to return -ENOENT, however this condition
> >> is not handled in btrfs_uuid_tree_iterate which can lead to calling
> >> btrfs_next_item with freed path argument, leading to a null pointer
> >> dereference. Fix it by redoing the search but with an incremented
> >> objectid so we don't loop over the same key.
> >>
> >> Signed-off-by: Nikolay Borisov <kernel@kyup.com>
> >> Suggested-by: Chris Mason <clm@fb.com>
> >> Link: https://lkml.kernel.org/r/57A473B0.2040203@kyup.com
> >
> > I'll queue the patch for 4.9, thanks.
> >
> 
> Not having a good test for this kept me from trying the patch cold.  I 
> think bumping the objectid will end up missing items.

Ok, so I can keep it in the branches that are not for the upcoming
merges but still in for-next.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] btrfs: Fix handling of -ENOENT from btrfs_uuid_iter_rem
       [not found]                 ` <CAJFSNy5eOdkn=YSA1-T7goOUNuX6ozUiGAM3tCTq7dvzsiJCug@mail.gmail.com>
@ 2016-09-20  7:36                   ` Nikolay Borisov
  0 siblings, 0 replies; 13+ messages in thread
From: Nikolay Borisov @ 2016-09-20  7:36 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Chris Mason, dsterba

[Resend due to gmail mobile interface defaulting to html layout]
>>
>> We know its returning -ENOENT, so it should in theory be enough to just
>> goto again_search_slot, assuming that we just raced with the deletion.
>
>
> I will apply this on the machine which are exhibitting problems and will
> report whether it rectified the situation. i bump the objectid wince this is
> what you suggested. i can also try without it.
>>
>>
>> -chris

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2016-09-20  7:36 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-08-05 11:08 Crash in btrfs_uuid_tree_iterate during mount Nikolay Borisov
2016-08-05 15:12 ` Chris Mason
2016-08-05 19:14   ` Nikolay Borisov
2016-08-08 10:49   ` Nikolay Borisov
2016-08-08 14:16     ` Chris Mason
2016-08-08 14:21       ` Nikolay Borisov
2016-08-08 14:24         ` Chris Mason
2016-08-29  7:25           ` Nikolay Borisov
2016-09-07  7:38           ` [PATCH] btrfs: Fix handling of -ENOENT from btrfs_uuid_iter_rem Nikolay Borisov
2016-09-19 18:13             ` David Sterba
2016-09-19 18:49               ` Chris Mason
2016-09-19 20:18                 ` David Sterba
     [not found]                 ` <CAJFSNy5eOdkn=YSA1-T7goOUNuX6ozUiGAM3tCTq7dvzsiJCug@mail.gmail.com>
2016-09-20  7:36                   ` Nikolay Borisov

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.