All of lore.kernel.org
 help / color / mirror / Atom feed
* Null pointer oops
@ 2014-08-13  5:02 Larkin Lowrey
       [not found] ` <CALJ65z=25CrrO9uMc2vfYVAQWb=6eK+OhB5TGJJrCp=D4ALvrQ@mail.gmail.com>
  0 siblings, 1 reply; 13+ messages in thread
From: Larkin Lowrey @ 2014-08-13  5:02 UTC (permalink / raw)
  To: linux-bcache

I got an oops while doing some heavy I/O. I have an md raid10 cache
device (4 SSDs) and 3 md raid5/6 backing devices. This setup has been
well behaved for about 6 months.

If this isn't a known issue is there anything I can do to provide more
useful information?

I'm running kernel 3.15.8-200.fc20.x86_64.

[210884.047249] BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
[210884.055605] IP: [<ffffffffa01625fc>] bch_btree_node_read_done+0x4c/0x450 [bcache]
[210884.063723] PGD 0
[210884.066053] Oops: 0002 [#1] SMP
[210884.069610] Modules linked in: lp parport binfmt_misc ipt_MASQUERADE iptable_nat nf_nat_ipv4 nf_nat xt_CHECKSUM iptable_mangle tun bridge stp llc xt_multiport ebtable_nat ebtables hwmon_vid ip6t_REJECT nf_conntrack_ipv6 nf_conntrack_ipv4 nf_defrag_ipv6 nf_defrag_ipv4 ip6table_filter xt_conntrack ip6_tables nf_conntrack keyspan ezusb kvm_amd kvm crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel microcode serio_raw amd64_edac_mod edac_core fam15h_power k10temp edac_mce_amd sp5100_tco i2c_piix4 igb ptp pps_core dca shpchp acpi_cpufreq btrfs bcache raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq raid10 i2c_algo_bit drm_kms_helper ttm drm i2c_core mpt2sas mvsas libsas raid_class scsi_transport_sas cpufreq_stats
[210884.140704] CPU: 5 PID: 11188 Comm: kworker/5:1 Not tainted 3.15.8-200.fc20.x86_64 #1
[210884.149069] Hardware name:  /H8DG6/H8DGi, BIOS 3.0a       07/2
[210884.155280] Workqueue: bcache cache_lookup [bcache]
[210884.160531] task: ffff880218633160 ti: ffff8800217b8000 task.ti: ffff8800217b8000
[210884.168502] RIP: 0010:[<ffffffffa01625fc>]  [<ffffffffa01625fc>] bch_btree_node_read_done+0x4c/0x450 [bcache]
[210884.179105] RSP: 0000:ffff8800217bbbe8  EFLAGS: 00010212
[210884.184806] RAX: 0000000000000400 RBX: ffff880245ec0000 RCX: 0000000000000000
[210884.192480] RDX: 0000000000000000 RSI: ffff880418380000 RDI: 0000000000000246
[210884.200075] RBP: ffff8800217bbc10 R08: 0000000000000000 R09: 0000000000000f6b
[210884.207738] R10: 0000000000000000 R11: 0000000000000400 R12: ffff880413d06c00
[210884.215391] R13: 0000000000000000 R14: ffff8800217bbc20 R15: ffff880413d06c00
[210884.222961] FS:  00007f73bacd6880(0000) GS:ffff88021fd40000(0000) knlGS:0000000000000000
[210884.231516] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[210884.237557] CR2: 0000000000000008 CR3: 0000000001c11000 CR4: 00000000000407e0
[210884.245131] Stack:
[210884.247395]  ffff880274f4d020 ffff880413d06c00 0000bfcc44a463f8 ffff8800217bbc20
[210884.255337]  ffff880413d06c00 ffff8800217bbc78 ffffffffa0162b68 0000000000000000
[210884.263256]  ffff880218633160 0000000000000000 0000000000000000 0000000000000000
[210884.271234] Call Trace:
[210884.273985]  [<ffffffffa0162b68>] bch_btree_node_read+0x168/0x190 [bcache]
[210884.281258]  [<ffffffffa0163f69>] bch_btree_node_get+0x169/0x290 [bcache]
[210884.288377]  [<ffffffffa01642f5>] bch_btree_map_keys_recurse+0xd5/0x1d0 [bcache]
[210884.296311]  [<ffffffffa016dcb0>] ? cached_dev_congested+0x180/0x180 [bcache]
[210884.303953]  [<ffffffff8135b204>] ? call_rwsem_down_read_failed+0x14/0x30
[210884.311158]  [<ffffffffa01673f7>] bch_btree_map_keys+0x127/0x150 [bcache]
[210884.318273]  [<ffffffffa016dcb0>] ? cached_dev_congested+0x180/0x180 [bcache]
[210884.325826]  [<ffffffffa016e7f5>] cache_lookup+0xf5/0x1f0 [bcache]
[210884.332325]  [<ffffffff810a4af6>] process_one_work+0x176/0x430
[210884.338427]  [<ffffffff810a578b>] worker_thread+0x11b/0x3a0
[210884.344282]  [<ffffffff810a5670>] ? rescuer_thread+0x3b0/0x3b0
[210884.350447]  [<ffffffff810ac528>] kthread+0xd8/0xf0
[210884.355615]  [<ffffffff810ac450>] ? insert_kthread_work+0x40/0x40
[210884.362017]  [<ffffffff816ff93c>] ret_from_fork+0x7c/0xb0
[210884.367756]  [<ffffffff810ac450>] ? insert_kthread_work+0x40/0x40
[210884.374234] Code: 08 01 00 00 48 8b b8 58 cb 00 00 e8 bf 25 01 e1 49 8b b4 24 80 00 00 00 49 89 c5 31 d2 0f b7 86 32 04 00 00 66 f7 b6 30 04 00 00 <49> c7 45 08 00 00 00 00 0f b7 c0 49 89 45 00 48 8b 43 10 48 85
[210884.395405] RIP  [<ffffffffa01625fc>] bch_btree_node_read_done+0x4c/0x450 [bcache]
[210884.403389]  RSP <ffff8800217bbbe8>
[210884.407171] CR2: 0000000000000008
[210884.411233] ---[ end trace 0064e6abfd068c85 ]---
[210884.416352] BUG: unable to handle kernel paging request at ffffffffffffffd8
[210884.423871] IP: [<ffffffff810acb10>] kthread_data+0x10/0x20
[210884.429915] PGD 1c14067 PUD 1c16067 PMD 0

--Larkin

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Null pointer oops
       [not found] ` <CALJ65z=25CrrO9uMc2vfYVAQWb=6eK+OhB5TGJJrCp=D4ALvrQ@mail.gmail.com>
@ 2014-08-13 16:40   ` Larkin Lowrey
  2014-08-13 17:41     ` Slava Pestov
  0 siblings, 1 reply; 13+ messages in thread
From: Larkin Lowrey @ 2014-08-13 16:40 UTC (permalink / raw)
  To: Kent Overstreet; +Cc: linux-bcache

This is making be feel very dumb. I've googled extensively but can't
figure out how to run addr2line for a module.

I'm running Fedora 20 and the kernel did not have debugging symbols. I
downloaded the version with symbols but I don't know if the addresses
are going to be the same. Bcache is a module for me and that's where
things get tricky. Do you have any tips?

--Larkin

On 8/13/2014 12:04 AM, Kent Overstreet wrote:
>
> Any chance you could do an addr2line and get me the exact line where
> it happened?
>
> On Aug 12, 2014 10:02 PM, "Larkin Lowrey" <llowrey@nuclearwinter.com
> <mailto:llowrey@nuclearwinter.com>> wrote:
>
>     I got an oops while doing some heavy I/O. I have an md raid10 cache
>     device (4 SSDs) and 3 md raid5/6 backing devices. This setup has been
>     well behaved for about 6 months.
>
>     If this isn't a known issue is there anything I can do to provide more
>     useful information?
>
>     I'm running kernel 3.15.8-200.fc20.x86_64.
>
>     [210884.047249] BUG: unable to handle kernel NULL pointer
>     dereference at 0000000000000008
>     [210884.055605] IP: [<ffffffffa01625fc>]
>     bch_btree_node_read_done+0x4c/0x450 [bcache]
>     [210884.063723] PGD 0
>     [210884.066053] Oops: 0002 [#1] SMP
>     [210884.069610] Modules linked in: lp parport binfmt_misc
>     ipt_MASQUERADE iptable_nat nf_nat_ipv4 nf_nat xt_CHECKSUM
>     iptable_mangle tun bridge stp llc xt_multiport ebtable_nat
>     ebtables hwmon_vid ip6t_REJECT nf_conntrack_ipv6 nf_conntrack_ipv4
>     nf_defrag_ipv6 nf_defrag_ipv4 ip6table_filter xt_conntrack
>     ip6_tables nf_conntrack keyspan ezusb kvm_amd kvm crct10dif_pclmul
>     crc32_pclmul crc32c_intel ghash_clmulni_intel microcode serio_raw
>     amd64_edac_mod edac_core fam15h_power k10temp edac_mce_amd
>     sp5100_tco i2c_piix4 igb ptp pps_core dca shpchp acpi_cpufreq
>     btrfs bcache raid456 async_raid6_recov async_memcpy async_pq
>     async_xor async_tx xor raid6_pq raid10 i2c_algo_bit drm_kms_helper
>     ttm drm i2c_core mpt2sas mvsas libsas raid_class
>     scsi_transport_sas cpufreq_stats
>     [210884.140704] CPU: 5 PID: 11188 Comm: kworker/5:1 Not tainted
>     3.15.8-200.fc20.x86_64 #1
>     [210884.149069] Hardware name:  /H8DG6/H8DGi, BIOS 3.0a       07/2
>     [210884.155280] Workqueue: bcache cache_lookup [bcache]
>     [210884.160531] task: ffff880218633160 ti: ffff8800217b8000
>     task.ti: ffff8800217b8000
>     [210884.168502] RIP: 0010:[<ffffffffa01625fc>]
>      [<ffffffffa01625fc>] bch_btree_node_read_done+0x4c/0x450 [bcache]
>     [210884.179105] RSP: 0000:ffff8800217bbbe8  EFLAGS: 00010212
>     [210884.184806] RAX: 0000000000000400 RBX: ffff880245ec0000 RCX:
>     0000000000000000
>     [210884.192480] RDX: 0000000000000000 RSI: ffff880418380000 RDI:
>     0000000000000246
>     [210884.200075] RBP: ffff8800217bbc10 R08: 0000000000000000 R09:
>     0000000000000f6b
>     [210884.207738] R10: 0000000000000000 R11: 0000000000000400 R12:
>     ffff880413d06c00
>     [210884.215391] R13: 0000000000000000 R14: ffff8800217bbc20 R15:
>     ffff880413d06c00
>     [210884.222961] FS:  00007f73bacd6880(0000)
>     GS:ffff88021fd40000(0000) knlGS:0000000000000000
>     [210884.231516] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
>     [210884.237557] CR2: 0000000000000008 CR3: 0000000001c11000 CR4:
>     00000000000407e0
>     [210884.245131] Stack:
>     [210884.247395]  ffff880274f4d020 ffff880413d06c00
>     0000bfcc44a463f8 ffff8800217bbc20
>     [210884.255337]  ffff880413d06c00 ffff8800217bbc78
>     ffffffffa0162b68 0000000000000000
>     [210884.263256]  ffff880218633160 0000000000000000
>     0000000000000000 0000000000000000
>     [210884.271234] Call Trace:
>     [210884.273985]  [<ffffffffa0162b68>]
>     bch_btree_node_read+0x168/0x190 [bcache]
>     [210884.281258]  [<ffffffffa0163f69>]
>     bch_btree_node_get+0x169/0x290 [bcache]
>     [210884.288377]  [<ffffffffa01642f5>]
>     bch_btree_map_keys_recurse+0xd5/0x1d0 [bcache]
>     [210884.296311]  [<ffffffffa016dcb0>] ?
>     cached_dev_congested+0x180/0x180 [bcache]
>     [210884.303953]  [<ffffffff8135b204>] ?
>     call_rwsem_down_read_failed+0x14/0x30
>     [210884.311158]  [<ffffffffa01673f7>]
>     bch_btree_map_keys+0x127/0x150 [bcache]
>     [210884.318273]  [<ffffffffa016dcb0>] ?
>     cached_dev_congested+0x180/0x180 [bcache]
>     [210884.325826]  [<ffffffffa016e7f5>] cache_lookup+0xf5/0x1f0 [bcache]
>     [210884.332325]  [<ffffffff810a4af6>] process_one_work+0x176/0x430
>     [210884.338427]  [<ffffffff810a578b>] worker_thread+0x11b/0x3a0
>     [210884.344282]  [<ffffffff810a5670>] ? rescuer_thread+0x3b0/0x3b0
>     [210884.350447]  [<ffffffff810ac528>] kthread+0xd8/0xf0
>     [210884.355615]  [<ffffffff810ac450>] ? insert_kthread_work+0x40/0x40
>     [210884.362017]  [<ffffffff816ff93c>] ret_from_fork+0x7c/0xb0
>     [210884.367756]  [<ffffffff810ac450>] ? insert_kthread_work+0x40/0x40
>     [210884.374234] Code: 08 01 00 00 48 8b b8 58 cb 00 00 e8 bf 25 01
>     e1 49 8b b4 24 80 00 00 00 49 89 c5 31 d2 0f b7 86 32 04 00 00 66
>     f7 b6 30 04 00 00 <49> c7 45 08 00 00 00 00 0f b7 c0 49 89 45 00
>     48 8b 43 10 48 85
>     [210884.395405] RIP  [<ffffffffa01625fc>]
>     bch_btree_node_read_done+0x4c/0x450 [bcache]
>     [210884.403389]  RSP <ffff8800217bbbe8>
>     [210884.407171] CR2: 0000000000000008
>     [210884.411233] ---[ end trace 0064e6abfd068c85 ]---
>     [210884.416352] BUG: unable to handle kernel paging request at
>     ffffffffffffffd8
>     [210884.423871] IP: [<ffffffff810acb10>] kthread_data+0x10/0x20
>     [210884.429915] PGD 1c14067 PUD 1c16067 PMD 0
>
>     --Larkin
>
>     --
>     To unsubscribe from this list: send the line "unsubscribe
>     linux-bcache" in
>     the body of a message to majordomo@vger.kernel.org
>     <mailto:majordomo@vger.kernel.org>
>     More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Null pointer oops
  2014-08-13 16:40   ` Larkin Lowrey
@ 2014-08-13 17:41     ` Slava Pestov
  2014-08-13 18:35       ` Larkin Lowrey
  0 siblings, 1 reply; 13+ messages in thread
From: Slava Pestov @ 2014-08-13 17:41 UTC (permalink / raw)
  To: Larkin Lowrey; +Cc: Kent Overstreet, linux-bcache

You can try to use gdb:

gdb /lib/modules/.../foo.ko

list *(bch_btree_node_read_done+0x4c)


On Wed, Aug 13, 2014 at 9:40 AM, Larkin Lowrey
<llowrey@nuclearwinter.com> wrote:
> This is making be feel very dumb. I've googled extensively but can't
> figure out how to run addr2line for a module.
>
> I'm running Fedora 20 and the kernel did not have debugging symbols. I
> downloaded the version with symbols but I don't know if the addresses
> are going to be the same. Bcache is a module for me and that's where
> things get tricky. Do you have any tips?
>
> --Larkin
>
> On 8/13/2014 12:04 AM, Kent Overstreet wrote:
>>
>> Any chance you could do an addr2line and get me the exact line where
>> it happened?
>>
>> On Aug 12, 2014 10:02 PM, "Larkin Lowrey" <llowrey@nuclearwinter.com
>> <mailto:llowrey@nuclearwinter.com>> wrote:
>>
>>     I got an oops while doing some heavy I/O. I have an md raid10 cache
>>     device (4 SSDs) and 3 md raid5/6 backing devices. This setup has been
>>     well behaved for about 6 months.
>>
>>     If this isn't a known issue is there anything I can do to provide more
>>     useful information?
>>
>>     I'm running kernel 3.15.8-200.fc20.x86_64.
>>
>>     [210884.047249] BUG: unable to handle kernel NULL pointer
>>     dereference at 0000000000000008
>>     [210884.055605] IP: [<ffffffffa01625fc>]
>>     bch_btree_node_read_done+0x4c/0x450 [bcache]
>>     [210884.063723] PGD 0
>>     [210884.066053] Oops: 0002 [#1] SMP
>>     [210884.069610] Modules linked in: lp parport binfmt_misc
>>     ipt_MASQUERADE iptable_nat nf_nat_ipv4 nf_nat xt_CHECKSUM
>>     iptable_mangle tun bridge stp llc xt_multiport ebtable_nat
>>     ebtables hwmon_vid ip6t_REJECT nf_conntrack_ipv6 nf_conntrack_ipv4
>>     nf_defrag_ipv6 nf_defrag_ipv4 ip6table_filter xt_conntrack
>>     ip6_tables nf_conntrack keyspan ezusb kvm_amd kvm crct10dif_pclmul
>>     crc32_pclmul crc32c_intel ghash_clmulni_intel microcode serio_raw
>>     amd64_edac_mod edac_core fam15h_power k10temp edac_mce_amd
>>     sp5100_tco i2c_piix4 igb ptp pps_core dca shpchp acpi_cpufreq
>>     btrfs bcache raid456 async_raid6_recov async_memcpy async_pq
>>     async_xor async_tx xor raid6_pq raid10 i2c_algo_bit drm_kms_helper
>>     ttm drm i2c_core mpt2sas mvsas libsas raid_class
>>     scsi_transport_sas cpufreq_stats
>>     [210884.140704] CPU: 5 PID: 11188 Comm: kworker/5:1 Not tainted
>>     3.15.8-200.fc20.x86_64 #1
>>     [210884.149069] Hardware name:  /H8DG6/H8DGi, BIOS 3.0a       07/2
>>     [210884.155280] Workqueue: bcache cache_lookup [bcache]
>>     [210884.160531] task: ffff880218633160 ti: ffff8800217b8000
>>     task.ti: ffff8800217b8000
>>     [210884.168502] RIP: 0010:[<ffffffffa01625fc>]
>>      [<ffffffffa01625fc>] bch_btree_node_read_done+0x4c/0x450 [bcache]
>>     [210884.179105] RSP: 0000:ffff8800217bbbe8  EFLAGS: 00010212
>>     [210884.184806] RAX: 0000000000000400 RBX: ffff880245ec0000 RCX:
>>     0000000000000000
>>     [210884.192480] RDX: 0000000000000000 RSI: ffff880418380000 RDI:
>>     0000000000000246
>>     [210884.200075] RBP: ffff8800217bbc10 R08: 0000000000000000 R09:
>>     0000000000000f6b
>>     [210884.207738] R10: 0000000000000000 R11: 0000000000000400 R12:
>>     ffff880413d06c00
>>     [210884.215391] R13: 0000000000000000 R14: ffff8800217bbc20 R15:
>>     ffff880413d06c00
>>     [210884.222961] FS:  00007f73bacd6880(0000)
>>     GS:ffff88021fd40000(0000) knlGS:0000000000000000
>>     [210884.231516] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
>>     [210884.237557] CR2: 0000000000000008 CR3: 0000000001c11000 CR4:
>>     00000000000407e0
>>     [210884.245131] Stack:
>>     [210884.247395]  ffff880274f4d020 ffff880413d06c00
>>     0000bfcc44a463f8 ffff8800217bbc20
>>     [210884.255337]  ffff880413d06c00 ffff8800217bbc78
>>     ffffffffa0162b68 0000000000000000
>>     [210884.263256]  ffff880218633160 0000000000000000
>>     0000000000000000 0000000000000000
>>     [210884.271234] Call Trace:
>>     [210884.273985]  [<ffffffffa0162b68>]
>>     bch_btree_node_read+0x168/0x190 [bcache]
>>     [210884.281258]  [<ffffffffa0163f69>]
>>     bch_btree_node_get+0x169/0x290 [bcache]
>>     [210884.288377]  [<ffffffffa01642f5>]
>>     bch_btree_map_keys_recurse+0xd5/0x1d0 [bcache]
>>     [210884.296311]  [<ffffffffa016dcb0>] ?
>>     cached_dev_congested+0x180/0x180 [bcache]
>>     [210884.303953]  [<ffffffff8135b204>] ?
>>     call_rwsem_down_read_failed+0x14/0x30
>>     [210884.311158]  [<ffffffffa01673f7>]
>>     bch_btree_map_keys+0x127/0x150 [bcache]
>>     [210884.318273]  [<ffffffffa016dcb0>] ?
>>     cached_dev_congested+0x180/0x180 [bcache]
>>     [210884.325826]  [<ffffffffa016e7f5>] cache_lookup+0xf5/0x1f0 [bcache]
>>     [210884.332325]  [<ffffffff810a4af6>] process_one_work+0x176/0x430
>>     [210884.338427]  [<ffffffff810a578b>] worker_thread+0x11b/0x3a0
>>     [210884.344282]  [<ffffffff810a5670>] ? rescuer_thread+0x3b0/0x3b0
>>     [210884.350447]  [<ffffffff810ac528>] kthread+0xd8/0xf0
>>     [210884.355615]  [<ffffffff810ac450>] ? insert_kthread_work+0x40/0x40
>>     [210884.362017]  [<ffffffff816ff93c>] ret_from_fork+0x7c/0xb0
>>     [210884.367756]  [<ffffffff810ac450>] ? insert_kthread_work+0x40/0x40
>>     [210884.374234] Code: 08 01 00 00 48 8b b8 58 cb 00 00 e8 bf 25 01
>>     e1 49 8b b4 24 80 00 00 00 49 89 c5 31 d2 0f b7 86 32 04 00 00 66
>>     f7 b6 30 04 00 00 <49> c7 45 08 00 00 00 00 0f b7 c0 49 89 45 00
>>     48 8b 43 10 48 85
>>     [210884.395405] RIP  [<ffffffffa01625fc>]
>>     bch_btree_node_read_done+0x4c/0x450 [bcache]
>>     [210884.403389]  RSP <ffff8800217bbbe8>
>>     [210884.407171] CR2: 0000000000000008
>>     [210884.411233] ---[ end trace 0064e6abfd068c85 ]---
>>     [210884.416352] BUG: unable to handle kernel paging request at
>>     ffffffffffffffd8
>>     [210884.423871] IP: [<ffffffff810acb10>] kthread_data+0x10/0x20
>>     [210884.429915] PGD 1c14067 PUD 1c16067 PMD 0
>>
>>     --Larkin
>>
>>     --
>>     To unsubscribe from this list: send the line "unsubscribe
>>     linux-bcache" in
>>     the body of a message to majordomo@vger.kernel.org
>>     <mailto:majordomo@vger.kernel.org>
>>     More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-bcache" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Null pointer oops
  2014-08-13 17:41     ` Slava Pestov
@ 2014-08-13 18:35       ` Larkin Lowrey
  2014-08-13 18:45         ` Slava Pestov
  0 siblings, 1 reply; 13+ messages in thread
From: Larkin Lowrey @ 2014-08-13 18:35 UTC (permalink / raw)
  To: Slava Pestov; +Cc: Kent Overstreet, linux-bcache

Thanks. Trying gdb helped me find the answer. I needed to install the
kernel-debuginfo-3.15.8-200.fc20.x86_64  package via yum.

From addr2line:
> bch_btree_node_read_done+0x4c
> drivers/md/bcache/btree.c:207

Here'a a snippet from gdb:

> (gdb) list *(bch_btree_node_read_done+0x4c)
> 0x65fc is in bch_btree_node_read_done (drivers/md/bcache/btree.c:207).
> 202             struct bset *i = btree_bset_first(b);
> 203             struct btree_iter *iter;
> 204
> 205             iter = mempool_alloc(b->c->fill_iter, GFP_NOWAIT);
> 206             iter->size = b->c->sb.bucket_size / b->c->sb.block_size;
> 207             iter->used = 0;
> 208
> 209     #ifdef CONFIG_BCACHE_DEBUG
> 210             iter->b = &b->keys;
> 211     #endif

This doesn't make any sense to me. If iter was null I would expect line
206 to blow up first.

--Larkin

On 8/13/2014 12:41 PM, Slava Pestov wrote:
> You can try to use gdb:
>
> gdb /lib/modules/.../foo.ko
>
> list *(bch_btree_node_read_done+0x4c)
>
>
> On Wed, Aug 13, 2014 at 9:40 AM, Larkin Lowrey
> <llowrey@nuclearwinter.com> wrote:
>> This is making be feel very dumb. I've googled extensively but can't
>> figure out how to run addr2line for a module.
>>
>> I'm running Fedora 20 and the kernel did not have debugging symbols. I
>> downloaded the version with symbols but I don't know if the addresses
>> are going to be the same. Bcache is a module for me and that's where
>> things get tricky. Do you have any tips?
>>
>> --Larkin
>>
>> On 8/13/2014 12:04 AM, Kent Overstreet wrote:
>>> Any chance you could do an addr2line and get me the exact line where
>>> it happened?
>>>
>>> On Aug 12, 2014 10:02 PM, "Larkin Lowrey" <llowrey@nuclearwinter.com
>>> <mailto:llowrey@nuclearwinter.com>> wrote:
>>>
>>>     I got an oops while doing some heavy I/O. I have an md raid10 cache
>>>     device (4 SSDs) and 3 md raid5/6 backing devices. This setup has been
>>>     well behaved for about 6 months.
>>>
>>>     If this isn't a known issue is there anything I can do to provide more
>>>     useful information?
>>>
>>>     I'm running kernel 3.15.8-200.fc20.x86_64.
>>>
>>>     [210884.047249] BUG: unable to handle kernel NULL pointer
>>>     dereference at 0000000000000008
>>>     [210884.055605] IP: [<ffffffffa01625fc>]
>>>     bch_btree_node_read_done+0x4c/0x450 [bcache]
>>>     [210884.063723] PGD 0
>>>     [210884.066053] Oops: 0002 [#1] SMP
>>>     [210884.069610] Modules linked in: lp parport binfmt_misc
>>>     ipt_MASQUERADE iptable_nat nf_nat_ipv4 nf_nat xt_CHECKSUM
>>>     iptable_mangle tun bridge stp llc xt_multiport ebtable_nat
>>>     ebtables hwmon_vid ip6t_REJECT nf_conntrack_ipv6 nf_conntrack_ipv4
>>>     nf_defrag_ipv6 nf_defrag_ipv4 ip6table_filter xt_conntrack
>>>     ip6_tables nf_conntrack keyspan ezusb kvm_amd kvm crct10dif_pclmul
>>>     crc32_pclmul crc32c_intel ghash_clmulni_intel microcode serio_raw
>>>     amd64_edac_mod edac_core fam15h_power k10temp edac_mce_amd
>>>     sp5100_tco i2c_piix4 igb ptp pps_core dca shpchp acpi_cpufreq
>>>     btrfs bcache raid456 async_raid6_recov async_memcpy async_pq
>>>     async_xor async_tx xor raid6_pq raid10 i2c_algo_bit drm_kms_helper
>>>     ttm drm i2c_core mpt2sas mvsas libsas raid_class
>>>     scsi_transport_sas cpufreq_stats
>>>     [210884.140704] CPU: 5 PID: 11188 Comm: kworker/5:1 Not tainted
>>>     3.15.8-200.fc20.x86_64 #1
>>>     [210884.149069] Hardware name:  /H8DG6/H8DGi, BIOS 3.0a       07/2
>>>     [210884.155280] Workqueue: bcache cache_lookup [bcache]
>>>     [210884.160531] task: ffff880218633160 ti: ffff8800217b8000
>>>     task.ti: ffff8800217b8000
>>>     [210884.168502] RIP: 0010:[<ffffffffa01625fc>]
>>>      [<ffffffffa01625fc>] bch_btree_node_read_done+0x4c/0x450 [bcache]
>>>     [210884.179105] RSP: 0000:ffff8800217bbbe8  EFLAGS: 00010212
>>>     [210884.184806] RAX: 0000000000000400 RBX: ffff880245ec0000 RCX:
>>>     0000000000000000
>>>     [210884.192480] RDX: 0000000000000000 RSI: ffff880418380000 RDI:
>>>     0000000000000246
>>>     [210884.200075] RBP: ffff8800217bbc10 R08: 0000000000000000 R09:
>>>     0000000000000f6b
>>>     [210884.207738] R10: 0000000000000000 R11: 0000000000000400 R12:
>>>     ffff880413d06c00
>>>     [210884.215391] R13: 0000000000000000 R14: ffff8800217bbc20 R15:
>>>     ffff880413d06c00
>>>     [210884.222961] FS:  00007f73bacd6880(0000)
>>>     GS:ffff88021fd40000(0000) knlGS:0000000000000000
>>>     [210884.231516] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
>>>     [210884.237557] CR2: 0000000000000008 CR3: 0000000001c11000 CR4:
>>>     00000000000407e0
>>>     [210884.245131] Stack:
>>>     [210884.247395]  ffff880274f4d020 ffff880413d06c00
>>>     0000bfcc44a463f8 ffff8800217bbc20
>>>     [210884.255337]  ffff880413d06c00 ffff8800217bbc78
>>>     ffffffffa0162b68 0000000000000000
>>>     [210884.263256]  ffff880218633160 0000000000000000
>>>     0000000000000000 0000000000000000
>>>     [210884.271234] Call Trace:
>>>     [210884.273985]  [<ffffffffa0162b68>]
>>>     bch_btree_node_read+0x168/0x190 [bcache]
>>>     [210884.281258]  [<ffffffffa0163f69>]
>>>     bch_btree_node_get+0x169/0x290 [bcache]
>>>     [210884.288377]  [<ffffffffa01642f5>]
>>>     bch_btree_map_keys_recurse+0xd5/0x1d0 [bcache]
>>>     [210884.296311]  [<ffffffffa016dcb0>] ?
>>>     cached_dev_congested+0x180/0x180 [bcache]
>>>     [210884.303953]  [<ffffffff8135b204>] ?
>>>     call_rwsem_down_read_failed+0x14/0x30
>>>     [210884.311158]  [<ffffffffa01673f7>]
>>>     bch_btree_map_keys+0x127/0x150 [bcache]
>>>     [210884.318273]  [<ffffffffa016dcb0>] ?
>>>     cached_dev_congested+0x180/0x180 [bcache]
>>>     [210884.325826]  [<ffffffffa016e7f5>] cache_lookup+0xf5/0x1f0 [bcache]
>>>     [210884.332325]  [<ffffffff810a4af6>] process_one_work+0x176/0x430
>>>     [210884.338427]  [<ffffffff810a578b>] worker_thread+0x11b/0x3a0
>>>     [210884.344282]  [<ffffffff810a5670>] ? rescuer_thread+0x3b0/0x3b0
>>>     [210884.350447]  [<ffffffff810ac528>] kthread+0xd8/0xf0
>>>     [210884.355615]  [<ffffffff810ac450>] ? insert_kthread_work+0x40/0x40
>>>     [210884.362017]  [<ffffffff816ff93c>] ret_from_fork+0x7c/0xb0
>>>     [210884.367756]  [<ffffffff810ac450>] ? insert_kthread_work+0x40/0x40
>>>     [210884.374234] Code: 08 01 00 00 48 8b b8 58 cb 00 00 e8 bf 25 01
>>>     e1 49 8b b4 24 80 00 00 00 49 89 c5 31 d2 0f b7 86 32 04 00 00 66
>>>     f7 b6 30 04 00 00 <49> c7 45 08 00 00 00 00 0f b7 c0 49 89 45 00
>>>     48 8b 43 10 48 85
>>>     [210884.395405] RIP  [<ffffffffa01625fc>]
>>>     bch_btree_node_read_done+0x4c/0x450 [bcache]
>>>     [210884.403389]  RSP <ffff8800217bbbe8>
>>>     [210884.407171] CR2: 0000000000000008
>>>     [210884.411233] ---[ end trace 0064e6abfd068c85 ]---
>>>     [210884.416352] BUG: unable to handle kernel paging request at
>>>     ffffffffffffffd8
>>>     [210884.423871] IP: [<ffffffff810acb10>] kthread_data+0x10/0x20
>>>     [210884.429915] PGD 1c14067 PUD 1c16067 PMD 0
>>>
>>>     --Larkin
>>>
>>>     --
>>>     To unsubscribe from this list: send the line "unsubscribe
>>>     linux-bcache" in
>>>     the body of a message to majordomo@vger.kernel.org
>>>     <mailto:majordomo@vger.kernel.org>
>>>     More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-bcache" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe linux-bcache" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Null pointer oops
  2014-08-13 18:35       ` Larkin Lowrey
@ 2014-08-13 18:45         ` Slava Pestov
  2014-08-13 21:21           ` Larkin Lowrey
  0 siblings, 1 reply; 13+ messages in thread
From: Slava Pestov @ 2014-08-13 18:45 UTC (permalink / raw)
  To: Larkin Lowrey; +Cc: Kent Overstreet, linux-bcache

Can you post the disassembly of the function?

On Wed, Aug 13, 2014 at 11:35 AM, Larkin Lowrey
<llowrey@nuclearwinter.com> wrote:
> Thanks. Trying gdb helped me find the answer. I needed to install the
> kernel-debuginfo-3.15.8-200.fc20.x86_64  package via yum.
>
> From addr2line:
>> bch_btree_node_read_done+0x4c
>> drivers/md/bcache/btree.c:207
>
> Here'a a snippet from gdb:
>
>> (gdb) list *(bch_btree_node_read_done+0x4c)
>> 0x65fc is in bch_btree_node_read_done (drivers/md/bcache/btree.c:207).
>> 202             struct bset *i = btree_bset_first(b);
>> 203             struct btree_iter *iter;
>> 204
>> 205             iter = mempool_alloc(b->c->fill_iter, GFP_NOWAIT);
>> 206             iter->size = b->c->sb.bucket_size / b->c->sb.block_size;
>> 207             iter->used = 0;
>> 208
>> 209     #ifdef CONFIG_BCACHE_DEBUG
>> 210             iter->b = &b->keys;
>> 211     #endif
>
> This doesn't make any sense to me. If iter was null I would expect line
> 206 to blow up first.
>
> --Larkin
>
> On 8/13/2014 12:41 PM, Slava Pestov wrote:
>> You can try to use gdb:
>>
>> gdb /lib/modules/.../foo.ko
>>
>> list *(bch_btree_node_read_done+0x4c)
>>
>>
>> On Wed, Aug 13, 2014 at 9:40 AM, Larkin Lowrey
>> <llowrey@nuclearwinter.com> wrote:
>>> This is making be feel very dumb. I've googled extensively but can't
>>> figure out how to run addr2line for a module.
>>>
>>> I'm running Fedora 20 and the kernel did not have debugging symbols. I
>>> downloaded the version with symbols but I don't know if the addresses
>>> are going to be the same. Bcache is a module for me and that's where
>>> things get tricky. Do you have any tips?
>>>
>>> --Larkin
>>>
>>> On 8/13/2014 12:04 AM, Kent Overstreet wrote:
>>>> Any chance you could do an addr2line and get me the exact line where
>>>> it happened?
>>>>
>>>> On Aug 12, 2014 10:02 PM, "Larkin Lowrey" <llowrey@nuclearwinter.com
>>>> <mailto:llowrey@nuclearwinter.com>> wrote:
>>>>
>>>>     I got an oops while doing some heavy I/O. I have an md raid10 cache
>>>>     device (4 SSDs) and 3 md raid5/6 backing devices. This setup has been
>>>>     well behaved for about 6 months.
>>>>
>>>>     If this isn't a known issue is there anything I can do to provide more
>>>>     useful information?
>>>>
>>>>     I'm running kernel 3.15.8-200.fc20.x86_64.
>>>>
>>>>     [210884.047249] BUG: unable to handle kernel NULL pointer
>>>>     dereference at 0000000000000008
>>>>     [210884.055605] IP: [<ffffffffa01625fc>]
>>>>     bch_btree_node_read_done+0x4c/0x450 [bcache]
>>>>     [210884.063723] PGD 0
>>>>     [210884.066053] Oops: 0002 [#1] SMP
>>>>     [210884.069610] Modules linked in: lp parport binfmt_misc
>>>>     ipt_MASQUERADE iptable_nat nf_nat_ipv4 nf_nat xt_CHECKSUM
>>>>     iptable_mangle tun bridge stp llc xt_multiport ebtable_nat
>>>>     ebtables hwmon_vid ip6t_REJECT nf_conntrack_ipv6 nf_conntrack_ipv4
>>>>     nf_defrag_ipv6 nf_defrag_ipv4 ip6table_filter xt_conntrack
>>>>     ip6_tables nf_conntrack keyspan ezusb kvm_amd kvm crct10dif_pclmul
>>>>     crc32_pclmul crc32c_intel ghash_clmulni_intel microcode serio_raw
>>>>     amd64_edac_mod edac_core fam15h_power k10temp edac_mce_amd
>>>>     sp5100_tco i2c_piix4 igb ptp pps_core dca shpchp acpi_cpufreq
>>>>     btrfs bcache raid456 async_raid6_recov async_memcpy async_pq
>>>>     async_xor async_tx xor raid6_pq raid10 i2c_algo_bit drm_kms_helper
>>>>     ttm drm i2c_core mpt2sas mvsas libsas raid_class
>>>>     scsi_transport_sas cpufreq_stats
>>>>     [210884.140704] CPU: 5 PID: 11188 Comm: kworker/5:1 Not tainted
>>>>     3.15.8-200.fc20.x86_64 #1
>>>>     [210884.149069] Hardware name:  /H8DG6/H8DGi, BIOS 3.0a       07/2
>>>>     [210884.155280] Workqueue: bcache cache_lookup [bcache]
>>>>     [210884.160531] task: ffff880218633160 ti: ffff8800217b8000
>>>>     task.ti: ffff8800217b8000
>>>>     [210884.168502] RIP: 0010:[<ffffffffa01625fc>]
>>>>      [<ffffffffa01625fc>] bch_btree_node_read_done+0x4c/0x450 [bcache]
>>>>     [210884.179105] RSP: 0000:ffff8800217bbbe8  EFLAGS: 00010212
>>>>     [210884.184806] RAX: 0000000000000400 RBX: ffff880245ec0000 RCX:
>>>>     0000000000000000
>>>>     [210884.192480] RDX: 0000000000000000 RSI: ffff880418380000 RDI:
>>>>     0000000000000246
>>>>     [210884.200075] RBP: ffff8800217bbc10 R08: 0000000000000000 R09:
>>>>     0000000000000f6b
>>>>     [210884.207738] R10: 0000000000000000 R11: 0000000000000400 R12:
>>>>     ffff880413d06c00
>>>>     [210884.215391] R13: 0000000000000000 R14: ffff8800217bbc20 R15:
>>>>     ffff880413d06c00
>>>>     [210884.222961] FS:  00007f73bacd6880(0000)
>>>>     GS:ffff88021fd40000(0000) knlGS:0000000000000000
>>>>     [210884.231516] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
>>>>     [210884.237557] CR2: 0000000000000008 CR3: 0000000001c11000 CR4:
>>>>     00000000000407e0
>>>>     [210884.245131] Stack:
>>>>     [210884.247395]  ffff880274f4d020 ffff880413d06c00
>>>>     0000bfcc44a463f8 ffff8800217bbc20
>>>>     [210884.255337]  ffff880413d06c00 ffff8800217bbc78
>>>>     ffffffffa0162b68 0000000000000000
>>>>     [210884.263256]  ffff880218633160 0000000000000000
>>>>     0000000000000000 0000000000000000
>>>>     [210884.271234] Call Trace:
>>>>     [210884.273985]  [<ffffffffa0162b68>]
>>>>     bch_btree_node_read+0x168/0x190 [bcache]
>>>>     [210884.281258]  [<ffffffffa0163f69>]
>>>>     bch_btree_node_get+0x169/0x290 [bcache]
>>>>     [210884.288377]  [<ffffffffa01642f5>]
>>>>     bch_btree_map_keys_recurse+0xd5/0x1d0 [bcache]
>>>>     [210884.296311]  [<ffffffffa016dcb0>] ?
>>>>     cached_dev_congested+0x180/0x180 [bcache]
>>>>     [210884.303953]  [<ffffffff8135b204>] ?
>>>>     call_rwsem_down_read_failed+0x14/0x30
>>>>     [210884.311158]  [<ffffffffa01673f7>]
>>>>     bch_btree_map_keys+0x127/0x150 [bcache]
>>>>     [210884.318273]  [<ffffffffa016dcb0>] ?
>>>>     cached_dev_congested+0x180/0x180 [bcache]
>>>>     [210884.325826]  [<ffffffffa016e7f5>] cache_lookup+0xf5/0x1f0 [bcache]
>>>>     [210884.332325]  [<ffffffff810a4af6>] process_one_work+0x176/0x430
>>>>     [210884.338427]  [<ffffffff810a578b>] worker_thread+0x11b/0x3a0
>>>>     [210884.344282]  [<ffffffff810a5670>] ? rescuer_thread+0x3b0/0x3b0
>>>>     [210884.350447]  [<ffffffff810ac528>] kthread+0xd8/0xf0
>>>>     [210884.355615]  [<ffffffff810ac450>] ? insert_kthread_work+0x40/0x40
>>>>     [210884.362017]  [<ffffffff816ff93c>] ret_from_fork+0x7c/0xb0
>>>>     [210884.367756]  [<ffffffff810ac450>] ? insert_kthread_work+0x40/0x40
>>>>     [210884.374234] Code: 08 01 00 00 48 8b b8 58 cb 00 00 e8 bf 25 01
>>>>     e1 49 8b b4 24 80 00 00 00 49 89 c5 31 d2 0f b7 86 32 04 00 00 66
>>>>     f7 b6 30 04 00 00 <49> c7 45 08 00 00 00 00 0f b7 c0 49 89 45 00
>>>>     48 8b 43 10 48 85
>>>>     [210884.395405] RIP  [<ffffffffa01625fc>]
>>>>     bch_btree_node_read_done+0x4c/0x450 [bcache]
>>>>     [210884.403389]  RSP <ffff8800217bbbe8>
>>>>     [210884.407171] CR2: 0000000000000008
>>>>     [210884.411233] ---[ end trace 0064e6abfd068c85 ]---
>>>>     [210884.416352] BUG: unable to handle kernel paging request at
>>>>     ffffffffffffffd8
>>>>     [210884.423871] IP: [<ffffffff810acb10>] kthread_data+0x10/0x20
>>>>     [210884.429915] PGD 1c14067 PUD 1c16067 PMD 0
>>>>
>>>>     --Larkin
>>>>
>>>>     --
>>>>     To unsubscribe from this list: send the line "unsubscribe
>>>>     linux-bcache" in
>>>>     the body of a message to majordomo@vger.kernel.org
>>>>     <mailto:majordomo@vger.kernel.org>
>>>>     More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-bcache" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-bcache" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Null pointer oops
  2014-08-13 18:45         ` Slava Pestov
@ 2014-08-13 21:21           ` Larkin Lowrey
  2014-08-13 21:25             ` Slava Pestov
  0 siblings, 1 reply; 13+ messages in thread
From: Larkin Lowrey @ 2014-08-13 21:21 UTC (permalink / raw)
  To: Slava Pestov; +Cc: Kent Overstreet, linux-bcache

Here's the dissassembly of bch_btree_node_read_done. The offending line
is 207 and the instruction is at offset 76.

--Larkin

199     void bch_btree_node_read_done(struct btree *b)
200     {
   0x00000000000065b0 <+0>:     callq  0x65b5 <bch_btree_node_read_done+5>
   0x00000000000065b5 <+5>:     push   %rbp
   0x00000000000065b8 <+8>:     mov    %rsp,%rbp
   0x00000000000065bb <+11>:    push   %r15
   0x00000000000065bd <+13>:    push   %r14
   0x00000000000065bf <+15>:    push   %r13
   0x00000000000065c1 <+17>:    push   %r12
   0x00000000000065c3 <+19>:    mov    %rdi,%r12
   0x00000000000065c6 <+22>:    push   %rbx

201             const char *err = "bad btree header";
   0x0000000000006800 <+592>:   mov    $0x0,%rdx

202             struct bset *i = btree_bset_first(b);
203             struct btree_iter *iter;
204
205             iter = mempool_alloc(b->c->fill_iter, GFP_NOWAIT);
   0x00000000000065b6 <+6>:     xor    %esi,%esi
   0x00000000000065c7 <+23>:    mov    0x80(%rdi),%rax
   0x00000000000065d5 <+37>:    mov    0xcb58(%rax),%rdi
   0x00000000000065dc <+44>:    callq  0x65e1 <bch_btree_node_read_done+49>
   0x00000000000065e9 <+57>:    mov    %rax,%r13

206             iter->size = b->c->sb.bucket_size / b->c->sb.block_size;
   0x00000000000065e1 <+49>:    mov    0x80(%r12),%rsi
   0x00000000000065ec <+60>:    xor    %edx,%edx
   0x00000000000065ee <+62>:    movzwl 0x432(%rsi),%eax
   0x00000000000065f5 <+69>:    divw   0x430(%rsi)
   0x0000000000006604 <+84>:    movzwl %ax,%eax
   0x0000000000006607 <+87>:    mov    %rax,0x0(%r13)

207             iter->used = 0;
   0x00000000000065fc <+76>:    movq   $0x0,0x8(%r13)

208
209     #ifdef CONFIG_BCACHE_DEBUG
210             iter->b = &b->keys;
211     #endif
212
213             if (!i->seq)
   0x000000000000660b <+91>:    mov    0x10(%rbx),%rax
   0x000000000000660f <+95>:    test   %rax,%rax
   0x0000000000006612 <+98>:    je     0x6800 <bch_btree_node_read_done+592>

214                     goto err;
215
216             for (;
   0x000000000000664d <+157>:   cmp    %r9d,%ecx
   0x0000000000006650 <+160>:   jae    0x6882 <bch_btree_node_read_done+722>
   0x0000000000006744 <+404>:   cmp    %r9d,%r10d
   0x0000000000006747 <+407>:   jae    0x6898 <bch_btree_node_read_done+744>

217                  b->written < btree_blocks(b) && i->seq ==
b->keys.set[0].data->seq;
   0x0000000000006618 <+104>:   mov    0x80(%r12),%rsi
   0x0000000000006625 <+117>:   movzwl 0xc0(%r12),%edi
   0x000000000000662e <+126>:   mov    0x108(%r12),%r8
   0x0000000000006636 <+134>:   movzwl 0xde2(%rsi),%ecx
   0x0000000000006644 <+148>:   mov    %rdx,%r9
   0x0000000000006647 <+151>:   shr    %cl,%r9
   0x000000000000664a <+154>:   movzwl %di,%ecx
   0x0000000000006656 <+166>:   cmp    0x10(%r8),%rax
   0x000000000000665a <+170>:   jne    0x6882 <bch_btree_node_read_done+722>
   0x000000000000670f <+351>:   mov    %rdx,%r9
   0x000000000000672a <+378>:   movzwl 0xde2(%rsi),%ecx
   0x0000000000006738 <+392>:   shr    %cl,%r9
   0x000000000000674d <+413>:   mov    0x10(%r8),%rcx
   0x0000000000006751 <+417>:   cmp    %rcx,0x10(%rbx)
   0x0000000000006755 <+421>:   jne    0x6898 <bch_btree_node_read_done+744>
   0x0000000000006892 <+738>:   add    %r8,%rbx
   0x0000000000006895 <+741>:   nopl   (%rax)

218                  i = write_block(b)) {
219                     err = "unsupported bset version";
   0x00000000000069c0 <+1040>:  mov    $0x0,%rdx
   0x00000000000069c7 <+1047>:  jmpq   0x6807 <bch_btree_node_read_done+599>
   0x00000000000069cc <+1052>:  nopl   0x0(%rax)

220                     if (i->version > BCACHE_BSET_VERSION)
   0x0000000000006660 <+176>:   mov    0x18(%rbx),%r10d
   0x0000000000006664 <+180>:   cmp    $0x1,%r10d
   0x0000000000006668 <+184>:   ja     0x69c0
<bch_btree_node_read_done+1040>
   0x000000000000666e <+190>:   movzwl 0x430(%rsi),%r11d
   0x0000000000006676 <+198>:   jmpq   0x6769 <bch_btree_node_read_done+441>
   0x000000000000667b <+203>:   nopl   0x0(%rax,%rax,1)
   0x000000000000675b <+427>:   mov    0x18(%rbx),%r10d
   0x000000000000675f <+431>:   cmp    $0x1,%r10d
   0x0000000000006763 <+435>:   ja     0x69c0
<bch_btree_node_read_done+1040>

221                             goto err;
222
223                     err = "bad btree header";
224                     if (b->written + set_blocks(i, block_bytes(b->c)) >
   0x0000000000006769 <+441>:   mov    0x1c(%rbx),%eax
   0x000000000000676c <+444>:   mov    %r11,%rcx
   0x000000000000676f <+447>:   xor    %edx,%edx
   0x0000000000006771 <+449>:   shl    $0x9,%rcx
   0x0000000000006775 <+453>:   movzwl %di,%edi
   0x0000000000006778 <+456>:   mov    %r9d,%r9d
   0x000000000000677b <+459>:   and    $0x1fffe00,%ecx
   0x0000000000006781 <+465>:   lea    0x20(,%rax,8),%r8
   0x0000000000006789 <+473>:   lea    -0x1(%r8,%rcx,1),%rax
   0x000000000000678e <+478>:   div    %rcx
   0x0000000000006791 <+481>:   add    %rdi,%rax
   0x0000000000006794 <+484>:   cmp    %r9,%rax
   0x0000000000006797 <+487>:   ja     0x6800 <bch_btree_node_read_done+592>

225                         btree_blocks(b))
226                             goto err;
227
228                     err = "bad magic";
   0x00000000000069d0 <+1056>:  mov    $0x0,%rdx
   0x00000000000069d7 <+1063>:  jmpq   0x6807 <bch_btree_node_read_done+599>
   0x00000000000069dc <+1068>:  nopl   0x0(%rax)

229                     if (i->magic != bset_magic(&b->c->sb))
   0x00000000000067aa <+506>:   cmp    %rax,0x8(%rbx)
   0x00000000000067ae <+510>:   jne    0x69d0
<bch_btree_node_read_done+1056>

230                             goto err;
231
232                     err = "bad checksum";
   0x00000000000067df <+559>:   mov    $0x0,%rdx
   0x00000000000067e6 <+566>:   jmp    0x6807 <bch_btree_node_read_done+599>
   0x00000000000067e8 <+568>:   nopl   0x0(%rax,%rax,1)
   0x00000000000067f0 <+576>:   mov    0x1c(%rbx),%eax
   0x00000000000067f3 <+579>:   jmpq   0x66bf <bch_btree_node_read_done+271>
   0x00000000000067f8 <+584>:   nopl   0x0(%rax,%rax,1)

233                     switch (i->version) {
   0x00000000000067b4 <+516>:   cmp    $0x1,%r10d
   0x00000000000067bb <+523>:   je     0x6680 <bch_btree_node_read_done+208>

234                     case 0:
235                             if (i->csum != csum_set(i))
   0x00000000000067c1 <+529>:   lea    0x20(%rbx),%r14
   0x00000000000067c5 <+533>:   lea    0x8(%rbx),%rdi
   0x00000000000067ce <+542>:   sub    %rdi,%rsi
   0x00000000000067d1 <+545>:   callq  0x67d6 <bch_btree_node_read_done+550>
   0x00000000000067d6 <+550>:   cmp    %rax,%r15
   0x00000000000067d9 <+553>:   je     0x66a6 <bch_btree_node_read_done+246>
236                                     goto err;
237                             break;
238                     case BCACHE_BSET_VERSION:
239                             if (i->csum != btree_csum_set(b, i))
   0x000000000000669d <+237>:   cmp    %rax,%r15
   0x00000000000066a0 <+240>:   jne    0x67df <bch_btree_node_read_done+559>
   0x00000000000067b8 <+520>:   mov    (%rbx),%r15

240                                     goto err;
241                             break;
242                     }
243
244                     err = "empty set";
   0x00000000000069e0 <+1072>:  mov    $0x0,%rdx
   0x00000000000069e7 <+1079>:  jmpq   0x6807 <bch_btree_node_read_done+599>

245                     if (i != b->keys.set[0].data && !i->keys)
   0x00000000000066a6 <+246>:   cmp    %rbx,0x108(%r12)
   0x00000000000066ae <+254>:   je     0x67f0 <bch_btree_node_read_done+576>
   0x00000000000066b4 <+260>:   mov    0x1c(%rbx),%eax
   0x00000000000066b7 <+263>:   test   %eax,%eax
   0x00000000000066b9 <+265>:   je     0x69e0
<bch_btree_node_read_done+1072>

246                             goto err;
247
248                     bch_btree_iter_push(iter, i->start,
bset_bkey_last(i));
   0x00000000000066c3 <+275>:   mov    %r14,%rsi
   0x00000000000066c6 <+278>:   mov    %r13,%rdi
   0x00000000000066c9 <+281>:   callq  0x66ce <bch_btree_node_read_done+286>

249
250                     b->written += set_blocks(i, block_bytes(b->c));
   0x00000000000066ce <+286>:   mov    0x80(%r12),%rsi
   0x00000000000066d6 <+294>:   mov    0x1c(%rbx),%eax
   0x00000000000066d9 <+297>:   xor    %edx,%edx
   0x00000000000066e3 <+307>:   movzwl 0x430(%rsi),%ecx
   0x00000000000066ea <+314>:   shl    $0x9,%ecx
   0x00000000000066ed <+317>:   movslq %ecx,%rcx
   0x00000000000066f0 <+320>:   lea    0x1f(%rcx,%rax,8),%rax
   0x00000000000066f5 <+325>:   div    %rcx
   0x0000000000006704 <+340>:   mov    %eax,%edi
   0x0000000000006706 <+342>:   add    0xc0(%r12),%di
   0x0000000000006712 <+354>:   mov    %di,0xc0(%r12)

251             }
252
253             err = "corrupted btree";
   0x00000000000069b0 <+1024>:  mov    $0x0,%rdx
   0x00000000000069b7 <+1031>:  jmpq   0x6807 <bch_btree_node_read_done+599>
   0x00000000000069bc <+1036>:  nopl   0x0(%rax)

254             for (i = write_block(b);
   0x00000000000068a1 <+753>:   cmp    %rdx,%rcx
   0x00000000000068a4 <+756>:   jae    0x68e5 <bch_btree_node_read_done+821>
   0x00000000000068e0 <+816>:   cmp    %rdx,%rcx
   0x00000000000068e3 <+819>:   jb     0x68c8 <bch_btree_node_read_done+792>

255                  bset_sector_offset(&b->keys, i) < KEY_SIZE(&b->key);
256                  i = ((void *) i) + block_bytes(b->c))
   0x00000000000068d7 <+807>:   mov    %rcx,%rbx
   0x00000000000068da <+810>:   sub    %r8d,%ecx

257                     if (i->seq == b->keys.set[0].data->seq)
   0x00000000000068a6 <+758>:   mov    0x10(%r8),%rdi
   0x00000000000068aa <+762>:   cmp    %rdi,0x10(%rbx)
   0x00000000000068ae <+766>:   je     0x69b0
<bch_btree_node_read_done+1024>
   0x00000000000068b4 <+772>:   cltq
   0x00000000000068b6 <+774>:   mov    %rax,%r9
   0x00000000000068b9 <+777>:   lea    (%rbx,%rax,1),%rcx
   0x00000000000068bd <+781>:   neg    %r9
   0x00000000000068c0 <+784>:   jmp    0x68d7 <bch_btree_node_read_done+807>
   0x00000000000068c2 <+786>:   nopw   0x0(%rax,%rax,1)
   0x00000000000068c8 <+792>:   lea    (%rbx,%rax,1),%rcx
   0x00000000000068cc <+796>:   cmp    0x10(%rcx,%r9,1),%rdi
   0x00000000000068d1 <+801>:   je     0x69b0
<bch_btree_node_read_done+1024>

258                             goto err;
259
260             bch_btree_sort_and_fix_extents(&b->keys, iter, &b->c->sort);
   0x00000000000068e5 <+821>:   lea    0xc8(%r12),%r14
   0x00000000000068ed <+829>:   lea    0xcb60(%rsi),%rdx
   0x00000000000068f4 <+836>:   mov    %r13,%rsi
   0x00000000000068f7 <+839>:   mov    %r14,%rdi
   0x00000000000068fa <+842>:   callq  0x68ff <bch_btree_node_read_done+847>

261
262             i = b->keys.set[0].data;
   0x0000000000006907 <+855>:   mov    0x108(%r12),%rbx

263             err = "short btree key";
   0x00000000000069ec <+1084>:  mov    $0x0,%rdx
   0x00000000000069f3 <+1091>:  jmpq   0x6807 <bch_btree_node_read_done+599>

264             if (b->keys.set[0].size &&
   0x00000000000068ff <+847>:   mov    0xe0(%r12),%eax
   0x0000000000006914 <+868>:   test   %eax,%eax
   0x0000000000006916 <+870>:   je     0x694d <bch_btree_node_read_done+925>
   0x0000000000006944 <+916>:   test   %rax,%rax
   0x0000000000006947 <+919>:   js     0x69ec
<bch_btree_node_read_done+1084>

265                 bkey_cmp(&b->key, &b->keys.set[0].end) < 0)
266                     goto err;
267
268             if (b->written < btree_blocks(b))
   0x000000000000694d <+925>:   mov    0x80(%r12),%rax
   0x0000000000006955 <+933>:   movzwl 0xc0(%r12),%esi
   0x0000000000006965 <+949>:   movzwl 0xde2(%rax),%ecx
   0x000000000000696c <+956>:   shr    %cl,%rdx
   0x000000000000696f <+959>:   cmp    %edx,%esi
   0x0000000000006971 <+961>:   jae    0x6868 <bch_btree_node_read_done+696>

269                     bch_bset_init_next(&b->keys, write_block(b),
   0x000000000000698f <+991>:   mov    %r14,%rdi
   0x000000000000699e <+1006>:  callq  0x69a3
<bch_btree_node_read_done+1011>
   0x00000000000069a3 <+1011>:  mov    0x80(%r12),%rax
   0x00000000000069ab <+1019>:  jmpq   0x6868 <bch_btree_node_read_done+696>

270                                        bset_magic(&b->c->sb));
271     out:
272             mempool_free(iter, b->c->fill_iter);
   0x0000000000006868 <+696>:   mov    0xcb58(%rax),%rsi
   0x000000000000686f <+703>:   mov    %r13,%rdi
   0x0000000000006872 <+706>:   callq  0x6877 <bch_btree_node_read_done+711>

273             return;
274     err:
275             set_btree_node_io_error(b);
276             bch_cache_set_error(b->c, "%s at bucket %zu, block %u,
%u keys",
   0x0000000000006829 <+633>:   mov    0x1c(%rbx),%r9d
   0x000000000000684a <+666>:   mov    %esi,%ecx
   0x000000000000684c <+668>:   mov    $0x0,%rsi
   0x0000000000006853 <+675>:   shr    %cl,%r8d
   0x0000000000006856 <+678>:   mov    %rax,%rcx
   0x0000000000006859 <+681>:   xor    %eax,%eax
   0x000000000000685b <+683>:   callq  0x6860 <bch_btree_node_read_done+688>
   0x0000000000006860 <+688>:   mov    0x80(%r12),%rax

277                                 err, PTR_BUCKET_NR(b->c, &b->key, 0),
278                                 bset_block_offset(b, i), i->keys);
279             goto out;
280     }
   0x0000000000006877 <+711>:   pop    %rbx
   0x0000000000006878 <+712>:   pop    %r12
   0x000000000000687a <+714>:   pop    %r13
   0x000000000000687c <+716>:   pop    %r14
   0x000000000000687e <+718>:   pop    %r15
   0x0000000000006880 <+720>:   pop    %rbp
   0x0000000000006881 <+721>:   retq
   0x0000000000006882 <+722>:   movzwl 0x430(%rsi),%eax
   0x0000000000006889 <+729>:   shl    $0x9,%eax
   0x000000000000688c <+732>:   imul   %eax,%ecx
   0x000000000000688f <+735>:   movslq %ecx,%rbx


On 8/13/2014 1:45 PM, Slava Pestov wrote:
> Can you post the disassembly of the function?
>
> On Wed, Aug 13, 2014 at 11:35 AM, Larkin Lowrey
> <llowrey@nuclearwinter.com> wrote:
>> Thanks. Trying gdb helped me find the answer. I needed to install the
>> kernel-debuginfo-3.15.8-200.fc20.x86_64  package via yum.
>>
>> From addr2line:
>>> bch_btree_node_read_done+0x4c
>>> drivers/md/bcache/btree.c:207
>> Here'a a snippet from gdb:
>>
>>> (gdb) list *(bch_btree_node_read_done+0x4c)
>>> 0x65fc is in bch_btree_node_read_done (drivers/md/bcache/btree.c:207).
>>> 202             struct bset *i = btree_bset_first(b);
>>> 203             struct btree_iter *iter;
>>> 204
>>> 205             iter = mempool_alloc(b->c->fill_iter, GFP_NOWAIT);
>>> 206             iter->size = b->c->sb.bucket_size / b->c->sb.block_size;
>>> 207             iter->used = 0;
>>> 208
>>> 209     #ifdef CONFIG_BCACHE_DEBUG
>>> 210             iter->b = &b->keys;
>>> 211     #endif
>> This doesn't make any sense to me. If iter was null I would expect line
>> 206 to blow up first.
>>
>> --Larkin
>>
>> On 8/13/2014 12:41 PM, Slava Pestov wrote:
>>> You can try to use gdb:
>>>
>>> gdb /lib/modules/.../foo.ko
>>>
>>> list *(bch_btree_node_read_done+0x4c)
>>>
>>>
>>> On Wed, Aug 13, 2014 at 9:40 AM, Larkin Lowrey
>>> <llowrey@nuclearwinter.com> wrote:
>>>> This is making be feel very dumb. I've googled extensively but can't
>>>> figure out how to run addr2line for a module.
>>>>
>>>> I'm running Fedora 20 and the kernel did not have debugging symbols. I
>>>> downloaded the version with symbols but I don't know if the addresses
>>>> are going to be the same. Bcache is a module for me and that's where
>>>> things get tricky. Do you have any tips?
>>>>
>>>> --Larkin
>>>>
>>>> On 8/13/2014 12:04 AM, Kent Overstreet wrote:
>>>>> Any chance you could do an addr2line and get me the exact line where
>>>>> it happened?
>>>>>
>>>>> On Aug 12, 2014 10:02 PM, "Larkin Lowrey" <llowrey@nuclearwinter.com
>>>>> <mailto:llowrey@nuclearwinter.com>> wrote:
>>>>>
>>>>>     I got an oops while doing some heavy I/O. I have an md raid10 cache
>>>>>     device (4 SSDs) and 3 md raid5/6 backing devices. This setup has been
>>>>>     well behaved for about 6 months.
>>>>>
>>>>>     If this isn't a known issue is there anything I can do to provide more
>>>>>     useful information?
>>>>>
>>>>>     I'm running kernel 3.15.8-200.fc20.x86_64.
>>>>>
>>>>>     [210884.047249] BUG: unable to handle kernel NULL pointer
>>>>>     dereference at 0000000000000008
>>>>>     [210884.055605] IP: [<ffffffffa01625fc>]
>>>>>     bch_btree_node_read_done+0x4c/0x450 [bcache]
>>>>>     [210884.063723] PGD 0
>>>>>     [210884.066053] Oops: 0002 [#1] SMP
>>>>>     [210884.069610] Modules linked in: lp parport binfmt_misc
>>>>>     ipt_MASQUERADE iptable_nat nf_nat_ipv4 nf_nat xt_CHECKSUM
>>>>>     iptable_mangle tun bridge stp llc xt_multiport ebtable_nat
>>>>>     ebtables hwmon_vid ip6t_REJECT nf_conntrack_ipv6 nf_conntrack_ipv4
>>>>>     nf_defrag_ipv6 nf_defrag_ipv4 ip6table_filter xt_conntrack
>>>>>     ip6_tables nf_conntrack keyspan ezusb kvm_amd kvm crct10dif_pclmul
>>>>>     crc32_pclmul crc32c_intel ghash_clmulni_intel microcode serio_raw
>>>>>     amd64_edac_mod edac_core fam15h_power k10temp edac_mce_amd
>>>>>     sp5100_tco i2c_piix4 igb ptp pps_core dca shpchp acpi_cpufreq
>>>>>     btrfs bcache raid456 async_raid6_recov async_memcpy async_pq
>>>>>     async_xor async_tx xor raid6_pq raid10 i2c_algo_bit drm_kms_helper
>>>>>     ttm drm i2c_core mpt2sas mvsas libsas raid_class
>>>>>     scsi_transport_sas cpufreq_stats
>>>>>     [210884.140704] CPU: 5 PID: 11188 Comm: kworker/5:1 Not tainted
>>>>>     3.15.8-200.fc20.x86_64 #1
>>>>>     [210884.149069] Hardware name:  /H8DG6/H8DGi, BIOS 3.0a       07/2
>>>>>     [210884.155280] Workqueue: bcache cache_lookup [bcache]
>>>>>     [210884.160531] task: ffff880218633160 ti: ffff8800217b8000
>>>>>     task.ti: ffff8800217b8000
>>>>>     [210884.168502] RIP: 0010:[<ffffffffa01625fc>]
>>>>>      [<ffffffffa01625fc>] bch_btree_node_read_done+0x4c/0x450 [bcache]
>>>>>     [210884.179105] RSP: 0000:ffff8800217bbbe8  EFLAGS: 00010212
>>>>>     [210884.184806] RAX: 0000000000000400 RBX: ffff880245ec0000 RCX:
>>>>>     0000000000000000
>>>>>     [210884.192480] RDX: 0000000000000000 RSI: ffff880418380000 RDI:
>>>>>     0000000000000246
>>>>>     [210884.200075] RBP: ffff8800217bbc10 R08: 0000000000000000 R09:
>>>>>     0000000000000f6b
>>>>>     [210884.207738] R10: 0000000000000000 R11: 0000000000000400 R12:
>>>>>     ffff880413d06c00
>>>>>     [210884.215391] R13: 0000000000000000 R14: ffff8800217bbc20 R15:
>>>>>     ffff880413d06c00
>>>>>     [210884.222961] FS:  00007f73bacd6880(0000)
>>>>>     GS:ffff88021fd40000(0000) knlGS:0000000000000000
>>>>>     [210884.231516] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
>>>>>     [210884.237557] CR2: 0000000000000008 CR3: 0000000001c11000 CR4:
>>>>>     00000000000407e0
>>>>>     [210884.245131] Stack:
>>>>>     [210884.247395]  ffff880274f4d020 ffff880413d06c00
>>>>>     0000bfcc44a463f8 ffff8800217bbc20
>>>>>     [210884.255337]  ffff880413d06c00 ffff8800217bbc78
>>>>>     ffffffffa0162b68 0000000000000000
>>>>>     [210884.263256]  ffff880218633160 0000000000000000
>>>>>     0000000000000000 0000000000000000
>>>>>     [210884.271234] Call Trace:
>>>>>     [210884.273985]  [<ffffffffa0162b68>]
>>>>>     bch_btree_node_read+0x168/0x190 [bcache]
>>>>>     [210884.281258]  [<ffffffffa0163f69>]
>>>>>     bch_btree_node_get+0x169/0x290 [bcache]
>>>>>     [210884.288377]  [<ffffffffa01642f5>]
>>>>>     bch_btree_map_keys_recurse+0xd5/0x1d0 [bcache]
>>>>>     [210884.296311]  [<ffffffffa016dcb0>] ?
>>>>>     cached_dev_congested+0x180/0x180 [bcache]
>>>>>     [210884.303953]  [<ffffffff8135b204>] ?
>>>>>     call_rwsem_down_read_failed+0x14/0x30
>>>>>     [210884.311158]  [<ffffffffa01673f7>]
>>>>>     bch_btree_map_keys+0x127/0x150 [bcache]
>>>>>     [210884.318273]  [<ffffffffa016dcb0>] ?
>>>>>     cached_dev_congested+0x180/0x180 [bcache]
>>>>>     [210884.325826]  [<ffffffffa016e7f5>] cache_lookup+0xf5/0x1f0 [bcache]
>>>>>     [210884.332325]  [<ffffffff810a4af6>] process_one_work+0x176/0x430
>>>>>     [210884.338427]  [<ffffffff810a578b>] worker_thread+0x11b/0x3a0
>>>>>     [210884.344282]  [<ffffffff810a5670>] ? rescuer_thread+0x3b0/0x3b0
>>>>>     [210884.350447]  [<ffffffff810ac528>] kthread+0xd8/0xf0
>>>>>     [210884.355615]  [<ffffffff810ac450>] ? insert_kthread_work+0x40/0x40
>>>>>     [210884.362017]  [<ffffffff816ff93c>] ret_from_fork+0x7c/0xb0
>>>>>     [210884.367756]  [<ffffffff810ac450>] ? insert_kthread_work+0x40/0x40
>>>>>     [210884.374234] Code: 08 01 00 00 48 8b b8 58 cb 00 00 e8 bf 25 01
>>>>>     e1 49 8b b4 24 80 00 00 00 49 89 c5 31 d2 0f b7 86 32 04 00 00 66
>>>>>     f7 b6 30 04 00 00 <49> c7 45 08 00 00 00 00 0f b7 c0 49 89 45 00
>>>>>     48 8b 43 10 48 85
>>>>>     [210884.395405] RIP  [<ffffffffa01625fc>]
>>>>>     bch_btree_node_read_done+0x4c/0x450 [bcache]
>>>>>     [210884.403389]  RSP <ffff8800217bbbe8>
>>>>>     [210884.407171] CR2: 0000000000000008
>>>>>     [210884.411233] ---[ end trace 0064e6abfd068c85 ]---
>>>>>     [210884.416352] BUG: unable to handle kernel paging request at
>>>>>     ffffffffffffffd8
>>>>>     [210884.423871] IP: [<ffffffff810acb10>] kthread_data+0x10/0x20
>>>>>     [210884.429915] PGD 1c14067 PUD 1c16067 PMD 0
>>>>>
>>>>>     --Larkin
>>>>>
>>>>>     --
>>>>>     To unsubscribe from this list: send the line "unsubscribe
>>>>>     linux-bcache" in
>>>>>     the body of a message to majordomo@vger.kernel.org
>>>>>     <mailto:majordomo@vger.kernel.org>
>>>>>     More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>
>>>> --
>>>> To unsubscribe from this list: send the line "unsubscribe linux-bcache" in
>>>> the body of a message to majordomo@vger.kernel.org
>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-bcache" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Null pointer oops
  2014-08-13 21:21           ` Larkin Lowrey
@ 2014-08-13 21:25             ` Slava Pestov
  2014-08-13 21:30               ` Slava Pestov
  2014-08-13 21:32               ` Larkin Lowrey
  0 siblings, 2 replies; 13+ messages in thread
From: Slava Pestov @ 2014-08-13 21:25 UTC (permalink / raw)
  To: Larkin Lowrey; +Cc: Kent Overstreet, linux-bcache

Indeed it looks like iter is NULL. I see the bug is still present in
the latest dev branch. The problem is that we're not checking the
return value of mempoool_alloc(), which may be NULL if we pass
GFP_NOWAIT.

On Wed, Aug 13, 2014 at 2:21 PM, Larkin Lowrey
<llowrey@nuclearwinter.com> wrote:
> Here's the dissassembly of bch_btree_node_read_done. The offending line
> is 207 and the instruction is at offset 76.
>
> --Larkin
>
> 199     void bch_btree_node_read_done(struct btree *b)
> 200     {
>    0x00000000000065b0 <+0>:     callq  0x65b5 <bch_btree_node_read_done+5>
>    0x00000000000065b5 <+5>:     push   %rbp
>    0x00000000000065b8 <+8>:     mov    %rsp,%rbp
>    0x00000000000065bb <+11>:    push   %r15
>    0x00000000000065bd <+13>:    push   %r14
>    0x00000000000065bf <+15>:    push   %r13
>    0x00000000000065c1 <+17>:    push   %r12
>    0x00000000000065c3 <+19>:    mov    %rdi,%r12
>    0x00000000000065c6 <+22>:    push   %rbx
>
> 201             const char *err = "bad btree header";
>    0x0000000000006800 <+592>:   mov    $0x0,%rdx
>
> 202             struct bset *i = btree_bset_first(b);
> 203             struct btree_iter *iter;
> 204
> 205             iter = mempool_alloc(b->c->fill_iter, GFP_NOWAIT);
>    0x00000000000065b6 <+6>:     xor    %esi,%esi
>    0x00000000000065c7 <+23>:    mov    0x80(%rdi),%rax
>    0x00000000000065d5 <+37>:    mov    0xcb58(%rax),%rdi
>    0x00000000000065dc <+44>:    callq  0x65e1 <bch_btree_node_read_done+49>
>    0x00000000000065e9 <+57>:    mov    %rax,%r13
>
> 206             iter->size = b->c->sb.bucket_size / b->c->sb.block_size;
>    0x00000000000065e1 <+49>:    mov    0x80(%r12),%rsi
>    0x00000000000065ec <+60>:    xor    %edx,%edx
>    0x00000000000065ee <+62>:    movzwl 0x432(%rsi),%eax
>    0x00000000000065f5 <+69>:    divw   0x430(%rsi)
>    0x0000000000006604 <+84>:    movzwl %ax,%eax
>    0x0000000000006607 <+87>:    mov    %rax,0x0(%r13)
>
> 207             iter->used = 0;
>    0x00000000000065fc <+76>:    movq   $0x0,0x8(%r13)
>
> 208
> 209     #ifdef CONFIG_BCACHE_DEBUG
> 210             iter->b = &b->keys;
> 211     #endif
> 212
> 213             if (!i->seq)
>    0x000000000000660b <+91>:    mov    0x10(%rbx),%rax
>    0x000000000000660f <+95>:    test   %rax,%rax
>    0x0000000000006612 <+98>:    je     0x6800 <bch_btree_node_read_done+592>
>
> 214                     goto err;
> 215
> 216             for (;
>    0x000000000000664d <+157>:   cmp    %r9d,%ecx
>    0x0000000000006650 <+160>:   jae    0x6882 <bch_btree_node_read_done+722>
>    0x0000000000006744 <+404>:   cmp    %r9d,%r10d
>    0x0000000000006747 <+407>:   jae    0x6898 <bch_btree_node_read_done+744>
>
> 217                  b->written < btree_blocks(b) && i->seq ==
> b->keys.set[0].data->seq;
>    0x0000000000006618 <+104>:   mov    0x80(%r12),%rsi
>    0x0000000000006625 <+117>:   movzwl 0xc0(%r12),%edi
>    0x000000000000662e <+126>:   mov    0x108(%r12),%r8
>    0x0000000000006636 <+134>:   movzwl 0xde2(%rsi),%ecx
>    0x0000000000006644 <+148>:   mov    %rdx,%r9
>    0x0000000000006647 <+151>:   shr    %cl,%r9
>    0x000000000000664a <+154>:   movzwl %di,%ecx
>    0x0000000000006656 <+166>:   cmp    0x10(%r8),%rax
>    0x000000000000665a <+170>:   jne    0x6882 <bch_btree_node_read_done+722>
>    0x000000000000670f <+351>:   mov    %rdx,%r9
>    0x000000000000672a <+378>:   movzwl 0xde2(%rsi),%ecx
>    0x0000000000006738 <+392>:   shr    %cl,%r9
>    0x000000000000674d <+413>:   mov    0x10(%r8),%rcx
>    0x0000000000006751 <+417>:   cmp    %rcx,0x10(%rbx)
>    0x0000000000006755 <+421>:   jne    0x6898 <bch_btree_node_read_done+744>
>    0x0000000000006892 <+738>:   add    %r8,%rbx
>    0x0000000000006895 <+741>:   nopl   (%rax)
>
> 218                  i = write_block(b)) {
> 219                     err = "unsupported bset version";
>    0x00000000000069c0 <+1040>:  mov    $0x0,%rdx
>    0x00000000000069c7 <+1047>:  jmpq   0x6807 <bch_btree_node_read_done+599>
>    0x00000000000069cc <+1052>:  nopl   0x0(%rax)
>
> 220                     if (i->version > BCACHE_BSET_VERSION)
>    0x0000000000006660 <+176>:   mov    0x18(%rbx),%r10d
>    0x0000000000006664 <+180>:   cmp    $0x1,%r10d
>    0x0000000000006668 <+184>:   ja     0x69c0
> <bch_btree_node_read_done+1040>
>    0x000000000000666e <+190>:   movzwl 0x430(%rsi),%r11d
>    0x0000000000006676 <+198>:   jmpq   0x6769 <bch_btree_node_read_done+441>
>    0x000000000000667b <+203>:   nopl   0x0(%rax,%rax,1)
>    0x000000000000675b <+427>:   mov    0x18(%rbx),%r10d
>    0x000000000000675f <+431>:   cmp    $0x1,%r10d
>    0x0000000000006763 <+435>:   ja     0x69c0
> <bch_btree_node_read_done+1040>
>
> 221                             goto err;
> 222
> 223                     err = "bad btree header";
> 224                     if (b->written + set_blocks(i, block_bytes(b->c)) >
>    0x0000000000006769 <+441>:   mov    0x1c(%rbx),%eax
>    0x000000000000676c <+444>:   mov    %r11,%rcx
>    0x000000000000676f <+447>:   xor    %edx,%edx
>    0x0000000000006771 <+449>:   shl    $0x9,%rcx
>    0x0000000000006775 <+453>:   movzwl %di,%edi
>    0x0000000000006778 <+456>:   mov    %r9d,%r9d
>    0x000000000000677b <+459>:   and    $0x1fffe00,%ecx
>    0x0000000000006781 <+465>:   lea    0x20(,%rax,8),%r8
>    0x0000000000006789 <+473>:   lea    -0x1(%r8,%rcx,1),%rax
>    0x000000000000678e <+478>:   div    %rcx
>    0x0000000000006791 <+481>:   add    %rdi,%rax
>    0x0000000000006794 <+484>:   cmp    %r9,%rax
>    0x0000000000006797 <+487>:   ja     0x6800 <bch_btree_node_read_done+592>
>
> 225                         btree_blocks(b))
> 226                             goto err;
> 227
> 228                     err = "bad magic";
>    0x00000000000069d0 <+1056>:  mov    $0x0,%rdx
>    0x00000000000069d7 <+1063>:  jmpq   0x6807 <bch_btree_node_read_done+599>
>    0x00000000000069dc <+1068>:  nopl   0x0(%rax)
>
> 229                     if (i->magic != bset_magic(&b->c->sb))
>    0x00000000000067aa <+506>:   cmp    %rax,0x8(%rbx)
>    0x00000000000067ae <+510>:   jne    0x69d0
> <bch_btree_node_read_done+1056>
>
> 230                             goto err;
> 231
> 232                     err = "bad checksum";
>    0x00000000000067df <+559>:   mov    $0x0,%rdx
>    0x00000000000067e6 <+566>:   jmp    0x6807 <bch_btree_node_read_done+599>
>    0x00000000000067e8 <+568>:   nopl   0x0(%rax,%rax,1)
>    0x00000000000067f0 <+576>:   mov    0x1c(%rbx),%eax
>    0x00000000000067f3 <+579>:   jmpq   0x66bf <bch_btree_node_read_done+271>
>    0x00000000000067f8 <+584>:   nopl   0x0(%rax,%rax,1)
>
> 233                     switch (i->version) {
>    0x00000000000067b4 <+516>:   cmp    $0x1,%r10d
>    0x00000000000067bb <+523>:   je     0x6680 <bch_btree_node_read_done+208>
>
> 234                     case 0:
> 235                             if (i->csum != csum_set(i))
>    0x00000000000067c1 <+529>:   lea    0x20(%rbx),%r14
>    0x00000000000067c5 <+533>:   lea    0x8(%rbx),%rdi
>    0x00000000000067ce <+542>:   sub    %rdi,%rsi
>    0x00000000000067d1 <+545>:   callq  0x67d6 <bch_btree_node_read_done+550>
>    0x00000000000067d6 <+550>:   cmp    %rax,%r15
>    0x00000000000067d9 <+553>:   je     0x66a6 <bch_btree_node_read_done+246>
> 236                                     goto err;
> 237                             break;
> 238                     case BCACHE_BSET_VERSION:
> 239                             if (i->csum != btree_csum_set(b, i))
>    0x000000000000669d <+237>:   cmp    %rax,%r15
>    0x00000000000066a0 <+240>:   jne    0x67df <bch_btree_node_read_done+559>
>    0x00000000000067b8 <+520>:   mov    (%rbx),%r15
>
> 240                                     goto err;
> 241                             break;
> 242                     }
> 243
> 244                     err = "empty set";
>    0x00000000000069e0 <+1072>:  mov    $0x0,%rdx
>    0x00000000000069e7 <+1079>:  jmpq   0x6807 <bch_btree_node_read_done+599>
>
> 245                     if (i != b->keys.set[0].data && !i->keys)
>    0x00000000000066a6 <+246>:   cmp    %rbx,0x108(%r12)
>    0x00000000000066ae <+254>:   je     0x67f0 <bch_btree_node_read_done+576>
>    0x00000000000066b4 <+260>:   mov    0x1c(%rbx),%eax
>    0x00000000000066b7 <+263>:   test   %eax,%eax
>    0x00000000000066b9 <+265>:   je     0x69e0
> <bch_btree_node_read_done+1072>
>
> 246                             goto err;
> 247
> 248                     bch_btree_iter_push(iter, i->start,
> bset_bkey_last(i));
>    0x00000000000066c3 <+275>:   mov    %r14,%rsi
>    0x00000000000066c6 <+278>:   mov    %r13,%rdi
>    0x00000000000066c9 <+281>:   callq  0x66ce <bch_btree_node_read_done+286>
>
> 249
> 250                     b->written += set_blocks(i, block_bytes(b->c));
>    0x00000000000066ce <+286>:   mov    0x80(%r12),%rsi
>    0x00000000000066d6 <+294>:   mov    0x1c(%rbx),%eax
>    0x00000000000066d9 <+297>:   xor    %edx,%edx
>    0x00000000000066e3 <+307>:   movzwl 0x430(%rsi),%ecx
>    0x00000000000066ea <+314>:   shl    $0x9,%ecx
>    0x00000000000066ed <+317>:   movslq %ecx,%rcx
>    0x00000000000066f0 <+320>:   lea    0x1f(%rcx,%rax,8),%rax
>    0x00000000000066f5 <+325>:   div    %rcx
>    0x0000000000006704 <+340>:   mov    %eax,%edi
>    0x0000000000006706 <+342>:   add    0xc0(%r12),%di
>    0x0000000000006712 <+354>:   mov    %di,0xc0(%r12)
>
> 251             }
> 252
> 253             err = "corrupted btree";
>    0x00000000000069b0 <+1024>:  mov    $0x0,%rdx
>    0x00000000000069b7 <+1031>:  jmpq   0x6807 <bch_btree_node_read_done+599>
>    0x00000000000069bc <+1036>:  nopl   0x0(%rax)
>
> 254             for (i = write_block(b);
>    0x00000000000068a1 <+753>:   cmp    %rdx,%rcx
>    0x00000000000068a4 <+756>:   jae    0x68e5 <bch_btree_node_read_done+821>
>    0x00000000000068e0 <+816>:   cmp    %rdx,%rcx
>    0x00000000000068e3 <+819>:   jb     0x68c8 <bch_btree_node_read_done+792>
>
> 255                  bset_sector_offset(&b->keys, i) < KEY_SIZE(&b->key);
> 256                  i = ((void *) i) + block_bytes(b->c))
>    0x00000000000068d7 <+807>:   mov    %rcx,%rbx
>    0x00000000000068da <+810>:   sub    %r8d,%ecx
>
> 257                     if (i->seq == b->keys.set[0].data->seq)
>    0x00000000000068a6 <+758>:   mov    0x10(%r8),%rdi
>    0x00000000000068aa <+762>:   cmp    %rdi,0x10(%rbx)
>    0x00000000000068ae <+766>:   je     0x69b0
> <bch_btree_node_read_done+1024>
>    0x00000000000068b4 <+772>:   cltq
>    0x00000000000068b6 <+774>:   mov    %rax,%r9
>    0x00000000000068b9 <+777>:   lea    (%rbx,%rax,1),%rcx
>    0x00000000000068bd <+781>:   neg    %r9
>    0x00000000000068c0 <+784>:   jmp    0x68d7 <bch_btree_node_read_done+807>
>    0x00000000000068c2 <+786>:   nopw   0x0(%rax,%rax,1)
>    0x00000000000068c8 <+792>:   lea    (%rbx,%rax,1),%rcx
>    0x00000000000068cc <+796>:   cmp    0x10(%rcx,%r9,1),%rdi
>    0x00000000000068d1 <+801>:   je     0x69b0
> <bch_btree_node_read_done+1024>
>
> 258                             goto err;
> 259
> 260             bch_btree_sort_and_fix_extents(&b->keys, iter, &b->c->sort);
>    0x00000000000068e5 <+821>:   lea    0xc8(%r12),%r14
>    0x00000000000068ed <+829>:   lea    0xcb60(%rsi),%rdx
>    0x00000000000068f4 <+836>:   mov    %r13,%rsi
>    0x00000000000068f7 <+839>:   mov    %r14,%rdi
>    0x00000000000068fa <+842>:   callq  0x68ff <bch_btree_node_read_done+847>
>
> 261
> 262             i = b->keys.set[0].data;
>    0x0000000000006907 <+855>:   mov    0x108(%r12),%rbx
>
> 263             err = "short btree key";
>    0x00000000000069ec <+1084>:  mov    $0x0,%rdx
>    0x00000000000069f3 <+1091>:  jmpq   0x6807 <bch_btree_node_read_done+599>
>
> 264             if (b->keys.set[0].size &&
>    0x00000000000068ff <+847>:   mov    0xe0(%r12),%eax
>    0x0000000000006914 <+868>:   test   %eax,%eax
>    0x0000000000006916 <+870>:   je     0x694d <bch_btree_node_read_done+925>
>    0x0000000000006944 <+916>:   test   %rax,%rax
>    0x0000000000006947 <+919>:   js     0x69ec
> <bch_btree_node_read_done+1084>
>
> 265                 bkey_cmp(&b->key, &b->keys.set[0].end) < 0)
> 266                     goto err;
> 267
> 268             if (b->written < btree_blocks(b))
>    0x000000000000694d <+925>:   mov    0x80(%r12),%rax
>    0x0000000000006955 <+933>:   movzwl 0xc0(%r12),%esi
>    0x0000000000006965 <+949>:   movzwl 0xde2(%rax),%ecx
>    0x000000000000696c <+956>:   shr    %cl,%rdx
>    0x000000000000696f <+959>:   cmp    %edx,%esi
>    0x0000000000006971 <+961>:   jae    0x6868 <bch_btree_node_read_done+696>
>
> 269                     bch_bset_init_next(&b->keys, write_block(b),
>    0x000000000000698f <+991>:   mov    %r14,%rdi
>    0x000000000000699e <+1006>:  callq  0x69a3
> <bch_btree_node_read_done+1011>
>    0x00000000000069a3 <+1011>:  mov    0x80(%r12),%rax
>    0x00000000000069ab <+1019>:  jmpq   0x6868 <bch_btree_node_read_done+696>
>
> 270                                        bset_magic(&b->c->sb));
> 271     out:
> 272             mempool_free(iter, b->c->fill_iter);
>    0x0000000000006868 <+696>:   mov    0xcb58(%rax),%rsi
>    0x000000000000686f <+703>:   mov    %r13,%rdi
>    0x0000000000006872 <+706>:   callq  0x6877 <bch_btree_node_read_done+711>
>
> 273             return;
> 274     err:
> 275             set_btree_node_io_error(b);
> 276             bch_cache_set_error(b->c, "%s at bucket %zu, block %u,
> %u keys",
>    0x0000000000006829 <+633>:   mov    0x1c(%rbx),%r9d
>    0x000000000000684a <+666>:   mov    %esi,%ecx
>    0x000000000000684c <+668>:   mov    $0x0,%rsi
>    0x0000000000006853 <+675>:   shr    %cl,%r8d
>    0x0000000000006856 <+678>:   mov    %rax,%rcx
>    0x0000000000006859 <+681>:   xor    %eax,%eax
>    0x000000000000685b <+683>:   callq  0x6860 <bch_btree_node_read_done+688>
>    0x0000000000006860 <+688>:   mov    0x80(%r12),%rax
>
> 277                                 err, PTR_BUCKET_NR(b->c, &b->key, 0),
> 278                                 bset_block_offset(b, i), i->keys);
> 279             goto out;
> 280     }
>    0x0000000000006877 <+711>:   pop    %rbx
>    0x0000000000006878 <+712>:   pop    %r12
>    0x000000000000687a <+714>:   pop    %r13
>    0x000000000000687c <+716>:   pop    %r14
>    0x000000000000687e <+718>:   pop    %r15
>    0x0000000000006880 <+720>:   pop    %rbp
>    0x0000000000006881 <+721>:   retq
>    0x0000000000006882 <+722>:   movzwl 0x430(%rsi),%eax
>    0x0000000000006889 <+729>:   shl    $0x9,%eax
>    0x000000000000688c <+732>:   imul   %eax,%ecx
>    0x000000000000688f <+735>:   movslq %ecx,%rbx
>
>
> On 8/13/2014 1:45 PM, Slava Pestov wrote:
>> Can you post the disassembly of the function?
>>
>> On Wed, Aug 13, 2014 at 11:35 AM, Larkin Lowrey
>> <llowrey@nuclearwinter.com> wrote:
>>> Thanks. Trying gdb helped me find the answer. I needed to install the
>>> kernel-debuginfo-3.15.8-200.fc20.x86_64  package via yum.
>>>
>>> From addr2line:
>>>> bch_btree_node_read_done+0x4c
>>>> drivers/md/bcache/btree.c:207
>>> Here'a a snippet from gdb:
>>>
>>>> (gdb) list *(bch_btree_node_read_done+0x4c)
>>>> 0x65fc is in bch_btree_node_read_done (drivers/md/bcache/btree.c:207).
>>>> 202             struct bset *i = btree_bset_first(b);
>>>> 203             struct btree_iter *iter;
>>>> 204
>>>> 205             iter = mempool_alloc(b->c->fill_iter, GFP_NOWAIT);
>>>> 206             iter->size = b->c->sb.bucket_size / b->c->sb.block_size;
>>>> 207             iter->used = 0;
>>>> 208
>>>> 209     #ifdef CONFIG_BCACHE_DEBUG
>>>> 210             iter->b = &b->keys;
>>>> 211     #endif
>>> This doesn't make any sense to me. If iter was null I would expect line
>>> 206 to blow up first.
>>>
>>> --Larkin
>>>
>>> On 8/13/2014 12:41 PM, Slava Pestov wrote:
>>>> You can try to use gdb:
>>>>
>>>> gdb /lib/modules/.../foo.ko
>>>>
>>>> list *(bch_btree_node_read_done+0x4c)
>>>>
>>>>
>>>> On Wed, Aug 13, 2014 at 9:40 AM, Larkin Lowrey
>>>> <llowrey@nuclearwinter.com> wrote:
>>>>> This is making be feel very dumb. I've googled extensively but can't
>>>>> figure out how to run addr2line for a module.
>>>>>
>>>>> I'm running Fedora 20 and the kernel did not have debugging symbols. I
>>>>> downloaded the version with symbols but I don't know if the addresses
>>>>> are going to be the same. Bcache is a module for me and that's where
>>>>> things get tricky. Do you have any tips?
>>>>>
>>>>> --Larkin
>>>>>
>>>>> On 8/13/2014 12:04 AM, Kent Overstreet wrote:
>>>>>> Any chance you could do an addr2line and get me the exact line where
>>>>>> it happened?
>>>>>>
>>>>>> On Aug 12, 2014 10:02 PM, "Larkin Lowrey" <llowrey@nuclearwinter.com
>>>>>> <mailto:llowrey@nuclearwinter.com>> wrote:
>>>>>>
>>>>>>     I got an oops while doing some heavy I/O. I have an md raid10 cache
>>>>>>     device (4 SSDs) and 3 md raid5/6 backing devices. This setup has been
>>>>>>     well behaved for about 6 months.
>>>>>>
>>>>>>     If this isn't a known issue is there anything I can do to provide more
>>>>>>     useful information?
>>>>>>
>>>>>>     I'm running kernel 3.15.8-200.fc20.x86_64.
>>>>>>
>>>>>>     [210884.047249] BUG: unable to handle kernel NULL pointer
>>>>>>     dereference at 0000000000000008
>>>>>>     [210884.055605] IP: [<ffffffffa01625fc>]
>>>>>>     bch_btree_node_read_done+0x4c/0x450 [bcache]
>>>>>>     [210884.063723] PGD 0
>>>>>>     [210884.066053] Oops: 0002 [#1] SMP
>>>>>>     [210884.069610] Modules linked in: lp parport binfmt_misc
>>>>>>     ipt_MASQUERADE iptable_nat nf_nat_ipv4 nf_nat xt_CHECKSUM
>>>>>>     iptable_mangle tun bridge stp llc xt_multiport ebtable_nat
>>>>>>     ebtables hwmon_vid ip6t_REJECT nf_conntrack_ipv6 nf_conntrack_ipv4
>>>>>>     nf_defrag_ipv6 nf_defrag_ipv4 ip6table_filter xt_conntrack
>>>>>>     ip6_tables nf_conntrack keyspan ezusb kvm_amd kvm crct10dif_pclmul
>>>>>>     crc32_pclmul crc32c_intel ghash_clmulni_intel microcode serio_raw
>>>>>>     amd64_edac_mod edac_core fam15h_power k10temp edac_mce_amd
>>>>>>     sp5100_tco i2c_piix4 igb ptp pps_core dca shpchp acpi_cpufreq
>>>>>>     btrfs bcache raid456 async_raid6_recov async_memcpy async_pq
>>>>>>     async_xor async_tx xor raid6_pq raid10 i2c_algo_bit drm_kms_helper
>>>>>>     ttm drm i2c_core mpt2sas mvsas libsas raid_class
>>>>>>     scsi_transport_sas cpufreq_stats
>>>>>>     [210884.140704] CPU: 5 PID: 11188 Comm: kworker/5:1 Not tainted
>>>>>>     3.15.8-200.fc20.x86_64 #1
>>>>>>     [210884.149069] Hardware name:  /H8DG6/H8DGi, BIOS 3.0a       07/2
>>>>>>     [210884.155280] Workqueue: bcache cache_lookup [bcache]
>>>>>>     [210884.160531] task: ffff880218633160 ti: ffff8800217b8000
>>>>>>     task.ti: ffff8800217b8000
>>>>>>     [210884.168502] RIP: 0010:[<ffffffffa01625fc>]
>>>>>>      [<ffffffffa01625fc>] bch_btree_node_read_done+0x4c/0x450 [bcache]
>>>>>>     [210884.179105] RSP: 0000:ffff8800217bbbe8  EFLAGS: 00010212
>>>>>>     [210884.184806] RAX: 0000000000000400 RBX: ffff880245ec0000 RCX:
>>>>>>     0000000000000000
>>>>>>     [210884.192480] RDX: 0000000000000000 RSI: ffff880418380000 RDI:
>>>>>>     0000000000000246
>>>>>>     [210884.200075] RBP: ffff8800217bbc10 R08: 0000000000000000 R09:
>>>>>>     0000000000000f6b
>>>>>>     [210884.207738] R10: 0000000000000000 R11: 0000000000000400 R12:
>>>>>>     ffff880413d06c00
>>>>>>     [210884.215391] R13: 0000000000000000 R14: ffff8800217bbc20 R15:
>>>>>>     ffff880413d06c00
>>>>>>     [210884.222961] FS:  00007f73bacd6880(0000)
>>>>>>     GS:ffff88021fd40000(0000) knlGS:0000000000000000
>>>>>>     [210884.231516] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
>>>>>>     [210884.237557] CR2: 0000000000000008 CR3: 0000000001c11000 CR4:
>>>>>>     00000000000407e0
>>>>>>     [210884.245131] Stack:
>>>>>>     [210884.247395]  ffff880274f4d020 ffff880413d06c00
>>>>>>     0000bfcc44a463f8 ffff8800217bbc20
>>>>>>     [210884.255337]  ffff880413d06c00 ffff8800217bbc78
>>>>>>     ffffffffa0162b68 0000000000000000
>>>>>>     [210884.263256]  ffff880218633160 0000000000000000
>>>>>>     0000000000000000 0000000000000000
>>>>>>     [210884.271234] Call Trace:
>>>>>>     [210884.273985]  [<ffffffffa0162b68>]
>>>>>>     bch_btree_node_read+0x168/0x190 [bcache]
>>>>>>     [210884.281258]  [<ffffffffa0163f69>]
>>>>>>     bch_btree_node_get+0x169/0x290 [bcache]
>>>>>>     [210884.288377]  [<ffffffffa01642f5>]
>>>>>>     bch_btree_map_keys_recurse+0xd5/0x1d0 [bcache]
>>>>>>     [210884.296311]  [<ffffffffa016dcb0>] ?
>>>>>>     cached_dev_congested+0x180/0x180 [bcache]
>>>>>>     [210884.303953]  [<ffffffff8135b204>] ?
>>>>>>     call_rwsem_down_read_failed+0x14/0x30
>>>>>>     [210884.311158]  [<ffffffffa01673f7>]
>>>>>>     bch_btree_map_keys+0x127/0x150 [bcache]
>>>>>>     [210884.318273]  [<ffffffffa016dcb0>] ?
>>>>>>     cached_dev_congested+0x180/0x180 [bcache]
>>>>>>     [210884.325826]  [<ffffffffa016e7f5>] cache_lookup+0xf5/0x1f0 [bcache]
>>>>>>     [210884.332325]  [<ffffffff810a4af6>] process_one_work+0x176/0x430
>>>>>>     [210884.338427]  [<ffffffff810a578b>] worker_thread+0x11b/0x3a0
>>>>>>     [210884.344282]  [<ffffffff810a5670>] ? rescuer_thread+0x3b0/0x3b0
>>>>>>     [210884.350447]  [<ffffffff810ac528>] kthread+0xd8/0xf0
>>>>>>     [210884.355615]  [<ffffffff810ac450>] ? insert_kthread_work+0x40/0x40
>>>>>>     [210884.362017]  [<ffffffff816ff93c>] ret_from_fork+0x7c/0xb0
>>>>>>     [210884.367756]  [<ffffffff810ac450>] ? insert_kthread_work+0x40/0x40
>>>>>>     [210884.374234] Code: 08 01 00 00 48 8b b8 58 cb 00 00 e8 bf 25 01
>>>>>>     e1 49 8b b4 24 80 00 00 00 49 89 c5 31 d2 0f b7 86 32 04 00 00 66
>>>>>>     f7 b6 30 04 00 00 <49> c7 45 08 00 00 00 00 0f b7 c0 49 89 45 00
>>>>>>     48 8b 43 10 48 85
>>>>>>     [210884.395405] RIP  [<ffffffffa01625fc>]
>>>>>>     bch_btree_node_read_done+0x4c/0x450 [bcache]
>>>>>>     [210884.403389]  RSP <ffff8800217bbbe8>
>>>>>>     [210884.407171] CR2: 0000000000000008
>>>>>>     [210884.411233] ---[ end trace 0064e6abfd068c85 ]---
>>>>>>     [210884.416352] BUG: unable to handle kernel paging request at
>>>>>>     ffffffffffffffd8
>>>>>>     [210884.423871] IP: [<ffffffff810acb10>] kthread_data+0x10/0x20
>>>>>>     [210884.429915] PGD 1c14067 PUD 1c16067 PMD 0
>>>>>>
>>>>>>     --Larkin
>>>>>>
>>>>>>     --
>>>>>>     To unsubscribe from this list: send the line "unsubscribe
>>>>>>     linux-bcache" in
>>>>>>     the body of a message to majordomo@vger.kernel.org
>>>>>>     <mailto:majordomo@vger.kernel.org>
>>>>>>     More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>>
>>>>> --
>>>>> To unsubscribe from this list: send the line "unsubscribe linux-bcache" in
>>>>> the body of a message to majordomo@vger.kernel.org
>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>> --
>>>> To unsubscribe from this list: send the line "unsubscribe linux-bcache" in
>>>> the body of a message to majordomo@vger.kernel.org
>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Null pointer oops
  2014-08-13 21:25             ` Slava Pestov
@ 2014-08-13 21:30               ` Slava Pestov
  2014-08-13 21:34                 ` Jianjian Huo
                                   ` (2 more replies)
  2014-08-13 21:32               ` Larkin Lowrey
  1 sibling, 3 replies; 13+ messages in thread
From: Slava Pestov @ 2014-08-13 21:30 UTC (permalink / raw)
  To: Larkin Lowrey; +Cc: Kent Overstreet, linux-bcache

I was mistaken. The bug is fixed in the pull request Kent sent to Jens for 3.16:

http://evilpiepirate.org/git/linux-bcache.git/commit/?h=bcache-dev&id=bcf090e0040e30f8409e6a535a01e6473afb096f

On Wed, Aug 13, 2014 at 2:25 PM, Slava Pestov <sp@datera.io> wrote:
> Indeed it looks like iter is NULL. I see the bug is still present in
> the latest dev branch. The problem is that we're not checking the
> return value of mempoool_alloc(), which may be NULL if we pass
> GFP_NOWAIT.
>
> On Wed, Aug 13, 2014 at 2:21 PM, Larkin Lowrey
> <llowrey@nuclearwinter.com> wrote:
>> Here's the dissassembly of bch_btree_node_read_done. The offending line
>> is 207 and the instruction is at offset 76.
>>
>> --Larkin
>>
>> 199     void bch_btree_node_read_done(struct btree *b)
>> 200     {
>>    0x00000000000065b0 <+0>:     callq  0x65b5 <bch_btree_node_read_done+5>
>>    0x00000000000065b5 <+5>:     push   %rbp
>>    0x00000000000065b8 <+8>:     mov    %rsp,%rbp
>>    0x00000000000065bb <+11>:    push   %r15
>>    0x00000000000065bd <+13>:    push   %r14
>>    0x00000000000065bf <+15>:    push   %r13
>>    0x00000000000065c1 <+17>:    push   %r12
>>    0x00000000000065c3 <+19>:    mov    %rdi,%r12
>>    0x00000000000065c6 <+22>:    push   %rbx
>>
>> 201             const char *err = "bad btree header";
>>    0x0000000000006800 <+592>:   mov    $0x0,%rdx
>>
>> 202             struct bset *i = btree_bset_first(b);
>> 203             struct btree_iter *iter;
>> 204
>> 205             iter = mempool_alloc(b->c->fill_iter, GFP_NOWAIT);
>>    0x00000000000065b6 <+6>:     xor    %esi,%esi
>>    0x00000000000065c7 <+23>:    mov    0x80(%rdi),%rax
>>    0x00000000000065d5 <+37>:    mov    0xcb58(%rax),%rdi
>>    0x00000000000065dc <+44>:    callq  0x65e1 <bch_btree_node_read_done+49>
>>    0x00000000000065e9 <+57>:    mov    %rax,%r13
>>
>> 206             iter->size = b->c->sb.bucket_size / b->c->sb.block_size;
>>    0x00000000000065e1 <+49>:    mov    0x80(%r12),%rsi
>>    0x00000000000065ec <+60>:    xor    %edx,%edx
>>    0x00000000000065ee <+62>:    movzwl 0x432(%rsi),%eax
>>    0x00000000000065f5 <+69>:    divw   0x430(%rsi)
>>    0x0000000000006604 <+84>:    movzwl %ax,%eax
>>    0x0000000000006607 <+87>:    mov    %rax,0x0(%r13)
>>
>> 207             iter->used = 0;
>>    0x00000000000065fc <+76>:    movq   $0x0,0x8(%r13)
>>
>> 208
>> 209     #ifdef CONFIG_BCACHE_DEBUG
>> 210             iter->b = &b->keys;
>> 211     #endif
>> 212
>> 213             if (!i->seq)
>>    0x000000000000660b <+91>:    mov    0x10(%rbx),%rax
>>    0x000000000000660f <+95>:    test   %rax,%rax
>>    0x0000000000006612 <+98>:    je     0x6800 <bch_btree_node_read_done+592>
>>
>> 214                     goto err;
>> 215
>> 216             for (;
>>    0x000000000000664d <+157>:   cmp    %r9d,%ecx
>>    0x0000000000006650 <+160>:   jae    0x6882 <bch_btree_node_read_done+722>
>>    0x0000000000006744 <+404>:   cmp    %r9d,%r10d
>>    0x0000000000006747 <+407>:   jae    0x6898 <bch_btree_node_read_done+744>
>>
>> 217                  b->written < btree_blocks(b) && i->seq ==
>> b->keys.set[0].data->seq;
>>    0x0000000000006618 <+104>:   mov    0x80(%r12),%rsi
>>    0x0000000000006625 <+117>:   movzwl 0xc0(%r12),%edi
>>    0x000000000000662e <+126>:   mov    0x108(%r12),%r8
>>    0x0000000000006636 <+134>:   movzwl 0xde2(%rsi),%ecx
>>    0x0000000000006644 <+148>:   mov    %rdx,%r9
>>    0x0000000000006647 <+151>:   shr    %cl,%r9
>>    0x000000000000664a <+154>:   movzwl %di,%ecx
>>    0x0000000000006656 <+166>:   cmp    0x10(%r8),%rax
>>    0x000000000000665a <+170>:   jne    0x6882 <bch_btree_node_read_done+722>
>>    0x000000000000670f <+351>:   mov    %rdx,%r9
>>    0x000000000000672a <+378>:   movzwl 0xde2(%rsi),%ecx
>>    0x0000000000006738 <+392>:   shr    %cl,%r9
>>    0x000000000000674d <+413>:   mov    0x10(%r8),%rcx
>>    0x0000000000006751 <+417>:   cmp    %rcx,0x10(%rbx)
>>    0x0000000000006755 <+421>:   jne    0x6898 <bch_btree_node_read_done+744>
>>    0x0000000000006892 <+738>:   add    %r8,%rbx
>>    0x0000000000006895 <+741>:   nopl   (%rax)
>>
>> 218                  i = write_block(b)) {
>> 219                     err = "unsupported bset version";
>>    0x00000000000069c0 <+1040>:  mov    $0x0,%rdx
>>    0x00000000000069c7 <+1047>:  jmpq   0x6807 <bch_btree_node_read_done+599>
>>    0x00000000000069cc <+1052>:  nopl   0x0(%rax)
>>
>> 220                     if (i->version > BCACHE_BSET_VERSION)
>>    0x0000000000006660 <+176>:   mov    0x18(%rbx),%r10d
>>    0x0000000000006664 <+180>:   cmp    $0x1,%r10d
>>    0x0000000000006668 <+184>:   ja     0x69c0
>> <bch_btree_node_read_done+1040>
>>    0x000000000000666e <+190>:   movzwl 0x430(%rsi),%r11d
>>    0x0000000000006676 <+198>:   jmpq   0x6769 <bch_btree_node_read_done+441>
>>    0x000000000000667b <+203>:   nopl   0x0(%rax,%rax,1)
>>    0x000000000000675b <+427>:   mov    0x18(%rbx),%r10d
>>    0x000000000000675f <+431>:   cmp    $0x1,%r10d
>>    0x0000000000006763 <+435>:   ja     0x69c0
>> <bch_btree_node_read_done+1040>
>>
>> 221                             goto err;
>> 222
>> 223                     err = "bad btree header";
>> 224                     if (b->written + set_blocks(i, block_bytes(b->c)) >
>>    0x0000000000006769 <+441>:   mov    0x1c(%rbx),%eax
>>    0x000000000000676c <+444>:   mov    %r11,%rcx
>>    0x000000000000676f <+447>:   xor    %edx,%edx
>>    0x0000000000006771 <+449>:   shl    $0x9,%rcx
>>    0x0000000000006775 <+453>:   movzwl %di,%edi
>>    0x0000000000006778 <+456>:   mov    %r9d,%r9d
>>    0x000000000000677b <+459>:   and    $0x1fffe00,%ecx
>>    0x0000000000006781 <+465>:   lea    0x20(,%rax,8),%r8
>>    0x0000000000006789 <+473>:   lea    -0x1(%r8,%rcx,1),%rax
>>    0x000000000000678e <+478>:   div    %rcx
>>    0x0000000000006791 <+481>:   add    %rdi,%rax
>>    0x0000000000006794 <+484>:   cmp    %r9,%rax
>>    0x0000000000006797 <+487>:   ja     0x6800 <bch_btree_node_read_done+592>
>>
>> 225                         btree_blocks(b))
>> 226                             goto err;
>> 227
>> 228                     err = "bad magic";
>>    0x00000000000069d0 <+1056>:  mov    $0x0,%rdx
>>    0x00000000000069d7 <+1063>:  jmpq   0x6807 <bch_btree_node_read_done+599>
>>    0x00000000000069dc <+1068>:  nopl   0x0(%rax)
>>
>> 229                     if (i->magic != bset_magic(&b->c->sb))
>>    0x00000000000067aa <+506>:   cmp    %rax,0x8(%rbx)
>>    0x00000000000067ae <+510>:   jne    0x69d0
>> <bch_btree_node_read_done+1056>
>>
>> 230                             goto err;
>> 231
>> 232                     err = "bad checksum";
>>    0x00000000000067df <+559>:   mov    $0x0,%rdx
>>    0x00000000000067e6 <+566>:   jmp    0x6807 <bch_btree_node_read_done+599>
>>    0x00000000000067e8 <+568>:   nopl   0x0(%rax,%rax,1)
>>    0x00000000000067f0 <+576>:   mov    0x1c(%rbx),%eax
>>    0x00000000000067f3 <+579>:   jmpq   0x66bf <bch_btree_node_read_done+271>
>>    0x00000000000067f8 <+584>:   nopl   0x0(%rax,%rax,1)
>>
>> 233                     switch (i->version) {
>>    0x00000000000067b4 <+516>:   cmp    $0x1,%r10d
>>    0x00000000000067bb <+523>:   je     0x6680 <bch_btree_node_read_done+208>
>>
>> 234                     case 0:
>> 235                             if (i->csum != csum_set(i))
>>    0x00000000000067c1 <+529>:   lea    0x20(%rbx),%r14
>>    0x00000000000067c5 <+533>:   lea    0x8(%rbx),%rdi
>>    0x00000000000067ce <+542>:   sub    %rdi,%rsi
>>    0x00000000000067d1 <+545>:   callq  0x67d6 <bch_btree_node_read_done+550>
>>    0x00000000000067d6 <+550>:   cmp    %rax,%r15
>>    0x00000000000067d9 <+553>:   je     0x66a6 <bch_btree_node_read_done+246>
>> 236                                     goto err;
>> 237                             break;
>> 238                     case BCACHE_BSET_VERSION:
>> 239                             if (i->csum != btree_csum_set(b, i))
>>    0x000000000000669d <+237>:   cmp    %rax,%r15
>>    0x00000000000066a0 <+240>:   jne    0x67df <bch_btree_node_read_done+559>
>>    0x00000000000067b8 <+520>:   mov    (%rbx),%r15
>>
>> 240                                     goto err;
>> 241                             break;
>> 242                     }
>> 243
>> 244                     err = "empty set";
>>    0x00000000000069e0 <+1072>:  mov    $0x0,%rdx
>>    0x00000000000069e7 <+1079>:  jmpq   0x6807 <bch_btree_node_read_done+599>
>>
>> 245                     if (i != b->keys.set[0].data && !i->keys)
>>    0x00000000000066a6 <+246>:   cmp    %rbx,0x108(%r12)
>>    0x00000000000066ae <+254>:   je     0x67f0 <bch_btree_node_read_done+576>
>>    0x00000000000066b4 <+260>:   mov    0x1c(%rbx),%eax
>>    0x00000000000066b7 <+263>:   test   %eax,%eax
>>    0x00000000000066b9 <+265>:   je     0x69e0
>> <bch_btree_node_read_done+1072>
>>
>> 246                             goto err;
>> 247
>> 248                     bch_btree_iter_push(iter, i->start,
>> bset_bkey_last(i));
>>    0x00000000000066c3 <+275>:   mov    %r14,%rsi
>>    0x00000000000066c6 <+278>:   mov    %r13,%rdi
>>    0x00000000000066c9 <+281>:   callq  0x66ce <bch_btree_node_read_done+286>
>>
>> 249
>> 250                     b->written += set_blocks(i, block_bytes(b->c));
>>    0x00000000000066ce <+286>:   mov    0x80(%r12),%rsi
>>    0x00000000000066d6 <+294>:   mov    0x1c(%rbx),%eax
>>    0x00000000000066d9 <+297>:   xor    %edx,%edx
>>    0x00000000000066e3 <+307>:   movzwl 0x430(%rsi),%ecx
>>    0x00000000000066ea <+314>:   shl    $0x9,%ecx
>>    0x00000000000066ed <+317>:   movslq %ecx,%rcx
>>    0x00000000000066f0 <+320>:   lea    0x1f(%rcx,%rax,8),%rax
>>    0x00000000000066f5 <+325>:   div    %rcx
>>    0x0000000000006704 <+340>:   mov    %eax,%edi
>>    0x0000000000006706 <+342>:   add    0xc0(%r12),%di
>>    0x0000000000006712 <+354>:   mov    %di,0xc0(%r12)
>>
>> 251             }
>> 252
>> 253             err = "corrupted btree";
>>    0x00000000000069b0 <+1024>:  mov    $0x0,%rdx
>>    0x00000000000069b7 <+1031>:  jmpq   0x6807 <bch_btree_node_read_done+599>
>>    0x00000000000069bc <+1036>:  nopl   0x0(%rax)
>>
>> 254             for (i = write_block(b);
>>    0x00000000000068a1 <+753>:   cmp    %rdx,%rcx
>>    0x00000000000068a4 <+756>:   jae    0x68e5 <bch_btree_node_read_done+821>
>>    0x00000000000068e0 <+816>:   cmp    %rdx,%rcx
>>    0x00000000000068e3 <+819>:   jb     0x68c8 <bch_btree_node_read_done+792>
>>
>> 255                  bset_sector_offset(&b->keys, i) < KEY_SIZE(&b->key);
>> 256                  i = ((void *) i) + block_bytes(b->c))
>>    0x00000000000068d7 <+807>:   mov    %rcx,%rbx
>>    0x00000000000068da <+810>:   sub    %r8d,%ecx
>>
>> 257                     if (i->seq == b->keys.set[0].data->seq)
>>    0x00000000000068a6 <+758>:   mov    0x10(%r8),%rdi
>>    0x00000000000068aa <+762>:   cmp    %rdi,0x10(%rbx)
>>    0x00000000000068ae <+766>:   je     0x69b0
>> <bch_btree_node_read_done+1024>
>>    0x00000000000068b4 <+772>:   cltq
>>    0x00000000000068b6 <+774>:   mov    %rax,%r9
>>    0x00000000000068b9 <+777>:   lea    (%rbx,%rax,1),%rcx
>>    0x00000000000068bd <+781>:   neg    %r9
>>    0x00000000000068c0 <+784>:   jmp    0x68d7 <bch_btree_node_read_done+807>
>>    0x00000000000068c2 <+786>:   nopw   0x0(%rax,%rax,1)
>>    0x00000000000068c8 <+792>:   lea    (%rbx,%rax,1),%rcx
>>    0x00000000000068cc <+796>:   cmp    0x10(%rcx,%r9,1),%rdi
>>    0x00000000000068d1 <+801>:   je     0x69b0
>> <bch_btree_node_read_done+1024>
>>
>> 258                             goto err;
>> 259
>> 260             bch_btree_sort_and_fix_extents(&b->keys, iter, &b->c->sort);
>>    0x00000000000068e5 <+821>:   lea    0xc8(%r12),%r14
>>    0x00000000000068ed <+829>:   lea    0xcb60(%rsi),%rdx
>>    0x00000000000068f4 <+836>:   mov    %r13,%rsi
>>    0x00000000000068f7 <+839>:   mov    %r14,%rdi
>>    0x00000000000068fa <+842>:   callq  0x68ff <bch_btree_node_read_done+847>
>>
>> 261
>> 262             i = b->keys.set[0].data;
>>    0x0000000000006907 <+855>:   mov    0x108(%r12),%rbx
>>
>> 263             err = "short btree key";
>>    0x00000000000069ec <+1084>:  mov    $0x0,%rdx
>>    0x00000000000069f3 <+1091>:  jmpq   0x6807 <bch_btree_node_read_done+599>
>>
>> 264             if (b->keys.set[0].size &&
>>    0x00000000000068ff <+847>:   mov    0xe0(%r12),%eax
>>    0x0000000000006914 <+868>:   test   %eax,%eax
>>    0x0000000000006916 <+870>:   je     0x694d <bch_btree_node_read_done+925>
>>    0x0000000000006944 <+916>:   test   %rax,%rax
>>    0x0000000000006947 <+919>:   js     0x69ec
>> <bch_btree_node_read_done+1084>
>>
>> 265                 bkey_cmp(&b->key, &b->keys.set[0].end) < 0)
>> 266                     goto err;
>> 267
>> 268             if (b->written < btree_blocks(b))
>>    0x000000000000694d <+925>:   mov    0x80(%r12),%rax
>>    0x0000000000006955 <+933>:   movzwl 0xc0(%r12),%esi
>>    0x0000000000006965 <+949>:   movzwl 0xde2(%rax),%ecx
>>    0x000000000000696c <+956>:   shr    %cl,%rdx
>>    0x000000000000696f <+959>:   cmp    %edx,%esi
>>    0x0000000000006971 <+961>:   jae    0x6868 <bch_btree_node_read_done+696>
>>
>> 269                     bch_bset_init_next(&b->keys, write_block(b),
>>    0x000000000000698f <+991>:   mov    %r14,%rdi
>>    0x000000000000699e <+1006>:  callq  0x69a3
>> <bch_btree_node_read_done+1011>
>>    0x00000000000069a3 <+1011>:  mov    0x80(%r12),%rax
>>    0x00000000000069ab <+1019>:  jmpq   0x6868 <bch_btree_node_read_done+696>
>>
>> 270                                        bset_magic(&b->c->sb));
>> 271     out:
>> 272             mempool_free(iter, b->c->fill_iter);
>>    0x0000000000006868 <+696>:   mov    0xcb58(%rax),%rsi
>>    0x000000000000686f <+703>:   mov    %r13,%rdi
>>    0x0000000000006872 <+706>:   callq  0x6877 <bch_btree_node_read_done+711>
>>
>> 273             return;
>> 274     err:
>> 275             set_btree_node_io_error(b);
>> 276             bch_cache_set_error(b->c, "%s at bucket %zu, block %u,
>> %u keys",
>>    0x0000000000006829 <+633>:   mov    0x1c(%rbx),%r9d
>>    0x000000000000684a <+666>:   mov    %esi,%ecx
>>    0x000000000000684c <+668>:   mov    $0x0,%rsi
>>    0x0000000000006853 <+675>:   shr    %cl,%r8d
>>    0x0000000000006856 <+678>:   mov    %rax,%rcx
>>    0x0000000000006859 <+681>:   xor    %eax,%eax
>>    0x000000000000685b <+683>:   callq  0x6860 <bch_btree_node_read_done+688>
>>    0x0000000000006860 <+688>:   mov    0x80(%r12),%rax
>>
>> 277                                 err, PTR_BUCKET_NR(b->c, &b->key, 0),
>> 278                                 bset_block_offset(b, i), i->keys);
>> 279             goto out;
>> 280     }
>>    0x0000000000006877 <+711>:   pop    %rbx
>>    0x0000000000006878 <+712>:   pop    %r12
>>    0x000000000000687a <+714>:   pop    %r13
>>    0x000000000000687c <+716>:   pop    %r14
>>    0x000000000000687e <+718>:   pop    %r15
>>    0x0000000000006880 <+720>:   pop    %rbp
>>    0x0000000000006881 <+721>:   retq
>>    0x0000000000006882 <+722>:   movzwl 0x430(%rsi),%eax
>>    0x0000000000006889 <+729>:   shl    $0x9,%eax
>>    0x000000000000688c <+732>:   imul   %eax,%ecx
>>    0x000000000000688f <+735>:   movslq %ecx,%rbx
>>
>>
>> On 8/13/2014 1:45 PM, Slava Pestov wrote:
>>> Can you post the disassembly of the function?
>>>
>>> On Wed, Aug 13, 2014 at 11:35 AM, Larkin Lowrey
>>> <llowrey@nuclearwinter.com> wrote:
>>>> Thanks. Trying gdb helped me find the answer. I needed to install the
>>>> kernel-debuginfo-3.15.8-200.fc20.x86_64  package via yum.
>>>>
>>>> From addr2line:
>>>>> bch_btree_node_read_done+0x4c
>>>>> drivers/md/bcache/btree.c:207
>>>> Here'a a snippet from gdb:
>>>>
>>>>> (gdb) list *(bch_btree_node_read_done+0x4c)
>>>>> 0x65fc is in bch_btree_node_read_done (drivers/md/bcache/btree.c:207).
>>>>> 202             struct bset *i = btree_bset_first(b);
>>>>> 203             struct btree_iter *iter;
>>>>> 204
>>>>> 205             iter = mempool_alloc(b->c->fill_iter, GFP_NOWAIT);
>>>>> 206             iter->size = b->c->sb.bucket_size / b->c->sb.block_size;
>>>>> 207             iter->used = 0;
>>>>> 208
>>>>> 209     #ifdef CONFIG_BCACHE_DEBUG
>>>>> 210             iter->b = &b->keys;
>>>>> 211     #endif
>>>> This doesn't make any sense to me. If iter was null I would expect line
>>>> 206 to blow up first.
>>>>
>>>> --Larkin
>>>>
>>>> On 8/13/2014 12:41 PM, Slava Pestov wrote:
>>>>> You can try to use gdb:
>>>>>
>>>>> gdb /lib/modules/.../foo.ko
>>>>>
>>>>> list *(bch_btree_node_read_done+0x4c)
>>>>>
>>>>>
>>>>> On Wed, Aug 13, 2014 at 9:40 AM, Larkin Lowrey
>>>>> <llowrey@nuclearwinter.com> wrote:
>>>>>> This is making be feel very dumb. I've googled extensively but can't
>>>>>> figure out how to run addr2line for a module.
>>>>>>
>>>>>> I'm running Fedora 20 and the kernel did not have debugging symbols. I
>>>>>> downloaded the version with symbols but I don't know if the addresses
>>>>>> are going to be the same. Bcache is a module for me and that's where
>>>>>> things get tricky. Do you have any tips?
>>>>>>
>>>>>> --Larkin
>>>>>>
>>>>>> On 8/13/2014 12:04 AM, Kent Overstreet wrote:
>>>>>>> Any chance you could do an addr2line and get me the exact line where
>>>>>>> it happened?
>>>>>>>
>>>>>>> On Aug 12, 2014 10:02 PM, "Larkin Lowrey" <llowrey@nuclearwinter.com
>>>>>>> <mailto:llowrey@nuclearwinter.com>> wrote:
>>>>>>>
>>>>>>>     I got an oops while doing some heavy I/O. I have an md raid10 cache
>>>>>>>     device (4 SSDs) and 3 md raid5/6 backing devices. This setup has been
>>>>>>>     well behaved for about 6 months.
>>>>>>>
>>>>>>>     If this isn't a known issue is there anything I can do to provide more
>>>>>>>     useful information?
>>>>>>>
>>>>>>>     I'm running kernel 3.15.8-200.fc20.x86_64.
>>>>>>>
>>>>>>>     [210884.047249] BUG: unable to handle kernel NULL pointer
>>>>>>>     dereference at 0000000000000008
>>>>>>>     [210884.055605] IP: [<ffffffffa01625fc>]
>>>>>>>     bch_btree_node_read_done+0x4c/0x450 [bcache]
>>>>>>>     [210884.063723] PGD 0
>>>>>>>     [210884.066053] Oops: 0002 [#1] SMP
>>>>>>>     [210884.069610] Modules linked in: lp parport binfmt_misc
>>>>>>>     ipt_MASQUERADE iptable_nat nf_nat_ipv4 nf_nat xt_CHECKSUM
>>>>>>>     iptable_mangle tun bridge stp llc xt_multiport ebtable_nat
>>>>>>>     ebtables hwmon_vid ip6t_REJECT nf_conntrack_ipv6 nf_conntrack_ipv4
>>>>>>>     nf_defrag_ipv6 nf_defrag_ipv4 ip6table_filter xt_conntrack
>>>>>>>     ip6_tables nf_conntrack keyspan ezusb kvm_amd kvm crct10dif_pclmul
>>>>>>>     crc32_pclmul crc32c_intel ghash_clmulni_intel microcode serio_raw
>>>>>>>     amd64_edac_mod edac_core fam15h_power k10temp edac_mce_amd
>>>>>>>     sp5100_tco i2c_piix4 igb ptp pps_core dca shpchp acpi_cpufreq
>>>>>>>     btrfs bcache raid456 async_raid6_recov async_memcpy async_pq
>>>>>>>     async_xor async_tx xor raid6_pq raid10 i2c_algo_bit drm_kms_helper
>>>>>>>     ttm drm i2c_core mpt2sas mvsas libsas raid_class
>>>>>>>     scsi_transport_sas cpufreq_stats
>>>>>>>     [210884.140704] CPU: 5 PID: 11188 Comm: kworker/5:1 Not tainted
>>>>>>>     3.15.8-200.fc20.x86_64 #1
>>>>>>>     [210884.149069] Hardware name:  /H8DG6/H8DGi, BIOS 3.0a       07/2
>>>>>>>     [210884.155280] Workqueue: bcache cache_lookup [bcache]
>>>>>>>     [210884.160531] task: ffff880218633160 ti: ffff8800217b8000
>>>>>>>     task.ti: ffff8800217b8000
>>>>>>>     [210884.168502] RIP: 0010:[<ffffffffa01625fc>]
>>>>>>>      [<ffffffffa01625fc>] bch_btree_node_read_done+0x4c/0x450 [bcache]
>>>>>>>     [210884.179105] RSP: 0000:ffff8800217bbbe8  EFLAGS: 00010212
>>>>>>>     [210884.184806] RAX: 0000000000000400 RBX: ffff880245ec0000 RCX:
>>>>>>>     0000000000000000
>>>>>>>     [210884.192480] RDX: 0000000000000000 RSI: ffff880418380000 RDI:
>>>>>>>     0000000000000246
>>>>>>>     [210884.200075] RBP: ffff8800217bbc10 R08: 0000000000000000 R09:
>>>>>>>     0000000000000f6b
>>>>>>>     [210884.207738] R10: 0000000000000000 R11: 0000000000000400 R12:
>>>>>>>     ffff880413d06c00
>>>>>>>     [210884.215391] R13: 0000000000000000 R14: ffff8800217bbc20 R15:
>>>>>>>     ffff880413d06c00
>>>>>>>     [210884.222961] FS:  00007f73bacd6880(0000)
>>>>>>>     GS:ffff88021fd40000(0000) knlGS:0000000000000000
>>>>>>>     [210884.231516] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
>>>>>>>     [210884.237557] CR2: 0000000000000008 CR3: 0000000001c11000 CR4:
>>>>>>>     00000000000407e0
>>>>>>>     [210884.245131] Stack:
>>>>>>>     [210884.247395]  ffff880274f4d020 ffff880413d06c00
>>>>>>>     0000bfcc44a463f8 ffff8800217bbc20
>>>>>>>     [210884.255337]  ffff880413d06c00 ffff8800217bbc78
>>>>>>>     ffffffffa0162b68 0000000000000000
>>>>>>>     [210884.263256]  ffff880218633160 0000000000000000
>>>>>>>     0000000000000000 0000000000000000
>>>>>>>     [210884.271234] Call Trace:
>>>>>>>     [210884.273985]  [<ffffffffa0162b68>]
>>>>>>>     bch_btree_node_read+0x168/0x190 [bcache]
>>>>>>>     [210884.281258]  [<ffffffffa0163f69>]
>>>>>>>     bch_btree_node_get+0x169/0x290 [bcache]
>>>>>>>     [210884.288377]  [<ffffffffa01642f5>]
>>>>>>>     bch_btree_map_keys_recurse+0xd5/0x1d0 [bcache]
>>>>>>>     [210884.296311]  [<ffffffffa016dcb0>] ?
>>>>>>>     cached_dev_congested+0x180/0x180 [bcache]
>>>>>>>     [210884.303953]  [<ffffffff8135b204>] ?
>>>>>>>     call_rwsem_down_read_failed+0x14/0x30
>>>>>>>     [210884.311158]  [<ffffffffa01673f7>]
>>>>>>>     bch_btree_map_keys+0x127/0x150 [bcache]
>>>>>>>     [210884.318273]  [<ffffffffa016dcb0>] ?
>>>>>>>     cached_dev_congested+0x180/0x180 [bcache]
>>>>>>>     [210884.325826]  [<ffffffffa016e7f5>] cache_lookup+0xf5/0x1f0 [bcache]
>>>>>>>     [210884.332325]  [<ffffffff810a4af6>] process_one_work+0x176/0x430
>>>>>>>     [210884.338427]  [<ffffffff810a578b>] worker_thread+0x11b/0x3a0
>>>>>>>     [210884.344282]  [<ffffffff810a5670>] ? rescuer_thread+0x3b0/0x3b0
>>>>>>>     [210884.350447]  [<ffffffff810ac528>] kthread+0xd8/0xf0
>>>>>>>     [210884.355615]  [<ffffffff810ac450>] ? insert_kthread_work+0x40/0x40
>>>>>>>     [210884.362017]  [<ffffffff816ff93c>] ret_from_fork+0x7c/0xb0
>>>>>>>     [210884.367756]  [<ffffffff810ac450>] ? insert_kthread_work+0x40/0x40
>>>>>>>     [210884.374234] Code: 08 01 00 00 48 8b b8 58 cb 00 00 e8 bf 25 01
>>>>>>>     e1 49 8b b4 24 80 00 00 00 49 89 c5 31 d2 0f b7 86 32 04 00 00 66
>>>>>>>     f7 b6 30 04 00 00 <49> c7 45 08 00 00 00 00 0f b7 c0 49 89 45 00
>>>>>>>     48 8b 43 10 48 85
>>>>>>>     [210884.395405] RIP  [<ffffffffa01625fc>]
>>>>>>>     bch_btree_node_read_done+0x4c/0x450 [bcache]
>>>>>>>     [210884.403389]  RSP <ffff8800217bbbe8>
>>>>>>>     [210884.407171] CR2: 0000000000000008
>>>>>>>     [210884.411233] ---[ end trace 0064e6abfd068c85 ]---
>>>>>>>     [210884.416352] BUG: unable to handle kernel paging request at
>>>>>>>     ffffffffffffffd8
>>>>>>>     [210884.423871] IP: [<ffffffff810acb10>] kthread_data+0x10/0x20
>>>>>>>     [210884.429915] PGD 1c14067 PUD 1c16067 PMD 0
>>>>>>>
>>>>>>>     --Larkin
>>>>>>>
>>>>>>>     --
>>>>>>>     To unsubscribe from this list: send the line "unsubscribe
>>>>>>>     linux-bcache" in
>>>>>>>     the body of a message to majordomo@vger.kernel.org
>>>>>>>     <mailto:majordomo@vger.kernel.org>
>>>>>>>     More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>>>
>>>>>> --
>>>>>> To unsubscribe from this list: send the line "unsubscribe linux-bcache" in
>>>>>> the body of a message to majordomo@vger.kernel.org
>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>> --
>>>>> To unsubscribe from this list: send the line "unsubscribe linux-bcache" in
>>>>> the body of a message to majordomo@vger.kernel.org
>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Null pointer oops
  2014-08-13 21:25             ` Slava Pestov
  2014-08-13 21:30               ` Slava Pestov
@ 2014-08-13 21:32               ` Larkin Lowrey
  2014-08-13 21:37                 ` Slava Pestov
  1 sibling, 1 reply; 13+ messages in thread
From: Larkin Lowrey @ 2014-08-13 21:32 UTC (permalink / raw)
  To: Slava Pestov; +Cc: Kent Overstreet, linux-bcache

My swap is an LVM LV on top of a raid10 backed bcache device. I have had
a few oopses in recent months but have not been able to pin down the
cause. I have begun to suspect that the swap may be involved. The SSDs
in that raid10 are junky OCZ Agility3s. They seem to have a reputation
for periodic freezes or long pauses. Could it be that the kernel wanted
to write to the swap but couldn't because the SSDs were in a long pause
and that caused mempool_alloc to return null which then blew up the world?

Is there any reason not to put swap on top of a bcache device?

--Larkin

On 8/13/2014 4:25 PM, Slava Pestov wrote:
> Indeed it looks like iter is NULL. I see the bug is still present in
> the latest dev branch. The problem is that we're not checking the
> return value of mempoool_alloc(), which may be NULL if we pass
> GFP_NOWAIT.
>
> On Wed, Aug 13, 2014 at 2:21 PM, Larkin Lowrey
> <llowrey@nuclearwinter.com> wrote:
>> Here's the dissassembly of bch_btree_node_read_done. The offending line
>> is 207 and the instruction is at offset 76.
>>
>> --Larkin
>>
>> 199     void bch_btree_node_read_done(struct btree *b)
>> 200     {
>>    0x00000000000065b0 <+0>:     callq  0x65b5 <bch_btree_node_read_done+5>
>>    0x00000000000065b5 <+5>:     push   %rbp
>>    0x00000000000065b8 <+8>:     mov    %rsp,%rbp
>>    0x00000000000065bb <+11>:    push   %r15
>>    0x00000000000065bd <+13>:    push   %r14
>>    0x00000000000065bf <+15>:    push   %r13
>>    0x00000000000065c1 <+17>:    push   %r12
>>    0x00000000000065c3 <+19>:    mov    %rdi,%r12
>>    0x00000000000065c6 <+22>:    push   %rbx
>>
>> 201             const char *err = "bad btree header";
>>    0x0000000000006800 <+592>:   mov    $0x0,%rdx
>>
>> 202             struct bset *i = btree_bset_first(b);
>> 203             struct btree_iter *iter;
>> 204
>> 205             iter = mempool_alloc(b->c->fill_iter, GFP_NOWAIT);
>>    0x00000000000065b6 <+6>:     xor    %esi,%esi
>>    0x00000000000065c7 <+23>:    mov    0x80(%rdi),%rax
>>    0x00000000000065d5 <+37>:    mov    0xcb58(%rax),%rdi
>>    0x00000000000065dc <+44>:    callq  0x65e1 <bch_btree_node_read_done+49>
>>    0x00000000000065e9 <+57>:    mov    %rax,%r13
>>
>> 206             iter->size = b->c->sb.bucket_size / b->c->sb.block_size;
>>    0x00000000000065e1 <+49>:    mov    0x80(%r12),%rsi
>>    0x00000000000065ec <+60>:    xor    %edx,%edx
>>    0x00000000000065ee <+62>:    movzwl 0x432(%rsi),%eax
>>    0x00000000000065f5 <+69>:    divw   0x430(%rsi)
>>    0x0000000000006604 <+84>:    movzwl %ax,%eax
>>    0x0000000000006607 <+87>:    mov    %rax,0x0(%r13)
>>
>> 207             iter->used = 0;
>>    0x00000000000065fc <+76>:    movq   $0x0,0x8(%r13)
>>
>> 208
>> 209     #ifdef CONFIG_BCACHE_DEBUG
>> 210             iter->b = &b->keys;
>> 211     #endif
>> 212
>> 213             if (!i->seq)
>>    0x000000000000660b <+91>:    mov    0x10(%rbx),%rax
>>    0x000000000000660f <+95>:    test   %rax,%rax
>>    0x0000000000006612 <+98>:    je     0x6800 <bch_btree_node_read_done+592>
>>
>> 214                     goto err;
>> 215
>> 216             for (;
>>    0x000000000000664d <+157>:   cmp    %r9d,%ecx
>>    0x0000000000006650 <+160>:   jae    0x6882 <bch_btree_node_read_done+722>
>>    0x0000000000006744 <+404>:   cmp    %r9d,%r10d
>>    0x0000000000006747 <+407>:   jae    0x6898 <bch_btree_node_read_done+744>
>>
>> 217                  b->written < btree_blocks(b) && i->seq ==
>> b->keys.set[0].data->seq;
>>    0x0000000000006618 <+104>:   mov    0x80(%r12),%rsi
>>    0x0000000000006625 <+117>:   movzwl 0xc0(%r12),%edi
>>    0x000000000000662e <+126>:   mov    0x108(%r12),%r8
>>    0x0000000000006636 <+134>:   movzwl 0xde2(%rsi),%ecx
>>    0x0000000000006644 <+148>:   mov    %rdx,%r9
>>    0x0000000000006647 <+151>:   shr    %cl,%r9
>>    0x000000000000664a <+154>:   movzwl %di,%ecx
>>    0x0000000000006656 <+166>:   cmp    0x10(%r8),%rax
>>    0x000000000000665a <+170>:   jne    0x6882 <bch_btree_node_read_done+722>
>>    0x000000000000670f <+351>:   mov    %rdx,%r9
>>    0x000000000000672a <+378>:   movzwl 0xde2(%rsi),%ecx
>>    0x0000000000006738 <+392>:   shr    %cl,%r9
>>    0x000000000000674d <+413>:   mov    0x10(%r8),%rcx
>>    0x0000000000006751 <+417>:   cmp    %rcx,0x10(%rbx)
>>    0x0000000000006755 <+421>:   jne    0x6898 <bch_btree_node_read_done+744>
>>    0x0000000000006892 <+738>:   add    %r8,%rbx
>>    0x0000000000006895 <+741>:   nopl   (%rax)
>>
>> 218                  i = write_block(b)) {
>> 219                     err = "unsupported bset version";
>>    0x00000000000069c0 <+1040>:  mov    $0x0,%rdx
>>    0x00000000000069c7 <+1047>:  jmpq   0x6807 <bch_btree_node_read_done+599>
>>    0x00000000000069cc <+1052>:  nopl   0x0(%rax)
>>
>> 220                     if (i->version > BCACHE_BSET_VERSION)
>>    0x0000000000006660 <+176>:   mov    0x18(%rbx),%r10d
>>    0x0000000000006664 <+180>:   cmp    $0x1,%r10d
>>    0x0000000000006668 <+184>:   ja     0x69c0
>> <bch_btree_node_read_done+1040>
>>    0x000000000000666e <+190>:   movzwl 0x430(%rsi),%r11d
>>    0x0000000000006676 <+198>:   jmpq   0x6769 <bch_btree_node_read_done+441>
>>    0x000000000000667b <+203>:   nopl   0x0(%rax,%rax,1)
>>    0x000000000000675b <+427>:   mov    0x18(%rbx),%r10d
>>    0x000000000000675f <+431>:   cmp    $0x1,%r10d
>>    0x0000000000006763 <+435>:   ja     0x69c0
>> <bch_btree_node_read_done+1040>
>>
>> 221                             goto err;
>> 222
>> 223                     err = "bad btree header";
>> 224                     if (b->written + set_blocks(i, block_bytes(b->c)) >
>>    0x0000000000006769 <+441>:   mov    0x1c(%rbx),%eax
>>    0x000000000000676c <+444>:   mov    %r11,%rcx
>>    0x000000000000676f <+447>:   xor    %edx,%edx
>>    0x0000000000006771 <+449>:   shl    $0x9,%rcx
>>    0x0000000000006775 <+453>:   movzwl %di,%edi
>>    0x0000000000006778 <+456>:   mov    %r9d,%r9d
>>    0x000000000000677b <+459>:   and    $0x1fffe00,%ecx
>>    0x0000000000006781 <+465>:   lea    0x20(,%rax,8),%r8
>>    0x0000000000006789 <+473>:   lea    -0x1(%r8,%rcx,1),%rax
>>    0x000000000000678e <+478>:   div    %rcx
>>    0x0000000000006791 <+481>:   add    %rdi,%rax
>>    0x0000000000006794 <+484>:   cmp    %r9,%rax
>>    0x0000000000006797 <+487>:   ja     0x6800 <bch_btree_node_read_done+592>
>>
>> 225                         btree_blocks(b))
>> 226                             goto err;
>> 227
>> 228                     err = "bad magic";
>>    0x00000000000069d0 <+1056>:  mov    $0x0,%rdx
>>    0x00000000000069d7 <+1063>:  jmpq   0x6807 <bch_btree_node_read_done+599>
>>    0x00000000000069dc <+1068>:  nopl   0x0(%rax)
>>
>> 229                     if (i->magic != bset_magic(&b->c->sb))
>>    0x00000000000067aa <+506>:   cmp    %rax,0x8(%rbx)
>>    0x00000000000067ae <+510>:   jne    0x69d0
>> <bch_btree_node_read_done+1056>
>>
>> 230                             goto err;
>> 231
>> 232                     err = "bad checksum";
>>    0x00000000000067df <+559>:   mov    $0x0,%rdx
>>    0x00000000000067e6 <+566>:   jmp    0x6807 <bch_btree_node_read_done+599>
>>    0x00000000000067e8 <+568>:   nopl   0x0(%rax,%rax,1)
>>    0x00000000000067f0 <+576>:   mov    0x1c(%rbx),%eax
>>    0x00000000000067f3 <+579>:   jmpq   0x66bf <bch_btree_node_read_done+271>
>>    0x00000000000067f8 <+584>:   nopl   0x0(%rax,%rax,1)
>>
>> 233                     switch (i->version) {
>>    0x00000000000067b4 <+516>:   cmp    $0x1,%r10d
>>    0x00000000000067bb <+523>:   je     0x6680 <bch_btree_node_read_done+208>
>>
>> 234                     case 0:
>> 235                             if (i->csum != csum_set(i))
>>    0x00000000000067c1 <+529>:   lea    0x20(%rbx),%r14
>>    0x00000000000067c5 <+533>:   lea    0x8(%rbx),%rdi
>>    0x00000000000067ce <+542>:   sub    %rdi,%rsi
>>    0x00000000000067d1 <+545>:   callq  0x67d6 <bch_btree_node_read_done+550>
>>    0x00000000000067d6 <+550>:   cmp    %rax,%r15
>>    0x00000000000067d9 <+553>:   je     0x66a6 <bch_btree_node_read_done+246>
>> 236                                     goto err;
>> 237                             break;
>> 238                     case BCACHE_BSET_VERSION:
>> 239                             if (i->csum != btree_csum_set(b, i))
>>    0x000000000000669d <+237>:   cmp    %rax,%r15
>>    0x00000000000066a0 <+240>:   jne    0x67df <bch_btree_node_read_done+559>
>>    0x00000000000067b8 <+520>:   mov    (%rbx),%r15
>>
>> 240                                     goto err;
>> 241                             break;
>> 242                     }
>> 243
>> 244                     err = "empty set";
>>    0x00000000000069e0 <+1072>:  mov    $0x0,%rdx
>>    0x00000000000069e7 <+1079>:  jmpq   0x6807 <bch_btree_node_read_done+599>
>>
>> 245                     if (i != b->keys.set[0].data && !i->keys)
>>    0x00000000000066a6 <+246>:   cmp    %rbx,0x108(%r12)
>>    0x00000000000066ae <+254>:   je     0x67f0 <bch_btree_node_read_done+576>
>>    0x00000000000066b4 <+260>:   mov    0x1c(%rbx),%eax
>>    0x00000000000066b7 <+263>:   test   %eax,%eax
>>    0x00000000000066b9 <+265>:   je     0x69e0
>> <bch_btree_node_read_done+1072>
>>
>> 246                             goto err;
>> 247
>> 248                     bch_btree_iter_push(iter, i->start,
>> bset_bkey_last(i));
>>    0x00000000000066c3 <+275>:   mov    %r14,%rsi
>>    0x00000000000066c6 <+278>:   mov    %r13,%rdi
>>    0x00000000000066c9 <+281>:   callq  0x66ce <bch_btree_node_read_done+286>
>>
>> 249
>> 250                     b->written += set_blocks(i, block_bytes(b->c));
>>    0x00000000000066ce <+286>:   mov    0x80(%r12),%rsi
>>    0x00000000000066d6 <+294>:   mov    0x1c(%rbx),%eax
>>    0x00000000000066d9 <+297>:   xor    %edx,%edx
>>    0x00000000000066e3 <+307>:   movzwl 0x430(%rsi),%ecx
>>    0x00000000000066ea <+314>:   shl    $0x9,%ecx
>>    0x00000000000066ed <+317>:   movslq %ecx,%rcx
>>    0x00000000000066f0 <+320>:   lea    0x1f(%rcx,%rax,8),%rax
>>    0x00000000000066f5 <+325>:   div    %rcx
>>    0x0000000000006704 <+340>:   mov    %eax,%edi
>>    0x0000000000006706 <+342>:   add    0xc0(%r12),%di
>>    0x0000000000006712 <+354>:   mov    %di,0xc0(%r12)
>>
>> 251             }
>> 252
>> 253             err = "corrupted btree";
>>    0x00000000000069b0 <+1024>:  mov    $0x0,%rdx
>>    0x00000000000069b7 <+1031>:  jmpq   0x6807 <bch_btree_node_read_done+599>
>>    0x00000000000069bc <+1036>:  nopl   0x0(%rax)
>>
>> 254             for (i = write_block(b);
>>    0x00000000000068a1 <+753>:   cmp    %rdx,%rcx
>>    0x00000000000068a4 <+756>:   jae    0x68e5 <bch_btree_node_read_done+821>
>>    0x00000000000068e0 <+816>:   cmp    %rdx,%rcx
>>    0x00000000000068e3 <+819>:   jb     0x68c8 <bch_btree_node_read_done+792>
>>
>> 255                  bset_sector_offset(&b->keys, i) < KEY_SIZE(&b->key);
>> 256                  i = ((void *) i) + block_bytes(b->c))
>>    0x00000000000068d7 <+807>:   mov    %rcx,%rbx
>>    0x00000000000068da <+810>:   sub    %r8d,%ecx
>>
>> 257                     if (i->seq == b->keys.set[0].data->seq)
>>    0x00000000000068a6 <+758>:   mov    0x10(%r8),%rdi
>>    0x00000000000068aa <+762>:   cmp    %rdi,0x10(%rbx)
>>    0x00000000000068ae <+766>:   je     0x69b0
>> <bch_btree_node_read_done+1024>
>>    0x00000000000068b4 <+772>:   cltq
>>    0x00000000000068b6 <+774>:   mov    %rax,%r9
>>    0x00000000000068b9 <+777>:   lea    (%rbx,%rax,1),%rcx
>>    0x00000000000068bd <+781>:   neg    %r9
>>    0x00000000000068c0 <+784>:   jmp    0x68d7 <bch_btree_node_read_done+807>
>>    0x00000000000068c2 <+786>:   nopw   0x0(%rax,%rax,1)
>>    0x00000000000068c8 <+792>:   lea    (%rbx,%rax,1),%rcx
>>    0x00000000000068cc <+796>:   cmp    0x10(%rcx,%r9,1),%rdi
>>    0x00000000000068d1 <+801>:   je     0x69b0
>> <bch_btree_node_read_done+1024>
>>
>> 258                             goto err;
>> 259
>> 260             bch_btree_sort_and_fix_extents(&b->keys, iter, &b->c->sort);
>>    0x00000000000068e5 <+821>:   lea    0xc8(%r12),%r14
>>    0x00000000000068ed <+829>:   lea    0xcb60(%rsi),%rdx
>>    0x00000000000068f4 <+836>:   mov    %r13,%rsi
>>    0x00000000000068f7 <+839>:   mov    %r14,%rdi
>>    0x00000000000068fa <+842>:   callq  0x68ff <bch_btree_node_read_done+847>
>>
>> 261
>> 262             i = b->keys.set[0].data;
>>    0x0000000000006907 <+855>:   mov    0x108(%r12),%rbx
>>
>> 263             err = "short btree key";
>>    0x00000000000069ec <+1084>:  mov    $0x0,%rdx
>>    0x00000000000069f3 <+1091>:  jmpq   0x6807 <bch_btree_node_read_done+599>
>>
>> 264             if (b->keys.set[0].size &&
>>    0x00000000000068ff <+847>:   mov    0xe0(%r12),%eax
>>    0x0000000000006914 <+868>:   test   %eax,%eax
>>    0x0000000000006916 <+870>:   je     0x694d <bch_btree_node_read_done+925>
>>    0x0000000000006944 <+916>:   test   %rax,%rax
>>    0x0000000000006947 <+919>:   js     0x69ec
>> <bch_btree_node_read_done+1084>
>>
>> 265                 bkey_cmp(&b->key, &b->keys.set[0].end) < 0)
>> 266                     goto err;
>> 267
>> 268             if (b->written < btree_blocks(b))
>>    0x000000000000694d <+925>:   mov    0x80(%r12),%rax
>>    0x0000000000006955 <+933>:   movzwl 0xc0(%r12),%esi
>>    0x0000000000006965 <+949>:   movzwl 0xde2(%rax),%ecx
>>    0x000000000000696c <+956>:   shr    %cl,%rdx
>>    0x000000000000696f <+959>:   cmp    %edx,%esi
>>    0x0000000000006971 <+961>:   jae    0x6868 <bch_btree_node_read_done+696>
>>
>> 269                     bch_bset_init_next(&b->keys, write_block(b),
>>    0x000000000000698f <+991>:   mov    %r14,%rdi
>>    0x000000000000699e <+1006>:  callq  0x69a3
>> <bch_btree_node_read_done+1011>
>>    0x00000000000069a3 <+1011>:  mov    0x80(%r12),%rax
>>    0x00000000000069ab <+1019>:  jmpq   0x6868 <bch_btree_node_read_done+696>
>>
>> 270                                        bset_magic(&b->c->sb));
>> 271     out:
>> 272             mempool_free(iter, b->c->fill_iter);
>>    0x0000000000006868 <+696>:   mov    0xcb58(%rax),%rsi
>>    0x000000000000686f <+703>:   mov    %r13,%rdi
>>    0x0000000000006872 <+706>:   callq  0x6877 <bch_btree_node_read_done+711>
>>
>> 273             return;
>> 274     err:
>> 275             set_btree_node_io_error(b);
>> 276             bch_cache_set_error(b->c, "%s at bucket %zu, block %u,
>> %u keys",
>>    0x0000000000006829 <+633>:   mov    0x1c(%rbx),%r9d
>>    0x000000000000684a <+666>:   mov    %esi,%ecx
>>    0x000000000000684c <+668>:   mov    $0x0,%rsi
>>    0x0000000000006853 <+675>:   shr    %cl,%r8d
>>    0x0000000000006856 <+678>:   mov    %rax,%rcx
>>    0x0000000000006859 <+681>:   xor    %eax,%eax
>>    0x000000000000685b <+683>:   callq  0x6860 <bch_btree_node_read_done+688>
>>    0x0000000000006860 <+688>:   mov    0x80(%r12),%rax
>>
>> 277                                 err, PTR_BUCKET_NR(b->c, &b->key, 0),
>> 278                                 bset_block_offset(b, i), i->keys);
>> 279             goto out;
>> 280     }
>>    0x0000000000006877 <+711>:   pop    %rbx
>>    0x0000000000006878 <+712>:   pop    %r12
>>    0x000000000000687a <+714>:   pop    %r13
>>    0x000000000000687c <+716>:   pop    %r14
>>    0x000000000000687e <+718>:   pop    %r15
>>    0x0000000000006880 <+720>:   pop    %rbp
>>    0x0000000000006881 <+721>:   retq
>>    0x0000000000006882 <+722>:   movzwl 0x430(%rsi),%eax
>>    0x0000000000006889 <+729>:   shl    $0x9,%eax
>>    0x000000000000688c <+732>:   imul   %eax,%ecx
>>    0x000000000000688f <+735>:   movslq %ecx,%rbx
>>
>>
>> On 8/13/2014 1:45 PM, Slava Pestov wrote:
>>> Can you post the disassembly of the function?
>>>
>>> On Wed, Aug 13, 2014 at 11:35 AM, Larkin Lowrey
>>> <llowrey@nuclearwinter.com> wrote:
>>>> Thanks. Trying gdb helped me find the answer. I needed to install the
>>>> kernel-debuginfo-3.15.8-200.fc20.x86_64  package via yum.
>>>>
>>>> From addr2line:
>>>>> bch_btree_node_read_done+0x4c
>>>>> drivers/md/bcache/btree.c:207
>>>> Here'a a snippet from gdb:
>>>>
>>>>> (gdb) list *(bch_btree_node_read_done+0x4c)
>>>>> 0x65fc is in bch_btree_node_read_done (drivers/md/bcache/btree.c:207).
>>>>> 202             struct bset *i = btree_bset_first(b);
>>>>> 203             struct btree_iter *iter;
>>>>> 204
>>>>> 205             iter = mempool_alloc(b->c->fill_iter, GFP_NOWAIT);
>>>>> 206             iter->size = b->c->sb.bucket_size / b->c->sb.block_size;
>>>>> 207             iter->used = 0;
>>>>> 208
>>>>> 209     #ifdef CONFIG_BCACHE_DEBUG
>>>>> 210             iter->b = &b->keys;
>>>>> 211     #endif
>>>> This doesn't make any sense to me. If iter was null I would expect line
>>>> 206 to blow up first.
>>>>
>>>> --Larkin
>>>>
>>>> On 8/13/2014 12:41 PM, Slava Pestov wrote:
>>>>> You can try to use gdb:
>>>>>
>>>>> gdb /lib/modules/.../foo.ko
>>>>>
>>>>> list *(bch_btree_node_read_done+0x4c)
>>>>>
>>>>>
>>>>> On Wed, Aug 13, 2014 at 9:40 AM, Larkin Lowrey
>>>>> <llowrey@nuclearwinter.com> wrote:
>>>>>> This is making be feel very dumb. I've googled extensively but can't
>>>>>> figure out how to run addr2line for a module.
>>>>>>
>>>>>> I'm running Fedora 20 and the kernel did not have debugging symbols. I
>>>>>> downloaded the version with symbols but I don't know if the addresses
>>>>>> are going to be the same. Bcache is a module for me and that's where
>>>>>> things get tricky. Do you have any tips?
>>>>>>
>>>>>> --Larkin
>>>>>>
>>>>>> On 8/13/2014 12:04 AM, Kent Overstreet wrote:
>>>>>>> Any chance you could do an addr2line and get me the exact line where
>>>>>>> it happened?
>>>>>>>
>>>>>>> On Aug 12, 2014 10:02 PM, "Larkin Lowrey" <llowrey@nuclearwinter.com
>>>>>>> <mailto:llowrey@nuclearwinter.com>> wrote:
>>>>>>>
>>>>>>>     I got an oops while doing some heavy I/O. I have an md raid10 cache
>>>>>>>     device (4 SSDs) and 3 md raid5/6 backing devices. This setup has been
>>>>>>>     well behaved for about 6 months.
>>>>>>>
>>>>>>>     If this isn't a known issue is there anything I can do to provide more
>>>>>>>     useful information?
>>>>>>>
>>>>>>>     I'm running kernel 3.15.8-200.fc20.x86_64.
>>>>>>>
>>>>>>>     [210884.047249] BUG: unable to handle kernel NULL pointer
>>>>>>>     dereference at 0000000000000008
>>>>>>>     [210884.055605] IP: [<ffffffffa01625fc>]
>>>>>>>     bch_btree_node_read_done+0x4c/0x450 [bcache]
>>>>>>>     [210884.063723] PGD 0
>>>>>>>     [210884.066053] Oops: 0002 [#1] SMP
>>>>>>>     [210884.069610] Modules linked in: lp parport binfmt_misc
>>>>>>>     ipt_MASQUERADE iptable_nat nf_nat_ipv4 nf_nat xt_CHECKSUM
>>>>>>>     iptable_mangle tun bridge stp llc xt_multiport ebtable_nat
>>>>>>>     ebtables hwmon_vid ip6t_REJECT nf_conntrack_ipv6 nf_conntrack_ipv4
>>>>>>>     nf_defrag_ipv6 nf_defrag_ipv4 ip6table_filter xt_conntrack
>>>>>>>     ip6_tables nf_conntrack keyspan ezusb kvm_amd kvm crct10dif_pclmul
>>>>>>>     crc32_pclmul crc32c_intel ghash_clmulni_intel microcode serio_raw
>>>>>>>     amd64_edac_mod edac_core fam15h_power k10temp edac_mce_amd
>>>>>>>     sp5100_tco i2c_piix4 igb ptp pps_core dca shpchp acpi_cpufreq
>>>>>>>     btrfs bcache raid456 async_raid6_recov async_memcpy async_pq
>>>>>>>     async_xor async_tx xor raid6_pq raid10 i2c_algo_bit drm_kms_helper
>>>>>>>     ttm drm i2c_core mpt2sas mvsas libsas raid_class
>>>>>>>     scsi_transport_sas cpufreq_stats
>>>>>>>     [210884.140704] CPU: 5 PID: 11188 Comm: kworker/5:1 Not tainted
>>>>>>>     3.15.8-200.fc20.x86_64 #1
>>>>>>>     [210884.149069] Hardware name:  /H8DG6/H8DGi, BIOS 3.0a       07/2
>>>>>>>     [210884.155280] Workqueue: bcache cache_lookup [bcache]
>>>>>>>     [210884.160531] task: ffff880218633160 ti: ffff8800217b8000
>>>>>>>     task.ti: ffff8800217b8000
>>>>>>>     [210884.168502] RIP: 0010:[<ffffffffa01625fc>]
>>>>>>>      [<ffffffffa01625fc>] bch_btree_node_read_done+0x4c/0x450 [bcache]
>>>>>>>     [210884.179105] RSP: 0000:ffff8800217bbbe8  EFLAGS: 00010212
>>>>>>>     [210884.184806] RAX: 0000000000000400 RBX: ffff880245ec0000 RCX:
>>>>>>>     0000000000000000
>>>>>>>     [210884.192480] RDX: 0000000000000000 RSI: ffff880418380000 RDI:
>>>>>>>     0000000000000246
>>>>>>>     [210884.200075] RBP: ffff8800217bbc10 R08: 0000000000000000 R09:
>>>>>>>     0000000000000f6b
>>>>>>>     [210884.207738] R10: 0000000000000000 R11: 0000000000000400 R12:
>>>>>>>     ffff880413d06c00
>>>>>>>     [210884.215391] R13: 0000000000000000 R14: ffff8800217bbc20 R15:
>>>>>>>     ffff880413d06c00
>>>>>>>     [210884.222961] FS:  00007f73bacd6880(0000)
>>>>>>>     GS:ffff88021fd40000(0000) knlGS:0000000000000000
>>>>>>>     [210884.231516] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
>>>>>>>     [210884.237557] CR2: 0000000000000008 CR3: 0000000001c11000 CR4:
>>>>>>>     00000000000407e0
>>>>>>>     [210884.245131] Stack:
>>>>>>>     [210884.247395]  ffff880274f4d020 ffff880413d06c00
>>>>>>>     0000bfcc44a463f8 ffff8800217bbc20
>>>>>>>     [210884.255337]  ffff880413d06c00 ffff8800217bbc78
>>>>>>>     ffffffffa0162b68 0000000000000000
>>>>>>>     [210884.263256]  ffff880218633160 0000000000000000
>>>>>>>     0000000000000000 0000000000000000
>>>>>>>     [210884.271234] Call Trace:
>>>>>>>     [210884.273985]  [<ffffffffa0162b68>]
>>>>>>>     bch_btree_node_read+0x168/0x190 [bcache]
>>>>>>>     [210884.281258]  [<ffffffffa0163f69>]
>>>>>>>     bch_btree_node_get+0x169/0x290 [bcache]
>>>>>>>     [210884.288377]  [<ffffffffa01642f5>]
>>>>>>>     bch_btree_map_keys_recurse+0xd5/0x1d0 [bcache]
>>>>>>>     [210884.296311]  [<ffffffffa016dcb0>] ?
>>>>>>>     cached_dev_congested+0x180/0x180 [bcache]
>>>>>>>     [210884.303953]  [<ffffffff8135b204>] ?
>>>>>>>     call_rwsem_down_read_failed+0x14/0x30
>>>>>>>     [210884.311158]  [<ffffffffa01673f7>]
>>>>>>>     bch_btree_map_keys+0x127/0x150 [bcache]
>>>>>>>     [210884.318273]  [<ffffffffa016dcb0>] ?
>>>>>>>     cached_dev_congested+0x180/0x180 [bcache]
>>>>>>>     [210884.325826]  [<ffffffffa016e7f5>] cache_lookup+0xf5/0x1f0 [bcache]
>>>>>>>     [210884.332325]  [<ffffffff810a4af6>] process_one_work+0x176/0x430
>>>>>>>     [210884.338427]  [<ffffffff810a578b>] worker_thread+0x11b/0x3a0
>>>>>>>     [210884.344282]  [<ffffffff810a5670>] ? rescuer_thread+0x3b0/0x3b0
>>>>>>>     [210884.350447]  [<ffffffff810ac528>] kthread+0xd8/0xf0
>>>>>>>     [210884.355615]  [<ffffffff810ac450>] ? insert_kthread_work+0x40/0x40
>>>>>>>     [210884.362017]  [<ffffffff816ff93c>] ret_from_fork+0x7c/0xb0
>>>>>>>     [210884.367756]  [<ffffffff810ac450>] ? insert_kthread_work+0x40/0x40
>>>>>>>     [210884.374234] Code: 08 01 00 00 48 8b b8 58 cb 00 00 e8 bf 25 01
>>>>>>>     e1 49 8b b4 24 80 00 00 00 49 89 c5 31 d2 0f b7 86 32 04 00 00 66
>>>>>>>     f7 b6 30 04 00 00 <49> c7 45 08 00 00 00 00 0f b7 c0 49 89 45 00
>>>>>>>     48 8b 43 10 48 85
>>>>>>>     [210884.395405] RIP  [<ffffffffa01625fc>]
>>>>>>>     bch_btree_node_read_done+0x4c/0x450 [bcache]
>>>>>>>     [210884.403389]  RSP <ffff8800217bbbe8>
>>>>>>>     [210884.407171] CR2: 0000000000000008
>>>>>>>     [210884.411233] ---[ end trace 0064e6abfd068c85 ]---
>>>>>>>     [210884.416352] BUG: unable to handle kernel paging request at
>>>>>>>     ffffffffffffffd8
>>>>>>>     [210884.423871] IP: [<ffffffff810acb10>] kthread_data+0x10/0x20
>>>>>>>     [210884.429915] PGD 1c14067 PUD 1c16067 PMD 0
>>>>>>>
>>>>>>>     --Larkin
>>>>>>>
>>>>>>>     --
>>>>>>>     To unsubscribe from this list: send the line "unsubscribe
>>>>>>>     linux-bcache" in
>>>>>>>     the body of a message to majordomo@vger.kernel.org
>>>>>>>     <mailto:majordomo@vger.kernel.org>
>>>>>>>     More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>>>
>>>>>> --
>>>>>> To unsubscribe from this list: send the line "unsubscribe linux-bcache" in
>>>>>> the body of a message to majordomo@vger.kernel.org
>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>> --
>>>>> To unsubscribe from this list: send the line "unsubscribe linux-bcache" in
>>>>> the body of a message to majordomo@vger.kernel.org
>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Null pointer oops
  2014-08-13 21:30               ` Slava Pestov
@ 2014-08-13 21:34                 ` Jianjian Huo
  2014-08-13 22:14                 ` Larkin Lowrey
  2014-08-16  5:48                 ` Peter Kieser
  2 siblings, 0 replies; 13+ messages in thread
From: Jianjian Huo @ 2014-08-13 21:34 UTC (permalink / raw)
  To: Slava Pestov; +Cc: Larkin Lowrey, Kent Overstreet, linux-bcache

yes, it's GFP_NOIO in 3.16.
And Line 207 could be executed before 206, due to out-of-order execution.

On Wed, Aug 13, 2014 at 2:30 PM, Slava Pestov <sp@datera.io> wrote:
> I was mistaken. The bug is fixed in the pull request Kent sent to Jens for 3.16:
>
> http://evilpiepirate.org/git/linux-bcache.git/commit/?h=bcache-dev&id=bcf090e0040e30f8409e6a535a01e6473afb096f
>
> On Wed, Aug 13, 2014 at 2:25 PM, Slava Pestov <sp@datera.io> wrote:
>> Indeed it looks like iter is NULL. I see the bug is still present in
>> the latest dev branch. The problem is that we're not checking the
>> return value of mempoool_alloc(), which may be NULL if we pass
>> GFP_NOWAIT.
>>
>> On Wed, Aug 13, 2014 at 2:21 PM, Larkin Lowrey
>> <llowrey@nuclearwinter.com> wrote:
>>> Here's the dissassembly of bch_btree_node_read_done. The offending line
>>> is 207 and the instruction is at offset 76.
>>>
>>> --Larkin
>>>
>>> 199     void bch_btree_node_read_done(struct btree *b)
>>> 200     {
>>>    0x00000000000065b0 <+0>:     callq  0x65b5 <bch_btree_node_read_done+5>
>>>    0x00000000000065b5 <+5>:     push   %rbp
>>>    0x00000000000065b8 <+8>:     mov    %rsp,%rbp
>>>    0x00000000000065bb <+11>:    push   %r15
>>>    0x00000000000065bd <+13>:    push   %r14
>>>    0x00000000000065bf <+15>:    push   %r13
>>>    0x00000000000065c1 <+17>:    push   %r12
>>>    0x00000000000065c3 <+19>:    mov    %rdi,%r12
>>>    0x00000000000065c6 <+22>:    push   %rbx
>>>
>>> 201             const char *err = "bad btree header";
>>>    0x0000000000006800 <+592>:   mov    $0x0,%rdx
>>>
>>> 202             struct bset *i = btree_bset_first(b);
>>> 203             struct btree_iter *iter;
>>> 204
>>> 205             iter = mempool_alloc(b->c->fill_iter, GFP_NOWAIT);
>>>    0x00000000000065b6 <+6>:     xor    %esi,%esi
>>>    0x00000000000065c7 <+23>:    mov    0x80(%rdi),%rax
>>>    0x00000000000065d5 <+37>:    mov    0xcb58(%rax),%rdi
>>>    0x00000000000065dc <+44>:    callq  0x65e1 <bch_btree_node_read_done+49>
>>>    0x00000000000065e9 <+57>:    mov    %rax,%r13
>>>
>>> 206             iter->size = b->c->sb.bucket_size / b->c->sb.block_size;
>>>    0x00000000000065e1 <+49>:    mov    0x80(%r12),%rsi
>>>    0x00000000000065ec <+60>:    xor    %edx,%edx
>>>    0x00000000000065ee <+62>:    movzwl 0x432(%rsi),%eax
>>>    0x00000000000065f5 <+69>:    divw   0x430(%rsi)
>>>    0x0000000000006604 <+84>:    movzwl %ax,%eax
>>>    0x0000000000006607 <+87>:    mov    %rax,0x0(%r13)
>>>
>>> 207             iter->used = 0;
>>>    0x00000000000065fc <+76>:    movq   $0x0,0x8(%r13)
>>>
>>> 208
>>> 209     #ifdef CONFIG_BCACHE_DEBUG
>>> 210             iter->b = &b->keys;
>>> 211     #endif
>>> 212
>>> 213             if (!i->seq)
>>>    0x000000000000660b <+91>:    mov    0x10(%rbx),%rax
>>>    0x000000000000660f <+95>:    test   %rax,%rax
>>>    0x0000000000006612 <+98>:    je     0x6800 <bch_btree_node_read_done+592>
>>>
>>> 214                     goto err;
>>> 215
>>> 216             for (;
>>>    0x000000000000664d <+157>:   cmp    %r9d,%ecx
>>>    0x0000000000006650 <+160>:   jae    0x6882 <bch_btree_node_read_done+722>
>>>    0x0000000000006744 <+404>:   cmp    %r9d,%r10d
>>>    0x0000000000006747 <+407>:   jae    0x6898 <bch_btree_node_read_done+744>
>>>
>>> 217                  b->written < btree_blocks(b) && i->seq ==
>>> b->keys.set[0].data->seq;
>>>    0x0000000000006618 <+104>:   mov    0x80(%r12),%rsi
>>>    0x0000000000006625 <+117>:   movzwl 0xc0(%r12),%edi
>>>    0x000000000000662e <+126>:   mov    0x108(%r12),%r8
>>>    0x0000000000006636 <+134>:   movzwl 0xde2(%rsi),%ecx
>>>    0x0000000000006644 <+148>:   mov    %rdx,%r9
>>>    0x0000000000006647 <+151>:   shr    %cl,%r9
>>>    0x000000000000664a <+154>:   movzwl %di,%ecx
>>>    0x0000000000006656 <+166>:   cmp    0x10(%r8),%rax
>>>    0x000000000000665a <+170>:   jne    0x6882 <bch_btree_node_read_done+722>
>>>    0x000000000000670f <+351>:   mov    %rdx,%r9
>>>    0x000000000000672a <+378>:   movzwl 0xde2(%rsi),%ecx
>>>    0x0000000000006738 <+392>:   shr    %cl,%r9
>>>    0x000000000000674d <+413>:   mov    0x10(%r8),%rcx
>>>    0x0000000000006751 <+417>:   cmp    %rcx,0x10(%rbx)
>>>    0x0000000000006755 <+421>:   jne    0x6898 <bch_btree_node_read_done+744>
>>>    0x0000000000006892 <+738>:   add    %r8,%rbx
>>>    0x0000000000006895 <+741>:   nopl   (%rax)
>>>
>>> 218                  i = write_block(b)) {
>>> 219                     err = "unsupported bset version";
>>>    0x00000000000069c0 <+1040>:  mov    $0x0,%rdx
>>>    0x00000000000069c7 <+1047>:  jmpq   0x6807 <bch_btree_node_read_done+599>
>>>    0x00000000000069cc <+1052>:  nopl   0x0(%rax)
>>>
>>> 220                     if (i->version > BCACHE_BSET_VERSION)
>>>    0x0000000000006660 <+176>:   mov    0x18(%rbx),%r10d
>>>    0x0000000000006664 <+180>:   cmp    $0x1,%r10d
>>>    0x0000000000006668 <+184>:   ja     0x69c0
>>> <bch_btree_node_read_done+1040>
>>>    0x000000000000666e <+190>:   movzwl 0x430(%rsi),%r11d
>>>    0x0000000000006676 <+198>:   jmpq   0x6769 <bch_btree_node_read_done+441>
>>>    0x000000000000667b <+203>:   nopl   0x0(%rax,%rax,1)
>>>    0x000000000000675b <+427>:   mov    0x18(%rbx),%r10d
>>>    0x000000000000675f <+431>:   cmp    $0x1,%r10d
>>>    0x0000000000006763 <+435>:   ja     0x69c0
>>> <bch_btree_node_read_done+1040>
>>>
>>> 221                             goto err;
>>> 222
>>> 223                     err = "bad btree header";
>>> 224                     if (b->written + set_blocks(i, block_bytes(b->c)) >
>>>    0x0000000000006769 <+441>:   mov    0x1c(%rbx),%eax
>>>    0x000000000000676c <+444>:   mov    %r11,%rcx
>>>    0x000000000000676f <+447>:   xor    %edx,%edx
>>>    0x0000000000006771 <+449>:   shl    $0x9,%rcx
>>>    0x0000000000006775 <+453>:   movzwl %di,%edi
>>>    0x0000000000006778 <+456>:   mov    %r9d,%r9d
>>>    0x000000000000677b <+459>:   and    $0x1fffe00,%ecx
>>>    0x0000000000006781 <+465>:   lea    0x20(,%rax,8),%r8
>>>    0x0000000000006789 <+473>:   lea    -0x1(%r8,%rcx,1),%rax
>>>    0x000000000000678e <+478>:   div    %rcx
>>>    0x0000000000006791 <+481>:   add    %rdi,%rax
>>>    0x0000000000006794 <+484>:   cmp    %r9,%rax
>>>    0x0000000000006797 <+487>:   ja     0x6800 <bch_btree_node_read_done+592>
>>>
>>> 225                         btree_blocks(b))
>>> 226                             goto err;
>>> 227
>>> 228                     err = "bad magic";
>>>    0x00000000000069d0 <+1056>:  mov    $0x0,%rdx
>>>    0x00000000000069d7 <+1063>:  jmpq   0x6807 <bch_btree_node_read_done+599>
>>>    0x00000000000069dc <+1068>:  nopl   0x0(%rax)
>>>
>>> 229                     if (i->magic != bset_magic(&b->c->sb))
>>>    0x00000000000067aa <+506>:   cmp    %rax,0x8(%rbx)
>>>    0x00000000000067ae <+510>:   jne    0x69d0
>>> <bch_btree_node_read_done+1056>
>>>
>>> 230                             goto err;
>>> 231
>>> 232                     err = "bad checksum";
>>>    0x00000000000067df <+559>:   mov    $0x0,%rdx
>>>    0x00000000000067e6 <+566>:   jmp    0x6807 <bch_btree_node_read_done+599>
>>>    0x00000000000067e8 <+568>:   nopl   0x0(%rax,%rax,1)
>>>    0x00000000000067f0 <+576>:   mov    0x1c(%rbx),%eax
>>>    0x00000000000067f3 <+579>:   jmpq   0x66bf <bch_btree_node_read_done+271>
>>>    0x00000000000067f8 <+584>:   nopl   0x0(%rax,%rax,1)
>>>
>>> 233                     switch (i->version) {
>>>    0x00000000000067b4 <+516>:   cmp    $0x1,%r10d
>>>    0x00000000000067bb <+523>:   je     0x6680 <bch_btree_node_read_done+208>
>>>
>>> 234                     case 0:
>>> 235                             if (i->csum != csum_set(i))
>>>    0x00000000000067c1 <+529>:   lea    0x20(%rbx),%r14
>>>    0x00000000000067c5 <+533>:   lea    0x8(%rbx),%rdi
>>>    0x00000000000067ce <+542>:   sub    %rdi,%rsi
>>>    0x00000000000067d1 <+545>:   callq  0x67d6 <bch_btree_node_read_done+550>
>>>    0x00000000000067d6 <+550>:   cmp    %rax,%r15
>>>    0x00000000000067d9 <+553>:   je     0x66a6 <bch_btree_node_read_done+246>
>>> 236                                     goto err;
>>> 237                             break;
>>> 238                     case BCACHE_BSET_VERSION:
>>> 239                             if (i->csum != btree_csum_set(b, i))
>>>    0x000000000000669d <+237>:   cmp    %rax,%r15
>>>    0x00000000000066a0 <+240>:   jne    0x67df <bch_btree_node_read_done+559>
>>>    0x00000000000067b8 <+520>:   mov    (%rbx),%r15
>>>
>>> 240                                     goto err;
>>> 241                             break;
>>> 242                     }
>>> 243
>>> 244                     err = "empty set";
>>>    0x00000000000069e0 <+1072>:  mov    $0x0,%rdx
>>>    0x00000000000069e7 <+1079>:  jmpq   0x6807 <bch_btree_node_read_done+599>
>>>
>>> 245                     if (i != b->keys.set[0].data && !i->keys)
>>>    0x00000000000066a6 <+246>:   cmp    %rbx,0x108(%r12)
>>>    0x00000000000066ae <+254>:   je     0x67f0 <bch_btree_node_read_done+576>
>>>    0x00000000000066b4 <+260>:   mov    0x1c(%rbx),%eax
>>>    0x00000000000066b7 <+263>:   test   %eax,%eax
>>>    0x00000000000066b9 <+265>:   je     0x69e0
>>> <bch_btree_node_read_done+1072>
>>>
>>> 246                             goto err;
>>> 247
>>> 248                     bch_btree_iter_push(iter, i->start,
>>> bset_bkey_last(i));
>>>    0x00000000000066c3 <+275>:   mov    %r14,%rsi
>>>    0x00000000000066c6 <+278>:   mov    %r13,%rdi
>>>    0x00000000000066c9 <+281>:   callq  0x66ce <bch_btree_node_read_done+286>
>>>
>>> 249
>>> 250                     b->written += set_blocks(i, block_bytes(b->c));
>>>    0x00000000000066ce <+286>:   mov    0x80(%r12),%rsi
>>>    0x00000000000066d6 <+294>:   mov    0x1c(%rbx),%eax
>>>    0x00000000000066d9 <+297>:   xor    %edx,%edx
>>>    0x00000000000066e3 <+307>:   movzwl 0x430(%rsi),%ecx
>>>    0x00000000000066ea <+314>:   shl    $0x9,%ecx
>>>    0x00000000000066ed <+317>:   movslq %ecx,%rcx
>>>    0x00000000000066f0 <+320>:   lea    0x1f(%rcx,%rax,8),%rax
>>>    0x00000000000066f5 <+325>:   div    %rcx
>>>    0x0000000000006704 <+340>:   mov    %eax,%edi
>>>    0x0000000000006706 <+342>:   add    0xc0(%r12),%di
>>>    0x0000000000006712 <+354>:   mov    %di,0xc0(%r12)
>>>
>>> 251             }
>>> 252
>>> 253             err = "corrupted btree";
>>>    0x00000000000069b0 <+1024>:  mov    $0x0,%rdx
>>>    0x00000000000069b7 <+1031>:  jmpq   0x6807 <bch_btree_node_read_done+599>
>>>    0x00000000000069bc <+1036>:  nopl   0x0(%rax)
>>>
>>> 254             for (i = write_block(b);
>>>    0x00000000000068a1 <+753>:   cmp    %rdx,%rcx
>>>    0x00000000000068a4 <+756>:   jae    0x68e5 <bch_btree_node_read_done+821>
>>>    0x00000000000068e0 <+816>:   cmp    %rdx,%rcx
>>>    0x00000000000068e3 <+819>:   jb     0x68c8 <bch_btree_node_read_done+792>
>>>
>>> 255                  bset_sector_offset(&b->keys, i) < KEY_SIZE(&b->key);
>>> 256                  i = ((void *) i) + block_bytes(b->c))
>>>    0x00000000000068d7 <+807>:   mov    %rcx,%rbx
>>>    0x00000000000068da <+810>:   sub    %r8d,%ecx
>>>
>>> 257                     if (i->seq == b->keys.set[0].data->seq)
>>>    0x00000000000068a6 <+758>:   mov    0x10(%r8),%rdi
>>>    0x00000000000068aa <+762>:   cmp    %rdi,0x10(%rbx)
>>>    0x00000000000068ae <+766>:   je     0x69b0
>>> <bch_btree_node_read_done+1024>
>>>    0x00000000000068b4 <+772>:   cltq
>>>    0x00000000000068b6 <+774>:   mov    %rax,%r9
>>>    0x00000000000068b9 <+777>:   lea    (%rbx,%rax,1),%rcx
>>>    0x00000000000068bd <+781>:   neg    %r9
>>>    0x00000000000068c0 <+784>:   jmp    0x68d7 <bch_btree_node_read_done+807>
>>>    0x00000000000068c2 <+786>:   nopw   0x0(%rax,%rax,1)
>>>    0x00000000000068c8 <+792>:   lea    (%rbx,%rax,1),%rcx
>>>    0x00000000000068cc <+796>:   cmp    0x10(%rcx,%r9,1),%rdi
>>>    0x00000000000068d1 <+801>:   je     0x69b0
>>> <bch_btree_node_read_done+1024>
>>>
>>> 258                             goto err;
>>> 259
>>> 260             bch_btree_sort_and_fix_extents(&b->keys, iter, &b->c->sort);
>>>    0x00000000000068e5 <+821>:   lea    0xc8(%r12),%r14
>>>    0x00000000000068ed <+829>:   lea    0xcb60(%rsi),%rdx
>>>    0x00000000000068f4 <+836>:   mov    %r13,%rsi
>>>    0x00000000000068f7 <+839>:   mov    %r14,%rdi
>>>    0x00000000000068fa <+842>:   callq  0x68ff <bch_btree_node_read_done+847>
>>>
>>> 261
>>> 262             i = b->keys.set[0].data;
>>>    0x0000000000006907 <+855>:   mov    0x108(%r12),%rbx
>>>
>>> 263             err = "short btree key";
>>>    0x00000000000069ec <+1084>:  mov    $0x0,%rdx
>>>    0x00000000000069f3 <+1091>:  jmpq   0x6807 <bch_btree_node_read_done+599>
>>>
>>> 264             if (b->keys.set[0].size &&
>>>    0x00000000000068ff <+847>:   mov    0xe0(%r12),%eax
>>>    0x0000000000006914 <+868>:   test   %eax,%eax
>>>    0x0000000000006916 <+870>:   je     0x694d <bch_btree_node_read_done+925>
>>>    0x0000000000006944 <+916>:   test   %rax,%rax
>>>    0x0000000000006947 <+919>:   js     0x69ec
>>> <bch_btree_node_read_done+1084>
>>>
>>> 265                 bkey_cmp(&b->key, &b->keys.set[0].end) < 0)
>>> 266                     goto err;
>>> 267
>>> 268             if (b->written < btree_blocks(b))
>>>    0x000000000000694d <+925>:   mov    0x80(%r12),%rax
>>>    0x0000000000006955 <+933>:   movzwl 0xc0(%r12),%esi
>>>    0x0000000000006965 <+949>:   movzwl 0xde2(%rax),%ecx
>>>    0x000000000000696c <+956>:   shr    %cl,%rdx
>>>    0x000000000000696f <+959>:   cmp    %edx,%esi
>>>    0x0000000000006971 <+961>:   jae    0x6868 <bch_btree_node_read_done+696>
>>>
>>> 269                     bch_bset_init_next(&b->keys, write_block(b),
>>>    0x000000000000698f <+991>:   mov    %r14,%rdi
>>>    0x000000000000699e <+1006>:  callq  0x69a3
>>> <bch_btree_node_read_done+1011>
>>>    0x00000000000069a3 <+1011>:  mov    0x80(%r12),%rax
>>>    0x00000000000069ab <+1019>:  jmpq   0x6868 <bch_btree_node_read_done+696>
>>>
>>> 270                                        bset_magic(&b->c->sb));
>>> 271     out:
>>> 272             mempool_free(iter, b->c->fill_iter);
>>>    0x0000000000006868 <+696>:   mov    0xcb58(%rax),%rsi
>>>    0x000000000000686f <+703>:   mov    %r13,%rdi
>>>    0x0000000000006872 <+706>:   callq  0x6877 <bch_btree_node_read_done+711>
>>>
>>> 273             return;
>>> 274     err:
>>> 275             set_btree_node_io_error(b);
>>> 276             bch_cache_set_error(b->c, "%s at bucket %zu, block %u,
>>> %u keys",
>>>    0x0000000000006829 <+633>:   mov    0x1c(%rbx),%r9d
>>>    0x000000000000684a <+666>:   mov    %esi,%ecx
>>>    0x000000000000684c <+668>:   mov    $0x0,%rsi
>>>    0x0000000000006853 <+675>:   shr    %cl,%r8d
>>>    0x0000000000006856 <+678>:   mov    %rax,%rcx
>>>    0x0000000000006859 <+681>:   xor    %eax,%eax
>>>    0x000000000000685b <+683>:   callq  0x6860 <bch_btree_node_read_done+688>
>>>    0x0000000000006860 <+688>:   mov    0x80(%r12),%rax
>>>
>>> 277                                 err, PTR_BUCKET_NR(b->c, &b->key, 0),
>>> 278                                 bset_block_offset(b, i), i->keys);
>>> 279             goto out;
>>> 280     }
>>>    0x0000000000006877 <+711>:   pop    %rbx
>>>    0x0000000000006878 <+712>:   pop    %r12
>>>    0x000000000000687a <+714>:   pop    %r13
>>>    0x000000000000687c <+716>:   pop    %r14
>>>    0x000000000000687e <+718>:   pop    %r15
>>>    0x0000000000006880 <+720>:   pop    %rbp
>>>    0x0000000000006881 <+721>:   retq
>>>    0x0000000000006882 <+722>:   movzwl 0x430(%rsi),%eax
>>>    0x0000000000006889 <+729>:   shl    $0x9,%eax
>>>    0x000000000000688c <+732>:   imul   %eax,%ecx
>>>    0x000000000000688f <+735>:   movslq %ecx,%rbx
>>>
>>>
>>> On 8/13/2014 1:45 PM, Slava Pestov wrote:
>>>> Can you post the disassembly of the function?
>>>>
>>>> On Wed, Aug 13, 2014 at 11:35 AM, Larkin Lowrey
>>>> <llowrey@nuclearwinter.com> wrote:
>>>>> Thanks. Trying gdb helped me find the answer. I needed to install the
>>>>> kernel-debuginfo-3.15.8-200.fc20.x86_64  package via yum.
>>>>>
>>>>> From addr2line:
>>>>>> bch_btree_node_read_done+0x4c
>>>>>> drivers/md/bcache/btree.c:207
>>>>> Here'a a snippet from gdb:
>>>>>
>>>>>> (gdb) list *(bch_btree_node_read_done+0x4c)
>>>>>> 0x65fc is in bch_btree_node_read_done (drivers/md/bcache/btree.c:207).
>>>>>> 202             struct bset *i = btree_bset_first(b);
>>>>>> 203             struct btree_iter *iter;
>>>>>> 204
>>>>>> 205             iter = mempool_alloc(b->c->fill_iter, GFP_NOWAIT);
>>>>>> 206             iter->size = b->c->sb.bucket_size / b->c->sb.block_size;
>>>>>> 207             iter->used = 0;
>>>>>> 208
>>>>>> 209     #ifdef CONFIG_BCACHE_DEBUG
>>>>>> 210             iter->b = &b->keys;
>>>>>> 211     #endif
>>>>> This doesn't make any sense to me. If iter was null I would expect line
>>>>> 206 to blow up first.
>>>>>
>>>>> --Larkin
>>>>>
>>>>> On 8/13/2014 12:41 PM, Slava Pestov wrote:
>>>>>> You can try to use gdb:
>>>>>>
>>>>>> gdb /lib/modules/.../foo.ko
>>>>>>
>>>>>> list *(bch_btree_node_read_done+0x4c)
>>>>>>
>>>>>>
>>>>>> On Wed, Aug 13, 2014 at 9:40 AM, Larkin Lowrey
>>>>>> <llowrey@nuclearwinter.com> wrote:
>>>>>>> This is making be feel very dumb. I've googled extensively but can't
>>>>>>> figure out how to run addr2line for a module.
>>>>>>>
>>>>>>> I'm running Fedora 20 and the kernel did not have debugging symbols. I
>>>>>>> downloaded the version with symbols but I don't know if the addresses
>>>>>>> are going to be the same. Bcache is a module for me and that's where
>>>>>>> things get tricky. Do you have any tips?
>>>>>>>
>>>>>>> --Larkin
>>>>>>>
>>>>>>> On 8/13/2014 12:04 AM, Kent Overstreet wrote:
>>>>>>>> Any chance you could do an addr2line and get me the exact line where
>>>>>>>> it happened?
>>>>>>>>
>>>>>>>> On Aug 12, 2014 10:02 PM, "Larkin Lowrey" <llowrey@nuclearwinter.com
>>>>>>>> <mailto:llowrey@nuclearwinter.com>> wrote:
>>>>>>>>
>>>>>>>>     I got an oops while doing some heavy I/O. I have an md raid10 cache
>>>>>>>>     device (4 SSDs) and 3 md raid5/6 backing devices. This setup has been
>>>>>>>>     well behaved for about 6 months.
>>>>>>>>
>>>>>>>>     If this isn't a known issue is there anything I can do to provide more
>>>>>>>>     useful information?
>>>>>>>>
>>>>>>>>     I'm running kernel 3.15.8-200.fc20.x86_64.
>>>>>>>>
>>>>>>>>     [210884.047249] BUG: unable to handle kernel NULL pointer
>>>>>>>>     dereference at 0000000000000008
>>>>>>>>     [210884.055605] IP: [<ffffffffa01625fc>]
>>>>>>>>     bch_btree_node_read_done+0x4c/0x450 [bcache]
>>>>>>>>     [210884.063723] PGD 0
>>>>>>>>     [210884.066053] Oops: 0002 [#1] SMP
>>>>>>>>     [210884.069610] Modules linked in: lp parport binfmt_misc
>>>>>>>>     ipt_MASQUERADE iptable_nat nf_nat_ipv4 nf_nat xt_CHECKSUM
>>>>>>>>     iptable_mangle tun bridge stp llc xt_multiport ebtable_nat
>>>>>>>>     ebtables hwmon_vid ip6t_REJECT nf_conntrack_ipv6 nf_conntrack_ipv4
>>>>>>>>     nf_defrag_ipv6 nf_defrag_ipv4 ip6table_filter xt_conntrack
>>>>>>>>     ip6_tables nf_conntrack keyspan ezusb kvm_amd kvm crct10dif_pclmul
>>>>>>>>     crc32_pclmul crc32c_intel ghash_clmulni_intel microcode serio_raw
>>>>>>>>     amd64_edac_mod edac_core fam15h_power k10temp edac_mce_amd
>>>>>>>>     sp5100_tco i2c_piix4 igb ptp pps_core dca shpchp acpi_cpufreq
>>>>>>>>     btrfs bcache raid456 async_raid6_recov async_memcpy async_pq
>>>>>>>>     async_xor async_tx xor raid6_pq raid10 i2c_algo_bit drm_kms_helper
>>>>>>>>     ttm drm i2c_core mpt2sas mvsas libsas raid_class
>>>>>>>>     scsi_transport_sas cpufreq_stats
>>>>>>>>     [210884.140704] CPU: 5 PID: 11188 Comm: kworker/5:1 Not tainted
>>>>>>>>     3.15.8-200.fc20.x86_64 #1
>>>>>>>>     [210884.149069] Hardware name:  /H8DG6/H8DGi, BIOS 3.0a       07/2
>>>>>>>>     [210884.155280] Workqueue: bcache cache_lookup [bcache]
>>>>>>>>     [210884.160531] task: ffff880218633160 ti: ffff8800217b8000
>>>>>>>>     task.ti: ffff8800217b8000
>>>>>>>>     [210884.168502] RIP: 0010:[<ffffffffa01625fc>]
>>>>>>>>      [<ffffffffa01625fc>] bch_btree_node_read_done+0x4c/0x450 [bcache]
>>>>>>>>     [210884.179105] RSP: 0000:ffff8800217bbbe8  EFLAGS: 00010212
>>>>>>>>     [210884.184806] RAX: 0000000000000400 RBX: ffff880245ec0000 RCX:
>>>>>>>>     0000000000000000
>>>>>>>>     [210884.192480] RDX: 0000000000000000 RSI: ffff880418380000 RDI:
>>>>>>>>     0000000000000246
>>>>>>>>     [210884.200075] RBP: ffff8800217bbc10 R08: 0000000000000000 R09:
>>>>>>>>     0000000000000f6b
>>>>>>>>     [210884.207738] R10: 0000000000000000 R11: 0000000000000400 R12:
>>>>>>>>     ffff880413d06c00
>>>>>>>>     [210884.215391] R13: 0000000000000000 R14: ffff8800217bbc20 R15:
>>>>>>>>     ffff880413d06c00
>>>>>>>>     [210884.222961] FS:  00007f73bacd6880(0000)
>>>>>>>>     GS:ffff88021fd40000(0000) knlGS:0000000000000000
>>>>>>>>     [210884.231516] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
>>>>>>>>     [210884.237557] CR2: 0000000000000008 CR3: 0000000001c11000 CR4:
>>>>>>>>     00000000000407e0
>>>>>>>>     [210884.245131] Stack:
>>>>>>>>     [210884.247395]  ffff880274f4d020 ffff880413d06c00
>>>>>>>>     0000bfcc44a463f8 ffff8800217bbc20
>>>>>>>>     [210884.255337]  ffff880413d06c00 ffff8800217bbc78
>>>>>>>>     ffffffffa0162b68 0000000000000000
>>>>>>>>     [210884.263256]  ffff880218633160 0000000000000000
>>>>>>>>     0000000000000000 0000000000000000
>>>>>>>>     [210884.271234] Call Trace:
>>>>>>>>     [210884.273985]  [<ffffffffa0162b68>]
>>>>>>>>     bch_btree_node_read+0x168/0x190 [bcache]
>>>>>>>>     [210884.281258]  [<ffffffffa0163f69>]
>>>>>>>>     bch_btree_node_get+0x169/0x290 [bcache]
>>>>>>>>     [210884.288377]  [<ffffffffa01642f5>]
>>>>>>>>     bch_btree_map_keys_recurse+0xd5/0x1d0 [bcache]
>>>>>>>>     [210884.296311]  [<ffffffffa016dcb0>] ?
>>>>>>>>     cached_dev_congested+0x180/0x180 [bcache]
>>>>>>>>     [210884.303953]  [<ffffffff8135b204>] ?
>>>>>>>>     call_rwsem_down_read_failed+0x14/0x30
>>>>>>>>     [210884.311158]  [<ffffffffa01673f7>]
>>>>>>>>     bch_btree_map_keys+0x127/0x150 [bcache]
>>>>>>>>     [210884.318273]  [<ffffffffa016dcb0>] ?
>>>>>>>>     cached_dev_congested+0x180/0x180 [bcache]
>>>>>>>>     [210884.325826]  [<ffffffffa016e7f5>] cache_lookup+0xf5/0x1f0 [bcache]
>>>>>>>>     [210884.332325]  [<ffffffff810a4af6>] process_one_work+0x176/0x430
>>>>>>>>     [210884.338427]  [<ffffffff810a578b>] worker_thread+0x11b/0x3a0
>>>>>>>>     [210884.344282]  [<ffffffff810a5670>] ? rescuer_thread+0x3b0/0x3b0
>>>>>>>>     [210884.350447]  [<ffffffff810ac528>] kthread+0xd8/0xf0
>>>>>>>>     [210884.355615]  [<ffffffff810ac450>] ? insert_kthread_work+0x40/0x40
>>>>>>>>     [210884.362017]  [<ffffffff816ff93c>] ret_from_fork+0x7c/0xb0
>>>>>>>>     [210884.367756]  [<ffffffff810ac450>] ? insert_kthread_work+0x40/0x40
>>>>>>>>     [210884.374234] Code: 08 01 00 00 48 8b b8 58 cb 00 00 e8 bf 25 01
>>>>>>>>     e1 49 8b b4 24 80 00 00 00 49 89 c5 31 d2 0f b7 86 32 04 00 00 66
>>>>>>>>     f7 b6 30 04 00 00 <49> c7 45 08 00 00 00 00 0f b7 c0 49 89 45 00
>>>>>>>>     48 8b 43 10 48 85
>>>>>>>>     [210884.395405] RIP  [<ffffffffa01625fc>]
>>>>>>>>     bch_btree_node_read_done+0x4c/0x450 [bcache]
>>>>>>>>     [210884.403389]  RSP <ffff8800217bbbe8>
>>>>>>>>     [210884.407171] CR2: 0000000000000008
>>>>>>>>     [210884.411233] ---[ end trace 0064e6abfd068c85 ]---
>>>>>>>>     [210884.416352] BUG: unable to handle kernel paging request at
>>>>>>>>     ffffffffffffffd8
>>>>>>>>     [210884.423871] IP: [<ffffffff810acb10>] kthread_data+0x10/0x20
>>>>>>>>     [210884.429915] PGD 1c14067 PUD 1c16067 PMD 0
>>>>>>>>
>>>>>>>>     --Larkin
>>>>>>>>
>>>>>>>>     --
>>>>>>>>     To unsubscribe from this list: send the line "unsubscribe
>>>>>>>>     linux-bcache" in
>>>>>>>>     the body of a message to majordomo@vger.kernel.org
>>>>>>>>     <mailto:majordomo@vger.kernel.org>
>>>>>>>>     More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>>>>
>>>>>>> --
>>>>>>> To unsubscribe from this list: send the line "unsubscribe linux-bcache" in
>>>>>>> the body of a message to majordomo@vger.kernel.org
>>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>> --
>>>>>> To unsubscribe from this list: send the line "unsubscribe linux-bcache" in
>>>>>> the body of a message to majordomo@vger.kernel.org
>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-bcache" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Null pointer oops
  2014-08-13 21:32               ` Larkin Lowrey
@ 2014-08-13 21:37                 ` Slava Pestov
  0 siblings, 0 replies; 13+ messages in thread
From: Slava Pestov @ 2014-08-13 21:37 UTC (permalink / raw)
  To: Larkin Lowrey; +Cc: Kent Overstreet, linux-bcache

Hi Larkin,

A mempool_alloc() failing indicates memory pressure. The SSD is not at
fault here.

On Wed, Aug 13, 2014 at 2:32 PM, Larkin Lowrey
<llowrey@nuclearwinter.com> wrote:
> My swap is an LVM LV on top of a raid10 backed bcache device. I have had
> a few oopses in recent months but have not been able to pin down the
> cause. I have begun to suspect that the swap may be involved. The SSDs
> in that raid10 are junky OCZ Agility3s. They seem to have a reputation
> for periodic freezes or long pauses. Could it be that the kernel wanted
> to write to the swap but couldn't because the SSDs were in a long pause
> and that caused mempool_alloc to return null which then blew up the world?
>
> Is there any reason not to put swap on top of a bcache device?
>
> --Larkin
>
> On 8/13/2014 4:25 PM, Slava Pestov wrote:
>> Indeed it looks like iter is NULL. I see the bug is still present in
>> the latest dev branch. The problem is that we're not checking the
>> return value of mempoool_alloc(), which may be NULL if we pass
>> GFP_NOWAIT.
>>
>> On Wed, Aug 13, 2014 at 2:21 PM, Larkin Lowrey
>> <llowrey@nuclearwinter.com> wrote:
>>> Here's the dissassembly of bch_btree_node_read_done. The offending line
>>> is 207 and the instruction is at offset 76.
>>>
>>> --Larkin
>>>
>>> 199     void bch_btree_node_read_done(struct btree *b)
>>> 200     {
>>>    0x00000000000065b0 <+0>:     callq  0x65b5 <bch_btree_node_read_done+5>
>>>    0x00000000000065b5 <+5>:     push   %rbp
>>>    0x00000000000065b8 <+8>:     mov    %rsp,%rbp
>>>    0x00000000000065bb <+11>:    push   %r15
>>>    0x00000000000065bd <+13>:    push   %r14
>>>    0x00000000000065bf <+15>:    push   %r13
>>>    0x00000000000065c1 <+17>:    push   %r12
>>>    0x00000000000065c3 <+19>:    mov    %rdi,%r12
>>>    0x00000000000065c6 <+22>:    push   %rbx
>>>
>>> 201             const char *err = "bad btree header";
>>>    0x0000000000006800 <+592>:   mov    $0x0,%rdx
>>>
>>> 202             struct bset *i = btree_bset_first(b);
>>> 203             struct btree_iter *iter;
>>> 204
>>> 205             iter = mempool_alloc(b->c->fill_iter, GFP_NOWAIT);
>>>    0x00000000000065b6 <+6>:     xor    %esi,%esi
>>>    0x00000000000065c7 <+23>:    mov    0x80(%rdi),%rax
>>>    0x00000000000065d5 <+37>:    mov    0xcb58(%rax),%rdi
>>>    0x00000000000065dc <+44>:    callq  0x65e1 <bch_btree_node_read_done+49>
>>>    0x00000000000065e9 <+57>:    mov    %rax,%r13
>>>
>>> 206             iter->size = b->c->sb.bucket_size / b->c->sb.block_size;
>>>    0x00000000000065e1 <+49>:    mov    0x80(%r12),%rsi
>>>    0x00000000000065ec <+60>:    xor    %edx,%edx
>>>    0x00000000000065ee <+62>:    movzwl 0x432(%rsi),%eax
>>>    0x00000000000065f5 <+69>:    divw   0x430(%rsi)
>>>    0x0000000000006604 <+84>:    movzwl %ax,%eax
>>>    0x0000000000006607 <+87>:    mov    %rax,0x0(%r13)
>>>
>>> 207             iter->used = 0;
>>>    0x00000000000065fc <+76>:    movq   $0x0,0x8(%r13)
>>>
>>> 208
>>> 209     #ifdef CONFIG_BCACHE_DEBUG
>>> 210             iter->b = &b->keys;
>>> 211     #endif
>>> 212
>>> 213             if (!i->seq)
>>>    0x000000000000660b <+91>:    mov    0x10(%rbx),%rax
>>>    0x000000000000660f <+95>:    test   %rax,%rax
>>>    0x0000000000006612 <+98>:    je     0x6800 <bch_btree_node_read_done+592>
>>>
>>> 214                     goto err;
>>> 215
>>> 216             for (;
>>>    0x000000000000664d <+157>:   cmp    %r9d,%ecx
>>>    0x0000000000006650 <+160>:   jae    0x6882 <bch_btree_node_read_done+722>
>>>    0x0000000000006744 <+404>:   cmp    %r9d,%r10d
>>>    0x0000000000006747 <+407>:   jae    0x6898 <bch_btree_node_read_done+744>
>>>
>>> 217                  b->written < btree_blocks(b) && i->seq ==
>>> b->keys.set[0].data->seq;
>>>    0x0000000000006618 <+104>:   mov    0x80(%r12),%rsi
>>>    0x0000000000006625 <+117>:   movzwl 0xc0(%r12),%edi
>>>    0x000000000000662e <+126>:   mov    0x108(%r12),%r8
>>>    0x0000000000006636 <+134>:   movzwl 0xde2(%rsi),%ecx
>>>    0x0000000000006644 <+148>:   mov    %rdx,%r9
>>>    0x0000000000006647 <+151>:   shr    %cl,%r9
>>>    0x000000000000664a <+154>:   movzwl %di,%ecx
>>>    0x0000000000006656 <+166>:   cmp    0x10(%r8),%rax
>>>    0x000000000000665a <+170>:   jne    0x6882 <bch_btree_node_read_done+722>
>>>    0x000000000000670f <+351>:   mov    %rdx,%r9
>>>    0x000000000000672a <+378>:   movzwl 0xde2(%rsi),%ecx
>>>    0x0000000000006738 <+392>:   shr    %cl,%r9
>>>    0x000000000000674d <+413>:   mov    0x10(%r8),%rcx
>>>    0x0000000000006751 <+417>:   cmp    %rcx,0x10(%rbx)
>>>    0x0000000000006755 <+421>:   jne    0x6898 <bch_btree_node_read_done+744>
>>>    0x0000000000006892 <+738>:   add    %r8,%rbx
>>>    0x0000000000006895 <+741>:   nopl   (%rax)
>>>
>>> 218                  i = write_block(b)) {
>>> 219                     err = "unsupported bset version";
>>>    0x00000000000069c0 <+1040>:  mov    $0x0,%rdx
>>>    0x00000000000069c7 <+1047>:  jmpq   0x6807 <bch_btree_node_read_done+599>
>>>    0x00000000000069cc <+1052>:  nopl   0x0(%rax)
>>>
>>> 220                     if (i->version > BCACHE_BSET_VERSION)
>>>    0x0000000000006660 <+176>:   mov    0x18(%rbx),%r10d
>>>    0x0000000000006664 <+180>:   cmp    $0x1,%r10d
>>>    0x0000000000006668 <+184>:   ja     0x69c0
>>> <bch_btree_node_read_done+1040>
>>>    0x000000000000666e <+190>:   movzwl 0x430(%rsi),%r11d
>>>    0x0000000000006676 <+198>:   jmpq   0x6769 <bch_btree_node_read_done+441>
>>>    0x000000000000667b <+203>:   nopl   0x0(%rax,%rax,1)
>>>    0x000000000000675b <+427>:   mov    0x18(%rbx),%r10d
>>>    0x000000000000675f <+431>:   cmp    $0x1,%r10d
>>>    0x0000000000006763 <+435>:   ja     0x69c0
>>> <bch_btree_node_read_done+1040>
>>>
>>> 221                             goto err;
>>> 222
>>> 223                     err = "bad btree header";
>>> 224                     if (b->written + set_blocks(i, block_bytes(b->c)) >
>>>    0x0000000000006769 <+441>:   mov    0x1c(%rbx),%eax
>>>    0x000000000000676c <+444>:   mov    %r11,%rcx
>>>    0x000000000000676f <+447>:   xor    %edx,%edx
>>>    0x0000000000006771 <+449>:   shl    $0x9,%rcx
>>>    0x0000000000006775 <+453>:   movzwl %di,%edi
>>>    0x0000000000006778 <+456>:   mov    %r9d,%r9d
>>>    0x000000000000677b <+459>:   and    $0x1fffe00,%ecx
>>>    0x0000000000006781 <+465>:   lea    0x20(,%rax,8),%r8
>>>    0x0000000000006789 <+473>:   lea    -0x1(%r8,%rcx,1),%rax
>>>    0x000000000000678e <+478>:   div    %rcx
>>>    0x0000000000006791 <+481>:   add    %rdi,%rax
>>>    0x0000000000006794 <+484>:   cmp    %r9,%rax
>>>    0x0000000000006797 <+487>:   ja     0x6800 <bch_btree_node_read_done+592>
>>>
>>> 225                         btree_blocks(b))
>>> 226                             goto err;
>>> 227
>>> 228                     err = "bad magic";
>>>    0x00000000000069d0 <+1056>:  mov    $0x0,%rdx
>>>    0x00000000000069d7 <+1063>:  jmpq   0x6807 <bch_btree_node_read_done+599>
>>>    0x00000000000069dc <+1068>:  nopl   0x0(%rax)
>>>
>>> 229                     if (i->magic != bset_magic(&b->c->sb))
>>>    0x00000000000067aa <+506>:   cmp    %rax,0x8(%rbx)
>>>    0x00000000000067ae <+510>:   jne    0x69d0
>>> <bch_btree_node_read_done+1056>
>>>
>>> 230                             goto err;
>>> 231
>>> 232                     err = "bad checksum";
>>>    0x00000000000067df <+559>:   mov    $0x0,%rdx
>>>    0x00000000000067e6 <+566>:   jmp    0x6807 <bch_btree_node_read_done+599>
>>>    0x00000000000067e8 <+568>:   nopl   0x0(%rax,%rax,1)
>>>    0x00000000000067f0 <+576>:   mov    0x1c(%rbx),%eax
>>>    0x00000000000067f3 <+579>:   jmpq   0x66bf <bch_btree_node_read_done+271>
>>>    0x00000000000067f8 <+584>:   nopl   0x0(%rax,%rax,1)
>>>
>>> 233                     switch (i->version) {
>>>    0x00000000000067b4 <+516>:   cmp    $0x1,%r10d
>>>    0x00000000000067bb <+523>:   je     0x6680 <bch_btree_node_read_done+208>
>>>
>>> 234                     case 0:
>>> 235                             if (i->csum != csum_set(i))
>>>    0x00000000000067c1 <+529>:   lea    0x20(%rbx),%r14
>>>    0x00000000000067c5 <+533>:   lea    0x8(%rbx),%rdi
>>>    0x00000000000067ce <+542>:   sub    %rdi,%rsi
>>>    0x00000000000067d1 <+545>:   callq  0x67d6 <bch_btree_node_read_done+550>
>>>    0x00000000000067d6 <+550>:   cmp    %rax,%r15
>>>    0x00000000000067d9 <+553>:   je     0x66a6 <bch_btree_node_read_done+246>
>>> 236                                     goto err;
>>> 237                             break;
>>> 238                     case BCACHE_BSET_VERSION:
>>> 239                             if (i->csum != btree_csum_set(b, i))
>>>    0x000000000000669d <+237>:   cmp    %rax,%r15
>>>    0x00000000000066a0 <+240>:   jne    0x67df <bch_btree_node_read_done+559>
>>>    0x00000000000067b8 <+520>:   mov    (%rbx),%r15
>>>
>>> 240                                     goto err;
>>> 241                             break;
>>> 242                     }
>>> 243
>>> 244                     err = "empty set";
>>>    0x00000000000069e0 <+1072>:  mov    $0x0,%rdx
>>>    0x00000000000069e7 <+1079>:  jmpq   0x6807 <bch_btree_node_read_done+599>
>>>
>>> 245                     if (i != b->keys.set[0].data && !i->keys)
>>>    0x00000000000066a6 <+246>:   cmp    %rbx,0x108(%r12)
>>>    0x00000000000066ae <+254>:   je     0x67f0 <bch_btree_node_read_done+576>
>>>    0x00000000000066b4 <+260>:   mov    0x1c(%rbx),%eax
>>>    0x00000000000066b7 <+263>:   test   %eax,%eax
>>>    0x00000000000066b9 <+265>:   je     0x69e0
>>> <bch_btree_node_read_done+1072>
>>>
>>> 246                             goto err;
>>> 247
>>> 248                     bch_btree_iter_push(iter, i->start,
>>> bset_bkey_last(i));
>>>    0x00000000000066c3 <+275>:   mov    %r14,%rsi
>>>    0x00000000000066c6 <+278>:   mov    %r13,%rdi
>>>    0x00000000000066c9 <+281>:   callq  0x66ce <bch_btree_node_read_done+286>
>>>
>>> 249
>>> 250                     b->written += set_blocks(i, block_bytes(b->c));
>>>    0x00000000000066ce <+286>:   mov    0x80(%r12),%rsi
>>>    0x00000000000066d6 <+294>:   mov    0x1c(%rbx),%eax
>>>    0x00000000000066d9 <+297>:   xor    %edx,%edx
>>>    0x00000000000066e3 <+307>:   movzwl 0x430(%rsi),%ecx
>>>    0x00000000000066ea <+314>:   shl    $0x9,%ecx
>>>    0x00000000000066ed <+317>:   movslq %ecx,%rcx
>>>    0x00000000000066f0 <+320>:   lea    0x1f(%rcx,%rax,8),%rax
>>>    0x00000000000066f5 <+325>:   div    %rcx
>>>    0x0000000000006704 <+340>:   mov    %eax,%edi
>>>    0x0000000000006706 <+342>:   add    0xc0(%r12),%di
>>>    0x0000000000006712 <+354>:   mov    %di,0xc0(%r12)
>>>
>>> 251             }
>>> 252
>>> 253             err = "corrupted btree";
>>>    0x00000000000069b0 <+1024>:  mov    $0x0,%rdx
>>>    0x00000000000069b7 <+1031>:  jmpq   0x6807 <bch_btree_node_read_done+599>
>>>    0x00000000000069bc <+1036>:  nopl   0x0(%rax)
>>>
>>> 254             for (i = write_block(b);
>>>    0x00000000000068a1 <+753>:   cmp    %rdx,%rcx
>>>    0x00000000000068a4 <+756>:   jae    0x68e5 <bch_btree_node_read_done+821>
>>>    0x00000000000068e0 <+816>:   cmp    %rdx,%rcx
>>>    0x00000000000068e3 <+819>:   jb     0x68c8 <bch_btree_node_read_done+792>
>>>
>>> 255                  bset_sector_offset(&b->keys, i) < KEY_SIZE(&b->key);
>>> 256                  i = ((void *) i) + block_bytes(b->c))
>>>    0x00000000000068d7 <+807>:   mov    %rcx,%rbx
>>>    0x00000000000068da <+810>:   sub    %r8d,%ecx
>>>
>>> 257                     if (i->seq == b->keys.set[0].data->seq)
>>>    0x00000000000068a6 <+758>:   mov    0x10(%r8),%rdi
>>>    0x00000000000068aa <+762>:   cmp    %rdi,0x10(%rbx)
>>>    0x00000000000068ae <+766>:   je     0x69b0
>>> <bch_btree_node_read_done+1024>
>>>    0x00000000000068b4 <+772>:   cltq
>>>    0x00000000000068b6 <+774>:   mov    %rax,%r9
>>>    0x00000000000068b9 <+777>:   lea    (%rbx,%rax,1),%rcx
>>>    0x00000000000068bd <+781>:   neg    %r9
>>>    0x00000000000068c0 <+784>:   jmp    0x68d7 <bch_btree_node_read_done+807>
>>>    0x00000000000068c2 <+786>:   nopw   0x0(%rax,%rax,1)
>>>    0x00000000000068c8 <+792>:   lea    (%rbx,%rax,1),%rcx
>>>    0x00000000000068cc <+796>:   cmp    0x10(%rcx,%r9,1),%rdi
>>>    0x00000000000068d1 <+801>:   je     0x69b0
>>> <bch_btree_node_read_done+1024>
>>>
>>> 258                             goto err;
>>> 259
>>> 260             bch_btree_sort_and_fix_extents(&b->keys, iter, &b->c->sort);
>>>    0x00000000000068e5 <+821>:   lea    0xc8(%r12),%r14
>>>    0x00000000000068ed <+829>:   lea    0xcb60(%rsi),%rdx
>>>    0x00000000000068f4 <+836>:   mov    %r13,%rsi
>>>    0x00000000000068f7 <+839>:   mov    %r14,%rdi
>>>    0x00000000000068fa <+842>:   callq  0x68ff <bch_btree_node_read_done+847>
>>>
>>> 261
>>> 262             i = b->keys.set[0].data;
>>>    0x0000000000006907 <+855>:   mov    0x108(%r12),%rbx
>>>
>>> 263             err = "short btree key";
>>>    0x00000000000069ec <+1084>:  mov    $0x0,%rdx
>>>    0x00000000000069f3 <+1091>:  jmpq   0x6807 <bch_btree_node_read_done+599>
>>>
>>> 264             if (b->keys.set[0].size &&
>>>    0x00000000000068ff <+847>:   mov    0xe0(%r12),%eax
>>>    0x0000000000006914 <+868>:   test   %eax,%eax
>>>    0x0000000000006916 <+870>:   je     0x694d <bch_btree_node_read_done+925>
>>>    0x0000000000006944 <+916>:   test   %rax,%rax
>>>    0x0000000000006947 <+919>:   js     0x69ec
>>> <bch_btree_node_read_done+1084>
>>>
>>> 265                 bkey_cmp(&b->key, &b->keys.set[0].end) < 0)
>>> 266                     goto err;
>>> 267
>>> 268             if (b->written < btree_blocks(b))
>>>    0x000000000000694d <+925>:   mov    0x80(%r12),%rax
>>>    0x0000000000006955 <+933>:   movzwl 0xc0(%r12),%esi
>>>    0x0000000000006965 <+949>:   movzwl 0xde2(%rax),%ecx
>>>    0x000000000000696c <+956>:   shr    %cl,%rdx
>>>    0x000000000000696f <+959>:   cmp    %edx,%esi
>>>    0x0000000000006971 <+961>:   jae    0x6868 <bch_btree_node_read_done+696>
>>>
>>> 269                     bch_bset_init_next(&b->keys, write_block(b),
>>>    0x000000000000698f <+991>:   mov    %r14,%rdi
>>>    0x000000000000699e <+1006>:  callq  0x69a3
>>> <bch_btree_node_read_done+1011>
>>>    0x00000000000069a3 <+1011>:  mov    0x80(%r12),%rax
>>>    0x00000000000069ab <+1019>:  jmpq   0x6868 <bch_btree_node_read_done+696>
>>>
>>> 270                                        bset_magic(&b->c->sb));
>>> 271     out:
>>> 272             mempool_free(iter, b->c->fill_iter);
>>>    0x0000000000006868 <+696>:   mov    0xcb58(%rax),%rsi
>>>    0x000000000000686f <+703>:   mov    %r13,%rdi
>>>    0x0000000000006872 <+706>:   callq  0x6877 <bch_btree_node_read_done+711>
>>>
>>> 273             return;
>>> 274     err:
>>> 275             set_btree_node_io_error(b);
>>> 276             bch_cache_set_error(b->c, "%s at bucket %zu, block %u,
>>> %u keys",
>>>    0x0000000000006829 <+633>:   mov    0x1c(%rbx),%r9d
>>>    0x000000000000684a <+666>:   mov    %esi,%ecx
>>>    0x000000000000684c <+668>:   mov    $0x0,%rsi
>>>    0x0000000000006853 <+675>:   shr    %cl,%r8d
>>>    0x0000000000006856 <+678>:   mov    %rax,%rcx
>>>    0x0000000000006859 <+681>:   xor    %eax,%eax
>>>    0x000000000000685b <+683>:   callq  0x6860 <bch_btree_node_read_done+688>
>>>    0x0000000000006860 <+688>:   mov    0x80(%r12),%rax
>>>
>>> 277                                 err, PTR_BUCKET_NR(b->c, &b->key, 0),
>>> 278                                 bset_block_offset(b, i), i->keys);
>>> 279             goto out;
>>> 280     }
>>>    0x0000000000006877 <+711>:   pop    %rbx
>>>    0x0000000000006878 <+712>:   pop    %r12
>>>    0x000000000000687a <+714>:   pop    %r13
>>>    0x000000000000687c <+716>:   pop    %r14
>>>    0x000000000000687e <+718>:   pop    %r15
>>>    0x0000000000006880 <+720>:   pop    %rbp
>>>    0x0000000000006881 <+721>:   retq
>>>    0x0000000000006882 <+722>:   movzwl 0x430(%rsi),%eax
>>>    0x0000000000006889 <+729>:   shl    $0x9,%eax
>>>    0x000000000000688c <+732>:   imul   %eax,%ecx
>>>    0x000000000000688f <+735>:   movslq %ecx,%rbx
>>>
>>>
>>> On 8/13/2014 1:45 PM, Slava Pestov wrote:
>>>> Can you post the disassembly of the function?
>>>>
>>>> On Wed, Aug 13, 2014 at 11:35 AM, Larkin Lowrey
>>>> <llowrey@nuclearwinter.com> wrote:
>>>>> Thanks. Trying gdb helped me find the answer. I needed to install the
>>>>> kernel-debuginfo-3.15.8-200.fc20.x86_64  package via yum.
>>>>>
>>>>> From addr2line:
>>>>>> bch_btree_node_read_done+0x4c
>>>>>> drivers/md/bcache/btree.c:207
>>>>> Here'a a snippet from gdb:
>>>>>
>>>>>> (gdb) list *(bch_btree_node_read_done+0x4c)
>>>>>> 0x65fc is in bch_btree_node_read_done (drivers/md/bcache/btree.c:207).
>>>>>> 202             struct bset *i = btree_bset_first(b);
>>>>>> 203             struct btree_iter *iter;
>>>>>> 204
>>>>>> 205             iter = mempool_alloc(b->c->fill_iter, GFP_NOWAIT);
>>>>>> 206             iter->size = b->c->sb.bucket_size / b->c->sb.block_size;
>>>>>> 207             iter->used = 0;
>>>>>> 208
>>>>>> 209     #ifdef CONFIG_BCACHE_DEBUG
>>>>>> 210             iter->b = &b->keys;
>>>>>> 211     #endif
>>>>> This doesn't make any sense to me. If iter was null I would expect line
>>>>> 206 to blow up first.
>>>>>
>>>>> --Larkin
>>>>>
>>>>> On 8/13/2014 12:41 PM, Slava Pestov wrote:
>>>>>> You can try to use gdb:
>>>>>>
>>>>>> gdb /lib/modules/.../foo.ko
>>>>>>
>>>>>> list *(bch_btree_node_read_done+0x4c)
>>>>>>
>>>>>>
>>>>>> On Wed, Aug 13, 2014 at 9:40 AM, Larkin Lowrey
>>>>>> <llowrey@nuclearwinter.com> wrote:
>>>>>>> This is making be feel very dumb. I've googled extensively but can't
>>>>>>> figure out how to run addr2line for a module.
>>>>>>>
>>>>>>> I'm running Fedora 20 and the kernel did not have debugging symbols. I
>>>>>>> downloaded the version with symbols but I don't know if the addresses
>>>>>>> are going to be the same. Bcache is a module for me and that's where
>>>>>>> things get tricky. Do you have any tips?
>>>>>>>
>>>>>>> --Larkin
>>>>>>>
>>>>>>> On 8/13/2014 12:04 AM, Kent Overstreet wrote:
>>>>>>>> Any chance you could do an addr2line and get me the exact line where
>>>>>>>> it happened?
>>>>>>>>
>>>>>>>> On Aug 12, 2014 10:02 PM, "Larkin Lowrey" <llowrey@nuclearwinter.com
>>>>>>>> <mailto:llowrey@nuclearwinter.com>> wrote:
>>>>>>>>
>>>>>>>>     I got an oops while doing some heavy I/O. I have an md raid10 cache
>>>>>>>>     device (4 SSDs) and 3 md raid5/6 backing devices. This setup has been
>>>>>>>>     well behaved for about 6 months.
>>>>>>>>
>>>>>>>>     If this isn't a known issue is there anything I can do to provide more
>>>>>>>>     useful information?
>>>>>>>>
>>>>>>>>     I'm running kernel 3.15.8-200.fc20.x86_64.
>>>>>>>>
>>>>>>>>     [210884.047249] BUG: unable to handle kernel NULL pointer
>>>>>>>>     dereference at 0000000000000008
>>>>>>>>     [210884.055605] IP: [<ffffffffa01625fc>]
>>>>>>>>     bch_btree_node_read_done+0x4c/0x450 [bcache]
>>>>>>>>     [210884.063723] PGD 0
>>>>>>>>     [210884.066053] Oops: 0002 [#1] SMP
>>>>>>>>     [210884.069610] Modules linked in: lp parport binfmt_misc
>>>>>>>>     ipt_MASQUERADE iptable_nat nf_nat_ipv4 nf_nat xt_CHECKSUM
>>>>>>>>     iptable_mangle tun bridge stp llc xt_multiport ebtable_nat
>>>>>>>>     ebtables hwmon_vid ip6t_REJECT nf_conntrack_ipv6 nf_conntrack_ipv4
>>>>>>>>     nf_defrag_ipv6 nf_defrag_ipv4 ip6table_filter xt_conntrack
>>>>>>>>     ip6_tables nf_conntrack keyspan ezusb kvm_amd kvm crct10dif_pclmul
>>>>>>>>     crc32_pclmul crc32c_intel ghash_clmulni_intel microcode serio_raw
>>>>>>>>     amd64_edac_mod edac_core fam15h_power k10temp edac_mce_amd
>>>>>>>>     sp5100_tco i2c_piix4 igb ptp pps_core dca shpchp acpi_cpufreq
>>>>>>>>     btrfs bcache raid456 async_raid6_recov async_memcpy async_pq
>>>>>>>>     async_xor async_tx xor raid6_pq raid10 i2c_algo_bit drm_kms_helper
>>>>>>>>     ttm drm i2c_core mpt2sas mvsas libsas raid_class
>>>>>>>>     scsi_transport_sas cpufreq_stats
>>>>>>>>     [210884.140704] CPU: 5 PID: 11188 Comm: kworker/5:1 Not tainted
>>>>>>>>     3.15.8-200.fc20.x86_64 #1
>>>>>>>>     [210884.149069] Hardware name:  /H8DG6/H8DGi, BIOS 3.0a       07/2
>>>>>>>>     [210884.155280] Workqueue: bcache cache_lookup [bcache]
>>>>>>>>     [210884.160531] task: ffff880218633160 ti: ffff8800217b8000
>>>>>>>>     task.ti: ffff8800217b8000
>>>>>>>>     [210884.168502] RIP: 0010:[<ffffffffa01625fc>]
>>>>>>>>      [<ffffffffa01625fc>] bch_btree_node_read_done+0x4c/0x450 [bcache]
>>>>>>>>     [210884.179105] RSP: 0000:ffff8800217bbbe8  EFLAGS: 00010212
>>>>>>>>     [210884.184806] RAX: 0000000000000400 RBX: ffff880245ec0000 RCX:
>>>>>>>>     0000000000000000
>>>>>>>>     [210884.192480] RDX: 0000000000000000 RSI: ffff880418380000 RDI:
>>>>>>>>     0000000000000246
>>>>>>>>     [210884.200075] RBP: ffff8800217bbc10 R08: 0000000000000000 R09:
>>>>>>>>     0000000000000f6b
>>>>>>>>     [210884.207738] R10: 0000000000000000 R11: 0000000000000400 R12:
>>>>>>>>     ffff880413d06c00
>>>>>>>>     [210884.215391] R13: 0000000000000000 R14: ffff8800217bbc20 R15:
>>>>>>>>     ffff880413d06c00
>>>>>>>>     [210884.222961] FS:  00007f73bacd6880(0000)
>>>>>>>>     GS:ffff88021fd40000(0000) knlGS:0000000000000000
>>>>>>>>     [210884.231516] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
>>>>>>>>     [210884.237557] CR2: 0000000000000008 CR3: 0000000001c11000 CR4:
>>>>>>>>     00000000000407e0
>>>>>>>>     [210884.245131] Stack:
>>>>>>>>     [210884.247395]  ffff880274f4d020 ffff880413d06c00
>>>>>>>>     0000bfcc44a463f8 ffff8800217bbc20
>>>>>>>>     [210884.255337]  ffff880413d06c00 ffff8800217bbc78
>>>>>>>>     ffffffffa0162b68 0000000000000000
>>>>>>>>     [210884.263256]  ffff880218633160 0000000000000000
>>>>>>>>     0000000000000000 0000000000000000
>>>>>>>>     [210884.271234] Call Trace:
>>>>>>>>     [210884.273985]  [<ffffffffa0162b68>]
>>>>>>>>     bch_btree_node_read+0x168/0x190 [bcache]
>>>>>>>>     [210884.281258]  [<ffffffffa0163f69>]
>>>>>>>>     bch_btree_node_get+0x169/0x290 [bcache]
>>>>>>>>     [210884.288377]  [<ffffffffa01642f5>]
>>>>>>>>     bch_btree_map_keys_recurse+0xd5/0x1d0 [bcache]
>>>>>>>>     [210884.296311]  [<ffffffffa016dcb0>] ?
>>>>>>>>     cached_dev_congested+0x180/0x180 [bcache]
>>>>>>>>     [210884.303953]  [<ffffffff8135b204>] ?
>>>>>>>>     call_rwsem_down_read_failed+0x14/0x30
>>>>>>>>     [210884.311158]  [<ffffffffa01673f7>]
>>>>>>>>     bch_btree_map_keys+0x127/0x150 [bcache]
>>>>>>>>     [210884.318273]  [<ffffffffa016dcb0>] ?
>>>>>>>>     cached_dev_congested+0x180/0x180 [bcache]
>>>>>>>>     [210884.325826]  [<ffffffffa016e7f5>] cache_lookup+0xf5/0x1f0 [bcache]
>>>>>>>>     [210884.332325]  [<ffffffff810a4af6>] process_one_work+0x176/0x430
>>>>>>>>     [210884.338427]  [<ffffffff810a578b>] worker_thread+0x11b/0x3a0
>>>>>>>>     [210884.344282]  [<ffffffff810a5670>] ? rescuer_thread+0x3b0/0x3b0
>>>>>>>>     [210884.350447]  [<ffffffff810ac528>] kthread+0xd8/0xf0
>>>>>>>>     [210884.355615]  [<ffffffff810ac450>] ? insert_kthread_work+0x40/0x40
>>>>>>>>     [210884.362017]  [<ffffffff816ff93c>] ret_from_fork+0x7c/0xb0
>>>>>>>>     [210884.367756]  [<ffffffff810ac450>] ? insert_kthread_work+0x40/0x40
>>>>>>>>     [210884.374234] Code: 08 01 00 00 48 8b b8 58 cb 00 00 e8 bf 25 01
>>>>>>>>     e1 49 8b b4 24 80 00 00 00 49 89 c5 31 d2 0f b7 86 32 04 00 00 66
>>>>>>>>     f7 b6 30 04 00 00 <49> c7 45 08 00 00 00 00 0f b7 c0 49 89 45 00
>>>>>>>>     48 8b 43 10 48 85
>>>>>>>>     [210884.395405] RIP  [<ffffffffa01625fc>]
>>>>>>>>     bch_btree_node_read_done+0x4c/0x450 [bcache]
>>>>>>>>     [210884.403389]  RSP <ffff8800217bbbe8>
>>>>>>>>     [210884.407171] CR2: 0000000000000008
>>>>>>>>     [210884.411233] ---[ end trace 0064e6abfd068c85 ]---
>>>>>>>>     [210884.416352] BUG: unable to handle kernel paging request at
>>>>>>>>     ffffffffffffffd8
>>>>>>>>     [210884.423871] IP: [<ffffffff810acb10>] kthread_data+0x10/0x20
>>>>>>>>     [210884.429915] PGD 1c14067 PUD 1c16067 PMD 0
>>>>>>>>
>>>>>>>>     --Larkin
>>>>>>>>
>>>>>>>>     --
>>>>>>>>     To unsubscribe from this list: send the line "unsubscribe
>>>>>>>>     linux-bcache" in
>>>>>>>>     the body of a message to majordomo@vger.kernel.org
>>>>>>>>     <mailto:majordomo@vger.kernel.org>
>>>>>>>>     More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>>>>
>>>>>>> --
>>>>>>> To unsubscribe from this list: send the line "unsubscribe linux-bcache" in
>>>>>>> the body of a message to majordomo@vger.kernel.org
>>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>> --
>>>>>> To unsubscribe from this list: send the line "unsubscribe linux-bcache" in
>>>>>> the body of a message to majordomo@vger.kernel.org
>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Null pointer oops
  2014-08-13 21:30               ` Slava Pestov
  2014-08-13 21:34                 ` Jianjian Huo
@ 2014-08-13 22:14                 ` Larkin Lowrey
  2014-08-16  5:48                 ` Peter Kieser
  2 siblings, 0 replies; 13+ messages in thread
From: Larkin Lowrey @ 2014-08-13 22:14 UTC (permalink / raw)
  To: Slava Pestov; +Cc: Kent Overstreet, linux-bcache

Thanks for looking into this. It's good to know it has already been
addressed.

--Larkin

On 8/13/2014 4:30 PM, Slava Pestov wrote:
> I was mistaken. The bug is fixed in the pull request Kent sent to Jens for 3.16:
>
> http://evilpiepirate.org/git/linux-bcache.git/commit/?h=bcache-dev&id=bcf090e0040e30f8409e6a535a01e6473afb096f
>
> On Wed, Aug 13, 2014 at 2:25 PM, Slava Pestov <sp@datera.io> wrote:
>> Indeed it looks like iter is NULL. I see the bug is still present in
>> the latest dev branch. The problem is that we're not checking the
>> return value of mempoool_alloc(), which may be NULL if we pass
>> GFP_NOWAIT.
>>
>> On Wed, Aug 13, 2014 at 2:21 PM, Larkin Lowrey
>> <llowrey@nuclearwinter.com> wrote:
>>> Here's the dissassembly of bch_btree_node_read_done. The offending line
>>> is 207 and the instruction is at offset 76.
>>>
>>> --Larkin
>>>
>>> 199     void bch_btree_node_read_done(struct btree *b)
>>> 200     {
>>>    0x00000000000065b0 <+0>:     callq  0x65b5 <bch_btree_node_read_done+5>
>>>    0x00000000000065b5 <+5>:     push   %rbp
>>>    0x00000000000065b8 <+8>:     mov    %rsp,%rbp
>>>    0x00000000000065bb <+11>:    push   %r15
>>>    0x00000000000065bd <+13>:    push   %r14
>>>    0x00000000000065bf <+15>:    push   %r13
>>>    0x00000000000065c1 <+17>:    push   %r12
>>>    0x00000000000065c3 <+19>:    mov    %rdi,%r12
>>>    0x00000000000065c6 <+22>:    push   %rbx
>>>
>>> 201             const char *err = "bad btree header";
>>>    0x0000000000006800 <+592>:   mov    $0x0,%rdx
>>>
>>> 202             struct bset *i = btree_bset_first(b);
>>> 203             struct btree_iter *iter;
>>> 204
>>> 205             iter = mempool_alloc(b->c->fill_iter, GFP_NOWAIT);
>>>    0x00000000000065b6 <+6>:     xor    %esi,%esi
>>>    0x00000000000065c7 <+23>:    mov    0x80(%rdi),%rax
>>>    0x00000000000065d5 <+37>:    mov    0xcb58(%rax),%rdi
>>>    0x00000000000065dc <+44>:    callq  0x65e1 <bch_btree_node_read_done+49>
>>>    0x00000000000065e9 <+57>:    mov    %rax,%r13
>>>
>>> 206             iter->size = b->c->sb.bucket_size / b->c->sb.block_size;
>>>    0x00000000000065e1 <+49>:    mov    0x80(%r12),%rsi
>>>    0x00000000000065ec <+60>:    xor    %edx,%edx
>>>    0x00000000000065ee <+62>:    movzwl 0x432(%rsi),%eax
>>>    0x00000000000065f5 <+69>:    divw   0x430(%rsi)
>>>    0x0000000000006604 <+84>:    movzwl %ax,%eax
>>>    0x0000000000006607 <+87>:    mov    %rax,0x0(%r13)
>>>
>>> 207             iter->used = 0;
>>>    0x00000000000065fc <+76>:    movq   $0x0,0x8(%r13)
>>>
>>> 208
>>> 209     #ifdef CONFIG_BCACHE_DEBUG
>>> 210             iter->b = &b->keys;
>>> 211     #endif
>>> 212
>>> 213             if (!i->seq)
>>>    0x000000000000660b <+91>:    mov    0x10(%rbx),%rax
>>>    0x000000000000660f <+95>:    test   %rax,%rax
>>>    0x0000000000006612 <+98>:    je     0x6800 <bch_btree_node_read_done+592>
>>>
>>> 214                     goto err;
>>> 215
>>> 216             for (;
>>>    0x000000000000664d <+157>:   cmp    %r9d,%ecx
>>>    0x0000000000006650 <+160>:   jae    0x6882 <bch_btree_node_read_done+722>
>>>    0x0000000000006744 <+404>:   cmp    %r9d,%r10d
>>>    0x0000000000006747 <+407>:   jae    0x6898 <bch_btree_node_read_done+744>
>>>
>>> 217                  b->written < btree_blocks(b) && i->seq ==
>>> b->keys.set[0].data->seq;
>>>    0x0000000000006618 <+104>:   mov    0x80(%r12),%rsi
>>>    0x0000000000006625 <+117>:   movzwl 0xc0(%r12),%edi
>>>    0x000000000000662e <+126>:   mov    0x108(%r12),%r8
>>>    0x0000000000006636 <+134>:   movzwl 0xde2(%rsi),%ecx
>>>    0x0000000000006644 <+148>:   mov    %rdx,%r9
>>>    0x0000000000006647 <+151>:   shr    %cl,%r9
>>>    0x000000000000664a <+154>:   movzwl %di,%ecx
>>>    0x0000000000006656 <+166>:   cmp    0x10(%r8),%rax
>>>    0x000000000000665a <+170>:   jne    0x6882 <bch_btree_node_read_done+722>
>>>    0x000000000000670f <+351>:   mov    %rdx,%r9
>>>    0x000000000000672a <+378>:   movzwl 0xde2(%rsi),%ecx
>>>    0x0000000000006738 <+392>:   shr    %cl,%r9
>>>    0x000000000000674d <+413>:   mov    0x10(%r8),%rcx
>>>    0x0000000000006751 <+417>:   cmp    %rcx,0x10(%rbx)
>>>    0x0000000000006755 <+421>:   jne    0x6898 <bch_btree_node_read_done+744>
>>>    0x0000000000006892 <+738>:   add    %r8,%rbx
>>>    0x0000000000006895 <+741>:   nopl   (%rax)
>>>
>>> 218                  i = write_block(b)) {
>>> 219                     err = "unsupported bset version";
>>>    0x00000000000069c0 <+1040>:  mov    $0x0,%rdx
>>>    0x00000000000069c7 <+1047>:  jmpq   0x6807 <bch_btree_node_read_done+599>
>>>    0x00000000000069cc <+1052>:  nopl   0x0(%rax)
>>>
>>> 220                     if (i->version > BCACHE_BSET_VERSION)
>>>    0x0000000000006660 <+176>:   mov    0x18(%rbx),%r10d
>>>    0x0000000000006664 <+180>:   cmp    $0x1,%r10d
>>>    0x0000000000006668 <+184>:   ja     0x69c0
>>> <bch_btree_node_read_done+1040>
>>>    0x000000000000666e <+190>:   movzwl 0x430(%rsi),%r11d
>>>    0x0000000000006676 <+198>:   jmpq   0x6769 <bch_btree_node_read_done+441>
>>>    0x000000000000667b <+203>:   nopl   0x0(%rax,%rax,1)
>>>    0x000000000000675b <+427>:   mov    0x18(%rbx),%r10d
>>>    0x000000000000675f <+431>:   cmp    $0x1,%r10d
>>>    0x0000000000006763 <+435>:   ja     0x69c0
>>> <bch_btree_node_read_done+1040>
>>>
>>> 221                             goto err;
>>> 222
>>> 223                     err = "bad btree header";
>>> 224                     if (b->written + set_blocks(i, block_bytes(b->c)) >
>>>    0x0000000000006769 <+441>:   mov    0x1c(%rbx),%eax
>>>    0x000000000000676c <+444>:   mov    %r11,%rcx
>>>    0x000000000000676f <+447>:   xor    %edx,%edx
>>>    0x0000000000006771 <+449>:   shl    $0x9,%rcx
>>>    0x0000000000006775 <+453>:   movzwl %di,%edi
>>>    0x0000000000006778 <+456>:   mov    %r9d,%r9d
>>>    0x000000000000677b <+459>:   and    $0x1fffe00,%ecx
>>>    0x0000000000006781 <+465>:   lea    0x20(,%rax,8),%r8
>>>    0x0000000000006789 <+473>:   lea    -0x1(%r8,%rcx,1),%rax
>>>    0x000000000000678e <+478>:   div    %rcx
>>>    0x0000000000006791 <+481>:   add    %rdi,%rax
>>>    0x0000000000006794 <+484>:   cmp    %r9,%rax
>>>    0x0000000000006797 <+487>:   ja     0x6800 <bch_btree_node_read_done+592>
>>>
>>> 225                         btree_blocks(b))
>>> 226                             goto err;
>>> 227
>>> 228                     err = "bad magic";
>>>    0x00000000000069d0 <+1056>:  mov    $0x0,%rdx
>>>    0x00000000000069d7 <+1063>:  jmpq   0x6807 <bch_btree_node_read_done+599>
>>>    0x00000000000069dc <+1068>:  nopl   0x0(%rax)
>>>
>>> 229                     if (i->magic != bset_magic(&b->c->sb))
>>>    0x00000000000067aa <+506>:   cmp    %rax,0x8(%rbx)
>>>    0x00000000000067ae <+510>:   jne    0x69d0
>>> <bch_btree_node_read_done+1056>
>>>
>>> 230                             goto err;
>>> 231
>>> 232                     err = "bad checksum";
>>>    0x00000000000067df <+559>:   mov    $0x0,%rdx
>>>    0x00000000000067e6 <+566>:   jmp    0x6807 <bch_btree_node_read_done+599>
>>>    0x00000000000067e8 <+568>:   nopl   0x0(%rax,%rax,1)
>>>    0x00000000000067f0 <+576>:   mov    0x1c(%rbx),%eax
>>>    0x00000000000067f3 <+579>:   jmpq   0x66bf <bch_btree_node_read_done+271>
>>>    0x00000000000067f8 <+584>:   nopl   0x0(%rax,%rax,1)
>>>
>>> 233                     switch (i->version) {
>>>    0x00000000000067b4 <+516>:   cmp    $0x1,%r10d
>>>    0x00000000000067bb <+523>:   je     0x6680 <bch_btree_node_read_done+208>
>>>
>>> 234                     case 0:
>>> 235                             if (i->csum != csum_set(i))
>>>    0x00000000000067c1 <+529>:   lea    0x20(%rbx),%r14
>>>    0x00000000000067c5 <+533>:   lea    0x8(%rbx),%rdi
>>>    0x00000000000067ce <+542>:   sub    %rdi,%rsi
>>>    0x00000000000067d1 <+545>:   callq  0x67d6 <bch_btree_node_read_done+550>
>>>    0x00000000000067d6 <+550>:   cmp    %rax,%r15
>>>    0x00000000000067d9 <+553>:   je     0x66a6 <bch_btree_node_read_done+246>
>>> 236                                     goto err;
>>> 237                             break;
>>> 238                     case BCACHE_BSET_VERSION:
>>> 239                             if (i->csum != btree_csum_set(b, i))
>>>    0x000000000000669d <+237>:   cmp    %rax,%r15
>>>    0x00000000000066a0 <+240>:   jne    0x67df <bch_btree_node_read_done+559>
>>>    0x00000000000067b8 <+520>:   mov    (%rbx),%r15
>>>
>>> 240                                     goto err;
>>> 241                             break;
>>> 242                     }
>>> 243
>>> 244                     err = "empty set";
>>>    0x00000000000069e0 <+1072>:  mov    $0x0,%rdx
>>>    0x00000000000069e7 <+1079>:  jmpq   0x6807 <bch_btree_node_read_done+599>
>>>
>>> 245                     if (i != b->keys.set[0].data && !i->keys)
>>>    0x00000000000066a6 <+246>:   cmp    %rbx,0x108(%r12)
>>>    0x00000000000066ae <+254>:   je     0x67f0 <bch_btree_node_read_done+576>
>>>    0x00000000000066b4 <+260>:   mov    0x1c(%rbx),%eax
>>>    0x00000000000066b7 <+263>:   test   %eax,%eax
>>>    0x00000000000066b9 <+265>:   je     0x69e0
>>> <bch_btree_node_read_done+1072>
>>>
>>> 246                             goto err;
>>> 247
>>> 248                     bch_btree_iter_push(iter, i->start,
>>> bset_bkey_last(i));
>>>    0x00000000000066c3 <+275>:   mov    %r14,%rsi
>>>    0x00000000000066c6 <+278>:   mov    %r13,%rdi
>>>    0x00000000000066c9 <+281>:   callq  0x66ce <bch_btree_node_read_done+286>
>>>
>>> 249
>>> 250                     b->written += set_blocks(i, block_bytes(b->c));
>>>    0x00000000000066ce <+286>:   mov    0x80(%r12),%rsi
>>>    0x00000000000066d6 <+294>:   mov    0x1c(%rbx),%eax
>>>    0x00000000000066d9 <+297>:   xor    %edx,%edx
>>>    0x00000000000066e3 <+307>:   movzwl 0x430(%rsi),%ecx
>>>    0x00000000000066ea <+314>:   shl    $0x9,%ecx
>>>    0x00000000000066ed <+317>:   movslq %ecx,%rcx
>>>    0x00000000000066f0 <+320>:   lea    0x1f(%rcx,%rax,8),%rax
>>>    0x00000000000066f5 <+325>:   div    %rcx
>>>    0x0000000000006704 <+340>:   mov    %eax,%edi
>>>    0x0000000000006706 <+342>:   add    0xc0(%r12),%di
>>>    0x0000000000006712 <+354>:   mov    %di,0xc0(%r12)
>>>
>>> 251             }
>>> 252
>>> 253             err = "corrupted btree";
>>>    0x00000000000069b0 <+1024>:  mov    $0x0,%rdx
>>>    0x00000000000069b7 <+1031>:  jmpq   0x6807 <bch_btree_node_read_done+599>
>>>    0x00000000000069bc <+1036>:  nopl   0x0(%rax)
>>>
>>> 254             for (i = write_block(b);
>>>    0x00000000000068a1 <+753>:   cmp    %rdx,%rcx
>>>    0x00000000000068a4 <+756>:   jae    0x68e5 <bch_btree_node_read_done+821>
>>>    0x00000000000068e0 <+816>:   cmp    %rdx,%rcx
>>>    0x00000000000068e3 <+819>:   jb     0x68c8 <bch_btree_node_read_done+792>
>>>
>>> 255                  bset_sector_offset(&b->keys, i) < KEY_SIZE(&b->key);
>>> 256                  i = ((void *) i) + block_bytes(b->c))
>>>    0x00000000000068d7 <+807>:   mov    %rcx,%rbx
>>>    0x00000000000068da <+810>:   sub    %r8d,%ecx
>>>
>>> 257                     if (i->seq == b->keys.set[0].data->seq)
>>>    0x00000000000068a6 <+758>:   mov    0x10(%r8),%rdi
>>>    0x00000000000068aa <+762>:   cmp    %rdi,0x10(%rbx)
>>>    0x00000000000068ae <+766>:   je     0x69b0
>>> <bch_btree_node_read_done+1024>
>>>    0x00000000000068b4 <+772>:   cltq
>>>    0x00000000000068b6 <+774>:   mov    %rax,%r9
>>>    0x00000000000068b9 <+777>:   lea    (%rbx,%rax,1),%rcx
>>>    0x00000000000068bd <+781>:   neg    %r9
>>>    0x00000000000068c0 <+784>:   jmp    0x68d7 <bch_btree_node_read_done+807>
>>>    0x00000000000068c2 <+786>:   nopw   0x0(%rax,%rax,1)
>>>    0x00000000000068c8 <+792>:   lea    (%rbx,%rax,1),%rcx
>>>    0x00000000000068cc <+796>:   cmp    0x10(%rcx,%r9,1),%rdi
>>>    0x00000000000068d1 <+801>:   je     0x69b0
>>> <bch_btree_node_read_done+1024>
>>>
>>> 258                             goto err;
>>> 259
>>> 260             bch_btree_sort_and_fix_extents(&b->keys, iter, &b->c->sort);
>>>    0x00000000000068e5 <+821>:   lea    0xc8(%r12),%r14
>>>    0x00000000000068ed <+829>:   lea    0xcb60(%rsi),%rdx
>>>    0x00000000000068f4 <+836>:   mov    %r13,%rsi
>>>    0x00000000000068f7 <+839>:   mov    %r14,%rdi
>>>    0x00000000000068fa <+842>:   callq  0x68ff <bch_btree_node_read_done+847>
>>>
>>> 261
>>> 262             i = b->keys.set[0].data;
>>>    0x0000000000006907 <+855>:   mov    0x108(%r12),%rbx
>>>
>>> 263             err = "short btree key";
>>>    0x00000000000069ec <+1084>:  mov    $0x0,%rdx
>>>    0x00000000000069f3 <+1091>:  jmpq   0x6807 <bch_btree_node_read_done+599>
>>>
>>> 264             if (b->keys.set[0].size &&
>>>    0x00000000000068ff <+847>:   mov    0xe0(%r12),%eax
>>>    0x0000000000006914 <+868>:   test   %eax,%eax
>>>    0x0000000000006916 <+870>:   je     0x694d <bch_btree_node_read_done+925>
>>>    0x0000000000006944 <+916>:   test   %rax,%rax
>>>    0x0000000000006947 <+919>:   js     0x69ec
>>> <bch_btree_node_read_done+1084>
>>>
>>> 265                 bkey_cmp(&b->key, &b->keys.set[0].end) < 0)
>>> 266                     goto err;
>>> 267
>>> 268             if (b->written < btree_blocks(b))
>>>    0x000000000000694d <+925>:   mov    0x80(%r12),%rax
>>>    0x0000000000006955 <+933>:   movzwl 0xc0(%r12),%esi
>>>    0x0000000000006965 <+949>:   movzwl 0xde2(%rax),%ecx
>>>    0x000000000000696c <+956>:   shr    %cl,%rdx
>>>    0x000000000000696f <+959>:   cmp    %edx,%esi
>>>    0x0000000000006971 <+961>:   jae    0x6868 <bch_btree_node_read_done+696>
>>>
>>> 269                     bch_bset_init_next(&b->keys, write_block(b),
>>>    0x000000000000698f <+991>:   mov    %r14,%rdi
>>>    0x000000000000699e <+1006>:  callq  0x69a3
>>> <bch_btree_node_read_done+1011>
>>>    0x00000000000069a3 <+1011>:  mov    0x80(%r12),%rax
>>>    0x00000000000069ab <+1019>:  jmpq   0x6868 <bch_btree_node_read_done+696>
>>>
>>> 270                                        bset_magic(&b->c->sb));
>>> 271     out:
>>> 272             mempool_free(iter, b->c->fill_iter);
>>>    0x0000000000006868 <+696>:   mov    0xcb58(%rax),%rsi
>>>    0x000000000000686f <+703>:   mov    %r13,%rdi
>>>    0x0000000000006872 <+706>:   callq  0x6877 <bch_btree_node_read_done+711>
>>>
>>> 273             return;
>>> 274     err:
>>> 275             set_btree_node_io_error(b);
>>> 276             bch_cache_set_error(b->c, "%s at bucket %zu, block %u,
>>> %u keys",
>>>    0x0000000000006829 <+633>:   mov    0x1c(%rbx),%r9d
>>>    0x000000000000684a <+666>:   mov    %esi,%ecx
>>>    0x000000000000684c <+668>:   mov    $0x0,%rsi
>>>    0x0000000000006853 <+675>:   shr    %cl,%r8d
>>>    0x0000000000006856 <+678>:   mov    %rax,%rcx
>>>    0x0000000000006859 <+681>:   xor    %eax,%eax
>>>    0x000000000000685b <+683>:   callq  0x6860 <bch_btree_node_read_done+688>
>>>    0x0000000000006860 <+688>:   mov    0x80(%r12),%rax
>>>
>>> 277                                 err, PTR_BUCKET_NR(b->c, &b->key, 0),
>>> 278                                 bset_block_offset(b, i), i->keys);
>>> 279             goto out;
>>> 280     }
>>>    0x0000000000006877 <+711>:   pop    %rbx
>>>    0x0000000000006878 <+712>:   pop    %r12
>>>    0x000000000000687a <+714>:   pop    %r13
>>>    0x000000000000687c <+716>:   pop    %r14
>>>    0x000000000000687e <+718>:   pop    %r15
>>>    0x0000000000006880 <+720>:   pop    %rbp
>>>    0x0000000000006881 <+721>:   retq
>>>    0x0000000000006882 <+722>:   movzwl 0x430(%rsi),%eax
>>>    0x0000000000006889 <+729>:   shl    $0x9,%eax
>>>    0x000000000000688c <+732>:   imul   %eax,%ecx
>>>    0x000000000000688f <+735>:   movslq %ecx,%rbx
>>>
>>>
>>> On 8/13/2014 1:45 PM, Slava Pestov wrote:
>>>> Can you post the disassembly of the function?
>>>>
>>>> On Wed, Aug 13, 2014 at 11:35 AM, Larkin Lowrey
>>>> <llowrey@nuclearwinter.com> wrote:
>>>>> Thanks. Trying gdb helped me find the answer. I needed to install the
>>>>> kernel-debuginfo-3.15.8-200.fc20.x86_64  package via yum.
>>>>>
>>>>> From addr2line:
>>>>>> bch_btree_node_read_done+0x4c
>>>>>> drivers/md/bcache/btree.c:207
>>>>> Here'a a snippet from gdb:
>>>>>
>>>>>> (gdb) list *(bch_btree_node_read_done+0x4c)
>>>>>> 0x65fc is in bch_btree_node_read_done (drivers/md/bcache/btree.c:207).
>>>>>> 202             struct bset *i = btree_bset_first(b);
>>>>>> 203             struct btree_iter *iter;
>>>>>> 204
>>>>>> 205             iter = mempool_alloc(b->c->fill_iter, GFP_NOWAIT);
>>>>>> 206             iter->size = b->c->sb.bucket_size / b->c->sb.block_size;
>>>>>> 207             iter->used = 0;
>>>>>> 208
>>>>>> 209     #ifdef CONFIG_BCACHE_DEBUG
>>>>>> 210             iter->b = &b->keys;
>>>>>> 211     #endif
>>>>> This doesn't make any sense to me. If iter was null I would expect line
>>>>> 206 to blow up first.
>>>>>
>>>>> --Larkin
>>>>>
>>>>> On 8/13/2014 12:41 PM, Slava Pestov wrote:
>>>>>> You can try to use gdb:
>>>>>>
>>>>>> gdb /lib/modules/.../foo.ko
>>>>>>
>>>>>> list *(bch_btree_node_read_done+0x4c)
>>>>>>
>>>>>>
>>>>>> On Wed, Aug 13, 2014 at 9:40 AM, Larkin Lowrey
>>>>>> <llowrey@nuclearwinter.com> wrote:
>>>>>>> This is making be feel very dumb. I've googled extensively but can't
>>>>>>> figure out how to run addr2line for a module.
>>>>>>>
>>>>>>> I'm running Fedora 20 and the kernel did not have debugging symbols. I
>>>>>>> downloaded the version with symbols but I don't know if the addresses
>>>>>>> are going to be the same. Bcache is a module for me and that's where
>>>>>>> things get tricky. Do you have any tips?
>>>>>>>
>>>>>>> --Larkin
>>>>>>>
>>>>>>> On 8/13/2014 12:04 AM, Kent Overstreet wrote:
>>>>>>>> Any chance you could do an addr2line and get me the exact line where
>>>>>>>> it happened?
>>>>>>>>
>>>>>>>> On Aug 12, 2014 10:02 PM, "Larkin Lowrey" <llowrey@nuclearwinter.com
>>>>>>>> <mailto:llowrey@nuclearwinter.com>> wrote:
>>>>>>>>
>>>>>>>>     I got an oops while doing some heavy I/O. I have an md raid10 cache
>>>>>>>>     device (4 SSDs) and 3 md raid5/6 backing devices. This setup has been
>>>>>>>>     well behaved for about 6 months.
>>>>>>>>
>>>>>>>>     If this isn't a known issue is there anything I can do to provide more
>>>>>>>>     useful information?
>>>>>>>>
>>>>>>>>     I'm running kernel 3.15.8-200.fc20.x86_64.
>>>>>>>>
>>>>>>>>     [210884.047249] BUG: unable to handle kernel NULL pointer
>>>>>>>>     dereference at 0000000000000008
>>>>>>>>     [210884.055605] IP: [<ffffffffa01625fc>]
>>>>>>>>     bch_btree_node_read_done+0x4c/0x450 [bcache]
>>>>>>>>     [210884.063723] PGD 0
>>>>>>>>     [210884.066053] Oops: 0002 [#1] SMP
>>>>>>>>     [210884.069610] Modules linked in: lp parport binfmt_misc
>>>>>>>>     ipt_MASQUERADE iptable_nat nf_nat_ipv4 nf_nat xt_CHECKSUM
>>>>>>>>     iptable_mangle tun bridge stp llc xt_multiport ebtable_nat
>>>>>>>>     ebtables hwmon_vid ip6t_REJECT nf_conntrack_ipv6 nf_conntrack_ipv4
>>>>>>>>     nf_defrag_ipv6 nf_defrag_ipv4 ip6table_filter xt_conntrack
>>>>>>>>     ip6_tables nf_conntrack keyspan ezusb kvm_amd kvm crct10dif_pclmul
>>>>>>>>     crc32_pclmul crc32c_intel ghash_clmulni_intel microcode serio_raw
>>>>>>>>     amd64_edac_mod edac_core fam15h_power k10temp edac_mce_amd
>>>>>>>>     sp5100_tco i2c_piix4 igb ptp pps_core dca shpchp acpi_cpufreq
>>>>>>>>     btrfs bcache raid456 async_raid6_recov async_memcpy async_pq
>>>>>>>>     async_xor async_tx xor raid6_pq raid10 i2c_algo_bit drm_kms_helper
>>>>>>>>     ttm drm i2c_core mpt2sas mvsas libsas raid_class
>>>>>>>>     scsi_transport_sas cpufreq_stats
>>>>>>>>     [210884.140704] CPU: 5 PID: 11188 Comm: kworker/5:1 Not tainted
>>>>>>>>     3.15.8-200.fc20.x86_64 #1
>>>>>>>>     [210884.149069] Hardware name:  /H8DG6/H8DGi, BIOS 3.0a       07/2
>>>>>>>>     [210884.155280] Workqueue: bcache cache_lookup [bcache]
>>>>>>>>     [210884.160531] task: ffff880218633160 ti: ffff8800217b8000
>>>>>>>>     task.ti: ffff8800217b8000
>>>>>>>>     [210884.168502] RIP: 0010:[<ffffffffa01625fc>]
>>>>>>>>      [<ffffffffa01625fc>] bch_btree_node_read_done+0x4c/0x450 [bcache]
>>>>>>>>     [210884.179105] RSP: 0000:ffff8800217bbbe8  EFLAGS: 00010212
>>>>>>>>     [210884.184806] RAX: 0000000000000400 RBX: ffff880245ec0000 RCX:
>>>>>>>>     0000000000000000
>>>>>>>>     [210884.192480] RDX: 0000000000000000 RSI: ffff880418380000 RDI:
>>>>>>>>     0000000000000246
>>>>>>>>     [210884.200075] RBP: ffff8800217bbc10 R08: 0000000000000000 R09:
>>>>>>>>     0000000000000f6b
>>>>>>>>     [210884.207738] R10: 0000000000000000 R11: 0000000000000400 R12:
>>>>>>>>     ffff880413d06c00
>>>>>>>>     [210884.215391] R13: 0000000000000000 R14: ffff8800217bbc20 R15:
>>>>>>>>     ffff880413d06c00
>>>>>>>>     [210884.222961] FS:  00007f73bacd6880(0000)
>>>>>>>>     GS:ffff88021fd40000(0000) knlGS:0000000000000000
>>>>>>>>     [210884.231516] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
>>>>>>>>     [210884.237557] CR2: 0000000000000008 CR3: 0000000001c11000 CR4:
>>>>>>>>     00000000000407e0
>>>>>>>>     [210884.245131] Stack:
>>>>>>>>     [210884.247395]  ffff880274f4d020 ffff880413d06c00
>>>>>>>>     0000bfcc44a463f8 ffff8800217bbc20
>>>>>>>>     [210884.255337]  ffff880413d06c00 ffff8800217bbc78
>>>>>>>>     ffffffffa0162b68 0000000000000000
>>>>>>>>     [210884.263256]  ffff880218633160 0000000000000000
>>>>>>>>     0000000000000000 0000000000000000
>>>>>>>>     [210884.271234] Call Trace:
>>>>>>>>     [210884.273985]  [<ffffffffa0162b68>]
>>>>>>>>     bch_btree_node_read+0x168/0x190 [bcache]
>>>>>>>>     [210884.281258]  [<ffffffffa0163f69>]
>>>>>>>>     bch_btree_node_get+0x169/0x290 [bcache]
>>>>>>>>     [210884.288377]  [<ffffffffa01642f5>]
>>>>>>>>     bch_btree_map_keys_recurse+0xd5/0x1d0 [bcache]
>>>>>>>>     [210884.296311]  [<ffffffffa016dcb0>] ?
>>>>>>>>     cached_dev_congested+0x180/0x180 [bcache]
>>>>>>>>     [210884.303953]  [<ffffffff8135b204>] ?
>>>>>>>>     call_rwsem_down_read_failed+0x14/0x30
>>>>>>>>     [210884.311158]  [<ffffffffa01673f7>]
>>>>>>>>     bch_btree_map_keys+0x127/0x150 [bcache]
>>>>>>>>     [210884.318273]  [<ffffffffa016dcb0>] ?
>>>>>>>>     cached_dev_congested+0x180/0x180 [bcache]
>>>>>>>>     [210884.325826]  [<ffffffffa016e7f5>] cache_lookup+0xf5/0x1f0 [bcache]
>>>>>>>>     [210884.332325]  [<ffffffff810a4af6>] process_one_work+0x176/0x430
>>>>>>>>     [210884.338427]  [<ffffffff810a578b>] worker_thread+0x11b/0x3a0
>>>>>>>>     [210884.344282]  [<ffffffff810a5670>] ? rescuer_thread+0x3b0/0x3b0
>>>>>>>>     [210884.350447]  [<ffffffff810ac528>] kthread+0xd8/0xf0
>>>>>>>>     [210884.355615]  [<ffffffff810ac450>] ? insert_kthread_work+0x40/0x40
>>>>>>>>     [210884.362017]  [<ffffffff816ff93c>] ret_from_fork+0x7c/0xb0
>>>>>>>>     [210884.367756]  [<ffffffff810ac450>] ? insert_kthread_work+0x40/0x40
>>>>>>>>     [210884.374234] Code: 08 01 00 00 48 8b b8 58 cb 00 00 e8 bf 25 01
>>>>>>>>     e1 49 8b b4 24 80 00 00 00 49 89 c5 31 d2 0f b7 86 32 04 00 00 66
>>>>>>>>     f7 b6 30 04 00 00 <49> c7 45 08 00 00 00 00 0f b7 c0 49 89 45 00
>>>>>>>>     48 8b 43 10 48 85
>>>>>>>>     [210884.395405] RIP  [<ffffffffa01625fc>]
>>>>>>>>     bch_btree_node_read_done+0x4c/0x450 [bcache]
>>>>>>>>     [210884.403389]  RSP <ffff8800217bbbe8>
>>>>>>>>     [210884.407171] CR2: 0000000000000008
>>>>>>>>     [210884.411233] ---[ end trace 0064e6abfd068c85 ]---
>>>>>>>>     [210884.416352] BUG: unable to handle kernel paging request at
>>>>>>>>     ffffffffffffffd8
>>>>>>>>     [210884.423871] IP: [<ffffffff810acb10>] kthread_data+0x10/0x20
>>>>>>>>     [210884.429915] PGD 1c14067 PUD 1c16067 PMD 0
>>>>>>>>
>>>>>>>>     --Larkin
>>>>>>>>
>>>>>>>>     --
>>>>>>>>     To unsubscribe from this list: send the line "unsubscribe
>>>>>>>>     linux-bcache" in
>>>>>>>>     the body of a message to majordomo@vger.kernel.org
>>>>>>>>     <mailto:majordomo@vger.kernel.org>
>>>>>>>>     More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>>>>
>>>>>>> --
>>>>>>> To unsubscribe from this list: send the line "unsubscribe linux-bcache" in
>>>>>>> the body of a message to majordomo@vger.kernel.org
>>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>> --
>>>>>> To unsubscribe from this list: send the line "unsubscribe linux-bcache" in
>>>>>> the body of a message to majordomo@vger.kernel.org
>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe linux-bcache" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Null pointer oops
  2014-08-13 21:30               ` Slava Pestov
  2014-08-13 21:34                 ` Jianjian Huo
  2014-08-13 22:14                 ` Larkin Lowrey
@ 2014-08-16  5:48                 ` Peter Kieser
  2 siblings, 0 replies; 13+ messages in thread
From: Peter Kieser @ 2014-08-16  5:48 UTC (permalink / raw)
  To: Slava Pestov, Larkin Lowrey; +Cc: Kent Overstreet, linux-bcache

[-- Attachment #1: Type: text/plain, Size: 352 bytes --]


On 2014-08-13 2:30 PM, Slava Pestov wrote:
> I was mistaken. The bug is fixed in the pull request Kent sent to Jens for 3.16:
>
> http://evilpiepirate.org/git/linux-bcache.git/commit/?h=bcache-dev&id=bcf090e0040e30f8409e6a535a01e6473afb096f
(Again) are these fixes going to be backported to Linux 3.10 (or other 
longterm kernels?)

-Peter


[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 4504 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2014-08-16  5:48 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-08-13  5:02 Null pointer oops Larkin Lowrey
     [not found] ` <CALJ65z=25CrrO9uMc2vfYVAQWb=6eK+OhB5TGJJrCp=D4ALvrQ@mail.gmail.com>
2014-08-13 16:40   ` Larkin Lowrey
2014-08-13 17:41     ` Slava Pestov
2014-08-13 18:35       ` Larkin Lowrey
2014-08-13 18:45         ` Slava Pestov
2014-08-13 21:21           ` Larkin Lowrey
2014-08-13 21:25             ` Slava Pestov
2014-08-13 21:30               ` Slava Pestov
2014-08-13 21:34                 ` Jianjian Huo
2014-08-13 22:14                 ` Larkin Lowrey
2014-08-16  5:48                 ` Peter Kieser
2014-08-13 21:32               ` Larkin Lowrey
2014-08-13 21:37                 ` Slava Pestov

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.