bcache: for-next unable to handle kernel NULL pointer dereference at 0000000000000019

All of lore.kernel.org
 help / color / mirror / Atom feed

* bcache: for-next unable to handle kernel NULL pointer dereference at 0000000000000019
@ 2017-10-23 11:42 Stefan Priebe - Profihost AG
  2017-10-23 12:56 ` Coly Li
  0 siblings, 1 reply; 12+ messages in thread
From: Stefan Priebe - Profihost AG @ 2017-10-23 11:42 UTC (permalink / raw)
  To: linux-bcache; +Cc: Michael Lyle, Coly Li

Hello,

i picked all bcache patches from for-next to my 4.4 kernel to test the
new controller.

After doing so i see random kernel panics with the following trace:

BUG: unable to handle kernel NULL pointer dereference at 0000000000000019
IP: [<ffffffffc04ef62e>] closure_sub+0xe/0xc0 [bcache]
PGD 0
Oops: 0002 [#1] SMP
Modules linked in: netconsole xt_multiport ipt_REJECT nf_reject_ipv4
xt_set iptable_filter ip_tables x_tables ip_set_hash_net ip_set
nfnetlink bonding ipmi_devintf sb_edac edac_core x86_pkg_temp_thermal
mgag200 kvm_intel ttm drm_kms_helper kvm irqbypass drm fb_sys_fops
syscopyarea crc32_pclmul sysfillrect sysimgblt ghash_clmulni_intel wmi
ipmi_si ipmi_msghandler shpchp button coretemp 8021q garp fuse btrfs xor
raid6_pq dm_mod usb_storage ohci_hcd usbhid bcache sg sd_mod ahci
ehci_pci i2c_i801 libahci ehci_hcd isci igb i2c_algo_bit ixgbe usbcore
libsas i2c_core mdio usb_common scsi_transport_sas ptp pps_core
CPU: 6 PID: 50 Comm: ksoftirqd/6 Not tainted 4.4.92+534-ph #1
Hardware name: Supermicro
X9SRE/X9SRE-3F/X9SRi/X9SRi-3F/X9SRE/X9SRE-3F/X9SRi/X9SRi-3F, BIOS 3.0
07/05/2013
task: ffff8802196ccb00 ti: ffff8802196d4000 task.ti: ffff8802196d4000
RIP: 0010:[<ffffffffc04ef62e>] [<ffffffffc04ef62e>] closure_sub+0xe/0xc0
[bcache]
RSP: 0018:ffff8802196d7c20 EFLAGS: 00010297
RAX: 00000000fdffffff RBX: fffffffffffffff1 RCX: 00000000000f4240
RDX: 0000000000070651 RSI: 0000000002000001 RDI: fffffffffffffff1
RBP: ffff8802196d7c28 R08: 0000000000000007 R09: ffff880219568000
R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000e00
R13: 0000000000000000 R14: 0000000000000e00 R15: ffff8805eb2dbd60
FS: 0000000000000000(0000) GS:ffff880c7f580000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000019 CR3: 0000000c7ee0b000 CR4: 00000000001406e0Stack:
fffffffffffffff1 ffff8802196d7c40 ffffffffc04ef70d ffff880c459ecbc0
ffff8802196d7c58 ffffffffc04f20df ffff880c459eccf0 ffff8802196d7c70
ffffffffc04ef616 ffff880c459eccf0 ffff8802196d7c88 ffffffffc04f16a7
Call Trace:
[<ffffffffc04ef70d>] __closure_wake_up+0x2d/0x40 [bcache]
[<ffffffffc04f20df>] journal_write_done+0x2f/0xa0 [bcache]
[<ffffffffc04ef616>] closure_put+0xb6/0xc0 [bcache]
[<ffffffffc04f16a7>] journal_write_endio+0x37/0x40 [bcache]
[<ffffffff81397b16>] bio_endio+0x56/0x60
[<ffffffff813a00cb>] blk_update_request+0x8b/0x370
[<ffffffff8150b663>] scsi_end_request+0x33/0x1c0
[<ffffffff8150dddd>] scsi_io_completion+0x18d/0x660
[<ffffffff815046df>] scsi_finish_command+0xcf/0x120
[<ffffffff8150d566>] scsi_softirq_done+0x126/0x150
[<ffffffff813a7f88>] blk_done_softirq+0x78/0x90
[<ffffffff8108a48c>] __do_softirq+0x11c/0x2e0
[<ffffffff8108a678>] run_ksoftirqd+0x28/0x50
[<ffffffff810a6fb9>] smpboot_thread_fn+0x139/0x1a0
[<ffffffff810a3aeb>] kthread+0xeb/0x110
[<ffffffff816dbd0f>] ret_from_fork+0x3f/0x70
DWARF2 unwinder stuck at ret_from_fork+0x3f/0x70
Leftover inexact backtrace:
2017-10-23 13:35:09
[<ffffffff810a3a00>] ? kthread_park+0x60/0x60
Code: e8 a8 d6 ba c0 84 c0 75 83 0f 0b 48 8b 5f 20 eb af ff d1 e9 74 ff
ff ff 0f 0b 0f 1f 00 0f 1f 44 00 00 55 89 f0 f7 d8 48 89 e5 53 <f0> 0f
c1 47 28 29 f0 89 c1 81 e1 ff ff 7f 00 a9 00 00 00 55 75
RIP [<ffffffffc04ef62e>] closure_sub+0xe/0xc0 [bcache]
RSP <ffff8802196d7c20>
CR2: 0000000000000019
---[ end trace a4cc6c37159f8e49 ]---
Kernel panic - not syncing: Fatal exception in interrupt
Kernel Offset: 0x0 from 0xffffffff81000000 (relocation range:
0xffffffff80000000-0xffffffffbfffffff)


Greets,
Stefan

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: bcache: for-next unable to handle kernel NULL pointer dereference at 0000000000000019
  2017-10-23 11:42 bcache: for-next unable to handle kernel NULL pointer dereference at 0000000000000019 Stefan Priebe - Profihost AG
@ 2017-10-23 12:56 ` Coly Li
  2017-10-23 12:59   ` Stefan Priebe - Profihost AG
  0 siblings, 1 reply; 12+ messages in thread
From: Coly Li @ 2017-10-23 12:56 UTC (permalink / raw)
  To: Stefan Priebe - Profihost AG, linux-bcache; +Cc: Michael Lyle

On 2017/10/23 下午7:42, Stefan Priebe - Profihost AG wrote:
> Hello,
> 
> i picked all bcache patches from for-next to my 4.4 kernel to test the
> new controller.
> 
> After doing so i see random kernel panics with the following trace:

Hi Stefan,

Thanks for the report. This is the 3rd report I see recently for NULL
pointer dereference, maybe they are related (or maybe not). Is it a
panic when bcache starts to run, or during heavy workload ?

If I may have chance to trigger similar oops on my server, that will be
much easier. So far I cannot reproduce any oops, neither by rebooting
and assemble bcache device by udev rules, nor compose bcache device and
run it by bash scripts...

Thank you in advance.

Coly Li

> 
> BUG: unable to handle kernel NULL pointer dereference at 0000000000000019
> IP: [<ffffffffc04ef62e>] closure_sub+0xe/0xc0 [bcache]
> PGD 0
> Oops: 0002 [#1] SMP
> Modules linked in: netconsole xt_multiport ipt_REJECT nf_reject_ipv4
> xt_set iptable_filter ip_tables x_tables ip_set_hash_net ip_set
> nfnetlink bonding ipmi_devintf sb_edac edac_core x86_pkg_temp_thermal
> mgag200 kvm_intel ttm drm_kms_helper kvm irqbypass drm fb_sys_fops
> syscopyarea crc32_pclmul sysfillrect sysimgblt ghash_clmulni_intel wmi
> ipmi_si ipmi_msghandler shpchp button coretemp 8021q garp fuse btrfs xor
> raid6_pq dm_mod usb_storage ohci_hcd usbhid bcache sg sd_mod ahci
> ehci_pci i2c_i801 libahci ehci_hcd isci igb i2c_algo_bit ixgbe usbcore
> libsas i2c_core mdio usb_common scsi_transport_sas ptp pps_core
> CPU: 6 PID: 50 Comm: ksoftirqd/6 Not tainted 4.4.92+534-ph #1
> Hardware name: Supermicro
> X9SRE/X9SRE-3F/X9SRi/X9SRi-3F/X9SRE/X9SRE-3F/X9SRi/X9SRi-3F, BIOS 3.0
> 07/05/2013
> task: ffff8802196ccb00 ti: ffff8802196d4000 task.ti: ffff8802196d4000
> RIP: 0010:[<ffffffffc04ef62e>] [<ffffffffc04ef62e>] closure_sub+0xe/0xc0
> [bcache]
> RSP: 0018:ffff8802196d7c20 EFLAGS: 00010297
> RAX: 00000000fdffffff RBX: fffffffffffffff1 RCX: 00000000000f4240
> RDX: 0000000000070651 RSI: 0000000002000001 RDI: fffffffffffffff1
> RBP: ffff8802196d7c28 R08: 0000000000000007 R09: ffff880219568000
> R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000e00
> R13: 0000000000000000 R14: 0000000000000e00 R15: ffff8805eb2dbd60
> FS: 0000000000000000(0000) GS:ffff880c7f580000(0000) knlGS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 0000000000000019 CR3: 0000000c7ee0b000 CR4: 00000000001406e0Stack:
> fffffffffffffff1 ffff8802196d7c40 ffffffffc04ef70d ffff880c459ecbc0
> ffff8802196d7c58 ffffffffc04f20df ffff880c459eccf0 ffff8802196d7c70
> ffffffffc04ef616 ffff880c459eccf0 ffff8802196d7c88 ffffffffc04f16a7
> Call Trace:
> [<ffffffffc04ef70d>] __closure_wake_up+0x2d/0x40 [bcache]
> [<ffffffffc04f20df>] journal_write_done+0x2f/0xa0 [bcache]
> [<ffffffffc04ef616>] closure_put+0xb6/0xc0 [bcache]
> [<ffffffffc04f16a7>] journal_write_endio+0x37/0x40 [bcache]
> [<ffffffff81397b16>] bio_endio+0x56/0x60
> [<ffffffff813a00cb>] blk_update_request+0x8b/0x370
> [<ffffffff8150b663>] scsi_end_request+0x33/0x1c0
> [<ffffffff8150dddd>] scsi_io_completion+0x18d/0x660
> [<ffffffff815046df>] scsi_finish_command+0xcf/0x120
> [<ffffffff8150d566>] scsi_softirq_done+0x126/0x150
> [<ffffffff813a7f88>] blk_done_softirq+0x78/0x90
> [<ffffffff8108a48c>] __do_softirq+0x11c/0x2e0
> [<ffffffff8108a678>] run_ksoftirqd+0x28/0x50
> [<ffffffff810a6fb9>] smpboot_thread_fn+0x139/0x1a0
> [<ffffffff810a3aeb>] kthread+0xeb/0x110
> [<ffffffff816dbd0f>] ret_from_fork+0x3f/0x70
> DWARF2 unwinder stuck at ret_from_fork+0x3f/0x70
> Leftover inexact backtrace:
> 2017-10-23 13:35:09
> [<ffffffff810a3a00>] ? kthread_park+0x60/0x60
> Code: e8 a8 d6 ba c0 84 c0 75 83 0f 0b 48 8b 5f 20 eb af ff d1 e9 74 ff
> ff ff 0f 0b 0f 1f 00 0f 1f 44 00 00 55 89 f0 f7 d8 48 89 e5 53 <f0> 0f
> c1 47 28 29 f0 89 c1 81 e1 ff ff 7f 00 a9 00 00 00 55 75
> RIP [<ffffffffc04ef62e>] closure_sub+0xe/0xc0 [bcache]
> RSP <ffff8802196d7c20>
> CR2: 0000000000000019
> ---[ end trace a4cc6c37159f8e49 ]---
> Kernel panic - not syncing: Fatal exception in interrupt
> Kernel Offset: 0x0 from 0xffffffff81000000 (relocation range:
> 0xffffffff80000000-0xffffffffbfffffff)

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: bcache: for-next unable to handle kernel NULL pointer dereference at 0000000000000019
  2017-10-23 12:56 ` Coly Li
@ 2017-10-23 12:59   ` Stefan Priebe - Profihost AG
  2017-10-23 13:05     ` Coly Li
  0 siblings, 1 reply; 12+ messages in thread
From: Stefan Priebe - Profihost AG @ 2017-10-23 12:59 UTC (permalink / raw)
  To: Coly Li, linux-bcache; +Cc: Michael Lyle

Hi Coly,


Am 23.10.2017 um 14:56 schrieb Coly Li:
> On 2017/10/23 下午7:42, Stefan Priebe - Profihost AG wrote:
>> Hello,
>>
>> i picked all bcache patches from for-next to my 4.4 kernel to test the
>> new controller.
>>
>> After doing so i see random kernel panics with the following trace:
> 
> Hi Stefan,
> 
> Thanks for the report. This is the 3rd report I see recently for NULL
> pointer dereference, maybe they are related (or maybe not). Is it a
> panic when bcache starts to run, or during heavy workload ?

It's during heavy / normal workload.

> If I may have chance to trigger similar oops on my server, that will be
> much easier. So far I cannot reproduce any oops, neither by rebooting
> and assemble bcache device by udev rules, nor compose bcache device and
> run it by bash scripts...

Do you need the line where this happens? It should be possible to get
the line from the IP: [<ffffffffc04ef62e>] output?

Greets,
Stefan

> Thank you in advance.
> 
> Coly Li
> 
>>
>> BUG: unable to handle kernel NULL pointer dereference at 0000000000000019
>> IP: [<ffffffffc04ef62e>] closure_sub+0xe/0xc0 [bcache]
>> PGD 0
>> Oops: 0002 [#1] SMP
>> Modules linked in: netconsole xt_multiport ipt_REJECT nf_reject_ipv4
>> xt_set iptable_filter ip_tables x_tables ip_set_hash_net ip_set
>> nfnetlink bonding ipmi_devintf sb_edac edac_core x86_pkg_temp_thermal
>> mgag200 kvm_intel ttm drm_kms_helper kvm irqbypass drm fb_sys_fops
>> syscopyarea crc32_pclmul sysfillrect sysimgblt ghash_clmulni_intel wmi
>> ipmi_si ipmi_msghandler shpchp button coretemp 8021q garp fuse btrfs xor
>> raid6_pq dm_mod usb_storage ohci_hcd usbhid bcache sg sd_mod ahci
>> ehci_pci i2c_i801 libahci ehci_hcd isci igb i2c_algo_bit ixgbe usbcore
>> libsas i2c_core mdio usb_common scsi_transport_sas ptp pps_core
>> CPU: 6 PID: 50 Comm: ksoftirqd/6 Not tainted 4.4.92+534-ph #1
>> Hardware name: Supermicro
>> X9SRE/X9SRE-3F/X9SRi/X9SRi-3F/X9SRE/X9SRE-3F/X9SRi/X9SRi-3F, BIOS 3.0
>> 07/05/2013
>> task: ffff8802196ccb00 ti: ffff8802196d4000 task.ti: ffff8802196d4000
>> RIP: 0010:[<ffffffffc04ef62e>] [<ffffffffc04ef62e>] closure_sub+0xe/0xc0
>> [bcache]
>> RSP: 0018:ffff8802196d7c20 EFLAGS: 00010297
>> RAX: 00000000fdffffff RBX: fffffffffffffff1 RCX: 00000000000f4240
>> RDX: 0000000000070651 RSI: 0000000002000001 RDI: fffffffffffffff1
>> RBP: ffff8802196d7c28 R08: 0000000000000007 R09: ffff880219568000
>> R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000e00
>> R13: 0000000000000000 R14: 0000000000000e00 R15: ffff8805eb2dbd60
>> FS: 0000000000000000(0000) GS:ffff880c7f580000(0000) knlGS:0000000000000000
>> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> CR2: 0000000000000019 CR3: 0000000c7ee0b000 CR4: 00000000001406e0Stack:
>> fffffffffffffff1 ffff8802196d7c40 ffffffffc04ef70d ffff880c459ecbc0
>> ffff8802196d7c58 ffffffffc04f20df ffff880c459eccf0 ffff8802196d7c70
>> ffffffffc04ef616 ffff880c459eccf0 ffff8802196d7c88 ffffffffc04f16a7
>> Call Trace:
>> [<ffffffffc04ef70d>] __closure_wake_up+0x2d/0x40 [bcache]
>> [<ffffffffc04f20df>] journal_write_done+0x2f/0xa0 [bcache]
>> [<ffffffffc04ef616>] closure_put+0xb6/0xc0 [bcache]
>> [<ffffffffc04f16a7>] journal_write_endio+0x37/0x40 [bcache]
>> [<ffffffff81397b16>] bio_endio+0x56/0x60
>> [<ffffffff813a00cb>] blk_update_request+0x8b/0x370
>> [<ffffffff8150b663>] scsi_end_request+0x33/0x1c0
>> [<ffffffff8150dddd>] scsi_io_completion+0x18d/0x660
>> [<ffffffff815046df>] scsi_finish_command+0xcf/0x120
>> [<ffffffff8150d566>] scsi_softirq_done+0x126/0x150
>> [<ffffffff813a7f88>] blk_done_softirq+0x78/0x90
>> [<ffffffff8108a48c>] __do_softirq+0x11c/0x2e0
>> [<ffffffff8108a678>] run_ksoftirqd+0x28/0x50
>> [<ffffffff810a6fb9>] smpboot_thread_fn+0x139/0x1a0
>> [<ffffffff810a3aeb>] kthread+0xeb/0x110
>> [<ffffffff816dbd0f>] ret_from_fork+0x3f/0x70
>> DWARF2 unwinder stuck at ret_from_fork+0x3f/0x70
>> Leftover inexact backtrace:
>> 2017-10-23 13:35:09
>> [<ffffffff810a3a00>] ? kthread_park+0x60/0x60
>> Code: e8 a8 d6 ba c0 84 c0 75 83 0f 0b 48 8b 5f 20 eb af ff d1 e9 74 ff
>> ff ff 0f 0b 0f 1f 00 0f 1f 44 00 00 55 89 f0 f7 d8 48 89 e5 53 <f0> 0f
>> c1 47 28 29 f0 89 c1 81 e1 ff ff 7f 00 a9 00 00 00 55 75
>> RIP [<ffffffffc04ef62e>] closure_sub+0xe/0xc0 [bcache]
>> RSP <ffff8802196d7c20>
>> CR2: 0000000000000019
>> ---[ end trace a4cc6c37159f8e49 ]---
>> Kernel panic - not syncing: Fatal exception in interrupt
>> Kernel Offset: 0x0 from 0xffffffff81000000 (relocation range:
>> 0xffffffff80000000-0xffffffffbfffffff)
> --
> To unsubscribe from this list: send the line "unsubscribe linux-bcache" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: bcache: for-next unable to handle kernel NULL pointer dereference at 0000000000000019
  2017-10-23 12:59   ` Stefan Priebe - Profihost AG
@ 2017-10-23 13:05     ` Coly Li
  2017-10-23 13:16       ` Stefan Priebe - Profihost AG
  0 siblings, 1 reply; 12+ messages in thread
From: Coly Li @ 2017-10-23 13:05 UTC (permalink / raw)
  To: Stefan Priebe - Profihost AG, linux-bcache; +Cc: Michael Lyle

On 2017/10/23 下午8:59, Stefan Priebe - Profihost AG wrote:
> Hi Coly,
> 
> 
> Am 23.10.2017 um 14:56 schrieb Coly Li:
>> On 2017/10/23 下午7:42, Stefan Priebe - Profihost AG wrote:
>>> Hello,
>>>
>>> i picked all bcache patches from for-next to my 4.4 kernel to test the
>>> new controller.
>>>
>>> After doing so i see random kernel panics with the following trace:
>>
>> Hi Stefan,
>>
>> Thanks for the report. This is the 3rd report I see recently for NULL
>> pointer dereference, maybe they are related (or maybe not). Is it a
>> panic when bcache starts to run, or during heavy workload ?
> 
> It's during heavy / normal workload.
> 
>> If I may have chance to trigger similar oops on my server, that will be
>> much easier. So far I cannot reproduce any oops, neither by rebooting
>> and assemble bcache device by udev rules, nor compose bcache device and
>> run it by bash scripts...
> 
> Do you need the line where this happens? It should be possible to get
> the line from the IP: [<ffffffffc04ef62e>] output?
> 
This is very helpful. Is it possible to get a kdump crash for the kernel
oops, that will be much more informative :-)

Thanks.

-- 
Coly Li

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: bcache: for-next unable to handle kernel NULL pointer dereference at 0000000000000019
  2017-10-23 13:05     ` Coly Li
@ 2017-10-23 13:16       ` Stefan Priebe - Profihost AG
  2017-10-23 14:00         ` Coly Li
  0 siblings, 1 reply; 12+ messages in thread
From: Stefan Priebe - Profihost AG @ 2017-10-23 13:16 UTC (permalink / raw)
  To: Coly Li, linux-bcache; +Cc: Michael Lyle

Hi,

Am 23.10.2017 um 15:05 schrieb Coly Li:
> On 2017/10/23 下午8:59, Stefan Priebe - Profihost AG wrote:
>> Hi Coly,
>>
>>
>> Am 23.10.2017 um 14:56 schrieb Coly Li:
>>> On 2017/10/23 下午7:42, Stefan Priebe - Profihost AG wrote:
>>>> Hello,
>>>>
>>>> i picked all bcache patches from for-next to my 4.4 kernel to test the
>>>> new controller.
>>>>
>>>> After doing so i see random kernel panics with the following trace:
>>>
>>> Hi Stefan,
>>>
>>> Thanks for the report. This is the 3rd report I see recently for NULL
>>> pointer dereference, maybe they are related (or maybe not). Is it a
>>> panic when bcache starts to run, or during heavy workload ?
>>
>> It's during heavy / normal workload.
>>
>>> If I may have chance to trigger similar oops on my server, that will be
>>> much easier. So far I cannot reproduce any oops, neither by rebooting
>>> and assemble bcache device by udev rules, nor compose bcache device and
>>> run it by bash scripts...
>>
>> Do you need the line where this happens? It should be possible to get
>> the line from the IP: [<ffffffffc04ef62e>] output?
>>
> This is very helpful.

May be i'm too stupid but it does not print anything useful:

# addr2line -f -e
/usr/lib/debug/lib/modules/4.4.92+534-ph/kernel/drivers/md/bcache/bcache.ko
ffffffffc04ef62e closure_sub
??
??:0
bch_inc_gen
??:?

> Is it possible to get a kdump crash for the kernel
> oops, that will be much more informative :-)

no idea how to archieve this for a remote Server.

Greets,
Stefan

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: bcache: for-next unable to handle kernel NULL pointer dereference at 0000000000000019
  2017-10-23 13:16       ` Stefan Priebe - Profihost AG
@ 2017-10-23 14:00         ` Coly Li
  2017-10-23 14:26           ` Stefan Priebe - Profihost AG
  0 siblings, 1 reply; 12+ messages in thread
From: Coly Li @ 2017-10-23 14:00 UTC (permalink / raw)
  To: Stefan Priebe - Profihost AG, linux-bcache; +Cc: Michael Lyle

On 2017/10/23 下午9:16, Stefan Priebe - Profihost AG wrote:
> Hi,
> 
> Am 23.10.2017 um 15:05 schrieb Coly Li:
>> On 2017/10/23 下午8:59, Stefan Priebe - Profihost AG wrote:
>>> Hi Coly,
>>>
>>>
>>> Am 23.10.2017 um 14:56 schrieb Coly Li:
>>>> On 2017/10/23 下午7:42, Stefan Priebe - Profihost AG wrote:
>>>>> Hello,
>>>>>
>>>>> i picked all bcache patches from for-next to my 4.4 kernel to test the
>>>>> new controller.
>>>>>
>>>>> After doing so i see random kernel panics with the following trace:
>>>>
>>>> Hi Stefan,
>>>>
>>>> Thanks for the report. This is the 3rd report I see recently for NULL
>>>> pointer dereference, maybe they are related (or maybe not). Is it a
>>>> panic when bcache starts to run, or during heavy workload ?
>>>
>>> It's during heavy / normal workload.
>>>
>>>> If I may have chance to trigger similar oops on my server, that will be
>>>> much easier. So far I cannot reproduce any oops, neither by rebooting
>>>> and assemble bcache device by udev rules, nor compose bcache device and
>>>> run it by bash scripts...
>>>
>>> Do you need the line where this happens? It should be possible to get
>>> the line from the IP: [<ffffffffc04ef62e>] output?
>>>
>> This is very helpful.
> 
> May be i'm too stupid but it does not print anything useful:
> 
> # addr2line -f -e
> /usr/lib/debug/lib/modules/4.4.92+534-ph/kernel/drivers/md/bcache/bcache.ko
> ffffffffc04ef62e closure_sub
> ??
> ??:0
> bch_inc_gen
> ??:?
> 
>> Is it possible to get a kdump crash for the kernel
>> oops, that will be much more informative :-)
> 
> no idea how to archieve this for a remote Server.

Hi Stefan,

In code path of closure_wake_up(), I remember there are two patches in
last run,
- commit a5f3d8a5eaaf ("bcache: use llist_for_each_entry_safe() in
__closure_wake_up()")
- commit 09b3efec81de ("bcache: Don't reinvent the wheel but use
existing llist API")

Can you check whether you have all of these patches ? Or can we try to
revoke these two patches and see whether oops still happens.

Thanks.

-- 
Coly Li

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: bcache: for-next unable to handle kernel NULL pointer dereference at 0000000000000019
  2017-10-23 14:00         ` Coly Li
@ 2017-10-23 14:26           ` Stefan Priebe - Profihost AG
  2017-10-23 14:39             ` Coly Li
  2017-10-23 16:48             ` Michael Lyle
  0 siblings, 2 replies; 12+ messages in thread
From: Stefan Priebe - Profihost AG @ 2017-10-23 14:26 UTC (permalink / raw)
  To: Coly Li, linux-bcache; +Cc: Michael Lyle

Hi,
Am 23.10.2017 um 16:00 schrieb Coly Li:
> On 2017/10/23 下午9:16, Stefan Priebe - Profihost AG wrote:
>> Hi,
>>
>> Am 23.10.2017 um 15:05 schrieb Coly Li:
>>> On 2017/10/23 下午8:59, Stefan Priebe - Profihost AG wrote:
>>>> Hi Coly,
>>>>
>>>>
>>>> Am 23.10.2017 um 14:56 schrieb Coly Li:
>>>>> On 2017/10/23 下午7:42, Stefan Priebe - Profihost AG wrote:
>>>>>> Hello,
>>>>>>
>>>>>> i picked all bcache patches from for-next to my 4.4 kernel to test the
>>>>>> new controller.
>>>>>>
>>>>>> After doing so i see random kernel panics with the following trace:
>>>>>
>>>>> Hi Stefan,
>>>>>
>>>>> Thanks for the report. This is the 3rd report I see recently for NULL
>>>>> pointer dereference, maybe they are related (or maybe not). Is it a
>>>>> panic when bcache starts to run, or during heavy workload ?
>>>>
>>>> It's during heavy / normal workload.
>>>>
>>>>> If I may have chance to trigger similar oops on my server, that will be
>>>>> much easier. So far I cannot reproduce any oops, neither by rebooting
>>>>> and assemble bcache device by udev rules, nor compose bcache device and
>>>>> run it by bash scripts...
>>>>
>>>> Do you need the line where this happens? It should be possible to get
>>>> the line from the IP: [<ffffffffc04ef62e>] output?
>>>>
>>> This is very helpful.
>>
>> May be i'm too stupid but it does not print anything useful:
>>
>> # addr2line -f -e
>> /usr/lib/debug/lib/modules/4.4.92+534-ph/kernel/drivers/md/bcache/bcache.ko
>> ffffffffc04ef62e closure_sub
>> ??
>> ??:0
>> bch_inc_gen
>> ??:?
>>
>>> Is it possible to get a kdump crash for the kernel
>>> oops, that will be much more informative :-)
>>
>> no idea how to archieve this for a remote Server.
> 
> Hi Stefan,
> 
> In code path of closure_wake_up(), I remember there are two patches in
> last run,
> - commit a5f3d8a5eaaf ("bcache: use llist_for_each_entry_safe() in
> __closure_wake_up()")
> - commit 09b3efec81de ("bcache: Don't reinvent the wheel but use
> existing llist API")
> 
> Can you check whether you have all of these patches ? Or can we try to
> revoke these two patches and see whether oops still happens.

It seems i'm missing a5f3d8a5eaaf but i have 09b3efec81de.

I missed it because
git log ..linux-block/for-next -- drivers/md/bcache/

does not show it. It seems linux-block/for-next does not contain it?
Which branch should i use?

Only those contain the mentioned commit:
  remotes/linux-block/for-linus
  remotes/linux-block/master
  remotes/linux-block/wbt-odirect

Greets,
Stefan

> 
> Thanks.
> 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: bcache: for-next unable to handle kernel NULL pointer dereference at 0000000000000019
  2017-10-23 14:26           ` Stefan Priebe - Profihost AG
@ 2017-10-23 14:39             ` Coly Li
  2017-10-25 13:26               ` Stefan Priebe - Profihost AG
  2017-10-23 16:48             ` Michael Lyle
  1 sibling, 1 reply; 12+ messages in thread
From: Coly Li @ 2017-10-23 14:39 UTC (permalink / raw)
  To: Stefan Priebe - Profihost AG, linux-bcache; +Cc: Michael Lyle

On 2017/10/23 下午10:26, Stefan Priebe - Profihost AG wrote:
> Hi,
> Am 23.10.2017 um 16:00 schrieb Coly Li:
>> On 2017/10/23 下午9:16, Stefan Priebe - Profihost AG wrote:
>>> Hi,
>>>
>>> Am 23.10.2017 um 15:05 schrieb Coly Li:
>>>> On 2017/10/23 下午8:59, Stefan Priebe - Profihost AG wrote:
>>>>> Hi Coly,
>>>>>
>>>>>
>>>>> Am 23.10.2017 um 14:56 schrieb Coly Li:
>>>>>> On 2017/10/23 下午7:42, Stefan Priebe - Profihost AG wrote:
>>>>>>> Hello,
>>>>>>>
>>>>>>> i picked all bcache patches from for-next to my 4.4 kernel to test the
>>>>>>> new controller.
>>>>>>>
>>>>>>> After doing so i see random kernel panics with the following trace:
>>>>>>
>>>>>> Hi Stefan,
>>>>>>
>>>>>> Thanks for the report. This is the 3rd report I see recently for NULL
>>>>>> pointer dereference, maybe they are related (or maybe not). Is it a
>>>>>> panic when bcache starts to run, or during heavy workload ?
>>>>>
>>>>> It's during heavy / normal workload.
>>>>>
>>>>>> If I may have chance to trigger similar oops on my server, that will be
>>>>>> much easier. So far I cannot reproduce any oops, neither by rebooting
>>>>>> and assemble bcache device by udev rules, nor compose bcache device and
>>>>>> run it by bash scripts...
>>>>>
>>>>> Do you need the line where this happens? It should be possible to get
>>>>> the line from the IP: [<ffffffffc04ef62e>] output?
>>>>>
>>>> This is very helpful.
>>>
>>> May be i'm too stupid but it does not print anything useful:
>>>
>>> # addr2line -f -e
>>> /usr/lib/debug/lib/modules/4.4.92+534-ph/kernel/drivers/md/bcache/bcache.ko
>>> ffffffffc04ef62e closure_sub
>>> ??
>>> ??:0
>>> bch_inc_gen
>>> ??:?
>>>
>>>> Is it possible to get a kdump crash for the kernel
>>>> oops, that will be much more informative :-)
>>>
>>> no idea how to archieve this for a remote Server.
>>
>> Hi Stefan,
>>
>> In code path of closure_wake_up(), I remember there are two patches in
>> last run,
>> - commit a5f3d8a5eaaf ("bcache: use llist_for_each_entry_safe() in
>> __closure_wake_up()")
>> - commit 09b3efec81de ("bcache: Don't reinvent the wheel but use
>> existing llist API")
>>
>> Can you check whether you have all of these patches ? Or can we try to
>> revoke these two patches and see whether oops still happens.
> 
> It seems i'm missing a5f3d8a5eaaf but i have 09b3efec81de.
> 
> I missed it because
> git log ..linux-block/for-next -- drivers/md/bcache/
> 
> does not show it. It seems linux-block/for-next does not contain it?
> Which branch should i use?
> 
> Only those contain the mentioned commit:
>   remotes/linux-block/for-linus
>   remotes/linux-block/master
>   remotes/linux-block/wbt-odirect
> 

Hi Stefan,

These 2 patches are in 4.14 mainline kernel already. This is my fault to
make commit 09b3efec81de buggy, and fix it in commit a5f3d8a5eaaf.

Could you please try again with the fixing patch ?

(And I guess maybe other 2 reports may also miss this fix).

Thanks.

Coly Li

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: bcache: for-next unable to handle kernel NULL pointer dereference at 0000000000000019
  2017-10-23 14:26           ` Stefan Priebe - Profihost AG
  2017-10-23 14:39             ` Coly Li
@ 2017-10-23 16:48             ` Michael Lyle
  2017-10-23 17:49               ` Stefan Priebe - Profihost AG
  1 sibling, 1 reply; 12+ messages in thread
From: Michael Lyle @ 2017-10-23 16:48 UTC (permalink / raw)
  To: Stefan Priebe - Profihost AG, Coly Li, linux-bcache

On 10/23/2017 07:26 AM, Stefan Priebe - Profihost AG wrote:
> It seems i'm missing a5f3d8a5eaaf but i have 09b3efec81de.
> 
> I missed it because
> git log ..linux-block/for-next -- drivers/md/bcache/
> 
> does not show it. It seems linux-block/for-next does not contain it?
> Which branch should i use?

This is one of the pitfalls of selectively taking a bunch of changes
back into an older kernel series, especially from development trees.
linux-block/for-next is guaranteed to contain all the changes needed to
take the linux-next tree into complete state, but not necessarily to be
based on a point in time on linux-next that captures all past changes.
Similarly, there's no guarantee that current bcache code will work well
on the 4.4 kernel without changes-- i.e. I am accepting changes that
rely on recent changes in kernel facilities to compile, and I would
accept changes that rely on recent changes in semantics to be correct.

Mike

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: bcache: for-next unable to handle kernel NULL pointer dereference at 0000000000000019
  2017-10-23 16:48             ` Michael Lyle
@ 2017-10-23 17:49               ` Stefan Priebe - Profihost AG
  0 siblings, 0 replies; 12+ messages in thread
From: Stefan Priebe - Profihost AG @ 2017-10-23 17:49 UTC (permalink / raw)
  To: Michael Lyle, Coly Li, linux-bcache


Am 23.10.2017 um 18:48 schrieb Michael Lyle:
> On 10/23/2017 07:26 AM, Stefan Priebe - Profihost AG wrote:
>> It seems i'm missing a5f3d8a5eaaf but i have 09b3efec81de.
>>
>> I missed it because
>> git log ..linux-block/for-next -- drivers/md/bcache/
>>
>> does not show it. It seems linux-block/for-next does not contain it?
>> Which branch should i use?
> 
> This is one of the pitfalls of selectively taking a bunch of changes
> back into an older kernel series, especially from development trees.
> linux-block/for-next is guaranteed to contain all the changes needed to
> take the linux-next tree into complete state, but not necessarily to be
> based on a point in time on linux-next that captures all past changes.
> Similarly, there's no guarantee that current bcache code will work well
> on the 4.4 kernel without changes-- i.e. I am accepting changes that
> rely on recent changes in kernel facilities to compile, and I would
> accept changes that rely on recent changes in semantics to be correct.

Yes sure but that's no problem. I already backported them.

Stefan

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: bcache: for-next unable to handle kernel NULL pointer dereference at 0000000000000019
  2017-10-23 14:39             ` Coly Li
@ 2017-10-25 13:26               ` Stefan Priebe - Profihost AG
  2017-10-25 14:05                 ` Coly Li
  0 siblings, 1 reply; 12+ messages in thread
From: Stefan Priebe - Profihost AG @ 2017-10-25 13:26 UTC (permalink / raw)
  To: Coly Li, linux-bcache; +Cc: Michael Lyle


Am 23.10.2017 um 16:39 schrieb Coly Li:
> On 2017/10/23 下午10:26, Stefan Priebe - Profihost AG wrote:
>> Hi,
>> Am 23.10.2017 um 16:00 schrieb Coly Li:
>>> On 2017/10/23 下午9:16, Stefan Priebe - Profihost AG wrote:
>>>> Hi,
>>>>
>>>> Am 23.10.2017 um 15:05 schrieb Coly Li:
>>>>> On 2017/10/23 下午8:59, Stefan Priebe - Profihost AG wrote:
>>>>>> Hi Coly,
>>>>>>
>>>>>>
>>>>>> Am 23.10.2017 um 14:56 schrieb Coly Li:
>>>>>>> On 2017/10/23 下午7:42, Stefan Priebe - Profihost AG wrote:
>>>>>>>> Hello,
>>>>>>>>
>>>>>>>> i picked all bcache patches from for-next to my 4.4 kernel to test the
>>>>>>>> new controller.
>>>>>>>>
>>>>>>>> After doing so i see random kernel panics with the following trace:
>>>>>>>
>>>>>>> Hi Stefan,
>>>>>>>
>>>>>>> Thanks for the report. This is the 3rd report I see recently for NULL
>>>>>>> pointer dereference, maybe they are related (or maybe not). Is it a
>>>>>>> panic when bcache starts to run, or during heavy workload ?
>>>>>>
>>>>>> It's during heavy / normal workload.
>>>>>>
>>>>>>> If I may have chance to trigger similar oops on my server, that will be
>>>>>>> much easier. So far I cannot reproduce any oops, neither by rebooting
>>>>>>> and assemble bcache device by udev rules, nor compose bcache device and
>>>>>>> run it by bash scripts...
>>>>>>
>>>>>> Do you need the line where this happens? It should be possible to get
>>>>>> the line from the IP: [<ffffffffc04ef62e>] output?
>>>>>>
>>>>> This is very helpful.
>>>>
>>>> May be i'm too stupid but it does not print anything useful:
>>>>
>>>> # addr2line -f -e
>>>> /usr/lib/debug/lib/modules/4.4.92+534-ph/kernel/drivers/md/bcache/bcache.ko
>>>> ffffffffc04ef62e closure_sub
>>>> ??
>>>> ??:0
>>>> bch_inc_gen
>>>> ??:?
>>>>
>>>>> Is it possible to get a kdump crash for the kernel
>>>>> oops, that will be much more informative :-)
>>>>
>>>> no idea how to archieve this for a remote Server.
>>>
>>> Hi Stefan,
>>>
>>> In code path of closure_wake_up(), I remember there are two patches in
>>> last run,
>>> - commit a5f3d8a5eaaf ("bcache: use llist_for_each_entry_safe() in
>>> __closure_wake_up()")
>>> - commit 09b3efec81de ("bcache: Don't reinvent the wheel but use
>>> existing llist API")
>>>
>>> Can you check whether you have all of these patches ? Or can we try to
>>> revoke these two patches and see whether oops still happens.
>>
>> It seems i'm missing a5f3d8a5eaaf but i have 09b3efec81de.
>>
>> I missed it because
>> git log ..linux-block/for-next -- drivers/md/bcache/
>>
>> does not show it. It seems linux-block/for-next does not contain it?
>> Which branch should i use?
>>
>> Only those contain the mentioned commit:
>>   remotes/linux-block/for-linus
>>   remotes/linux-block/master
>>   remotes/linux-block/wbt-odirect
>>
> 
> Hi Stefan,
> 
> These 2 patches are in 4.14 mainline kernel already. This is my fault to
> make commit 09b3efec81de buggy, and fix it in commit a5f3d8a5eaaf.
> 
> Could you please try again with the fixing patch ?
> 
> (And I guess maybe other 2 reports may also miss this fix).

OK indeed it seems it fixed it. Sorry about that.

Greets,
Stefan

> Thanks.
> 
> Coly Li
> 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: bcache: for-next unable to handle kernel NULL pointer dereference at 0000000000000019
  2017-10-25 13:26               ` Stefan Priebe - Profihost AG
@ 2017-10-25 14:05                 ` Coly Li
  0 siblings, 0 replies; 12+ messages in thread
From: Coly Li @ 2017-10-25 14:05 UTC (permalink / raw)
  To: Stefan Priebe - Profihost AG, linux-bcache; +Cc: Michael Lyle

On 2017/10/25 下午9:26, Stefan Priebe - Profihost AG wrote:
[snip]
>>>> Hi Stefan,
>>>>
>>>> In code path of closure_wake_up(), I remember there are two patches in
>>>> last run,
>>>> - commit a5f3d8a5eaaf ("bcache: use llist_for_each_entry_safe() in
>>>> __closure_wake_up()")
>>>> - commit 09b3efec81de ("bcache: Don't reinvent the wheel but use
>>>> existing llist API")
>>>>
>>>> Can you check whether you have all of these patches ? Or can we try to
>>>> revoke these two patches and see whether oops still happens.
>>>
>>> It seems i'm missing a5f3d8a5eaaf but i have 09b3efec81de.
>>>
>>> I missed it because
>>> git log ..linux-block/for-next -- drivers/md/bcache/
>>>
>>> does not show it. It seems linux-block/for-next does not contain it?
>>> Which branch should i use?
>>>
>>> Only those contain the mentioned commit:
>>>   remotes/linux-block/for-linus
>>>   remotes/linux-block/master
>>>   remotes/linux-block/wbt-odirect
>>>
>>
>> Hi Stefan,
>>
>> These 2 patches are in 4.14 mainline kernel already. This is my fault to
>> make commit 09b3efec81de buggy, and fix it in commit a5f3d8a5eaaf.
>>
>> Could you please try again with the fixing patch ?
>>
>> (And I guess maybe other 2 reports may also miss this fix).
> 
> OK indeed it seems it fixed it. Sorry about that.
> 

Hi Stefan,

Thanks for the confirm :-)

-- 
Coly Li

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2017-10-25 14:05 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-10-23 11:42 bcache: for-next unable to handle kernel NULL pointer dereference at 0000000000000019 Stefan Priebe - Profihost AG
2017-10-23 12:56 ` Coly Li
2017-10-23 12:59   ` Stefan Priebe - Profihost AG
2017-10-23 13:05     ` Coly Li
2017-10-23 13:16       ` Stefan Priebe - Profihost AG
2017-10-23 14:00         ` Coly Li
2017-10-23 14:26           ` Stefan Priebe - Profihost AG
2017-10-23 14:39             ` Coly Li
2017-10-25 13:26               ` Stefan Priebe - Profihost AG
2017-10-25 14:05                 ` Coly Li
2017-10-23 16:48             ` Michael Lyle
2017-10-23 17:49               ` Stefan Priebe - Profihost AG

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.