* 4.1-rc6 radi5 OOPS @ 2015-06-03 20:20 Jes Sorensen 2015-06-03 20:40 ` NeilBrown 0 siblings, 1 reply; 14+ messages in thread From: Jes Sorensen @ 2015-06-03 20:20 UTC (permalink / raw) To: NeilBrown; +Cc: linux-raid, Xiao Ni Neil, I was running testing on the current 4.1-rc6 tree (Linus' top of trunk 8cd9234c64c584432f6992fe944ca9e46ca8ea76) and I am seeing the following OOPS which is reproducible. It shows up when running the mdadm test suite, 07changelevelintr to be specific. Is this something you have seen? Cheers, Jes ------------[ cut here ]------------ kernel BUG at drivers/md/raid5.c:5391! invalid opcode: 0000 [#1] SMP Modules linked in: raid456 async_raid6_recov async_memcpy async_pq async_xor xor async_tx raid6_pq nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT nf_reject_ipv4 iptable_filter ip_tables tun bridge stp llc xfs x86_pkg_temp_thermal coretemp kvm_intel kvm crc32c_intel ghash_clmulni_intel libcrc32c aesni_intel aes_x86_64 glue_helper lrw gf128mul ablk_helper cryptd ppdev raid0 microcode pcspkr iTCO_wdt iTCO_vendor_support parport_pc i2c_i801 i2c_core parport shpchp lpc_ich mfd_core video acpi_cpufreq nfsd auth_rpcgss oid_registry exportfs nfs_acl lockd grace sunrpc uinput autofs4 ext4 mbcache jbd2 sd_mod e1000e ptp pps_core ahci r8169 mii libahci dm_mirror dm_region_hash dm_log dm_mod ipv6 CPU: 0 PID: 15142 Comm: md0_resync Tainted: G W 4.1.0-rc6+ #2 Hardware name: Intel Corporation S1200BTL/S1200BTL, BIOS S1200BT.86B.02.00.0035.030220120927 03/02/2012 task: ffff88022b2642d0 ti: ffff8802327e0000 task.ti: ffff8802327e0000 RIP: 0010:[<ffffffffa046d6f7>] [<ffffffffa046d6f7>] reshape_request+0x8a7/0x8b0 [raid456] RSP: 0018:ffff8802327e3b78 EFLAGS: 00010297 RAX: 0000000000000200 RBX: ffff88022b130400 RCX: 0000000000000200 RDX: 0000000000009801 RSI: 0000000000000200 RDI: 0000000000000001 RBP: ffff8802327e3c28 R08: 00000000000001ff R09: 0000000000000000 R10: fffffffffffffdff R11: 0000000000000001 R12: ffff8802327e3d0c R13: 0000000000000000 R14: ffff8802327e3d0c R15: ffff880234577000 FS: 0000000000000000(0000) GS:ffff88023ee00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f985b677140 CR3: 0000000001a0b000 CR4: 00000000001406f0 Stack: 0000000000000000 0000000000000200 0000000000000001 0000000000000000 0000000000000000 fffffffffffffdff 0000020000000001 ffffffff815b34d3 ffff88022b2642d0 ffff880235f2f080 ffff8802327e3c08 ffff8802327e4000 Call Trace: [<ffffffff815b34d3>] ? __schedule+0x383/0x8e0 [<ffffffffa046da2e>] sync_request+0x32e/0x3a0 [raid456] [<ffffffff81092738>] ? __wake_up+0x48/0x60 [<ffffffff8148b974>] md_do_sync+0x8f4/0xe90 [<ffffffff810779bc>] ? update_rq_clock.part.89+0x1c/0x40 [<ffffffff81487c98>] md_thread+0x128/0x140 [<ffffffff81487b70>] ? find_pers+0x80/0x80 [<ffffffff81487b70>] ? find_pers+0x80/0x80 [<ffffffff81071b39>] kthread+0xc9/0xe0 [<ffffffff810eddd6>] ? __audit_syscall_exit+0x1e6/0x280 [<ffffffff81071a70>] ? kthread_create_on_node+0x170/0x170 [<ffffffff815b7352>] ret_from_fork+0x42/0x70 [<ffffffff81071a70>] ? kthread_create_on_node+0x170/0x170 Code: 00 00 be 03 00 00 00 e8 18 50 c2 e0 49 8d 7f 50 48 c7 c2 39 55 47 a0 31 f6 e8 56 21 da e0 48 8b 45 88 e9 ae fb ff ff 0f 0b 0f 0b <0f> 0b 0f 1f 80 00 00 00 00 55 48 89 e5 41 57 41 56 41 55 41 54 RIP [<ffffffffa046d6f7>] reshape_request+0x8a7/0x8b0 [raid456] RSP <ffff8802327e3b78> ---[ end trace 1d72a07e246912ea ]--- BUG: unable to handle kernel NULL pointer dereference at (null) IP: [<ffffffff81073c1f>] exit_creds+0x1f/0x70 PGD bdc0c067 PUD a6697067 PMD 0 Oops: 0000 [#2] SMP Modules linked in: raid456 async_raid6_recov async_memcpy async_pq async_xor xor async_tx raid6_pq nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT nf_reject_ipv4 iptable_filter ip_tables tun bridge stp llc xfs x86_pkg_temp_thermal coretemp kvm_intel kvm crc32c_intel ghash_clmulni_intel libcrc32c aesni_intel aes_x86_64 glue_helper lrw gf128mul ablk_helper cryptd ppdev raid0 microcode pcspkr iTCO_wdt iTCO_vendor_support parport_pc i2c_i801 i2c_core parport shpchp lpc_ich mfd_core video acpi_cpufreq nfsd auth_rpcgss oid_registry exportfs nfs_acl lockd grace sunrpc uinput autofs4 ext4 mbcache jbd2 sd_mod e1000e ptp pps_core ahci r8169 mii libahci dm_mirror dm_region_hash dm_log dm_mod ipv6 CPU: 1 PID: 15030 Comm: mdadm Tainted: G D W 4.1.0-rc6+ #2 Hardware name: Intel Corporation S1200BTL/S1200BTL, BIOS S1200BT.86B.02.00.0035.030220120927 03/02/2012 task: ffff880233550110 ti: ffff8800a6cbc000 task.ti: ffff8800a6cbc000 RIP: 0010:[<ffffffff81073c1f>] [<ffffffff81073c1f>] exit_creds+0x1f/0x70 RSP: 0018:ffff8800a6cbfca8 EFLAGS: 00010292 RAX: 0000000000000000 RBX: ffff88022b2642d0 RCX: ffff8800a6cbfd00 RDX: 0000000000005110 RSI: 0000000000000296 RDI: 0000000000000000 RBP: ffff8800a6cbfcb8 R08: 0000000000000000 R09: 0000000000000000 R10: 000000000000000b R11: 0000000000000246 R12: ffff88022b2642d0 R13: 000000000000ff00 R14: 0000000000000004 R15: 0000000000000004 FS: 00007f70ff26f740(0000) GS:ffff88023ee20000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000000 CR3: 00000000a66cf000 CR4: 00000000001406e0 Stack: ffff8800a6cbfcb8 ffff88022b2642d0 ffff8800a6cbfcd8 ffffffff8105264a 0000000000000000 ffff88022b2642d0 ffff8800a6cbfd08 ffffffff81072048 ffff880000000001 ffff880233cb9140 ffff880234577150 ffffffffa0477320 Call Trace: [<ffffffff8105264a>] __put_task_struct+0x4a/0x130 [<ffffffff81072048>] kthread_stop+0x88/0x100 [<ffffffff81487cf5>] md_unregister_thread+0x45/0x80 [<ffffffff8148e73d>] md_reap_sync_thread+0x1d/0x1a0 [<ffffffff8148ea68>] action_store+0x1a8/0x2b0 [<ffffffff8105cd4d>] ? ns_capable+0x2d/0x60 [<ffffffff8148bf8b>] md_attr_store+0x7b/0xd0 [<ffffffff8120f77d>] sysfs_kf_write+0x3d/0x50 [<ffffffff8120eeda>] kernfs_fop_write+0x12a/0x180 [<ffffffff81199ba8>] __vfs_write+0x28/0xf0 [<ffffffff8119c7a9>] ? __sb_start_write+0x49/0xf0 [<ffffffff81230b93>] ? security_file_permission+0x23/0xa0 [<ffffffff8119a299>] vfs_write+0xa9/0x1b0 [<ffffffff8119b066>] SyS_write+0x46/0xb0 [<ffffffff810edb94>] ? __audit_syscall_entry+0xb4/0x110 [<ffffffff815b6f97>] system_call_fastpath+0x12/0x6a Code: 0f 84 37 fe ff ff e9 10 fe ff ff 90 0f 1f 44 00 00 55 48 89 e5 53 48 89 fb 48 83 ec 08 48 8b 87 c8 09 00 00 48 8b bf c0 09 00 00 <8b> 00 48 c7 83 c0 09 00 00 00 00 00 00 f0 ff 0f 74 1f 48 8b bb RIP [<ffffffff81073c1f>] exit_creds+0x1f/0x70 RSP <ffff8800a6cbfca8> CR2: 0000000000000000 ---[ end trace 1d72a07e246912eb ]--- ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: 4.1-rc6 radi5 OOPS 2015-06-03 20:20 4.1-rc6 radi5 OOPS Jes Sorensen @ 2015-06-03 20:40 ` NeilBrown 2015-06-03 21:57 ` Jes Sorensen 0 siblings, 1 reply; 14+ messages in thread From: NeilBrown @ 2015-06-03 20:40 UTC (permalink / raw) To: Jes Sorensen; +Cc: linux-raid, Xiao Ni [-- Attachment #1: Type: text/plain, Size: 7148 bytes --] On Wed, 03 Jun 2015 16:20:21 -0400 Jes Sorensen <Jes.Sorensen@redhat.com> wrote: > Neil, > > I was running testing on the current 4.1-rc6 tree (Linus' top of trunk > 8cd9234c64c584432f6992fe944ca9e46ca8ea76) and I am seeing the following > OOPS which is reproducible. > > It shows up when running the mdadm test suite, 07changelevelintr to be > specific. > > Is this something you have seen? > > Cheers, > Jes > > ------------[ cut here ]------------ > kernel BUG at drivers/md/raid5.c:5391! No, I haven't seen that. And I've been running the test suite quite a bit lately. Can you get it to print out the relevant numbers? Include readpos/writepos/safepos too. Thanks, NeilBrown > invalid opcode: 0000 [#1] SMP > Modules linked in: raid456 async_raid6_recov async_memcpy async_pq async_xor xor async_tx raid6_pq nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT nf_reject_ipv4 iptable_filter ip_tables tun bridge stp llc xfs x86_pkg_temp_thermal coretemp kvm_intel kvm crc32c_intel ghash_clmulni_intel libcrc32c aesni_intel aes_x86_64 glue_helper lrw gf128mul ablk_helper cryptd ppdev raid0 microcode pcspkr iTCO_wdt iTCO_vendor_support parport_pc i2c_i801 i2c_core parport shpchp lpc_ich mfd_core video acpi_cpufreq nfsd auth_rpcgss oid_registry exportfs nfs_acl lockd grace sunrpc uinput autofs4 ext4 mbcache jbd2 sd_mod e1000e ptp pps_core ahci r8169 mii libahci dm_mirror dm_region_hash dm_log dm_mod ipv6 > CPU: 0 PID: 15142 Comm: md0_resync Tainted: G W 4.1.0-rc6+ #2 > Hardware name: Intel Corporation S1200BTL/S1200BTL, BIOS S1200BT.86B.02.00.0035.030220120927 03/02/2012 > task: ffff88022b2642d0 ti: ffff8802327e0000 task.ti: ffff8802327e0000 > RIP: 0010:[<ffffffffa046d6f7>] [<ffffffffa046d6f7>] reshape_request+0x8a7/0x8b0 [raid456] > RSP: 0018:ffff8802327e3b78 EFLAGS: 00010297 > RAX: 0000000000000200 RBX: ffff88022b130400 RCX: 0000000000000200 > RDX: 0000000000009801 RSI: 0000000000000200 RDI: 0000000000000001 > RBP: ffff8802327e3c28 R08: 00000000000001ff R09: 0000000000000000 > R10: fffffffffffffdff R11: 0000000000000001 R12: ffff8802327e3d0c > R13: 0000000000000000 R14: ffff8802327e3d0c R15: ffff880234577000 > FS: 0000000000000000(0000) GS:ffff88023ee00000(0000) knlGS:0000000000000000 > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > CR2: 00007f985b677140 CR3: 0000000001a0b000 CR4: 00000000001406f0 > Stack: > 0000000000000000 0000000000000200 0000000000000001 0000000000000000 > 0000000000000000 fffffffffffffdff 0000020000000001 ffffffff815b34d3 > ffff88022b2642d0 ffff880235f2f080 ffff8802327e3c08 ffff8802327e4000 > Call Trace: > [<ffffffff815b34d3>] ? __schedule+0x383/0x8e0 > [<ffffffffa046da2e>] sync_request+0x32e/0x3a0 [raid456] > [<ffffffff81092738>] ? __wake_up+0x48/0x60 > [<ffffffff8148b974>] md_do_sync+0x8f4/0xe90 > [<ffffffff810779bc>] ? update_rq_clock.part.89+0x1c/0x40 > [<ffffffff81487c98>] md_thread+0x128/0x140 > [<ffffffff81487b70>] ? find_pers+0x80/0x80 > [<ffffffff81487b70>] ? find_pers+0x80/0x80 > [<ffffffff81071b39>] kthread+0xc9/0xe0 > [<ffffffff810eddd6>] ? __audit_syscall_exit+0x1e6/0x280 > [<ffffffff81071a70>] ? kthread_create_on_node+0x170/0x170 > [<ffffffff815b7352>] ret_from_fork+0x42/0x70 > [<ffffffff81071a70>] ? kthread_create_on_node+0x170/0x170 > Code: 00 00 be 03 00 00 00 e8 18 50 c2 e0 49 8d 7f 50 48 c7 c2 39 55 47 a0 31 f6 e8 56 21 da e0 48 8b 45 88 e9 ae fb ff ff 0f 0b 0f 0b <0f> 0b 0f 1f 80 00 00 00 00 55 48 89 e5 41 57 41 56 41 55 41 54 > RIP [<ffffffffa046d6f7>] reshape_request+0x8a7/0x8b0 [raid456] > RSP <ffff8802327e3b78> > ---[ end trace 1d72a07e246912ea ]--- > BUG: unable to handle kernel NULL pointer dereference at (null) > IP: [<ffffffff81073c1f>] exit_creds+0x1f/0x70 > PGD bdc0c067 PUD a6697067 PMD 0 > Oops: 0000 [#2] SMP > Modules linked in: raid456 async_raid6_recov async_memcpy async_pq async_xor xor async_tx raid6_pq nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT nf_reject_ipv4 iptable_filter ip_tables tun bridge stp llc xfs x86_pkg_temp_thermal coretemp kvm_intel kvm crc32c_intel ghash_clmulni_intel libcrc32c aesni_intel aes_x86_64 glue_helper lrw gf128mul ablk_helper cryptd ppdev raid0 microcode pcspkr iTCO_wdt iTCO_vendor_support parport_pc i2c_i801 i2c_core parport shpchp lpc_ich mfd_core video acpi_cpufreq nfsd auth_rpcgss oid_registry exportfs nfs_acl lockd grace sunrpc uinput autofs4 ext4 mbcache jbd2 sd_mod e1000e ptp pps_core ahci r8169 mii libahci dm_mirror dm_region_hash dm_log dm_mod ipv6 > CPU: 1 PID: 15030 Comm: mdadm Tainted: G D W 4.1.0-rc6+ #2 > Hardware name: Intel Corporation S1200BTL/S1200BTL, BIOS S1200BT.86B.02.00.0035.030220120927 03/02/2012 > task: ffff880233550110 ti: ffff8800a6cbc000 task.ti: ffff8800a6cbc000 > RIP: 0010:[<ffffffff81073c1f>] [<ffffffff81073c1f>] exit_creds+0x1f/0x70 > RSP: 0018:ffff8800a6cbfca8 EFLAGS: 00010292 > RAX: 0000000000000000 RBX: ffff88022b2642d0 RCX: ffff8800a6cbfd00 > RDX: 0000000000005110 RSI: 0000000000000296 RDI: 0000000000000000 > RBP: ffff8800a6cbfcb8 R08: 0000000000000000 R09: 0000000000000000 > R10: 000000000000000b R11: 0000000000000246 R12: ffff88022b2642d0 > R13: 000000000000ff00 R14: 0000000000000004 R15: 0000000000000004 > FS: 00007f70ff26f740(0000) GS:ffff88023ee20000(0000) knlGS:0000000000000000 > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > CR2: 0000000000000000 CR3: 00000000a66cf000 CR4: 00000000001406e0 > Stack: > ffff8800a6cbfcb8 ffff88022b2642d0 ffff8800a6cbfcd8 ffffffff8105264a > 0000000000000000 ffff88022b2642d0 ffff8800a6cbfd08 ffffffff81072048 > ffff880000000001 ffff880233cb9140 ffff880234577150 ffffffffa0477320 > Call Trace: > [<ffffffff8105264a>] __put_task_struct+0x4a/0x130 > [<ffffffff81072048>] kthread_stop+0x88/0x100 > [<ffffffff81487cf5>] md_unregister_thread+0x45/0x80 > [<ffffffff8148e73d>] md_reap_sync_thread+0x1d/0x1a0 > [<ffffffff8148ea68>] action_store+0x1a8/0x2b0 > [<ffffffff8105cd4d>] ? ns_capable+0x2d/0x60 > [<ffffffff8148bf8b>] md_attr_store+0x7b/0xd0 > [<ffffffff8120f77d>] sysfs_kf_write+0x3d/0x50 > [<ffffffff8120eeda>] kernfs_fop_write+0x12a/0x180 > [<ffffffff81199ba8>] __vfs_write+0x28/0xf0 > [<ffffffff8119c7a9>] ? __sb_start_write+0x49/0xf0 > [<ffffffff81230b93>] ? security_file_permission+0x23/0xa0 > [<ffffffff8119a299>] vfs_write+0xa9/0x1b0 > [<ffffffff8119b066>] SyS_write+0x46/0xb0 > [<ffffffff810edb94>] ? __audit_syscall_entry+0xb4/0x110 > [<ffffffff815b6f97>] system_call_fastpath+0x12/0x6a > Code: 0f 84 37 fe ff ff e9 10 fe ff ff 90 0f 1f 44 00 00 55 48 89 e5 53 48 89 fb 48 83 ec 08 48 8b 87 c8 09 00 00 48 8b bf c0 09 00 00 <8b> 00 48 c7 83 c0 09 00 00 00 00 00 00 f0 ff 0f 74 1f 48 8b bb > RIP [<ffffffff81073c1f>] exit_creds+0x1f/0x70 > RSP <ffff8800a6cbfca8> > CR2: 0000000000000000 > ---[ end trace 1d72a07e246912eb ]--- > -- > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 811 bytes --] ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: 4.1-rc6 radi5 OOPS 2015-06-03 20:40 ` NeilBrown @ 2015-06-03 21:57 ` Jes Sorensen 2015-06-03 22:15 ` NeilBrown 2015-06-10 0:19 ` Neil Brown 0 siblings, 2 replies; 14+ messages in thread From: Jes Sorensen @ 2015-06-03 21:57 UTC (permalink / raw) To: NeilBrown; +Cc: linux-raid, Xiao Ni NeilBrown <neilb@suse.de> writes: > On Wed, 03 Jun 2015 16:20:21 -0400 Jes Sorensen <Jes.Sorensen@redhat.com> > wrote: > >> Neil, >> >> I was running testing on the current 4.1-rc6 tree (Linus' top of trunk >> 8cd9234c64c584432f6992fe944ca9e46ca8ea76) and I am seeing the following >> OOPS which is reproducible. >> >> It shows up when running the mdadm test suite, 07changelevelintr to be >> specific. >> >> Is this something you have seen? >> >> Cheers, >> Jes >> >> ------------[ cut here ]------------ >> kernel BUG at drivers/md/raid5.c:5391! > > No, I haven't seen that. And I've been running the test suite quite a bit > lately. > > Can you get it to print out the relevant numbers? Include > readpos/writepos/safepos too. This enough? Let me know if you need more. I suspect this started happening with the changes that went in between 4.1-rc5 and 4.1-rc6. I will try to bisect it tomorrow. Cheers, Jes mddev->dev_sectors: 0x9800, reshape_sectors: 0x0200 stripe_addr: fffffffffffffdff, sector_nr 0, readpos 511, writepos -513, safepos 512 ------------[ cut here ]------------ kernel BUG at drivers/md/raid5.c:5394! invalid opcode: 0000 [#1] SMP Modules linked in: raid456 async_raid6_recov async_memcpy async_pq async_xor xor async_tx raid6_pq nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT nf_reject_ipv4 iptable_filter ip_tables tun bridge stp llc x86_pkg_temp_thermal coretemp kvm_intel kvm xfs crc32c_intel ghash_clmulni_intel aesni_intel aes_x86_64 glue_helper lrw libcrc32c nfsd gf128mul ablk_helper auth_rpcgss oid_registry exportfs nfs_acl lockd cryptd iTCO_wdt grace sunrpc microcode raid0 ppdev iTCO_vendor_support shpchp i2c_i801 i2c_core pcspkr lpc_ich mfd_core parport_pc acpi_cpufreq parport video uinput autofs4 ext4 mbcache jbd2 sd_mod e1000e ptp pps_core ahci r8169 libahci mii dm_mirror dm_region_hash dm_log dm_mod ipv6 CPU: 1 PID: 14648 Comm: md0_resync Not tainted 4.1.0-rc2+ #5 Hardware name: Intel Corporation S1200BTL/S1200BTL, BIOS S1200BT.86B.02.00.0035.030220120927 03/02/2012 task: ffff880234b0f080 ti: ffff88022aadc000 task.ti: ffff88022aadc000 RIP: 0010:[<ffffffffa05db722>] [<ffffffffa05db722>] reshape_request+0x8d2/0x8e0 [raid456] RSP: 0018:ffff88022aadfb68 EFLAGS: 00010296 RAX: 0000000000000087 RBX: ffff88022a8d2400 RCX: 0000000000000000 RDX: 0000000000000001 RSI: ffff88023ee2d368 RDI: ffff88023ee2d368 RBP: ffff88022aadfc28 R08: 0000000000000400 R09: ffffffff81d6c864 R10: 000000000000058f R11: 000000000000058e R12: ffff88022aadfd0c R13: 0000000000000000 R14: ffff88022aadfd0c R15: ffff8800a778c800 FS: 0000000000000000(0000) GS:ffff88023ee20000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007fc8ddece000 CR3: 0000000001a0b000 CR4: 00000000001406e0 Stack: fffffffffffffdff 0000000000000200 0000000000000000 0000000000000200 0000000000000001 0000000000000000 0000000000000001 fffffffffffffdff 0000020000000001 ffffffff815b2d23 ffff880234b0f080 ffff880233748f00 Call Trace: [<ffffffff815b2d23>] ? __schedule+0x383/0x8e0 [<ffffffffa05dba5e>] sync_request+0x32e/0x3a0 [raid456] [<ffffffff81092708>] ? __wake_up+0x48/0x60 [<ffffffff8148b564>] md_do_sync+0x8f4/0xe90 [<ffffffff81487888>] md_thread+0x128/0x140 [<ffffffff81487760>] ? find_pers+0x80/0x80 [<ffffffff81487760>] ? find_pers+0x80/0x80 [<ffffffff81071b49>] kthread+0xc9/0xe0 [<ffffffff810edd76>] ? __audit_syscall_exit+0x1e6/0x280 [<ffffffff81071a80>] ? kthread_create_on_node+0x170/0x170 [<ffffffff815b6b92>] ret_from_fork+0x42/0x70 [<ffffffff81071a80>] ? kthread_create_on_node+0x170/0x170 Code: 48 89 44 24 08 48 8b 85 78 ff ff ff 48 c7 c7 e0 3d 5e a0 4c 8b 85 68 ff ff ff 8b 55 84 48 89 04 24 48 89 c1 31 c0 e8 fe 10 fd e0 <0f> 0b 66 66 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 41 57 41 RIP [<ffffffffa05db722>] reshape_request+0x8d2/0x8e0 [raid456] RSP <ffff88022aadfb68> ---[ end trace 7eefd62cedfb2b13 ]--- BUG: unable to handle kernel NULL pointer dereference at (null) IP: [<ffffffff81073c2f>] exit_creds+0x1f/0x70 PGD 2365b1067 PUD 235280067 PMD 0 Oops: 0000 [#2] SMP Modules linked in: raid456 async_raid6_recov async_memcpy async_pq async_xor xor async_tx raid6_pq nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT nf_reject_ipv4 iptable_filter ip_tables tun bridge stp llc x86_pkg_temp_thermal coretemp kvm_intel kvm xfs crc32c_intel ghash_clmulni_intel aesni_intel aes_x86_64 glue_helper lrw libcrc32c nfsd gf128mul ablk_helper auth_rpcgss oid_registry exportfs nfs_acl lockd cryptd iTCO_wdt grace sunrpc microcode raid0 ppdev iTCO_vendor_support shpchp i2c_i801 i2c_core pcspkr lpc_ich mfd_core parport_pc acpi_cpufreq parport video uinput autofs4 ext4 mbcache jbd2 sd_mod e1000e ptp pps_core ahci r8169 libahci mii dm_mirror dm_region_hash dm_log dm_mod ipv6 CPU: 1 PID: 14535 Comm: mdadm Tainted: G D 4.1.0-rc2+ #5 Hardware name: Intel Corporation S1200BTL/S1200BTL, BIOS S1200BT.86B.02.00.0035.030220120927 03/02/2012 task: ffff8800a6b67100 ti: ffff880232ae4000 task.ti: ffff880232ae4000 RIP: 0010:[<ffffffff81073c2f>] [<ffffffff81073c2f>] exit_creds+0x1f/0x70 RSP: 0018:ffff880232ae7ca8 EFLAGS: 00010292 RAX: 0000000000000000 RBX: ffff880234b0f080 RCX: ffff880232ae7d00 RDX: 0000000000005a31 RSI: 0000000000000296 RDI: 0000000000000000 RBP: ffff880232ae7cb8 R08: 0000000000000000 R09: 0000000000000000 R10: 000000000000000b R11: 0000000000000246 R12: ffff880234b0f080 R13: 000000000000000b R14: ffff8802338f0000 R15: 0000000000000004 FS: 00007f70bfc97740(0000) GS:ffff88023ee20000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000000 CR3: 0000000234966000 CR4: 00000000001406e0 Stack: ffff880232ae7cb8 ffff880234b0f080 ffff880232ae7cd8 ffffffff8105265a 0000000000000000 ffff880234b0f080 ffff880232ae7d08 ffffffff81072058 ffff880200000004 ffff8802361046c0 ffff8800a778c950 0000000000000004 Call Trace: [<ffffffff8105265a>] __put_task_struct+0x4a/0x130 [<ffffffff81072058>] kthread_stop+0x88/0x100 [<ffffffff814878e5>] md_unregister_thread+0x45/0x80 [<ffffffff8148e32d>] md_reap_sync_thread+0x1d/0x1a0 [<ffffffff8148e650>] action_store+0x1a0/0x290 [<ffffffff8105cd5d>] ? ns_capable+0x2d/0x60 [<ffffffff8148bb7b>] md_attr_store+0x7b/0xd0 [<ffffffff8120f45d>] sysfs_kf_write+0x3d/0x50 [<ffffffff8120ebba>] kernfs_fop_write+0x12a/0x180 [<ffffffff81199878>] __vfs_write+0x28/0xf0 [<ffffffff8119c479>] ? __sb_start_write+0x49/0xf0 [<ffffffff81230873>] ? security_file_permission+0x23/0xa0 [<ffffffff81199f69>] vfs_write+0xa9/0x1b0 [<ffffffff8119ad36>] SyS_write+0x46/0xb0 [<ffffffff810edb34>] ? __audit_syscall_entry+0xb4/0x110 [<ffffffff815b67d7>] system_call_fastpath+0x12/0x6a Code: 0f 84 37 fe ff ff e9 10 fe ff ff 90 0f 1f 44 00 00 55 48 89 e5 53 48 89 fb 48 83 ec 08 48 8b 87 c8 09 00 00 48 8b bf c0 09 00 00 <8b> 00 48 c7 83 c0 09 00 00 00 00 00 00 f0 ff 0f 74 1f 48 8b bb RIP [<ffffffff81073c2f>] exit_creds+0x1f/0x70 RSP <ffff880232ae7ca8> CR2: 0000000000000000 ---[ end trace 7eefd62cedfb2b14 ]--- > > Thanks, > NeilBrown > > >> invalid opcode: 0000 [#1] SMP >> Modules linked in: raid456 async_raid6_recov async_memcpy async_pq >> async_xor xor async_tx raid6_pq nf_conntrack_ipv4 nf_defrag_ipv4 >> xt_conntrack nf_conntrack ipt_REJECT nf_reject_ipv4 iptable_filter >> ip_tables tun bridge stp llc xfs x86_pkg_temp_thermal coretemp >> kvm_intel kvm crc32c_intel ghash_clmulni_intel libcrc32c aesni_intel >> aes_x86_64 glue_helper lrw gf128mul ablk_helper cryptd ppdev raid0 >> microcode pcspkr iTCO_wdt iTCO_vendor_support parport_pc i2c_i801 >> i2c_core parport shpchp lpc_ich mfd_core video acpi_cpufreq nfsd >> auth_rpcgss oid_registry exportfs nfs_acl lockd grace sunrpc uinput >> autofs4 ext4 mbcache jbd2 sd_mod e1000e ptp pps_core ahci r8169 mii >> libahci dm_mirror dm_region_hash dm_log dm_mod ipv6 >> CPU: 0 PID: 15142 Comm: md0_resync Tainted: G W 4.1.0-rc6+ #2 >> Hardware name: Intel Corporation S1200BTL/S1200BTL, BIOS >> S1200BT.86B.02.00.0035.030220120927 03/02/2012 >> task: ffff88022b2642d0 ti: ffff8802327e0000 task.ti: ffff8802327e0000 >> RIP: 0010:[<ffffffffa046d6f7>] [<ffffffffa046d6f7>] >> reshape_request+0x8a7/0x8b0 [raid456] >> RSP: 0018:ffff8802327e3b78 EFLAGS: 00010297 >> RAX: 0000000000000200 RBX: ffff88022b130400 RCX: 0000000000000200 >> RDX: 0000000000009801 RSI: 0000000000000200 RDI: 0000000000000001 >> RBP: ffff8802327e3c28 R08: 00000000000001ff R09: 0000000000000000 >> R10: fffffffffffffdff R11: 0000000000000001 R12: ffff8802327e3d0c >> R13: 0000000000000000 R14: ffff8802327e3d0c R15: ffff880234577000 >> FS: 0000000000000000(0000) GS:ffff88023ee00000(0000) knlGS:0000000000000000 >> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >> CR2: 00007f985b677140 CR3: 0000000001a0b000 CR4: 00000000001406f0 >> Stack: >> 0000000000000000 0000000000000200 0000000000000001 0000000000000000 >> 0000000000000000 fffffffffffffdff 0000020000000001 ffffffff815b34d3 >> ffff88022b2642d0 ffff880235f2f080 ffff8802327e3c08 ffff8802327e4000 >> Call Trace: >> [<ffffffff815b34d3>] ? __schedule+0x383/0x8e0 >> [<ffffffffa046da2e>] sync_request+0x32e/0x3a0 [raid456] >> [<ffffffff81092738>] ? __wake_up+0x48/0x60 >> [<ffffffff8148b974>] md_do_sync+0x8f4/0xe90 >> [<ffffffff810779bc>] ? update_rq_clock.part.89+0x1c/0x40 >> [<ffffffff81487c98>] md_thread+0x128/0x140 >> [<ffffffff81487b70>] ? find_pers+0x80/0x80 >> [<ffffffff81487b70>] ? find_pers+0x80/0x80 >> [<ffffffff81071b39>] kthread+0xc9/0xe0 >> [<ffffffff810eddd6>] ? __audit_syscall_exit+0x1e6/0x280 >> [<ffffffff81071a70>] ? kthread_create_on_node+0x170/0x170 >> [<ffffffff815b7352>] ret_from_fork+0x42/0x70 >> [<ffffffff81071a70>] ? kthread_create_on_node+0x170/0x170 >> Code: 00 00 be 03 00 00 00 e8 18 50 c2 e0 49 8d 7f 50 48 c7 c2 39 55 >> 47 a0 31 f6 e8 56 21 da e0 48 8b 45 88 e9 ae fb ff ff 0f 0b 0f 0b >> <0f> 0b 0f 1f 80 00 00 00 00 55 48 89 e5 41 57 41 56 41 55 41 54 >> RIP [<ffffffffa046d6f7>] reshape_request+0x8a7/0x8b0 [raid456] >> RSP <ffff8802327e3b78> >> ---[ end trace 1d72a07e246912ea ]--- >> BUG: unable to handle kernel NULL pointer dereference at (null) >> IP: [<ffffffff81073c1f>] exit_creds+0x1f/0x70 >> PGD bdc0c067 PUD a6697067 PMD 0 >> Oops: 0000 [#2] SMP >> Modules linked in: raid456 async_raid6_recov async_memcpy async_pq >> async_xor xor async_tx raid6_pq nf_conntrack_ipv4 nf_defrag_ipv4 >> xt_conntrack nf_conntrack ipt_REJECT nf_reject_ipv4 iptable_filter >> ip_tables tun bridge stp llc xfs x86_pkg_temp_thermal coretemp >> kvm_intel kvm crc32c_intel ghash_clmulni_intel libcrc32c aesni_intel >> aes_x86_64 glue_helper lrw gf128mul ablk_helper cryptd ppdev raid0 >> microcode pcspkr iTCO_wdt iTCO_vendor_support parport_pc i2c_i801 >> i2c_core parport shpchp lpc_ich mfd_core video acpi_cpufreq nfsd >> auth_rpcgss oid_registry exportfs nfs_acl lockd grace sunrpc uinput >> autofs4 ext4 mbcache jbd2 sd_mod e1000e ptp pps_core ahci r8169 mii >> libahci dm_mirror dm_region_hash dm_log dm_mod ipv6 >> CPU: 1 PID: 15030 Comm: mdadm Tainted: G D W 4.1.0-rc6+ #2 >> Hardware name: Intel Corporation S1200BTL/S1200BTL, BIOS >> S1200BT.86B.02.00.0035.030220120927 03/02/2012 >> task: ffff880233550110 ti: ffff8800a6cbc000 task.ti: ffff8800a6cbc000 >> RIP: 0010:[<ffffffff81073c1f>] [<ffffffff81073c1f>] exit_creds+0x1f/0x70 >> RSP: 0018:ffff8800a6cbfca8 EFLAGS: 00010292 >> RAX: 0000000000000000 RBX: ffff88022b2642d0 RCX: ffff8800a6cbfd00 >> RDX: 0000000000005110 RSI: 0000000000000296 RDI: 0000000000000000 >> RBP: ffff8800a6cbfcb8 R08: 0000000000000000 R09: 0000000000000000 >> R10: 000000000000000b R11: 0000000000000246 R12: ffff88022b2642d0 >> R13: 000000000000ff00 R14: 0000000000000004 R15: 0000000000000004 >> FS: 00007f70ff26f740(0000) GS:ffff88023ee20000(0000) knlGS:0000000000000000 >> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >> CR2: 0000000000000000 CR3: 00000000a66cf000 CR4: 00000000001406e0 >> Stack: >> ffff8800a6cbfcb8 ffff88022b2642d0 ffff8800a6cbfcd8 ffffffff8105264a >> 0000000000000000 ffff88022b2642d0 ffff8800a6cbfd08 ffffffff81072048 >> ffff880000000001 ffff880233cb9140 ffff880234577150 ffffffffa0477320 >> Call Trace: >> [<ffffffff8105264a>] __put_task_struct+0x4a/0x130 >> [<ffffffff81072048>] kthread_stop+0x88/0x100 >> [<ffffffff81487cf5>] md_unregister_thread+0x45/0x80 >> [<ffffffff8148e73d>] md_reap_sync_thread+0x1d/0x1a0 >> [<ffffffff8148ea68>] action_store+0x1a8/0x2b0 >> [<ffffffff8105cd4d>] ? ns_capable+0x2d/0x60 >> [<ffffffff8148bf8b>] md_attr_store+0x7b/0xd0 >> [<ffffffff8120f77d>] sysfs_kf_write+0x3d/0x50 >> [<ffffffff8120eeda>] kernfs_fop_write+0x12a/0x180 >> [<ffffffff81199ba8>] __vfs_write+0x28/0xf0 >> [<ffffffff8119c7a9>] ? __sb_start_write+0x49/0xf0 >> [<ffffffff81230b93>] ? security_file_permission+0x23/0xa0 >> [<ffffffff8119a299>] vfs_write+0xa9/0x1b0 >> [<ffffffff8119b066>] SyS_write+0x46/0xb0 >> [<ffffffff810edb94>] ? __audit_syscall_entry+0xb4/0x110 >> [<ffffffff815b6f97>] system_call_fastpath+0x12/0x6a >> Code: 0f 84 37 fe ff ff e9 10 fe ff ff 90 0f 1f 44 00 00 55 48 89 e5 >> 53 48 89 fb 48 83 ec 08 48 8b 87 c8 09 00 00 48 8b bf c0 09 00 00 >> <8b> 00 48 c7 83 c0 09 00 00 00 00 00 00 f0 ff 0f 74 1f 48 8b bb >> RIP [<ffffffff81073c1f>] exit_creds+0x1f/0x70 >> RSP <ffff8800a6cbfca8> >> CR2: 0000000000000000 >> ---[ end trace 1d72a07e246912eb ]--- >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-raid" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: 4.1-rc6 radi5 OOPS 2015-06-03 21:57 ` Jes Sorensen @ 2015-06-03 22:15 ` NeilBrown 2015-06-04 1:44 ` Jes Sorensen 2015-06-10 0:19 ` Neil Brown 1 sibling, 1 reply; 14+ messages in thread From: NeilBrown @ 2015-06-03 22:15 UTC (permalink / raw) To: Jes Sorensen; +Cc: linux-raid, Xiao Ni [-- Attachment #1: Type: text/plain, Size: 1431 bytes --] On Wed, 03 Jun 2015 17:57:43 -0400 Jes Sorensen <Jes.Sorensen@redhat.com> wrote: > NeilBrown <neilb@suse.de> writes: > > On Wed, 03 Jun 2015 16:20:21 -0400 Jes Sorensen <Jes.Sorensen@redhat.com> > > wrote: > > > >> Neil, > >> > >> I was running testing on the current 4.1-rc6 tree (Linus' top of trunk > >> 8cd9234c64c584432f6992fe944ca9e46ca8ea76) and I am seeing the following > >> OOPS which is reproducible. > >> > >> It shows up when running the mdadm test suite, 07changelevelintr to be > >> specific. > >> > >> Is this something you have seen? > >> > >> Cheers, > >> Jes > >> > >> ------------[ cut here ]------------ > >> kernel BUG at drivers/md/raid5.c:5391! > > > > No, I haven't seen that. And I've been running the test suite quite a bit > > lately. > > > > Can you get it to print out the relevant numbers? Include > > readpos/writepos/safepos too. > > This enough? Let me know if you need more. > > I suspect this started happening with the changes that went in between > 4.1-rc5 and 4.1-rc6. I will try to bisect it tomorrow. > > Cheers, > Jes > > mddev->dev_sectors: 0x9800, reshape_sectors: 0x0200 stripe_addr: fffffffffffffdff, sector_nr 0, readpos 511, writepos -513, safepos 512 Those negative numbers look VERY suspicious. I'm actually on leave this week so I won't be looking at it any more, but I'll see what I can find on Monday. Thanks, NeilBrown [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 811 bytes --] ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: 4.1-rc6 radi5 OOPS 2015-06-03 22:15 ` NeilBrown @ 2015-06-04 1:44 ` Jes Sorensen 0 siblings, 0 replies; 14+ messages in thread From: Jes Sorensen @ 2015-06-04 1:44 UTC (permalink / raw) To: NeilBrown; +Cc: linux-raid, Xiao Ni NeilBrown <neilb@suse.de> writes: > On Wed, 03 Jun 2015 17:57:43 -0400 Jes Sorensen <Jes.Sorensen@redhat.com> > wrote: > >> NeilBrown <neilb@suse.de> writes: >> > On Wed, 03 Jun 2015 16:20:21 -0400 Jes Sorensen <Jes.Sorensen@redhat.com> >> > wrote: >> > >> >> Neil, >> >> >> >> I was running testing on the current 4.1-rc6 tree (Linus' top of trunk >> >> 8cd9234c64c584432f6992fe944ca9e46ca8ea76) and I am seeing the following >> >> OOPS which is reproducible. >> >> >> >> It shows up when running the mdadm test suite, 07changelevelintr to be >> >> specific. >> >> >> >> Is this something you have seen? >> >> >> >> Cheers, >> >> Jes >> >> >> >> ------------[ cut here ]------------ >> >> kernel BUG at drivers/md/raid5.c:5391! >> > >> > No, I haven't seen that. And I've been running the test suite quite a bit >> > lately. >> > >> > Can you get it to print out the relevant numbers? Include >> > readpos/writepos/safepos too. >> >> This enough? Let me know if you need more. >> >> I suspect this started happening with the changes that went in between >> 4.1-rc5 and 4.1-rc6. I will try to bisect it tomorrow. >> >> Cheers, >> Jes >> >> mddev->dev_sectors: 0x9800, reshape_sectors: 0x0200 stripe_addr: >> fffffffffffffdff, sector_nr 0, readpos 511, writepos -513, safepos >> 512 > > Those negative numbers look VERY suspicious. > I'm actually on leave this week so I won't be looking at it any more, but > I'll see what I can find on Monday. > Thanks, > NeilBrown Thanks - I'll try and dig more into this in the mean time. Cheers, Jes ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: 4.1-rc6 radi5 OOPS 2015-06-03 21:57 ` Jes Sorensen 2015-06-03 22:15 ` NeilBrown @ 2015-06-10 0:19 ` Neil Brown 2015-06-10 1:57 ` Neil Brown 2015-06-10 21:02 ` Jes Sorensen 1 sibling, 2 replies; 14+ messages in thread From: Neil Brown @ 2015-06-10 0:19 UTC (permalink / raw) To: Jes Sorensen; +Cc: linux-raid, Xiao Ni On Wed, 03 Jun 2015 17:57:43 -0400 Jes Sorensen <Jes.Sorensen@redhat.com> wrote: > NeilBrown <neilb@suse.de> writes: > > On Wed, 03 Jun 2015 16:20:21 -0400 Jes Sorensen > > <Jes.Sorensen@redhat.com> wrote: > > > >> Neil, > >> > >> I was running testing on the current 4.1-rc6 tree (Linus' top of > >> trunk 8cd9234c64c584432f6992fe944ca9e46ca8ea76) and I am seeing > >> the following OOPS which is reproducible. > >> > >> It shows up when running the mdadm test suite, 07changelevelintr > >> to be specific. > >> > >> Is this something you have seen? > >> > >> Cheers, > >> Jes > >> > >> ------------[ cut here ]------------ > >> kernel BUG at drivers/md/raid5.c:5391! > > > > No, I haven't seen that. And I've been running the test suite > > quite a bit lately. > > > > Can you get it to print out the relevant numbers? Include > > readpos/writepos/safepos too. > > This enough? Let me know if you need more. > > I suspect this started happening with the changes that went in between > 4.1-rc5 and 4.1-rc6. I will try to bisect it tomorrow. > > Cheers, > Jes > > mddev->dev_sectors: 0x9800, reshape_sectors: 0x0200 stripe_addr: > fffffffffffffdff, sector_nr 0, readpos 511, writepos -513, safepos > 512 These numbers suggest that conf->reshape_progress divided by "data_disks" or "new_data_disks" is -1 - or really the unsigned equivalent, which is MaxSectors. But unless data_disks is 1, ->reshape_progress must really be -2 or -3 or something. So maybe if you could confirm the values of ->reshape_progress, data_disks, and new_data_disks, that might help. I don't think ->reshape_progress could get a negative value in any way except by being assigned MaxSectors. And that only happens when the reshape has really completely finished. So it looks like some sort of race. I have other evidence of a race with the resync/reshape thread starting/stopping. If I track that down it'll probably fix this issue too. Thanks, NeilBrown ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: 4.1-rc6 radi5 OOPS 2015-06-10 0:19 ` Neil Brown @ 2015-06-10 1:57 ` Neil Brown 2015-06-10 16:27 ` Jes Sorensen 2015-06-10 21:02 ` Jes Sorensen 1 sibling, 1 reply; 14+ messages in thread From: Neil Brown @ 2015-06-10 1:57 UTC (permalink / raw) To: Jes Sorensen; +Cc: linux-raid, Xiao Ni On Wed, 10 Jun 2015 10:19:42 +1000 Neil Brown <neilb@suse.de> wrote: > So it looks like some sort of race. I have other evidence of a race > with the resync/reshape thread starting/stopping. If I track that > down it'll probably fix this issue too. I think I have found just such a race. If you request a reshape just as a recovery completes, you can end up with two reshapes running. This causes confusion :-) Can you try this patch? If I can remember how to reproduce my race I'll test it on that too. Thanks, NeilBrown diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c index 83532fe84205..03f460a1de60 100644 --- a/drivers/md/raid10.c +++ b/drivers/md/raid10.c @@ -4146,6 +4146,7 @@ static int raid10_start_reshape(struct mddev *mddev) clear_bit(MD_RECOVERY_SYNC, &mddev->recovery); clear_bit(MD_RECOVERY_CHECK, &mddev->recovery); + clear_bit(MD_RECOVERY_DONE, &mddev->recovery); set_bit(MD_RECOVERY_RESHAPE, &mddev->recovery); set_bit(MD_RECOVERY_RUNNING, &mddev->recovery); diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c index 0e49b2c94bdd..59e44e99eef3 100644 --- a/drivers/md/raid5.c +++ b/drivers/md/raid5.c @@ -7368,6 +7368,7 @@ static int raid5_start_reshape(struct mddev *mddev) clear_bit(MD_RECOVERY_SYNC, &mddev->recovery); clear_bit(MD_RECOVERY_CHECK, &mddev->recovery); + clear_bit(MD_RECOVERY_DONE, &mddev->recovery); set_bit(MD_RECOVERY_RESHAPE, &mddev->recovery); set_bit(MD_RECOVERY_RUNNING, &mddev->recovery); mddev->sync_thread = md_register_thread(md_do_sync, mddev, ^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: 4.1-rc6 radi5 OOPS 2015-06-10 1:57 ` Neil Brown @ 2015-06-10 16:27 ` Jes Sorensen 2015-06-11 6:48 ` Neil Brown 0 siblings, 1 reply; 14+ messages in thread From: Jes Sorensen @ 2015-06-10 16:27 UTC (permalink / raw) To: Neil Brown; +Cc: linux-raid, Xiao Ni Neil Brown <neilb@suse.de> writes: > On Wed, 10 Jun 2015 10:19:42 +1000 Neil Brown <neilb@suse.de> wrote: > >> So it looks like some sort of race. I have other evidence of a race >> with the resync/reshape thread starting/stopping. If I track that >> down it'll probably fix this issue too. > > I think I have found just such a race. If you request a reshape just > as a recovery completes, you can end up with two reshapes running. > This causes confusion :-) > > Can you try this patch? If I can remember how to reproduce my race > I'll test it on that too. > > Thanks, > NeilBrown Hi Neil, Thanks for the patch - I tried with this applied, but it still crashed for me :( I had to mangle it manually, somehow it got modified in the email. Note this was a mangled RHEL kernel, but it's the same crash I see on the upstream kernel. [ 754.303561] md: using 128k window, over a total of 19456k. [ 754.309706] mddev->dev_sectors: 0x9800, reshape_sectors: 0x0200 stripe_addr: fffffffffffffdff, sector_nr 0, readpos 511, writepos -513, safepos 512 [ 754.324486] ------------[ cut here ]------------ [ 754.329649] kernel BUG at drivers/md/raid5.c:5388! Cheers, Jes ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: 4.1-rc6 radi5 OOPS 2015-06-10 16:27 ` Jes Sorensen @ 2015-06-11 6:48 ` Neil Brown 2015-06-11 7:02 ` Neil Brown 2015-06-12 21:52 ` Jes Sorensen 0 siblings, 2 replies; 14+ messages in thread From: Neil Brown @ 2015-06-11 6:48 UTC (permalink / raw) To: Jes Sorensen; +Cc: linux-raid, Xiao Ni On Wed, 10 Jun 2015 12:27:35 -0400 Jes Sorensen <Jes.Sorensen@redhat.com> wrote: > Neil Brown <neilb@suse.de> writes: > > On Wed, 10 Jun 2015 10:19:42 +1000 Neil Brown <neilb@suse.de> wrote: > > > >> So it looks like some sort of race. I have other evidence of a race > >> with the resync/reshape thread starting/stopping. If I track that > >> down it'll probably fix this issue too. > > > > I think I have found just such a race. If you request a reshape just > > as a recovery completes, you can end up with two reshapes running. > > This causes confusion :-) > > > > Can you try this patch? If I can remember how to reproduce my race > > I'll test it on that too. > > > > Thanks, > > NeilBrown > > Hi Neil, > > Thanks for the patch - I tried with this applied, but it still crashed > for me :( I had to mangle it manually, somehow it got modified in the > email. Very :-( I had high hopes for that patch. I cannot find anything else that could lead to what you are seeing. I wish I could reproduce it but it is probably highly sensitive to timing so some hardware shows it and others don't. It looks very much like two 'resync' threads are running at the same time. When one finishes, it sets ->reshape_progress to -1 (MaxSector), which trips up the other one. In the hang that I very rarely see, one thread (presumably) finishes and sets MD_RECOVERY_DONE, so the raid5d threads waits for the resync thread to complete, and that thread is waiting for the raid5d to retire some stripe_heads. ... though the 'resync' thread is probably actually doing a 'reshape'... Did you get a chance to bisect it? I must admit that I doubt that would be useful. It probably starts when "md_start_sync" was introduced and maybe made worse when some locking with mddev_lock was relaxed. The only way I can see a race is if MD_RECOVERY_DONE gets left set. When a new thread is started. md_check_recovery always clears it before starting a thread, but raid5_start_reshape doesn't - or didn't before the patch I gave you. It might make more sense to clear the bit in md_reap_sync_thread as below, but if the first patch didn't work, this one is unlikely to. Would you be able to test with the following patch? There is a chance it might confirm whether two sync threads are running at the same time. Thanks, NeilBrown diff --git a/drivers/md/md.c b/drivers/md/md.c index d4f31e1..8eed951 100644 --- a/drivers/md/md.c +++ b/drivers/md/md.c @@ -6945,7 +6945,7 @@ static int md_thread(void *arg) * bdflush, otherwise bdflush will deadlock if there are too * many dirty RAID5 blocks. */ - + printk("Started thread %p %s\n", thread, current->comm); allow_signal(SIGKILL); while (!kthread_should_stop()) { @@ -6967,6 +6967,7 @@ static int md_thread(void *arg) if (!kthread_should_stop()) thread->run(thread); } + printk("Finished thread %p %s\n", thread, current->comm); return 0; } @@ -6992,6 +6993,7 @@ struct md_thread *md_register_thread(void (*run) (struct md_thread *), init_waitqueue_head(&thread->wqueue); + printk("Register thread %p %s_%s\n", thread, mdname(mddev), name); thread->run = run; thread->mddev = mddev; thread->timeout = MAX_SCHEDULE_TIMEOUT; @@ -7954,7 +7956,7 @@ void md_do_sync(struct md_thread *thread) mddev->resync_min = mddev->curr_resync_completed; mddev->curr_resync = 0; spin_unlock(&mddev->lock); - + printk("Finished thread %p\n", thread); wake_up(&resync_wait); set_bit(MD_RECOVERY_DONE, &mddev->recovery); md_wakeup_thread(mddev->thread); @@ -8231,6 +8233,7 @@ void md_reap_sync_thread(struct mddev *mddev) struct md_rdev *rdev; /* resync has finished, collect result */ + printk("Reap thread %p\n", mddev->sync_thread); md_unregister_thread(&mddev->sync_thread); if (!test_bit(MD_RECOVERY_INTR, &mddev->recovery) && !test_bit(MD_RECOVERY_REQUESTED, &mddev->recovery)) { @@ -8259,6 +8262,7 @@ void md_reap_sync_thread(struct mddev *mddev) if (mddev_is_clustered(mddev)) md_cluster_ops->metadata_update_finish(mddev); clear_bit(MD_RECOVERY_RUNNING, &mddev->recovery); + clear_bit(MD_RECOVERY_DONE, &mddev->recovery); clear_bit(MD_RECOVERY_SYNC, &mddev->recovery); clear_bit(MD_RECOVERY_RESHAPE, &mddev->recovery); clear_bit(MD_RECOVERY_REQUESTED, &mddev->recovery); ^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: 4.1-rc6 radi5 OOPS 2015-06-11 6:48 ` Neil Brown @ 2015-06-11 7:02 ` Neil Brown 2015-06-11 7:20 ` Neil Brown 2015-06-12 21:52 ` Jes Sorensen 1 sibling, 1 reply; 14+ messages in thread From: Neil Brown @ 2015-06-11 7:02 UTC (permalink / raw) To: Jes Sorensen; +Cc: linux-raid, Xiao Ni On Thu, 11 Jun 2015 16:48:47 +1000 Neil Brown <neilb@suse.de> wrote: > Would you be able to test with the following patch? There is a chance it might > confirm whether two sync threads are running at the same time. I tried the patch, and was fortunate: [ 208.626468] Register thread ffff88003cf53d00 md0_resync [ 208.629801] Started thread ffff88003cf53d00 md0_resync [ 208.629801] Finished thread ffff88003cf53d00 [ 208.639800] Reap thread ffff88003cf53d00 [ 208.639800] Finished thread ffff88003cf53d00 md0_resync [ 208.643133] Register thread ffff88003cf53700 md0_resync [ 208.643133] Reap thread (null) [ 208.646466] Started thread ffff88003cf53700 md0_resync [ 208.659799] Reap thread ffff88003cf53700 -HANG- That "(null)" is a problem - that will clear MD_RECOVERY_RUNNING, but there is a thread there. Will keep looking. NeilBrown ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: 4.1-rc6 radi5 OOPS 2015-06-11 7:02 ` Neil Brown @ 2015-06-11 7:20 ` Neil Brown 0 siblings, 0 replies; 14+ messages in thread From: Neil Brown @ 2015-06-11 7:20 UTC (permalink / raw) To: Jes Sorensen; +Cc: linux-raid, Xiao Ni On Thu, 11 Jun 2015 17:02:57 +1000 Neil Brown <neilb@suse.de> wrote: > On Thu, 11 Jun 2015 16:48:47 +1000 > Neil Brown <neilb@suse.de> wrote: > > > > Would you be able to test with the following patch? There is a chance it might > > confirm whether two sync threads are running at the same time. > > > I tried the patch, and was fortunate: > > [ 208.626468] Register thread ffff88003cf53d00 md0_resync > [ 208.629801] Started thread ffff88003cf53d00 md0_resync > [ 208.629801] Finished thread ffff88003cf53d00 > [ 208.639800] Reap thread ffff88003cf53d00 > [ 208.639800] Finished thread ffff88003cf53d00 md0_resync > [ 208.643133] Register thread ffff88003cf53700 md0_resync > [ 208.643133] Reap thread (null) > [ 208.646466] Started thread ffff88003cf53700 md0_resync > [ 208.659799] Reap thread ffff88003cf53700 > -HANG- > > That "(null)" is a problem - that will clear MD_RECOVERY_RUNNING, but there > is a thread there. > Will keep looking. > > NeilBrown Could you try this? It is just a hunch, but I'm about to stop for the evening so I won't find anything more until tomorrow. Thanks, NeilBrown diff --git a/drivers/md/md.c b/drivers/md/md.c index d4f31e1..b966b5b 100644 --- a/drivers/md/md.c +++ b/drivers/md/md.c @@ -4217,13 +4217,13 @@ action_store(struct mddev *mddev, const char *page, size_t len) clear_bit(MD_RECOVERY_FROZEN, &mddev->recovery); if (cmd_match(page, "idle") || cmd_match(page, "frozen")) { - flush_workqueue(md_misc_wq); - if (mddev->sync_thread) { - set_bit(MD_RECOVERY_INTR, &mddev->recovery); - if (mddev_lock(mddev) == 0) { + if (mddev_lock(mddev) == 0) { + flush_workqueue(md_misc_wq); + if (mddev->sync_thread) { + set_bit(MD_RECOVERY_INTR, &mddev->recovery); md_reap_sync_thread(mddev); - mddev_unlock(mddev); } + mddev_unlock(mddev); } } else if (test_bit(MD_RECOVERY_RUNNING, &mddev->recovery) || test_bit(MD_RECOVERY_NEEDED, &mddev->recovery)) ^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: 4.1-rc6 radi5 OOPS 2015-06-11 6:48 ` Neil Brown 2015-06-11 7:02 ` Neil Brown @ 2015-06-12 21:52 ` Jes Sorensen 2015-06-13 4:26 ` Neil Brown 1 sibling, 1 reply; 14+ messages in thread From: Jes Sorensen @ 2015-06-12 21:52 UTC (permalink / raw) To: Neil Brown; +Cc: linux-raid, Xiao Ni Neil Brown <neilb@suse.de> writes: > On Wed, 10 Jun 2015 12:27:35 -0400 > Jes Sorensen <Jes.Sorensen@redhat.com> wrote: > >> Neil Brown <neilb@suse.de> writes: >> > On Wed, 10 Jun 2015 10:19:42 +1000 Neil Brown <neilb@suse.de> wrote: >> > >> >> So it looks like some sort of race. I have other evidence of a race >> >> with the resync/reshape thread starting/stopping. If I track that >> >> down it'll probably fix this issue too. >> > >> > I think I have found just such a race. If you request a reshape just >> > as a recovery completes, you can end up with two reshapes running. >> > This causes confusion :-) >> > >> > Can you try this patch? If I can remember how to reproduce my race >> > I'll test it on that too. >> > >> > Thanks, >> > NeilBrown >> >> Hi Neil, >> >> Thanks for the patch - I tried with this applied, but it still crashed >> for me :( I had to mangle it manually, somehow it got modified in the >> email. > > Very :-( > > I had high hopes for that patch. I cannot find anything else that could lead > to what you are seeing. I wish I could reproduce it but it is probably highly > sensitive to timing so some hardware shows it and others don't. > > It looks very much like two 'resync' threads are running at the same time. > When one finishes, it sets ->reshape_progress to -1 (MaxSector), which trips up > the other one. > > In the hang that I very rarely see, one thread (presumably) finishes and sets > MD_RECOVERY_DONE, so the raid5d threads waits for the resync thread to > complete, and that thread is waiting for the raid5d to retire some stripe_heads. > > ... though the 'resync' thread is probably actually doing a 'reshape'... Neil Good news - albeit not guaranteed yet. I tried with the full patch that you sent to Linus, and with that I haven't been able to reproduce the problem so far. I'll try and do some more testing over the weekend. The patch I manually applied only had two hunks in it, the one you pushed to Linus looks a lot more complete :) > Did you get a chance to bisect it? I must admit that I doubt that would be > useful. It probably starts when "md_start_sync" was introduced and maybe made > worse when some locking with mddev_lock was relaxed. > > The only way I can see a race is if MD_RECOVERY_DONE gets left set. When a new > thread is started. md_check_recovery always clears it before starting a thread, > but raid5_start_reshape doesn't - or didn't before the patch I gave you. > > It might make more sense to clear the bit in md_reap_sync_thread as below, > but if the first patch didn't work, this one is unlikely to. > > Would you be able to test with the following patch? There is a chance it might > confirm whether two sync threads are running at the same time. I can try with this patch on too, but I won't get to it before next week. It's been a week of non related MD issues. Thanks a lot! Cheers, Jes ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: 4.1-rc6 radi5 OOPS 2015-06-12 21:52 ` Jes Sorensen @ 2015-06-13 4:26 ` Neil Brown 0 siblings, 0 replies; 14+ messages in thread From: Neil Brown @ 2015-06-13 4:26 UTC (permalink / raw) To: Jes Sorensen; +Cc: linux-raid, Xiao Ni On Fri, 12 Jun 2015 17:52:58 -0400 Jes Sorensen <Jes.Sorensen@redhat.com> wrote: > Neil Brown <neilb@suse.de> writes: > > On Wed, 10 Jun 2015 12:27:35 -0400 > > Jes Sorensen <Jes.Sorensen@redhat.com> wrote: > > > >> Neil Brown <neilb@suse.de> writes: > >> > On Wed, 10 Jun 2015 10:19:42 +1000 Neil Brown <neilb@suse.de> wrote: > >> > > >> >> So it looks like some sort of race. I have other evidence of a race > >> >> with the resync/reshape thread starting/stopping. If I track that > >> >> down it'll probably fix this issue too. > >> > > >> > I think I have found just such a race. If you request a reshape just > >> > as a recovery completes, you can end up with two reshapes running. > >> > This causes confusion :-) > >> > > >> > Can you try this patch? If I can remember how to reproduce my race > >> > I'll test it on that too. > >> > > >> > Thanks, > >> > NeilBrown > >> > >> Hi Neil, > >> > >> Thanks for the patch - I tried with this applied, but it still crashed > >> for me :( I had to mangle it manually, somehow it got modified in the > >> email. > > > > Very :-( > > > > I had high hopes for that patch. I cannot find anything else that could lead > > to what you are seeing. I wish I could reproduce it but it is probably highly > > sensitive to timing so some hardware shows it and others don't. > > > > It looks very much like two 'resync' threads are running at the same time. > > When one finishes, it sets ->reshape_progress to -1 (MaxSector), which trips up > > the other one. > > > > In the hang that I very rarely see, one thread (presumably) finishes and sets > > MD_RECOVERY_DONE, so the raid5d threads waits for the resync thread to > > complete, and that thread is waiting for the raid5d to retire some stripe_heads. > > > > ... though the 'resync' thread is probably actually doing a 'reshape'... > > Neil > > Good news - albeit not guaranteed yet. I tried with the full patch that > you sent to Linus, and with that I haven't been able to reproduce the > problem so far. I'll try and do some more testing over the weekend. > > The patch I manually applied only had two hunks in it, the one you > pushed to Linus looks a lot more complete :) Thanks for testing. I'm fairly sure you issue is fixed now, but it is very nice to have it confirmed. > > > Did you get a chance to bisect it? I must admit that I doubt that would be > > useful. It probably starts when "md_start_sync" was introduced and maybe made > > worse when some locking with mddev_lock was relaxed. > > > > The only way I can see a race is if MD_RECOVERY_DONE gets left set. When a new > > thread is started. md_check_recovery always clears it before starting a thread, > > but raid5_start_reshape doesn't - or didn't before the patch I gave you. > > > > It might make more sense to clear the bit in md_reap_sync_thread as below, > > but if the first patch didn't work, this one is unlikely to. > > > > Would you be able to test with the following patch? There is a chance it might > > confirm whether two sync threads are running at the same time. > > I can try with this patch on too, but I won't get to it before next > week. It's been a week of non related MD issues. Don't bother - that one is just an early version of one that went to Linus, so you have tested the important bit. Thanks, NeilBrown > > Thanks a lot! > > Cheers, > Jes > -- > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: 4.1-rc6 radi5 OOPS 2015-06-10 0:19 ` Neil Brown 2015-06-10 1:57 ` Neil Brown @ 2015-06-10 21:02 ` Jes Sorensen 1 sibling, 0 replies; 14+ messages in thread From: Jes Sorensen @ 2015-06-10 21:02 UTC (permalink / raw) To: Neil Brown; +Cc: linux-raid, Xiao Ni Neil Brown <neilb@suse.de> writes: > On Wed, 03 Jun 2015 17:57:43 -0400 > Jes Sorensen <Jes.Sorensen@redhat.com> wrote: > >> NeilBrown <neilb@suse.de> writes: >> > On Wed, 03 Jun 2015 16:20:21 -0400 Jes Sorensen >> > <Jes.Sorensen@redhat.com> wrote: >> > >> >> Neil, >> >> >> >> I was running testing on the current 4.1-rc6 tree (Linus' top of >> >> trunk 8cd9234c64c584432f6992fe944ca9e46ca8ea76) and I am seeing >> >> the following OOPS which is reproducible. >> >> >> >> It shows up when running the mdadm test suite, 07changelevelintr >> >> to be specific. >> >> >> >> Is this something you have seen? >> >> >> >> Cheers, >> >> Jes >> >> >> >> ------------[ cut here ]------------ >> >> kernel BUG at drivers/md/raid5.c:5391! >> > >> > No, I haven't seen that. And I've been running the test suite >> > quite a bit lately. >> > >> > Can you get it to print out the relevant numbers? Include >> > readpos/writepos/safepos too. >> >> This enough? Let me know if you need more. >> >> I suspect this started happening with the changes that went in between >> 4.1-rc5 and 4.1-rc6. I will try to bisect it tomorrow. >> >> Cheers, >> Jes >> >> mddev->dev_sectors: 0x9800, reshape_sectors: 0x0200 stripe_addr: >> fffffffffffffdff, sector_nr 0, readpos 511, writepos -513, safepos >> 512 > > These numbers suggest that conf->reshape_progress divided by > "data_disks" or "new_data_disks" is -1 - or really the unsigned > equivalent, which is MaxSectors. > But unless data_disks is 1, ->reshape_progress must really be -2 or -3 > or something. > So maybe if you could confirm the values of ->reshape_progress, > data_disks, and new_data_disks, that might help. > > > I don't think ->reshape_progress could get a negative value in any way > except by being assigned MaxSectors. And that only happens when the > reshape has really completely finished. > > So it looks like some sort of race. I have other evidence of a race > with the resync/reshape thread starting/stopping. If I track that > down it'll probably fix this issue too. Hi Neil, I added the debug output you asked for - this is what I got. Looks like reshape_progress did get set to -1. Cheers, Jes mddev->dev_sectors: 0x9800, reshape_sectors: 0x0200 reshape_progress 0xffffffffffffffff, stripe_addr: fffffffffffffdff, sector_nr 0, readpos 511, writepos -513, safepos 512, data_disks 1, new_data_disks 1 ------------[ cut here ]------------ kernel BUG at drivers/md/raid5.c:5363! invalid opcode: 0000 [#1] SMP Modules linked in: raid456 async_raid6_recov async_memcpy async_pq async_xor xor async_tx raid6_pq nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT nf_reject_ipv4 iptable_filter ip_tables tun bridge stp llc xfs x86_pkg_temp_thermal coretemp kvm_intel kvm crc32c_intel ghash_clmulni_intel aesni_intel aes_x86_64 libcrc32c glue_helper lrw gf128mul ablk_helper cryptd iTCO_wdt microcode iTCO_vendor_support raid0 ppdev pcspkr parport_pc nfsd parport shpchp i2c_i801 video i2c_core acpi_cpufreq lpc_ich mfd_core auth_rpcgss oid_registry exportfs nfs_acl lockd grace uinput sunrpc ext4 mbcache jbd2 sd_mod ahci libahci e1000e ptp pps_core r8169 mii dm_mirror dm_region_hash dm_log dm_mod ipv6 autofs4 CPU: 3 PID: 15216 Comm: md0_resync Not tainted 4.1.0-rc2+ #6 Hardware name: Intel Corporation S1200BTL/S1200BTL, BIOS S1200BT.86B.02.00.0035.030220120927 03/02/2012 task: ffff8800a582f040 ti: ffff8800bd410000 task.ti: ffff8800bd410000 RIP: 0010:[<ffffffffa04003a3>] [<ffffffffa04003a3>] reshape_request+0x8e3/0x8f0 [raid456] RSP: 0018:ffff8800bd413b48 EFLAGS: 00010296 RAX: 00000000000000cc RBX: 0000000000000001 RCX: 0000000000000000 RDX: 0000000000000001 RSI: ffff88023ee6d368 RDI: ffff88023ee6d368 RBP: ffff8800bd413c28 R08: 0000000000000400 R09: ffffffff81d6c864 R10: 000000000000058b R11: 000000000000058a R12: ffff8800bd413d0c R13: 0000000000000000 R14: ffff8800bd413d0c R15: ffff8800bd6d6000 FS: 0000000000000000(0000) GS:ffff88023ee60000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f32b2cc7000 CR3: 0000000001a0b000 CR4: 00000000001406e0 Stack: 00000000000001ff fffffffffffffdff 0000000000000200 ffff880200000001 0000000000000001 0000000000010fb0 0000000000000000 0000000000000200 0000000000000001 0000000000000000 0000000000000003 fffffffffffffdff Call Trace: [<ffffffff815b2d33>] ? __schedule+0x383/0x8e0 [<ffffffffa04006de>] sync_request+0x32e/0x3a0 [raid456] [<ffffffff81092708>] ? __wake_up+0x48/0x60 [<ffffffff8148b564>] md_do_sync+0x8f4/0xe90 [<ffffffff810779cc>] ? update_rq_clock.part.89+0x1c/0x40 [<ffffffff81487888>] md_thread+0x128/0x140 [<ffffffff81487760>] ? find_pers+0x80/0x80 [<ffffffff81487760>] ? find_pers+0x80/0x80 [<ffffffff81071b49>] kthread+0xc9/0xe0 [<ffffffff810edd76>] ? __audit_syscall_exit+0x1e6/0x280 [<ffffffff81071a80>] ? kthread_create_on_node+0x170/0x170 [<ffffffff815b6b92>] ret_from_fork+0x42/0x70 [<ffffffff81071a80>] ? kthread_create_on_node+0x170/0x170 Code: 85 78 ff ff ff 4c 8b 8d 68 ff ff ff 8b 55 84 4c 89 04 24 89 5c 24 20 44 89 5c 24 18 48 89 44 24 08 49 89 c0 31 c0 e8 8d c4 1a e1 <0f> 0b 66 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 41 57 41 56 RIP [<ffffffffa04003a3>] reshape_request+0x8e3/0x8f0 [raid456] RSP <ffff8800bd413b48> ---[ end trace f745eac38e148690 ]--- BUG: unable to handle kernel NULL pointer dereference at (null) IP: [<ffffffff81073c2f>] exit_creds+0x1f/0x70 PGD a58ee067 PUD bd4f7067 PMD 0 Oops: 0000 [#2] SMP Modules linked in: raid456 async_raid6_recov async_memcpy async_pq async_xor xor async_tx raid6_pq nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT nf_reject_ipv4 iptable_filter ip_tables tun bridge stp llc xfs x86_pkg_temp_thermal coretemp kvm_intel kvm crc32c_intel ghash_clmulni_intel aesni_intel aes_x86_64 libcrc32c glue_helper lrw gf128mul ablk_helper cryptd iTCO_wdt microcode iTCO_vendor_support raid0 ppdev pcspkr parport_pc nfsd parport shpchp i2c_i801 video i2c_core acpi_cpufreq lpc_ich mfd_core auth_rpcgss oid_registry exportfs nfs_acl lockd grace uinput sunrpc ext4 mbcache jbd2 sd_mod ahci libahci e1000e ptp pps_core r8169 mii dm_mirror dm_region_hash dm_log dm_mod ipv6 autofs4 CPU: 1 PID: 15096 Comm: mdadm Tainted: G D 4.1.0-rc2+ #6 Hardware name: Intel Corporation S1200BTL/S1200BTL, BIOS S1200BT.86B.02.00.0035.030220120927 03/02/2012 task: ffff880233e17000 ti: ffff8800365a4000 task.ti: ffff8800365a4000 RIP: 0010:[<ffffffff81073c2f>] [<ffffffff81073c2f>] exit_creds+0x1f/0x70 RSP: 0018:ffff8800365a7ca8 EFLAGS: 00010292 RAX: 0000000000000000 RBX: ffff8800a582f040 RCX: ffff8800365a7d00 RDX: 0000000000006a70 RSI: 0000000000000296 RDI: 0000000000000000 RBP: ffff8800365a7cb8 R08: 0000000000000000 R09: 0000000000000000 R10: 000000000000000b R11: 0000000000000246 R12: ffff8800a582f040 R13: 0000000000000000 R14: ffff8800a5d16000 R15: 0000000000000004 FS: 00007f3e1e2dc740(0000) GS:ffff88023ee20000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000000 CR3: 00000000bd1b6000 CR4: 00000000001406e0 Stack: ffff8800365a7cb8 ffff8800a582f040 ffff8800365a7cd8 ffffffff8105265a 0000000000000000 ffff8800a582f040 ffff8800365a7d08 ffffffff81072058 ffff880200000006 ffff880234cb5cc0 ffff8800bd6d6150 0000000000000004 Call Trace: [<ffffffff8105265a>] __put_task_struct+0x4a/0x130 [<ffffffff81072058>] kthread_stop+0x88/0x100 [<ffffffff814878e5>] md_unregister_thread+0x45/0x80 [<ffffffff8148e32d>] md_reap_sync_thread+0x1d/0x1a0 [<ffffffff8148e650>] action_store+0x1a0/0x290 [<ffffffff8105cd5d>] ? ns_capable+0x2d/0x60 [<ffffffff8148bb7b>] md_attr_store+0x7b/0xd0 [<ffffffff8120f45d>] sysfs_kf_write+0x3d/0x50 [<ffffffff8120ebba>] kernfs_fop_write+0x12a/0x180 [<ffffffff81199878>] __vfs_write+0x28/0xf0 [<ffffffff8119c479>] ? __sb_start_write+0x49/0xf0 [<ffffffff81230873>] ? security_file_permission+0x23/0xa0 [<ffffffff81199f69>] vfs_write+0xa9/0x1b0 [<ffffffff8119ad36>] SyS_write+0x46/0xb0 [<ffffffff810edb34>] ? __audit_syscall_entry+0xb4/0x110 [<ffffffff815b67d7>] system_call_fastpath+0x12/0x6a Code: 0f 84 37 fe ff ff e9 10 fe ff ff 90 0f 1f 44 00 00 55 48 89 e5 53 48 89 fb 48 83 ec 08 48 8b 87 c8 09 00 00 48 8b bf c0 09 00 00 <8b> 00 48 c7 83 c0 09 00 00 00 00 00 00 f0 ff 0f 74 1f 48 8b bb RIP [<ffffffff81073c2f>] exit_creds+0x1f/0x70 RSP <ffff8800365a7ca8> CR2: 0000000000000000 ---[ end trace f745eac38e148691 ]--- ^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2015-06-13 4:26 UTC | newest] Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2015-06-03 20:20 4.1-rc6 radi5 OOPS Jes Sorensen 2015-06-03 20:40 ` NeilBrown 2015-06-03 21:57 ` Jes Sorensen 2015-06-03 22:15 ` NeilBrown 2015-06-04 1:44 ` Jes Sorensen 2015-06-10 0:19 ` Neil Brown 2015-06-10 1:57 ` Neil Brown 2015-06-10 16:27 ` Jes Sorensen 2015-06-11 6:48 ` Neil Brown 2015-06-11 7:02 ` Neil Brown 2015-06-11 7:20 ` Neil Brown 2015-06-12 21:52 ` Jes Sorensen 2015-06-13 4:26 ` Neil Brown 2015-06-10 21:02 ` Jes Sorensen
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.