* divide error: 0000 [#1] SMP in task_numa_migrate - handle_mm_fault vanilla 4.4.6 @ 2016-03-17 18:38 Stefan Priebe 2016-03-17 18:45 ` Greg KH 0 siblings, 1 reply; 19+ messages in thread From: Stefan Priebe @ 2016-03-17 18:38 UTC (permalink / raw) To: LKML, stable, linux-mm, linux-mm Hi, while running qemu 2.5 on a host running 4.4.6 the host system has crashed (load > 200) 3 times in the last 3 days. Always with this stack trace: (copy left here: http://pastebin.com/raw/bCWTLKyt) [69068.874268] divide error: 0000 [#1] SMP [69068.875242] Modules linked in: ebtable_filter ebtables ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter ip6_tables ipt_REJECT nf_reject_ipv4 xt_physdev xt_comment nf_conntrack_ipv4 nf_defrag_ipv4 xt_tcpudp xt_mark xt_set xt_addrtype xt_conntrack nf_conntrack ip_set_hash_net ip_set vhost_net tun vhost macvtap macvlan kvm_intel nfnetlink_log kvm nfnetlink irqbypass netconsole dlm xt_multiport iptable_filter ip_tables x_tables iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi nfsd auth_rpcgss oid_registry bonding coretemp 8021q garp fuse i2c_i801 i7core_edac edac_core i5500_temp button btrfs xor raid6_pq dm_mod raid1 md_mod usb_storage ohci_hcd bcache sg usbhid sd_mod ata_generic uhci_hcd ehci_pci ehci_hcd usbcore ata_piix usb_common igb i2c_algo_bit mpt3sas raid_class ixgbe scsi_transport_sas i2c_core mdio ptp pps_core [69068.895604] CPU: 14 PID: 6673 Comm: ceph-osd Not tainted 4.4.6+7-ph #1 [69068.897052] Hardware name: Supermicro X8DT3/X8DT3, BIOS 2.1 03/17/2012 [69068.898578] task: ffff880fc7f28000 ti: ffff880fda2c4000 task.ti: ffff880fda2c4000 [69068.900377] RIP: 0010:[<ffffffff860b372c>] [<ffffffff860b372c>] task_h_load+0xcc/0x100 [69068.961763] RSP: 0000:ffff880fda2c7b50 EFLAGS: 00010257 [69069.023910] RAX: 0000000000000000 RBX: ffff880fda2c7c10 RCX: 0000000000000000 [69069.085953] RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff880fc7f28000 [69069.151731] RBP: ffff880fda2c7bc8 R08: 00000001041955df R09: ffff880fffd153f8 [69069.213757] R10: 0000000000000009 R11: 0000000000000193 R12: ffff881f6832c780 [69069.274271] R13: ffff88203fc35380 R14: 0000000000000007 R15: 00000000000000b6 [69069.334727] FS: 00007f578a3fb700(0000) GS:ffff880fffd00000(0000) knlGS:0000000000000000 [69069.396435] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [69069.458522] CR2: 00007f5784f18468 CR3: 0000001fe9738000 CR4: 00000000000026e0 [69069.520799] Stack: [69069.581430] ffffffff860b6855 ffff880fda2c7b78 0000000000000000 0000000000000005 [69069.642629] ffff880fffd00000 0000000000000015 fffffffffffffe7d 0000000000000015 [69069.702815] 0000000000015380 ffff880fda2c7bc8 ffff880fc7f28000 00000000000001e9 [69069.761881] Call Trace: [69069.819883] [<ffffffff860b6855>] ? task_numa_find_cpu+0x225/0x670 [69069.878368] [<ffffffff860b79f0>] task_numa_migrate+0x550/0x950 [69069.936059] [<ffffffff863d9138>] ? find_next_bit+0x18/0x20 [69069.993262] [<ffffffff860b7e6d>] numa_migrate_preferred+0x7d/0x90 [69070.050528] [<ffffffff860b89a5>] task_numa_fault+0x7c5/0xaa0 [69070.106544] [<ffffffff861a2c0b>] ? mpol_misplaced+0x16b/0x1b0 [69070.163705] [<ffffffff8618104e>] __handle_mm_fault+0x9ae/0x11f0 [69070.220013] [<ffffffff865e4c52>] ? inet_recvmsg+0x72/0x90 [69070.276558] [<ffffffff8655240b>] ? SYSC_recvfrom+0x12b/0x170 [69070.332283] [<ffffffff8618196f>] handle_mm_fault+0xdf/0x180 [69070.388515] [<ffffffff8604f324>] __do_page_fault+0x164/0x380 [69070.443897] [<ffffffff860b25c3>] ? account_user_time+0x73/0x80 [69070.498534] [<ffffffff860b2b3e>] ? vtime_account_user+0x4e/0x70 [69070.552598] [<ffffffff8604f5a7>] do_page_fault+0x37/0x90 [69070.605960] [<ffffffff86002a23>] ? syscall_return_slowpath+0x83/0xf0 [69070.660705] [<ffffffff866b32f8>] page_fault+0x28/0x30 [69070.715707] Code: 86 b8 00 00 00 48 89 86 b0 00 00 00 48 85 c9 75 ca 49 8b 81 b0 00 00 00 49 8b 49 78 31 d2 48 0f af 87 d8 01 00 00 5d 48 83 c1 01 <48> f7 f1 c3 4c 89 ce 48 8b 8e c0 00 00 00 48 8b 46 78 4c 89 86 [69070.835144] RIP [<ffffffff860b372c>] task_h_load+0xcc/0x100 [69070.894095] RSP <ffff880fda2c7b50> [69070.953213] ---[ end trace 8d6f449a03dacfd4 ]--- Would be nice if we can fix this in 4.4? Greets, Stefan ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: divide error: 0000 [#1] SMP in task_numa_migrate - handle_mm_fault vanilla 4.4.6 2016-03-17 18:38 divide error: 0000 [#1] SMP in task_numa_migrate - handle_mm_fault vanilla 4.4.6 Stefan Priebe @ 2016-03-17 18:45 ` Greg KH 2016-03-19 22:26 ` Vlastimil Babka 0 siblings, 1 reply; 19+ messages in thread From: Greg KH @ 2016-03-17 18:45 UTC (permalink / raw) To: Stefan Priebe; +Cc: LKML, stable, linux-mm, linux-mm On Thu, Mar 17, 2016 at 07:38:03PM +0100, Stefan Priebe wrote: > Hi, > > while running qemu 2.5 on a host running 4.4.6 the host system has crashed > (load > 200) 3 times in the last 3 days. > > Always with this stack trace: (copy left here: > http://pastebin.com/raw/bCWTLKyt) > > [69068.874268] divide error: 0000 [#1] SMP > [69068.875242] Modules linked in: ebtable_filter ebtables ip6t_REJECT > nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter ip6_tables > ipt_REJECT nf_reject_ipv4 xt_physdev xt_comment nf_conntrack_ipv4 > nf_defrag_ipv4 xt_tcpudp xt_mark xt_set xt_addrtype xt_conntrack > nf_conntrack ip_set_hash_net ip_set vhost_net tun vhost macvtap macvlan > kvm_intel nfnetlink_log kvm nfnetlink irqbypass netconsole dlm xt_multiport > iptable_filter ip_tables x_tables iscsi_tcp libiscsi_tcp libiscsi > scsi_transport_iscsi nfsd auth_rpcgss oid_registry bonding coretemp 8021q > garp fuse i2c_i801 i7core_edac edac_core i5500_temp button btrfs xor > raid6_pq dm_mod raid1 md_mod usb_storage ohci_hcd bcache sg usbhid sd_mod > ata_generic uhci_hcd ehci_pci ehci_hcd usbcore ata_piix usb_common igb > i2c_algo_bit mpt3sas raid_class ixgbe scsi_transport_sas i2c_core mdio ptp > pps_core > [69068.895604] CPU: 14 PID: 6673 Comm: ceph-osd Not tainted 4.4.6+7-ph #1 > [69068.897052] Hardware name: Supermicro X8DT3/X8DT3, BIOS 2.1 03/17/2012 > [69068.898578] task: ffff880fc7f28000 ti: ffff880fda2c4000 task.ti: > ffff880fda2c4000 > [69068.900377] RIP: 0010:[<ffffffff860b372c>] [<ffffffff860b372c>] > task_h_load+0xcc/0x100 > [69068.961763] RSP: 0000:ffff880fda2c7b50 EFLAGS: 00010257 > [69069.023910] RAX: 0000000000000000 RBX: ffff880fda2c7c10 RCX: > 0000000000000000 > [69069.085953] RDX: 0000000000000000 RSI: 0000000000000001 RDI: > ffff880fc7f28000 > [69069.151731] RBP: ffff880fda2c7bc8 R08: 00000001041955df R09: > ffff880fffd153f8 > [69069.213757] R10: 0000000000000009 R11: 0000000000000193 R12: > ffff881f6832c780 > [69069.274271] R13: ffff88203fc35380 R14: 0000000000000007 R15: > 00000000000000b6 > [69069.334727] FS: 00007f578a3fb700(0000) GS:ffff880fffd00000(0000) > knlGS:0000000000000000 > [69069.396435] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [69069.458522] CR2: 00007f5784f18468 CR3: 0000001fe9738000 CR4: > 00000000000026e0 > [69069.520799] Stack: > [69069.581430] ffffffff860b6855 ffff880fda2c7b78 0000000000000000 > 0000000000000005 > [69069.642629] ffff880fffd00000 0000000000000015 fffffffffffffe7d > 0000000000000015 > [69069.702815] 0000000000015380 ffff880fda2c7bc8 ffff880fc7f28000 > 00000000000001e9 > [69069.761881] Call Trace: > [69069.819883] [<ffffffff860b6855>] ? task_numa_find_cpu+0x225/0x670 > [69069.878368] [<ffffffff860b79f0>] task_numa_migrate+0x550/0x950 > [69069.936059] [<ffffffff863d9138>] ? find_next_bit+0x18/0x20 > [69069.993262] [<ffffffff860b7e6d>] numa_migrate_preferred+0x7d/0x90 > [69070.050528] [<ffffffff860b89a5>] task_numa_fault+0x7c5/0xaa0 > [69070.106544] [<ffffffff861a2c0b>] ? mpol_misplaced+0x16b/0x1b0 > [69070.163705] [<ffffffff8618104e>] __handle_mm_fault+0x9ae/0x11f0 > [69070.220013] [<ffffffff865e4c52>] ? inet_recvmsg+0x72/0x90 > [69070.276558] [<ffffffff8655240b>] ? SYSC_recvfrom+0x12b/0x170 > [69070.332283] [<ffffffff8618196f>] handle_mm_fault+0xdf/0x180 > [69070.388515] [<ffffffff8604f324>] __do_page_fault+0x164/0x380 > [69070.443897] [<ffffffff860b25c3>] ? account_user_time+0x73/0x80 > [69070.498534] [<ffffffff860b2b3e>] ? vtime_account_user+0x4e/0x70 > [69070.552598] [<ffffffff8604f5a7>] do_page_fault+0x37/0x90 > [69070.605960] [<ffffffff86002a23>] ? syscall_return_slowpath+0x83/0xf0 > [69070.660705] [<ffffffff866b32f8>] page_fault+0x28/0x30 > [69070.715707] Code: 86 b8 00 00 00 48 89 86 b0 00 00 00 48 85 c9 75 ca 49 > 8b 81 b0 00 00 00 49 8b 49 78 31 d2 48 0f af 87 d8 01 00 00 5d 48 83 c1 01 > <48> f7 f1 c3 4c 89 ce 48 8b 8e c0 00 00 00 48 8b 46 78 4c 89 86 > [69070.835144] RIP [<ffffffff860b372c>] task_h_load+0xcc/0x100 > [69070.894095] RSP <ffff880fda2c7b50> > [69070.953213] ---[ end trace 8d6f449a03dacfd4 ]--- > > Would be nice if we can fix this in 4.4? Does this also happen in 4.5? Did this work on 4.4.5? Some previous release? If you can find the offending patch using 'git bisect', that would be great. thanks, greg k-h ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: divide error: 0000 [#1] SMP in task_numa_migrate - handle_mm_fault vanilla 4.4.6 2016-03-17 18:45 ` Greg KH @ 2016-03-19 22:26 ` Vlastimil Babka 2016-03-20 21:27 ` Stefan Priebe 0 siblings, 1 reply; 19+ messages in thread From: Vlastimil Babka @ 2016-03-19 22:26 UTC (permalink / raw) To: Greg KH, Stefan Priebe Cc: LKML, stable, linux-mm, linux-mm, Peter Zijlstra, Ingo Molnar, Rik van Riel On 03/17/2016 07:45 PM, Greg KH wrote: > On Thu, Mar 17, 2016 at 07:38:03PM +0100, Stefan Priebe wrote: >> Hi, >> >> while running qemu 2.5 on a host running 4.4.6 the host system has crashed >> (load > 200) 3 times in the last 3 days. >> >> Always with this stack trace: (copy left here: >> http://pastebin.com/raw/bCWTLKyt) >> >> [69068.874268] divide error: 0000 [#1] SMP >> [69068.875242] Modules linked in: ebtable_filter ebtables ip6t_REJECT >> nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter ip6_tables >> ipt_REJECT nf_reject_ipv4 xt_physdev xt_comment nf_conntrack_ipv4 >> nf_defrag_ipv4 xt_tcpudp xt_mark xt_set xt_addrtype xt_conntrack >> nf_conntrack ip_set_hash_net ip_set vhost_net tun vhost macvtap macvlan >> kvm_intel nfnetlink_log kvm nfnetlink irqbypass netconsole dlm xt_multiport >> iptable_filter ip_tables x_tables iscsi_tcp libiscsi_tcp libiscsi >> scsi_transport_iscsi nfsd auth_rpcgss oid_registry bonding coretemp 8021q >> garp fuse i2c_i801 i7core_edac edac_core i5500_temp button btrfs xor >> raid6_pq dm_mod raid1 md_mod usb_storage ohci_hcd bcache sg usbhid sd_mod >> ata_generic uhci_hcd ehci_pci ehci_hcd usbcore ata_piix usb_common igb >> i2c_algo_bit mpt3sas raid_class ixgbe scsi_transport_sas i2c_core mdio ptp >> pps_core >> [69068.895604] CPU: 14 PID: 6673 Comm: ceph-osd Not tainted 4.4.6+7-ph #1 >> [69068.897052] Hardware name: Supermicro X8DT3/X8DT3, BIOS 2.1 03/17/2012 >> [69068.898578] task: ffff880fc7f28000 ti: ffff880fda2c4000 task.ti: >> ffff880fda2c4000 >> [69068.900377] RIP: 0010:[<ffffffff860b372c>] [<ffffffff860b372c>] >> task_h_load+0xcc/0x100 decodecode says: 27: 48 83 c1 01 add $0x1,%rcx 2b:* 48 f7 f1 div %rcx <-- trapping instruction This suggests the CONFIG_FAIR_GROUP_SCHED version of task_h_load: update_cfs_rq_h_load(cfs_rq); return div64_ul(p->se.avg.load_avg * cfs_rq->h_load, cfs_rq_load_avg(cfs_rq) + 1); So the load avg is -1, thus after adding 1 we get division by 0, huh? >> [69068.961763] RSP: 0000:ffff880fda2c7b50 EFLAGS: 00010257 >> [69069.023910] RAX: 0000000000000000 RBX: ffff880fda2c7c10 RCX: >> 0000000000000000 >> [69069.085953] RDX: 0000000000000000 RSI: 0000000000000001 RDI: >> ffff880fc7f28000 >> [69069.151731] RBP: ffff880fda2c7bc8 R08: 00000001041955df R09: >> ffff880fffd153f8 >> [69069.213757] R10: 0000000000000009 R11: 0000000000000193 R12: >> ffff881f6832c780 >> [69069.274271] R13: ffff88203fc35380 R14: 0000000000000007 R15: >> 00000000000000b6 >> [69069.334727] FS: 00007f578a3fb700(0000) GS:ffff880fffd00000(0000) >> knlGS:0000000000000000 >> [69069.396435] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >> [69069.458522] CR2: 00007f5784f18468 CR3: 0000001fe9738000 CR4: >> 00000000000026e0 >> [69069.520799] Stack: >> [69069.581430] ffffffff860b6855 ffff880fda2c7b78 0000000000000000 >> 0000000000000005 >> [69069.642629] ffff880fffd00000 0000000000000015 fffffffffffffe7d >> 0000000000000015 >> [69069.702815] 0000000000015380 ffff880fda2c7bc8 ffff880fc7f28000 >> 00000000000001e9 >> [69069.761881] Call Trace: >> [69069.819883] [<ffffffff860b6855>] ? task_numa_find_cpu+0x225/0x670 >> [69069.878368] [<ffffffff860b79f0>] task_numa_migrate+0x550/0x950 >> [69069.936059] [<ffffffff863d9138>] ? find_next_bit+0x18/0x20 >> [69069.993262] [<ffffffff860b7e6d>] numa_migrate_preferred+0x7d/0x90 >> [69070.050528] [<ffffffff860b89a5>] task_numa_fault+0x7c5/0xaa0 >> [69070.106544] [<ffffffff861a2c0b>] ? mpol_misplaced+0x16b/0x1b0 >> [69070.163705] [<ffffffff8618104e>] __handle_mm_fault+0x9ae/0x11f0 >> [69070.220013] [<ffffffff865e4c52>] ? inet_recvmsg+0x72/0x90 >> [69070.276558] [<ffffffff8655240b>] ? SYSC_recvfrom+0x12b/0x170 >> [69070.332283] [<ffffffff8618196f>] handle_mm_fault+0xdf/0x180 >> [69070.388515] [<ffffffff8604f324>] __do_page_fault+0x164/0x380 >> [69070.443897] [<ffffffff860b25c3>] ? account_user_time+0x73/0x80 >> [69070.498534] [<ffffffff860b2b3e>] ? vtime_account_user+0x4e/0x70 >> [69070.552598] [<ffffffff8604f5a7>] do_page_fault+0x37/0x90 >> [69070.605960] [<ffffffff86002a23>] ? syscall_return_slowpath+0x83/0xf0 >> [69070.660705] [<ffffffff866b32f8>] page_fault+0x28/0x30 >> [69070.715707] Code: 86 b8 00 00 00 48 89 86 b0 00 00 00 48 85 c9 75 ca 49 >> 8b 81 b0 00 00 00 49 8b 49 78 31 d2 48 0f af 87 d8 01 00 00 5d 48 83 c1 01 >> <48> f7 f1 c3 4c 89 ce 48 8b 8e c0 00 00 00 48 8b 46 78 4c 89 86 >> [69070.835144] RIP [<ffffffff860b372c>] task_h_load+0xcc/0x100 >> [69070.894095] RSP <ffff880fda2c7b50> >> [69070.953213] ---[ end trace 8d6f449a03dacfd4 ]--- >> >> Would be nice if we can fix this in 4.4? > > Does this also happen in 4.5? Did this work on 4.4.5? Some previous > release? If you can find the offending patch using 'git bisect', that > would be great. > > thanks, > > greg k-h > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majordomo@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> > ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: divide error: 0000 [#1] SMP in task_numa_migrate - handle_mm_fault vanilla 4.4.6 2016-03-19 22:26 ` Vlastimil Babka @ 2016-03-20 21:27 ` Stefan Priebe 2016-03-20 21:41 ` Greg KH 0 siblings, 1 reply; 19+ messages in thread From: Stefan Priebe @ 2016-03-20 21:27 UTC (permalink / raw) To: Vlastimil Babka, Greg KH Cc: LKML, stable, linux-mm, linux-mm, Peter Zijlstra, Ingo Molnar, Rik van Riel Am 19.03.2016 um 23:26 schrieb Vlastimil Babka: > On 03/17/2016 07:45 PM, Greg KH wrote: >> On Thu, Mar 17, 2016 at 07:38:03PM +0100, Stefan Priebe wrote: >>> Hi, >>> >>> while running qemu 2.5 on a host running 4.4.6 the host system has >>> crashed >>> (load > 200) 3 times in the last 3 days. >>> >>> Always with this stack trace: (copy left here: >>> http://pastebin.com/raw/bCWTLKyt) >>> >>> [69068.874268] divide error: 0000 [#1] SMP >>> [69068.875242] Modules linked in: ebtable_filter ebtables ip6t_REJECT >>> nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter >>> ip6_tables >>> ipt_REJECT nf_reject_ipv4 xt_physdev xt_comment nf_conntrack_ipv4 >>> nf_defrag_ipv4 xt_tcpudp xt_mark xt_set xt_addrtype xt_conntrack >>> nf_conntrack ip_set_hash_net ip_set vhost_net tun vhost macvtap macvlan >>> kvm_intel nfnetlink_log kvm nfnetlink irqbypass netconsole dlm >>> xt_multiport >>> iptable_filter ip_tables x_tables iscsi_tcp libiscsi_tcp libiscsi >>> scsi_transport_iscsi nfsd auth_rpcgss oid_registry bonding coretemp >>> 8021q >>> garp fuse i2c_i801 i7core_edac edac_core i5500_temp button btrfs xor >>> raid6_pq dm_mod raid1 md_mod usb_storage ohci_hcd bcache sg usbhid >>> sd_mod >>> ata_generic uhci_hcd ehci_pci ehci_hcd usbcore ata_piix usb_common igb >>> i2c_algo_bit mpt3sas raid_class ixgbe scsi_transport_sas i2c_core >>> mdio ptp >>> pps_core >>> [69068.895604] CPU: 14 PID: 6673 Comm: ceph-osd Not tainted >>> 4.4.6+7-ph #1 >>> [69068.897052] Hardware name: Supermicro X8DT3/X8DT3, BIOS 2.1 >>> 03/17/2012 >>> [69068.898578] task: ffff880fc7f28000 ti: ffff880fda2c4000 task.ti: >>> ffff880fda2c4000 >>> [69068.900377] RIP: 0010:[<ffffffff860b372c>] [<ffffffff860b372c>] >>> task_h_load+0xcc/0x100 > > decodecode says: > > 27: 48 83 c1 01 add $0x1,%rcx > 2b:* 48 f7 f1 div %rcx <-- trapping > instruction > > This suggests the CONFIG_FAIR_GROUP_SCHED version of task_h_load: > > update_cfs_rq_h_load(cfs_rq); > return div64_ul(p->se.avg.load_avg * cfs_rq->h_load, > cfs_rq_load_avg(cfs_rq) + 1); > > So the load avg is -1, thus after adding 1 we get division by 0, huh? Yes CONFIG_FAIR_GROUP_SCHED is set. I cherry picked now all those commits up to 4.5 for fair.c: https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/log/kernel/sched/fair.c?h=v4.5 It didn't happen again with v4.4.6 + 4.5 patches for fair.c Greets, Stefan > >>> [69068.961763] RSP: 0000:ffff880fda2c7b50 EFLAGS: 00010257 >>> [69069.023910] RAX: 0000000000000000 RBX: ffff880fda2c7c10 RCX: >>> 0000000000000000 >>> [69069.085953] RDX: 0000000000000000 RSI: 0000000000000001 RDI: >>> ffff880fc7f28000 >>> [69069.151731] RBP: ffff880fda2c7bc8 R08: 00000001041955df R09: >>> ffff880fffd153f8 >>> [69069.213757] R10: 0000000000000009 R11: 0000000000000193 R12: >>> ffff881f6832c780 >>> [69069.274271] R13: ffff88203fc35380 R14: 0000000000000007 R15: >>> 00000000000000b6 >>> [69069.334727] FS: 00007f578a3fb700(0000) GS:ffff880fffd00000(0000) >>> knlGS:0000000000000000 >>> [69069.396435] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >>> [69069.458522] CR2: 00007f5784f18468 CR3: 0000001fe9738000 CR4: >>> 00000000000026e0 >>> [69069.520799] Stack: >>> [69069.581430] ffffffff860b6855 ffff880fda2c7b78 0000000000000000 >>> 0000000000000005 >>> [69069.642629] ffff880fffd00000 0000000000000015 fffffffffffffe7d >>> 0000000000000015 >>> [69069.702815] 0000000000015380 ffff880fda2c7bc8 ffff880fc7f28000 >>> 00000000000001e9 >>> [69069.761881] Call Trace: >>> [69069.819883] [<ffffffff860b6855>] ? task_numa_find_cpu+0x225/0x670 >>> [69069.878368] [<ffffffff860b79f0>] task_numa_migrate+0x550/0x950 >>> [69069.936059] [<ffffffff863d9138>] ? find_next_bit+0x18/0x20 >>> [69069.993262] [<ffffffff860b7e6d>] numa_migrate_preferred+0x7d/0x90 >>> [69070.050528] [<ffffffff860b89a5>] task_numa_fault+0x7c5/0xaa0 >>> [69070.106544] [<ffffffff861a2c0b>] ? mpol_misplaced+0x16b/0x1b0 >>> [69070.163705] [<ffffffff8618104e>] __handle_mm_fault+0x9ae/0x11f0 >>> [69070.220013] [<ffffffff865e4c52>] ? inet_recvmsg+0x72/0x90 >>> [69070.276558] [<ffffffff8655240b>] ? SYSC_recvfrom+0x12b/0x170 >>> [69070.332283] [<ffffffff8618196f>] handle_mm_fault+0xdf/0x180 >>> [69070.388515] [<ffffffff8604f324>] __do_page_fault+0x164/0x380 >>> [69070.443897] [<ffffffff860b25c3>] ? account_user_time+0x73/0x80 >>> [69070.498534] [<ffffffff860b2b3e>] ? vtime_account_user+0x4e/0x70 >>> [69070.552598] [<ffffffff8604f5a7>] do_page_fault+0x37/0x90 >>> [69070.605960] [<ffffffff86002a23>] ? syscall_return_slowpath+0x83/0xf0 >>> [69070.660705] [<ffffffff866b32f8>] page_fault+0x28/0x30 >>> [69070.715707] Code: 86 b8 00 00 00 48 89 86 b0 00 00 00 48 85 c9 75 >>> ca 49 >>> 8b 81 b0 00 00 00 49 8b 49 78 31 d2 48 0f af 87 d8 01 00 00 5d 48 83 >>> c1 01 >>> <48> f7 f1 c3 4c 89 ce 48 8b 8e c0 00 00 00 48 8b 46 78 4c 89 86 >>> [69070.835144] RIP [<ffffffff860b372c>] task_h_load+0xcc/0x100 >>> [69070.894095] RSP <ffff880fda2c7b50> >>> [69070.953213] ---[ end trace 8d6f449a03dacfd4 ]--- >>> >>> Would be nice if we can fix this in 4.4? >> >> Does this also happen in 4.5? Did this work on 4.4.5? Some previous >> release? If you can find the offending patch using 'git bisect', that >> would be great. >> >> thanks, >> >> greg k-h >> >> -- >> To unsubscribe, send a message with 'unsubscribe linux-mm' in >> the body to majordomo@kvack.org. For more info on Linux MM, >> see: http://www.linux-mm.org/ . >> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> >> > ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: divide error: 0000 [#1] SMP in task_numa_migrate - handle_mm_fault vanilla 4.4.6 2016-03-20 21:27 ` Stefan Priebe @ 2016-03-20 21:41 ` Greg KH 2016-03-21 10:52 ` Stefan Priebe - Profihost AG 0 siblings, 1 reply; 19+ messages in thread From: Greg KH @ 2016-03-20 21:41 UTC (permalink / raw) To: Stefan Priebe Cc: Vlastimil Babka, LKML, stable, linux-mm, linux-mm, Peter Zijlstra, Ingo Molnar, Rik van Riel On Sun, Mar 20, 2016 at 10:27:23PM +0100, Stefan Priebe wrote: > > Am 19.03.2016 um 23:26 schrieb Vlastimil Babka: > >On 03/17/2016 07:45 PM, Greg KH wrote: > >>On Thu, Mar 17, 2016 at 07:38:03PM +0100, Stefan Priebe wrote: > >>>Hi, > >>> > >>>while running qemu 2.5 on a host running 4.4.6 the host system has > >>>crashed > >>>(load > 200) 3 times in the last 3 days. > >>> > >>>Always with this stack trace: (copy left here: > >>>http://pastebin.com/raw/bCWTLKyt) > >>> > >>>[69068.874268] divide error: 0000 [#1] SMP > >>>[69068.875242] Modules linked in: ebtable_filter ebtables ip6t_REJECT > >>>nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter > >>>ip6_tables > >>>ipt_REJECT nf_reject_ipv4 xt_physdev xt_comment nf_conntrack_ipv4 > >>>nf_defrag_ipv4 xt_tcpudp xt_mark xt_set xt_addrtype xt_conntrack > >>>nf_conntrack ip_set_hash_net ip_set vhost_net tun vhost macvtap macvlan > >>>kvm_intel nfnetlink_log kvm nfnetlink irqbypass netconsole dlm > >>>xt_multiport > >>>iptable_filter ip_tables x_tables iscsi_tcp libiscsi_tcp libiscsi > >>>scsi_transport_iscsi nfsd auth_rpcgss oid_registry bonding coretemp > >>>8021q > >>>garp fuse i2c_i801 i7core_edac edac_core i5500_temp button btrfs xor > >>>raid6_pq dm_mod raid1 md_mod usb_storage ohci_hcd bcache sg usbhid > >>>sd_mod > >>>ata_generic uhci_hcd ehci_pci ehci_hcd usbcore ata_piix usb_common igb > >>>i2c_algo_bit mpt3sas raid_class ixgbe scsi_transport_sas i2c_core > >>>mdio ptp > >>>pps_core > >>>[69068.895604] CPU: 14 PID: 6673 Comm: ceph-osd Not tainted > >>>4.4.6+7-ph #1 > >>>[69068.897052] Hardware name: Supermicro X8DT3/X8DT3, BIOS 2.1 > >>>03/17/2012 > >>>[69068.898578] task: ffff880fc7f28000 ti: ffff880fda2c4000 task.ti: > >>>ffff880fda2c4000 > >>>[69068.900377] RIP: 0010:[<ffffffff860b372c>] [<ffffffff860b372c>] > >>>task_h_load+0xcc/0x100 > > > >decodecode says: > > > > 27: 48 83 c1 01 add $0x1,%rcx > > 2b:* 48 f7 f1 div %rcx <-- trapping > >instruction > > > >This suggests the CONFIG_FAIR_GROUP_SCHED version of task_h_load: > > > > update_cfs_rq_h_load(cfs_rq); > > return div64_ul(p->se.avg.load_avg * cfs_rq->h_load, > > cfs_rq_load_avg(cfs_rq) + 1); > > > >So the load avg is -1, thus after adding 1 we get division by 0, huh? > > Yes CONFIG_FAIR_GROUP_SCHED is set. I cherry picked now all those commits up > to 4.5 for fair.c: > https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/log/kernel/sched/fair.c?h=v4.5 > > It didn't happen again with v4.4.6 + 4.5 patches for fair.c Ok, that's a lot of patches, how about figuring out which single patch, or shortest number of patches, makes things work again? thanks, greg k-h ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: divide error: 0000 [#1] SMP in task_numa_migrate - handle_mm_fault vanilla 4.4.6 2016-03-20 21:41 ` Greg KH @ 2016-03-21 10:52 ` Stefan Priebe - Profihost AG 2016-03-21 13:38 ` Greg KH 0 siblings, 1 reply; 19+ messages in thread From: Stefan Priebe - Profihost AG @ 2016-03-21 10:52 UTC (permalink / raw) To: Greg KH Cc: Vlastimil Babka, LKML, stable, linux-mm, linux-mm, Peter Zijlstra, Ingo Molnar, Rik van Riel Am 20.03.2016 um 22:41 schrieb Greg KH: > On Sun, Mar 20, 2016 at 10:27:23PM +0100, Stefan Priebe wrote: >> >> Am 19.03.2016 um 23:26 schrieb Vlastimil Babka: >>> On 03/17/2016 07:45 PM, Greg KH wrote: >>>> On Thu, Mar 17, 2016 at 07:38:03PM +0100, Stefan Priebe wrote: >>>>> Hi, >>>>> >>>>> while running qemu 2.5 on a host running 4.4.6 the host system has >>>>> crashed >>>>> (load > 200) 3 times in the last 3 days. >>>>> >>>>> Always with this stack trace: (copy left here: >>>>> http://pastebin.com/raw/bCWTLKyt) >>>>> >>>>> [69068.874268] divide error: 0000 [#1] SMP >>>>> [69068.875242] Modules linked in: ebtable_filter ebtables ip6t_REJECT >>>>> nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter >>>>> ip6_tables >>>>> ipt_REJECT nf_reject_ipv4 xt_physdev xt_comment nf_conntrack_ipv4 >>>>> nf_defrag_ipv4 xt_tcpudp xt_mark xt_set xt_addrtype xt_conntrack >>>>> nf_conntrack ip_set_hash_net ip_set vhost_net tun vhost macvtap macvlan >>>>> kvm_intel nfnetlink_log kvm nfnetlink irqbypass netconsole dlm >>>>> xt_multiport >>>>> iptable_filter ip_tables x_tables iscsi_tcp libiscsi_tcp libiscsi >>>>> scsi_transport_iscsi nfsd auth_rpcgss oid_registry bonding coretemp >>>>> 8021q >>>>> garp fuse i2c_i801 i7core_edac edac_core i5500_temp button btrfs xor >>>>> raid6_pq dm_mod raid1 md_mod usb_storage ohci_hcd bcache sg usbhid >>>>> sd_mod >>>>> ata_generic uhci_hcd ehci_pci ehci_hcd usbcore ata_piix usb_common igb >>>>> i2c_algo_bit mpt3sas raid_class ixgbe scsi_transport_sas i2c_core >>>>> mdio ptp >>>>> pps_core >>>>> [69068.895604] CPU: 14 PID: 6673 Comm: ceph-osd Not tainted >>>>> 4.4.6+7-ph #1 >>>>> [69068.897052] Hardware name: Supermicro X8DT3/X8DT3, BIOS 2.1 >>>>> 03/17/2012 >>>>> [69068.898578] task: ffff880fc7f28000 ti: ffff880fda2c4000 task.ti: >>>>> ffff880fda2c4000 >>>>> [69068.900377] RIP: 0010:[<ffffffff860b372c>] [<ffffffff860b372c>] >>>>> task_h_load+0xcc/0x100 >>> >>> decodecode says: >>> >>> 27: 48 83 c1 01 add $0x1,%rcx >>> 2b:* 48 f7 f1 div %rcx <-- trapping >>> instruction >>> >>> This suggests the CONFIG_FAIR_GROUP_SCHED version of task_h_load: >>> >>> update_cfs_rq_h_load(cfs_rq); >>> return div64_ul(p->se.avg.load_avg * cfs_rq->h_load, >>> cfs_rq_load_avg(cfs_rq) + 1); >>> >>> So the load avg is -1, thus after adding 1 we get division by 0, huh? >> >> Yes CONFIG_FAIR_GROUP_SCHED is set. I cherry picked now all those commits up >> to 4.5 for fair.c: >> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/log/kernel/sched/fair.c?h=v4.5 >> >> It didn't happen again with v4.4.6 + 4.5 patches for fair.c > > Ok, that's a lot of patches, how about figuring out which single patch, > or shortest number of patches, makes things work again? will do so but it seems most out of those 9 patches are based on each other. So it wouldn't be easy. Stefan > > thanks, > > greg k-h > ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: divide error: 0000 [#1] SMP in task_numa_migrate - handle_mm_fault vanilla 4.4.6 2016-03-21 10:52 ` Stefan Priebe - Profihost AG @ 2016-03-21 13:38 ` Greg KH 2016-05-17 6:01 ` Stefan Priebe - Profihost AG 0 siblings, 1 reply; 19+ messages in thread From: Greg KH @ 2016-03-21 13:38 UTC (permalink / raw) To: Stefan Priebe - Profihost AG Cc: Vlastimil Babka, LKML, stable, linux-mm, linux-mm, Peter Zijlstra, Ingo Molnar, Rik van Riel On Mon, Mar 21, 2016 at 11:52:23AM +0100, Stefan Priebe - Profihost AG wrote: > > Am 20.03.2016 um 22:41 schrieb Greg KH: > > On Sun, Mar 20, 2016 at 10:27:23PM +0100, Stefan Priebe wrote: > >> > >> Am 19.03.2016 um 23:26 schrieb Vlastimil Babka: > >>> On 03/17/2016 07:45 PM, Greg KH wrote: > >>>> On Thu, Mar 17, 2016 at 07:38:03PM +0100, Stefan Priebe wrote: > >>>>> Hi, > >>>>> > >>>>> while running qemu 2.5 on a host running 4.4.6 the host system has > >>>>> crashed > >>>>> (load > 200) 3 times in the last 3 days. > >>>>> > >>>>> Always with this stack trace: (copy left here: > >>>>> http://pastebin.com/raw/bCWTLKyt) > >>>>> > >>>>> [69068.874268] divide error: 0000 [#1] SMP > >>>>> [69068.875242] Modules linked in: ebtable_filter ebtables ip6t_REJECT > >>>>> nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter > >>>>> ip6_tables > >>>>> ipt_REJECT nf_reject_ipv4 xt_physdev xt_comment nf_conntrack_ipv4 > >>>>> nf_defrag_ipv4 xt_tcpudp xt_mark xt_set xt_addrtype xt_conntrack > >>>>> nf_conntrack ip_set_hash_net ip_set vhost_net tun vhost macvtap macvlan > >>>>> kvm_intel nfnetlink_log kvm nfnetlink irqbypass netconsole dlm > >>>>> xt_multiport > >>>>> iptable_filter ip_tables x_tables iscsi_tcp libiscsi_tcp libiscsi > >>>>> scsi_transport_iscsi nfsd auth_rpcgss oid_registry bonding coretemp > >>>>> 8021q > >>>>> garp fuse i2c_i801 i7core_edac edac_core i5500_temp button btrfs xor > >>>>> raid6_pq dm_mod raid1 md_mod usb_storage ohci_hcd bcache sg usbhid > >>>>> sd_mod > >>>>> ata_generic uhci_hcd ehci_pci ehci_hcd usbcore ata_piix usb_common igb > >>>>> i2c_algo_bit mpt3sas raid_class ixgbe scsi_transport_sas i2c_core > >>>>> mdio ptp > >>>>> pps_core > >>>>> [69068.895604] CPU: 14 PID: 6673 Comm: ceph-osd Not tainted > >>>>> 4.4.6+7-ph #1 > >>>>> [69068.897052] Hardware name: Supermicro X8DT3/X8DT3, BIOS 2.1 > >>>>> 03/17/2012 > >>>>> [69068.898578] task: ffff880fc7f28000 ti: ffff880fda2c4000 task.ti: > >>>>> ffff880fda2c4000 > >>>>> [69068.900377] RIP: 0010:[<ffffffff860b372c>] [<ffffffff860b372c>] > >>>>> task_h_load+0xcc/0x100 > >>> > >>> decodecode says: > >>> > >>> 27: 48 83 c1 01 add $0x1,%rcx > >>> 2b:* 48 f7 f1 div %rcx <-- trapping > >>> instruction > >>> > >>> This suggests the CONFIG_FAIR_GROUP_SCHED version of task_h_load: > >>> > >>> update_cfs_rq_h_load(cfs_rq); > >>> return div64_ul(p->se.avg.load_avg * cfs_rq->h_load, > >>> cfs_rq_load_avg(cfs_rq) + 1); > >>> > >>> So the load avg is -1, thus after adding 1 we get division by 0, huh? > >> > >> Yes CONFIG_FAIR_GROUP_SCHED is set. I cherry picked now all those commits up > >> to 4.5 for fair.c: > >> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/log/kernel/sched/fair.c?h=v4.5 > >> > >> It didn't happen again with v4.4.6 + 4.5 patches for fair.c > > > > Ok, that's a lot of patches, how about figuring out which single patch, > > or shortest number of patches, makes things work again? > > will do so but it seems most out of those 9 patches are based on each > other. So it wouldn't be easy. Worst case, we take all 9. Best case, we only need one :) thanks, greg k-h ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: divide error: 0000 [#1] SMP in task_numa_migrate - handle_mm_fault vanilla 4.4.6 2016-03-21 13:38 ` Greg KH @ 2016-05-17 6:01 ` Stefan Priebe - Profihost AG 2016-05-17 9:21 ` Campbell Steven 0 siblings, 1 reply; 19+ messages in thread From: Stefan Priebe - Profihost AG @ 2016-05-17 6:01 UTC (permalink / raw) To: Greg KH Cc: Vlastimil Babka, LKML, Campbell Steven, linux-mm, linux-mm, Peter Zijlstra, Ingo Molnar, Rik van Riel Am 21.03.2016 um 14:38 schrieb Greg KH: > On Mon, Mar 21, 2016 at 11:52:23AM +0100, Stefan Priebe - Profihost AG wrote: >> >> Am 20.03.2016 um 22:41 schrieb Greg KH: >>> On Sun, Mar 20, 2016 at 10:27:23PM +0100, Stefan Priebe wrote: >>>> >>>> Am 19.03.2016 um 23:26 schrieb Vlastimil Babka: >>>>> On 03/17/2016 07:45 PM, Greg KH wrote: >>>>>> On Thu, Mar 17, 2016 at 07:38:03PM +0100, Stefan Priebe wrote: >>>>>>> Hi, >>>>>>> >>>>>>> while running qemu 2.5 on a host running 4.4.6 the host system has >>>>>>> crashed >>>>>>> (load > 200) 3 times in the last 3 days. >>>>>>> >>>>>>> Always with this stack trace: (copy left here: >>>>>>> http://pastebin.com/raw/bCWTLKyt) >>>>>>> >>>>>>> [69068.874268] divide error: 0000 [#1] SMP >>>>>>> [69068.875242] Modules linked in: ebtable_filter ebtables ip6t_REJECT >>>>>>> nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter >>>>>>> ip6_tables >>>>>>> ipt_REJECT nf_reject_ipv4 xt_physdev xt_comment nf_conntrack_ipv4 >>>>>>> nf_defrag_ipv4 xt_tcpudp xt_mark xt_set xt_addrtype xt_conntrack >>>>>>> nf_conntrack ip_set_hash_net ip_set vhost_net tun vhost macvtap macvlan >>>>>>> kvm_intel nfnetlink_log kvm nfnetlink irqbypass netconsole dlm >>>>>>> xt_multiport >>>>>>> iptable_filter ip_tables x_tables iscsi_tcp libiscsi_tcp libiscsi >>>>>>> scsi_transport_iscsi nfsd auth_rpcgss oid_registry bonding coretemp >>>>>>> 8021q >>>>>>> garp fuse i2c_i801 i7core_edac edac_core i5500_temp button btrfs xor >>>>>>> raid6_pq dm_mod raid1 md_mod usb_storage ohci_hcd bcache sg usbhid >>>>>>> sd_mod >>>>>>> ata_generic uhci_hcd ehci_pci ehci_hcd usbcore ata_piix usb_common igb >>>>>>> i2c_algo_bit mpt3sas raid_class ixgbe scsi_transport_sas i2c_core >>>>>>> mdio ptp >>>>>>> pps_core >>>>>>> [69068.895604] CPU: 14 PID: 6673 Comm: ceph-osd Not tainted >>>>>>> 4.4.6+7-ph #1 >>>>>>> [69068.897052] Hardware name: Supermicro X8DT3/X8DT3, BIOS 2.1 >>>>>>> 03/17/2012 >>>>>>> [69068.898578] task: ffff880fc7f28000 ti: ffff880fda2c4000 task.ti: >>>>>>> ffff880fda2c4000 >>>>>>> [69068.900377] RIP: 0010:[<ffffffff860b372c>] [<ffffffff860b372c>] >>>>>>> task_h_load+0xcc/0x100 >>>>> >>>>> decodecode says: >>>>> >>>>> 27: 48 83 c1 01 add $0x1,%rcx >>>>> 2b:* 48 f7 f1 div %rcx <-- trapping >>>>> instruction >>>>> >>>>> This suggests the CONFIG_FAIR_GROUP_SCHED version of task_h_load: >>>>> >>>>> update_cfs_rq_h_load(cfs_rq); >>>>> return div64_ul(p->se.avg.load_avg * cfs_rq->h_load, >>>>> cfs_rq_load_avg(cfs_rq) + 1); >>>>> >>>>> So the load avg is -1, thus after adding 1 we get division by 0, huh? >>>> >>>> Yes CONFIG_FAIR_GROUP_SCHED is set. I cherry picked now all those commits up >>>> to 4.5 for fair.c: >>>> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/log/kernel/sched/fair.c?h=v4.5 >>>> >>>> It didn't happen again with v4.4.6 + 4.5 patches for fair.c >>> >>> Ok, that's a lot of patches, how about figuring out which single patch, >>> or shortest number of patches, makes things work again? >> >> will do so but it seems most out of those 9 patches are based on each >> other. So it wouldn't be easy. > > Worst case, we take all 9. Best case, we only need one :) Today i got another report from: casteven (at) gmail.com having the same issue. I wasn't able to figure out which of those 9 patches helps - i still can just tell that it does not happen with this series on top of 4.4. Stefan ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: divide error: 0000 [#1] SMP in task_numa_migrate - handle_mm_fault vanilla 4.4.6 2016-05-17 6:01 ` Stefan Priebe - Profihost AG @ 2016-05-17 9:21 ` Campbell Steven 2016-06-22 1:19 ` Campbell Steven 0 siblings, 1 reply; 19+ messages in thread From: Campbell Steven @ 2016-05-17 9:21 UTC (permalink / raw) To: Stefan Priebe - Profihost AG Cc: Greg KH, Vlastimil Babka, LKML, linux-mm, linux-mm, Peter Zijlstra, Ingo Molnar, Rik van Riel Thanks Stefan, I am seeing this on 4.5.0 and 4.5.4 both are compiled from mainline neither include any patches over and above the tree. I ran for well over a month in production on 4.5.0 with no issues at all on a single socket server (E5-2670 v3 @ 2.30GHz) but as soon as we try to run either 4.5.0 or 4.5.4 on a dual socket box we get these traces after 12 or so hours of uptime and the same sky rocketing load. Given you have stopped your issues by applying those patches to 4.4 it's possible I have different issues, but the symptoms are too similar to ignore, our trace is here: http://pastebin.com/Q7nd8nfP I am using the cfg scheduler, and not the fair scheduler as Stefan reported so that is another difference. Happy to proceed with any suggestions or further information that may help isolate this, use case on these servers are 12 x CEPH osd's with bcache so they are seeing alot of disk activity via NVMe and SATA buses. Thanks Campbell On 17 May 2016 at 18:01, Stefan Priebe - Profihost AG <s.priebe@profihost.ag> wrote: > Am 21.03.2016 um 14:38 schrieb Greg KH: >> On Mon, Mar 21, 2016 at 11:52:23AM +0100, Stefan Priebe - Profihost AG wrote: >>> >>> Am 20.03.2016 um 22:41 schrieb Greg KH: >>>> On Sun, Mar 20, 2016 at 10:27:23PM +0100, Stefan Priebe wrote: >>>>> >>>>> Am 19.03.2016 um 23:26 schrieb Vlastimil Babka: >>>>>> On 03/17/2016 07:45 PM, Greg KH wrote: >>>>>>> On Thu, Mar 17, 2016 at 07:38:03PM +0100, Stefan Priebe wrote: >>>>>>>> Hi, >>>>>>>> >>>>>>>> while running qemu 2.5 on a host running 4.4.6 the host system has >>>>>>>> crashed >>>>>>>> (load > 200) 3 times in the last 3 days. >>>>>>>> >>>>>>>> Always with this stack trace: (copy left here: >>>>>>>> http://pastebin.com/raw/bCWTLKyt) >>>>>>>> >>>>>>>> [69068.874268] divide error: 0000 [#1] SMP >>>>>>>> [69068.875242] Modules linked in: ebtable_filter ebtables ip6t_REJECT >>>>>>>> nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter >>>>>>>> ip6_tables >>>>>>>> ipt_REJECT nf_reject_ipv4 xt_physdev xt_comment nf_conntrack_ipv4 >>>>>>>> nf_defrag_ipv4 xt_tcpudp xt_mark xt_set xt_addrtype xt_conntrack >>>>>>>> nf_conntrack ip_set_hash_net ip_set vhost_net tun vhost macvtap macvlan >>>>>>>> kvm_intel nfnetlink_log kvm nfnetlink irqbypass netconsole dlm >>>>>>>> xt_multiport >>>>>>>> iptable_filter ip_tables x_tables iscsi_tcp libiscsi_tcp libiscsi >>>>>>>> scsi_transport_iscsi nfsd auth_rpcgss oid_registry bonding coretemp >>>>>>>> 8021q >>>>>>>> garp fuse i2c_i801 i7core_edac edac_core i5500_temp button btrfs xor >>>>>>>> raid6_pq dm_mod raid1 md_mod usb_storage ohci_hcd bcache sg usbhid >>>>>>>> sd_mod >>>>>>>> ata_generic uhci_hcd ehci_pci ehci_hcd usbcore ata_piix usb_common igb >>>>>>>> i2c_algo_bit mpt3sas raid_class ixgbe scsi_transport_sas i2c_core >>>>>>>> mdio ptp >>>>>>>> pps_core >>>>>>>> [69068.895604] CPU: 14 PID: 6673 Comm: ceph-osd Not tainted >>>>>>>> 4.4.6+7-ph #1 >>>>>>>> [69068.897052] Hardware name: Supermicro X8DT3/X8DT3, BIOS 2.1 >>>>>>>> 03/17/2012 >>>>>>>> [69068.898578] task: ffff880fc7f28000 ti: ffff880fda2c4000 task.ti: >>>>>>>> ffff880fda2c4000 >>>>>>>> [69068.900377] RIP: 0010:[<ffffffff860b372c>] [<ffffffff860b372c>] >>>>>>>> task_h_load+0xcc/0x100 >>>>>> >>>>>> decodecode says: >>>>>> >>>>>> 27: 48 83 c1 01 add $0x1,%rcx >>>>>> 2b:* 48 f7 f1 div %rcx <-- trapping >>>>>> instruction >>>>>> >>>>>> This suggests the CONFIG_FAIR_GROUP_SCHED version of task_h_load: >>>>>> >>>>>> update_cfs_rq_h_load(cfs_rq); >>>>>> return div64_ul(p->se.avg.load_avg * cfs_rq->h_load, >>>>>> cfs_rq_load_avg(cfs_rq) + 1); >>>>>> >>>>>> So the load avg is -1, thus after adding 1 we get division by 0, huh? >>>>> >>>>> Yes CONFIG_FAIR_GROUP_SCHED is set. I cherry picked now all those commits up >>>>> to 4.5 for fair.c: >>>>> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/log/kernel/sched/fair.c?h=v4.5 >>>>> >>>>> It didn't happen again with v4.4.6 + 4.5 patches for fair.c >>>> >>>> Ok, that's a lot of patches, how about figuring out which single patch, >>>> or shortest number of patches, makes things work again? >>> >>> will do so but it seems most out of those 9 patches are based on each >>> other. So it wouldn't be easy. >> >> Worst case, we take all 9. Best case, we only need one :) > > Today i got another report from: casteven (at) gmail.com having the same > issue. I wasn't able to figure out which of those 9 patches helps - i > still can just tell that it does not happen with this series on top of 4.4. > > Stefan ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: divide error: 0000 [#1] SMP in task_numa_migrate - handle_mm_fault vanilla 4.4.6 2016-05-17 9:21 ` Campbell Steven @ 2016-06-22 1:19 ` Campbell Steven 2016-06-22 6:13 ` Peter Zijlstra 0 siblings, 1 reply; 19+ messages in thread From: Campbell Steven @ 2016-06-22 1:19 UTC (permalink / raw) To: Stefan Priebe - Profihost AG Cc: Greg KH, Vlastimil Babka, LKML, linux-mm, linux-mm, Peter Zijlstra, Ingo Molnar, Rik van Riel On 17 May 2016 at 21:21, Campbell Steven <casteven@gmail.com> wrote: > Thanks Stefan, > > I am seeing this on 4.5.0 and 4.5.4 both are compiled from mainline > neither include any patches over and above the tree. I ran for well > over a month in production on 4.5.0 with no issues at all on a single > socket server (E5-2670 v3 @ 2.30GHz) but as soon as we try to run > either 4.5.0 or 4.5.4 on a dual socket box we get these traces after > 12 or so hours of uptime and the same sky rocketing load. Given you > have stopped your issues by applying those patches to 4.4 it's > possible I have different issues, but the symptoms are too similar to > ignore, our trace is here: > > http://pastebin.com/Q7nd8nfP > > I am using the cfg scheduler, and not the fair scheduler as Stefan > reported so that is another difference. > > Happy to proceed with any suggestions or further information that may > help isolate this, use case on these servers are 12 x CEPH osd's with > bcache so they are seeing alot of disk activity via NVMe and SATA > buses. > > Thanks > > Campbell > > On 17 May 2016 at 18:01, Stefan Priebe - Profihost AG > <s.priebe@profihost.ag> wrote: >> Am 21.03.2016 um 14:38 schrieb Greg KH: >>> On Mon, Mar 21, 2016 at 11:52:23AM +0100, Stefan Priebe - Profihost AG wrote: >>>> >>>> Am 20.03.2016 um 22:41 schrieb Greg KH: >>>>> On Sun, Mar 20, 2016 at 10:27:23PM +0100, Stefan Priebe wrote: >>>>>> >>>>>> Am 19.03.2016 um 23:26 schrieb Vlastimil Babka: >>>>>>> On 03/17/2016 07:45 PM, Greg KH wrote: >>>>>>>> On Thu, Mar 17, 2016 at 07:38:03PM +0100, Stefan Priebe wrote: >>>>>>>>> Hi, >>>>>>>>> >>>>>>>>> while running qemu 2.5 on a host running 4.4.6 the host system has >>>>>>>>> crashed >>>>>>>>> (load > 200) 3 times in the last 3 days. >>>>>>>>> >>>>>>>>> Always with this stack trace: (copy left here: >>>>>>>>> http://pastebin.com/raw/bCWTLKyt) >>>>>>>>> >>>>>>>>> [69068.874268] divide error: 0000 [#1] SMP >>>>>>>>> [69068.875242] Modules linked in: ebtable_filter ebtables ip6t_REJECT >>>>>>>>> nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter >>>>>>>>> ip6_tables >>>>>>>>> ipt_REJECT nf_reject_ipv4 xt_physdev xt_comment nf_conntrack_ipv4 >>>>>>>>> nf_defrag_ipv4 xt_tcpudp xt_mark xt_set xt_addrtype xt_conntrack >>>>>>>>> nf_conntrack ip_set_hash_net ip_set vhost_net tun vhost macvtap macvlan >>>>>>>>> kvm_intel nfnetlink_log kvm nfnetlink irqbypass netconsole dlm >>>>>>>>> xt_multiport >>>>>>>>> iptable_filter ip_tables x_tables iscsi_tcp libiscsi_tcp libiscsi >>>>>>>>> scsi_transport_iscsi nfsd auth_rpcgss oid_registry bonding coretemp >>>>>>>>> 8021q >>>>>>>>> garp fuse i2c_i801 i7core_edac edac_core i5500_temp button btrfs xor >>>>>>>>> raid6_pq dm_mod raid1 md_mod usb_storage ohci_hcd bcache sg usbhid >>>>>>>>> sd_mod >>>>>>>>> ata_generic uhci_hcd ehci_pci ehci_hcd usbcore ata_piix usb_common igb >>>>>>>>> i2c_algo_bit mpt3sas raid_class ixgbe scsi_transport_sas i2c_core >>>>>>>>> mdio ptp >>>>>>>>> pps_core >>>>>>>>> [69068.895604] CPU: 14 PID: 6673 Comm: ceph-osd Not tainted >>>>>>>>> 4.4.6+7-ph #1 >>>>>>>>> [69068.897052] Hardware name: Supermicro X8DT3/X8DT3, BIOS 2.1 >>>>>>>>> 03/17/2012 >>>>>>>>> [69068.898578] task: ffff880fc7f28000 ti: ffff880fda2c4000 task.ti: >>>>>>>>> ffff880fda2c4000 >>>>>>>>> [69068.900377] RIP: 0010:[<ffffffff860b372c>] [<ffffffff860b372c>] >>>>>>>>> task_h_load+0xcc/0x100 >>>>>>> >>>>>>> decodecode says: >>>>>>> >>>>>>> 27: 48 83 c1 01 add $0x1,%rcx >>>>>>> 2b:* 48 f7 f1 div %rcx <-- trapping >>>>>>> instruction >>>>>>> >>>>>>> This suggests the CONFIG_FAIR_GROUP_SCHED version of task_h_load: >>>>>>> >>>>>>> update_cfs_rq_h_load(cfs_rq); >>>>>>> return div64_ul(p->se.avg.load_avg * cfs_rq->h_load, >>>>>>> cfs_rq_load_avg(cfs_rq) + 1); >>>>>>> >>>>>>> So the load avg is -1, thus after adding 1 we get division by 0, huh? >>>>>> >>>>>> Yes CONFIG_FAIR_GROUP_SCHED is set. I cherry picked now all those commits up >>>>>> to 4.5 for fair.c: >>>>>> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/log/kernel/sched/fair.c?h=v4.5 >>>>>> >>>>>> It didn't happen again with v4.4.6 + 4.5 patches for fair.c >>>>> >>>>> Ok, that's a lot of patches, how about figuring out which single patch, >>>>> or shortest number of patches, makes things work again? >>>> >>>> will do so but it seems most out of those 9 patches are based on each >>>> other. So it wouldn't be easy. >>> >>> Worst case, we take all 9. Best case, we only need one :) >> >> Today i got another report from: casteven (at) gmail.com having the same >> issue. I wasn't able to figure out which of those 9 patches helps - i >> still can just tell that it does not happen with this series on top of 4.4. >> >> Stefan Just reinvigorating this thread based off yesterdays post by Yannis here: https://lkml.org/lkml/headers/2016/6/21/491 of which the trace is more or less is identical to what we see. In addition to this we have since tested the 4.6.x releases and have seen the same issues as we have in all the 4.4 and 4.5 releases we tried. I have noticed a few commits in this area of the kernel into the 4.7 series, curious if this may be solved already in 4.7 and if there is a patch to 4.6 we could possibly test, our constraint is that we are only able to reproduce this in our production environment it seems, we never see it in our test/dev systems. We have also found we can work around the issue by disabling all cores on the second CPU socket (e.g. echo 1 > /sys/devices/system/cpu/cpuX/online), doing this "fixes" this problem for us but is obviously not ideal or long term but gives stability if like us you need to run kernels in these series for various other unrelated reasons and can do without your second socket. Campbell ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: divide error: 0000 [#1] SMP in task_numa_migrate - handle_mm_fault vanilla 4.4.6 2016-06-22 1:19 ` Campbell Steven @ 2016-06-22 6:13 ` Peter Zijlstra 2016-07-06 23:20 ` Campbell Steven 0 siblings, 1 reply; 19+ messages in thread From: Peter Zijlstra @ 2016-06-22 6:13 UTC (permalink / raw) To: Campbell Steven Cc: Stefan Priebe - Profihost AG, Greg KH, Vlastimil Babka, LKML, linux-mm, linux-mm, Ingo Molnar, Rik van Riel On Wed, Jun 22, 2016 at 01:19:54PM +1200, Campbell Steven wrote: > >>>>>>> This suggests the CONFIG_FAIR_GROUP_SCHED version of task_h_load: > >>>>>>> > >>>>>>> update_cfs_rq_h_load(cfs_rq); > >>>>>>> return div64_ul(p->se.avg.load_avg * cfs_rq->h_load, > >>>>>>> cfs_rq_load_avg(cfs_rq) + 1); > >>>>>>> --- commit 8974189222159154c55f24ddad33e3613960521a Author: Peter Zijlstra <peterz@infradead.org> Date: Thu Jun 16 10:50:40 2016 +0200 sched/fair: Fix cfs_rq avg tracking underflow As per commit: b7fa30c9cc48 ("sched/fair: Fix post_init_entity_util_avg() serialization") > the code generated from update_cfs_rq_load_avg(): > > if (atomic_long_read(&cfs_rq->removed_load_avg)) { > s64 r = atomic_long_xchg(&cfs_rq->removed_load_avg, 0); > sa->load_avg = max_t(long, sa->load_avg - r, 0); > sa->load_sum = max_t(s64, sa->load_sum - r * LOAD_AVG_MAX, 0); > removed_load = 1; > } > > turns into: > > ffffffff81087064: 49 8b 85 98 00 00 00 mov 0x98(%r13),%rax > ffffffff8108706b: 48 85 c0 test %rax,%rax > ffffffff8108706e: 74 40 je ffffffff810870b0 <update_blocked_averages+0xc0> > ffffffff81087070: 4c 89 f8 mov %r15,%rax > ffffffff81087073: 49 87 85 98 00 00 00 xchg %rax,0x98(%r13) > ffffffff8108707a: 49 29 45 70 sub %rax,0x70(%r13) > ffffffff8108707e: 4c 89 f9 mov %r15,%rcx > ffffffff81087081: bb 01 00 00 00 mov $0x1,%ebx > ffffffff81087086: 49 83 7d 70 00 cmpq $0x0,0x70(%r13) > ffffffff8108708b: 49 0f 49 4d 70 cmovns 0x70(%r13),%rcx > > Which you'll note ends up with sa->load_avg -= r in memory at > ffffffff8108707a. So I _should_ have looked at other unserialized users of ->load_avg, but alas. Luckily nikbor reported a similar /0 from task_h_load() which instantly triggered recollection of this here problem. Aside from the intermediate value hitting memory and causing problems, there's another problem: the underflow detection relies on the signed bit. This reduces the effective width of the variables, IOW its effectively the same as having these variables be of signed type. This patch changes to a different means of unsigned underflow detection to not rely on the signed bit. This allows the variables to use the 'full' unsigned range. And it does so with explicit LOAD - STORE to ensure any intermediate value will never be visible in memory, allowing these unserialized loads. Note: GCC generates crap code for this, might warrant a look later. Note2: I say 'full' above, if we end up at U*_MAX we'll still explode; maybe we should do clamping on add too. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Yuyang Du <yuyang.du@intel.com> Cc: bsegall@google.com Cc: kernel@kyup.com Cc: morten.rasmussen@arm.com Cc: pjt@google.com Cc: steve.muckle@linaro.org Fixes: 9d89c257dfb9 ("sched/fair: Rewrite runnable load and utilization average tracking") Link: http://lkml.kernel.org/r/20160617091948.GJ30927@twins.programming.kicks-ass.net Signed-off-by: Ingo Molnar <mingo@kernel.org> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index a2348deab7a3..2ae68f0e3bf5 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -2904,6 +2904,23 @@ static inline void cfs_rq_util_change(struct cfs_rq *cfs_rq) } } +/* + * Unsigned subtract and clamp on underflow. + * + * Explicitly do a load-store to ensure the intermediate value never hits + * memory. This allows lockless observations without ever seeing the negative + * values. + */ +#define sub_positive(_ptr, _val) do { \ + typeof(_ptr) ptr = (_ptr); \ + typeof(*ptr) val = (_val); \ + typeof(*ptr) res, var = READ_ONCE(*ptr); \ + res = var - val; \ + if (res > var) \ + res = 0; \ + WRITE_ONCE(*ptr, res); \ +} while (0) + /* Group cfs_rq's load_avg is used for task_h_load and update_cfs_share */ static inline int update_cfs_rq_load_avg(u64 now, struct cfs_rq *cfs_rq, bool update_freq) @@ -2913,15 +2930,15 @@ update_cfs_rq_load_avg(u64 now, struct cfs_rq *cfs_rq, bool update_freq) if (atomic_long_read(&cfs_rq->removed_load_avg)) { s64 r = atomic_long_xchg(&cfs_rq->removed_load_avg, 0); - sa->load_avg = max_t(long, sa->load_avg - r, 0); - sa->load_sum = max_t(s64, sa->load_sum - r * LOAD_AVG_MAX, 0); + sub_positive(&sa->load_avg, r); + sub_positive(&sa->load_sum, r * LOAD_AVG_MAX); removed_load = 1; } if (atomic_long_read(&cfs_rq->removed_util_avg)) { long r = atomic_long_xchg(&cfs_rq->removed_util_avg, 0); - sa->util_avg = max_t(long, sa->util_avg - r, 0); - sa->util_sum = max_t(s32, sa->util_sum - r * LOAD_AVG_MAX, 0); + sub_positive(&sa->util_avg, r); + sub_positive(&sa->util_sum, r * LOAD_AVG_MAX); removed_util = 1; } @@ -2994,10 +3011,10 @@ static void detach_entity_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *s &se->avg, se->on_rq * scale_load_down(se->load.weight), cfs_rq->curr == se, NULL); - cfs_rq->avg.load_avg = max_t(long, cfs_rq->avg.load_avg - se->avg.load_avg, 0); - cfs_rq->avg.load_sum = max_t(s64, cfs_rq->avg.load_sum - se->avg.load_sum, 0); - cfs_rq->avg.util_avg = max_t(long, cfs_rq->avg.util_avg - se->avg.util_avg, 0); - cfs_rq->avg.util_sum = max_t(s32, cfs_rq->avg.util_sum - se->avg.util_sum, 0); + sub_positive(&cfs_rq->avg.load_avg, se->avg.load_avg); + sub_positive(&cfs_rq->avg.load_sum, se->avg.load_sum); + sub_positive(&cfs_rq->avg.util_avg, se->avg.util_avg); + sub_positive(&cfs_rq->avg.util_sum, se->avg.util_sum); cfs_rq_util_change(cfs_rq); } ^ permalink raw reply related [flat|nested] 19+ messages in thread
* Re: divide error: 0000 [#1] SMP in task_numa_migrate - handle_mm_fault vanilla 4.4.6 2016-06-22 6:13 ` Peter Zijlstra @ 2016-07-06 23:20 ` Campbell Steven 2016-07-07 7:42 ` Peter Zijlstra 0 siblings, 1 reply; 19+ messages in thread From: Campbell Steven @ 2016-07-06 23:20 UTC (permalink / raw) To: Peter Zijlstra Cc: Stefan Priebe - Profihost AG, Greg KH, Vlastimil Babka, LKML, linux-mm, linux-mm, Ingo Molnar, Rik van Riel On 22 June 2016 at 18:13, Peter Zijlstra <peterz@infradead.org> wrote: > On Wed, Jun 22, 2016 at 01:19:54PM +1200, Campbell Steven wrote: >> >>>>>>> This suggests the CONFIG_FAIR_GROUP_SCHED version of task_h_load: >> >>>>>>> >> >>>>>>> update_cfs_rq_h_load(cfs_rq); >> >>>>>>> return div64_ul(p->se.avg.load_avg * cfs_rq->h_load, >> >>>>>>> cfs_rq_load_avg(cfs_rq) + 1); >> >>>>>>> > > > --- > commit 8974189222159154c55f24ddad33e3613960521a > Author: Peter Zijlstra <peterz@infradead.org> > Date: Thu Jun 16 10:50:40 2016 +0200 > > sched/fair: Fix cfs_rq avg tracking underflow > > As per commit: > > b7fa30c9cc48 ("sched/fair: Fix post_init_entity_util_avg() serialization") > > > the code generated from update_cfs_rq_load_avg(): > > > > if (atomic_long_read(&cfs_rq->removed_load_avg)) { > > s64 r = atomic_long_xchg(&cfs_rq->removed_load_avg, 0); > > sa->load_avg = max_t(long, sa->load_avg - r, 0); > > sa->load_sum = max_t(s64, sa->load_sum - r * LOAD_AVG_MAX, 0); > > removed_load = 1; > > } Hi Peter, I just wanted to report back to say thanks for this, and we have (and others) have tested this out in 4.7 rc6 and have not been able to repeat the issue. It seems that anyone running busy ceph osd's or high load KVM instances is able to trigger this on a dual socket box pretty easily. Since these early reports from Stefan and I it looks like it's been hit but alot more folks now so I'd like to ask what the process is for getting this backported into 4.6, 4.5 and 4.4 as in our testing all those versions for their latest point release seem to have the same problem. Thanks Campbell ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: divide error: 0000 [#1] SMP in task_numa_migrate - handle_mm_fault vanilla 4.4.6 2016-07-06 23:20 ` Campbell Steven @ 2016-07-07 7:42 ` Peter Zijlstra 2016-07-09 5:21 ` Greg KH 2016-07-11 22:33 ` Greg KH 0 siblings, 2 replies; 19+ messages in thread From: Peter Zijlstra @ 2016-07-07 7:42 UTC (permalink / raw) To: Campbell Steven Cc: Stefan Priebe - Profihost AG, Greg KH, Vlastimil Babka, LKML, linux-mm, linux-mm, Ingo Molnar, Rik van Riel On Thu, Jul 07, 2016 at 11:20:36AM +1200, Campbell Steven wrote: > > commit 8974189222159154c55f24ddad33e3613960521a > > Author: Peter Zijlstra <peterz@infradead.org> > > Date: Thu Jun 16 10:50:40 2016 +0200 > Since these early reports from Stefan and I it looks like it's been > hit but alot more folks now so I'd like to ask what the process is for > getting this backported into 4.6, 4.5 and 4.4 as in our testing all > those versions for their latest point release seem to have the same > problem. I think this should do; Greg is on Cc and will mark the commit somewhere. It is already in Linus' tree and should indeed be sufficient. It has a Fixes tag referring the commit that introduced it, which IIRC is somewhere around v4.2. Greg, anything else required? ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: divide error: 0000 [#1] SMP in task_numa_migrate - handle_mm_fault vanilla 4.4.6 2016-07-07 7:42 ` Peter Zijlstra @ 2016-07-09 5:21 ` Greg KH 2016-07-11 22:33 ` Greg KH 1 sibling, 0 replies; 19+ messages in thread From: Greg KH @ 2016-07-09 5:21 UTC (permalink / raw) To: Peter Zijlstra Cc: Campbell Steven, Stefan Priebe - Profihost AG, Vlastimil Babka, LKML, linux-mm, linux-mm, Ingo Molnar, Rik van Riel On Thu, Jul 07, 2016 at 09:42:32AM +0200, Peter Zijlstra wrote: > On Thu, Jul 07, 2016 at 11:20:36AM +1200, Campbell Steven wrote: > > > > commit 8974189222159154c55f24ddad33e3613960521a > > > Author: Peter Zijlstra <peterz@infradead.org> > > > Date: Thu Jun 16 10:50:40 2016 +0200 > > > Since these early reports from Stefan and I it looks like it's been > > hit but alot more folks now so I'd like to ask what the process is for > > getting this backported into 4.6, 4.5 and 4.4 as in our testing all > > those versions for their latest point release seem to have the same > > problem. > > I think this should do; Greg is on Cc and will mark the commit > somewhere. It is already in Linus' tree and should indeed be sufficient. > > It has a Fixes tag referring the commit that introduced it, which IIRC > is somewhere around v4.2. > > Greg, anything else required? Nope, that should be fine. greg k-h ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: divide error: 0000 [#1] SMP in task_numa_migrate - handle_mm_fault vanilla 4.4.6 2016-07-07 7:42 ` Peter Zijlstra 2016-07-09 5:21 ` Greg KH @ 2016-07-11 22:33 ` Greg KH 2016-07-12 13:12 ` Peter Zijlstra 1 sibling, 1 reply; 19+ messages in thread From: Greg KH @ 2016-07-11 22:33 UTC (permalink / raw) To: Peter Zijlstra Cc: Campbell Steven, Stefan Priebe - Profihost AG, Vlastimil Babka, LKML, linux-mm, linux-mm, Ingo Molnar, Rik van Riel On Thu, Jul 07, 2016 at 09:42:32AM +0200, Peter Zijlstra wrote: > On Thu, Jul 07, 2016 at 11:20:36AM +1200, Campbell Steven wrote: > > > > commit 8974189222159154c55f24ddad33e3613960521a > > > Author: Peter Zijlstra <peterz@infradead.org> > > > Date: Thu Jun 16 10:50:40 2016 +0200 > > > Since these early reports from Stefan and I it looks like it's been > > hit but alot more folks now so I'd like to ask what the process is for > > getting this backported into 4.6, 4.5 and 4.4 as in our testing all > > those versions for their latest point release seem to have the same > > problem. > > I think this should do; Greg is on Cc and will mark the commit > somewhere. It is already in Linus' tree and should indeed be sufficient. > > It has a Fixes tag referring the commit that introduced it, which IIRC > is somewhere around v4.2. > > Greg, anything else required? Oops, this commit does not apply cleanly to 4.6 or 4.4-stable trees. Can someone send me the backported verision that they have tested to work properly so I can queue it up? thanks, greg k-h ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: divide error: 0000 [#1] SMP in task_numa_migrate - handle_mm_fault vanilla 4.4.6 2016-07-11 22:33 ` Greg KH @ 2016-07-12 13:12 ` Peter Zijlstra 2016-07-13 0:26 ` Greg KH 0 siblings, 1 reply; 19+ messages in thread From: Peter Zijlstra @ 2016-07-12 13:12 UTC (permalink / raw) To: Greg KH Cc: Campbell Steven, Stefan Priebe - Profihost AG, Vlastimil Babka, LKML, linux-mm, linux-mm, Ingo Molnar, Rik van Riel [-- Attachment #1: Type: text/plain, Size: 454 bytes --] On Mon, Jul 11, 2016 at 03:33:53PM -0700, Greg KH wrote: > Oops, this commit does not apply cleanly to 4.6 or 4.4-stable trees. > Can someone send me the backported verision that they have tested to > work properly so I can queue it up? I've never actually been able to reproduce, but the attached patches apply, the reject was trivial. They seem to compile and boot on my main test rig, but nothing else was done but build the next kernel with it. [-- Attachment #2: sched-stable-4.4.patch --] [-- Type: text/x-diff, Size: 5885 bytes --] commit 8974189222159154c55f24ddad33e3613960521a Author: Peter Zijlstra <peterz@infradead.org> Date: Thu Jun 16 10:50:40 2016 +0200 sched/fair: Fix cfs_rq avg tracking underflow As per commit: b7fa30c9cc48 ("sched/fair: Fix post_init_entity_util_avg() serialization") > the code generated from update_cfs_rq_load_avg(): > > if (atomic_long_read(&cfs_rq->removed_load_avg)) { > s64 r = atomic_long_xchg(&cfs_rq->removed_load_avg, 0); > sa->load_avg = max_t(long, sa->load_avg - r, 0); > sa->load_sum = max_t(s64, sa->load_sum - r * LOAD_AVG_MAX, 0); > removed_load = 1; > } > > turns into: > > ffffffff81087064: 49 8b 85 98 00 00 00 mov 0x98(%r13),%rax > ffffffff8108706b: 48 85 c0 test %rax,%rax > ffffffff8108706e: 74 40 je ffffffff810870b0 <update_blocked_averages+0xc0> > ffffffff81087070: 4c 89 f8 mov %r15,%rax > ffffffff81087073: 49 87 85 98 00 00 00 xchg %rax,0x98(%r13) > ffffffff8108707a: 49 29 45 70 sub %rax,0x70(%r13) > ffffffff8108707e: 4c 89 f9 mov %r15,%rcx > ffffffff81087081: bb 01 00 00 00 mov $0x1,%ebx > ffffffff81087086: 49 83 7d 70 00 cmpq $0x0,0x70(%r13) > ffffffff8108708b: 49 0f 49 4d 70 cmovns 0x70(%r13),%rcx > > Which you'll note ends up with sa->load_avg -= r in memory at > ffffffff8108707a. So I _should_ have looked at other unserialized users of ->load_avg, but alas. Luckily nikbor reported a similar /0 from task_h_load() which instantly triggered recollection of this here problem. Aside from the intermediate value hitting memory and causing problems, there's another problem: the underflow detection relies on the signed bit. This reduces the effective width of the variables, IOW its effectively the same as having these variables be of signed type. This patch changes to a different means of unsigned underflow detection to not rely on the signed bit. This allows the variables to use the 'full' unsigned range. And it does so with explicit LOAD - STORE to ensure any intermediate value will never be visible in memory, allowing these unserialized loads. Note: GCC generates crap code for this, might warrant a look later. Note2: I say 'full' above, if we end up at U*_MAX we'll still explode; maybe we should do clamping on add too. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Yuyang Du <yuyang.du@intel.com> Cc: bsegall@google.com Cc: kernel@kyup.com Cc: morten.rasmussen@arm.com Cc: pjt@google.com Cc: steve.muckle@linaro.org Fixes: 9d89c257dfb9 ("sched/fair: Rewrite runnable load and utilization average tracking") Link: http://lkml.kernel.org/r/20160617091948.GJ30927@twins.programming.kicks-ass.net Signed-off-by: Ingo Molnar <mingo@kernel.org> --- kernel/sched/fair.c | 33 +++++++++++++++++++++++++-------- 1 file changed, 25 insertions(+), 8 deletions(-) --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -2682,6 +2682,23 @@ static inline void update_tg_load_avg(st static inline u64 cfs_rq_clock_task(struct cfs_rq *cfs_rq); +/* + * Unsigned subtract and clamp on underflow. + * + * Explicitly do a load-store to ensure the intermediate value never hits + * memory. This allows lockless observations without ever seeing the negative + * values. + */ +#define sub_positive(_ptr, _val) do { \ + typeof(_ptr) ptr = (_ptr); \ + typeof(*ptr) val = (_val); \ + typeof(*ptr) res, var = READ_ONCE(*ptr); \ + res = var - val; \ + if (res > var) \ + res = 0; \ + WRITE_ONCE(*ptr, res); \ +} while (0) + /* Group cfs_rq's load_avg is used for task_h_load and update_cfs_share */ static inline int update_cfs_rq_load_avg(u64 now, struct cfs_rq *cfs_rq) { @@ -2690,15 +2707,15 @@ static inline int update_cfs_rq_load_avg if (atomic_long_read(&cfs_rq->removed_load_avg)) { s64 r = atomic_long_xchg(&cfs_rq->removed_load_avg, 0); - sa->load_avg = max_t(long, sa->load_avg - r, 0); - sa->load_sum = max_t(s64, sa->load_sum - r * LOAD_AVG_MAX, 0); + sub_positive(&sa->load_avg, r); + sub_positive(&sa->load_sum, r * LOAD_AVG_MAX); removed = 1; } if (atomic_long_read(&cfs_rq->removed_util_avg)) { long r = atomic_long_xchg(&cfs_rq->removed_util_avg, 0); - sa->util_avg = max_t(long, sa->util_avg - r, 0); - sa->util_sum = max_t(s32, sa->util_sum - r * LOAD_AVG_MAX, 0); + sub_positive(&sa->util_avg, r); + sub_positive(&sa->util_sum, r * LOAD_AVG_MAX); } decayed = __update_load_avg(now, cpu_of(rq_of(cfs_rq)), sa, @@ -2764,10 +2781,10 @@ static void detach_entity_load_avg(struc &se->avg, se->on_rq * scale_load_down(se->load.weight), cfs_rq->curr == se, NULL); - cfs_rq->avg.load_avg = max_t(long, cfs_rq->avg.load_avg - se->avg.load_avg, 0); - cfs_rq->avg.load_sum = max_t(s64, cfs_rq->avg.load_sum - se->avg.load_sum, 0); - cfs_rq->avg.util_avg = max_t(long, cfs_rq->avg.util_avg - se->avg.util_avg, 0); - cfs_rq->avg.util_sum = max_t(s32, cfs_rq->avg.util_sum - se->avg.util_sum, 0); + sub_positive(&cfs_rq->avg.load_avg, se->avg.load_avg); + sub_positive(&cfs_rq->avg.load_sum, se->avg.load_sum); + sub_positive(&cfs_rq->avg.util_avg, se->avg.util_avg); + sub_positive(&cfs_rq->avg.util_sum, se->avg.util_sum); } /* Add the load generated by se into cfs_rq's load average */ [-- Attachment #3: sched-stable-4.6.patch --] [-- Type: text/x-diff, Size: 5732 bytes --] commit 8974189222159154c55f24ddad33e3613960521a Author: Peter Zijlstra <peterz@infradead.org> Date: Thu Jun 16 10:50:40 2016 +0200 sched/fair: Fix cfs_rq avg tracking underflow As per commit: b7fa30c9cc48 ("sched/fair: Fix post_init_entity_util_avg() serialization") > the code generated from update_cfs_rq_load_avg(): > > if (atomic_long_read(&cfs_rq->removed_load_avg)) { > s64 r = atomic_long_xchg(&cfs_rq->removed_load_avg, 0); > sa->load_avg = max_t(long, sa->load_avg - r, 0); > sa->load_sum = max_t(s64, sa->load_sum - r * LOAD_AVG_MAX, 0); > removed_load = 1; > } > > turns into: > > ffffffff81087064: 49 8b 85 98 00 00 00 mov 0x98(%r13),%rax > ffffffff8108706b: 48 85 c0 test %rax,%rax > ffffffff8108706e: 74 40 je ffffffff810870b0 <update_blocked_averages+0xc0> > ffffffff81087070: 4c 89 f8 mov %r15,%rax > ffffffff81087073: 49 87 85 98 00 00 00 xchg %rax,0x98(%r13) > ffffffff8108707a: 49 29 45 70 sub %rax,0x70(%r13) > ffffffff8108707e: 4c 89 f9 mov %r15,%rcx > ffffffff81087081: bb 01 00 00 00 mov $0x1,%ebx > ffffffff81087086: 49 83 7d 70 00 cmpq $0x0,0x70(%r13) > ffffffff8108708b: 49 0f 49 4d 70 cmovns 0x70(%r13),%rcx > > Which you'll note ends up with sa->load_avg -= r in memory at > ffffffff8108707a. So I _should_ have looked at other unserialized users of ->load_avg, but alas. Luckily nikbor reported a similar /0 from task_h_load() which instantly triggered recollection of this here problem. Aside from the intermediate value hitting memory and causing problems, there's another problem: the underflow detection relies on the signed bit. This reduces the effective width of the variables, IOW its effectively the same as having these variables be of signed type. This patch changes to a different means of unsigned underflow detection to not rely on the signed bit. This allows the variables to use the 'full' unsigned range. And it does so with explicit LOAD - STORE to ensure any intermediate value will never be visible in memory, allowing these unserialized loads. Note: GCC generates crap code for this, might warrant a look later. Note2: I say 'full' above, if we end up at U*_MAX we'll still explode; maybe we should do clamping on add too. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Yuyang Du <yuyang.du@intel.com> Cc: bsegall@google.com Cc: kernel@kyup.com Cc: morten.rasmussen@arm.com Cc: pjt@google.com Cc: steve.muckle@linaro.org Fixes: 9d89c257dfb9 ("sched/fair: Rewrite runnable load and utilization average tracking") Link: http://lkml.kernel.org/r/20160617091948.GJ30927@twins.programming.kicks-ass.net Signed-off-by: Ingo Molnar <mingo@kernel.org> --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -2821,6 +2821,23 @@ static inline void update_tg_load_avg(st static inline u64 cfs_rq_clock_task(struct cfs_rq *cfs_rq); +/* + * Unsigned subtract and clamp on underflow. + * + * Explicitly do a load-store to ensure the intermediate value never hits + * memory. This allows lockless observations without ever seeing the negative + * values. + */ +#define sub_positive(_ptr, _val) do { \ + typeof(_ptr) ptr = (_ptr); \ + typeof(*ptr) val = (_val); \ + typeof(*ptr) res, var = READ_ONCE(*ptr); \ + res = var - val; \ + if (res > var) \ + res = 0; \ + WRITE_ONCE(*ptr, res); \ +} while (0) + /* Group cfs_rq's load_avg is used for task_h_load and update_cfs_share */ static inline int update_cfs_rq_load_avg(u64 now, struct cfs_rq *cfs_rq) { @@ -2829,15 +2846,15 @@ static inline int update_cfs_rq_load_avg if (atomic_long_read(&cfs_rq->removed_load_avg)) { s64 r = atomic_long_xchg(&cfs_rq->removed_load_avg, 0); - sa->load_avg = max_t(long, sa->load_avg - r, 0); - sa->load_sum = max_t(s64, sa->load_sum - r * LOAD_AVG_MAX, 0); + sub_positive(&sa->load_avg, r); + sub_positive(&sa->load_sum, r * LOAD_AVG_MAX); removed = 1; } if (atomic_long_read(&cfs_rq->removed_util_avg)) { long r = atomic_long_xchg(&cfs_rq->removed_util_avg, 0); - sa->util_avg = max_t(long, sa->util_avg - r, 0); - sa->util_sum = max_t(s32, sa->util_sum - r * LOAD_AVG_MAX, 0); + sub_positive(&sa->util_avg, r); + sub_positive(&sa->util_sum, r * LOAD_AVG_MAX); } decayed = __update_load_avg(now, cpu_of(rq_of(cfs_rq)), sa, @@ -2927,10 +2944,10 @@ static void detach_entity_load_avg(struc &se->avg, se->on_rq * scale_load_down(se->load.weight), cfs_rq->curr == se, NULL); - cfs_rq->avg.load_avg = max_t(long, cfs_rq->avg.load_avg - se->avg.load_avg, 0); - cfs_rq->avg.load_sum = max_t(s64, cfs_rq->avg.load_sum - se->avg.load_sum, 0); - cfs_rq->avg.util_avg = max_t(long, cfs_rq->avg.util_avg - se->avg.util_avg, 0); - cfs_rq->avg.util_sum = max_t(s32, cfs_rq->avg.util_sum - se->avg.util_sum, 0); + sub_positive(&cfs_rq->avg.load_avg, se->avg.load_avg); + sub_positive(&cfs_rq->avg.load_sum, se->avg.load_sum); + sub_positive(&cfs_rq->avg.util_avg, se->avg.util_avg); + sub_positive(&cfs_rq->avg.util_sum, se->avg.util_sum); } /* Add the load generated by se into cfs_rq's load average */ ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: divide error: 0000 [#1] SMP in task_numa_migrate - handle_mm_fault vanilla 4.4.6 2016-07-12 13:12 ` Peter Zijlstra @ 2016-07-13 0:26 ` Greg KH 0 siblings, 0 replies; 19+ messages in thread From: Greg KH @ 2016-07-13 0:26 UTC (permalink / raw) To: Peter Zijlstra Cc: Campbell Steven, Stefan Priebe - Profihost AG, Vlastimil Babka, LKML, linux-mm, linux-mm, Ingo Molnar, Rik van Riel On Tue, Jul 12, 2016 at 03:12:35PM +0200, Peter Zijlstra wrote: > On Mon, Jul 11, 2016 at 03:33:53PM -0700, Greg KH wrote: > > > Oops, this commit does not apply cleanly to 4.6 or 4.4-stable trees. > > Can someone send me the backported verision that they have tested to > > work properly so I can queue it up? > > I've never actually been able to reproduce, but the attached patches > apply, the reject was trivial. > > They seem to compile and boot on my main test rig, but nothing else was > done but build the next kernel with it. Thanks for these, now applied. greg k-h ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: divide error: 0000 [#1] SMP in task_numa_migrate - handle_mm_fault vanilla 4.4.6 @ 2016-06-21 12:13 Yannis Aribaud 2016-06-22 15:42 ` Yannis Aribaud 0 siblings, 1 reply; 19+ messages in thread From: Yannis Aribaud @ 2016-06-21 12:13 UTC (permalink / raw) To: linux-kernel Hi everyone, I recently it this bug in the kernel using a vanilla 4.6.2 release. It seems that somewhere in the load average calculation a division by 0 occurs (see the stack trace at the end). After digging a bit (be fair it's my first time) in the kernel sources, I found that we "recently" added the function cfs_rq_load_avg (commit 6f2b04524f0b38bfbb8413f98d2d6af234508309) and started using it in the function task_h_load which do a division with the value returned (kernel/sched/fair.c) like this: static unsigned long task_h_load(struct task_struct *p) { struct cfs_rq *cfs_rq = task_cfs_rq(p); update_cfs_rq_h_load(cfs_rq); return div64_ul(p->se.avg.load_avg * cfs_rq->h_load, cfs_rq_load_avg(cfs_rq) + 1); } But the load_avg filed from sched_avg struct is an atomic_long_t and the cfs_rq_load_avg returns this field as an unsigned long without doing any type conversion. static inline unsigned long cfs_rq_load_avg(struct cfs_rq *cfs_rq) { return cfs_rq->avg.load_avg; } I'm not an expert at all but I suspect that is the issue's origin. Shouldn't the function cfs_rq_load_avg use an atomic_long_read() to avoid this ? Here is the stack trace: [534814.112500] divide error: 0000 [#1] SMP [534814.112550] Modules linked in: vhost_net vhost macvtap macvlan ipmi_si mpt3sas raid_class scsi_transport_sas ipmi_devintf dell_rbu tun nfsd auth_rpcgss oid_registry nfs_acl nfs lockd grace fscache sunrpc bridge 8021q garp mrp stp llc bonding xfs libcrc32c bcache usbhid hid uhci_hcd ohci_hcd x86_pkg_temp_thermal coretemp kvm_intel kvm irqbypass ghash_clmulni_intel iTCO_wdt iTCO_vendor_support sha256_generic hmac drbg dcdbas ansi_cprng aesni_intel aes_x86_64 ablk_helper cryptd lrw gf128mul glue_helper shpchp evdev sb_edac edac_core ehci_pci ehci_hcd lpc_ich usbcore mfd_core usb_common ipmi_msghandler acpi_cpufreq wmi tpm_tis tpm processor acpi_power_meter button ext4 crc16 jbd2 mbcache sg sd_mod dm_mod crc32c_intel igb megaraid_sas i2c_algo_bit i2c_core dca ptp scsi_mod pps_core [last unloaded: ipmi_si] [534814.113345] CPU: 10 PID: 38568 Comm: ceph-osd Not tainted 4.6.2-ig1virt #16 [534814.113390] Hardware name: Dell Inc. PowerEdge R730xd/0H21J3, BIOS 1.1.4 11/03/2014 [534814.113458] task: ffff88100cf5ef00 ti: ffff8814827e0000 task.ti: ffff8814827e0000 [534814.113525] RIP: 0010:[<ffffffff8106cfd7>] [<ffffffff8106cfd7>] task_h_load+0x4f/0xc7 [534814.113613] RSP: 0000:ffff8814827e3c00 EFLAGS: 00010256 [534814.113654] RAX: 0000000000000000 RBX: 00000000000000d7 RCX: 0000000000000000 [534814.113720] RDX: 0000000000000000 RSI: ffff88103d8a5f00 RDI: ffff88100cf5ef00 [534814.113786] RBP: ffff8814827e3c90 R08: 0000000107f70c76 R09: 0000000000000000 [534814.113851] R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000001 [534814.113917] R13: 0000000000000015 R14: 0000000000000000 R15: ffff88207ec14580 [534814.113984] FS: 00007eff83cbb700(0000) GS:ffff88107f4a0000(0000) knlGS:0000000000000000 [534814.114053] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [534814.114095] CR2: 00000000146b07c0 CR3: 000000103f605000 CR4: 00000000001426e0 [534814.119456] Stack: [534814.119488] ffffffff8106fc4c ffff88103d639400 00000000000000d7 0000000000014580 [534814.119571] fffffffffffffe19 ffff88107f4a0000 00000000000000d7 0000000000000027 [534814.119653] ffff88100cf5ef00 000000000000025f 0000000000000100 0000000000000188 [534814.119736] Call Trace: [534814.119774] [<ffffffff8106fc4c>] ? task_numa_find_cpu+0x1d2/0x2ec [534814.119819] [<ffffffff8106fe86>] ? task_numa_migrate+0x120/0x328 [534814.119864] [<ffffffff81067829>] ? ttwu_do_wakeup+0xf/0xcd [534814.119907] [<ffffffff81071176>] ? task_numa_fault+0x912/0x9a9 [534814.119954] [<ffffffff81128568>] ? mpol_misplaced+0x138/0x14a [534814.120001] [<ffffffff8110f39d>] ? handle_mm_fault+0xe28/0xf31 [534814.120046] [<ffffffff8113db0b>] ? fput+0xd/0x81 [534814.120087] [<ffffffff8103cd91>] ? __do_page_fault+0x425/0x485 [534814.120131] [<ffffffff813c85a2>] ? page_fault+0x22/0x30 [534814.120171] Code: 63 92 38 09 00 00 48 8b 80 b8 00 00 00 48 8b 04 d0 75 1c 48 8b 86 b0 00 00 00 48 8b 4e 78 31 d2 48 0f af 87 58 01 00 00 48 ff c1 <48> f7 f1 c3 48 c7 86 c0 00 00 00 00 00 00 00 48 89 f1 eb 18 48 [534814.120582] RIP [<ffffffff8106cfd7>] task_h_load+0x4f/0xc7 [534814.120628] RSP <ffff8814827e3c00> [534814.121242] ---[ end trace ca72a3c25fb6f0dc ]--- Best regards, -- Yannis Aribaud -- Yannis Aribaud ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: divide error: 0000 [#1] SMP in task_numa_migrate - handle_mm_fault vanilla 4.4.6 2016-06-21 12:13 Yannis Aribaud @ 2016-06-22 15:42 ` Yannis Aribaud 0 siblings, 0 replies; 19+ messages in thread From: Yannis Aribaud @ 2016-06-22 15:42 UTC (permalink / raw) To: linux-kernel 21 juin 2016 14:13 "Yannis Aribaud" <bugs@d6bell.net> a écrit: > Hi everyone, > > I recently it this bug in the kernel using a vanilla 4.6.2 release. > It seems that somewhere in the load average calculation a division by 0 occurs (see the stack trace > at the end). > > [snipped] > > I'm not an expert at all but I suspect that is the issue's origin. Shouldn't the function > cfs_rq_load_avg use an atomic_long_read() to avoid this ? After digging a bit more, this can't be the problem as this function obviously can't return negative value. I found that it can maybe come from the update_cfs_rq_load_avg function in the following block: if (atomic_long_read(&cfs_rq->removed_load_avg)) { s64 r = atomic_long_xchg(&cfs_rq->removed_load_avg, 0); sa->load_avg = max_t(long, sa->load_avg - r, 0); sa->load_sum = max_t(s64, sa->load_sum - r * LOAD_AVG_MAX, 0); removed_load = 1; } The max_t(long, sa->load_avg - r, 0) can result in a negative value keeped by the max_t function as the long would wrap up then generate a division by zero in task_h_load function. Best regards, -- Yannis Aribaud ^ permalink raw reply [flat|nested] 19+ messages in thread
end of thread, other threads:[~2016-07-13 0:28 UTC | newest] Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2016-03-17 18:38 divide error: 0000 [#1] SMP in task_numa_migrate - handle_mm_fault vanilla 4.4.6 Stefan Priebe 2016-03-17 18:45 ` Greg KH 2016-03-19 22:26 ` Vlastimil Babka 2016-03-20 21:27 ` Stefan Priebe 2016-03-20 21:41 ` Greg KH 2016-03-21 10:52 ` Stefan Priebe - Profihost AG 2016-03-21 13:38 ` Greg KH 2016-05-17 6:01 ` Stefan Priebe - Profihost AG 2016-05-17 9:21 ` Campbell Steven 2016-06-22 1:19 ` Campbell Steven 2016-06-22 6:13 ` Peter Zijlstra 2016-07-06 23:20 ` Campbell Steven 2016-07-07 7:42 ` Peter Zijlstra 2016-07-09 5:21 ` Greg KH 2016-07-11 22:33 ` Greg KH 2016-07-12 13:12 ` Peter Zijlstra 2016-07-13 0:26 ` Greg KH 2016-06-21 12:13 Yannis Aribaud 2016-06-22 15:42 ` Yannis Aribaud
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).