linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* divide error: 0000 [#1] SMP in task_numa_migrate - handle_mm_fault vanilla 4.4.6
@ 2016-03-17 18:38 Stefan Priebe
  2016-03-17 18:45 ` Greg KH
  0 siblings, 1 reply; 17+ messages in thread
From: Stefan Priebe @ 2016-03-17 18:38 UTC (permalink / raw)
  To: LKML, stable, linux-mm, linux-mm

Hi,

while running qemu 2.5 on a host running 4.4.6 the host system has 
crashed (load > 200) 3 times in the last 3 days.

Always with this stack trace: (copy left here: 
http://pastebin.com/raw/bCWTLKyt)

[69068.874268] divide error: 0000 [#1] SMP
[69068.875242] Modules linked in: ebtable_filter ebtables ip6t_REJECT 
nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter 
ip6_tables ipt_REJECT nf_reject_ipv4 xt_physdev xt_comment 
nf_conntrack_ipv4 nf_defrag_ipv4 xt_tcpudp xt_mark xt_set xt_addrtype 
xt_conntrack nf_conntrack ip_set_hash_net ip_set vhost_net tun vhost 
macvtap macvlan kvm_intel nfnetlink_log kvm nfnetlink irqbypass 
netconsole dlm xt_multiport iptable_filter ip_tables x_tables iscsi_tcp 
libiscsi_tcp libiscsi scsi_transport_iscsi nfsd auth_rpcgss oid_registry 
bonding coretemp 8021q garp fuse i2c_i801 i7core_edac edac_core 
i5500_temp button btrfs xor raid6_pq dm_mod raid1 md_mod usb_storage 
ohci_hcd bcache sg usbhid sd_mod ata_generic uhci_hcd ehci_pci ehci_hcd 
usbcore ata_piix usb_common igb i2c_algo_bit mpt3sas raid_class ixgbe 
scsi_transport_sas i2c_core mdio ptp pps_core
[69068.895604] CPU: 14 PID: 6673 Comm: ceph-osd Not tainted 4.4.6+7-ph #1
[69068.897052] Hardware name: Supermicro X8DT3/X8DT3, BIOS 2.1 
03/17/2012
[69068.898578] task: ffff880fc7f28000 ti: ffff880fda2c4000 task.ti: 
ffff880fda2c4000
[69068.900377] RIP: 0010:[<ffffffff860b372c>]  [<ffffffff860b372c>] 
task_h_load+0xcc/0x100
[69068.961763] RSP: 0000:ffff880fda2c7b50  EFLAGS: 00010257
[69069.023910] RAX: 0000000000000000 RBX: ffff880fda2c7c10 RCX: 
0000000000000000
[69069.085953] RDX: 0000000000000000 RSI: 0000000000000001 RDI: 
ffff880fc7f28000
[69069.151731] RBP: ffff880fda2c7bc8 R08: 00000001041955df R09: 
ffff880fffd153f8
[69069.213757] R10: 0000000000000009 R11: 0000000000000193 R12: 
ffff881f6832c780
[69069.274271] R13: ffff88203fc35380 R14: 0000000000000007 R15: 
00000000000000b6
[69069.334727] FS:  00007f578a3fb700(0000) GS:ffff880fffd00000(0000) 
knlGS:0000000000000000
[69069.396435] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[69069.458522] CR2: 00007f5784f18468 CR3: 0000001fe9738000 CR4: 
00000000000026e0
[69069.520799] Stack:
[69069.581430]  ffffffff860b6855 ffff880fda2c7b78 0000000000000000 
0000000000000005
[69069.642629]  ffff880fffd00000 0000000000000015 fffffffffffffe7d 
0000000000000015
[69069.702815]  0000000000015380 ffff880fda2c7bc8 ffff880fc7f28000 
00000000000001e9
[69069.761881] Call Trace:
[69069.819883]  [<ffffffff860b6855>] ? task_numa_find_cpu+0x225/0x670
[69069.878368]  [<ffffffff860b79f0>] task_numa_migrate+0x550/0x950
[69069.936059]  [<ffffffff863d9138>] ? find_next_bit+0x18/0x20
[69069.993262]  [<ffffffff860b7e6d>] numa_migrate_preferred+0x7d/0x90
[69070.050528]  [<ffffffff860b89a5>] task_numa_fault+0x7c5/0xaa0
[69070.106544]  [<ffffffff861a2c0b>] ? mpol_misplaced+0x16b/0x1b0
[69070.163705]  [<ffffffff8618104e>] __handle_mm_fault+0x9ae/0x11f0
[69070.220013]  [<ffffffff865e4c52>] ? inet_recvmsg+0x72/0x90
[69070.276558]  [<ffffffff8655240b>] ? SYSC_recvfrom+0x12b/0x170
[69070.332283]  [<ffffffff8618196f>] handle_mm_fault+0xdf/0x180
[69070.388515]  [<ffffffff8604f324>] __do_page_fault+0x164/0x380
[69070.443897]  [<ffffffff860b25c3>] ? account_user_time+0x73/0x80
[69070.498534]  [<ffffffff860b2b3e>] ? vtime_account_user+0x4e/0x70
[69070.552598]  [<ffffffff8604f5a7>] do_page_fault+0x37/0x90
[69070.605960]  [<ffffffff86002a23>] ? syscall_return_slowpath+0x83/0xf0
[69070.660705]  [<ffffffff866b32f8>] page_fault+0x28/0x30
[69070.715707] Code: 86 b8 00 00 00 48 89 86 b0 00 00 00 48 85 c9 75 ca 
49 8b 81 b0 00 00 00 49 8b 49 78 31 d2 48 0f af 87 d8 01 00 00 5d 48 83 
c1 01 <48> f7 f1 c3 4c 89 ce 48 8b 8e c0 00 00 00 48 8b 46 78 4c 89 86
[69070.835144] RIP  [<ffffffff860b372c>] task_h_load+0xcc/0x100
[69070.894095]  RSP <ffff880fda2c7b50>
[69070.953213] ---[ end trace 8d6f449a03dacfd4 ]---

Would be nice if we can fix this in 4.4?

Greets,
Stefan

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: divide error: 0000 [#1] SMP in task_numa_migrate - handle_mm_fault vanilla 4.4.6
  2016-03-17 18:38 divide error: 0000 [#1] SMP in task_numa_migrate - handle_mm_fault vanilla 4.4.6 Stefan Priebe
@ 2016-03-17 18:45 ` Greg KH
  2016-03-19 22:26   ` Vlastimil Babka
  0 siblings, 1 reply; 17+ messages in thread
From: Greg KH @ 2016-03-17 18:45 UTC (permalink / raw)
  To: Stefan Priebe; +Cc: LKML, stable, linux-mm, linux-mm

On Thu, Mar 17, 2016 at 07:38:03PM +0100, Stefan Priebe wrote:
> Hi,
> 
> while running qemu 2.5 on a host running 4.4.6 the host system has crashed
> (load > 200) 3 times in the last 3 days.
> 
> Always with this stack trace: (copy left here:
> http://pastebin.com/raw/bCWTLKyt)
> 
> [69068.874268] divide error: 0000 [#1] SMP
> [69068.875242] Modules linked in: ebtable_filter ebtables ip6t_REJECT
> nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter ip6_tables
> ipt_REJECT nf_reject_ipv4 xt_physdev xt_comment nf_conntrack_ipv4
> nf_defrag_ipv4 xt_tcpudp xt_mark xt_set xt_addrtype xt_conntrack
> nf_conntrack ip_set_hash_net ip_set vhost_net tun vhost macvtap macvlan
> kvm_intel nfnetlink_log kvm nfnetlink irqbypass netconsole dlm xt_multiport
> iptable_filter ip_tables x_tables iscsi_tcp libiscsi_tcp libiscsi
> scsi_transport_iscsi nfsd auth_rpcgss oid_registry bonding coretemp 8021q
> garp fuse i2c_i801 i7core_edac edac_core i5500_temp button btrfs xor
> raid6_pq dm_mod raid1 md_mod usb_storage ohci_hcd bcache sg usbhid sd_mod
> ata_generic uhci_hcd ehci_pci ehci_hcd usbcore ata_piix usb_common igb
> i2c_algo_bit mpt3sas raid_class ixgbe scsi_transport_sas i2c_core mdio ptp
> pps_core
> [69068.895604] CPU: 14 PID: 6673 Comm: ceph-osd Not tainted 4.4.6+7-ph #1
> [69068.897052] Hardware name: Supermicro X8DT3/X8DT3, BIOS 2.1 03/17/2012
> [69068.898578] task: ffff880fc7f28000 ti: ffff880fda2c4000 task.ti:
> ffff880fda2c4000
> [69068.900377] RIP: 0010:[<ffffffff860b372c>]  [<ffffffff860b372c>]
> task_h_load+0xcc/0x100
> [69068.961763] RSP: 0000:ffff880fda2c7b50  EFLAGS: 00010257
> [69069.023910] RAX: 0000000000000000 RBX: ffff880fda2c7c10 RCX:
> 0000000000000000
> [69069.085953] RDX: 0000000000000000 RSI: 0000000000000001 RDI:
> ffff880fc7f28000
> [69069.151731] RBP: ffff880fda2c7bc8 R08: 00000001041955df R09:
> ffff880fffd153f8
> [69069.213757] R10: 0000000000000009 R11: 0000000000000193 R12:
> ffff881f6832c780
> [69069.274271] R13: ffff88203fc35380 R14: 0000000000000007 R15:
> 00000000000000b6
> [69069.334727] FS:  00007f578a3fb700(0000) GS:ffff880fffd00000(0000)
> knlGS:0000000000000000
> [69069.396435] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [69069.458522] CR2: 00007f5784f18468 CR3: 0000001fe9738000 CR4:
> 00000000000026e0
> [69069.520799] Stack:
> [69069.581430]  ffffffff860b6855 ffff880fda2c7b78 0000000000000000
> 0000000000000005
> [69069.642629]  ffff880fffd00000 0000000000000015 fffffffffffffe7d
> 0000000000000015
> [69069.702815]  0000000000015380 ffff880fda2c7bc8 ffff880fc7f28000
> 00000000000001e9
> [69069.761881] Call Trace:
> [69069.819883]  [<ffffffff860b6855>] ? task_numa_find_cpu+0x225/0x670
> [69069.878368]  [<ffffffff860b79f0>] task_numa_migrate+0x550/0x950
> [69069.936059]  [<ffffffff863d9138>] ? find_next_bit+0x18/0x20
> [69069.993262]  [<ffffffff860b7e6d>] numa_migrate_preferred+0x7d/0x90
> [69070.050528]  [<ffffffff860b89a5>] task_numa_fault+0x7c5/0xaa0
> [69070.106544]  [<ffffffff861a2c0b>] ? mpol_misplaced+0x16b/0x1b0
> [69070.163705]  [<ffffffff8618104e>] __handle_mm_fault+0x9ae/0x11f0
> [69070.220013]  [<ffffffff865e4c52>] ? inet_recvmsg+0x72/0x90
> [69070.276558]  [<ffffffff8655240b>] ? SYSC_recvfrom+0x12b/0x170
> [69070.332283]  [<ffffffff8618196f>] handle_mm_fault+0xdf/0x180
> [69070.388515]  [<ffffffff8604f324>] __do_page_fault+0x164/0x380
> [69070.443897]  [<ffffffff860b25c3>] ? account_user_time+0x73/0x80
> [69070.498534]  [<ffffffff860b2b3e>] ? vtime_account_user+0x4e/0x70
> [69070.552598]  [<ffffffff8604f5a7>] do_page_fault+0x37/0x90
> [69070.605960]  [<ffffffff86002a23>] ? syscall_return_slowpath+0x83/0xf0
> [69070.660705]  [<ffffffff866b32f8>] page_fault+0x28/0x30
> [69070.715707] Code: 86 b8 00 00 00 48 89 86 b0 00 00 00 48 85 c9 75 ca 49
> 8b 81 b0 00 00 00 49 8b 49 78 31 d2 48 0f af 87 d8 01 00 00 5d 48 83 c1 01
> <48> f7 f1 c3 4c 89 ce 48 8b 8e c0 00 00 00 48 8b 46 78 4c 89 86
> [69070.835144] RIP  [<ffffffff860b372c>] task_h_load+0xcc/0x100
> [69070.894095]  RSP <ffff880fda2c7b50>
> [69070.953213] ---[ end trace 8d6f449a03dacfd4 ]---
> 
> Would be nice if we can fix this in 4.4?

Does this also happen in 4.5?  Did this work on 4.4.5?  Some previous
release?  If you can find the offending patch using 'git bisect', that
would be great.

thanks,

greg k-h

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: divide error: 0000 [#1] SMP in task_numa_migrate - handle_mm_fault vanilla 4.4.6
  2016-03-17 18:45 ` Greg KH
@ 2016-03-19 22:26   ` Vlastimil Babka
  2016-03-20 21:27     ` Stefan Priebe
  0 siblings, 1 reply; 17+ messages in thread
From: Vlastimil Babka @ 2016-03-19 22:26 UTC (permalink / raw)
  To: Greg KH, Stefan Priebe
  Cc: LKML, stable, linux-mm, linux-mm, Peter Zijlstra, Ingo Molnar,
	Rik van Riel

On 03/17/2016 07:45 PM, Greg KH wrote:
> On Thu, Mar 17, 2016 at 07:38:03PM +0100, Stefan Priebe wrote:
>> Hi,
>>
>> while running qemu 2.5 on a host running 4.4.6 the host system has crashed
>> (load > 200) 3 times in the last 3 days.
>>
>> Always with this stack trace: (copy left here:
>> http://pastebin.com/raw/bCWTLKyt)
>>
>> [69068.874268] divide error: 0000 [#1] SMP
>> [69068.875242] Modules linked in: ebtable_filter ebtables ip6t_REJECT
>> nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter ip6_tables
>> ipt_REJECT nf_reject_ipv4 xt_physdev xt_comment nf_conntrack_ipv4
>> nf_defrag_ipv4 xt_tcpudp xt_mark xt_set xt_addrtype xt_conntrack
>> nf_conntrack ip_set_hash_net ip_set vhost_net tun vhost macvtap macvlan
>> kvm_intel nfnetlink_log kvm nfnetlink irqbypass netconsole dlm xt_multiport
>> iptable_filter ip_tables x_tables iscsi_tcp libiscsi_tcp libiscsi
>> scsi_transport_iscsi nfsd auth_rpcgss oid_registry bonding coretemp 8021q
>> garp fuse i2c_i801 i7core_edac edac_core i5500_temp button btrfs xor
>> raid6_pq dm_mod raid1 md_mod usb_storage ohci_hcd bcache sg usbhid sd_mod
>> ata_generic uhci_hcd ehci_pci ehci_hcd usbcore ata_piix usb_common igb
>> i2c_algo_bit mpt3sas raid_class ixgbe scsi_transport_sas i2c_core mdio ptp
>> pps_core
>> [69068.895604] CPU: 14 PID: 6673 Comm: ceph-osd Not tainted 4.4.6+7-ph #1
>> [69068.897052] Hardware name: Supermicro X8DT3/X8DT3, BIOS 2.1 03/17/2012
>> [69068.898578] task: ffff880fc7f28000 ti: ffff880fda2c4000 task.ti:
>> ffff880fda2c4000
>> [69068.900377] RIP: 0010:[<ffffffff860b372c>]  [<ffffffff860b372c>]
>> task_h_load+0xcc/0x100

decodecode says:

   27:   48 83 c1 01             add    $0x1,%rcx
   2b:*  48 f7 f1                div    %rcx             <-- trapping instruction

This suggests the CONFIG_FAIR_GROUP_SCHED version of task_h_load:

         update_cfs_rq_h_load(cfs_rq);
         return div64_ul(p->se.avg.load_avg * cfs_rq->h_load,
                         cfs_rq_load_avg(cfs_rq) + 1);

So the load avg is -1, thus after adding 1 we get division by 0, huh?

>> [69068.961763] RSP: 0000:ffff880fda2c7b50  EFLAGS: 00010257
>> [69069.023910] RAX: 0000000000000000 RBX: ffff880fda2c7c10 RCX:
>> 0000000000000000
>> [69069.085953] RDX: 0000000000000000 RSI: 0000000000000001 RDI:
>> ffff880fc7f28000
>> [69069.151731] RBP: ffff880fda2c7bc8 R08: 00000001041955df R09:
>> ffff880fffd153f8
>> [69069.213757] R10: 0000000000000009 R11: 0000000000000193 R12:
>> ffff881f6832c780
>> [69069.274271] R13: ffff88203fc35380 R14: 0000000000000007 R15:
>> 00000000000000b6
>> [69069.334727] FS:  00007f578a3fb700(0000) GS:ffff880fffd00000(0000)
>> knlGS:0000000000000000
>> [69069.396435] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> [69069.458522] CR2: 00007f5784f18468 CR3: 0000001fe9738000 CR4:
>> 00000000000026e0
>> [69069.520799] Stack:
>> [69069.581430]  ffffffff860b6855 ffff880fda2c7b78 0000000000000000
>> 0000000000000005
>> [69069.642629]  ffff880fffd00000 0000000000000015 fffffffffffffe7d
>> 0000000000000015
>> [69069.702815]  0000000000015380 ffff880fda2c7bc8 ffff880fc7f28000
>> 00000000000001e9
>> [69069.761881] Call Trace:
>> [69069.819883]  [<ffffffff860b6855>] ? task_numa_find_cpu+0x225/0x670
>> [69069.878368]  [<ffffffff860b79f0>] task_numa_migrate+0x550/0x950
>> [69069.936059]  [<ffffffff863d9138>] ? find_next_bit+0x18/0x20
>> [69069.993262]  [<ffffffff860b7e6d>] numa_migrate_preferred+0x7d/0x90
>> [69070.050528]  [<ffffffff860b89a5>] task_numa_fault+0x7c5/0xaa0
>> [69070.106544]  [<ffffffff861a2c0b>] ? mpol_misplaced+0x16b/0x1b0
>> [69070.163705]  [<ffffffff8618104e>] __handle_mm_fault+0x9ae/0x11f0
>> [69070.220013]  [<ffffffff865e4c52>] ? inet_recvmsg+0x72/0x90
>> [69070.276558]  [<ffffffff8655240b>] ? SYSC_recvfrom+0x12b/0x170
>> [69070.332283]  [<ffffffff8618196f>] handle_mm_fault+0xdf/0x180
>> [69070.388515]  [<ffffffff8604f324>] __do_page_fault+0x164/0x380
>> [69070.443897]  [<ffffffff860b25c3>] ? account_user_time+0x73/0x80
>> [69070.498534]  [<ffffffff860b2b3e>] ? vtime_account_user+0x4e/0x70
>> [69070.552598]  [<ffffffff8604f5a7>] do_page_fault+0x37/0x90
>> [69070.605960]  [<ffffffff86002a23>] ? syscall_return_slowpath+0x83/0xf0
>> [69070.660705]  [<ffffffff866b32f8>] page_fault+0x28/0x30
>> [69070.715707] Code: 86 b8 00 00 00 48 89 86 b0 00 00 00 48 85 c9 75 ca 49
>> 8b 81 b0 00 00 00 49 8b 49 78 31 d2 48 0f af 87 d8 01 00 00 5d 48 83 c1 01
>> <48> f7 f1 c3 4c 89 ce 48 8b 8e c0 00 00 00 48 8b 46 78 4c 89 86
>> [69070.835144] RIP  [<ffffffff860b372c>] task_h_load+0xcc/0x100
>> [69070.894095]  RSP <ffff880fda2c7b50>
>> [69070.953213] ---[ end trace 8d6f449a03dacfd4 ]---
>>
>> Would be nice if we can fix this in 4.4?
>
> Does this also happen in 4.5?  Did this work on 4.4.5?  Some previous
> release?  If you can find the offending patch using 'git bisect', that
> would be great.
>
> thanks,
>
> greg k-h
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: divide error: 0000 [#1] SMP in task_numa_migrate - handle_mm_fault vanilla 4.4.6
  2016-03-19 22:26   ` Vlastimil Babka
@ 2016-03-20 21:27     ` Stefan Priebe
  2016-03-20 21:41       ` Greg KH
  0 siblings, 1 reply; 17+ messages in thread
From: Stefan Priebe @ 2016-03-20 21:27 UTC (permalink / raw)
  To: Vlastimil Babka, Greg KH
  Cc: LKML, stable, linux-mm, linux-mm, Peter Zijlstra, Ingo Molnar,
	Rik van Riel


Am 19.03.2016 um 23:26 schrieb Vlastimil Babka:
> On 03/17/2016 07:45 PM, Greg KH wrote:
>> On Thu, Mar 17, 2016 at 07:38:03PM +0100, Stefan Priebe wrote:
>>> Hi,
>>>
>>> while running qemu 2.5 on a host running 4.4.6 the host system has
>>> crashed
>>> (load > 200) 3 times in the last 3 days.
>>>
>>> Always with this stack trace: (copy left here:
>>> http://pastebin.com/raw/bCWTLKyt)
>>>
>>> [69068.874268] divide error: 0000 [#1] SMP
>>> [69068.875242] Modules linked in: ebtable_filter ebtables ip6t_REJECT
>>> nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter
>>> ip6_tables
>>> ipt_REJECT nf_reject_ipv4 xt_physdev xt_comment nf_conntrack_ipv4
>>> nf_defrag_ipv4 xt_tcpudp xt_mark xt_set xt_addrtype xt_conntrack
>>> nf_conntrack ip_set_hash_net ip_set vhost_net tun vhost macvtap macvlan
>>> kvm_intel nfnetlink_log kvm nfnetlink irqbypass netconsole dlm
>>> xt_multiport
>>> iptable_filter ip_tables x_tables iscsi_tcp libiscsi_tcp libiscsi
>>> scsi_transport_iscsi nfsd auth_rpcgss oid_registry bonding coretemp
>>> 8021q
>>> garp fuse i2c_i801 i7core_edac edac_core i5500_temp button btrfs xor
>>> raid6_pq dm_mod raid1 md_mod usb_storage ohci_hcd bcache sg usbhid
>>> sd_mod
>>> ata_generic uhci_hcd ehci_pci ehci_hcd usbcore ata_piix usb_common igb
>>> i2c_algo_bit mpt3sas raid_class ixgbe scsi_transport_sas i2c_core
>>> mdio ptp
>>> pps_core
>>> [69068.895604] CPU: 14 PID: 6673 Comm: ceph-osd Not tainted
>>> 4.4.6+7-ph #1
>>> [69068.897052] Hardware name: Supermicro X8DT3/X8DT3, BIOS 2.1
>>> 03/17/2012
>>> [69068.898578] task: ffff880fc7f28000 ti: ffff880fda2c4000 task.ti:
>>> ffff880fda2c4000
>>> [69068.900377] RIP: 0010:[<ffffffff860b372c>]  [<ffffffff860b372c>]
>>> task_h_load+0xcc/0x100
>
> decodecode says:
>
>    27:   48 83 c1 01             add    $0x1,%rcx
>    2b:*  48 f7 f1                div    %rcx             <-- trapping
> instruction
>
> This suggests the CONFIG_FAIR_GROUP_SCHED version of task_h_load:
>
>          update_cfs_rq_h_load(cfs_rq);
>          return div64_ul(p->se.avg.load_avg * cfs_rq->h_load,
>                          cfs_rq_load_avg(cfs_rq) + 1);
>
> So the load avg is -1, thus after adding 1 we get division by 0, huh?

Yes CONFIG_FAIR_GROUP_SCHED is set. I cherry picked now all those 
commits up to 4.5 for fair.c:
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/log/kernel/sched/fair.c?h=v4.5

It didn't happen again with v4.4.6 + 4.5 patches for fair.c

Greets,
Stefan

>
>>> [69068.961763] RSP: 0000:ffff880fda2c7b50  EFLAGS: 00010257
>>> [69069.023910] RAX: 0000000000000000 RBX: ffff880fda2c7c10 RCX:
>>> 0000000000000000
>>> [69069.085953] RDX: 0000000000000000 RSI: 0000000000000001 RDI:
>>> ffff880fc7f28000
>>> [69069.151731] RBP: ffff880fda2c7bc8 R08: 00000001041955df R09:
>>> ffff880fffd153f8
>>> [69069.213757] R10: 0000000000000009 R11: 0000000000000193 R12:
>>> ffff881f6832c780
>>> [69069.274271] R13: ffff88203fc35380 R14: 0000000000000007 R15:
>>> 00000000000000b6
>>> [69069.334727] FS:  00007f578a3fb700(0000) GS:ffff880fffd00000(0000)
>>> knlGS:0000000000000000
>>> [69069.396435] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>> [69069.458522] CR2: 00007f5784f18468 CR3: 0000001fe9738000 CR4:
>>> 00000000000026e0
>>> [69069.520799] Stack:
>>> [69069.581430]  ffffffff860b6855 ffff880fda2c7b78 0000000000000000
>>> 0000000000000005
>>> [69069.642629]  ffff880fffd00000 0000000000000015 fffffffffffffe7d
>>> 0000000000000015
>>> [69069.702815]  0000000000015380 ffff880fda2c7bc8 ffff880fc7f28000
>>> 00000000000001e9
>>> [69069.761881] Call Trace:
>>> [69069.819883]  [<ffffffff860b6855>] ? task_numa_find_cpu+0x225/0x670
>>> [69069.878368]  [<ffffffff860b79f0>] task_numa_migrate+0x550/0x950
>>> [69069.936059]  [<ffffffff863d9138>] ? find_next_bit+0x18/0x20
>>> [69069.993262]  [<ffffffff860b7e6d>] numa_migrate_preferred+0x7d/0x90
>>> [69070.050528]  [<ffffffff860b89a5>] task_numa_fault+0x7c5/0xaa0
>>> [69070.106544]  [<ffffffff861a2c0b>] ? mpol_misplaced+0x16b/0x1b0
>>> [69070.163705]  [<ffffffff8618104e>] __handle_mm_fault+0x9ae/0x11f0
>>> [69070.220013]  [<ffffffff865e4c52>] ? inet_recvmsg+0x72/0x90
>>> [69070.276558]  [<ffffffff8655240b>] ? SYSC_recvfrom+0x12b/0x170
>>> [69070.332283]  [<ffffffff8618196f>] handle_mm_fault+0xdf/0x180
>>> [69070.388515]  [<ffffffff8604f324>] __do_page_fault+0x164/0x380
>>> [69070.443897]  [<ffffffff860b25c3>] ? account_user_time+0x73/0x80
>>> [69070.498534]  [<ffffffff860b2b3e>] ? vtime_account_user+0x4e/0x70
>>> [69070.552598]  [<ffffffff8604f5a7>] do_page_fault+0x37/0x90
>>> [69070.605960]  [<ffffffff86002a23>] ? syscall_return_slowpath+0x83/0xf0
>>> [69070.660705]  [<ffffffff866b32f8>] page_fault+0x28/0x30
>>> [69070.715707] Code: 86 b8 00 00 00 48 89 86 b0 00 00 00 48 85 c9 75
>>> ca 49
>>> 8b 81 b0 00 00 00 49 8b 49 78 31 d2 48 0f af 87 d8 01 00 00 5d 48 83
>>> c1 01
>>> <48> f7 f1 c3 4c 89 ce 48 8b 8e c0 00 00 00 48 8b 46 78 4c 89 86
>>> [69070.835144] RIP  [<ffffffff860b372c>] task_h_load+0xcc/0x100
>>> [69070.894095]  RSP <ffff880fda2c7b50>
>>> [69070.953213] ---[ end trace 8d6f449a03dacfd4 ]---
>>>
>>> Would be nice if we can fix this in 4.4?
>>
>> Does this also happen in 4.5?  Did this work on 4.4.5?  Some previous
>> release?  If you can find the offending patch using 'git bisect', that
>> would be great.
>>
>> thanks,
>>
>> greg k-h
>>
>> --
>> To unsubscribe, send a message with 'unsubscribe linux-mm' in
>> the body to majordomo@kvack.org.  For more info on Linux MM,
>> see: http://www.linux-mm.org/ .
>> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
>>
>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: divide error: 0000 [#1] SMP in task_numa_migrate - handle_mm_fault vanilla 4.4.6
  2016-03-20 21:27     ` Stefan Priebe
@ 2016-03-20 21:41       ` Greg KH
  2016-03-21 10:52         ` Stefan Priebe - Profihost AG
  0 siblings, 1 reply; 17+ messages in thread
From: Greg KH @ 2016-03-20 21:41 UTC (permalink / raw)
  To: Stefan Priebe
  Cc: Vlastimil Babka, LKML, stable, linux-mm, linux-mm,
	Peter Zijlstra, Ingo Molnar, Rik van Riel

On Sun, Mar 20, 2016 at 10:27:23PM +0100, Stefan Priebe wrote:
> 
> Am 19.03.2016 um 23:26 schrieb Vlastimil Babka:
> >On 03/17/2016 07:45 PM, Greg KH wrote:
> >>On Thu, Mar 17, 2016 at 07:38:03PM +0100, Stefan Priebe wrote:
> >>>Hi,
> >>>
> >>>while running qemu 2.5 on a host running 4.4.6 the host system has
> >>>crashed
> >>>(load > 200) 3 times in the last 3 days.
> >>>
> >>>Always with this stack trace: (copy left here:
> >>>http://pastebin.com/raw/bCWTLKyt)
> >>>
> >>>[69068.874268] divide error: 0000 [#1] SMP
> >>>[69068.875242] Modules linked in: ebtable_filter ebtables ip6t_REJECT
> >>>nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter
> >>>ip6_tables
> >>>ipt_REJECT nf_reject_ipv4 xt_physdev xt_comment nf_conntrack_ipv4
> >>>nf_defrag_ipv4 xt_tcpudp xt_mark xt_set xt_addrtype xt_conntrack
> >>>nf_conntrack ip_set_hash_net ip_set vhost_net tun vhost macvtap macvlan
> >>>kvm_intel nfnetlink_log kvm nfnetlink irqbypass netconsole dlm
> >>>xt_multiport
> >>>iptable_filter ip_tables x_tables iscsi_tcp libiscsi_tcp libiscsi
> >>>scsi_transport_iscsi nfsd auth_rpcgss oid_registry bonding coretemp
> >>>8021q
> >>>garp fuse i2c_i801 i7core_edac edac_core i5500_temp button btrfs xor
> >>>raid6_pq dm_mod raid1 md_mod usb_storage ohci_hcd bcache sg usbhid
> >>>sd_mod
> >>>ata_generic uhci_hcd ehci_pci ehci_hcd usbcore ata_piix usb_common igb
> >>>i2c_algo_bit mpt3sas raid_class ixgbe scsi_transport_sas i2c_core
> >>>mdio ptp
> >>>pps_core
> >>>[69068.895604] CPU: 14 PID: 6673 Comm: ceph-osd Not tainted
> >>>4.4.6+7-ph #1
> >>>[69068.897052] Hardware name: Supermicro X8DT3/X8DT3, BIOS 2.1
> >>>03/17/2012
> >>>[69068.898578] task: ffff880fc7f28000 ti: ffff880fda2c4000 task.ti:
> >>>ffff880fda2c4000
> >>>[69068.900377] RIP: 0010:[<ffffffff860b372c>]  [<ffffffff860b372c>]
> >>>task_h_load+0xcc/0x100
> >
> >decodecode says:
> >
> >   27:   48 83 c1 01             add    $0x1,%rcx
> >   2b:*  48 f7 f1                div    %rcx             <-- trapping
> >instruction
> >
> >This suggests the CONFIG_FAIR_GROUP_SCHED version of task_h_load:
> >
> >         update_cfs_rq_h_load(cfs_rq);
> >         return div64_ul(p->se.avg.load_avg * cfs_rq->h_load,
> >                         cfs_rq_load_avg(cfs_rq) + 1);
> >
> >So the load avg is -1, thus after adding 1 we get division by 0, huh?
> 
> Yes CONFIG_FAIR_GROUP_SCHED is set. I cherry picked now all those commits up
> to 4.5 for fair.c:
> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/log/kernel/sched/fair.c?h=v4.5
> 
> It didn't happen again with v4.4.6 + 4.5 patches for fair.c

Ok, that's a lot of patches, how about figuring out which single patch,
or shortest number of patches, makes things work again?

thanks,

greg k-h

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: divide error: 0000 [#1] SMP in task_numa_migrate - handle_mm_fault vanilla 4.4.6
  2016-03-20 21:41       ` Greg KH
@ 2016-03-21 10:52         ` Stefan Priebe - Profihost AG
  2016-03-21 13:38           ` Greg KH
  0 siblings, 1 reply; 17+ messages in thread
From: Stefan Priebe - Profihost AG @ 2016-03-21 10:52 UTC (permalink / raw)
  To: Greg KH
  Cc: Vlastimil Babka, LKML, stable, linux-mm, linux-mm,
	Peter Zijlstra, Ingo Molnar, Rik van Riel


Am 20.03.2016 um 22:41 schrieb Greg KH:
> On Sun, Mar 20, 2016 at 10:27:23PM +0100, Stefan Priebe wrote:
>>
>> Am 19.03.2016 um 23:26 schrieb Vlastimil Babka:
>>> On 03/17/2016 07:45 PM, Greg KH wrote:
>>>> On Thu, Mar 17, 2016 at 07:38:03PM +0100, Stefan Priebe wrote:
>>>>> Hi,
>>>>>
>>>>> while running qemu 2.5 on a host running 4.4.6 the host system has
>>>>> crashed
>>>>> (load > 200) 3 times in the last 3 days.
>>>>>
>>>>> Always with this stack trace: (copy left here:
>>>>> http://pastebin.com/raw/bCWTLKyt)
>>>>>
>>>>> [69068.874268] divide error: 0000 [#1] SMP
>>>>> [69068.875242] Modules linked in: ebtable_filter ebtables ip6t_REJECT
>>>>> nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter
>>>>> ip6_tables
>>>>> ipt_REJECT nf_reject_ipv4 xt_physdev xt_comment nf_conntrack_ipv4
>>>>> nf_defrag_ipv4 xt_tcpudp xt_mark xt_set xt_addrtype xt_conntrack
>>>>> nf_conntrack ip_set_hash_net ip_set vhost_net tun vhost macvtap macvlan
>>>>> kvm_intel nfnetlink_log kvm nfnetlink irqbypass netconsole dlm
>>>>> xt_multiport
>>>>> iptable_filter ip_tables x_tables iscsi_tcp libiscsi_tcp libiscsi
>>>>> scsi_transport_iscsi nfsd auth_rpcgss oid_registry bonding coretemp
>>>>> 8021q
>>>>> garp fuse i2c_i801 i7core_edac edac_core i5500_temp button btrfs xor
>>>>> raid6_pq dm_mod raid1 md_mod usb_storage ohci_hcd bcache sg usbhid
>>>>> sd_mod
>>>>> ata_generic uhci_hcd ehci_pci ehci_hcd usbcore ata_piix usb_common igb
>>>>> i2c_algo_bit mpt3sas raid_class ixgbe scsi_transport_sas i2c_core
>>>>> mdio ptp
>>>>> pps_core
>>>>> [69068.895604] CPU: 14 PID: 6673 Comm: ceph-osd Not tainted
>>>>> 4.4.6+7-ph #1
>>>>> [69068.897052] Hardware name: Supermicro X8DT3/X8DT3, BIOS 2.1
>>>>> 03/17/2012
>>>>> [69068.898578] task: ffff880fc7f28000 ti: ffff880fda2c4000 task.ti:
>>>>> ffff880fda2c4000
>>>>> [69068.900377] RIP: 0010:[<ffffffff860b372c>]  [<ffffffff860b372c>]
>>>>> task_h_load+0xcc/0x100
>>>
>>> decodecode says:
>>>
>>>   27:   48 83 c1 01             add    $0x1,%rcx
>>>   2b:*  48 f7 f1                div    %rcx             <-- trapping
>>> instruction
>>>
>>> This suggests the CONFIG_FAIR_GROUP_SCHED version of task_h_load:
>>>
>>>         update_cfs_rq_h_load(cfs_rq);
>>>         return div64_ul(p->se.avg.load_avg * cfs_rq->h_load,
>>>                         cfs_rq_load_avg(cfs_rq) + 1);
>>>
>>> So the load avg is -1, thus after adding 1 we get division by 0, huh?
>>
>> Yes CONFIG_FAIR_GROUP_SCHED is set. I cherry picked now all those commits up
>> to 4.5 for fair.c:
>> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/log/kernel/sched/fair.c?h=v4.5
>>
>> It didn't happen again with v4.4.6 + 4.5 patches for fair.c
> 
> Ok, that's a lot of patches, how about figuring out which single patch,
> or shortest number of patches, makes things work again?

will do so but it seems most out of those 9 patches are based on each
other. So it wouldn't be easy.

Stefan

> 
> thanks,
> 
> greg k-h
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: divide error: 0000 [#1] SMP in task_numa_migrate - handle_mm_fault vanilla 4.4.6
  2016-03-21 10:52         ` Stefan Priebe - Profihost AG
@ 2016-03-21 13:38           ` Greg KH
  2016-05-17  6:01             ` Stefan Priebe - Profihost AG
  0 siblings, 1 reply; 17+ messages in thread
From: Greg KH @ 2016-03-21 13:38 UTC (permalink / raw)
  To: Stefan Priebe - Profihost AG
  Cc: Vlastimil Babka, LKML, stable, linux-mm, linux-mm,
	Peter Zijlstra, Ingo Molnar, Rik van Riel

On Mon, Mar 21, 2016 at 11:52:23AM +0100, Stefan Priebe - Profihost AG wrote:
> 
> Am 20.03.2016 um 22:41 schrieb Greg KH:
> > On Sun, Mar 20, 2016 at 10:27:23PM +0100, Stefan Priebe wrote:
> >>
> >> Am 19.03.2016 um 23:26 schrieb Vlastimil Babka:
> >>> On 03/17/2016 07:45 PM, Greg KH wrote:
> >>>> On Thu, Mar 17, 2016 at 07:38:03PM +0100, Stefan Priebe wrote:
> >>>>> Hi,
> >>>>>
> >>>>> while running qemu 2.5 on a host running 4.4.6 the host system has
> >>>>> crashed
> >>>>> (load > 200) 3 times in the last 3 days.
> >>>>>
> >>>>> Always with this stack trace: (copy left here:
> >>>>> http://pastebin.com/raw/bCWTLKyt)
> >>>>>
> >>>>> [69068.874268] divide error: 0000 [#1] SMP
> >>>>> [69068.875242] Modules linked in: ebtable_filter ebtables ip6t_REJECT
> >>>>> nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter
> >>>>> ip6_tables
> >>>>> ipt_REJECT nf_reject_ipv4 xt_physdev xt_comment nf_conntrack_ipv4
> >>>>> nf_defrag_ipv4 xt_tcpudp xt_mark xt_set xt_addrtype xt_conntrack
> >>>>> nf_conntrack ip_set_hash_net ip_set vhost_net tun vhost macvtap macvlan
> >>>>> kvm_intel nfnetlink_log kvm nfnetlink irqbypass netconsole dlm
> >>>>> xt_multiport
> >>>>> iptable_filter ip_tables x_tables iscsi_tcp libiscsi_tcp libiscsi
> >>>>> scsi_transport_iscsi nfsd auth_rpcgss oid_registry bonding coretemp
> >>>>> 8021q
> >>>>> garp fuse i2c_i801 i7core_edac edac_core i5500_temp button btrfs xor
> >>>>> raid6_pq dm_mod raid1 md_mod usb_storage ohci_hcd bcache sg usbhid
> >>>>> sd_mod
> >>>>> ata_generic uhci_hcd ehci_pci ehci_hcd usbcore ata_piix usb_common igb
> >>>>> i2c_algo_bit mpt3sas raid_class ixgbe scsi_transport_sas i2c_core
> >>>>> mdio ptp
> >>>>> pps_core
> >>>>> [69068.895604] CPU: 14 PID: 6673 Comm: ceph-osd Not tainted
> >>>>> 4.4.6+7-ph #1
> >>>>> [69068.897052] Hardware name: Supermicro X8DT3/X8DT3, BIOS 2.1
> >>>>> 03/17/2012
> >>>>> [69068.898578] task: ffff880fc7f28000 ti: ffff880fda2c4000 task.ti:
> >>>>> ffff880fda2c4000
> >>>>> [69068.900377] RIP: 0010:[<ffffffff860b372c>]  [<ffffffff860b372c>]
> >>>>> task_h_load+0xcc/0x100
> >>>
> >>> decodecode says:
> >>>
> >>>   27:   48 83 c1 01             add    $0x1,%rcx
> >>>   2b:*  48 f7 f1                div    %rcx             <-- trapping
> >>> instruction
> >>>
> >>> This suggests the CONFIG_FAIR_GROUP_SCHED version of task_h_load:
> >>>
> >>>         update_cfs_rq_h_load(cfs_rq);
> >>>         return div64_ul(p->se.avg.load_avg * cfs_rq->h_load,
> >>>                         cfs_rq_load_avg(cfs_rq) + 1);
> >>>
> >>> So the load avg is -1, thus after adding 1 we get division by 0, huh?
> >>
> >> Yes CONFIG_FAIR_GROUP_SCHED is set. I cherry picked now all those commits up
> >> to 4.5 for fair.c:
> >> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/log/kernel/sched/fair.c?h=v4.5
> >>
> >> It didn't happen again with v4.4.6 + 4.5 patches for fair.c
> > 
> > Ok, that's a lot of patches, how about figuring out which single patch,
> > or shortest number of patches, makes things work again?
> 
> will do so but it seems most out of those 9 patches are based on each
> other. So it wouldn't be easy.

Worst case, we take all 9.  Best case, we only need one :)

thanks,

greg k-h

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: divide error: 0000 [#1] SMP in task_numa_migrate - handle_mm_fault vanilla 4.4.6
  2016-03-21 13:38           ` Greg KH
@ 2016-05-17  6:01             ` Stefan Priebe - Profihost AG
  2016-05-17  9:21               ` Campbell Steven
  0 siblings, 1 reply; 17+ messages in thread
From: Stefan Priebe - Profihost AG @ 2016-05-17  6:01 UTC (permalink / raw)
  To: Greg KH
  Cc: Vlastimil Babka, LKML, Campbell Steven, linux-mm, linux-mm,
	Peter Zijlstra, Ingo Molnar, Rik van Riel

Am 21.03.2016 um 14:38 schrieb Greg KH:
> On Mon, Mar 21, 2016 at 11:52:23AM +0100, Stefan Priebe - Profihost AG wrote:
>>
>> Am 20.03.2016 um 22:41 schrieb Greg KH:
>>> On Sun, Mar 20, 2016 at 10:27:23PM +0100, Stefan Priebe wrote:
>>>>
>>>> Am 19.03.2016 um 23:26 schrieb Vlastimil Babka:
>>>>> On 03/17/2016 07:45 PM, Greg KH wrote:
>>>>>> On Thu, Mar 17, 2016 at 07:38:03PM +0100, Stefan Priebe wrote:
>>>>>>> Hi,
>>>>>>>
>>>>>>> while running qemu 2.5 on a host running 4.4.6 the host system has
>>>>>>> crashed
>>>>>>> (load > 200) 3 times in the last 3 days.
>>>>>>>
>>>>>>> Always with this stack trace: (copy left here:
>>>>>>> http://pastebin.com/raw/bCWTLKyt)
>>>>>>>
>>>>>>> [69068.874268] divide error: 0000 [#1] SMP
>>>>>>> [69068.875242] Modules linked in: ebtable_filter ebtables ip6t_REJECT
>>>>>>> nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter
>>>>>>> ip6_tables
>>>>>>> ipt_REJECT nf_reject_ipv4 xt_physdev xt_comment nf_conntrack_ipv4
>>>>>>> nf_defrag_ipv4 xt_tcpudp xt_mark xt_set xt_addrtype xt_conntrack
>>>>>>> nf_conntrack ip_set_hash_net ip_set vhost_net tun vhost macvtap macvlan
>>>>>>> kvm_intel nfnetlink_log kvm nfnetlink irqbypass netconsole dlm
>>>>>>> xt_multiport
>>>>>>> iptable_filter ip_tables x_tables iscsi_tcp libiscsi_tcp libiscsi
>>>>>>> scsi_transport_iscsi nfsd auth_rpcgss oid_registry bonding coretemp
>>>>>>> 8021q
>>>>>>> garp fuse i2c_i801 i7core_edac edac_core i5500_temp button btrfs xor
>>>>>>> raid6_pq dm_mod raid1 md_mod usb_storage ohci_hcd bcache sg usbhid
>>>>>>> sd_mod
>>>>>>> ata_generic uhci_hcd ehci_pci ehci_hcd usbcore ata_piix usb_common igb
>>>>>>> i2c_algo_bit mpt3sas raid_class ixgbe scsi_transport_sas i2c_core
>>>>>>> mdio ptp
>>>>>>> pps_core
>>>>>>> [69068.895604] CPU: 14 PID: 6673 Comm: ceph-osd Not tainted
>>>>>>> 4.4.6+7-ph #1
>>>>>>> [69068.897052] Hardware name: Supermicro X8DT3/X8DT3, BIOS 2.1
>>>>>>> 03/17/2012
>>>>>>> [69068.898578] task: ffff880fc7f28000 ti: ffff880fda2c4000 task.ti:
>>>>>>> ffff880fda2c4000
>>>>>>> [69068.900377] RIP: 0010:[<ffffffff860b372c>]  [<ffffffff860b372c>]
>>>>>>> task_h_load+0xcc/0x100
>>>>>
>>>>> decodecode says:
>>>>>
>>>>>   27:   48 83 c1 01             add    $0x1,%rcx
>>>>>   2b:*  48 f7 f1                div    %rcx             <-- trapping
>>>>> instruction
>>>>>
>>>>> This suggests the CONFIG_FAIR_GROUP_SCHED version of task_h_load:
>>>>>
>>>>>         update_cfs_rq_h_load(cfs_rq);
>>>>>         return div64_ul(p->se.avg.load_avg * cfs_rq->h_load,
>>>>>                         cfs_rq_load_avg(cfs_rq) + 1);
>>>>>
>>>>> So the load avg is -1, thus after adding 1 we get division by 0, huh?
>>>>
>>>> Yes CONFIG_FAIR_GROUP_SCHED is set. I cherry picked now all those commits up
>>>> to 4.5 for fair.c:
>>>> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/log/kernel/sched/fair.c?h=v4.5
>>>>
>>>> It didn't happen again with v4.4.6 + 4.5 patches for fair.c
>>>
>>> Ok, that's a lot of patches, how about figuring out which single patch,
>>> or shortest number of patches, makes things work again?
>>
>> will do so but it seems most out of those 9 patches are based on each
>> other. So it wouldn't be easy.
> 
> Worst case, we take all 9.  Best case, we only need one :)

Today i got another report from: casteven (at) gmail.com having the same
issue. I wasn't able to figure out which of those 9 patches helps - i
still can just tell that it does not happen with this series on top of 4.4.

Stefan

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: divide error: 0000 [#1] SMP in task_numa_migrate - handle_mm_fault vanilla 4.4.6
  2016-05-17  6:01             ` Stefan Priebe - Profihost AG
@ 2016-05-17  9:21               ` Campbell Steven
  2016-06-22  1:19                 ` Campbell Steven
  0 siblings, 1 reply; 17+ messages in thread
From: Campbell Steven @ 2016-05-17  9:21 UTC (permalink / raw)
  To: Stefan Priebe - Profihost AG
  Cc: Greg KH, Vlastimil Babka, LKML, linux-mm, linux-mm,
	Peter Zijlstra, Ingo Molnar, Rik van Riel

Thanks Stefan,

I am seeing this on 4.5.0 and 4.5.4 both are compiled from mainline
neither include any patches over and above the tree. I ran for well
over a month in production on 4.5.0 with no issues at all on a single
socket server (E5-2670 v3 @ 2.30GHz) but as soon as we try to run
either 4.5.0 or 4.5.4 on a dual socket box we get these traces after
12 or so hours of uptime and the same sky rocketing load. Given you
have stopped your issues by applying those patches to 4.4 it's
possible I have different issues, but the symptoms are too similar to
ignore, our trace is here:

http://pastebin.com/Q7nd8nfP

I am using the cfg scheduler, and not the fair scheduler as Stefan
reported so that is another difference.

Happy to proceed with any suggestions or further information that may
help isolate this, use case on these servers are 12 x CEPH osd's with
bcache so they are seeing alot of disk activity via NVMe and SATA
buses.

Thanks

Campbell

On 17 May 2016 at 18:01, Stefan Priebe - Profihost AG
<s.priebe@profihost.ag> wrote:
> Am 21.03.2016 um 14:38 schrieb Greg KH:
>> On Mon, Mar 21, 2016 at 11:52:23AM +0100, Stefan Priebe - Profihost AG wrote:
>>>
>>> Am 20.03.2016 um 22:41 schrieb Greg KH:
>>>> On Sun, Mar 20, 2016 at 10:27:23PM +0100, Stefan Priebe wrote:
>>>>>
>>>>> Am 19.03.2016 um 23:26 schrieb Vlastimil Babka:
>>>>>> On 03/17/2016 07:45 PM, Greg KH wrote:
>>>>>>> On Thu, Mar 17, 2016 at 07:38:03PM +0100, Stefan Priebe wrote:
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> while running qemu 2.5 on a host running 4.4.6 the host system has
>>>>>>>> crashed
>>>>>>>> (load > 200) 3 times in the last 3 days.
>>>>>>>>
>>>>>>>> Always with this stack trace: (copy left here:
>>>>>>>> http://pastebin.com/raw/bCWTLKyt)
>>>>>>>>
>>>>>>>> [69068.874268] divide error: 0000 [#1] SMP
>>>>>>>> [69068.875242] Modules linked in: ebtable_filter ebtables ip6t_REJECT
>>>>>>>> nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter
>>>>>>>> ip6_tables
>>>>>>>> ipt_REJECT nf_reject_ipv4 xt_physdev xt_comment nf_conntrack_ipv4
>>>>>>>> nf_defrag_ipv4 xt_tcpudp xt_mark xt_set xt_addrtype xt_conntrack
>>>>>>>> nf_conntrack ip_set_hash_net ip_set vhost_net tun vhost macvtap macvlan
>>>>>>>> kvm_intel nfnetlink_log kvm nfnetlink irqbypass netconsole dlm
>>>>>>>> xt_multiport
>>>>>>>> iptable_filter ip_tables x_tables iscsi_tcp libiscsi_tcp libiscsi
>>>>>>>> scsi_transport_iscsi nfsd auth_rpcgss oid_registry bonding coretemp
>>>>>>>> 8021q
>>>>>>>> garp fuse i2c_i801 i7core_edac edac_core i5500_temp button btrfs xor
>>>>>>>> raid6_pq dm_mod raid1 md_mod usb_storage ohci_hcd bcache sg usbhid
>>>>>>>> sd_mod
>>>>>>>> ata_generic uhci_hcd ehci_pci ehci_hcd usbcore ata_piix usb_common igb
>>>>>>>> i2c_algo_bit mpt3sas raid_class ixgbe scsi_transport_sas i2c_core
>>>>>>>> mdio ptp
>>>>>>>> pps_core
>>>>>>>> [69068.895604] CPU: 14 PID: 6673 Comm: ceph-osd Not tainted
>>>>>>>> 4.4.6+7-ph #1
>>>>>>>> [69068.897052] Hardware name: Supermicro X8DT3/X8DT3, BIOS 2.1
>>>>>>>> 03/17/2012
>>>>>>>> [69068.898578] task: ffff880fc7f28000 ti: ffff880fda2c4000 task.ti:
>>>>>>>> ffff880fda2c4000
>>>>>>>> [69068.900377] RIP: 0010:[<ffffffff860b372c>]  [<ffffffff860b372c>]
>>>>>>>> task_h_load+0xcc/0x100
>>>>>>
>>>>>> decodecode says:
>>>>>>
>>>>>>   27:   48 83 c1 01             add    $0x1,%rcx
>>>>>>   2b:*  48 f7 f1                div    %rcx             <-- trapping
>>>>>> instruction
>>>>>>
>>>>>> This suggests the CONFIG_FAIR_GROUP_SCHED version of task_h_load:
>>>>>>
>>>>>>         update_cfs_rq_h_load(cfs_rq);
>>>>>>         return div64_ul(p->se.avg.load_avg * cfs_rq->h_load,
>>>>>>                         cfs_rq_load_avg(cfs_rq) + 1);
>>>>>>
>>>>>> So the load avg is -1, thus after adding 1 we get division by 0, huh?
>>>>>
>>>>> Yes CONFIG_FAIR_GROUP_SCHED is set. I cherry picked now all those commits up
>>>>> to 4.5 for fair.c:
>>>>> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/log/kernel/sched/fair.c?h=v4.5
>>>>>
>>>>> It didn't happen again with v4.4.6 + 4.5 patches for fair.c
>>>>
>>>> Ok, that's a lot of patches, how about figuring out which single patch,
>>>> or shortest number of patches, makes things work again?
>>>
>>> will do so but it seems most out of those 9 patches are based on each
>>> other. So it wouldn't be easy.
>>
>> Worst case, we take all 9.  Best case, we only need one :)
>
> Today i got another report from: casteven (at) gmail.com having the same
> issue. I wasn't able to figure out which of those 9 patches helps - i
> still can just tell that it does not happen with this series on top of 4.4.
>
> Stefan

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: divide error: 0000 [#1] SMP in task_numa_migrate - handle_mm_fault vanilla 4.4.6
  2016-05-17  9:21               ` Campbell Steven
@ 2016-06-22  1:19                 ` Campbell Steven
  2016-06-22  6:13                   ` Peter Zijlstra
  0 siblings, 1 reply; 17+ messages in thread
From: Campbell Steven @ 2016-06-22  1:19 UTC (permalink / raw)
  To: Stefan Priebe - Profihost AG
  Cc: Greg KH, Vlastimil Babka, LKML, linux-mm, linux-mm,
	Peter Zijlstra, Ingo Molnar, Rik van Riel

On 17 May 2016 at 21:21, Campbell Steven <casteven@gmail.com> wrote:
> Thanks Stefan,
>
> I am seeing this on 4.5.0 and 4.5.4 both are compiled from mainline
> neither include any patches over and above the tree. I ran for well
> over a month in production on 4.5.0 with no issues at all on a single
> socket server (E5-2670 v3 @ 2.30GHz) but as soon as we try to run
> either 4.5.0 or 4.5.4 on a dual socket box we get these traces after
> 12 or so hours of uptime and the same sky rocketing load. Given you
> have stopped your issues by applying those patches to 4.4 it's
> possible I have different issues, but the symptoms are too similar to
> ignore, our trace is here:
>
> http://pastebin.com/Q7nd8nfP
>
> I am using the cfg scheduler, and not the fair scheduler as Stefan
> reported so that is another difference.
>
> Happy to proceed with any suggestions or further information that may
> help isolate this, use case on these servers are 12 x CEPH osd's with
> bcache so they are seeing alot of disk activity via NVMe and SATA
> buses.
>
> Thanks
>
> Campbell
>
> On 17 May 2016 at 18:01, Stefan Priebe - Profihost AG
> <s.priebe@profihost.ag> wrote:
>> Am 21.03.2016 um 14:38 schrieb Greg KH:
>>> On Mon, Mar 21, 2016 at 11:52:23AM +0100, Stefan Priebe - Profihost AG wrote:
>>>>
>>>> Am 20.03.2016 um 22:41 schrieb Greg KH:
>>>>> On Sun, Mar 20, 2016 at 10:27:23PM +0100, Stefan Priebe wrote:
>>>>>>
>>>>>> Am 19.03.2016 um 23:26 schrieb Vlastimil Babka:
>>>>>>> On 03/17/2016 07:45 PM, Greg KH wrote:
>>>>>>>> On Thu, Mar 17, 2016 at 07:38:03PM +0100, Stefan Priebe wrote:
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> while running qemu 2.5 on a host running 4.4.6 the host system has
>>>>>>>>> crashed
>>>>>>>>> (load > 200) 3 times in the last 3 days.
>>>>>>>>>
>>>>>>>>> Always with this stack trace: (copy left here:
>>>>>>>>> http://pastebin.com/raw/bCWTLKyt)
>>>>>>>>>
>>>>>>>>> [69068.874268] divide error: 0000 [#1] SMP
>>>>>>>>> [69068.875242] Modules linked in: ebtable_filter ebtables ip6t_REJECT
>>>>>>>>> nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter
>>>>>>>>> ip6_tables
>>>>>>>>> ipt_REJECT nf_reject_ipv4 xt_physdev xt_comment nf_conntrack_ipv4
>>>>>>>>> nf_defrag_ipv4 xt_tcpudp xt_mark xt_set xt_addrtype xt_conntrack
>>>>>>>>> nf_conntrack ip_set_hash_net ip_set vhost_net tun vhost macvtap macvlan
>>>>>>>>> kvm_intel nfnetlink_log kvm nfnetlink irqbypass netconsole dlm
>>>>>>>>> xt_multiport
>>>>>>>>> iptable_filter ip_tables x_tables iscsi_tcp libiscsi_tcp libiscsi
>>>>>>>>> scsi_transport_iscsi nfsd auth_rpcgss oid_registry bonding coretemp
>>>>>>>>> 8021q
>>>>>>>>> garp fuse i2c_i801 i7core_edac edac_core i5500_temp button btrfs xor
>>>>>>>>> raid6_pq dm_mod raid1 md_mod usb_storage ohci_hcd bcache sg usbhid
>>>>>>>>> sd_mod
>>>>>>>>> ata_generic uhci_hcd ehci_pci ehci_hcd usbcore ata_piix usb_common igb
>>>>>>>>> i2c_algo_bit mpt3sas raid_class ixgbe scsi_transport_sas i2c_core
>>>>>>>>> mdio ptp
>>>>>>>>> pps_core
>>>>>>>>> [69068.895604] CPU: 14 PID: 6673 Comm: ceph-osd Not tainted
>>>>>>>>> 4.4.6+7-ph #1
>>>>>>>>> [69068.897052] Hardware name: Supermicro X8DT3/X8DT3, BIOS 2.1
>>>>>>>>> 03/17/2012
>>>>>>>>> [69068.898578] task: ffff880fc7f28000 ti: ffff880fda2c4000 task.ti:
>>>>>>>>> ffff880fda2c4000
>>>>>>>>> [69068.900377] RIP: 0010:[<ffffffff860b372c>]  [<ffffffff860b372c>]
>>>>>>>>> task_h_load+0xcc/0x100
>>>>>>>
>>>>>>> decodecode says:
>>>>>>>
>>>>>>>   27:   48 83 c1 01             add    $0x1,%rcx
>>>>>>>   2b:*  48 f7 f1                div    %rcx             <-- trapping
>>>>>>> instruction
>>>>>>>
>>>>>>> This suggests the CONFIG_FAIR_GROUP_SCHED version of task_h_load:
>>>>>>>
>>>>>>>         update_cfs_rq_h_load(cfs_rq);
>>>>>>>         return div64_ul(p->se.avg.load_avg * cfs_rq->h_load,
>>>>>>>                         cfs_rq_load_avg(cfs_rq) + 1);
>>>>>>>
>>>>>>> So the load avg is -1, thus after adding 1 we get division by 0, huh?
>>>>>>
>>>>>> Yes CONFIG_FAIR_GROUP_SCHED is set. I cherry picked now all those commits up
>>>>>> to 4.5 for fair.c:
>>>>>> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/log/kernel/sched/fair.c?h=v4.5
>>>>>>
>>>>>> It didn't happen again with v4.4.6 + 4.5 patches for fair.c
>>>>>
>>>>> Ok, that's a lot of patches, how about figuring out which single patch,
>>>>> or shortest number of patches, makes things work again?
>>>>
>>>> will do so but it seems most out of those 9 patches are based on each
>>>> other. So it wouldn't be easy.
>>>
>>> Worst case, we take all 9.  Best case, we only need one :)
>>
>> Today i got another report from: casteven (at) gmail.com having the same
>> issue. I wasn't able to figure out which of those 9 patches helps - i
>> still can just tell that it does not happen with this series on top of 4.4.
>>
>> Stefan

Just reinvigorating this thread based off yesterdays post by Yannis
here: https://lkml.org/lkml/headers/2016/6/21/491 of which the trace
is more or less is identical to what we see.

In addition to this we have since tested the 4.6.x releases and have
seen the same issues as we have in all the 4.4 and 4.5 releases we
tried. I have noticed a few commits in this area of the kernel into
the 4.7 series, curious if this may be solved already in 4.7 and if
there is a patch to 4.6 we could possibly test, our constraint is that
we are only able to reproduce this in our production environment it
seems, we never see it in our test/dev systems.

We have also found we can work around the issue by disabling all cores
on the second CPU socket (e.g. echo 1 >
/sys/devices/system/cpu/cpuX/online), doing this "fixes" this problem
for us but is obviously not ideal or long term but gives stability if
like us you need to run kernels in these series for various other
unrelated reasons and can do without your second socket.

Campbell

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: divide error: 0000 [#1] SMP in task_numa_migrate - handle_mm_fault vanilla 4.4.6
  2016-06-22  1:19                 ` Campbell Steven
@ 2016-06-22  6:13                   ` Peter Zijlstra
  2016-07-06 23:20                     ` Campbell Steven
  0 siblings, 1 reply; 17+ messages in thread
From: Peter Zijlstra @ 2016-06-22  6:13 UTC (permalink / raw)
  To: Campbell Steven
  Cc: Stefan Priebe - Profihost AG, Greg KH, Vlastimil Babka, LKML,
	linux-mm, linux-mm, Ingo Molnar, Rik van Riel

On Wed, Jun 22, 2016 at 01:19:54PM +1200, Campbell Steven wrote:
> >>>>>>> This suggests the CONFIG_FAIR_GROUP_SCHED version of task_h_load:
> >>>>>>>
> >>>>>>>         update_cfs_rq_h_load(cfs_rq);
> >>>>>>>         return div64_ul(p->se.avg.load_avg * cfs_rq->h_load,
> >>>>>>>                         cfs_rq_load_avg(cfs_rq) + 1);
> >>>>>>>


---
commit 8974189222159154c55f24ddad33e3613960521a
Author: Peter Zijlstra <peterz@infradead.org>
Date:   Thu Jun 16 10:50:40 2016 +0200

    sched/fair: Fix cfs_rq avg tracking underflow
    
    As per commit:
    
      b7fa30c9cc48 ("sched/fair: Fix post_init_entity_util_avg() serialization")
    
    > the code generated from update_cfs_rq_load_avg():
    >
    > 	if (atomic_long_read(&cfs_rq->removed_load_avg)) {
    > 		s64 r = atomic_long_xchg(&cfs_rq->removed_load_avg, 0);
    > 		sa->load_avg = max_t(long, sa->load_avg - r, 0);
    > 		sa->load_sum = max_t(s64, sa->load_sum - r * LOAD_AVG_MAX, 0);
    > 		removed_load = 1;
    > 	}
    >
    > turns into:
    >
    > ffffffff81087064:       49 8b 85 98 00 00 00    mov    0x98(%r13),%rax
    > ffffffff8108706b:       48 85 c0                test   %rax,%rax
    > ffffffff8108706e:       74 40                   je     ffffffff810870b0 <update_blocked_averages+0xc0>
    > ffffffff81087070:       4c 89 f8                mov    %r15,%rax
    > ffffffff81087073:       49 87 85 98 00 00 00    xchg   %rax,0x98(%r13)
    > ffffffff8108707a:       49 29 45 70             sub    %rax,0x70(%r13)
    > ffffffff8108707e:       4c 89 f9                mov    %r15,%rcx
    > ffffffff81087081:       bb 01 00 00 00          mov    $0x1,%ebx
    > ffffffff81087086:       49 83 7d 70 00          cmpq   $0x0,0x70(%r13)
    > ffffffff8108708b:       49 0f 49 4d 70          cmovns 0x70(%r13),%rcx
    >
    > Which you'll note ends up with sa->load_avg -= r in memory at
    > ffffffff8108707a.
    
    So I _should_ have looked at other unserialized users of ->load_avg,
    but alas. Luckily nikbor reported a similar /0 from task_h_load() which
    instantly triggered recollection of this here problem.
    
    Aside from the intermediate value hitting memory and causing problems,
    there's another problem: the underflow detection relies on the signed
    bit. This reduces the effective width of the variables, IOW its
    effectively the same as having these variables be of signed type.
    
    This patch changes to a different means of unsigned underflow
    detection to not rely on the signed bit. This allows the variables to
    use the 'full' unsigned range. And it does so with explicit LOAD -
    STORE to ensure any intermediate value will never be visible in
    memory, allowing these unserialized loads.
    
    Note: GCC generates crap code for this, might warrant a look later.
    
    Note2: I say 'full' above, if we end up at U*_MAX we'll still explode;
           maybe we should do clamping on add too.
    
    Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
    Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
    Cc: Chris Wilson <chris@chris-wilson.co.uk>
    Cc: Linus Torvalds <torvalds@linux-foundation.org>
    Cc: Mike Galbraith <efault@gmx.de>
    Cc: Peter Zijlstra <peterz@infradead.org>
    Cc: Thomas Gleixner <tglx@linutronix.de>
    Cc: Yuyang Du <yuyang.du@intel.com>
    Cc: bsegall@google.com
    Cc: kernel@kyup.com
    Cc: morten.rasmussen@arm.com
    Cc: pjt@google.com
    Cc: steve.muckle@linaro.org
    Fixes: 9d89c257dfb9 ("sched/fair: Rewrite runnable load and utilization average tracking")
    Link: http://lkml.kernel.org/r/20160617091948.GJ30927@twins.programming.kicks-ass.net
    Signed-off-by: Ingo Molnar <mingo@kernel.org>

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index a2348deab7a3..2ae68f0e3bf5 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -2904,6 +2904,23 @@ static inline void cfs_rq_util_change(struct cfs_rq *cfs_rq)
 	}
 }
 
+/*
+ * Unsigned subtract and clamp on underflow.
+ *
+ * Explicitly do a load-store to ensure the intermediate value never hits
+ * memory. This allows lockless observations without ever seeing the negative
+ * values.
+ */
+#define sub_positive(_ptr, _val) do {				\
+	typeof(_ptr) ptr = (_ptr);				\
+	typeof(*ptr) val = (_val);				\
+	typeof(*ptr) res, var = READ_ONCE(*ptr);		\
+	res = var - val;					\
+	if (res > var)						\
+		res = 0;					\
+	WRITE_ONCE(*ptr, res);					\
+} while (0)
+
 /* Group cfs_rq's load_avg is used for task_h_load and update_cfs_share */
 static inline int
 update_cfs_rq_load_avg(u64 now, struct cfs_rq *cfs_rq, bool update_freq)
@@ -2913,15 +2930,15 @@ update_cfs_rq_load_avg(u64 now, struct cfs_rq *cfs_rq, bool update_freq)
 
 	if (atomic_long_read(&cfs_rq->removed_load_avg)) {
 		s64 r = atomic_long_xchg(&cfs_rq->removed_load_avg, 0);
-		sa->load_avg = max_t(long, sa->load_avg - r, 0);
-		sa->load_sum = max_t(s64, sa->load_sum - r * LOAD_AVG_MAX, 0);
+		sub_positive(&sa->load_avg, r);
+		sub_positive(&sa->load_sum, r * LOAD_AVG_MAX);
 		removed_load = 1;
 	}
 
 	if (atomic_long_read(&cfs_rq->removed_util_avg)) {
 		long r = atomic_long_xchg(&cfs_rq->removed_util_avg, 0);
-		sa->util_avg = max_t(long, sa->util_avg - r, 0);
-		sa->util_sum = max_t(s32, sa->util_sum - r * LOAD_AVG_MAX, 0);
+		sub_positive(&sa->util_avg, r);
+		sub_positive(&sa->util_sum, r * LOAD_AVG_MAX);
 		removed_util = 1;
 	}
 
@@ -2994,10 +3011,10 @@ static void detach_entity_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *s
 			  &se->avg, se->on_rq * scale_load_down(se->load.weight),
 			  cfs_rq->curr == se, NULL);
 
-	cfs_rq->avg.load_avg = max_t(long, cfs_rq->avg.load_avg - se->avg.load_avg, 0);
-	cfs_rq->avg.load_sum = max_t(s64,  cfs_rq->avg.load_sum - se->avg.load_sum, 0);
-	cfs_rq->avg.util_avg = max_t(long, cfs_rq->avg.util_avg - se->avg.util_avg, 0);
-	cfs_rq->avg.util_sum = max_t(s32,  cfs_rq->avg.util_sum - se->avg.util_sum, 0);
+	sub_positive(&cfs_rq->avg.load_avg, se->avg.load_avg);
+	sub_positive(&cfs_rq->avg.load_sum, se->avg.load_sum);
+	sub_positive(&cfs_rq->avg.util_avg, se->avg.util_avg);
+	sub_positive(&cfs_rq->avg.util_sum, se->avg.util_sum);
 
 	cfs_rq_util_change(cfs_rq);
 }

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: divide error: 0000 [#1] SMP in task_numa_migrate - handle_mm_fault vanilla 4.4.6
  2016-06-22  6:13                   ` Peter Zijlstra
@ 2016-07-06 23:20                     ` Campbell Steven
  2016-07-07  7:42                       ` Peter Zijlstra
  0 siblings, 1 reply; 17+ messages in thread
From: Campbell Steven @ 2016-07-06 23:20 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Stefan Priebe - Profihost AG, Greg KH, Vlastimil Babka, LKML,
	linux-mm, linux-mm, Ingo Molnar, Rik van Riel

On 22 June 2016 at 18:13, Peter Zijlstra <peterz@infradead.org> wrote:
> On Wed, Jun 22, 2016 at 01:19:54PM +1200, Campbell Steven wrote:
>> >>>>>>> This suggests the CONFIG_FAIR_GROUP_SCHED version of task_h_load:
>> >>>>>>>
>> >>>>>>>         update_cfs_rq_h_load(cfs_rq);
>> >>>>>>>         return div64_ul(p->se.avg.load_avg * cfs_rq->h_load,
>> >>>>>>>                         cfs_rq_load_avg(cfs_rq) + 1);
>> >>>>>>>
>
>
> ---
> commit 8974189222159154c55f24ddad33e3613960521a
> Author: Peter Zijlstra <peterz@infradead.org>
> Date:   Thu Jun 16 10:50:40 2016 +0200
>
>     sched/fair: Fix cfs_rq avg tracking underflow
>
>     As per commit:
>
>       b7fa30c9cc48 ("sched/fair: Fix post_init_entity_util_avg() serialization")
>
>     > the code generated from update_cfs_rq_load_avg():
>     >
>     >   if (atomic_long_read(&cfs_rq->removed_load_avg)) {
>     >           s64 r = atomic_long_xchg(&cfs_rq->removed_load_avg, 0);
>     >           sa->load_avg = max_t(long, sa->load_avg - r, 0);
>     >           sa->load_sum = max_t(s64, sa->load_sum - r * LOAD_AVG_MAX, 0);
>     >           removed_load = 1;
>     >   }


Hi Peter,

I just wanted to report back to say thanks for this, and we have (and
others) have tested this out in 4.7 rc6 and have not been able to
repeat the issue. It seems that anyone running busy ceph osd's or high
load KVM instances is able to trigger this on a dual socket box pretty
easily.

Since these early reports from Stefan and I it looks like it's been
hit but alot more folks now so I'd like to ask what the process is for
getting this backported into 4.6, 4.5 and 4.4 as in our testing all
those versions for their latest point release seem to have the same
problem.

Thanks

Campbell

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: divide error: 0000 [#1] SMP in task_numa_migrate - handle_mm_fault vanilla 4.4.6
  2016-07-06 23:20                     ` Campbell Steven
@ 2016-07-07  7:42                       ` Peter Zijlstra
  2016-07-09  5:21                         ` Greg KH
  2016-07-11 22:33                         ` Greg KH
  0 siblings, 2 replies; 17+ messages in thread
From: Peter Zijlstra @ 2016-07-07  7:42 UTC (permalink / raw)
  To: Campbell Steven
  Cc: Stefan Priebe - Profihost AG, Greg KH, Vlastimil Babka, LKML,
	linux-mm, linux-mm, Ingo Molnar, Rik van Riel

On Thu, Jul 07, 2016 at 11:20:36AM +1200, Campbell Steven wrote:

> > commit 8974189222159154c55f24ddad33e3613960521a
> > Author: Peter Zijlstra <peterz@infradead.org>
> > Date:   Thu Jun 16 10:50:40 2016 +0200

> Since these early reports from Stefan and I it looks like it's been
> hit but alot more folks now so I'd like to ask what the process is for
> getting this backported into 4.6, 4.5 and 4.4 as in our testing all
> those versions for their latest point release seem to have the same
> problem.

I think this should do; Greg is on Cc and will mark the commit
somewhere. It is already in Linus' tree and should indeed be sufficient.

It has a Fixes tag referring the commit that introduced it, which IIRC
is somewhere around v4.2.

Greg, anything else required?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: divide error: 0000 [#1] SMP in task_numa_migrate - handle_mm_fault vanilla 4.4.6
  2016-07-07  7:42                       ` Peter Zijlstra
@ 2016-07-09  5:21                         ` Greg KH
  2016-07-11 22:33                         ` Greg KH
  1 sibling, 0 replies; 17+ messages in thread
From: Greg KH @ 2016-07-09  5:21 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Campbell Steven, Stefan Priebe - Profihost AG, Vlastimil Babka,
	LKML, linux-mm, linux-mm, Ingo Molnar, Rik van Riel

On Thu, Jul 07, 2016 at 09:42:32AM +0200, Peter Zijlstra wrote:
> On Thu, Jul 07, 2016 at 11:20:36AM +1200, Campbell Steven wrote:
> 
> > > commit 8974189222159154c55f24ddad33e3613960521a
> > > Author: Peter Zijlstra <peterz@infradead.org>
> > > Date:   Thu Jun 16 10:50:40 2016 +0200
> 
> > Since these early reports from Stefan and I it looks like it's been
> > hit but alot more folks now so I'd like to ask what the process is for
> > getting this backported into 4.6, 4.5 and 4.4 as in our testing all
> > those versions for their latest point release seem to have the same
> > problem.
> 
> I think this should do; Greg is on Cc and will mark the commit
> somewhere. It is already in Linus' tree and should indeed be sufficient.
> 
> It has a Fixes tag referring the commit that introduced it, which IIRC
> is somewhere around v4.2.
> 
> Greg, anything else required?

Nope, that should be fine.

greg k-h

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: divide error: 0000 [#1] SMP in task_numa_migrate - handle_mm_fault vanilla 4.4.6
  2016-07-07  7:42                       ` Peter Zijlstra
  2016-07-09  5:21                         ` Greg KH
@ 2016-07-11 22:33                         ` Greg KH
  2016-07-12 13:12                           ` Peter Zijlstra
  1 sibling, 1 reply; 17+ messages in thread
From: Greg KH @ 2016-07-11 22:33 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Campbell Steven, Stefan Priebe - Profihost AG, Vlastimil Babka,
	LKML, linux-mm, linux-mm, Ingo Molnar, Rik van Riel

On Thu, Jul 07, 2016 at 09:42:32AM +0200, Peter Zijlstra wrote:
> On Thu, Jul 07, 2016 at 11:20:36AM +1200, Campbell Steven wrote:
> 
> > > commit 8974189222159154c55f24ddad33e3613960521a
> > > Author: Peter Zijlstra <peterz@infradead.org>
> > > Date:   Thu Jun 16 10:50:40 2016 +0200
> 
> > Since these early reports from Stefan and I it looks like it's been
> > hit but alot more folks now so I'd like to ask what the process is for
> > getting this backported into 4.6, 4.5 and 4.4 as in our testing all
> > those versions for their latest point release seem to have the same
> > problem.
> 
> I think this should do; Greg is on Cc and will mark the commit
> somewhere. It is already in Linus' tree and should indeed be sufficient.
> 
> It has a Fixes tag referring the commit that introduced it, which IIRC
> is somewhere around v4.2.
> 
> Greg, anything else required?

Oops, this commit does not apply cleanly to 4.6 or 4.4-stable trees.
Can someone send me the backported verision that they have tested to
work properly so I can queue it up?

thanks,

greg k-h

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: divide error: 0000 [#1] SMP in task_numa_migrate - handle_mm_fault vanilla 4.4.6
  2016-07-11 22:33                         ` Greg KH
@ 2016-07-12 13:12                           ` Peter Zijlstra
  2016-07-13  0:26                             ` Greg KH
  0 siblings, 1 reply; 17+ messages in thread
From: Peter Zijlstra @ 2016-07-12 13:12 UTC (permalink / raw)
  To: Greg KH
  Cc: Campbell Steven, Stefan Priebe - Profihost AG, Vlastimil Babka,
	LKML, linux-mm, linux-mm, Ingo Molnar, Rik van Riel

[-- Attachment #1: Type: text/plain, Size: 454 bytes --]

On Mon, Jul 11, 2016 at 03:33:53PM -0700, Greg KH wrote:

> Oops, this commit does not apply cleanly to 4.6 or 4.4-stable trees.
> Can someone send me the backported verision that they have tested to
> work properly so I can queue it up?

I've never actually been able to reproduce, but the attached patches
apply, the reject was trivial.

They seem to compile and boot on my main test rig, but nothing else was
done but build the next kernel with it.



[-- Attachment #2: sched-stable-4.4.patch --]
[-- Type: text/x-diff, Size: 5885 bytes --]

commit 8974189222159154c55f24ddad33e3613960521a
Author: Peter Zijlstra <peterz@infradead.org>
Date:   Thu Jun 16 10:50:40 2016 +0200

    sched/fair: Fix cfs_rq avg tracking underflow
    
    As per commit:
    
      b7fa30c9cc48 ("sched/fair: Fix post_init_entity_util_avg() serialization")
    
    > the code generated from update_cfs_rq_load_avg():
    >
    > 	if (atomic_long_read(&cfs_rq->removed_load_avg)) {
    > 		s64 r = atomic_long_xchg(&cfs_rq->removed_load_avg, 0);
    > 		sa->load_avg = max_t(long, sa->load_avg - r, 0);
    > 		sa->load_sum = max_t(s64, sa->load_sum - r * LOAD_AVG_MAX, 0);
    > 		removed_load = 1;
    > 	}
    >
    > turns into:
    >
    > ffffffff81087064:       49 8b 85 98 00 00 00    mov    0x98(%r13),%rax
    > ffffffff8108706b:       48 85 c0                test   %rax,%rax
    > ffffffff8108706e:       74 40                   je     ffffffff810870b0 <update_blocked_averages+0xc0>
    > ffffffff81087070:       4c 89 f8                mov    %r15,%rax
    > ffffffff81087073:       49 87 85 98 00 00 00    xchg   %rax,0x98(%r13)
    > ffffffff8108707a:       49 29 45 70             sub    %rax,0x70(%r13)
    > ffffffff8108707e:       4c 89 f9                mov    %r15,%rcx
    > ffffffff81087081:       bb 01 00 00 00          mov    $0x1,%ebx
    > ffffffff81087086:       49 83 7d 70 00          cmpq   $0x0,0x70(%r13)
    > ffffffff8108708b:       49 0f 49 4d 70          cmovns 0x70(%r13),%rcx
    >
    > Which you'll note ends up with sa->load_avg -= r in memory at
    > ffffffff8108707a.
    
    So I _should_ have looked at other unserialized users of ->load_avg,
    but alas. Luckily nikbor reported a similar /0 from task_h_load() which
    instantly triggered recollection of this here problem.
    
    Aside from the intermediate value hitting memory and causing problems,
    there's another problem: the underflow detection relies on the signed
    bit. This reduces the effective width of the variables, IOW its
    effectively the same as having these variables be of signed type.
    
    This patch changes to a different means of unsigned underflow
    detection to not rely on the signed bit. This allows the variables to
    use the 'full' unsigned range. And it does so with explicit LOAD -
    STORE to ensure any intermediate value will never be visible in
    memory, allowing these unserialized loads.
    
    Note: GCC generates crap code for this, might warrant a look later.
    
    Note2: I say 'full' above, if we end up at U*_MAX we'll still explode;
           maybe we should do clamping on add too.
    
    Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
    Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
    Cc: Chris Wilson <chris@chris-wilson.co.uk>
    Cc: Linus Torvalds <torvalds@linux-foundation.org>
    Cc: Mike Galbraith <efault@gmx.de>
    Cc: Peter Zijlstra <peterz@infradead.org>
    Cc: Thomas Gleixner <tglx@linutronix.de>
    Cc: Yuyang Du <yuyang.du@intel.com>
    Cc: bsegall@google.com
    Cc: kernel@kyup.com
    Cc: morten.rasmussen@arm.com
    Cc: pjt@google.com
    Cc: steve.muckle@linaro.org
    Fixes: 9d89c257dfb9 ("sched/fair: Rewrite runnable load and utilization average tracking")
    Link: http://lkml.kernel.org/r/20160617091948.GJ30927@twins.programming.kicks-ass.net
    Signed-off-by: Ingo Molnar <mingo@kernel.org>

---
 kernel/sched/fair.c |   33 +++++++++++++++++++++++++--------
 1 file changed, 25 insertions(+), 8 deletions(-)

--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -2682,6 +2682,23 @@ static inline void update_tg_load_avg(st
 
 static inline u64 cfs_rq_clock_task(struct cfs_rq *cfs_rq);
 
+/*
+ * Unsigned subtract and clamp on underflow.
+ *
+ * Explicitly do a load-store to ensure the intermediate value never hits
+ * memory. This allows lockless observations without ever seeing the negative
+ * values.
+ */
+#define sub_positive(_ptr, _val) do {				\
+	typeof(_ptr) ptr = (_ptr);				\
+	typeof(*ptr) val = (_val);				\
+	typeof(*ptr) res, var = READ_ONCE(*ptr);		\
+	res = var - val;					\
+	if (res > var)						\
+		res = 0;					\
+	WRITE_ONCE(*ptr, res);					\
+} while (0)
+
 /* Group cfs_rq's load_avg is used for task_h_load and update_cfs_share */
 static inline int update_cfs_rq_load_avg(u64 now, struct cfs_rq *cfs_rq)
 {
@@ -2690,15 +2707,15 @@ static inline int update_cfs_rq_load_avg
 
 	if (atomic_long_read(&cfs_rq->removed_load_avg)) {
 		s64 r = atomic_long_xchg(&cfs_rq->removed_load_avg, 0);
-		sa->load_avg = max_t(long, sa->load_avg - r, 0);
-		sa->load_sum = max_t(s64, sa->load_sum - r * LOAD_AVG_MAX, 0);
+		sub_positive(&sa->load_avg, r);
+		sub_positive(&sa->load_sum, r * LOAD_AVG_MAX);
 		removed = 1;
 	}
 
 	if (atomic_long_read(&cfs_rq->removed_util_avg)) {
 		long r = atomic_long_xchg(&cfs_rq->removed_util_avg, 0);
-		sa->util_avg = max_t(long, sa->util_avg - r, 0);
-		sa->util_sum = max_t(s32, sa->util_sum - r * LOAD_AVG_MAX, 0);
+		sub_positive(&sa->util_avg, r);
+		sub_positive(&sa->util_sum, r * LOAD_AVG_MAX);
 	}
 
 	decayed = __update_load_avg(now, cpu_of(rq_of(cfs_rq)), sa,
@@ -2764,10 +2781,10 @@ static void detach_entity_load_avg(struc
 			  &se->avg, se->on_rq * scale_load_down(se->load.weight),
 			  cfs_rq->curr == se, NULL);
 
-	cfs_rq->avg.load_avg = max_t(long, cfs_rq->avg.load_avg - se->avg.load_avg, 0);
-	cfs_rq->avg.load_sum = max_t(s64,  cfs_rq->avg.load_sum - se->avg.load_sum, 0);
-	cfs_rq->avg.util_avg = max_t(long, cfs_rq->avg.util_avg - se->avg.util_avg, 0);
-	cfs_rq->avg.util_sum = max_t(s32,  cfs_rq->avg.util_sum - se->avg.util_sum, 0);
+	sub_positive(&cfs_rq->avg.load_avg, se->avg.load_avg);
+	sub_positive(&cfs_rq->avg.load_sum, se->avg.load_sum);
+	sub_positive(&cfs_rq->avg.util_avg, se->avg.util_avg);
+	sub_positive(&cfs_rq->avg.util_sum, se->avg.util_sum);
 }
 
 /* Add the load generated by se into cfs_rq's load average */

[-- Attachment #3: sched-stable-4.6.patch --]
[-- Type: text/x-diff, Size: 5732 bytes --]

commit 8974189222159154c55f24ddad33e3613960521a
Author: Peter Zijlstra <peterz@infradead.org>
Date:   Thu Jun 16 10:50:40 2016 +0200

    sched/fair: Fix cfs_rq avg tracking underflow

    As per commit:

      b7fa30c9cc48 ("sched/fair: Fix post_init_entity_util_avg() serialization")

    > the code generated from update_cfs_rq_load_avg():
    >
    > 	if (atomic_long_read(&cfs_rq->removed_load_avg)) {
    > 		s64 r = atomic_long_xchg(&cfs_rq->removed_load_avg, 0);
    > 		sa->load_avg = max_t(long, sa->load_avg - r, 0);
    > 		sa->load_sum = max_t(s64, sa->load_sum - r * LOAD_AVG_MAX, 0);
    > 		removed_load = 1;
    > 	}
    >
    > turns into:
    >
    > ffffffff81087064:       49 8b 85 98 00 00 00    mov    0x98(%r13),%rax
    > ffffffff8108706b:       48 85 c0                test   %rax,%rax
    > ffffffff8108706e:       74 40                   je     ffffffff810870b0 <update_blocked_averages+0xc0>
    > ffffffff81087070:       4c 89 f8                mov    %r15,%rax
    > ffffffff81087073:       49 87 85 98 00 00 00    xchg   %rax,0x98(%r13)
    > ffffffff8108707a:       49 29 45 70             sub    %rax,0x70(%r13)
    > ffffffff8108707e:       4c 89 f9                mov    %r15,%rcx
    > ffffffff81087081:       bb 01 00 00 00          mov    $0x1,%ebx
    > ffffffff81087086:       49 83 7d 70 00          cmpq   $0x0,0x70(%r13)
    > ffffffff8108708b:       49 0f 49 4d 70          cmovns 0x70(%r13),%rcx
    >
    > Which you'll note ends up with sa->load_avg -= r in memory at
    > ffffffff8108707a.

    So I _should_ have looked at other unserialized users of ->load_avg,
    but alas. Luckily nikbor reported a similar /0 from task_h_load() which
    instantly triggered recollection of this here problem.

    Aside from the intermediate value hitting memory and causing problems,
    there's another problem: the underflow detection relies on the signed
    bit. This reduces the effective width of the variables, IOW its
    effectively the same as having these variables be of signed type.

    This patch changes to a different means of unsigned underflow
    detection to not rely on the signed bit. This allows the variables to
    use the 'full' unsigned range. And it does so with explicit LOAD -
    STORE to ensure any intermediate value will never be visible in
    memory, allowing these unserialized loads.

    Note: GCC generates crap code for this, might warrant a look later.

    Note2: I say 'full' above, if we end up at U*_MAX we'll still explode;
           maybe we should do clamping on add too.

    Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
    Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
    Cc: Chris Wilson <chris@chris-wilson.co.uk>
    Cc: Linus Torvalds <torvalds@linux-foundation.org>
    Cc: Mike Galbraith <efault@gmx.de>
    Cc: Peter Zijlstra <peterz@infradead.org>
    Cc: Thomas Gleixner <tglx@linutronix.de>
    Cc: Yuyang Du <yuyang.du@intel.com>
    Cc: bsegall@google.com
    Cc: kernel@kyup.com
    Cc: morten.rasmussen@arm.com
    Cc: pjt@google.com
    Cc: steve.muckle@linaro.org
    Fixes: 9d89c257dfb9 ("sched/fair: Rewrite runnable load and utilization average tracking")
    Link: http://lkml.kernel.org/r/20160617091948.GJ30927@twins.programming.kicks-ass.net
    Signed-off-by: Ingo Molnar <mingo@kernel.org>

--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -2821,6 +2821,23 @@ static inline void update_tg_load_avg(st
 
 static inline u64 cfs_rq_clock_task(struct cfs_rq *cfs_rq);
 
+/*
+ * Unsigned subtract and clamp on underflow.
+ *
+ * Explicitly do a load-store to ensure the intermediate value never hits
+ * memory. This allows lockless observations without ever seeing the negative
+ * values.
+ */
+#define sub_positive(_ptr, _val) do {				\
+	typeof(_ptr) ptr = (_ptr);				\
+	typeof(*ptr) val = (_val);				\
+	typeof(*ptr) res, var = READ_ONCE(*ptr);		\
+	res = var - val;					\
+	if (res > var)						\
+		res = 0;					\
+	WRITE_ONCE(*ptr, res);					\
+} while (0)
+
 /* Group cfs_rq's load_avg is used for task_h_load and update_cfs_share */
 static inline int update_cfs_rq_load_avg(u64 now, struct cfs_rq *cfs_rq)
 {
@@ -2829,15 +2846,15 @@ static inline int update_cfs_rq_load_avg
 
 	if (atomic_long_read(&cfs_rq->removed_load_avg)) {
 		s64 r = atomic_long_xchg(&cfs_rq->removed_load_avg, 0);
-		sa->load_avg = max_t(long, sa->load_avg - r, 0);
-		sa->load_sum = max_t(s64, sa->load_sum - r * LOAD_AVG_MAX, 0);
+		sub_positive(&sa->load_avg, r);
+		sub_positive(&sa->load_sum, r * LOAD_AVG_MAX);
 		removed = 1;
 	}
 
 	if (atomic_long_read(&cfs_rq->removed_util_avg)) {
 		long r = atomic_long_xchg(&cfs_rq->removed_util_avg, 0);
-		sa->util_avg = max_t(long, sa->util_avg - r, 0);
-		sa->util_sum = max_t(s32, sa->util_sum - r * LOAD_AVG_MAX, 0);
+		sub_positive(&sa->util_avg, r);
+		sub_positive(&sa->util_sum, r * LOAD_AVG_MAX);
 	}
 
 	decayed = __update_load_avg(now, cpu_of(rq_of(cfs_rq)), sa,
@@ -2927,10 +2944,10 @@ static void detach_entity_load_avg(struc
 			  &se->avg, se->on_rq * scale_load_down(se->load.weight),
 			  cfs_rq->curr == se, NULL);
 
-	cfs_rq->avg.load_avg = max_t(long, cfs_rq->avg.load_avg - se->avg.load_avg, 0);
-	cfs_rq->avg.load_sum = max_t(s64,  cfs_rq->avg.load_sum - se->avg.load_sum, 0);
-	cfs_rq->avg.util_avg = max_t(long, cfs_rq->avg.util_avg - se->avg.util_avg, 0);
-	cfs_rq->avg.util_sum = max_t(s32,  cfs_rq->avg.util_sum - se->avg.util_sum, 0);
+	sub_positive(&cfs_rq->avg.load_avg, se->avg.load_avg);
+	sub_positive(&cfs_rq->avg.load_sum, se->avg.load_sum);
+	sub_positive(&cfs_rq->avg.util_avg, se->avg.util_avg);
+	sub_positive(&cfs_rq->avg.util_sum, se->avg.util_sum);
 }
 
 /* Add the load generated by se into cfs_rq's load average */

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: divide error: 0000 [#1] SMP in task_numa_migrate - handle_mm_fault vanilla 4.4.6
  2016-07-12 13:12                           ` Peter Zijlstra
@ 2016-07-13  0:26                             ` Greg KH
  0 siblings, 0 replies; 17+ messages in thread
From: Greg KH @ 2016-07-13  0:26 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Campbell Steven, Stefan Priebe - Profihost AG, Vlastimil Babka,
	LKML, linux-mm, linux-mm, Ingo Molnar, Rik van Riel

On Tue, Jul 12, 2016 at 03:12:35PM +0200, Peter Zijlstra wrote:
> On Mon, Jul 11, 2016 at 03:33:53PM -0700, Greg KH wrote:
> 
> > Oops, this commit does not apply cleanly to 4.6 or 4.4-stable trees.
> > Can someone send me the backported verision that they have tested to
> > work properly so I can queue it up?
> 
> I've never actually been able to reproduce, but the attached patches
> apply, the reject was trivial.
> 
> They seem to compile and boot on my main test rig, but nothing else was
> done but build the next kernel with it.

Thanks for these, now applied.

greg k-h

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2016-07-13  0:28 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-03-17 18:38 divide error: 0000 [#1] SMP in task_numa_migrate - handle_mm_fault vanilla 4.4.6 Stefan Priebe
2016-03-17 18:45 ` Greg KH
2016-03-19 22:26   ` Vlastimil Babka
2016-03-20 21:27     ` Stefan Priebe
2016-03-20 21:41       ` Greg KH
2016-03-21 10:52         ` Stefan Priebe - Profihost AG
2016-03-21 13:38           ` Greg KH
2016-05-17  6:01             ` Stefan Priebe - Profihost AG
2016-05-17  9:21               ` Campbell Steven
2016-06-22  1:19                 ` Campbell Steven
2016-06-22  6:13                   ` Peter Zijlstra
2016-07-06 23:20                     ` Campbell Steven
2016-07-07  7:42                       ` Peter Zijlstra
2016-07-09  5:21                         ` Greg KH
2016-07-11 22:33                         ` Greg KH
2016-07-12 13:12                           ` Peter Zijlstra
2016-07-13  0:26                             ` Greg KH

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).