All of lore.kernel.org
 help / color / mirror / Atom feed
From: Greg KH <greg@kroah.com>
To: Stefan Priebe <s.priebe@profihost.ag>
Cc: Vlastimil Babka <vbabka@suse.cz>,
	LKML <linux-kernel@vger.kernel.org>,
	stable <stable@vger.kernel.org>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	linux-mm@vger.kernel.org, Peter Zijlstra <peterz@infradead.org>,
	Ingo Molnar <mingo@redhat.com>, Rik van Riel <riel@redhat.com>
Subject: Re: divide error: 0000 [#1] SMP in task_numa_migrate - handle_mm_fault vanilla 4.4.6
Date: Sun, 20 Mar 2016 17:41:30 -0400	[thread overview]
Message-ID: <20160320214130.GB23920@kroah.com> (raw)
In-Reply-To: <56EF15BB.3080509@profihost.ag>

On Sun, Mar 20, 2016 at 10:27:23PM +0100, Stefan Priebe wrote:
> 
> Am 19.03.2016 um 23:26 schrieb Vlastimil Babka:
> >On 03/17/2016 07:45 PM, Greg KH wrote:
> >>On Thu, Mar 17, 2016 at 07:38:03PM +0100, Stefan Priebe wrote:
> >>>Hi,
> >>>
> >>>while running qemu 2.5 on a host running 4.4.6 the host system has
> >>>crashed
> >>>(load > 200) 3 times in the last 3 days.
> >>>
> >>>Always with this stack trace: (copy left here:
> >>>http://pastebin.com/raw/bCWTLKyt)
> >>>
> >>>[69068.874268] divide error: 0000 [#1] SMP
> >>>[69068.875242] Modules linked in: ebtable_filter ebtables ip6t_REJECT
> >>>nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter
> >>>ip6_tables
> >>>ipt_REJECT nf_reject_ipv4 xt_physdev xt_comment nf_conntrack_ipv4
> >>>nf_defrag_ipv4 xt_tcpudp xt_mark xt_set xt_addrtype xt_conntrack
> >>>nf_conntrack ip_set_hash_net ip_set vhost_net tun vhost macvtap macvlan
> >>>kvm_intel nfnetlink_log kvm nfnetlink irqbypass netconsole dlm
> >>>xt_multiport
> >>>iptable_filter ip_tables x_tables iscsi_tcp libiscsi_tcp libiscsi
> >>>scsi_transport_iscsi nfsd auth_rpcgss oid_registry bonding coretemp
> >>>8021q
> >>>garp fuse i2c_i801 i7core_edac edac_core i5500_temp button btrfs xor
> >>>raid6_pq dm_mod raid1 md_mod usb_storage ohci_hcd bcache sg usbhid
> >>>sd_mod
> >>>ata_generic uhci_hcd ehci_pci ehci_hcd usbcore ata_piix usb_common igb
> >>>i2c_algo_bit mpt3sas raid_class ixgbe scsi_transport_sas i2c_core
> >>>mdio ptp
> >>>pps_core
> >>>[69068.895604] CPU: 14 PID: 6673 Comm: ceph-osd Not tainted
> >>>4.4.6+7-ph #1
> >>>[69068.897052] Hardware name: Supermicro X8DT3/X8DT3, BIOS 2.1
> >>>03/17/2012
> >>>[69068.898578] task: ffff880fc7f28000 ti: ffff880fda2c4000 task.ti:
> >>>ffff880fda2c4000
> >>>[69068.900377] RIP: 0010:[<ffffffff860b372c>]  [<ffffffff860b372c>]
> >>>task_h_load+0xcc/0x100
> >
> >decodecode says:
> >
> >   27:   48 83 c1 01             add    $0x1,%rcx
> >   2b:*  48 f7 f1                div    %rcx             <-- trapping
> >instruction
> >
> >This suggests the CONFIG_FAIR_GROUP_SCHED version of task_h_load:
> >
> >         update_cfs_rq_h_load(cfs_rq);
> >         return div64_ul(p->se.avg.load_avg * cfs_rq->h_load,
> >                         cfs_rq_load_avg(cfs_rq) + 1);
> >
> >So the load avg is -1, thus after adding 1 we get division by 0, huh?
> 
> Yes CONFIG_FAIR_GROUP_SCHED is set. I cherry picked now all those commits up
> to 4.5 for fair.c:
> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/log/kernel/sched/fair.c?h=v4.5
> 
> It didn't happen again with v4.4.6 + 4.5 patches for fair.c

Ok, that's a lot of patches, how about figuring out which single patch,
or shortest number of patches, makes things work again?

thanks,

greg k-h

WARNING: multiple messages have this Message-ID (diff)
From: Greg KH <greg@kroah.com>
To: Stefan Priebe <s.priebe@profihost.ag>
Cc: Vlastimil Babka <vbabka@suse.cz>,
	LKML <linux-kernel@vger.kernel.org>,
	stable <stable@vger.kernel.org>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	linux-mm@vger.kernel.org, Peter Zijlstra <peterz@infradead.org>,
	Ingo Molnar <mingo@redhat.com>, Rik van Riel <riel@redhat.com>
Subject: Re: divide error: 0000 [#1] SMP in task_numa_migrate - handle_mm_fault vanilla 4.4.6
Date: Sun, 20 Mar 2016 17:41:30 -0400	[thread overview]
Message-ID: <20160320214130.GB23920@kroah.com> (raw)
In-Reply-To: <56EF15BB.3080509@profihost.ag>

On Sun, Mar 20, 2016 at 10:27:23PM +0100, Stefan Priebe wrote:
> 
> Am 19.03.2016 um 23:26 schrieb Vlastimil Babka:
> >On 03/17/2016 07:45 PM, Greg KH wrote:
> >>On Thu, Mar 17, 2016 at 07:38:03PM +0100, Stefan Priebe wrote:
> >>>Hi,
> >>>
> >>>while running qemu 2.5 on a host running 4.4.6 the host system has
> >>>crashed
> >>>(load > 200) 3 times in the last 3 days.
> >>>
> >>>Always with this stack trace: (copy left here:
> >>>http://pastebin.com/raw/bCWTLKyt)
> >>>
> >>>[69068.874268] divide error: 0000 [#1] SMP
> >>>[69068.875242] Modules linked in: ebtable_filter ebtables ip6t_REJECT
> >>>nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter
> >>>ip6_tables
> >>>ipt_REJECT nf_reject_ipv4 xt_physdev xt_comment nf_conntrack_ipv4
> >>>nf_defrag_ipv4 xt_tcpudp xt_mark xt_set xt_addrtype xt_conntrack
> >>>nf_conntrack ip_set_hash_net ip_set vhost_net tun vhost macvtap macvlan
> >>>kvm_intel nfnetlink_log kvm nfnetlink irqbypass netconsole dlm
> >>>xt_multiport
> >>>iptable_filter ip_tables x_tables iscsi_tcp libiscsi_tcp libiscsi
> >>>scsi_transport_iscsi nfsd auth_rpcgss oid_registry bonding coretemp
> >>>8021q
> >>>garp fuse i2c_i801 i7core_edac edac_core i5500_temp button btrfs xor
> >>>raid6_pq dm_mod raid1 md_mod usb_storage ohci_hcd bcache sg usbhid
> >>>sd_mod
> >>>ata_generic uhci_hcd ehci_pci ehci_hcd usbcore ata_piix usb_common igb
> >>>i2c_algo_bit mpt3sas raid_class ixgbe scsi_transport_sas i2c_core
> >>>mdio ptp
> >>>pps_core
> >>>[69068.895604] CPU: 14 PID: 6673 Comm: ceph-osd Not tainted
> >>>4.4.6+7-ph #1
> >>>[69068.897052] Hardware name: Supermicro X8DT3/X8DT3, BIOS 2.1
> >>>03/17/2012
> >>>[69068.898578] task: ffff880fc7f28000 ti: ffff880fda2c4000 task.ti:
> >>>ffff880fda2c4000
> >>>[69068.900377] RIP: 0010:[<ffffffff860b372c>]  [<ffffffff860b372c>]
> >>>task_h_load+0xcc/0x100
> >
> >decodecode says:
> >
> >   27:   48 83 c1 01             add    $0x1,%rcx
> >   2b:*  48 f7 f1                div    %rcx             <-- trapping
> >instruction
> >
> >This suggests the CONFIG_FAIR_GROUP_SCHED version of task_h_load:
> >
> >         update_cfs_rq_h_load(cfs_rq);
> >         return div64_ul(p->se.avg.load_avg * cfs_rq->h_load,
> >                         cfs_rq_load_avg(cfs_rq) + 1);
> >
> >So the load avg is -1, thus after adding 1 we get division by 0, huh?
> 
> Yes CONFIG_FAIR_GROUP_SCHED is set. I cherry picked now all those commits up
> to 4.5 for fair.c:
> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/log/kernel/sched/fair.c?h=v4.5
> 
> It didn't happen again with v4.4.6 + 4.5 patches for fair.c

Ok, that's a lot of patches, how about figuring out which single patch,
or shortest number of patches, makes things work again?

thanks,

greg k-h

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2016-03-20 21:41 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-03-17 18:38 divide error: 0000 [#1] SMP in task_numa_migrate - handle_mm_fault vanilla 4.4.6 Stefan Priebe
2016-03-17 18:45 ` Greg KH
2016-03-17 18:45   ` Greg KH
2016-03-19 22:26   ` Vlastimil Babka
2016-03-19 22:26     ` Vlastimil Babka
2016-03-20 21:27     ` Stefan Priebe
2016-03-20 21:27       ` Stefan Priebe
2016-03-20 21:41       ` Greg KH [this message]
2016-03-20 21:41         ` Greg KH
2016-03-21 10:52         ` Stefan Priebe - Profihost AG
2016-03-21 10:52           ` Stefan Priebe - Profihost AG
2016-03-21 13:38           ` Greg KH
2016-03-21 13:38             ` Greg KH
2016-05-17  6:01             ` Stefan Priebe - Profihost AG
2016-05-17  6:01               ` Stefan Priebe - Profihost AG
2016-05-17  9:21               ` Campbell Steven
2016-05-17  9:21                 ` Campbell Steven
2016-06-22  1:19                 ` Campbell Steven
2016-06-22  1:19                   ` Campbell Steven
2016-06-22  6:13                   ` Peter Zijlstra
2016-06-22  6:13                     ` Peter Zijlstra
2016-07-06 23:20                     ` Campbell Steven
2016-07-06 23:20                       ` Campbell Steven
2016-07-07  7:42                       ` Peter Zijlstra
2016-07-07  7:42                         ` Peter Zijlstra
2016-07-09  5:21                         ` Greg KH
2016-07-09  5:21                           ` Greg KH
2016-07-11 22:33                         ` Greg KH
2016-07-11 22:33                           ` Greg KH
2016-07-12 13:12                           ` Peter Zijlstra
2016-07-13  0:26                             ` Greg KH
2016-07-13  0:26                               ` Greg KH
2016-06-21 12:13 Yannis Aribaud
2016-06-22 15:42 ` Yannis Aribaud

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160320214130.GB23920@kroah.com \
    --to=greg@kroah.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-mm@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=riel@redhat.com \
    --cc=s.priebe@profihost.ag \
    --cc=stable@vger.kernel.org \
    --cc=vbabka@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.