From mboxrd@z Thu Jan 1 00:00:00 1970 From: Boris Ostrovsky Subject: Re: 3.16-rc2 WARNING: CPU: 0 PID: 0 at arch/x86/xen/multicalls.c:129 xen_mc_flush+0x1ab/0x1c0() Date: Mon, 30 Jun 2014 12:29:30 -0400 Message-ID: <53B1906A.8040206@oracle.com> References: <1683024832.20140626135707@eikelenboom.it> <53AC476B.1000305@oracle.com> <20140627141530.GC12518@laptop.dumpdata.com> <24264213.20140629115349@eikelenboom.it> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii"; Format="flowed" Content-Transfer-Encoding: 7bit Return-path: Received: from mail6.bemta4.messagelabs.com ([85.158.143.247]) by lists.xen.org with esmtp (Exim 4.72) (envelope-from ) id 1X1eQh-0003sb-Kq for xen-devel@lists.xenproject.org; Mon, 30 Jun 2014 16:27:43 +0000 In-Reply-To: <24264213.20140629115349@eikelenboom.it> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: Sander Eikelenboom Cc: xen-devel@lists.xenproject.org, David Vrabel List-Id: xen-devel@lists.xenproject.org On 06/29/2014 05:53 AM, Sander Eikelenboom wrote: > Friday, June 27, 2014, 4:15:30 PM, you wrote: > >> On Thu, Jun 26, 2014 at 12:16:43PM -0400, Boris Ostrovsky wrote: >>> On 06/26/2014 07:57 AM, Sander Eikelenboom wrote: >>>> Hi, >>>> >>>> Got the warning below on dom0 boot with a 3.16-rc2 kernel, doesn't seem to do >>>> much immediate harm: >>>> >>>> [ 12.723393] Calibrating delay loop (skipped), value calculated using timer frequency.. 6402.05 BogoMIPS (lpj=10667280) >>>> [ 12.723414] pid_max: default: 32768 minimum: 301 >>>> [ 12.723433] ACPI: Core revision 20140424 >>>> [ 12.740681] ACPI: All ACPI Tables successfully acquired >>>> [ 12.743145] ------------[ cut here ]------------ >>>> [ 12.743170] WARNING: CPU: 0 PID: 0 at arch/x86/xen/multicalls.c:129 xen_mc_flush+0x1ab/0x1c0() >>>> [ 12.743186] Modules linked in: >>>> [ 12.743196] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 3.16.0-rc2-20140626+ #1 >>>> [ 12.743209] Hardware name: MSI MS-7640/890FXA-GD70 (MS-7640) , BIOS V1.8B1 09/13/2010 >>>> [ 12.743223] 0000000000000009 ffffffff82203d88 ffffffff81afdba7 ffffffff822184e0 >>>> [ 12.743242] 0000000000000000 ffffffff82203dc8 ffffffff810bfdb2 ffffffff81007858 >>>> [ 12.743261] 0000000000000000 0000000000000001 0000000000000000 ffff88005f60b120 >>>> [ 12.743280] Call Trace: >>>> [ 12.743290] [] dump_stack+0x46/0x58 >>>> [ 12.743302] [] warn_slowpath_common+0x82/0xb0 >>>> [ 12.743316] [] ? pte_pfn_to_mfn+0x88/0xa0 >>>> [ 12.743328] [] warn_slowpath_null+0x15/0x20 >>>> [ 12.743341] [] xen_mc_flush+0x1ab/0x1c0 >>>> [ 12.743353] [] xen_alloc_pte+0x1ad/0x220 >>>> [ 12.743364] [] ? xen_make_pte+0x1b/0x70 >>>> [ 12.743378] [] init_espfix_ap+0x22f/0x3f0 >>>> [ 12.743392] [] init_espfix_bsp+0xee/0xf3 >>>> [ 12.743404] [] start_kernel+0x3d6/0x441 >>>> [ 12.743416] [] ? set_init_arg+0x58/0x58 >>>> [ 12.743428] [] x86_64_start_reservations+0x2a/0x2c >>>> [ 12.743441] [] xen_start_kernel+0x59b/0x59d >>>> [ 12.743468] ---[ end trace 20d3292a87f35842 ]--- >>>> [ 12.743943] Dentry cache hash table entries: 262144 (order: 9, 2097152 bytes) >>>> [ 12.744726] Inode-cache hash table entries: 131072 (order: 8, 1048576 bytes) >>>> [ 12.745043] Mount-cache hash table entries: 4096 (order: 3, 32768 bytes) >>>> [ 12.745064] Mountpoint-cache hash table entries: 4096 (order: 3, 32768 bytes) >>>> [ 12.745709] Initializing cgroup subsys freezer >>>> [ 12.745733] Initializing cgroup subsys blkio >>>> >>>> If you need more info / complete logs, give me a call. >>> This is caused by hpa's ESP fix for 16-bit programs leaking SP bits from >>> 64-bit kernels (commit 3891a04aafd668686239349ea58f3314ea2af86b). >>> >>> It's harmless in the sense that the only effect is that the workaround won't >>> get enabled. >> But oddly enough it only shows up on certain machines. I am wondering >> if the issue is some E820 is not rounded up to a page size and we allocate >> for the ESPFIX an not-4KB (ie, we cross-over) page. Hopefully the E820 >> will shed the light. > I think it could be related to this in xl dmesg (sorry .. somehow didn't notice > it before): > > (XEN) [2014-06-29 09:43:48.956] mm.c:1215:d0v0 Global bit is set to kernel page 7fe8d2ffb9 > (XEN) [2014-06-29 09:43:48.956] mm.c:766:d0v0 Bad L1 flags 400000 > (XEN) [2014-06-29 09:43:48.956] mm.c:1222:d0v0 Failure in alloc_l1_table: entry 3 > (XEN) [2014-06-29 09:43:48.956] mm.c:2100:d0v0 Error while validating mfn 5469ad (pfn 59850) for type 1000000000000000: caf=8000000000000003 taf=1000000000000001 > (XEN) [2014-06-29 09:43:48.956] mm.c:2996:d0v0 Error while pinning mfn 5469ad It almost certainly is related. Most likely result of espfix code creating translations for the page that it allocates. At least that's what it was last time I looked at this. -boris >