All of lore.kernel.org
 help / color / mirror / Atom feed
* GPF in mcheck_init() when booting xen-unstable on VMware ESX 5.1
@ 2013-05-31 19:19 Aravindh Puthiyaparambil (aravindp)
  2013-05-31 19:32 ` Andrew Cooper
  0 siblings, 1 reply; 4+ messages in thread
From: Aravindh Puthiyaparambil (aravindp) @ 2013-05-31 19:19 UTC (permalink / raw)
  To: xen-devel

I am trying to boot xen-unstable (9204bc654562976c7cdebf21c6b5013f6e3057b3) on VMware ESX 5.1 and Workstation 9. I have enabled "Virtualize Intel VT-x/EPT" option. I am seeing the following GPF during boot:

(XEN) mce_intel.c:717: MCA Capability: BCAST 1 SER 0 CMCI 0 firstbank 0 extended MCE MSR 0
(XEN) Intel machine check reporting enabled
(XEN) ----[ Xen-4.3-unstable  x86_64  debug=y  Not tainted ]----
(XEN) CPU:    0
(XEN) RIP:    e008:[<ffff82c4c01aa71e>] mcheck_init+0x38a/0x45d
(XEN) RFLAGS: 0000000000010087   CONTEXT: hypervisor
(XEN) rax: 0000000000000000   rbx: ffff82c4c026ca80   rcx: 0000000000000000
(XEN) rdx: ffff83001d6b2fe0   rsi: bad0bad0bad0bad0   rdi: bad0bad0bad0bad0
(XEN) rbp: ffff82c4c02cfe08   rsp: ffff82c4c02cfde8   r8:  ffff8300000b8f00
(XEN) r9:  0000000000000010   r10: bad0bad0bad0bad0   r11: 0000000000000010
(XEN) r12: ffff83001ffd9fe0   r13: 0000000000000000   r14: ffff82c4c02c8000
(XEN) r15: ffff83000008efb0   cr0: 000000008005003b   cr4: 00000000000400f0
(XEN) cr3: 000000001fc7b000   cr2: 0000000000000000
(XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: 0000   cs: e008
(XEN) Xen stack trace from rsp=ffff82c4c02cfde8:
(XEN)    0000000000000000 ffff82c4c026ca80 0000000080000008 00000000ffffffff
(XEN)    ffff82c4c02cfe48 ffff82c4c01a7356 1fabfbff000206a7 0000000096ba2223
(XEN)    ffff83001ffd9820 0000000000000002 ffff83001ffd9820 ffff82c4c02c8000
(XEN)    ffff82c4c02cff08 ffff82c4c02a4536 0000000200000000 0000000000000000
(XEN)    ffff83000008ed90 00000000011fb000 0000000000100000 ffff83000008efb0
(XEN)    0000000000000000 ffff83000051bc90 ffff830000000010 ffff8300ffffff00
(XEN)    ffff83000008ef40 ffff82c400000001 0000000800000000 000000010000006e
(XEN)    0000000000000003 00000000000002f8 0000000000000000 0000000000000000
(XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN)    0000000000000000 ffff82c4c01000b5 0000000000000000 0000000000000000
(XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN)    ffff83001d6b0000 0000000000000000 0000000000000000
(XEN) Xen call trace:
(XEN)    [<ffff82c4c01aa71e>] mcheck_init+0x38a/0x45d
(XEN)    [<ffff82c4c01a7356>] identify_cpu+0x2b4/0x2d0
(XEN)    [<ffff82c4c02a4536>] __start_xen+0x26e9/0x2c98
(XEN)    
(XEN) 
(XEN) ****************************************
(XEN) Panic on CPU 0:
(XEN) GENERAL PROTECTION FAULT
(XEN) [error_code=0000]
(XEN) ****************************************
(XEN)

I have narrowed it down to line 631 in set_poll_bankmask():
	bitmap_copy(mb->bank_map, mca_allbanks->bank_map, nr_mce_banks);

What is happening is that in mca_cap_init(), nr_mce_banks is being set to 0. This causes the allocation of bank_map to be set to ZERO_BLOCK_PTR which is the return value for zero-size allocation by xzalloc_array()/_xmalloc(). This results in the bitmap_copy() to fail disastrously. Is it correct to disable MCE if nr_mce_banks is 0? Or say this is a quirk of the VMware virtual platform and run with mce=0? Linux is to be able to handle this gracefully.

Another question I have is that callers of xzalloc_array() and friends only check for a NULL return as an error. So what about cases like the one above which fell through the cracks because the return value is ZERO_BLOCK_PTR? Should they all be checking for ZERO_BLOCK_PTR too or ensuring that no calls are made with zero size allocations?

Thanks,
Aravindh

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: GPF in mcheck_init() when booting xen-unstable on VMware ESX 5.1
  2013-05-31 19:19 GPF in mcheck_init() when booting xen-unstable on VMware ESX 5.1 Aravindh Puthiyaparambil (aravindp)
@ 2013-05-31 19:32 ` Andrew Cooper
  2013-05-31 19:40   ` Aravindh Puthiyaparambil (aravindp)
  2013-06-03  8:59   ` Jan Beulich
  0 siblings, 2 replies; 4+ messages in thread
From: Andrew Cooper @ 2013-05-31 19:32 UTC (permalink / raw)
  To: Aravindh Puthiyaparambil (aravindp); +Cc: xen-devel

On 31/05/13 20:19, Aravindh Puthiyaparambil (aravindp) wrote:
> I am trying to boot xen-unstable (9204bc654562976c7cdebf21c6b5013f6e3057b3) on VMware ESX 5.1 and Workstation 9. I have enabled "Virtualize Intel VT-x/EPT" option. I am seeing the following GPF during boot:
>
> (XEN) mce_intel.c:717: MCA Capability: BCAST 1 SER 0 CMCI 0 firstbank 0 extended MCE MSR 0
> (XEN) Intel machine check reporting enabled
> (XEN) ----[ Xen-4.3-unstable  x86_64  debug=y  Not tainted ]----
> (XEN) CPU:    0
> (XEN) RIP:    e008:[<ffff82c4c01aa71e>] mcheck_init+0x38a/0x45d
> (XEN) RFLAGS: 0000000000010087   CONTEXT: hypervisor
> (XEN) rax: 0000000000000000   rbx: ffff82c4c026ca80   rcx: 0000000000000000
> (XEN) rdx: ffff83001d6b2fe0   rsi: bad0bad0bad0bad0   rdi: bad0bad0bad0bad0
> (XEN) rbp: ffff82c4c02cfe08   rsp: ffff82c4c02cfde8   r8:  ffff8300000b8f00
> (XEN) r9:  0000000000000010   r10: bad0bad0bad0bad0   r11: 0000000000000010
> (XEN) r12: ffff83001ffd9fe0   r13: 0000000000000000   r14: ffff82c4c02c8000
> (XEN) r15: ffff83000008efb0   cr0: 000000008005003b   cr4: 00000000000400f0
> (XEN) cr3: 000000001fc7b000   cr2: 0000000000000000
> (XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: 0000   cs: e008
> (XEN) Xen stack trace from rsp=ffff82c4c02cfde8:
> (XEN)    0000000000000000 ffff82c4c026ca80 0000000080000008 00000000ffffffff
> (XEN)    ffff82c4c02cfe48 ffff82c4c01a7356 1fabfbff000206a7 0000000096ba2223
> (XEN)    ffff83001ffd9820 0000000000000002 ffff83001ffd9820 ffff82c4c02c8000
> (XEN)    ffff82c4c02cff08 ffff82c4c02a4536 0000000200000000 0000000000000000
> (XEN)    ffff83000008ed90 00000000011fb000 0000000000100000 ffff83000008efb0
> (XEN)    0000000000000000 ffff83000051bc90 ffff830000000010 ffff8300ffffff00
> (XEN)    ffff83000008ef40 ffff82c400000001 0000000800000000 000000010000006e
> (XEN)    0000000000000003 00000000000002f8 0000000000000000 0000000000000000
> (XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
> (XEN)    0000000000000000 ffff82c4c01000b5 0000000000000000 0000000000000000
> (XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
> (XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
> (XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
> (XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
> (XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
> (XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
> (XEN)    ffff83001d6b0000 0000000000000000 0000000000000000
> (XEN) Xen call trace:
> (XEN)    [<ffff82c4c01aa71e>] mcheck_init+0x38a/0x45d
> (XEN)    [<ffff82c4c01a7356>] identify_cpu+0x2b4/0x2d0
> (XEN)    [<ffff82c4c02a4536>] __start_xen+0x26e9/0x2c98
> (XEN)    
> (XEN) 
> (XEN) ****************************************
> (XEN) Panic on CPU 0:
> (XEN) GENERAL PROTECTION FAULT
> (XEN) [error_code=0000]
> (XEN) ****************************************
> (XEN)
>
> I have narrowed it down to line 631 in set_poll_bankmask():
> 	bitmap_copy(mb->bank_map, mca_allbanks->bank_map, nr_mce_banks);
>
> What is happening is that in mca_cap_init(), nr_mce_banks is being set to 0. This causes the allocation of bank_map to be set to ZERO_BLOCK_PTR which is the return value for zero-size allocation by xzalloc_array()/_xmalloc(). This results in the bitmap_copy() to fail disastrously. Is it correct to disable MCE if nr_mce_banks is 0? Or say this is a quirk of the VMware virtual platform and run with mce=0? Linux is to be able to handle this gracefully.
>
> Another question I have is that callers of xzalloc_array() and friends only check for a NULL return as an error. So what about cases like the one above which fell through the cracks because the return value is ZERO_BLOCK_PTR? Should they all be checking for ZERO_BLOCK_PTR too or ensuring that no calls are made with zero size allocations?
>
> Thanks,
> Aravindh

ZERO_BLOCK_PTR is specifically distinguished from NULL (As the comment
beside it says).

The real bug is calling **alloc() with 0 as a parameter.

I would say that nr_mce_banks of 0 should result in an implicit mce=0. 
You certainly cant sensibly use MCEs with 0 banks to play with.

~Andrew

>
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: GPF in mcheck_init() when booting xen-unstable on VMware ESX 5.1
  2013-05-31 19:32 ` Andrew Cooper
@ 2013-05-31 19:40   ` Aravindh Puthiyaparambil (aravindp)
  2013-06-03  8:59   ` Jan Beulich
  1 sibling, 0 replies; 4+ messages in thread
From: Aravindh Puthiyaparambil (aravindp) @ 2013-05-31 19:40 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: xen-devel

> -----Original Message-----
> From: Andrew Cooper [mailto:andrew.cooper3@citrix.com]
> Sent: Friday, May 31, 2013 12:32 PM
> To: Aravindh Puthiyaparambil (aravindp)
> Cc: xen-devel@lists.xensource.com
> Subject: Re: [Xen-devel] GPF in mcheck_init() when booting xen-unstable on
> VMware ESX 5.1
> 
> On 31/05/13 20:19, Aravindh Puthiyaparambil (aravindp) wrote:
> > I am trying to boot xen-unstable
> (9204bc654562976c7cdebf21c6b5013f6e3057b3) on VMware ESX 5.1 and
> Workstation 9. I have enabled "Virtualize Intel VT-x/EPT" option. I am seeing
> the following GPF during boot:
> >
> > (XEN) mce_intel.c:717: MCA Capability: BCAST 1 SER 0 CMCI 0 firstbank
> > 0 extended MCE MSR 0
> > (XEN) Intel machine check reporting enabled
> > (XEN) ----[ Xen-4.3-unstable  x86_64  debug=y  Not tainted ]----
> > (XEN) CPU:    0
> > (XEN) RIP:    e008:[<ffff82c4c01aa71e>] mcheck_init+0x38a/0x45d
> > (XEN) RFLAGS: 0000000000010087   CONTEXT: hypervisor
> > (XEN) rax: 0000000000000000   rbx: ffff82c4c026ca80   rcx: 0000000000000000
> > (XEN) rdx: ffff83001d6b2fe0   rsi: bad0bad0bad0bad0   rdi:
> bad0bad0bad0bad0
> > (XEN) rbp: ffff82c4c02cfe08   rsp: ffff82c4c02cfde8   r8:  ffff8300000b8f00
> > (XEN) r9:  0000000000000010   r10: bad0bad0bad0bad0   r11:
> 0000000000000010
> > (XEN) r12: ffff83001ffd9fe0   r13: 0000000000000000   r14: ffff82c4c02c8000
> > (XEN) r15: ffff83000008efb0   cr0: 000000008005003b   cr4: 00000000000400f0
> > (XEN) cr3: 000000001fc7b000   cr2: 0000000000000000
> > (XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: 0000   cs: e008
> > (XEN) Xen stack trace from rsp=ffff82c4c02cfde8:
> > (XEN)    0000000000000000 ffff82c4c026ca80 0000000080000008
> 00000000ffffffff
> > (XEN)    ffff82c4c02cfe48 ffff82c4c01a7356 1fabfbff000206a7
> 0000000096ba2223
> > (XEN)    ffff83001ffd9820 0000000000000002 ffff83001ffd9820
> ffff82c4c02c8000
> > (XEN)    ffff82c4c02cff08 ffff82c4c02a4536 0000000200000000
> 0000000000000000
> > (XEN)    ffff83000008ed90 00000000011fb000 0000000000100000
> ffff83000008efb0
> > (XEN)    0000000000000000 ffff83000051bc90 ffff830000000010
> ffff8300ffffff00
> > (XEN)    ffff83000008ef40 ffff82c400000001 0000000800000000
> 000000010000006e
> > (XEN)    0000000000000003 00000000000002f8 0000000000000000
> 0000000000000000
> > (XEN)    0000000000000000 0000000000000000 0000000000000000
> 0000000000000000
> > (XEN)    0000000000000000 ffff82c4c01000b5 0000000000000000
> 0000000000000000
> > (XEN)    0000000000000000 0000000000000000 0000000000000000
> 0000000000000000
> > (XEN)    0000000000000000 0000000000000000 0000000000000000
> 0000000000000000
> > (XEN)    0000000000000000 0000000000000000 0000000000000000
> 0000000000000000
> > (XEN)    0000000000000000 0000000000000000 0000000000000000
> 0000000000000000
> > (XEN)    0000000000000000 0000000000000000 0000000000000000
> 0000000000000000
> > (XEN)    0000000000000000 0000000000000000 0000000000000000
> 0000000000000000
> > (XEN)    ffff83001d6b0000 0000000000000000 0000000000000000
> > (XEN) Xen call trace:
> > (XEN)    [<ffff82c4c01aa71e>] mcheck_init+0x38a/0x45d
> > (XEN)    [<ffff82c4c01a7356>] identify_cpu+0x2b4/0x2d0
> > (XEN)    [<ffff82c4c02a4536>] __start_xen+0x26e9/0x2c98
> > (XEN)
> > (XEN)
> > (XEN) ****************************************
> > (XEN) Panic on CPU 0:
> > (XEN) GENERAL PROTECTION FAULT
> > (XEN) [error_code=0000]
> > (XEN) ****************************************
> > (XEN)
> >
> > I have narrowed it down to line 631 in set_poll_bankmask():
> > 	bitmap_copy(mb->bank_map, mca_allbanks->bank_map,
> nr_mce_banks);
> >
> > What is happening is that in mca_cap_init(), nr_mce_banks is being set to
> 0. This causes the allocation of bank_map to be set to ZERO_BLOCK_PTR
> which is the return value for zero-size allocation by
> xzalloc_array()/_xmalloc(). This results in the bitmap_copy() to fail
> disastrously. Is it correct to disable MCE if nr_mce_banks is 0? Or say this is a
> quirk of the VMware virtual platform and run with mce=0? Linux is to be able
> to handle this gracefully.
> >
> > Another question I have is that callers of xzalloc_array() and friends only
> check for a NULL return as an error. So what about cases like the one above
> which fell through the cracks because the return value is ZERO_BLOCK_PTR?
> Should they all be checking for ZERO_BLOCK_PTR too or ensuring that no calls
> are made with zero size allocations?
> >
> > Thanks,
> > Aravindh
> 
> ZERO_BLOCK_PTR is specifically distinguished from NULL (As the comment
> beside it says).
> 
> The real bug is calling **alloc() with 0 as a parameter.
> 
> I would say that nr_mce_banks of 0 should result in an implicit mce=0.
> You certainly cant sensibly use MCEs with 0 banks to play with.

OK. I will submit a patch.

Aravindh

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: GPF in mcheck_init() when booting xen-unstable on VMware ESX 5.1
  2013-05-31 19:32 ` Andrew Cooper
  2013-05-31 19:40   ` Aravindh Puthiyaparambil (aravindp)
@ 2013-06-03  8:59   ` Jan Beulich
  1 sibling, 0 replies; 4+ messages in thread
From: Jan Beulich @ 2013-06-03  8:59 UTC (permalink / raw)
  To: Aravindh Puthiyaparambil (aravindp), Andrew Cooper; +Cc: xen-devel

>>> On 31.05.13 at 21:32, Andrew Cooper <andrew.cooper3@citrix.com> wrote:
> On 31/05/13 20:19, Aravindh Puthiyaparambil (aravindp) wrote:
>> I have narrowed it down to line 631 in set_poll_bankmask():
>> 	bitmap_copy(mb->bank_map, mca_allbanks->bank_map, nr_mce_banks);
>>
>> What is happening is that in mca_cap_init(), nr_mce_banks is being set to 0. 
> This causes the allocation of bank_map to be set to ZERO_BLOCK_PTR which is 
> the return value for zero-size allocation by xzalloc_array()/_xmalloc(). This 
> results in the bitmap_copy() to fail disastrously. Is it correct to disable 
> MCE if nr_mce_banks is 0? Or say this is a quirk of the VMware virtual 
> platform and run with mce=0? Linux is to be able to handle this gracefully.
>>
>> Another question I have is that callers of xzalloc_array() and friends only 
> check for a NULL return as an error. So what about cases like the one above 
> which fell through the cracks because the return value is ZERO_BLOCK_PTR? 
> Should they all be checking for ZERO_BLOCK_PTR too or ensuring that no calls 
> are made with zero size allocations?
> 
> ZERO_BLOCK_PTR is specifically distinguished from NULL (As the comment
> beside it says).
> 
> The real bug is calling **alloc() with 0 as a parameter.

That's not really a bug - there are cases (hence the returning of
ZERO_BLOCK_PTR in the first place) where this is okay (e.g. in
order to simplify code). memcpy(), memset(), etc are all well
capable of dealing with that situation. The fact that bitmap_copy()
isn't is the root bug from my pov (and similarly other bitmap
operations - they should all bail on non-positive nbits input to be
generically usable, but perhaps we should make nbits of unsigned
type in the first place, as negative values are senseless anyway).

> I would say that nr_mce_banks of 0 should result in an implicit mce=0. 
> You certainly cant sensibly use MCEs with 0 banks to play with.

Yes, this is sensible to be fixed regardless of the above.

Jan

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2013-06-03  8:59 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-05-31 19:19 GPF in mcheck_init() when booting xen-unstable on VMware ESX 5.1 Aravindh Puthiyaparambil (aravindp)
2013-05-31 19:32 ` Andrew Cooper
2013-05-31 19:40   ` Aravindh Puthiyaparambil (aravindp)
2013-06-03  8:59   ` Jan Beulich

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.