All of lore.kernel.org
 help / color / mirror / Atom feed
* Real-mode bug with AMD, gPXE, and 32-bit rep movs
@ 2009-03-26 12:25 George Dunlap
  2009-03-26 14:43 ` Keir Fraser
  0 siblings, 1 reply; 9+ messages in thread
From: George Dunlap @ 2009-03-26 12:25 UTC (permalink / raw)
  To: Huang2, Wei, Christoph Egger, Xen-devel

With recent builds of -unstable, I've been unable to get an HVM domain
to boot on an AMD box with the virtual network card enabled.  The same
exact binaries work on an intel box just fine.

The problem turns out to be with the handling of a 32-bit rep MOVS
instruction in the 16-bit gPXE initialization code.  The offending
code is here:

f3 67 a4 66 89 3e 7b 02

nasm interprets it this way:

00000000  F3                db 0xF3
00000001  67A4              a32 movsb
00000003  66893E7B02        a32 mov [0x27b],edi

In other words, a 32-bit rep movs in 16-bit mode.  (gPXE appeas to be
copying itself from where it was installed at 0xc9000 to somewhere
higher in memory, 0x200000.  Not clear why it wants to do that.)

On Intel boxes, the code causes a #GP (not surprisingly), and the
emulator handles it successfully.

On AMD boxes (at least two of them), this causes a #GP (surprisingly).
That calls to the BIOS "null trap handler", which simply does an iret,
causing a busy loop.

There are three possibilities I came up with:
1) The same thing would happen outside of SVM; in which case it's
(sort of) a gPXE bug for using an instruction that won't work on AMD
boxes.
2) Xen is subtly screwing up the VM state, causing the AMD hardware
not to recognize that this shouldn't cause a #GP
3) AMD hardware (at least some of it) doesn't handle 32-bit rep movs
instructions in 16-bit mode.

If it's #1, we should try to build gPXE without the 32-bit instructions

If it's #2, we need to track down what state is being corrupted by Xen.

If it's #3, the simplest solution is probably to take vmexits on GP
faults and attempt to emulate the instruction if we're in real mode,
as we do for vmx.

Wei, Christoph: any ideas?

The cpuid output of the two boxes I've tried this on is below.

Thanks,
 -George Dunlap

[elite]
processor	: 0

vendor_id	: AuthenticAMD

cpu family	: 16

model		: 2

model name	: Quad-Core AMD Opteron(tm) Processor 2352

stepping	: 3

cpu MHz		: 2094.850

cache size	: 512 KB

fdiv_bug	: no

hlt_bug		: no

f00f_bug	: no

coma_bug	: no

fpu		: yes

fpu_exception	: yes

cpuid level	: 5

wp		: yes

flags		: fpu de tsc msr pae cx8 apic mtrr cmov pat clflush mmx fxsr
sse sse2 ht syscall nx mmxext fxsr_opt 3dnowext 3dnow constant_tsc pni
cmp_legacy cr8legacy ts ttp tm stc [6] [7] [8]

bogomips	: 4190.72


[dakota]processor	: 0

vendor_id	: AuthenticAMD

cpu family	: 15

model		: 65

model name	: Dual-Core AMD Opteron(tm) Processor 2218

stepping	: 2

cpu MHz		: 2593.560

cache size	: 1024 KB

fdiv_bug	: no

hlt_bug		: no

f00f_bug	: no

coma_bug	: no

fpu		: yes

fpu_exception	: yes

cpuid level	: 1

wp		: yes

flags		: fpu de tsc msr pae mce cx8 apic mtrr mca cmov pat clflush mmx
fxsr sse sse2 ht nx mmxext fxsr_opt 3dnowext 3dnow pni cmp_legacy
cr8legacy ts fid vid ttp tm stc

bogomips	: 5188.45

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Real-mode bug with AMD, gPXE, and 32-bit rep movs
  2009-03-26 12:25 Real-mode bug with AMD, gPXE, and 32-bit rep movs George Dunlap
@ 2009-03-26 14:43 ` Keir Fraser
  2009-03-26 14:54   ` Tim Deegan
  2009-03-26 15:15   ` George Dunlap
  0 siblings, 2 replies; 9+ messages in thread
From: Keir Fraser @ 2009-03-26 14:43 UTC (permalink / raw)
  To: George Dunlap, Huang2, Wei, Christoph Egger, Xen-devel

On 26/03/2009 12:25, "George Dunlap" <George.Dunlap@eu.citrix.com> wrote:

> There are three possibilities I came up with:
> 1) The same thing would happen outside of SVM; in which case it's
> (sort of) a gPXE bug for using an instruction that won't work on AMD
> boxes.
> 2) Xen is subtly screwing up the VM state, causing the AMD hardware
> not to recognize that this shouldn't cause a #GP
> 3) AMD hardware (at least some of it) doesn't handle 32-bit rep movs
> instructions in 16-bit mode.

It must surely be a Xen bug. Doing 32-bit ops in 16-bit mode is a completely
standard thing that all processors will support. The other alternative is
perhaps we have somehow managed to build ourselves a bogus gpxe image.

Your assertion that it causes GP on Intel is weird. We should be running in
the emulator already since for the movs to 0x200000 to work we must be
running in big real mode (i.e., one of the segment registers has a limit
greater than 0xffff) and so we cannot be emulating that by running the guest
in vm86 mode.

I can give some help tracking this down when I'm back next week, if it's not
resolved by then. It's also the sort of thing which may interest Tim Deegan,
who has also worked on real mode support on the Intel side in the past.

 -- Keir

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Real-mode bug with AMD, gPXE, and 32-bit rep movs
  2009-03-26 14:43 ` Keir Fraser
@ 2009-03-26 14:54   ` Tim Deegan
  2009-03-26 15:15   ` George Dunlap
  1 sibling, 0 replies; 9+ messages in thread
From: Tim Deegan @ 2009-03-26 14:54 UTC (permalink / raw)
  To: Keir Fraser; +Cc: George Dunlap, Christoph Egger, Huang2, Wei, Xen-devel

At 14:43 +0000 on 26 Mar (1238078627), Keir Fraser wrote:
> Your assertion that it causes GP on Intel is weird. We should be running in
> the emulator already since for the movs to 0x200000 to work we must be
> running in big real mode (i.e., one of the segment registers has a limit
> greater than 0xffff) and so we cannot be emulating that by running the guest
> in vm86 mode.

We do use vm86 mode for big-real-mode; we just clip the segment limits
to 16 bits and carry on, since almost all instructions don't use the big
segments.  Then when we take a fault for the A32 REP MOVS with the
>16-bit offset we go into the emulator and it does the right thing.

Cheers,

Tim.

-- 
Tim Deegan <Tim.Deegan@citrix.com>
Principal Software Engineer, Citrix Systems (R&D) Ltd.
[Company #02300071, SL9 0DZ, UK.]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Real-mode bug with AMD, gPXE, and 32-bit rep movs
  2009-03-26 14:43 ` Keir Fraser
  2009-03-26 14:54   ` Tim Deegan
@ 2009-03-26 15:15   ` George Dunlap
  2009-03-26 16:24     ` Christoph Egger
  1 sibling, 1 reply; 9+ messages in thread
From: George Dunlap @ 2009-03-26 15:15 UTC (permalink / raw)
  To: Keir Fraser; +Cc: Christoph Egger, Huang2, Wei, Xen-devel

Keir Fraser wrote:
> On 26/03/2009 12:25, "George Dunlap" <George.Dunlap@eu.citrix.com> wrote:
>
>   
>> There are three possibilities I came up with:
>> 1) The same thing would happen outside of SVM; in which case it's
>> (sort of) a gPXE bug for using an instruction that won't work on AMD
>> boxes.
>> 2) Xen is subtly screwing up the VM state, causing the AMD hardware
>> not to recognize that this shouldn't cause a #GP
>> 3) AMD hardware (at least some of it) doesn't handle 32-bit rep movs
>> instructions in 16-bit mode.
>>     
>
> It must surely be a Xen bug. Doing 32-bit ops in 16-bit mode is a completely
> standard thing that all processors will support. The other alternative is
> perhaps we have somehow managed to build ourselves a bogus gpxe image.
>   
for #3, I meant that perhaps the AMD hardware didn't handle it properly 
in non-root mode (as opposed to #1, which suggested it may not work on 
AMD hardware at all).  I'm not that familiar with this level of the x86 
architecture at all, so I'll take your word for it. :-)
> Your assertion that it causes GP on Intel is weird. We should be running in
> the emulator already since for the movs to 0x200000 to work we must be
> running in big real mode (i.e., one of the segment registers has a limit
> greater than 0xffff) and so we cannot be emulating that by running the guest
> in vm86 mode.
>   
Maybe I wasn't clear, or didn't use the technical terms properly; in any 
case, here's a trace from an Intel box about the code in question.  I 
added some extra tracing to gather information about what happened in 
the emulation.  You see:
* An io port write (the last thing before the instruction).
* An EXCEPTION_NMI exit at the code in question (cs=c900 eip=1cb, linear 
address = c91cb) caused by a trap 13 (GP fault)
* The emulator copies 1 page from c9000 to 200000
* Repeats for ca000 -> 201000

!  4.110129337 -x  vmentry
]  4.110130683 -x  vmexit exit_reason IO_INSTRUCTION eip 7b16
   4.110130683 -x io write port 981 val 40
   4.110133785 -x  runstate_change d2v0 running->offline
 [dom0 handles the io write]
   4.110142327 -x  runstate_change d2v0 runnable->running
!  4.110144371 -x  vmentry
]  4.110145950 -x  vmexit exit_reason EXCEPTION_NMI eip 1cb
   4.110145950 -x realmode (trap 13)
   4.110145950 -x rep_mov sseg 2 soff 0 dseg 3 doff 200000
   4.110145950 -x rep_mov2 saddr c9000 sgpa c9000 daddr 200000 dgpa 200000
]  4.110156960 -x  vmentry cycles 26424 !
]  4.110158295 -x  vmexit exit_reason EXCEPTION_NMI eip 1cb
   4.110158295 -x realmode (trap 13)
   4.110158295 -x rep_mov sseg 2 soff 1000 dseg 3 doff 201000
   4.110158295 -x rep_mov2 saddr ca000 sgpa ca000 daddr 201000 dgpa 201000
]  4.110162836 -x  vmentry cycles 10899 !

So it seems clear that:
* it was not in all-emulation mode
* it took a GP fault at that instruction
* it emulated it successfully. 
Is this not what's expected?
> I can give some help tracking this down when I'm back next week, if it's not
> resolved by then. It's also the sort of thing which may interest Tim Deegan,
> who has also worked on real mode support on the Intel side in the past.
>   
Tim gave me a hand to get this far.  I'm going to try to get the rep 
movs instruction into Gianluca's "xentest" framework when he comes back 
next week, so we can isolate different variables better.

 -George

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Real-mode bug with AMD, gPXE, and 32-bit rep movs
  2009-03-26 15:15   ` George Dunlap
@ 2009-03-26 16:24     ` Christoph Egger
  2009-03-26 16:31       ` Tim Deegan
  0 siblings, 1 reply; 9+ messages in thread
From: Christoph Egger @ 2009-03-26 16:24 UTC (permalink / raw)
  To: xen-devel; +Cc: George Dunlap, Huang2, Wei, Keir Fraser

On Thursday 26 March 2009 16:15:06 George Dunlap wrote:
> Keir Fraser wrote:
> > On 26/03/2009 12:25, "George Dunlap" <George.Dunlap@eu.citrix.com> wrote:
> >> There are three possibilities I came up with:
> >> 1) The same thing would happen outside of SVM; in which case it's
> >> (sort of) a gPXE bug for using an instruction that won't work on AMD
> >> boxes.
> >> 2) Xen is subtly screwing up the VM state, causing the AMD hardware
> >> not to recognize that this shouldn't cause a #GP

I think it's #2. Look at the #GP causes in APM 
Volume 2 for MOVSx: the only one in real mode is if the address 
exceeded a data segment limit.  And the comment from Deegan about 
clipping segment limits to 16 bits makes me think that the clipping is 
happening on AMD machines and it shouldn't be.

So probably, VMCB.DS.LIMIT is smaller than it should be. Note, that
AMD requires the segment limit to be the effective limit and
the granularity segment attribute is ignored.

Christoph


-- 
---to satisfy European Law for business letters:
Advanced Micro Devices GmbH
Karl-Hammerschmidt-Str. 34, 85609 Dornach b. Muenchen
Geschaeftsfuehrer: Jochen Polster, Thomas M. McCoy, Giuliano Meroni
Sitz: Dornach, Gemeinde Aschheim, Landkreis Muenchen
Registergericht Muenchen, HRB Nr. 43632

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Real-mode bug with AMD, gPXE, and 32-bit rep movs
  2009-03-26 16:24     ` Christoph Egger
@ 2009-03-26 16:31       ` Tim Deegan
  2009-03-30 15:02         ` George Dunlap
  0 siblings, 1 reply; 9+ messages in thread
From: Tim Deegan @ 2009-03-26 16:31 UTC (permalink / raw)
  To: Christoph Egger; +Cc: George Dunlap, xen-devel, Keir Fraser, Huang2, Wei

At 16:24 +0000 on 26 Mar (1238084646), Christoph Egger wrote:
> I think it's #2. Look at the #GP causes in APM 
> Volume 2 for MOVSx: the only one in real mode is if the address 
> exceeded a data segment limit.  And the comment from Deegan about 
> clipping segment limits to 16 bits makes me think that the clipping is 
> happening on AMD machines and it shouldn't be.

That particular clipping happens in vmx.c so I certainly hope it doesn't
get called on AMD machines. :)  But yes, it's likely that some
big-real-mode segment state has got lost somewhere.

Tim.

-- 
Tim Deegan <Tim.Deegan@citrix.com>
Principal Software Engineer, Citrix Systems (R&D) Ltd.
[Company #02300071, SL9 0DZ, UK.]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Real-mode bug with AMD, gPXE, and 32-bit rep movs
  2009-03-26 16:31       ` Tim Deegan
@ 2009-03-30 15:02         ` George Dunlap
  0 siblings, 0 replies; 9+ messages in thread
From: George Dunlap @ 2009-03-30 15:02 UTC (permalink / raw)
  To: Tim Deegan; +Cc: Huang2, Wei, Christoph Egger, xen-devel, Keir Fraser

So it turns out this is actually a bug in gPXE, combined with a
limit-check bug in HVM emulation.  The segment limit was only set to
16 bits, but was clearly trying to use a 32-bit address, so the AMD
hardware was doing exactly what it should have done.

The reason it worked on an Intel box was that the hvm address checking
logic doesn't do *any* limit checking when in real mode.  If you add a
real-mode segment limit check in hvm.c:hvm_virtual_to_linear_addr(),
then it the VM has the exact same issue.

So one thing is clear, we need to re-compile the gPXE "binary" that
comes in xen so that it doesn't violate segment limits.

We might think about checking some limits in real mode as well, just
for good measure.

 -George

On Thu, Mar 26, 2009 at 5:31 PM, Tim Deegan <Tim.Deegan@citrix.com> wrote:
> At 16:24 +0000 on 26 Mar (1238084646), Christoph Egger wrote:
>> I think it's #2. Look at the #GP causes in APM
>> Volume 2 for MOVSx: the only one in real mode is if the address
>> exceeded a data segment limit.  And the comment from Deegan about
>> clipping segment limits to 16 bits makes me think that the clipping is
>> happening on AMD machines and it shouldn't be.
>
> That particular clipping happens in vmx.c so I certainly hope it doesn't
> get called on AMD machines. :)  But yes, it's likely that some
> big-real-mode segment state has got lost somewhere.
>
> Tim.
>
> --
> Tim Deegan <Tim.Deegan@citrix.com>
> Principal Software Engineer, Citrix Systems (R&D) Ltd.
> [Company #02300071, SL9 0DZ, UK.]
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel
>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Real-mode bug with AMD, gPXE, and 32-bit rep movs
       [not found] <200903310456.39523.mcb30@etherboot.org>
  2009-03-31  7:21 ` Keir Fraser
@ 2009-03-31 13:06 ` Keir Fraser
  1 sibling, 0 replies; 9+ messages in thread
From: Keir Fraser @ 2009-03-31 13:06 UTC (permalink / raw)
  To: Michael Brown, George Dunlap
  Cc: Tim Deegan, Christoph Egger, Huang2, Wei, xen-devel, Sebastian Herbszt

On 31/03/2009 04:56, "Michael Brown" <mcb30@etherboot.org> wrote:

> It's a bug in the BIOS; it should either not advertise PMM, or it should
> follow the PMM spec and call the initialisation entry point in flat real
> mode.

George, can you please test with xen-unstable:19477. This should give all
our real-mode segments 4GB limits, and thus fix our PMM support.

 -- Keir

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Real-mode bug with AMD, gPXE, and 32-bit rep movs
       [not found] <200903310456.39523.mcb30@etherboot.org>
@ 2009-03-31  7:21 ` Keir Fraser
  2009-03-31 13:06 ` Keir Fraser
  1 sibling, 0 replies; 9+ messages in thread
From: Keir Fraser @ 2009-03-31  7:21 UTC (permalink / raw)
  To: Michael Brown, George Dunlap
  Cc: Christoph Egger, xen-devel, etherboot-developers, Tim Deegan,
	Huang2, Wei, Sebastian Herbszt

On 31/03/2009 04:56, "Michael Brown" <mcb30@etherboot.org> wrote:

> It's a bug in the BIOS; it should either not advertise PMM, or it should
> follow the PMM spec and call the initialisation entry point in flat real
> mode.

Thanks Michael. I'll get this fixed.

 -- Keir

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2009-03-31 13:06 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-03-26 12:25 Real-mode bug with AMD, gPXE, and 32-bit rep movs George Dunlap
2009-03-26 14:43 ` Keir Fraser
2009-03-26 14:54   ` Tim Deegan
2009-03-26 15:15   ` George Dunlap
2009-03-26 16:24     ` Christoph Egger
2009-03-26 16:31       ` Tim Deegan
2009-03-30 15:02         ` George Dunlap
     [not found] <200903310456.39523.mcb30@etherboot.org>
2009-03-31  7:21 ` Keir Fraser
2009-03-31 13:06 ` Keir Fraser

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.