All of lore.kernel.org
 help / color / mirror / Atom feed
* x86/AMD: Nested hvm crashes in 4.3
@ 2013-06-27  0:24 Suravee Suthikulanit
  2013-06-27  8:22 ` Jan Beulich
  0 siblings, 1 reply; 17+ messages in thread
From: Suravee Suthikulanit @ 2013-06-27  0:24 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Shin, Jacob, Hurwitz, Sherry, Keir Fraser, Suravee Suthikulanit,
	xen-devel

Hi Jan,

I have found an issue in where the system crash right when I start 
another HVM guest inside an HVM guest.  I have traced back to the patch 
which the issue started.

commit f1bde87fc08ce8c818a1640a8fe4765d48923091
Author: Jan Beulich <jbeulich@suse.com>
Date:   Fri Feb 8 11:06:04 2013 +0100

     x86: debugging code for testing 16Tb support on smaller memory systems

     Signed-off-by: Jan Beulich <jbeulich@suse.com>
     Acked-by: Keir Fraser <keir@xen.org>

The issue doesn't reproduce when starting a PV (L2) guest inside an HVM 
(L1) guest.

Suravee

PS: The L2 Xen is running Xen-4.3, but I think the issue is at the L1 
Xen since it crashes the system.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: x86/AMD: Nested hvm crashes in 4.3
  2013-06-27  0:24 x86/AMD: Nested hvm crashes in 4.3 Suravee Suthikulanit
@ 2013-06-27  8:22 ` Jan Beulich
  2013-06-27  9:20   ` Suravee Suthikulpanit
  0 siblings, 1 reply; 17+ messages in thread
From: Jan Beulich @ 2013-06-27  8:22 UTC (permalink / raw)
  To: Suravee Suthikulanit; +Cc: Keir Fraser, xen-devel, Jacob Shin, Sherry Hurwitz

>>> On 27.06.13 at 02:24, Suravee Suthikulanit <suravee.suthikulpanit@amd.com> wrote:
> I have found an issue in where the system crash right when I start 
> another HVM guest inside an HVM guest.  I have traced back to the patch 
> which the issue started.
> 
> commit f1bde87fc08ce8c818a1640a8fe4765d48923091
> Author: Jan Beulich <jbeulich@suse.com>
> Date:   Fri Feb 8 11:06:04 2013 +0100
> 
>      x86: debugging code for testing 16Tb support on smaller memory systems
> 
>      Signed-off-by: Jan Beulich <jbeulich@suse.com>
>      Acked-by: Keir Fraser <keir@xen.org>

We had issues exposed by this patch before, but any such issue
would just have been masked before that patch (and would
surface on a system with more than 5Tb of memory anyway). So
it is very unlikely for the patch itself to be at fault.

Furthermore, the crash them supposedly is because of the code
added conditional upon NDEBUG, and hence would (on a smaller
memory system) otherwise not surface at all for a production
(debug=n) build.

> The issue doesn't reproduce when starting a PV (L2) guest inside an HVM 
> (L1) guest.

"Does not" or "does"? In the former case - what is this supposed to
tell me?

In any case - without you sharing technical details (register/stack
dump of the crash at the very least) I don't think I have anything
at hand to look for possible problems.

Jan

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: x86/AMD: Nested hvm crashes in 4.3
  2013-06-27  8:22 ` Jan Beulich
@ 2013-06-27  9:20   ` Suravee Suthikulpanit
  2013-06-27  9:50     ` Egger, Christoph
  2013-06-27 10:08     ` Jan Beulich
  0 siblings, 2 replies; 17+ messages in thread
From: Suravee Suthikulpanit @ 2013-06-27  9:20 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Keir Fraser, xen-devel, Jacob Shin, Sherry Hurwitz

On 6/27/2013 3:22 AM, Jan Beulich wrote:
>>>> On 27.06.13 at 02:24, Suravee Suthikulanit <suravee.suthikulpanit@amd.com> wrote:
>> I have found an issue in where the system crash right when I start
>> another HVM guest inside an HVM guest.  I have traced back to the patch
>> which the issue started.
>>
>> commit f1bde87fc08ce8c818a1640a8fe4765d48923091
>> Author: Jan Beulich <jbeulich@suse.com>
>> Date:   Fri Feb 8 11:06:04 2013 +0100
>>
>>       x86: debugging code for testing 16Tb support on smaller memory systems
>>
>>       Signed-off-by: Jan Beulich <jbeulich@suse.com>
>>       Acked-by: Keir Fraser <keir@xen.org>
> We had issues exposed by this patch before, but any such issue
> would just have been masked before that patch (and would
> surface on a system with more than 5Tb of memory anyway).

The system I am having the issue has 48GB of memory.

> So it is very unlikely for the patch itself to be at fault.

I have traced the issue and found that the system crashing starts from this commit id and onward.
(i.e. The system does not crash with commit id ed759d20249197cf87b338ff0ed328052ca3b8e7)
So, I am still believe that this patch has somehow triggered the issue.

> Furthermore, the crash them supposedly is because of the code
> added conditional upon NDEBUG, and hence would (on a smaller
> memory system) otherwise not surface at all for a production
> (debug=n) build.
>
>> The issue doesn't reproduce when starting a PV (L2) guest inside an HVM
>> (L1) guest.
> "Does not" or "does"? In the former case - what is this supposed to
> tell me?

What I am trying to say here is that the system_does not_ crash when starting the PV guest as level 2 guest.
This is meant to be another data point to help analyzing the issue.

> In any case - without you sharing technical details (register/stack
> dump of the crash at the very least) I don't think I have anything
> at hand to look for possible problems.

At this point, I am just reporting the issue.  I have not been able to get the crash dump because the system immediately reboot.
I'll try to boot Xen with "noreboot" and inspect the log for more clues. Any suggestions are welcome.

Thank you,

Suravee

>
> Jan
>
>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: x86/AMD: Nested hvm crashes in 4.3
  2013-06-27  9:20   ` Suravee Suthikulpanit
@ 2013-06-27  9:50     ` Egger, Christoph
  2013-06-27 10:08     ` Jan Beulich
  1 sibling, 0 replies; 17+ messages in thread
From: Egger, Christoph @ 2013-06-27  9:50 UTC (permalink / raw)
  To: Suravee Suthikulpanit
  Cc: Jacob Shin, Sherry Hurwitz, Keir Fraser, Jan Beulich, xen-devel

Running a PV guest as L2 guest makes no difference how the p2m
code is used in L0 Xen (L0 == OS that runs on bare metal hardware).

I assume you use the default settings which means you use NPT-on-NPT.
Try shadow-on-npt. You can do this with

  cpuid="host,svm_npt=0"

in the guest config file. Then in the L1 guest you should see
the NPT svm feature bit not available.
Then launch a l2 guest and check if it still crashes.

I agree with Jan: Please provide the crash logs he requested.

Christoph

On 27.06.13 11:20, Suravee Suthikulpanit wrote:
> On 6/27/2013 3:22 AM, Jan Beulich wrote:
>>>>> On 27.06.13 at 02:24, Suravee Suthikulanit
>>>>> <suravee.suthikulpanit@amd.com> wrote:
>>> I have found an issue in where the system crash right when I start
>>> another HVM guest inside an HVM guest.  I have traced back to the patch
>>> which the issue started.
>>>
>>> commit f1bde87fc08ce8c818a1640a8fe4765d48923091
>>> Author: Jan Beulich <jbeulich@suse.com>
>>> Date:   Fri Feb 8 11:06:04 2013 +0100
>>>
>>>       x86: debugging code for testing 16Tb support on smaller memory
>>> systems
>>>
>>>       Signed-off-by: Jan Beulich <jbeulich@suse.com>
>>>       Acked-by: Keir Fraser <keir@xen.org>
>> We had issues exposed by this patch before, but any such issue
>> would just have been masked before that patch (and would
>> surface on a system with more than 5Tb of memory anyway).
> 
> The system I am having the issue has 48GB of memory.
> 
>> So it is very unlikely for the patch itself to be at fault.
> 
> I have traced the issue and found that the system crashing starts from
> this commit id and onward.
> (i.e. The system does not crash with commit id
> ed759d20249197cf87b338ff0ed328052ca3b8e7)
> So, I am still believe that this patch has somehow triggered the issue.
> 
>> Furthermore, the crash them supposedly is because of the code
>> added conditional upon NDEBUG, and hence would (on a smaller
>> memory system) otherwise not surface at all for a production
>> (debug=n) build.
>>
>>> The issue doesn't reproduce when starting a PV (L2) guest inside an HVM
>>> (L1) guest.
>> "Does not" or "does"? In the former case - what is this supposed to
>> tell me?
> 
> What I am trying to say here is that the system_does not_ crash when
> starting the PV guest as level 2 guest.
> This is meant to be another data point to help analyzing the issue.
> 
>> In any case - without you sharing technical details (register/stack
>> dump of the crash at the very least) I don't think I have anything
>> at hand to look for possible problems.
> 
> At this point, I am just reporting the issue.  I have not been able to
> get the crash dump because the system immediately reboot.
> I'll try to boot Xen with "noreboot" and inspect the log for more clues.
> Any suggestions are welcome.
> 
> Thank you,
> 
> Suravee

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: x86/AMD: Nested hvm crashes in 4.3
  2013-06-27  9:20   ` Suravee Suthikulpanit
  2013-06-27  9:50     ` Egger, Christoph
@ 2013-06-27 10:08     ` Jan Beulich
  2013-06-27 10:24       ` Suravee Suthikulpanit
  1 sibling, 1 reply; 17+ messages in thread
From: Jan Beulich @ 2013-06-27 10:08 UTC (permalink / raw)
  To: Suravee Suthikulpanit; +Cc: Keir Fraser, xen-devel, Jacob Shin, Sherry Hurwitz

>>> On 27.06.13 at 11:20, Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> wrote:
> On 6/27/2013 3:22 AM, Jan Beulich wrote:
>>>>> On 27.06.13 at 02:24, Suravee Suthikulanit <suravee.suthikulpanit@amd.com> wrote:
>>> I have found an issue in where the system crash right when I start
>>> another HVM guest inside an HVM guest.  I have traced back to the patch
>>> which the issue started.
>>>
>>> commit f1bde87fc08ce8c818a1640a8fe4765d48923091
>>> Author: Jan Beulich <jbeulich@suse.com>
>>> Date:   Fri Feb 8 11:06:04 2013 +0100
>>>
>>>       x86: debugging code for testing 16Tb support on smaller memory systems
>>>
>>>       Signed-off-by: Jan Beulich <jbeulich@suse.com>
>>>       Acked-by: Keir Fraser <keir@xen.org>
>> We had issues exposed by this patch before, but any such issue
>> would just have been masked before that patch (and would
>> surface on a system with more than 5Tb of memory anyway).
> 
> The system I am having the issue has 48GB of memory.

Which is why you're seeing the problem only with the debugging
code enabled. (And of course I didn't really expect you to have
tried this on a huge memory system - they're just too rare still
for this to be likely.)

>> So it is very unlikely for the patch itself to be at fault.
> 
> I have traced the issue and found that the system crashing starts from this 
> commit id and onward.
> (i.e. The system does not crash with commit id 
> ed759d20249197cf87b338ff0ed328052ca3b8e7)
> So, I am still believe that this patch has somehow triggered the issue.

As said - I'm pretty certain this merely unmasked an already
lurking issue. And that's what the purpose of that patch is.

Jan

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: x86/AMD: Nested hvm crashes in 4.3
  2013-06-27 10:08     ` Jan Beulich
@ 2013-06-27 10:24       ` Suravee Suthikulpanit
  2013-06-27 10:28         ` Andrew Cooper
                           ` (3 more replies)
  0 siblings, 4 replies; 17+ messages in thread
From: Suravee Suthikulpanit @ 2013-06-27 10:24 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Keir Fraser, xen-devel, Jacob Shin, Sherry Hurwitz

On 6/27/2013 5:08 AM, Jan Beulich wrote:
>>>> On 27.06.13 at 11:20, Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> wrote:
>> On 6/27/2013 3:22 AM, Jan Beulich wrote:
>>>>>> On 27.06.13 at 02:24, Suravee Suthikulanit <suravee.suthikulpanit@amd.com> wrote:
>>>> I have found an issue in where the system crash right when I start
>>>> another HVM guest inside an HVM guest.  I have traced back to the patch
>>>> which the issue started.
>>>>
>>>> commit f1bde87fc08ce8c818a1640a8fe4765d48923091
>>>> Author: Jan Beulich <jbeulich@suse.com>
>>>> Date:   Fri Feb 8 11:06:04 2013 +0100
>>>>
>>>>        x86: debugging code for testing 16Tb support on smaller memory systems
>>>>
>>>>        Signed-off-by: Jan Beulich <jbeulich@suse.com>
>>>>        Acked-by: Keir Fraser <keir@xen.org>
>>> We had issues exposed by this patch before, but any such issue
>>> would just have been masked before that patch (and would
>>> surface on a system with more than 5Tb of memory anyway).
>> The system I am having the issue has 48GB of memory.
> Which is why you're seeing the problem only with the debugging
> code enabled.
Is the "debugging" enabled by default?  I didn't specify any debug when 
building.
How can I check and disable debugging?

> (And of course I didn't really expect you to have
> tried this on a huge memory system - they're just too rare still
> for this to be likely.)
>
>>> So it is very unlikely for the patch itself to be at fault.
>> I have traced the issue and found that the system crashing starts from this
>> commit id and onward.
>> (i.e. The system does not crash with commit id
>> ed759d20249197cf87b338ff0ed328052ca3b8e7)
>> So, I am still believe that this patch has somehow triggered the issue.
> As said - I'm pretty certain this merely unmasked an already
> lurking issue.
I'm not quite sure what you meant here.  Are you saying that this 
"crashing" is a known issue?

>   And that's what the purpose of that patch is.
This patch is crashing the system. What do you mean by "And that's what 
the purpose of that patch is"?

Suravee
>
> Jan
>
>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: x86/AMD: Nested hvm crashes in 4.3
  2013-06-27 10:24       ` Suravee Suthikulpanit
@ 2013-06-27 10:28         ` Andrew Cooper
  2013-06-27 10:33         ` Egger, Christoph
                           ` (2 subsequent siblings)
  3 siblings, 0 replies; 17+ messages in thread
From: Andrew Cooper @ 2013-06-27 10:28 UTC (permalink / raw)
  To: Suravee Suthikulpanit
  Cc: Jacob Shin, Sherry Hurwitz, Keir Fraser, Jan Beulich, xen-devel

On 27/06/13 11:24, Suravee Suthikulpanit wrote:
> On 6/27/2013 5:08 AM, Jan Beulich wrote:
>>>>> On 27.06.13 at 11:20, Suravee Suthikulpanit
>>>>> <suravee.suthikulpanit@amd.com> wrote:
>>> On 6/27/2013 3:22 AM, Jan Beulich wrote:
>>>>>>> On 27.06.13 at 02:24, Suravee Suthikulanit
>>>>>>> <suravee.suthikulpanit@amd.com> wrote:
>>>>> I have found an issue in where the system crash right when I start
>>>>> another HVM guest inside an HVM guest.  I have traced back to the
>>>>> patch
>>>>> which the issue started.
>>>>>
>>>>> commit f1bde87fc08ce8c818a1640a8fe4765d48923091
>>>>> Author: Jan Beulich <jbeulich@suse.com>
>>>>> Date:   Fri Feb 8 11:06:04 2013 +0100
>>>>>
>>>>>        x86: debugging code for testing 16Tb support on smaller
>>>>> memory systems
>>>>>
>>>>>        Signed-off-by: Jan Beulich <jbeulich@suse.com>
>>>>>        Acked-by: Keir Fraser <keir@xen.org>
>>>> We had issues exposed by this patch before, but any such issue
>>>> would just have been masked before that patch (and would
>>>> surface on a system with more than 5Tb of memory anyway).
>>> The system I am having the issue has 48GB of memory.
>> Which is why you're seeing the problem only with the debugging
>> code enabled.
> Is the "debugging" enabled by default?  I didn't specify any debug
> when building.
> How can I check and disable debugging?
>
>> (And of course I didn't really expect you to have
>> tried this on a huge memory system - they're just too rare still
>> for this to be likely.)
>>
>>>> So it is very unlikely for the patch itself to be at fault.
>>> I have traced the issue and found that the system crashing starts
>>> from this
>>> commit id and onward.
>>> (i.e. The system does not crash with commit id
>>> ed759d20249197cf87b338ff0ed328052ca3b8e7)
>>> So, I am still believe that this patch has somehow triggered the issue.
>> As said - I'm pretty certain this merely unmasked an already
>> lurking issue.
> I'm not quite sure what you meant here.  Are you saying that this
> "crashing" is a known issue?
>
>>   And that's what the purpose of that patch is.
> This patch is crashing the system. What do you mean by "And that's
> what the purpose of that patch is"?
>
> Suravee
>>
>> Jan
>>
>>
>
>

It means that this patch is exposing a latent bug where the nested hvm
code is already wrong.  It will be something in the nested hvm code
which is not using map_domain_page() when it really should be.

Without posting a stack trace, there is nothing we can do to help narrow
down the issue.

~Andrew


>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: x86/AMD: Nested hvm crashes in 4.3
  2013-06-27 10:24       ` Suravee Suthikulpanit
  2013-06-27 10:28         ` Andrew Cooper
@ 2013-06-27 10:33         ` Egger, Christoph
  2013-06-27 11:14           ` Suravee Suthikulpanit
  2013-06-27 11:20         ` George Dunlap
  2013-06-27 11:37         ` Jan Beulich
  3 siblings, 1 reply; 17+ messages in thread
From: Egger, Christoph @ 2013-06-27 10:33 UTC (permalink / raw)
  To: Suravee Suthikulpanit
  Cc: Jacob Shin, Sherry Hurwitz, Keir Fraser, Jan Beulich, xen-devel

On 27.06.13 12:24, Suravee Suthikulpanit wrote:
> On 6/27/2013 5:08 AM, Jan Beulich wrote:
>>>>> On 27.06.13 at 11:20, Suravee Suthikulpanit
>>>>> <suravee.suthikulpanit@amd.com> wrote:
>>> On 6/27/2013 3:22 AM, Jan Beulich wrote:
>>>>>>> On 27.06.13 at 02:24, Suravee Suthikulanit
>>>>>>> <suravee.suthikulpanit@amd.com> wrote:
>>>>> I have found an issue in where the system crash right when I start
>>>>> another HVM guest inside an HVM guest.  I have traced back to the
>>>>> patch
>>>>> which the issue started.
>>>>>
>>>>> commit f1bde87fc08ce8c818a1640a8fe4765d48923091
>>>>> Author: Jan Beulich <jbeulich@suse.com>
>>>>> Date:   Fri Feb 8 11:06:04 2013 +0100
>>>>>
>>>>>        x86: debugging code for testing 16Tb support on smaller
>>>>> memory systems
>>>>>
>>>>>        Signed-off-by: Jan Beulich <jbeulich@suse.com>
>>>>>        Acked-by: Keir Fraser <keir@xen.org>
>>>> We had issues exposed by this patch before, but any such issue
>>>> would just have been masked before that patch (and would
>>>> surface on a system with more than 5Tb of memory anyway).
>>> The system I am having the issue has 48GB of memory.
>> Which is why you're seeing the problem only with the debugging
>> code enabled.
> Is the "debugging" enabled by default?  I didn't specify any debug when
> building.

"Debugging" is enabled by default in the development tree.

> How can I check and disable debugging?

In the toplevel source directory look into Config.mk
and set the line

   debug ?= y

accordingly.

> 
>> (And of course I didn't really expect you to have
>> tried this on a huge memory system - they're just too rare still
>> for this to be likely.)
>>
>>>> So it is very unlikely for the patch itself to be at fault.
>>> I have traced the issue and found that the system crashing starts
>>> from this
>>> commit id and onward.
>>> (i.e. The system does not crash with commit id
>>> ed759d20249197cf87b338ff0ed328052ca3b8e7)
>>> So, I am still believe that this patch has somehow triggered the issue.
>> As said - I'm pretty certain this merely unmasked an already
>> lurking issue.
> I'm not quite sure what you meant here.  Are you saying that this
> "crashing" is a known issue?

He means nestedhvm reveals an existing bug in his patch.
If he is right then you do not see nestedhvm crashing with a non-debug
xen-kernel (unless something else broke it).

> 
>>   And that's what the purpose of that patch is.
> This patch is crashing the system. What do you mean by "And that's what
> the purpose of that patch is"?

The purpose is "People, please test".

Christoph

> 
> Suravee
>>
>> Jan

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: x86/AMD: Nested hvm crashes in 4.3
  2013-06-27 10:33         ` Egger, Christoph
@ 2013-06-27 11:14           ` Suravee Suthikulpanit
  2013-06-28  0:44             ` Suravee Suthikulanit
  0 siblings, 1 reply; 17+ messages in thread
From: Suravee Suthikulpanit @ 2013-06-27 11:14 UTC (permalink / raw)
  To: Egger, Christoph
  Cc: Jacob Shin, Sherry Hurwitz, Keir Fraser, Jan Beulich, xen-devel

On 6/27/2013 5:33 AM, Egger, Christoph wrote:
> On 27.06.13 12:24, Suravee Suthikulpanit wrote:
>> On 6/27/2013 5:08 AM, Jan Beulich wrote:
>>>>>> On 27.06.13 at 11:20, Suravee Suthikulpanit
>>>>>> <suravee.suthikulpanit@amd.com> wrote:
>>>> On 6/27/2013 3:22 AM, Jan Beulich wrote:
>>>>>>>> On 27.06.13 at 02:24, Suravee Suthikulanit
>>>>>>>> <suravee.suthikulpanit@amd.com> wrote:
>>>>>> I have found an issue in where the system crash right when I start
>>>>>> another HVM guest inside an HVM guest.  I have traced back to the
>>>>>> patch
>>>>>> which the issue started.
>>>>>>
>>>>>> commit f1bde87fc08ce8c818a1640a8fe4765d48923091
>>>>>> Author: Jan Beulich <jbeulich@suse.com>
>>>>>> Date:   Fri Feb 8 11:06:04 2013 +0100
>>>>>>
>>>>>>         x86: debugging code for testing 16Tb support on smaller
>>>>>> memory systems
>>>>>>
>>>>>>         Signed-off-by: Jan Beulich <jbeulich@suse.com>
>>>>>>         Acked-by: Keir Fraser <keir@xen.org>
>>>>> We had issues exposed by this patch before, but any such issue
>>>>> would just have been masked before that patch (and would
>>>>> surface on a system with more than 5Tb of memory anyway).
>>>> The system I am having the issue has 48GB of memory.
>>> Which is why you're seeing the problem only with the debugging
>>> code enabled.
>> Is the "debugging" enabled by default?  I didn't specify any debug when
>> building.
> "Debugging" is enabled by default in the development tree.
>
>> How can I check and disable debugging?
> In the toplevel source directory look into Config.mk
> and set the line
>
>     debug ?= y
>
> accordingly.

Thank you for clarification.

>>> (And of course I didn't really expect you to have
>>> tried this on a huge memory system - they're just too rare still
>>> for this to be likely.)
>>>
>>>>> So it is very unlikely for the patch itself to be at fault.
>>>> I have traced the issue and found that the system crashing starts
>>>> from this
>>>> commit id and onward.
>>>> (i.e. The system does not crash with commit id
>>>> ed759d20249197cf87b338ff0ed328052ca3b8e7)
>>>> So, I am still believe that this patch has somehow triggered the issue.
>>> As said - I'm pretty certain this merely unmasked an already
>>> lurking issue.
>> I'm not quite sure what you meant here.  Are you saying that this
>> "crashing" is a known issue?
> He means nestedhvm reveals an existing bug in his patch.
> If he is right then you do not see nestedhvm crashing with a non-debug
> xen-kernel (unless something else broke it).

After I rebuilt Xen kernel with debug=n, the system no longer crash when starting npt-on-npt and shadown-on-npt guests.
I was not able to get to the crash dump previously. I will try again tomorrow at work and will post them.

Thank you,

Suravee

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: x86/AMD: Nested hvm crashes in 4.3
  2013-06-27 10:24       ` Suravee Suthikulpanit
  2013-06-27 10:28         ` Andrew Cooper
  2013-06-27 10:33         ` Egger, Christoph
@ 2013-06-27 11:20         ` George Dunlap
  2013-06-27 11:37         ` Jan Beulich
  3 siblings, 0 replies; 17+ messages in thread
From: George Dunlap @ 2013-06-27 11:20 UTC (permalink / raw)
  To: Suravee Suthikulpanit
  Cc: Jacob Shin, Sherry Hurwitz, Keir Fraser, Jan Beulich, xen-devel

On Thu, Jun 27, 2013 at 11:24 AM, Suravee Suthikulpanit
<suravee.suthikulpanit@amd.com> wrote:
> On 6/27/2013 5:08 AM, Jan Beulich wrote:
>>>>>
>>>>> On 27.06.13 at 11:20, Suravee Suthikulpanit
>>>>> <suravee.suthikulpanit@amd.com> wrote:
>>>
>>> On 6/27/2013 3:22 AM, Jan Beulich wrote:
>>>>>>>
>>>>>>> On 27.06.13 at 02:24, Suravee Suthikulanit
>>>>>>> <suravee.suthikulpanit@amd.com> wrote:
>>>>>
>>>>> I have found an issue in where the system crash right when I start
>>>>> another HVM guest inside an HVM guest.  I have traced back to the patch
>>>>> which the issue started.
>>>>>
>>>>> commit f1bde87fc08ce8c818a1640a8fe4765d48923091
>>>>> Author: Jan Beulich <jbeulich@suse.com>
>>>>> Date:   Fri Feb 8 11:06:04 2013 +0100
>>>>>
>>>>>        x86: debugging code for testing 16Tb support on smaller memory
>>>>> systems
>>>>>
>>>>>        Signed-off-by: Jan Beulich <jbeulich@suse.com>
>>>>>        Acked-by: Keir Fraser <keir@xen.org>
>>>>
>>>> We had issues exposed by this patch before, but any such issue
>>>> would just have been masked before that patch (and would
>>>> surface on a system with more than 5Tb of memory anyway).
>>>
>>> The system I am having the issue has 48GB of memory.
>>
>> Which is why you're seeing the problem only with the debugging
>> code enabled.
>
> Is the "debugging" enabled by default?  I didn't specify any debug when
> building.
> How can I check and disable debugging?
>
>
>> (And of course I didn't really expect you to have
>> tried this on a huge memory system - they're just too rare still
>> for this to be likely.)
>>
>>>> So it is very unlikely for the patch itself to be at fault.
>>>
>>> I have traced the issue and found that the system crashing starts from
>>> this
>>> commit id and onward.
>>> (i.e. The system does not crash with commit id
>>> ed759d20249197cf87b338ff0ed328052ca3b8e7)
>>> So, I am still believe that this patch has somehow triggered the issue.
>>
>> As said - I'm pretty certain this merely unmasked an already
>> lurking issue.
>
> I'm not quite sure what you meant here.  Are you saying that this "crashing"
> is a known issue?
>
>
>>   And that's what the purpose of that patch is.
>
> This patch is crashing the system. What do you mean by "And that's what the
> purpose of that patch is"?

*If* you had had >5TiB, then you would have crashed even without this patch.

The purpose of the patch is to make it so that if there is a bug that
will crash for >5TiB, then it will *also* crash for <5TiB.  Since the
vast majority of people have <5TiB of RAM, this results in better
testing coverage for those with >5TiB of RAM.

On production systems, we want it to work as often as possible, so
this test is disabled when debug=n, which is the default for released
versions of Xen.  But the development branch we very much want to find
bugs, so during development, we set debug=y by default.

 -George

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: x86/AMD: Nested hvm crashes in 4.3
  2013-06-27 10:24       ` Suravee Suthikulpanit
                           ` (2 preceding siblings ...)
  2013-06-27 11:20         ` George Dunlap
@ 2013-06-27 11:37         ` Jan Beulich
  3 siblings, 0 replies; 17+ messages in thread
From: Jan Beulich @ 2013-06-27 11:37 UTC (permalink / raw)
  To: Suravee Suthikulpanit; +Cc: Keir Fraser, xen-devel, Jacob Shin, Sherry Hurwitz

>>> On 27.06.13 at 12:24, Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> wrote:
> On 6/27/2013 5:08 AM, Jan Beulich wrote:
>>>>> On 27.06.13 at 11:20, Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> wrote:
>>> On 6/27/2013 3:22 AM, Jan Beulich wrote:
>>>> We had issues exposed by this patch before, but any such issue
>>>> would just have been masked before that patch (and would
>>>> surface on a system with more than 5Tb of memory anyway).
>>> The system I am having the issue has 48GB of memory.
>> Which is why you're seeing the problem only with the debugging
>> code enabled.
> Is the "debugging" enabled by default?  I didn't specify any debug when 
> building.
> How can I check and disable debugging?

Set

debug := n

close to the top of ./Config.mk.

>> (And of course I didn't really expect you to have
>> tried this on a huge memory system - they're just too rare still
>> for this to be likely.)
>>
>>>> So it is very unlikely for the patch itself to be at fault.
>>> I have traced the issue and found that the system crashing starts from this
>>> commit id and onward.
>>> (i.e. The system does not crash with commit id
>>> ed759d20249197cf87b338ff0ed328052ca3b8e7)
>>> So, I am still believe that this patch has somehow triggered the issue.
>> As said - I'm pretty certain this merely unmasked an already
>> lurking issue.
> I'm not quite sure what you meant here.  Are you saying that this 
> "crashing" is a known issue?

No, I'm unaware of any issue similar to what you describe.

>>   And that's what the purpose of that patch is.
> This patch is crashing the system. What do you mean by "And that's what 
> the purpose of that patch is"?

The finding of bugs that otherwise would surface only once
indeed running on a huge memory system. If on such a system
this would result in crashing the host, so be it with this
debugging code even on "normal" systems (as long as not
running in production mode). The alternative would be to keep
the bug masked until someone really tried to run Xen on such
a huge system, and the debugging of this then would be quite
a bit more expensive (if nothing else then in the amount of
electrical power needed).

Jan

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: x86/AMD: Nested hvm crashes in 4.3
  2013-06-27 11:14           ` Suravee Suthikulpanit
@ 2013-06-28  0:44             ` Suravee Suthikulanit
  2013-06-28  7:58               ` Jan Beulich
  0 siblings, 1 reply; 17+ messages in thread
From: Suravee Suthikulanit @ 2013-06-28  0:44 UTC (permalink / raw)
  To: Egger, Christoph, Jan Beulich; +Cc: xen-devel, Jacob Shin, Sherry Hurwitz

On 6/27/2013 6:14 AM, Suravee Suthikulpanit wrote:
> On 6/27/2013 5:33 AM, Egger, Christoph wrote:
>> On 27.06.13 12:24, Suravee Suthikulpanit wrote:
>>> On 6/27/2013 5:08 AM, Jan Beulich wrote:
>>>>>>> On 27.06.13 at 11:20, Suravee Suthikulpanit
>>>>>>> <suravee.suthikulpanit@amd.com> wrote:
>>>>> On 6/27/2013 3:22 AM, Jan Beulich wrote:
>>>>>>>>> On 27.06.13 at 02:24, Suravee Suthikulanit
>>>>>>>>> <suravee.suthikulpanit@amd.com> wrote:
>>>>>>> I have found an issue in where the system crash right when I start
>>>>>>> another HVM guest inside an HVM guest.  I have traced back to the
>>>>>>> patch
>>>>>>> which the issue started.
>>>>>>>
>>>>>>> commit f1bde87fc08ce8c818a1640a8fe4765d48923091
>>>>>>> Author: Jan Beulich <jbeulich@suse.com>
>>>>>>> Date:   Fri Feb 8 11:06:04 2013 +0100
>>>>>>>
>>>>>>>         x86: debugging code for testing 16Tb support on smaller
>>>>>>> memory systems
>>>>>>>
>>>>>>>         Signed-off-by: Jan Beulich <jbeulich@suse.com>
>>>>>>>         Acked-by: Keir Fraser <keir@xen.org>
>>>>>> We had issues exposed by this patch before, but any such issue
>>>>>> would just have been masked before that patch (and would
>>>>>> surface on a system with more than 5Tb of memory anyway).
>>>>> The system I am having the issue has 48GB of memory.
>>>> Which is why you're seeing the problem only with the debugging
>>>> code enabled.
>>> Is the "debugging" enabled by default?  I didn't specify any debug when
>>> building.
>> "Debugging" is enabled by default in the development tree.
>>
>>> How can I check and disable debugging?
>> In the toplevel source directory look into Config.mk
>> and set the line
>>
>>     debug ?= y
>>
>> accordingly.
>
> Thank you for clarification.
>
>>>> (And of course I didn't really expect you to have
>>>> tried this on a huge memory system - they're just too rare still
>>>> for this to be likely.)
>>>>
>>>>>> So it is very unlikely for the patch itself to be at fault.
>>>>> I have traced the issue and found that the system crashing starts
>>>>> from this
>>>>> commit id and onward.
>>>>> (i.e. The system does not crash with commit id
>>>>> ed759d20249197cf87b338ff0ed328052ca3b8e7)
>>>>> So, I am still believe that this patch has somehow triggered the 
>>>>> issue.
>>>> As said - I'm pretty certain this merely unmasked an already
>>>> lurking issue.
>>> I'm not quite sure what you meant here.  Are you saying that this
>>> "crashing" is a known issue?
>> He means nestedhvm reveals an existing bug in his patch.
>> If he is right then you do not see nestedhvm crashing with a non-debug
>> xen-kernel (unless something else broke it).
>
> After I rebuilt Xen kernel with debug=n, the system no longer crash 
> when starting npt-on-npt and shadown-on-npt guests.
> I was not able to get to the crash dump previously. I will try again 
> tomorrow at work and will post them.
>
> Thank you,
>
> Suravee

So, I have finally able to get the crash dump (see below). The crash is due to an assert

     (XEN) Assertion 'va >= XEN_VIRT_START' failed at /sandbox/xen/xen.git/xen/include/asm/x86_64/page.h:86

* Debugging show the va=ffff82c40002d000, XEN_VIRT_START=ffff82c4c0000000, DIRECTMAP_VIRT_END=ffffff8000000000.
* Backtrace symbol showing the crash is in "svm_vmexit_handler()", which is inlined from "svm_vmexit_do_vmsave()" and "svm_vmsave()".

CRASH DUMP
==========

(XEN) Assertion 'va >= XEN_VIRT_START' failed at /sandbox/xen/xen.git/xen/include/asm/x86_64/page.h:86
(XEN) Debugging connection not set up.
(XEN) ----[ Xen-4.3-unstable  x86_64  debug=y  Not tainted ]----
(XEN) CPU:    17
(XEN) RIP:    e008:[<ffff82c4c01cfbfc>] svm_vmexit_handler+0x1574/0x1a2a
(XEN) RFLAGS: 0000000000010293   CONTEXT: hypervisor
(XEN) rax: ffff82c4bfffffff   rbx: ffff830852ec1000   rcx: 0000000000000000
(XEN) rdx: ffff830434757020   rsi: 000000000000000a   rdi: ffff82c4c0283740
(XEN) rbp: ffff83043474ff08   rsp: ffff83043474fd28   r8:  0000000000000004
(XEN) r9:  0000000000000010   r10: ffffff8000000000   r11: 0000000000000010
(XEN) r12: ffff83000e010000   r13: 0000000000000003   r14: 0000000000000000
(XEN) r15: ffff82c40002d000   cr0: 000000008005003b   cr4: 00000000000406f0
(XEN) cr3: 000000086d9dd000   cr2: 00007fe7f8e99120
(XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: 0000   cs: e008
(XEN) Xen stack trace from rsp=ffff83043474fd28:
(XEN)    ffff83000e010000 ffff83043474fd70 ffff82c4c01bb001 0000000000000000
(XEN)    ffff83000e010000 ffff830852ec1000 0000000000000000 0000000000000000
(XEN)    ffff830434757080 ffff83043474fda0 ffff82c4c01cca33 0000000000000000
(XEN)    ffff8300c7ea6000 00000000000fee00 ffff83043474ff18 ffff830400000000
(XEN)    ffff82c4c015fe19 ffff83043474fe10 ffff82c4c0185827 00000000000000fc
(XEN)    0000003b5c327b44 0000000a0000000d 0000000000000000 0000000000000000
(XEN)    0000000000000000 ffff83043474fe20 ffff8300c7ea6000 00000049a0b0dcf5
(XEN)    0000000000000286 ffff83043474fe28 ffff82c4c0125e9e ffff83000e010000
(XEN)    ffff83043474fe98 ffff82c4c01c8048 ffff82c4c0125e9e ffff83000e010488
(XEN)    ffff83043474fe98 ffffffffffffffff ffff83043474fe78 ffff82c4c01c5e56
(XEN)    ffff83000e010000 ffff830853ea1000 ffff83043474fe98 ffff82c4c01be614
(XEN)    ffff83000e010000 0000000000000007 ffff83043474ff08 ffff82c4c01c8e66
(XEN)    ffff830434757080 000000fc3474fee0 ffff82c4c0125a52 ffff830434748000
(XEN)    ffff830434748000 00000000ffffffff ffff830852ec1000 ffff83000e010000
(XEN)    ffff830209c87000 0000000000000007 0000000000000003 ffff830203ddff18
(XEN)    ffff830203ddfd70 ffff82c4c01d1c45 ffff830203ddff18 0000000000000003
(XEN)    0000000000000007 ffff830209c87000 ffff830203ddfd70 ffff8300d4b46000
(XEN)    0000000000000246 00000000deadbeef 00000013eabcc169 0000000000000003
(XEN)    0000000203de2000 0000000000000000 0000000203de2000 ffff830203de4000
(XEN)    ffff830209c87000 0000beef0000beef ffff82c4c01ce158 0000beef0000beef
(XEN) Xen call trace:
(XEN)    [<ffff82c4c01cfbfc>] svm_vmexit_handler+0x1574/0x1a2a
(XEN)
(XEN)
(XEN) ****************************************
(XEN) Panic on CPU 17:
(XEN) Assertion 'va >= XEN_VIRT_START' failed at /sandbox/xen/xen.git/xen/include/asm/x86_64/page.h:92
(XEN) ****************************************
(XEN)
(XEN) Manual reset required ('noreboot' specified)
(XEN) Debugging connection not set up.

Suravee

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: x86/AMD: Nested hvm crashes in 4.3
  2013-06-28  0:44             ` Suravee Suthikulanit
@ 2013-06-28  7:58               ` Jan Beulich
  2013-06-28 14:20                 ` Suravee Suthikulanit
  0 siblings, 1 reply; 17+ messages in thread
From: Jan Beulich @ 2013-06-28  7:58 UTC (permalink / raw)
  To: Suravee Suthikulanit
  Cc: xen-devel, Christoph Egger, Jacob Shin, Sherry Hurwitz

>>> On 28.06.13 at 02:44, Suravee Suthikulanit <suravee.suthikulpanit@amd.com> wrote:
> So, I have finally able to get the crash dump (see below). The crash is due 
> to an assert
> 
>      (XEN) Assertion 'va >= XEN_VIRT_START' failed at 
> /sandbox/xen/xen.git/xen/include/asm/x86_64/page.h:86
> 
> * Debugging show the va=ffff82c40002d000, XEN_VIRT_START=ffff82c4c0000000, 
> DIRECTMAP_VIRT_END=ffffff8000000000.
> * Backtrace symbol showing the crash is in "svm_vmexit_handler()", which is 
> inlined from "svm_vmexit_do_vmsave()" and "svm_vmsave()".

Which helps in no way identifying where the problem is -
svm_vmexit_handler() is just too large to spot this without either
the matching xen-syms at hand, or you adding further
instrumentation.

Jan

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: x86/AMD: Nested hvm crashes in 4.3
  2013-06-28  7:58               ` Jan Beulich
@ 2013-06-28 14:20                 ` Suravee Suthikulanit
  2013-06-28 14:24                   ` Andrew Cooper
  2013-06-28 14:52                   ` Jan Beulich
  0 siblings, 2 replies; 17+ messages in thread
From: Suravee Suthikulanit @ 2013-06-28 14:20 UTC (permalink / raw)
  To: Jan Beulich; +Cc: xen-devel, Christoph Egger, Jacob Shin, Sherry Hurwitz

On 6/28/2013 2:58 AM, Jan Beulich wrote:
>>>> On 28.06.13 at 02:44, Suravee Suthikulanit <suravee.suthikulpanit@amd.com> wrote:
>> So, I have finally able to get the crash dump (see below). The crash is due
>> to an assert
>>
>>       (XEN) Assertion 'va >= XEN_VIRT_START' failed at
>> /sandbox/xen/xen.git/xen/include/asm/x86_64/page.h:86
>>
>> * Debugging show the va=ffff82c40002d000, XEN_VIRT_START=ffff82c4c0000000,
>> DIRECTMAP_VIRT_END=ffffff8000000000.
>> * Backtrace symbol showing the crash is in "svm_vmexit_handler()", which is
>> inlined from "svm_vmexit_do_vmsave()" and "svm_vmsave()".
> Which helps in no way identifying where the problem is -
> svm_vmexit_handler() is just too large to spot this without either
> the matching xen-syms at hand, or you adding further
> instrumentation.
>
> Jan

What I am trying to say is, the assertion is in the __virt_to_maddr which is called from
svm_vmexit_do_vmsave().  However, this is a bit complicate due to macros and inlines.
Here is the callchain supposed to look like:

     ASSERT(va >= XEN_VIRT_START )
     __virt_to_maddr        <---- inlined
     virt_to_mfn ()         <---- macro
     __pa ()                <---- macro
     smv_vmasave()          <---- inlined
     svm_vmexit_do_vmsave() <---- inlined
     svm_vmexit_handler()   <---- symbol

Suravee

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: x86/AMD: Nested hvm crashes in 4.3
  2013-06-28 14:20                 ` Suravee Suthikulanit
@ 2013-06-28 14:24                   ` Andrew Cooper
  2013-06-28 14:52                   ` Jan Beulich
  1 sibling, 0 replies; 17+ messages in thread
From: Andrew Cooper @ 2013-06-28 14:24 UTC (permalink / raw)
  To: Suravee Suthikulanit
  Cc: Sherry Hurwitz, Christoph Egger, Jacob Shin, Jan Beulich, xen-devel

On 28/06/13 15:20, Suravee Suthikulanit wrote:
> On 6/28/2013 2:58 AM, Jan Beulich wrote:
>>>>> On 28.06.13 at 02:44, Suravee Suthikulanit
>>>>> <suravee.suthikulpanit@amd.com> wrote:
>>> So, I have finally able to get the crash dump (see below). The crash
>>> is due
>>> to an assert
>>>
>>>       (XEN) Assertion 'va >= XEN_VIRT_START' failed at
>>> /sandbox/xen/xen.git/xen/include/asm/x86_64/page.h:86
>>>
>>> * Debugging show the va=ffff82c40002d000,
>>> XEN_VIRT_START=ffff82c4c0000000,
>>> DIRECTMAP_VIRT_END=ffffff8000000000.
>>> * Backtrace symbol showing the crash is in "svm_vmexit_handler()",
>>> which is
>>> inlined from "svm_vmexit_do_vmsave()" and "svm_vmsave()".
>> Which helps in no way identifying where the problem is -
>> svm_vmexit_handler() is just too large to spot this without either
>> the matching xen-syms at hand, or you adding further
>> instrumentation.
>>
>> Jan
>
> What I am trying to say is, the assertion is in the __virt_to_maddr
> which is called from
> svm_vmexit_do_vmsave().  However, this is a bit complicate due to
> macros and inlines.
> Here is the callchain supposed to look like:
>
>     ASSERT(va >= XEN_VIRT_START )
>     __virt_to_maddr        <---- inlined
>     virt_to_mfn ()         <---- macro
>     __pa ()                <---- macro
>     smv_vmasave()          <---- inlined
>     svm_vmexit_do_vmsave() <---- inlined
>     svm_vmexit_handler()   <---- symbol
>
> Suravee

The code is assuming that the virtual address is mapped into the Xen
pagetables when in fact it is not.

The code needs to be corrected to use map_domain_page() to correctly
access a domheap page.

~Andrew

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: x86/AMD: Nested hvm crashes in 4.3
  2013-06-28 14:20                 ` Suravee Suthikulanit
  2013-06-28 14:24                   ` Andrew Cooper
@ 2013-06-28 14:52                   ` Jan Beulich
  2013-06-28 15:05                     ` Egger, Christoph
  1 sibling, 1 reply; 17+ messages in thread
From: Jan Beulich @ 2013-06-28 14:52 UTC (permalink / raw)
  To: Suravee Suthikulanit
  Cc: xen-devel, Christoph Egger, Jacob Shin, Sherry Hurwitz

>>> On 28.06.13 at 16:20, Suravee Suthikulanit <suravee.suthikulpanit@amd.com>
wrote:
> On 6/28/2013 2:58 AM, Jan Beulich wrote:
>>>>> On 28.06.13 at 02:44, Suravee Suthikulanit <suravee.suthikulpanit@amd.com> 
> wrote:
>>> So, I have finally able to get the crash dump (see below). The crash is due
>>> to an assert
>>>
>>>       (XEN) Assertion 'va >= XEN_VIRT_START' failed at
>>> /sandbox/xen/xen.git/xen/include/asm/x86_64/page.h:86
>>>
>>> * Debugging show the va=ffff82c40002d000, XEN_VIRT_START=ffff82c4c0000000,
>>> DIRECTMAP_VIRT_END=ffffff8000000000.
>>> * Backtrace symbol showing the crash is in "svm_vmexit_handler()", which is
>>> inlined from "svm_vmexit_do_vmsave()" and "svm_vmsave()".
>> Which helps in no way identifying where the problem is -
>> svm_vmexit_handler() is just too large to spot this without either
>> the matching xen-syms at hand, or you adding further
>> instrumentation.
>>
>> Jan
> 
> What I am trying to say is, the assertion is in the __virt_to_maddr which is 
> called from
> svm_vmexit_do_vmsave().  However, this is a bit complicate due to macros and 
> inlines.
> Here is the callchain supposed to look like:
> 
>      ASSERT(va >= XEN_VIRT_START )
>      __virt_to_maddr        <---- inlined
>      virt_to_mfn ()         <---- macro
>      __pa ()                <---- macro
>      smv_vmasave()          <---- inlined
>      svm_vmexit_do_vmsave() <---- inlined
>      svm_vmexit_handler()   <---- symbol

So the problem is the inverse of the usual one (and that's part of
why I didn't spot it when searching the tree for code that needs
fixing; the other part is that while running into these functions I
knew that VMCBs get allocated from the Xen heap, but didn't
notice that the same functions also get used for dealing with
guest VMCBs):

nestedsvm_vmcb_map() properly does the necessary mapping,
but svm_vmsave() (just like svm_vmload()) blindly uses __pa() on
something that's not an address in the direct mapping region.
Which means that on 4.2.0, where we still had a 32-bit hypervisor,
nested SVM was completely broken (and presumably never tested)
in that 32-bit case. Luckily we meanwhile disabled the use of nested
HVM in 4.2.x's 32-bit builds.

Jan

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: x86/AMD: Nested hvm crashes in 4.3
  2013-06-28 14:52                   ` Jan Beulich
@ 2013-06-28 15:05                     ` Egger, Christoph
  0 siblings, 0 replies; 17+ messages in thread
From: Egger, Christoph @ 2013-06-28 15:05 UTC (permalink / raw)
  To: Jan Beulich; +Cc: xen-devel, Jacob Shin, Suravee Suthikulanit, Sherry Hurwitz

On 28.06.13 16:52, Jan Beulich wrote:
>>>> On 28.06.13 at 16:20, Suravee Suthikulanit <suravee.suthikulpanit@amd.com>
> wrote:
>> On 6/28/2013 2:58 AM, Jan Beulich wrote:
>>>>>> On 28.06.13 at 02:44, Suravee Suthikulanit <suravee.suthikulpanit@amd.com> 
>> wrote:
>>>> So, I have finally able to get the crash dump (see below). The crash is due
>>>> to an assert
>>>>
>>>>       (XEN) Assertion 'va >= XEN_VIRT_START' failed at
>>>> /sandbox/xen/xen.git/xen/include/asm/x86_64/page.h:86
>>>>
>>>> * Debugging show the va=ffff82c40002d000, XEN_VIRT_START=ffff82c4c0000000,
>>>> DIRECTMAP_VIRT_END=ffffff8000000000.
>>>> * Backtrace symbol showing the crash is in "svm_vmexit_handler()", which is
>>>> inlined from "svm_vmexit_do_vmsave()" and "svm_vmsave()".
>>> Which helps in no way identifying where the problem is -
>>> svm_vmexit_handler() is just too large to spot this without either
>>> the matching xen-syms at hand, or you adding further
>>> instrumentation.
>>>
>>> Jan
>>
>> What I am trying to say is, the assertion is in the __virt_to_maddr which is 
>> called from
>> svm_vmexit_do_vmsave().  However, this is a bit complicate due to macros and 
>> inlines.
>> Here is the callchain supposed to look like:
>>
>>      ASSERT(va >= XEN_VIRT_START )
>>      __virt_to_maddr        <---- inlined
>>      virt_to_mfn ()         <---- macro
>>      __pa ()                <---- macro
>>      smv_vmasave()          <---- inlined
>>      svm_vmexit_do_vmsave() <---- inlined
>>      svm_vmexit_handler()   <---- symbol
> 
> So the problem is the inverse of the usual one (and that's part of
> why I didn't spot it when searching the tree for code that needs
> fixing; the other part is that while running into these functions I
> knew that VMCBs get allocated from the Xen heap, but didn't
> notice that the same functions also get used for dealing with
> guest VMCBs):
> 
> nestedsvm_vmcb_map() properly does the necessary mapping,
> but svm_vmsave() (just like svm_vmload()) blindly uses __pa() on
> something that's not an address in the direct mapping region.
> Which means that on 4.2.0, where we still had a 32-bit hypervisor,
> nested SVM was completely broken (and presumably never tested)
> in that 32-bit case. Luckily we meanwhile disabled the use of nested
> HVM in 4.2.x's 32-bit builds.

I never tested nested svm on 32bit on the host. I did test
32bit hypervisors as guest.

Christoph

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2013-06-28 15:05 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-06-27  0:24 x86/AMD: Nested hvm crashes in 4.3 Suravee Suthikulanit
2013-06-27  8:22 ` Jan Beulich
2013-06-27  9:20   ` Suravee Suthikulpanit
2013-06-27  9:50     ` Egger, Christoph
2013-06-27 10:08     ` Jan Beulich
2013-06-27 10:24       ` Suravee Suthikulpanit
2013-06-27 10:28         ` Andrew Cooper
2013-06-27 10:33         ` Egger, Christoph
2013-06-27 11:14           ` Suravee Suthikulpanit
2013-06-28  0:44             ` Suravee Suthikulanit
2013-06-28  7:58               ` Jan Beulich
2013-06-28 14:20                 ` Suravee Suthikulanit
2013-06-28 14:24                   ` Andrew Cooper
2013-06-28 14:52                   ` Jan Beulich
2013-06-28 15:05                     ` Egger, Christoph
2013-06-27 11:20         ` George Dunlap
2013-06-27 11:37         ` Jan Beulich

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.