All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Kinney, Michael D" <michael.d.kinney@intel.com>
To: Laszlo Ersek <lersek@redhat.com>,
	Xiao Guangrong <guangrong.xiao@linux.intel.com>,
	"Kinney, Michael D" <michael.d.kinney@intel.com>
Cc: "kvm@vger.kernel.org" <kvm@vger.kernel.org>,
	"Justen, Jordan L" <jordan.l.justen@intel.com>,
	"edk2-devel@ml01.01.org" <edk2-devel@ml01.01.org>,
	Alex Williamson <alex.williamson@redhat.com>,
	"Chen Fan" <chen.fan.fnst@cn.fujitsu.com>,
	Paolo Bonzini <pbonzini@redhat.com>,
	Wanpeng Li <wanpeng.li@hotmail.com>
Subject: RE: [edk2] KVM: MTRR: fix memory type handling if MTRR is completely disabled
Date: Thu, 15 Oct 2015 16:53:25 +0000	[thread overview]
Message-ID: <E92EE9817A31E24EB0585FDF735412F563254548@ORSMSX113.amr.corp.intel.com> (raw)
In-Reply-To: <561FD1D8.3030605@redhat.com>

Laszlo,

There is already a PCD for this timeout that is used by CpuMpPei.

	gUefiCpuPkgTokenSpaceGuid.PcdCpuApInitTimeOutInMicroSeconds

I noticed that CpuDxe is using a hard coded AP timeout.  I think we should just use this same PCD for both the PEI and DXE CPU module and then set it for OVMF to the compatible value.

Mike

>-----Original Message-----
>From: edk2-devel [mailto:edk2-devel-bounces@lists.01.org] On Behalf Of
>Laszlo Ersek
>Sent: Thursday, October 15, 2015 9:19 AM
>To: Xiao Guangrong
>Cc: kvm@vger.kernel.org; Justen, Jordan L; edk2-devel@ml01.01.org; Alex
>Williamson; Chen Fan; Paolo Bonzini; Wanpeng Li
>Subject: Re: [edk2] KVM: MTRR: fix memory type handling if MTRR is
>completely disabled
>
>CC'ing Jordan and Chen Fan.
>
>On 10/15/15 09:10, Xiao Guangrong wrote:
>>
>>
>> On 10/15/2015 02:58 PM, Janusz wrote:
>>> W dniu 15.10.2015 o 08:41, Xiao Guangrong pisze:
>>>>
>>>>
>>>> On 10/15/2015 02:19 PM, Janusz wrote:
>>>>> W dniu 15.10.2015 o 06:19, Xiao Guangrong pisze:
>>>>>>
>>>>>>
>>>>>>
>>>>>> Well, the bug may be not in KVM. When this bug happened, i saw
>OVMF
>>>>>> only checked 1 CPU out, there is the log from OVMF's debug input:
>>>>>>
>>>>>>     Flushing GCD
>>>>>>     Flushing GCD
>>>>>>     Flushing GCD
>>>>>>     Flushing GCD
>>>>>>     Flushing GCD
>>>>>>     Flushing GCD
>>>>>>     Flushing GCD
>>>>>>     Flushing GCD
>>>>>>     Flushing GCD
>>>>>>     Flushing GCDs
>>>>>> Detect CPU count: 1
>>>>>>
>>>>>> So that the startup code has been freed however the APs are still
>>>>>> running,
>>>>>> i think that why we saw the vCPUs executed on unexpected address.
>>>>>>
>>>>>> After digging into OVMF's code, i noticed that BSP CPU waits for APs
>>>>>> for a fixed timer period, however, KVM recent changes require zap all
>>>>>> mappings if CR0.CD is changed, that means the APs need more time to
>>>>>> startup.
>>>>>>
>>>>>> After following changes to OVMF, the bug is completely gone on my
>>>>>> side:
>>>>>>
>>>>>> --- a/UefiCpuPkg/CpuDxe/ApStartup.c
>>>>>> +++ b/UefiCpuPkg/CpuDxe/ApStartup.c
>>>>>> @@ -454,7 +454,9 @@ StartApsStackless (
>>>>>>      //
>>>>>>      // Wait 100 milliseconds for APs to arrive at the ApEntryPoint
>>>>>> routine
>>>>>>      //
>>>>>> -  MicroSecondDelay (100 * 1000);
>>>>>> +  MicroSecondDelay (10 * 100 * 1000);
>>>>>>
>>>>>>      return EFI_SUCCESS;
>>>>>>    }
>>>>>>
>>>>>> Janusz, could you please check this instead? You can switch to your
>>>>>> previous kernel to do this test.
>>>>>>
>>>>>>
>>>>> Ok, now first time when I started VM I was able to start system
>>>>> successfully. When I turned it off and started it again, it
>>>>> restarted my
>>>>> vm at system boot couple of times. Sometimes I also get very high cpu
>>>>> usage for no reason. Also, I get less fps in GTA 5 than in kernel
>>>>> 4.1, I
>>>>> get something like 30-55, but on 4.1 I get all the time 60 fps. This is
>>>>> my new log: https://bpaste.net/show/61a122ad7fe5
>>>>>
>>>>
>>>> Just confirm: the Qemu internal error did not appear any more, right?
>>> Yes, when I reverted your first patch, switched to -vga std from -vga
>>> none and didn't passthrough my GPU (case when I got this internal
>>> error), vm started without problem. I even didn't get any VM restarts
>>> like with passthrough
>>>
>>
>> Wow, it seems we have fixed the QEMU internal error now. :)
>>
>> Recurrently, Paolo has reverted some MTRR patches, was your test
>> based on these reverted patches?
>>
>> The GPU passthrough issue may be related to vfio (not sure), Alex, do
>> you have any idea?
>>
>> Laszlo, could you please check the root case is reasonable and fix it in
>> OVMF if it's right?
>
>The code that you have found is in edk2's EFI_MP_SERVICES_PROTOCOL
>implementation -- more closely, its initial CPU counter code --, from
>edk2 git commit 533263ee5a7f. It is not specific to OVMF -- it is
>generic edk2 code for Intel processors. (I'm CC'ing Jordan and Chen Fan
>because they authored the patch in question.)
>
>If VCPUs need more time to rendezvous than written in the code, on
>recent KVM, then I think we should introduce a new FixedPCD in
>UefiCpuPkg (practically: a compile time constant) for the timeout. Which
>is not hard to do.
>
>However, we'll need two things:
>- an idea about the concrete rendezvous timeout to set, from OvmfPkg
>
>- a *detailed* explanation / elaboration on your words:
>
>  "KVM recent changes require zap all mappings if CR0.CD is changed,
>  that means the APs need more time to startup"
>
>  Preferably with references to Linux kernel commits and the Intel SDM,
>  so that n00bs like me can get a fleeting idea. Do you mean that with
>  caching disabled, the APs execute their rendezvous code (from memory)
>  more slowly?
>
>> BTW, OVMF handles #UD with no trace - nothing is killed, and no call trace
>> in the debug input...
>
>There *is* a trace (of any unexpected exception -- at least for the
>BSP), but unfortunately its location is not intuitive.
>
>The exception handler that is built into OVMF
>("UefiCpuPkg/Library/CpuExceptionHandlerLib") is again generic edk2
>code, and it prints the trace directly to the serial port, regardless of
>the fact that OVMF's DebugLib instance logs explicit DEBUGs to the QEMU
>debug port. (The latter can be directed to the serial port as well, if
>you build OVMF with -D DEBUG_ON_SERIAL_PORT, but this is not relevant
>here.)
>
>If you reproduce the issue while looking at the (virtual) serial port of
>the guest, I trust you will get a register dump.
>
>Thanks!
>Laszlo
>_______________________________________________
>edk2-devel mailing list
>edk2-devel@lists.01.org
>https://lists.01.org/mailman/listinfo/edk2-devel

  reply	other threads:[~2015-10-15 16:53 UTC|newest]

Thread overview: 37+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-09-18  9:37 [edk2] KVM: MTRR: fix memory type handling if MTRR is completely disabled Janusz
2015-09-18 10:07 ` Laszlo Ersek
2015-09-18 17:48   ` Janusz
2015-09-21  2:51     ` Xiao Guangrong
2015-09-21  3:30       ` Wanpeng Li
2015-09-21  3:40         ` Xiao Guangrong
2015-10-01 14:12           ` Janusz
2015-10-01 14:18             ` Paolo Bonzini
2015-10-02 14:38               ` Janusz
2015-10-10 20:07                 ` Xiao Guangrong
2015-10-12 18:20                   ` Xiao Guangrong
2015-10-12 18:29                     ` Xiao Guangrong
2015-10-14  3:58                 ` Xiao Guangrong
2015-10-14  7:37                   ` Janusz
2015-10-14  8:24                     ` Xiao Guangrong
2015-10-14  8:32                       ` Xiao Guangrong
2015-10-14  9:13                         ` Janusz
2015-10-14  9:16                           ` Janusz
2015-10-14  9:47                         ` Laszlo Ersek
2015-10-15  3:59                           ` Xiao Guangrong
2015-10-14 18:08                         ` Janusz
2015-10-15  4:19                           ` Xiao Guangrong
2015-10-15  6:19                             ` Janusz
2015-10-15  6:41                               ` Xiao Guangrong
2015-10-15  6:58                                 ` Janusz
2015-10-15  7:10                                   ` Xiao Guangrong
2015-10-15  7:21                                     ` Janusz
2015-10-15 16:18                                     ` Laszlo Ersek
2015-10-15 16:53                                       ` Kinney, Michael D [this message]
2015-10-15 18:46                                         ` Laszlo Ersek
2015-10-20 17:27                                           ` Janusz
2015-10-20 17:44                                             ` Laszlo Ersek
2015-10-20 18:52                                               ` Janusz Mocek
     [not found]                                       ` <5620696F.7050406@linux.intel.com>
2015-10-16 18:22                                         ` Laszlo Ersek
2015-09-21  8:23       ` Janusz
2015-09-22  8:59 ` Paolo Bonzini
2015-09-22 10:29   ` Janusz

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=E92EE9817A31E24EB0585FDF735412F563254548@ORSMSX113.amr.corp.intel.com \
    --to=michael.d.kinney@intel.com \
    --cc=alex.williamson@redhat.com \
    --cc=chen.fan.fnst@cn.fujitsu.com \
    --cc=edk2-devel@ml01.01.org \
    --cc=guangrong.xiao@linux.intel.com \
    --cc=jordan.l.justen@intel.com \
    --cc=kvm@vger.kernel.org \
    --cc=lersek@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=wanpeng.li@hotmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.