All of lore.kernel.org
 help / color / mirror / Atom feed
From: Laszlo Ersek <lersek@redhat.com>
To: Igor Mammedov <imammedo@redhat.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>,
	Xiao Guangrong <guangrong.xiao@linux.intel.com>,
	pbonzini@redhat.com, gleb@kernel.org, mtosatti@redhat.com,
	stefanha@redhat.com, rth@twiddle.net, ehabkost@redhat.com,
	dan.j.williams@intel.com, kvm@vger.kernel.org,
	qemu-devel@nongnu.org, Marcel Apfelbaum <marcel@redhat.com>
Subject: Re: How to reserve guest physical region for ACPI
Date: Thu, 7 Jan 2016 18:33:05 +0100	[thread overview]
Message-ID: <568EA151.5040702@redhat.com> (raw)
In-Reply-To: <20160107145113.7b459368@nial.brq.redhat.com>

On 01/07/16 14:51, Igor Mammedov wrote:
> On Mon, 4 Jan 2016 21:17:31 +0100
> Laszlo Ersek <lersek@redhat.com> wrote:
> 
>> Michael CC'd me on the grandparent of the email below. I'll try to add
>> my thoughts in a single go, with regard to OVMF.
>>
>> On 12/30/15 20:52, Michael S. Tsirkin wrote:
>>> On Wed, Dec 30, 2015 at 04:55:54PM +0100, Igor Mammedov wrote:  
>>>> On Mon, 28 Dec 2015 14:50:15 +0200
>>>> "Michael S. Tsirkin" <mst@redhat.com> wrote:
>>>>  
>>>>> On Mon, Dec 28, 2015 at 10:39:04AM +0800, Xiao Guangrong wrote:  
>>>>>>
>>>>>> Hi Michael, Paolo,
>>>>>>
>>>>>> Now it is the time to return to the challenge that how to reserve guest
>>>>>> physical region internally used by ACPI.
>>>>>>
>>>>>> Igor suggested that:
>>>>>> | An alternative place to allocate reserve from could be high memory.
>>>>>> | For pc we have "reserved-memory-end" which currently makes sure
>>>>>> | that hotpluggable memory range isn't used by firmware
>>>>>> (https://lists.nongnu.org/archive/html/qemu-devel/2015-11/msg00926.html)  
>>
>> OVMF has no support for the "reserved-memory-end" fw_cfg file. The
>> reason is that nobody wrote that patch, nor asked for the patch to be
>> written. (Not implying that just requesting the patch would be
>> sufficient for the patch to be written.)
> Hijacking this part of thread to check if OVMF would work with memory-hotplug
> and if it needs "reserved-memory-end" support at all.
> 
> How OVMF determines which GPA ranges to use for initializing PCI BARs
> at boot time,

I'm glad you asked this question. This is an utterly sordid area that goes back quite a bit. We've discussed it several times in the past; for example: if you recall the "etc/pci-info" discussion...

Fact is, OVMF has no way to dynamically determine the PCI MMIO aperture to allocate BARs from. (Obviously parsing AML is out of question, especially at the early stage of the firmware where this information would be necessary. Plus that would be a chicken-egg problem anyway: QEMU composes the CRS in the AML *based on* the enumeration that was completed by the guest.)

Search "OvmfPkg/PlatformPei/Platform.c" for the string "PciBase"; it all originates there. I can also quote it:

    UINT32  TopOfLowRam;
    UINT32  PciBase;

    TopOfLowRam = GetSystemMemorySizeBelow4gb ();
    if (mHostBridgeDevId == INTEL_Q35_MCH_DEVICE_ID) {
      //
      // A 3GB base will always fall into Q35's 32-bit PCI host aperture,
      // regardless of the Q35 MMCONFIG BAR. Correspondingly, QEMU never lets
      // the RAM below 4 GB exceed it.
      //
      PciBase = BASE_2GB + BASE_1GB;
      ASSERT (TopOfLowRam <= PciBase);
    } else {
      PciBase = (TopOfLowRam < BASE_2GB) ? BASE_2GB : TopOfLowRam;
    }

    ...

    AddIoMemoryRangeHob (PciBase, 0xFC000000);

That's it.

In the past, it has repeatedly occurred that OVMF's calculation wouldn't match QEMU's calculation. Then PCI MMIO BARs were allocated outside of QEMU's actual MMIO aperture. This caused two things:
- video display not working (due to framebuffer being accessed in bogus place),
- Windows and Linux guests noticing that the BARs were outside of the range exposed in the _CRS, and disabling devices etc.

We kept duct-taping this, with patches in both OVMF and QEMU (see e.g. Gerd's QEMU commit ddaaefb4dd42).

It has been working fine for quite a long time now, but it is still not dynamic -- the calculations are duplicated between QEMU and OVMF.

To this day, I maintain that the "etc/pci-info" fw_cfg file would have been ideal for OVMF's purposes; and I still don't understand why it was ultimately removed.

> more specifically 64-bit BARs.

Ha. Haha. Hahahaha.

OVMF doesn't support 64-bit BARs *at all*. In order to implement that, I would have to (1) understand PCI about ten billion percent better than I do now, (2) extend the mostly *impenetrable* PCI host bridge / root bridge driver in "OvmfPkg/PciHostBridgeDxe" to support this functionality.

Unfortunately, the parts of the UEFI & Platform Init specs that seem to talk about this functionality are super complex and obscure.

We have plans with Marcel and others to understand this better and perhaps do something about it.

Anyway, the basic premise bears repeating: even for the 32-bit case, OVMF has no way to dynamically retrieve the PCI hole's boundaries from QEMU.

Honestly, I'm confused. If "reserved-memory-end" is exposed over fw_cfg, and it -- apparently! -- partakes in communicating the 64-bit PCI hole to the guest, then why again was "etc/pci-info" removed in the first place?

Thanks
Laszlo

WARNING: multiple messages have this Message-ID (diff)
From: Laszlo Ersek <lersek@redhat.com>
To: Igor Mammedov <imammedo@redhat.com>
Cc: Xiao Guangrong <guangrong.xiao@linux.intel.com>,
	ehabkost@redhat.com, kvm@vger.kernel.org,
	"Michael S. Tsirkin" <mst@redhat.com>,
	gleb@kernel.org, mtosatti@redhat.com, qemu-devel@nongnu.org,
	stefanha@redhat.com, Marcel Apfelbaum <marcel@redhat.com>,
	pbonzini@redhat.com, dan.j.williams@intel.com, rth@twiddle.net
Subject: Re: [Qemu-devel] How to reserve guest physical region for ACPI
Date: Thu, 7 Jan 2016 18:33:05 +0100	[thread overview]
Message-ID: <568EA151.5040702@redhat.com> (raw)
In-Reply-To: <20160107145113.7b459368@nial.brq.redhat.com>

On 01/07/16 14:51, Igor Mammedov wrote:
> On Mon, 4 Jan 2016 21:17:31 +0100
> Laszlo Ersek <lersek@redhat.com> wrote:
> 
>> Michael CC'd me on the grandparent of the email below. I'll try to add
>> my thoughts in a single go, with regard to OVMF.
>>
>> On 12/30/15 20:52, Michael S. Tsirkin wrote:
>>> On Wed, Dec 30, 2015 at 04:55:54PM +0100, Igor Mammedov wrote:  
>>>> On Mon, 28 Dec 2015 14:50:15 +0200
>>>> "Michael S. Tsirkin" <mst@redhat.com> wrote:
>>>>  
>>>>> On Mon, Dec 28, 2015 at 10:39:04AM +0800, Xiao Guangrong wrote:  
>>>>>>
>>>>>> Hi Michael, Paolo,
>>>>>>
>>>>>> Now it is the time to return to the challenge that how to reserve guest
>>>>>> physical region internally used by ACPI.
>>>>>>
>>>>>> Igor suggested that:
>>>>>> | An alternative place to allocate reserve from could be high memory.
>>>>>> | For pc we have "reserved-memory-end" which currently makes sure
>>>>>> | that hotpluggable memory range isn't used by firmware
>>>>>> (https://lists.nongnu.org/archive/html/qemu-devel/2015-11/msg00926.html)  
>>
>> OVMF has no support for the "reserved-memory-end" fw_cfg file. The
>> reason is that nobody wrote that patch, nor asked for the patch to be
>> written. (Not implying that just requesting the patch would be
>> sufficient for the patch to be written.)
> Hijacking this part of thread to check if OVMF would work with memory-hotplug
> and if it needs "reserved-memory-end" support at all.
> 
> How OVMF determines which GPA ranges to use for initializing PCI BARs
> at boot time,

I'm glad you asked this question. This is an utterly sordid area that goes back quite a bit. We've discussed it several times in the past; for example: if you recall the "etc/pci-info" discussion...

Fact is, OVMF has no way to dynamically determine the PCI MMIO aperture to allocate BARs from. (Obviously parsing AML is out of question, especially at the early stage of the firmware where this information would be necessary. Plus that would be a chicken-egg problem anyway: QEMU composes the CRS in the AML *based on* the enumeration that was completed by the guest.)

Search "OvmfPkg/PlatformPei/Platform.c" for the string "PciBase"; it all originates there. I can also quote it:

    UINT32  TopOfLowRam;
    UINT32  PciBase;

    TopOfLowRam = GetSystemMemorySizeBelow4gb ();
    if (mHostBridgeDevId == INTEL_Q35_MCH_DEVICE_ID) {
      //
      // A 3GB base will always fall into Q35's 32-bit PCI host aperture,
      // regardless of the Q35 MMCONFIG BAR. Correspondingly, QEMU never lets
      // the RAM below 4 GB exceed it.
      //
      PciBase = BASE_2GB + BASE_1GB;
      ASSERT (TopOfLowRam <= PciBase);
    } else {
      PciBase = (TopOfLowRam < BASE_2GB) ? BASE_2GB : TopOfLowRam;
    }

    ...

    AddIoMemoryRangeHob (PciBase, 0xFC000000);

That's it.

In the past, it has repeatedly occurred that OVMF's calculation wouldn't match QEMU's calculation. Then PCI MMIO BARs were allocated outside of QEMU's actual MMIO aperture. This caused two things:
- video display not working (due to framebuffer being accessed in bogus place),
- Windows and Linux guests noticing that the BARs were outside of the range exposed in the _CRS, and disabling devices etc.

We kept duct-taping this, with patches in both OVMF and QEMU (see e.g. Gerd's QEMU commit ddaaefb4dd42).

It has been working fine for quite a long time now, but it is still not dynamic -- the calculations are duplicated between QEMU and OVMF.

To this day, I maintain that the "etc/pci-info" fw_cfg file would have been ideal for OVMF's purposes; and I still don't understand why it was ultimately removed.

> more specifically 64-bit BARs.

Ha. Haha. Hahahaha.

OVMF doesn't support 64-bit BARs *at all*. In order to implement that, I would have to (1) understand PCI about ten billion percent better than I do now, (2) extend the mostly *impenetrable* PCI host bridge / root bridge driver in "OvmfPkg/PciHostBridgeDxe" to support this functionality.

Unfortunately, the parts of the UEFI & Platform Init specs that seem to talk about this functionality are super complex and obscure.

We have plans with Marcel and others to understand this better and perhaps do something about it.

Anyway, the basic premise bears repeating: even for the 32-bit case, OVMF has no way to dynamically retrieve the PCI hole's boundaries from QEMU.

Honestly, I'm confused. If "reserved-memory-end" is exposed over fw_cfg, and it -- apparently! -- partakes in communicating the 64-bit PCI hole to the guest, then why again was "etc/pci-info" removed in the first place?

Thanks
Laszlo

  reply	other threads:[~2016-01-07 17:33 UTC|newest]

Thread overview: 59+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-12-02  7:20 [PATCH v9 0/5] implement vNVDIMM Xiao Guangrong
2015-12-02  7:20 ` [Qemu-devel] " Xiao Guangrong
2015-12-02  7:20 ` [PATCH v9 1/5] nvdimm: implement NVDIMM device abstract Xiao Guangrong
2015-12-02  7:20   ` [Qemu-devel] " Xiao Guangrong
2015-12-02  7:20 ` [PATCH v9 2/5] acpi: support specified oem table id for build_header Xiao Guangrong
2015-12-02  7:20   ` [Qemu-devel] " Xiao Guangrong
2015-12-02  7:20 ` [PATCH v9 3/5] nvdimm acpi: build ACPI NFIT table Xiao Guangrong
2015-12-02  7:20   ` [Qemu-devel] " Xiao Guangrong
2015-12-02  7:20 ` [PATCH v9 4/5] nvdimm acpi: build ACPI nvdimm devices Xiao Guangrong
2015-12-02  7:20   ` [Qemu-devel] " Xiao Guangrong
2015-12-02  7:21 ` [PATCH v9 5/5] nvdimm: add maintain info Xiao Guangrong
2015-12-02  7:21   ` [Qemu-devel] " Xiao Guangrong
2015-12-10  3:11 ` [PATCH v9 0/5] implement vNVDIMM Xiao Guangrong
2015-12-10  3:11   ` [Qemu-devel] " Xiao Guangrong
2015-12-21 14:13   ` Xiao Guangrong
2015-12-21 14:13     ` [Qemu-devel] " Xiao Guangrong
2015-12-28  2:39 ` How to reserve guest physical region for ACPI Xiao Guangrong
2015-12-28  2:39   ` [Qemu-devel] " Xiao Guangrong
2015-12-28 12:50   ` Michael S. Tsirkin
2015-12-28 12:50     ` [Qemu-devel] " Michael S. Tsirkin
2015-12-30 15:55     ` Igor Mammedov
2015-12-30 15:55       ` [Qemu-devel] " Igor Mammedov
2015-12-30 19:52       ` Michael S. Tsirkin
2015-12-30 19:52         ` [Qemu-devel] " Michael S. Tsirkin
2016-01-04 20:17         ` Laszlo Ersek
2016-01-04 20:17           ` [Qemu-devel] " Laszlo Ersek
2016-01-05 17:08           ` Igor Mammedov
2016-01-05 17:08             ` [Qemu-devel] " Igor Mammedov
2016-01-05 17:22             ` Laszlo Ersek
2016-01-05 17:22               ` [Qemu-devel] " Laszlo Ersek
2016-01-06 13:39               ` Igor Mammedov
2016-01-06 13:39                 ` [Qemu-devel] " Igor Mammedov
2016-01-06 14:43                 ` Laszlo Ersek
2016-01-06 14:43                   ` [Qemu-devel] " Laszlo Ersek
2016-01-07 13:51           ` Igor Mammedov
2016-01-07 13:51             ` [Qemu-devel] " Igor Mammedov
2016-01-07 17:33             ` Laszlo Ersek [this message]
2016-01-07 17:33               ` Laszlo Ersek
2016-01-05 16:30         ` Igor Mammedov
2016-01-05 16:30           ` [Qemu-devel] " Igor Mammedov
2016-01-05 16:43           ` Michael S. Tsirkin
2016-01-05 16:43             ` [Qemu-devel] " Michael S. Tsirkin
2016-01-05 17:07             ` Laszlo Ersek
2016-01-05 17:07               ` [Qemu-devel] " Laszlo Ersek
2016-01-05 17:07             ` Xiao Guangrong
2016-01-05 17:07               ` [Qemu-devel] " Xiao Guangrong
2016-01-07  9:21               ` Igor Mammedov
2016-01-07  9:21                 ` [Qemu-devel] " Igor Mammedov
2016-01-08  4:21                 ` Xiao Guangrong
2016-01-08  4:21                   ` [Qemu-devel] " Xiao Guangrong
2016-01-08  9:42                   ` Laszlo Ersek
2016-01-08  9:42                     ` [Qemu-devel] " Laszlo Ersek
2016-01-08 15:59                   ` Igor Mammedov
2016-01-08 15:59                     ` [Qemu-devel] " Igor Mammedov
2016-01-07 10:30             ` Igor Mammedov
2016-01-07 10:54               ` Michael S. Tsirkin
2016-01-07 13:42                 ` Igor Mammedov
2016-01-07 17:11                   ` Laszlo Ersek
2016-01-07 17:08                 ` Laszlo Ersek

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=568EA151.5040702@redhat.com \
    --to=lersek@redhat.com \
    --cc=dan.j.williams@intel.com \
    --cc=ehabkost@redhat.com \
    --cc=gleb@kernel.org \
    --cc=guangrong.xiao@linux.intel.com \
    --cc=imammedo@redhat.com \
    --cc=kvm@vger.kernel.org \
    --cc=marcel@redhat.com \
    --cc=mst@redhat.com \
    --cc=mtosatti@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=rth@twiddle.net \
    --cc=stefanha@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.