All of lore.kernel.org
 help / color / mirror / Atom feed
From: Don Slutz <dslutz@verizon.com>
To: George Dunlap <George.Dunlap@eu.citrix.com>,
	Konrad Rzeszutek Wilk <konrad@darnok.org>
Cc: Ian Campbell <ian.campbell@citrix.com>,
	Stefano Stabellini <stefano.stabellini@eu.citrix.com>,
	Ian Jackson <ian.jackson@eu.citrix.com>,
	Don Slutz <dslutz@verizon.com>,
	"xen-devel@lists.xen.org" <xen-devel@lists.xen.org>,
	Gordan Bobic <gordan@bobich.net>
Subject: Re: [RFC PATCH 1/1] Add pci_hole_min_size
Date: Tue, 11 Mar 2014 13:16:48 -0400	[thread overview]
Message-ID: <531F4500.703@terremark.com> (raw)
In-Reply-To: <CAFLBxZahEBrHP=yEOQAZjXkts-zMqigmHuUA_JRumTz8c=N8-A@mail.gmail.com>

On 03/11/14 08:54, George Dunlap wrote:
> On Fri, Mar 7, 2014 at 7:28 PM, Konrad Rzeszutek Wilk <konrad@darnok.org> wrote:
>> On Tue, Mar 04, 2014 at 01:57:26PM -0500, Don Slutz wrote:
>>> On 03/04/14 08:25, George Dunlap wrote:
>>>> On Fri, Feb 28, 2014 at 8:15 PM, Don Slutz <dslutz@verizon.com> wrote:
>>>>> This allows growing the pci_hole to the size needed.
>>>> You mean, it allows the pci hole size to be specified at boot
>>> Yes.
>>>
>>>>   -- the
>>>> pci hole still cannot be enlarged dynamically in hvmloader, correct?
>>> If I am correctly understanding you, this is in reference to:
>>>
>>> /*
>>>       * At the moment qemu-xen can't deal with relocated memory regions.
>>>       * It's too close to the release to make a proper fix; for now,
>>>       * only allow the MMIO hole to grow large enough to move guest memory
>>>       * if we're running qemu-traditional.  Items that don't fit will be
>>>       * relocated into the 64-bit address space.   */
>>>
>>>
>>> so the answer is no, however using pci_hole_min_size can mean that
>>> allow_memory_relocate is not needed for upstream QEMU.
>>>
>>>
>>>
>>>> What's your intended use case for this?
>>>>
>>>>   -George
>>> If you add enough PCI devices then all mmio may not fit below 4G which may
>>> not be the layout the user wanted. This allows you to increase the below 4G
>>> address space that PCI devices can use and therefore in more cases not have
>>> any mmio that is above 4G.
>>>
>>> There are real PCI cards that do not support mmio over 4G, so if you want
>>> to emulate them precisely, you may also need to increase the space below 4G
>>> for them.  There are drivers for these cards that also do not work if they
>>> have their mmio space mapped above 4G.
>> Would it be better if the HVM guests had something similar to what we
>> manufacture for PV guests with PCI passthrough: an filtered version of
>> the host's E820?
>>
>> That way you don't have to worry about resizing just right and instead
>> the E820 looks like the hosts one. Granted you can't migrate, but I
>> don't think that is a problem in your use-case?
> Having the guest PCI hole the same size as the host PCI hole also gets
> rid of a whole class of (unfortunately very common) bugs in PCI
> hardware, such that if guest paddrs collide overlap with device IO
> ranges the PCI hardware sends the DMA requests to the wrong place.
> (In other words, VT-d as implemented in a very large number of
> motherboards is utterly broken -- total fail on someone's part.)
>
> The main disadvantage of this is that it unnecessarily reduces the
> amount of lowmem available -- and for 32-bit non-PAE guests, reduces
> the total amount of memory available at all.
>
> I think long-term, it would be best to:
> * Have the pci hole be small for VMs without devices passed through
> * Have the pci hole default to the host pci hole for VMs with devices
> passed through
> * Have the pci hole size able to be specified, either as a size, or as "host".
>
> As long as the size specification can be extended to this
> functionality easily, I think just having a size to begin with is OK.

I see no problem with extending to add "host".  So I am starting with just a size.  Note: the new QEMU way is simpler to decode from an e820 map.

For example:


Mar 10 13:08:28 dcs-xen-54 kernel: [    0.000000] e820: BIOS-provided physical RAM map:
Mar 10 13:08:28 dcs-xen-54 kernel: [    0.000000] Xen: [mem 0x0000000000000000-0x000000000009afff] usable
Mar 10 13:08:28 dcs-xen-54 kernel: [    0.000000] Xen: [mem 0x000000000009b800-0x00000000000fffff] reserved
Mar 10 13:08:28 dcs-xen-54 kernel: [    0.000000] Xen: [mem 0x0000000000100000-0x00000000bf63efff] usable
Mar 10 13:08:28 dcs-xen-54 kernel: [    0.000000] Xen: [mem 0x00000000bf63f000-0x00000000bf6befff] reserved
Mar 10 13:08:28 dcs-xen-54 kernel: [    0.000000] Xen: [mem 0x00000000bf6bf000-0x00000000bf7befff] ACPI NVS
Mar 10 13:08:28 dcs-xen-54 kernel: [    0.000000] Xen: [mem 0x00000000bf7bf000-0x00000000bf7fefff] ACPI data
Mar 10 13:08:28 dcs-xen-54 kernel: [    0.000000] Xen: [mem 0x00000000bf7ff000-0x00000000bf7fffff] usable
Mar 10 13:08:28 dcs-xen-54 kernel: [    0.000000] Xen: [mem 0x00000000bf800000-0x00000000bfffffff] reserved
Mar 10 13:08:28 dcs-xen-54 kernel: [    0.000000] Xen: [mem 0x00000000e0000000-0x00000000efffffff] reserved
Mar 10 13:08:28 dcs-xen-54 kernel: [    0.000000] Xen: [mem 0x00000000feb00000-0x00000000feb03fff] reserved
Mar 10 13:08:28 dcs-xen-54 kernel: [    0.000000] Xen: [mem 0x00000000fec00000-0x00000000fec00fff] reserved
Mar 10 13:08:28 dcs-xen-54 kernel: [    0.000000] Xen: [mem 0x00000000fed10000-0x00000000fed19fff] reserved
Mar 10 13:08:28 dcs-xen-54 kernel: [    0.000000] Xen: [mem 0x00000000fed1c000-0x00000000fed1ffff] reserved
Mar 10 13:08:28 dcs-xen-54 kernel: [    0.000000] Xen: [mem 0x00000000fee00000-0x00000000feefffff] reserved
Mar 10 13:08:28 dcs-xen-54 kernel: [    0.000000] Xen: [mem 0x00000000ffd80000-0x00000000ffffffff] reserved
Mar 10 13:08:28 dcs-xen-54 kernel: [    0.000000] Xen: [mem 0x0000000100000000-0x00000005c0e16fff] usable
Mar 10 13:08:28 dcs-xen-54 kernel: [    0.000000] Xen: [mem 0x00000005c0e17000-0x000000083fffffff] unusable


pci_hole_min_size =  1082130432 (0x40800000) for 0xbf800000
pci_hole_min_size =  536870912 (0x20000000) for 0xe0000000

Note: It does not leap out at me from the e820 map which is the host one.

> I think the qemu guys didn't like the term "pci_hole" and wanted
> something like "lowmem" instead -- that will need to be sorted out.

Next version of QEMU patch out with new name.  Was not sure what name or size make the most sense.


lowmem == 1 << 32 - pci_hole_min_size.

     -Don Slutz

>   -George

  reply	other threads:[~2014-03-11 17:16 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-02-28 20:15 [RFC PATCH 0/1] Add pci_hole_min_size Don Slutz
2014-02-28 20:15 ` [RFC PATCH 1/1] " Don Slutz
2014-02-28 22:07   ` Boris Ostrovsky
2014-03-03 15:30     ` Don Slutz
2014-03-03 16:07       ` Boris Ostrovsky
2014-03-03 20:43         ` Don Slutz
2014-03-03 22:54           ` Boris Ostrovsky
2014-03-04 13:25   ` George Dunlap
2014-03-04 18:57     ` Don Slutz
2014-03-07 19:28       ` Konrad Rzeszutek Wilk
2014-03-11 12:54         ` George Dunlap
2014-03-11 17:16           ` Don Slutz [this message]
2014-03-11 17:01 [RFC PATCH 0/1] " Don Slutz
2014-03-11 17:01 ` [RFC PATCH 1/1] " Don Slutz

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=531F4500.703@terremark.com \
    --to=dslutz@verizon.com \
    --cc=George.Dunlap@eu.citrix.com \
    --cc=gordan@bobich.net \
    --cc=ian.campbell@citrix.com \
    --cc=ian.jackson@eu.citrix.com \
    --cc=konrad@darnok.org \
    --cc=stefano.stabellini@eu.citrix.com \
    --cc=xen-devel@lists.xen.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.