All of lore.kernel.org
 help / color / mirror / Atom feed
From: Igor Mammedov <imammedo@redhat.com>
To: Joao Martins <joao.m.martins@oracle.com>
Cc: Eduardo Habkost <ehabkost@redhat.com>,
	"Michael S . Tsirkin" <mst@redhat.com>,
	Richard Henderson <richard.henderson@linaro.org>,
	qemu-devel@nongnu.org, Daniel Jordan <daniel.m.jordan@oracle.com>,
	David Edmondson <david.edmondson@oracle.com>,
	Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>,
	Paolo Bonzini <pbonzini@redhat.com>,
	lersek@redhat.com
Subject: Re: [PATCH RFC 4/6] i386/pc: Keep PCI 64-bit hole within usable IOVA space
Date: Mon, 28 Jun 2021 17:37:19 +0200	[thread overview]
Message-ID: <20210628173719.2822ce0d@redhat.com> (raw)
In-Reply-To: <57664249-003c-7e62-2a16-925b3bb33418@oracle.com>

On Wed, 23 Jun 2021 14:22:29 +0100
Joao Martins <joao.m.martins@oracle.com> wrote:

> On 6/23/21 1:30 PM, Igor Mammedov wrote:
> > On Tue, 22 Jun 2021 16:49:03 +0100
> > Joao Martins <joao.m.martins@oracle.com> wrote:
> >   
> >> pci_memory initialized by q35 and i440fx is set to a range
> >> of 0 .. UINT64_MAX, and as a consequence when ACPI and pci-host
> >> pick the hole64_start it does not account for allowed IOVA ranges.
> >>
> >> Rather than blindly returning, round up the hole64_start
> >> value to the allowable IOVA range, such that it accounts for
> >> the 1Tb hole *on AMD*. On Intel it returns the input value
> >> for hole64 start.  
> > 
> > wouldn't that work only in case where guest firmware hadn't
> > mapped any PCI bars above 4Gb (possibly in not allowed region).
> > 
> > Looking at Seabios, it uses reserved_memory_end as the start
> > of PCI64 window. Not sure about OVMF,
> >  CCing Laszlo.
> >   
> Hmmm, perhaps only in the case that I don't advertise the reserved
> region (i.e. mem size not being big enough to let the guest know in
> the e820). But then that it begs the question, should we then always
> advertise the said HT region as reserved regardless of memory size?

Where in firmware that e820 reserved entry will be accounted for?
All I see in Seabios is qemu_cfg_e820() which accounts for E820_RAM (sometimes)
and there the high RAM is hardwired to 4G. Then later on it might get used
to setup the start of 64-bit PCI hole if etc/reserved-memory-end
doesn't exist.

And for E820_RAM to work, the entry with highest address has to be
the last in the table, which is fragile and it's only a question of
time when someone breaks that.
(well if it's explicitly specified in ACPI spec, I won't object
if properly documented)
So firmware side will also need changes to account for this,
and make it independent if 4G starting point.

For Seabios it might be better to make use of fwcfg file Laszlo has
suggested in his reply than relying on e820 & reserved-memory-end,
either way Seabios would need related patches for this to work
properly. For OVMF you can get away with QEMU only patches,
if using that knob.

(PS:
I'm not PCI expert so there might be more issues hidden there
)

> >> Suggested-by: David Edmondson <david.edmondson@oracle.com>
> >> Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
> >> ---
> >>  hw/i386/pc.c         | 17 +++++++++++++++--
> >>  hw/pci-host/i440fx.c |  4 +++-
> >>  hw/pci-host/q35.c    |  4 +++-
> >>  include/hw/i386/pc.h |  3 ++-
> >>  4 files changed, 23 insertions(+), 5 deletions(-)
> >>
> >> diff --git a/hw/i386/pc.c b/hw/i386/pc.c
> >> index 2e2ea82a4661..65885cc16037 100644
> >> --- a/hw/i386/pc.c
> >> +++ b/hw/i386/pc.c
> >> @@ -1141,7 +1141,7 @@ void pc_memory_init(PCMachineState *pcms,
> >>   * The 64bit pci hole starts after "above 4G RAM" and
> >>   * potentially the space reserved for memory hotplug.
> >>   */
> >> -uint64_t pc_pci_hole64_start(void)
> >> +uint64_t pc_pci_hole64_start(uint64_t size)
> >>  {
> >>      PCMachineState *pcms = PC_MACHINE(qdev_get_machine());
> >>      PCMachineClass *pcmc = PC_MACHINE_GET_CLASS(pcms);
> >> @@ -1155,12 +1155,25 @@ uint64_t pc_pci_hole64_start(void)
> >>              hole64_start += memory_region_size(&ms->device_memory->mr);
> >>          }
> >>      } else {
> >> -        hole64_start = 0x100000000ULL + x86ms->above_4g_mem_size;
> >> +        if (!x86ms->above_1t_mem_size) {
> >> +            hole64_start = 0x100000000ULL + x86ms->above_4g_mem_size;
> >> +        } else {
> >> +            hole64_start = x86ms->above_1t_maxram_start;
> >> +        }
> >>      }  
> >   
> >> +    hole64_start = allowed_round_up(hole64_start, size);  
> > 
> > I'm not sure but, might it cause problems if there were BARs placed
> > by firmware in region below rounded up value?
> > (i.e. ACPI description which uses PCI_HOST_PROP_PCI_HOLE_START property
> > won't match whatever firmware programmed due to rounding pushing hole
> > start up)
> >   
> But wouldn't then the problem be firmware ignoring E820 to avoid putting
> firmware region where it shouldn't? Unless of course, it wasn't advertised
> like I mentioned earlier.
> 
> >>      return ROUND_UP(hole64_start, 1 * GiB);
> >>  }
> >>  
> >> +uint64_t pc_pci_hole64_start_aligned(uint64_t start, uint64_t size)
> >> +{
> >> +    if (nb_iova_ranges == DEFAULT_NR_USABLE_IOVAS) {
> >> +        return start;
> >> +    }
> >> +    return allowed_round_up(start, size);
> >> +}
> >> +
> >>  DeviceState *pc_vga_init(ISABus *isa_bus, PCIBus *pci_bus)
> >>  {
> >>      DeviceState *dev = NULL;
> >> diff --git a/hw/pci-host/i440fx.c b/hw/pci-host/i440fx.c
> >> index 28c9bae89944..e8eaebfe1034 100644
> >> --- a/hw/pci-host/i440fx.c
> >> +++ b/hw/pci-host/i440fx.c
> >> @@ -163,8 +163,10 @@ static uint64_t i440fx_pcihost_get_pci_hole64_start_value(Object *obj)
> >>      pci_bus_get_w64_range(h->bus, &w64);
> >>      value = range_is_empty(&w64) ? 0 : range_lob(&w64);
> >>      if (!value && s->pci_hole64_fix) {
> >> -        value = pc_pci_hole64_start();
> >> +        value = pc_pci_hole64_start(s->pci_hole64_size);
> >>      }
> >> +    /* This returns @value when not on AMD */
> >> +    value = pc_pci_hole64_start_aligned(value, s->pci_hole64_size);
> >>      return value;
> >>  }
> >>  
> >> diff --git a/hw/pci-host/q35.c b/hw/pci-host/q35.c
> >> index 2eb729dff585..d556eb965ddb 100644
> >> --- a/hw/pci-host/q35.c
> >> +++ b/hw/pci-host/q35.c
> >> @@ -126,8 +126,10 @@ static uint64_t q35_host_get_pci_hole64_start_value(Object *obj)
> >>      pci_bus_get_w64_range(h->bus, &w64);
> >>      value = range_is_empty(&w64) ? 0 : range_lob(&w64);
> >>      if (!value && s->pci_hole64_fix) {
> >> -        value = pc_pci_hole64_start();
> >> +        value = pc_pci_hole64_start(s->mch.pci_hole64_size);
> >>      }
> >> +    /* This returns @value when not on AMD */
> >> +    value = pc_pci_hole64_start_aligned(value, s->mch.pci_hole64_size);
> >>      return value;
> >>  }
> >>  
> >> diff --git a/include/hw/i386/pc.h b/include/hw/i386/pc.h
> >> index 73b8e2900c72..b924aef3a218 100644
> >> --- a/include/hw/i386/pc.h
> >> +++ b/include/hw/i386/pc.h
> >> @@ -217,7 +217,8 @@ void pc_memory_init(PCMachineState *pcms,
> >>                      MemoryRegion *system_memory,
> >>                      MemoryRegion *rom_memory,
> >>                      MemoryRegion **ram_memory);
> >> -uint64_t pc_pci_hole64_start(void);
> >> +uint64_t pc_pci_hole64_start(uint64_t size);
> >> +uint64_t pc_pci_hole64_start_aligned(uint64_t value, uint64_t size);
> >>  DeviceState *pc_vga_init(ISABus *isa_bus, PCIBus *pci_bus);
> >>  void pc_basic_device_init(struct PCMachineState *pcms,
> >>                            ISABus *isa_bus, qemu_irq *gsi,  
> >   
> 



  reply	other threads:[~2021-06-28 15:38 UTC|newest]

Thread overview: 38+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-06-22 15:48 [PATCH RFC 0/6] i386/pc: Fix creation of >= 1Tb guests on AMD systems with IOMMU Joao Martins
2021-06-22 15:49 ` [PATCH RFC 1/6] i386/pc: Account IOVA reserved ranges above 4G boundary Joao Martins
2021-06-23  7:11   ` Igor Mammedov
2021-06-23  9:37     ` Joao Martins
2021-06-23 11:39       ` Igor Mammedov
2021-06-23 13:04         ` Joao Martins
2021-06-28 14:32           ` Igor Mammedov
2021-08-06 10:41             ` Joao Martins
2021-06-23  9:03   ` Igor Mammedov
2021-06-23  9:51     ` Joao Martins
2021-06-23 12:09       ` Igor Mammedov
2021-06-23 13:07         ` Joao Martins
2021-06-28 13:25           ` Igor Mammedov
2021-06-28 13:43             ` Joao Martins
2021-06-28 15:21               ` Igor Mammedov
2021-06-24  9:32     ` Dr. David Alan Gilbert
2021-06-28 14:42       ` Igor Mammedov
2021-06-22 15:49 ` [PATCH RFC 2/6] i386/pc: Round up the hotpluggable memory within valid IOVA ranges Joao Martins
2021-06-22 15:49 ` [PATCH RFC 3/6] pc/cmos: Adjust CMOS above 4G memory size according to 1Tb boundary Joao Martins
2021-06-22 15:49 ` [PATCH RFC 4/6] i386/pc: Keep PCI 64-bit hole within usable IOVA space Joao Martins
2021-06-23 12:30   ` Igor Mammedov
2021-06-23 13:22     ` Joao Martins
2021-06-28 15:37       ` Igor Mammedov [this message]
2021-06-23 16:33     ` Laszlo Ersek
2021-06-25 17:19       ` Joao Martins
2021-06-22 15:49 ` [PATCH RFC 5/6] i386/acpi: Fix SRAT ranges in accordance to usable IOVA Joao Martins
2021-06-22 15:49 ` [PATCH RFC 6/6] i386/pc: Add a machine property for AMD-only enforcing of valid IOVAs Joao Martins
2021-06-23  9:18   ` Igor Mammedov
2021-06-23  9:59     ` Joao Martins
2021-06-22 21:16 ` [PATCH RFC 0/6] i386/pc: Fix creation of >= 1Tb guests on AMD systems with IOMMU Alex Williamson
2021-06-23  7:40   ` David Edmondson
2021-06-23 19:13     ` Alex Williamson
2021-06-23  9:30   ` Joao Martins
2021-06-23 11:58     ` Igor Mammedov
2021-06-23 13:15       ` Joao Martins
2021-06-23 19:27     ` Alex Williamson
2021-06-24  9:22       ` Dr. David Alan Gilbert
2021-06-25 16:54       ` Joao Martins

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210628173719.2822ce0d@redhat.com \
    --to=imammedo@redhat.com \
    --cc=daniel.m.jordan@oracle.com \
    --cc=david.edmondson@oracle.com \
    --cc=ehabkost@redhat.com \
    --cc=joao.m.martins@oracle.com \
    --cc=lersek@redhat.com \
    --cc=mst@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=richard.henderson@linaro.org \
    --cc=suravee.suthikulpanit@amd.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.