All of lore.kernel.org
 help / color / mirror / Atom feed
From: Igor Mammedov <imammedo@redhat.com>
To: Joao Martins <joao.m.martins@oracle.com>
Cc: Eduardo Habkost <ehabkost@redhat.com>,
	"Michael S . Tsirkin" <mst@redhat.com>,
	Richard Henderson <richard.henderson@linaro.org>,
	qemu-devel@nongnu.org, Daniel Jordan <daniel.m.jordan@oracle.com>,
	David Edmondson <david.edmondson@oracle.com>,
	Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>,
	Paolo Bonzini <pbonzini@redhat.com>
Subject: Re: [PATCH RFC 1/6] i386/pc: Account IOVA reserved ranges above 4G boundary
Date: Mon, 28 Jun 2021 17:21:50 +0200	[thread overview]
Message-ID: <20210628172150.672072f4@redhat.com> (raw)
In-Reply-To: <e4be49b5-69ea-5f15-8c35-bd4be51f5adc@oracle.com>

On Mon, 28 Jun 2021 14:43:48 +0100
Joao Martins <joao.m.martins@oracle.com> wrote:

> On 6/28/21 2:25 PM, Igor Mammedov wrote:
> > On Wed, 23 Jun 2021 14:07:29 +0100
> > Joao Martins <joao.m.martins@oracle.com> wrote:
> >   
> >> On 6/23/21 1:09 PM, Igor Mammedov wrote:  
> >>> On Wed, 23 Jun 2021 10:51:59 +0100
> >>> Joao Martins <joao.m.martins@oracle.com> wrote:
> >>>     
> >>>> On 6/23/21 10:03 AM, Igor Mammedov wrote:    
> >>>>> On Tue, 22 Jun 2021 16:49:00 +0100
> >>>>> Joao Martins <joao.m.martins@oracle.com> wrote:
> >>>>>       
> >>>>>> It is assumed that the whole GPA space is available to be
> >>>>>> DMA addressable, within a given address space limit. Since
> >>>>>> v5.4 based that is not true, and VFIO will validate whether
> >>>>>> the selected IOVA is indeed valid i.e. not reserved by IOMMU
> >>>>>> on behalf of some specific devices or platform-defined.
> >>>>>>
> >>>>>> AMD systems with an IOMMU are examples of such platforms and
> >>>>>> particularly may export only these ranges as allowed:
> >>>>>>
> >>>>>> 	0000000000000000 - 00000000fedfffff (0      .. 3.982G)
> >>>>>> 	00000000fef00000 - 000000fcffffffff (3.983G .. 1011.9G)
> >>>>>> 	0000010000000000 - ffffffffffffffff (1Tb    .. 16Pb)
> >>>>>>
> >>>>>> We already know of accounting for the 4G hole, albeit if the
> >>>>>> guest is big enough we will fail to allocate a >1010G given
> >>>>>> the ~12G hole at the 1Tb boundary, reserved for HyperTransport.
> >>>>>>
> >>>>>> When creating the region above 4G, take into account what
> >>>>>> IOVAs are allowed by defining the known allowed ranges
> >>>>>> and search for the next free IOVA ranges. When finding a
> >>>>>> invalid IOVA we mark them as reserved and proceed to the
> >>>>>> next allowed IOVA region.
> >>>>>>
> >>>>>> After accounting for the 1Tb hole on AMD hosts, mtree should
> >>>>>> look like:
> >>>>>>
> >>>>>> 0000000100000000-000000fcffffffff (prio 0, i/o):
> >>>>>> 	alias ram-above-4g @pc.ram 0000000080000000-000000fc7fffffff
> >>>>>> 0000010000000000-000001037fffffff (prio 0, i/o):
> >>>>>> 	alias ram-above-1t @pc.ram 000000fc80000000-000000ffffffffff      
> >>>>>
> >>>>> You are talking here about GPA which is guest specific thing
> >>>>> and then somehow it becomes tied to host. For bystanders it's
> >>>>> not clear from above commit message how both are related.
> >>>>> I'd add here an explicit explanation how AMD host is related GPAs
> >>>>> and clarify where you are talking about guest/host side.
> >>>>>       
> >>>> OK, makes sense.
> >>>>
> >>>> Perhaps using IOVA makes it easier to understand. I said GPA because
> >>>> there's an 1:1 mapping between GPA and IOVA (if you're not using vIOMMU).    
> >>>
> >>> IOVA may be a too broad term, maybe explain it in terms of GPA and HPA
> >>> and why it does matter on each side (host/guest)
> >>>     
> >>
> >> I used the term IOVA specially because that is applicable to Host IOVA or
> >> Guest IOVA (same rules apply as this is not special cased for VMs). So,
> >> regardless of whether we have guest mode page tables, or just host
> >> iommu page tables, this address range should be reserved and not used.  
> > 
> > IOVA doesn't make it any clearer, on contrary it's more confusing.
> > 
> > And does host's HPA matter at all? (if host's firmware isn't broken,
> > it should never use nor advertise 1Tb hole). 
> > So we probably talking here only about GPA only.
> >   
> For the case in point for the series, yes it's only GPA that we care about.
> 
> Perhaps I misunderstood your earlier comment where you said how HPAs were
> affected, so I was trying to encompass the problem statement in a Guest/Host
> agnostic manner by using IOVA given this is all related to IOMMU reserved ranges.
> I'll stick to GPA to avoid any confusion -- as that's what matters for this series.

Even better is to add here a reference to spec where it says so.

> 
> >>>>> also what about usecases:
> >>>>>  * start QEMU with Intel cpu model on AMD host with intel's iommu      
> >>>>
> >>>> In principle it would be less likely to occur. But you would still need
> >>>> to mark the same range as reserved. The limitation is on DMA occuring
> >>>> on those IOVAs (host or guest) coinciding with that range, so you would
> >>>> want to inform the guest that at least those should be avoided.
> >>>>    
> >>>>>  * start QEMU with AMD cpu model and AMD's iommu on Intel host      
> >>>>
> >>>> Here you would probably only mark the range, solely for honoring how hardware
> >>>> is usually represented. But really, on Intel, nothing stops you from exposing the
> >>>> aforementioned range as RAM.
> >>>>    
> >>>>>  * start QEMU in TCG mode on AMD host (mostly form qtest point ot view)
> >>>>>       
> >>>> This one is tricky. Because you can hotplug a VFIO device later on,
> >>>> I opted for always marking the reserved range. If you don't use VFIO you're good, but
> >>>> otherwise you would still need reserved. But I am not sure how qtest is used
> >>>> today for testing huge guests.    
> >>> I do not know if there are VFIO tests in qtest (probably nope, since that
> >>> could require a host configured for that), but we can add a test
> >>> for his memory quirk (assuming phys-bits won't get in the way)
> >>>     
> >>
> >> 	Joao
> >>  
> >   
> 



  reply	other threads:[~2021-06-28 15:23 UTC|newest]

Thread overview: 38+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-06-22 15:48 [PATCH RFC 0/6] i386/pc: Fix creation of >= 1Tb guests on AMD systems with IOMMU Joao Martins
2021-06-22 15:49 ` [PATCH RFC 1/6] i386/pc: Account IOVA reserved ranges above 4G boundary Joao Martins
2021-06-23  7:11   ` Igor Mammedov
2021-06-23  9:37     ` Joao Martins
2021-06-23 11:39       ` Igor Mammedov
2021-06-23 13:04         ` Joao Martins
2021-06-28 14:32           ` Igor Mammedov
2021-08-06 10:41             ` Joao Martins
2021-06-23  9:03   ` Igor Mammedov
2021-06-23  9:51     ` Joao Martins
2021-06-23 12:09       ` Igor Mammedov
2021-06-23 13:07         ` Joao Martins
2021-06-28 13:25           ` Igor Mammedov
2021-06-28 13:43             ` Joao Martins
2021-06-28 15:21               ` Igor Mammedov [this message]
2021-06-24  9:32     ` Dr. David Alan Gilbert
2021-06-28 14:42       ` Igor Mammedov
2021-06-22 15:49 ` [PATCH RFC 2/6] i386/pc: Round up the hotpluggable memory within valid IOVA ranges Joao Martins
2021-06-22 15:49 ` [PATCH RFC 3/6] pc/cmos: Adjust CMOS above 4G memory size according to 1Tb boundary Joao Martins
2021-06-22 15:49 ` [PATCH RFC 4/6] i386/pc: Keep PCI 64-bit hole within usable IOVA space Joao Martins
2021-06-23 12:30   ` Igor Mammedov
2021-06-23 13:22     ` Joao Martins
2021-06-28 15:37       ` Igor Mammedov
2021-06-23 16:33     ` Laszlo Ersek
2021-06-25 17:19       ` Joao Martins
2021-06-22 15:49 ` [PATCH RFC 5/6] i386/acpi: Fix SRAT ranges in accordance to usable IOVA Joao Martins
2021-06-22 15:49 ` [PATCH RFC 6/6] i386/pc: Add a machine property for AMD-only enforcing of valid IOVAs Joao Martins
2021-06-23  9:18   ` Igor Mammedov
2021-06-23  9:59     ` Joao Martins
2021-06-22 21:16 ` [PATCH RFC 0/6] i386/pc: Fix creation of >= 1Tb guests on AMD systems with IOMMU Alex Williamson
2021-06-23  7:40   ` David Edmondson
2021-06-23 19:13     ` Alex Williamson
2021-06-23  9:30   ` Joao Martins
2021-06-23 11:58     ` Igor Mammedov
2021-06-23 13:15       ` Joao Martins
2021-06-23 19:27     ` Alex Williamson
2021-06-24  9:22       ` Dr. David Alan Gilbert
2021-06-25 16:54       ` Joao Martins

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210628172150.672072f4@redhat.com \
    --to=imammedo@redhat.com \
    --cc=daniel.m.jordan@oracle.com \
    --cc=david.edmondson@oracle.com \
    --cc=ehabkost@redhat.com \
    --cc=joao.m.martins@oracle.com \
    --cc=mst@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=richard.henderson@linaro.org \
    --cc=suravee.suthikulpanit@amd.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.