RE: [RFC v1 04/18] intel_iommu: add "sm_model" option

From: "Liu, Yi L" <yi.l.liu@intel.com>
To: Peter Xu <zhexu@redhat.com>
Cc: "qemu-devel@nongnu.org" <qemu-devel@nongnu.org>,
	"mst@redhat.com" <mst@redhat.com>,
	"pbonzini@redhat.com" <pbonzini@redhat.com>,
	"alex.williamson@redhat.com" <alex.williamson@redhat.com>,
	"eric.auger@redhat.com" <eric.auger@redhat.com>,
	"david@gibson.dropbear.id.au" <david@gibson.dropbear.id.au>,
	"tianyu.lan@intel.com" <tianyu.lan@intel.com>,
	"Tian, Kevin" <kevin.tian@intel.com>,
	"Tian, Jun J" <jun.j.tian@intel.com>,
	"Sun, Yi Y" <yi.y.sun@intel.com>,
	"kvm@vger.kernel.org" <kvm@vger.kernel.org>,
	Jacob Pan <jacob.jun.pan@linux.intel.com>,
	Yi Sun <yi.y.sun@linux.intel.com>
Subject: RE: [RFC v1 04/18] intel_iommu: add "sm_model" option
Date: Wed, 10 Jul 2019 12:14:44 +0000	[thread overview]
Message-ID: <A2975661238FB949B60364EF0F2C257439F2A6D3@SHSMSX104.ccr.corp.intel.com> (raw)
In-Reply-To: <20190709021554.GB5178@xz-x1>

> From: Peter Xu [mailto:zhexu@redhat.com]
> Sent: Tuesday, July 9, 2019 10:16 AM
> To: Liu, Yi L <yi.l.liu@intel.com>
> Subject: Re: [RFC v1 04/18] intel_iommu: add "sm_model" option
> 
> On Fri, Jul 05, 2019 at 07:01:37PM +0800, Liu Yi L wrote:
> > Intel VT-d 3.0 introduces scalable mode, and it has a bunch of
> > capabilities related to scalable mode translation, thus there
> > are multiple combinations. While this vIOMMU implementation
> > wants simplify it for user by providing typical combinations.
> > User could config it by "sm_model" option. The usage is as
> > below:
> >
> > "-device intel-iommu,x-scalable-mode=on,sm_model=["legacy"|"scalable"]"
> 
> Is it a requirement to split into two parameters, instead of just
> exposing everything about scalable mode when x-scalable-mode is set?

yes, it is. Scalable mode has multiple capabilities. And we want to support
the most typical combinations to simplify software. e.g. current scalable mode
vIOMMU exposes only 2nd level translation to guest, and guest IOVA support
is via shadowing guest 2nd level page table. We have plan to move IOVA from
2nd level page table to 1st level page table, thus guest IOVA can be supported
with nested translation. And this also addresses the co-existence issue of guest
SVA and guest IOVA. So in future we will have scalable mode vIOMMU expose
1st level translation only. To differentiate this config with current vIOMMU,
we need an extra option to control it. But yes, it is still scalable mode vIOMMU.
just has different capability exposed to guest.

BTW. do you know if I can add sub-options under "x-scalable-mode"? I think
that may demonstrate the dependency better.

> >
> >  - "legacy": gives support for SL page table
> >  - "scalable": gives support for FL page table, pasid, virtual command
> >  - default to be "legacy" if "x-scalable-mode=on while no sm_model is
> >    configured
> >
> > Cc: Kevin Tian <kevin.tian@intel.com>
> > Cc: Jacob Pan <jacob.jun.pan@linux.intel.com>
> > Cc: Peter Xu <peterx@redhat.com>
> > Cc: Yi Sun <yi.y.sun@linux.intel.com>
> > Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
> > Signed-off-by: Yi Sun <yi.y.sun@linux.intel.com>
> > ---
> >  hw/i386/intel_iommu.c          | 28 +++++++++++++++++++++++++++-
> >  hw/i386/intel_iommu_internal.h |  2 ++
> >  include/hw/i386/intel_iommu.h  |  1 +
> >  3 files changed, 30 insertions(+), 1 deletion(-)
> >
> > diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
> > index 44b1231..3160a05 100644
> > --- a/hw/i386/intel_iommu.c
> > +++ b/hw/i386/intel_iommu.c
> > @@ -3014,6 +3014,7 @@ static Property vtd_properties[] = {
> >      DEFINE_PROP_BOOL("caching-mode", IntelIOMMUState, caching_mode,
> FALSE),
> >      DEFINE_PROP_BOOL("x-scalable-mode", IntelIOMMUState, scalable_mode,
> FALSE),
> >      DEFINE_PROP_BOOL("dma-drain", IntelIOMMUState, dma_drain, true),
> > +    DEFINE_PROP_STRING("sm_model", IntelIOMMUState, sm_model),
> 
> Can do 's/-/_/' to follow the rest if we need it.

Do you mean sub-options after "x-scalable-mode"?

> >      DEFINE_PROP_END_OF_LIST(),
> >  };
> >
> > @@ -3489,6 +3490,14 @@ static void vtd_iommu_replay(IOMMUMemoryRegion
> *iommu_mr, IOMMUNotifier *n)
> >      return;
> >  }
> >
> > +const char sm_model_manual[] =
> > +        "\"-device intel-iommu,x-scalable-mode=on,"
> > +        "sm_model=[\"legacy\"|\"scalable\"]\"\n"
> > +        " - \"legacy\" gives support for SL page table based IOVA\n"
> > +        " - \"scalable\" gives support for FL page table based IOVA and SVA\n"
> > +        " - default to be \"legacy\" if \"x-scalable-mode=on\""
> > +        " while no sm_model is configured\n";
> > +
> >  /* Do the initialization. It will also be called when reset, so pay
> >   * attention when adding new initialization stuff.
> >   */
> > @@ -3557,9 +3566,26 @@ static void vtd_init(IntelIOMMUState *s)
> >          s->cap |= VTD_CAP_CM;
> >      }
> >
> > +    if (s->sm_model && !s->scalable_mode) {
> > +        printf("\n\"sm_model\" depends on \"x-scalable-mode\"\n"
> > +               "please check if \"x-scalable-mode\" is expected\n"
> > +               "\"sm_model\" manual:\n%s", sm_model_manual);
> > +        exit(1);
> 
> Let's avoid calling exit() directly considering that we've had things
> like vtd_decide_config() already which allows an Error**.  We can also
> introduce that too into vtd_init() and pass the error to upper to
> handle the failure.

sure.

> > +    }
> > +
> >      /* TODO: read cap/ecap from host to decide which cap to be exposed. */
> >      if (s->scalable_mode) {
> > -        s->ecap |= VTD_ECAP_SMTS | VTD_ECAP_SRS | VTD_ECAP_SLTS;
> > +        if (!s->sm_model || !strcmp(s->sm_model, "legacy")) {
> > +            s->ecap |= VTD_ECAP_SMTS | VTD_ECAP_SRS | VTD_ECAP_SLTS;
> > +        } else if (!strcmp(s->sm_model, "scalable")) {
> > +            s->ecap |= VTD_ECAP_SMTS | VTD_ECAP_SRS | VTD_ECAP_PASID
> > +                       | VTD_ECAP_FLTS;
> 
> Do you also need VTD_ECAP_SLTS here?

As mentioned above, in long term, we want to expose FLT to guest only.

> > +        } else {
> > +            printf("\n!!!!! Invalid sm_model config !!!!!\n"
> > +                "Please config sm_model=[\"legacy\"|\"scalable\"]\n"
> > +                "\"sm_model\" manual:\n%s", sm_model_manual);
> > +            exit(1);
> 
> Same here.

got it.

> Thanks,
> 
> > +        }
> >      }
> >
> >      vtd_reset_caches(s);
> > diff --git a/hw/i386/intel_iommu_internal.h b/hw/i386/intel_iommu_internal.h
> > index c1235a7..adae198 100644
> > --- a/hw/i386/intel_iommu_internal.h
> > +++ b/hw/i386/intel_iommu_internal.h
> > @@ -190,8 +190,10 @@
> >  #define VTD_ECAP_PT                 (1ULL << 6)
> >  #define VTD_ECAP_MHMV               (15ULL << 20)
> >  #define VTD_ECAP_SRS                (1ULL << 31)
> > +#define VTD_ECAP_PASID              (1ULL << 40)
> >  #define VTD_ECAP_SMTS               (1ULL << 43)
> >  #define VTD_ECAP_SLTS               (1ULL << 46)
> > +#define VTD_ECAP_FLTS               (1ULL << 47)
> >
> >  /* CAP_REG */
> >  /* (offset >> 4) << 24 */
> > diff --git a/include/hw/i386/intel_iommu.h b/include/hw/i386/intel_iommu.h
> > index 12f3d26..b51cc9f 100644
> > --- a/include/hw/i386/intel_iommu.h
> > +++ b/include/hw/i386/intel_iommu.h
> > @@ -270,6 +270,7 @@ struct IntelIOMMUState {
> >      bool buggy_eim;                 /* Force buggy EIM unless eim=off */
> >      uint8_t aw_bits;                /* Host/IOVA address width (in bits) */
> >      bool dma_drain;                 /* Whether DMA r/w draining enabled */
> > +    char *sm_model;          /* identify actual scalable mode iommu model*/
> >
> >      /*
> >       * Protects IOMMU states in general.  Currently it protects the
> > --
> > 2.7.4
> >
> 
> Regards,
> 
> --
> Peter Xu

Thanks,
Yi Liu