Re: [PATCH v9 2/3] VT-d: wrap a _sync version for all VT-d flush interfaces

From: "Xu, Quan" <quan.xu@intel.com>
To: Jan Beulich <JBeulich@suse.com>
Cc: "dario.faggioli@citrix.com" <dario.faggioli@citrix.com>,
	"Wu, Feng" <feng.wu@intel.com>,
	"Tian, Kevin" <kevin.tian@intel.com>,
	"xen-devel@lists.xen.org" <xen-devel@lists.xen.org>
Subject: Re: [PATCH v9 2/3] VT-d: wrap a _sync version for all VT-d flush interfaces
Date: Thu, 7 Apr 2016 07:44:51 +0000	[thread overview]
Message-ID: <945CA011AD5F084CBEA3E851C0AB28894B878D2E@SHSMSX101.ccr.corp.intel.com> (raw)
In-Reply-To: <5703A2EC02000078000E3151@prv-mh.provo.novell.com>

On April 05, 2016 5:35pm, Jan Beulich <JBeulich@suse.com> wrote:
> >>> On 01.04.16 at 16:47, <quan.xu@intel.com> wrote:
> > The dev_invalidate_iotlb() scans ats_devices list to flush ATS
> > devices, and the invalidate_sync() is put after dev_invalidate_iotlb()
> > to synchronize with hardware for flush status. If we assign multiple
> > ATS devices to a domain, the flush status is about all these multiple
> > ATS devices. Once flush timeout expires, we couldn't find out which
> > one is the buggy ATS device.
> 
> Is that true? Or is that just a limitation of our implementation?
> 

IMO, both.
I hope vt-d maintainers help me double check it.

> > Then, The invalidate_sync() variant (We need to pass down the device's
> > SBDF to hide the ATS device) is put within dev_invalidate_iotlb() to
> > synchronize for the flush status one by one.
> 
> I don't think this is stating current state of things. So ...
> 
> > If flush timeout expires,
> > we could find out the buggy ATS device and hide it. However, for other
> > VT-d flush interfaces, the invalidate_sync() is still put after at present.
> > This is inconsistent.
> 
> ... taken together, what is inconsistent here needs to be described better, as well
> as what it is you do to eliminate the inconsistency. Note that you should not
> refer back (or imply knowledge of) the previous discussion on the earlier
> version.
> In any of that discussion is useful here, you need to summarize it instead.
> 

I will continue to summarize it and send out later.

> > --- a/xen/drivers/passthrough/vtd/extern.h
> > +++ b/xen/drivers/passthrough/vtd/extern.h
> > @@ -61,6 +61,8 @@ int dev_invalidate_iotlb(struct iommu *iommu, u16
> > did,
> >
> >  int qinval_device_iotlb(struct iommu *iommu,
> >                          u32 max_invs_pend, u16 sid, u16 size, u64
> > addr);
> > +int qinval_device_iotlb_sync(struct iommu *iommu, u32 max_invs_pend,
> > +                             u16 sid, u16 size, u64 addr);
> 
> So are then both functions needed to be externally accessible?
> That would seem contrary to the last paragraph of the patch description.
> 

I was aware of this. I'd better make the qinval_device_iotlb() a static one in next v10.

[...]

> > +static int queue_invalidate_context_sync(struct iommu *iommu,
> 
> __must_check?
> 

Agreed.

[...]

> > +{
> > +    queue_invalidate_context(iommu, did, source_id,
> > +                             function_mask, granu);
> > +
> > +    return invalidate_sync(iommu);
> > +}
> 
> Further down you replace the only call to
> queue_invalidate_context() - why keep both functions instead of just making the
> existing one do the sync? (That would the likely also apply to
> qinval_device_iotlb() and others below.)
> 

It is optional.
 I think:
1. in the long term, we may need no _sync version.
2. At least, the current wrap looks good to me. e.g. queue_invalidate_context() is for context-cache Invalidate Descriptor, and the
invalidate_sync() is for Invalidation Wait Descriptor. It is much clearer.

> > @@ -338,23 +365,24 @@ static int flush_iotlb_qi(
> >
> >      if ( qi_ctrl->qinval_maddr != 0 )
> >      {
> > -        int rc;
> > -
> >          /* use queued invalidation */
> >          if (cap_write_drain(iommu->cap))
> >              dw = 1;
> >          if (cap_read_drain(iommu->cap))
> >              dr = 1;
> >          /* Need to conside the ih bit later */
> > -        queue_invalidate_iotlb(iommu,
> > -                               type >>
> DMA_TLB_FLUSH_GRANU_OFFSET, dr,
> > -                               dw, did, size_order, 0, addr);
> > +        ret = queue_invalidate_iotlb_sync(iommu,
> > +                  type >> DMA_TLB_FLUSH_GRANU_OFFSET, dr, dw, did,
> > +                  size_order, 0, addr);
> > +
> > +        /* TODO: Timeout error handling to be added later */
> 
> As of today queue_invalidate_wait() panics, so this comment is not very helpful
> as there is not timeout that could possibly be detected here.
> 

Okay, I will drop it.

> > +        if ( ret )
> > +            return ret;
> > +
> >          if ( flush_dev_iotlb )
> >              ret = dev_invalidate_iotlb(iommu, did, addr, size_order, type);
> > -        rc = invalidate_sync(iommu);
> > -        if ( !ret )
> > -            ret = rc;
> >      }
> 
> I think leaving the existing logic as is would be better - best effort invalidation
> even when an error has occurred.
> 

I have an open:
As vt-d spec(:Queued Invalidation Ordering Considerations) said,
     1. If the Fence(FN) flag is 1 in a inv_wait_dsc, hardware must execute descriptors following the inv_wait_dsc only after wait command is completed.
     2. when a Device-TLB invalidation timeout is detected, hardware must not complete any pending inv_wait_dsc commands.
In current code, the Fence(FN) is always 1.
if a Device-TLB invalidation timeout is detected, this additional inv_wait_dsc is not completed.
__iiuc__, 
the new coming descriptors, in that queue, _might_ be not executed any more, waiting for this additional inv_wait_dsc which is not completed.
is it true?

> > --- a/xen/drivers/passthrough/vtd/x86/ats.c
> > +++ b/xen/drivers/passthrough/vtd/x86/ats.c
> > @@ -118,7 +118,6 @@ int dev_invalidate_iotlb(struct iommu *iommu, u16
> did,
> >      {
> >          u16 sid = PCI_BDF2(pdev->bus, pdev->devfn);
> >          bool_t sbit;
> > -        int rc = 0;
> >
> >          /* Only invalidate devices that belong to this IOMMU */
> >          if ( pdev->iommu != iommu )
> > @@ -134,8 +133,8 @@ int dev_invalidate_iotlb(struct iommu *iommu, u16
> did,
> >              /* invalidate all translations: sbit=1,bit_63=0,bit[62:12]=1 */
> >              sbit = 1;
> >              addr = (~0UL << PAGE_SHIFT_4K) & 0x7FFFFFFFFFFFFFFF;
> > -            rc = qinval_device_iotlb(iommu, pdev->ats_queue_depth,
> > -                                     sid, sbit, addr);
> > +            ret = qinval_device_iotlb_sync(iommu,
> pdev->ats_queue_depth,
> > +                                           sid, sbit, addr);
> >              break;
> >          case DMA_TLB_PSI_FLUSH:
> >              if ( !device_in_domain(iommu, pdev, did) ) @@ -154,16
> > +153,13 @@ int dev_invalidate_iotlb(struct iommu *iommu, u16 did,
> >                  addr |= (((u64)1 << (size_order - 1)) - 1) <<
> PAGE_SHIFT_4K;
> >              }
> >
> > -            rc = qinval_device_iotlb(iommu, pdev->ats_queue_depth,
> > -                                     sid, sbit, addr);
> > +            ret = qinval_device_iotlb_sync(iommu,
> pdev->ats_queue_depth,
> > +                                           sid, sbit, addr);
> >              break;
> >          default:
> >              dprintk(XENLOG_WARNING VTDPREFIX, "invalid vt-d flush
> type\n");
> >              return -EOPNOTSUPP;
> >          }
> > -
> > -        if ( !ret )
> > -            ret = rc;
> >      }
> 
> The removal of "rc" (which btw is unrelated to the purpose of your
> patch) means that if an earlier iteration encountering an error is followed by
> later successful ones, no error would get reported to the caller. Hence this part
> of the change needs to be undone.
> 

Agreed.
I tried to drop rc, as the ret was always zero in previous code.

Quan
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel