Re: [PATCH 2/3] ACPI: Add driver for the VIOT table - Jean-Philippe Brucker

From: Jean-Philippe Brucker <jean-philippe@linaro.org>
To: Robin Murphy <robin.murphy@arm.com>
Cc: rjw@rjwysocki.net, lenb@kernel.org, joro@8bytes.org,
	mst@redhat.com, kevin.tian@intel.com,
	virtualization@lists.linux-foundation.org,
	linux-acpi@vger.kernel.org, iommu@lists.linux-foundation.org,
	sebastien.boeuf@intel.com, will@kernel.org
Subject: Re: [PATCH 2/3] ACPI: Add driver for the VIOT table
Date: Thu, 15 Apr 2021 16:31:59 +0200	[thread overview]
Message-ID: <YHhOX6yZi1bxifDp@myrica> (raw)
In-Reply-To: <2f081b8f-98e2-2ce1-6be6-bb81aab8e153@arm.com>

On Thu, Mar 18, 2021 at 07:36:50PM +0000, Robin Murphy wrote:
> On 2021-03-16 19:16, Jean-Philippe Brucker wrote:
> > The ACPI Virtual I/O Translation Table describes topology of
> > para-virtual platforms. For now it describes the relation between
> > virtio-iommu and the endpoints it manages. Supporting that requires
> > three steps:
> > 
> > (1) acpi_viot_init(): parse the VIOT table, build a list of endpoints
> >      and vIOMMUs.
> > 
> > (2) acpi_viot_set_iommu_ops(): when the vIOMMU driver is loaded and the
> >      device probed, register it to the VIOT driver. This step is required
> >      because unlike similar drivers, VIOT doesn't create the vIOMMU
> >      device.
> 
> Note that you're basically the same as the DT case in this regard, so I'd
> expect things to be closer to that pattern than to that of IORT.
> 
> [...]
> > @@ -1506,12 +1507,17 @@ int acpi_dma_configure_id(struct device *dev, enum dev_dma_attr attr,
> >   {
> >   	const struct iommu_ops *iommu;
> >   	u64 dma_addr = 0, size = 0;
> > +	int ret;
> >   	if (attr == DEV_DMA_NOT_SUPPORTED) {
> >   		set_dma_ops(dev, &dma_dummy_ops);
> >   		return 0;
> >   	}
> > +	ret = acpi_viot_dma_setup(dev, attr);
> > +	if (ret)
> > +		return ret > 0 ? 0 : ret;
> 
> I think things could do with a fair bit of refactoring here. Ideally we want
> to process a possible _DMA method (acpi_dma_get_range()) regardless of which
> flavour of IOMMU table might be present, and the amount of duplication we
> fork into at this point is unfortunate.
> 
> > +
> >   	iort_dma_setup(dev, &dma_addr, &size);
> 
> For starters I think most of that should be dragged out to this level here -
> it's really only the {rc,nc}_dma_get_range() bit that deserves to be the
> IORT-specific call.

Makes sense, though I'll move it to drivers/acpi/arm64/dma.c instead of
here, because it has only ever run on CONFIG_ARM64. I don't want to
accidentally break some x86 platform with an invalid _DMA method (same
reason 7ad426398082 and 18b709beb503 kept this code in IORT)

> 
> >   	iommu = iort_iommu_configure_id(dev, input_id);
> 
> Similarly, it feels like it's only the table scan part in the middle of that
> that needs dispatching between IORT/VIOT, and its head and tail pulled out
> into a common path.

Agreed

> 
> [...]
> > +static const struct iommu_ops *viot_iommu_setup(struct device *dev)
> > +{
> > +	struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(dev);
> > +	struct viot_iommu *viommu = NULL;
> > +	struct viot_endpoint *ep;
> > +	u32 epid;
> > +	int ret;
> > +
> > +	/* Already translated? */
> > +	if (fwspec && fwspec->ops)
> > +		return NULL;
> > +
> > +	mutex_lock(&viommus_lock);
> > +	list_for_each_entry(ep, &viot_endpoints, list) {
> > +		if (viot_device_match(dev, &ep->dev_id, &epid)) {
> > +			epid += ep->endpoint_id;
> > +			viommu = ep->viommu;
> > +			break;
> > +		}
> > +	}
> > +	mutex_unlock(&viommus_lock);
> > +	if (!viommu)
> > +		return NULL;
> > +
> > +	/* We're not translating ourself */
> > +	if (viot_device_match(dev, &viommu->dev_id, &epid))
> > +		return NULL;
> > +
> > +	/*
> > +	 * If we found a PCI range managed by the viommu, we're the one that has
> > +	 * to request ACS.
> > +	 */
> > +	if (dev_is_pci(dev))
> > +		pci_request_acs();
> > +
> > +	if (!viommu->ops || WARN_ON(!viommu->dev))
> > +		return ERR_PTR(-EPROBE_DEFER);
> 
> Can you create (or look up) a viommu->fwnode when initially parsing the VIOT
> to represent the IOMMU devices to wait for, such that the
> viot_device_match() lookup can resolve to that and let you fall into the
> standard iommu_ops_from_fwnode() path? That's what I mean about following
> the DT pattern - I guess it might need a bit of trickery to rewrite things
> if iommu_device_register() eventually turns up with a new fwnode, so I doubt
> we can get away without *some* kind of private interface between
> virtio-iommu and VIOT, but it would be nice for the common(ish) DMA paths to
> stay as unaware of the specifics as possible.

Yes I can reuse iommu_ops_from_fwnode(). Turns out it's really easy: if we
move the VIOT initialization after acpi_scan_init(), we can use
pci_get_domain_bus_and_slot() directly and create missing fwnodes. That
gets rid of any need for a private interface between virtio-iommu and
VIOT.

> 
> > +
> > +	ret = iommu_fwspec_init(dev, viommu->dev->fwnode, viommu->ops);
> > +	if (ret)
> > +		return ERR_PTR(ret);
> > +
> > +	iommu_fwspec_add_ids(dev, &epid, 1);
> > +
> > +	/*
> > +	 * If we have reason to believe the IOMMU driver missed the initial
> > +	 * add_device callback for dev, replay it to get things in order.
> > +	 */
> > +	if (dev->bus && !device_iommu_mapped(dev))
> > +		iommu_probe_device(dev);
> > +
> > +	return viommu->ops;
> > +}
> > +
> > +/**
> > + * acpi_viot_dma_setup - Configure DMA for an endpoint described in VIOT
> > + * @dev: the endpoint
> > + * @attr: coherency property of the endpoint
> > + *
> > + * Setup the DMA and IOMMU ops for an endpoint described by the VIOT table.
> > + *
> > + * Return:
> > + * * 0 - @dev doesn't match any VIOT node
> > + * * 1 - ops for @dev were successfully installed
> > + * * -EPROBE_DEFER - ops for @dev aren't yet available
> > + */
> > +int acpi_viot_dma_setup(struct device *dev, enum dev_dma_attr attr)
> > +{
> > +	const struct iommu_ops *iommu_ops = viot_iommu_setup(dev);
> > +
> > +	if (IS_ERR_OR_NULL(iommu_ops)) {
> > +		int ret = PTR_ERR(iommu_ops);
> > +
> > +		if (ret == -EPROBE_DEFER || ret == 0)
> > +			return ret;
> > +		dev_err(dev, "error %d while setting up virt IOMMU\n", ret);
> > +		return 0;
> > +	}
> > +
> > +#ifdef CONFIG_ARCH_HAS_SETUP_DMA_OPS
> > +	arch_setup_dma_ops(dev, 0, ~0ULL, iommu_ops, attr == DEV_DMA_COHERENT);
> > +#else
> > +	iommu_setup_dma_ops(dev, 0, ~0ULL);
> > +#endif
> 
> Duplicating all of this feels particularly wrong... :(

Right, I still don't have a good solution for this last part. Ideally I'd
implement arch_setup_dma_ops() on x86 but virtio-iommu alone isn't enough
justification and changing DMAR and IVRS to use it is too much work. For
the next version I added a probe_finalize() method in virtio-iommu that
does the same as Vt-d and AMD IOMMU on x86. Hopefully the only wart in the
series.

Thanks,
Jean