Linux-ARM-Kernel Archive on lore.kernel.org
 help / color / Atom feed
From: Robin Murphy <robin.murphy@arm.com>
To: John Garry <john.garry@huawei.com>, Marc Zyngier <maz@kernel.org>,
	Will Deacon <will@kernel.org>,
	Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>,
	Sudeep Holla <sudeep.holla@arm.com>,
	"Guohanjun (Hanjun Guo)" <guohanjun@huawei.com>
Cc: Saravana Kannan <saravanak@google.com>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	Alex Williamson <alex.williamson@redhat.com>,
	Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>,
	Linuxarm <linuxarm@huawei.com>,
	iommu <iommu@lists.linux-foundation.org>,
	Bjorn Helgaas <bhelgaas@google.com>,
	"linux-arm-kernel@lists.infradead.org"
	<linux-arm-kernel@lists.infradead.org>
Subject: Re: arm64 iommu groups issue
Date: Thu, 13 Feb 2020 19:40:42 +0000
Message-ID: <be464e2a-03d5-0b2e-24ee-96d0d14fd739@arm.com> (raw)
In-Reply-To: <ddc7eaff-c3f9-4304-9b4e-75eff2c66cd5@huawei.com>

On 13/02/2020 3:49 pm, John Garry wrote:
>>>
>>> The underlying issue is that, for historical reasons, OF/IORT-based
>>> IOMMU drivers have ended up with group allocation being tied to endpoint
>>> driver probing via the dma_configure() mechanism (long story short,
>>> driver probe is the only thing which can be delayed in order to wait for
>>> a specific IOMMU instance to be ready).However, in the meantime, the
>>> IOMMU API internals have evolved sufficiently that I think there's a way
>>> to really put things right - I have the spark of an idea which I'll try
>>> to sketch out ASAP...
>>>
>>
>> OK, great.
> 
> Hi Robin,
> 
> I was wondering if you have had a chance to consider this problem again?

Yeah, sorry, more things came up such that it still hasn't been P yet ;)

Lorenzo and I did get as far as discussing it a bit more and writing up 
a ticket, so it's formally on our internal radar now, at least.

> One simple idea could be to introduce a device link between the endpoint 
> device and its parent bridge to ensure that they probe in order, as 
> expected in pci_device_group():
> 
> Subject: [PATCH] PCI: Add device link to ensure endpoint device driver 
> probes after bridge
> 
> It is required to ensure that a device driver for an endpoint will probe
> after the parent port driver, so add a device link for this.
> 
> ---
> diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
> index 512cb4312ddd..4b832ad25b20 100644
> --- a/drivers/pci/probe.c
> +++ b/drivers/pci/probe.c
> @@ -2383,6 +2383,7 @@ static void pci_set_msi_domain(struct pci_dev *dev)
>   void pci_device_add(struct pci_dev *dev, struct pci_bus *bus)
>   {
>       int ret;
> +    struct device *parent;
> 
>       pci_configure_device(dev);
> 
> @@ -2420,6 +2421,10 @@ void pci_device_add(struct pci_dev *dev, struct 
> pci_bus *bus)
>       /* Set up MSI IRQ domain */
>       pci_set_msi_domain(dev);
> 
> +    parent = dev->dev.parent;
> +    if (parent && parent->bus == &pci_bus_type)
> +        device_link_add(&dev->dev, parent, DL_FLAG_AUTOPROBE_CONSUMER);
> +
>       /* Notifier could use PCI capabilities */
>       dev->match_driver = false;
>       ret = device_add(&dev->dev);
> -- 
> 
> This would work, but the problem is that if the port driver fails in 
> probing - and not just for -EPROBE_DEFER - then the child device will 
> never probe. This very thing happens on my dev board. However we could 
> expand the device links API to cover this sort of scenario.

Yes, that's an undesirable issue, but in fact I think it's mostly 
indicative that involving drivers in something which is designed to 
happen at a level below drivers is still fundamentally wrong and doomed 
to be fragile at best.

Another thought that crosses my mind is that when pci_device_group() 
walks up to the point of ACS isolation and doesn't find an existing 
group, it can still infer that everything it walked past *should* be put 
in the same group it's then eventually going to return. Unfortunately I 
can't see an obvious way for it to act on that knowledge, though, since 
recursive iommu_probe_device() is unlikely to end well.

> As for alternatives, it looks pretty difficult to me to disassociate the 
> group allocation from the dma_configure path.

Indeed it's non-trivial, but it really does need cleaning up at some point.

Having just had yet another spark, does something like the untested 
super-hack below work at all? I doubt it's a viable direction to take in 
itself, but it could be food for thought if it at least proves the concept.

Robin.

----->8-----
diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index aa3ac2a03807..554cde76c766 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -3841,20 +3841,20 @@ static int arm_smmu_set_bus_ops(struct iommu_ops 
*ops)
  	int err;

  #ifdef CONFIG_PCI
-	if (pci_bus_type.iommu_ops != ops) {
+	if (1) {
  		err = bus_set_iommu(&pci_bus_type, ops);
  		if (err)
  			return err;
  	}
  #endif
  #ifdef CONFIG_ARM_AMBA
-	if (amba_bustype.iommu_ops != ops) {
+	if (1) {
  		err = bus_set_iommu(&amba_bustype, ops);
  		if (err)
  			goto err_reset_pci_ops;
  	}
  #endif
-	if (platform_bus_type.iommu_ops != ops) {
+	if (1) {
  		err = bus_set_iommu(&platform_bus_type, ops);
  		if (err)
  			goto err_reset_amba_ops;
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 660eea8d1d2f..b81ae2b4d4fb 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -1542,6 +1542,14 @@ static int iommu_bus_init(struct bus_type *bus, 
const struct iommu_ops *ops)
  	return err;
  }

+static int iommu_bus_replay(struct device *dev, void *data)
+{
+	if (dev->iommu_group)
+		return 0;
+
+	return add_iommu_group(dev, data);
+}
+
  /**
   * bus_set_iommu - set iommu-callbacks for the bus
   * @bus: bus.
@@ -1564,6 +1572,9 @@ int bus_set_iommu(struct bus_type *bus, const 
struct iommu_ops *ops)
  		return 0;
  	}

+	if (bus->iommu_ops == ops)
+		return bus_for_each_dev(bus, NULL, NULL, iommu_bus_replay);
+
  	if (bus->iommu_ops != NULL)
  		return -EBUSY;


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

  reply index

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-09-19  8:43 John Garry
2019-09-19 13:25 ` Robin Murphy
2019-09-19 14:35   ` John Garry
2019-11-04 12:18     ` John Garry
2020-02-13 15:49     ` John Garry
2020-02-13 19:40       ` Robin Murphy [this message]
2020-02-14 14:09         ` John Garry
2020-02-14 18:35           ` Robin Murphy
2020-02-17 12:08             ` John Garry

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=be464e2a-03d5-0b2e-24ee-96d0d14fd739@arm.com \
    --to=robin.murphy@arm.com \
    --cc=alex.williamson@redhat.com \
    --cc=bhelgaas@google.com \
    --cc=guohanjun@huawei.com \
    --cc=iommu@lists.linux-foundation.org \
    --cc=john.garry@huawei.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linuxarm@huawei.com \
    --cc=lorenzo.pieralisi@arm.com \
    --cc=maz@kernel.org \
    --cc=saravanak@google.com \
    --cc=shameerali.kolothum.thodi@huawei.com \
    --cc=sudeep.holla@arm.com \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Linux-ARM-Kernel Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-arm-kernel/0 linux-arm-kernel/git/0.git
	git clone --mirror https://lore.kernel.org/linux-arm-kernel/1 linux-arm-kernel/git/1.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-arm-kernel linux-arm-kernel/ https://lore.kernel.org/linux-arm-kernel \
		linux-arm-kernel@lists.infradead.org
	public-inbox-index linux-arm-kernel

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.infradead.lists.linux-arm-kernel


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git