iommu.lists.linux-foundation.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/4] Sort out SMMUv3 ATC invalidation and locking
@ 2019-08-20 15:45 Will Deacon
  2019-08-20 15:45 ` [PATCH 1/4] iommu/arm-smmu-v3: Document ordering guarantees of command insertion Will Deacon
                   ` (3 more replies)
  0 siblings, 4 replies; 9+ messages in thread
From: Will Deacon @ 2019-08-20 15:45 UTC (permalink / raw)
  To: iommu; +Cc: Jean-Philippe Brucker, Will Deacon, Robin Murphy

Hi all,

This series arose from my attempt to remove the 'devices_lock' from the
->unmap() path. In actual fact, I think the current code in mainline
gets this wrong, so I've fixed up the ordering and then removed the lock.
Unfortunately, this relies on my deferred invalidation work so that the
invalidation range is propagated through to the sync callback:

  https://git.kernel.org/pub/scm/linux/kernel/git/will/linux.git/log/?h=iommu/cmdq

At this point, if we decide to do anything for earier kernels, the easiest
thing is probably to nobble the ATS feature at probe time.

NOTE: this has not been tested, since I don't have access to any systems
capable of ATS!

Feedback welcome,

Will

Cc: Zhen Lei <thunder.leizhen@huawei.com>
Cc: Jean-Philippe Brucker <jean-philippe@linaro.org>
Cc: John Garry <john.garry@huawei.com>
Cc: Robin Murphy <robin.murphy@arm.com>

--->8

Will Deacon (4):
  iommu/arm-smmu-v3: Document ordering guarantees of command insertion
  iommu/arm-smmu-v3: Rework enabling/disabling of ATS for PCI masters
  iommu/arm-smmu-v3: Fix ATC invalidation ordering wrt main TLBs
  iommu/arm-smmu-v3: Avoid locking on invalidation path when not using
    ATS

 drivers/iommu/arm-smmu-v3.c | 105 ++++++++++++++++++++++++++++++++------------
 1 file changed, 77 insertions(+), 28 deletions(-)

-- 
2.11.0

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH 1/4] iommu/arm-smmu-v3: Document ordering guarantees of command insertion
  2019-08-20 15:45 [PATCH 0/4] Sort out SMMUv3 ATC invalidation and locking Will Deacon
@ 2019-08-20 15:45 ` Will Deacon
  2019-08-20 15:45 ` [PATCH 2/4] iommu/arm-smmu-v3: Rework enabling/disabling of ATS for PCI masters Will Deacon
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 9+ messages in thread
From: Will Deacon @ 2019-08-20 15:45 UTC (permalink / raw)
  To: iommu; +Cc: Jean-Philippe Brucker, Will Deacon, Robin Murphy

It turns out that we've always relied on some subtle ordering guarantees
when inserting commands into the SMMUv3 command queue. With the recent
changes to elide locking when possible, these guarantees become more
subtle and even more important.

Add a comment documented the barrier semantics of command insertion so
that we don't have to derive the behaviour from scratch each time it
comes up on the list.

Signed-off-by: Will Deacon <will@kernel.org>
---
 drivers/iommu/arm-smmu-v3.c | 16 ++++++++++++++++
 1 file changed, 16 insertions(+)

diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index b36a99971401..3402b1bc8e94 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -1286,6 +1286,22 @@ static void arm_smmu_cmdq_write_entries(struct arm_smmu_cmdq *cmdq, u64 *cmds,
 	}
 }
 
+/*
+ * This is the actual insertion function, and provides the following
+ * ordering guarantees to callers:
+ *
+ * - There is a dma_wmb() before publishing any commands to the queue.
+ *   This can be relied upon to order prior writes to data structures
+ *   in memory (such as a CD or an STE) before the command.
+ *
+ * - On completion of a CMD_SYNC, there is a control dependency.
+ *   This can be relied upon to order subsequent writes to memory (e.g.
+ *   freeing an IOVA) after completion of the CMD_SYNC.
+ *
+ * - Command insertion is totally ordered, so if two CPUs each race to
+ *   insert their own list of commands then all of the commands from one
+ *   CPU will appear before any of the commands from the other CPU.
+ */
 static int arm_smmu_cmdq_issue_cmdlist(struct arm_smmu_device *smmu,
 				       u64 *cmds, int n, bool sync)
 {
-- 
2.11.0

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 2/4] iommu/arm-smmu-v3: Rework enabling/disabling of ATS for PCI masters
  2019-08-20 15:45 [PATCH 0/4] Sort out SMMUv3 ATC invalidation and locking Will Deacon
  2019-08-20 15:45 ` [PATCH 1/4] iommu/arm-smmu-v3: Document ordering guarantees of command insertion Will Deacon
@ 2019-08-20 15:45 ` Will Deacon
  2019-08-20 16:12   ` Robin Murphy
  2019-08-20 15:45 ` [PATCH 3/4] iommu/arm-smmu-v3: Fix ATC invalidation ordering wrt main TLBs Will Deacon
  2019-08-20 15:45 ` [PATCH 4/4] iommu/arm-smmu-v3: Avoid locking on invalidation path when not using ATS Will Deacon
  3 siblings, 1 reply; 9+ messages in thread
From: Will Deacon @ 2019-08-20 15:45 UTC (permalink / raw)
  To: iommu; +Cc: Jean-Philippe Brucker, Will Deacon, Robin Murphy

To prevent any potential issues arising from speculative Address
Translation Requests from an ATS-enabled PCIe endpoint, rework our ATS
enabling/disabling logic so that we enable ATS at the SMMU before we
enable it at the endpoint, and disable things in the opposite order.

Signed-off-by: Will Deacon <will@kernel.org>
---
 drivers/iommu/arm-smmu-v3.c | 44 ++++++++++++++++++++++++++------------------
 1 file changed, 26 insertions(+), 18 deletions(-)

diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index 3402b1bc8e94..9096eca0c480 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -2283,31 +2283,34 @@ static void arm_smmu_install_ste_for_dev(struct arm_smmu_master *master)
 	}
 }
 
-static int arm_smmu_enable_ats(struct arm_smmu_master *master)
+static bool arm_smmu_ats_supported(struct arm_smmu_master *master)
 {
-	int ret;
-	size_t stu;
 	struct pci_dev *pdev;
 	struct arm_smmu_device *smmu = master->smmu;
 	struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(master->dev);
 
 	if (!(smmu->features & ARM_SMMU_FEAT_ATS) || !dev_is_pci(master->dev) ||
 	    !(fwspec->flags & IOMMU_FWSPEC_PCI_RC_ATS) || pci_ats_disabled())
-		return -ENXIO;
+		return false;
 
 	pdev = to_pci_dev(master->dev);
-	if (pdev->untrusted)
-		return -EPERM;
+	return !pdev->untrusted && pdev->ats_cap;
+}
 
-	/* Smallest Translation Unit: log2 of the smallest supported granule */
-	stu = __ffs(smmu->pgsize_bitmap);
+static void arm_smmu_enable_ats(struct arm_smmu_master *master)
+{
+	size_t stu;
+	struct pci_dev *pdev;
+	struct arm_smmu_device *smmu = master->smmu;
 
-	ret = pci_enable_ats(pdev, stu);
-	if (ret)
-		return ret;
+	if (master->ats_enabled || !dev_is_pci(master->dev))
+		return;
 
-	master->ats_enabled = true;
-	return 0;
+	/* Smallest Translation Unit: log2 of the smallest supported granule */
+	stu = __ffs(smmu->pgsize_bitmap);
+	pdev = to_pci_dev(master->dev);
+	if (pci_enable_ats(pdev, stu))
+		dev_err(master->dev, "Failed to enable ATS (STU %zu)\n", stu);
 }
 
 static void arm_smmu_disable_ats(struct arm_smmu_master *master)
@@ -2317,10 +2320,14 @@ static void arm_smmu_disable_ats(struct arm_smmu_master *master)
 	if (!master->ats_enabled || !dev_is_pci(master->dev))
 		return;
 
+	pci_disable_ats(to_pci_dev(master->dev));
+	/*
+	 * Ensure ATS is disabled at the endpoint before we issue the
+	 * ATC invalidation via the SMMU.
+	 */
+	wmb();
 	arm_smmu_atc_inv_to_cmd(0, 0, 0, &cmd);
 	arm_smmu_atc_inv_master(master, &cmd);
-	pci_disable_ats(to_pci_dev(master->dev));
-	master->ats_enabled = false;
 }
 
 static void arm_smmu_detach_dev(struct arm_smmu_master *master)
@@ -2335,10 +2342,10 @@ static void arm_smmu_detach_dev(struct arm_smmu_master *master)
 	list_del(&master->domain_head);
 	spin_unlock_irqrestore(&smmu_domain->devices_lock, flags);
 
+	arm_smmu_disable_ats(master);
 	master->domain = NULL;
+	master->ats_enabled = false;
 	arm_smmu_install_ste_for_dev(master);
-
-	arm_smmu_disable_ats(master);
 }
 
 static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev)
@@ -2383,12 +2390,13 @@ static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev)
 	spin_unlock_irqrestore(&smmu_domain->devices_lock, flags);
 
 	if (smmu_domain->stage != ARM_SMMU_DOMAIN_BYPASS)
-		arm_smmu_enable_ats(master);
+		master->ats_enabled = arm_smmu_ats_supported(master);
 
 	if (smmu_domain->stage == ARM_SMMU_DOMAIN_S1)
 		arm_smmu_write_ctx_desc(smmu, &smmu_domain->s1_cfg);
 
 	arm_smmu_install_ste_for_dev(master);
+	arm_smmu_enable_ats(master);
 out_unlock:
 	mutex_unlock(&smmu_domain->init_mutex);
 	return ret;
-- 
2.11.0

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 3/4] iommu/arm-smmu-v3: Fix ATC invalidation ordering wrt main TLBs
  2019-08-20 15:45 [PATCH 0/4] Sort out SMMUv3 ATC invalidation and locking Will Deacon
  2019-08-20 15:45 ` [PATCH 1/4] iommu/arm-smmu-v3: Document ordering guarantees of command insertion Will Deacon
  2019-08-20 15:45 ` [PATCH 2/4] iommu/arm-smmu-v3: Rework enabling/disabling of ATS for PCI masters Will Deacon
@ 2019-08-20 15:45 ` Will Deacon
  2019-08-20 16:50   ` Robin Murphy
  2019-08-20 15:45 ` [PATCH 4/4] iommu/arm-smmu-v3: Avoid locking on invalidation path when not using ATS Will Deacon
  3 siblings, 1 reply; 9+ messages in thread
From: Will Deacon @ 2019-08-20 15:45 UTC (permalink / raw)
  To: iommu; +Cc: Jean-Philippe Brucker, Will Deacon, Robin Murphy

When invalidating the ATC for an PCIe endpoint using ATS, we must take
care to complete invalidation of the main SMMU TLBs beforehand, otherwise
the device could immediately repopulate its ATC with stale translations.

Hooking the ATC invalidation into ->unmap() as we currently do does the
exact opposite: it ensures that the ATC is invalidated *before*  the
main TLBs, which is bogus.

Move ATC invalidation into the actual (leaf) invalidation routines so
that it is always called after completing main TLB invalidation.

Signed-off-by: Will Deacon <will@kernel.org>
---
 drivers/iommu/arm-smmu-v3.c | 12 +++++-------
 1 file changed, 5 insertions(+), 7 deletions(-)

diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index 9096eca0c480..183a1c121179 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -1961,6 +1961,7 @@ static void arm_smmu_tlb_inv_context(void *cookie)
 	 */
 	arm_smmu_cmdq_issue_cmd(smmu, &cmd);
 	arm_smmu_cmdq_issue_sync(smmu);
+	arm_smmu_atc_inv_domain(smmu_domain, 0, 0, 0);
 }
 
 static void arm_smmu_tlb_inv_range(unsigned long iova, size_t size,
@@ -1969,7 +1970,7 @@ static void arm_smmu_tlb_inv_range(unsigned long iova, size_t size,
 {
 	u64 cmds[CMDQ_BATCH_ENTRIES * CMDQ_ENT_DWORDS];
 	struct arm_smmu_device *smmu = smmu_domain->smmu;
-	unsigned long end = iova + size;
+	unsigned long start = iova, end = iova + size;
 	int i = 0;
 	struct arm_smmu_cmdq_ent cmd = {
 		.tlbi = {
@@ -1998,6 +1999,8 @@ static void arm_smmu_tlb_inv_range(unsigned long iova, size_t size,
 	}
 
 	arm_smmu_cmdq_issue_cmdlist(smmu, cmds, i, true);
+	if (leaf)
+		arm_smmu_atc_inv_domain(smmu_domain, 0, start, size);
 }
 
 static void arm_smmu_tlb_inv_page_nosync(struct iommu_iotlb_gather *gather,
@@ -2416,18 +2419,13 @@ static int arm_smmu_map(struct iommu_domain *domain, unsigned long iova,
 static size_t arm_smmu_unmap(struct iommu_domain *domain, unsigned long iova,
 			     size_t size, struct iommu_iotlb_gather *gather)
 {
-	int ret;
 	struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
 	struct io_pgtable_ops *ops = smmu_domain->pgtbl_ops;
 
 	if (!ops)
 		return 0;
 
-	ret = ops->unmap(ops, iova, size, gather);
-	if (ret && arm_smmu_atc_inv_domain(smmu_domain, 0, iova, size))
-		return 0;
-
-	return ret;
+	return ops->unmap(ops, iova, size, gather);
 }
 
 static void arm_smmu_flush_iotlb_all(struct iommu_domain *domain)
-- 
2.11.0

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 4/4] iommu/arm-smmu-v3: Avoid locking on invalidation path when not using ATS
  2019-08-20 15:45 [PATCH 0/4] Sort out SMMUv3 ATC invalidation and locking Will Deacon
                   ` (2 preceding siblings ...)
  2019-08-20 15:45 ` [PATCH 3/4] iommu/arm-smmu-v3: Fix ATC invalidation ordering wrt main TLBs Will Deacon
@ 2019-08-20 15:45 ` Will Deacon
  3 siblings, 0 replies; 9+ messages in thread
From: Will Deacon @ 2019-08-20 15:45 UTC (permalink / raw)
  To: iommu; +Cc: Jean-Philippe Brucker, Will Deacon, Robin Murphy

When ATS is not in use, we can avoid taking the 'devices_lock' for the
domain on the invalidation path by simply caching the number of ATS
masters currently attached. The fiddly part is handling a concurrent
->attach() of an ATS-enabled master to a domain that is being
invalidated, but we can handle this using an 'smp_mb()' to ensure that
our check of the count is ordered after completion of our prior TLB
invalidation.

This also makes our ->attach() and ->detach() flows symmetric wrt ATS
interactions.

Signed-off-by: Will Deacon <will@kernel.org>
---
 drivers/iommu/arm-smmu-v3.c | 37 ++++++++++++++++++++++++++++++++-----
 1 file changed, 32 insertions(+), 5 deletions(-)

diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index 183a1c121179..835fd14b071d 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -654,6 +654,7 @@ struct arm_smmu_domain {
 
 	struct io_pgtable_ops		*pgtbl_ops;
 	bool				non_strict;
+	atomic_t			nr_ats_masters;
 
 	enum arm_smmu_domain_stage	stage;
 	union {
@@ -1926,6 +1927,23 @@ static int arm_smmu_atc_inv_domain(struct arm_smmu_domain *smmu_domain,
 	if (!(smmu_domain->smmu->features & ARM_SMMU_FEAT_ATS))
 		return 0;
 
+	/*
+	 * Ensure that we've completed prior invalidation of the main TLBs
+	 * before we read 'nr_ats_masters' in case of a concurrent call to
+	 * arm_smmu_enable_ats():
+	 *
+	 *	// unmap()			// arm_smmu_enable_ats()
+	 *	TLBI+SYNC			atomic_inc(&nr_ats_masters);
+	 *	smp_mb();			[...]
+	 *	atomic_read(&nr_ats_masters);	pci_enable_ats() // writel()
+	 *
+	 * Ensures that we always see the incremented 'nr_ats_masters' count if
+	 * ATS was enabled at the PCI device before completion of the TLBI.
+	 */
+	smp_mb();
+	if (!atomic_read(&smmu_domain->nr_ats_masters))
+		return 0;
+
 	arm_smmu_atc_inv_to_cmd(ssid, iova, size, &cmd);
 
 	spin_lock_irqsave(&smmu_domain->devices_lock, flags);
@@ -2305,6 +2323,7 @@ static void arm_smmu_enable_ats(struct arm_smmu_master *master)
 	size_t stu;
 	struct pci_dev *pdev;
 	struct arm_smmu_device *smmu = master->smmu;
+	struct arm_smmu_domain *smmu_domain = master->domain;
 
 	if (master->ats_enabled || !dev_is_pci(master->dev))
 		return;
@@ -2312,6 +2331,9 @@ static void arm_smmu_enable_ats(struct arm_smmu_master *master)
 	/* Smallest Translation Unit: log2 of the smallest supported granule */
 	stu = __ffs(smmu->pgsize_bitmap);
 	pdev = to_pci_dev(master->dev);
+
+	atomic_inc(&smmu_domain->nr_ats_masters);
+	arm_smmu_atc_inv_domain(smmu_domain, 0, 0, 0);
 	if (pci_enable_ats(pdev, stu))
 		dev_err(master->dev, "Failed to enable ATS (STU %zu)\n", stu);
 }
@@ -2319,6 +2341,7 @@ static void arm_smmu_enable_ats(struct arm_smmu_master *master)
 static void arm_smmu_disable_ats(struct arm_smmu_master *master)
 {
 	struct arm_smmu_cmdq_ent cmd;
+	struct arm_smmu_domain *smmu_domain = master->domain;
 
 	if (!master->ats_enabled || !dev_is_pci(master->dev))
 		return;
@@ -2331,6 +2354,7 @@ static void arm_smmu_disable_ats(struct arm_smmu_master *master)
 	wmb();
 	arm_smmu_atc_inv_to_cmd(0, 0, 0, &cmd);
 	arm_smmu_atc_inv_master(master, &cmd);
+	atomic_dec(&smmu_domain->nr_ats_masters);
 }
 
 static void arm_smmu_detach_dev(struct arm_smmu_master *master)
@@ -2341,11 +2365,12 @@ static void arm_smmu_detach_dev(struct arm_smmu_master *master)
 	if (!smmu_domain)
 		return;
 
+	arm_smmu_disable_ats(master);
+
 	spin_lock_irqsave(&smmu_domain->devices_lock, flags);
 	list_del(&master->domain_head);
 	spin_unlock_irqrestore(&smmu_domain->devices_lock, flags);
 
-	arm_smmu_disable_ats(master);
 	master->domain = NULL;
 	master->ats_enabled = false;
 	arm_smmu_install_ste_for_dev(master);
@@ -2388,10 +2413,6 @@ static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev)
 
 	master->domain = smmu_domain;
 
-	spin_lock_irqsave(&smmu_domain->devices_lock, flags);
-	list_add(&master->domain_head, &smmu_domain->devices);
-	spin_unlock_irqrestore(&smmu_domain->devices_lock, flags);
-
 	if (smmu_domain->stage != ARM_SMMU_DOMAIN_BYPASS)
 		master->ats_enabled = arm_smmu_ats_supported(master);
 
@@ -2399,7 +2420,13 @@ static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev)
 		arm_smmu_write_ctx_desc(smmu, &smmu_domain->s1_cfg);
 
 	arm_smmu_install_ste_for_dev(master);
+
+	spin_lock_irqsave(&smmu_domain->devices_lock, flags);
+	list_add(&master->domain_head, &smmu_domain->devices);
+	spin_unlock_irqrestore(&smmu_domain->devices_lock, flags);
+
 	arm_smmu_enable_ats(master);
+
 out_unlock:
 	mutex_unlock(&smmu_domain->init_mutex);
 	return ret;
-- 
2.11.0

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH 2/4] iommu/arm-smmu-v3: Rework enabling/disabling of ATS for PCI masters
  2019-08-20 15:45 ` [PATCH 2/4] iommu/arm-smmu-v3: Rework enabling/disabling of ATS for PCI masters Will Deacon
@ 2019-08-20 16:12   ` Robin Murphy
  2019-08-20 16:31     ` Will Deacon
  0 siblings, 1 reply; 9+ messages in thread
From: Robin Murphy @ 2019-08-20 16:12 UTC (permalink / raw)
  To: Will Deacon, iommu; +Cc: Jean-Philippe Brucker

On 20/08/2019 16:45, Will Deacon wrote:
> To prevent any potential issues arising from speculative Address
> Translation Requests from an ATS-enabled PCIe endpoint, rework our ATS
> enabling/disabling logic so that we enable ATS at the SMMU before we
> enable it at the endpoint, and disable things in the opposite order.
> 
> Signed-off-by: Will Deacon <will@kernel.org>
> ---
>   drivers/iommu/arm-smmu-v3.c | 44 ++++++++++++++++++++++++++------------------
>   1 file changed, 26 insertions(+), 18 deletions(-)
> 
> diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
> index 3402b1bc8e94..9096eca0c480 100644
> --- a/drivers/iommu/arm-smmu-v3.c
> +++ b/drivers/iommu/arm-smmu-v3.c
> @@ -2283,31 +2283,34 @@ static void arm_smmu_install_ste_for_dev(struct arm_smmu_master *master)
>   	}
>   }
>   
> -static int arm_smmu_enable_ats(struct arm_smmu_master *master)
> +static bool arm_smmu_ats_supported(struct arm_smmu_master *master)
>   {
> -	int ret;
> -	size_t stu;
>   	struct pci_dev *pdev;
>   	struct arm_smmu_device *smmu = master->smmu;
>   	struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(master->dev);
>   
>   	if (!(smmu->features & ARM_SMMU_FEAT_ATS) || !dev_is_pci(master->dev) ||
>   	    !(fwspec->flags & IOMMU_FWSPEC_PCI_RC_ATS) || pci_ats_disabled())
> -		return -ENXIO;
> +		return false;
>   
>   	pdev = to_pci_dev(master->dev);
> -	if (pdev->untrusted)
> -		return -EPERM;
> +	return !pdev->untrusted && pdev->ats_cap;
> +}
>   
> -	/* Smallest Translation Unit: log2 of the smallest supported granule */
> -	stu = __ffs(smmu->pgsize_bitmap);
> +static void arm_smmu_enable_ats(struct arm_smmu_master *master)
> +{
> +	size_t stu;
> +	struct pci_dev *pdev;
> +	struct arm_smmu_device *smmu = master->smmu;
>   
> -	ret = pci_enable_ats(pdev, stu);
> -	if (ret)
> -		return ret;
> +	if (master->ats_enabled || !dev_is_pci(master->dev))
> +		return;
>   
> -	master->ats_enabled = true;
> -	return 0;
> +	/* Smallest Translation Unit: log2 of the smallest supported granule */
> +	stu = __ffs(smmu->pgsize_bitmap);
> +	pdev = to_pci_dev(master->dev);
> +	if (pci_enable_ats(pdev, stu))
> +		dev_err(master->dev, "Failed to enable ATS (STU %zu)\n", stu);
>   }
>   
>   static void arm_smmu_disable_ats(struct arm_smmu_master *master)
> @@ -2317,10 +2320,14 @@ static void arm_smmu_disable_ats(struct arm_smmu_master *master)
>   	if (!master->ats_enabled || !dev_is_pci(master->dev))

Hmm, while you've got the lid off, that dev_is_pci() test is clearly 
redundant.

>   		return;
>   
> +	pci_disable_ats(to_pci_dev(master->dev));
> +	/*
> +	 * Ensure ATS is disabled at the endpoint before we issue the
> +	 * ATC invalidation via the SMMU.
> +	 */
> +	wmb();
>   	arm_smmu_atc_inv_to_cmd(0, 0, 0, &cmd);
>   	arm_smmu_atc_inv_master(master, &cmd);
> -	pci_disable_ats(to_pci_dev(master->dev));
> -	master->ats_enabled = false;
>   }
>   
>   static void arm_smmu_detach_dev(struct arm_smmu_master *master)
> @@ -2335,10 +2342,10 @@ static void arm_smmu_detach_dev(struct arm_smmu_master *master)
>   	list_del(&master->domain_head);
>   	spin_unlock_irqrestore(&smmu_domain->devices_lock, flags);
>   
> +	arm_smmu_disable_ats(master);
>   	master->domain = NULL;
> +	master->ats_enabled = false;
>   	arm_smmu_install_ste_for_dev(master);
> -
> -	arm_smmu_disable_ats(master);
>   }
>   
>   static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev)
> @@ -2383,12 +2390,13 @@ static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev)
>   	spin_unlock_irqrestore(&smmu_domain->devices_lock, flags);
>   
>   	if (smmu_domain->stage != ARM_SMMU_DOMAIN_BYPASS)
> -		arm_smmu_enable_ats(master);
> +		master->ats_enabled = arm_smmu_ats_supported(master);

So for non-bypass domains we pretend ATS is already enabled iff it could 
possibly be...

>   	if (smmu_domain->stage == ARM_SMMU_DOMAIN_S1)
>   		arm_smmu_write_ctx_desc(smmu, &smmu_domain->s1_cfg);
>   
>   	arm_smmu_install_ste_for_dev(master);
> +	arm_smmu_enable_ats(master);

...which ensures this won't actually touch the PCIe cap, unless of 
course when STE.EATS == 0. Are you sure about that?

Robin.

>   out_unlock:
>   	mutex_unlock(&smmu_domain->init_mutex);
>   	return ret;
> 
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 2/4] iommu/arm-smmu-v3: Rework enabling/disabling of ATS for PCI masters
  2019-08-20 16:12   ` Robin Murphy
@ 2019-08-20 16:31     ` Will Deacon
  0 siblings, 0 replies; 9+ messages in thread
From: Will Deacon @ 2019-08-20 16:31 UTC (permalink / raw)
  To: Robin Murphy; +Cc: Jean-Philippe Brucker, iommu

On Tue, Aug 20, 2019 at 05:12:11PM +0100, Robin Murphy wrote:
> On 20/08/2019 16:45, Will Deacon wrote:
> > To prevent any potential issues arising from speculative Address
> > Translation Requests from an ATS-enabled PCIe endpoint, rework our ATS
> > enabling/disabling logic so that we enable ATS at the SMMU before we
> > enable it at the endpoint, and disable things in the opposite order.
> > 
> > Signed-off-by: Will Deacon <will@kernel.org>
> > ---
> >   drivers/iommu/arm-smmu-v3.c | 44 ++++++++++++++++++++++++++------------------
> >   1 file changed, 26 insertions(+), 18 deletions(-)
> > 
> > diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
> > index 3402b1bc8e94..9096eca0c480 100644
> > --- a/drivers/iommu/arm-smmu-v3.c
> > +++ b/drivers/iommu/arm-smmu-v3.c
> > @@ -2283,31 +2283,34 @@ static void arm_smmu_install_ste_for_dev(struct arm_smmu_master *master)
> >   	}
> >   }
> > -static int arm_smmu_enable_ats(struct arm_smmu_master *master)
> > +static bool arm_smmu_ats_supported(struct arm_smmu_master *master)
> >   {
> > -	int ret;
> > -	size_t stu;
> >   	struct pci_dev *pdev;
> >   	struct arm_smmu_device *smmu = master->smmu;
> >   	struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(master->dev);
> >   	if (!(smmu->features & ARM_SMMU_FEAT_ATS) || !dev_is_pci(master->dev) ||
> >   	    !(fwspec->flags & IOMMU_FWSPEC_PCI_RC_ATS) || pci_ats_disabled())
> > -		return -ENXIO;
> > +		return false;
> >   	pdev = to_pci_dev(master->dev);
> > -	if (pdev->untrusted)
> > -		return -EPERM;
> > +	return !pdev->untrusted && pdev->ats_cap;
> > +}
> > -	/* Smallest Translation Unit: log2 of the smallest supported granule */
> > -	stu = __ffs(smmu->pgsize_bitmap);
> > +static void arm_smmu_enable_ats(struct arm_smmu_master *master)
> > +{
> > +	size_t stu;
> > +	struct pci_dev *pdev;
> > +	struct arm_smmu_device *smmu = master->smmu;
> > -	ret = pci_enable_ats(pdev, stu);
> > -	if (ret)
> > -		return ret;
> > +	if (master->ats_enabled || !dev_is_pci(master->dev))
> > +		return;
> > -	master->ats_enabled = true;
> > -	return 0;
> > +	/* Smallest Translation Unit: log2 of the smallest supported granule */
> > +	stu = __ffs(smmu->pgsize_bitmap);
> > +	pdev = to_pci_dev(master->dev);
> > +	if (pci_enable_ats(pdev, stu))
> > +		dev_err(master->dev, "Failed to enable ATS (STU %zu)\n", stu);
> >   }
> >   static void arm_smmu_disable_ats(struct arm_smmu_master *master)
> > @@ -2317,10 +2320,14 @@ static void arm_smmu_disable_ats(struct arm_smmu_master *master)
> >   	if (!master->ats_enabled || !dev_is_pci(master->dev))
> 
> Hmm, while you've got the lid off, that dev_is_pci() test is clearly
> redundant.

Good point; I'll kill it.

> >   		return;
> > +	pci_disable_ats(to_pci_dev(master->dev));
> > +	/*
> > +	 * Ensure ATS is disabled at the endpoint before we issue the
> > +	 * ATC invalidation via the SMMU.
> > +	 */
> > +	wmb();
> >   	arm_smmu_atc_inv_to_cmd(0, 0, 0, &cmd);
> >   	arm_smmu_atc_inv_master(master, &cmd);
> > -	pci_disable_ats(to_pci_dev(master->dev));
> > -	master->ats_enabled = false;
> >   }
> >   static void arm_smmu_detach_dev(struct arm_smmu_master *master)
> > @@ -2335,10 +2342,10 @@ static void arm_smmu_detach_dev(struct arm_smmu_master *master)
> >   	list_del(&master->domain_head);
> >   	spin_unlock_irqrestore(&smmu_domain->devices_lock, flags);
> > +	arm_smmu_disable_ats(master);
> >   	master->domain = NULL;
> > +	master->ats_enabled = false;
> >   	arm_smmu_install_ste_for_dev(master);
> > -
> > -	arm_smmu_disable_ats(master);
> >   }
> >   static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev)
> > @@ -2383,12 +2390,13 @@ static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev)
> >   	spin_unlock_irqrestore(&smmu_domain->devices_lock, flags);
> >   	if (smmu_domain->stage != ARM_SMMU_DOMAIN_BYPASS)
> > -		arm_smmu_enable_ats(master);
> > +		master->ats_enabled = arm_smmu_ats_supported(master);
> 
> So for non-bypass domains we pretend ATS is already enabled iff it could
> possibly be...
> 
> >   	if (smmu_domain->stage == ARM_SMMU_DOMAIN_S1)
> >   		arm_smmu_write_ctx_desc(smmu, &smmu_domain->s1_cfg);
> >   	arm_smmu_install_ste_for_dev(master);
> > +	arm_smmu_enable_ats(master);
> 
> ...which ensures this won't actually touch the PCIe cap, unless of course
> when STE.EATS == 0. Are you sure about that?

Argh, too many "ats_enabled" flags! (there's another one in the pci
device). I should probably invert the check, but let me have a play --
the idea is that arm_smmu_master::ats_enabled is initially used to
configure the STE and then acts as a proxy for the STE state after that.

Thanks for the review.

Will
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 3/4] iommu/arm-smmu-v3: Fix ATC invalidation ordering wrt main TLBs
  2019-08-20 15:45 ` [PATCH 3/4] iommu/arm-smmu-v3: Fix ATC invalidation ordering wrt main TLBs Will Deacon
@ 2019-08-20 16:50   ` Robin Murphy
  2019-08-20 17:07     ` Will Deacon
  0 siblings, 1 reply; 9+ messages in thread
From: Robin Murphy @ 2019-08-20 16:50 UTC (permalink / raw)
  To: Will Deacon, iommu; +Cc: Jean-Philippe Brucker

On 20/08/2019 16:45, Will Deacon wrote:
> When invalidating the ATC for an PCIe endpoint using ATS, we must take
> care to complete invalidation of the main SMMU TLBs beforehand, otherwise
> the device could immediately repopulate its ATC with stale translations.
> 
> Hooking the ATC invalidation into ->unmap() as we currently do does the
> exact opposite: it ensures that the ATC is invalidated *before*  the
> main TLBs, which is bogus.
> 
> Move ATC invalidation into the actual (leaf) invalidation routines so
> that it is always called after completing main TLB invalidation.
> 
> Signed-off-by: Will Deacon <will@kernel.org>
> ---
>   drivers/iommu/arm-smmu-v3.c | 12 +++++-------
>   1 file changed, 5 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
> index 9096eca0c480..183a1c121179 100644
> --- a/drivers/iommu/arm-smmu-v3.c
> +++ b/drivers/iommu/arm-smmu-v3.c
> @@ -1961,6 +1961,7 @@ static void arm_smmu_tlb_inv_context(void *cookie)
>   	 */
>   	arm_smmu_cmdq_issue_cmd(smmu, &cmd);
>   	arm_smmu_cmdq_issue_sync(smmu);
> +	arm_smmu_atc_inv_domain(smmu_domain, 0, 0, 0);
>   }
>   
>   static void arm_smmu_tlb_inv_range(unsigned long iova, size_t size,
> @@ -1969,7 +1970,7 @@ static void arm_smmu_tlb_inv_range(unsigned long iova, size_t size,
>   {
>   	u64 cmds[CMDQ_BATCH_ENTRIES * CMDQ_ENT_DWORDS];
>   	struct arm_smmu_device *smmu = smmu_domain->smmu;
> -	unsigned long end = iova + size;
> +	unsigned long start = iova, end = iova + size;
>   	int i = 0;
>   	struct arm_smmu_cmdq_ent cmd = {
>   		.tlbi = {
> @@ -1998,6 +1999,8 @@ static void arm_smmu_tlb_inv_range(unsigned long iova, size_t size,
>   	}
>   
>   	arm_smmu_cmdq_issue_cmdlist(smmu, cmds, i, true);
> +	if (leaf)
> +		arm_smmu_atc_inv_domain(smmu_domain, 0, start, size);

I still need to get up to speed on your cmdlist and unmap changes, but 
in isolation this "if (leaf)" guard looks a bit dodgy - in the case 
where io-pgtable goes to unmap a 2MB block, finds it's mapped as a 
table, and blows it away in one go, we'll only see a non-leaf TLBI call 
for that range, no?

Tangentially, does arm_smmu_atc_inv_domain() really need to sync once 
for each individual master, or could that do better as well? Not 
something we should worry about right now, but now that I'm looking I 
may as well note it for the record.

Robin.

>   }
>   
>   static void arm_smmu_tlb_inv_page_nosync(struct iommu_iotlb_gather *gather,
> @@ -2416,18 +2419,13 @@ static int arm_smmu_map(struct iommu_domain *domain, unsigned long iova,
>   static size_t arm_smmu_unmap(struct iommu_domain *domain, unsigned long iova,
>   			     size_t size, struct iommu_iotlb_gather *gather)
>   {
> -	int ret;
>   	struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
>   	struct io_pgtable_ops *ops = smmu_domain->pgtbl_ops;
>   
>   	if (!ops)
>   		return 0;
>   
> -	ret = ops->unmap(ops, iova, size, gather);
> -	if (ret && arm_smmu_atc_inv_domain(smmu_domain, 0, iova, size))
> -		return 0;
> -
> -	return ret;
> +	return ops->unmap(ops, iova, size, gather);
>   }
>   
>   static void arm_smmu_flush_iotlb_all(struct iommu_domain *domain)
> 
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 3/4] iommu/arm-smmu-v3: Fix ATC invalidation ordering wrt main TLBs
  2019-08-20 16:50   ` Robin Murphy
@ 2019-08-20 17:07     ` Will Deacon
  0 siblings, 0 replies; 9+ messages in thread
From: Will Deacon @ 2019-08-20 17:07 UTC (permalink / raw)
  To: Robin Murphy; +Cc: Jean-Philippe Brucker, iommu

On Tue, Aug 20, 2019 at 05:50:06PM +0100, Robin Murphy wrote:
> On 20/08/2019 16:45, Will Deacon wrote:
> > When invalidating the ATC for an PCIe endpoint using ATS, we must take
> > care to complete invalidation of the main SMMU TLBs beforehand, otherwise
> > the device could immediately repopulate its ATC with stale translations.
> > 
> > Hooking the ATC invalidation into ->unmap() as we currently do does the
> > exact opposite: it ensures that the ATC is invalidated *before*  the
> > main TLBs, which is bogus.
> > 
> > Move ATC invalidation into the actual (leaf) invalidation routines so
> > that it is always called after completing main TLB invalidation.
> > 
> > Signed-off-by: Will Deacon <will@kernel.org>
> > ---
> >   drivers/iommu/arm-smmu-v3.c | 12 +++++-------
> >   1 file changed, 5 insertions(+), 7 deletions(-)
> > 
> > diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
> > index 9096eca0c480..183a1c121179 100644
> > --- a/drivers/iommu/arm-smmu-v3.c
> > +++ b/drivers/iommu/arm-smmu-v3.c
> > @@ -1961,6 +1961,7 @@ static void arm_smmu_tlb_inv_context(void *cookie)
> >   	 */
> >   	arm_smmu_cmdq_issue_cmd(smmu, &cmd);
> >   	arm_smmu_cmdq_issue_sync(smmu);
> > +	arm_smmu_atc_inv_domain(smmu_domain, 0, 0, 0);
> >   }
> >   static void arm_smmu_tlb_inv_range(unsigned long iova, size_t size,
> > @@ -1969,7 +1970,7 @@ static void arm_smmu_tlb_inv_range(unsigned long iova, size_t size,
> >   {
> >   	u64 cmds[CMDQ_BATCH_ENTRIES * CMDQ_ENT_DWORDS];
> >   	struct arm_smmu_device *smmu = smmu_domain->smmu;
> > -	unsigned long end = iova + size;
> > +	unsigned long start = iova, end = iova + size;
> >   	int i = 0;
> >   	struct arm_smmu_cmdq_ent cmd = {
> >   		.tlbi = {
> > @@ -1998,6 +1999,8 @@ static void arm_smmu_tlb_inv_range(unsigned long iova, size_t size,
> >   	}
> >   	arm_smmu_cmdq_issue_cmdlist(smmu, cmds, i, true);
> > +	if (leaf)
> > +		arm_smmu_atc_inv_domain(smmu_domain, 0, start, size);
> 
> I still need to get up to speed on your cmdlist and unmap changes, but in
> isolation this "if (leaf)" guard looks a bit dodgy - in the case where
> io-pgtable goes to unmap a 2MB block, finds it's mapped as a table, and
> blows it away in one go, we'll only see a non-leaf TLBI call for that range,
> no?

Yuck, this is quite horrible. I don't think the ATC is permitted to cache
intermediate walks, so we actually don't need the thing to be synchronous
here. But if we update the gather structure as well, then we risk
over-invalidating for the non-ATS case when we get to the sync.

I'll have a think.

> Tangentially, does arm_smmu_atc_inv_domain() really need to sync once for
> each individual master, or could that do better as well? Not something we
> should worry about right now, but now that I'm looking I may as well note it
> for the record.

Indeed -- that function should be rewritten using the cmdlist() stuff I've
done. I'm just reluctant to start optimising for the ATS case when I'm not
able to test it.

Thanks,

Will
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2019-08-20 17:07 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-08-20 15:45 [PATCH 0/4] Sort out SMMUv3 ATC invalidation and locking Will Deacon
2019-08-20 15:45 ` [PATCH 1/4] iommu/arm-smmu-v3: Document ordering guarantees of command insertion Will Deacon
2019-08-20 15:45 ` [PATCH 2/4] iommu/arm-smmu-v3: Rework enabling/disabling of ATS for PCI masters Will Deacon
2019-08-20 16:12   ` Robin Murphy
2019-08-20 16:31     ` Will Deacon
2019-08-20 15:45 ` [PATCH 3/4] iommu/arm-smmu-v3: Fix ATC invalidation ordering wrt main TLBs Will Deacon
2019-08-20 16:50   ` Robin Murphy
2019-08-20 17:07     ` Will Deacon
2019-08-20 15:45 ` [PATCH 4/4] iommu/arm-smmu-v3: Avoid locking on invalidation path when not using ATS Will Deacon

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).