linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC v2 0/4] arm-smmu-v3 tlbi-on-map option
@ 2017-08-11 13:45 Eric Auger
  2017-08-11 13:45 ` [RFC v2 1/4] iommu/io-pgtable-arm: flush TLBs when IO_PGTABLE_QUIRK_TLBI_ON_MAP Eric Auger
                   ` (3 more replies)
  0 siblings, 4 replies; 19+ messages in thread
From: Eric Auger @ 2017-08-11 13:45 UTC (permalink / raw)
  To: eric.auger.pro, eric.auger, iommu, linux-kernel, Will.Deacon,
	robin.murphy, Jean-Philippe.Brucker
  Cc: christoffer.dall, Marc.Zyngier, alex.williamson, peterx, mst, tn,
	bharat.bhushan

This series adds a new tlbi-on-map option to the smmuv3 driver.
When set, the IO_PGTABLE_QUIRK_TLBI_ON_MAP quirk is applied for
LPAE tables and the smmuv3 driver sends TLB invalidations on map.

This mode is useful when running the driver on a guest as it allows
the virtualizer to trap any change to the translation structures.
This is similar to the Intel vtd caching mode (CM).

This is mandated for vSMMUv3/VFIO integration where guest mappings
must be applied to the physical IOMMU and also for VHOST.

When this mode is set we use an implementation defined TLBI
invalidation command which allows to invalidate a range of IOVA
instead of using CMD_TLBI_NH_VA which works by page. This
is needed for sake of efficiency when running DPDK use case on
guest as DPDK uses hugepages. As far as I understand the intel
IOMMU provides such invalidation command (using the address mask
parameter) and at the moment, I haven't found something similar
in the smmuv3 architecture specification.

Best Regards

Eric

Git: complete series available at
https://github.com/eauger/linux/tree/v4.13-rc3-tlbi-on-map-rfcv2

History:
v1 -> v2:
- add support for ACPI probing
- add implementation defined CMD_TLBI_NH_VA_AM which allows
  IOVA range invalidation


Eric Auger (4):
  iommu/io-pgtable-arm: flush TLBs when IO_PGTABLE_QUIRK_TLBI_ON_MAP
  iommu/arm-smmu-v3: Add tlbi_on_map option
  iommu/arm-smmu-v3: Add hypothetical caching mode model
  iommu/arm-smmu-v3: add CMD_TLBI_NH_VA_AM command for iova range
    invalidation

 .../devicetree/bindings/iommu/arm,smmu-v3.txt      |  4 +++
 drivers/iommu/arm-smmu-v3.c                        | 34 ++++++++++++++++++++--
 drivers/iommu/io-pgtable-arm.c                     | 14 +++++++--
 3 files changed, 48 insertions(+), 4 deletions(-)

-- 
2.5.5

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [RFC v2 1/4] iommu/io-pgtable-arm: flush TLBs when IO_PGTABLE_QUIRK_TLBI_ON_MAP
  2017-08-11 13:45 [RFC v2 0/4] arm-smmu-v3 tlbi-on-map option Eric Auger
@ 2017-08-11 13:45 ` Eric Auger
  2017-08-11 13:45 ` [RFC v2 2/4] iommu/arm-smmu-v3: Add tlbi_on_map option Eric Auger
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 19+ messages in thread
From: Eric Auger @ 2017-08-11 13:45 UTC (permalink / raw)
  To: eric.auger.pro, eric.auger, iommu, linux-kernel, Will.Deacon,
	robin.murphy, Jean-Philippe.Brucker
  Cc: christoffer.dall, Marc.Zyngier, alex.williamson, peterx, mst, tn,
	bharat.bhushan

This patch implements the IO_PGTABLE_QUIRK_TLBI_ON_MAP quirk for
LPAE page tables. It forces TLB invalidations on map.

Signed-off-by: Eric Auger <eric.auger@redhat.com>

---
v1 -> v2:
- rebase on v4.13-rc2
---
 drivers/iommu/io-pgtable-arm.c | 14 ++++++++++++--
 1 file changed, 12 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c
index b182039..198a042 100644
--- a/drivers/iommu/io-pgtable-arm.c
+++ b/drivers/iommu/io-pgtable-arm.c
@@ -447,6 +447,8 @@ static int arm_lpae_map(struct io_pgtable_ops *ops, unsigned long iova,
 	arm_lpae_iopte *ptep = data->pgd;
 	int ret, lvl = ARM_LPAE_START_LVL(data);
 	arm_lpae_iopte prot;
+	struct io_pgtable *iop = &data->iop;
+	struct io_pgtable_cfg *cfg = &data->iop.cfg;
 
 	/* If no access, then nothing to do */
 	if (!(iommu_prot & (IOMMU_READ | IOMMU_WRITE)))
@@ -454,6 +456,12 @@ static int arm_lpae_map(struct io_pgtable_ops *ops, unsigned long iova,
 
 	prot = arm_lpae_prot_to_pte(data, iommu_prot);
 	ret = __arm_lpae_map(data, iova, paddr, size, prot, lvl, ptep);
+
+	if (cfg->quirks & IO_PGTABLE_QUIRK_TLBI_ON_MAP) {
+		io_pgtable_tlb_add_flush(iop, iova, size,
+					 ARM_LPAE_GRANULE(data), false);
+		io_pgtable_tlb_sync(iop);
+	}
 	/*
 	 * Synchronise all PTE updates for the new mapping before there's
 	 * a chance for anything to kick off a table walk for the new iova.
@@ -739,7 +747,8 @@ arm_64_lpae_alloc_pgtable_s1(struct io_pgtable_cfg *cfg, void *cookie)
 	u64 reg;
 	struct arm_lpae_io_pgtable *data;
 
-	if (cfg->quirks & ~(IO_PGTABLE_QUIRK_ARM_NS | IO_PGTABLE_QUIRK_NO_DMA))
+	if (cfg->quirks & ~(IO_PGTABLE_QUIRK_ARM_NS | IO_PGTABLE_QUIRK_NO_DMA |
+			    IO_PGTABLE_QUIRK_TLBI_ON_MAP))
 		return NULL;
 
 	data = arm_lpae_alloc_pgtable(cfg);
@@ -828,7 +837,8 @@ arm_64_lpae_alloc_pgtable_s2(struct io_pgtable_cfg *cfg, void *cookie)
 	struct arm_lpae_io_pgtable *data;
 
 	/* The NS quirk doesn't apply at stage 2 */
-	if (cfg->quirks & ~IO_PGTABLE_QUIRK_NO_DMA)
+	if (cfg->quirks &
+		~(IO_PGTABLE_QUIRK_NO_DMA | IO_PGTABLE_QUIRK_TLBI_ON_MAP))
 		return NULL;
 
 	data = arm_lpae_alloc_pgtable(cfg);
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [RFC v2 2/4] iommu/arm-smmu-v3: Add tlbi_on_map option
  2017-08-11 13:45 [RFC v2 0/4] arm-smmu-v3 tlbi-on-map option Eric Auger
  2017-08-11 13:45 ` [RFC v2 1/4] iommu/io-pgtable-arm: flush TLBs when IO_PGTABLE_QUIRK_TLBI_ON_MAP Eric Auger
@ 2017-08-11 13:45 ` Eric Auger
  2017-08-17 16:34   ` Will Deacon
  2017-08-11 13:45 ` [RFC v2 3/4] iommu/arm-smmu-v3: Add hypothetical caching mode model Eric Auger
  2017-08-11 13:45 ` [RFC v2 4/4] iommu/arm-smmu-v3: add CMD_TLBI_NH_VA_AM command for iova range invalidation Eric Auger
  3 siblings, 1 reply; 19+ messages in thread
From: Eric Auger @ 2017-08-11 13:45 UTC (permalink / raw)
  To: eric.auger.pro, eric.auger, iommu, linux-kernel, Will.Deacon,
	robin.murphy, Jean-Philippe.Brucker
  Cc: christoffer.dall, Marc.Zyngier, alex.williamson, peterx, mst, tn,
	bharat.bhushan

When running a virtual SMMU on a guest we sometimes need to trap
all changes to the translation structures. This is especially useful
to integrate with VFIO. This patch adds a new option that forces
the IO_PGTABLE_QUIRK_TLBI_ON_MAP to be applied on LPAE page tables.

TLBI commands then can be trapped.

Signed-off-by: Eric Auger <eric.auger@redhat.com>

---
v1 -> v2:
- rebase on v4.13-rc2
---
 Documentation/devicetree/bindings/iommu/arm,smmu-v3.txt | 4 ++++
 drivers/iommu/arm-smmu-v3.c                             | 5 +++++
 2 files changed, 9 insertions(+)

diff --git a/Documentation/devicetree/bindings/iommu/arm,smmu-v3.txt b/Documentation/devicetree/bindings/iommu/arm,smmu-v3.txt
index c9abbf3..ebb85e9 100644
--- a/Documentation/devicetree/bindings/iommu/arm,smmu-v3.txt
+++ b/Documentation/devicetree/bindings/iommu/arm,smmu-v3.txt
@@ -52,6 +52,10 @@ the PCIe specification.
                         devicetree/bindings/interrupt-controller/msi.txt
                       for a description of the msi-parent property.
 
+- tlbi-on-map       : invalidate caches whenever there is an update of
+                      any remapping structure (updates to not-present or
+                      present entries).
+
 - hisilicon,broken-prefetch-cmd
                     : Avoid sending CMD_PREFETCH_* commands to the SMMU.
 
diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index 568c400..690247b 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -608,6 +608,7 @@ struct arm_smmu_device {
 
 #define ARM_SMMU_OPT_SKIP_PREFETCH	(1 << 0)
 #define ARM_SMMU_OPT_PAGE0_REGS_ONLY	(1 << 1)
+#define ARM_SMMU_OPT_TLBI_ON_MAP	(1 << 2)
 	u32				options;
 
 	struct arm_smmu_cmdq		cmdq;
@@ -675,6 +676,7 @@ struct arm_smmu_option_prop {
 static struct arm_smmu_option_prop arm_smmu_options[] = {
 	{ ARM_SMMU_OPT_SKIP_PREFETCH, "hisilicon,broken-prefetch-cmd" },
 	{ ARM_SMMU_OPT_PAGE0_REGS_ONLY, "cavium,cn9900-broken-page1-regspace"},
+	{ ARM_SMMU_OPT_TLBI_ON_MAP, "tlbi-on-map" },
 	{ 0, NULL},
 };
 
@@ -1604,6 +1606,9 @@ static int arm_smmu_domain_finalise(struct iommu_domain *domain)
 	if (smmu->features & ARM_SMMU_FEAT_COHERENCY)
 		pgtbl_cfg.quirks = IO_PGTABLE_QUIRK_NO_DMA;
 
+	if (smmu->options & ARM_SMMU_OPT_TLBI_ON_MAP)
+		pgtbl_cfg.quirks = IO_PGTABLE_QUIRK_TLBI_ON_MAP;
+
 	pgtbl_ops = alloc_io_pgtable_ops(fmt, &pgtbl_cfg, smmu_domain);
 	if (!pgtbl_ops)
 		return -ENOMEM;
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [RFC v2 3/4] iommu/arm-smmu-v3: Add hypothetical caching mode model
  2017-08-11 13:45 [RFC v2 0/4] arm-smmu-v3 tlbi-on-map option Eric Auger
  2017-08-11 13:45 ` [RFC v2 1/4] iommu/io-pgtable-arm: flush TLBs when IO_PGTABLE_QUIRK_TLBI_ON_MAP Eric Auger
  2017-08-11 13:45 ` [RFC v2 2/4] iommu/arm-smmu-v3: Add tlbi_on_map option Eric Auger
@ 2017-08-11 13:45 ` Eric Auger
  2017-08-11 13:45 ` [RFC v2 4/4] iommu/arm-smmu-v3: add CMD_TLBI_NH_VA_AM command for iova range invalidation Eric Auger
  3 siblings, 0 replies; 19+ messages in thread
From: Eric Auger @ 2017-08-11 13:45 UTC (permalink / raw)
  To: eric.auger.pro, eric.auger, iommu, linux-kernel, Will.Deacon,
	robin.murphy, Jean-Philippe.Brucker
  Cc: christoffer.dall, Marc.Zyngier, alex.williamson, peterx, mst, tn,
	bharat.bhushan

Let's add an hypothetical "caching mode" smmuv3 model (not yet
discussed for IORT spec) and enable the TLBI_ON_MAP option for
this latter.

Signed-off-by: Eric Auger <eric.auger@redhat.com>
---
 drivers/iommu/arm-smmu-v3.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index 690247b..a1c10af 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -422,6 +422,10 @@
 #define ACPI_IORT_SMMU_V3_CAVIUM_CN99XX		0x2
 #endif
 
+#ifndef ACPI_IORT_SMMU_V3_CACHING_MODE
+#define ACPI_IORT_SMMU_V3_CACHING_MODE		0x3
+#endif
+
 static bool disable_bypass;
 module_param_named(disable_bypass, disable_bypass, bool, S_IRUGO);
 MODULE_PARM_DESC(disable_bypass,
@@ -2673,6 +2677,9 @@ static void acpi_smmu_get_options(u32 model, struct arm_smmu_device *smmu)
 	case ACPI_IORT_SMMU_HISILICON_HI161X:
 		smmu->options |= ARM_SMMU_OPT_SKIP_PREFETCH;
 		break;
+	case ACPI_IORT_SMMU_V3_CACHING_MODE:
+		smmu->options |= ARM_SMMU_OPT_TLBI_ON_MAP;
+		break;
 	}
 
 	dev_notice(smmu->dev, "option mask 0x%x\n", smmu->options);
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [RFC v2 4/4] iommu/arm-smmu-v3: add CMD_TLBI_NH_VA_AM command for iova range invalidation
  2017-08-11 13:45 [RFC v2 0/4] arm-smmu-v3 tlbi-on-map option Eric Auger
                   ` (2 preceding siblings ...)
  2017-08-11 13:45 ` [RFC v2 3/4] iommu/arm-smmu-v3: Add hypothetical caching mode model Eric Auger
@ 2017-08-11 13:45 ` Eric Auger
  3 siblings, 0 replies; 19+ messages in thread
From: Eric Auger @ 2017-08-11 13:45 UTC (permalink / raw)
  To: eric.auger.pro, eric.auger, iommu, linux-kernel, Will.Deacon,
	robin.murphy, Jean-Philippe.Brucker
  Cc: christoffer.dall, Marc.Zyngier, alex.williamson, peterx, mst, tn,
	bharat.bhushan

When using a virtual SMMU and running the driver in TLBI_ON_MAP
mode we need invalidate large IOVA ranges. This typically happens
in DPDK use case where hugepages are used. In that case, invalidating
pages by page is really inefficient and we would need to invalidate
by iova range. Unfortunately there is no such command specified in the
SMMUv3 architecture spec. Let's add a new implementation defined command
that takes an address mask.

The CMD_TLBI_NH_VA_AM command format is inherited from CMD_TLBI_NH_VA's
one, replace the currently unused VMID field by the AM field.

Signed-off-by: Eric Auger <eric.auger@redhat.com>
---
 drivers/iommu/arm-smmu-v3.c | 22 ++++++++++++++++++++--
 1 file changed, 20 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index a1c10af..9da2785 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -492,9 +492,15 @@ struct arm_smmu_cmdq_ent {
 		#define CMDQ_OP_TLBI_S12_VMALL	0x28
 		#define CMDQ_OP_TLBI_S2_IPA	0x2a
 		#define CMDQ_OP_TLBI_NSNH_ALL	0x30
+
+		/* vIOMMU ASID/IOVA Range Invalidation */
+		#define CMDQ_OP_TLBI_NH_VA_AM	0x8F
 		struct {
 			u16			asid;
-			u16			vmid;
+			union {
+				u16		vmid;
+				u16		am; /* address mask */
+			};
 			bool			leaf;
 			u64			addr;
 		} tlbi;
@@ -853,6 +859,12 @@ static int arm_smmu_cmdq_build_cmd(u64 *cmd, struct arm_smmu_cmdq_ent *ent)
 		cmd[1] |= ent->tlbi.leaf ? CMDQ_TLBI_1_LEAF : 0;
 		cmd[1] |= ent->tlbi.addr & CMDQ_TLBI_1_VA_MASK;
 		break;
+	case CMDQ_OP_TLBI_NH_VA_AM:
+		cmd[0] |= (u64)ent->tlbi.asid << CMDQ_TLBI_0_ASID_SHIFT;
+		cmd[0] |= (u64)ent->tlbi.am << CMDQ_TLBI_0_VMID_SHIFT;
+		cmd[1] |= ent->tlbi.leaf ? CMDQ_TLBI_1_LEAF : 0;
+		cmd[1] |= ent->tlbi.addr & CMDQ_TLBI_1_VA_MASK;
+		break;
 	case CMDQ_OP_TLBI_S2_IPA:
 		cmd[0] |= (u64)ent->tlbi.vmid << CMDQ_TLBI_0_VMID_SHIFT;
 		cmd[1] |= ent->tlbi.leaf ? CMDQ_TLBI_1_LEAF : 0;
@@ -1402,8 +1414,14 @@ static void arm_smmu_tlb_inv_range_nosync(unsigned long iova, size_t size,
 	};
 
 	if (smmu_domain->stage == ARM_SMMU_DOMAIN_S1) {
-		cmd.opcode	= CMDQ_OP_TLBI_NH_VA;
 		cmd.tlbi.asid	= smmu_domain->s1_cfg.cd.asid;
+		if (smmu->options & ARM_SMMU_OPT_TLBI_ON_MAP) {
+			cmd.opcode	= CMDQ_OP_TLBI_NH_VA_AM;
+			cmd.tlbi.am   = size >> 12;
+			granule = size;
+		} else {
+			cmd.opcode	= CMDQ_OP_TLBI_NH_VA;
+		}
 	} else {
 		cmd.opcode	= CMDQ_OP_TLBI_S2_IPA;
 		cmd.tlbi.vmid	= smmu_domain->s2_cfg.vmid;
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [RFC v2 2/4] iommu/arm-smmu-v3: Add tlbi_on_map option
  2017-08-11 13:45 ` [RFC v2 2/4] iommu/arm-smmu-v3: Add tlbi_on_map option Eric Auger
@ 2017-08-17 16:34   ` Will Deacon
  2017-08-17 17:47     ` Auger Eric
  2017-08-18  2:49     ` Michael S. Tsirkin
  0 siblings, 2 replies; 19+ messages in thread
From: Will Deacon @ 2017-08-17 16:34 UTC (permalink / raw)
  To: Eric Auger
  Cc: eric.auger.pro, iommu, linux-kernel, robin.murphy,
	Jean-Philippe.Brucker, christoffer.dall, Marc.Zyngier,
	alex.williamson, peterx, mst, tn, bharat.bhushan

On Fri, Aug 11, 2017 at 03:45:28PM +0200, Eric Auger wrote:
> When running a virtual SMMU on a guest we sometimes need to trap
> all changes to the translation structures. This is especially useful
> to integrate with VFIO. This patch adds a new option that forces
> the IO_PGTABLE_QUIRK_TLBI_ON_MAP to be applied on LPAE page tables.
> 
> TLBI commands then can be trapped.
> 
> Signed-off-by: Eric Auger <eric.auger@redhat.com>
> 
> ---
> v1 -> v2:
> - rebase on v4.13-rc2
> ---
>  Documentation/devicetree/bindings/iommu/arm,smmu-v3.txt | 4 ++++
>  drivers/iommu/arm-smmu-v3.c                             | 5 +++++
>  2 files changed, 9 insertions(+)
> 
> diff --git a/Documentation/devicetree/bindings/iommu/arm,smmu-v3.txt b/Documentation/devicetree/bindings/iommu/arm,smmu-v3.txt
> index c9abbf3..ebb85e9 100644
> --- a/Documentation/devicetree/bindings/iommu/arm,smmu-v3.txt
> +++ b/Documentation/devicetree/bindings/iommu/arm,smmu-v3.txt
> @@ -52,6 +52,10 @@ the PCIe specification.
>                          devicetree/bindings/interrupt-controller/msi.txt
>                        for a description of the msi-parent property.
>  
> +- tlbi-on-map       : invalidate caches whenever there is an update of
> +                      any remapping structure (updates to not-present or
> +                      present entries).
> +

My position on this hasn't changed, so NAK for this patch. If you want to
emulate something outside of the SMMUv3 architecture, please do so, but
don't pretend that it's an SMMUv3.

Will

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC v2 2/4] iommu/arm-smmu-v3: Add tlbi_on_map option
  2017-08-17 16:34   ` Will Deacon
@ 2017-08-17 17:47     ` Auger Eric
  2017-08-18  2:54       ` Michael S. Tsirkin
  2017-08-18  2:49     ` Michael S. Tsirkin
  1 sibling, 1 reply; 19+ messages in thread
From: Auger Eric @ 2017-08-17 17:47 UTC (permalink / raw)
  To: Will Deacon
  Cc: eric.auger.pro, iommu, linux-kernel, robin.murphy,
	Jean-Philippe.Brucker, christoffer.dall, Marc.Zyngier,
	alex.williamson, peterx, mst, tn, bharat.bhushan

Hi Will,

On 17/08/2017 18:34, Will Deacon wrote:
> On Fri, Aug 11, 2017 at 03:45:28PM +0200, Eric Auger wrote:
>> When running a virtual SMMU on a guest we sometimes need to trap
>> all changes to the translation structures. This is especially useful
>> to integrate with VFIO. This patch adds a new option that forces
>> the IO_PGTABLE_QUIRK_TLBI_ON_MAP to be applied on LPAE page tables.
>>
>> TLBI commands then can be trapped.
>>
>> Signed-off-by: Eric Auger <eric.auger@redhat.com>
>>
>> ---
>> v1 -> v2:
>> - rebase on v4.13-rc2
>> ---
>>  Documentation/devicetree/bindings/iommu/arm,smmu-v3.txt | 4 ++++
>>  drivers/iommu/arm-smmu-v3.c                             | 5 +++++
>>  2 files changed, 9 insertions(+)
>>
>> diff --git a/Documentation/devicetree/bindings/iommu/arm,smmu-v3.txt b/Documentation/devicetree/bindings/iommu/arm,smmu-v3.txt
>> index c9abbf3..ebb85e9 100644
>> --- a/Documentation/devicetree/bindings/iommu/arm,smmu-v3.txt
>> +++ b/Documentation/devicetree/bindings/iommu/arm,smmu-v3.txt
>> @@ -52,6 +52,10 @@ the PCIe specification.
>>                          devicetree/bindings/interrupt-controller/msi.txt
>>                        for a description of the msi-parent property.
>>  
>> +- tlbi-on-map       : invalidate caches whenever there is an update of
>> +                      any remapping structure (updates to not-present or
>> +                      present entries).
>> +
> 
> My position on this hasn't changed, so NAK for this patch. If you want to
> emulate something outside of the SMMUv3 architecture, please do so, but
> don't pretend that it's an SMMUv3.
OK understood. I wanted to go down the road and enable DPDK use case
using the same solution as Intel.

So to me the approach is not adapted to SMMUv3 because
- SMMUv3 does not expose anything similar to the Intel Caching Mode
(when set, indicates the OS needs to invalidate TLB on map)
- SMMUv3 does not expose any IOVA range TLB invalidation command.

Do you agree with this conclusion? ACPI enablement was not a showstopper
I think.

I will see with Peter and other potential users in the community whether
it is worth to pursue the efforts on upstreaming the QEMU vSMMUv3
device, considering the VFIO/VHOST integration is made impossible.

Thanks

Eric

> 
> Will
> 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC v2 2/4] iommu/arm-smmu-v3: Add tlbi_on_map option
  2017-08-17 16:34   ` Will Deacon
  2017-08-17 17:47     ` Auger Eric
@ 2017-08-18  2:49     ` Michael S. Tsirkin
  2017-08-22 19:09       ` Michael S. Tsirkin
  1 sibling, 1 reply; 19+ messages in thread
From: Michael S. Tsirkin @ 2017-08-18  2:49 UTC (permalink / raw)
  To: Will Deacon
  Cc: Eric Auger, eric.auger.pro, iommu, linux-kernel, robin.murphy,
	Jean-Philippe.Brucker, christoffer.dall, Marc.Zyngier,
	alex.williamson, peterx, tn, bharat.bhushan

On Thu, Aug 17, 2017 at 05:34:25PM +0100, Will Deacon wrote:
> On Fri, Aug 11, 2017 at 03:45:28PM +0200, Eric Auger wrote:
> > When running a virtual SMMU on a guest we sometimes need to trap
> > all changes to the translation structures. This is especially useful
> > to integrate with VFIO. This patch adds a new option that forces
> > the IO_PGTABLE_QUIRK_TLBI_ON_MAP to be applied on LPAE page tables.
> > 
> > TLBI commands then can be trapped.
> > 
> > Signed-off-by: Eric Auger <eric.auger@redhat.com>
> > 
> > ---
> > v1 -> v2:
> > - rebase on v4.13-rc2
> > ---
> >  Documentation/devicetree/bindings/iommu/arm,smmu-v3.txt | 4 ++++
> >  drivers/iommu/arm-smmu-v3.c                             | 5 +++++
> >  2 files changed, 9 insertions(+)
> > 
> > diff --git a/Documentation/devicetree/bindings/iommu/arm,smmu-v3.txt b/Documentation/devicetree/bindings/iommu/arm,smmu-v3.txt
> > index c9abbf3..ebb85e9 100644
> > --- a/Documentation/devicetree/bindings/iommu/arm,smmu-v3.txt
> > +++ b/Documentation/devicetree/bindings/iommu/arm,smmu-v3.txt
> > @@ -52,6 +52,10 @@ the PCIe specification.
> >                          devicetree/bindings/interrupt-controller/msi.txt
> >                        for a description of the msi-parent property.
> >  
> > +- tlbi-on-map       : invalidate caches whenever there is an update of
> > +                      any remapping structure (updates to not-present or
> > +                      present entries).
> > +
> 
> My position on this hasn't changed, so NAK for this patch. If you want to
> emulate something outside of the SMMUv3 architecture, please do so, but
> don't pretend that it's an SMMUv3.
> 
> Will

What if the emulated device does not list arm,smmu-v3, listing
qemu,ssmu-v3 as compatible? Would that address the concern?

Way I see it down the road most people will want to use nested
mmu support in hardware. So things will work fine without
this hack.

For those that can't for some reason, reusing most of the code
in a real smmu driver allows better coverage than a completely
separate device.

Consider hypervisor as hardware - yes it's
not exactly an smmu-v3 but it's very similar so imho it's better to
reuse existing code than duplicate all of it.



-- 
MST

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC v2 2/4] iommu/arm-smmu-v3: Add tlbi_on_map option
  2017-08-17 17:47     ` Auger Eric
@ 2017-08-18  2:54       ` Michael S. Tsirkin
  2017-08-18  6:50         ` Auger Eric
  0 siblings, 1 reply; 19+ messages in thread
From: Michael S. Tsirkin @ 2017-08-18  2:54 UTC (permalink / raw)
  To: Auger Eric
  Cc: Will Deacon, eric.auger.pro, iommu, linux-kernel, robin.murphy,
	Jean-Philippe.Brucker, christoffer.dall, Marc.Zyngier,
	alex.williamson, peterx, tn, bharat.bhushan

On Thu, Aug 17, 2017 at 07:47:04PM +0200, Auger Eric wrote:
> I will see with Peter and other potential users in the community whether
> it is worth to pursue the efforts on upstreaming the QEMU vSMMUv3
> device, considering the VFIO/VHOST integration is made impossible.

I posted more ideas on finding a way to support it after all separately.
Even without this:
	1. VHOST does not need to be notified on map.
	2. VFIO might be possible for hardware that supports nested page tables.

IMHO 2 is worth looking into.

-- 
MST

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC v2 2/4] iommu/arm-smmu-v3: Add tlbi_on_map option
  2017-08-18  2:54       ` Michael S. Tsirkin
@ 2017-08-18  6:50         ` Auger Eric
  0 siblings, 0 replies; 19+ messages in thread
From: Auger Eric @ 2017-08-18  6:50 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Will Deacon, eric.auger.pro, iommu, linux-kernel, robin.murphy,
	Jean-Philippe.Brucker, christoffer.dall, Marc.Zyngier,
	alex.williamson, peterx, tn, bharat.bhushan

Hi Michael,
On 18/08/2017 04:54, Michael S. Tsirkin wrote:
> On Thu, Aug 17, 2017 at 07:47:04PM +0200, Auger Eric wrote:
>> I will see with Peter and other potential users in the community whether
>> it is worth to pursue the efforts on upstreaming the QEMU vSMMUv3
>> device, considering the VFIO/VHOST integration is made impossible.
> 
> I posted more ideas on finding a way to support it after all separately.
> Even without this:
> 	1. VHOST does not need to be notified on map.
My mistake. I will split the QEMU series accordingly.
> 	2. VFIO might be possible for hardware that supports nested page tables.
> 
> IMHO 2 is worth looking into.
OK That's my next step.

Thank you for your suggestions.

Eric

> 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC v2 2/4] iommu/arm-smmu-v3: Add tlbi_on_map option
  2017-08-18  2:49     ` Michael S. Tsirkin
@ 2017-08-22 19:09       ` Michael S. Tsirkin
  2017-08-23 10:25         ` Will Deacon
  0 siblings, 1 reply; 19+ messages in thread
From: Michael S. Tsirkin @ 2017-08-22 19:09 UTC (permalink / raw)
  To: Will Deacon
  Cc: Eric Auger, eric.auger.pro, iommu, linux-kernel, robin.murphy,
	Jean-Philippe.Brucker, christoffer.dall, Marc.Zyngier,
	alex.williamson, peterx, tn, bharat.bhushan

On Fri, Aug 18, 2017 at 05:49:42AM +0300, Michael S. Tsirkin wrote:
> On Thu, Aug 17, 2017 at 05:34:25PM +0100, Will Deacon wrote:
> > On Fri, Aug 11, 2017 at 03:45:28PM +0200, Eric Auger wrote:
> > > When running a virtual SMMU on a guest we sometimes need to trap
> > > all changes to the translation structures. This is especially useful
> > > to integrate with VFIO. This patch adds a new option that forces
> > > the IO_PGTABLE_QUIRK_TLBI_ON_MAP to be applied on LPAE page tables.
> > > 
> > > TLBI commands then can be trapped.
> > > 
> > > Signed-off-by: Eric Auger <eric.auger@redhat.com>
> > > 
> > > ---
> > > v1 -> v2:
> > > - rebase on v4.13-rc2
> > > ---
> > >  Documentation/devicetree/bindings/iommu/arm,smmu-v3.txt | 4 ++++
> > >  drivers/iommu/arm-smmu-v3.c                             | 5 +++++
> > >  2 files changed, 9 insertions(+)
> > > 
> > > diff --git a/Documentation/devicetree/bindings/iommu/arm,smmu-v3.txt b/Documentation/devicetree/bindings/iommu/arm,smmu-v3.txt
> > > index c9abbf3..ebb85e9 100644
> > > --- a/Documentation/devicetree/bindings/iommu/arm,smmu-v3.txt
> > > +++ b/Documentation/devicetree/bindings/iommu/arm,smmu-v3.txt
> > > @@ -52,6 +52,10 @@ the PCIe specification.
> > >                          devicetree/bindings/interrupt-controller/msi.txt
> > >                        for a description of the msi-parent property.
> > >  
> > > +- tlbi-on-map       : invalidate caches whenever there is an update of
> > > +                      any remapping structure (updates to not-present or
> > > +                      present entries).
> > > +
> > 
> > My position on this hasn't changed, so NAK for this patch. If you want to
> > emulate something outside of the SMMUv3 architecture, please do so, but
> > don't pretend that it's an SMMUv3.
> > 
> > Will
> 
> What if the emulated device does not list arm,smmu-v3, listing
> qemu,ssmu-v3 as compatible? Would that address the concern?

Will, can you comment on this please? Are you open to reusing the code
in drivers/iommu/arm-smmu-v3.c to support a paravirtual device that does
not claim to be compatible with smmuv3 but does try to behave very close to
it except it can cache non-present structures? Or would you rather
the code to support this is forked to qemu-smmu-v3.c?


> Way I see it down the road most people will want to use nested
> mmu support in hardware. So things will work fine without
> this hack.
> 
> For those that can't for some reason, reusing most of the code
> in a real smmu driver allows better coverage than a completely
> separate device.
> 
> Consider hypervisor as hardware - yes it's
> not exactly an smmu-v3 but it's very similar so imho it's better to
> reuse existing code than duplicate all of it.
> 
> 
> 
> -- 
> MST

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC v2 2/4] iommu/arm-smmu-v3: Add tlbi_on_map option
  2017-08-22 19:09       ` Michael S. Tsirkin
@ 2017-08-23 10:25         ` Will Deacon
  2017-08-23 12:36           ` Auger Eric
  2017-08-23 14:05           ` Michael S. Tsirkin
  0 siblings, 2 replies; 19+ messages in thread
From: Will Deacon @ 2017-08-23 10:25 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Eric Auger, eric.auger.pro, iommu, linux-kernel, robin.murphy,
	Jean-Philippe.Brucker, christoffer.dall, Marc.Zyngier,
	alex.williamson, peterx, tn, bharat.bhushan

On Tue, Aug 22, 2017 at 10:09:15PM +0300, Michael S. Tsirkin wrote:
> On Fri, Aug 18, 2017 at 05:49:42AM +0300, Michael S. Tsirkin wrote:
> > On Thu, Aug 17, 2017 at 05:34:25PM +0100, Will Deacon wrote:
> > > On Fri, Aug 11, 2017 at 03:45:28PM +0200, Eric Auger wrote:
> > > > When running a virtual SMMU on a guest we sometimes need to trap
> > > > all changes to the translation structures. This is especially useful
> > > > to integrate with VFIO. This patch adds a new option that forces
> > > > the IO_PGTABLE_QUIRK_TLBI_ON_MAP to be applied on LPAE page tables.
> > > > 
> > > > TLBI commands then can be trapped.
> > > > 
> > > > Signed-off-by: Eric Auger <eric.auger@redhat.com>
> > > > 
> > > > ---
> > > > v1 -> v2:
> > > > - rebase on v4.13-rc2
> > > > ---
> > > >  Documentation/devicetree/bindings/iommu/arm,smmu-v3.txt | 4 ++++
> > > >  drivers/iommu/arm-smmu-v3.c                             | 5 +++++
> > > >  2 files changed, 9 insertions(+)
> > > > 
> > > > diff --git a/Documentation/devicetree/bindings/iommu/arm,smmu-v3.txt b/Documentation/devicetree/bindings/iommu/arm,smmu-v3.txt
> > > > index c9abbf3..ebb85e9 100644
> > > > --- a/Documentation/devicetree/bindings/iommu/arm,smmu-v3.txt
> > > > +++ b/Documentation/devicetree/bindings/iommu/arm,smmu-v3.txt
> > > > @@ -52,6 +52,10 @@ the PCIe specification.
> > > >                          devicetree/bindings/interrupt-controller/msi.txt
> > > >                        for a description of the msi-parent property.
> > > >  
> > > > +- tlbi-on-map       : invalidate caches whenever there is an update of
> > > > +                      any remapping structure (updates to not-present or
> > > > +                      present entries).
> > > > +
> > > 
> > > My position on this hasn't changed, so NAK for this patch. If you want to
> > > emulate something outside of the SMMUv3 architecture, please do so, but
> > > don't pretend that it's an SMMUv3.
> > > 
> > > Will
> > 
> > What if the emulated device does not list arm,smmu-v3, listing
> > qemu,ssmu-v3 as compatible? Would that address the concern?
> 
> Will, can you comment on this please? Are you open to reusing the code
> in drivers/iommu/arm-smmu-v3.c to support a paravirtual device that does
> not claim to be compatible with smmuv3 but does try to behave very close to
> it except it can cache non-present structures? Or would you rather
> the code to support this is forked to qemu-smmu-v3.c?

I still don't understand why this is preferable to a PV IOMMU
implementation. Not only is this proposing to issue TLB maintenance on
map, but the maintenance command itself is entirely made up. Why not just
have a map command? Anyway, I'm reluctant to add this hack to the driver until:

  1. There is a compelling reason to pursue this approach instead of a
     PV approach (including performance measurements).

  2. There is a specification for the QEMU fork of the ARM SMMUv3
     architecture, including the semantics of the new command being proposed
     and what exactly the TLB maintenance requirements are on map (for
     example, what if I change an STE or a CD -- are they cached too?).

  3. The ACPI IORT spec is updated to recognise this implementation

  4. There is an implementation that can use the guest page tables directly,
     because that may well make all of this moot.

Forking the driver doesn't sound very sensible to me.

Will

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC v2 2/4] iommu/arm-smmu-v3: Add tlbi_on_map option
  2017-08-23 10:25         ` Will Deacon
@ 2017-08-23 12:36           ` Auger Eric
  2017-08-23 14:12             ` Michael S. Tsirkin
  2017-08-23 16:42             ` Will Deacon
  2017-08-23 14:05           ` Michael S. Tsirkin
  1 sibling, 2 replies; 19+ messages in thread
From: Auger Eric @ 2017-08-23 12:36 UTC (permalink / raw)
  To: Will Deacon, Michael S. Tsirkin
  Cc: eric.auger.pro, iommu, linux-kernel, robin.murphy,
	Jean-Philippe.Brucker, christoffer.dall, Marc.Zyngier,
	alex.williamson, peterx, tn, bharat.bhushan

Hi Will,

On 23/08/2017 12:25, Will Deacon wrote:
> On Tue, Aug 22, 2017 at 10:09:15PM +0300, Michael S. Tsirkin wrote:
>> On Fri, Aug 18, 2017 at 05:49:42AM +0300, Michael S. Tsirkin wrote:
>>> On Thu, Aug 17, 2017 at 05:34:25PM +0100, Will Deacon wrote:
>>>> On Fri, Aug 11, 2017 at 03:45:28PM +0200, Eric Auger wrote:
>>>>> When running a virtual SMMU on a guest we sometimes need to trap
>>>>> all changes to the translation structures. This is especially useful
>>>>> to integrate with VFIO. This patch adds a new option that forces
>>>>> the IO_PGTABLE_QUIRK_TLBI_ON_MAP to be applied on LPAE page tables.
>>>>>
>>>>> TLBI commands then can be trapped.
>>>>>
>>>>> Signed-off-by: Eric Auger <eric.auger@redhat.com>
>>>>>
>>>>> ---
>>>>> v1 -> v2:
>>>>> - rebase on v4.13-rc2
>>>>> ---
>>>>>  Documentation/devicetree/bindings/iommu/arm,smmu-v3.txt | 4 ++++
>>>>>  drivers/iommu/arm-smmu-v3.c                             | 5 +++++
>>>>>  2 files changed, 9 insertions(+)
>>>>>
>>>>> diff --git a/Documentation/devicetree/bindings/iommu/arm,smmu-v3.txt b/Documentation/devicetree/bindings/iommu/arm,smmu-v3.txt
>>>>> index c9abbf3..ebb85e9 100644
>>>>> --- a/Documentation/devicetree/bindings/iommu/arm,smmu-v3.txt
>>>>> +++ b/Documentation/devicetree/bindings/iommu/arm,smmu-v3.txt
>>>>> @@ -52,6 +52,10 @@ the PCIe specification.
>>>>>                          devicetree/bindings/interrupt-controller/msi.txt
>>>>>                        for a description of the msi-parent property.
>>>>>  
>>>>> +- tlbi-on-map       : invalidate caches whenever there is an update of
>>>>> +                      any remapping structure (updates to not-present or
>>>>> +                      present entries).
>>>>> +
>>>>
>>>> My position on this hasn't changed, so NAK for this patch. If you want to
>>>> emulate something outside of the SMMUv3 architecture, please do so, but
>>>> don't pretend that it's an SMMUv3.
>>>>
>>>> Will
>>>
>>> What if the emulated device does not list arm,smmu-v3, listing
>>> qemu,ssmu-v3 as compatible? Would that address the concern?
>>
>> Will, can you comment on this please? Are you open to reusing the code
>> in drivers/iommu/arm-smmu-v3.c to support a paravirtual device that does
>> not claim to be compatible with smmuv3 but does try to behave very close to
>> it except it can cache non-present structures? Or would you rather
>> the code to support this is forked to qemu-smmu-v3.c?
> 
> I still don't understand why this is preferable to a PV IOMMU
> implementation. Not only is this proposing to issue TLB maintenance on
> map, but the maintenance command itself is entirely made up. Why not just
> have a map command? Anyway, I'm reluctant to add this hack to the driver until:
> 
>   1. There is a compelling reason to pursue this approach instead of a
>      PV approach (including performance measurements).
> 
>   2. There is a specification for the QEMU fork of the ARM SMMUv3
>      architecture, including the semantics of the new command being proposed
>      and what exactly the TLB maintenance requirements are on map (for
>      example, what if I change an STE or a CD -- are they cached too?).
I am not sure I catch this last point. At the moment whenever the smmuv3
driver issues data structure invalidation commands (CMD_CFGI_*), those
are trapped and I replay the mappings on host side. I have not changed
anything on that side.

I introduced a new map implementation defined command because the per
page CMD_TLBI_NH_VA IOVA invalidation command was not efficient/usable
with use cases such as DPDK on guest. I understood the spec provisions
for such implementation defined commands.
> 
>   3. The ACPI IORT spec is updated to recognise this implementation
> 
>   4. There is an implementation that can use the guest page tables directly,
>      because that may well make all of this moot.
Most probably I will come back to you with questions on stage 1 + stage2
enablement and "4.8 Virtualisation" chapter of smmuv3 spec. Besides I
also need to get access to some HW with smmuv3 ;-)

Thanks

Eric
> 
> Forking the driver doesn't sound very sensible to me.
> 
> Will
> 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC v2 2/4] iommu/arm-smmu-v3: Add tlbi_on_map option
  2017-08-23 10:25         ` Will Deacon
  2017-08-23 12:36           ` Auger Eric
@ 2017-08-23 14:05           ` Michael S. Tsirkin
  1 sibling, 0 replies; 19+ messages in thread
From: Michael S. Tsirkin @ 2017-08-23 14:05 UTC (permalink / raw)
  To: Will Deacon
  Cc: Eric Auger, eric.auger.pro, iommu, linux-kernel, robin.murphy,
	Jean-Philippe.Brucker, christoffer.dall, Marc.Zyngier,
	alex.williamson, peterx, tn, bharat.bhushan

On Wed, Aug 23, 2017 at 11:25:17AM +0100, Will Deacon wrote:
> On Tue, Aug 22, 2017 at 10:09:15PM +0300, Michael S. Tsirkin wrote:
> > On Fri, Aug 18, 2017 at 05:49:42AM +0300, Michael S. Tsirkin wrote:
> > > On Thu, Aug 17, 2017 at 05:34:25PM +0100, Will Deacon wrote:
> > > > On Fri, Aug 11, 2017 at 03:45:28PM +0200, Eric Auger wrote:
> > > > > When running a virtual SMMU on a guest we sometimes need to trap
> > > > > all changes to the translation structures. This is especially useful
> > > > > to integrate with VFIO. This patch adds a new option that forces
> > > > > the IO_PGTABLE_QUIRK_TLBI_ON_MAP to be applied on LPAE page tables.
> > > > > 
> > > > > TLBI commands then can be trapped.
> > > > > 
> > > > > Signed-off-by: Eric Auger <eric.auger@redhat.com>
> > > > > 
> > > > > ---
> > > > > v1 -> v2:
> > > > > - rebase on v4.13-rc2
> > > > > ---
> > > > >  Documentation/devicetree/bindings/iommu/arm,smmu-v3.txt | 4 ++++
> > > > >  drivers/iommu/arm-smmu-v3.c                             | 5 +++++
> > > > >  2 files changed, 9 insertions(+)
> > > > > 
> > > > > diff --git a/Documentation/devicetree/bindings/iommu/arm,smmu-v3.txt b/Documentation/devicetree/bindings/iommu/arm,smmu-v3.txt
> > > > > index c9abbf3..ebb85e9 100644
> > > > > --- a/Documentation/devicetree/bindings/iommu/arm,smmu-v3.txt
> > > > > +++ b/Documentation/devicetree/bindings/iommu/arm,smmu-v3.txt
> > > > > @@ -52,6 +52,10 @@ the PCIe specification.
> > > > >                          devicetree/bindings/interrupt-controller/msi.txt
> > > > >                        for a description of the msi-parent property.
> > > > >  
> > > > > +- tlbi-on-map       : invalidate caches whenever there is an update of
> > > > > +                      any remapping structure (updates to not-present or
> > > > > +                      present entries).
> > > > > +
> > > > 
> > > > My position on this hasn't changed, so NAK for this patch. If you want to
> > > > emulate something outside of the SMMUv3 architecture, please do so, but
> > > > don't pretend that it's an SMMUv3.
> > > > 
> > > > Will
> > > 
> > > What if the emulated device does not list arm,smmu-v3, listing
> > > qemu,ssmu-v3 as compatible? Would that address the concern?
> > 
> > Will, can you comment on this please? Are you open to reusing the code
> > in drivers/iommu/arm-smmu-v3.c to support a paravirtual device that does
> > not claim to be compatible with smmuv3 but does try to behave very close to
> > it except it can cache non-present structures? Or would you rather
> > the code to support this is forked to qemu-smmu-v3.c?
> 
> I still don't understand why this is preferable to a PV IOMMU
> implementation.

It has advantages and disadvantages as everything. To list some
advantages:

- Because this reuses all of the code we need for emulating SMMU anyway.
Just look at size of the patches and compare to virtio iommu patches.

- I think this is a reasonable stepping stone for using nested support in
host SMMU which is obviously faster as you don't need to send mappings
to host. We can get guest and QEMU working, then work on support
using guest page tables directly.

- With virtio IOMMU you will never be able to switch to using
guest page tables directly without upheaving to host/guest
interfaces.

> Not only is this proposing to issue TLB maintenance on
> map, but the maintenance command itself is entirely made up. Why not just
> have a map command? Anyway, I'm reluctant to add this hack to the driver until:
> 
>   1. There is a compelling reason to pursue this approach instead of a
>      PV approach (including performance measurements).

In fact the question does not make a lot of sense.  This interface is PV
right there. But this PV is waaay less code since we need the emulation
anyway. I don't even need to look at performance to see a compelling
reason on the QEMU side.  So it's a question of reduced maintainance
host side.

>   2. There is a specification for the QEMU fork of the ARM SMMUv3
>      architecture, including the semantics of the new command being proposed
>      and what exactly the TLB maintenance requirements are on map (for
>      example, what if I change an STE or a CD -- are they cached too?).

Makes sense.

>   3. The ACPI IORT spec is updated to recognise this implementation

I don't think we have to gate on this. IORT is ARM spec for ARM
hardware.  This should be a device specific quirk only triggering for
the QEMU (non-ARM) implementation (which in this patchset it isn't, and
this is something to fix IMO).

>   4. There is an implementation that can use the guest page tables directly,
>      because that may well make all of this moot.

That will depend on the specific host capability. So this is actually
another part of the motivation here. Guest / host interface will be very
similar with this and with using guest page tables directly. So most of
the same code gets to run with and without, good for testing, coverage etc.

But I agree this item should at least be on the roadmap,
I'm somewhat concerned it isn't. In fact the same applies to PV IOMMU.


> Forking the driver doesn't sound very sensible to me.
> 
> Will

There's still time as patches on qemu side are RFC

-- 
MST

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC v2 2/4] iommu/arm-smmu-v3: Add tlbi_on_map option
  2017-08-23 12:36           ` Auger Eric
@ 2017-08-23 14:12             ` Michael S. Tsirkin
  2017-08-23 16:42             ` Will Deacon
  1 sibling, 0 replies; 19+ messages in thread
From: Michael S. Tsirkin @ 2017-08-23 14:12 UTC (permalink / raw)
  To: Auger Eric
  Cc: Will Deacon, eric.auger.pro, iommu, linux-kernel, robin.murphy,
	Jean-Philippe.Brucker, christoffer.dall, Marc.Zyngier,
	alex.williamson, peterx, tn, bharat.bhushan

On Wed, Aug 23, 2017 at 02:36:53PM +0200, Auger Eric wrote:
> Hi Will,
> 
> On 23/08/2017 12:25, Will Deacon wrote:
> > On Tue, Aug 22, 2017 at 10:09:15PM +0300, Michael S. Tsirkin wrote:
> >> On Fri, Aug 18, 2017 at 05:49:42AM +0300, Michael S. Tsirkin wrote:
> >>> On Thu, Aug 17, 2017 at 05:34:25PM +0100, Will Deacon wrote:
> >>>> On Fri, Aug 11, 2017 at 03:45:28PM +0200, Eric Auger wrote:
> >>>>> When running a virtual SMMU on a guest we sometimes need to trap
> >>>>> all changes to the translation structures. This is especially useful
> >>>>> to integrate with VFIO. This patch adds a new option that forces
> >>>>> the IO_PGTABLE_QUIRK_TLBI_ON_MAP to be applied on LPAE page tables.
> >>>>>
> >>>>> TLBI commands then can be trapped.
> >>>>>
> >>>>> Signed-off-by: Eric Auger <eric.auger@redhat.com>
> >>>>>
> >>>>> ---
> >>>>> v1 -> v2:
> >>>>> - rebase on v4.13-rc2
> >>>>> ---
> >>>>>  Documentation/devicetree/bindings/iommu/arm,smmu-v3.txt | 4 ++++
> >>>>>  drivers/iommu/arm-smmu-v3.c                             | 5 +++++
> >>>>>  2 files changed, 9 insertions(+)
> >>>>>
> >>>>> diff --git a/Documentation/devicetree/bindings/iommu/arm,smmu-v3.txt b/Documentation/devicetree/bindings/iommu/arm,smmu-v3.txt
> >>>>> index c9abbf3..ebb85e9 100644
> >>>>> --- a/Documentation/devicetree/bindings/iommu/arm,smmu-v3.txt
> >>>>> +++ b/Documentation/devicetree/bindings/iommu/arm,smmu-v3.txt
> >>>>> @@ -52,6 +52,10 @@ the PCIe specification.
> >>>>>                          devicetree/bindings/interrupt-controller/msi.txt
> >>>>>                        for a description of the msi-parent property.
> >>>>>  
> >>>>> +- tlbi-on-map       : invalidate caches whenever there is an update of
> >>>>> +                      any remapping structure (updates to not-present or
> >>>>> +                      present entries).
> >>>>> +
> >>>>
> >>>> My position on this hasn't changed, so NAK for this patch. If you want to
> >>>> emulate something outside of the SMMUv3 architecture, please do so, but
> >>>> don't pretend that it's an SMMUv3.
> >>>>
> >>>> Will
> >>>
> >>> What if the emulated device does not list arm,smmu-v3, listing
> >>> qemu,ssmu-v3 as compatible? Would that address the concern?
> >>
> >> Will, can you comment on this please? Are you open to reusing the code
> >> in drivers/iommu/arm-smmu-v3.c to support a paravirtual device that does
> >> not claim to be compatible with smmuv3 but does try to behave very close to
> >> it except it can cache non-present structures? Or would you rather
> >> the code to support this is forked to qemu-smmu-v3.c?
> > 
> > I still don't understand why this is preferable to a PV IOMMU
> > implementation. Not only is this proposing to issue TLB maintenance on
> > map, but the maintenance command itself is entirely made up. Why not just
> > have a map command? Anyway, I'm reluctant to add this hack to the driver until:
> > 
> >   1. There is a compelling reason to pursue this approach instead of a
> >      PV approach (including performance measurements).
> > 
> >   2. There is a specification for the QEMU fork of the ARM SMMUv3
> >      architecture, including the semantics of the new command being proposed
> >      and what exactly the TLB maintenance requirements are on map (for
> >      example, what if I change an STE or a CD -- are they cached too?).
> I am not sure I catch this last point. At the moment whenever the smmuv3
> driver issues data structure invalidation commands (CMD_CFGI_*), those
> are trapped and I replay the mappings on host side. I have not changed
> anything on that side.

Right but you do need to include a text document with QEMU patches that
details how it differs from SMMUv3. That should explain what
exactly does QEMU cache that real hardware does not,
without relying on current driver behaviour.

> I introduced a new map implementation defined command because the per
> page CMD_TLBI_NH_VA IOVA invalidation command was not efficient/usable
> with use cases such as DPDK on guest. I understood the spec provisions
> for such implementation defined commands.

I think it makes sense to add code to limit these command to the QEMU
implementation only. When not implementing SMMUv3 spec verbatim
QEMU will not pretend it's compatible. This way future hardware
will not conflict if it does this.

> > 
> >   3. The ACPI IORT spec is updated to recognise this implementation
> > 
> >   4. There is an implementation that can use the guest page tables directly,
> >      because that may well make all of this moot.
> Most probably I will come back to you with questions on stage 1 + stage2
> enablement and "4.8 Virtualisation" chapter of smmuv3 spec. Besides I
> also need to get access to some HW with smmuv3 ;-)
> 
> Thanks
> 
> Eric

So support on older hosts is a valid motivation too, you want
to include the motivation with the patches.


> > 
> > Forking the driver doesn't sound very sensible to me.
> > 
> > Will
> > 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC v2 2/4] iommu/arm-smmu-v3: Add tlbi_on_map option
  2017-08-23 12:36           ` Auger Eric
  2017-08-23 14:12             ` Michael S. Tsirkin
@ 2017-08-23 16:42             ` Will Deacon
  2017-08-23 19:43               ` Michael S. Tsirkin
                                 ` (2 more replies)
  1 sibling, 3 replies; 19+ messages in thread
From: Will Deacon @ 2017-08-23 16:42 UTC (permalink / raw)
  To: Auger Eric
  Cc: Michael S. Tsirkin, eric.auger.pro, iommu, linux-kernel,
	robin.murphy, Jean-Philippe.Brucker, christoffer.dall,
	Marc.Zyngier, alex.williamson, peterx, tn, bharat.bhushan

Hi Eric,

On Wed, Aug 23, 2017 at 02:36:53PM +0200, Auger Eric wrote:
> On 23/08/2017 12:25, Will Deacon wrote:
> > On Tue, Aug 22, 2017 at 10:09:15PM +0300, Michael S. Tsirkin wrote:
> >> On Fri, Aug 18, 2017 at 05:49:42AM +0300, Michael S. Tsirkin wrote:
> >>> On Thu, Aug 17, 2017 at 05:34:25PM +0100, Will Deacon wrote:
> >>>> On Fri, Aug 11, 2017 at 03:45:28PM +0200, Eric Auger wrote:
> >>>>> When running a virtual SMMU on a guest we sometimes need to trap
> >>>>> all changes to the translation structures. This is especially useful
> >>>>> to integrate with VFIO. This patch adds a new option that forces
> >>>>> the IO_PGTABLE_QUIRK_TLBI_ON_MAP to be applied on LPAE page tables.
> >>>>>
> >>>>> TLBI commands then can be trapped.
> >>>>>
> >>>>> Signed-off-by: Eric Auger <eric.auger@redhat.com>
> >>>>>
> >>>>> ---
> >>>>> v1 -> v2:
> >>>>> - rebase on v4.13-rc2
> >>>>> ---
> >>>>>  Documentation/devicetree/bindings/iommu/arm,smmu-v3.txt | 4 ++++
> >>>>>  drivers/iommu/arm-smmu-v3.c                             | 5 +++++
> >>>>>  2 files changed, 9 insertions(+)
> >>>>>
> >>>>> diff --git a/Documentation/devicetree/bindings/iommu/arm,smmu-v3.txt b/Documentation/devicetree/bindings/iommu/arm,smmu-v3.txt
> >>>>> index c9abbf3..ebb85e9 100644
> >>>>> --- a/Documentation/devicetree/bindings/iommu/arm,smmu-v3.txt
> >>>>> +++ b/Documentation/devicetree/bindings/iommu/arm,smmu-v3.txt
> >>>>> @@ -52,6 +52,10 @@ the PCIe specification.
> >>>>>                          devicetree/bindings/interrupt-controller/msi.txt
> >>>>>                        for a description of the msi-parent property.
> >>>>>  
> >>>>> +- tlbi-on-map       : invalidate caches whenever there is an update of
> >>>>> +                      any remapping structure (updates to not-present or
> >>>>> +                      present entries).
> >>>>> +
> >>>>
> >>>> My position on this hasn't changed, so NAK for this patch. If you want to
> >>>> emulate something outside of the SMMUv3 architecture, please do so, but
> >>>> don't pretend that it's an SMMUv3.
> >>>>
> >>>> Will
> >>>
> >>> What if the emulated device does not list arm,smmu-v3, listing
> >>> qemu,ssmu-v3 as compatible? Would that address the concern?
> >>
> >> Will, can you comment on this please? Are you open to reusing the code
> >> in drivers/iommu/arm-smmu-v3.c to support a paravirtual device that does
> >> not claim to be compatible with smmuv3 but does try to behave very close to
> >> it except it can cache non-present structures? Or would you rather
> >> the code to support this is forked to qemu-smmu-v3.c?
> > 
> > I still don't understand why this is preferable to a PV IOMMU
> > implementation. Not only is this proposing to issue TLB maintenance on
> > map, but the maintenance command itself is entirely made up. Why not just
> > have a map command? Anyway, I'm reluctant to add this hack to the driver until:
> > 
> >   1. There is a compelling reason to pursue this approach instead of a
> >      PV approach (including performance measurements).
> > 
> >   2. There is a specification for the QEMU fork of the ARM SMMUv3
> >      architecture, including the semantics of the new command being proposed
> >      and what exactly the TLB maintenance requirements are on map (for
> >      example, what if I change an STE or a CD -- are they cached too?).
> I am not sure I catch this last point. At the moment whenever the smmuv3
> driver issues data structure invalidation commands (CMD_CFGI_*), those
> are trapped and I replay the mappings on host side. I have not changed
> anything on that side.

But STEs and CDs have very similar rules to TLBs: you don't need to issue
invalidation if the data structure is transitioning from invalid to valid.
If you're caching those in QEMU, how do you keep them up-to-date? I can
also guarantee you that there will be additional data structures added
in future versions of the architecture, so you'll need to consider how
you want to operate when running on newer hardware.

> I introduced a new map implementation defined command because the per
> page CMD_TLBI_NH_VA IOVA invalidation command was not efficient/usable
> with use cases such as DPDK on guest. I understood the spec provisions
> for such implementation defined commands.

Whilst there is a space for IMP DEF commands, this doesn't generally mean
that they can be repurposed by software. What if the underlying hardware
has an IMP DEF command that you want to export? Besides, my main points
here are that your command isn't well-specified and if you have to add
a command, why not just add a "map" command (i.e. implement a PV interface
instead)?

> >   3. The ACPI IORT spec is updated to recognise this implementation
> > 
> >   4. There is an implementation that can use the guest page tables directly,
> >      because that may well make all of this moot.
> Most probably I will come back to you with questions on stage 1 + stage2
> enablement and "4.8 Virtualisation" chapter of smmuv3 spec. Besides I
> also need to get access to some HW with smmuv3 ;-)

Ok.

Will

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC v2 2/4] iommu/arm-smmu-v3: Add tlbi_on_map option
  2017-08-23 16:42             ` Will Deacon
@ 2017-08-23 19:43               ` Michael S. Tsirkin
  2017-08-24  7:09               ` Auger Eric
  2017-10-05 15:14               ` Auger Eric
  2 siblings, 0 replies; 19+ messages in thread
From: Michael S. Tsirkin @ 2017-08-23 19:43 UTC (permalink / raw)
  To: Will Deacon
  Cc: Auger Eric, eric.auger.pro, iommu, linux-kernel, robin.murphy,
	Jean-Philippe.Brucker, christoffer.dall, Marc.Zyngier,
	alex.williamson, peterx, tn, bharat.bhushan

On Wed, Aug 23, 2017 at 05:42:55PM +0100, Will Deacon wrote:
> Hi Eric,
> 
> On Wed, Aug 23, 2017 at 02:36:53PM +0200, Auger Eric wrote:
> > On 23/08/2017 12:25, Will Deacon wrote:
> > > On Tue, Aug 22, 2017 at 10:09:15PM +0300, Michael S. Tsirkin wrote:
> > >> On Fri, Aug 18, 2017 at 05:49:42AM +0300, Michael S. Tsirkin wrote:
> > >>> On Thu, Aug 17, 2017 at 05:34:25PM +0100, Will Deacon wrote:
> > >>>> On Fri, Aug 11, 2017 at 03:45:28PM +0200, Eric Auger wrote:
> > >>>>> When running a virtual SMMU on a guest we sometimes need to trap
> > >>>>> all changes to the translation structures. This is especially useful
> > >>>>> to integrate with VFIO. This patch adds a new option that forces
> > >>>>> the IO_PGTABLE_QUIRK_TLBI_ON_MAP to be applied on LPAE page tables.
> > >>>>>
> > >>>>> TLBI commands then can be trapped.
> > >>>>>
> > >>>>> Signed-off-by: Eric Auger <eric.auger@redhat.com>
> > >>>>>
> > >>>>> ---
> > >>>>> v1 -> v2:
> > >>>>> - rebase on v4.13-rc2
> > >>>>> ---
> > >>>>>  Documentation/devicetree/bindings/iommu/arm,smmu-v3.txt | 4 ++++
> > >>>>>  drivers/iommu/arm-smmu-v3.c                             | 5 +++++
> > >>>>>  2 files changed, 9 insertions(+)
> > >>>>>
> > >>>>> diff --git a/Documentation/devicetree/bindings/iommu/arm,smmu-v3.txt b/Documentation/devicetree/bindings/iommu/arm,smmu-v3.txt
> > >>>>> index c9abbf3..ebb85e9 100644
> > >>>>> --- a/Documentation/devicetree/bindings/iommu/arm,smmu-v3.txt
> > >>>>> +++ b/Documentation/devicetree/bindings/iommu/arm,smmu-v3.txt
> > >>>>> @@ -52,6 +52,10 @@ the PCIe specification.
> > >>>>>                          devicetree/bindings/interrupt-controller/msi.txt
> > >>>>>                        for a description of the msi-parent property.
> > >>>>>  
> > >>>>> +- tlbi-on-map       : invalidate caches whenever there is an update of
> > >>>>> +                      any remapping structure (updates to not-present or
> > >>>>> +                      present entries).
> > >>>>> +
> > >>>>
> > >>>> My position on this hasn't changed, so NAK for this patch. If you want to
> > >>>> emulate something outside of the SMMUv3 architecture, please do so, but
> > >>>> don't pretend that it's an SMMUv3.
> > >>>>
> > >>>> Will
> > >>>
> > >>> What if the emulated device does not list arm,smmu-v3, listing
> > >>> qemu,ssmu-v3 as compatible? Would that address the concern?
> > >>
> > >> Will, can you comment on this please? Are you open to reusing the code
> > >> in drivers/iommu/arm-smmu-v3.c to support a paravirtual device that does
> > >> not claim to be compatible with smmuv3 but does try to behave very close to
> > >> it except it can cache non-present structures? Or would you rather
> > >> the code to support this is forked to qemu-smmu-v3.c?
> > > 
> > > I still don't understand why this is preferable to a PV IOMMU
> > > implementation. Not only is this proposing to issue TLB maintenance on
> > > map, but the maintenance command itself is entirely made up. Why not just
> > > have a map command? Anyway, I'm reluctant to add this hack to the driver until:
> > > 
> > >   1. There is a compelling reason to pursue this approach instead of a
> > >      PV approach (including performance measurements).
> > > 
> > >   2. There is a specification for the QEMU fork of the ARM SMMUv3
> > >      architecture, including the semantics of the new command being proposed
> > >      and what exactly the TLB maintenance requirements are on map (for
> > >      example, what if I change an STE or a CD -- are they cached too?).
> > I am not sure I catch this last point. At the moment whenever the smmuv3
> > driver issues data structure invalidation commands (CMD_CFGI_*), those
> > are trapped and I replay the mappings on host side. I have not changed
> > anything on that side.
> 
> But STEs and CDs have very similar rules to TLBs: you don't need to issue
> invalidation if the data structure is transitioning from invalid to valid.
> If you're caching those in QEMU, how do you keep them up-to-date? I can
> also guarantee you that there will be additional data structures added
> in future versions of the architecture, so you'll need to consider how
> you want to operate when running on newer hardware.
>
> > I introduced a new map implementation defined command because the per
> > page CMD_TLBI_NH_VA IOVA invalidation command was not efficient/usable
> > with use cases such as DPDK on guest. I understood the spec provisions
> > for such implementation defined commands.
> 
> Whilst there is a space for IMP DEF commands, this doesn't generally mean
> that they can be repurposed by software. What if the underlying hardware
> has an IMP DEF command that you want to export?

The code needs to change to only issue these commands for the QEMU SMMU,
not the arm one.  Then if necessary we will be able to add some kind of
handshake with guest and transition to a different command for newer
guests.

> Besides, my main points
> here are that your command isn't well-specified and if you have to add
> a command, why not just add a "map" command (i.e. implement a PV interface
> instead)?

Yes it's PV anyway, just do what's easier to code up and support.


> > >   3. The ACPI IORT spec is updated to recognise this implementation
> > > 
> > >   4. There is an implementation that can use the guest page tables directly,
> > >      because that may well make all of this moot.
> > Most probably I will come back to you with questions on stage 1 + stage2
> > enablement and "4.8 Virtualisation" chapter of smmuv3 spec. Besides I
> > also need to get access to some HW with smmuv3 ;-)
> 
> Ok.
> 
> Will

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC v2 2/4] iommu/arm-smmu-v3: Add tlbi_on_map option
  2017-08-23 16:42             ` Will Deacon
  2017-08-23 19:43               ` Michael S. Tsirkin
@ 2017-08-24  7:09               ` Auger Eric
  2017-10-05 15:14               ` Auger Eric
  2 siblings, 0 replies; 19+ messages in thread
From: Auger Eric @ 2017-08-24  7:09 UTC (permalink / raw)
  To: Will Deacon
  Cc: Michael S. Tsirkin, eric.auger.pro, iommu, linux-kernel,
	robin.murphy, Jean-Philippe.Brucker, christoffer.dall,
	Marc.Zyngier, alex.williamson, peterx, tn, bharat.bhushan

Hi Will,

On 23/08/2017 18:42, Will Deacon wrote:
> Hi Eric,
> 
> On Wed, Aug 23, 2017 at 02:36:53PM +0200, Auger Eric wrote:
>> On 23/08/2017 12:25, Will Deacon wrote:
>>> On Tue, Aug 22, 2017 at 10:09:15PM +0300, Michael S. Tsirkin wrote:
>>>> On Fri, Aug 18, 2017 at 05:49:42AM +0300, Michael S. Tsirkin wrote:
>>>>> On Thu, Aug 17, 2017 at 05:34:25PM +0100, Will Deacon wrote:
>>>>>> On Fri, Aug 11, 2017 at 03:45:28PM +0200, Eric Auger wrote:
>>>>>>> When running a virtual SMMU on a guest we sometimes need to trap
>>>>>>> all changes to the translation structures. This is especially useful
>>>>>>> to integrate with VFIO. This patch adds a new option that forces
>>>>>>> the IO_PGTABLE_QUIRK_TLBI_ON_MAP to be applied on LPAE page tables.
>>>>>>>
>>>>>>> TLBI commands then can be trapped.
>>>>>>>
>>>>>>> Signed-off-by: Eric Auger <eric.auger@redhat.com>
>>>>>>>
>>>>>>> ---
>>>>>>> v1 -> v2:
>>>>>>> - rebase on v4.13-rc2
>>>>>>> ---
>>>>>>>  Documentation/devicetree/bindings/iommu/arm,smmu-v3.txt | 4 ++++
>>>>>>>  drivers/iommu/arm-smmu-v3.c                             | 5 +++++
>>>>>>>  2 files changed, 9 insertions(+)
>>>>>>>
>>>>>>> diff --git a/Documentation/devicetree/bindings/iommu/arm,smmu-v3.txt b/Documentation/devicetree/bindings/iommu/arm,smmu-v3.txt
>>>>>>> index c9abbf3..ebb85e9 100644
>>>>>>> --- a/Documentation/devicetree/bindings/iommu/arm,smmu-v3.txt
>>>>>>> +++ b/Documentation/devicetree/bindings/iommu/arm,smmu-v3.txt
>>>>>>> @@ -52,6 +52,10 @@ the PCIe specification.
>>>>>>>                          devicetree/bindings/interrupt-controller/msi.txt
>>>>>>>                        for a description of the msi-parent property.
>>>>>>>  
>>>>>>> +- tlbi-on-map       : invalidate caches whenever there is an update of
>>>>>>> +                      any remapping structure (updates to not-present or
>>>>>>> +                      present entries).
>>>>>>> +
>>>>>>
>>>>>> My position on this hasn't changed, so NAK for this patch. If you want to
>>>>>> emulate something outside of the SMMUv3 architecture, please do so, but
>>>>>> don't pretend that it's an SMMUv3.
>>>>>>
>>>>>> Will
>>>>>
>>>>> What if the emulated device does not list arm,smmu-v3, listing
>>>>> qemu,ssmu-v3 as compatible? Would that address the concern?
>>>>
>>>> Will, can you comment on this please? Are you open to reusing the code
>>>> in drivers/iommu/arm-smmu-v3.c to support a paravirtual device that does
>>>> not claim to be compatible with smmuv3 but does try to behave very close to
>>>> it except it can cache non-present structures? Or would you rather
>>>> the code to support this is forked to qemu-smmu-v3.c?
>>>
>>> I still don't understand why this is preferable to a PV IOMMU
>>> implementation. Not only is this proposing to issue TLB maintenance on
>>> map, but the maintenance command itself is entirely made up. Why not just
>>> have a map command? Anyway, I'm reluctant to add this hack to the driver until:
>>>
>>>   1. There is a compelling reason to pursue this approach instead of a
>>>      PV approach (including performance measurements).
>>>
>>>   2. There is a specification for the QEMU fork of the ARM SMMUv3
>>>      architecture, including the semantics of the new command being proposed
>>>      and what exactly the TLB maintenance requirements are on map (for
>>>      example, what if I change an STE or a CD -- are they cached too?).
>> I am not sure I catch this last point. At the moment whenever the smmuv3
>> driver issues data structure invalidation commands (CMD_CFGI_*), those
>> are trapped and I replay the mappings on host side. I have not changed
>> anything on that side.
> 
> But STEs and CDs have very similar rules to TLBs: you don't need to issue
> invalidation if the data structure is transitioning from invalid to valid.
> If you're caching those in QEMU, how do you keep them up-to-date? I can
> also guarantee you that there will be additional data structures added
> in future versions of the architecture, so you'll need to consider how
> you want to operate when running on newer hardware.
OK I get your point now. I thought the driver was currently sending
CMD_CFGI_STE on each config change and I did not notice any issue yet
with the various use cases. So that should be part of the FW quirk to
guarantee that each time this happens, commands are sent for QEMU to trap.

This FW quirk was devised to do exactly the same as the Intel vtd
caching mode. Technically I guess Intel driver implements the exact same
hooks (but they did to comply with the spec, I agree).

Thanks

Eric
> 
>> I introduced a new map implementation defined command because the per
>> page CMD_TLBI_NH_VA IOVA invalidation command was not efficient/usable
>> with use cases such as DPDK on guest. I understood the spec provisions
>> for such implementation defined commands.
> 
> Whilst there is a space for IMP DEF commands, this doesn't generally mean
> that they can be repurposed by software. What if the underlying hardware
> has an IMP DEF command that you want to export? Besides, my main points
> here are that your command isn't well-specified and if you have to add
> a command, why not just add a "map" command (i.e. implement a PV interface
> instead)?
> 
>>>   3. The ACPI IORT spec is updated to recognise this implementation
>>>
>>>   4. There is an implementation that can use the guest page tables directly,
>>>      because that may well make all of this moot.
>> Most probably I will come back to you with questions on stage 1 + stage2
>> enablement and "4.8 Virtualisation" chapter of smmuv3 spec. Besides I
>> also need to get access to some HW with smmuv3 ;-)
> 
> Ok.
> 
> Will
> 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC v2 2/4] iommu/arm-smmu-v3: Add tlbi_on_map option
  2017-08-23 16:42             ` Will Deacon
  2017-08-23 19:43               ` Michael S. Tsirkin
  2017-08-24  7:09               ` Auger Eric
@ 2017-10-05 15:14               ` Auger Eric
  2 siblings, 0 replies; 19+ messages in thread
From: Auger Eric @ 2017-10-05 15:14 UTC (permalink / raw)
  To: Will Deacon
  Cc: Michael S. Tsirkin, eric.auger.pro, iommu, linux-kernel,
	robin.murphy, Jean-Philippe.Brucker, christoffer.dall,
	Marc.Zyngier, alex.williamson, peterx, tn, bharat.bhushan

Hi Will,

On 23/08/2017 18:42, Will Deacon wrote:
> Hi Eric,
> 
> On Wed, Aug 23, 2017 at 02:36:53PM +0200, Auger Eric wrote:
>> On 23/08/2017 12:25, Will Deacon wrote:
>>> On Tue, Aug 22, 2017 at 10:09:15PM +0300, Michael S. Tsirkin wrote:
>>>> On Fri, Aug 18, 2017 at 05:49:42AM +0300, Michael S. Tsirkin wrote:
>>>>> On Thu, Aug 17, 2017 at 05:34:25PM +0100, Will Deacon wrote:
>>>>>> On Fri, Aug 11, 2017 at 03:45:28PM +0200, Eric Auger wrote:
>>>>>>> When running a virtual SMMU on a guest we sometimes need to trap
>>>>>>> all changes to the translation structures. This is especially useful
>>>>>>> to integrate with VFIO. This patch adds a new option that forces
>>>>>>> the IO_PGTABLE_QUIRK_TLBI_ON_MAP to be applied on LPAE page tables.
>>>>>>>
>>>>>>> TLBI commands then can be trapped.
>>>>>>>
>>>>>>> Signed-off-by: Eric Auger <eric.auger@redhat.com>
>>>>>>>
>>>>>>> ---
>>>>>>> v1 -> v2:
>>>>>>> - rebase on v4.13-rc2
>>>>>>> ---
>>>>>>>  Documentation/devicetree/bindings/iommu/arm,smmu-v3.txt | 4 ++++
>>>>>>>  drivers/iommu/arm-smmu-v3.c                             | 5 +++++
>>>>>>>  2 files changed, 9 insertions(+)
>>>>>>>
>>>>>>> diff --git a/Documentation/devicetree/bindings/iommu/arm,smmu-v3.txt b/Documentation/devicetree/bindings/iommu/arm,smmu-v3.txt
>>>>>>> index c9abbf3..ebb85e9 100644
>>>>>>> --- a/Documentation/devicetree/bindings/iommu/arm,smmu-v3.txt
>>>>>>> +++ b/Documentation/devicetree/bindings/iommu/arm,smmu-v3.txt
>>>>>>> @@ -52,6 +52,10 @@ the PCIe specification.
>>>>>>>                          devicetree/bindings/interrupt-controller/msi.txt
>>>>>>>                        for a description of the msi-parent property.
>>>>>>>  
>>>>>>> +- tlbi-on-map       : invalidate caches whenever there is an update of
>>>>>>> +                      any remapping structure (updates to not-present or
>>>>>>> +                      present entries).
>>>>>>> +
>>>>>>
>>>>>> My position on this hasn't changed, so NAK for this patch. If you want to
>>>>>> emulate something outside of the SMMUv3 architecture, please do so, but
>>>>>> don't pretend that it's an SMMUv3.
>>>>>>
>>>>>> Will
>>>>>
>>>>> What if the emulated device does not list arm,smmu-v3, listing
>>>>> qemu,ssmu-v3 as compatible? Would that address the concern?
>>>>
>>>> Will, can you comment on this please? Are you open to reusing the code
>>>> in drivers/iommu/arm-smmu-v3.c to support a paravirtual device that does
>>>> not claim to be compatible with smmuv3 but does try to behave very close to
>>>> it except it can cache non-present structures? Or would you rather
>>>> the code to support this is forked to qemu-smmu-v3.c?
>>>
>>> I still don't understand why this is preferable to a PV IOMMU
>>> implementation. Not only is this proposing to issue TLB maintenance on
>>> map, but the maintenance command itself is entirely made up. Why not just
>>> have a map command? Anyway, I'm reluctant to add this hack to the driver until:
>>>
>>>   1. There is a compelling reason to pursue this approach instead of a
>>>      PV approach (including performance measurements).
>>>
>>>   2. There is a specification for the QEMU fork of the ARM SMMUv3
>>>      architecture, including the semantics of the new command being proposed
>>>      and what exactly the TLB maintenance requirements are on map (for
>>>      example, what if I change an STE or a CD -- are they cached too?).
>> I am not sure I catch this last point. At the moment whenever the smmuv3
>> driver issues data structure invalidation commands (CMD_CFGI_*), those
>> are trapped and I replay the mappings on host side. I have not changed
>> anything on that side.
> 
> But STEs and CDs have very similar rules to TLBs: you don't need to issue
> invalidation if the data structure is transitioning from invalid to valid.

While looking at chapter "4.8 virtualisation" of the smmuv3 spec, I
understand that if we were to use the 2 stages we would need to trap on
STE updates since they are owned by the hyp.

Spec says "updates to a guest STE are accompanied by a CMD_CFGI_STE (or
similar) issued from the guest. So I understand invalidation of CDs are
not mandated by the spec but invalidation of STEs if the data structure
is transitioning from invalid to valid would be requested. Is that
correct? I fail to understand if this is currently done by the smmuv3
driver though.


> If you're caching those in QEMU, how do you keep them up-to-date? I can
> also guarantee you that there will be additional data structures added
> in future versions of the architecture, so you'll need to consider how
> you want to operate when running on newer hardware.
> 
>> I introduced a new map implementation defined command because the per
>> page CMD_TLBI_NH_VA IOVA invalidation command was not efficient/usable
>> with use cases such as DPDK on guest. I understood the spec provisions
>> for such implementation defined commands.

Also if we were to use dual stage, command queue accesses still would be
trapped. So if a guest invalidates a hugepage it would send a storm of
granule sized invalidations and each would be trapped. So maybe it does
not happen often but I guess it would be pretty inefficient.

On intel I understand the IOTLB Invalidation Descriptor has an AM field
(address-mask) which specifies the number of contiguous second level
pages that needs to be invalidated. When invalidating a large-page
transaction, the driver can use the appropriate mask value (0 for 4KB, 9
for 2MB, 18 for 1GB).

Thanks

Eric
> 
> Whilst there is a space for IMP DEF commands, this doesn't generally mean
> that they can be repurposed by software. What if the underlying hardware
> has an IMP DEF command that you want to export? Besides, my main points
> here are that your command isn't well-specified and if you have to add
> a command, why not just add a "map" command (i.e. implement a PV interface
> instead)?
> 
>>>   3. The ACPI IORT spec is updated to recognise this implementation
>>>
>>>   4. There is an implementation that can use the guest page tables directly,
>>>      because that may well make all of this moot.
>> Most probably I will come back to you with questions on stage 1 + stage2
>> enablement and "4.8 Virtualisation" chapter of smmuv3 spec. Besides I
>> also need to get access to some HW with smmuv3 ;-)
> 
> Ok.
> 
> Will
> 

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2017-10-05 15:15 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-08-11 13:45 [RFC v2 0/4] arm-smmu-v3 tlbi-on-map option Eric Auger
2017-08-11 13:45 ` [RFC v2 1/4] iommu/io-pgtable-arm: flush TLBs when IO_PGTABLE_QUIRK_TLBI_ON_MAP Eric Auger
2017-08-11 13:45 ` [RFC v2 2/4] iommu/arm-smmu-v3: Add tlbi_on_map option Eric Auger
2017-08-17 16:34   ` Will Deacon
2017-08-17 17:47     ` Auger Eric
2017-08-18  2:54       ` Michael S. Tsirkin
2017-08-18  6:50         ` Auger Eric
2017-08-18  2:49     ` Michael S. Tsirkin
2017-08-22 19:09       ` Michael S. Tsirkin
2017-08-23 10:25         ` Will Deacon
2017-08-23 12:36           ` Auger Eric
2017-08-23 14:12             ` Michael S. Tsirkin
2017-08-23 16:42             ` Will Deacon
2017-08-23 19:43               ` Michael S. Tsirkin
2017-08-24  7:09               ` Auger Eric
2017-10-05 15:14               ` Auger Eric
2017-08-23 14:05           ` Michael S. Tsirkin
2017-08-11 13:45 ` [RFC v2 3/4] iommu/arm-smmu-v3: Add hypothetical caching mode model Eric Auger
2017-08-11 13:45 ` [RFC v2 4/4] iommu/arm-smmu-v3: add CMD_TLBI_NH_VA_AM command for iova range invalidation Eric Auger

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).