iommu.lists.linux-foundation.org archive mirror
 help / color / mirror / Atom feed
* [PATCH V3 0/2] iommu/arm-smmu-v3: Add support for ECMDQ register mode
@ 2024-04-25 14:41 Tanmay Jagdale
  2024-04-25 14:41 ` [PATCH V3 1/2] " Tanmay Jagdale
                   ` (4 more replies)
  0 siblings, 5 replies; 8+ messages in thread
From: Tanmay Jagdale @ 2024-04-25 14:41 UTC (permalink / raw)
  To: will, robin.murphy, joro, nicolinc, mshavit, baolu.lu,
	thunder.leizhen, set_pte_at, smostafa
  Cc: sgoutham, gcherian, jcm, linux-arm-kernel, iommu, linux-kernel,
	Tanmay Jagdale

Resending the patches by Zhen Lei <thunder.leizhen@huawei.com> that add
support for SMMU ECMDQ feature.

Tested this feature on a Marvell SoC by implementing a smmu-test driver.
This test driver spawns a thread per CPU and each thread keeps sending
map, table-walk and unmap requests for a fixed duration.

Using this test driver, we compared ECMDQ vs SMMU with software batching
support and saw ~5% improvement with ECMDQ. Performance numbers are
mentioned below:

                   Total Requests  Average Requests  Difference
                                      Per CPU         wrt ECMDQ
-----------------------------------------------------------------
ECMDQ                 239286381       2991079
CMDQ Batch Size 1     228232187       2852902         -4.62%
CMDQ Batch Size 32    233465784       2918322         -2.43%
CMDQ Batch Size 64    231679588       2895994         -3.18%
CMDQ Batch Size 128   233189030       2914862         -2.55%
CMDQ Batch Size 256   230965773       2887072         -3.48%


v2 --> v3:
1. Rebased on linux 6.9-rc5

v1 --> v2:
1. Drop patch "iommu/arm-smmu-v3: Add arm_smmu_ecmdq_issue_cmdlist() for
non-shared ECMDQ" in v1
2. Drop patch "iommu/arm-smmu-v3: Add support for less than one ECMDQ
per core" in v1
3. Replace rwlock with IPI to support lockless protection against the
write operation to bit
   'ERRACK' during error handling and the read operation to bit 'ERRACK'
during command insertion. 
4. Standardize variable names.
-	struct arm_smmu_ecmdq *__percpu	*ecmdq;
+	struct arm_smmu_ecmdq *__percpu	*ecmdqs;

5. Add member 'iobase' to struct arm_smmu_device to record the start
physical
   address of the SMMU, to replace translation operation
(vmalloc_to_pfn(smmu->base) << PAGE_SHIFT)
+	phys_addr_t			iobase;
-	smmu_dma_base = (vmalloc_to_pfn(smmu->base) << PAGE_SHIFT);

6. Cancel below union. Whether ECMDQ is enabled is determined only based
on 'ecmdq_enabled'.
-	union {
-		u32			nr_ecmdq;
-		u32			ecmdq_enabled;
-	};
+	u32				nr_ecmdq;
+	bool				ecmdq_enabled;

7. Eliminate some sparse check warnings. For example.
-	struct arm_smmu_ecmdq *ecmdq;
+	struct arm_smmu_ecmdq __percpu *ecmdq;


Zhen Lei (2):
  iommu/arm-smmu-v3: Add support for ECMDQ register mode
  iommu/arm-smmu-v3: Ensure that a set of associated commands are
    inserted in the same ECMDQ

 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 261 +++++++++++++++++++-
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h |  33 +++
 2 files changed, 286 insertions(+), 8 deletions(-)

-- 
2.34.1


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH V3 1/2] iommu/arm-smmu-v3: Add support for ECMDQ register mode
  2024-04-25 14:41 [PATCH V3 0/2] iommu/arm-smmu-v3: Add support for ECMDQ register mode Tanmay Jagdale
@ 2024-04-25 14:41 ` Tanmay Jagdale
  2024-04-28  2:19   ` Leizhen (ThunderTown)
  2024-04-25 14:41 ` [PATCH V3 2/2] iommu/arm-smmu-v3: Ensure that a set of associated commands are inserted in the same ECMDQ Tanmay Jagdale
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 8+ messages in thread
From: Tanmay Jagdale @ 2024-04-25 14:41 UTC (permalink / raw)
  To: will, robin.murphy, joro, nicolinc, mshavit, baolu.lu,
	thunder.leizhen, set_pte_at, smostafa
  Cc: sgoutham, gcherian, jcm, linux-arm-kernel, iommu, linux-kernel,
	Tanmay Jagdale

From: Zhen Lei <thunder.leizhen@huawei.com>

Ensure that each core exclusively occupies an ECMDQ and all of them are
enabled during initialization. During this initialization process, any
errors will result in a fallback to using normal CMDQ.

When GERROR is triggered by ECMDQ, all ECMDQs need to be traversed: the
ECMDQs with errors will be processed and the ECMDQs without errors will
be skipped directly. Compared with register SMMU_CMDQ_PROD, register
SMMU_ECMDQ_PROD has one more 'EN' bit and one more 'ERRACK' bit. After
the error indicated by SMMU_GERROR.CMDQP_ERR is fixed, the 'ERRACK'
bit needs to be toggled to resume the corresponding ECMDQ. In order to
lockless protection against the write operation to bit 'ERRACK' during
error handling and the read operation to bit 'ERRACK' during command
insertion. Send IPI to the faulty CPU and perform the toggle operation
on the faulty CPU. Because the command insertion is protected by
local_irq_save(), so no race.

Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
Signed-off-by: Tanmay Jagdale <tanmay@marvell.com>
---
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 219 +++++++++++++++++++-
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h |  33 +++
 2 files changed, 251 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index 41f93c3ab160..8e088ca4e8e1 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -352,6 +352,14 @@ static int arm_smmu_cmdq_build_cmd(u64 *cmd, struct arm_smmu_cmdq_ent *ent)
 
 static struct arm_smmu_cmdq *arm_smmu_get_cmdq(struct arm_smmu_device *smmu)
 {
+	if (smmu->ecmdq_enabled) {
+		struct arm_smmu_ecmdq *ecmdq;
+
+		ecmdq = *this_cpu_ptr(smmu->ecmdqs);
+
+		return &ecmdq->cmdq;
+	}
+
 	return &smmu->cmdq;
 }
 
@@ -434,6 +442,43 @@ static void arm_smmu_cmdq_skip_err(struct arm_smmu_device *smmu)
 	__arm_smmu_cmdq_skip_err(smmu, &smmu->cmdq.q);
 }
 
+static void arm_smmu_ecmdq_err_ack(void *info)
+{
+	u32 prod, cons;
+	struct arm_smmu_queue *q = info;
+
+	prod = readl_relaxed(q->prod_reg);
+	cons = readl_relaxed(q->cons_reg);
+	prod &= ~ECMDQ_PROD_ERRACK;
+	prod |= cons & ECMDQ_CONS_ERR;
+	writel(prod, q->prod_reg);
+}
+
+static void arm_smmu_ecmdq_skip_err(struct arm_smmu_device *smmu)
+{
+	int i;
+	u32 prod, cons;
+	struct arm_smmu_queue *q;
+	struct arm_smmu_ecmdq *ecmdq;
+
+	if (!smmu->ecmdq_enabled)
+		return;
+
+	for (i = 0; i < smmu->nr_ecmdq; i++) {
+		ecmdq = *per_cpu_ptr(smmu->ecmdqs, i);
+		q = &ecmdq->cmdq.q;
+
+		prod = readl_relaxed(q->prod_reg);
+		cons = readl_relaxed(q->cons_reg);
+		if (((prod ^ cons) & ECMDQ_CONS_ERR) == 0)
+			continue;
+
+		__arm_smmu_cmdq_skip_err(smmu, q);
+
+		smp_call_function_single(i, arm_smmu_ecmdq_err_ack, q, true);
+	}
+}
+
 /*
  * Command queue locking.
  * This is a form of bastardised rwlock with the following major changes:
@@ -830,7 +875,10 @@ static int arm_smmu_cmdq_issue_cmdlist(struct arm_smmu_device *smmu,
 		 * d. Advance the hardware prod pointer
 		 * Control dependency ordering from the entries becoming valid.
 		 */
-		writel_relaxed(prod, cmdq->q.prod_reg);
+		if (smmu->ecmdq_enabled)
+			writel_relaxed(prod | ECMDQ_PROD_EN, cmdq->q.prod_reg);
+		else
+			writel_relaxed(prod, cmdq->q.prod_reg);
 
 		/*
 		 * e. Tell the next owner we're done
@@ -1838,6 +1886,9 @@ static irqreturn_t arm_smmu_gerror_handler(int irq, void *dev)
 	if (active & GERROR_CMDQ_ERR)
 		arm_smmu_cmdq_skip_err(smmu);
 
+	if (active & GERROR_CMDQP_ERR)
+		arm_smmu_ecmdq_skip_err(smmu);
+
 	writel(gerror, smmu->base + ARM_SMMU_GERRORN);
 	return IRQ_HANDLED;
 }
@@ -3154,6 +3205,20 @@ static int arm_smmu_cmdq_init(struct arm_smmu_device *smmu)
 	return 0;
 }
 
+static int arm_smmu_ecmdq_init(struct arm_smmu_cmdq *cmdq)
+{
+	unsigned int nents = 1 << cmdq->q.llq.max_n_shift;
+
+	atomic_set(&cmdq->owner_prod, 0);
+	atomic_set(&cmdq->lock, 0);
+
+	cmdq->valid_map = (atomic_long_t *)bitmap_zalloc(nents, GFP_KERNEL);
+	if (!cmdq->valid_map)
+		return -ENOMEM;
+
+	return 0;
+}
+
 static int arm_smmu_init_queues(struct arm_smmu_device *smmu)
 {
 	int ret;
@@ -3503,6 +3568,36 @@ static int arm_smmu_device_disable(struct arm_smmu_device *smmu)
 	return ret;
 }
 
+static void arm_smmu_ecmdq_reset(struct arm_smmu_device *smmu)
+{
+	u32 reg;
+	int i, ret;
+	struct arm_smmu_queue *q;
+	struct arm_smmu_ecmdq *ecmdq;
+
+	if (!smmu->ecmdq_enabled)
+		return;
+
+	for (i = 0; i < smmu->nr_ecmdq; i++) {
+		ecmdq = *per_cpu_ptr(smmu->ecmdqs, i);
+
+		q = &ecmdq->cmdq.q;
+		writeq_relaxed(q->q_base, ecmdq->base + ARM_SMMU_ECMDQ_BASE);
+		writel_relaxed(q->llq.prod, ecmdq->base + ARM_SMMU_ECMDQ_PROD);
+		writel_relaxed(q->llq.cons, ecmdq->base + ARM_SMMU_ECMDQ_CONS);
+
+		/* enable ecmdq */
+		writel(ECMDQ_PROD_EN, q->prod_reg);
+		ret = readl_relaxed_poll_timeout(q->cons_reg, reg, reg & ECMDQ_CONS_ENACK,
+					  1, ARM_SMMU_POLL_TIMEOUT_US);
+		if (ret) {
+			dev_err(smmu->dev, "ecmdq[%d] enable failed\n", i);
+			smmu->ecmdq_enabled = false;
+			break;
+		}
+	}
+}
+
 static int arm_smmu_device_reset(struct arm_smmu_device *smmu, bool bypass)
 {
 	int ret;
@@ -3557,6 +3652,8 @@ static int arm_smmu_device_reset(struct arm_smmu_device *smmu, bool bypass)
 		return ret;
 	}
 
+	arm_smmu_ecmdq_reset(smmu);
+
 	/* Invalidate any cached configuration */
 	cmd.opcode = CMDQ_OP_CFGI_ALL;
 	arm_smmu_cmdq_issue_cmd_with_sync(smmu, &cmd);
@@ -3674,6 +3771,112 @@ static void arm_smmu_device_iidr_probe(struct arm_smmu_device *smmu)
 		}
 		break;
 	}
+};
+
+static int arm_smmu_ecmdq_layout(struct arm_smmu_device *smmu)
+{
+	int cpu;
+	struct arm_smmu_ecmdq __percpu *ecmdq;
+
+	if (num_possible_cpus() <= smmu->nr_ecmdq) {
+		ecmdq = devm_alloc_percpu(smmu->dev, *ecmdq);
+		if (!ecmdq)
+			return -ENOMEM;
+
+		for_each_possible_cpu(cpu)
+			*per_cpu_ptr(smmu->ecmdqs, cpu) = per_cpu_ptr(ecmdq, cpu);
+
+		/* A core requires at most one ECMDQ */
+		smmu->nr_ecmdq = num_possible_cpus();
+
+		return 0;
+	}
+
+	return -ENOSPC;
+}
+
+static int arm_smmu_ecmdq_probe(struct arm_smmu_device *smmu)
+{
+	int ret, cpu;
+	u32 i, nump, numq, gap;
+	u32 reg, shift_increment;
+	u64 offset;
+	void __iomem *cp_regs, *cp_base;
+
+	/* IDR6 */
+	reg = readl_relaxed(smmu->base + ARM_SMMU_IDR6);
+	nump = 1 << FIELD_GET(IDR6_LOG2NUMP, reg);
+	numq = 1 << FIELD_GET(IDR6_LOG2NUMQ, reg);
+	smmu->nr_ecmdq = nump * numq;
+	gap = ECMDQ_CP_RRESET_SIZE >> FIELD_GET(IDR6_LOG2NUMQ, reg);
+
+	cp_regs = ioremap(smmu->iobase + ARM_SMMU_ECMDQ_CP_BASE, PAGE_SIZE);
+	if (!cp_regs)
+		return -ENOMEM;
+
+	for (i = 0; i < nump; i++) {
+		u64 val, pre_addr = 0;
+
+		val = readq_relaxed(cp_regs + 32 * i);
+		if (!(val & ECMDQ_CP_PRESET)) {
+			iounmap(cp_regs);
+			dev_err(smmu->dev, "ecmdq control page %u is memory mode\n", i);
+			return -EFAULT;
+		}
+
+		if (i && ((val & ECMDQ_CP_ADDR) != (pre_addr + ECMDQ_CP_RRESET_SIZE))) {
+			iounmap(cp_regs);
+			dev_err(smmu->dev, "ecmdq_cp memory region is not contiguous\n");
+			return -EFAULT;
+		}
+
+		pre_addr = val & ECMDQ_CP_ADDR;
+	}
+
+	offset = readl_relaxed(cp_regs) & ECMDQ_CP_ADDR;
+	iounmap(cp_regs);
+
+	cp_base = devm_ioremap(smmu->dev, smmu->iobase + offset, ECMDQ_CP_RRESET_SIZE * nump);
+	if (!cp_base)
+		return -ENOMEM;
+
+	smmu->ecmdqs = devm_alloc_percpu(smmu->dev, struct arm_smmu_ecmdq *);
+	if (!smmu->ecmdqs)
+		return -ENOMEM;
+
+	ret = arm_smmu_ecmdq_layout(smmu);
+	if (ret)
+		return ret;
+
+	shift_increment = order_base_2(num_possible_cpus() / smmu->nr_ecmdq);
+
+	offset = 0;
+	for_each_possible_cpu(cpu) {
+		struct arm_smmu_ecmdq *ecmdq;
+		struct arm_smmu_queue *q;
+
+		ecmdq = *per_cpu_ptr(smmu->ecmdqs, cpu);
+		ecmdq->base = cp_base + offset;
+
+		q = &ecmdq->cmdq.q;
+
+		q->llq.max_n_shift = ECMDQ_MAX_SZ_SHIFT + shift_increment;
+		ret = arm_smmu_init_one_queue(smmu, q, ecmdq->base, ARM_SMMU_ECMDQ_PROD,
+				ARM_SMMU_ECMDQ_CONS, CMDQ_ENT_DWORDS, "ecmdq");
+		if (ret)
+			return ret;
+
+		ret = arm_smmu_ecmdq_init(&ecmdq->cmdq);
+		if (ret) {
+			dev_err(smmu->dev, "ecmdq[%d] init failed\n", i);
+			return ret;
+		}
+
+		offset += gap;
+	}
+	smmu->ecmdq_enabled = true;
+
+	return 0;
 }
 
 static int arm_smmu_device_hw_probe(struct arm_smmu_device *smmu)
@@ -3789,6 +3992,9 @@ static int arm_smmu_device_hw_probe(struct arm_smmu_device *smmu)
 	if (reg & IDR1_ATTR_TYPES_OVR)
 		smmu->features |= ARM_SMMU_FEAT_ATTR_TYPES_OVR;
 
+	if (reg & IDR1_ECMDQ)
+		smmu->features |= ARM_SMMU_FEAT_ECMDQ;
+
 	/* Queue sizes, capped to ensure natural alignment */
 	smmu->cmdq.q.llq.max_n_shift = min_t(u32, CMDQ_MAX_SZ_SHIFT,
 					     FIELD_GET(IDR1_CMDQS, reg));
@@ -3896,6 +4102,16 @@ static int arm_smmu_device_hw_probe(struct arm_smmu_device *smmu)
 
 	dev_info(smmu->dev, "ias %lu-bit, oas %lu-bit (features 0x%08x)\n",
 		 smmu->ias, smmu->oas, smmu->features);
+
+	if (smmu->features & ARM_SMMU_FEAT_ECMDQ) {
+		int err;
+
+		err = arm_smmu_ecmdq_probe(smmu);
+		if (err) {
+			dev_err(smmu->dev, "suppress ecmdq feature, errno=%d\n", err);
+			smmu->ecmdq_enabled = false;
+		}
+	}
 	return 0;
 }
 
@@ -4054,6 +4270,7 @@ static int arm_smmu_device_probe(struct platform_device *pdev)
 	smmu->base = arm_smmu_ioremap(dev, ioaddr, ARM_SMMU_REG_SZ);
 	if (IS_ERR(smmu->base))
 		return PTR_ERR(smmu->base);
+	smmu->iobase = ioaddr;
 
 	if (arm_smmu_resource_size(smmu) > SZ_64K) {
 		smmu->page1 = arm_smmu_ioremap(dev, ioaddr + SZ_64K,
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
index 2a19bb63e5c6..335b9f975d74 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
@@ -41,6 +41,7 @@
 #define IDR0_S2P			(1 << 0)
 
 #define ARM_SMMU_IDR1			0x4
+#define IDR1_ECMDQ			(1 << 31)
 #define IDR1_TABLES_PRESET		(1 << 30)
 #define IDR1_QUEUES_PRESET		(1 << 29)
 #define IDR1_REL			(1 << 28)
@@ -114,6 +115,7 @@
 #define ARM_SMMU_IRQ_CTRLACK		0x54
 
 #define ARM_SMMU_GERROR			0x60
+#define GERROR_CMDQP_ERR		(1 << 9)
 #define GERROR_SFM_ERR			(1 << 8)
 #define GERROR_MSI_GERROR_ABT_ERR	(1 << 7)
 #define GERROR_MSI_PRIQ_ABT_ERR		(1 << 6)
@@ -159,6 +161,26 @@
 #define ARM_SMMU_PRIQ_IRQ_CFG1		0xd8
 #define ARM_SMMU_PRIQ_IRQ_CFG2		0xdc
 
+#define ARM_SMMU_IDR6			0x190
+#define IDR6_LOG2NUMP			GENMASK(27, 24)
+#define IDR6_LOG2NUMQ			GENMASK(19, 16)
+#define IDR6_BA_DOORBELLS		GENMASK(9, 0)
+
+#define ARM_SMMU_ECMDQ_BASE		0x00
+#define ARM_SMMU_ECMDQ_PROD		0x08
+#define ARM_SMMU_ECMDQ_CONS		0x0c
+#define ECMDQ_MAX_SZ_SHIFT		8
+#define ECMDQ_PROD_EN			(1 << 31)
+#define ECMDQ_CONS_ENACK		(1 << 31)
+#define ECMDQ_CONS_ERR			(1 << 23)
+#define ECMDQ_PROD_ERRACK		(1 << 23)
+
+#define ARM_SMMU_ECMDQ_CP_BASE		0x4000
+#define ECMDQ_CP_ADDR			GENMASK_ULL(51, 12)
+#define ECMDQ_CP_CMDQGS			GENMASK_ULL(2, 1)
+#define ECMDQ_CP_PRESET			(1UL << 0)
+#define ECMDQ_CP_RRESET_SIZE		0x10000
+
 #define ARM_SMMU_REG_SZ			0xe00
 
 /* Common MSI config fields */
@@ -558,6 +580,11 @@ struct arm_smmu_cmdq {
 	atomic_t			lock;
 };
 
+struct arm_smmu_ecmdq {
+	struct arm_smmu_cmdq		cmdq;
+	void __iomem			*base;
+};
+
 struct arm_smmu_cmdq_batch {
 	u64				cmds[CMDQ_BATCH_ENTRIES * CMDQ_ENT_DWORDS];
 	int				num;
@@ -627,6 +654,7 @@ struct arm_smmu_device {
 	struct device			*dev;
 	void __iomem			*base;
 	void __iomem			*page1;
+	phys_addr_t			iobase;
 
 #define ARM_SMMU_FEAT_2_LVL_STRTAB	(1 << 0)
 #define ARM_SMMU_FEAT_2_LVL_CDTAB	(1 << 1)
@@ -649,6 +677,7 @@ struct arm_smmu_device {
 #define ARM_SMMU_FEAT_E2H		(1 << 18)
 #define ARM_SMMU_FEAT_NESTING		(1 << 19)
 #define ARM_SMMU_FEAT_ATTR_TYPES_OVR	(1 << 20)
+#define ARM_SMMU_FEAT_ECMDQ		(1 << 21)
 	u32				features;
 
 #define ARM_SMMU_OPT_SKIP_PREFETCH	(1 << 0)
@@ -657,6 +686,10 @@ struct arm_smmu_device {
 #define ARM_SMMU_OPT_CMDQ_FORCE_SYNC	(1 << 3)
 	u32				options;
 
+	struct arm_smmu_ecmdq *__percpu	*ecmdqs;
+	u32				nr_ecmdq;
+	bool				ecmdq_enabled;
+
 	struct arm_smmu_cmdq		cmdq;
 	struct arm_smmu_evtq		evtq;
 	struct arm_smmu_priq		priq;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH V3 2/2] iommu/arm-smmu-v3: Ensure that a set of associated commands are inserted in the same ECMDQ
  2024-04-25 14:41 [PATCH V3 0/2] iommu/arm-smmu-v3: Add support for ECMDQ register mode Tanmay Jagdale
  2024-04-25 14:41 ` [PATCH V3 1/2] " Tanmay Jagdale
@ 2024-04-25 14:41 ` Tanmay Jagdale
  2024-04-28  2:08 ` [PATCH V3 0/2] iommu/arm-smmu-v3: Add support for ECMDQ register mode Leizhen (ThunderTown)
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 8+ messages in thread
From: Tanmay Jagdale @ 2024-04-25 14:41 UTC (permalink / raw)
  To: will, robin.murphy, joro, nicolinc, mshavit, baolu.lu,
	thunder.leizhen, set_pte_at, smostafa
  Cc: sgoutham, gcherian, jcm, linux-arm-kernel, iommu, linux-kernel,
	Tanmay Jagdale

From: Zhen Lei <thunder.leizhen@huawei.com>

The SYNC command only ensures that the command that precedes it in the
same ECMDQ must be executed, but cannot synchronize the commands in other
ECMDQs. If an unmap involves multiple commands, some commands are executed
on one core, and the other commands are executed on another core. In this
case, after the SYNC execution is complete, the execution of all preceded
commands can not be ensured.

Prevent the process that performs a set of associated commands insertion
from being migrated to other cores ensures that all commands are inserted
into the same ECMDQ.

Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
Signed-off-by: Tanmay Jagdale <tanmay@marvell.com>
---
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 42 +++++++++++++++++----
 1 file changed, 35 insertions(+), 7 deletions(-)

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index 8e088ca4e8e1..d53b808de03f 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -248,6 +248,18 @@ static int queue_remove_raw(struct arm_smmu_queue *q, u64 *ent)
 	return 0;
 }
 
+static void arm_smmu_preempt_disable(struct arm_smmu_device *smmu)
+{
+	if (smmu->ecmdq_enabled)
+		preempt_disable();
+}
+
+static void arm_smmu_preempt_enable(struct arm_smmu_device *smmu)
+{
+	if (smmu->ecmdq_enabled)
+		preempt_enable();
+}
+
 /* High-level queue accessors */
 static int arm_smmu_cmdq_build_cmd(u64 *cmd, struct arm_smmu_cmdq_ent *ent)
 {
@@ -1229,12 +1241,15 @@ static void arm_smmu_sync_cd(struct arm_smmu_master *master,
 	};
 
 	cmds.num = 0;
+
+	arm_smmu_preempt_disable(smmu);
 	for (i = 0; i < master->num_streams; i++) {
 		cmd.cfgi.sid = master->streams[i].id;
 		arm_smmu_cmdq_batch_add(smmu, &cmds, &cmd);
 	}
 
 	arm_smmu_cmdq_batch_submit(smmu, &cmds);
+	arm_smmu_preempt_enable(smmu);
 }
 
 static int arm_smmu_alloc_cd_leaf_table(struct arm_smmu_device *smmu,
@@ -1979,31 +1994,38 @@ arm_smmu_atc_inv_to_cmd(int ssid, unsigned long iova, size_t size,
 
 static int arm_smmu_atc_inv_master(struct arm_smmu_master *master)
 {
-	int i;
+	int i, ret;
 	struct arm_smmu_cmdq_ent cmd;
 	struct arm_smmu_cmdq_batch cmds;
+	struct arm_smmu_device *smmu = master->smmu;
 
 	arm_smmu_atc_inv_to_cmd(IOMMU_NO_PASID, 0, 0, &cmd);
 
 	cmds.num = 0;
+
+	arm_smmu_preempt_disable(smmu);
 	for (i = 0; i < master->num_streams; i++) {
 		cmd.atc.sid = master->streams[i].id;
-		arm_smmu_cmdq_batch_add(master->smmu, &cmds, &cmd);
+		arm_smmu_cmdq_batch_add(smmu, &cmds, &cmd);
 	}
 
-	return arm_smmu_cmdq_batch_submit(master->smmu, &cmds);
+	ret = arm_smmu_cmdq_batch_submit(smmu, &cmds);
+	arm_smmu_preempt_enable(smmu);
+
+	return ret;
 }
 
 int arm_smmu_atc_inv_domain(struct arm_smmu_domain *smmu_domain, int ssid,
 			    unsigned long iova, size_t size)
 {
-	int i;
+	int i, ret;
 	unsigned long flags;
 	struct arm_smmu_cmdq_ent cmd;
 	struct arm_smmu_master *master;
 	struct arm_smmu_cmdq_batch cmds;
+	struct arm_smmu_device *smmu = smmu_domain->smmu;
 
-	if (!(smmu_domain->smmu->features & ARM_SMMU_FEAT_ATS))
+	if (!(smmu->features & ARM_SMMU_FEAT_ATS))
 		return 0;
 
 	/*
@@ -2027,6 +2049,7 @@ int arm_smmu_atc_inv_domain(struct arm_smmu_domain *smmu_domain, int ssid,
 
 	cmds.num = 0;
 
+	arm_smmu_preempt_disable(smmu);
 	spin_lock_irqsave(&smmu_domain->devices_lock, flags);
 	list_for_each_entry(master, &smmu_domain->devices, domain_head) {
 		if (!master->ats_enabled)
@@ -2034,12 +2057,15 @@ int arm_smmu_atc_inv_domain(struct arm_smmu_domain *smmu_domain, int ssid,
 
 		for (i = 0; i < master->num_streams; i++) {
 			cmd.atc.sid = master->streams[i].id;
-			arm_smmu_cmdq_batch_add(smmu_domain->smmu, &cmds, &cmd);
+			arm_smmu_cmdq_batch_add(smmu, &cmds, &cmd);
 		}
 	}
 	spin_unlock_irqrestore(&smmu_domain->devices_lock, flags);
 
-	return arm_smmu_cmdq_batch_submit(smmu_domain->smmu, &cmds);
+	ret = arm_smmu_cmdq_batch_submit(smmu, &cmds);
+	arm_smmu_preempt_enable(smmu);
+
+	return ret;
 }
 
 /* IO_PGTABLE API */
@@ -2104,6 +2130,7 @@ static void __arm_smmu_tlb_inv_range(struct arm_smmu_cmdq_ent *cmd,
 
 	cmds.num = 0;
 
+	arm_smmu_preempt_disable(smmu);
 	while (iova < end) {
 		if (smmu->features & ARM_SMMU_FEAT_RANGE_INV) {
 			/*
@@ -2135,6 +2162,7 @@ static void __arm_smmu_tlb_inv_range(struct arm_smmu_cmdq_ent *cmd,
 		iova += inv_range;
 	}
 	arm_smmu_cmdq_batch_submit(smmu, &cmds);
+	arm_smmu_preempt_enable(smmu);
 }
 
 static void arm_smmu_tlb_inv_range_domain(unsigned long iova, size_t size,
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH V3 0/2] iommu/arm-smmu-v3: Add support for ECMDQ register mode
  2024-04-25 14:41 [PATCH V3 0/2] iommu/arm-smmu-v3: Add support for ECMDQ register mode Tanmay Jagdale
  2024-04-25 14:41 ` [PATCH V3 1/2] " Tanmay Jagdale
  2024-04-25 14:41 ` [PATCH V3 2/2] iommu/arm-smmu-v3: Ensure that a set of associated commands are inserted in the same ECMDQ Tanmay Jagdale
@ 2024-04-28  2:08 ` Leizhen (ThunderTown)
  2024-04-30 15:09 ` Will Deacon
  2024-05-02 16:25 ` Jason Gunthorpe
  4 siblings, 0 replies; 8+ messages in thread
From: Leizhen (ThunderTown) @ 2024-04-28  2:08 UTC (permalink / raw)
  To: Tanmay Jagdale, will, robin.murphy, joro, nicolinc, mshavit,
	baolu.lu, thunder.leizhen, set_pte_at, smostafa
  Cc: sgoutham, gcherian, jcm, linux-arm-kernel, iommu, linux-kernel



On 2024/4/25 22:41, Tanmay Jagdale wrote:
> Resending the patches by Zhen Lei <thunder.leizhen@huawei.com> that add
> support for SMMU ECMDQ feature.
> 
> Tested this feature on a Marvell SoC by implementing a smmu-test driver.
> This test driver spawns a thread per CPU and each thread keeps sending
> map, table-walk and unmap requests for a fixed duration.
> 
> Using this test driver, we compared ECMDQ vs SMMU with software batching
> support and saw ~5% improvement with ECMDQ. Performance numbers are

This data is similar to the performance simulated by our EMU.

> mentioned below:
> 
>                    Total Requests  Average Requests  Difference
>                                       Per CPU         wrt ECMDQ
> -----------------------------------------------------------------
> ECMDQ                 239286381       2991079
> CMDQ Batch Size 1     228232187       2852902         -4.62%
> CMDQ Batch Size 32    233465784       2918322         -2.43%
> CMDQ Batch Size 64    231679588       2895994         -3.18%
> CMDQ Batch Size 128   233189030       2914862         -2.55%
> CMDQ Batch Size 256   230965773       2887072         -3.48%
> 
> 
> v2 --> v3:
> 1. Rebased on linux 6.9-rc5
> 
> v1 --> v2:

Thanks.

> 1. Drop patch "iommu/arm-smmu-v3: Add arm_smmu_ecmdq_issue_cmdlist() for
> non-shared ECMDQ" in v1
> 2. Drop patch "iommu/arm-smmu-v3: Add support for less than one ECMDQ
> per core" in v1
> 3. Replace rwlock with IPI to support lockless protection against the
> write operation to bit
>    'ERRACK' during error handling and the read operation to bit 'ERRACK'
> during command insertion. 
> 4. Standardize variable names.
> -	struct arm_smmu_ecmdq *__percpu	*ecmdq;
> +	struct arm_smmu_ecmdq *__percpu	*ecmdqs;
> 
> 5. Add member 'iobase' to struct arm_smmu_device to record the start
> physical
>    address of the SMMU, to replace translation operation
> (vmalloc_to_pfn(smmu->base) << PAGE_SHIFT)
> +	phys_addr_t			iobase;
> -	smmu_dma_base = (vmalloc_to_pfn(smmu->base) << PAGE_SHIFT);
> 
> 6. Cancel below union. Whether ECMDQ is enabled is determined only based
> on 'ecmdq_enabled'.
> -	union {
> -		u32			nr_ecmdq;
> -		u32			ecmdq_enabled;
> -	};
> +	u32				nr_ecmdq;
> +	bool				ecmdq_enabled;
> 
> 7. Eliminate some sparse check warnings. For example.
> -	struct arm_smmu_ecmdq *ecmdq;
> +	struct arm_smmu_ecmdq __percpu *ecmdq;
> 
> 
> Zhen Lei (2):
>   iommu/arm-smmu-v3: Add support for ECMDQ register mode
>   iommu/arm-smmu-v3: Ensure that a set of associated commands are
>     inserted in the same ECMDQ
> 
>  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 261 +++++++++++++++++++-
>  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h |  33 +++
>  2 files changed, 286 insertions(+), 8 deletions(-)
> 

-- 
Regards,
  Zhen Lei


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH V3 1/2] iommu/arm-smmu-v3: Add support for ECMDQ register mode
  2024-04-25 14:41 ` [PATCH V3 1/2] " Tanmay Jagdale
@ 2024-04-28  2:19   ` Leizhen (ThunderTown)
  0 siblings, 0 replies; 8+ messages in thread
From: Leizhen (ThunderTown) @ 2024-04-28  2:19 UTC (permalink / raw)
  To: Tanmay Jagdale, will, robin.murphy, joro, nicolinc, mshavit,
	baolu.lu, thunder.leizhen, set_pte_at, smostafa
  Cc: sgoutham, gcherian, jcm, linux-arm-kernel, iommu, linux-kernel



On 2024/4/25 22:41, Tanmay Jagdale wrote:
> +static int arm_smmu_ecmdq_probe(struct arm_smmu_device *smmu)
> +{
> +	int ret, cpu;
> +	u32 i, nump, numq, gap;
> +	u32 reg, shift_increment;
> +	u64 offset;
> +	void __iomem *cp_regs, *cp_base;
> +
> +	/* IDR6 */
> +	reg = readl_relaxed(smmu->base + ARM_SMMU_IDR6);
> +	nump = 1 << FIELD_GET(IDR6_LOG2NUMP, reg);
> +	numq = 1 << FIELD_GET(IDR6_LOG2NUMQ, reg);
> +	smmu->nr_ecmdq = nump * numq;
> +	gap = ECMDQ_CP_RRESET_SIZE >> FIELD_GET(IDR6_LOG2NUMQ, reg);
> +
> +	cp_regs = ioremap(smmu->iobase + ARM_SMMU_ECMDQ_CP_BASE, PAGE_SIZE);
> +	if (!cp_regs)
> +		return -ENOMEM;
> +
> +	for (i = 0; i < nump; i++) {
> +		u64 val, pre_addr = 0;

This is my mistake. The 'pre_addr' should be defined out of this 'for' loop. Or, you
can remove all 'pre_addr' related statements, commercially available chips that have
been tested cannot have such errors.

> +
> +		val = readq_relaxed(cp_regs + 32 * i);
> +		if (!(val & ECMDQ_CP_PRESET)) {
> +			iounmap(cp_regs);
> +			dev_err(smmu->dev, "ecmdq control page %u is memory mode\n", i);
> +			return -EFAULT;
> +		}
> +

                                                     <---------------------------------------------
> +		if (i && ((val & ECMDQ_CP_ADDR) != (pre_addr + ECMDQ_CP_RRESET_SIZE))) {           |
> +			iounmap(cp_regs);                                                          |
> +			dev_err(smmu->dev, "ecmdq_cp memory region is not contiguous\n");          |
> +			return -EFAULT;                                                            |
> +		}                                                                                  |
> +                                                                                                |
> +		pre_addr = val & ECMDQ_CP_ADDR;                                                    |
                                                      <--------------------------------------------   feel free remove these
> +	}

-- 
Regards,
  Zhen Lei


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH V3 0/2] iommu/arm-smmu-v3: Add support for ECMDQ register mode
  2024-04-25 14:41 [PATCH V3 0/2] iommu/arm-smmu-v3: Add support for ECMDQ register mode Tanmay Jagdale
                   ` (2 preceding siblings ...)
  2024-04-28  2:08 ` [PATCH V3 0/2] iommu/arm-smmu-v3: Add support for ECMDQ register mode Leizhen (ThunderTown)
@ 2024-04-30 15:09 ` Will Deacon
  2024-05-02 16:25 ` Jason Gunthorpe
  4 siblings, 0 replies; 8+ messages in thread
From: Will Deacon @ 2024-04-30 15:09 UTC (permalink / raw)
  To: Tanmay Jagdale
  Cc: robin.murphy, joro, nicolinc, mshavit, baolu.lu, thunder.leizhen,
	set_pte_at, smostafa, sgoutham, gcherian, jcm, linux-arm-kernel,
	iommu, linux-kernel

On Thu, Apr 25, 2024 at 07:41:50AM -0700, Tanmay Jagdale wrote:
> Resending the patches by Zhen Lei <thunder.leizhen@huawei.com> that add
> support for SMMU ECMDQ feature.
> 
> Tested this feature on a Marvell SoC by implementing a smmu-test driver.
> This test driver spawns a thread per CPU and each thread keeps sending
> map, table-walk and unmap requests for a fixed duration.
> 
> Using this test driver, we compared ECMDQ vs SMMU with software batching
> support and saw ~5% improvement with ECMDQ. Performance numbers are
> mentioned below:
> 
>                    Total Requests  Average Requests  Difference
>                                       Per CPU         wrt ECMDQ
> -----------------------------------------------------------------
> ECMDQ                 239286381       2991079
> CMDQ Batch Size 1     228232187       2852902         -4.62%
> CMDQ Batch Size 32    233465784       2918322         -2.43%
> CMDQ Batch Size 64    231679588       2895994         -3.18%
> CMDQ Batch Size 128   233189030       2914862         -2.55%
> CMDQ Batch Size 256   230965773       2887072         -3.48%

These are pretty small improvements in a targetted micro-benchmark. Do
you have any real-world numbers showing that this is worthwhile? For
example, running something like netperf.

Will

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH V3 0/2] iommu/arm-smmu-v3: Add support for ECMDQ register mode
  2024-04-25 14:41 [PATCH V3 0/2] iommu/arm-smmu-v3: Add support for ECMDQ register mode Tanmay Jagdale
                   ` (3 preceding siblings ...)
  2024-04-30 15:09 ` Will Deacon
@ 2024-05-02 16:25 ` Jason Gunthorpe
  4 siblings, 0 replies; 8+ messages in thread
From: Jason Gunthorpe @ 2024-05-02 16:25 UTC (permalink / raw)
  To: Tanmay Jagdale
  Cc: will, robin.murphy, joro, nicolinc, mshavit, baolu.lu,
	thunder.leizhen, set_pte_at, smostafa, sgoutham, gcherian, jcm,
	linux-arm-kernel, iommu, linux-kernel

On Thu, Apr 25, 2024 at 07:41:50AM -0700, Tanmay Jagdale wrote:
> Resending the patches by Zhen Lei <thunder.leizhen@huawei.com> that add
> support for SMMU ECMDQ feature.
> 
> Tested this feature on a Marvell SoC by implementing a smmu-test driver.
> This test driver spawns a thread per CPU and each thread keeps sending
> map, table-walk and unmap requests for a fixed duration.

So this is not just measuring invalidation performance but basically
the DMA API performance to do map/unmap operations? What is "batch
size" ?

Does this HW support the range invalidation? How many invalidation
commands does earch test cycle generate?

>                    Total Requests  Average Requests  Difference
>                                       Per CPU         wrt ECMDQ
> -----------------------------------------------------------------
> ECMDQ                 239286381       2991079
> CMDQ Batch Size 1     228232187       2852902         -4.62%
> CMDQ Batch Size 32    233465784       2918322         -2.43%
> CMDQ Batch Size 64    231679588       2895994         -3.18%
> CMDQ Batch Size 128   233189030       2914862         -2.55%
> CMDQ Batch Size 256   230965773       2887072         -3.48%

If this is really 5% for a typical DMA API map/unmap cycle then that
seems interesting to me.

If it is 5% for just the invalidation command then it is harder to
say.

I'd suggest to present your results in terms of latency to do a dma
API map/unmap cycle, and to show how the results scale as you add more
threads. Does even 2 threads start to show a 4-5% gain?

Also, I'm wondering how ATS would interact, I think we have a
head-of-line blocking issue with ATS.. Allowing ATS to progress
concurrently with unrelated parallel invalidations may also be
interesting, esepcially for multi-SVA workloads.

Jason

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH V3 2/2] iommu/arm-smmu-v3: Ensure that a set of associated commands are inserted in the same ECMDQ
@ 2024-04-25 14:31 Tanmay Jagdale
  0 siblings, 0 replies; 8+ messages in thread
From: Tanmay Jagdale @ 2024-04-25 14:31 UTC (permalink / raw)
  To: will, robin.murphy, joro, nicolinc, mshavit, baolu.lu,
	thunder.leizhen, set_pte_at, smostafa
  Cc: sgoutham, gcherian, jcm, linux-arm-kernel, iommu, linux-kernel,
	Tanmay Jagdale

From: Zhen Lei <thunder.leizhen@huawei.com>

The SYNC command only ensures that the command that precedes it in the
same ECMDQ must be executed, but cannot synchronize the commands in other
ECMDQs. If an unmap involves multiple commands, some commands are executed
on one core, and the other commands are executed on another core. In this
case, after the SYNC execution is complete, the execution of all preceded
commands can not be ensured.

Prevent the process that performs a set of associated commands insertion
from being migrated to other cores ensures that all commands are inserted
into the same ECMDQ.

Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
Signed-off-by: Tanmay Jagdale <tanmay@marvell.com>
---
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 42 +++++++++++++++++----
 1 file changed, 35 insertions(+), 7 deletions(-)

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index 8e088ca4e8e1..d53b808de03f 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -248,6 +248,18 @@ static int queue_remove_raw(struct arm_smmu_queue *q, u64 *ent)
 	return 0;
 }
 
+static void arm_smmu_preempt_disable(struct arm_smmu_device *smmu)
+{
+	if (smmu->ecmdq_enabled)
+		preempt_disable();
+}
+
+static void arm_smmu_preempt_enable(struct arm_smmu_device *smmu)
+{
+	if (smmu->ecmdq_enabled)
+		preempt_enable();
+}
+
 /* High-level queue accessors */
 static int arm_smmu_cmdq_build_cmd(u64 *cmd, struct arm_smmu_cmdq_ent *ent)
 {
@@ -1229,12 +1241,15 @@ static void arm_smmu_sync_cd(struct arm_smmu_master *master,
 	};
 
 	cmds.num = 0;
+
+	arm_smmu_preempt_disable(smmu);
 	for (i = 0; i < master->num_streams; i++) {
 		cmd.cfgi.sid = master->streams[i].id;
 		arm_smmu_cmdq_batch_add(smmu, &cmds, &cmd);
 	}
 
 	arm_smmu_cmdq_batch_submit(smmu, &cmds);
+	arm_smmu_preempt_enable(smmu);
 }
 
 static int arm_smmu_alloc_cd_leaf_table(struct arm_smmu_device *smmu,
@@ -1979,31 +1994,38 @@ arm_smmu_atc_inv_to_cmd(int ssid, unsigned long iova, size_t size,
 
 static int arm_smmu_atc_inv_master(struct arm_smmu_master *master)
 {
-	int i;
+	int i, ret;
 	struct arm_smmu_cmdq_ent cmd;
 	struct arm_smmu_cmdq_batch cmds;
+	struct arm_smmu_device *smmu = master->smmu;
 
 	arm_smmu_atc_inv_to_cmd(IOMMU_NO_PASID, 0, 0, &cmd);
 
 	cmds.num = 0;
+
+	arm_smmu_preempt_disable(smmu);
 	for (i = 0; i < master->num_streams; i++) {
 		cmd.atc.sid = master->streams[i].id;
-		arm_smmu_cmdq_batch_add(master->smmu, &cmds, &cmd);
+		arm_smmu_cmdq_batch_add(smmu, &cmds, &cmd);
 	}
 
-	return arm_smmu_cmdq_batch_submit(master->smmu, &cmds);
+	ret = arm_smmu_cmdq_batch_submit(smmu, &cmds);
+	arm_smmu_preempt_enable(smmu);
+
+	return ret;
 }
 
 int arm_smmu_atc_inv_domain(struct arm_smmu_domain *smmu_domain, int ssid,
 			    unsigned long iova, size_t size)
 {
-	int i;
+	int i, ret;
 	unsigned long flags;
 	struct arm_smmu_cmdq_ent cmd;
 	struct arm_smmu_master *master;
 	struct arm_smmu_cmdq_batch cmds;
+	struct arm_smmu_device *smmu = smmu_domain->smmu;
 
-	if (!(smmu_domain->smmu->features & ARM_SMMU_FEAT_ATS))
+	if (!(smmu->features & ARM_SMMU_FEAT_ATS))
 		return 0;
 
 	/*
@@ -2027,6 +2049,7 @@ int arm_smmu_atc_inv_domain(struct arm_smmu_domain *smmu_domain, int ssid,
 
 	cmds.num = 0;
 
+	arm_smmu_preempt_disable(smmu);
 	spin_lock_irqsave(&smmu_domain->devices_lock, flags);
 	list_for_each_entry(master, &smmu_domain->devices, domain_head) {
 		if (!master->ats_enabled)
@@ -2034,12 +2057,15 @@ int arm_smmu_atc_inv_domain(struct arm_smmu_domain *smmu_domain, int ssid,
 
 		for (i = 0; i < master->num_streams; i++) {
 			cmd.atc.sid = master->streams[i].id;
-			arm_smmu_cmdq_batch_add(smmu_domain->smmu, &cmds, &cmd);
+			arm_smmu_cmdq_batch_add(smmu, &cmds, &cmd);
 		}
 	}
 	spin_unlock_irqrestore(&smmu_domain->devices_lock, flags);
 
-	return arm_smmu_cmdq_batch_submit(smmu_domain->smmu, &cmds);
+	ret = arm_smmu_cmdq_batch_submit(smmu, &cmds);
+	arm_smmu_preempt_enable(smmu);
+
+	return ret;
 }
 
 /* IO_PGTABLE API */
@@ -2104,6 +2130,7 @@ static void __arm_smmu_tlb_inv_range(struct arm_smmu_cmdq_ent *cmd,
 
 	cmds.num = 0;
 
+	arm_smmu_preempt_disable(smmu);
 	while (iova < end) {
 		if (smmu->features & ARM_SMMU_FEAT_RANGE_INV) {
 			/*
@@ -2135,6 +2162,7 @@ static void __arm_smmu_tlb_inv_range(struct arm_smmu_cmdq_ent *cmd,
 		iova += inv_range;
 	}
 	arm_smmu_cmdq_batch_submit(smmu, &cmds);
+	arm_smmu_preempt_enable(smmu);
 }
 
 static void arm_smmu_tlb_inv_range_domain(unsigned long iova, size_t size,
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2024-05-02 16:25 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-04-25 14:41 [PATCH V3 0/2] iommu/arm-smmu-v3: Add support for ECMDQ register mode Tanmay Jagdale
2024-04-25 14:41 ` [PATCH V3 1/2] " Tanmay Jagdale
2024-04-28  2:19   ` Leizhen (ThunderTown)
2024-04-25 14:41 ` [PATCH V3 2/2] iommu/arm-smmu-v3: Ensure that a set of associated commands are inserted in the same ECMDQ Tanmay Jagdale
2024-04-28  2:08 ` [PATCH V3 0/2] iommu/arm-smmu-v3: Add support for ECMDQ register mode Leizhen (ThunderTown)
2024-04-30 15:09 ` Will Deacon
2024-05-02 16:25 ` Jason Gunthorpe
  -- strict thread matches above, loose matches on Subject: below --
2024-04-25 14:31 [PATCH V3 2/2] iommu/arm-smmu-v3: Ensure that a set of associated commands are inserted in the same ECMDQ Tanmay Jagdale

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).