linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v4 0/2] iommu/arm-smmu: hard iova_to_phys
@ 2014-10-01  1:28 Mitchel Humpherys
  2014-10-01  1:28 ` [PATCH v4 1/2] iopoll: Introduce memory-mapped IO polling macros Mitchel Humpherys
  2014-10-01  1:28 ` [PATCH v4 2/2] iommu/arm-smmu: add support for iova_to_phys through ATS1PR Mitchel Humpherys
  0 siblings, 2 replies; 12+ messages in thread
From: Mitchel Humpherys @ 2014-10-01  1:28 UTC (permalink / raw)
  To: linux-arm-kernel

This series introduces support for performing iova-to-phys translations via
the ARM SMMU hardware on supported implementations. We also make use of
some new generic macros for polling hardware registers.

v1..v2:

  - Renamed one of the iopoll macros to use the more standard `_atomic'
    suffix
  - Removed some convenience iopoll wrappers to encourage explicitness

v2..v3:

  - Removed unnecessary `dev_name's

v3..v4:

  - Updated the iopoll commit message to reflect the patch better
  - Added locking around address translation op
  - Return 0 on iova_to_phys failure


Matt Wagantall (1):
  iopoll: Introduce memory-mapped IO polling macros

Mitchel Humpherys (1):
  iommu/arm-smmu: add support for iova_to_phys through ATS1PR

 drivers/iommu/arm-smmu.c | 79 +++++++++++++++++++++++++++++++++++++++++++++++-
 include/linux/iopoll.h   | 77 ++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 155 insertions(+), 1 deletion(-)
 create mode 100644 include/linux/iopoll.h

-- 
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
hosted by The Linux Foundation

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH v4 1/2] iopoll: Introduce memory-mapped IO polling macros
  2014-10-01  1:28 [PATCH v4 0/2] iommu/arm-smmu: hard iova_to_phys Mitchel Humpherys
@ 2014-10-01  1:28 ` Mitchel Humpherys
  2014-10-01  8:25   ` Arnd Bergmann
  2014-10-01  1:28 ` [PATCH v4 2/2] iommu/arm-smmu: add support for iova_to_phys through ATS1PR Mitchel Humpherys
  1 sibling, 1 reply; 12+ messages in thread
From: Mitchel Humpherys @ 2014-10-01  1:28 UTC (permalink / raw)
  To: linux-arm-kernel

From: Matt Wagantall <mattw@codeaurora.org>

It is sometimes necessary to poll a memory-mapped register until its value
satisfies some condition. Introduce a family of convenience macros that do
this. Tight-looping, sleeping, and timing out can all be accomplished using
these macros.

Cc: Thierry Reding <thierry.reding@gmail.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Matt Wagantall <mattw@codeaurora.org>
Signed-off-by: Mitchel Humpherys <mitchelh@codeaurora.org>
---
Changes since v3:

  - Updated commit message to better reflect the patch content
---
 include/linux/iopoll.h | 77 ++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 77 insertions(+)
 create mode 100644 include/linux/iopoll.h

diff --git a/include/linux/iopoll.h b/include/linux/iopoll.h
new file mode 100644
index 0000000000..594b0d4f03
--- /dev/null
+++ b/include/linux/iopoll.h
@@ -0,0 +1,77 @@
+/*
+ * Copyright (c) 2012-2014 The Linux Foundation. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 and
+ * only version 2 as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ */
+
+#ifndef _LINUX_IOPOLL_H
+#define _LINUX_IOPOLL_H
+
+#include <linux/kernel.h>
+#include <linux/types.h>
+#include <linux/hrtimer.h>
+#include <linux/delay.h>
+#include <linux/errno.h>
+#include <linux/io.h>
+
+/**
+ * readl_poll_timeout - Periodically poll an address until a condition is met or a timeout occurs
+ * @addr: Address to poll
+ * @val: Variable to read the value into
+ * @cond: Break condition (usually involving @val)
+ * @sleep_us: Maximum time to sleep between reads in us (0 tight-loops)
+ * @timeout_us: Timeout in us, 0 means never timeout
+ *
+ * Returns 0 on success and -ETIMEDOUT upon a timeout. In either
+ * case, the last read value at @addr is stored in @val. Must not
+ * be called from atomic context if sleep_us or timeout_us are used.
+ */
+#define readl_poll_timeout(addr, val, cond, sleep_us, timeout_us) \
+({ \
+	ktime_t timeout = ktime_add_us(ktime_get(), timeout_us); \
+	might_sleep_if(timeout_us); \
+	for (;;) { \
+		(val) = readl(addr); \
+		if (cond) \
+			break; \
+		if (timeout_us && ktime_compare(ktime_get(), timeout) > 0) { \
+			(val) = readl(addr); \
+			break; \
+		} \
+		if (sleep_us) \
+			usleep_range(DIV_ROUND_UP(sleep_us, 4), sleep_us); \
+	} \
+	(cond) ? 0 : -ETIMEDOUT; \
+})
+
+/**
+ * readl_poll_timeout_atomic - Periodically poll an address until a condition is met or a timeout occurs
+ * @addr: Address to poll
+ * @val: Variable to read the value into
+ * @cond: Break condition (usually involving @val)
+ * @max_reads: Maximum number of reads before giving up
+ * @time_between_us: Time to udelay() between successive reads
+ *
+ * Returns 0 on success and -ETIMEDOUT upon a timeout.
+ */
+#define readl_poll_timeout_atomic(addr, val, cond, max_reads, time_between_us) \
+({ \
+	int count; \
+	for (count = (max_reads); count > 0; count--) { \
+		(val) = readl(addr); \
+		if (cond) \
+			break; \
+		udelay(time_between_us); \
+	} \
+	(cond) ? 0 : -ETIMEDOUT; \
+})
+
+#endif /* _LINUX_IOPOLL_H */
-- 
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
hosted by The Linux Foundation

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v4 2/2] iommu/arm-smmu: add support for iova_to_phys through ATS1PR
  2014-10-01  1:28 [PATCH v4 0/2] iommu/arm-smmu: hard iova_to_phys Mitchel Humpherys
  2014-10-01  1:28 ` [PATCH v4 1/2] iopoll: Introduce memory-mapped IO polling macros Mitchel Humpherys
@ 2014-10-01  1:28 ` Mitchel Humpherys
  2014-10-01  8:27   ` Arnd Bergmann
  1 sibling, 1 reply; 12+ messages in thread
From: Mitchel Humpherys @ 2014-10-01  1:28 UTC (permalink / raw)
  To: linux-arm-kernel

Currently, we provide the iommu_ops.iova_to_phys service by doing a
table walk in software to translate IO virtual addresses to physical
addresses. On SMMUs that support it, it can be useful to ask the SMMU
itself to do the translation. This can be used to warm the TLBs for an
SMMU. It can also be useful for testing and hardware validation.

Since the address translation registers are optional on SMMUv2, only
enable hardware translations when using SMMUv1 or when SMMU_IDR0.S1TS=1
and SMMU_IDR0.ATOSNS=0, as described in the ARM SMMU v1-v2 spec.

Signed-off-by: Mitchel Humpherys <mitchelh@codeaurora.org>
---
Changes since v3:

  - Added locking around address translation op
  - Return 0 on iova_to_phys failure
---
 drivers/iommu/arm-smmu.c | 79 +++++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 78 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index 37dc3dd0df..c80c12a104 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -36,6 +36,7 @@
 #include <linux/interrupt.h>
 #include <linux/io.h>
 #include <linux/iommu.h>
+#include <linux/iopoll.h>
 #include <linux/mm.h>
 #include <linux/module.h>
 #include <linux/of.h>
@@ -140,6 +141,7 @@
 #define ID0_S2TS			(1 << 29)
 #define ID0_NTS				(1 << 28)
 #define ID0_SMS				(1 << 27)
+#define ID0_ATOSNS			(1 << 26)
 #define ID0_PTFS_SHIFT			24
 #define ID0_PTFS_MASK			0x2
 #define ID0_PTFS_V8_ONLY		0x2
@@ -233,11 +235,16 @@
 #define ARM_SMMU_CB_TTBR0_HI		0x24
 #define ARM_SMMU_CB_TTBCR		0x30
 #define ARM_SMMU_CB_S1_MAIR0		0x38
+#define ARM_SMMU_CB_PAR_LO		0x50
+#define ARM_SMMU_CB_PAR_HI		0x54
 #define ARM_SMMU_CB_FSR			0x58
 #define ARM_SMMU_CB_FAR_LO		0x60
 #define ARM_SMMU_CB_FAR_HI		0x64
 #define ARM_SMMU_CB_FSYNR0		0x68
 #define ARM_SMMU_CB_S1_TLBIASID		0x610
+#define ARM_SMMU_CB_ATS1PR_LO		0x800
+#define ARM_SMMU_CB_ATS1PR_HI		0x804
+#define ARM_SMMU_CB_ATSR		0x8f0
 
 #define SCTLR_S1_ASIDPNE		(1 << 12)
 #define SCTLR_CFCFG			(1 << 7)
@@ -249,6 +256,10 @@
 #define SCTLR_M				(1 << 0)
 #define SCTLR_EAE_SBOP			(SCTLR_AFE | SCTLR_TRE)
 
+#define CB_PAR_F			(1 << 0)
+
+#define ATSR_ACTIVE			(1 << 0)
+
 #define RESUME_RETRY			(0 << 0)
 #define RESUME_TERMINATE		(1 << 0)
 
@@ -366,6 +377,7 @@ struct arm_smmu_device {
 #define ARM_SMMU_FEAT_TRANS_S1		(1 << 2)
 #define ARM_SMMU_FEAT_TRANS_S2		(1 << 3)
 #define ARM_SMMU_FEAT_TRANS_NESTED	(1 << 4)
+#define ARM_SMMU_FEAT_TRANS_OPS		(1 << 5)
 	u32				features;
 
 #define ARM_SMMU_OPT_SECURE_CFG_ACCESS (1 << 0)
@@ -1524,7 +1536,7 @@ static size_t arm_smmu_unmap(struct iommu_domain *domain, unsigned long iova,
 	return ret ? 0 : size;
 }
 
-static phys_addr_t arm_smmu_iova_to_phys(struct iommu_domain *domain,
+static phys_addr_t arm_smmu_iova_to_phys_soft(struct iommu_domain *domain,
 					 dma_addr_t iova)
 {
 	pgd_t *pgdp, pgd;
@@ -1557,6 +1569,66 @@ static phys_addr_t arm_smmu_iova_to_phys(struct iommu_domain *domain,
 	return __pfn_to_phys(pte_pfn(pte)) | (iova & ~PAGE_MASK);
 }
 
+static phys_addr_t arm_smmu_iova_to_phys_hard(struct iommu_domain *domain,
+					dma_addr_t iova)
+{
+	struct arm_smmu_domain *smmu_domain = domain->priv;
+	struct arm_smmu_device *smmu = smmu_domain->smmu;
+	struct arm_smmu_cfg *cfg = &smmu_domain->cfg;
+	struct device *dev = smmu->dev;
+	void __iomem *cb_base;
+	u32 tmp;
+	u64 phys;
+	unsigned long flags;
+
+	cb_base = ARM_SMMU_CB_BASE(smmu) + ARM_SMMU_CB(smmu, cfg->cbndx);
+
+	spin_lock_irqsave(&smmu_domain->lock, flags);
+
+	if (smmu->version == 1) {
+		u32 reg = iova & ~0xfff;
+		writel_relaxed(reg, cb_base + ARM_SMMU_CB_ATS1PR_LO);
+	} else {
+		u32 reg = iova & ~0xfff;
+		writel_relaxed(reg, cb_base + ARM_SMMU_CB_ATS1PR_LO);
+		reg = (iova & ~0xfff) >> 32;
+		writel_relaxed(reg, cb_base + ARM_SMMU_CB_ATS1PR_HI);
+	}
+
+	if (readl_poll_timeout_atomic(cb_base + ARM_SMMU_CB_ATSR, tmp,
+				!(tmp & ATSR_ACTIVE), 50, 100)) {
+		dev_err(dev,
+			"iova to phys timed out on 0x%pa. Falling back to software table walk.\n",
+			&iova);
+		return arm_smmu_iova_to_phys_soft(domain, iova);
+	}
+
+	phys = readl_relaxed(cb_base + ARM_SMMU_CB_PAR_LO);
+	phys |= ((u64) readl_relaxed(cb_base + ARM_SMMU_CB_PAR_HI)) << 32;
+
+	spin_unlock_irqrestore(&smmu_domain->lock, flags);
+
+	if (phys & CB_PAR_F) {
+		dev_err(dev, "translation fault!\n");
+		dev_err(dev, "PAR = 0x%llx\n", phys);
+		phys = 0;
+	} else {
+		phys = (phys & 0xfffffff000ULL) | (iova & 0x00000fff);
+	}
+
+	return phys;
+}
+
+static phys_addr_t arm_smmu_iova_to_phys(struct iommu_domain *domain,
+					dma_addr_t iova)
+{
+	struct arm_smmu_domain *smmu_domain = domain->priv;
+
+	if (smmu_domain->smmu->features & ARM_SMMU_FEAT_TRANS_OPS)
+		return arm_smmu_iova_to_phys_hard(domain, iova);
+	return arm_smmu_iova_to_phys_soft(domain, iova);
+}
+
 static int arm_smmu_domain_has_cap(struct iommu_domain *domain,
 				   unsigned long cap)
 {
@@ -1777,6 +1849,11 @@ static int arm_smmu_device_cfg_probe(struct arm_smmu_device *smmu)
 		return -ENODEV;
 	}
 
+	if (smmu->version == 1 || (!(id & ID0_ATOSNS) && (id & ID0_S1TS))) {
+		smmu->features |= ARM_SMMU_FEAT_TRANS_OPS;
+		dev_notice(smmu->dev, "\taddress translation ops\n");
+	}
+
 	if (id & ID0_CTTW) {
 		smmu->features |= ARM_SMMU_FEAT_COHERENT_WALK;
 		dev_notice(smmu->dev, "\tcoherent table walk\n");
-- 
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
hosted by The Linux Foundation

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v4 1/2] iopoll: Introduce memory-mapped IO polling macros
  2014-10-01  1:28 ` [PATCH v4 1/2] iopoll: Introduce memory-mapped IO polling macros Mitchel Humpherys
@ 2014-10-01  8:25   ` Arnd Bergmann
  2014-10-08  1:47     ` Mitchel Humpherys
  0 siblings, 1 reply; 12+ messages in thread
From: Arnd Bergmann @ 2014-10-01  8:25 UTC (permalink / raw)
  To: linux-arm-kernel

On Tuesday 30 September 2014 18:28:12 Mitchel Humpherys wrote:
> + */
> +#define readl_poll_timeout(addr, val, cond, sleep_us, timeout_us) \
> +({ \
> +       ktime_t timeout = ktime_add_us(ktime_get(), timeout_us); \
> +       might_sleep_if(timeout_us); \

Does it make sense to call this with timeout_us = 0?

> +       for (;;) { \
> +               (val) = readl(addr); \
> +               if (cond) \
> +                       break; \
> +               if (timeout_us && ktime_compare(ktime_get(), timeout) > 0) { \
> +                       (val) = readl(addr); \
> +                       break; \
> +               } \
> +               if (sleep_us) \
> +                       usleep_range(DIV_ROUND_UP(sleep_us, 4), sleep_us); \
> +       } \
> +       (cond) ? 0 : -ETIMEDOUT; \
> +})

I think it would be better to tie the 'range' argument to the timeout. Also
doing a division seems expensive here.

> +/**
> + * readl_poll_timeout_atomic - Periodically poll an address until a condition is met or a timeout occurs
> + * @addr: Address to poll
> + * @val: Variable to read the value into
> + * @cond: Break condition (usually involving @val)
> + * @max_reads: Maximum number of reads before giving up
> + * @time_between_us: Time to udelay() between successive reads
> + *
> + * Returns 0 on success and -ETIMEDOUT upon a timeout.
> + */
> +#define readl_poll_timeout_atomic(addr, val, cond, max_reads, time_between_us) \
> +({ \
> +       int count; \
> +       for (count = (max_reads); count > 0; count--) { \
> +               (val) = readl(addr); \
> +               if (cond) \
> +                       break; \
> +               udelay(time_between_us); \
> +       } \
> +       (cond) ? 0 : -ETIMEDOUT; \
> +})

udelay has a large variability, I think it would be better to also use
ktime_compare here and make the interface the same as the other one.
You might want to add a warning if someone tries to pass more than a few
microseconds as the timeout.

More generally speaking, using 'readl' seems fairly specific. I suspect
that we'd have to add the entire range of accessors over time if this
catches on: readb, readw, readq, readb_relaxed, readw_relaxed, readl_relaxed,
readq_relaxed, ioread8, ioread16, ioread16be, ioread32, ioread32be,
inb, inb_p, inw, inw_p, inw, inl, inl_p, and possibly more of those.

Would it make sense to pass that operation as an argument?

	Arnd

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH v4 2/2] iommu/arm-smmu: add support for iova_to_phys through ATS1PR
  2014-10-01  1:28 ` [PATCH v4 2/2] iommu/arm-smmu: add support for iova_to_phys through ATS1PR Mitchel Humpherys
@ 2014-10-01  8:27   ` Arnd Bergmann
  2014-10-01 19:52     ` Mitchel Humpherys
  0 siblings, 1 reply; 12+ messages in thread
From: Arnd Bergmann @ 2014-10-01  8:27 UTC (permalink / raw)
  To: linux-arm-kernel

On Tuesday 30 September 2014 18:28:13 Mitchel Humpherys wrote:
> +       if (readl_poll_timeout_atomic(cb_base + ARM_SMMU_CB_ATSR, tmp,
> +                               !(tmp & ATSR_ACTIVE), 50, 100)) {
> 

This looks really bad.

You are doing up to 50 100us delays, each of which can be much longer,
so you can do up to 10ms total delay with interrupts disabled.

Don't do that.

	Arnd

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH v4 2/2] iommu/arm-smmu: add support for iova_to_phys through ATS1PR
  2014-10-01  8:27   ` Arnd Bergmann
@ 2014-10-01 19:52     ` Mitchel Humpherys
  0 siblings, 0 replies; 12+ messages in thread
From: Mitchel Humpherys @ 2014-10-01 19:52 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Oct 01 2014 at 01:27:27 AM, Arnd Bergmann <arnd@arndb.de> wrote:
> On Tuesday 30 September 2014 18:28:13 Mitchel Humpherys wrote:
>> +       if (readl_poll_timeout_atomic(cb_base + ARM_SMMU_CB_ATSR, tmp,
>> +                               !(tmp & ATSR_ACTIVE), 50, 100)) {
>> 
>
> This looks really bad.
>
> You are doing up to 50 100us delays, each of which can be much longer,
> so you can do up to 10ms total delay with interrupts disabled.
>
> Don't do that.

Oh wow somehow I forgot I was in atomic context even though I was
explicitly moving to the `_atomic' polling function in this version.
Don't ask.

Let me ratchet that down to a maximum of 10 delays of 5 microseconds
each for v5.


-Mitch

-- 
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
hosted by The Linux Foundation

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH v4 1/2] iopoll: Introduce memory-mapped IO polling macros
  2014-10-01  8:25   ` Arnd Bergmann
@ 2014-10-08  1:47     ` Mitchel Humpherys
  2014-10-08 13:40       ` Arnd Bergmann
  2014-10-09 22:45       ` Mitchel Humpherys
  0 siblings, 2 replies; 12+ messages in thread
From: Mitchel Humpherys @ 2014-10-08  1:47 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Oct 01 2014 at 01:25:33 AM, Arnd Bergmann <arnd@arndb.de> wrote:
> On Tuesday 30 September 2014 18:28:12 Mitchel Humpherys wrote:
>> + */
>> +#define readl_poll_timeout(addr, val, cond, sleep_us, timeout_us) \
>> +({ \
>> +       ktime_t timeout = ktime_add_us(ktime_get(), timeout_us); \
>> +       might_sleep_if(timeout_us); \
>
> Does it make sense to call this with timeout_us = 0?

Yes, the idea there being to "never timeout".  That mode should, of
course, be used with extreme caution since never timing out is not
really "playing nice" with the system.

>
>> +       for (;;) { \
>> +               (val) = readl(addr); \
>> +               if (cond) \
>> +                       break; \
>> +               if (timeout_us && ktime_compare(ktime_get(), timeout) > 0) { \
>> +                       (val) = readl(addr); \
>> +                       break; \
>> +               } \
>> +               if (sleep_us) \
>> +                       usleep_range(DIV_ROUND_UP(sleep_us, 4), sleep_us); \
>> +       } \
>> +       (cond) ? 0 : -ETIMEDOUT; \
>> +})
>
> I think it would be better to tie the 'range' argument to the timeout. Also
> doing a division seems expensive here.

We may have cases where the HW spec says something like "the foo widget
response time is on average 5us, with a worst case of 100us."  In such a
case we may want to poll the bit very frequently to optimize for the
common case of a very fast lock time, but we may not want to error out
due to a timeout unless we've been waiting 100us.

Regarding the division, for the overwhelmingly common case where the
user of the API passes in a constant for sleep_us the compiler optimizes
out this calculation altogether and just sticks the final result in (I
verified this with gcc 4.9 and the kernel build system's built-in
support for generating .s files).  Conveying semantic meaning by using
`DIV_ROUND_UP' is nice but if you feel strongly about it we can make
this a shift instead.

>
>> +/**
>> + * readl_poll_timeout_atomic - Periodically poll an address until a condition is met or a timeout occurs
>> + * @addr: Address to poll
>> + * @val: Variable to read the value into
>> + * @cond: Break condition (usually involving @val)
>> + * @max_reads: Maximum number of reads before giving up
>> + * @time_between_us: Time to udelay() between successive reads
>> + *
>> + * Returns 0 on success and -ETIMEDOUT upon a timeout.
>> + */
>> +#define readl_poll_timeout_atomic(addr, val, cond, max_reads, time_between_us) \
>> +({ \
>> +       int count; \
>> +       for (count = (max_reads); count > 0; count--) { \
>> +               (val) = readl(addr); \
>> +               if (cond) \
>> +                       break; \
>> +               udelay(time_between_us); \
>> +       } \
>> +       (cond) ? 0 : -ETIMEDOUT; \
>> +})
>
> udelay has a large variability, I think it would be better to also use
> ktime_compare here and make the interface the same as the other one.
> You might want to add a warning if someone tries to pass more than a few
> microseconds as the timeout.

Sounds good, will update in v5.

>
> More generally speaking, using 'readl' seems fairly specific. I suspect
> that we'd have to add the entire range of accessors over time if this
> catches on: readb, readw, readq, readb_relaxed, readw_relaxed, readl_relaxed,
> readq_relaxed, ioread8, ioread16, ioread16be, ioread32, ioread32be,
> inb, inb_p, inw, inw_p, inw, inl, inl_p, and possibly more of those.
>
> Would it make sense to pass that operation as an argument?

Sure, we'll do that in v5 as well.



-Mitch

-- 
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
hosted by The Linux Foundation

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH v4 1/2] iopoll: Introduce memory-mapped IO polling macros
  2014-10-08  1:47     ` Mitchel Humpherys
@ 2014-10-08 13:40       ` Arnd Bergmann
  2014-10-10 19:44         ` Mitchel Humpherys
  2014-10-09 22:45       ` Mitchel Humpherys
  1 sibling, 1 reply; 12+ messages in thread
From: Arnd Bergmann @ 2014-10-08 13:40 UTC (permalink / raw)
  To: linux-arm-kernel

On Tuesday 07 October 2014 18:47:59 Mitchel Humpherys wrote:
> On Wed, Oct 01 2014 at 01:25:33 AM, Arnd Bergmann <arnd@arndb.de> wrote:
> > On Tuesday 30 September 2014 18:28:12 Mitchel Humpherys wrote:
> >> + */
> >> +#define readl_poll_timeout(addr, val, cond, sleep_us, timeout_us) \
> >> +({ \
> >> +       ktime_t timeout = ktime_add_us(ktime_get(), timeout_us); \
> >> +       might_sleep_if(timeout_us); \
> >
> > Does it make sense to call this with timeout_us = 0?
> 
> Yes, the idea there being to "never timeout".  That mode should, of
> course, be used with extreme caution since never timing out is not
> really "playing nice" with the system.

But then you certainly still 'might_sleep' here. The
might_sleep_if(timeout_us) line suggests that it won't sleep, but
that isn't the case.

> >
> >> +       for (;;) { \
> >> +               (val) = readl(addr); \
> >> +               if (cond) \
> >> +                       break; \
> >> +               if (timeout_us && ktime_compare(ktime_get(), timeout) > 0) { \
> >> +                       (val) = readl(addr); \
> >> +                       break; \
> >> +               } \
> >> +               if (sleep_us) \
> >> +                       usleep_range(DIV_ROUND_UP(sleep_us, 4), sleep_us); \
> >> +       } \
> >> +       (cond) ? 0 : -ETIMEDOUT; \
> >> +})
> >
> > I think it would be better to tie the 'range' argument to the timeout. Also
> > doing a division seems expensive here.
> 
> We may have cases where the HW spec says something like "the foo widget
> response time is on average 5us, with a worst case of 100us."  In such a
> case we may want to poll the bit very frequently to optimize for the
> common case of a very fast lock time, but we may not want to error out
> due to a timeout unless we've been waiting 100us.

Ok.

> Regarding the division, for the overwhelmingly common case where the
> user of the API passes in a constant for sleep_us the compiler optimizes
> out this calculation altogether and just sticks the final result in (I
> verified this with gcc 4.9 and the kernel build system's built-in
> support for generating .s files).  Conveying semantic meaning by using
> `DIV_ROUND_UP' is nice but if you feel strongly about it we can make
> this a shift instead.

The more important question is probably if you want to keep the _ROUND_UP
part. If that's not significant, I think a shift would be better.

	Arnd

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH v4 1/2] iopoll: Introduce memory-mapped IO polling macros
  2014-10-08  1:47     ` Mitchel Humpherys
  2014-10-08 13:40       ` Arnd Bergmann
@ 2014-10-09 22:45       ` Mitchel Humpherys
  1 sibling, 0 replies; 12+ messages in thread
From: Mitchel Humpherys @ 2014-10-09 22:45 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Oct 07 2014 at 06:47:59 PM, Mitchel Humpherys <mitchelh@codeaurora.org> wrote:
>>> +#define readl_poll_timeout_atomic(addr, val, cond, max_reads, time_between_us) \
>>> +({ \
>>> +       int count; \
>>> +       for (count = (max_reads); count > 0; count--) { \
>>> +               (val) = readl(addr); \
>>> +               if (cond) \
>>> +                       break; \
>>> +               udelay(time_between_us); \
>>> +       } \
>>> +       (cond) ? 0 : -ETIMEDOUT; \
>>> +})
>>
>> udelay has a large variability, I think it would be better to also use
>> ktime_compare here and make the interface the same as the other one.
>> You might want to add a warning if someone tries to pass more than a few
>> microseconds as the timeout.
>
> Sounds good, will update in v5.

Except I'll probably hold off on adding a warning about udelay since
udelay already includes a "warning" (a compile error, actually) when
exceedingly large delays are requested.


-Mitch

-- 
Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH v4 1/2] iopoll: Introduce memory-mapped IO polling macros
  2014-10-08 13:40       ` Arnd Bergmann
@ 2014-10-10 19:44         ` Mitchel Humpherys
  2014-10-10 19:50           ` Arnd Bergmann
  0 siblings, 1 reply; 12+ messages in thread
From: Mitchel Humpherys @ 2014-10-10 19:44 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Oct 08 2014 at 06:40:46 AM, Arnd Bergmann <arnd@arndb.de> wrote:
> On Tuesday 07 October 2014 18:47:59 Mitchel Humpherys wrote:
>> On Wed, Oct 01 2014 at 01:25:33 AM, Arnd Bergmann <arnd@arndb.de> wrote:
>> > On Tuesday 30 September 2014 18:28:12 Mitchel Humpherys wrote:
>> >> + */
>> >> +#define readl_poll_timeout(addr, val, cond, sleep_us, timeout_us) \
>> >> +({ \
>> >> +       ktime_t timeout = ktime_add_us(ktime_get(), timeout_us); \
>> >> +       might_sleep_if(timeout_us); \
>> >
>> > Does it make sense to call this with timeout_us = 0?
>> 
>> Yes, the idea there being to "never timeout".  That mode should, of
>> course, be used with extreme caution since never timing out is not
>> really "playing nice" with the system.
>
> But then you certainly still 'might_sleep' here. The
> might_sleep_if(timeout_us) line suggests that it won't sleep, but
> that isn't the case.

Yes looks like that was actually a bug.  Should have been
might_sleep_if()'ing on sleep_us.  This is fixed in the v5 I just sent
out.


[...]

>> Regarding the division, for the overwhelmingly common case where the
>> user of the API passes in a constant for sleep_us the compiler optimizes
>> out this calculation altogether and just sticks the final result in (I
>> verified this with gcc 4.9 and the kernel build system's built-in
>> support for generating .s files).  Conveying semantic meaning by using
>> `DIV_ROUND_UP' is nice but if you feel strongly about it we can make
>> this a shift instead.
>
> The more important question is probably if you want to keep the _ROUND_UP
> part. If that's not significant, I think a shift would be better.

If we drop the _ROUND_UP then passing a sleep_us <= 4 would result in a
minimum sleep time of 0, so we'd be polling a lot faster than the user
had expected.



-Mitch

-- 
Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH v4 1/2] iopoll: Introduce memory-mapped IO polling macros
  2014-10-10 19:44         ` Mitchel Humpherys
@ 2014-10-10 19:50           ` Arnd Bergmann
  2014-10-10 20:24             ` Mitchel Humpherys
  0 siblings, 1 reply; 12+ messages in thread
From: Arnd Bergmann @ 2014-10-10 19:50 UTC (permalink / raw)
  To: linux-arm-kernel

On Friday 10 October 2014 12:44:45 Mitchel Humpherys wrote:
> >> Regarding the division, for the overwhelmingly common case where the
> >> user of the API passes in a constant for sleep_us the compiler optimizes
> >> out this calculation altogether and just sticks the final result in (I
> >> verified this with gcc 4.9 and the kernel build system's built-in
> >> support for generating .s files).  Conveying semantic meaning by using
> >> `DIV_ROUND_UP' is nice but if you feel strongly about it we can make
> >> this a shift instead.
> >
> > The more important question is probably if you want to keep the _ROUND_UP
> > part. If that's not significant, I think a shift would be better.
> 
> If we drop the _ROUND_UP then passing a sleep_us <= 4 would result in a
> minimum sleep time of 0, so we'd be polling a lot faster than the user
> had expected.

How about changing the semantics to sleep at least the sleep_us time,
and at most four times that? This would turn the expensive division into
a multiplication and avoid the need for rounding.

If there are important reasons to keep doing the division, you could
instead use '(sleep_us >> 4) + 1', which is also very cheap to compute
and avoids the problem you mention.

	Arnd

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH v4 1/2] iopoll: Introduce memory-mapped IO polling macros
  2014-10-10 19:50           ` Arnd Bergmann
@ 2014-10-10 20:24             ` Mitchel Humpherys
  0 siblings, 0 replies; 12+ messages in thread
From: Mitchel Humpherys @ 2014-10-10 20:24 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Oct 10 2014 at 12:50:23 PM, Arnd Bergmann <arnd@arndb.de> wrote:
> On Friday 10 October 2014 12:44:45 Mitchel Humpherys wrote:
>> >> Regarding the division, for the overwhelmingly common case where the
>> >> user of the API passes in a constant for sleep_us the compiler optimizes
>> >> out this calculation altogether and just sticks the final result in (I
>> >> verified this with gcc 4.9 and the kernel build system's built-in
>> >> support for generating .s files).  Conveying semantic meaning by using
>> >> `DIV_ROUND_UP' is nice but if you feel strongly about it we can make
>> >> this a shift instead.
>> >
>> > The more important question is probably if you want to keep the _ROUND_UP
>> > part. If that's not significant, I think a shift would be better.
>> 
>> If we drop the _ROUND_UP then passing a sleep_us <= 4 would result in a
>> minimum sleep time of 0, so we'd be polling a lot faster than the user
>> had expected.
>
> How about changing the semantics to sleep at least the sleep_us time,
> and at most four times that? This would turn the expensive division into
> a multiplication and avoid the need for rounding.

We already have a bunch of code using this and I'm not sure what the
effect would be on the system by changing this.  It would probably be
negligible but saving a couple of instructions hardly seems like
justification for a change...  More importantly, I think users would
rather poll their register a little quicker than they asked, rather than
slower.

> If there are important reasons to keep doing the division, you could
> instead use '(sleep_us >> 4) + 1', which is also very cheap to compute
> and avoids the problem you mention.

But I think you meant `(sleep_us >> 2) + 1', right?  Incidentally, that
illustrates the benefit of the semantic clarity provided by explicitly
dividing :).  In any case, I'm happy to go with a shift here for v6.


-Mitch

-- 
Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2014-10-10 20:24 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-10-01  1:28 [PATCH v4 0/2] iommu/arm-smmu: hard iova_to_phys Mitchel Humpherys
2014-10-01  1:28 ` [PATCH v4 1/2] iopoll: Introduce memory-mapped IO polling macros Mitchel Humpherys
2014-10-01  8:25   ` Arnd Bergmann
2014-10-08  1:47     ` Mitchel Humpherys
2014-10-08 13:40       ` Arnd Bergmann
2014-10-10 19:44         ` Mitchel Humpherys
2014-10-10 19:50           ` Arnd Bergmann
2014-10-10 20:24             ` Mitchel Humpherys
2014-10-09 22:45       ` Mitchel Humpherys
2014-10-01  1:28 ` [PATCH v4 2/2] iommu/arm-smmu: add support for iova_to_phys through ATS1PR Mitchel Humpherys
2014-10-01  8:27   ` Arnd Bergmann
2014-10-01 19:52     ` Mitchel Humpherys

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).