All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH V4 0/6] iommu/arm-smmu: Add runtime pm/sleep support
@ 2017-07-06  9:36 ` Vivek Gautam
  0 siblings, 0 replies; 168+ messages in thread
From: Vivek Gautam @ 2017-07-06  9:36 UTC (permalink / raw)
  To: joro, robin.murphy, robh+dt, mark.rutland, will.deacon,
	m.szyprowski, sboyd, robdclark, iommu, devicetree, linux-kernel,
	linux-clk
  Cc: linux-arm-msm, sricharan, stanimir.varbanov, architt,
	vivek.gautam, linux-arm-kernel

This series provides the support for turning on the arm-smmu's
clocks/power domains using runtime pm. This is done using the
recently introduced device links patches, which lets the symmu's
runtime to follow the master's runtime pm, so the smmu remains
powered only when the masters use it.

Took some reference from the exynos runtime patches [2].
Tested this with MDP, GPU, and VENUS devices on apq8096-db820c board.

Previous version of the patchset [1].

[V4]
   * Reworked the clock handling part. We now take clock names as data
     in the driver for supported compatible versions, and loop over them
     to get, enable, and disable the clocks.
   * Using qcom,msm8996 based compatibles for bindings instead of a generic
     qcom compatible.
   * Refactor MMU500 patch to just add the necessary clock names data and
     corresponding bindings.
   * Added the pm_runtime_get/put() calls in .unmap iommu op (fix added by
     Stanimir on top of previous patch version.
   * Added a patch to fix error path in arm_smmu_add_device()
   * Removed patch 3/5 of V3 patch series that added qcom,smmu-v2 bindings.

[V3]
   * Reworked the patches to keep the clocks init/enabling function
     separately for each compatible.

   * Added clocks bindings for MMU40x/500.

   * Added a new compatible for qcom,smmu-v2 implementation and
     the clock bindings for the same.

   * Rebased on top of 4.11-rc1

[V2]
   * Split the patches little differently.

   * Addressed comments.

   * Removed the patch #4 [3] from previous post
     for arm-smmu context save restore. Planning to
     post this separately after reworking/addressing Robin's
     feedback.

   * Reversed the sequence to disable clocks than enabling.
     This was required for those cases where the
     clocks are populated in a dependent order from DT.

[1] https://www.spinics.net/lists/arm-kernel/msg567488.html
[2] https://lkml.org/lkml/2016/10/20/70
[3] https://patchwork.kernel.org/patch/9389717/

Sricharan R (4):
  iommu/arm-smmu: Add pm_runtime/sleep ops
  iommu/arm-smmu: Invoke pm_runtime during probe, add/remove device
  iommu/arm-smmu: Add the device_link between masters and smmu
  iommu/arm-smmu: Add support for MMU40x/500 clocks

Vivek Gautam (2):
  iommu/arm-smmu: Fix the error path in arm_smmu_add_device
  iommu/arm-smmu: Add support for qcom,msm8996-smmu-v2 clocks

 .../devicetree/bindings/iommu/arm,smmu.txt         |  42 +++++
 drivers/iommu/arm-smmu.c                           | 191 +++++++++++++++++++--
 2 files changed, 222 insertions(+), 11 deletions(-)

-- 
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project

^ permalink raw reply	[flat|nested] 168+ messages in thread

* [PATCH V4 0/6] iommu/arm-smmu: Add runtime pm/sleep support
@ 2017-07-06  9:36 ` Vivek Gautam
  0 siblings, 0 replies; 168+ messages in thread
From: Vivek Gautam @ 2017-07-06  9:36 UTC (permalink / raw)
  To: linux-arm-kernel

This series provides the support for turning on the arm-smmu's
clocks/power domains using runtime pm. This is done using the
recently introduced device links patches, which lets the symmu's
runtime to follow the master's runtime pm, so the smmu remains
powered only when the masters use it.

Took some reference from the exynos runtime patches [2].
Tested this with MDP, GPU, and VENUS devices on apq8096-db820c board.

Previous version of the patchset [1].

[V4]
   * Reworked the clock handling part. We now take clock names as data
     in the driver for supported compatible versions, and loop over them
     to get, enable, and disable the clocks.
   * Using qcom,msm8996 based compatibles for bindings instead of a generic
     qcom compatible.
   * Refactor MMU500 patch to just add the necessary clock names data and
     corresponding bindings.
   * Added the pm_runtime_get/put() calls in .unmap iommu op (fix added by
     Stanimir on top of previous patch version.
   * Added a patch to fix error path in arm_smmu_add_device()
   * Removed patch 3/5 of V3 patch series that added qcom,smmu-v2 bindings.

[V3]
   * Reworked the patches to keep the clocks init/enabling function
     separately for each compatible.

   * Added clocks bindings for MMU40x/500.

   * Added a new compatible for qcom,smmu-v2 implementation and
     the clock bindings for the same.

   * Rebased on top of 4.11-rc1

[V2]
   * Split the patches little differently.

   * Addressed comments.

   * Removed the patch #4 [3] from previous post
     for arm-smmu context save restore. Planning to
     post this separately after reworking/addressing Robin's
     feedback.

   * Reversed the sequence to disable clocks than enabling.
     This was required for those cases where the
     clocks are populated in a dependent order from DT.

[1] https://www.spinics.net/lists/arm-kernel/msg567488.html
[2] https://lkml.org/lkml/2016/10/20/70
[3] https://patchwork.kernel.org/patch/9389717/

Sricharan R (4):
  iommu/arm-smmu: Add pm_runtime/sleep ops
  iommu/arm-smmu: Invoke pm_runtime during probe, add/remove device
  iommu/arm-smmu: Add the device_link between masters and smmu
  iommu/arm-smmu: Add support for MMU40x/500 clocks

Vivek Gautam (2):
  iommu/arm-smmu: Fix the error path in arm_smmu_add_device
  iommu/arm-smmu: Add support for qcom,msm8996-smmu-v2 clocks

 .../devicetree/bindings/iommu/arm,smmu.txt         |  42 +++++
 drivers/iommu/arm-smmu.c                           | 191 +++++++++++++++++++--
 2 files changed, 222 insertions(+), 11 deletions(-)

-- 
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project

^ permalink raw reply	[flat|nested] 168+ messages in thread

* [PATCH V4 1/6] iommu/arm-smmu: Fix the error path in arm_smmu_add_device
  2017-07-06  9:36 ` Vivek Gautam
@ 2017-07-06  9:37   ` Vivek Gautam
  -1 siblings, 0 replies; 168+ messages in thread
From: Vivek Gautam @ 2017-07-06  9:37 UTC (permalink / raw)
  To: joro, robin.murphy, robh+dt, mark.rutland, will.deacon,
	m.szyprowski, sboyd, robdclark, iommu, devicetree, linux-kernel,
	linux-clk
  Cc: linux-arm-msm, sricharan, stanimir.varbanov, architt,
	vivek.gautam, linux-arm-kernel

fwspec->iommu_priv is available only after arm_smmu_master_cfg
instance has been allocated. We shouldn't free it before that.
Also it's logical to free the master cfg itself without
checking for fwspec.

Signed-off-by: Vivek Gautam <vivek.gautam@codeaurora.org>
---
 drivers/iommu/arm-smmu.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index 9a45117d90de..61b1f8729a7c 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -1349,15 +1349,15 @@ static int arm_smmu_add_device(struct device *dev)
 
 	ret = arm_smmu_master_alloc_smes(dev);
 	if (ret)
-		goto out_free;
+		goto out_cfg_free;
 
 	iommu_device_link(&smmu->iommu, dev);
 
 	return 0;
 
+out_cfg_free:
+	kfree(cfg);
 out_free:
-	if (fwspec)
-		kfree(fwspec->iommu_priv);
 	iommu_fwspec_free(dev);
 	return ret;
 }
-- 
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project


^ permalink raw reply related	[flat|nested] 168+ messages in thread

* [PATCH V4 1/6] iommu/arm-smmu: Fix the error path in arm_smmu_add_device
@ 2017-07-06  9:37   ` Vivek Gautam
  0 siblings, 0 replies; 168+ messages in thread
From: Vivek Gautam @ 2017-07-06  9:37 UTC (permalink / raw)
  To: linux-arm-kernel

fwspec->iommu_priv is available only after arm_smmu_master_cfg
instance has been allocated. We shouldn't free it before that.
Also it's logical to free the master cfg itself without
checking for fwspec.

Signed-off-by: Vivek Gautam <vivek.gautam@codeaurora.org>
---
 drivers/iommu/arm-smmu.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index 9a45117d90de..61b1f8729a7c 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -1349,15 +1349,15 @@ static int arm_smmu_add_device(struct device *dev)
 
 	ret = arm_smmu_master_alloc_smes(dev);
 	if (ret)
-		goto out_free;
+		goto out_cfg_free;
 
 	iommu_device_link(&smmu->iommu, dev);
 
 	return 0;
 
+out_cfg_free:
+	kfree(cfg);
 out_free:
-	if (fwspec)
-		kfree(fwspec->iommu_priv);
 	iommu_fwspec_free(dev);
 	return ret;
 }
-- 
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project

^ permalink raw reply related	[flat|nested] 168+ messages in thread

* [PATCH V4 2/6] iommu/arm-smmu: Add pm_runtime/sleep ops
  2017-07-06  9:36 ` Vivek Gautam
  (?)
@ 2017-07-06  9:37   ` Vivek Gautam
  -1 siblings, 0 replies; 168+ messages in thread
From: Vivek Gautam @ 2017-07-06  9:37 UTC (permalink / raw)
  To: joro, robin.murphy, robh+dt, mark.rutland, will.deacon,
	m.szyprowski, sboyd, robdclark, iommu, devicetree, linux-kernel,
	linux-clk
  Cc: linux-arm-msm, sricharan, stanimir.varbanov, architt,
	vivek.gautam, linux-arm-kernel

From: Sricharan R <sricharan@codeaurora.org>

The smmu needs to be functional only when the respective
master's using it are active. The device_link feature
helps to track such functional dependencies, so that the
iommu gets powered when the master device enables itself
using pm_runtime. So by adapting the smmu driver for
runtime pm, above said dependency can be addressed.

This patch adds the pm runtime/sleep callbacks to the
driver and also the functions to parse the smmu clocks
from DT and enable them in resume/suspend.

Signed-off-by: Sricharan R <sricharan@codeaurora.org>
Signed-off-by: Archit Taneja <architt@codeaurora.org>
[vivek: Clock rework to loop over clock names data]
Signed-off-by: Vivek Gautam <vivek.gautam@codeaurora.org>
---
 drivers/iommu/arm-smmu.c | 95 +++++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 94 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index 61b1f8729a7c..bfe613f8939c 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -48,6 +48,7 @@
 #include <linux/of_iommu.h>
 #include <linux/pci.h>
 #include <linux/platform_device.h>
+#include <linux/pm_runtime.h>
 #include <linux/slab.h>
 #include <linux/spinlock.h>
 
@@ -196,6 +197,9 @@ struct arm_smmu_device {
 	u32				num_global_irqs;
 	u32				num_context_irqs;
 	unsigned int			*irqs;
+	int                             num_clks;
+	struct clk                      **clocks;
+	const char * const		*clk_names;
 
 	u32				cavium_id_base; /* Specific to Cavium */
 
@@ -272,6 +276,32 @@ static void parse_driver_options(struct arm_smmu_device *smmu)
 	} while (arm_smmu_options[++i].opt);
 }
 
+static int arm_smmu_enable_clocks(struct arm_smmu_device *smmu)
+{
+	int i, ret = 0;
+
+	for (i = 0; i < smmu->num_clks; ++i) {
+		ret = clk_prepare_enable(smmu->clocks[i]);
+		if (ret) {
+			dev_err(smmu->dev, "Couldn't enable %s clock\n",
+				smmu->clk_names[i]);
+			while (i--)
+				clk_disable_unprepare(smmu->clocks[i]);
+			break;
+		}
+	}
+
+	return ret;
+}
+
+static void arm_smmu_disable_clocks(struct arm_smmu_device *smmu)
+{
+	int i = smmu->num_clks;
+
+	while (i--)
+		clk_disable_unprepare(smmu->clocks[i]);
+}
+
 static struct device_node *dev_get_dev_node(struct device *dev)
 {
 	if (dev_is_pci(dev)) {
@@ -1626,6 +1656,36 @@ static int arm_smmu_id_size_to_bits(int size)
 	}
 }
 
+static int arm_smmu_init_clocks(struct arm_smmu_device *smmu)
+{
+	int i, err;
+	struct device *dev = smmu->dev;
+
+	if (smmu->num_clks < 1)
+		return 0;
+
+	smmu->clocks = devm_kcalloc(dev, smmu->num_clks,
+				    sizeof(*smmu->clocks), GFP_KERNEL);
+	if (!smmu->clocks)
+		return -ENOMEM;
+
+	for (i = 0; i < smmu->num_clks; i++) {
+		const char *cname = smmu->clk_names[i];
+		struct clk *c = devm_clk_get(dev, cname);
+
+		if (IS_ERR(c)) {
+			err = PTR_ERR(c);
+			if (err != -EPROBE_DEFER)
+				dev_err(dev, "Couldn't get clock: %s", cname);
+
+			return err;
+		}
+		smmu->clocks[i] = c;
+	}
+
+	return 0;
+}
+
 static int arm_smmu_device_cfg_probe(struct arm_smmu_device *smmu)
 {
 	unsigned long size;
@@ -1833,10 +1893,12 @@ static int arm_smmu_device_cfg_probe(struct arm_smmu_device *smmu)
 struct arm_smmu_match_data {
 	enum arm_smmu_arch_version version;
 	enum arm_smmu_implementation model;
+	const char * const *clks;
+	int num_clks;
 };
 
 #define ARM_SMMU_MATCH_DATA(name, ver, imp)	\
-static struct arm_smmu_match_data name = { .version = ver, .model = imp }
+static const struct arm_smmu_match_data name = { .version = ver, .model = imp }
 
 ARM_SMMU_MATCH_DATA(smmu_generic_v1, ARM_SMMU_V1, GENERIC_SMMU);
 ARM_SMMU_MATCH_DATA(smmu_generic_v2, ARM_SMMU_V2, GENERIC_SMMU);
@@ -1937,6 +1999,8 @@ static int arm_smmu_device_dt_probe(struct platform_device *pdev,
 	data = of_device_get_match_data(dev);
 	smmu->version = data->version;
 	smmu->model = data->model;
+	smmu->clk_names = data->clks;
+	smmu->num_clks = data->num_clks;
 
 	parse_driver_options(smmu);
 
@@ -2035,6 +2099,10 @@ static int arm_smmu_device_probe(struct platform_device *pdev)
 		smmu->irqs[i] = irq;
 	}
 
+	err = arm_smmu_init_clocks(smmu);
+	if (err)
+		return err;
+
 	err = arm_smmu_device_cfg_probe(smmu);
 	if (err)
 		return err;
@@ -2120,10 +2188,35 @@ static int arm_smmu_device_remove(struct platform_device *pdev)
 	return 0;
 }
 
+#ifdef CONFIG_PM
+static int arm_smmu_resume(struct device *dev)
+{
+	struct arm_smmu_device *smmu = dev_get_drvdata(dev);
+
+	return arm_smmu_enable_clocks(smmu);
+}
+
+static int arm_smmu_suspend(struct device *dev)
+{
+	struct arm_smmu_device *smmu = dev_get_drvdata(dev);
+
+	arm_smmu_disable_clocks(smmu);
+
+	return 0;
+}
+#endif
+
+static const struct dev_pm_ops arm_smmu_pm_ops = {
+	SET_RUNTIME_PM_OPS(arm_smmu_suspend, arm_smmu_resume, NULL)
+	SET_SYSTEM_SLEEP_PM_OPS(pm_runtime_force_suspend,
+				pm_runtime_force_resume)
+};
+
 static struct platform_driver arm_smmu_driver = {
 	.driver	= {
 		.name		= "arm-smmu",
 		.of_match_table	= of_match_ptr(arm_smmu_of_match),
+		.pm = &arm_smmu_pm_ops,
 	},
 	.probe	= arm_smmu_device_probe,
 	.remove	= arm_smmu_device_remove,
-- 
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project

^ permalink raw reply related	[flat|nested] 168+ messages in thread

* [PATCH V4 2/6] iommu/arm-smmu: Add pm_runtime/sleep ops
@ 2017-07-06  9:37   ` Vivek Gautam
  0 siblings, 0 replies; 168+ messages in thread
From: Vivek Gautam @ 2017-07-06  9:37 UTC (permalink / raw)
  To: joro, robin.murphy, robh+dt, mark.rutland, will.deacon,
	m.szyprowski, sboyd, robdclark, iommu, devicetree, linux-kernel,
	linux-clk
  Cc: architt, linux-arm-msm, stanimir.varbanov, vivek.gautam,
	sricharan, linux-arm-kernel

From: Sricharan R <sricharan@codeaurora.org>

The smmu needs to be functional only when the respective
master's using it are active. The device_link feature
helps to track such functional dependencies, so that the
iommu gets powered when the master device enables itself
using pm_runtime. So by adapting the smmu driver for
runtime pm, above said dependency can be addressed.

This patch adds the pm runtime/sleep callbacks to the
driver and also the functions to parse the smmu clocks
from DT and enable them in resume/suspend.

Signed-off-by: Sricharan R <sricharan@codeaurora.org>
Signed-off-by: Archit Taneja <architt@codeaurora.org>
[vivek: Clock rework to loop over clock names data]
Signed-off-by: Vivek Gautam <vivek.gautam@codeaurora.org>
---
 drivers/iommu/arm-smmu.c | 95 +++++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 94 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index 61b1f8729a7c..bfe613f8939c 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -48,6 +48,7 @@
 #include <linux/of_iommu.h>
 #include <linux/pci.h>
 #include <linux/platform_device.h>
+#include <linux/pm_runtime.h>
 #include <linux/slab.h>
 #include <linux/spinlock.h>
 
@@ -196,6 +197,9 @@ struct arm_smmu_device {
 	u32				num_global_irqs;
 	u32				num_context_irqs;
 	unsigned int			*irqs;
+	int                             num_clks;
+	struct clk                      **clocks;
+	const char * const		*clk_names;
 
 	u32				cavium_id_base; /* Specific to Cavium */
 
@@ -272,6 +276,32 @@ static void parse_driver_options(struct arm_smmu_device *smmu)
 	} while (arm_smmu_options[++i].opt);
 }
 
+static int arm_smmu_enable_clocks(struct arm_smmu_device *smmu)
+{
+	int i, ret = 0;
+
+	for (i = 0; i < smmu->num_clks; ++i) {
+		ret = clk_prepare_enable(smmu->clocks[i]);
+		if (ret) {
+			dev_err(smmu->dev, "Couldn't enable %s clock\n",
+				smmu->clk_names[i]);
+			while (i--)
+				clk_disable_unprepare(smmu->clocks[i]);
+			break;
+		}
+	}
+
+	return ret;
+}
+
+static void arm_smmu_disable_clocks(struct arm_smmu_device *smmu)
+{
+	int i = smmu->num_clks;
+
+	while (i--)
+		clk_disable_unprepare(smmu->clocks[i]);
+}
+
 static struct device_node *dev_get_dev_node(struct device *dev)
 {
 	if (dev_is_pci(dev)) {
@@ -1626,6 +1656,36 @@ static int arm_smmu_id_size_to_bits(int size)
 	}
 }
 
+static int arm_smmu_init_clocks(struct arm_smmu_device *smmu)
+{
+	int i, err;
+	struct device *dev = smmu->dev;
+
+	if (smmu->num_clks < 1)
+		return 0;
+
+	smmu->clocks = devm_kcalloc(dev, smmu->num_clks,
+				    sizeof(*smmu->clocks), GFP_KERNEL);
+	if (!smmu->clocks)
+		return -ENOMEM;
+
+	for (i = 0; i < smmu->num_clks; i++) {
+		const char *cname = smmu->clk_names[i];
+		struct clk *c = devm_clk_get(dev, cname);
+
+		if (IS_ERR(c)) {
+			err = PTR_ERR(c);
+			if (err != -EPROBE_DEFER)
+				dev_err(dev, "Couldn't get clock: %s", cname);
+
+			return err;
+		}
+		smmu->clocks[i] = c;
+	}
+
+	return 0;
+}
+
 static int arm_smmu_device_cfg_probe(struct arm_smmu_device *smmu)
 {
 	unsigned long size;
@@ -1833,10 +1893,12 @@ static int arm_smmu_device_cfg_probe(struct arm_smmu_device *smmu)
 struct arm_smmu_match_data {
 	enum arm_smmu_arch_version version;
 	enum arm_smmu_implementation model;
+	const char * const *clks;
+	int num_clks;
 };
 
 #define ARM_SMMU_MATCH_DATA(name, ver, imp)	\
-static struct arm_smmu_match_data name = { .version = ver, .model = imp }
+static const struct arm_smmu_match_data name = { .version = ver, .model = imp }
 
 ARM_SMMU_MATCH_DATA(smmu_generic_v1, ARM_SMMU_V1, GENERIC_SMMU);
 ARM_SMMU_MATCH_DATA(smmu_generic_v2, ARM_SMMU_V2, GENERIC_SMMU);
@@ -1937,6 +1999,8 @@ static int arm_smmu_device_dt_probe(struct platform_device *pdev,
 	data = of_device_get_match_data(dev);
 	smmu->version = data->version;
 	smmu->model = data->model;
+	smmu->clk_names = data->clks;
+	smmu->num_clks = data->num_clks;
 
 	parse_driver_options(smmu);
 
@@ -2035,6 +2099,10 @@ static int arm_smmu_device_probe(struct platform_device *pdev)
 		smmu->irqs[i] = irq;
 	}
 
+	err = arm_smmu_init_clocks(smmu);
+	if (err)
+		return err;
+
 	err = arm_smmu_device_cfg_probe(smmu);
 	if (err)
 		return err;
@@ -2120,10 +2188,35 @@ static int arm_smmu_device_remove(struct platform_device *pdev)
 	return 0;
 }
 
+#ifdef CONFIG_PM
+static int arm_smmu_resume(struct device *dev)
+{
+	struct arm_smmu_device *smmu = dev_get_drvdata(dev);
+
+	return arm_smmu_enable_clocks(smmu);
+}
+
+static int arm_smmu_suspend(struct device *dev)
+{
+	struct arm_smmu_device *smmu = dev_get_drvdata(dev);
+
+	arm_smmu_disable_clocks(smmu);
+
+	return 0;
+}
+#endif
+
+static const struct dev_pm_ops arm_smmu_pm_ops = {
+	SET_RUNTIME_PM_OPS(arm_smmu_suspend, arm_smmu_resume, NULL)
+	SET_SYSTEM_SLEEP_PM_OPS(pm_runtime_force_suspend,
+				pm_runtime_force_resume)
+};
+
 static struct platform_driver arm_smmu_driver = {
 	.driver	= {
 		.name		= "arm-smmu",
 		.of_match_table	= of_match_ptr(arm_smmu_of_match),
+		.pm = &arm_smmu_pm_ops,
 	},
 	.probe	= arm_smmu_device_probe,
 	.remove	= arm_smmu_device_remove,
-- 
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 168+ messages in thread

* [PATCH V4 2/6] iommu/arm-smmu: Add pm_runtime/sleep ops
@ 2017-07-06  9:37   ` Vivek Gautam
  0 siblings, 0 replies; 168+ messages in thread
From: Vivek Gautam @ 2017-07-06  9:37 UTC (permalink / raw)
  To: linux-arm-kernel

From: Sricharan R <sricharan@codeaurora.org>

The smmu needs to be functional only when the respective
master's using it are active. The device_link feature
helps to track such functional dependencies, so that the
iommu gets powered when the master device enables itself
using pm_runtime. So by adapting the smmu driver for
runtime pm, above said dependency can be addressed.

This patch adds the pm runtime/sleep callbacks to the
driver and also the functions to parse the smmu clocks
from DT and enable them in resume/suspend.

Signed-off-by: Sricharan R <sricharan@codeaurora.org>
Signed-off-by: Archit Taneja <architt@codeaurora.org>
[vivek: Clock rework to loop over clock names data]
Signed-off-by: Vivek Gautam <vivek.gautam@codeaurora.org>
---
 drivers/iommu/arm-smmu.c | 95 +++++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 94 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index 61b1f8729a7c..bfe613f8939c 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -48,6 +48,7 @@
 #include <linux/of_iommu.h>
 #include <linux/pci.h>
 #include <linux/platform_device.h>
+#include <linux/pm_runtime.h>
 #include <linux/slab.h>
 #include <linux/spinlock.h>
 
@@ -196,6 +197,9 @@ struct arm_smmu_device {
 	u32				num_global_irqs;
 	u32				num_context_irqs;
 	unsigned int			*irqs;
+	int                             num_clks;
+	struct clk                      **clocks;
+	const char * const		*clk_names;
 
 	u32				cavium_id_base; /* Specific to Cavium */
 
@@ -272,6 +276,32 @@ static void parse_driver_options(struct arm_smmu_device *smmu)
 	} while (arm_smmu_options[++i].opt);
 }
 
+static int arm_smmu_enable_clocks(struct arm_smmu_device *smmu)
+{
+	int i, ret = 0;
+
+	for (i = 0; i < smmu->num_clks; ++i) {
+		ret = clk_prepare_enable(smmu->clocks[i]);
+		if (ret) {
+			dev_err(smmu->dev, "Couldn't enable %s clock\n",
+				smmu->clk_names[i]);
+			while (i--)
+				clk_disable_unprepare(smmu->clocks[i]);
+			break;
+		}
+	}
+
+	return ret;
+}
+
+static void arm_smmu_disable_clocks(struct arm_smmu_device *smmu)
+{
+	int i = smmu->num_clks;
+
+	while (i--)
+		clk_disable_unprepare(smmu->clocks[i]);
+}
+
 static struct device_node *dev_get_dev_node(struct device *dev)
 {
 	if (dev_is_pci(dev)) {
@@ -1626,6 +1656,36 @@ static int arm_smmu_id_size_to_bits(int size)
 	}
 }
 
+static int arm_smmu_init_clocks(struct arm_smmu_device *smmu)
+{
+	int i, err;
+	struct device *dev = smmu->dev;
+
+	if (smmu->num_clks < 1)
+		return 0;
+
+	smmu->clocks = devm_kcalloc(dev, smmu->num_clks,
+				    sizeof(*smmu->clocks), GFP_KERNEL);
+	if (!smmu->clocks)
+		return -ENOMEM;
+
+	for (i = 0; i < smmu->num_clks; i++) {
+		const char *cname = smmu->clk_names[i];
+		struct clk *c = devm_clk_get(dev, cname);
+
+		if (IS_ERR(c)) {
+			err = PTR_ERR(c);
+			if (err != -EPROBE_DEFER)
+				dev_err(dev, "Couldn't get clock: %s", cname);
+
+			return err;
+		}
+		smmu->clocks[i] = c;
+	}
+
+	return 0;
+}
+
 static int arm_smmu_device_cfg_probe(struct arm_smmu_device *smmu)
 {
 	unsigned long size;
@@ -1833,10 +1893,12 @@ static int arm_smmu_device_cfg_probe(struct arm_smmu_device *smmu)
 struct arm_smmu_match_data {
 	enum arm_smmu_arch_version version;
 	enum arm_smmu_implementation model;
+	const char * const *clks;
+	int num_clks;
 };
 
 #define ARM_SMMU_MATCH_DATA(name, ver, imp)	\
-static struct arm_smmu_match_data name = { .version = ver, .model = imp }
+static const struct arm_smmu_match_data name = { .version = ver, .model = imp }
 
 ARM_SMMU_MATCH_DATA(smmu_generic_v1, ARM_SMMU_V1, GENERIC_SMMU);
 ARM_SMMU_MATCH_DATA(smmu_generic_v2, ARM_SMMU_V2, GENERIC_SMMU);
@@ -1937,6 +1999,8 @@ static int arm_smmu_device_dt_probe(struct platform_device *pdev,
 	data = of_device_get_match_data(dev);
 	smmu->version = data->version;
 	smmu->model = data->model;
+	smmu->clk_names = data->clks;
+	smmu->num_clks = data->num_clks;
 
 	parse_driver_options(smmu);
 
@@ -2035,6 +2099,10 @@ static int arm_smmu_device_probe(struct platform_device *pdev)
 		smmu->irqs[i] = irq;
 	}
 
+	err = arm_smmu_init_clocks(smmu);
+	if (err)
+		return err;
+
 	err = arm_smmu_device_cfg_probe(smmu);
 	if (err)
 		return err;
@@ -2120,10 +2188,35 @@ static int arm_smmu_device_remove(struct platform_device *pdev)
 	return 0;
 }
 
+#ifdef CONFIG_PM
+static int arm_smmu_resume(struct device *dev)
+{
+	struct arm_smmu_device *smmu = dev_get_drvdata(dev);
+
+	return arm_smmu_enable_clocks(smmu);
+}
+
+static int arm_smmu_suspend(struct device *dev)
+{
+	struct arm_smmu_device *smmu = dev_get_drvdata(dev);
+
+	arm_smmu_disable_clocks(smmu);
+
+	return 0;
+}
+#endif
+
+static const struct dev_pm_ops arm_smmu_pm_ops = {
+	SET_RUNTIME_PM_OPS(arm_smmu_suspend, arm_smmu_resume, NULL)
+	SET_SYSTEM_SLEEP_PM_OPS(pm_runtime_force_suspend,
+				pm_runtime_force_resume)
+};
+
 static struct platform_driver arm_smmu_driver = {
 	.driver	= {
 		.name		= "arm-smmu",
 		.of_match_table	= of_match_ptr(arm_smmu_of_match),
+		.pm = &arm_smmu_pm_ops,
 	},
 	.probe	= arm_smmu_device_probe,
 	.remove	= arm_smmu_device_remove,
-- 
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project

^ permalink raw reply related	[flat|nested] 168+ messages in thread

* [PATCH V4 3/6] iommu/arm-smmu: Invoke pm_runtime during probe, add/remove device
  2017-07-06  9:36 ` Vivek Gautam
@ 2017-07-06  9:37   ` Vivek Gautam
  -1 siblings, 0 replies; 168+ messages in thread
From: Vivek Gautam @ 2017-07-06  9:37 UTC (permalink / raw)
  To: joro, robin.murphy, robh+dt, mark.rutland, will.deacon,
	m.szyprowski, sboyd, robdclark, iommu, devicetree, linux-kernel,
	linux-clk
  Cc: linux-arm-msm, sricharan, stanimir.varbanov, architt,
	vivek.gautam, linux-arm-kernel

From: Sricharan R <sricharan@codeaurora.org>

The smmu device probe/remove and add/remove master device callbacks
gets called when the smmu is not linked to its master, that is without
the context of the master device. So calling runtime apis in those places
separately.

Signed-off-by: Sricharan R <sricharan@codeaurora.org>
[stanimir: added runtime pm in .unmap iommu op]
Signed-off-by: Stanimir Varbanov <stanimir.varbanov@linaro.org>
[vivek: Cleanup pm runtime calls]
Signed-off-by: Vivek Gautam <vivek.gautam@codeaurora.org>
---
 drivers/iommu/arm-smmu.c | 54 ++++++++++++++++++++++++++++++++++++++++++------
 1 file changed, 48 insertions(+), 6 deletions(-)

diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index bfe613f8939c..ddbfa8ab69e6 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -897,11 +897,15 @@ static void arm_smmu_destroy_domain_context(struct iommu_domain *domain)
 	struct arm_smmu_device *smmu = smmu_domain->smmu;
 	struct arm_smmu_cfg *cfg = &smmu_domain->cfg;
 	void __iomem *cb_base;
-	int irq;
+	int ret, irq;
 
 	if (!smmu || domain->type == IOMMU_DOMAIN_IDENTITY)
 		return;
 
+	ret = pm_runtime_get_sync(smmu->dev);
+	if (ret)
+		return;
+
 	/*
 	 * Disable the context bank and free the page tables before freeing
 	 * it.
@@ -916,6 +920,8 @@ static void arm_smmu_destroy_domain_context(struct iommu_domain *domain)
 
 	free_io_pgtable_ops(smmu_domain->pgtbl_ops);
 	__arm_smmu_free_bitmap(smmu->context_map, cfg->cbndx);
+
+	pm_runtime_put_sync(smmu->dev);
 }
 
 static struct iommu_domain *arm_smmu_domain_alloc(unsigned type)
@@ -1231,12 +1237,18 @@ static int arm_smmu_map(struct iommu_domain *domain, unsigned long iova,
 static size_t arm_smmu_unmap(struct iommu_domain *domain, unsigned long iova,
 			     size_t size)
 {
-	struct io_pgtable_ops *ops = to_smmu_domain(domain)->pgtbl_ops;
+	struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
+	struct io_pgtable_ops *ops = smmu_domain->pgtbl_ops;
+	size_t ret;
 
 	if (!ops)
 		return 0;
 
-	return ops->unmap(ops, iova, size);
+	pm_runtime_get_sync(smmu_domain->smmu->dev);
+	ret = ops->unmap(ops, iova, size);
+	pm_runtime_put_sync(smmu_domain->smmu->dev);
+
+	return ret;
 }
 
 static phys_addr_t arm_smmu_iova_to_phys_hard(struct iommu_domain *domain,
@@ -1377,12 +1389,20 @@ static int arm_smmu_add_device(struct device *dev)
 	while (i--)
 		cfg->smendx[i] = INVALID_SMENDX;
 
-	ret = arm_smmu_master_alloc_smes(dev);
+	ret = pm_runtime_get_sync(smmu->dev);
 	if (ret)
 		goto out_cfg_free;
 
+	ret = arm_smmu_master_alloc_smes(dev);
+	if (ret) {
+		pm_runtime_put_sync(smmu->dev);
+		goto out_cfg_free;
+	}
+
 	iommu_device_link(&smmu->iommu, dev);
 
+	pm_runtime_put_sync(smmu->dev);
+
 	return 0;
 
 out_cfg_free:
@@ -1397,7 +1417,7 @@ static void arm_smmu_remove_device(struct device *dev)
 	struct iommu_fwspec *fwspec = dev->iommu_fwspec;
 	struct arm_smmu_master_cfg *cfg;
 	struct arm_smmu_device *smmu;
-
+	int ret;
 
 	if (!fwspec || fwspec->ops != &arm_smmu_ops)
 		return;
@@ -1405,8 +1425,21 @@ static void arm_smmu_remove_device(struct device *dev)
 	cfg  = fwspec->iommu_priv;
 	smmu = cfg->smmu;
 
+	/*
+	 * The device link between the master device and
+	 * smmu is already purged at this point.
+	 * So enable the power to smmu explicitly.
+	 */
+
+	ret = pm_runtime_get_sync(smmu->dev);
+	if (ret)
+		return;
+
 	iommu_device_unlink(&smmu->iommu, dev);
 	arm_smmu_master_free_smes(fwspec);
+
+	pm_runtime_put_sync(smmu->dev);
+
 	iommu_group_remove_device(dev);
 	kfree(fwspec->iommu_priv);
 	iommu_fwspec_free(dev);
@@ -2103,6 +2136,13 @@ static int arm_smmu_device_probe(struct platform_device *pdev)
 	if (err)
 		return err;
 
+	platform_set_drvdata(pdev, smmu);
+	pm_runtime_enable(dev);
+
+	err = pm_runtime_get_sync(dev);
+	if (err)
+		return err;
+
 	err = arm_smmu_device_cfg_probe(smmu);
 	if (err)
 		return err;
@@ -2144,9 +2184,9 @@ static int arm_smmu_device_probe(struct platform_device *pdev)
 		return err;
 	}
 
-	platform_set_drvdata(pdev, smmu);
 	arm_smmu_device_reset(smmu);
 	arm_smmu_test_smr_masks(smmu);
+	pm_runtime_put_sync(dev);
 
 	/*
 	 * For ACPI and generic DT bindings, an SMMU will be probed before
@@ -2185,6 +2225,8 @@ static int arm_smmu_device_remove(struct platform_device *pdev)
 
 	/* Turn the thing off */
 	writel(sCR0_CLIENTPD, ARM_SMMU_GR0_NS(smmu) + ARM_SMMU_GR0_sCR0);
+	pm_runtime_force_suspend(smmu->dev);
+
 	return 0;
 }
 
-- 
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project

^ permalink raw reply related	[flat|nested] 168+ messages in thread

* [PATCH V4 3/6] iommu/arm-smmu: Invoke pm_runtime during probe, add/remove device
@ 2017-07-06  9:37   ` Vivek Gautam
  0 siblings, 0 replies; 168+ messages in thread
From: Vivek Gautam @ 2017-07-06  9:37 UTC (permalink / raw)
  To: linux-arm-kernel

From: Sricharan R <sricharan@codeaurora.org>

The smmu device probe/remove and add/remove master device callbacks
gets called when the smmu is not linked to its master, that is without
the context of the master device. So calling runtime apis in those places
separately.

Signed-off-by: Sricharan R <sricharan@codeaurora.org>
[stanimir: added runtime pm in .unmap iommu op]
Signed-off-by: Stanimir Varbanov <stanimir.varbanov@linaro.org>
[vivek: Cleanup pm runtime calls]
Signed-off-by: Vivek Gautam <vivek.gautam@codeaurora.org>
---
 drivers/iommu/arm-smmu.c | 54 ++++++++++++++++++++++++++++++++++++++++++------
 1 file changed, 48 insertions(+), 6 deletions(-)

diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index bfe613f8939c..ddbfa8ab69e6 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -897,11 +897,15 @@ static void arm_smmu_destroy_domain_context(struct iommu_domain *domain)
 	struct arm_smmu_device *smmu = smmu_domain->smmu;
 	struct arm_smmu_cfg *cfg = &smmu_domain->cfg;
 	void __iomem *cb_base;
-	int irq;
+	int ret, irq;
 
 	if (!smmu || domain->type == IOMMU_DOMAIN_IDENTITY)
 		return;
 
+	ret = pm_runtime_get_sync(smmu->dev);
+	if (ret)
+		return;
+
 	/*
 	 * Disable the context bank and free the page tables before freeing
 	 * it.
@@ -916,6 +920,8 @@ static void arm_smmu_destroy_domain_context(struct iommu_domain *domain)
 
 	free_io_pgtable_ops(smmu_domain->pgtbl_ops);
 	__arm_smmu_free_bitmap(smmu->context_map, cfg->cbndx);
+
+	pm_runtime_put_sync(smmu->dev);
 }
 
 static struct iommu_domain *arm_smmu_domain_alloc(unsigned type)
@@ -1231,12 +1237,18 @@ static int arm_smmu_map(struct iommu_domain *domain, unsigned long iova,
 static size_t arm_smmu_unmap(struct iommu_domain *domain, unsigned long iova,
 			     size_t size)
 {
-	struct io_pgtable_ops *ops = to_smmu_domain(domain)->pgtbl_ops;
+	struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
+	struct io_pgtable_ops *ops = smmu_domain->pgtbl_ops;
+	size_t ret;
 
 	if (!ops)
 		return 0;
 
-	return ops->unmap(ops, iova, size);
+	pm_runtime_get_sync(smmu_domain->smmu->dev);
+	ret = ops->unmap(ops, iova, size);
+	pm_runtime_put_sync(smmu_domain->smmu->dev);
+
+	return ret;
 }
 
 static phys_addr_t arm_smmu_iova_to_phys_hard(struct iommu_domain *domain,
@@ -1377,12 +1389,20 @@ static int arm_smmu_add_device(struct device *dev)
 	while (i--)
 		cfg->smendx[i] = INVALID_SMENDX;
 
-	ret = arm_smmu_master_alloc_smes(dev);
+	ret = pm_runtime_get_sync(smmu->dev);
 	if (ret)
 		goto out_cfg_free;
 
+	ret = arm_smmu_master_alloc_smes(dev);
+	if (ret) {
+		pm_runtime_put_sync(smmu->dev);
+		goto out_cfg_free;
+	}
+
 	iommu_device_link(&smmu->iommu, dev);
 
+	pm_runtime_put_sync(smmu->dev);
+
 	return 0;
 
 out_cfg_free:
@@ -1397,7 +1417,7 @@ static void arm_smmu_remove_device(struct device *dev)
 	struct iommu_fwspec *fwspec = dev->iommu_fwspec;
 	struct arm_smmu_master_cfg *cfg;
 	struct arm_smmu_device *smmu;
-
+	int ret;
 
 	if (!fwspec || fwspec->ops != &arm_smmu_ops)
 		return;
@@ -1405,8 +1425,21 @@ static void arm_smmu_remove_device(struct device *dev)
 	cfg  = fwspec->iommu_priv;
 	smmu = cfg->smmu;
 
+	/*
+	 * The device link between the master device and
+	 * smmu is already purged at this point.
+	 * So enable the power to smmu explicitly.
+	 */
+
+	ret = pm_runtime_get_sync(smmu->dev);
+	if (ret)
+		return;
+
 	iommu_device_unlink(&smmu->iommu, dev);
 	arm_smmu_master_free_smes(fwspec);
+
+	pm_runtime_put_sync(smmu->dev);
+
 	iommu_group_remove_device(dev);
 	kfree(fwspec->iommu_priv);
 	iommu_fwspec_free(dev);
@@ -2103,6 +2136,13 @@ static int arm_smmu_device_probe(struct platform_device *pdev)
 	if (err)
 		return err;
 
+	platform_set_drvdata(pdev, smmu);
+	pm_runtime_enable(dev);
+
+	err = pm_runtime_get_sync(dev);
+	if (err)
+		return err;
+
 	err = arm_smmu_device_cfg_probe(smmu);
 	if (err)
 		return err;
@@ -2144,9 +2184,9 @@ static int arm_smmu_device_probe(struct platform_device *pdev)
 		return err;
 	}
 
-	platform_set_drvdata(pdev, smmu);
 	arm_smmu_device_reset(smmu);
 	arm_smmu_test_smr_masks(smmu);
+	pm_runtime_put_sync(dev);
 
 	/*
 	 * For ACPI and generic DT bindings, an SMMU will be probed before
@@ -2185,6 +2225,8 @@ static int arm_smmu_device_remove(struct platform_device *pdev)
 
 	/* Turn the thing off */
 	writel(sCR0_CLIENTPD, ARM_SMMU_GR0_NS(smmu) + ARM_SMMU_GR0_sCR0);
+	pm_runtime_force_suspend(smmu->dev);
+
 	return 0;
 }
 
-- 
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project

^ permalink raw reply related	[flat|nested] 168+ messages in thread

* [PATCH V4 4/6] iommu/arm-smmu: Add the device_link between masters and smmu
  2017-07-06  9:36 ` Vivek Gautam
@ 2017-07-06  9:37   ` Vivek Gautam
  -1 siblings, 0 replies; 168+ messages in thread
From: Vivek Gautam @ 2017-07-06  9:37 UTC (permalink / raw)
  To: joro, robin.murphy, robh+dt, mark.rutland, will.deacon,
	m.szyprowski, sboyd, robdclark, iommu, devicetree, linux-kernel,
	linux-clk
  Cc: linux-arm-msm, sricharan, stanimir.varbanov, architt,
	vivek.gautam, linux-arm-kernel

From: Sricharan R <sricharan@codeaurora.org>

Finally add the device link between the master device and
smmu, so that the smmu gets runtime enabled/disabled only when the
master needs it. This is done from add_device callback which gets
called once when the master is added to the smmu.

Signed-off-by: Sricharan R <sricharan@codeaurora.org>
---
 drivers/iommu/arm-smmu.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index ddbfa8ab69e6..75567d9698ab 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -1348,6 +1348,7 @@ static int arm_smmu_add_device(struct device *dev)
 	struct arm_smmu_device *smmu;
 	struct arm_smmu_master_cfg *cfg;
 	struct iommu_fwspec *fwspec = dev->iommu_fwspec;
+	struct device_link *link = NULL;
 	int i, ret;
 
 	if (using_legacy_binding) {
@@ -1403,6 +1404,16 @@ static int arm_smmu_add_device(struct device *dev)
 
 	pm_runtime_put_sync(smmu->dev);
 
+	/*
+	 * Establish the link between smmu and master, so that the
+	 * smmu gets runtime enabled/disabled as per the master's
+	 * needs.
+	 */
+	link = device_link_add(dev, smmu->dev, DL_FLAG_PM_RUNTIME);
+	if (!link)
+		dev_warn(smmu->dev, "Unable to create device link between %s and %s\n",
+			 dev_name(smmu->dev), dev_name(dev));
+
 	return 0;
 
 out_cfg_free:
-- 
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project

^ permalink raw reply related	[flat|nested] 168+ messages in thread

* [PATCH V4 4/6] iommu/arm-smmu: Add the device_link between masters and smmu
@ 2017-07-06  9:37   ` Vivek Gautam
  0 siblings, 0 replies; 168+ messages in thread
From: Vivek Gautam @ 2017-07-06  9:37 UTC (permalink / raw)
  To: linux-arm-kernel

From: Sricharan R <sricharan@codeaurora.org>

Finally add the device link between the master device and
smmu, so that the smmu gets runtime enabled/disabled only when the
master needs it. This is done from add_device callback which gets
called once when the master is added to the smmu.

Signed-off-by: Sricharan R <sricharan@codeaurora.org>
---
 drivers/iommu/arm-smmu.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index ddbfa8ab69e6..75567d9698ab 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -1348,6 +1348,7 @@ static int arm_smmu_add_device(struct device *dev)
 	struct arm_smmu_device *smmu;
 	struct arm_smmu_master_cfg *cfg;
 	struct iommu_fwspec *fwspec = dev->iommu_fwspec;
+	struct device_link *link = NULL;
 	int i, ret;
 
 	if (using_legacy_binding) {
@@ -1403,6 +1404,16 @@ static int arm_smmu_add_device(struct device *dev)
 
 	pm_runtime_put_sync(smmu->dev);
 
+	/*
+	 * Establish the link between smmu and master, so that the
+	 * smmu gets runtime enabled/disabled as per the master's
+	 * needs.
+	 */
+	link = device_link_add(dev, smmu->dev, DL_FLAG_PM_RUNTIME);
+	if (!link)
+		dev_warn(smmu->dev, "Unable to create device link between %s and %s\n",
+			 dev_name(smmu->dev), dev_name(dev));
+
 	return 0;
 
 out_cfg_free:
-- 
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project

^ permalink raw reply related	[flat|nested] 168+ messages in thread

* [PATCH V4 5/6] iommu/arm-smmu: Add support for MMU40x/500 clocks
  2017-07-06  9:36 ` Vivek Gautam
@ 2017-07-06  9:37   ` Vivek Gautam
  -1 siblings, 0 replies; 168+ messages in thread
From: Vivek Gautam @ 2017-07-06  9:37 UTC (permalink / raw)
  To: joro, robin.murphy, robh+dt, mark.rutland, will.deacon,
	m.szyprowski, sboyd, robdclark, iommu, devicetree, linux-kernel,
	linux-clk
  Cc: linux-arm-msm, sricharan, stanimir.varbanov, architt,
	vivek.gautam, linux-arm-kernel

From: Sricharan R <sricharan@codeaurora.org>

The MMU400x/500 is the implementation of the SMMUv2
arch specification. It is split in to two blocks
TBU, TCU. TBU caches the page table, instantiated
for each master locally, clocked by the TBUn_clk.
TCU manages the address translation with PTW and has
the programming interface as well, clocked using the
TCU_CLK. The TBU can also be sharing the same clock
domain as TCU, in which case both are clocked using
the TCU_CLK.

This defines the clock bindings for the same and adds
the clock names to compatible data.

Signed-off-by: Sricharan R <sricharan@codeaurora.org>
[vivek: clock rework and cleanup]
Signed-off-by: Vivek Gautam <vivek.gautam@codeaurora.org>
---
 .../devicetree/bindings/iommu/arm,smmu.txt         | 24 ++++++++++++++++++++++
 drivers/iommu/arm-smmu.c                           | 12 ++++++++++-
 2 files changed, 35 insertions(+), 1 deletion(-)

diff --git a/Documentation/devicetree/bindings/iommu/arm,smmu.txt b/Documentation/devicetree/bindings/iommu/arm,smmu.txt
index 8a6ffce12af5..00331752d355 100644
--- a/Documentation/devicetree/bindings/iommu/arm,smmu.txt
+++ b/Documentation/devicetree/bindings/iommu/arm,smmu.txt
@@ -71,6 +71,26 @@ conditions.
                   or using stream matching with #iommu-cells = <2>, and
                   may be ignored if present in such cases.
 
+- clock-names:    Should be "tcu" and "iface" for "arm,mmu-400",
+                  "arm,mmu-401" and "arm,mmu-500"
+
+                  "tcu" clock is required for smmu's register access using the
+                  programming interface and ptw for downstream bus access. This
+                  clock is also used for access to the TBU connected to the
+                  master locally. Sometimes however, TBU is clocked along with
+                  the master.
+
+                  "iface" clock is required to access the TCU's programming
+                  interface, apart from the "tcu" clock.
+
+- clocks:         Phandles for respective clocks described by clock-names.
+
+- power-domains:  Phandles to SMMU's power domain specifier. This is
+                  required even if SMMU belongs to the master's power
+                  domain, as the SMMU will have to be enabled and
+                  accessed before master gets enabled and linked to its
+                  SMMU.
+
 ** Deprecated properties:
 
 - mmu-masters (deprecated in favour of the generic "iommus" binding) :
@@ -95,6 +115,10 @@ conditions.
                              <0 36 4>,
                              <0 37 4>;
                 #iommu-cells = <1>;
+                clocks = <&gcc GCC_SMMU_CFG_CLK>,
+                         <&gcc GCC_APSS_TCU_CLK>;
+
+		clock-names = "iface", "tcu";
         };
 
         /* device with two stream IDs, 0 and 7 */
diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index 75567d9698ab..7bb09280fa11 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -1947,9 +1947,19 @@ struct arm_smmu_match_data {
 ARM_SMMU_MATCH_DATA(smmu_generic_v1, ARM_SMMU_V1, GENERIC_SMMU);
 ARM_SMMU_MATCH_DATA(smmu_generic_v2, ARM_SMMU_V2, GENERIC_SMMU);
 ARM_SMMU_MATCH_DATA(arm_mmu401, ARM_SMMU_V1_64K, GENERIC_SMMU);
-ARM_SMMU_MATCH_DATA(arm_mmu500, ARM_SMMU_V2, ARM_MMU500);
 ARM_SMMU_MATCH_DATA(cavium_smmuv2, ARM_SMMU_V2, CAVIUM_SMMUV2);
 
+static const char * const arm_mmu500_clks[] = {
+	"tcu", "iface",
+};
+
+static const struct arm_smmu_match_data arm_mmu500 = {
+	.version = ARM_SMMU_V2,
+	.model = ARM_MMU500,
+	.clks = arm_mmu500_clks,
+	.num_clks = ARRAY_SIZE(arm_mmu500_clks),
+};
+
 static const struct of_device_id arm_smmu_of_match[] = {
 	{ .compatible = "arm,smmu-v1", .data = &smmu_generic_v1 },
 	{ .compatible = "arm,smmu-v2", .data = &smmu_generic_v2 },
-- 
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project


^ permalink raw reply related	[flat|nested] 168+ messages in thread

* [PATCH V4 5/6] iommu/arm-smmu: Add support for MMU40x/500 clocks
@ 2017-07-06  9:37   ` Vivek Gautam
  0 siblings, 0 replies; 168+ messages in thread
From: Vivek Gautam @ 2017-07-06  9:37 UTC (permalink / raw)
  To: linux-arm-kernel

From: Sricharan R <sricharan@codeaurora.org>

The MMU400x/500 is the implementation of the SMMUv2
arch specification. It is split in to two blocks
TBU, TCU. TBU caches the page table, instantiated
for each master locally, clocked by the TBUn_clk.
TCU manages the address translation with PTW and has
the programming interface as well, clocked using the
TCU_CLK. The TBU can also be sharing the same clock
domain as TCU, in which case both are clocked using
the TCU_CLK.

This defines the clock bindings for the same and adds
the clock names to compatible data.

Signed-off-by: Sricharan R <sricharan@codeaurora.org>
[vivek: clock rework and cleanup]
Signed-off-by: Vivek Gautam <vivek.gautam@codeaurora.org>
---
 .../devicetree/bindings/iommu/arm,smmu.txt         | 24 ++++++++++++++++++++++
 drivers/iommu/arm-smmu.c                           | 12 ++++++++++-
 2 files changed, 35 insertions(+), 1 deletion(-)

diff --git a/Documentation/devicetree/bindings/iommu/arm,smmu.txt b/Documentation/devicetree/bindings/iommu/arm,smmu.txt
index 8a6ffce12af5..00331752d355 100644
--- a/Documentation/devicetree/bindings/iommu/arm,smmu.txt
+++ b/Documentation/devicetree/bindings/iommu/arm,smmu.txt
@@ -71,6 +71,26 @@ conditions.
                   or using stream matching with #iommu-cells = <2>, and
                   may be ignored if present in such cases.
 
+- clock-names:    Should be "tcu" and "iface" for "arm,mmu-400",
+                  "arm,mmu-401" and "arm,mmu-500"
+
+                  "tcu" clock is required for smmu's register access using the
+                  programming interface and ptw for downstream bus access. This
+                  clock is also used for access to the TBU connected to the
+                  master locally. Sometimes however, TBU is clocked along with
+                  the master.
+
+                  "iface" clock is required to access the TCU's programming
+                  interface, apart from the "tcu" clock.
+
+- clocks:         Phandles for respective clocks described by clock-names.
+
+- power-domains:  Phandles to SMMU's power domain specifier. This is
+                  required even if SMMU belongs to the master's power
+                  domain, as the SMMU will have to be enabled and
+                  accessed before master gets enabled and linked to its
+                  SMMU.
+
 ** Deprecated properties:
 
 - mmu-masters (deprecated in favour of the generic "iommus" binding) :
@@ -95,6 +115,10 @@ conditions.
                              <0 36 4>,
                              <0 37 4>;
                 #iommu-cells = <1>;
+                clocks = <&gcc GCC_SMMU_CFG_CLK>,
+                         <&gcc GCC_APSS_TCU_CLK>;
+
+		clock-names = "iface", "tcu";
         };
 
         /* device with two stream IDs, 0 and 7 */
diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index 75567d9698ab..7bb09280fa11 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -1947,9 +1947,19 @@ struct arm_smmu_match_data {
 ARM_SMMU_MATCH_DATA(smmu_generic_v1, ARM_SMMU_V1, GENERIC_SMMU);
 ARM_SMMU_MATCH_DATA(smmu_generic_v2, ARM_SMMU_V2, GENERIC_SMMU);
 ARM_SMMU_MATCH_DATA(arm_mmu401, ARM_SMMU_V1_64K, GENERIC_SMMU);
-ARM_SMMU_MATCH_DATA(arm_mmu500, ARM_SMMU_V2, ARM_MMU500);
 ARM_SMMU_MATCH_DATA(cavium_smmuv2, ARM_SMMU_V2, CAVIUM_SMMUV2);
 
+static const char * const arm_mmu500_clks[] = {
+	"tcu", "iface",
+};
+
+static const struct arm_smmu_match_data arm_mmu500 = {
+	.version = ARM_SMMU_V2,
+	.model = ARM_MMU500,
+	.clks = arm_mmu500_clks,
+	.num_clks = ARRAY_SIZE(arm_mmu500_clks),
+};
+
 static const struct of_device_id arm_smmu_of_match[] = {
 	{ .compatible = "arm,smmu-v1", .data = &smmu_generic_v1 },
 	{ .compatible = "arm,smmu-v2", .data = &smmu_generic_v2 },
-- 
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project

^ permalink raw reply related	[flat|nested] 168+ messages in thread

* [PATCH V4 6/6] iommu/arm-smmu: Add support for qcom,msm8996-smmu-v2 clocks
  2017-07-06  9:36 ` Vivek Gautam
  (?)
@ 2017-07-06  9:37     ` Vivek Gautam
  -1 siblings, 0 replies; 168+ messages in thread
From: Vivek Gautam @ 2017-07-06  9:37 UTC (permalink / raw)
  To: joro-zLv9SwRftAIdnm+yROfE0A, robin.murphy-5wv7dgnIgG8,
	robh+dt-DgEjT+Ai2ygdnm+yROfE0A, mark.rutland-5wv7dgnIgG8,
	will.deacon-5wv7dgnIgG8, m.szyprowski-Sze3O3UU22JBDgjK7y7TUQ,
	sboyd-sgV2jX0FEOL9JmXXK+q4OQ, robdclark-Re5JQEeQqe8AvxtiuMwx3w,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-clk-u79uwXL29TY76Z2rM5mHXA
  Cc: linux-arm-msm-u79uwXL29TY76Z2rM5mHXA,
	sricharan-sgV2jX0FEOL9JmXXK+q4OQ,
	stanimir.varbanov-QSEj5FYQhm4dnm+yROfE0A,
	architt-sgV2jX0FEOL9JmXXK+q4OQ,
	vivek.gautam-sgV2jX0FEOL9JmXXK+q4OQ,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r

qcom,msm8996-smmu-v2 is an arm,smmu-v2 implementation with
specific clock and power requirements. This smmu core is used
with multiple masters on msm8996, viz. mdss, video, etc.
Add bindings for the same.

Signed-off-by: Vivek Gautam <vivek.gautam-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org>
---
 Documentation/devicetree/bindings/iommu/arm,smmu.txt | 18 ++++++++++++++++++
 drivers/iommu/arm-smmu.c                             | 13 +++++++++++++
 2 files changed, 31 insertions(+)

diff --git a/Documentation/devicetree/bindings/iommu/arm,smmu.txt b/Documentation/devicetree/bindings/iommu/arm,smmu.txt
index 00331752d355..5d8e79775fae 100644
--- a/Documentation/devicetree/bindings/iommu/arm,smmu.txt
+++ b/Documentation/devicetree/bindings/iommu/arm,smmu.txt
@@ -17,6 +17,7 @@ conditions.
                         "arm,mmu-401"
                         "arm,mmu-500"
                         "cavium,smmu-v2"
+                        "qcom,msm8996-smmu-v2"
 
                   depending on the particular implementation and/or the
                   version of the architecture implemented.
@@ -74,11 +75,16 @@ conditions.
 - clock-names:    Should be "tcu" and "iface" for "arm,mmu-400",
                   "arm,mmu-401" and "arm,mmu-500"
 
+                  Should be "bus", and "iface" for "qcom,msm8996-smmu-v2"
+                  implementation.
+
                   "tcu" clock is required for smmu's register access using the
                   programming interface and ptw for downstream bus access. This
                   clock is also used for access to the TBU connected to the
                   master locally. Sometimes however, TBU is clocked along with
                   the master.
+                  "bus" clock for "qcom,msm8996-smmu-v2" is requierd for downstream
+                  bus access and for the smmu ptw.
 
                   "iface" clock is required to access the TCU's programming
                   interface, apart from the "tcu" clock.
@@ -161,3 +167,15 @@ conditions.
                 iommu-map = <0 &smmu3 0 0x400>;
                 ...
         };
+
+	/* Qcom's arm,smmu-v2 implementation for msm8996 */
+	smmu4: iommu {
+		compatible = "qcom,msm8996-smmu-v2";
+		...
+		#iommu-cells = <1>;
+		power-domains = <&mmcc MDSS_GDSC>;
+
+		clocks = <&mmcc SMMU_MDP_AXI_CLK>,
+			 <&mmcc SMMU_MDP_AHB_CLK>;
+		clock-names = "bus", "iface";
+	};
diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index 7bb09280fa11..fe8e7fd61282 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -110,6 +110,7 @@ enum arm_smmu_implementation {
 	GENERIC_SMMU,
 	ARM_MMU500,
 	CAVIUM_SMMUV2,
+	QCOM_MSM8996_SMMUV2,
 };
 
 /* Until ACPICA headers cover IORT rev. C */
@@ -1960,6 +1961,17 @@ struct arm_smmu_match_data {
 	.num_clks = ARRAY_SIZE(arm_mmu500_clks),
 };
 
+static const char * const qcom_msm8996_smmuv2_clks[] = {
+	"bus", "iface",
+};
+
+static const struct arm_smmu_match_data qcom_msm8996_smmuv2 = {
+	.version = ARM_SMMU_V2,
+	.model = QCOM_MSM8996_SMMUV2,
+	.clks = qcom_msm8996_smmuv2_clks,
+	.num_clks = ARRAY_SIZE(qcom_msm8996_smmuv2_clks),
+};
+
 static const struct of_device_id arm_smmu_of_match[] = {
 	{ .compatible = "arm,smmu-v1", .data = &smmu_generic_v1 },
 	{ .compatible = "arm,smmu-v2", .data = &smmu_generic_v2 },
@@ -1967,6 +1979,7 @@ struct arm_smmu_match_data {
 	{ .compatible = "arm,mmu-401", .data = &arm_mmu401 },
 	{ .compatible = "arm,mmu-500", .data = &arm_mmu500 },
 	{ .compatible = "cavium,smmu-v2", .data = &cavium_smmuv2 },
+	{ .compatible = "qcom,msm8996-smmu-v2", .data = &qcom_msm8996_smmuv2 },
 	{ },
 };
 MODULE_DEVICE_TABLE(of, arm_smmu_of_match);
-- 
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project

--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 168+ messages in thread

* [PATCH V4 6/6] iommu/arm-smmu: Add support for qcom,msm8996-smmu-v2 clocks
@ 2017-07-06  9:37     ` Vivek Gautam
  0 siblings, 0 replies; 168+ messages in thread
From: Vivek Gautam @ 2017-07-06  9:37 UTC (permalink / raw)
  To: joro, robin.murphy, robh+dt, mark.rutland, will.deacon,
	m.szyprowski, sboyd, robdclark, iommu, devicetree, linux-kernel,
	linux-clk
  Cc: linux-arm-msm, sricharan, stanimir.varbanov, architt,
	vivek.gautam, linux-arm-kernel

qcom,msm8996-smmu-v2 is an arm,smmu-v2 implementation with
specific clock and power requirements. This smmu core is used
with multiple masters on msm8996, viz. mdss, video, etc.
Add bindings for the same.

Signed-off-by: Vivek Gautam <vivek.gautam@codeaurora.org>
---
 Documentation/devicetree/bindings/iommu/arm,smmu.txt | 18 ++++++++++++++++++
 drivers/iommu/arm-smmu.c                             | 13 +++++++++++++
 2 files changed, 31 insertions(+)

diff --git a/Documentation/devicetree/bindings/iommu/arm,smmu.txt b/Documentation/devicetree/bindings/iommu/arm,smmu.txt
index 00331752d355..5d8e79775fae 100644
--- a/Documentation/devicetree/bindings/iommu/arm,smmu.txt
+++ b/Documentation/devicetree/bindings/iommu/arm,smmu.txt
@@ -17,6 +17,7 @@ conditions.
                         "arm,mmu-401"
                         "arm,mmu-500"
                         "cavium,smmu-v2"
+                        "qcom,msm8996-smmu-v2"
 
                   depending on the particular implementation and/or the
                   version of the architecture implemented.
@@ -74,11 +75,16 @@ conditions.
 - clock-names:    Should be "tcu" and "iface" for "arm,mmu-400",
                   "arm,mmu-401" and "arm,mmu-500"
 
+                  Should be "bus", and "iface" for "qcom,msm8996-smmu-v2"
+                  implementation.
+
                   "tcu" clock is required for smmu's register access using the
                   programming interface and ptw for downstream bus access. This
                   clock is also used for access to the TBU connected to the
                   master locally. Sometimes however, TBU is clocked along with
                   the master.
+                  "bus" clock for "qcom,msm8996-smmu-v2" is requierd for downstream
+                  bus access and for the smmu ptw.
 
                   "iface" clock is required to access the TCU's programming
                   interface, apart from the "tcu" clock.
@@ -161,3 +167,15 @@ conditions.
                 iommu-map = <0 &smmu3 0 0x400>;
                 ...
         };
+
+	/* Qcom's arm,smmu-v2 implementation for msm8996 */
+	smmu4: iommu {
+		compatible = "qcom,msm8996-smmu-v2";
+		...
+		#iommu-cells = <1>;
+		power-domains = <&mmcc MDSS_GDSC>;
+
+		clocks = <&mmcc SMMU_MDP_AXI_CLK>,
+			 <&mmcc SMMU_MDP_AHB_CLK>;
+		clock-names = "bus", "iface";
+	};
diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index 7bb09280fa11..fe8e7fd61282 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -110,6 +110,7 @@ enum arm_smmu_implementation {
 	GENERIC_SMMU,
 	ARM_MMU500,
 	CAVIUM_SMMUV2,
+	QCOM_MSM8996_SMMUV2,
 };
 
 /* Until ACPICA headers cover IORT rev. C */
@@ -1960,6 +1961,17 @@ struct arm_smmu_match_data {
 	.num_clks = ARRAY_SIZE(arm_mmu500_clks),
 };
 
+static const char * const qcom_msm8996_smmuv2_clks[] = {
+	"bus", "iface",
+};
+
+static const struct arm_smmu_match_data qcom_msm8996_smmuv2 = {
+	.version = ARM_SMMU_V2,
+	.model = QCOM_MSM8996_SMMUV2,
+	.clks = qcom_msm8996_smmuv2_clks,
+	.num_clks = ARRAY_SIZE(qcom_msm8996_smmuv2_clks),
+};
+
 static const struct of_device_id arm_smmu_of_match[] = {
 	{ .compatible = "arm,smmu-v1", .data = &smmu_generic_v1 },
 	{ .compatible = "arm,smmu-v2", .data = &smmu_generic_v2 },
@@ -1967,6 +1979,7 @@ struct arm_smmu_match_data {
 	{ .compatible = "arm,mmu-401", .data = &arm_mmu401 },
 	{ .compatible = "arm,mmu-500", .data = &arm_mmu500 },
 	{ .compatible = "cavium,smmu-v2", .data = &cavium_smmuv2 },
+	{ .compatible = "qcom,msm8996-smmu-v2", .data = &qcom_msm8996_smmuv2 },
 	{ },
 };
 MODULE_DEVICE_TABLE(of, arm_smmu_of_match);
-- 
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project

^ permalink raw reply related	[flat|nested] 168+ messages in thread

* [PATCH V4 6/6] iommu/arm-smmu: Add support for qcom, msm8996-smmu-v2 clocks
@ 2017-07-06  9:37     ` Vivek Gautam
  0 siblings, 0 replies; 168+ messages in thread
From: Vivek Gautam @ 2017-07-06  9:37 UTC (permalink / raw)
  To: linux-arm-kernel

qcom,msm8996-smmu-v2 is an arm,smmu-v2 implementation with
specific clock and power requirements. This smmu core is used
with multiple masters on msm8996, viz. mdss, video, etc.
Add bindings for the same.

Signed-off-by: Vivek Gautam <vivek.gautam@codeaurora.org>
---
 Documentation/devicetree/bindings/iommu/arm,smmu.txt | 18 ++++++++++++++++++
 drivers/iommu/arm-smmu.c                             | 13 +++++++++++++
 2 files changed, 31 insertions(+)

diff --git a/Documentation/devicetree/bindings/iommu/arm,smmu.txt b/Documentation/devicetree/bindings/iommu/arm,smmu.txt
index 00331752d355..5d8e79775fae 100644
--- a/Documentation/devicetree/bindings/iommu/arm,smmu.txt
+++ b/Documentation/devicetree/bindings/iommu/arm,smmu.txt
@@ -17,6 +17,7 @@ conditions.
                         "arm,mmu-401"
                         "arm,mmu-500"
                         "cavium,smmu-v2"
+                        "qcom,msm8996-smmu-v2"
 
                   depending on the particular implementation and/or the
                   version of the architecture implemented.
@@ -74,11 +75,16 @@ conditions.
 - clock-names:    Should be "tcu" and "iface" for "arm,mmu-400",
                   "arm,mmu-401" and "arm,mmu-500"
 
+                  Should be "bus", and "iface" for "qcom,msm8996-smmu-v2"
+                  implementation.
+
                   "tcu" clock is required for smmu's register access using the
                   programming interface and ptw for downstream bus access. This
                   clock is also used for access to the TBU connected to the
                   master locally. Sometimes however, TBU is clocked along with
                   the master.
+                  "bus" clock for "qcom,msm8996-smmu-v2" is requierd for downstream
+                  bus access and for the smmu ptw.
 
                   "iface" clock is required to access the TCU's programming
                   interface, apart from the "tcu" clock.
@@ -161,3 +167,15 @@ conditions.
                 iommu-map = <0 &smmu3 0 0x400>;
                 ...
         };
+
+	/* Qcom's arm,smmu-v2 implementation for msm8996 */
+	smmu4: iommu {
+		compatible = "qcom,msm8996-smmu-v2";
+		...
+		#iommu-cells = <1>;
+		power-domains = <&mmcc MDSS_GDSC>;
+
+		clocks = <&mmcc SMMU_MDP_AXI_CLK>,
+			 <&mmcc SMMU_MDP_AHB_CLK>;
+		clock-names = "bus", "iface";
+	};
diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index 7bb09280fa11..fe8e7fd61282 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -110,6 +110,7 @@ enum arm_smmu_implementation {
 	GENERIC_SMMU,
 	ARM_MMU500,
 	CAVIUM_SMMUV2,
+	QCOM_MSM8996_SMMUV2,
 };
 
 /* Until ACPICA headers cover IORT rev. C */
@@ -1960,6 +1961,17 @@ struct arm_smmu_match_data {
 	.num_clks = ARRAY_SIZE(arm_mmu500_clks),
 };
 
+static const char * const qcom_msm8996_smmuv2_clks[] = {
+	"bus", "iface",
+};
+
+static const struct arm_smmu_match_data qcom_msm8996_smmuv2 = {
+	.version = ARM_SMMU_V2,
+	.model = QCOM_MSM8996_SMMUV2,
+	.clks = qcom_msm8996_smmuv2_clks,
+	.num_clks = ARRAY_SIZE(qcom_msm8996_smmuv2_clks),
+};
+
 static const struct of_device_id arm_smmu_of_match[] = {
 	{ .compatible = "arm,smmu-v1", .data = &smmu_generic_v1 },
 	{ .compatible = "arm,smmu-v2", .data = &smmu_generic_v2 },
@@ -1967,6 +1979,7 @@ struct arm_smmu_match_data {
 	{ .compatible = "arm,mmu-401", .data = &arm_mmu401 },
 	{ .compatible = "arm,mmu-500", .data = &arm_mmu500 },
 	{ .compatible = "cavium,smmu-v2", .data = &cavium_smmuv2 },
+	{ .compatible = "qcom,msm8996-smmu-v2", .data = &qcom_msm8996_smmuv2 },
 	{ },
 };
 MODULE_DEVICE_TABLE(of, arm_smmu_of_match);
-- 
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project

^ permalink raw reply related	[flat|nested] 168+ messages in thread

* Re: [PATCH V4 5/6] iommu/arm-smmu: Add support for MMU40x/500 clocks
  2017-07-06  9:37   ` Vivek Gautam
  (?)
@ 2017-07-10  3:37       ` Rob Herring
  -1 siblings, 0 replies; 168+ messages in thread
From: Rob Herring @ 2017-07-10  3:37 UTC (permalink / raw)
  To: Vivek Gautam
  Cc: mark.rutland-5wv7dgnIgG8, devicetree-u79uwXL29TY76Z2rM5mHXA,
	sboyd-sgV2jX0FEOL9JmXXK+q4OQ, will.deacon-5wv7dgnIgG8,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	stanimir.varbanov-QSEj5FYQhm4dnm+yROfE0A,
	linux-arm-msm-u79uwXL29TY76Z2rM5mHXA,
	linux-clk-u79uwXL29TY76Z2rM5mHXA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r

On Thu, Jul 06, 2017 at 03:07:04PM +0530, Vivek Gautam wrote:
> From: Sricharan R <sricharan-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org>
> 
> The MMU400x/500 is the implementation of the SMMUv2
> arch specification. It is split in to two blocks
> TBU, TCU. TBU caches the page table, instantiated
> for each master locally, clocked by the TBUn_clk.
> TCU manages the address translation with PTW and has
> the programming interface as well, clocked using the
> TCU_CLK. The TBU can also be sharing the same clock
> domain as TCU, in which case both are clocked using
> the TCU_CLK.

No TBU clock below. When is it shared or not? If that's an integration 
option then the binding should always have a TBU clock with the same 
parent as the TCU_CLK.

> This defines the clock bindings for the same and adds
> the clock names to compatible data.
> 
> Signed-off-by: Sricharan R <sricharan-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org>
> [vivek: clock rework and cleanup]
> Signed-off-by: Vivek Gautam <vivek.gautam-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org>
> ---
>  .../devicetree/bindings/iommu/arm,smmu.txt         | 24 ++++++++++++++++++++++
>  drivers/iommu/arm-smmu.c                           | 12 ++++++++++-
>  2 files changed, 35 insertions(+), 1 deletion(-)
> 
> diff --git a/Documentation/devicetree/bindings/iommu/arm,smmu.txt b/Documentation/devicetree/bindings/iommu/arm,smmu.txt
> index 8a6ffce12af5..00331752d355 100644
> --- a/Documentation/devicetree/bindings/iommu/arm,smmu.txt
> +++ b/Documentation/devicetree/bindings/iommu/arm,smmu.txt
> @@ -71,6 +71,26 @@ conditions.
>                    or using stream matching with #iommu-cells = <2>, and
>                    may be ignored if present in such cases.
>  
> +- clock-names:    Should be "tcu" and "iface" for "arm,mmu-400",
> +                  "arm,mmu-401" and "arm,mmu-500"
> +
> +                  "tcu" clock is required for smmu's register access using the
> +                  programming interface and ptw for downstream bus access. This
> +                  clock is also used for access to the TBU connected to the
> +                  master locally. Sometimes however, TBU is clocked along with
> +                  the master.
> +
> +                  "iface" clock is required to access the TCU's programming
> +                  interface, apart from the "tcu" clock.
> +
> +- clocks:         Phandles for respective clocks described by clock-names.
> +
> +- power-domains:  Phandles to SMMU's power domain specifier. This is
> +                  required even if SMMU belongs to the master's power
> +                  domain, as the SMMU will have to be enabled and
> +                  accessed before master gets enabled and linked to its
> +                  SMMU.
> +
>  ** Deprecated properties:
>  
>  - mmu-masters (deprecated in favour of the generic "iommus" binding) :
> @@ -95,6 +115,10 @@ conditions.
>                               <0 36 4>,
>                               <0 37 4>;
>                  #iommu-cells = <1>;
> +                clocks = <&gcc GCC_SMMU_CFG_CLK>,
> +                         <&gcc GCC_APSS_TCU_CLK>;
> +
> +		clock-names = "iface", "tcu";
>          };
>  
>          /* device with two stream IDs, 0 and 7 */
> diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
> index 75567d9698ab..7bb09280fa11 100644
> --- a/drivers/iommu/arm-smmu.c
> +++ b/drivers/iommu/arm-smmu.c
> @@ -1947,9 +1947,19 @@ struct arm_smmu_match_data {
>  ARM_SMMU_MATCH_DATA(smmu_generic_v1, ARM_SMMU_V1, GENERIC_SMMU);
>  ARM_SMMU_MATCH_DATA(smmu_generic_v2, ARM_SMMU_V2, GENERIC_SMMU);
>  ARM_SMMU_MATCH_DATA(arm_mmu401, ARM_SMMU_V1_64K, GENERIC_SMMU);
> -ARM_SMMU_MATCH_DATA(arm_mmu500, ARM_SMMU_V2, ARM_MMU500);
>  ARM_SMMU_MATCH_DATA(cavium_smmuv2, ARM_SMMU_V2, CAVIUM_SMMUV2);
>  
> +static const char * const arm_mmu500_clks[] = {
> +	"tcu", "iface",
> +};
> +
> +static const struct arm_smmu_match_data arm_mmu500 = {
> +	.version = ARM_SMMU_V2,
> +	.model = ARM_MMU500,
> +	.clks = arm_mmu500_clks,
> +	.num_clks = ARRAY_SIZE(arm_mmu500_clks),
> +};
> +
>  static const struct of_device_id arm_smmu_of_match[] = {
>  	{ .compatible = "arm,smmu-v1", .data = &smmu_generic_v1 },
>  	{ .compatible = "arm,smmu-v2", .data = &smmu_generic_v2 },
> -- 
> The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
> a Linux Foundation Collaborative Project
> 

^ permalink raw reply	[flat|nested] 168+ messages in thread

* Re: [PATCH V4 5/6] iommu/arm-smmu: Add support for MMU40x/500 clocks
@ 2017-07-10  3:37       ` Rob Herring
  0 siblings, 0 replies; 168+ messages in thread
From: Rob Herring @ 2017-07-10  3:37 UTC (permalink / raw)
  To: Vivek Gautam
  Cc: joro, robin.murphy, mark.rutland, will.deacon, m.szyprowski,
	sboyd, robdclark, iommu, devicetree, linux-kernel, linux-clk,
	linux-arm-msm, sricharan, stanimir.varbanov, architt,
	linux-arm-kernel

On Thu, Jul 06, 2017 at 03:07:04PM +0530, Vivek Gautam wrote:
> From: Sricharan R <sricharan@codeaurora.org>
> 
> The MMU400x/500 is the implementation of the SMMUv2
> arch specification. It is split in to two blocks
> TBU, TCU. TBU caches the page table, instantiated
> for each master locally, clocked by the TBUn_clk.
> TCU manages the address translation with PTW and has
> the programming interface as well, clocked using the
> TCU_CLK. The TBU can also be sharing the same clock
> domain as TCU, in which case both are clocked using
> the TCU_CLK.

No TBU clock below. When is it shared or not? If that's an integration 
option then the binding should always have a TBU clock with the same 
parent as the TCU_CLK.

> This defines the clock bindings for the same and adds
> the clock names to compatible data.
> 
> Signed-off-by: Sricharan R <sricharan@codeaurora.org>
> [vivek: clock rework and cleanup]
> Signed-off-by: Vivek Gautam <vivek.gautam@codeaurora.org>
> ---
>  .../devicetree/bindings/iommu/arm,smmu.txt         | 24 ++++++++++++++++++++++
>  drivers/iommu/arm-smmu.c                           | 12 ++++++++++-
>  2 files changed, 35 insertions(+), 1 deletion(-)
> 
> diff --git a/Documentation/devicetree/bindings/iommu/arm,smmu.txt b/Documentation/devicetree/bindings/iommu/arm,smmu.txt
> index 8a6ffce12af5..00331752d355 100644
> --- a/Documentation/devicetree/bindings/iommu/arm,smmu.txt
> +++ b/Documentation/devicetree/bindings/iommu/arm,smmu.txt
> @@ -71,6 +71,26 @@ conditions.
>                    or using stream matching with #iommu-cells = <2>, and
>                    may be ignored if present in such cases.
>  
> +- clock-names:    Should be "tcu" and "iface" for "arm,mmu-400",
> +                  "arm,mmu-401" and "arm,mmu-500"
> +
> +                  "tcu" clock is required for smmu's register access using the
> +                  programming interface and ptw for downstream bus access. This
> +                  clock is also used for access to the TBU connected to the
> +                  master locally. Sometimes however, TBU is clocked along with
> +                  the master.
> +
> +                  "iface" clock is required to access the TCU's programming
> +                  interface, apart from the "tcu" clock.
> +
> +- clocks:         Phandles for respective clocks described by clock-names.
> +
> +- power-domains:  Phandles to SMMU's power domain specifier. This is
> +                  required even if SMMU belongs to the master's power
> +                  domain, as the SMMU will have to be enabled and
> +                  accessed before master gets enabled and linked to its
> +                  SMMU.
> +
>  ** Deprecated properties:
>  
>  - mmu-masters (deprecated in favour of the generic "iommus" binding) :
> @@ -95,6 +115,10 @@ conditions.
>                               <0 36 4>,
>                               <0 37 4>;
>                  #iommu-cells = <1>;
> +                clocks = <&gcc GCC_SMMU_CFG_CLK>,
> +                         <&gcc GCC_APSS_TCU_CLK>;
> +
> +		clock-names = "iface", "tcu";
>          };
>  
>          /* device with two stream IDs, 0 and 7 */
> diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
> index 75567d9698ab..7bb09280fa11 100644
> --- a/drivers/iommu/arm-smmu.c
> +++ b/drivers/iommu/arm-smmu.c
> @@ -1947,9 +1947,19 @@ struct arm_smmu_match_data {
>  ARM_SMMU_MATCH_DATA(smmu_generic_v1, ARM_SMMU_V1, GENERIC_SMMU);
>  ARM_SMMU_MATCH_DATA(smmu_generic_v2, ARM_SMMU_V2, GENERIC_SMMU);
>  ARM_SMMU_MATCH_DATA(arm_mmu401, ARM_SMMU_V1_64K, GENERIC_SMMU);
> -ARM_SMMU_MATCH_DATA(arm_mmu500, ARM_SMMU_V2, ARM_MMU500);
>  ARM_SMMU_MATCH_DATA(cavium_smmuv2, ARM_SMMU_V2, CAVIUM_SMMUV2);
>  
> +static const char * const arm_mmu500_clks[] = {
> +	"tcu", "iface",
> +};
> +
> +static const struct arm_smmu_match_data arm_mmu500 = {
> +	.version = ARM_SMMU_V2,
> +	.model = ARM_MMU500,
> +	.clks = arm_mmu500_clks,
> +	.num_clks = ARRAY_SIZE(arm_mmu500_clks),
> +};
> +
>  static const struct of_device_id arm_smmu_of_match[] = {
>  	{ .compatible = "arm,smmu-v1", .data = &smmu_generic_v1 },
>  	{ .compatible = "arm,smmu-v2", .data = &smmu_generic_v2 },
> -- 
> The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
> a Linux Foundation Collaborative Project
> 

^ permalink raw reply	[flat|nested] 168+ messages in thread

* [PATCH V4 5/6] iommu/arm-smmu: Add support for MMU40x/500 clocks
@ 2017-07-10  3:37       ` Rob Herring
  0 siblings, 0 replies; 168+ messages in thread
From: Rob Herring @ 2017-07-10  3:37 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Jul 06, 2017 at 03:07:04PM +0530, Vivek Gautam wrote:
> From: Sricharan R <sricharan@codeaurora.org>
> 
> The MMU400x/500 is the implementation of the SMMUv2
> arch specification. It is split in to two blocks
> TBU, TCU. TBU caches the page table, instantiated
> for each master locally, clocked by the TBUn_clk.
> TCU manages the address translation with PTW and has
> the programming interface as well, clocked using the
> TCU_CLK. The TBU can also be sharing the same clock
> domain as TCU, in which case both are clocked using
> the TCU_CLK.

No TBU clock below. When is it shared or not? If that's an integration 
option then the binding should always have a TBU clock with the same 
parent as the TCU_CLK.

> This defines the clock bindings for the same and adds
> the clock names to compatible data.
> 
> Signed-off-by: Sricharan R <sricharan@codeaurora.org>
> [vivek: clock rework and cleanup]
> Signed-off-by: Vivek Gautam <vivek.gautam@codeaurora.org>
> ---
>  .../devicetree/bindings/iommu/arm,smmu.txt         | 24 ++++++++++++++++++++++
>  drivers/iommu/arm-smmu.c                           | 12 ++++++++++-
>  2 files changed, 35 insertions(+), 1 deletion(-)
> 
> diff --git a/Documentation/devicetree/bindings/iommu/arm,smmu.txt b/Documentation/devicetree/bindings/iommu/arm,smmu.txt
> index 8a6ffce12af5..00331752d355 100644
> --- a/Documentation/devicetree/bindings/iommu/arm,smmu.txt
> +++ b/Documentation/devicetree/bindings/iommu/arm,smmu.txt
> @@ -71,6 +71,26 @@ conditions.
>                    or using stream matching with #iommu-cells = <2>, and
>                    may be ignored if present in such cases.
>  
> +- clock-names:    Should be "tcu" and "iface" for "arm,mmu-400",
> +                  "arm,mmu-401" and "arm,mmu-500"
> +
> +                  "tcu" clock is required for smmu's register access using the
> +                  programming interface and ptw for downstream bus access. This
> +                  clock is also used for access to the TBU connected to the
> +                  master locally. Sometimes however, TBU is clocked along with
> +                  the master.
> +
> +                  "iface" clock is required to access the TCU's programming
> +                  interface, apart from the "tcu" clock.
> +
> +- clocks:         Phandles for respective clocks described by clock-names.
> +
> +- power-domains:  Phandles to SMMU's power domain specifier. This is
> +                  required even if SMMU belongs to the master's power
> +                  domain, as the SMMU will have to be enabled and
> +                  accessed before master gets enabled and linked to its
> +                  SMMU.
> +
>  ** Deprecated properties:
>  
>  - mmu-masters (deprecated in favour of the generic "iommus" binding) :
> @@ -95,6 +115,10 @@ conditions.
>                               <0 36 4>,
>                               <0 37 4>;
>                  #iommu-cells = <1>;
> +                clocks = <&gcc GCC_SMMU_CFG_CLK>,
> +                         <&gcc GCC_APSS_TCU_CLK>;
> +
> +		clock-names = "iface", "tcu";
>          };
>  
>          /* device with two stream IDs, 0 and 7 */
> diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
> index 75567d9698ab..7bb09280fa11 100644
> --- a/drivers/iommu/arm-smmu.c
> +++ b/drivers/iommu/arm-smmu.c
> @@ -1947,9 +1947,19 @@ struct arm_smmu_match_data {
>  ARM_SMMU_MATCH_DATA(smmu_generic_v1, ARM_SMMU_V1, GENERIC_SMMU);
>  ARM_SMMU_MATCH_DATA(smmu_generic_v2, ARM_SMMU_V2, GENERIC_SMMU);
>  ARM_SMMU_MATCH_DATA(arm_mmu401, ARM_SMMU_V1_64K, GENERIC_SMMU);
> -ARM_SMMU_MATCH_DATA(arm_mmu500, ARM_SMMU_V2, ARM_MMU500);
>  ARM_SMMU_MATCH_DATA(cavium_smmuv2, ARM_SMMU_V2, CAVIUM_SMMUV2);
>  
> +static const char * const arm_mmu500_clks[] = {
> +	"tcu", "iface",
> +};
> +
> +static const struct arm_smmu_match_data arm_mmu500 = {
> +	.version = ARM_SMMU_V2,
> +	.model = ARM_MMU500,
> +	.clks = arm_mmu500_clks,
> +	.num_clks = ARRAY_SIZE(arm_mmu500_clks),
> +};
> +
>  static const struct of_device_id arm_smmu_of_match[] = {
>  	{ .compatible = "arm,smmu-v1", .data = &smmu_generic_v1 },
>  	{ .compatible = "arm,smmu-v2", .data = &smmu_generic_v2 },
> -- 
> The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
> a Linux Foundation Collaborative Project
> 

^ permalink raw reply	[flat|nested] 168+ messages in thread

* Re: [PATCH V4 6/6] iommu/arm-smmu: Add support for qcom,msm8996-smmu-v2 clocks
  2017-07-06  9:37     ` [PATCH V4 6/6] iommu/arm-smmu: Add support for qcom,msm8996-smmu-v2 clocks Vivek Gautam
  (?)
@ 2017-07-10  3:40         ` Rob Herring
  -1 siblings, 0 replies; 168+ messages in thread
From: Rob Herring @ 2017-07-10  3:40 UTC (permalink / raw)
  To: Vivek Gautam
  Cc: mark.rutland-5wv7dgnIgG8, devicetree-u79uwXL29TY76Z2rM5mHXA,
	sboyd-sgV2jX0FEOL9JmXXK+q4OQ, will.deacon-5wv7dgnIgG8,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	stanimir.varbanov-QSEj5FYQhm4dnm+yROfE0A,
	linux-arm-msm-u79uwXL29TY76Z2rM5mHXA,
	linux-clk-u79uwXL29TY76Z2rM5mHXA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r

On Thu, Jul 06, 2017 at 03:07:05PM +0530, Vivek Gautam wrote:
> qcom,msm8996-smmu-v2 is an arm,smmu-v2 implementation with
> specific clock and power requirements. This smmu core is used
> with multiple masters on msm8996, viz. mdss, video, etc.
> Add bindings for the same.
> 
> Signed-off-by: Vivek Gautam <vivek.gautam-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org>
> ---
>  Documentation/devicetree/bindings/iommu/arm,smmu.txt | 18 ++++++++++++++++++
>  drivers/iommu/arm-smmu.c                             | 13 +++++++++++++
>  2 files changed, 31 insertions(+)
> 
> diff --git a/Documentation/devicetree/bindings/iommu/arm,smmu.txt b/Documentation/devicetree/bindings/iommu/arm,smmu.txt
> index 00331752d355..5d8e79775fae 100644
> --- a/Documentation/devicetree/bindings/iommu/arm,smmu.txt
> +++ b/Documentation/devicetree/bindings/iommu/arm,smmu.txt
> @@ -17,6 +17,7 @@ conditions.
>                          "arm,mmu-401"
>                          "arm,mmu-500"
>                          "cavium,smmu-v2"
> +                        "qcom,msm8996-smmu-v2"
>  
>                    depending on the particular implementation and/or the
>                    version of the architecture implemented.
> @@ -74,11 +75,16 @@ conditions.
>  - clock-names:    Should be "tcu" and "iface" for "arm,mmu-400",
>                    "arm,mmu-401" and "arm,mmu-500"
>  
> +                  Should be "bus", and "iface" for "qcom,msm8996-smmu-v2"
> +                  implementation.
> +
>                    "tcu" clock is required for smmu's register access using the
>                    programming interface and ptw for downstream bus access. This
>                    clock is also used for access to the TBU connected to the
>                    master locally. Sometimes however, TBU is clocked along with
>                    the master.
> +                  "bus" clock for "qcom,msm8996-smmu-v2" is requierd for downstream

s/requierd/required/

> +                  bus access and for the smmu ptw.
>  
>                    "iface" clock is required to access the TCU's programming
>                    interface, apart from the "tcu" clock.
> @@ -161,3 +167,15 @@ conditions.
>                  iommu-map = <0 &smmu3 0 0x400>;
>                  ...
>          };
> +
> +	/* Qcom's arm,smmu-v2 implementation for msm8996 */
> +	smmu4: iommu {
> +		compatible = "qcom,msm8996-smmu-v2";

No registers?

> +		...
> +		#iommu-cells = <1>;
> +		power-domains = <&mmcc MDSS_GDSC>;
> +
> +		clocks = <&mmcc SMMU_MDP_AXI_CLK>,
> +			 <&mmcc SMMU_MDP_AHB_CLK>;
> +		clock-names = "bus", "iface";
> +	};

^ permalink raw reply	[flat|nested] 168+ messages in thread

* Re: [PATCH V4 6/6] iommu/arm-smmu: Add support for qcom,msm8996-smmu-v2 clocks
@ 2017-07-10  3:40         ` Rob Herring
  0 siblings, 0 replies; 168+ messages in thread
From: Rob Herring @ 2017-07-10  3:40 UTC (permalink / raw)
  To: Vivek Gautam
  Cc: joro, robin.murphy, mark.rutland, will.deacon, m.szyprowski,
	sboyd, robdclark, iommu, devicetree, linux-kernel, linux-clk,
	linux-arm-msm, sricharan, stanimir.varbanov, architt,
	linux-arm-kernel

On Thu, Jul 06, 2017 at 03:07:05PM +0530, Vivek Gautam wrote:
> qcom,msm8996-smmu-v2 is an arm,smmu-v2 implementation with
> specific clock and power requirements. This smmu core is used
> with multiple masters on msm8996, viz. mdss, video, etc.
> Add bindings for the same.
> 
> Signed-off-by: Vivek Gautam <vivek.gautam@codeaurora.org>
> ---
>  Documentation/devicetree/bindings/iommu/arm,smmu.txt | 18 ++++++++++++++++++
>  drivers/iommu/arm-smmu.c                             | 13 +++++++++++++
>  2 files changed, 31 insertions(+)
> 
> diff --git a/Documentation/devicetree/bindings/iommu/arm,smmu.txt b/Documentation/devicetree/bindings/iommu/arm,smmu.txt
> index 00331752d355..5d8e79775fae 100644
> --- a/Documentation/devicetree/bindings/iommu/arm,smmu.txt
> +++ b/Documentation/devicetree/bindings/iommu/arm,smmu.txt
> @@ -17,6 +17,7 @@ conditions.
>                          "arm,mmu-401"
>                          "arm,mmu-500"
>                          "cavium,smmu-v2"
> +                        "qcom,msm8996-smmu-v2"
>  
>                    depending on the particular implementation and/or the
>                    version of the architecture implemented.
> @@ -74,11 +75,16 @@ conditions.
>  - clock-names:    Should be "tcu" and "iface" for "arm,mmu-400",
>                    "arm,mmu-401" and "arm,mmu-500"
>  
> +                  Should be "bus", and "iface" for "qcom,msm8996-smmu-v2"
> +                  implementation.
> +
>                    "tcu" clock is required for smmu's register access using the
>                    programming interface and ptw for downstream bus access. This
>                    clock is also used for access to the TBU connected to the
>                    master locally. Sometimes however, TBU is clocked along with
>                    the master.
> +                  "bus" clock for "qcom,msm8996-smmu-v2" is requierd for downstream

s/requierd/required/

> +                  bus access and for the smmu ptw.
>  
>                    "iface" clock is required to access the TCU's programming
>                    interface, apart from the "tcu" clock.
> @@ -161,3 +167,15 @@ conditions.
>                  iommu-map = <0 &smmu3 0 0x400>;
>                  ...
>          };
> +
> +	/* Qcom's arm,smmu-v2 implementation for msm8996 */
> +	smmu4: iommu {
> +		compatible = "qcom,msm8996-smmu-v2";

No registers?

> +		...
> +		#iommu-cells = <1>;
> +		power-domains = <&mmcc MDSS_GDSC>;
> +
> +		clocks = <&mmcc SMMU_MDP_AXI_CLK>,
> +			 <&mmcc SMMU_MDP_AHB_CLK>;
> +		clock-names = "bus", "iface";
> +	};

^ permalink raw reply	[flat|nested] 168+ messages in thread

* [PATCH V4 6/6] iommu/arm-smmu: Add support for qcom,msm8996-smmu-v2 clocks
@ 2017-07-10  3:40         ` Rob Herring
  0 siblings, 0 replies; 168+ messages in thread
From: Rob Herring @ 2017-07-10  3:40 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Jul 06, 2017 at 03:07:05PM +0530, Vivek Gautam wrote:
> qcom,msm8996-smmu-v2 is an arm,smmu-v2 implementation with
> specific clock and power requirements. This smmu core is used
> with multiple masters on msm8996, viz. mdss, video, etc.
> Add bindings for the same.
> 
> Signed-off-by: Vivek Gautam <vivek.gautam@codeaurora.org>
> ---
>  Documentation/devicetree/bindings/iommu/arm,smmu.txt | 18 ++++++++++++++++++
>  drivers/iommu/arm-smmu.c                             | 13 +++++++++++++
>  2 files changed, 31 insertions(+)
> 
> diff --git a/Documentation/devicetree/bindings/iommu/arm,smmu.txt b/Documentation/devicetree/bindings/iommu/arm,smmu.txt
> index 00331752d355..5d8e79775fae 100644
> --- a/Documentation/devicetree/bindings/iommu/arm,smmu.txt
> +++ b/Documentation/devicetree/bindings/iommu/arm,smmu.txt
> @@ -17,6 +17,7 @@ conditions.
>                          "arm,mmu-401"
>                          "arm,mmu-500"
>                          "cavium,smmu-v2"
> +                        "qcom,msm8996-smmu-v2"
>  
>                    depending on the particular implementation and/or the
>                    version of the architecture implemented.
> @@ -74,11 +75,16 @@ conditions.
>  - clock-names:    Should be "tcu" and "iface" for "arm,mmu-400",
>                    "arm,mmu-401" and "arm,mmu-500"
>  
> +                  Should be "bus", and "iface" for "qcom,msm8996-smmu-v2"
> +                  implementation.
> +
>                    "tcu" clock is required for smmu's register access using the
>                    programming interface and ptw for downstream bus access. This
>                    clock is also used for access to the TBU connected to the
>                    master locally. Sometimes however, TBU is clocked along with
>                    the master.
> +                  "bus" clock for "qcom,msm8996-smmu-v2" is requierd for downstream

s/requierd/required/

> +                  bus access and for the smmu ptw.
>  
>                    "iface" clock is required to access the TCU's programming
>                    interface, apart from the "tcu" clock.
> @@ -161,3 +167,15 @@ conditions.
>                  iommu-map = <0 &smmu3 0 0x400>;
>                  ...
>          };
> +
> +	/* Qcom's arm,smmu-v2 implementation for msm8996 */
> +	smmu4: iommu {
> +		compatible = "qcom,msm8996-smmu-v2";

No registers?

> +		...
> +		#iommu-cells = <1>;
> +		power-domains = <&mmcc MDSS_GDSC>;
> +
> +		clocks = <&mmcc SMMU_MDP_AXI_CLK>,
> +			 <&mmcc SMMU_MDP_AHB_CLK>;
> +		clock-names = "bus", "iface";
> +	};

^ permalink raw reply	[flat|nested] 168+ messages in thread

* Re: [PATCH V4 6/6] iommu/arm-smmu: Add support for qcom,msm8996-smmu-v2 clocks
  2017-07-10  3:40         ` Rob Herring
  (?)
  (?)
@ 2017-07-10  6:42           ` Vivek Gautam
  -1 siblings, 0 replies; 168+ messages in thread
From: Vivek Gautam @ 2017-07-10  6:42 UTC (permalink / raw)
  To: Rob Herring
  Cc: Mark Rutland, devicetree-u79uwXL29TY76Z2rM5mHXA, Stephen Boyd,
	Will Deacon, iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, Stanimir Varbanov,
	linux-arm-msm-u79uwXL29TY76Z2rM5mHXA,
	linux-clk-u79uwXL29TY76Z2rM5mHXA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r

Hi Rob,


On Mon, Jul 10, 2017 at 9:10 AM, Rob Herring <robh-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org> wrote:
> On Thu, Jul 06, 2017 at 03:07:05PM +0530, Vivek Gautam wrote:
>> qcom,msm8996-smmu-v2 is an arm,smmu-v2 implementation with
>> specific clock and power requirements. This smmu core is used
>> with multiple masters on msm8996, viz. mdss, video, etc.
>> Add bindings for the same.
>>
>> Signed-off-by: Vivek Gautam <vivek.gautam-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org>
>> ---
>>  Documentation/devicetree/bindings/iommu/arm,smmu.txt | 18 ++++++++++++++++++
>>  drivers/iommu/arm-smmu.c                             | 13 +++++++++++++
>>  2 files changed, 31 insertions(+)
>>
>> diff --git a/Documentation/devicetree/bindings/iommu/arm,smmu.txt b/Documentation/devicetree/bindings/iommu/arm,smmu.txt
>> index 00331752d355..5d8e79775fae 100644
>> --- a/Documentation/devicetree/bindings/iommu/arm,smmu.txt
>> +++ b/Documentation/devicetree/bindings/iommu/arm,smmu.txt
>> @@ -17,6 +17,7 @@ conditions.
>>                          "arm,mmu-401"
>>                          "arm,mmu-500"
>>                          "cavium,smmu-v2"
>> +                        "qcom,msm8996-smmu-v2"
>>
>>                    depending on the particular implementation and/or the
>>                    version of the architecture implemented.
>> @@ -74,11 +75,16 @@ conditions.
>>  - clock-names:    Should be "tcu" and "iface" for "arm,mmu-400",
>>                    "arm,mmu-401" and "arm,mmu-500"
>>
>> +                  Should be "bus", and "iface" for "qcom,msm8996-smmu-v2"
>> +                  implementation.
>> +
>>                    "tcu" clock is required for smmu's register access using the
>>                    programming interface and ptw for downstream bus access. This
>>                    clock is also used for access to the TBU connected to the
>>                    master locally. Sometimes however, TBU is clocked along with
>>                    the master.
>> +                  "bus" clock for "qcom,msm8996-smmu-v2" is requierd for downstream
>
> s/requierd/required/

sure, will correct it.

>
>> +                  bus access and for the smmu ptw.
>>
>>                    "iface" clock is required to access the TCU's programming
>>                    interface, apart from the "tcu" clock.
>> @@ -161,3 +167,15 @@ conditions.
>>                  iommu-map = <0 &smmu3 0 0x400>;
>>                  ...
>>          };
>> +
>> +     /* Qcom's arm,smmu-v2 implementation for msm8996 */
>> +     smmu4: iommu {
>> +             compatible = "qcom,msm8996-smmu-v2";
>
> No registers?

It does have registers. Will add the complete binding example.

Thank you for the review.

Best Regards
Vivek

[snip]


-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project

^ permalink raw reply	[flat|nested] 168+ messages in thread

* Re: [PATCH V4 6/6] iommu/arm-smmu: Add support for qcom,msm8996-smmu-v2 clocks
@ 2017-07-10  6:42           ` Vivek Gautam
  0 siblings, 0 replies; 168+ messages in thread
From: Vivek Gautam @ 2017-07-10  6:42 UTC (permalink / raw)
  To: Rob Herring
  Cc: joro, robin.murphy, Mark Rutland, Will Deacon, Marek Szyprowski,
	Stephen Boyd, robdclark, iommu, devicetree, linux-kernel,
	linux-clk, linux-arm-msm, sricharan, Stanimir Varbanov, architt,
	linux-arm-kernel

Hi Rob,


On Mon, Jul 10, 2017 at 9:10 AM, Rob Herring <robh@kernel.org> wrote:
> On Thu, Jul 06, 2017 at 03:07:05PM +0530, Vivek Gautam wrote:
>> qcom,msm8996-smmu-v2 is an arm,smmu-v2 implementation with
>> specific clock and power requirements. This smmu core is used
>> with multiple masters on msm8996, viz. mdss, video, etc.
>> Add bindings for the same.
>>
>> Signed-off-by: Vivek Gautam <vivek.gautam@codeaurora.org>
>> ---
>>  Documentation/devicetree/bindings/iommu/arm,smmu.txt | 18 ++++++++++++++++++
>>  drivers/iommu/arm-smmu.c                             | 13 +++++++++++++
>>  2 files changed, 31 insertions(+)
>>
>> diff --git a/Documentation/devicetree/bindings/iommu/arm,smmu.txt b/Documentation/devicetree/bindings/iommu/arm,smmu.txt
>> index 00331752d355..5d8e79775fae 100644
>> --- a/Documentation/devicetree/bindings/iommu/arm,smmu.txt
>> +++ b/Documentation/devicetree/bindings/iommu/arm,smmu.txt
>> @@ -17,6 +17,7 @@ conditions.
>>                          "arm,mmu-401"
>>                          "arm,mmu-500"
>>                          "cavium,smmu-v2"
>> +                        "qcom,msm8996-smmu-v2"
>>
>>                    depending on the particular implementation and/or the
>>                    version of the architecture implemented.
>> @@ -74,11 +75,16 @@ conditions.
>>  - clock-names:    Should be "tcu" and "iface" for "arm,mmu-400",
>>                    "arm,mmu-401" and "arm,mmu-500"
>>
>> +                  Should be "bus", and "iface" for "qcom,msm8996-smmu-v2"
>> +                  implementation.
>> +
>>                    "tcu" clock is required for smmu's register access using the
>>                    programming interface and ptw for downstream bus access. This
>>                    clock is also used for access to the TBU connected to the
>>                    master locally. Sometimes however, TBU is clocked along with
>>                    the master.
>> +                  "bus" clock for "qcom,msm8996-smmu-v2" is requierd for downstream
>
> s/requierd/required/

sure, will correct it.

>
>> +                  bus access and for the smmu ptw.
>>
>>                    "iface" clock is required to access the TCU's programming
>>                    interface, apart from the "tcu" clock.
>> @@ -161,3 +167,15 @@ conditions.
>>                  iommu-map = <0 &smmu3 0 0x400>;
>>                  ...
>>          };
>> +
>> +     /* Qcom's arm,smmu-v2 implementation for msm8996 */
>> +     smmu4: iommu {
>> +             compatible = "qcom,msm8996-smmu-v2";
>
> No registers?

It does have registers. Will add the complete binding example.

Thank you for the review.

Best Regards
Vivek

[snip]


-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project

^ permalink raw reply	[flat|nested] 168+ messages in thread

* Re: [PATCH V4 6/6] iommu/arm-smmu: Add support for qcom,msm8996-smmu-v2 clocks
@ 2017-07-10  6:42           ` Vivek Gautam
  0 siblings, 0 replies; 168+ messages in thread
From: Vivek Gautam @ 2017-07-10  6:42 UTC (permalink / raw)
  To: Rob Herring
  Cc: joro, robin.murphy, Mark Rutland, Will Deacon, Marek Szyprowski,
	Stephen Boyd, robdclark, iommu, devicetree, linux-kernel,
	linux-clk, linux-arm-msm, sricharan, Stanimir Varbanov, architt,
	linux-arm-kernel

Hi Rob,


On Mon, Jul 10, 2017 at 9:10 AM, Rob Herring <robh@kernel.org> wrote:
> On Thu, Jul 06, 2017 at 03:07:05PM +0530, Vivek Gautam wrote:
>> qcom,msm8996-smmu-v2 is an arm,smmu-v2 implementation with
>> specific clock and power requirements. This smmu core is used
>> with multiple masters on msm8996, viz. mdss, video, etc.
>> Add bindings for the same.
>>
>> Signed-off-by: Vivek Gautam <vivek.gautam@codeaurora.org>
>> ---
>>  Documentation/devicetree/bindings/iommu/arm,smmu.txt | 18 ++++++++++++++++++
>>  drivers/iommu/arm-smmu.c                             | 13 +++++++++++++
>>  2 files changed, 31 insertions(+)
>>
>> diff --git a/Documentation/devicetree/bindings/iommu/arm,smmu.txt b/Documentation/devicetree/bindings/iommu/arm,smmu.txt
>> index 00331752d355..5d8e79775fae 100644
>> --- a/Documentation/devicetree/bindings/iommu/arm,smmu.txt
>> +++ b/Documentation/devicetree/bindings/iommu/arm,smmu.txt
>> @@ -17,6 +17,7 @@ conditions.
>>                          "arm,mmu-401"
>>                          "arm,mmu-500"
>>                          "cavium,smmu-v2"
>> +                        "qcom,msm8996-smmu-v2"
>>
>>                    depending on the particular implementation and/or the
>>                    version of the architecture implemented.
>> @@ -74,11 +75,16 @@ conditions.
>>  - clock-names:    Should be "tcu" and "iface" for "arm,mmu-400",
>>                    "arm,mmu-401" and "arm,mmu-500"
>>
>> +                  Should be "bus", and "iface" for "qcom,msm8996-smmu-v2"
>> +                  implementation.
>> +
>>                    "tcu" clock is required for smmu's register access using the
>>                    programming interface and ptw for downstream bus access. This
>>                    clock is also used for access to the TBU connected to the
>>                    master locally. Sometimes however, TBU is clocked along with
>>                    the master.
>> +                  "bus" clock for "qcom,msm8996-smmu-v2" is requierd for downstream
>
> s/requierd/required/

sure, will correct it.

>
>> +                  bus access and for the smmu ptw.
>>
>>                    "iface" clock is required to access the TCU's programming
>>                    interface, apart from the "tcu" clock.
>> @@ -161,3 +167,15 @@ conditions.
>>                  iommu-map = <0 &smmu3 0 0x400>;
>>                  ...
>>          };
>> +
>> +     /* Qcom's arm,smmu-v2 implementation for msm8996 */
>> +     smmu4: iommu {
>> +             compatible = "qcom,msm8996-smmu-v2";
>
> No registers?

It does have registers. Will add the complete binding example.

Thank you for the review.

Best Regards
Vivek

[snip]


-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project

^ permalink raw reply	[flat|nested] 168+ messages in thread

* [PATCH V4 6/6] iommu/arm-smmu: Add support for qcom,msm8996-smmu-v2 clocks
@ 2017-07-10  6:42           ` Vivek Gautam
  0 siblings, 0 replies; 168+ messages in thread
From: Vivek Gautam @ 2017-07-10  6:42 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Rob,


On Mon, Jul 10, 2017 at 9:10 AM, Rob Herring <robh@kernel.org> wrote:
> On Thu, Jul 06, 2017 at 03:07:05PM +0530, Vivek Gautam wrote:
>> qcom,msm8996-smmu-v2 is an arm,smmu-v2 implementation with
>> specific clock and power requirements. This smmu core is used
>> with multiple masters on msm8996, viz. mdss, video, etc.
>> Add bindings for the same.
>>
>> Signed-off-by: Vivek Gautam <vivek.gautam@codeaurora.org>
>> ---
>>  Documentation/devicetree/bindings/iommu/arm,smmu.txt | 18 ++++++++++++++++++
>>  drivers/iommu/arm-smmu.c                             | 13 +++++++++++++
>>  2 files changed, 31 insertions(+)
>>
>> diff --git a/Documentation/devicetree/bindings/iommu/arm,smmu.txt b/Documentation/devicetree/bindings/iommu/arm,smmu.txt
>> index 00331752d355..5d8e79775fae 100644
>> --- a/Documentation/devicetree/bindings/iommu/arm,smmu.txt
>> +++ b/Documentation/devicetree/bindings/iommu/arm,smmu.txt
>> @@ -17,6 +17,7 @@ conditions.
>>                          "arm,mmu-401"
>>                          "arm,mmu-500"
>>                          "cavium,smmu-v2"
>> +                        "qcom,msm8996-smmu-v2"
>>
>>                    depending on the particular implementation and/or the
>>                    version of the architecture implemented.
>> @@ -74,11 +75,16 @@ conditions.
>>  - clock-names:    Should be "tcu" and "iface" for "arm,mmu-400",
>>                    "arm,mmu-401" and "arm,mmu-500"
>>
>> +                  Should be "bus", and "iface" for "qcom,msm8996-smmu-v2"
>> +                  implementation.
>> +
>>                    "tcu" clock is required for smmu's register access using the
>>                    programming interface and ptw for downstream bus access. This
>>                    clock is also used for access to the TBU connected to the
>>                    master locally. Sometimes however, TBU is clocked along with
>>                    the master.
>> +                  "bus" clock for "qcom,msm8996-smmu-v2" is requierd for downstream
>
> s/requierd/required/

sure, will correct it.

>
>> +                  bus access and for the smmu ptw.
>>
>>                    "iface" clock is required to access the TCU's programming
>>                    interface, apart from the "tcu" clock.
>> @@ -161,3 +167,15 @@ conditions.
>>                  iommu-map = <0 &smmu3 0 0x400>;
>>                  ...
>>          };
>> +
>> +     /* Qcom's arm,smmu-v2 implementation for msm8996 */
>> +     smmu4: iommu {
>> +             compatible = "qcom,msm8996-smmu-v2";
>
> No registers?

It does have registers. Will add the complete binding example.

Thank you for the review.

Best Regards
Vivek

[snip]


-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project

^ permalink raw reply	[flat|nested] 168+ messages in thread

* Re: [PATCH V4 5/6] iommu/arm-smmu: Add support for MMU40x/500 clocks
  2017-07-10  3:37       ` Rob Herring
@ 2017-07-11  5:18         ` Vivek Gautam
  -1 siblings, 0 replies; 168+ messages in thread
From: Vivek Gautam @ 2017-07-11  5:18 UTC (permalink / raw)
  To: Rob Herring
  Cc: joro, robin.murphy, mark.rutland, will.deacon, m.szyprowski,
	sboyd, robdclark, iommu, devicetree, linux-kernel, linux-clk,
	linux-arm-msm, sricharan, stanimir.varbanov, architt,
	linux-arm-kernel

Hi Rob,


On 07/10/2017 09:07 AM, Rob Herring wrote:
> On Thu, Jul 06, 2017 at 03:07:04PM +0530, Vivek Gautam wrote:
>> From: Sricharan R<sricharan@codeaurora.org>
>>
>> The MMU400x/500 is the implementation of the SMMUv2
>> arch specification. It is split in to two blocks
>> TBU, TCU. TBU caches the page table, instantiated
>> for each master locally, clocked by the TBUn_clk.
>> TCU manages the address translation with PTW and has
>> the programming interface as well, clocked using the
>> TCU_CLK. The TBU can also be sharing the same clock
>> domain as TCU, in which case both are clocked using
>> the TCU_CLK.
> No TBU clock below. When is it shared or not? If that's an integration
> option then the binding should always have a TBU clock with the same
> parent as the TCU_CLK.

Right. This is something that the ARM spec also says.
The TBU clock can either be in the same clock and power domain as
the TCU clock, or in a separate.

As you said, we should have the TBU clock as well, and based on the
integration the TBU clock can either have same parent as TCU or
different.

I will change these bindings to include the TBU clock as well.


Best Regards
Vivek

>> This defines the clock bindings for the same and adds
>> the clock names to compatible data.
>>
>> Signed-off-by: Sricharan R<sricharan@codeaurora.org>
>> [vivek: clock rework and cleanup]
>> Signed-off-by: Vivek Gautam<vivek.gautam@codeaurora.org>
>> ---
>>   .../devicetree/bindings/iommu/arm,smmu.txt         | 24 ++++++++++++++++++++++
>>   drivers/iommu/arm-smmu.c                           | 12 ++++++++++-
>>   2 files changed, 35 insertions(+), 1 deletion(-)
>>
>> diff --git a/Documentation/devicetree/bindings/iommu/arm,smmu.txt b/Documentation/devicetree/bindings/iommu/arm,smmu.txt
>> index 8a6ffce12af5..00331752d355 100644
>> --- a/Documentation/devicetree/bindings/iommu/arm,smmu.txt
>> +++ b/Documentation/devicetree/bindings/iommu/arm,smmu.txt
>> @@ -71,6 +71,26 @@ conditions.
>>                     or using stream matching with #iommu-cells = <2>, and
>>                     may be ignored if present in such cases.
>>   
>> +- clock-names:    Should be "tcu" and "iface" for "arm,mmu-400",
>> +                  "arm,mmu-401" and "arm,mmu-500"
>> +
>> +                  "tcu" clock is required for smmu's register access using the
>> +                  programming interface and ptw for downstream bus access. This
>> +                  clock is also used for access to the TBU connected to the
>> +                  master locally. Sometimes however, TBU is clocked along with
>> +                  the master.
>> +
>> +                  "iface" clock is required to access the TCU's programming
>> +                  interface, apart from the "tcu" clock.
>> +
>> +- clocks:         Phandles for respective clocks described by clock-names.
>> +
>> +- power-domains:  Phandles to SMMU's power domain specifier. This is
>> +                  required even if SMMU belongs to the master's power
>> +                  domain, as the SMMU will have to be enabled and
>> +                  accessed before master gets enabled and linked to its
>> +                  SMMU.
>> +
>>   ** Deprecated properties:
>>   
>>   - mmu-masters (deprecated in favour of the generic "iommus" binding) :
>> @@ -95,6 +115,10 @@ conditions.
>>                                <0 36 4>,
>>                                <0 37 4>;
>>                   #iommu-cells = <1>;
>> +                clocks = <&gcc GCC_SMMU_CFG_CLK>,
>> +                         <&gcc GCC_APSS_TCU_CLK>;
>> +
>> +		clock-names = "iface", "tcu";
>>           };
>>   
>>           /* device with two stream IDs, 0 and 7 */
>> diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
>> index 75567d9698ab..7bb09280fa11 100644
>> --- a/drivers/iommu/arm-smmu.c
>> +++ b/drivers/iommu/arm-smmu.c
>> @@ -1947,9 +1947,19 @@ struct arm_smmu_match_data {
>>   ARM_SMMU_MATCH_DATA(smmu_generic_v1, ARM_SMMU_V1, GENERIC_SMMU);
>>   ARM_SMMU_MATCH_DATA(smmu_generic_v2, ARM_SMMU_V2, GENERIC_SMMU);
>>   ARM_SMMU_MATCH_DATA(arm_mmu401, ARM_SMMU_V1_64K, GENERIC_SMMU);
>> -ARM_SMMU_MATCH_DATA(arm_mmu500, ARM_SMMU_V2, ARM_MMU500);
>>   ARM_SMMU_MATCH_DATA(cavium_smmuv2, ARM_SMMU_V2, CAVIUM_SMMUV2);
>>   
>> +static const char * const arm_mmu500_clks[] = {
>> +	"tcu", "iface",
>> +};
>> +
>> +static const struct arm_smmu_match_data arm_mmu500 = {
>> +	.version = ARM_SMMU_V2,
>> +	.model = ARM_MMU500,
>> +	.clks = arm_mmu500_clks,
>> +	.num_clks = ARRAY_SIZE(arm_mmu500_clks),
>> +};
>> +
>>   static const struct of_device_id arm_smmu_of_match[] = {
>>   	{ .compatible = "arm,smmu-v1", .data = &smmu_generic_v1 },
>>   	{ .compatible = "arm,smmu-v2", .data = &smmu_generic_v2 },
>> -- 
>> The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
>> a Linux Foundation Collaborative Project
>>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-arm-msm" in
> the body of a message tomajordomo@vger.kernel.org
> More majordomo info athttp://vger.kernel.org/majordomo-info.html

-- 
The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project


^ permalink raw reply	[flat|nested] 168+ messages in thread

* [PATCH V4 5/6] iommu/arm-smmu: Add support for MMU40x/500 clocks
@ 2017-07-11  5:18         ` Vivek Gautam
  0 siblings, 0 replies; 168+ messages in thread
From: Vivek Gautam @ 2017-07-11  5:18 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Rob,


On 07/10/2017 09:07 AM, Rob Herring wrote:
> On Thu, Jul 06, 2017 at 03:07:04PM +0530, Vivek Gautam wrote:
>> From: Sricharan R<sricharan@codeaurora.org>
>>
>> The MMU400x/500 is the implementation of the SMMUv2
>> arch specification. It is split in to two blocks
>> TBU, TCU. TBU caches the page table, instantiated
>> for each master locally, clocked by the TBUn_clk.
>> TCU manages the address translation with PTW and has
>> the programming interface as well, clocked using the
>> TCU_CLK. The TBU can also be sharing the same clock
>> domain as TCU, in which case both are clocked using
>> the TCU_CLK.
> No TBU clock below. When is it shared or not? If that's an integration
> option then the binding should always have a TBU clock with the same
> parent as the TCU_CLK.

Right. This is something that the ARM spec also says.
The TBU clock can either be in the same clock and power domain as
the TCU clock, or in a separate.

As you said, we should have the TBU clock as well, and based on the
integration the TBU clock can either have same parent as TCU or
different.

I will change these bindings to include the TBU clock as well.


Best Regards
Vivek

>> This defines the clock bindings for the same and adds
>> the clock names to compatible data.
>>
>> Signed-off-by: Sricharan R<sricharan@codeaurora.org>
>> [vivek: clock rework and cleanup]
>> Signed-off-by: Vivek Gautam<vivek.gautam@codeaurora.org>
>> ---
>>   .../devicetree/bindings/iommu/arm,smmu.txt         | 24 ++++++++++++++++++++++
>>   drivers/iommu/arm-smmu.c                           | 12 ++++++++++-
>>   2 files changed, 35 insertions(+), 1 deletion(-)
>>
>> diff --git a/Documentation/devicetree/bindings/iommu/arm,smmu.txt b/Documentation/devicetree/bindings/iommu/arm,smmu.txt
>> index 8a6ffce12af5..00331752d355 100644
>> --- a/Documentation/devicetree/bindings/iommu/arm,smmu.txt
>> +++ b/Documentation/devicetree/bindings/iommu/arm,smmu.txt
>> @@ -71,6 +71,26 @@ conditions.
>>                     or using stream matching with #iommu-cells = <2>, and
>>                     may be ignored if present in such cases.
>>   
>> +- clock-names:    Should be "tcu" and "iface" for "arm,mmu-400",
>> +                  "arm,mmu-401" and "arm,mmu-500"
>> +
>> +                  "tcu" clock is required for smmu's register access using the
>> +                  programming interface and ptw for downstream bus access. This
>> +                  clock is also used for access to the TBU connected to the
>> +                  master locally. Sometimes however, TBU is clocked along with
>> +                  the master.
>> +
>> +                  "iface" clock is required to access the TCU's programming
>> +                  interface, apart from the "tcu" clock.
>> +
>> +- clocks:         Phandles for respective clocks described by clock-names.
>> +
>> +- power-domains:  Phandles to SMMU's power domain specifier. This is
>> +                  required even if SMMU belongs to the master's power
>> +                  domain, as the SMMU will have to be enabled and
>> +                  accessed before master gets enabled and linked to its
>> +                  SMMU.
>> +
>>   ** Deprecated properties:
>>   
>>   - mmu-masters (deprecated in favour of the generic "iommus" binding) :
>> @@ -95,6 +115,10 @@ conditions.
>>                                <0 36 4>,
>>                                <0 37 4>;
>>                   #iommu-cells = <1>;
>> +                clocks = <&gcc GCC_SMMU_CFG_CLK>,
>> +                         <&gcc GCC_APSS_TCU_CLK>;
>> +
>> +		clock-names = "iface", "tcu";
>>           };
>>   
>>           /* device with two stream IDs, 0 and 7 */
>> diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
>> index 75567d9698ab..7bb09280fa11 100644
>> --- a/drivers/iommu/arm-smmu.c
>> +++ b/drivers/iommu/arm-smmu.c
>> @@ -1947,9 +1947,19 @@ struct arm_smmu_match_data {
>>   ARM_SMMU_MATCH_DATA(smmu_generic_v1, ARM_SMMU_V1, GENERIC_SMMU);
>>   ARM_SMMU_MATCH_DATA(smmu_generic_v2, ARM_SMMU_V2, GENERIC_SMMU);
>>   ARM_SMMU_MATCH_DATA(arm_mmu401, ARM_SMMU_V1_64K, GENERIC_SMMU);
>> -ARM_SMMU_MATCH_DATA(arm_mmu500, ARM_SMMU_V2, ARM_MMU500);
>>   ARM_SMMU_MATCH_DATA(cavium_smmuv2, ARM_SMMU_V2, CAVIUM_SMMUV2);
>>   
>> +static const char * const arm_mmu500_clks[] = {
>> +	"tcu", "iface",
>> +};
>> +
>> +static const struct arm_smmu_match_data arm_mmu500 = {
>> +	.version = ARM_SMMU_V2,
>> +	.model = ARM_MMU500,
>> +	.clks = arm_mmu500_clks,
>> +	.num_clks = ARRAY_SIZE(arm_mmu500_clks),
>> +};
>> +
>>   static const struct of_device_id arm_smmu_of_match[] = {
>>   	{ .compatible = "arm,smmu-v1", .data = &smmu_generic_v1 },
>>   	{ .compatible = "arm,smmu-v2", .data = &smmu_generic_v2 },
>> -- 
>> The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
>> a Linux Foundation Collaborative Project
>>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-arm-msm" in
> the body of a message tomajordomo at vger.kernel.org
> More majordomo info athttp://vger.kernel.org/majordomo-info.html

-- 
The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project

^ permalink raw reply	[flat|nested] 168+ messages in thread

* Re: [PATCH V4 3/6] iommu/arm-smmu: Invoke pm_runtime during probe, add/remove device
  2017-07-06  9:37   ` Vivek Gautam
  (?)
@ 2017-07-12 22:54       ` Stephen Boyd
  -1 siblings, 0 replies; 168+ messages in thread
From: Stephen Boyd @ 2017-07-12 22:54 UTC (permalink / raw)
  To: Vivek Gautam
  Cc: mark.rutland-5wv7dgnIgG8, devicetree-u79uwXL29TY76Z2rM5mHXA,
	linux-arm-msm-u79uwXL29TY76Z2rM5mHXA, will.deacon-5wv7dgnIgG8,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	robh+dt-DgEjT+Ai2ygdnm+yROfE0A,
	stanimir.varbanov-QSEj5FYQhm4dnm+yROfE0A,
	linux-clk-u79uwXL29TY76Z2rM5mHXA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r

On 07/06, Vivek Gautam wrote:
> @@ -1231,12 +1237,18 @@ static int arm_smmu_map(struct iommu_domain *domain, unsigned long iova,
>  static size_t arm_smmu_unmap(struct iommu_domain *domain, unsigned long iova,
>  			     size_t size)
>  {
> -	struct io_pgtable_ops *ops = to_smmu_domain(domain)->pgtbl_ops;
> +	struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
> +	struct io_pgtable_ops *ops = smmu_domain->pgtbl_ops;
> +	size_t ret;
>  
>  	if (!ops)
>  		return 0;
>  
> -	return ops->unmap(ops, iova, size);
> +	pm_runtime_get_sync(smmu_domain->smmu->dev);

Can these map/unmap ops be called from an atomic context? I seem
to recall that being a problem before.


> +	ret = ops->unmap(ops, iova, size);
> +	pm_runtime_put_sync(smmu_domain->smmu->dev);
> +
> +	return ret;
>  }
>  
>  static phys_addr_t arm_smmu_iova_to_phys_hard(struct iommu_domain *domain,

-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project

^ permalink raw reply	[flat|nested] 168+ messages in thread

* Re: [PATCH V4 3/6] iommu/arm-smmu: Invoke pm_runtime during probe, add/remove device
@ 2017-07-12 22:54       ` Stephen Boyd
  0 siblings, 0 replies; 168+ messages in thread
From: Stephen Boyd @ 2017-07-12 22:54 UTC (permalink / raw)
  To: Vivek Gautam
  Cc: joro, robin.murphy, robh+dt, mark.rutland, will.deacon,
	m.szyprowski, robdclark, iommu, devicetree, linux-kernel,
	linux-clk, linux-arm-msm, sricharan, stanimir.varbanov, architt,
	linux-arm-kernel

On 07/06, Vivek Gautam wrote:
> @@ -1231,12 +1237,18 @@ static int arm_smmu_map(struct iommu_domain *domain, unsigned long iova,
>  static size_t arm_smmu_unmap(struct iommu_domain *domain, unsigned long iova,
>  			     size_t size)
>  {
> -	struct io_pgtable_ops *ops = to_smmu_domain(domain)->pgtbl_ops;
> +	struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
> +	struct io_pgtable_ops *ops = smmu_domain->pgtbl_ops;
> +	size_t ret;
>  
>  	if (!ops)
>  		return 0;
>  
> -	return ops->unmap(ops, iova, size);
> +	pm_runtime_get_sync(smmu_domain->smmu->dev);

Can these map/unmap ops be called from an atomic context? I seem
to recall that being a problem before.


> +	ret = ops->unmap(ops, iova, size);
> +	pm_runtime_put_sync(smmu_domain->smmu->dev);
> +
> +	return ret;
>  }
>  
>  static phys_addr_t arm_smmu_iova_to_phys_hard(struct iommu_domain *domain,

-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project

^ permalink raw reply	[flat|nested] 168+ messages in thread

* [PATCH V4 3/6] iommu/arm-smmu: Invoke pm_runtime during probe, add/remove device
@ 2017-07-12 22:54       ` Stephen Boyd
  0 siblings, 0 replies; 168+ messages in thread
From: Stephen Boyd @ 2017-07-12 22:54 UTC (permalink / raw)
  To: linux-arm-kernel

On 07/06, Vivek Gautam wrote:
> @@ -1231,12 +1237,18 @@ static int arm_smmu_map(struct iommu_domain *domain, unsigned long iova,
>  static size_t arm_smmu_unmap(struct iommu_domain *domain, unsigned long iova,
>  			     size_t size)
>  {
> -	struct io_pgtable_ops *ops = to_smmu_domain(domain)->pgtbl_ops;
> +	struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
> +	struct io_pgtable_ops *ops = smmu_domain->pgtbl_ops;
> +	size_t ret;
>  
>  	if (!ops)
>  		return 0;
>  
> -	return ops->unmap(ops, iova, size);
> +	pm_runtime_get_sync(smmu_domain->smmu->dev);

Can these map/unmap ops be called from an atomic context? I seem
to recall that being a problem before.


> +	ret = ops->unmap(ops, iova, size);
> +	pm_runtime_put_sync(smmu_domain->smmu->dev);
> +
> +	return ret;
>  }
>  
>  static phys_addr_t arm_smmu_iova_to_phys_hard(struct iommu_domain *domain,

-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project

^ permalink raw reply	[flat|nested] 168+ messages in thread

* Re: [PATCH V4 4/6] iommu/arm-smmu: Add the device_link between masters and smmu
  2017-07-06  9:37   ` Vivek Gautam
  (?)
@ 2017-07-12 22:55       ` Stephen Boyd
  -1 siblings, 0 replies; 168+ messages in thread
From: Stephen Boyd @ 2017-07-12 22:55 UTC (permalink / raw)
  To: Vivek Gautam
  Cc: joro-zLv9SwRftAIdnm+yROfE0A, robin.murphy-5wv7dgnIgG8,
	robh+dt-DgEjT+Ai2ygdnm+yROfE0A, mark.rutland-5wv7dgnIgG8,
	will.deacon-5wv7dgnIgG8, m.szyprowski-Sze3O3UU22JBDgjK7y7TUQ,
	robdclark-Re5JQEeQqe8AvxtiuMwx3w,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-clk-u79uwXL29TY76Z2rM5mHXA,
	linux-arm-msm-u79uwXL29TY76Z2rM5mHXA,
	sricharan-sgV2jX0FEOL9JmXXK+q4OQ,
	stanimir.varbanov-QSEj5FYQhm4dnm+yROfE0A,
	architt-sgV2jX0FEOL9JmXXK+q4OQ,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r

On 07/06, Vivek Gautam wrote:
> diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
> index ddbfa8ab69e6..75567d9698ab 100644
> --- a/drivers/iommu/arm-smmu.c
> +++ b/drivers/iommu/arm-smmu.c
> @@ -1348,6 +1348,7 @@ static int arm_smmu_add_device(struct device *dev)
>  	struct arm_smmu_device *smmu;
>  	struct arm_smmu_master_cfg *cfg;
>  	struct iommu_fwspec *fwspec = dev->iommu_fwspec;
> +	struct device_link *link = NULL;

Unnecessary initialization?

>  	int i, ret;
>  
>  	if (using_legacy_binding) {
> @@ -1403,6 +1404,16 @@ static int arm_smmu_add_device(struct device *dev)
>  
>  	pm_runtime_put_sync(smmu->dev);
>  
> +	/*
> +	 * Establish the link between smmu and master, so that the
> +	 * smmu gets runtime enabled/disabled as per the master's
> +	 * needs.
> +	 */
> +	link = device_link_add(dev, smmu->dev, DL_FLAG_PM_RUNTIME);
> +	if (!link)
> +		dev_warn(smmu->dev, "Unable to create device link between %s and %s\n",
> +			 dev_name(smmu->dev), dev_name(dev));
> +
>  	return 0;
>  

-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 168+ messages in thread

* Re: [PATCH V4 4/6] iommu/arm-smmu: Add the device_link between masters and smmu
@ 2017-07-12 22:55       ` Stephen Boyd
  0 siblings, 0 replies; 168+ messages in thread
From: Stephen Boyd @ 2017-07-12 22:55 UTC (permalink / raw)
  To: Vivek Gautam
  Cc: joro, robin.murphy, robh+dt, mark.rutland, will.deacon,
	m.szyprowski, robdclark, iommu, devicetree, linux-kernel,
	linux-clk, linux-arm-msm, sricharan, stanimir.varbanov, architt,
	linux-arm-kernel

On 07/06, Vivek Gautam wrote:
> diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
> index ddbfa8ab69e6..75567d9698ab 100644
> --- a/drivers/iommu/arm-smmu.c
> +++ b/drivers/iommu/arm-smmu.c
> @@ -1348,6 +1348,7 @@ static int arm_smmu_add_device(struct device *dev)
>  	struct arm_smmu_device *smmu;
>  	struct arm_smmu_master_cfg *cfg;
>  	struct iommu_fwspec *fwspec = dev->iommu_fwspec;
> +	struct device_link *link = NULL;

Unnecessary initialization?

>  	int i, ret;
>  
>  	if (using_legacy_binding) {
> @@ -1403,6 +1404,16 @@ static int arm_smmu_add_device(struct device *dev)
>  
>  	pm_runtime_put_sync(smmu->dev);
>  
> +	/*
> +	 * Establish the link between smmu and master, so that the
> +	 * smmu gets runtime enabled/disabled as per the master's
> +	 * needs.
> +	 */
> +	link = device_link_add(dev, smmu->dev, DL_FLAG_PM_RUNTIME);
> +	if (!link)
> +		dev_warn(smmu->dev, "Unable to create device link between %s and %s\n",
> +			 dev_name(smmu->dev), dev_name(dev));
> +
>  	return 0;
>  

-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project

^ permalink raw reply	[flat|nested] 168+ messages in thread

* [PATCH V4 4/6] iommu/arm-smmu: Add the device_link between masters and smmu
@ 2017-07-12 22:55       ` Stephen Boyd
  0 siblings, 0 replies; 168+ messages in thread
From: Stephen Boyd @ 2017-07-12 22:55 UTC (permalink / raw)
  To: linux-arm-kernel

On 07/06, Vivek Gautam wrote:
> diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
> index ddbfa8ab69e6..75567d9698ab 100644
> --- a/drivers/iommu/arm-smmu.c
> +++ b/drivers/iommu/arm-smmu.c
> @@ -1348,6 +1348,7 @@ static int arm_smmu_add_device(struct device *dev)
>  	struct arm_smmu_device *smmu;
>  	struct arm_smmu_master_cfg *cfg;
>  	struct iommu_fwspec *fwspec = dev->iommu_fwspec;
> +	struct device_link *link = NULL;

Unnecessary initialization?

>  	int i, ret;
>  
>  	if (using_legacy_binding) {
> @@ -1403,6 +1404,16 @@ static int arm_smmu_add_device(struct device *dev)
>  
>  	pm_runtime_put_sync(smmu->dev);
>  
> +	/*
> +	 * Establish the link between smmu and master, so that the
> +	 * smmu gets runtime enabled/disabled as per the master's
> +	 * needs.
> +	 */
> +	link = device_link_add(dev, smmu->dev, DL_FLAG_PM_RUNTIME);
> +	if (!link)
> +		dev_warn(smmu->dev, "Unable to create device link between %s and %s\n",
> +			 dev_name(smmu->dev), dev_name(dev));
> +
>  	return 0;
>  

-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project

^ permalink raw reply	[flat|nested] 168+ messages in thread

* Re: [PATCH V4 2/6] iommu/arm-smmu: Add pm_runtime/sleep ops
  2017-07-06  9:37   ` Vivek Gautam
  (?)
@ 2017-07-12 22:58       ` Stephen Boyd
  -1 siblings, 0 replies; 168+ messages in thread
From: Stephen Boyd @ 2017-07-12 22:58 UTC (permalink / raw)
  To: Vivek Gautam
  Cc: mark.rutland-5wv7dgnIgG8, devicetree-u79uwXL29TY76Z2rM5mHXA,
	linux-arm-msm-u79uwXL29TY76Z2rM5mHXA, will.deacon-5wv7dgnIgG8,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	robh+dt-DgEjT+Ai2ygdnm+yROfE0A,
	stanimir.varbanov-QSEj5FYQhm4dnm+yROfE0A,
	linux-clk-u79uwXL29TY76Z2rM5mHXA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r

On 07/06, Vivek Gautam wrote:
> From: Sricharan R <sricharan-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org>
> 
> The smmu needs to be functional only when the respective
> master's using it are active. The device_link feature
> helps to track such functional dependencies, so that the
> iommu gets powered when the master device enables itself
> using pm_runtime. So by adapting the smmu driver for
> runtime pm, above said dependency can be addressed.
> 
> This patch adds the pm runtime/sleep callbacks to the
> driver and also the functions to parse the smmu clocks
> from DT and enable them in resume/suspend.
> 
> Signed-off-by: Sricharan R <sricharan-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org>
> Signed-off-by: Archit Taneja <architt-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org>
> [vivek: Clock rework to loop over clock names data]
> Signed-off-by: Vivek Gautam <vivek.gautam-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org>
> ---

General comment, we have a bulk clk API now, but I guess we
failed to add the clk_bulk_prepare_enable() API that could be
used here. Perhaps you can add that API and then use it here to
reduce lines of code.

-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project

^ permalink raw reply	[flat|nested] 168+ messages in thread

* Re: [PATCH V4 2/6] iommu/arm-smmu: Add pm_runtime/sleep ops
@ 2017-07-12 22:58       ` Stephen Boyd
  0 siblings, 0 replies; 168+ messages in thread
From: Stephen Boyd @ 2017-07-12 22:58 UTC (permalink / raw)
  To: Vivek Gautam
  Cc: joro, robin.murphy, robh+dt, mark.rutland, will.deacon,
	m.szyprowski, robdclark, iommu, devicetree, linux-kernel,
	linux-clk, linux-arm-msm, sricharan, stanimir.varbanov, architt,
	linux-arm-kernel

On 07/06, Vivek Gautam wrote:
> From: Sricharan R <sricharan@codeaurora.org>
> 
> The smmu needs to be functional only when the respective
> master's using it are active. The device_link feature
> helps to track such functional dependencies, so that the
> iommu gets powered when the master device enables itself
> using pm_runtime. So by adapting the smmu driver for
> runtime pm, above said dependency can be addressed.
> 
> This patch adds the pm runtime/sleep callbacks to the
> driver and also the functions to parse the smmu clocks
> from DT and enable them in resume/suspend.
> 
> Signed-off-by: Sricharan R <sricharan@codeaurora.org>
> Signed-off-by: Archit Taneja <architt@codeaurora.org>
> [vivek: Clock rework to loop over clock names data]
> Signed-off-by: Vivek Gautam <vivek.gautam@codeaurora.org>
> ---

General comment, we have a bulk clk API now, but I guess we
failed to add the clk_bulk_prepare_enable() API that could be
used here. Perhaps you can add that API and then use it here to
reduce lines of code.

-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project

^ permalink raw reply	[flat|nested] 168+ messages in thread

* [PATCH V4 2/6] iommu/arm-smmu: Add pm_runtime/sleep ops
@ 2017-07-12 22:58       ` Stephen Boyd
  0 siblings, 0 replies; 168+ messages in thread
From: Stephen Boyd @ 2017-07-12 22:58 UTC (permalink / raw)
  To: linux-arm-kernel

On 07/06, Vivek Gautam wrote:
> From: Sricharan R <sricharan@codeaurora.org>
> 
> The smmu needs to be functional only when the respective
> master's using it are active. The device_link feature
> helps to track such functional dependencies, so that the
> iommu gets powered when the master device enables itself
> using pm_runtime. So by adapting the smmu driver for
> runtime pm, above said dependency can be addressed.
> 
> This patch adds the pm runtime/sleep callbacks to the
> driver and also the functions to parse the smmu clocks
> from DT and enable them in resume/suspend.
> 
> Signed-off-by: Sricharan R <sricharan@codeaurora.org>
> Signed-off-by: Archit Taneja <architt@codeaurora.org>
> [vivek: Clock rework to loop over clock names data]
> Signed-off-by: Vivek Gautam <vivek.gautam@codeaurora.org>
> ---

General comment, we have a bulk clk API now, but I guess we
failed to add the clk_bulk_prepare_enable() API that could be
used here. Perhaps you can add that API and then use it here to
reduce lines of code.

-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project

^ permalink raw reply	[flat|nested] 168+ messages in thread

* Re: [PATCH V4 2/6] iommu/arm-smmu: Add pm_runtime/sleep ops
  2017-07-12 22:58       ` Stephen Boyd
  (?)
@ 2017-07-12 23:01           ` Stephen Boyd
  -1 siblings, 0 replies; 168+ messages in thread
From: Stephen Boyd @ 2017-07-12 23:01 UTC (permalink / raw)
  To: Vivek Gautam
  Cc: mark.rutland-5wv7dgnIgG8, devicetree-u79uwXL29TY76Z2rM5mHXA,
	linux-arm-msm-u79uwXL29TY76Z2rM5mHXA, will.deacon-5wv7dgnIgG8,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	robh+dt-DgEjT+Ai2ygdnm+yROfE0A,
	stanimir.varbanov-QSEj5FYQhm4dnm+yROfE0A, Bjorn Andersson,
	linux-clk-u79uwXL29TY76Z2rM5mHXA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r

On 07/12, Stephen Boyd wrote:
> On 07/06, Vivek Gautam wrote:
> > From: Sricharan R <sricharan-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org>
> > 
> > The smmu needs to be functional only when the respective
> > master's using it are active. The device_link feature
> > helps to track such functional dependencies, so that the
> > iommu gets powered when the master device enables itself
> > using pm_runtime. So by adapting the smmu driver for
> > runtime pm, above said dependency can be addressed.
> > 
> > This patch adds the pm runtime/sleep callbacks to the
> > driver and also the functions to parse the smmu clocks
> > from DT and enable them in resume/suspend.
> > 
> > Signed-off-by: Sricharan R <sricharan-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org>
> > Signed-off-by: Archit Taneja <architt-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org>
> > [vivek: Clock rework to loop over clock names data]
> > Signed-off-by: Vivek Gautam <vivek.gautam-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org>
> > ---
> 
> General comment, we have a bulk clk API now, but I guess we
> failed to add the clk_bulk_prepare_enable() API that could be
> used here. Perhaps you can add that API and then use it here to
> reduce lines of code.
> 

Bjorn just sent a patch for that API an hour ago.

-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project

^ permalink raw reply	[flat|nested] 168+ messages in thread

* Re: [PATCH V4 2/6] iommu/arm-smmu: Add pm_runtime/sleep ops
@ 2017-07-12 23:01           ` Stephen Boyd
  0 siblings, 0 replies; 168+ messages in thread
From: Stephen Boyd @ 2017-07-12 23:01 UTC (permalink / raw)
  To: Vivek Gautam
  Cc: joro, robin.murphy, robh+dt, mark.rutland, will.deacon,
	m.szyprowski, robdclark, iommu, devicetree, linux-kernel,
	linux-clk, linux-arm-msm, sricharan, stanimir.varbanov, architt,
	linux-arm-kernel, Bjorn Andersson

On 07/12, Stephen Boyd wrote:
> On 07/06, Vivek Gautam wrote:
> > From: Sricharan R <sricharan@codeaurora.org>
> > 
> > The smmu needs to be functional only when the respective
> > master's using it are active. The device_link feature
> > helps to track such functional dependencies, so that the
> > iommu gets powered when the master device enables itself
> > using pm_runtime. So by adapting the smmu driver for
> > runtime pm, above said dependency can be addressed.
> > 
> > This patch adds the pm runtime/sleep callbacks to the
> > driver and also the functions to parse the smmu clocks
> > from DT and enable them in resume/suspend.
> > 
> > Signed-off-by: Sricharan R <sricharan@codeaurora.org>
> > Signed-off-by: Archit Taneja <architt@codeaurora.org>
> > [vivek: Clock rework to loop over clock names data]
> > Signed-off-by: Vivek Gautam <vivek.gautam@codeaurora.org>
> > ---
> 
> General comment, we have a bulk clk API now, but I guess we
> failed to add the clk_bulk_prepare_enable() API that could be
> used here. Perhaps you can add that API and then use it here to
> reduce lines of code.
> 

Bjorn just sent a patch for that API an hour ago.

-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project

^ permalink raw reply	[flat|nested] 168+ messages in thread

* [PATCH V4 2/6] iommu/arm-smmu: Add pm_runtime/sleep ops
@ 2017-07-12 23:01           ` Stephen Boyd
  0 siblings, 0 replies; 168+ messages in thread
From: Stephen Boyd @ 2017-07-12 23:01 UTC (permalink / raw)
  To: linux-arm-kernel

On 07/12, Stephen Boyd wrote:
> On 07/06, Vivek Gautam wrote:
> > From: Sricharan R <sricharan@codeaurora.org>
> > 
> > The smmu needs to be functional only when the respective
> > master's using it are active. The device_link feature
> > helps to track such functional dependencies, so that the
> > iommu gets powered when the master device enables itself
> > using pm_runtime. So by adapting the smmu driver for
> > runtime pm, above said dependency can be addressed.
> > 
> > This patch adds the pm runtime/sleep callbacks to the
> > driver and also the functions to parse the smmu clocks
> > from DT and enable them in resume/suspend.
> > 
> > Signed-off-by: Sricharan R <sricharan@codeaurora.org>
> > Signed-off-by: Archit Taneja <architt@codeaurora.org>
> > [vivek: Clock rework to loop over clock names data]
> > Signed-off-by: Vivek Gautam <vivek.gautam@codeaurora.org>
> > ---
> 
> General comment, we have a bulk clk API now, but I guess we
> failed to add the clk_bulk_prepare_enable() API that could be
> used here. Perhaps you can add that API and then use it here to
> reduce lines of code.
> 

Bjorn just sent a patch for that API an hour ago.

-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project

^ permalink raw reply	[flat|nested] 168+ messages in thread

* Re: [PATCH V4 2/6] iommu/arm-smmu: Add pm_runtime/sleep ops
  2017-07-12 23:01           ` Stephen Boyd
@ 2017-07-13  3:57             ` Vivek Gautam
  -1 siblings, 0 replies; 168+ messages in thread
From: Vivek Gautam @ 2017-07-13  3:57 UTC (permalink / raw)
  To: Stephen Boyd
  Cc: joro, robin.murphy, robh+dt, mark.rutland, will.deacon,
	m.szyprowski, robdclark, iommu, devicetree, linux-kernel,
	linux-clk, linux-arm-msm, sricharan, stanimir.varbanov, architt,
	linux-arm-kernel, Bjorn Andersson



On 07/13/2017 04:31 AM, Stephen Boyd wrote:
> On 07/12, Stephen Boyd wrote:
>> On 07/06, Vivek Gautam wrote:
>>> From: Sricharan R <sricharan@codeaurora.org>
>>>
>>> The smmu needs to be functional only when the respective
>>> master's using it are active. The device_link feature
>>> helps to track such functional dependencies, so that the
>>> iommu gets powered when the master device enables itself
>>> using pm_runtime. So by adapting the smmu driver for
>>> runtime pm, above said dependency can be addressed.
>>>
>>> This patch adds the pm runtime/sleep callbacks to the
>>> driver and also the functions to parse the smmu clocks
>>> from DT and enable them in resume/suspend.
>>>
>>> Signed-off-by: Sricharan R <sricharan@codeaurora.org>
>>> Signed-off-by: Archit Taneja <architt@codeaurora.org>
>>> [vivek: Clock rework to loop over clock names data]
>>> Signed-off-by: Vivek Gautam <vivek.gautam@codeaurora.org>
>>> ---
>> General comment, we have a bulk clk API now, but I guess we
>> failed to add the clk_bulk_prepare_enable() API that could be
>> used here. Perhaps you can add that API and then use it here to
>> reduce lines of code.

Sure, will use the bulk clock APIs to handle the clocks.

Best regards
Vivek

>>
> Bjorn just sent a patch for that API an hour ago.
>

-- 
The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project

^ permalink raw reply	[flat|nested] 168+ messages in thread

* [PATCH V4 2/6] iommu/arm-smmu: Add pm_runtime/sleep ops
@ 2017-07-13  3:57             ` Vivek Gautam
  0 siblings, 0 replies; 168+ messages in thread
From: Vivek Gautam @ 2017-07-13  3:57 UTC (permalink / raw)
  To: linux-arm-kernel



On 07/13/2017 04:31 AM, Stephen Boyd wrote:
> On 07/12, Stephen Boyd wrote:
>> On 07/06, Vivek Gautam wrote:
>>> From: Sricharan R <sricharan@codeaurora.org>
>>>
>>> The smmu needs to be functional only when the respective
>>> master's using it are active. The device_link feature
>>> helps to track such functional dependencies, so that the
>>> iommu gets powered when the master device enables itself
>>> using pm_runtime. So by adapting the smmu driver for
>>> runtime pm, above said dependency can be addressed.
>>>
>>> This patch adds the pm runtime/sleep callbacks to the
>>> driver and also the functions to parse the smmu clocks
>>> from DT and enable them in resume/suspend.
>>>
>>> Signed-off-by: Sricharan R <sricharan@codeaurora.org>
>>> Signed-off-by: Archit Taneja <architt@codeaurora.org>
>>> [vivek: Clock rework to loop over clock names data]
>>> Signed-off-by: Vivek Gautam <vivek.gautam@codeaurora.org>
>>> ---
>> General comment, we have a bulk clk API now, but I guess we
>> failed to add the clk_bulk_prepare_enable() API that could be
>> used here. Perhaps you can add that API and then use it here to
>> reduce lines of code.

Sure, will use the bulk clock APIs to handle the clocks.

Best regards
Vivek

>>
> Bjorn just sent a patch for that API an hour ago.
>

-- 
The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project

^ permalink raw reply	[flat|nested] 168+ messages in thread

* Re: [PATCH V4 4/6] iommu/arm-smmu: Add the device_link between masters and smmu
  2017-07-12 22:55       ` Stephen Boyd
  (?)
@ 2017-07-13  3:59           ` Vivek Gautam
  -1 siblings, 0 replies; 168+ messages in thread
From: Vivek Gautam @ 2017-07-13  3:59 UTC (permalink / raw)
  To: Stephen Boyd
  Cc: mark.rutland-5wv7dgnIgG8, devicetree-u79uwXL29TY76Z2rM5mHXA,
	linux-arm-msm-u79uwXL29TY76Z2rM5mHXA, will.deacon-5wv7dgnIgG8,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	robh+dt-DgEjT+Ai2ygdnm+yROfE0A,
	stanimir.varbanov-QSEj5FYQhm4dnm+yROfE0A,
	linux-clk-u79uwXL29TY76Z2rM5mHXA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r



On 07/13/2017 04:25 AM, Stephen Boyd wrote:
> On 07/06, Vivek Gautam wrote:
>> diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
>> index ddbfa8ab69e6..75567d9698ab 100644
>> --- a/drivers/iommu/arm-smmu.c
>> +++ b/drivers/iommu/arm-smmu.c
>> @@ -1348,6 +1348,7 @@ static int arm_smmu_add_device(struct device *dev)
>>   	struct arm_smmu_device *smmu;
>>   	struct arm_smmu_master_cfg *cfg;
>>   	struct iommu_fwspec *fwspec = dev->iommu_fwspec;
>> +	struct device_link *link = NULL;
> Unnecessary initialization?

Right, will drop this.
Thanks.

>
>>   	int i, ret;
>>   
>>   	if (using_legacy_binding) {
>> @@ -1403,6 +1404,16 @@ static int arm_smmu_add_device(struct device *dev)
>>   
>>   	pm_runtime_put_sync(smmu->dev);
>>   
>> +	/*
>> +	 * Establish the link between smmu and master, so that the
>> +	 * smmu gets runtime enabled/disabled as per the master's
>> +	 * needs.
>> +	 */
>> +	link = device_link_add(dev, smmu->dev, DL_FLAG_PM_RUNTIME);
>> +	if (!link)
>> +		dev_warn(smmu->dev, "Unable to create device link between %s and %s\n",
>> +			 dev_name(smmu->dev), dev_name(dev));
>> +
>>   	return 0;
>>   

-- 
The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project

^ permalink raw reply	[flat|nested] 168+ messages in thread

* Re: [PATCH V4 4/6] iommu/arm-smmu: Add the device_link between masters and smmu
@ 2017-07-13  3:59           ` Vivek Gautam
  0 siblings, 0 replies; 168+ messages in thread
From: Vivek Gautam @ 2017-07-13  3:59 UTC (permalink / raw)
  To: Stephen Boyd
  Cc: joro, robin.murphy, robh+dt, mark.rutland, will.deacon,
	m.szyprowski, robdclark, iommu, devicetree, linux-kernel,
	linux-clk, linux-arm-msm, sricharan, stanimir.varbanov, architt,
	linux-arm-kernel



On 07/13/2017 04:25 AM, Stephen Boyd wrote:
> On 07/06, Vivek Gautam wrote:
>> diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
>> index ddbfa8ab69e6..75567d9698ab 100644
>> --- a/drivers/iommu/arm-smmu.c
>> +++ b/drivers/iommu/arm-smmu.c
>> @@ -1348,6 +1348,7 @@ static int arm_smmu_add_device(struct device *dev)
>>   	struct arm_smmu_device *smmu;
>>   	struct arm_smmu_master_cfg *cfg;
>>   	struct iommu_fwspec *fwspec = dev->iommu_fwspec;
>> +	struct device_link *link = NULL;
> Unnecessary initialization?

Right, will drop this.
Thanks.

>
>>   	int i, ret;
>>   
>>   	if (using_legacy_binding) {
>> @@ -1403,6 +1404,16 @@ static int arm_smmu_add_device(struct device *dev)
>>   
>>   	pm_runtime_put_sync(smmu->dev);
>>   
>> +	/*
>> +	 * Establish the link between smmu and master, so that the
>> +	 * smmu gets runtime enabled/disabled as per the master's
>> +	 * needs.
>> +	 */
>> +	link = device_link_add(dev, smmu->dev, DL_FLAG_PM_RUNTIME);
>> +	if (!link)
>> +		dev_warn(smmu->dev, "Unable to create device link between %s and %s\n",
>> +			 dev_name(smmu->dev), dev_name(dev));
>> +
>>   	return 0;
>>   

-- 
The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project

^ permalink raw reply	[flat|nested] 168+ messages in thread

* [PATCH V4 4/6] iommu/arm-smmu: Add the device_link between masters and smmu
@ 2017-07-13  3:59           ` Vivek Gautam
  0 siblings, 0 replies; 168+ messages in thread
From: Vivek Gautam @ 2017-07-13  3:59 UTC (permalink / raw)
  To: linux-arm-kernel



On 07/13/2017 04:25 AM, Stephen Boyd wrote:
> On 07/06, Vivek Gautam wrote:
>> diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
>> index ddbfa8ab69e6..75567d9698ab 100644
>> --- a/drivers/iommu/arm-smmu.c
>> +++ b/drivers/iommu/arm-smmu.c
>> @@ -1348,6 +1348,7 @@ static int arm_smmu_add_device(struct device *dev)
>>   	struct arm_smmu_device *smmu;
>>   	struct arm_smmu_master_cfg *cfg;
>>   	struct iommu_fwspec *fwspec = dev->iommu_fwspec;
>> +	struct device_link *link = NULL;
> Unnecessary initialization?

Right, will drop this.
Thanks.

>
>>   	int i, ret;
>>   
>>   	if (using_legacy_binding) {
>> @@ -1403,6 +1404,16 @@ static int arm_smmu_add_device(struct device *dev)
>>   
>>   	pm_runtime_put_sync(smmu->dev);
>>   
>> +	/*
>> +	 * Establish the link between smmu and master, so that the
>> +	 * smmu gets runtime enabled/disabled as per the master's
>> +	 * needs.
>> +	 */
>> +	link = device_link_add(dev, smmu->dev, DL_FLAG_PM_RUNTIME);
>> +	if (!link)
>> +		dev_warn(smmu->dev, "Unable to create device link between %s and %s\n",
>> +			 dev_name(smmu->dev), dev_name(dev));
>> +
>>   	return 0;
>>   

-- 
The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project

^ permalink raw reply	[flat|nested] 168+ messages in thread

* Re: [PATCH V4 3/6] iommu/arm-smmu: Invoke pm_runtime during probe, add/remove device
  2017-07-12 22:54       ` Stephen Boyd
  (?)
@ 2017-07-13  5:13           ` Vivek Gautam
  -1 siblings, 0 replies; 168+ messages in thread
From: Vivek Gautam @ 2017-07-13  5:13 UTC (permalink / raw)
  To: Stephen Boyd
  Cc: mark.rutland-5wv7dgnIgG8, devicetree-u79uwXL29TY76Z2rM5mHXA,
	linux-arm-msm-u79uwXL29TY76Z2rM5mHXA, will.deacon-5wv7dgnIgG8,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	robh+dt-DgEjT+Ai2ygdnm+yROfE0A,
	stanimir.varbanov-QSEj5FYQhm4dnm+yROfE0A,
	linux-clk-u79uwXL29TY76Z2rM5mHXA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r

Hi Stephen,


On 07/13/2017 04:24 AM, Stephen Boyd wrote:
> On 07/06, Vivek Gautam wrote:
>> @@ -1231,12 +1237,18 @@ static int arm_smmu_map(struct iommu_domain *domain, unsigned long iova,
>>   static size_t arm_smmu_unmap(struct iommu_domain *domain, unsigned long iova,
>>   			     size_t size)
>>   {
>> -	struct io_pgtable_ops *ops = to_smmu_domain(domain)->pgtbl_ops;
>> +	struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
>> +	struct io_pgtable_ops *ops = smmu_domain->pgtbl_ops;
>> +	size_t ret;
>>   
>>   	if (!ops)
>>   		return 0;
>>   
>> -	return ops->unmap(ops, iova, size);
>> +	pm_runtime_get_sync(smmu_domain->smmu->dev);
> Can these map/unmap ops be called from an atomic context? I seem
> to recall that being a problem before.

That's something which was dropped in the following patch merged in master:
523d7423e21b iommu/arm-smmu: Remove io-pgtable spinlock

Looks like we don't  need locks here anymore?

Best Regards
Vivek

>
>
>> +	ret = ops->unmap(ops, iova, size);
>> +	pm_runtime_put_sync(smmu_domain->smmu->dev);
>> +
>> +	return ret;
>>   }
>>   
>>   static phys_addr_t arm_smmu_iova_to_phys_hard(struct iommu_domain *domain,

-- 
The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project

^ permalink raw reply	[flat|nested] 168+ messages in thread

* Re: [PATCH V4 3/6] iommu/arm-smmu: Invoke pm_runtime during probe, add/remove device
@ 2017-07-13  5:13           ` Vivek Gautam
  0 siblings, 0 replies; 168+ messages in thread
From: Vivek Gautam @ 2017-07-13  5:13 UTC (permalink / raw)
  To: Stephen Boyd
  Cc: joro, robin.murphy, robh+dt, mark.rutland, will.deacon,
	m.szyprowski, robdclark, iommu, devicetree, linux-kernel,
	linux-clk, linux-arm-msm, sricharan, stanimir.varbanov, architt,
	linux-arm-kernel

Hi Stephen,


On 07/13/2017 04:24 AM, Stephen Boyd wrote:
> On 07/06, Vivek Gautam wrote:
>> @@ -1231,12 +1237,18 @@ static int arm_smmu_map(struct iommu_domain *domain, unsigned long iova,
>>   static size_t arm_smmu_unmap(struct iommu_domain *domain, unsigned long iova,
>>   			     size_t size)
>>   {
>> -	struct io_pgtable_ops *ops = to_smmu_domain(domain)->pgtbl_ops;
>> +	struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
>> +	struct io_pgtable_ops *ops = smmu_domain->pgtbl_ops;
>> +	size_t ret;
>>   
>>   	if (!ops)
>>   		return 0;
>>   
>> -	return ops->unmap(ops, iova, size);
>> +	pm_runtime_get_sync(smmu_domain->smmu->dev);
> Can these map/unmap ops be called from an atomic context? I seem
> to recall that being a problem before.

That's something which was dropped in the following patch merged in master:
523d7423e21b iommu/arm-smmu: Remove io-pgtable spinlock

Looks like we don't  need locks here anymore?

Best Regards
Vivek

>
>
>> +	ret = ops->unmap(ops, iova, size);
>> +	pm_runtime_put_sync(smmu_domain->smmu->dev);
>> +
>> +	return ret;
>>   }
>>   
>>   static phys_addr_t arm_smmu_iova_to_phys_hard(struct iommu_domain *domain,

-- 
The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project

^ permalink raw reply	[flat|nested] 168+ messages in thread

* [PATCH V4 3/6] iommu/arm-smmu: Invoke pm_runtime during probe, add/remove device
@ 2017-07-13  5:13           ` Vivek Gautam
  0 siblings, 0 replies; 168+ messages in thread
From: Vivek Gautam @ 2017-07-13  5:13 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Stephen,


On 07/13/2017 04:24 AM, Stephen Boyd wrote:
> On 07/06, Vivek Gautam wrote:
>> @@ -1231,12 +1237,18 @@ static int arm_smmu_map(struct iommu_domain *domain, unsigned long iova,
>>   static size_t arm_smmu_unmap(struct iommu_domain *domain, unsigned long iova,
>>   			     size_t size)
>>   {
>> -	struct io_pgtable_ops *ops = to_smmu_domain(domain)->pgtbl_ops;
>> +	struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
>> +	struct io_pgtable_ops *ops = smmu_domain->pgtbl_ops;
>> +	size_t ret;
>>   
>>   	if (!ops)
>>   		return 0;
>>   
>> -	return ops->unmap(ops, iova, size);
>> +	pm_runtime_get_sync(smmu_domain->smmu->dev);
> Can these map/unmap ops be called from an atomic context? I seem
> to recall that being a problem before.

That's something which was dropped in the following patch merged in master:
523d7423e21b iommu/arm-smmu: Remove io-pgtable spinlock

Looks like we don't  need locks here anymore?

Best Regards
Vivek

>
>
>> +	ret = ops->unmap(ops, iova, size);
>> +	pm_runtime_put_sync(smmu_domain->smmu->dev);
>> +
>> +	return ret;
>>   }
>>   
>>   static phys_addr_t arm_smmu_iova_to_phys_hard(struct iommu_domain *domain,

-- 
The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project

^ permalink raw reply	[flat|nested] 168+ messages in thread

* Re: [PATCH V4 3/6] iommu/arm-smmu: Invoke pm_runtime during probe, add/remove device
  2017-07-13  5:13           ` Vivek Gautam
  (?)
@ 2017-07-13  5:35               ` Sricharan R
  -1 siblings, 0 replies; 168+ messages in thread
From: Sricharan R @ 2017-07-13  5:35 UTC (permalink / raw)
  To: Vivek Gautam, Stephen Boyd
  Cc: mark.rutland-5wv7dgnIgG8, devicetree-u79uwXL29TY76Z2rM5mHXA,
	linux-arm-msm-u79uwXL29TY76Z2rM5mHXA, will.deacon-5wv7dgnIgG8,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	robh+dt-DgEjT+Ai2ygdnm+yROfE0A,
	stanimir.varbanov-QSEj5FYQhm4dnm+yROfE0A,
	linux-clk-u79uwXL29TY76Z2rM5mHXA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r

Hi Vivek,

On 7/13/2017 10:43 AM, Vivek Gautam wrote:
> Hi Stephen,
> 
> 
> On 07/13/2017 04:24 AM, Stephen Boyd wrote:
>> On 07/06, Vivek Gautam wrote:
>>> @@ -1231,12 +1237,18 @@ static int arm_smmu_map(struct iommu_domain *domain, unsigned long iova,
>>>   static size_t arm_smmu_unmap(struct iommu_domain *domain, unsigned long iova,
>>>                    size_t size)
>>>   {
>>> -    struct io_pgtable_ops *ops = to_smmu_domain(domain)->pgtbl_ops;
>>> +    struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
>>> +    struct io_pgtable_ops *ops = smmu_domain->pgtbl_ops;
>>> +    size_t ret;
>>>         if (!ops)
>>>           return 0;
>>>   -    return ops->unmap(ops, iova, size);
>>> +    pm_runtime_get_sync(smmu_domain->smmu->dev);
>> Can these map/unmap ops be called from an atomic context? I seem
>> to recall that being a problem before.
> 
> That's something which was dropped in the following patch merged in master:
> 523d7423e21b iommu/arm-smmu: Remove io-pgtable spinlock
> 
> Looks like we don't  need locks here anymore?

 Apart from the locking, wonder why a explicit pm_runtime is needed
 from unmap. Somehow looks like some path in the master using that
 should have enabled the pm ?

Regards,
 Sricharan

-- 
"QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation

---
This email has been checked for viruses by Avast antivirus software.
https://www.avast.com/antivirus

^ permalink raw reply	[flat|nested] 168+ messages in thread

* Re: [PATCH V4 3/6] iommu/arm-smmu: Invoke pm_runtime during probe, add/remove device
@ 2017-07-13  5:35               ` Sricharan R
  0 siblings, 0 replies; 168+ messages in thread
From: Sricharan R @ 2017-07-13  5:35 UTC (permalink / raw)
  To: Vivek Gautam, Stephen Boyd
  Cc: joro, robin.murphy, robh+dt, mark.rutland, will.deacon,
	m.szyprowski, robdclark, iommu, devicetree, linux-kernel,
	linux-clk, linux-arm-msm, stanimir.varbanov, architt,
	linux-arm-kernel

Hi Vivek,

On 7/13/2017 10:43 AM, Vivek Gautam wrote:
> Hi Stephen,
> 
> 
> On 07/13/2017 04:24 AM, Stephen Boyd wrote:
>> On 07/06, Vivek Gautam wrote:
>>> @@ -1231,12 +1237,18 @@ static int arm_smmu_map(struct iommu_domain *domain, unsigned long iova,
>>>   static size_t arm_smmu_unmap(struct iommu_domain *domain, unsigned long iova,
>>>                    size_t size)
>>>   {
>>> -    struct io_pgtable_ops *ops = to_smmu_domain(domain)->pgtbl_ops;
>>> +    struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
>>> +    struct io_pgtable_ops *ops = smmu_domain->pgtbl_ops;
>>> +    size_t ret;
>>>         if (!ops)
>>>           return 0;
>>>   -    return ops->unmap(ops, iova, size);
>>> +    pm_runtime_get_sync(smmu_domain->smmu->dev);
>> Can these map/unmap ops be called from an atomic context? I seem
>> to recall that being a problem before.
> 
> That's something which was dropped in the following patch merged in master:
> 523d7423e21b iommu/arm-smmu: Remove io-pgtable spinlock
> 
> Looks like we don't  need locks here anymore?

 Apart from the locking, wonder why a explicit pm_runtime is needed
 from unmap. Somehow looks like some path in the master using that
 should have enabled the pm ?

Regards,
 Sricharan

-- 
"QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation

---
This email has been checked for viruses by Avast antivirus software.
https://www.avast.com/antivirus

^ permalink raw reply	[flat|nested] 168+ messages in thread

* [PATCH V4 3/6] iommu/arm-smmu: Invoke pm_runtime during probe, add/remove device
@ 2017-07-13  5:35               ` Sricharan R
  0 siblings, 0 replies; 168+ messages in thread
From: Sricharan R @ 2017-07-13  5:35 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Vivek,

On 7/13/2017 10:43 AM, Vivek Gautam wrote:
> Hi Stephen,
> 
> 
> On 07/13/2017 04:24 AM, Stephen Boyd wrote:
>> On 07/06, Vivek Gautam wrote:
>>> @@ -1231,12 +1237,18 @@ static int arm_smmu_map(struct iommu_domain *domain, unsigned long iova,
>>>   static size_t arm_smmu_unmap(struct iommu_domain *domain, unsigned long iova,
>>>                    size_t size)
>>>   {
>>> -    struct io_pgtable_ops *ops = to_smmu_domain(domain)->pgtbl_ops;
>>> +    struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
>>> +    struct io_pgtable_ops *ops = smmu_domain->pgtbl_ops;
>>> +    size_t ret;
>>>         if (!ops)
>>>           return 0;
>>>   -    return ops->unmap(ops, iova, size);
>>> +    pm_runtime_get_sync(smmu_domain->smmu->dev);
>> Can these map/unmap ops be called from an atomic context? I seem
>> to recall that being a problem before.
> 
> That's something which was dropped in the following patch merged in master:
> 523d7423e21b iommu/arm-smmu: Remove io-pgtable spinlock
> 
> Looks like we don't  need locks here anymore?

 Apart from the locking, wonder why a explicit pm_runtime is needed
 from unmap. Somehow looks like some path in the master using that
 should have enabled the pm ?

Regards,
 Sricharan

-- 
"QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation

---
This email has been checked for viruses by Avast antivirus software.
https://www.avast.com/antivirus

^ permalink raw reply	[flat|nested] 168+ messages in thread

* Re: [PATCH V4 3/6] iommu/arm-smmu: Invoke pm_runtime during probe, add/remove device
  2017-07-13  5:13           ` Vivek Gautam
  (?)
@ 2017-07-13  6:48               ` Stephen Boyd
  -1 siblings, 0 replies; 168+ messages in thread
From: Stephen Boyd @ 2017-07-13  6:48 UTC (permalink / raw)
  To: Vivek Gautam
  Cc: mark.rutland-5wv7dgnIgG8, devicetree-u79uwXL29TY76Z2rM5mHXA,
	linux-arm-msm-u79uwXL29TY76Z2rM5mHXA, will.deacon-5wv7dgnIgG8,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	robh+dt-DgEjT+Ai2ygdnm+yROfE0A,
	stanimir.varbanov-QSEj5FYQhm4dnm+yROfE0A,
	linux-clk-u79uwXL29TY76Z2rM5mHXA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r

On 07/13, Vivek Gautam wrote:
> Hi Stephen,
> 
> 
> On 07/13/2017 04:24 AM, Stephen Boyd wrote:
> >On 07/06, Vivek Gautam wrote:
> >>@@ -1231,12 +1237,18 @@ static int arm_smmu_map(struct iommu_domain *domain, unsigned long iova,
> >>  static size_t arm_smmu_unmap(struct iommu_domain *domain, unsigned long iova,
> >>  			     size_t size)
> >>  {
> >>-	struct io_pgtable_ops *ops = to_smmu_domain(domain)->pgtbl_ops;
> >>+	struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
> >>+	struct io_pgtable_ops *ops = smmu_domain->pgtbl_ops;
> >>+	size_t ret;
> >>  	if (!ops)
> >>  		return 0;
> >>-	return ops->unmap(ops, iova, size);
> >>+	pm_runtime_get_sync(smmu_domain->smmu->dev);
> >Can these map/unmap ops be called from an atomic context? I seem
> >to recall that being a problem before.
> 
> That's something which was dropped in the following patch merged in master:
> 523d7423e21b iommu/arm-smmu: Remove io-pgtable spinlock
> 
> Looks like we don't  need locks here anymore?
> 

While removing the spinlock around the map/unmap path may be one
thing, I'm not sure that's all of them. Is there a path from an
atomic DMA allocation (GFP_ATOMIC sort of thing) mapped into an
IOMMU for a device that can eventually get down to here and
attempt to turn a clk on?

-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project

^ permalink raw reply	[flat|nested] 168+ messages in thread

* Re: [PATCH V4 3/6] iommu/arm-smmu: Invoke pm_runtime during probe, add/remove device
@ 2017-07-13  6:48               ` Stephen Boyd
  0 siblings, 0 replies; 168+ messages in thread
From: Stephen Boyd @ 2017-07-13  6:48 UTC (permalink / raw)
  To: Vivek Gautam
  Cc: joro, robin.murphy, robh+dt, mark.rutland, will.deacon,
	m.szyprowski, robdclark, iommu, devicetree, linux-kernel,
	linux-clk, linux-arm-msm, sricharan, stanimir.varbanov, architt,
	linux-arm-kernel

On 07/13, Vivek Gautam wrote:
> Hi Stephen,
> 
> 
> On 07/13/2017 04:24 AM, Stephen Boyd wrote:
> >On 07/06, Vivek Gautam wrote:
> >>@@ -1231,12 +1237,18 @@ static int arm_smmu_map(struct iommu_domain *domain, unsigned long iova,
> >>  static size_t arm_smmu_unmap(struct iommu_domain *domain, unsigned long iova,
> >>  			     size_t size)
> >>  {
> >>-	struct io_pgtable_ops *ops = to_smmu_domain(domain)->pgtbl_ops;
> >>+	struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
> >>+	struct io_pgtable_ops *ops = smmu_domain->pgtbl_ops;
> >>+	size_t ret;
> >>  	if (!ops)
> >>  		return 0;
> >>-	return ops->unmap(ops, iova, size);
> >>+	pm_runtime_get_sync(smmu_domain->smmu->dev);
> >Can these map/unmap ops be called from an atomic context? I seem
> >to recall that being a problem before.
> 
> That's something which was dropped in the following patch merged in master:
> 523d7423e21b iommu/arm-smmu: Remove io-pgtable spinlock
> 
> Looks like we don't  need locks here anymore?
> 

While removing the spinlock around the map/unmap path may be one
thing, I'm not sure that's all of them. Is there a path from an
atomic DMA allocation (GFP_ATOMIC sort of thing) mapped into an
IOMMU for a device that can eventually get down to here and
attempt to turn a clk on?

-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project

^ permalink raw reply	[flat|nested] 168+ messages in thread

* [PATCH V4 3/6] iommu/arm-smmu: Invoke pm_runtime during probe, add/remove device
@ 2017-07-13  6:48               ` Stephen Boyd
  0 siblings, 0 replies; 168+ messages in thread
From: Stephen Boyd @ 2017-07-13  6:48 UTC (permalink / raw)
  To: linux-arm-kernel

On 07/13, Vivek Gautam wrote:
> Hi Stephen,
> 
> 
> On 07/13/2017 04:24 AM, Stephen Boyd wrote:
> >On 07/06, Vivek Gautam wrote:
> >>@@ -1231,12 +1237,18 @@ static int arm_smmu_map(struct iommu_domain *domain, unsigned long iova,
> >>  static size_t arm_smmu_unmap(struct iommu_domain *domain, unsigned long iova,
> >>  			     size_t size)
> >>  {
> >>-	struct io_pgtable_ops *ops = to_smmu_domain(domain)->pgtbl_ops;
> >>+	struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
> >>+	struct io_pgtable_ops *ops = smmu_domain->pgtbl_ops;
> >>+	size_t ret;
> >>  	if (!ops)
> >>  		return 0;
> >>-	return ops->unmap(ops, iova, size);
> >>+	pm_runtime_get_sync(smmu_domain->smmu->dev);
> >Can these map/unmap ops be called from an atomic context? I seem
> >to recall that being a problem before.
> 
> That's something which was dropped in the following patch merged in master:
> 523d7423e21b iommu/arm-smmu: Remove io-pgtable spinlock
> 
> Looks like we don't  need locks here anymore?
> 

While removing the spinlock around the map/unmap path may be one
thing, I'm not sure that's all of them. Is there a path from an
atomic DMA allocation (GFP_ATOMIC sort of thing) mapped into an
IOMMU for a device that can eventually get down to here and
attempt to turn a clk on?

-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project

^ permalink raw reply	[flat|nested] 168+ messages in thread

* Re: [PATCH V4 3/6] iommu/arm-smmu: Invoke pm_runtime during probe, add/remove device
  2017-07-13  6:48               ` Stephen Boyd
@ 2017-07-13  9:50                 ` Robin Murphy
  -1 siblings, 0 replies; 168+ messages in thread
From: Robin Murphy @ 2017-07-13  9:50 UTC (permalink / raw)
  To: Stephen Boyd, Vivek Gautam
  Cc: joro, robh+dt, mark.rutland, will.deacon, m.szyprowski,
	robdclark, iommu, devicetree, linux-kernel, linux-clk,
	linux-arm-msm, sricharan, stanimir.varbanov, architt,
	linux-arm-kernel

On 13/07/17 07:48, Stephen Boyd wrote:
> On 07/13, Vivek Gautam wrote:
>> Hi Stephen,
>>
>>
>> On 07/13/2017 04:24 AM, Stephen Boyd wrote:
>>> On 07/06, Vivek Gautam wrote:
>>>> @@ -1231,12 +1237,18 @@ static int arm_smmu_map(struct iommu_domain *domain, unsigned long iova,
>>>>  static size_t arm_smmu_unmap(struct iommu_domain *domain, unsigned long iova,
>>>>  			     size_t size)
>>>>  {
>>>> -	struct io_pgtable_ops *ops = to_smmu_domain(domain)->pgtbl_ops;
>>>> +	struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
>>>> +	struct io_pgtable_ops *ops = smmu_domain->pgtbl_ops;
>>>> +	size_t ret;
>>>>  	if (!ops)
>>>>  		return 0;
>>>> -	return ops->unmap(ops, iova, size);
>>>> +	pm_runtime_get_sync(smmu_domain->smmu->dev);
>>> Can these map/unmap ops be called from an atomic context? I seem
>>> to recall that being a problem before.
>>
>> That's something which was dropped in the following patch merged in master:
>> 523d7423e21b iommu/arm-smmu: Remove io-pgtable spinlock
>>
>> Looks like we don't  need locks here anymore?
>>
> 
> While removing the spinlock around the map/unmap path may be one
> thing, I'm not sure that's all of them. Is there a path from an
> atomic DMA allocation (GFP_ATOMIC sort of thing) mapped into an
> IOMMU for a device that can eventually get down to here and
> attempt to turn a clk on?

Yes, in the DMA path map/unmap will frequently be called from IRQ
handlers (think e.g. network packets). The whole point of removing the
lock was to allow multiple maps/unmaps to execute in parallel (since we
know they will be safely operating on different areas of the pagetable).
AFAICS this change is going to largely reintroduce that bottleneck via
dev->power_lock, which is anything but what we want :(

Robin.

^ permalink raw reply	[flat|nested] 168+ messages in thread

* [PATCH V4 3/6] iommu/arm-smmu: Invoke pm_runtime during probe, add/remove device
@ 2017-07-13  9:50                 ` Robin Murphy
  0 siblings, 0 replies; 168+ messages in thread
From: Robin Murphy @ 2017-07-13  9:50 UTC (permalink / raw)
  To: linux-arm-kernel

On 13/07/17 07:48, Stephen Boyd wrote:
> On 07/13, Vivek Gautam wrote:
>> Hi Stephen,
>>
>>
>> On 07/13/2017 04:24 AM, Stephen Boyd wrote:
>>> On 07/06, Vivek Gautam wrote:
>>>> @@ -1231,12 +1237,18 @@ static int arm_smmu_map(struct iommu_domain *domain, unsigned long iova,
>>>>  static size_t arm_smmu_unmap(struct iommu_domain *domain, unsigned long iova,
>>>>  			     size_t size)
>>>>  {
>>>> -	struct io_pgtable_ops *ops = to_smmu_domain(domain)->pgtbl_ops;
>>>> +	struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
>>>> +	struct io_pgtable_ops *ops = smmu_domain->pgtbl_ops;
>>>> +	size_t ret;
>>>>  	if (!ops)
>>>>  		return 0;
>>>> -	return ops->unmap(ops, iova, size);
>>>> +	pm_runtime_get_sync(smmu_domain->smmu->dev);
>>> Can these map/unmap ops be called from an atomic context? I seem
>>> to recall that being a problem before.
>>
>> That's something which was dropped in the following patch merged in master:
>> 523d7423e21b iommu/arm-smmu: Remove io-pgtable spinlock
>>
>> Looks like we don't  need locks here anymore?
>>
> 
> While removing the spinlock around the map/unmap path may be one
> thing, I'm not sure that's all of them. Is there a path from an
> atomic DMA allocation (GFP_ATOMIC sort of thing) mapped into an
> IOMMU for a device that can eventually get down to here and
> attempt to turn a clk on?

Yes, in the DMA path map/unmap will frequently be called from IRQ
handlers (think e.g. network packets). The whole point of removing the
lock was to allow multiple maps/unmaps to execute in parallel (since we
know they will be safely operating on different areas of the pagetable).
AFAICS this change is going to largely reintroduce that bottleneck via
dev->power_lock, which is anything but what we want :(

Robin.

^ permalink raw reply	[flat|nested] 168+ messages in thread

* Re: [PATCH V4 3/6] iommu/arm-smmu: Invoke pm_runtime during probe, add/remove device
  2017-07-13  5:35               ` Sricharan R
  (?)
  (?)
@ 2017-07-13 11:50                   ` Rob Clark
  -1 siblings, 0 replies; 168+ messages in thread
From: Rob Clark @ 2017-07-13 11:50 UTC (permalink / raw)
  To: Sricharan R
  Cc: Mark Rutland, devicetree-u79uwXL29TY76Z2rM5mHXA, Will Deacon,
	Stephen Boyd, Linux Kernel Mailing List, Stanimir Varbanov,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, Rob Herring,
	Vivek Gautam, linux-arm-msm, linux-clk,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r

On Thu, Jul 13, 2017 at 1:35 AM, Sricharan R <sricharan-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org> wrote:
> Hi Vivek,
>
> On 7/13/2017 10:43 AM, Vivek Gautam wrote:
>> Hi Stephen,
>>
>>
>> On 07/13/2017 04:24 AM, Stephen Boyd wrote:
>>> On 07/06, Vivek Gautam wrote:
>>>> @@ -1231,12 +1237,18 @@ static int arm_smmu_map(struct iommu_domain *domain, unsigned long iova,
>>>>   static size_t arm_smmu_unmap(struct iommu_domain *domain, unsigned long iova,
>>>>                    size_t size)
>>>>   {
>>>> -    struct io_pgtable_ops *ops = to_smmu_domain(domain)->pgtbl_ops;
>>>> +    struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
>>>> +    struct io_pgtable_ops *ops = smmu_domain->pgtbl_ops;
>>>> +    size_t ret;
>>>>         if (!ops)
>>>>           return 0;
>>>>   -    return ops->unmap(ops, iova, size);
>>>> +    pm_runtime_get_sync(smmu_domain->smmu->dev);
>>> Can these map/unmap ops be called from an atomic context? I seem
>>> to recall that being a problem before.
>>
>> That's something which was dropped in the following patch merged in master:
>> 523d7423e21b iommu/arm-smmu: Remove io-pgtable spinlock
>>
>> Looks like we don't  need locks here anymore?
>
>  Apart from the locking, wonder why a explicit pm_runtime is needed
>  from unmap. Somehow looks like some path in the master using that
>  should have enabled the pm ?
>

Yes, there are a bunch of scenarios where unmap can happen with
disabled master (but not in atomic context).  On the gpu side we
opportunistically keep a buffer mapping until the buffer is freed
(which can happen after gpu is disabled).  Likewise, v4l2 won't unmap
an exported dmabuf while some other driver holds a reference to it
(which can be dropped when the v4l2 device is suspended).

Since unmap triggers tbl flush which touches iommu regs, the iommu
driver *definitely* needs a pm_runtime_get_sync().

BR,
-R

^ permalink raw reply	[flat|nested] 168+ messages in thread

* Re: [PATCH V4 3/6] iommu/arm-smmu: Invoke pm_runtime during probe, add/remove device
@ 2017-07-13 11:50                   ` Rob Clark
  0 siblings, 0 replies; 168+ messages in thread
From: Rob Clark @ 2017-07-13 11:50 UTC (permalink / raw)
  To: Sricharan R
  Cc: Vivek Gautam, Stephen Boyd, Joerg Roedel, Robin Murphy,
	Rob Herring, Mark Rutland, Will Deacon, Marek Szyprowski, iommu,
	devicetree, Linux Kernel Mailing List, linux-clk, linux-arm-msm,
	Stanimir Varbanov, Archit Taneja, linux-arm-kernel

On Thu, Jul 13, 2017 at 1:35 AM, Sricharan R <sricharan@codeaurora.org> wrote:
> Hi Vivek,
>
> On 7/13/2017 10:43 AM, Vivek Gautam wrote:
>> Hi Stephen,
>>
>>
>> On 07/13/2017 04:24 AM, Stephen Boyd wrote:
>>> On 07/06, Vivek Gautam wrote:
>>>> @@ -1231,12 +1237,18 @@ static int arm_smmu_map(struct iommu_domain *domain, unsigned long iova,
>>>>   static size_t arm_smmu_unmap(struct iommu_domain *domain, unsigned long iova,
>>>>                    size_t size)
>>>>   {
>>>> -    struct io_pgtable_ops *ops = to_smmu_domain(domain)->pgtbl_ops;
>>>> +    struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
>>>> +    struct io_pgtable_ops *ops = smmu_domain->pgtbl_ops;
>>>> +    size_t ret;
>>>>         if (!ops)
>>>>           return 0;
>>>>   -    return ops->unmap(ops, iova, size);
>>>> +    pm_runtime_get_sync(smmu_domain->smmu->dev);
>>> Can these map/unmap ops be called from an atomic context? I seem
>>> to recall that being a problem before.
>>
>> That's something which was dropped in the following patch merged in master:
>> 523d7423e21b iommu/arm-smmu: Remove io-pgtable spinlock
>>
>> Looks like we don't  need locks here anymore?
>
>  Apart from the locking, wonder why a explicit pm_runtime is needed
>  from unmap. Somehow looks like some path in the master using that
>  should have enabled the pm ?
>

Yes, there are a bunch of scenarios where unmap can happen with
disabled master (but not in atomic context).  On the gpu side we
opportunistically keep a buffer mapping until the buffer is freed
(which can happen after gpu is disabled).  Likewise, v4l2 won't unmap
an exported dmabuf while some other driver holds a reference to it
(which can be dropped when the v4l2 device is suspended).

Since unmap triggers tbl flush which touches iommu regs, the iommu
driver *definitely* needs a pm_runtime_get_sync().

BR,
-R

^ permalink raw reply	[flat|nested] 168+ messages in thread

* Re: [PATCH V4 3/6] iommu/arm-smmu: Invoke pm_runtime during probe, add/remove device
@ 2017-07-13 11:50                   ` Rob Clark
  0 siblings, 0 replies; 168+ messages in thread
From: Rob Clark @ 2017-07-13 11:50 UTC (permalink / raw)
  To: Sricharan R
  Cc: Vivek Gautam, Stephen Boyd, Joerg Roedel, Robin Murphy,
	Rob Herring, Mark Rutland, Will Deacon, Marek Szyprowski, iommu,
	devicetree, Linux Kernel Mailing List, linux-clk, linux-arm-msm,
	Stanimir Varbanov, Archit Taneja, linux-arm-kernel

On Thu, Jul 13, 2017 at 1:35 AM, Sricharan R <sricharan@codeaurora.org> wrote:
> Hi Vivek,
>
> On 7/13/2017 10:43 AM, Vivek Gautam wrote:
>> Hi Stephen,
>>
>>
>> On 07/13/2017 04:24 AM, Stephen Boyd wrote:
>>> On 07/06, Vivek Gautam wrote:
>>>> @@ -1231,12 +1237,18 @@ static int arm_smmu_map(struct iommu_domain *domain, unsigned long iova,
>>>>   static size_t arm_smmu_unmap(struct iommu_domain *domain, unsigned long iova,
>>>>                    size_t size)
>>>>   {
>>>> -    struct io_pgtable_ops *ops = to_smmu_domain(domain)->pgtbl_ops;
>>>> +    struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
>>>> +    struct io_pgtable_ops *ops = smmu_domain->pgtbl_ops;
>>>> +    size_t ret;
>>>>         if (!ops)
>>>>           return 0;
>>>>   -    return ops->unmap(ops, iova, size);
>>>> +    pm_runtime_get_sync(smmu_domain->smmu->dev);
>>> Can these map/unmap ops be called from an atomic context? I seem
>>> to recall that being a problem before.
>>
>> That's something which was dropped in the following patch merged in master:
>> 523d7423e21b iommu/arm-smmu: Remove io-pgtable spinlock
>>
>> Looks like we don't  need locks here anymore?
>
>  Apart from the locking, wonder why a explicit pm_runtime is needed
>  from unmap. Somehow looks like some path in the master using that
>  should have enabled the pm ?
>

Yes, there are a bunch of scenarios where unmap can happen with
disabled master (but not in atomic context).  On the gpu side we
opportunistically keep a buffer mapping until the buffer is freed
(which can happen after gpu is disabled).  Likewise, v4l2 won't unmap
an exported dmabuf while some other driver holds a reference to it
(which can be dropped when the v4l2 device is suspended).

Since unmap triggers tbl flush which touches iommu regs, the iommu
driver *definitely* needs a pm_runtime_get_sync().

BR,
-R

^ permalink raw reply	[flat|nested] 168+ messages in thread

* [PATCH V4 3/6] iommu/arm-smmu: Invoke pm_runtime during probe, add/remove device
@ 2017-07-13 11:50                   ` Rob Clark
  0 siblings, 0 replies; 168+ messages in thread
From: Rob Clark @ 2017-07-13 11:50 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Jul 13, 2017 at 1:35 AM, Sricharan R <sricharan@codeaurora.org> wrote:
> Hi Vivek,
>
> On 7/13/2017 10:43 AM, Vivek Gautam wrote:
>> Hi Stephen,
>>
>>
>> On 07/13/2017 04:24 AM, Stephen Boyd wrote:
>>> On 07/06, Vivek Gautam wrote:
>>>> @@ -1231,12 +1237,18 @@ static int arm_smmu_map(struct iommu_domain *domain, unsigned long iova,
>>>>   static size_t arm_smmu_unmap(struct iommu_domain *domain, unsigned long iova,
>>>>                    size_t size)
>>>>   {
>>>> -    struct io_pgtable_ops *ops = to_smmu_domain(domain)->pgtbl_ops;
>>>> +    struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
>>>> +    struct io_pgtable_ops *ops = smmu_domain->pgtbl_ops;
>>>> +    size_t ret;
>>>>         if (!ops)
>>>>           return 0;
>>>>   -    return ops->unmap(ops, iova, size);
>>>> +    pm_runtime_get_sync(smmu_domain->smmu->dev);
>>> Can these map/unmap ops be called from an atomic context? I seem
>>> to recall that being a problem before.
>>
>> That's something which was dropped in the following patch merged in master:
>> 523d7423e21b iommu/arm-smmu: Remove io-pgtable spinlock
>>
>> Looks like we don't  need locks here anymore?
>
>  Apart from the locking, wonder why a explicit pm_runtime is needed
>  from unmap. Somehow looks like some path in the master using that
>  should have enabled the pm ?
>

Yes, there are a bunch of scenarios where unmap can happen with
disabled master (but not in atomic context).  On the gpu side we
opportunistically keep a buffer mapping until the buffer is freed
(which can happen after gpu is disabled).  Likewise, v4l2 won't unmap
an exported dmabuf while some other driver holds a reference to it
(which can be dropped when the v4l2 device is suspended).

Since unmap triggers tbl flush which touches iommu regs, the iommu
driver *definitely* needs a pm_runtime_get_sync().

BR,
-R

^ permalink raw reply	[flat|nested] 168+ messages in thread

* Re: [PATCH V4 3/6] iommu/arm-smmu: Invoke pm_runtime during probe, add/remove device
  2017-07-13  9:50                 ` Robin Murphy
  (?)
@ 2017-07-13 11:53                   ` Rob Clark
  -1 siblings, 0 replies; 168+ messages in thread
From: Rob Clark @ 2017-07-13 11:53 UTC (permalink / raw)
  To: Robin Murphy
  Cc: Stephen Boyd, Vivek Gautam, Joerg Roedel, Rob Herring,
	Mark Rutland, Will Deacon, Marek Szyprowski, iommu, devicetree,
	Linux Kernel Mailing List, linux-clk, linux-arm-msm, Sricharan R,
	Stanimir Varbanov, Archit Taneja, linux-arm-kernel

On Thu, Jul 13, 2017 at 5:50 AM, Robin Murphy <robin.murphy@arm.com> wrote:
> On 13/07/17 07:48, Stephen Boyd wrote:
>> On 07/13, Vivek Gautam wrote:
>>> Hi Stephen,
>>>
>>>
>>> On 07/13/2017 04:24 AM, Stephen Boyd wrote:
>>>> On 07/06, Vivek Gautam wrote:
>>>>> @@ -1231,12 +1237,18 @@ static int arm_smmu_map(struct iommu_domain *domain, unsigned long iova,
>>>>>  static size_t arm_smmu_unmap(struct iommu_domain *domain, unsigned long iova,
>>>>>                         size_t size)
>>>>>  {
>>>>> -  struct io_pgtable_ops *ops = to_smmu_domain(domain)->pgtbl_ops;
>>>>> +  struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
>>>>> +  struct io_pgtable_ops *ops = smmu_domain->pgtbl_ops;
>>>>> +  size_t ret;
>>>>>    if (!ops)
>>>>>            return 0;
>>>>> -  return ops->unmap(ops, iova, size);
>>>>> +  pm_runtime_get_sync(smmu_domain->smmu->dev);
>>>> Can these map/unmap ops be called from an atomic context? I seem
>>>> to recall that being a problem before.
>>>
>>> That's something which was dropped in the following patch merged in master:
>>> 523d7423e21b iommu/arm-smmu: Remove io-pgtable spinlock
>>>
>>> Looks like we don't  need locks here anymore?
>>>
>>
>> While removing the spinlock around the map/unmap path may be one
>> thing, I'm not sure that's all of them. Is there a path from an
>> atomic DMA allocation (GFP_ATOMIC sort of thing) mapped into an
>> IOMMU for a device that can eventually get down to here and
>> attempt to turn a clk on?
>
> Yes, in the DMA path map/unmap will frequently be called from IRQ
> handlers (think e.g. network packets). The whole point of removing the
> lock was to allow multiple maps/unmaps to execute in parallel (since we
> know they will be safely operating on different areas of the pagetable).
> AFAICS this change is going to largely reintroduce that bottleneck via
> dev->power_lock, which is anything but what we want :(
>

Maybe __pm_runtime_resume() needs some sort of fast-path if already
enabled?  Or otherwise we need some sort of flag to tell the iommu
that it cannot rely on the unmapping device to be resumed?

BR,
-R

^ permalink raw reply	[flat|nested] 168+ messages in thread

* Re: [PATCH V4 3/6] iommu/arm-smmu: Invoke pm_runtime during probe, add/remove device
@ 2017-07-13 11:53                   ` Rob Clark
  0 siblings, 0 replies; 168+ messages in thread
From: Rob Clark @ 2017-07-13 11:53 UTC (permalink / raw)
  To: Robin Murphy
  Cc: Stephen Boyd, Vivek Gautam, Joerg Roedel, Rob Herring,
	Mark Rutland, Will Deacon, Marek Szyprowski, iommu, devicetree,
	Linux Kernel Mailing List, linux-clk, linux-arm-msm, Sricharan R,
	Stanimir Varbanov, Archit Taneja, linux-arm-kernel

On Thu, Jul 13, 2017 at 5:50 AM, Robin Murphy <robin.murphy@arm.com> wrote:
> On 13/07/17 07:48, Stephen Boyd wrote:
>> On 07/13, Vivek Gautam wrote:
>>> Hi Stephen,
>>>
>>>
>>> On 07/13/2017 04:24 AM, Stephen Boyd wrote:
>>>> On 07/06, Vivek Gautam wrote:
>>>>> @@ -1231,12 +1237,18 @@ static int arm_smmu_map(struct iommu_domain *domain, unsigned long iova,
>>>>>  static size_t arm_smmu_unmap(struct iommu_domain *domain, unsigned long iova,
>>>>>                         size_t size)
>>>>>  {
>>>>> -  struct io_pgtable_ops *ops = to_smmu_domain(domain)->pgtbl_ops;
>>>>> +  struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
>>>>> +  struct io_pgtable_ops *ops = smmu_domain->pgtbl_ops;
>>>>> +  size_t ret;
>>>>>    if (!ops)
>>>>>            return 0;
>>>>> -  return ops->unmap(ops, iova, size);
>>>>> +  pm_runtime_get_sync(smmu_domain->smmu->dev);
>>>> Can these map/unmap ops be called from an atomic context? I seem
>>>> to recall that being a problem before.
>>>
>>> That's something which was dropped in the following patch merged in master:
>>> 523d7423e21b iommu/arm-smmu: Remove io-pgtable spinlock
>>>
>>> Looks like we don't  need locks here anymore?
>>>
>>
>> While removing the spinlock around the map/unmap path may be one
>> thing, I'm not sure that's all of them. Is there a path from an
>> atomic DMA allocation (GFP_ATOMIC sort of thing) mapped into an
>> IOMMU for a device that can eventually get down to here and
>> attempt to turn a clk on?
>
> Yes, in the DMA path map/unmap will frequently be called from IRQ
> handlers (think e.g. network packets). The whole point of removing the
> lock was to allow multiple maps/unmaps to execute in parallel (since we
> know they will be safely operating on different areas of the pagetable).
> AFAICS this change is going to largely reintroduce that bottleneck via
> dev->power_lock, which is anything but what we want :(
>

Maybe __pm_runtime_resume() needs some sort of fast-path if already
enabled?  Or otherwise we need some sort of flag to tell the iommu
that it cannot rely on the unmapping device to be resumed?

BR,
-R

^ permalink raw reply	[flat|nested] 168+ messages in thread

* [PATCH V4 3/6] iommu/arm-smmu: Invoke pm_runtime during probe, add/remove device
@ 2017-07-13 11:53                   ` Rob Clark
  0 siblings, 0 replies; 168+ messages in thread
From: Rob Clark @ 2017-07-13 11:53 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Jul 13, 2017 at 5:50 AM, Robin Murphy <robin.murphy@arm.com> wrote:
> On 13/07/17 07:48, Stephen Boyd wrote:
>> On 07/13, Vivek Gautam wrote:
>>> Hi Stephen,
>>>
>>>
>>> On 07/13/2017 04:24 AM, Stephen Boyd wrote:
>>>> On 07/06, Vivek Gautam wrote:
>>>>> @@ -1231,12 +1237,18 @@ static int arm_smmu_map(struct iommu_domain *domain, unsigned long iova,
>>>>>  static size_t arm_smmu_unmap(struct iommu_domain *domain, unsigned long iova,
>>>>>                         size_t size)
>>>>>  {
>>>>> -  struct io_pgtable_ops *ops = to_smmu_domain(domain)->pgtbl_ops;
>>>>> +  struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
>>>>> +  struct io_pgtable_ops *ops = smmu_domain->pgtbl_ops;
>>>>> +  size_t ret;
>>>>>    if (!ops)
>>>>>            return 0;
>>>>> -  return ops->unmap(ops, iova, size);
>>>>> +  pm_runtime_get_sync(smmu_domain->smmu->dev);
>>>> Can these map/unmap ops be called from an atomic context? I seem
>>>> to recall that being a problem before.
>>>
>>> That's something which was dropped in the following patch merged in master:
>>> 523d7423e21b iommu/arm-smmu: Remove io-pgtable spinlock
>>>
>>> Looks like we don't  need locks here anymore?
>>>
>>
>> While removing the spinlock around the map/unmap path may be one
>> thing, I'm not sure that's all of them. Is there a path from an
>> atomic DMA allocation (GFP_ATOMIC sort of thing) mapped into an
>> IOMMU for a device that can eventually get down to here and
>> attempt to turn a clk on?
>
> Yes, in the DMA path map/unmap will frequently be called from IRQ
> handlers (think e.g. network packets). The whole point of removing the
> lock was to allow multiple maps/unmaps to execute in parallel (since we
> know they will be safely operating on different areas of the pagetable).
> AFAICS this change is going to largely reintroduce that bottleneck via
> dev->power_lock, which is anything but what we want :(
>

Maybe __pm_runtime_resume() needs some sort of fast-path if already
enabled?  Or otherwise we need some sort of flag to tell the iommu
that it cannot rely on the unmapping device to be resumed?

BR,
-R

^ permalink raw reply	[flat|nested] 168+ messages in thread

* Re: [PATCH V4 3/6] iommu/arm-smmu: Invoke pm_runtime during probe, add/remove device
  2017-07-13 11:50                   ` Rob Clark
  (?)
  (?)
@ 2017-07-13 12:02                       ` Marek Szyprowski
  -1 siblings, 0 replies; 168+ messages in thread
From: Marek Szyprowski @ 2017-07-13 12:02 UTC (permalink / raw)
  To: Rob Clark, Sricharan R
  Cc: Mark Rutland, devicetree-u79uwXL29TY76Z2rM5mHXA, Will Deacon,
	Stephen Boyd, Linux Kernel Mailing List, Stanimir Varbanov,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, Rob Herring,
	Vivek Gautam, linux-arm-msm, linux-clk,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r

Hi All,

On 2017-07-13 13:50, Rob Clark wrote:
> On Thu, Jul 13, 2017 at 1:35 AM, Sricharan R <sricharan-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org> wrote:
>> On 7/13/2017 10:43 AM, Vivek Gautam wrote:
>>> On 07/13/2017 04:24 AM, Stephen Boyd wrote:
>>>> On 07/06, Vivek Gautam wrote:
>>>>> @@ -1231,12 +1237,18 @@ static int arm_smmu_map(struct iommu_domain *domain, unsigned long iova,
>>>>>    static size_t arm_smmu_unmap(struct iommu_domain *domain, unsigned long iova,
>>>>>                     size_t size)
>>>>>    {
>>>>> -    struct io_pgtable_ops *ops = to_smmu_domain(domain)->pgtbl_ops;
>>>>> +    struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
>>>>> +    struct io_pgtable_ops *ops = smmu_domain->pgtbl_ops;
>>>>> +    size_t ret;
>>>>>          if (!ops)
>>>>>            return 0;
>>>>>    -    return ops->unmap(ops, iova, size);
>>>>> +    pm_runtime_get_sync(smmu_domain->smmu->dev);
>>>> Can these map/unmap ops be called from an atomic context? I seem
>>>> to recall that being a problem before.
>>> That's something which was dropped in the following patch merged in master:
>>> 523d7423e21b iommu/arm-smmu: Remove io-pgtable spinlock
>>>
>>> Looks like we don't  need locks here anymore?
>>   Apart from the locking, wonder why a explicit pm_runtime is needed
>>   from unmap. Somehow looks like some path in the master using that
>>   should have enabled the pm ?
>>
> Yes, there are a bunch of scenarios where unmap can happen with
> disabled master (but not in atomic context).  On the gpu side we
> opportunistically keep a buffer mapping until the buffer is freed
> (which can happen after gpu is disabled).  Likewise, v4l2 won't unmap
> an exported dmabuf while some other driver holds a reference to it
> (which can be dropped when the v4l2 device is suspended).
>
> Since unmap triggers tbl flush which touches iommu regs, the iommu
> driver *definitely* needs a pm_runtime_get_sync().

Afair unmap might be called from atomic context as well, for example as
a result of dma_unmap_page(). In exynos IOMMU I simply check the runtime
PM state of IOMMU device. TLB flush is performed only when IOMMU is in 
active
state. If it is suspended, I assume that the IOMMU controller's context
is already lost and its respective power domain might be already turned off,
so there is no point in touching IOMMU registers.

Best regards
-- 
Marek Szyprowski, PhD
Samsung R&D Institute Poland

^ permalink raw reply	[flat|nested] 168+ messages in thread

* Re: [PATCH V4 3/6] iommu/arm-smmu: Invoke pm_runtime during probe, add/remove device
@ 2017-07-13 12:02                       ` Marek Szyprowski
  0 siblings, 0 replies; 168+ messages in thread
From: Marek Szyprowski @ 2017-07-13 12:02 UTC (permalink / raw)
  To: Rob Clark, Sricharan R
  Cc: Vivek Gautam, Stephen Boyd, Joerg Roedel, Robin Murphy,
	Rob Herring, Mark Rutland, Will Deacon, iommu, devicetree,
	Linux Kernel Mailing List, linux-clk, linux-arm-msm,
	Stanimir Varbanov, Archit Taneja, linux-arm-kernel

Hi All,

On 2017-07-13 13:50, Rob Clark wrote:
> On Thu, Jul 13, 2017 at 1:35 AM, Sricharan R <sricharan@codeaurora.org> wrote:
>> On 7/13/2017 10:43 AM, Vivek Gautam wrote:
>>> On 07/13/2017 04:24 AM, Stephen Boyd wrote:
>>>> On 07/06, Vivek Gautam wrote:
>>>>> @@ -1231,12 +1237,18 @@ static int arm_smmu_map(struct iommu_domain *domain, unsigned long iova,
>>>>>    static size_t arm_smmu_unmap(struct iommu_domain *domain, unsigned long iova,
>>>>>                     size_t size)
>>>>>    {
>>>>> -    struct io_pgtable_ops *ops = to_smmu_domain(domain)->pgtbl_ops;
>>>>> +    struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
>>>>> +    struct io_pgtable_ops *ops = smmu_domain->pgtbl_ops;
>>>>> +    size_t ret;
>>>>>          if (!ops)
>>>>>            return 0;
>>>>>    -    return ops->unmap(ops, iova, size);
>>>>> +    pm_runtime_get_sync(smmu_domain->smmu->dev);
>>>> Can these map/unmap ops be called from an atomic context? I seem
>>>> to recall that being a problem before.
>>> That's something which was dropped in the following patch merged in master:
>>> 523d7423e21b iommu/arm-smmu: Remove io-pgtable spinlock
>>>
>>> Looks like we don't  need locks here anymore?
>>   Apart from the locking, wonder why a explicit pm_runtime is needed
>>   from unmap. Somehow looks like some path in the master using that
>>   should have enabled the pm ?
>>
> Yes, there are a bunch of scenarios where unmap can happen with
> disabled master (but not in atomic context).  On the gpu side we
> opportunistically keep a buffer mapping until the buffer is freed
> (which can happen after gpu is disabled).  Likewise, v4l2 won't unmap
> an exported dmabuf while some other driver holds a reference to it
> (which can be dropped when the v4l2 device is suspended).
>
> Since unmap triggers tbl flush which touches iommu regs, the iommu
> driver *definitely* needs a pm_runtime_get_sync().

Afair unmap might be called from atomic context as well, for example as
a result of dma_unmap_page(). In exynos IOMMU I simply check the runtime
PM state of IOMMU device. TLB flush is performed only when IOMMU is in 
active
state. If it is suspended, I assume that the IOMMU controller's context
is already lost and its respective power domain might be already turned off,
so there is no point in touching IOMMU registers.

Best regards
-- 
Marek Szyprowski, PhD
Samsung R&D Institute Poland

^ permalink raw reply	[flat|nested] 168+ messages in thread

* Re: [PATCH V4 3/6] iommu/arm-smmu: Invoke pm_runtime during probe, add/remove device
@ 2017-07-13 12:02                       ` Marek Szyprowski
  0 siblings, 0 replies; 168+ messages in thread
From: Marek Szyprowski @ 2017-07-13 12:02 UTC (permalink / raw)
  To: Rob Clark, Sricharan R
  Cc: Vivek Gautam, Stephen Boyd, Joerg Roedel, Robin Murphy,
	Rob Herring, Mark Rutland, Will Deacon, iommu, devicetree,
	Linux Kernel Mailing List, linux-clk, linux-arm-msm,
	Stanimir Varbanov, Archit Taneja, linux-arm-kernel

Hi All,

On 2017-07-13 13:50, Rob Clark wrote:
> On Thu, Jul 13, 2017 at 1:35 AM, Sricharan R <sricharan@codeaurora.org> wrote:
>> On 7/13/2017 10:43 AM, Vivek Gautam wrote:
>>> On 07/13/2017 04:24 AM, Stephen Boyd wrote:
>>>> On 07/06, Vivek Gautam wrote:
>>>>> @@ -1231,12 +1237,18 @@ static int arm_smmu_map(struct iommu_domain *domain, unsigned long iova,
>>>>>    static size_t arm_smmu_unmap(struct iommu_domain *domain, unsigned long iova,
>>>>>                     size_t size)
>>>>>    {
>>>>> -    struct io_pgtable_ops *ops = to_smmu_domain(domain)->pgtbl_ops;
>>>>> +    struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
>>>>> +    struct io_pgtable_ops *ops = smmu_domain->pgtbl_ops;
>>>>> +    size_t ret;
>>>>>          if (!ops)
>>>>>            return 0;
>>>>>    -    return ops->unmap(ops, iova, size);
>>>>> +    pm_runtime_get_sync(smmu_domain->smmu->dev);
>>>> Can these map/unmap ops be called from an atomic context? I seem
>>>> to recall that being a problem before.
>>> That's something which was dropped in the following patch merged in master:
>>> 523d7423e21b iommu/arm-smmu: Remove io-pgtable spinlock
>>>
>>> Looks like we don't  need locks here anymore?
>>   Apart from the locking, wonder why a explicit pm_runtime is needed
>>   from unmap. Somehow looks like some path in the master using that
>>   should have enabled the pm ?
>>
> Yes, there are a bunch of scenarios where unmap can happen with
> disabled master (but not in atomic context).  On the gpu side we
> opportunistically keep a buffer mapping until the buffer is freed
> (which can happen after gpu is disabled).  Likewise, v4l2 won't unmap
> an exported dmabuf while some other driver holds a reference to it
> (which can be dropped when the v4l2 device is suspended).
>
> Since unmap triggers tbl flush which touches iommu regs, the iommu
> driver *definitely* needs a pm_runtime_get_sync().

Afair unmap might be called from atomic context as well, for example as
a result of dma_unmap_page(). In exynos IOMMU I simply check the runtime
PM state of IOMMU device. TLB flush is performed only when IOMMU is in 
active
state. If it is suspended, I assume that the IOMMU controller's context
is already lost and its respective power domain might be already turned off,
so there is no point in touching IOMMU registers.

Best regards
-- 
Marek Szyprowski, PhD
Samsung R&D Institute Poland


^ permalink raw reply	[flat|nested] 168+ messages in thread

* [PATCH V4 3/6] iommu/arm-smmu: Invoke pm_runtime during probe, add/remove device
@ 2017-07-13 12:02                       ` Marek Szyprowski
  0 siblings, 0 replies; 168+ messages in thread
From: Marek Szyprowski @ 2017-07-13 12:02 UTC (permalink / raw)
  To: linux-arm-kernel

Hi All,

On 2017-07-13 13:50, Rob Clark wrote:
> On Thu, Jul 13, 2017 at 1:35 AM, Sricharan R <sricharan@codeaurora.org> wrote:
>> On 7/13/2017 10:43 AM, Vivek Gautam wrote:
>>> On 07/13/2017 04:24 AM, Stephen Boyd wrote:
>>>> On 07/06, Vivek Gautam wrote:
>>>>> @@ -1231,12 +1237,18 @@ static int arm_smmu_map(struct iommu_domain *domain, unsigned long iova,
>>>>>    static size_t arm_smmu_unmap(struct iommu_domain *domain, unsigned long iova,
>>>>>                     size_t size)
>>>>>    {
>>>>> -    struct io_pgtable_ops *ops = to_smmu_domain(domain)->pgtbl_ops;
>>>>> +    struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
>>>>> +    struct io_pgtable_ops *ops = smmu_domain->pgtbl_ops;
>>>>> +    size_t ret;
>>>>>          if (!ops)
>>>>>            return 0;
>>>>>    -    return ops->unmap(ops, iova, size);
>>>>> +    pm_runtime_get_sync(smmu_domain->smmu->dev);
>>>> Can these map/unmap ops be called from an atomic context? I seem
>>>> to recall that being a problem before.
>>> That's something which was dropped in the following patch merged in master:
>>> 523d7423e21b iommu/arm-smmu: Remove io-pgtable spinlock
>>>
>>> Looks like we don't  need locks here anymore?
>>   Apart from the locking, wonder why a explicit pm_runtime is needed
>>   from unmap. Somehow looks like some path in the master using that
>>   should have enabled the pm ?
>>
> Yes, there are a bunch of scenarios where unmap can happen with
> disabled master (but not in atomic context).  On the gpu side we
> opportunistically keep a buffer mapping until the buffer is freed
> (which can happen after gpu is disabled).  Likewise, v4l2 won't unmap
> an exported dmabuf while some other driver holds a reference to it
> (which can be dropped when the v4l2 device is suspended).
>
> Since unmap triggers tbl flush which touches iommu regs, the iommu
> driver *definitely* needs a pm_runtime_get_sync().

Afair unmap might be called from atomic context as well, for example as
a result of dma_unmap_page(). In exynos IOMMU I simply check the runtime
PM state of IOMMU device. TLB flush is performed only when IOMMU is in 
active
state. If it is suspended, I assume that the IOMMU controller's context
is already lost and its respective power domain might be already turned off,
so there is no point in touching IOMMU registers.

Best regards
-- 
Marek Szyprowski, PhD
Samsung R&D Institute Poland

^ permalink raw reply	[flat|nested] 168+ messages in thread

* Re: [PATCH V4 3/6] iommu/arm-smmu: Invoke pm_runtime during probe, add/remove device
  2017-07-13 12:02                       ` Marek Szyprowski
  (?)
@ 2017-07-13 12:10                         ` Rob Clark
  -1 siblings, 0 replies; 168+ messages in thread
From: Rob Clark @ 2017-07-13 12:10 UTC (permalink / raw)
  To: Marek Szyprowski
  Cc: Sricharan R, Vivek Gautam, Stephen Boyd, Joerg Roedel,
	Robin Murphy, Rob Herring, Mark Rutland, Will Deacon, iommu,
	devicetree, Linux Kernel Mailing List, linux-clk, linux-arm-msm,
	Stanimir Varbanov, Archit Taneja, linux-arm-kernel

On Thu, Jul 13, 2017 at 8:02 AM, Marek Szyprowski
<m.szyprowski@samsung.com> wrote:
> Hi All,
>
> On 2017-07-13 13:50, Rob Clark wrote:
>>
>> On Thu, Jul 13, 2017 at 1:35 AM, Sricharan R <sricharan@codeaurora.org>
>> wrote:
>>>
>>> On 7/13/2017 10:43 AM, Vivek Gautam wrote:
>>>>
>>>> On 07/13/2017 04:24 AM, Stephen Boyd wrote:
>>>>>
>>>>> On 07/06, Vivek Gautam wrote:
>>>>>>
>>>>>> @@ -1231,12 +1237,18 @@ static int arm_smmu_map(struct iommu_domain
>>>>>> *domain, unsigned long iova,
>>>>>>    static size_t arm_smmu_unmap(struct iommu_domain *domain, unsigned
>>>>>> long iova,
>>>>>>                     size_t size)
>>>>>>    {
>>>>>> -    struct io_pgtable_ops *ops = to_smmu_domain(domain)->pgtbl_ops;
>>>>>> +    struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
>>>>>> +    struct io_pgtable_ops *ops = smmu_domain->pgtbl_ops;
>>>>>> +    size_t ret;
>>>>>>          if (!ops)
>>>>>>            return 0;
>>>>>>    -    return ops->unmap(ops, iova, size);
>>>>>> +    pm_runtime_get_sync(smmu_domain->smmu->dev);
>>>>>
>>>>> Can these map/unmap ops be called from an atomic context? I seem
>>>>> to recall that being a problem before.
>>>>
>>>> That's something which was dropped in the following patch merged in
>>>> master:
>>>> 523d7423e21b iommu/arm-smmu: Remove io-pgtable spinlock
>>>>
>>>> Looks like we don't  need locks here anymore?
>>>
>>>   Apart from the locking, wonder why a explicit pm_runtime is needed
>>>   from unmap. Somehow looks like some path in the master using that
>>>   should have enabled the pm ?
>>>
>> Yes, there are a bunch of scenarios where unmap can happen with
>> disabled master (but not in atomic context).  On the gpu side we
>> opportunistically keep a buffer mapping until the buffer is freed
>> (which can happen after gpu is disabled).  Likewise, v4l2 won't unmap
>> an exported dmabuf while some other driver holds a reference to it
>> (which can be dropped when the v4l2 device is suspended).
>>
>> Since unmap triggers tbl flush which touches iommu regs, the iommu
>> driver *definitely* needs a pm_runtime_get_sync().
>
>
> Afair unmap might be called from atomic context as well, for example as
> a result of dma_unmap_page(). In exynos IOMMU I simply check the runtime
> PM state of IOMMU device. TLB flush is performed only when IOMMU is in
> active
> state. If it is suspended, I assume that the IOMMU controller's context
> is already lost and its respective power domain might be already turned off,
> so there is no point in touching IOMMU registers.
>

that seems like an interesting approach.. although I wonder if there
can be some race w/ new device memory access once clks are enabled
before tlb flush completes?  That would be rather bad, since this
approach is letting the backing pages of memory be freed before tlb
flush.

BR,
-R

^ permalink raw reply	[flat|nested] 168+ messages in thread

* Re: [PATCH V4 3/6] iommu/arm-smmu: Invoke pm_runtime during probe, add/remove device
@ 2017-07-13 12:10                         ` Rob Clark
  0 siblings, 0 replies; 168+ messages in thread
From: Rob Clark @ 2017-07-13 12:10 UTC (permalink / raw)
  To: Marek Szyprowski
  Cc: Sricharan R, Vivek Gautam, Stephen Boyd, Joerg Roedel,
	Robin Murphy, Rob Herring, Mark Rutland, Will Deacon, iommu,
	devicetree, Linux Kernel Mailing List, linux-clk, linux-arm-msm,
	Stanimir Varbanov, Archit Taneja, linux-arm-kernel

On Thu, Jul 13, 2017 at 8:02 AM, Marek Szyprowski
<m.szyprowski@samsung.com> wrote:
> Hi All,
>
> On 2017-07-13 13:50, Rob Clark wrote:
>>
>> On Thu, Jul 13, 2017 at 1:35 AM, Sricharan R <sricharan@codeaurora.org>
>> wrote:
>>>
>>> On 7/13/2017 10:43 AM, Vivek Gautam wrote:
>>>>
>>>> On 07/13/2017 04:24 AM, Stephen Boyd wrote:
>>>>>
>>>>> On 07/06, Vivek Gautam wrote:
>>>>>>
>>>>>> @@ -1231,12 +1237,18 @@ static int arm_smmu_map(struct iommu_domain
>>>>>> *domain, unsigned long iova,
>>>>>>    static size_t arm_smmu_unmap(struct iommu_domain *domain, unsigned
>>>>>> long iova,
>>>>>>                     size_t size)
>>>>>>    {
>>>>>> -    struct io_pgtable_ops *ops = to_smmu_domain(domain)->pgtbl_ops;
>>>>>> +    struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
>>>>>> +    struct io_pgtable_ops *ops = smmu_domain->pgtbl_ops;
>>>>>> +    size_t ret;
>>>>>>          if (!ops)
>>>>>>            return 0;
>>>>>>    -    return ops->unmap(ops, iova, size);
>>>>>> +    pm_runtime_get_sync(smmu_domain->smmu->dev);
>>>>>
>>>>> Can these map/unmap ops be called from an atomic context? I seem
>>>>> to recall that being a problem before.
>>>>
>>>> That's something which was dropped in the following patch merged in
>>>> master:
>>>> 523d7423e21b iommu/arm-smmu: Remove io-pgtable spinlock
>>>>
>>>> Looks like we don't  need locks here anymore?
>>>
>>>   Apart from the locking, wonder why a explicit pm_runtime is needed
>>>   from unmap. Somehow looks like some path in the master using that
>>>   should have enabled the pm ?
>>>
>> Yes, there are a bunch of scenarios where unmap can happen with
>> disabled master (but not in atomic context).  On the gpu side we
>> opportunistically keep a buffer mapping until the buffer is freed
>> (which can happen after gpu is disabled).  Likewise, v4l2 won't unmap
>> an exported dmabuf while some other driver holds a reference to it
>> (which can be dropped when the v4l2 device is suspended).
>>
>> Since unmap triggers tbl flush which touches iommu regs, the iommu
>> driver *definitely* needs a pm_runtime_get_sync().
>
>
> Afair unmap might be called from atomic context as well, for example as
> a result of dma_unmap_page(). In exynos IOMMU I simply check the runtime
> PM state of IOMMU device. TLB flush is performed only when IOMMU is in
> active
> state. If it is suspended, I assume that the IOMMU controller's context
> is already lost and its respective power domain might be already turned off,
> so there is no point in touching IOMMU registers.
>

that seems like an interesting approach.. although I wonder if there
can be some race w/ new device memory access once clks are enabled
before tlb flush completes?  That would be rather bad, since this
approach is letting the backing pages of memory be freed before tlb
flush.

BR,
-R

^ permalink raw reply	[flat|nested] 168+ messages in thread

* [PATCH V4 3/6] iommu/arm-smmu: Invoke pm_runtime during probe, add/remove device
@ 2017-07-13 12:10                         ` Rob Clark
  0 siblings, 0 replies; 168+ messages in thread
From: Rob Clark @ 2017-07-13 12:10 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Jul 13, 2017 at 8:02 AM, Marek Szyprowski
<m.szyprowski@samsung.com> wrote:
> Hi All,
>
> On 2017-07-13 13:50, Rob Clark wrote:
>>
>> On Thu, Jul 13, 2017 at 1:35 AM, Sricharan R <sricharan@codeaurora.org>
>> wrote:
>>>
>>> On 7/13/2017 10:43 AM, Vivek Gautam wrote:
>>>>
>>>> On 07/13/2017 04:24 AM, Stephen Boyd wrote:
>>>>>
>>>>> On 07/06, Vivek Gautam wrote:
>>>>>>
>>>>>> @@ -1231,12 +1237,18 @@ static int arm_smmu_map(struct iommu_domain
>>>>>> *domain, unsigned long iova,
>>>>>>    static size_t arm_smmu_unmap(struct iommu_domain *domain, unsigned
>>>>>> long iova,
>>>>>>                     size_t size)
>>>>>>    {
>>>>>> -    struct io_pgtable_ops *ops = to_smmu_domain(domain)->pgtbl_ops;
>>>>>> +    struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
>>>>>> +    struct io_pgtable_ops *ops = smmu_domain->pgtbl_ops;
>>>>>> +    size_t ret;
>>>>>>          if (!ops)
>>>>>>            return 0;
>>>>>>    -    return ops->unmap(ops, iova, size);
>>>>>> +    pm_runtime_get_sync(smmu_domain->smmu->dev);
>>>>>
>>>>> Can these map/unmap ops be called from an atomic context? I seem
>>>>> to recall that being a problem before.
>>>>
>>>> That's something which was dropped in the following patch merged in
>>>> master:
>>>> 523d7423e21b iommu/arm-smmu: Remove io-pgtable spinlock
>>>>
>>>> Looks like we don't  need locks here anymore?
>>>
>>>   Apart from the locking, wonder why a explicit pm_runtime is needed
>>>   from unmap. Somehow looks like some path in the master using that
>>>   should have enabled the pm ?
>>>
>> Yes, there are a bunch of scenarios where unmap can happen with
>> disabled master (but not in atomic context).  On the gpu side we
>> opportunistically keep a buffer mapping until the buffer is freed
>> (which can happen after gpu is disabled).  Likewise, v4l2 won't unmap
>> an exported dmabuf while some other driver holds a reference to it
>> (which can be dropped when the v4l2 device is suspended).
>>
>> Since unmap triggers tbl flush which touches iommu regs, the iommu
>> driver *definitely* needs a pm_runtime_get_sync().
>
>
> Afair unmap might be called from atomic context as well, for example as
> a result of dma_unmap_page(). In exynos IOMMU I simply check the runtime
> PM state of IOMMU device. TLB flush is performed only when IOMMU is in
> active
> state. If it is suspended, I assume that the IOMMU controller's context
> is already lost and its respective power domain might be already turned off,
> so there is no point in touching IOMMU registers.
>

that seems like an interesting approach.. although I wonder if there
can be some race w/ new device memory access once clks are enabled
before tlb flush completes?  That would be rather bad, since this
approach is letting the backing pages of memory be freed before tlb
flush.

BR,
-R

^ permalink raw reply	[flat|nested] 168+ messages in thread

* Re: [PATCH V4 3/6] iommu/arm-smmu: Invoke pm_runtime during probe, add/remove device
  2017-07-13 12:10                         ` Rob Clark
  (?)
  (?)
@ 2017-07-13 12:23                             ` Marek Szyprowski
  -1 siblings, 0 replies; 168+ messages in thread
From: Marek Szyprowski @ 2017-07-13 12:23 UTC (permalink / raw)
  To: Rob Clark
  Cc: Mark Rutland, devicetree-u79uwXL29TY76Z2rM5mHXA, Stephen Boyd,
	Linux Kernel Mailing List, Will Deacon, Stanimir Varbanov,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, Rob Herring,
	Vivek Gautam, linux-arm-msm, linux-clk,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r

Hi Rob,

On 2017-07-13 14:10, Rob Clark wrote:
> On Thu, Jul 13, 2017 at 8:02 AM, Marek Szyprowski
> <m.szyprowski-Sze3O3UU22JBDgjK7y7TUQ@public.gmane.org> wrote:
>> On 2017-07-13 13:50, Rob Clark wrote:
>>> On Thu, Jul 13, 2017 at 1:35 AM, Sricharan R <sricharan-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org>
>>> wrote:
>>>> On 7/13/2017 10:43 AM, Vivek Gautam wrote:
>>>>> On 07/13/2017 04:24 AM, Stephen Boyd wrote:
>>>>>> On 07/06, Vivek Gautam wrote:
>>>>>>> @@ -1231,12 +1237,18 @@ static int arm_smmu_map(struct iommu_domain
>>>>>>> *domain, unsigned long iova,
>>>>>>>     static size_t arm_smmu_unmap(struct iommu_domain *domain, unsigned
>>>>>>> long iova,
>>>>>>>                      size_t size)
>>>>>>>     {
>>>>>>> -    struct io_pgtable_ops *ops = to_smmu_domain(domain)->pgtbl_ops;
>>>>>>> +    struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
>>>>>>> +    struct io_pgtable_ops *ops = smmu_domain->pgtbl_ops;
>>>>>>> +    size_t ret;
>>>>>>>           if (!ops)
>>>>>>>             return 0;
>>>>>>>     -    return ops->unmap(ops, iova, size);
>>>>>>> +    pm_runtime_get_sync(smmu_domain->smmu->dev);
>>>>>> Can these map/unmap ops be called from an atomic context? I seem
>>>>>> to recall that being a problem before.
>>>>> That's something which was dropped in the following patch merged in
>>>>> master:
>>>>> 523d7423e21b iommu/arm-smmu: Remove io-pgtable spinlock
>>>>>
>>>>> Looks like we don't  need locks here anymore?
>>>>    Apart from the locking, wonder why a explicit pm_runtime is needed
>>>>    from unmap. Somehow looks like some path in the master using that
>>>>    should have enabled the pm ?
>>>>
>>> Yes, there are a bunch of scenarios where unmap can happen with
>>> disabled master (but not in atomic context).  On the gpu side we
>>> opportunistically keep a buffer mapping until the buffer is freed
>>> (which can happen after gpu is disabled).  Likewise, v4l2 won't unmap
>>> an exported dmabuf while some other driver holds a reference to it
>>> (which can be dropped when the v4l2 device is suspended).
>>>
>>> Since unmap triggers tbl flush which touches iommu regs, the iommu
>>> driver *definitely* needs a pm_runtime_get_sync().
>>
>> Afair unmap might be called from atomic context as well, for example as
>> a result of dma_unmap_page(). In exynos IOMMU I simply check the runtime
>> PM state of IOMMU device. TLB flush is performed only when IOMMU is in
>> active
>> state. If it is suspended, I assume that the IOMMU controller's context
>> is already lost and its respective power domain might be already turned off,
>> so there is no point in touching IOMMU registers.
>>
> that seems like an interesting approach.. although I wonder if there
> can be some race w/ new device memory access once clks are enabled
> before tlb flush completes?  That would be rather bad, since this
> approach is letting the backing pages of memory be freed before tlb
> flush.

Exynos IOMMU has spinlock for ensuring that there is no race between PM 
runtime
suspend and unmap/tlb flush.

Best regards
-- 
Marek Szyprowski, PhD
Samsung R&D Institute Poland

^ permalink raw reply	[flat|nested] 168+ messages in thread

* Re: [PATCH V4 3/6] iommu/arm-smmu: Invoke pm_runtime during probe, add/remove device
@ 2017-07-13 12:23                             ` Marek Szyprowski
  0 siblings, 0 replies; 168+ messages in thread
From: Marek Szyprowski @ 2017-07-13 12:23 UTC (permalink / raw)
  To: Rob Clark
  Cc: Sricharan R, Vivek Gautam, Stephen Boyd, Joerg Roedel,
	Robin Murphy, Rob Herring, Mark Rutland, Will Deacon, iommu,
	devicetree, Linux Kernel Mailing List, linux-clk, linux-arm-msm,
	Stanimir Varbanov, Archit Taneja, linux-arm-kernel

Hi Rob,

On 2017-07-13 14:10, Rob Clark wrote:
> On Thu, Jul 13, 2017 at 8:02 AM, Marek Szyprowski
> <m.szyprowski@samsung.com> wrote:
>> On 2017-07-13 13:50, Rob Clark wrote:
>>> On Thu, Jul 13, 2017 at 1:35 AM, Sricharan R <sricharan@codeaurora.org>
>>> wrote:
>>>> On 7/13/2017 10:43 AM, Vivek Gautam wrote:
>>>>> On 07/13/2017 04:24 AM, Stephen Boyd wrote:
>>>>>> On 07/06, Vivek Gautam wrote:
>>>>>>> @@ -1231,12 +1237,18 @@ static int arm_smmu_map(struct iommu_domain
>>>>>>> *domain, unsigned long iova,
>>>>>>>     static size_t arm_smmu_unmap(struct iommu_domain *domain, unsigned
>>>>>>> long iova,
>>>>>>>                      size_t size)
>>>>>>>     {
>>>>>>> -    struct io_pgtable_ops *ops = to_smmu_domain(domain)->pgtbl_ops;
>>>>>>> +    struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
>>>>>>> +    struct io_pgtable_ops *ops = smmu_domain->pgtbl_ops;
>>>>>>> +    size_t ret;
>>>>>>>           if (!ops)
>>>>>>>             return 0;
>>>>>>>     -    return ops->unmap(ops, iova, size);
>>>>>>> +    pm_runtime_get_sync(smmu_domain->smmu->dev);
>>>>>> Can these map/unmap ops be called from an atomic context? I seem
>>>>>> to recall that being a problem before.
>>>>> That's something which was dropped in the following patch merged in
>>>>> master:
>>>>> 523d7423e21b iommu/arm-smmu: Remove io-pgtable spinlock
>>>>>
>>>>> Looks like we don't  need locks here anymore?
>>>>    Apart from the locking, wonder why a explicit pm_runtime is needed
>>>>    from unmap. Somehow looks like some path in the master using that
>>>>    should have enabled the pm ?
>>>>
>>> Yes, there are a bunch of scenarios where unmap can happen with
>>> disabled master (but not in atomic context).  On the gpu side we
>>> opportunistically keep a buffer mapping until the buffer is freed
>>> (which can happen after gpu is disabled).  Likewise, v4l2 won't unmap
>>> an exported dmabuf while some other driver holds a reference to it
>>> (which can be dropped when the v4l2 device is suspended).
>>>
>>> Since unmap triggers tbl flush which touches iommu regs, the iommu
>>> driver *definitely* needs a pm_runtime_get_sync().
>>
>> Afair unmap might be called from atomic context as well, for example as
>> a result of dma_unmap_page(). In exynos IOMMU I simply check the runtime
>> PM state of IOMMU device. TLB flush is performed only when IOMMU is in
>> active
>> state. If it is suspended, I assume that the IOMMU controller's context
>> is already lost and its respective power domain might be already turned off,
>> so there is no point in touching IOMMU registers.
>>
> that seems like an interesting approach.. although I wonder if there
> can be some race w/ new device memory access once clks are enabled
> before tlb flush completes?  That would be rather bad, since this
> approach is letting the backing pages of memory be freed before tlb
> flush.

Exynos IOMMU has spinlock for ensuring that there is no race between PM 
runtime
suspend and unmap/tlb flush.

Best regards
-- 
Marek Szyprowski, PhD
Samsung R&D Institute Poland

^ permalink raw reply	[flat|nested] 168+ messages in thread

* Re: [PATCH V4 3/6] iommu/arm-smmu: Invoke pm_runtime during probe, add/remove device
@ 2017-07-13 12:23                             ` Marek Szyprowski
  0 siblings, 0 replies; 168+ messages in thread
From: Marek Szyprowski @ 2017-07-13 12:23 UTC (permalink / raw)
  To: Rob Clark
  Cc: Sricharan R, Vivek Gautam, Stephen Boyd, Joerg Roedel,
	Robin Murphy, Rob Herring, Mark Rutland, Will Deacon, iommu,
	devicetree, Linux Kernel Mailing List, linux-clk, linux-arm-msm,
	Stanimir Varbanov, Archit Taneja, linux-arm-kernel

Hi Rob,

On 2017-07-13 14:10, Rob Clark wrote:
> On Thu, Jul 13, 2017 at 8:02 AM, Marek Szyprowski
> <m.szyprowski@samsung.com> wrote:
>> On 2017-07-13 13:50, Rob Clark wrote:
>>> On Thu, Jul 13, 2017 at 1:35 AM, Sricharan R <sricharan@codeaurora.org>
>>> wrote:
>>>> On 7/13/2017 10:43 AM, Vivek Gautam wrote:
>>>>> On 07/13/2017 04:24 AM, Stephen Boyd wrote:
>>>>>> On 07/06, Vivek Gautam wrote:
>>>>>>> @@ -1231,12 +1237,18 @@ static int arm_smmu_map(struct iommu_domain
>>>>>>> *domain, unsigned long iova,
>>>>>>>     static size_t arm_smmu_unmap(struct iommu_domain *domain, unsigned
>>>>>>> long iova,
>>>>>>>                      size_t size)
>>>>>>>     {
>>>>>>> -    struct io_pgtable_ops *ops = to_smmu_domain(domain)->pgtbl_ops;
>>>>>>> +    struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
>>>>>>> +    struct io_pgtable_ops *ops = smmu_domain->pgtbl_ops;
>>>>>>> +    size_t ret;
>>>>>>>           if (!ops)
>>>>>>>             return 0;
>>>>>>>     -    return ops->unmap(ops, iova, size);
>>>>>>> +    pm_runtime_get_sync(smmu_domain->smmu->dev);
>>>>>> Can these map/unmap ops be called from an atomic context? I seem
>>>>>> to recall that being a problem before.
>>>>> That's something which was dropped in the following patch merged in
>>>>> master:
>>>>> 523d7423e21b iommu/arm-smmu: Remove io-pgtable spinlock
>>>>>
>>>>> Looks like we don't  need locks here anymore?
>>>>    Apart from the locking, wonder why a explicit pm_runtime is needed
>>>>    from unmap. Somehow looks like some path in the master using that
>>>>    should have enabled the pm ?
>>>>
>>> Yes, there are a bunch of scenarios where unmap can happen with
>>> disabled master (but not in atomic context).  On the gpu side we
>>> opportunistically keep a buffer mapping until the buffer is freed
>>> (which can happen after gpu is disabled).  Likewise, v4l2 won't unmap
>>> an exported dmabuf while some other driver holds a reference to it
>>> (which can be dropped when the v4l2 device is suspended).
>>>
>>> Since unmap triggers tbl flush which touches iommu regs, the iommu
>>> driver *definitely* needs a pm_runtime_get_sync().
>>
>> Afair unmap might be called from atomic context as well, for example as
>> a result of dma_unmap_page(). In exynos IOMMU I simply check the runtime
>> PM state of IOMMU device. TLB flush is performed only when IOMMU is in
>> active
>> state. If it is suspended, I assume that the IOMMU controller's context
>> is already lost and its respective power domain might be already turned off,
>> so there is no point in touching IOMMU registers.
>>
> that seems like an interesting approach.. although I wonder if there
> can be some race w/ new device memory access once clks are enabled
> before tlb flush completes?  That would be rather bad, since this
> approach is letting the backing pages of memory be freed before tlb
> flush.

Exynos IOMMU has spinlock for ensuring that there is no race between PM 
runtime
suspend and unmap/tlb flush.

Best regards
-- 
Marek Szyprowski, PhD
Samsung R&D Institute Poland

^ permalink raw reply	[flat|nested] 168+ messages in thread

* [PATCH V4 3/6] iommu/arm-smmu: Invoke pm_runtime during probe, add/remove device
@ 2017-07-13 12:23                             ` Marek Szyprowski
  0 siblings, 0 replies; 168+ messages in thread
From: Marek Szyprowski @ 2017-07-13 12:23 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Rob,

On 2017-07-13 14:10, Rob Clark wrote:
> On Thu, Jul 13, 2017 at 8:02 AM, Marek Szyprowski
> <m.szyprowski@samsung.com> wrote:
>> On 2017-07-13 13:50, Rob Clark wrote:
>>> On Thu, Jul 13, 2017 at 1:35 AM, Sricharan R <sricharan@codeaurora.org>
>>> wrote:
>>>> On 7/13/2017 10:43 AM, Vivek Gautam wrote:
>>>>> On 07/13/2017 04:24 AM, Stephen Boyd wrote:
>>>>>> On 07/06, Vivek Gautam wrote:
>>>>>>> @@ -1231,12 +1237,18 @@ static int arm_smmu_map(struct iommu_domain
>>>>>>> *domain, unsigned long iova,
>>>>>>>     static size_t arm_smmu_unmap(struct iommu_domain *domain, unsigned
>>>>>>> long iova,
>>>>>>>                      size_t size)
>>>>>>>     {
>>>>>>> -    struct io_pgtable_ops *ops = to_smmu_domain(domain)->pgtbl_ops;
>>>>>>> +    struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
>>>>>>> +    struct io_pgtable_ops *ops = smmu_domain->pgtbl_ops;
>>>>>>> +    size_t ret;
>>>>>>>           if (!ops)
>>>>>>>             return 0;
>>>>>>>     -    return ops->unmap(ops, iova, size);
>>>>>>> +    pm_runtime_get_sync(smmu_domain->smmu->dev);
>>>>>> Can these map/unmap ops be called from an atomic context? I seem
>>>>>> to recall that being a problem before.
>>>>> That's something which was dropped in the following patch merged in
>>>>> master:
>>>>> 523d7423e21b iommu/arm-smmu: Remove io-pgtable spinlock
>>>>>
>>>>> Looks like we don't  need locks here anymore?
>>>>    Apart from the locking, wonder why a explicit pm_runtime is needed
>>>>    from unmap. Somehow looks like some path in the master using that
>>>>    should have enabled the pm ?
>>>>
>>> Yes, there are a bunch of scenarios where unmap can happen with
>>> disabled master (but not in atomic context).  On the gpu side we
>>> opportunistically keep a buffer mapping until the buffer is freed
>>> (which can happen after gpu is disabled).  Likewise, v4l2 won't unmap
>>> an exported dmabuf while some other driver holds a reference to it
>>> (which can be dropped when the v4l2 device is suspended).
>>>
>>> Since unmap triggers tbl flush which touches iommu regs, the iommu
>>> driver *definitely* needs a pm_runtime_get_sync().
>>
>> Afair unmap might be called from atomic context as well, for example as
>> a result of dma_unmap_page(). In exynos IOMMU I simply check the runtime
>> PM state of IOMMU device. TLB flush is performed only when IOMMU is in
>> active
>> state. If it is suspended, I assume that the IOMMU controller's context
>> is already lost and its respective power domain might be already turned off,
>> so there is no point in touching IOMMU registers.
>>
> that seems like an interesting approach.. although I wonder if there
> can be some race w/ new device memory access once clks are enabled
> before tlb flush completes?  That would be rather bad, since this
> approach is letting the backing pages of memory be freed before tlb
> flush.

Exynos IOMMU has spinlock for ensuring that there is no race between PM 
runtime
suspend and unmap/tlb flush.

Best regards
-- 
Marek Szyprowski, PhD
Samsung R&D Institute Poland

^ permalink raw reply	[flat|nested] 168+ messages in thread

* Re: [PATCH V4 3/6] iommu/arm-smmu: Invoke pm_runtime during probe, add/remove device
  2017-07-13 11:50                   ` Rob Clark
  (?)
  (?)
@ 2017-07-13 13:53                       ` Sricharan R
  -1 siblings, 0 replies; 168+ messages in thread
From: Sricharan R @ 2017-07-13 13:53 UTC (permalink / raw)
  To: Rob Clark
  Cc: Mark Rutland, devicetree-u79uwXL29TY76Z2rM5mHXA, Will Deacon,
	Stephen Boyd, Linux Kernel Mailing List, Stanimir Varbanov,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, Rob Herring,
	Vivek Gautam, linux-arm-msm, linux-clk,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r

Hi,

On 7/13/2017 5:20 PM, Rob Clark wrote:
> On Thu, Jul 13, 2017 at 1:35 AM, Sricharan R <sricharan-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org> wrote:
>> Hi Vivek,
>>
>> On 7/13/2017 10:43 AM, Vivek Gautam wrote:
>>> Hi Stephen,
>>>
>>>
>>> On 07/13/2017 04:24 AM, Stephen Boyd wrote:
>>>> On 07/06, Vivek Gautam wrote:
>>>>> @@ -1231,12 +1237,18 @@ static int arm_smmu_map(struct iommu_domain *domain, unsigned long iova,
>>>>>   static size_t arm_smmu_unmap(struct iommu_domain *domain, unsigned long iova,
>>>>>                    size_t size)
>>>>>   {
>>>>> -    struct io_pgtable_ops *ops = to_smmu_domain(domain)->pgtbl_ops;
>>>>> +    struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
>>>>> +    struct io_pgtable_ops *ops = smmu_domain->pgtbl_ops;
>>>>> +    size_t ret;
>>>>>         if (!ops)
>>>>>           return 0;
>>>>>   -    return ops->unmap(ops, iova, size);
>>>>> +    pm_runtime_get_sync(smmu_domain->smmu->dev);
>>>> Can these map/unmap ops be called from an atomic context? I seem
>>>> to recall that being a problem before.
>>>
>>> That's something which was dropped in the following patch merged in master:
>>> 523d7423e21b iommu/arm-smmu: Remove io-pgtable spinlock
>>>
>>> Looks like we don't  need locks here anymore?
>>
>>  Apart from the locking, wonder why a explicit pm_runtime is needed
>>  from unmap. Somehow looks like some path in the master using that
>>  should have enabled the pm ?
>>
> 
> Yes, there are a bunch of scenarios where unmap can happen with
> disabled master (but not in atomic context).  On the gpu side we
> opportunistically keep a buffer mapping until the buffer is freed
> (which can happen after gpu is disabled).  Likewise, v4l2 won't unmap
> an exported dmabuf while some other driver holds a reference to it
> (which can be dropped when the v4l2 device is suspended).
> 
> Since unmap triggers tbl flush which touches iommu regs, the iommu
> driver *definitely* needs a pm_runtime_get_sync().

 Ok, with that being the case, there are two things here,

 1) If the device links are still intact at these places where unmap is called,
    then pm_runtime from the master would setup the all the clocks. That would
    avoid reintroducing the locking indirectly here.

 2) If not, then doing it here is the only way. But for both cases, since
    the unmap can be called from atomic context, resume handler here should
    avoid doing clk_prepare_enable , instead move the clk_prepare to the init.

Regards,
 Sricharan


-- 
"QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation

---
This email has been checked for viruses by Avast antivirus software.
https://www.avast.com/antivirus

^ permalink raw reply	[flat|nested] 168+ messages in thread

* Re: [PATCH V4 3/6] iommu/arm-smmu: Invoke pm_runtime during probe, add/remove device
@ 2017-07-13 13:53                       ` Sricharan R
  0 siblings, 0 replies; 168+ messages in thread
From: Sricharan R @ 2017-07-13 13:53 UTC (permalink / raw)
  To: Rob Clark
  Cc: Vivek Gautam, Stephen Boyd, Joerg Roedel, Robin Murphy,
	Rob Herring, Mark Rutland, Will Deacon, Marek Szyprowski, iommu,
	devicetree, Linux Kernel Mailing List, linux-clk, linux-arm-msm,
	Stanimir Varbanov, Archit Taneja, linux-arm-kernel

Hi,

On 7/13/2017 5:20 PM, Rob Clark wrote:
> On Thu, Jul 13, 2017 at 1:35 AM, Sricharan R <sricharan@codeaurora.org> wrote:
>> Hi Vivek,
>>
>> On 7/13/2017 10:43 AM, Vivek Gautam wrote:
>>> Hi Stephen,
>>>
>>>
>>> On 07/13/2017 04:24 AM, Stephen Boyd wrote:
>>>> On 07/06, Vivek Gautam wrote:
>>>>> @@ -1231,12 +1237,18 @@ static int arm_smmu_map(struct iommu_domain *domain, unsigned long iova,
>>>>>   static size_t arm_smmu_unmap(struct iommu_domain *domain, unsigned long iova,
>>>>>                    size_t size)
>>>>>   {
>>>>> -    struct io_pgtable_ops *ops = to_smmu_domain(domain)->pgtbl_ops;
>>>>> +    struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
>>>>> +    struct io_pgtable_ops *ops = smmu_domain->pgtbl_ops;
>>>>> +    size_t ret;
>>>>>         if (!ops)
>>>>>           return 0;
>>>>>   -    return ops->unmap(ops, iova, size);
>>>>> +    pm_runtime_get_sync(smmu_domain->smmu->dev);
>>>> Can these map/unmap ops be called from an atomic context? I seem
>>>> to recall that being a problem before.
>>>
>>> That's something which was dropped in the following patch merged in master:
>>> 523d7423e21b iommu/arm-smmu: Remove io-pgtable spinlock
>>>
>>> Looks like we don't  need locks here anymore?
>>
>>  Apart from the locking, wonder why a explicit pm_runtime is needed
>>  from unmap. Somehow looks like some path in the master using that
>>  should have enabled the pm ?
>>
> 
> Yes, there are a bunch of scenarios where unmap can happen with
> disabled master (but not in atomic context).  On the gpu side we
> opportunistically keep a buffer mapping until the buffer is freed
> (which can happen after gpu is disabled).  Likewise, v4l2 won't unmap
> an exported dmabuf while some other driver holds a reference to it
> (which can be dropped when the v4l2 device is suspended).
> 
> Since unmap triggers tbl flush which touches iommu regs, the iommu
> driver *definitely* needs a pm_runtime_get_sync().

 Ok, with that being the case, there are two things here,

 1) If the device links are still intact at these places where unmap is called,
    then pm_runtime from the master would setup the all the clocks. That would
    avoid reintroducing the locking indirectly here.

 2) If not, then doing it here is the only way. But for both cases, since
    the unmap can be called from atomic context, resume handler here should
    avoid doing clk_prepare_enable , instead move the clk_prepare to the init.

Regards,
 Sricharan


-- 
"QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation

---
This email has been checked for viruses by Avast antivirus software.
https://www.avast.com/antivirus

^ permalink raw reply	[flat|nested] 168+ messages in thread

* Re: [PATCH V4 3/6] iommu/arm-smmu: Invoke pm_runtime during probe, add/remove device
@ 2017-07-13 13:53                       ` Sricharan R
  0 siblings, 0 replies; 168+ messages in thread
From: Sricharan R @ 2017-07-13 13:53 UTC (permalink / raw)
  To: Rob Clark
  Cc: Vivek Gautam, Stephen Boyd, Joerg Roedel, Robin Murphy,
	Rob Herring, Mark Rutland, Will Deacon, Marek Szyprowski, iommu,
	devicetree, Linux Kernel Mailing List, linux-clk, linux-arm-msm,
	Stanimir Varbanov, Archit Taneja, linux-arm-kernel

Hi,

On 7/13/2017 5:20 PM, Rob Clark wrote:
> On Thu, Jul 13, 2017 at 1:35 AM, Sricharan R <sricharan@codeaurora.org> wrote:
>> Hi Vivek,
>>
>> On 7/13/2017 10:43 AM, Vivek Gautam wrote:
>>> Hi Stephen,
>>>
>>>
>>> On 07/13/2017 04:24 AM, Stephen Boyd wrote:
>>>> On 07/06, Vivek Gautam wrote:
>>>>> @@ -1231,12 +1237,18 @@ static int arm_smmu_map(struct iommu_domain *domain, unsigned long iova,
>>>>>   static size_t arm_smmu_unmap(struct iommu_domain *domain, unsigned long iova,
>>>>>                    size_t size)
>>>>>   {
>>>>> -    struct io_pgtable_ops *ops = to_smmu_domain(domain)->pgtbl_ops;
>>>>> +    struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
>>>>> +    struct io_pgtable_ops *ops = smmu_domain->pgtbl_ops;
>>>>> +    size_t ret;
>>>>>         if (!ops)
>>>>>           return 0;
>>>>>   -    return ops->unmap(ops, iova, size);
>>>>> +    pm_runtime_get_sync(smmu_domain->smmu->dev);
>>>> Can these map/unmap ops be called from an atomic context? I seem
>>>> to recall that being a problem before.
>>>
>>> That's something which was dropped in the following patch merged in master:
>>> 523d7423e21b iommu/arm-smmu: Remove io-pgtable spinlock
>>>
>>> Looks like we don't  need locks here anymore?
>>
>>  Apart from the locking, wonder why a explicit pm_runtime is needed
>>  from unmap. Somehow looks like some path in the master using that
>>  should have enabled the pm ?
>>
> 
> Yes, there are a bunch of scenarios where unmap can happen with
> disabled master (but not in atomic context).  On the gpu side we
> opportunistically keep a buffer mapping until the buffer is freed
> (which can happen after gpu is disabled).  Likewise, v4l2 won't unmap
> an exported dmabuf while some other driver holds a reference to it
> (which can be dropped when the v4l2 device is suspended).
> 
> Since unmap triggers tbl flush which touches iommu regs, the iommu
> driver *definitely* needs a pm_runtime_get_sync().

 Ok, with that being the case, there are two things here,

 1) If the device links are still intact at these places where unmap is called,
    then pm_runtime from the master would setup the all the clocks. That would
    avoid reintroducing the locking indirectly here.

 2) If not, then doing it here is the only way. But for both cases, since
    the unmap can be called from atomic context, resume handler here should
    avoid doing clk_prepare_enable , instead move the clk_prepare to the init.

Regards,
 Sricharan


-- 
"QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation

---
This email has been checked for viruses by Avast antivirus software.
https://www.avast.com/antivirus

^ permalink raw reply	[flat|nested] 168+ messages in thread

* [PATCH V4 3/6] iommu/arm-smmu: Invoke pm_runtime during probe, add/remove device
@ 2017-07-13 13:53                       ` Sricharan R
  0 siblings, 0 replies; 168+ messages in thread
From: Sricharan R @ 2017-07-13 13:53 UTC (permalink / raw)
  To: linux-arm-kernel

Hi,

On 7/13/2017 5:20 PM, Rob Clark wrote:
> On Thu, Jul 13, 2017 at 1:35 AM, Sricharan R <sricharan@codeaurora.org> wrote:
>> Hi Vivek,
>>
>> On 7/13/2017 10:43 AM, Vivek Gautam wrote:
>>> Hi Stephen,
>>>
>>>
>>> On 07/13/2017 04:24 AM, Stephen Boyd wrote:
>>>> On 07/06, Vivek Gautam wrote:
>>>>> @@ -1231,12 +1237,18 @@ static int arm_smmu_map(struct iommu_domain *domain, unsigned long iova,
>>>>>   static size_t arm_smmu_unmap(struct iommu_domain *domain, unsigned long iova,
>>>>>                    size_t size)
>>>>>   {
>>>>> -    struct io_pgtable_ops *ops = to_smmu_domain(domain)->pgtbl_ops;
>>>>> +    struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
>>>>> +    struct io_pgtable_ops *ops = smmu_domain->pgtbl_ops;
>>>>> +    size_t ret;
>>>>>         if (!ops)
>>>>>           return 0;
>>>>>   -    return ops->unmap(ops, iova, size);
>>>>> +    pm_runtime_get_sync(smmu_domain->smmu->dev);
>>>> Can these map/unmap ops be called from an atomic context? I seem
>>>> to recall that being a problem before.
>>>
>>> That's something which was dropped in the following patch merged in master:
>>> 523d7423e21b iommu/arm-smmu: Remove io-pgtable spinlock
>>>
>>> Looks like we don't  need locks here anymore?
>>
>>  Apart from the locking, wonder why a explicit pm_runtime is needed
>>  from unmap. Somehow looks like some path in the master using that
>>  should have enabled the pm ?
>>
> 
> Yes, there are a bunch of scenarios where unmap can happen with
> disabled master (but not in atomic context).  On the gpu side we
> opportunistically keep a buffer mapping until the buffer is freed
> (which can happen after gpu is disabled).  Likewise, v4l2 won't unmap
> an exported dmabuf while some other driver holds a reference to it
> (which can be dropped when the v4l2 device is suspended).
> 
> Since unmap triggers tbl flush which touches iommu regs, the iommu
> driver *definitely* needs a pm_runtime_get_sync().

 Ok, with that being the case, there are two things here,

 1) If the device links are still intact at these places where unmap is called,
    then pm_runtime from the master would setup the all the clocks. That would
    avoid reintroducing the locking indirectly here.

 2) If not, then doing it here is the only way. But for both cases, since
    the unmap can be called from atomic context, resume handler here should
    avoid doing clk_prepare_enable , instead move the clk_prepare to the init.

Regards,
 Sricharan


-- 
"QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation

---
This email has been checked for viruses by Avast antivirus software.
https://www.avast.com/antivirus

^ permalink raw reply	[flat|nested] 168+ messages in thread

* Re: [PATCH V4 3/6] iommu/arm-smmu: Invoke pm_runtime during probe, add/remove device
  2017-07-13  5:35               ` Sricharan R
  (?)
  (?)
@ 2017-07-13 13:57                   ` Vivek Gautam
  -1 siblings, 0 replies; 168+ messages in thread
From: Vivek Gautam @ 2017-07-13 13:57 UTC (permalink / raw)
  To: Sricharan R, Stephen Boyd, robin.murphy-5wv7dgnIgG8
  Cc: Mark Rutland, devicetree-u79uwXL29TY76Z2rM5mHXA,
	linux-arm-msm-u79uwXL29TY76Z2rM5mHXA, Will Deacon,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, robh+dt, Stanimir Varbanov,
	linux-clk-u79uwXL29TY76Z2rM5mHXA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r

On Thu, Jul 13, 2017 at 11:05 AM, Sricharan R <sricharan-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org> wrote:
> Hi Vivek,
>
> On 7/13/2017 10:43 AM, Vivek Gautam wrote:
>> Hi Stephen,
>>
>>
>> On 07/13/2017 04:24 AM, Stephen Boyd wrote:
>>> On 07/06, Vivek Gautam wrote:
>>>> @@ -1231,12 +1237,18 @@ static int arm_smmu_map(struct iommu_domain *domain, unsigned long iova,
>>>>   static size_t arm_smmu_unmap(struct iommu_domain *domain, unsigned long iova,
>>>>                    size_t size)
>>>>   {
>>>> -    struct io_pgtable_ops *ops = to_smmu_domain(domain)->pgtbl_ops;
>>>> +    struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
>>>> +    struct io_pgtable_ops *ops = smmu_domain->pgtbl_ops;
>>>> +    size_t ret;
>>>>         if (!ops)
>>>>           return 0;
>>>>   -    return ops->unmap(ops, iova, size);
>>>> +    pm_runtime_get_sync(smmu_domain->smmu->dev);
>>> Can these map/unmap ops be called from an atomic context? I seem
>>> to recall that being a problem before.
>>
>> That's something which was dropped in the following patch merged in master:
>> 523d7423e21b iommu/arm-smmu: Remove io-pgtable spinlock
>>
>> Looks like we don't  need locks here anymore?
>
>  Apart from the locking, wonder why a explicit pm_runtime is needed
>  from unmap. Somehow looks like some path in the master using that
>  should have enabled the pm ?

Right, the master should have done a runtime_get(), and with
device links the iommu will also resume.

The master will call the unmap when it is attached to the iommu
and therefore the iommu should be in resume state.
We shouldn't have an unmap without the master attached anyways.
Will investigate this further if we need the pm_runtime() calls
around unmap or not.

Best regards
Vivek

>
> Regards,
>  Sricharan
>
> --
> "QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation
>
> ---
> This email has been checked for viruses by Avast antivirus software.
> https://www.avast.com/antivirus
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-arm-msm" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project

^ permalink raw reply	[flat|nested] 168+ messages in thread

* Re: [PATCH V4 3/6] iommu/arm-smmu: Invoke pm_runtime during probe, add/remove device
@ 2017-07-13 13:57                   ` Vivek Gautam
  0 siblings, 0 replies; 168+ messages in thread
From: Vivek Gautam @ 2017-07-13 13:57 UTC (permalink / raw)
  To: Sricharan R, Stephen Boyd, robin.murphy
  Cc: joro, robh+dt, Mark Rutland, Will Deacon, Marek Szyprowski,
	Rob Clark, iommu, devicetree, linux-kernel, linux-clk,
	linux-arm-msm, Stanimir Varbanov, architt, linux-arm-kernel

On Thu, Jul 13, 2017 at 11:05 AM, Sricharan R <sricharan@codeaurora.org> wrote:
> Hi Vivek,
>
> On 7/13/2017 10:43 AM, Vivek Gautam wrote:
>> Hi Stephen,
>>
>>
>> On 07/13/2017 04:24 AM, Stephen Boyd wrote:
>>> On 07/06, Vivek Gautam wrote:
>>>> @@ -1231,12 +1237,18 @@ static int arm_smmu_map(struct iommu_domain *domain, unsigned long iova,
>>>>   static size_t arm_smmu_unmap(struct iommu_domain *domain, unsigned long iova,
>>>>                    size_t size)
>>>>   {
>>>> -    struct io_pgtable_ops *ops = to_smmu_domain(domain)->pgtbl_ops;
>>>> +    struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
>>>> +    struct io_pgtable_ops *ops = smmu_domain->pgtbl_ops;
>>>> +    size_t ret;
>>>>         if (!ops)
>>>>           return 0;
>>>>   -    return ops->unmap(ops, iova, size);
>>>> +    pm_runtime_get_sync(smmu_domain->smmu->dev);
>>> Can these map/unmap ops be called from an atomic context? I seem
>>> to recall that being a problem before.
>>
>> That's something which was dropped in the following patch merged in master:
>> 523d7423e21b iommu/arm-smmu: Remove io-pgtable spinlock
>>
>> Looks like we don't  need locks here anymore?
>
>  Apart from the locking, wonder why a explicit pm_runtime is needed
>  from unmap. Somehow looks like some path in the master using that
>  should have enabled the pm ?

Right, the master should have done a runtime_get(), and with
device links the iommu will also resume.

The master will call the unmap when it is attached to the iommu
and therefore the iommu should be in resume state.
We shouldn't have an unmap without the master attached anyways.
Will investigate this further if we need the pm_runtime() calls
around unmap or not.

Best regards
Vivek

>
> Regards,
>  Sricharan
>
> --
> "QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation
>
> ---
> This email has been checked for viruses by Avast antivirus software.
> https://www.avast.com/antivirus
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-arm-msm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project

^ permalink raw reply	[flat|nested] 168+ messages in thread

* Re: [PATCH V4 3/6] iommu/arm-smmu: Invoke pm_runtime during probe, add/remove device
@ 2017-07-13 13:57                   ` Vivek Gautam
  0 siblings, 0 replies; 168+ messages in thread
From: Vivek Gautam @ 2017-07-13 13:57 UTC (permalink / raw)
  To: Sricharan R, Stephen Boyd, robin.murphy
  Cc: joro, robh+dt, Mark Rutland, Will Deacon, Marek Szyprowski,
	Rob Clark, iommu, devicetree, linux-kernel, linux-clk,
	linux-arm-msm, Stanimir Varbanov, architt, linux-arm-kernel

On Thu, Jul 13, 2017 at 11:05 AM, Sricharan R <sricharan@codeaurora.org> wrote:
> Hi Vivek,
>
> On 7/13/2017 10:43 AM, Vivek Gautam wrote:
>> Hi Stephen,
>>
>>
>> On 07/13/2017 04:24 AM, Stephen Boyd wrote:
>>> On 07/06, Vivek Gautam wrote:
>>>> @@ -1231,12 +1237,18 @@ static int arm_smmu_map(struct iommu_domain *domain, unsigned long iova,
>>>>   static size_t arm_smmu_unmap(struct iommu_domain *domain, unsigned long iova,
>>>>                    size_t size)
>>>>   {
>>>> -    struct io_pgtable_ops *ops = to_smmu_domain(domain)->pgtbl_ops;
>>>> +    struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
>>>> +    struct io_pgtable_ops *ops = smmu_domain->pgtbl_ops;
>>>> +    size_t ret;
>>>>         if (!ops)
>>>>           return 0;
>>>>   -    return ops->unmap(ops, iova, size);
>>>> +    pm_runtime_get_sync(smmu_domain->smmu->dev);
>>> Can these map/unmap ops be called from an atomic context? I seem
>>> to recall that being a problem before.
>>
>> That's something which was dropped in the following patch merged in master:
>> 523d7423e21b iommu/arm-smmu: Remove io-pgtable spinlock
>>
>> Looks like we don't  need locks here anymore?
>
>  Apart from the locking, wonder why a explicit pm_runtime is needed
>  from unmap. Somehow looks like some path in the master using that
>  should have enabled the pm ?

Right, the master should have done a runtime_get(), and with
device links the iommu will also resume.

The master will call the unmap when it is attached to the iommu
and therefore the iommu should be in resume state.
We shouldn't have an unmap without the master attached anyways.
Will investigate this further if we need the pm_runtime() calls
around unmap or not.

Best regards
Vivek

>
> Regards,
>  Sricharan
>
> --
> "QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation
>
> ---
> This email has been checked for viruses by Avast antivirus software.
> https://www.avast.com/antivirus
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-arm-msm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project

^ permalink raw reply	[flat|nested] 168+ messages in thread

* [PATCH V4 3/6] iommu/arm-smmu: Invoke pm_runtime during probe, add/remove device
@ 2017-07-13 13:57                   ` Vivek Gautam
  0 siblings, 0 replies; 168+ messages in thread
From: Vivek Gautam @ 2017-07-13 13:57 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Jul 13, 2017 at 11:05 AM, Sricharan R <sricharan@codeaurora.org> wrote:
> Hi Vivek,
>
> On 7/13/2017 10:43 AM, Vivek Gautam wrote:
>> Hi Stephen,
>>
>>
>> On 07/13/2017 04:24 AM, Stephen Boyd wrote:
>>> On 07/06, Vivek Gautam wrote:
>>>> @@ -1231,12 +1237,18 @@ static int arm_smmu_map(struct iommu_domain *domain, unsigned long iova,
>>>>   static size_t arm_smmu_unmap(struct iommu_domain *domain, unsigned long iova,
>>>>                    size_t size)
>>>>   {
>>>> -    struct io_pgtable_ops *ops = to_smmu_domain(domain)->pgtbl_ops;
>>>> +    struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
>>>> +    struct io_pgtable_ops *ops = smmu_domain->pgtbl_ops;
>>>> +    size_t ret;
>>>>         if (!ops)
>>>>           return 0;
>>>>   -    return ops->unmap(ops, iova, size);
>>>> +    pm_runtime_get_sync(smmu_domain->smmu->dev);
>>> Can these map/unmap ops be called from an atomic context? I seem
>>> to recall that being a problem before.
>>
>> That's something which was dropped in the following patch merged in master:
>> 523d7423e21b iommu/arm-smmu: Remove io-pgtable spinlock
>>
>> Looks like we don't  need locks here anymore?
>
>  Apart from the locking, wonder why a explicit pm_runtime is needed
>  from unmap. Somehow looks like some path in the master using that
>  should have enabled the pm ?

Right, the master should have done a runtime_get(), and with
device links the iommu will also resume.

The master will call the unmap when it is attached to the iommu
and therefore the iommu should be in resume state.
We shouldn't have an unmap without the master attached anyways.
Will investigate this further if we need the pm_runtime() calls
around unmap or not.

Best regards
Vivek

>
> Regards,
>  Sricharan
>
> --
> "QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation
>
> ---
> This email has been checked for viruses by Avast antivirus software.
> https://www.avast.com/antivirus
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-arm-msm" in
> the body of a message to majordomo at vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project

^ permalink raw reply	[flat|nested] 168+ messages in thread

* Re: [PATCH V4 3/6] iommu/arm-smmu: Invoke pm_runtime during probe, add/remove device
  2017-07-13 13:57                   ` Vivek Gautam
  (?)
  (?)
@ 2017-07-13 14:01                       ` Vivek Gautam
  -1 siblings, 0 replies; 168+ messages in thread
From: Vivek Gautam @ 2017-07-13 14:01 UTC (permalink / raw)
  To: Sricharan R, Stephen Boyd, robin.murphy-5wv7dgnIgG8, Rob Clark
  Cc: Mark Rutland, devicetree-u79uwXL29TY76Z2rM5mHXA,
	linux-arm-msm-u79uwXL29TY76Z2rM5mHXA, Will Deacon,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, Stanimir Varbanov,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, robh+dt,
	linux-clk-u79uwXL29TY76Z2rM5mHXA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r

On Thu, Jul 13, 2017 at 7:27 PM, Vivek Gautam
<vivek.gautam-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org> wrote:
> On Thu, Jul 13, 2017 at 11:05 AM, Sricharan R <sricharan-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org> wrote:
>> Hi Vivek,
>>
>> On 7/13/2017 10:43 AM, Vivek Gautam wrote:
>>> Hi Stephen,
>>>
>>>
>>> On 07/13/2017 04:24 AM, Stephen Boyd wrote:
>>>> On 07/06, Vivek Gautam wrote:
>>>>> @@ -1231,12 +1237,18 @@ static int arm_smmu_map(struct iommu_domain *domain, unsigned long iova,
>>>>>   static size_t arm_smmu_unmap(struct iommu_domain *domain, unsigned long iova,
>>>>>                    size_t size)
>>>>>   {
>>>>> -    struct io_pgtable_ops *ops = to_smmu_domain(domain)->pgtbl_ops;
>>>>> +    struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
>>>>> +    struct io_pgtable_ops *ops = smmu_domain->pgtbl_ops;
>>>>> +    size_t ret;
>>>>>         if (!ops)
>>>>>           return 0;
>>>>>   -    return ops->unmap(ops, iova, size);
>>>>> +    pm_runtime_get_sync(smmu_domain->smmu->dev);
>>>> Can these map/unmap ops be called from an atomic context? I seem
>>>> to recall that being a problem before.
>>>
>>> That's something which was dropped in the following patch merged in master:
>>> 523d7423e21b iommu/arm-smmu: Remove io-pgtable spinlock
>>>
>>> Looks like we don't  need locks here anymore?
>>
>>  Apart from the locking, wonder why a explicit pm_runtime is needed
>>  from unmap. Somehow looks like some path in the master using that
>>  should have enabled the pm ?
>
> Right, the master should have done a runtime_get(), and with
> device links the iommu will also resume.
>
> The master will call the unmap when it is attached to the iommu
> and therefore the iommu should be in resume state.
> We shouldn't have an unmap without the master attached anyways.
> Will investigate this further if we need the pm_runtime() calls
> around unmap or not.

My apologies. My email client didn't update the thread. So please ignore
this comment.

>
> Best regards
> Vivek
>
>>
>> Regards,
>>  Sricharan
>>
>> --
>> "QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation
>>
>> ---
>> This email has been checked for viruses by Avast antivirus software.
>> https://www.avast.com/antivirus
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-arm-msm" in
>> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
>
>
> --
> Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
> a Linux Foundation Collaborative Project



-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project

^ permalink raw reply	[flat|nested] 168+ messages in thread

* Re: [PATCH V4 3/6] iommu/arm-smmu: Invoke pm_runtime during probe, add/remove device
@ 2017-07-13 14:01                       ` Vivek Gautam
  0 siblings, 0 replies; 168+ messages in thread
From: Vivek Gautam @ 2017-07-13 14:01 UTC (permalink / raw)
  To: Sricharan R, Stephen Boyd, robin.murphy, Rob Clark
  Cc: joro, robh+dt, Mark Rutland, Will Deacon, Marek Szyprowski,
	iommu, devicetree, linux-kernel, linux-clk, linux-arm-msm,
	Stanimir Varbanov, architt, linux-arm-kernel

On Thu, Jul 13, 2017 at 7:27 PM, Vivek Gautam
<vivek.gautam@codeaurora.org> wrote:
> On Thu, Jul 13, 2017 at 11:05 AM, Sricharan R <sricharan@codeaurora.org> wrote:
>> Hi Vivek,
>>
>> On 7/13/2017 10:43 AM, Vivek Gautam wrote:
>>> Hi Stephen,
>>>
>>>
>>> On 07/13/2017 04:24 AM, Stephen Boyd wrote:
>>>> On 07/06, Vivek Gautam wrote:
>>>>> @@ -1231,12 +1237,18 @@ static int arm_smmu_map(struct iommu_domain *domain, unsigned long iova,
>>>>>   static size_t arm_smmu_unmap(struct iommu_domain *domain, unsigned long iova,
>>>>>                    size_t size)
>>>>>   {
>>>>> -    struct io_pgtable_ops *ops = to_smmu_domain(domain)->pgtbl_ops;
>>>>> +    struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
>>>>> +    struct io_pgtable_ops *ops = smmu_domain->pgtbl_ops;
>>>>> +    size_t ret;
>>>>>         if (!ops)
>>>>>           return 0;
>>>>>   -    return ops->unmap(ops, iova, size);
>>>>> +    pm_runtime_get_sync(smmu_domain->smmu->dev);
>>>> Can these map/unmap ops be called from an atomic context? I seem
>>>> to recall that being a problem before.
>>>
>>> That's something which was dropped in the following patch merged in master:
>>> 523d7423e21b iommu/arm-smmu: Remove io-pgtable spinlock
>>>
>>> Looks like we don't  need locks here anymore?
>>
>>  Apart from the locking, wonder why a explicit pm_runtime is needed
>>  from unmap. Somehow looks like some path in the master using that
>>  should have enabled the pm ?
>
> Right, the master should have done a runtime_get(), and with
> device links the iommu will also resume.
>
> The master will call the unmap when it is attached to the iommu
> and therefore the iommu should be in resume state.
> We shouldn't have an unmap without the master attached anyways.
> Will investigate this further if we need the pm_runtime() calls
> around unmap or not.

My apologies. My email client didn't update the thread. So please ignore
this comment.

>
> Best regards
> Vivek
>
>>
>> Regards,
>>  Sricharan
>>
>> --
>> "QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation
>>
>> ---
>> This email has been checked for viruses by Avast antivirus software.
>> https://www.avast.com/antivirus
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-arm-msm" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
>
>
> --
> Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
> a Linux Foundation Collaborative Project



-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project

^ permalink raw reply	[flat|nested] 168+ messages in thread

* Re: [PATCH V4 3/6] iommu/arm-smmu: Invoke pm_runtime during probe, add/remove device
@ 2017-07-13 14:01                       ` Vivek Gautam
  0 siblings, 0 replies; 168+ messages in thread
From: Vivek Gautam @ 2017-07-13 14:01 UTC (permalink / raw)
  To: Sricharan R, Stephen Boyd, robin.murphy, Rob Clark
  Cc: joro, robh+dt, Mark Rutland, Will Deacon, Marek Szyprowski,
	iommu, devicetree, linux-kernel, linux-clk, linux-arm-msm,
	Stanimir Varbanov, architt, linux-arm-kernel

On Thu, Jul 13, 2017 at 7:27 PM, Vivek Gautam
<vivek.gautam@codeaurora.org> wrote:
> On Thu, Jul 13, 2017 at 11:05 AM, Sricharan R <sricharan@codeaurora.org> wrote:
>> Hi Vivek,
>>
>> On 7/13/2017 10:43 AM, Vivek Gautam wrote:
>>> Hi Stephen,
>>>
>>>
>>> On 07/13/2017 04:24 AM, Stephen Boyd wrote:
>>>> On 07/06, Vivek Gautam wrote:
>>>>> @@ -1231,12 +1237,18 @@ static int arm_smmu_map(struct iommu_domain *domain, unsigned long iova,
>>>>>   static size_t arm_smmu_unmap(struct iommu_domain *domain, unsigned long iova,
>>>>>                    size_t size)
>>>>>   {
>>>>> -    struct io_pgtable_ops *ops = to_smmu_domain(domain)->pgtbl_ops;
>>>>> +    struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
>>>>> +    struct io_pgtable_ops *ops = smmu_domain->pgtbl_ops;
>>>>> +    size_t ret;
>>>>>         if (!ops)
>>>>>           return 0;
>>>>>   -    return ops->unmap(ops, iova, size);
>>>>> +    pm_runtime_get_sync(smmu_domain->smmu->dev);
>>>> Can these map/unmap ops be called from an atomic context? I seem
>>>> to recall that being a problem before.
>>>
>>> That's something which was dropped in the following patch merged in master:
>>> 523d7423e21b iommu/arm-smmu: Remove io-pgtable spinlock
>>>
>>> Looks like we don't  need locks here anymore?
>>
>>  Apart from the locking, wonder why a explicit pm_runtime is needed
>>  from unmap. Somehow looks like some path in the master using that
>>  should have enabled the pm ?
>
> Right, the master should have done a runtime_get(), and with
> device links the iommu will also resume.
>
> The master will call the unmap when it is attached to the iommu
> and therefore the iommu should be in resume state.
> We shouldn't have an unmap without the master attached anyways.
> Will investigate this further if we need the pm_runtime() calls
> around unmap or not.

My apologies. My email client didn't update the thread. So please ignore
this comment.

>
> Best regards
> Vivek
>
>>
>> Regards,
>>  Sricharan
>>
>> --
>> "QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation
>>
>> ---
>> This email has been checked for viruses by Avast antivirus software.
>> https://www.avast.com/antivirus
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-arm-msm" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
>
>
> --
> Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
> a Linux Foundation Collaborative Project



-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project

^ permalink raw reply	[flat|nested] 168+ messages in thread

* [PATCH V4 3/6] iommu/arm-smmu: Invoke pm_runtime during probe, add/remove device
@ 2017-07-13 14:01                       ` Vivek Gautam
  0 siblings, 0 replies; 168+ messages in thread
From: Vivek Gautam @ 2017-07-13 14:01 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Jul 13, 2017 at 7:27 PM, Vivek Gautam
<vivek.gautam@codeaurora.org> wrote:
> On Thu, Jul 13, 2017 at 11:05 AM, Sricharan R <sricharan@codeaurora.org> wrote:
>> Hi Vivek,
>>
>> On 7/13/2017 10:43 AM, Vivek Gautam wrote:
>>> Hi Stephen,
>>>
>>>
>>> On 07/13/2017 04:24 AM, Stephen Boyd wrote:
>>>> On 07/06, Vivek Gautam wrote:
>>>>> @@ -1231,12 +1237,18 @@ static int arm_smmu_map(struct iommu_domain *domain, unsigned long iova,
>>>>>   static size_t arm_smmu_unmap(struct iommu_domain *domain, unsigned long iova,
>>>>>                    size_t size)
>>>>>   {
>>>>> -    struct io_pgtable_ops *ops = to_smmu_domain(domain)->pgtbl_ops;
>>>>> +    struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
>>>>> +    struct io_pgtable_ops *ops = smmu_domain->pgtbl_ops;
>>>>> +    size_t ret;
>>>>>         if (!ops)
>>>>>           return 0;
>>>>>   -    return ops->unmap(ops, iova, size);
>>>>> +    pm_runtime_get_sync(smmu_domain->smmu->dev);
>>>> Can these map/unmap ops be called from an atomic context? I seem
>>>> to recall that being a problem before.
>>>
>>> That's something which was dropped in the following patch merged in master:
>>> 523d7423e21b iommu/arm-smmu: Remove io-pgtable spinlock
>>>
>>> Looks like we don't  need locks here anymore?
>>
>>  Apart from the locking, wonder why a explicit pm_runtime is needed
>>  from unmap. Somehow looks like some path in the master using that
>>  should have enabled the pm ?
>
> Right, the master should have done a runtime_get(), and with
> device links the iommu will also resume.
>
> The master will call the unmap when it is attached to the iommu
> and therefore the iommu should be in resume state.
> We shouldn't have an unmap without the master attached anyways.
> Will investigate this further if we need the pm_runtime() calls
> around unmap or not.

My apologies. My email client didn't update the thread. So please ignore
this comment.

>
> Best regards
> Vivek
>
>>
>> Regards,
>>  Sricharan
>>
>> --
>> "QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation
>>
>> ---
>> This email has been checked for viruses by Avast antivirus software.
>> https://www.avast.com/antivirus
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-arm-msm" in
>> the body of a message to majordomo at vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
>
>
> --
> Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
> a Linux Foundation Collaborative Project



-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project

^ permalink raw reply	[flat|nested] 168+ messages in thread

* Re: [PATCH V4 3/6] iommu/arm-smmu: Invoke pm_runtime during probe, add/remove device
  2017-07-13 13:53                       ` Sricharan R
  (?)
  (?)
@ 2017-07-13 14:55                           ` Rob Clark
  -1 siblings, 0 replies; 168+ messages in thread
From: Rob Clark @ 2017-07-13 14:55 UTC (permalink / raw)
  To: Sricharan R
  Cc: Mark Rutland, devicetree-u79uwXL29TY76Z2rM5mHXA, Will Deacon,
	Stephen Boyd, Linux Kernel Mailing List, Stanimir Varbanov,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, Rob Herring,
	Vivek Gautam, linux-arm-msm, linux-clk,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r

On Thu, Jul 13, 2017 at 9:53 AM, Sricharan R <sricharan-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org> wrote:
> Hi,
>
> On 7/13/2017 5:20 PM, Rob Clark wrote:
>> On Thu, Jul 13, 2017 at 1:35 AM, Sricharan R <sricharan-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org> wrote:
>>> Hi Vivek,
>>>
>>> On 7/13/2017 10:43 AM, Vivek Gautam wrote:
>>>> Hi Stephen,
>>>>
>>>>
>>>> On 07/13/2017 04:24 AM, Stephen Boyd wrote:
>>>>> On 07/06, Vivek Gautam wrote:
>>>>>> @@ -1231,12 +1237,18 @@ static int arm_smmu_map(struct iommu_domain *domain, unsigned long iova,
>>>>>>   static size_t arm_smmu_unmap(struct iommu_domain *domain, unsigned long iova,
>>>>>>                    size_t size)
>>>>>>   {
>>>>>> -    struct io_pgtable_ops *ops = to_smmu_domain(domain)->pgtbl_ops;
>>>>>> +    struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
>>>>>> +    struct io_pgtable_ops *ops = smmu_domain->pgtbl_ops;
>>>>>> +    size_t ret;
>>>>>>         if (!ops)
>>>>>>           return 0;
>>>>>>   -    return ops->unmap(ops, iova, size);
>>>>>> +    pm_runtime_get_sync(smmu_domain->smmu->dev);
>>>>> Can these map/unmap ops be called from an atomic context? I seem
>>>>> to recall that being a problem before.
>>>>
>>>> That's something which was dropped in the following patch merged in master:
>>>> 523d7423e21b iommu/arm-smmu: Remove io-pgtable spinlock
>>>>
>>>> Looks like we don't  need locks here anymore?
>>>
>>>  Apart from the locking, wonder why a explicit pm_runtime is needed
>>>  from unmap. Somehow looks like some path in the master using that
>>>  should have enabled the pm ?
>>>
>>
>> Yes, there are a bunch of scenarios where unmap can happen with
>> disabled master (but not in atomic context).  On the gpu side we
>> opportunistically keep a buffer mapping until the buffer is freed
>> (which can happen after gpu is disabled).  Likewise, v4l2 won't unmap
>> an exported dmabuf while some other driver holds a reference to it
>> (which can be dropped when the v4l2 device is suspended).
>>
>> Since unmap triggers tbl flush which touches iommu regs, the iommu
>> driver *definitely* needs a pm_runtime_get_sync().
>
>  Ok, with that being the case, there are two things here,
>
>  1) If the device links are still intact at these places where unmap is called,
>     then pm_runtime from the master would setup the all the clocks. That would
>     avoid reintroducing the locking indirectly here.
>
>  2) If not, then doing it here is the only way. But for both cases, since
>     the unmap can be called from atomic context, resume handler here should
>     avoid doing clk_prepare_enable , instead move the clk_prepare to the init.
>

I do kinda like the approach Marek suggested.. of deferring the tlb
flush until resume.  I'm wondering if we could combine that with
putting the mmu in a stalled state when we suspend (and not resume the
mmu until after the pending tlb flush)?

BR,
-R

^ permalink raw reply	[flat|nested] 168+ messages in thread

* Re: [PATCH V4 3/6] iommu/arm-smmu: Invoke pm_runtime during probe, add/remove device
@ 2017-07-13 14:55                           ` Rob Clark
  0 siblings, 0 replies; 168+ messages in thread
From: Rob Clark @ 2017-07-13 14:55 UTC (permalink / raw)
  To: Sricharan R
  Cc: Vivek Gautam, Stephen Boyd, Joerg Roedel, Robin Murphy,
	Rob Herring, Mark Rutland, Will Deacon, Marek Szyprowski, iommu,
	devicetree, Linux Kernel Mailing List, linux-clk, linux-arm-msm,
	Stanimir Varbanov, Archit Taneja, linux-arm-kernel

On Thu, Jul 13, 2017 at 9:53 AM, Sricharan R <sricharan@codeaurora.org> wrote:
> Hi,
>
> On 7/13/2017 5:20 PM, Rob Clark wrote:
>> On Thu, Jul 13, 2017 at 1:35 AM, Sricharan R <sricharan@codeaurora.org> wrote:
>>> Hi Vivek,
>>>
>>> On 7/13/2017 10:43 AM, Vivek Gautam wrote:
>>>> Hi Stephen,
>>>>
>>>>
>>>> On 07/13/2017 04:24 AM, Stephen Boyd wrote:
>>>>> On 07/06, Vivek Gautam wrote:
>>>>>> @@ -1231,12 +1237,18 @@ static int arm_smmu_map(struct iommu_domain *domain, unsigned long iova,
>>>>>>   static size_t arm_smmu_unmap(struct iommu_domain *domain, unsigned long iova,
>>>>>>                    size_t size)
>>>>>>   {
>>>>>> -    struct io_pgtable_ops *ops = to_smmu_domain(domain)->pgtbl_ops;
>>>>>> +    struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
>>>>>> +    struct io_pgtable_ops *ops = smmu_domain->pgtbl_ops;
>>>>>> +    size_t ret;
>>>>>>         if (!ops)
>>>>>>           return 0;
>>>>>>   -    return ops->unmap(ops, iova, size);
>>>>>> +    pm_runtime_get_sync(smmu_domain->smmu->dev);
>>>>> Can these map/unmap ops be called from an atomic context? I seem
>>>>> to recall that being a problem before.
>>>>
>>>> That's something which was dropped in the following patch merged in master:
>>>> 523d7423e21b iommu/arm-smmu: Remove io-pgtable spinlock
>>>>
>>>> Looks like we don't  need locks here anymore?
>>>
>>>  Apart from the locking, wonder why a explicit pm_runtime is needed
>>>  from unmap. Somehow looks like some path in the master using that
>>>  should have enabled the pm ?
>>>
>>
>> Yes, there are a bunch of scenarios where unmap can happen with
>> disabled master (but not in atomic context).  On the gpu side we
>> opportunistically keep a buffer mapping until the buffer is freed
>> (which can happen after gpu is disabled).  Likewise, v4l2 won't unmap
>> an exported dmabuf while some other driver holds a reference to it
>> (which can be dropped when the v4l2 device is suspended).
>>
>> Since unmap triggers tbl flush which touches iommu regs, the iommu
>> driver *definitely* needs a pm_runtime_get_sync().
>
>  Ok, with that being the case, there are two things here,
>
>  1) If the device links are still intact at these places where unmap is called,
>     then pm_runtime from the master would setup the all the clocks. That would
>     avoid reintroducing the locking indirectly here.
>
>  2) If not, then doing it here is the only way. But for both cases, since
>     the unmap can be called from atomic context, resume handler here should
>     avoid doing clk_prepare_enable , instead move the clk_prepare to the init.
>

I do kinda like the approach Marek suggested.. of deferring the tlb
flush until resume.  I'm wondering if we could combine that with
putting the mmu in a stalled state when we suspend (and not resume the
mmu until after the pending tlb flush)?

BR,
-R

^ permalink raw reply	[flat|nested] 168+ messages in thread

* Re: [PATCH V4 3/6] iommu/arm-smmu: Invoke pm_runtime during probe, add/remove device
@ 2017-07-13 14:55                           ` Rob Clark
  0 siblings, 0 replies; 168+ messages in thread
From: Rob Clark @ 2017-07-13 14:55 UTC (permalink / raw)
  To: Sricharan R
  Cc: Vivek Gautam, Stephen Boyd, Joerg Roedel, Robin Murphy,
	Rob Herring, Mark Rutland, Will Deacon, Marek Szyprowski, iommu,
	devicetree, Linux Kernel Mailing List, linux-clk, linux-arm-msm,
	Stanimir Varbanov, Archit Taneja, linux-arm-kernel

On Thu, Jul 13, 2017 at 9:53 AM, Sricharan R <sricharan@codeaurora.org> wrote:
> Hi,
>
> On 7/13/2017 5:20 PM, Rob Clark wrote:
>> On Thu, Jul 13, 2017 at 1:35 AM, Sricharan R <sricharan@codeaurora.org> wrote:
>>> Hi Vivek,
>>>
>>> On 7/13/2017 10:43 AM, Vivek Gautam wrote:
>>>> Hi Stephen,
>>>>
>>>>
>>>> On 07/13/2017 04:24 AM, Stephen Boyd wrote:
>>>>> On 07/06, Vivek Gautam wrote:
>>>>>> @@ -1231,12 +1237,18 @@ static int arm_smmu_map(struct iommu_domain *domain, unsigned long iova,
>>>>>>   static size_t arm_smmu_unmap(struct iommu_domain *domain, unsigned long iova,
>>>>>>                    size_t size)
>>>>>>   {
>>>>>> -    struct io_pgtable_ops *ops = to_smmu_domain(domain)->pgtbl_ops;
>>>>>> +    struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
>>>>>> +    struct io_pgtable_ops *ops = smmu_domain->pgtbl_ops;
>>>>>> +    size_t ret;
>>>>>>         if (!ops)
>>>>>>           return 0;
>>>>>>   -    return ops->unmap(ops, iova, size);
>>>>>> +    pm_runtime_get_sync(smmu_domain->smmu->dev);
>>>>> Can these map/unmap ops be called from an atomic context? I seem
>>>>> to recall that being a problem before.
>>>>
>>>> That's something which was dropped in the following patch merged in master:
>>>> 523d7423e21b iommu/arm-smmu: Remove io-pgtable spinlock
>>>>
>>>> Looks like we don't  need locks here anymore?
>>>
>>>  Apart from the locking, wonder why a explicit pm_runtime is needed
>>>  from unmap. Somehow looks like some path in the master using that
>>>  should have enabled the pm ?
>>>
>>
>> Yes, there are a bunch of scenarios where unmap can happen with
>> disabled master (but not in atomic context).  On the gpu side we
>> opportunistically keep a buffer mapping until the buffer is freed
>> (which can happen after gpu is disabled).  Likewise, v4l2 won't unmap
>> an exported dmabuf while some other driver holds a reference to it
>> (which can be dropped when the v4l2 device is suspended).
>>
>> Since unmap triggers tbl flush which touches iommu regs, the iommu
>> driver *definitely* needs a pm_runtime_get_sync().
>
>  Ok, with that being the case, there are two things here,
>
>  1) If the device links are still intact at these places where unmap is called,
>     then pm_runtime from the master would setup the all the clocks. That would
>     avoid reintroducing the locking indirectly here.
>
>  2) If not, then doing it here is the only way. But for both cases, since
>     the unmap can be called from atomic context, resume handler here should
>     avoid doing clk_prepare_enable , instead move the clk_prepare to the init.
>

I do kinda like the approach Marek suggested.. of deferring the tlb
flush until resume.  I'm wondering if we could combine that with
putting the mmu in a stalled state when we suspend (and not resume the
mmu until after the pending tlb flush)?

BR,
-R

^ permalink raw reply	[flat|nested] 168+ messages in thread

* [PATCH V4 3/6] iommu/arm-smmu: Invoke pm_runtime during probe, add/remove device
@ 2017-07-13 14:55                           ` Rob Clark
  0 siblings, 0 replies; 168+ messages in thread
From: Rob Clark @ 2017-07-13 14:55 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Jul 13, 2017 at 9:53 AM, Sricharan R <sricharan@codeaurora.org> wrote:
> Hi,
>
> On 7/13/2017 5:20 PM, Rob Clark wrote:
>> On Thu, Jul 13, 2017 at 1:35 AM, Sricharan R <sricharan@codeaurora.org> wrote:
>>> Hi Vivek,
>>>
>>> On 7/13/2017 10:43 AM, Vivek Gautam wrote:
>>>> Hi Stephen,
>>>>
>>>>
>>>> On 07/13/2017 04:24 AM, Stephen Boyd wrote:
>>>>> On 07/06, Vivek Gautam wrote:
>>>>>> @@ -1231,12 +1237,18 @@ static int arm_smmu_map(struct iommu_domain *domain, unsigned long iova,
>>>>>>   static size_t arm_smmu_unmap(struct iommu_domain *domain, unsigned long iova,
>>>>>>                    size_t size)
>>>>>>   {
>>>>>> -    struct io_pgtable_ops *ops = to_smmu_domain(domain)->pgtbl_ops;
>>>>>> +    struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
>>>>>> +    struct io_pgtable_ops *ops = smmu_domain->pgtbl_ops;
>>>>>> +    size_t ret;
>>>>>>         if (!ops)
>>>>>>           return 0;
>>>>>>   -    return ops->unmap(ops, iova, size);
>>>>>> +    pm_runtime_get_sync(smmu_domain->smmu->dev);
>>>>> Can these map/unmap ops be called from an atomic context? I seem
>>>>> to recall that being a problem before.
>>>>
>>>> That's something which was dropped in the following patch merged in master:
>>>> 523d7423e21b iommu/arm-smmu: Remove io-pgtable spinlock
>>>>
>>>> Looks like we don't  need locks here anymore?
>>>
>>>  Apart from the locking, wonder why a explicit pm_runtime is needed
>>>  from unmap. Somehow looks like some path in the master using that
>>>  should have enabled the pm ?
>>>
>>
>> Yes, there are a bunch of scenarios where unmap can happen with
>> disabled master (but not in atomic context).  On the gpu side we
>> opportunistically keep a buffer mapping until the buffer is freed
>> (which can happen after gpu is disabled).  Likewise, v4l2 won't unmap
>> an exported dmabuf while some other driver holds a reference to it
>> (which can be dropped when the v4l2 device is suspended).
>>
>> Since unmap triggers tbl flush which touches iommu regs, the iommu
>> driver *definitely* needs a pm_runtime_get_sync().
>
>  Ok, with that being the case, there are two things here,
>
>  1) If the device links are still intact at these places where unmap is called,
>     then pm_runtime from the master would setup the all the clocks. That would
>     avoid reintroducing the locking indirectly here.
>
>  2) If not, then doing it here is the only way. But for both cases, since
>     the unmap can be called from atomic context, resume handler here should
>     avoid doing clk_prepare_enable , instead move the clk_prepare to the init.
>

I do kinda like the approach Marek suggested.. of deferring the tlb
flush until resume.  I'm wondering if we could combine that with
putting the mmu in a stalled state when we suspend (and not resume the
mmu until after the pending tlb flush)?

BR,
-R

^ permalink raw reply	[flat|nested] 168+ messages in thread

* Re: [PATCH V4 3/6] iommu/arm-smmu: Invoke pm_runtime during probe, add/remove device
  2017-07-13 14:55                           ` Rob Clark
  (?)
@ 2017-07-14 17:07                             ` Will Deacon
  -1 siblings, 0 replies; 168+ messages in thread
From: Will Deacon @ 2017-07-14 17:07 UTC (permalink / raw)
  To: Rob Clark
  Cc: Sricharan R, Vivek Gautam, Stephen Boyd, Joerg Roedel,
	Robin Murphy, Rob Herring, Mark Rutland, Marek Szyprowski, iommu,
	devicetree, Linux Kernel Mailing List, linux-clk, linux-arm-msm,
	Stanimir Varbanov, Archit Taneja, linux-arm-kernel

On Thu, Jul 13, 2017 at 10:55:10AM -0400, Rob Clark wrote:
> On Thu, Jul 13, 2017 at 9:53 AM, Sricharan R <sricharan@codeaurora.org> wrote:
> > Hi,
> >
> > On 7/13/2017 5:20 PM, Rob Clark wrote:
> >> On Thu, Jul 13, 2017 at 1:35 AM, Sricharan R <sricharan@codeaurora.org> wrote:
> >>> Hi Vivek,
> >>>
> >>> On 7/13/2017 10:43 AM, Vivek Gautam wrote:
> >>>> Hi Stephen,
> >>>>
> >>>>
> >>>> On 07/13/2017 04:24 AM, Stephen Boyd wrote:
> >>>>> On 07/06, Vivek Gautam wrote:
> >>>>>> @@ -1231,12 +1237,18 @@ static int arm_smmu_map(struct iommu_domain *domain, unsigned long iova,
> >>>>>>   static size_t arm_smmu_unmap(struct iommu_domain *domain, unsigned long iova,
> >>>>>>                    size_t size)
> >>>>>>   {
> >>>>>> -    struct io_pgtable_ops *ops = to_smmu_domain(domain)->pgtbl_ops;
> >>>>>> +    struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
> >>>>>> +    struct io_pgtable_ops *ops = smmu_domain->pgtbl_ops;
> >>>>>> +    size_t ret;
> >>>>>>         if (!ops)
> >>>>>>           return 0;
> >>>>>>   -    return ops->unmap(ops, iova, size);
> >>>>>> +    pm_runtime_get_sync(smmu_domain->smmu->dev);
> >>>>> Can these map/unmap ops be called from an atomic context? I seem
> >>>>> to recall that being a problem before.
> >>>>
> >>>> That's something which was dropped in the following patch merged in master:
> >>>> 523d7423e21b iommu/arm-smmu: Remove io-pgtable spinlock
> >>>>
> >>>> Looks like we don't  need locks here anymore?
> >>>
> >>>  Apart from the locking, wonder why a explicit pm_runtime is needed
> >>>  from unmap. Somehow looks like some path in the master using that
> >>>  should have enabled the pm ?
> >>>
> >>
> >> Yes, there are a bunch of scenarios where unmap can happen with
> >> disabled master (but not in atomic context).  On the gpu side we
> >> opportunistically keep a buffer mapping until the buffer is freed
> >> (which can happen after gpu is disabled).  Likewise, v4l2 won't unmap
> >> an exported dmabuf while some other driver holds a reference to it
> >> (which can be dropped when the v4l2 device is suspended).
> >>
> >> Since unmap triggers tbl flush which touches iommu regs, the iommu
> >> driver *definitely* needs a pm_runtime_get_sync().
> >
> >  Ok, with that being the case, there are two things here,
> >
> >  1) If the device links are still intact at these places where unmap is called,
> >     then pm_runtime from the master would setup the all the clocks. That would
> >     avoid reintroducing the locking indirectly here.
> >
> >  2) If not, then doing it here is the only way. But for both cases, since
> >     the unmap can be called from atomic context, resume handler here should
> >     avoid doing clk_prepare_enable , instead move the clk_prepare to the init.
> >
> 
> I do kinda like the approach Marek suggested.. of deferring the tlb
> flush until resume.  I'm wondering if we could combine that with
> putting the mmu in a stalled state when we suspend (and not resume the
> mmu until after the pending tlb flush)?

I'm not sure that a stalled state is what we're after here, because we need
to take care to prevent any table walks if we've freed the underlying pages.
What we could try to do is disable the SMMU (put into global bypass) and
invalidate the TLB when performing a suspend operation, then we just ignore
invalidation whilst the clocks are stopped and, on resume, enable the SMMU
again.

That said, I don't think we can tolerate suspend/resume racing with
map/unmap, and it's not clear to me how we avoid that without penalising
the fastpath.

Will

^ permalink raw reply	[flat|nested] 168+ messages in thread

* Re: [PATCH V4 3/6] iommu/arm-smmu: Invoke pm_runtime during probe, add/remove device
@ 2017-07-14 17:07                             ` Will Deacon
  0 siblings, 0 replies; 168+ messages in thread
From: Will Deacon @ 2017-07-14 17:07 UTC (permalink / raw)
  To: Rob Clark
  Cc: Sricharan R, Vivek Gautam, Stephen Boyd, Joerg Roedel,
	Robin Murphy, Rob Herring, Mark Rutland, Marek Szyprowski, iommu,
	devicetree, Linux Kernel Mailing List, linux-clk, linux-arm-msm,
	Stanimir Varbanov, Archit Taneja, linux-arm-kernel

On Thu, Jul 13, 2017 at 10:55:10AM -0400, Rob Clark wrote:
> On Thu, Jul 13, 2017 at 9:53 AM, Sricharan R <sricharan@codeaurora.org> wrote:
> > Hi,
> >
> > On 7/13/2017 5:20 PM, Rob Clark wrote:
> >> On Thu, Jul 13, 2017 at 1:35 AM, Sricharan R <sricharan@codeaurora.org> wrote:
> >>> Hi Vivek,
> >>>
> >>> On 7/13/2017 10:43 AM, Vivek Gautam wrote:
> >>>> Hi Stephen,
> >>>>
> >>>>
> >>>> On 07/13/2017 04:24 AM, Stephen Boyd wrote:
> >>>>> On 07/06, Vivek Gautam wrote:
> >>>>>> @@ -1231,12 +1237,18 @@ static int arm_smmu_map(struct iommu_domain *domain, unsigned long iova,
> >>>>>>   static size_t arm_smmu_unmap(struct iommu_domain *domain, unsigned long iova,
> >>>>>>                    size_t size)
> >>>>>>   {
> >>>>>> -    struct io_pgtable_ops *ops = to_smmu_domain(domain)->pgtbl_ops;
> >>>>>> +    struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
> >>>>>> +    struct io_pgtable_ops *ops = smmu_domain->pgtbl_ops;
> >>>>>> +    size_t ret;
> >>>>>>         if (!ops)
> >>>>>>           return 0;
> >>>>>>   -    return ops->unmap(ops, iova, size);
> >>>>>> +    pm_runtime_get_sync(smmu_domain->smmu->dev);
> >>>>> Can these map/unmap ops be called from an atomic context? I seem
> >>>>> to recall that being a problem before.
> >>>>
> >>>> That's something which was dropped in the following patch merged in master:
> >>>> 523d7423e21b iommu/arm-smmu: Remove io-pgtable spinlock
> >>>>
> >>>> Looks like we don't  need locks here anymore?
> >>>
> >>>  Apart from the locking, wonder why a explicit pm_runtime is needed
> >>>  from unmap. Somehow looks like some path in the master using that
> >>>  should have enabled the pm ?
> >>>
> >>
> >> Yes, there are a bunch of scenarios where unmap can happen with
> >> disabled master (but not in atomic context).  On the gpu side we
> >> opportunistically keep a buffer mapping until the buffer is freed
> >> (which can happen after gpu is disabled).  Likewise, v4l2 won't unmap
> >> an exported dmabuf while some other driver holds a reference to it
> >> (which can be dropped when the v4l2 device is suspended).
> >>
> >> Since unmap triggers tbl flush which touches iommu regs, the iommu
> >> driver *definitely* needs a pm_runtime_get_sync().
> >
> >  Ok, with that being the case, there are two things here,
> >
> >  1) If the device links are still intact at these places where unmap is called,
> >     then pm_runtime from the master would setup the all the clocks. That would
> >     avoid reintroducing the locking indirectly here.
> >
> >  2) If not, then doing it here is the only way. But for both cases, since
> >     the unmap can be called from atomic context, resume handler here should
> >     avoid doing clk_prepare_enable , instead move the clk_prepare to the init.
> >
> 
> I do kinda like the approach Marek suggested.. of deferring the tlb
> flush until resume.  I'm wondering if we could combine that with
> putting the mmu in a stalled state when we suspend (and not resume the
> mmu until after the pending tlb flush)?

I'm not sure that a stalled state is what we're after here, because we need
to take care to prevent any table walks if we've freed the underlying pages.
What we could try to do is disable the SMMU (put into global bypass) and
invalidate the TLB when performing a suspend operation, then we just ignore
invalidation whilst the clocks are stopped and, on resume, enable the SMMU
again.

That said, I don't think we can tolerate suspend/resume racing with
map/unmap, and it's not clear to me how we avoid that without penalising
the fastpath.

Will

^ permalink raw reply	[flat|nested] 168+ messages in thread

* [PATCH V4 3/6] iommu/arm-smmu: Invoke pm_runtime during probe, add/remove device
@ 2017-07-14 17:07                             ` Will Deacon
  0 siblings, 0 replies; 168+ messages in thread
From: Will Deacon @ 2017-07-14 17:07 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Jul 13, 2017 at 10:55:10AM -0400, Rob Clark wrote:
> On Thu, Jul 13, 2017 at 9:53 AM, Sricharan R <sricharan@codeaurora.org> wrote:
> > Hi,
> >
> > On 7/13/2017 5:20 PM, Rob Clark wrote:
> >> On Thu, Jul 13, 2017 at 1:35 AM, Sricharan R <sricharan@codeaurora.org> wrote:
> >>> Hi Vivek,
> >>>
> >>> On 7/13/2017 10:43 AM, Vivek Gautam wrote:
> >>>> Hi Stephen,
> >>>>
> >>>>
> >>>> On 07/13/2017 04:24 AM, Stephen Boyd wrote:
> >>>>> On 07/06, Vivek Gautam wrote:
> >>>>>> @@ -1231,12 +1237,18 @@ static int arm_smmu_map(struct iommu_domain *domain, unsigned long iova,
> >>>>>>   static size_t arm_smmu_unmap(struct iommu_domain *domain, unsigned long iova,
> >>>>>>                    size_t size)
> >>>>>>   {
> >>>>>> -    struct io_pgtable_ops *ops = to_smmu_domain(domain)->pgtbl_ops;
> >>>>>> +    struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
> >>>>>> +    struct io_pgtable_ops *ops = smmu_domain->pgtbl_ops;
> >>>>>> +    size_t ret;
> >>>>>>         if (!ops)
> >>>>>>           return 0;
> >>>>>>   -    return ops->unmap(ops, iova, size);
> >>>>>> +    pm_runtime_get_sync(smmu_domain->smmu->dev);
> >>>>> Can these map/unmap ops be called from an atomic context? I seem
> >>>>> to recall that being a problem before.
> >>>>
> >>>> That's something which was dropped in the following patch merged in master:
> >>>> 523d7423e21b iommu/arm-smmu: Remove io-pgtable spinlock
> >>>>
> >>>> Looks like we don't  need locks here anymore?
> >>>
> >>>  Apart from the locking, wonder why a explicit pm_runtime is needed
> >>>  from unmap. Somehow looks like some path in the master using that
> >>>  should have enabled the pm ?
> >>>
> >>
> >> Yes, there are a bunch of scenarios where unmap can happen with
> >> disabled master (but not in atomic context).  On the gpu side we
> >> opportunistically keep a buffer mapping until the buffer is freed
> >> (which can happen after gpu is disabled).  Likewise, v4l2 won't unmap
> >> an exported dmabuf while some other driver holds a reference to it
> >> (which can be dropped when the v4l2 device is suspended).
> >>
> >> Since unmap triggers tbl flush which touches iommu regs, the iommu
> >> driver *definitely* needs a pm_runtime_get_sync().
> >
> >  Ok, with that being the case, there are two things here,
> >
> >  1) If the device links are still intact at these places where unmap is called,
> >     then pm_runtime from the master would setup the all the clocks. That would
> >     avoid reintroducing the locking indirectly here.
> >
> >  2) If not, then doing it here is the only way. But for both cases, since
> >     the unmap can be called from atomic context, resume handler here should
> >     avoid doing clk_prepare_enable , instead move the clk_prepare to the init.
> >
> 
> I do kinda like the approach Marek suggested.. of deferring the tlb
> flush until resume.  I'm wondering if we could combine that with
> putting the mmu in a stalled state when we suspend (and not resume the
> mmu until after the pending tlb flush)?

I'm not sure that a stalled state is what we're after here, because we need
to take care to prevent any table walks if we've freed the underlying pages.
What we could try to do is disable the SMMU (put into global bypass) and
invalidate the TLB when performing a suspend operation, then we just ignore
invalidation whilst the clocks are stopped and, on resume, enable the SMMU
again.

That said, I don't think we can tolerate suspend/resume racing with
map/unmap, and it's not clear to me how we avoid that without penalising
the fastpath.

Will

^ permalink raw reply	[flat|nested] 168+ messages in thread

* Re: [PATCH V4 3/6] iommu/arm-smmu: Invoke pm_runtime during probe, add/remove device
  2017-07-14 17:07                             ` Will Deacon
  (?)
@ 2017-07-14 17:42                               ` Rob Clark
  -1 siblings, 0 replies; 168+ messages in thread
From: Rob Clark @ 2017-07-14 17:42 UTC (permalink / raw)
  To: Will Deacon
  Cc: Sricharan R, Vivek Gautam, Stephen Boyd, Joerg Roedel,
	Robin Murphy, Rob Herring, Mark Rutland, Marek Szyprowski, iommu,
	devicetree, Linux Kernel Mailing List, linux-clk, linux-arm-msm,
	Stanimir Varbanov, Archit Taneja, linux-arm-kernel

On Fri, Jul 14, 2017 at 1:07 PM, Will Deacon <will.deacon@arm.com> wrote:
> On Thu, Jul 13, 2017 at 10:55:10AM -0400, Rob Clark wrote:
>> On Thu, Jul 13, 2017 at 9:53 AM, Sricharan R <sricharan@codeaurora.org> wrote:
>> > Hi,
>> >
>> > On 7/13/2017 5:20 PM, Rob Clark wrote:
>> >> On Thu, Jul 13, 2017 at 1:35 AM, Sricharan R <sricharan@codeaurora.org> wrote:
>> >>> Hi Vivek,
>> >>>
>> >>> On 7/13/2017 10:43 AM, Vivek Gautam wrote:
>> >>>> Hi Stephen,
>> >>>>
>> >>>>
>> >>>> On 07/13/2017 04:24 AM, Stephen Boyd wrote:
>> >>>>> On 07/06, Vivek Gautam wrote:
>> >>>>>> @@ -1231,12 +1237,18 @@ static int arm_smmu_map(struct iommu_domain *domain, unsigned long iova,
>> >>>>>>   static size_t arm_smmu_unmap(struct iommu_domain *domain, unsigned long iova,
>> >>>>>>                    size_t size)
>> >>>>>>   {
>> >>>>>> -    struct io_pgtable_ops *ops = to_smmu_domain(domain)->pgtbl_ops;
>> >>>>>> +    struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
>> >>>>>> +    struct io_pgtable_ops *ops = smmu_domain->pgtbl_ops;
>> >>>>>> +    size_t ret;
>> >>>>>>         if (!ops)
>> >>>>>>           return 0;
>> >>>>>>   -    return ops->unmap(ops, iova, size);
>> >>>>>> +    pm_runtime_get_sync(smmu_domain->smmu->dev);
>> >>>>> Can these map/unmap ops be called from an atomic context? I seem
>> >>>>> to recall that being a problem before.
>> >>>>
>> >>>> That's something which was dropped in the following patch merged in master:
>> >>>> 523d7423e21b iommu/arm-smmu: Remove io-pgtable spinlock
>> >>>>
>> >>>> Looks like we don't  need locks here anymore?
>> >>>
>> >>>  Apart from the locking, wonder why a explicit pm_runtime is needed
>> >>>  from unmap. Somehow looks like some path in the master using that
>> >>>  should have enabled the pm ?
>> >>>
>> >>
>> >> Yes, there are a bunch of scenarios where unmap can happen with
>> >> disabled master (but not in atomic context).  On the gpu side we
>> >> opportunistically keep a buffer mapping until the buffer is freed
>> >> (which can happen after gpu is disabled).  Likewise, v4l2 won't unmap
>> >> an exported dmabuf while some other driver holds a reference to it
>> >> (which can be dropped when the v4l2 device is suspended).
>> >>
>> >> Since unmap triggers tbl flush which touches iommu regs, the iommu
>> >> driver *definitely* needs a pm_runtime_get_sync().
>> >
>> >  Ok, with that being the case, there are two things here,
>> >
>> >  1) If the device links are still intact at these places where unmap is called,
>> >     then pm_runtime from the master would setup the all the clocks. That would
>> >     avoid reintroducing the locking indirectly here.
>> >
>> >  2) If not, then doing it here is the only way. But for both cases, since
>> >     the unmap can be called from atomic context, resume handler here should
>> >     avoid doing clk_prepare_enable , instead move the clk_prepare to the init.
>> >
>>
>> I do kinda like the approach Marek suggested.. of deferring the tlb
>> flush until resume.  I'm wondering if we could combine that with
>> putting the mmu in a stalled state when we suspend (and not resume the
>> mmu until after the pending tlb flush)?
>
> I'm not sure that a stalled state is what we're after here, because we need
> to take care to prevent any table walks if we've freed the underlying pages.
> What we could try to do is disable the SMMU (put into global bypass) and
> invalidate the TLB when performing a suspend operation, then we just ignore
> invalidation whilst the clocks are stopped and, on resume, enable the SMMU
> again.

wouldn't stalled just block any memory transactions by device(s) using
the context bank?  Putting it in bypass isn't really a good thing if
there is any chance the device can sneak in a memory access before
we've taking it back out of bypass (ie. makes gpu a giant userspace
controlled root hole).

BR,
-R

> That said, I don't think we can tolerate suspend/resume racing with
> map/unmap, and it's not clear to me how we avoid that without penalising
> the fastpath.
>
> Will

^ permalink raw reply	[flat|nested] 168+ messages in thread

* Re: [PATCH V4 3/6] iommu/arm-smmu: Invoke pm_runtime during probe, add/remove device
@ 2017-07-14 17:42                               ` Rob Clark
  0 siblings, 0 replies; 168+ messages in thread
From: Rob Clark @ 2017-07-14 17:42 UTC (permalink / raw)
  To: Will Deacon
  Cc: Sricharan R, Vivek Gautam, Stephen Boyd, Joerg Roedel,
	Robin Murphy, Rob Herring, Mark Rutland, Marek Szyprowski, iommu,
	devicetree, Linux Kernel Mailing List, linux-clk, linux-arm-msm,
	Stanimir Varbanov, Archit Taneja, linux-arm-kernel

On Fri, Jul 14, 2017 at 1:07 PM, Will Deacon <will.deacon@arm.com> wrote:
> On Thu, Jul 13, 2017 at 10:55:10AM -0400, Rob Clark wrote:
>> On Thu, Jul 13, 2017 at 9:53 AM, Sricharan R <sricharan@codeaurora.org> wrote:
>> > Hi,
>> >
>> > On 7/13/2017 5:20 PM, Rob Clark wrote:
>> >> On Thu, Jul 13, 2017 at 1:35 AM, Sricharan R <sricharan@codeaurora.org> wrote:
>> >>> Hi Vivek,
>> >>>
>> >>> On 7/13/2017 10:43 AM, Vivek Gautam wrote:
>> >>>> Hi Stephen,
>> >>>>
>> >>>>
>> >>>> On 07/13/2017 04:24 AM, Stephen Boyd wrote:
>> >>>>> On 07/06, Vivek Gautam wrote:
>> >>>>>> @@ -1231,12 +1237,18 @@ static int arm_smmu_map(struct iommu_domain *domain, unsigned long iova,
>> >>>>>>   static size_t arm_smmu_unmap(struct iommu_domain *domain, unsigned long iova,
>> >>>>>>                    size_t size)
>> >>>>>>   {
>> >>>>>> -    struct io_pgtable_ops *ops = to_smmu_domain(domain)->pgtbl_ops;
>> >>>>>> +    struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
>> >>>>>> +    struct io_pgtable_ops *ops = smmu_domain->pgtbl_ops;
>> >>>>>> +    size_t ret;
>> >>>>>>         if (!ops)
>> >>>>>>           return 0;
>> >>>>>>   -    return ops->unmap(ops, iova, size);
>> >>>>>> +    pm_runtime_get_sync(smmu_domain->smmu->dev);
>> >>>>> Can these map/unmap ops be called from an atomic context? I seem
>> >>>>> to recall that being a problem before.
>> >>>>
>> >>>> That's something which was dropped in the following patch merged in master:
>> >>>> 523d7423e21b iommu/arm-smmu: Remove io-pgtable spinlock
>> >>>>
>> >>>> Looks like we don't  need locks here anymore?
>> >>>
>> >>>  Apart from the locking, wonder why a explicit pm_runtime is needed
>> >>>  from unmap. Somehow looks like some path in the master using that
>> >>>  should have enabled the pm ?
>> >>>
>> >>
>> >> Yes, there are a bunch of scenarios where unmap can happen with
>> >> disabled master (but not in atomic context).  On the gpu side we
>> >> opportunistically keep a buffer mapping until the buffer is freed
>> >> (which can happen after gpu is disabled).  Likewise, v4l2 won't unmap
>> >> an exported dmabuf while some other driver holds a reference to it
>> >> (which can be dropped when the v4l2 device is suspended).
>> >>
>> >> Since unmap triggers tbl flush which touches iommu regs, the iommu
>> >> driver *definitely* needs a pm_runtime_get_sync().
>> >
>> >  Ok, with that being the case, there are two things here,
>> >
>> >  1) If the device links are still intact at these places where unmap is called,
>> >     then pm_runtime from the master would setup the all the clocks. That would
>> >     avoid reintroducing the locking indirectly here.
>> >
>> >  2) If not, then doing it here is the only way. But for both cases, since
>> >     the unmap can be called from atomic context, resume handler here should
>> >     avoid doing clk_prepare_enable , instead move the clk_prepare to the init.
>> >
>>
>> I do kinda like the approach Marek suggested.. of deferring the tlb
>> flush until resume.  I'm wondering if we could combine that with
>> putting the mmu in a stalled state when we suspend (and not resume the
>> mmu until after the pending tlb flush)?
>
> I'm not sure that a stalled state is what we're after here, because we need
> to take care to prevent any table walks if we've freed the underlying pages.
> What we could try to do is disable the SMMU (put into global bypass) and
> invalidate the TLB when performing a suspend operation, then we just ignore
> invalidation whilst the clocks are stopped and, on resume, enable the SMMU
> again.

wouldn't stalled just block any memory transactions by device(s) using
the context bank?  Putting it in bypass isn't really a good thing if
there is any chance the device can sneak in a memory access before
we've taking it back out of bypass (ie. makes gpu a giant userspace
controlled root hole).

BR,
-R

> That said, I don't think we can tolerate suspend/resume racing with
> map/unmap, and it's not clear to me how we avoid that without penalising
> the fastpath.
>
> Will

^ permalink raw reply	[flat|nested] 168+ messages in thread

* [PATCH V4 3/6] iommu/arm-smmu: Invoke pm_runtime during probe, add/remove device
@ 2017-07-14 17:42                               ` Rob Clark
  0 siblings, 0 replies; 168+ messages in thread
From: Rob Clark @ 2017-07-14 17:42 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Jul 14, 2017 at 1:07 PM, Will Deacon <will.deacon@arm.com> wrote:
> On Thu, Jul 13, 2017 at 10:55:10AM -0400, Rob Clark wrote:
>> On Thu, Jul 13, 2017 at 9:53 AM, Sricharan R <sricharan@codeaurora.org> wrote:
>> > Hi,
>> >
>> > On 7/13/2017 5:20 PM, Rob Clark wrote:
>> >> On Thu, Jul 13, 2017 at 1:35 AM, Sricharan R <sricharan@codeaurora.org> wrote:
>> >>> Hi Vivek,
>> >>>
>> >>> On 7/13/2017 10:43 AM, Vivek Gautam wrote:
>> >>>> Hi Stephen,
>> >>>>
>> >>>>
>> >>>> On 07/13/2017 04:24 AM, Stephen Boyd wrote:
>> >>>>> On 07/06, Vivek Gautam wrote:
>> >>>>>> @@ -1231,12 +1237,18 @@ static int arm_smmu_map(struct iommu_domain *domain, unsigned long iova,
>> >>>>>>   static size_t arm_smmu_unmap(struct iommu_domain *domain, unsigned long iova,
>> >>>>>>                    size_t size)
>> >>>>>>   {
>> >>>>>> -    struct io_pgtable_ops *ops = to_smmu_domain(domain)->pgtbl_ops;
>> >>>>>> +    struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
>> >>>>>> +    struct io_pgtable_ops *ops = smmu_domain->pgtbl_ops;
>> >>>>>> +    size_t ret;
>> >>>>>>         if (!ops)
>> >>>>>>           return 0;
>> >>>>>>   -    return ops->unmap(ops, iova, size);
>> >>>>>> +    pm_runtime_get_sync(smmu_domain->smmu->dev);
>> >>>>> Can these map/unmap ops be called from an atomic context? I seem
>> >>>>> to recall that being a problem before.
>> >>>>
>> >>>> That's something which was dropped in the following patch merged in master:
>> >>>> 523d7423e21b iommu/arm-smmu: Remove io-pgtable spinlock
>> >>>>
>> >>>> Looks like we don't  need locks here anymore?
>> >>>
>> >>>  Apart from the locking, wonder why a explicit pm_runtime is needed
>> >>>  from unmap. Somehow looks like some path in the master using that
>> >>>  should have enabled the pm ?
>> >>>
>> >>
>> >> Yes, there are a bunch of scenarios where unmap can happen with
>> >> disabled master (but not in atomic context).  On the gpu side we
>> >> opportunistically keep a buffer mapping until the buffer is freed
>> >> (which can happen after gpu is disabled).  Likewise, v4l2 won't unmap
>> >> an exported dmabuf while some other driver holds a reference to it
>> >> (which can be dropped when the v4l2 device is suspended).
>> >>
>> >> Since unmap triggers tbl flush which touches iommu regs, the iommu
>> >> driver *definitely* needs a pm_runtime_get_sync().
>> >
>> >  Ok, with that being the case, there are two things here,
>> >
>> >  1) If the device links are still intact at these places where unmap is called,
>> >     then pm_runtime from the master would setup the all the clocks. That would
>> >     avoid reintroducing the locking indirectly here.
>> >
>> >  2) If not, then doing it here is the only way. But for both cases, since
>> >     the unmap can be called from atomic context, resume handler here should
>> >     avoid doing clk_prepare_enable , instead move the clk_prepare to the init.
>> >
>>
>> I do kinda like the approach Marek suggested.. of deferring the tlb
>> flush until resume.  I'm wondering if we could combine that with
>> putting the mmu in a stalled state when we suspend (and not resume the
>> mmu until after the pending tlb flush)?
>
> I'm not sure that a stalled state is what we're after here, because we need
> to take care to prevent any table walks if we've freed the underlying pages.
> What we could try to do is disable the SMMU (put into global bypass) and
> invalidate the TLB when performing a suspend operation, then we just ignore
> invalidation whilst the clocks are stopped and, on resume, enable the SMMU
> again.

wouldn't stalled just block any memory transactions by device(s) using
the context bank?  Putting it in bypass isn't really a good thing if
there is any chance the device can sneak in a memory access before
we've taking it back out of bypass (ie. makes gpu a giant userspace
controlled root hole).

BR,
-R

> That said, I don't think we can tolerate suspend/resume racing with
> map/unmap, and it's not clear to me how we avoid that without penalising
> the fastpath.
>
> Will

^ permalink raw reply	[flat|nested] 168+ messages in thread

* Re: [PATCH V4 3/6] iommu/arm-smmu: Invoke pm_runtime during probe, add/remove device
  2017-07-14 17:42                               ` Rob Clark
  (?)
@ 2017-07-14 18:06                                 ` Will Deacon
  -1 siblings, 0 replies; 168+ messages in thread
From: Will Deacon @ 2017-07-14 18:06 UTC (permalink / raw)
  To: Rob Clark
  Cc: Sricharan R, Vivek Gautam, Stephen Boyd, Joerg Roedel,
	Robin Murphy, Rob Herring, Mark Rutland, Marek Szyprowski, iommu,
	devicetree, Linux Kernel Mailing List, linux-clk, linux-arm-msm,
	Stanimir Varbanov, Archit Taneja, linux-arm-kernel

On Fri, Jul 14, 2017 at 01:42:13PM -0400, Rob Clark wrote:
> On Fri, Jul 14, 2017 at 1:07 PM, Will Deacon <will.deacon@arm.com> wrote:
> > On Thu, Jul 13, 2017 at 10:55:10AM -0400, Rob Clark wrote:
> >> On Thu, Jul 13, 2017 at 9:53 AM, Sricharan R <sricharan@codeaurora.org> wrote:
> >> > Hi,
> >> >
> >> > On 7/13/2017 5:20 PM, Rob Clark wrote:
> >> >> On Thu, Jul 13, 2017 at 1:35 AM, Sricharan R <sricharan@codeaurora.org> wrote:
> >> >>> Hi Vivek,
> >> >>>
> >> >>> On 7/13/2017 10:43 AM, Vivek Gautam wrote:
> >> >>>> Hi Stephen,
> >> >>>>
> >> >>>>
> >> >>>> On 07/13/2017 04:24 AM, Stephen Boyd wrote:
> >> >>>>> On 07/06, Vivek Gautam wrote:
> >> >>>>>> @@ -1231,12 +1237,18 @@ static int arm_smmu_map(struct iommu_domain *domain, unsigned long iova,
> >> >>>>>>   static size_t arm_smmu_unmap(struct iommu_domain *domain, unsigned long iova,
> >> >>>>>>                    size_t size)
> >> >>>>>>   {
> >> >>>>>> -    struct io_pgtable_ops *ops = to_smmu_domain(domain)->pgtbl_ops;
> >> >>>>>> +    struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
> >> >>>>>> +    struct io_pgtable_ops *ops = smmu_domain->pgtbl_ops;
> >> >>>>>> +    size_t ret;
> >> >>>>>>         if (!ops)
> >> >>>>>>           return 0;
> >> >>>>>>   -    return ops->unmap(ops, iova, size);
> >> >>>>>> +    pm_runtime_get_sync(smmu_domain->smmu->dev);
> >> >>>>> Can these map/unmap ops be called from an atomic context? I seem
> >> >>>>> to recall that being a problem before.
> >> >>>>
> >> >>>> That's something which was dropped in the following patch merged in master:
> >> >>>> 523d7423e21b iommu/arm-smmu: Remove io-pgtable spinlock
> >> >>>>
> >> >>>> Looks like we don't  need locks here anymore?
> >> >>>
> >> >>>  Apart from the locking, wonder why a explicit pm_runtime is needed
> >> >>>  from unmap. Somehow looks like some path in the master using that
> >> >>>  should have enabled the pm ?
> >> >>>
> >> >>
> >> >> Yes, there are a bunch of scenarios where unmap can happen with
> >> >> disabled master (but not in atomic context).  On the gpu side we
> >> >> opportunistically keep a buffer mapping until the buffer is freed
> >> >> (which can happen after gpu is disabled).  Likewise, v4l2 won't unmap
> >> >> an exported dmabuf while some other driver holds a reference to it
> >> >> (which can be dropped when the v4l2 device is suspended).
> >> >>
> >> >> Since unmap triggers tbl flush which touches iommu regs, the iommu
> >> >> driver *definitely* needs a pm_runtime_get_sync().
> >> >
> >> >  Ok, with that being the case, there are two things here,
> >> >
> >> >  1) If the device links are still intact at these places where unmap is called,
> >> >     then pm_runtime from the master would setup the all the clocks. That would
> >> >     avoid reintroducing the locking indirectly here.
> >> >
> >> >  2) If not, then doing it here is the only way. But for both cases, since
> >> >     the unmap can be called from atomic context, resume handler here should
> >> >     avoid doing clk_prepare_enable , instead move the clk_prepare to the init.
> >> >
> >>
> >> I do kinda like the approach Marek suggested.. of deferring the tlb
> >> flush until resume.  I'm wondering if we could combine that with
> >> putting the mmu in a stalled state when we suspend (and not resume the
> >> mmu until after the pending tlb flush)?
> >
> > I'm not sure that a stalled state is what we're after here, because we need
> > to take care to prevent any table walks if we've freed the underlying pages.
> > What we could try to do is disable the SMMU (put into global bypass) and
> > invalidate the TLB when performing a suspend operation, then we just ignore
> > invalidation whilst the clocks are stopped and, on resume, enable the SMMU
> > again.
> 
> wouldn't stalled just block any memory transactions by device(s) using
> the context bank?  Putting it in bypass isn't really a good thing if
> there is any chance the device can sneak in a memory access before
> we've taking it back out of bypass (ie. makes gpu a giant userspace
> controlled root hole).

If it doesn't deadlock, then yes, it will stall transactions. However, that
doesn't mean it necessarily prevents page table walks. Instead of bypass, we
could configure all the streams to terminate, but this race still worries me
somewhat. I thought that the SMMU would only be suspended if all of its
masters were suspended, so if the GPU wants to come out of suspend then the
SMMU should be resumed first.

It would be helpful if somebody could figure out exactly what can race with
the suspend/resume calls here.

Will

^ permalink raw reply	[flat|nested] 168+ messages in thread

* Re: [PATCH V4 3/6] iommu/arm-smmu: Invoke pm_runtime during probe, add/remove device
@ 2017-07-14 18:06                                 ` Will Deacon
  0 siblings, 0 replies; 168+ messages in thread
From: Will Deacon @ 2017-07-14 18:06 UTC (permalink / raw)
  To: Rob Clark
  Cc: Sricharan R, Vivek Gautam, Stephen Boyd, Joerg Roedel,
	Robin Murphy, Rob Herring, Mark Rutland, Marek Szyprowski, iommu,
	devicetree, Linux Kernel Mailing List, linux-clk, linux-arm-msm,
	Stanimir Varbanov, Archit Taneja, linux-arm-kernel

On Fri, Jul 14, 2017 at 01:42:13PM -0400, Rob Clark wrote:
> On Fri, Jul 14, 2017 at 1:07 PM, Will Deacon <will.deacon@arm.com> wrote:
> > On Thu, Jul 13, 2017 at 10:55:10AM -0400, Rob Clark wrote:
> >> On Thu, Jul 13, 2017 at 9:53 AM, Sricharan R <sricharan@codeaurora.org> wrote:
> >> > Hi,
> >> >
> >> > On 7/13/2017 5:20 PM, Rob Clark wrote:
> >> >> On Thu, Jul 13, 2017 at 1:35 AM, Sricharan R <sricharan@codeaurora.org> wrote:
> >> >>> Hi Vivek,
> >> >>>
> >> >>> On 7/13/2017 10:43 AM, Vivek Gautam wrote:
> >> >>>> Hi Stephen,
> >> >>>>
> >> >>>>
> >> >>>> On 07/13/2017 04:24 AM, Stephen Boyd wrote:
> >> >>>>> On 07/06, Vivek Gautam wrote:
> >> >>>>>> @@ -1231,12 +1237,18 @@ static int arm_smmu_map(struct iommu_domain *domain, unsigned long iova,
> >> >>>>>>   static size_t arm_smmu_unmap(struct iommu_domain *domain, unsigned long iova,
> >> >>>>>>                    size_t size)
> >> >>>>>>   {
> >> >>>>>> -    struct io_pgtable_ops *ops = to_smmu_domain(domain)->pgtbl_ops;
> >> >>>>>> +    struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
> >> >>>>>> +    struct io_pgtable_ops *ops = smmu_domain->pgtbl_ops;
> >> >>>>>> +    size_t ret;
> >> >>>>>>         if (!ops)
> >> >>>>>>           return 0;
> >> >>>>>>   -    return ops->unmap(ops, iova, size);
> >> >>>>>> +    pm_runtime_get_sync(smmu_domain->smmu->dev);
> >> >>>>> Can these map/unmap ops be called from an atomic context? I seem
> >> >>>>> to recall that being a problem before.
> >> >>>>
> >> >>>> That's something which was dropped in the following patch merged in master:
> >> >>>> 523d7423e21b iommu/arm-smmu: Remove io-pgtable spinlock
> >> >>>>
> >> >>>> Looks like we don't  need locks here anymore?
> >> >>>
> >> >>>  Apart from the locking, wonder why a explicit pm_runtime is needed
> >> >>>  from unmap. Somehow looks like some path in the master using that
> >> >>>  should have enabled the pm ?
> >> >>>
> >> >>
> >> >> Yes, there are a bunch of scenarios where unmap can happen with
> >> >> disabled master (but not in atomic context).  On the gpu side we
> >> >> opportunistically keep a buffer mapping until the buffer is freed
> >> >> (which can happen after gpu is disabled).  Likewise, v4l2 won't unmap
> >> >> an exported dmabuf while some other driver holds a reference to it
> >> >> (which can be dropped when the v4l2 device is suspended).
> >> >>
> >> >> Since unmap triggers tbl flush which touches iommu regs, the iommu
> >> >> driver *definitely* needs a pm_runtime_get_sync().
> >> >
> >> >  Ok, with that being the case, there are two things here,
> >> >
> >> >  1) If the device links are still intact at these places where unmap is called,
> >> >     then pm_runtime from the master would setup the all the clocks. That would
> >> >     avoid reintroducing the locking indirectly here.
> >> >
> >> >  2) If not, then doing it here is the only way. But for both cases, since
> >> >     the unmap can be called from atomic context, resume handler here should
> >> >     avoid doing clk_prepare_enable , instead move the clk_prepare to the init.
> >> >
> >>
> >> I do kinda like the approach Marek suggested.. of deferring the tlb
> >> flush until resume.  I'm wondering if we could combine that with
> >> putting the mmu in a stalled state when we suspend (and not resume the
> >> mmu until after the pending tlb flush)?
> >
> > I'm not sure that a stalled state is what we're after here, because we need
> > to take care to prevent any table walks if we've freed the underlying pages.
> > What we could try to do is disable the SMMU (put into global bypass) and
> > invalidate the TLB when performing a suspend operation, then we just ignore
> > invalidation whilst the clocks are stopped and, on resume, enable the SMMU
> > again.
> 
> wouldn't stalled just block any memory transactions by device(s) using
> the context bank?  Putting it in bypass isn't really a good thing if
> there is any chance the device can sneak in a memory access before
> we've taking it back out of bypass (ie. makes gpu a giant userspace
> controlled root hole).

If it doesn't deadlock, then yes, it will stall transactions. However, that
doesn't mean it necessarily prevents page table walks. Instead of bypass, we
could configure all the streams to terminate, but this race still worries me
somewhat. I thought that the SMMU would only be suspended if all of its
masters were suspended, so if the GPU wants to come out of suspend then the
SMMU should be resumed first.

It would be helpful if somebody could figure out exactly what can race with
the suspend/resume calls here.

Will

^ permalink raw reply	[flat|nested] 168+ messages in thread

* [PATCH V4 3/6] iommu/arm-smmu: Invoke pm_runtime during probe, add/remove device
@ 2017-07-14 18:06                                 ` Will Deacon
  0 siblings, 0 replies; 168+ messages in thread
From: Will Deacon @ 2017-07-14 18:06 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Jul 14, 2017 at 01:42:13PM -0400, Rob Clark wrote:
> On Fri, Jul 14, 2017 at 1:07 PM, Will Deacon <will.deacon@arm.com> wrote:
> > On Thu, Jul 13, 2017 at 10:55:10AM -0400, Rob Clark wrote:
> >> On Thu, Jul 13, 2017 at 9:53 AM, Sricharan R <sricharan@codeaurora.org> wrote:
> >> > Hi,
> >> >
> >> > On 7/13/2017 5:20 PM, Rob Clark wrote:
> >> >> On Thu, Jul 13, 2017 at 1:35 AM, Sricharan R <sricharan@codeaurora.org> wrote:
> >> >>> Hi Vivek,
> >> >>>
> >> >>> On 7/13/2017 10:43 AM, Vivek Gautam wrote:
> >> >>>> Hi Stephen,
> >> >>>>
> >> >>>>
> >> >>>> On 07/13/2017 04:24 AM, Stephen Boyd wrote:
> >> >>>>> On 07/06, Vivek Gautam wrote:
> >> >>>>>> @@ -1231,12 +1237,18 @@ static int arm_smmu_map(struct iommu_domain *domain, unsigned long iova,
> >> >>>>>>   static size_t arm_smmu_unmap(struct iommu_domain *domain, unsigned long iova,
> >> >>>>>>                    size_t size)
> >> >>>>>>   {
> >> >>>>>> -    struct io_pgtable_ops *ops = to_smmu_domain(domain)->pgtbl_ops;
> >> >>>>>> +    struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
> >> >>>>>> +    struct io_pgtable_ops *ops = smmu_domain->pgtbl_ops;
> >> >>>>>> +    size_t ret;
> >> >>>>>>         if (!ops)
> >> >>>>>>           return 0;
> >> >>>>>>   -    return ops->unmap(ops, iova, size);
> >> >>>>>> +    pm_runtime_get_sync(smmu_domain->smmu->dev);
> >> >>>>> Can these map/unmap ops be called from an atomic context? I seem
> >> >>>>> to recall that being a problem before.
> >> >>>>
> >> >>>> That's something which was dropped in the following patch merged in master:
> >> >>>> 523d7423e21b iommu/arm-smmu: Remove io-pgtable spinlock
> >> >>>>
> >> >>>> Looks like we don't  need locks here anymore?
> >> >>>
> >> >>>  Apart from the locking, wonder why a explicit pm_runtime is needed
> >> >>>  from unmap. Somehow looks like some path in the master using that
> >> >>>  should have enabled the pm ?
> >> >>>
> >> >>
> >> >> Yes, there are a bunch of scenarios where unmap can happen with
> >> >> disabled master (but not in atomic context).  On the gpu side we
> >> >> opportunistically keep a buffer mapping until the buffer is freed
> >> >> (which can happen after gpu is disabled).  Likewise, v4l2 won't unmap
> >> >> an exported dmabuf while some other driver holds a reference to it
> >> >> (which can be dropped when the v4l2 device is suspended).
> >> >>
> >> >> Since unmap triggers tbl flush which touches iommu regs, the iommu
> >> >> driver *definitely* needs a pm_runtime_get_sync().
> >> >
> >> >  Ok, with that being the case, there are two things here,
> >> >
> >> >  1) If the device links are still intact at these places where unmap is called,
> >> >     then pm_runtime from the master would setup the all the clocks. That would
> >> >     avoid reintroducing the locking indirectly here.
> >> >
> >> >  2) If not, then doing it here is the only way. But for both cases, since
> >> >     the unmap can be called from atomic context, resume handler here should
> >> >     avoid doing clk_prepare_enable , instead move the clk_prepare to the init.
> >> >
> >>
> >> I do kinda like the approach Marek suggested.. of deferring the tlb
> >> flush until resume.  I'm wondering if we could combine that with
> >> putting the mmu in a stalled state when we suspend (and not resume the
> >> mmu until after the pending tlb flush)?
> >
> > I'm not sure that a stalled state is what we're after here, because we need
> > to take care to prevent any table walks if we've freed the underlying pages.
> > What we could try to do is disable the SMMU (put into global bypass) and
> > invalidate the TLB when performing a suspend operation, then we just ignore
> > invalidation whilst the clocks are stopped and, on resume, enable the SMMU
> > again.
> 
> wouldn't stalled just block any memory transactions by device(s) using
> the context bank?  Putting it in bypass isn't really a good thing if
> there is any chance the device can sneak in a memory access before
> we've taking it back out of bypass (ie. makes gpu a giant userspace
> controlled root hole).

If it doesn't deadlock, then yes, it will stall transactions. However, that
doesn't mean it necessarily prevents page table walks. Instead of bypass, we
could configure all the streams to terminate, but this race still worries me
somewhat. I thought that the SMMU would only be suspended if all of its
masters were suspended, so if the GPU wants to come out of suspend then the
SMMU should be resumed first.

It would be helpful if somebody could figure out exactly what can race with
the suspend/resume calls here.

Will

^ permalink raw reply	[flat|nested] 168+ messages in thread

* Re: [PATCH V4 3/6] iommu/arm-smmu: Invoke pm_runtime during probe, add/remove device
  2017-07-14 18:06                                 ` Will Deacon
  (?)
@ 2017-07-14 18:25                                   ` Rob Clark
  -1 siblings, 0 replies; 168+ messages in thread
From: Rob Clark @ 2017-07-14 18:25 UTC (permalink / raw)
  To: Will Deacon
  Cc: Sricharan R, Vivek Gautam, Stephen Boyd, Joerg Roedel,
	Robin Murphy, Rob Herring, Mark Rutland, Marek Szyprowski, iommu,
	devicetree, Linux Kernel Mailing List, linux-clk, linux-arm-msm,
	Stanimir Varbanov, Archit Taneja, linux-arm-kernel

On Fri, Jul 14, 2017 at 2:06 PM, Will Deacon <will.deacon@arm.com> wrote:
> On Fri, Jul 14, 2017 at 01:42:13PM -0400, Rob Clark wrote:
>> On Fri, Jul 14, 2017 at 1:07 PM, Will Deacon <will.deacon@arm.com> wrote:
>> > On Thu, Jul 13, 2017 at 10:55:10AM -0400, Rob Clark wrote:
>> >> On Thu, Jul 13, 2017 at 9:53 AM, Sricharan R <sricharan@codeaurora.org> wrote:
>> >> > Hi,
>> >> >
>> >> > On 7/13/2017 5:20 PM, Rob Clark wrote:
>> >> >> On Thu, Jul 13, 2017 at 1:35 AM, Sricharan R <sricharan@codeaurora.org> wrote:
>> >> >>> Hi Vivek,
>> >> >>>
>> >> >>> On 7/13/2017 10:43 AM, Vivek Gautam wrote:
>> >> >>>> Hi Stephen,
>> >> >>>>
>> >> >>>>
>> >> >>>> On 07/13/2017 04:24 AM, Stephen Boyd wrote:
>> >> >>>>> On 07/06, Vivek Gautam wrote:
>> >> >>>>>> @@ -1231,12 +1237,18 @@ static int arm_smmu_map(struct iommu_domain *domain, unsigned long iova,
>> >> >>>>>>   static size_t arm_smmu_unmap(struct iommu_domain *domain, unsigned long iova,
>> >> >>>>>>                    size_t size)
>> >> >>>>>>   {
>> >> >>>>>> -    struct io_pgtable_ops *ops = to_smmu_domain(domain)->pgtbl_ops;
>> >> >>>>>> +    struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
>> >> >>>>>> +    struct io_pgtable_ops *ops = smmu_domain->pgtbl_ops;
>> >> >>>>>> +    size_t ret;
>> >> >>>>>>         if (!ops)
>> >> >>>>>>           return 0;
>> >> >>>>>>   -    return ops->unmap(ops, iova, size);
>> >> >>>>>> +    pm_runtime_get_sync(smmu_domain->smmu->dev);
>> >> >>>>> Can these map/unmap ops be called from an atomic context? I seem
>> >> >>>>> to recall that being a problem before.
>> >> >>>>
>> >> >>>> That's something which was dropped in the following patch merged in master:
>> >> >>>> 523d7423e21b iommu/arm-smmu: Remove io-pgtable spinlock
>> >> >>>>
>> >> >>>> Looks like we don't  need locks here anymore?
>> >> >>>
>> >> >>>  Apart from the locking, wonder why a explicit pm_runtime is needed
>> >> >>>  from unmap. Somehow looks like some path in the master using that
>> >> >>>  should have enabled the pm ?
>> >> >>>
>> >> >>
>> >> >> Yes, there are a bunch of scenarios where unmap can happen with
>> >> >> disabled master (but not in atomic context).  On the gpu side we
>> >> >> opportunistically keep a buffer mapping until the buffer is freed
>> >> >> (which can happen after gpu is disabled).  Likewise, v4l2 won't unmap
>> >> >> an exported dmabuf while some other driver holds a reference to it
>> >> >> (which can be dropped when the v4l2 device is suspended).
>> >> >>
>> >> >> Since unmap triggers tbl flush which touches iommu regs, the iommu
>> >> >> driver *definitely* needs a pm_runtime_get_sync().
>> >> >
>> >> >  Ok, with that being the case, there are two things here,
>> >> >
>> >> >  1) If the device links are still intact at these places where unmap is called,
>> >> >     then pm_runtime from the master would setup the all the clocks. That would
>> >> >     avoid reintroducing the locking indirectly here.
>> >> >
>> >> >  2) If not, then doing it here is the only way. But for both cases, since
>> >> >     the unmap can be called from atomic context, resume handler here should
>> >> >     avoid doing clk_prepare_enable , instead move the clk_prepare to the init.
>> >> >
>> >>
>> >> I do kinda like the approach Marek suggested.. of deferring the tlb
>> >> flush until resume.  I'm wondering if we could combine that with
>> >> putting the mmu in a stalled state when we suspend (and not resume the
>> >> mmu until after the pending tlb flush)?
>> >
>> > I'm not sure that a stalled state is what we're after here, because we need
>> > to take care to prevent any table walks if we've freed the underlying pages.
>> > What we could try to do is disable the SMMU (put into global bypass) and
>> > invalidate the TLB when performing a suspend operation, then we just ignore
>> > invalidation whilst the clocks are stopped and, on resume, enable the SMMU
>> > again.
>>
>> wouldn't stalled just block any memory transactions by device(s) using
>> the context bank?  Putting it in bypass isn't really a good thing if
>> there is any chance the device can sneak in a memory access before
>> we've taking it back out of bypass (ie. makes gpu a giant userspace
>> controlled root hole).
>
> If it doesn't deadlock, then yes, it will stall transactions. However, that
> doesn't mean it necessarily prevents page table walks.

btw, I guess the concern about pagetable walk is that the unmap could
have removed some sub-level of the pt that the tlb walk would hit?
Would deferring freeing those pages help?

> Instead of bypass, we
> could configure all the streams to terminate, but this race still worries me
> somewhat. I thought that the SMMU would only be suspended if all of its
> masters were suspended, so if the GPU wants to come out of suspend then the
> SMMU should be resumed first.

I believe this should be true.. on the gpu side, I'm mostly trying to
avoid having to power the gpu back on to free buffers.  (On the v4l2
side, somewhere in the core videobuf code would also need to be made
to wrap it's dma_unmap_sg() with pm_runtime_get/put()..)

BR,
-R

> It would be helpful if somebody could figure out exactly what can race with
> the suspend/resume calls here.
>
> Will

^ permalink raw reply	[flat|nested] 168+ messages in thread

* Re: [PATCH V4 3/6] iommu/arm-smmu: Invoke pm_runtime during probe, add/remove device
@ 2017-07-14 18:25                                   ` Rob Clark
  0 siblings, 0 replies; 168+ messages in thread
From: Rob Clark @ 2017-07-14 18:25 UTC (permalink / raw)
  To: Will Deacon
  Cc: Sricharan R, Vivek Gautam, Stephen Boyd, Joerg Roedel,
	Robin Murphy, Rob Herring, Mark Rutland, Marek Szyprowski, iommu,
	devicetree, Linux Kernel Mailing List, linux-clk, linux-arm-msm,
	Stanimir Varbanov, Archit Taneja, linux-arm-kernel

On Fri, Jul 14, 2017 at 2:06 PM, Will Deacon <will.deacon@arm.com> wrote:
> On Fri, Jul 14, 2017 at 01:42:13PM -0400, Rob Clark wrote:
>> On Fri, Jul 14, 2017 at 1:07 PM, Will Deacon <will.deacon@arm.com> wrote:
>> > On Thu, Jul 13, 2017 at 10:55:10AM -0400, Rob Clark wrote:
>> >> On Thu, Jul 13, 2017 at 9:53 AM, Sricharan R <sricharan@codeaurora.org> wrote:
>> >> > Hi,
>> >> >
>> >> > On 7/13/2017 5:20 PM, Rob Clark wrote:
>> >> >> On Thu, Jul 13, 2017 at 1:35 AM, Sricharan R <sricharan@codeaurora.org> wrote:
>> >> >>> Hi Vivek,
>> >> >>>
>> >> >>> On 7/13/2017 10:43 AM, Vivek Gautam wrote:
>> >> >>>> Hi Stephen,
>> >> >>>>
>> >> >>>>
>> >> >>>> On 07/13/2017 04:24 AM, Stephen Boyd wrote:
>> >> >>>>> On 07/06, Vivek Gautam wrote:
>> >> >>>>>> @@ -1231,12 +1237,18 @@ static int arm_smmu_map(struct iommu_domain *domain, unsigned long iova,
>> >> >>>>>>   static size_t arm_smmu_unmap(struct iommu_domain *domain, unsigned long iova,
>> >> >>>>>>                    size_t size)
>> >> >>>>>>   {
>> >> >>>>>> -    struct io_pgtable_ops *ops = to_smmu_domain(domain)->pgtbl_ops;
>> >> >>>>>> +    struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
>> >> >>>>>> +    struct io_pgtable_ops *ops = smmu_domain->pgtbl_ops;
>> >> >>>>>> +    size_t ret;
>> >> >>>>>>         if (!ops)
>> >> >>>>>>           return 0;
>> >> >>>>>>   -    return ops->unmap(ops, iova, size);
>> >> >>>>>> +    pm_runtime_get_sync(smmu_domain->smmu->dev);
>> >> >>>>> Can these map/unmap ops be called from an atomic context? I seem
>> >> >>>>> to recall that being a problem before.
>> >> >>>>
>> >> >>>> That's something which was dropped in the following patch merged in master:
>> >> >>>> 523d7423e21b iommu/arm-smmu: Remove io-pgtable spinlock
>> >> >>>>
>> >> >>>> Looks like we don't  need locks here anymore?
>> >> >>>
>> >> >>>  Apart from the locking, wonder why a explicit pm_runtime is needed
>> >> >>>  from unmap. Somehow looks like some path in the master using that
>> >> >>>  should have enabled the pm ?
>> >> >>>
>> >> >>
>> >> >> Yes, there are a bunch of scenarios where unmap can happen with
>> >> >> disabled master (but not in atomic context).  On the gpu side we
>> >> >> opportunistically keep a buffer mapping until the buffer is freed
>> >> >> (which can happen after gpu is disabled).  Likewise, v4l2 won't unmap
>> >> >> an exported dmabuf while some other driver holds a reference to it
>> >> >> (which can be dropped when the v4l2 device is suspended).
>> >> >>
>> >> >> Since unmap triggers tbl flush which touches iommu regs, the iommu
>> >> >> driver *definitely* needs a pm_runtime_get_sync().
>> >> >
>> >> >  Ok, with that being the case, there are two things here,
>> >> >
>> >> >  1) If the device links are still intact at these places where unmap is called,
>> >> >     then pm_runtime from the master would setup the all the clocks. That would
>> >> >     avoid reintroducing the locking indirectly here.
>> >> >
>> >> >  2) If not, then doing it here is the only way. But for both cases, since
>> >> >     the unmap can be called from atomic context, resume handler here should
>> >> >     avoid doing clk_prepare_enable , instead move the clk_prepare to the init.
>> >> >
>> >>
>> >> I do kinda like the approach Marek suggested.. of deferring the tlb
>> >> flush until resume.  I'm wondering if we could combine that with
>> >> putting the mmu in a stalled state when we suspend (and not resume the
>> >> mmu until after the pending tlb flush)?
>> >
>> > I'm not sure that a stalled state is what we're after here, because we need
>> > to take care to prevent any table walks if we've freed the underlying pages.
>> > What we could try to do is disable the SMMU (put into global bypass) and
>> > invalidate the TLB when performing a suspend operation, then we just ignore
>> > invalidation whilst the clocks are stopped and, on resume, enable the SMMU
>> > again.
>>
>> wouldn't stalled just block any memory transactions by device(s) using
>> the context bank?  Putting it in bypass isn't really a good thing if
>> there is any chance the device can sneak in a memory access before
>> we've taking it back out of bypass (ie. makes gpu a giant userspace
>> controlled root hole).
>
> If it doesn't deadlock, then yes, it will stall transactions. However, that
> doesn't mean it necessarily prevents page table walks.

btw, I guess the concern about pagetable walk is that the unmap could
have removed some sub-level of the pt that the tlb walk would hit?
Would deferring freeing those pages help?

> Instead of bypass, we
> could configure all the streams to terminate, but this race still worries me
> somewhat. I thought that the SMMU would only be suspended if all of its
> masters were suspended, so if the GPU wants to come out of suspend then the
> SMMU should be resumed first.

I believe this should be true.. on the gpu side, I'm mostly trying to
avoid having to power the gpu back on to free buffers.  (On the v4l2
side, somewhere in the core videobuf code would also need to be made
to wrap it's dma_unmap_sg() with pm_runtime_get/put()..)

BR,
-R

> It would be helpful if somebody could figure out exactly what can race with
> the suspend/resume calls here.
>
> Will

^ permalink raw reply	[flat|nested] 168+ messages in thread

* [PATCH V4 3/6] iommu/arm-smmu: Invoke pm_runtime during probe, add/remove device
@ 2017-07-14 18:25                                   ` Rob Clark
  0 siblings, 0 replies; 168+ messages in thread
From: Rob Clark @ 2017-07-14 18:25 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Jul 14, 2017 at 2:06 PM, Will Deacon <will.deacon@arm.com> wrote:
> On Fri, Jul 14, 2017 at 01:42:13PM -0400, Rob Clark wrote:
>> On Fri, Jul 14, 2017 at 1:07 PM, Will Deacon <will.deacon@arm.com> wrote:
>> > On Thu, Jul 13, 2017 at 10:55:10AM -0400, Rob Clark wrote:
>> >> On Thu, Jul 13, 2017 at 9:53 AM, Sricharan R <sricharan@codeaurora.org> wrote:
>> >> > Hi,
>> >> >
>> >> > On 7/13/2017 5:20 PM, Rob Clark wrote:
>> >> >> On Thu, Jul 13, 2017 at 1:35 AM, Sricharan R <sricharan@codeaurora.org> wrote:
>> >> >>> Hi Vivek,
>> >> >>>
>> >> >>> On 7/13/2017 10:43 AM, Vivek Gautam wrote:
>> >> >>>> Hi Stephen,
>> >> >>>>
>> >> >>>>
>> >> >>>> On 07/13/2017 04:24 AM, Stephen Boyd wrote:
>> >> >>>>> On 07/06, Vivek Gautam wrote:
>> >> >>>>>> @@ -1231,12 +1237,18 @@ static int arm_smmu_map(struct iommu_domain *domain, unsigned long iova,
>> >> >>>>>>   static size_t arm_smmu_unmap(struct iommu_domain *domain, unsigned long iova,
>> >> >>>>>>                    size_t size)
>> >> >>>>>>   {
>> >> >>>>>> -    struct io_pgtable_ops *ops = to_smmu_domain(domain)->pgtbl_ops;
>> >> >>>>>> +    struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
>> >> >>>>>> +    struct io_pgtable_ops *ops = smmu_domain->pgtbl_ops;
>> >> >>>>>> +    size_t ret;
>> >> >>>>>>         if (!ops)
>> >> >>>>>>           return 0;
>> >> >>>>>>   -    return ops->unmap(ops, iova, size);
>> >> >>>>>> +    pm_runtime_get_sync(smmu_domain->smmu->dev);
>> >> >>>>> Can these map/unmap ops be called from an atomic context? I seem
>> >> >>>>> to recall that being a problem before.
>> >> >>>>
>> >> >>>> That's something which was dropped in the following patch merged in master:
>> >> >>>> 523d7423e21b iommu/arm-smmu: Remove io-pgtable spinlock
>> >> >>>>
>> >> >>>> Looks like we don't  need locks here anymore?
>> >> >>>
>> >> >>>  Apart from the locking, wonder why a explicit pm_runtime is needed
>> >> >>>  from unmap. Somehow looks like some path in the master using that
>> >> >>>  should have enabled the pm ?
>> >> >>>
>> >> >>
>> >> >> Yes, there are a bunch of scenarios where unmap can happen with
>> >> >> disabled master (but not in atomic context).  On the gpu side we
>> >> >> opportunistically keep a buffer mapping until the buffer is freed
>> >> >> (which can happen after gpu is disabled).  Likewise, v4l2 won't unmap
>> >> >> an exported dmabuf while some other driver holds a reference to it
>> >> >> (which can be dropped when the v4l2 device is suspended).
>> >> >>
>> >> >> Since unmap triggers tbl flush which touches iommu regs, the iommu
>> >> >> driver *definitely* needs a pm_runtime_get_sync().
>> >> >
>> >> >  Ok, with that being the case, there are two things here,
>> >> >
>> >> >  1) If the device links are still intact at these places where unmap is called,
>> >> >     then pm_runtime from the master would setup the all the clocks. That would
>> >> >     avoid reintroducing the locking indirectly here.
>> >> >
>> >> >  2) If not, then doing it here is the only way. But for both cases, since
>> >> >     the unmap can be called from atomic context, resume handler here should
>> >> >     avoid doing clk_prepare_enable , instead move the clk_prepare to the init.
>> >> >
>> >>
>> >> I do kinda like the approach Marek suggested.. of deferring the tlb
>> >> flush until resume.  I'm wondering if we could combine that with
>> >> putting the mmu in a stalled state when we suspend (and not resume the
>> >> mmu until after the pending tlb flush)?
>> >
>> > I'm not sure that a stalled state is what we're after here, because we need
>> > to take care to prevent any table walks if we've freed the underlying pages.
>> > What we could try to do is disable the SMMU (put into global bypass) and
>> > invalidate the TLB when performing a suspend operation, then we just ignore
>> > invalidation whilst the clocks are stopped and, on resume, enable the SMMU
>> > again.
>>
>> wouldn't stalled just block any memory transactions by device(s) using
>> the context bank?  Putting it in bypass isn't really a good thing if
>> there is any chance the device can sneak in a memory access before
>> we've taking it back out of bypass (ie. makes gpu a giant userspace
>> controlled root hole).
>
> If it doesn't deadlock, then yes, it will stall transactions. However, that
> doesn't mean it necessarily prevents page table walks.

btw, I guess the concern about pagetable walk is that the unmap could
have removed some sub-level of the pt that the tlb walk would hit?
Would deferring freeing those pages help?

> Instead of bypass, we
> could configure all the streams to terminate, but this race still worries me
> somewhat. I thought that the SMMU would only be suspended if all of its
> masters were suspended, so if the GPU wants to come out of suspend then the
> SMMU should be resumed first.

I believe this should be true.. on the gpu side, I'm mostly trying to
avoid having to power the gpu back on to free buffers.  (On the v4l2
side, somewhere in the core videobuf code would also need to be made
to wrap it's dma_unmap_sg() with pm_runtime_get/put()..)

BR,
-R

> It would be helpful if somebody could figure out exactly what can race with
> the suspend/resume calls here.
>
> Will

^ permalink raw reply	[flat|nested] 168+ messages in thread

* Re: [PATCH V4 3/6] iommu/arm-smmu: Invoke pm_runtime during probe, add/remove device
  2017-07-14 18:25                                   ` Rob Clark
  (?)
@ 2017-07-14 19:01                                     ` Will Deacon
  -1 siblings, 0 replies; 168+ messages in thread
From: Will Deacon @ 2017-07-14 19:01 UTC (permalink / raw)
  To: Rob Clark
  Cc: Sricharan R, Vivek Gautam, Stephen Boyd, Joerg Roedel,
	Robin Murphy, Rob Herring, Mark Rutland, Marek Szyprowski, iommu,
	devicetree, Linux Kernel Mailing List, linux-clk, linux-arm-msm,
	Stanimir Varbanov, Archit Taneja, linux-arm-kernel

On Fri, Jul 14, 2017 at 02:25:45PM -0400, Rob Clark wrote:
> On Fri, Jul 14, 2017 at 2:06 PM, Will Deacon <will.deacon@arm.com> wrote:
> > On Fri, Jul 14, 2017 at 01:42:13PM -0400, Rob Clark wrote:
> >> On Fri, Jul 14, 2017 at 1:07 PM, Will Deacon <will.deacon@arm.com> wrote:
> >> > On Thu, Jul 13, 2017 at 10:55:10AM -0400, Rob Clark wrote:
> >> >> On Thu, Jul 13, 2017 at 9:53 AM, Sricharan R <sricharan@codeaurora.org> wrote:
> >> >> > Hi,
> >> >> >
> >> >> > On 7/13/2017 5:20 PM, Rob Clark wrote:
> >> >> >> On Thu, Jul 13, 2017 at 1:35 AM, Sricharan R <sricharan@codeaurora.org> wrote:
> >> >> >>> Hi Vivek,
> >> >> >>>
> >> >> >>> On 7/13/2017 10:43 AM, Vivek Gautam wrote:
> >> >> >>>> Hi Stephen,
> >> >> >>>>
> >> >> >>>>
> >> >> >>>> On 07/13/2017 04:24 AM, Stephen Boyd wrote:
> >> >> >>>>> On 07/06, Vivek Gautam wrote:
> >> >> >>>>>> @@ -1231,12 +1237,18 @@ static int arm_smmu_map(struct iommu_domain *domain, unsigned long iova,
> >> >> >>>>>>   static size_t arm_smmu_unmap(struct iommu_domain *domain, unsigned long iova,
> >> >> >>>>>>                    size_t size)
> >> >> >>>>>>   {
> >> >> >>>>>> -    struct io_pgtable_ops *ops = to_smmu_domain(domain)->pgtbl_ops;
> >> >> >>>>>> +    struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
> >> >> >>>>>> +    struct io_pgtable_ops *ops = smmu_domain->pgtbl_ops;
> >> >> >>>>>> +    size_t ret;
> >> >> >>>>>>         if (!ops)
> >> >> >>>>>>           return 0;
> >> >> >>>>>>   -    return ops->unmap(ops, iova, size);
> >> >> >>>>>> +    pm_runtime_get_sync(smmu_domain->smmu->dev);
> >> >> >>>>> Can these map/unmap ops be called from an atomic context? I seem
> >> >> >>>>> to recall that being a problem before.
> >> >> >>>>
> >> >> >>>> That's something which was dropped in the following patch merged in master:
> >> >> >>>> 523d7423e21b iommu/arm-smmu: Remove io-pgtable spinlock
> >> >> >>>>
> >> >> >>>> Looks like we don't  need locks here anymore?
> >> >> >>>
> >> >> >>>  Apart from the locking, wonder why a explicit pm_runtime is needed
> >> >> >>>  from unmap. Somehow looks like some path in the master using that
> >> >> >>>  should have enabled the pm ?
> >> >> >>>
> >> >> >>
> >> >> >> Yes, there are a bunch of scenarios where unmap can happen with
> >> >> >> disabled master (but not in atomic context).  On the gpu side we
> >> >> >> opportunistically keep a buffer mapping until the buffer is freed
> >> >> >> (which can happen after gpu is disabled).  Likewise, v4l2 won't unmap
> >> >> >> an exported dmabuf while some other driver holds a reference to it
> >> >> >> (which can be dropped when the v4l2 device is suspended).
> >> >> >>
> >> >> >> Since unmap triggers tbl flush which touches iommu regs, the iommu
> >> >> >> driver *definitely* needs a pm_runtime_get_sync().
> >> >> >
> >> >> >  Ok, with that being the case, there are two things here,
> >> >> >
> >> >> >  1) If the device links are still intact at these places where unmap is called,
> >> >> >     then pm_runtime from the master would setup the all the clocks. That would
> >> >> >     avoid reintroducing the locking indirectly here.
> >> >> >
> >> >> >  2) If not, then doing it here is the only way. But for both cases, since
> >> >> >     the unmap can be called from atomic context, resume handler here should
> >> >> >     avoid doing clk_prepare_enable , instead move the clk_prepare to the init.
> >> >> >
> >> >>
> >> >> I do kinda like the approach Marek suggested.. of deferring the tlb
> >> >> flush until resume.  I'm wondering if we could combine that with
> >> >> putting the mmu in a stalled state when we suspend (and not resume the
> >> >> mmu until after the pending tlb flush)?
> >> >
> >> > I'm not sure that a stalled state is what we're after here, because we need
> >> > to take care to prevent any table walks if we've freed the underlying pages.
> >> > What we could try to do is disable the SMMU (put into global bypass) and
> >> > invalidate the TLB when performing a suspend operation, then we just ignore
> >> > invalidation whilst the clocks are stopped and, on resume, enable the SMMU
> >> > again.
> >>
> >> wouldn't stalled just block any memory transactions by device(s) using
> >> the context bank?  Putting it in bypass isn't really a good thing if
> >> there is any chance the device can sneak in a memory access before
> >> we've taking it back out of bypass (ie. makes gpu a giant userspace
> >> controlled root hole).
> >
> > If it doesn't deadlock, then yes, it will stall transactions. However, that
> > doesn't mean it necessarily prevents page table walks.
> 
> btw, I guess the concern about pagetable walk is that the unmap could
> have removed some sub-level of the pt that the tlb walk would hit?
> Would deferring freeing those pages help?

Could do, but it sounds like a lot of complication that I think we can fix
by making the suspend operation put the SMMU into a "clean" state.

> > Instead of bypass, we
> > could configure all the streams to terminate, but this race still worries me
> > somewhat. I thought that the SMMU would only be suspended if all of its
> > masters were suspended, so if the GPU wants to come out of suspend then the
> > SMMU should be resumed first.
> 
> I believe this should be true.. on the gpu side, I'm mostly trying to
> avoid having to power the gpu back on to free buffers.  (On the v4l2
> side, somewhere in the core videobuf code would also need to be made
> to wrap it's dma_unmap_sg() with pm_runtime_get/put()..)

Right, and we shouldn't have to resume it if we suspend it in a clean state,
with the TLBs invalidated.

Will

^ permalink raw reply	[flat|nested] 168+ messages in thread

* Re: [PATCH V4 3/6] iommu/arm-smmu: Invoke pm_runtime during probe, add/remove device
@ 2017-07-14 19:01                                     ` Will Deacon
  0 siblings, 0 replies; 168+ messages in thread
From: Will Deacon @ 2017-07-14 19:01 UTC (permalink / raw)
  To: Rob Clark
  Cc: Sricharan R, Vivek Gautam, Stephen Boyd, Joerg Roedel,
	Robin Murphy, Rob Herring, Mark Rutland, Marek Szyprowski, iommu,
	devicetree, Linux Kernel Mailing List, linux-clk, linux-arm-msm,
	Stanimir Varbanov, Archit Taneja, linux-arm-kernel

On Fri, Jul 14, 2017 at 02:25:45PM -0400, Rob Clark wrote:
> On Fri, Jul 14, 2017 at 2:06 PM, Will Deacon <will.deacon@arm.com> wrote:
> > On Fri, Jul 14, 2017 at 01:42:13PM -0400, Rob Clark wrote:
> >> On Fri, Jul 14, 2017 at 1:07 PM, Will Deacon <will.deacon@arm.com> wrote:
> >> > On Thu, Jul 13, 2017 at 10:55:10AM -0400, Rob Clark wrote:
> >> >> On Thu, Jul 13, 2017 at 9:53 AM, Sricharan R <sricharan@codeaurora.org> wrote:
> >> >> > Hi,
> >> >> >
> >> >> > On 7/13/2017 5:20 PM, Rob Clark wrote:
> >> >> >> On Thu, Jul 13, 2017 at 1:35 AM, Sricharan R <sricharan@codeaurora.org> wrote:
> >> >> >>> Hi Vivek,
> >> >> >>>
> >> >> >>> On 7/13/2017 10:43 AM, Vivek Gautam wrote:
> >> >> >>>> Hi Stephen,
> >> >> >>>>
> >> >> >>>>
> >> >> >>>> On 07/13/2017 04:24 AM, Stephen Boyd wrote:
> >> >> >>>>> On 07/06, Vivek Gautam wrote:
> >> >> >>>>>> @@ -1231,12 +1237,18 @@ static int arm_smmu_map(struct iommu_domain *domain, unsigned long iova,
> >> >> >>>>>>   static size_t arm_smmu_unmap(struct iommu_domain *domain, unsigned long iova,
> >> >> >>>>>>                    size_t size)
> >> >> >>>>>>   {
> >> >> >>>>>> -    struct io_pgtable_ops *ops = to_smmu_domain(domain)->pgtbl_ops;
> >> >> >>>>>> +    struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
> >> >> >>>>>> +    struct io_pgtable_ops *ops = smmu_domain->pgtbl_ops;
> >> >> >>>>>> +    size_t ret;
> >> >> >>>>>>         if (!ops)
> >> >> >>>>>>           return 0;
> >> >> >>>>>>   -    return ops->unmap(ops, iova, size);
> >> >> >>>>>> +    pm_runtime_get_sync(smmu_domain->smmu->dev);
> >> >> >>>>> Can these map/unmap ops be called from an atomic context? I seem
> >> >> >>>>> to recall that being a problem before.
> >> >> >>>>
> >> >> >>>> That's something which was dropped in the following patch merged in master:
> >> >> >>>> 523d7423e21b iommu/arm-smmu: Remove io-pgtable spinlock
> >> >> >>>>
> >> >> >>>> Looks like we don't  need locks here anymore?
> >> >> >>>
> >> >> >>>  Apart from the locking, wonder why a explicit pm_runtime is needed
> >> >> >>>  from unmap. Somehow looks like some path in the master using that
> >> >> >>>  should have enabled the pm ?
> >> >> >>>
> >> >> >>
> >> >> >> Yes, there are a bunch of scenarios where unmap can happen with
> >> >> >> disabled master (but not in atomic context).  On the gpu side we
> >> >> >> opportunistically keep a buffer mapping until the buffer is freed
> >> >> >> (which can happen after gpu is disabled).  Likewise, v4l2 won't unmap
> >> >> >> an exported dmabuf while some other driver holds a reference to it
> >> >> >> (which can be dropped when the v4l2 device is suspended).
> >> >> >>
> >> >> >> Since unmap triggers tbl flush which touches iommu regs, the iommu
> >> >> >> driver *definitely* needs a pm_runtime_get_sync().
> >> >> >
> >> >> >  Ok, with that being the case, there are two things here,
> >> >> >
> >> >> >  1) If the device links are still intact at these places where unmap is called,
> >> >> >     then pm_runtime from the master would setup the all the clocks. That would
> >> >> >     avoid reintroducing the locking indirectly here.
> >> >> >
> >> >> >  2) If not, then doing it here is the only way. But for both cases, since
> >> >> >     the unmap can be called from atomic context, resume handler here should
> >> >> >     avoid doing clk_prepare_enable , instead move the clk_prepare to the init.
> >> >> >
> >> >>
> >> >> I do kinda like the approach Marek suggested.. of deferring the tlb
> >> >> flush until resume.  I'm wondering if we could combine that with
> >> >> putting the mmu in a stalled state when we suspend (and not resume the
> >> >> mmu until after the pending tlb flush)?
> >> >
> >> > I'm not sure that a stalled state is what we're after here, because we need
> >> > to take care to prevent any table walks if we've freed the underlying pages.
> >> > What we could try to do is disable the SMMU (put into global bypass) and
> >> > invalidate the TLB when performing a suspend operation, then we just ignore
> >> > invalidation whilst the clocks are stopped and, on resume, enable the SMMU
> >> > again.
> >>
> >> wouldn't stalled just block any memory transactions by device(s) using
> >> the context bank?  Putting it in bypass isn't really a good thing if
> >> there is any chance the device can sneak in a memory access before
> >> we've taking it back out of bypass (ie. makes gpu a giant userspace
> >> controlled root hole).
> >
> > If it doesn't deadlock, then yes, it will stall transactions. However, that
> > doesn't mean it necessarily prevents page table walks.
> 
> btw, I guess the concern about pagetable walk is that the unmap could
> have removed some sub-level of the pt that the tlb walk would hit?
> Would deferring freeing those pages help?

Could do, but it sounds like a lot of complication that I think we can fix
by making the suspend operation put the SMMU into a "clean" state.

> > Instead of bypass, we
> > could configure all the streams to terminate, but this race still worries me
> > somewhat. I thought that the SMMU would only be suspended if all of its
> > masters were suspended, so if the GPU wants to come out of suspend then the
> > SMMU should be resumed first.
> 
> I believe this should be true.. on the gpu side, I'm mostly trying to
> avoid having to power the gpu back on to free buffers.  (On the v4l2
> side, somewhere in the core videobuf code would also need to be made
> to wrap it's dma_unmap_sg() with pm_runtime_get/put()..)

Right, and we shouldn't have to resume it if we suspend it in a clean state,
with the TLBs invalidated.

Will

^ permalink raw reply	[flat|nested] 168+ messages in thread

* [PATCH V4 3/6] iommu/arm-smmu: Invoke pm_runtime during probe, add/remove device
@ 2017-07-14 19:01                                     ` Will Deacon
  0 siblings, 0 replies; 168+ messages in thread
From: Will Deacon @ 2017-07-14 19:01 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Jul 14, 2017 at 02:25:45PM -0400, Rob Clark wrote:
> On Fri, Jul 14, 2017 at 2:06 PM, Will Deacon <will.deacon@arm.com> wrote:
> > On Fri, Jul 14, 2017 at 01:42:13PM -0400, Rob Clark wrote:
> >> On Fri, Jul 14, 2017 at 1:07 PM, Will Deacon <will.deacon@arm.com> wrote:
> >> > On Thu, Jul 13, 2017 at 10:55:10AM -0400, Rob Clark wrote:
> >> >> On Thu, Jul 13, 2017 at 9:53 AM, Sricharan R <sricharan@codeaurora.org> wrote:
> >> >> > Hi,
> >> >> >
> >> >> > On 7/13/2017 5:20 PM, Rob Clark wrote:
> >> >> >> On Thu, Jul 13, 2017 at 1:35 AM, Sricharan R <sricharan@codeaurora.org> wrote:
> >> >> >>> Hi Vivek,
> >> >> >>>
> >> >> >>> On 7/13/2017 10:43 AM, Vivek Gautam wrote:
> >> >> >>>> Hi Stephen,
> >> >> >>>>
> >> >> >>>>
> >> >> >>>> On 07/13/2017 04:24 AM, Stephen Boyd wrote:
> >> >> >>>>> On 07/06, Vivek Gautam wrote:
> >> >> >>>>>> @@ -1231,12 +1237,18 @@ static int arm_smmu_map(struct iommu_domain *domain, unsigned long iova,
> >> >> >>>>>>   static size_t arm_smmu_unmap(struct iommu_domain *domain, unsigned long iova,
> >> >> >>>>>>                    size_t size)
> >> >> >>>>>>   {
> >> >> >>>>>> -    struct io_pgtable_ops *ops = to_smmu_domain(domain)->pgtbl_ops;
> >> >> >>>>>> +    struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
> >> >> >>>>>> +    struct io_pgtable_ops *ops = smmu_domain->pgtbl_ops;
> >> >> >>>>>> +    size_t ret;
> >> >> >>>>>>         if (!ops)
> >> >> >>>>>>           return 0;
> >> >> >>>>>>   -    return ops->unmap(ops, iova, size);
> >> >> >>>>>> +    pm_runtime_get_sync(smmu_domain->smmu->dev);
> >> >> >>>>> Can these map/unmap ops be called from an atomic context? I seem
> >> >> >>>>> to recall that being a problem before.
> >> >> >>>>
> >> >> >>>> That's something which was dropped in the following patch merged in master:
> >> >> >>>> 523d7423e21b iommu/arm-smmu: Remove io-pgtable spinlock
> >> >> >>>>
> >> >> >>>> Looks like we don't  need locks here anymore?
> >> >> >>>
> >> >> >>>  Apart from the locking, wonder why a explicit pm_runtime is needed
> >> >> >>>  from unmap. Somehow looks like some path in the master using that
> >> >> >>>  should have enabled the pm ?
> >> >> >>>
> >> >> >>
> >> >> >> Yes, there are a bunch of scenarios where unmap can happen with
> >> >> >> disabled master (but not in atomic context).  On the gpu side we
> >> >> >> opportunistically keep a buffer mapping until the buffer is freed
> >> >> >> (which can happen after gpu is disabled).  Likewise, v4l2 won't unmap
> >> >> >> an exported dmabuf while some other driver holds a reference to it
> >> >> >> (which can be dropped when the v4l2 device is suspended).
> >> >> >>
> >> >> >> Since unmap triggers tbl flush which touches iommu regs, the iommu
> >> >> >> driver *definitely* needs a pm_runtime_get_sync().
> >> >> >
> >> >> >  Ok, with that being the case, there are two things here,
> >> >> >
> >> >> >  1) If the device links are still intact at these places where unmap is called,
> >> >> >     then pm_runtime from the master would setup the all the clocks. That would
> >> >> >     avoid reintroducing the locking indirectly here.
> >> >> >
> >> >> >  2) If not, then doing it here is the only way. But for both cases, since
> >> >> >     the unmap can be called from atomic context, resume handler here should
> >> >> >     avoid doing clk_prepare_enable , instead move the clk_prepare to the init.
> >> >> >
> >> >>
> >> >> I do kinda like the approach Marek suggested.. of deferring the tlb
> >> >> flush until resume.  I'm wondering if we could combine that with
> >> >> putting the mmu in a stalled state when we suspend (and not resume the
> >> >> mmu until after the pending tlb flush)?
> >> >
> >> > I'm not sure that a stalled state is what we're after here, because we need
> >> > to take care to prevent any table walks if we've freed the underlying pages.
> >> > What we could try to do is disable the SMMU (put into global bypass) and
> >> > invalidate the TLB when performing a suspend operation, then we just ignore
> >> > invalidation whilst the clocks are stopped and, on resume, enable the SMMU
> >> > again.
> >>
> >> wouldn't stalled just block any memory transactions by device(s) using
> >> the context bank?  Putting it in bypass isn't really a good thing if
> >> there is any chance the device can sneak in a memory access before
> >> we've taking it back out of bypass (ie. makes gpu a giant userspace
> >> controlled root hole).
> >
> > If it doesn't deadlock, then yes, it will stall transactions. However, that
> > doesn't mean it necessarily prevents page table walks.
> 
> btw, I guess the concern about pagetable walk is that the unmap could
> have removed some sub-level of the pt that the tlb walk would hit?
> Would deferring freeing those pages help?

Could do, but it sounds like a lot of complication that I think we can fix
by making the suspend operation put the SMMU into a "clean" state.

> > Instead of bypass, we
> > could configure all the streams to terminate, but this race still worries me
> > somewhat. I thought that the SMMU would only be suspended if all of its
> > masters were suspended, so if the GPU wants to come out of suspend then the
> > SMMU should be resumed first.
> 
> I believe this should be true.. on the gpu side, I'm mostly trying to
> avoid having to power the gpu back on to free buffers.  (On the v4l2
> side, somewhere in the core videobuf code would also need to be made
> to wrap it's dma_unmap_sg() with pm_runtime_get/put()..)

Right, and we shouldn't have to resume it if we suspend it in a clean state,
with the TLBs invalidated.

Will

^ permalink raw reply	[flat|nested] 168+ messages in thread

* Re: [PATCH V4 3/6] iommu/arm-smmu: Invoke pm_runtime during probe, add/remove device
  2017-07-14 19:01                                     ` Will Deacon
  (?)
  (?)
@ 2017-07-14 19:34                                         ` Rob Clark
  -1 siblings, 0 replies; 168+ messages in thread
From: Rob Clark @ 2017-07-14 19:34 UTC (permalink / raw)
  To: Will Deacon
  Cc: Mark Rutland, devicetree-u79uwXL29TY76Z2rM5mHXA, Stephen Boyd,
	Linux Kernel Mailing List, Stanimir Varbanov,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, Rob Herring,
	Vivek Gautam, linux-arm-msm, linux-clk,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r

On Fri, Jul 14, 2017 at 3:01 PM, Will Deacon <will.deacon-5wv7dgnIgG8@public.gmane.org> wrote:
> On Fri, Jul 14, 2017 at 02:25:45PM -0400, Rob Clark wrote:
>> On Fri, Jul 14, 2017 at 2:06 PM, Will Deacon <will.deacon-5wv7dgnIgG8@public.gmane.org> wrote:
>> > On Fri, Jul 14, 2017 at 01:42:13PM -0400, Rob Clark wrote:
>> >> On Fri, Jul 14, 2017 at 1:07 PM, Will Deacon <will.deacon-5wv7dgnIgG8@public.gmane.org> wrote:
>> >> > On Thu, Jul 13, 2017 at 10:55:10AM -0400, Rob Clark wrote:
>> >> >> On Thu, Jul 13, 2017 at 9:53 AM, Sricharan R <sricharan-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org> wrote:
>> >> >> > Hi,
>> >> >> >
>> >> >> > On 7/13/2017 5:20 PM, Rob Clark wrote:
>> >> >> >> On Thu, Jul 13, 2017 at 1:35 AM, Sricharan R <sricharan-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org> wrote:
>> >> >> >>> Hi Vivek,
>> >> >> >>>
>> >> >> >>> On 7/13/2017 10:43 AM, Vivek Gautam wrote:
>> >> >> >>>> Hi Stephen,
>> >> >> >>>>
>> >> >> >>>>
>> >> >> >>>> On 07/13/2017 04:24 AM, Stephen Boyd wrote:
>> >> >> >>>>> On 07/06, Vivek Gautam wrote:
>> >> >> >>>>>> @@ -1231,12 +1237,18 @@ static int arm_smmu_map(struct iommu_domain *domain, unsigned long iova,
>> >> >> >>>>>>   static size_t arm_smmu_unmap(struct iommu_domain *domain, unsigned long iova,
>> >> >> >>>>>>                    size_t size)
>> >> >> >>>>>>   {
>> >> >> >>>>>> -    struct io_pgtable_ops *ops = to_smmu_domain(domain)->pgtbl_ops;
>> >> >> >>>>>> +    struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
>> >> >> >>>>>> +    struct io_pgtable_ops *ops = smmu_domain->pgtbl_ops;
>> >> >> >>>>>> +    size_t ret;
>> >> >> >>>>>>         if (!ops)
>> >> >> >>>>>>           return 0;
>> >> >> >>>>>>   -    return ops->unmap(ops, iova, size);
>> >> >> >>>>>> +    pm_runtime_get_sync(smmu_domain->smmu->dev);
>> >> >> >>>>> Can these map/unmap ops be called from an atomic context? I seem
>> >> >> >>>>> to recall that being a problem before.
>> >> >> >>>>
>> >> >> >>>> That's something which was dropped in the following patch merged in master:
>> >> >> >>>> 523d7423e21b iommu/arm-smmu: Remove io-pgtable spinlock
>> >> >> >>>>
>> >> >> >>>> Looks like we don't  need locks here anymore?
>> >> >> >>>
>> >> >> >>>  Apart from the locking, wonder why a explicit pm_runtime is needed
>> >> >> >>>  from unmap. Somehow looks like some path in the master using that
>> >> >> >>>  should have enabled the pm ?
>> >> >> >>>
>> >> >> >>
>> >> >> >> Yes, there are a bunch of scenarios where unmap can happen with
>> >> >> >> disabled master (but not in atomic context).  On the gpu side we
>> >> >> >> opportunistically keep a buffer mapping until the buffer is freed
>> >> >> >> (which can happen after gpu is disabled).  Likewise, v4l2 won't unmap
>> >> >> >> an exported dmabuf while some other driver holds a reference to it
>> >> >> >> (which can be dropped when the v4l2 device is suspended).
>> >> >> >>
>> >> >> >> Since unmap triggers tbl flush which touches iommu regs, the iommu
>> >> >> >> driver *definitely* needs a pm_runtime_get_sync().
>> >> >> >
>> >> >> >  Ok, with that being the case, there are two things here,
>> >> >> >
>> >> >> >  1) If the device links are still intact at these places where unmap is called,
>> >> >> >     then pm_runtime from the master would setup the all the clocks. That would
>> >> >> >     avoid reintroducing the locking indirectly here.
>> >> >> >
>> >> >> >  2) If not, then doing it here is the only way. But for both cases, since
>> >> >> >     the unmap can be called from atomic context, resume handler here should
>> >> >> >     avoid doing clk_prepare_enable , instead move the clk_prepare to the init.
>> >> >> >
>> >> >>
>> >> >> I do kinda like the approach Marek suggested.. of deferring the tlb
>> >> >> flush until resume.  I'm wondering if we could combine that with
>> >> >> putting the mmu in a stalled state when we suspend (and not resume the
>> >> >> mmu until after the pending tlb flush)?
>> >> >
>> >> > I'm not sure that a stalled state is what we're after here, because we need
>> >> > to take care to prevent any table walks if we've freed the underlying pages.
>> >> > What we could try to do is disable the SMMU (put into global bypass) and
>> >> > invalidate the TLB when performing a suspend operation, then we just ignore
>> >> > invalidation whilst the clocks are stopped and, on resume, enable the SMMU
>> >> > again.
>> >>
>> >> wouldn't stalled just block any memory transactions by device(s) using
>> >> the context bank?  Putting it in bypass isn't really a good thing if
>> >> there is any chance the device can sneak in a memory access before
>> >> we've taking it back out of bypass (ie. makes gpu a giant userspace
>> >> controlled root hole).
>> >
>> > If it doesn't deadlock, then yes, it will stall transactions. However, that
>> > doesn't mean it necessarily prevents page table walks.
>>
>> btw, I guess the concern about pagetable walk is that the unmap could
>> have removed some sub-level of the pt that the tlb walk would hit?
>> Would deferring freeing those pages help?
>
> Could do, but it sounds like a lot of complication that I think we can fix
> by making the suspend operation put the SMMU into a "clean" state.
>
>> > Instead of bypass, we
>> > could configure all the streams to terminate, but this race still worries me
>> > somewhat. I thought that the SMMU would only be suspended if all of its
>> > masters were suspended, so if the GPU wants to come out of suspend then the
>> > SMMU should be resumed first.
>>
>> I believe this should be true.. on the gpu side, I'm mostly trying to
>> avoid having to power the gpu back on to free buffers.  (On the v4l2
>> side, somewhere in the core videobuf code would also need to be made
>> to wrap it's dma_unmap_sg() with pm_runtime_get/put()..)
>
> Right, and we shouldn't have to resume it if we suspend it in a clean state,
> with the TLBs invalidated.
>

I guess if the device_link() stuff ensured the attached device
(gpu/etc) was suspended before suspending the iommu, then I guess I
can't see how temporarily putting the iommu in bypass would be a
problem.  I haven't looked at the device_link() stuff too closely, but
iommu being resumed first and suspended last seems like the only thing
that would make sense.  I'm mostly just nervous about iommu in bypass
vs gpu since userspace has so much control over what address gpu
writes to / reads from, so getting it wrong w/ the iommu would be a
rather bad thing ;-)

BR,
-R

^ permalink raw reply	[flat|nested] 168+ messages in thread

* Re: [PATCH V4 3/6] iommu/arm-smmu: Invoke pm_runtime during probe, add/remove device
@ 2017-07-14 19:34                                         ` Rob Clark
  0 siblings, 0 replies; 168+ messages in thread
From: Rob Clark @ 2017-07-14 19:34 UTC (permalink / raw)
  To: Will Deacon
  Cc: Sricharan R, Vivek Gautam, Stephen Boyd, Joerg Roedel,
	Robin Murphy, Rob Herring, Mark Rutland, Marek Szyprowski, iommu,
	devicetree, Linux Kernel Mailing List, linux-clk, linux-arm-msm,
	Stanimir Varbanov, Archit Taneja, linux-arm-kernel

On Fri, Jul 14, 2017 at 3:01 PM, Will Deacon <will.deacon@arm.com> wrote:
> On Fri, Jul 14, 2017 at 02:25:45PM -0400, Rob Clark wrote:
>> On Fri, Jul 14, 2017 at 2:06 PM, Will Deacon <will.deacon@arm.com> wrote:
>> > On Fri, Jul 14, 2017 at 01:42:13PM -0400, Rob Clark wrote:
>> >> On Fri, Jul 14, 2017 at 1:07 PM, Will Deacon <will.deacon@arm.com> wrote:
>> >> > On Thu, Jul 13, 2017 at 10:55:10AM -0400, Rob Clark wrote:
>> >> >> On Thu, Jul 13, 2017 at 9:53 AM, Sricharan R <sricharan@codeaurora.org> wrote:
>> >> >> > Hi,
>> >> >> >
>> >> >> > On 7/13/2017 5:20 PM, Rob Clark wrote:
>> >> >> >> On Thu, Jul 13, 2017 at 1:35 AM, Sricharan R <sricharan@codeaurora.org> wrote:
>> >> >> >>> Hi Vivek,
>> >> >> >>>
>> >> >> >>> On 7/13/2017 10:43 AM, Vivek Gautam wrote:
>> >> >> >>>> Hi Stephen,
>> >> >> >>>>
>> >> >> >>>>
>> >> >> >>>> On 07/13/2017 04:24 AM, Stephen Boyd wrote:
>> >> >> >>>>> On 07/06, Vivek Gautam wrote:
>> >> >> >>>>>> @@ -1231,12 +1237,18 @@ static int arm_smmu_map(struct iommu_domain *domain, unsigned long iova,
>> >> >> >>>>>>   static size_t arm_smmu_unmap(struct iommu_domain *domain, unsigned long iova,
>> >> >> >>>>>>                    size_t size)
>> >> >> >>>>>>   {
>> >> >> >>>>>> -    struct io_pgtable_ops *ops = to_smmu_domain(domain)->pgtbl_ops;
>> >> >> >>>>>> +    struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
>> >> >> >>>>>> +    struct io_pgtable_ops *ops = smmu_domain->pgtbl_ops;
>> >> >> >>>>>> +    size_t ret;
>> >> >> >>>>>>         if (!ops)
>> >> >> >>>>>>           return 0;
>> >> >> >>>>>>   -    return ops->unmap(ops, iova, size);
>> >> >> >>>>>> +    pm_runtime_get_sync(smmu_domain->smmu->dev);
>> >> >> >>>>> Can these map/unmap ops be called from an atomic context? I seem
>> >> >> >>>>> to recall that being a problem before.
>> >> >> >>>>
>> >> >> >>>> That's something which was dropped in the following patch merged in master:
>> >> >> >>>> 523d7423e21b iommu/arm-smmu: Remove io-pgtable spinlock
>> >> >> >>>>
>> >> >> >>>> Looks like we don't  need locks here anymore?
>> >> >> >>>
>> >> >> >>>  Apart from the locking, wonder why a explicit pm_runtime is needed
>> >> >> >>>  from unmap. Somehow looks like some path in the master using that
>> >> >> >>>  should have enabled the pm ?
>> >> >> >>>
>> >> >> >>
>> >> >> >> Yes, there are a bunch of scenarios where unmap can happen with
>> >> >> >> disabled master (but not in atomic context).  On the gpu side we
>> >> >> >> opportunistically keep a buffer mapping until the buffer is freed
>> >> >> >> (which can happen after gpu is disabled).  Likewise, v4l2 won't unmap
>> >> >> >> an exported dmabuf while some other driver holds a reference to it
>> >> >> >> (which can be dropped when the v4l2 device is suspended).
>> >> >> >>
>> >> >> >> Since unmap triggers tbl flush which touches iommu regs, the iommu
>> >> >> >> driver *definitely* needs a pm_runtime_get_sync().
>> >> >> >
>> >> >> >  Ok, with that being the case, there are two things here,
>> >> >> >
>> >> >> >  1) If the device links are still intact at these places where unmap is called,
>> >> >> >     then pm_runtime from the master would setup the all the clocks. That would
>> >> >> >     avoid reintroducing the locking indirectly here.
>> >> >> >
>> >> >> >  2) If not, then doing it here is the only way. But for both cases, since
>> >> >> >     the unmap can be called from atomic context, resume handler here should
>> >> >> >     avoid doing clk_prepare_enable , instead move the clk_prepare to the init.
>> >> >> >
>> >> >>
>> >> >> I do kinda like the approach Marek suggested.. of deferring the tlb
>> >> >> flush until resume.  I'm wondering if we could combine that with
>> >> >> putting the mmu in a stalled state when we suspend (and not resume the
>> >> >> mmu until after the pending tlb flush)?
>> >> >
>> >> > I'm not sure that a stalled state is what we're after here, because we need
>> >> > to take care to prevent any table walks if we've freed the underlying pages.
>> >> > What we could try to do is disable the SMMU (put into global bypass) and
>> >> > invalidate the TLB when performing a suspend operation, then we just ignore
>> >> > invalidation whilst the clocks are stopped and, on resume, enable the SMMU
>> >> > again.
>> >>
>> >> wouldn't stalled just block any memory transactions by device(s) using
>> >> the context bank?  Putting it in bypass isn't really a good thing if
>> >> there is any chance the device can sneak in a memory access before
>> >> we've taking it back out of bypass (ie. makes gpu a giant userspace
>> >> controlled root hole).
>> >
>> > If it doesn't deadlock, then yes, it will stall transactions. However, that
>> > doesn't mean it necessarily prevents page table walks.
>>
>> btw, I guess the concern about pagetable walk is that the unmap could
>> have removed some sub-level of the pt that the tlb walk would hit?
>> Would deferring freeing those pages help?
>
> Could do, but it sounds like a lot of complication that I think we can fix
> by making the suspend operation put the SMMU into a "clean" state.
>
>> > Instead of bypass, we
>> > could configure all the streams to terminate, but this race still worries me
>> > somewhat. I thought that the SMMU would only be suspended if all of its
>> > masters were suspended, so if the GPU wants to come out of suspend then the
>> > SMMU should be resumed first.
>>
>> I believe this should be true.. on the gpu side, I'm mostly trying to
>> avoid having to power the gpu back on to free buffers.  (On the v4l2
>> side, somewhere in the core videobuf code would also need to be made
>> to wrap it's dma_unmap_sg() with pm_runtime_get/put()..)
>
> Right, and we shouldn't have to resume it if we suspend it in a clean state,
> with the TLBs invalidated.
>

I guess if the device_link() stuff ensured the attached device
(gpu/etc) was suspended before suspending the iommu, then I guess I
can't see how temporarily putting the iommu in bypass would be a
problem.  I haven't looked at the device_link() stuff too closely, but
iommu being resumed first and suspended last seems like the only thing
that would make sense.  I'm mostly just nervous about iommu in bypass
vs gpu since userspace has so much control over what address gpu
writes to / reads from, so getting it wrong w/ the iommu would be a
rather bad thing ;-)

BR,
-R

^ permalink raw reply	[flat|nested] 168+ messages in thread

* Re: [PATCH V4 3/6] iommu/arm-smmu: Invoke pm_runtime during probe, add/remove device
@ 2017-07-14 19:34                                         ` Rob Clark
  0 siblings, 0 replies; 168+ messages in thread
From: Rob Clark @ 2017-07-14 19:34 UTC (permalink / raw)
  To: Will Deacon
  Cc: Sricharan R, Vivek Gautam, Stephen Boyd, Joerg Roedel,
	Robin Murphy, Rob Herring, Mark Rutland, Marek Szyprowski, iommu,
	devicetree, Linux Kernel Mailing List, linux-clk, linux-arm-msm,
	Stanimir Varbanov, Archit Taneja, linux-arm-kernel

On Fri, Jul 14, 2017 at 3:01 PM, Will Deacon <will.deacon@arm.com> wrote:
> On Fri, Jul 14, 2017 at 02:25:45PM -0400, Rob Clark wrote:
>> On Fri, Jul 14, 2017 at 2:06 PM, Will Deacon <will.deacon@arm.com> wrote:
>> > On Fri, Jul 14, 2017 at 01:42:13PM -0400, Rob Clark wrote:
>> >> On Fri, Jul 14, 2017 at 1:07 PM, Will Deacon <will.deacon@arm.com> wrote:
>> >> > On Thu, Jul 13, 2017 at 10:55:10AM -0400, Rob Clark wrote:
>> >> >> On Thu, Jul 13, 2017 at 9:53 AM, Sricharan R <sricharan@codeaurora.org> wrote:
>> >> >> > Hi,
>> >> >> >
>> >> >> > On 7/13/2017 5:20 PM, Rob Clark wrote:
>> >> >> >> On Thu, Jul 13, 2017 at 1:35 AM, Sricharan R <sricharan@codeaurora.org> wrote:
>> >> >> >>> Hi Vivek,
>> >> >> >>>
>> >> >> >>> On 7/13/2017 10:43 AM, Vivek Gautam wrote:
>> >> >> >>>> Hi Stephen,
>> >> >> >>>>
>> >> >> >>>>
>> >> >> >>>> On 07/13/2017 04:24 AM, Stephen Boyd wrote:
>> >> >> >>>>> On 07/06, Vivek Gautam wrote:
>> >> >> >>>>>> @@ -1231,12 +1237,18 @@ static int arm_smmu_map(struct iommu_domain *domain, unsigned long iova,
>> >> >> >>>>>>   static size_t arm_smmu_unmap(struct iommu_domain *domain, unsigned long iova,
>> >> >> >>>>>>                    size_t size)
>> >> >> >>>>>>   {
>> >> >> >>>>>> -    struct io_pgtable_ops *ops = to_smmu_domain(domain)->pgtbl_ops;
>> >> >> >>>>>> +    struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
>> >> >> >>>>>> +    struct io_pgtable_ops *ops = smmu_domain->pgtbl_ops;
>> >> >> >>>>>> +    size_t ret;
>> >> >> >>>>>>         if (!ops)
>> >> >> >>>>>>           return 0;
>> >> >> >>>>>>   -    return ops->unmap(ops, iova, size);
>> >> >> >>>>>> +    pm_runtime_get_sync(smmu_domain->smmu->dev);
>> >> >> >>>>> Can these map/unmap ops be called from an atomic context? I seem
>> >> >> >>>>> to recall that being a problem before.
>> >> >> >>>>
>> >> >> >>>> That's something which was dropped in the following patch merged in master:
>> >> >> >>>> 523d7423e21b iommu/arm-smmu: Remove io-pgtable spinlock
>> >> >> >>>>
>> >> >> >>>> Looks like we don't  need locks here anymore?
>> >> >> >>>
>> >> >> >>>  Apart from the locking, wonder why a explicit pm_runtime is needed
>> >> >> >>>  from unmap. Somehow looks like some path in the master using that
>> >> >> >>>  should have enabled the pm ?
>> >> >> >>>
>> >> >> >>
>> >> >> >> Yes, there are a bunch of scenarios where unmap can happen with
>> >> >> >> disabled master (but not in atomic context).  On the gpu side we
>> >> >> >> opportunistically keep a buffer mapping until the buffer is freed
>> >> >> >> (which can happen after gpu is disabled).  Likewise, v4l2 won't unmap
>> >> >> >> an exported dmabuf while some other driver holds a reference to it
>> >> >> >> (which can be dropped when the v4l2 device is suspended).
>> >> >> >>
>> >> >> >> Since unmap triggers tbl flush which touches iommu regs, the iommu
>> >> >> >> driver *definitely* needs a pm_runtime_get_sync().
>> >> >> >
>> >> >> >  Ok, with that being the case, there are two things here,
>> >> >> >
>> >> >> >  1) If the device links are still intact at these places where unmap is called,
>> >> >> >     then pm_runtime from the master would setup the all the clocks. That would
>> >> >> >     avoid reintroducing the locking indirectly here.
>> >> >> >
>> >> >> >  2) If not, then doing it here is the only way. But for both cases, since
>> >> >> >     the unmap can be called from atomic context, resume handler here should
>> >> >> >     avoid doing clk_prepare_enable , instead move the clk_prepare to the init.
>> >> >> >
>> >> >>
>> >> >> I do kinda like the approach Marek suggested.. of deferring the tlb
>> >> >> flush until resume.  I'm wondering if we could combine that with
>> >> >> putting the mmu in a stalled state when we suspend (and not resume the
>> >> >> mmu until after the pending tlb flush)?
>> >> >
>> >> > I'm not sure that a stalled state is what we're after here, because we need
>> >> > to take care to prevent any table walks if we've freed the underlying pages.
>> >> > What we could try to do is disable the SMMU (put into global bypass) and
>> >> > invalidate the TLB when performing a suspend operation, then we just ignore
>> >> > invalidation whilst the clocks are stopped and, on resume, enable the SMMU
>> >> > again.
>> >>
>> >> wouldn't stalled just block any memory transactions by device(s) using
>> >> the context bank?  Putting it in bypass isn't really a good thing if
>> >> there is any chance the device can sneak in a memory access before
>> >> we've taking it back out of bypass (ie. makes gpu a giant userspace
>> >> controlled root hole).
>> >
>> > If it doesn't deadlock, then yes, it will stall transactions. However, that
>> > doesn't mean it necessarily prevents page table walks.
>>
>> btw, I guess the concern about pagetable walk is that the unmap could
>> have removed some sub-level of the pt that the tlb walk would hit?
>> Would deferring freeing those pages help?
>
> Could do, but it sounds like a lot of complication that I think we can fix
> by making the suspend operation put the SMMU into a "clean" state.
>
>> > Instead of bypass, we
>> > could configure all the streams to terminate, but this race still worries me
>> > somewhat. I thought that the SMMU would only be suspended if all of its
>> > masters were suspended, so if the GPU wants to come out of suspend then the
>> > SMMU should be resumed first.
>>
>> I believe this should be true.. on the gpu side, I'm mostly trying to
>> avoid having to power the gpu back on to free buffers.  (On the v4l2
>> side, somewhere in the core videobuf code would also need to be made
>> to wrap it's dma_unmap_sg() with pm_runtime_get/put()..)
>
> Right, and we shouldn't have to resume it if we suspend it in a clean state,
> with the TLBs invalidated.
>

I guess if the device_link() stuff ensured the attached device
(gpu/etc) was suspended before suspending the iommu, then I guess I
can't see how temporarily putting the iommu in bypass would be a
problem.  I haven't looked at the device_link() stuff too closely, but
iommu being resumed first and suspended last seems like the only thing
that would make sense.  I'm mostly just nervous about iommu in bypass
vs gpu since userspace has so much control over what address gpu
writes to / reads from, so getting it wrong w/ the iommu would be a
rather bad thing ;-)

BR,
-R

^ permalink raw reply	[flat|nested] 168+ messages in thread

* [PATCH V4 3/6] iommu/arm-smmu: Invoke pm_runtime during probe, add/remove device
@ 2017-07-14 19:34                                         ` Rob Clark
  0 siblings, 0 replies; 168+ messages in thread
From: Rob Clark @ 2017-07-14 19:34 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Jul 14, 2017 at 3:01 PM, Will Deacon <will.deacon@arm.com> wrote:
> On Fri, Jul 14, 2017 at 02:25:45PM -0400, Rob Clark wrote:
>> On Fri, Jul 14, 2017 at 2:06 PM, Will Deacon <will.deacon@arm.com> wrote:
>> > On Fri, Jul 14, 2017 at 01:42:13PM -0400, Rob Clark wrote:
>> >> On Fri, Jul 14, 2017 at 1:07 PM, Will Deacon <will.deacon@arm.com> wrote:
>> >> > On Thu, Jul 13, 2017 at 10:55:10AM -0400, Rob Clark wrote:
>> >> >> On Thu, Jul 13, 2017 at 9:53 AM, Sricharan R <sricharan@codeaurora.org> wrote:
>> >> >> > Hi,
>> >> >> >
>> >> >> > On 7/13/2017 5:20 PM, Rob Clark wrote:
>> >> >> >> On Thu, Jul 13, 2017 at 1:35 AM, Sricharan R <sricharan@codeaurora.org> wrote:
>> >> >> >>> Hi Vivek,
>> >> >> >>>
>> >> >> >>> On 7/13/2017 10:43 AM, Vivek Gautam wrote:
>> >> >> >>>> Hi Stephen,
>> >> >> >>>>
>> >> >> >>>>
>> >> >> >>>> On 07/13/2017 04:24 AM, Stephen Boyd wrote:
>> >> >> >>>>> On 07/06, Vivek Gautam wrote:
>> >> >> >>>>>> @@ -1231,12 +1237,18 @@ static int arm_smmu_map(struct iommu_domain *domain, unsigned long iova,
>> >> >> >>>>>>   static size_t arm_smmu_unmap(struct iommu_domain *domain, unsigned long iova,
>> >> >> >>>>>>                    size_t size)
>> >> >> >>>>>>   {
>> >> >> >>>>>> -    struct io_pgtable_ops *ops = to_smmu_domain(domain)->pgtbl_ops;
>> >> >> >>>>>> +    struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
>> >> >> >>>>>> +    struct io_pgtable_ops *ops = smmu_domain->pgtbl_ops;
>> >> >> >>>>>> +    size_t ret;
>> >> >> >>>>>>         if (!ops)
>> >> >> >>>>>>           return 0;
>> >> >> >>>>>>   -    return ops->unmap(ops, iova, size);
>> >> >> >>>>>> +    pm_runtime_get_sync(smmu_domain->smmu->dev);
>> >> >> >>>>> Can these map/unmap ops be called from an atomic context? I seem
>> >> >> >>>>> to recall that being a problem before.
>> >> >> >>>>
>> >> >> >>>> That's something which was dropped in the following patch merged in master:
>> >> >> >>>> 523d7423e21b iommu/arm-smmu: Remove io-pgtable spinlock
>> >> >> >>>>
>> >> >> >>>> Looks like we don't  need locks here anymore?
>> >> >> >>>
>> >> >> >>>  Apart from the locking, wonder why a explicit pm_runtime is needed
>> >> >> >>>  from unmap. Somehow looks like some path in the master using that
>> >> >> >>>  should have enabled the pm ?
>> >> >> >>>
>> >> >> >>
>> >> >> >> Yes, there are a bunch of scenarios where unmap can happen with
>> >> >> >> disabled master (but not in atomic context).  On the gpu side we
>> >> >> >> opportunistically keep a buffer mapping until the buffer is freed
>> >> >> >> (which can happen after gpu is disabled).  Likewise, v4l2 won't unmap
>> >> >> >> an exported dmabuf while some other driver holds a reference to it
>> >> >> >> (which can be dropped when the v4l2 device is suspended).
>> >> >> >>
>> >> >> >> Since unmap triggers tbl flush which touches iommu regs, the iommu
>> >> >> >> driver *definitely* needs a pm_runtime_get_sync().
>> >> >> >
>> >> >> >  Ok, with that being the case, there are two things here,
>> >> >> >
>> >> >> >  1) If the device links are still intact at these places where unmap is called,
>> >> >> >     then pm_runtime from the master would setup the all the clocks. That would
>> >> >> >     avoid reintroducing the locking indirectly here.
>> >> >> >
>> >> >> >  2) If not, then doing it here is the only way. But for both cases, since
>> >> >> >     the unmap can be called from atomic context, resume handler here should
>> >> >> >     avoid doing clk_prepare_enable , instead move the clk_prepare to the init.
>> >> >> >
>> >> >>
>> >> >> I do kinda like the approach Marek suggested.. of deferring the tlb
>> >> >> flush until resume.  I'm wondering if we could combine that with
>> >> >> putting the mmu in a stalled state when we suspend (and not resume the
>> >> >> mmu until after the pending tlb flush)?
>> >> >
>> >> > I'm not sure that a stalled state is what we're after here, because we need
>> >> > to take care to prevent any table walks if we've freed the underlying pages.
>> >> > What we could try to do is disable the SMMU (put into global bypass) and
>> >> > invalidate the TLB when performing a suspend operation, then we just ignore
>> >> > invalidation whilst the clocks are stopped and, on resume, enable the SMMU
>> >> > again.
>> >>
>> >> wouldn't stalled just block any memory transactions by device(s) using
>> >> the context bank?  Putting it in bypass isn't really a good thing if
>> >> there is any chance the device can sneak in a memory access before
>> >> we've taking it back out of bypass (ie. makes gpu a giant userspace
>> >> controlled root hole).
>> >
>> > If it doesn't deadlock, then yes, it will stall transactions. However, that
>> > doesn't mean it necessarily prevents page table walks.
>>
>> btw, I guess the concern about pagetable walk is that the unmap could
>> have removed some sub-level of the pt that the tlb walk would hit?
>> Would deferring freeing those pages help?
>
> Could do, but it sounds like a lot of complication that I think we can fix
> by making the suspend operation put the SMMU into a "clean" state.
>
>> > Instead of bypass, we
>> > could configure all the streams to terminate, but this race still worries me
>> > somewhat. I thought that the SMMU would only be suspended if all of its
>> > masters were suspended, so if the GPU wants to come out of suspend then the
>> > SMMU should be resumed first.
>>
>> I believe this should be true.. on the gpu side, I'm mostly trying to
>> avoid having to power the gpu back on to free buffers.  (On the v4l2
>> side, somewhere in the core videobuf code would also need to be made
>> to wrap it's dma_unmap_sg() with pm_runtime_get/put()..)
>
> Right, and we shouldn't have to resume it if we suspend it in a clean state,
> with the TLBs invalidated.
>

I guess if the device_link() stuff ensured the attached device
(gpu/etc) was suspended before suspending the iommu, then I guess I
can't see how temporarily putting the iommu in bypass would be a
problem.  I haven't looked at the device_link() stuff too closely, but
iommu being resumed first and suspended last seems like the only thing
that would make sense.  I'm mostly just nervous about iommu in bypass
vs gpu since userspace has so much control over what address gpu
writes to / reads from, so getting it wrong w/ the iommu would be a
rather bad thing ;-)

BR,
-R

^ permalink raw reply	[flat|nested] 168+ messages in thread

* Re: [PATCH V4 3/6] iommu/arm-smmu: Invoke pm_runtime during probe, add/remove device
  2017-07-14 19:34                                         ` Rob Clark
  (?)
@ 2017-07-14 19:36                                           ` Will Deacon
  -1 siblings, 0 replies; 168+ messages in thread
From: Will Deacon @ 2017-07-14 19:36 UTC (permalink / raw)
  To: Rob Clark
  Cc: Sricharan R, Vivek Gautam, Stephen Boyd, Joerg Roedel,
	Robin Murphy, Rob Herring, Mark Rutland, Marek Szyprowski, iommu,
	devicetree, Linux Kernel Mailing List, linux-clk, linux-arm-msm,
	Stanimir Varbanov, Archit Taneja, linux-arm-kernel

On Fri, Jul 14, 2017 at 03:34:42PM -0400, Rob Clark wrote:
> On Fri, Jul 14, 2017 at 3:01 PM, Will Deacon <will.deacon@arm.com> wrote:
> > On Fri, Jul 14, 2017 at 02:25:45PM -0400, Rob Clark wrote:
> >> On Fri, Jul 14, 2017 at 2:06 PM, Will Deacon <will.deacon@arm.com> wrote:
> >> > On Fri, Jul 14, 2017 at 01:42:13PM -0400, Rob Clark wrote:
> >> >> On Fri, Jul 14, 2017 at 1:07 PM, Will Deacon <will.deacon@arm.com> wrote:
> >> >> > On Thu, Jul 13, 2017 at 10:55:10AM -0400, Rob Clark wrote:
> >> >> >> On Thu, Jul 13, 2017 at 9:53 AM, Sricharan R <sricharan@codeaurora.org> wrote:
> >> >> >> > Hi,
> >> >> >> >
> >> >> >> > On 7/13/2017 5:20 PM, Rob Clark wrote:
> >> >> >> >> On Thu, Jul 13, 2017 at 1:35 AM, Sricharan R <sricharan@codeaurora.org> wrote:
> >> >> >> >>> Hi Vivek,
> >> >> >> >>>
> >> >> >> >>> On 7/13/2017 10:43 AM, Vivek Gautam wrote:
> >> >> >> >>>> Hi Stephen,
> >> >> >> >>>>
> >> >> >> >>>>
> >> >> >> >>>> On 07/13/2017 04:24 AM, Stephen Boyd wrote:
> >> >> >> >>>>> On 07/06, Vivek Gautam wrote:
> >> >> >> >>>>>> @@ -1231,12 +1237,18 @@ static int arm_smmu_map(struct iommu_domain *domain, unsigned long iova,
> >> >> >> >>>>>>   static size_t arm_smmu_unmap(struct iommu_domain *domain, unsigned long iova,
> >> >> >> >>>>>>                    size_t size)
> >> >> >> >>>>>>   {
> >> >> >> >>>>>> -    struct io_pgtable_ops *ops = to_smmu_domain(domain)->pgtbl_ops;
> >> >> >> >>>>>> +    struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
> >> >> >> >>>>>> +    struct io_pgtable_ops *ops = smmu_domain->pgtbl_ops;
> >> >> >> >>>>>> +    size_t ret;
> >> >> >> >>>>>>         if (!ops)
> >> >> >> >>>>>>           return 0;
> >> >> >> >>>>>>   -    return ops->unmap(ops, iova, size);
> >> >> >> >>>>>> +    pm_runtime_get_sync(smmu_domain->smmu->dev);
> >> >> >> >>>>> Can these map/unmap ops be called from an atomic context? I seem
> >> >> >> >>>>> to recall that being a problem before.
> >> >> >> >>>>
> >> >> >> >>>> That's something which was dropped in the following patch merged in master:
> >> >> >> >>>> 523d7423e21b iommu/arm-smmu: Remove io-pgtable spinlock
> >> >> >> >>>>
> >> >> >> >>>> Looks like we don't  need locks here anymore?
> >> >> >> >>>
> >> >> >> >>>  Apart from the locking, wonder why a explicit pm_runtime is needed
> >> >> >> >>>  from unmap. Somehow looks like some path in the master using that
> >> >> >> >>>  should have enabled the pm ?
> >> >> >> >>>
> >> >> >> >>
> >> >> >> >> Yes, there are a bunch of scenarios where unmap can happen with
> >> >> >> >> disabled master (but not in atomic context).  On the gpu side we
> >> >> >> >> opportunistically keep a buffer mapping until the buffer is freed
> >> >> >> >> (which can happen after gpu is disabled).  Likewise, v4l2 won't unmap
> >> >> >> >> an exported dmabuf while some other driver holds a reference to it
> >> >> >> >> (which can be dropped when the v4l2 device is suspended).
> >> >> >> >>
> >> >> >> >> Since unmap triggers tbl flush which touches iommu regs, the iommu
> >> >> >> >> driver *definitely* needs a pm_runtime_get_sync().
> >> >> >> >
> >> >> >> >  Ok, with that being the case, there are two things here,
> >> >> >> >
> >> >> >> >  1) If the device links are still intact at these places where unmap is called,
> >> >> >> >     then pm_runtime from the master would setup the all the clocks. That would
> >> >> >> >     avoid reintroducing the locking indirectly here.
> >> >> >> >
> >> >> >> >  2) If not, then doing it here is the only way. But for both cases, since
> >> >> >> >     the unmap can be called from atomic context, resume handler here should
> >> >> >> >     avoid doing clk_prepare_enable , instead move the clk_prepare to the init.
> >> >> >> >
> >> >> >>
> >> >> >> I do kinda like the approach Marek suggested.. of deferring the tlb
> >> >> >> flush until resume.  I'm wondering if we could combine that with
> >> >> >> putting the mmu in a stalled state when we suspend (and not resume the
> >> >> >> mmu until after the pending tlb flush)?
> >> >> >
> >> >> > I'm not sure that a stalled state is what we're after here, because we need
> >> >> > to take care to prevent any table walks if we've freed the underlying pages.
> >> >> > What we could try to do is disable the SMMU (put into global bypass) and
> >> >> > invalidate the TLB when performing a suspend operation, then we just ignore
> >> >> > invalidation whilst the clocks are stopped and, on resume, enable the SMMU
> >> >> > again.
> >> >>
> >> >> wouldn't stalled just block any memory transactions by device(s) using
> >> >> the context bank?  Putting it in bypass isn't really a good thing if
> >> >> there is any chance the device can sneak in a memory access before
> >> >> we've taking it back out of bypass (ie. makes gpu a giant userspace
> >> >> controlled root hole).
> >> >
> >> > If it doesn't deadlock, then yes, it will stall transactions. However, that
> >> > doesn't mean it necessarily prevents page table walks.
> >>
> >> btw, I guess the concern about pagetable walk is that the unmap could
> >> have removed some sub-level of the pt that the tlb walk would hit?
> >> Would deferring freeing those pages help?
> >
> > Could do, but it sounds like a lot of complication that I think we can fix
> > by making the suspend operation put the SMMU into a "clean" state.
> >
> >> > Instead of bypass, we
> >> > could configure all the streams to terminate, but this race still worries me
> >> > somewhat. I thought that the SMMU would only be suspended if all of its
> >> > masters were suspended, so if the GPU wants to come out of suspend then the
> >> > SMMU should be resumed first.
> >>
> >> I believe this should be true.. on the gpu side, I'm mostly trying to
> >> avoid having to power the gpu back on to free buffers.  (On the v4l2
> >> side, somewhere in the core videobuf code would also need to be made
> >> to wrap it's dma_unmap_sg() with pm_runtime_get/put()..)
> >
> > Right, and we shouldn't have to resume it if we suspend it in a clean state,
> > with the TLBs invalidated.
> >
> 
> I guess if the device_link() stuff ensured the attached device
> (gpu/etc) was suspended before suspending the iommu, then I guess I
> can't see how temporarily putting the iommu in bypass would be a
> problem.  I haven't looked at the device_link() stuff too closely, but
> iommu being resumed first and suspended last seems like the only thing
> that would make sense.  I'm mostly just nervous about iommu in bypass
> vs gpu since userspace has so much control over what address gpu
> writes to / reads from, so getting it wrong w/ the iommu would be a
> rather bad thing ;-)

Right, but we can also configure it to terminate if you don't want bypass.

Will

^ permalink raw reply	[flat|nested] 168+ messages in thread

* Re: [PATCH V4 3/6] iommu/arm-smmu: Invoke pm_runtime during probe, add/remove device
@ 2017-07-14 19:36                                           ` Will Deacon
  0 siblings, 0 replies; 168+ messages in thread
From: Will Deacon @ 2017-07-14 19:36 UTC (permalink / raw)
  To: Rob Clark
  Cc: Sricharan R, Vivek Gautam, Stephen Boyd, Joerg Roedel,
	Robin Murphy, Rob Herring, Mark Rutland, Marek Szyprowski, iommu,
	devicetree, Linux Kernel Mailing List, linux-clk, linux-arm-msm,
	Stanimir Varbanov, Archit Taneja, linux-arm-kernel

On Fri, Jul 14, 2017 at 03:34:42PM -0400, Rob Clark wrote:
> On Fri, Jul 14, 2017 at 3:01 PM, Will Deacon <will.deacon@arm.com> wrote:
> > On Fri, Jul 14, 2017 at 02:25:45PM -0400, Rob Clark wrote:
> >> On Fri, Jul 14, 2017 at 2:06 PM, Will Deacon <will.deacon@arm.com> wrote:
> >> > On Fri, Jul 14, 2017 at 01:42:13PM -0400, Rob Clark wrote:
> >> >> On Fri, Jul 14, 2017 at 1:07 PM, Will Deacon <will.deacon@arm.com> wrote:
> >> >> > On Thu, Jul 13, 2017 at 10:55:10AM -0400, Rob Clark wrote:
> >> >> >> On Thu, Jul 13, 2017 at 9:53 AM, Sricharan R <sricharan@codeaurora.org> wrote:
> >> >> >> > Hi,
> >> >> >> >
> >> >> >> > On 7/13/2017 5:20 PM, Rob Clark wrote:
> >> >> >> >> On Thu, Jul 13, 2017 at 1:35 AM, Sricharan R <sricharan@codeaurora.org> wrote:
> >> >> >> >>> Hi Vivek,
> >> >> >> >>>
> >> >> >> >>> On 7/13/2017 10:43 AM, Vivek Gautam wrote:
> >> >> >> >>>> Hi Stephen,
> >> >> >> >>>>
> >> >> >> >>>>
> >> >> >> >>>> On 07/13/2017 04:24 AM, Stephen Boyd wrote:
> >> >> >> >>>>> On 07/06, Vivek Gautam wrote:
> >> >> >> >>>>>> @@ -1231,12 +1237,18 @@ static int arm_smmu_map(struct iommu_domain *domain, unsigned long iova,
> >> >> >> >>>>>>   static size_t arm_smmu_unmap(struct iommu_domain *domain, unsigned long iova,
> >> >> >> >>>>>>                    size_t size)
> >> >> >> >>>>>>   {
> >> >> >> >>>>>> -    struct io_pgtable_ops *ops = to_smmu_domain(domain)->pgtbl_ops;
> >> >> >> >>>>>> +    struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
> >> >> >> >>>>>> +    struct io_pgtable_ops *ops = smmu_domain->pgtbl_ops;
> >> >> >> >>>>>> +    size_t ret;
> >> >> >> >>>>>>         if (!ops)
> >> >> >> >>>>>>           return 0;
> >> >> >> >>>>>>   -    return ops->unmap(ops, iova, size);
> >> >> >> >>>>>> +    pm_runtime_get_sync(smmu_domain->smmu->dev);
> >> >> >> >>>>> Can these map/unmap ops be called from an atomic context? I seem
> >> >> >> >>>>> to recall that being a problem before.
> >> >> >> >>>>
> >> >> >> >>>> That's something which was dropped in the following patch merged in master:
> >> >> >> >>>> 523d7423e21b iommu/arm-smmu: Remove io-pgtable spinlock
> >> >> >> >>>>
> >> >> >> >>>> Looks like we don't  need locks here anymore?
> >> >> >> >>>
> >> >> >> >>>  Apart from the locking, wonder why a explicit pm_runtime is needed
> >> >> >> >>>  from unmap. Somehow looks like some path in the master using that
> >> >> >> >>>  should have enabled the pm ?
> >> >> >> >>>
> >> >> >> >>
> >> >> >> >> Yes, there are a bunch of scenarios where unmap can happen with
> >> >> >> >> disabled master (but not in atomic context).  On the gpu side we
> >> >> >> >> opportunistically keep a buffer mapping until the buffer is freed
> >> >> >> >> (which can happen after gpu is disabled).  Likewise, v4l2 won't unmap
> >> >> >> >> an exported dmabuf while some other driver holds a reference to it
> >> >> >> >> (which can be dropped when the v4l2 device is suspended).
> >> >> >> >>
> >> >> >> >> Since unmap triggers tbl flush which touches iommu regs, the iommu
> >> >> >> >> driver *definitely* needs a pm_runtime_get_sync().
> >> >> >> >
> >> >> >> >  Ok, with that being the case, there are two things here,
> >> >> >> >
> >> >> >> >  1) If the device links are still intact at these places where unmap is called,
> >> >> >> >     then pm_runtime from the master would setup the all the clocks. That would
> >> >> >> >     avoid reintroducing the locking indirectly here.
> >> >> >> >
> >> >> >> >  2) If not, then doing it here is the only way. But for both cases, since
> >> >> >> >     the unmap can be called from atomic context, resume handler here should
> >> >> >> >     avoid doing clk_prepare_enable , instead move the clk_prepare to the init.
> >> >> >> >
> >> >> >>
> >> >> >> I do kinda like the approach Marek suggested.. of deferring the tlb
> >> >> >> flush until resume.  I'm wondering if we could combine that with
> >> >> >> putting the mmu in a stalled state when we suspend (and not resume the
> >> >> >> mmu until after the pending tlb flush)?
> >> >> >
> >> >> > I'm not sure that a stalled state is what we're after here, because we need
> >> >> > to take care to prevent any table walks if we've freed the underlying pages.
> >> >> > What we could try to do is disable the SMMU (put into global bypass) and
> >> >> > invalidate the TLB when performing a suspend operation, then we just ignore
> >> >> > invalidation whilst the clocks are stopped and, on resume, enable the SMMU
> >> >> > again.
> >> >>
> >> >> wouldn't stalled just block any memory transactions by device(s) using
> >> >> the context bank?  Putting it in bypass isn't really a good thing if
> >> >> there is any chance the device can sneak in a memory access before
> >> >> we've taking it back out of bypass (ie. makes gpu a giant userspace
> >> >> controlled root hole).
> >> >
> >> > If it doesn't deadlock, then yes, it will stall transactions. However, that
> >> > doesn't mean it necessarily prevents page table walks.
> >>
> >> btw, I guess the concern about pagetable walk is that the unmap could
> >> have removed some sub-level of the pt that the tlb walk would hit?
> >> Would deferring freeing those pages help?
> >
> > Could do, but it sounds like a lot of complication that I think we can fix
> > by making the suspend operation put the SMMU into a "clean" state.
> >
> >> > Instead of bypass, we
> >> > could configure all the streams to terminate, but this race still worries me
> >> > somewhat. I thought that the SMMU would only be suspended if all of its
> >> > masters were suspended, so if the GPU wants to come out of suspend then the
> >> > SMMU should be resumed first.
> >>
> >> I believe this should be true.. on the gpu side, I'm mostly trying to
> >> avoid having to power the gpu back on to free buffers.  (On the v4l2
> >> side, somewhere in the core videobuf code would also need to be made
> >> to wrap it's dma_unmap_sg() with pm_runtime_get/put()..)
> >
> > Right, and we shouldn't have to resume it if we suspend it in a clean state,
> > with the TLBs invalidated.
> >
> 
> I guess if the device_link() stuff ensured the attached device
> (gpu/etc) was suspended before suspending the iommu, then I guess I
> can't see how temporarily putting the iommu in bypass would be a
> problem.  I haven't looked at the device_link() stuff too closely, but
> iommu being resumed first and suspended last seems like the only thing
> that would make sense.  I'm mostly just nervous about iommu in bypass
> vs gpu since userspace has so much control over what address gpu
> writes to / reads from, so getting it wrong w/ the iommu would be a
> rather bad thing ;-)

Right, but we can also configure it to terminate if you don't want bypass.

Will

^ permalink raw reply	[flat|nested] 168+ messages in thread

* [PATCH V4 3/6] iommu/arm-smmu: Invoke pm_runtime during probe, add/remove device
@ 2017-07-14 19:36                                           ` Will Deacon
  0 siblings, 0 replies; 168+ messages in thread
From: Will Deacon @ 2017-07-14 19:36 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Jul 14, 2017 at 03:34:42PM -0400, Rob Clark wrote:
> On Fri, Jul 14, 2017 at 3:01 PM, Will Deacon <will.deacon@arm.com> wrote:
> > On Fri, Jul 14, 2017 at 02:25:45PM -0400, Rob Clark wrote:
> >> On Fri, Jul 14, 2017 at 2:06 PM, Will Deacon <will.deacon@arm.com> wrote:
> >> > On Fri, Jul 14, 2017 at 01:42:13PM -0400, Rob Clark wrote:
> >> >> On Fri, Jul 14, 2017 at 1:07 PM, Will Deacon <will.deacon@arm.com> wrote:
> >> >> > On Thu, Jul 13, 2017 at 10:55:10AM -0400, Rob Clark wrote:
> >> >> >> On Thu, Jul 13, 2017 at 9:53 AM, Sricharan R <sricharan@codeaurora.org> wrote:
> >> >> >> > Hi,
> >> >> >> >
> >> >> >> > On 7/13/2017 5:20 PM, Rob Clark wrote:
> >> >> >> >> On Thu, Jul 13, 2017 at 1:35 AM, Sricharan R <sricharan@codeaurora.org> wrote:
> >> >> >> >>> Hi Vivek,
> >> >> >> >>>
> >> >> >> >>> On 7/13/2017 10:43 AM, Vivek Gautam wrote:
> >> >> >> >>>> Hi Stephen,
> >> >> >> >>>>
> >> >> >> >>>>
> >> >> >> >>>> On 07/13/2017 04:24 AM, Stephen Boyd wrote:
> >> >> >> >>>>> On 07/06, Vivek Gautam wrote:
> >> >> >> >>>>>> @@ -1231,12 +1237,18 @@ static int arm_smmu_map(struct iommu_domain *domain, unsigned long iova,
> >> >> >> >>>>>>   static size_t arm_smmu_unmap(struct iommu_domain *domain, unsigned long iova,
> >> >> >> >>>>>>                    size_t size)
> >> >> >> >>>>>>   {
> >> >> >> >>>>>> -    struct io_pgtable_ops *ops = to_smmu_domain(domain)->pgtbl_ops;
> >> >> >> >>>>>> +    struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
> >> >> >> >>>>>> +    struct io_pgtable_ops *ops = smmu_domain->pgtbl_ops;
> >> >> >> >>>>>> +    size_t ret;
> >> >> >> >>>>>>         if (!ops)
> >> >> >> >>>>>>           return 0;
> >> >> >> >>>>>>   -    return ops->unmap(ops, iova, size);
> >> >> >> >>>>>> +    pm_runtime_get_sync(smmu_domain->smmu->dev);
> >> >> >> >>>>> Can these map/unmap ops be called from an atomic context? I seem
> >> >> >> >>>>> to recall that being a problem before.
> >> >> >> >>>>
> >> >> >> >>>> That's something which was dropped in the following patch merged in master:
> >> >> >> >>>> 523d7423e21b iommu/arm-smmu: Remove io-pgtable spinlock
> >> >> >> >>>>
> >> >> >> >>>> Looks like we don't  need locks here anymore?
> >> >> >> >>>
> >> >> >> >>>  Apart from the locking, wonder why a explicit pm_runtime is needed
> >> >> >> >>>  from unmap. Somehow looks like some path in the master using that
> >> >> >> >>>  should have enabled the pm ?
> >> >> >> >>>
> >> >> >> >>
> >> >> >> >> Yes, there are a bunch of scenarios where unmap can happen with
> >> >> >> >> disabled master (but not in atomic context).  On the gpu side we
> >> >> >> >> opportunistically keep a buffer mapping until the buffer is freed
> >> >> >> >> (which can happen after gpu is disabled).  Likewise, v4l2 won't unmap
> >> >> >> >> an exported dmabuf while some other driver holds a reference to it
> >> >> >> >> (which can be dropped when the v4l2 device is suspended).
> >> >> >> >>
> >> >> >> >> Since unmap triggers tbl flush which touches iommu regs, the iommu
> >> >> >> >> driver *definitely* needs a pm_runtime_get_sync().
> >> >> >> >
> >> >> >> >  Ok, with that being the case, there are two things here,
> >> >> >> >
> >> >> >> >  1) If the device links are still intact at these places where unmap is called,
> >> >> >> >     then pm_runtime from the master would setup the all the clocks. That would
> >> >> >> >     avoid reintroducing the locking indirectly here.
> >> >> >> >
> >> >> >> >  2) If not, then doing it here is the only way. But for both cases, since
> >> >> >> >     the unmap can be called from atomic context, resume handler here should
> >> >> >> >     avoid doing clk_prepare_enable , instead move the clk_prepare to the init.
> >> >> >> >
> >> >> >>
> >> >> >> I do kinda like the approach Marek suggested.. of deferring the tlb
> >> >> >> flush until resume.  I'm wondering if we could combine that with
> >> >> >> putting the mmu in a stalled state when we suspend (and not resume the
> >> >> >> mmu until after the pending tlb flush)?
> >> >> >
> >> >> > I'm not sure that a stalled state is what we're after here, because we need
> >> >> > to take care to prevent any table walks if we've freed the underlying pages.
> >> >> > What we could try to do is disable the SMMU (put into global bypass) and
> >> >> > invalidate the TLB when performing a suspend operation, then we just ignore
> >> >> > invalidation whilst the clocks are stopped and, on resume, enable the SMMU
> >> >> > again.
> >> >>
> >> >> wouldn't stalled just block any memory transactions by device(s) using
> >> >> the context bank?  Putting it in bypass isn't really a good thing if
> >> >> there is any chance the device can sneak in a memory access before
> >> >> we've taking it back out of bypass (ie. makes gpu a giant userspace
> >> >> controlled root hole).
> >> >
> >> > If it doesn't deadlock, then yes, it will stall transactions. However, that
> >> > doesn't mean it necessarily prevents page table walks.
> >>
> >> btw, I guess the concern about pagetable walk is that the unmap could
> >> have removed some sub-level of the pt that the tlb walk would hit?
> >> Would deferring freeing those pages help?
> >
> > Could do, but it sounds like a lot of complication that I think we can fix
> > by making the suspend operation put the SMMU into a "clean" state.
> >
> >> > Instead of bypass, we
> >> > could configure all the streams to terminate, but this race still worries me
> >> > somewhat. I thought that the SMMU would only be suspended if all of its
> >> > masters were suspended, so if the GPU wants to come out of suspend then the
> >> > SMMU should be resumed first.
> >>
> >> I believe this should be true.. on the gpu side, I'm mostly trying to
> >> avoid having to power the gpu back on to free buffers.  (On the v4l2
> >> side, somewhere in the core videobuf code would also need to be made
> >> to wrap it's dma_unmap_sg() with pm_runtime_get/put()..)
> >
> > Right, and we shouldn't have to resume it if we suspend it in a clean state,
> > with the TLBs invalidated.
> >
> 
> I guess if the device_link() stuff ensured the attached device
> (gpu/etc) was suspended before suspending the iommu, then I guess I
> can't see how temporarily putting the iommu in bypass would be a
> problem.  I haven't looked at the device_link() stuff too closely, but
> iommu being resumed first and suspended last seems like the only thing
> that would make sense.  I'm mostly just nervous about iommu in bypass
> vs gpu since userspace has so much control over what address gpu
> writes to / reads from, so getting it wrong w/ the iommu would be a
> rather bad thing ;-)

Right, but we can also configure it to terminate if you don't want bypass.

Will

^ permalink raw reply	[flat|nested] 168+ messages in thread

* Re: [PATCH V4 3/6] iommu/arm-smmu: Invoke pm_runtime during probe, add/remove device
  2017-07-14 19:36                                           ` Will Deacon
  (?)
@ 2017-07-14 19:39                                             ` Rob Clark
  -1 siblings, 0 replies; 168+ messages in thread
From: Rob Clark @ 2017-07-14 19:39 UTC (permalink / raw)
  To: Will Deacon
  Cc: Sricharan R, Vivek Gautam, Stephen Boyd, Joerg Roedel,
	Robin Murphy, Rob Herring, Mark Rutland, Marek Szyprowski, iommu,
	devicetree, Linux Kernel Mailing List, linux-clk, linux-arm-msm,
	Stanimir Varbanov, Archit Taneja, linux-arm-kernel

On Fri, Jul 14, 2017 at 3:36 PM, Will Deacon <will.deacon@arm.com> wrote:
> On Fri, Jul 14, 2017 at 03:34:42PM -0400, Rob Clark wrote:
>> On Fri, Jul 14, 2017 at 3:01 PM, Will Deacon <will.deacon@arm.com> wrote:
>> > On Fri, Jul 14, 2017 at 02:25:45PM -0400, Rob Clark wrote:
>> >> On Fri, Jul 14, 2017 at 2:06 PM, Will Deacon <will.deacon@arm.com> wrote:
>> >> > On Fri, Jul 14, 2017 at 01:42:13PM -0400, Rob Clark wrote:
>> >> >> On Fri, Jul 14, 2017 at 1:07 PM, Will Deacon <will.deacon@arm.com> wrote:
>> >> >> > On Thu, Jul 13, 2017 at 10:55:10AM -0400, Rob Clark wrote:
>> >> >> >> On Thu, Jul 13, 2017 at 9:53 AM, Sricharan R <sricharan@codeaurora.org> wrote:
>> >> >> >> > Hi,
>> >> >> >> >
>> >> >> >> > On 7/13/2017 5:20 PM, Rob Clark wrote:
>> >> >> >> >> On Thu, Jul 13, 2017 at 1:35 AM, Sricharan R <sricharan@codeaurora.org> wrote:
>> >> >> >> >>> Hi Vivek,
>> >> >> >> >>>
>> >> >> >> >>> On 7/13/2017 10:43 AM, Vivek Gautam wrote:
>> >> >> >> >>>> Hi Stephen,
>> >> >> >> >>>>
>> >> >> >> >>>>
>> >> >> >> >>>> On 07/13/2017 04:24 AM, Stephen Boyd wrote:
>> >> >> >> >>>>> On 07/06, Vivek Gautam wrote:
>> >> >> >> >>>>>> @@ -1231,12 +1237,18 @@ static int arm_smmu_map(struct iommu_domain *domain, unsigned long iova,
>> >> >> >> >>>>>>   static size_t arm_smmu_unmap(struct iommu_domain *domain, unsigned long iova,
>> >> >> >> >>>>>>                    size_t size)
>> >> >> >> >>>>>>   {
>> >> >> >> >>>>>> -    struct io_pgtable_ops *ops = to_smmu_domain(domain)->pgtbl_ops;
>> >> >> >> >>>>>> +    struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
>> >> >> >> >>>>>> +    struct io_pgtable_ops *ops = smmu_domain->pgtbl_ops;
>> >> >> >> >>>>>> +    size_t ret;
>> >> >> >> >>>>>>         if (!ops)
>> >> >> >> >>>>>>           return 0;
>> >> >> >> >>>>>>   -    return ops->unmap(ops, iova, size);
>> >> >> >> >>>>>> +    pm_runtime_get_sync(smmu_domain->smmu->dev);
>> >> >> >> >>>>> Can these map/unmap ops be called from an atomic context? I seem
>> >> >> >> >>>>> to recall that being a problem before.
>> >> >> >> >>>>
>> >> >> >> >>>> That's something which was dropped in the following patch merged in master:
>> >> >> >> >>>> 523d7423e21b iommu/arm-smmu: Remove io-pgtable spinlock
>> >> >> >> >>>>
>> >> >> >> >>>> Looks like we don't  need locks here anymore?
>> >> >> >> >>>
>> >> >> >> >>>  Apart from the locking, wonder why a explicit pm_runtime is needed
>> >> >> >> >>>  from unmap. Somehow looks like some path in the master using that
>> >> >> >> >>>  should have enabled the pm ?
>> >> >> >> >>>
>> >> >> >> >>
>> >> >> >> >> Yes, there are a bunch of scenarios where unmap can happen with
>> >> >> >> >> disabled master (but not in atomic context).  On the gpu side we
>> >> >> >> >> opportunistically keep a buffer mapping until the buffer is freed
>> >> >> >> >> (which can happen after gpu is disabled).  Likewise, v4l2 won't unmap
>> >> >> >> >> an exported dmabuf while some other driver holds a reference to it
>> >> >> >> >> (which can be dropped when the v4l2 device is suspended).
>> >> >> >> >>
>> >> >> >> >> Since unmap triggers tbl flush which touches iommu regs, the iommu
>> >> >> >> >> driver *definitely* needs a pm_runtime_get_sync().
>> >> >> >> >
>> >> >> >> >  Ok, with that being the case, there are two things here,
>> >> >> >> >
>> >> >> >> >  1) If the device links are still intact at these places where unmap is called,
>> >> >> >> >     then pm_runtime from the master would setup the all the clocks. That would
>> >> >> >> >     avoid reintroducing the locking indirectly here.
>> >> >> >> >
>> >> >> >> >  2) If not, then doing it here is the only way. But for both cases, since
>> >> >> >> >     the unmap can be called from atomic context, resume handler here should
>> >> >> >> >     avoid doing clk_prepare_enable , instead move the clk_prepare to the init.
>> >> >> >> >
>> >> >> >>
>> >> >> >> I do kinda like the approach Marek suggested.. of deferring the tlb
>> >> >> >> flush until resume.  I'm wondering if we could combine that with
>> >> >> >> putting the mmu in a stalled state when we suspend (and not resume the
>> >> >> >> mmu until after the pending tlb flush)?
>> >> >> >
>> >> >> > I'm not sure that a stalled state is what we're after here, because we need
>> >> >> > to take care to prevent any table walks if we've freed the underlying pages.
>> >> >> > What we could try to do is disable the SMMU (put into global bypass) and
>> >> >> > invalidate the TLB when performing a suspend operation, then we just ignore
>> >> >> > invalidation whilst the clocks are stopped and, on resume, enable the SMMU
>> >> >> > again.
>> >> >>
>> >> >> wouldn't stalled just block any memory transactions by device(s) using
>> >> >> the context bank?  Putting it in bypass isn't really a good thing if
>> >> >> there is any chance the device can sneak in a memory access before
>> >> >> we've taking it back out of bypass (ie. makes gpu a giant userspace
>> >> >> controlled root hole).
>> >> >
>> >> > If it doesn't deadlock, then yes, it will stall transactions. However, that
>> >> > doesn't mean it necessarily prevents page table walks.
>> >>
>> >> btw, I guess the concern about pagetable walk is that the unmap could
>> >> have removed some sub-level of the pt that the tlb walk would hit?
>> >> Would deferring freeing those pages help?
>> >
>> > Could do, but it sounds like a lot of complication that I think we can fix
>> > by making the suspend operation put the SMMU into a "clean" state.
>> >
>> >> > Instead of bypass, we
>> >> > could configure all the streams to terminate, but this race still worries me
>> >> > somewhat. I thought that the SMMU would only be suspended if all of its
>> >> > masters were suspended, so if the GPU wants to come out of suspend then the
>> >> > SMMU should be resumed first.
>> >>
>> >> I believe this should be true.. on the gpu side, I'm mostly trying to
>> >> avoid having to power the gpu back on to free buffers.  (On the v4l2
>> >> side, somewhere in the core videobuf code would also need to be made
>> >> to wrap it's dma_unmap_sg() with pm_runtime_get/put()..)
>> >
>> > Right, and we shouldn't have to resume it if we suspend it in a clean state,
>> > with the TLBs invalidated.
>> >
>>
>> I guess if the device_link() stuff ensured the attached device
>> (gpu/etc) was suspended before suspending the iommu, then I guess I
>> can't see how temporarily putting the iommu in bypass would be a
>> problem.  I haven't looked at the device_link() stuff too closely, but
>> iommu being resumed first and suspended last seems like the only thing
>> that would make sense.  I'm mostly just nervous about iommu in bypass
>> vs gpu since userspace has so much control over what address gpu
>> writes to / reads from, so getting it wrong w/ the iommu would be a
>> rather bad thing ;-)
>
> Right, but we can also configure it to terminate if you don't want bypass.
>

ok, terminate wfm

BR,
-R

^ permalink raw reply	[flat|nested] 168+ messages in thread

* Re: [PATCH V4 3/6] iommu/arm-smmu: Invoke pm_runtime during probe, add/remove device
@ 2017-07-14 19:39                                             ` Rob Clark
  0 siblings, 0 replies; 168+ messages in thread
From: Rob Clark @ 2017-07-14 19:39 UTC (permalink / raw)
  To: Will Deacon
  Cc: Sricharan R, Vivek Gautam, Stephen Boyd, Joerg Roedel,
	Robin Murphy, Rob Herring, Mark Rutland, Marek Szyprowski, iommu,
	devicetree, Linux Kernel Mailing List, linux-clk, linux-arm-msm,
	Stanimir Varbanov, Archit Taneja, linux-arm-kernel

On Fri, Jul 14, 2017 at 3:36 PM, Will Deacon <will.deacon@arm.com> wrote:
> On Fri, Jul 14, 2017 at 03:34:42PM -0400, Rob Clark wrote:
>> On Fri, Jul 14, 2017 at 3:01 PM, Will Deacon <will.deacon@arm.com> wrote:
>> > On Fri, Jul 14, 2017 at 02:25:45PM -0400, Rob Clark wrote:
>> >> On Fri, Jul 14, 2017 at 2:06 PM, Will Deacon <will.deacon@arm.com> wrote:
>> >> > On Fri, Jul 14, 2017 at 01:42:13PM -0400, Rob Clark wrote:
>> >> >> On Fri, Jul 14, 2017 at 1:07 PM, Will Deacon <will.deacon@arm.com> wrote:
>> >> >> > On Thu, Jul 13, 2017 at 10:55:10AM -0400, Rob Clark wrote:
>> >> >> >> On Thu, Jul 13, 2017 at 9:53 AM, Sricharan R <sricharan@codeaurora.org> wrote:
>> >> >> >> > Hi,
>> >> >> >> >
>> >> >> >> > On 7/13/2017 5:20 PM, Rob Clark wrote:
>> >> >> >> >> On Thu, Jul 13, 2017 at 1:35 AM, Sricharan R <sricharan@codeaurora.org> wrote:
>> >> >> >> >>> Hi Vivek,
>> >> >> >> >>>
>> >> >> >> >>> On 7/13/2017 10:43 AM, Vivek Gautam wrote:
>> >> >> >> >>>> Hi Stephen,
>> >> >> >> >>>>
>> >> >> >> >>>>
>> >> >> >> >>>> On 07/13/2017 04:24 AM, Stephen Boyd wrote:
>> >> >> >> >>>>> On 07/06, Vivek Gautam wrote:
>> >> >> >> >>>>>> @@ -1231,12 +1237,18 @@ static int arm_smmu_map(struct iommu_domain *domain, unsigned long iova,
>> >> >> >> >>>>>>   static size_t arm_smmu_unmap(struct iommu_domain *domain, unsigned long iova,
>> >> >> >> >>>>>>                    size_t size)
>> >> >> >> >>>>>>   {
>> >> >> >> >>>>>> -    struct io_pgtable_ops *ops = to_smmu_domain(domain)->pgtbl_ops;
>> >> >> >> >>>>>> +    struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
>> >> >> >> >>>>>> +    struct io_pgtable_ops *ops = smmu_domain->pgtbl_ops;
>> >> >> >> >>>>>> +    size_t ret;
>> >> >> >> >>>>>>         if (!ops)
>> >> >> >> >>>>>>           return 0;
>> >> >> >> >>>>>>   -    return ops->unmap(ops, iova, size);
>> >> >> >> >>>>>> +    pm_runtime_get_sync(smmu_domain->smmu->dev);
>> >> >> >> >>>>> Can these map/unmap ops be called from an atomic context? I seem
>> >> >> >> >>>>> to recall that being a problem before.
>> >> >> >> >>>>
>> >> >> >> >>>> That's something which was dropped in the following patch merged in master:
>> >> >> >> >>>> 523d7423e21b iommu/arm-smmu: Remove io-pgtable spinlock
>> >> >> >> >>>>
>> >> >> >> >>>> Looks like we don't  need locks here anymore?
>> >> >> >> >>>
>> >> >> >> >>>  Apart from the locking, wonder why a explicit pm_runtime is needed
>> >> >> >> >>>  from unmap. Somehow looks like some path in the master using that
>> >> >> >> >>>  should have enabled the pm ?
>> >> >> >> >>>
>> >> >> >> >>
>> >> >> >> >> Yes, there are a bunch of scenarios where unmap can happen with
>> >> >> >> >> disabled master (but not in atomic context).  On the gpu side we
>> >> >> >> >> opportunistically keep a buffer mapping until the buffer is freed
>> >> >> >> >> (which can happen after gpu is disabled).  Likewise, v4l2 won't unmap
>> >> >> >> >> an exported dmabuf while some other driver holds a reference to it
>> >> >> >> >> (which can be dropped when the v4l2 device is suspended).
>> >> >> >> >>
>> >> >> >> >> Since unmap triggers tbl flush which touches iommu regs, the iommu
>> >> >> >> >> driver *definitely* needs a pm_runtime_get_sync().
>> >> >> >> >
>> >> >> >> >  Ok, with that being the case, there are two things here,
>> >> >> >> >
>> >> >> >> >  1) If the device links are still intact at these places where unmap is called,
>> >> >> >> >     then pm_runtime from the master would setup the all the clocks. That would
>> >> >> >> >     avoid reintroducing the locking indirectly here.
>> >> >> >> >
>> >> >> >> >  2) If not, then doing it here is the only way. But for both cases, since
>> >> >> >> >     the unmap can be called from atomic context, resume handler here should
>> >> >> >> >     avoid doing clk_prepare_enable , instead move the clk_prepare to the init.
>> >> >> >> >
>> >> >> >>
>> >> >> >> I do kinda like the approach Marek suggested.. of deferring the tlb
>> >> >> >> flush until resume.  I'm wondering if we could combine that with
>> >> >> >> putting the mmu in a stalled state when we suspend (and not resume the
>> >> >> >> mmu until after the pending tlb flush)?
>> >> >> >
>> >> >> > I'm not sure that a stalled state is what we're after here, because we need
>> >> >> > to take care to prevent any table walks if we've freed the underlying pages.
>> >> >> > What we could try to do is disable the SMMU (put into global bypass) and
>> >> >> > invalidate the TLB when performing a suspend operation, then we just ignore
>> >> >> > invalidation whilst the clocks are stopped and, on resume, enable the SMMU
>> >> >> > again.
>> >> >>
>> >> >> wouldn't stalled just block any memory transactions by device(s) using
>> >> >> the context bank?  Putting it in bypass isn't really a good thing if
>> >> >> there is any chance the device can sneak in a memory access before
>> >> >> we've taking it back out of bypass (ie. makes gpu a giant userspace
>> >> >> controlled root hole).
>> >> >
>> >> > If it doesn't deadlock, then yes, it will stall transactions. However, that
>> >> > doesn't mean it necessarily prevents page table walks.
>> >>
>> >> btw, I guess the concern about pagetable walk is that the unmap could
>> >> have removed some sub-level of the pt that the tlb walk would hit?
>> >> Would deferring freeing those pages help?
>> >
>> > Could do, but it sounds like a lot of complication that I think we can fix
>> > by making the suspend operation put the SMMU into a "clean" state.
>> >
>> >> > Instead of bypass, we
>> >> > could configure all the streams to terminate, but this race still worries me
>> >> > somewhat. I thought that the SMMU would only be suspended if all of its
>> >> > masters were suspended, so if the GPU wants to come out of suspend then the
>> >> > SMMU should be resumed first.
>> >>
>> >> I believe this should be true.. on the gpu side, I'm mostly trying to
>> >> avoid having to power the gpu back on to free buffers.  (On the v4l2
>> >> side, somewhere in the core videobuf code would also need to be made
>> >> to wrap it's dma_unmap_sg() with pm_runtime_get/put()..)
>> >
>> > Right, and we shouldn't have to resume it if we suspend it in a clean state,
>> > with the TLBs invalidated.
>> >
>>
>> I guess if the device_link() stuff ensured the attached device
>> (gpu/etc) was suspended before suspending the iommu, then I guess I
>> can't see how temporarily putting the iommu in bypass would be a
>> problem.  I haven't looked at the device_link() stuff too closely, but
>> iommu being resumed first and suspended last seems like the only thing
>> that would make sense.  I'm mostly just nervous about iommu in bypass
>> vs gpu since userspace has so much control over what address gpu
>> writes to / reads from, so getting it wrong w/ the iommu would be a
>> rather bad thing ;-)
>
> Right, but we can also configure it to terminate if you don't want bypass.
>

ok, terminate wfm

BR,
-R

^ permalink raw reply	[flat|nested] 168+ messages in thread

* [PATCH V4 3/6] iommu/arm-smmu: Invoke pm_runtime during probe, add/remove device
@ 2017-07-14 19:39                                             ` Rob Clark
  0 siblings, 0 replies; 168+ messages in thread
From: Rob Clark @ 2017-07-14 19:39 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Jul 14, 2017 at 3:36 PM, Will Deacon <will.deacon@arm.com> wrote:
> On Fri, Jul 14, 2017 at 03:34:42PM -0400, Rob Clark wrote:
>> On Fri, Jul 14, 2017 at 3:01 PM, Will Deacon <will.deacon@arm.com> wrote:
>> > On Fri, Jul 14, 2017 at 02:25:45PM -0400, Rob Clark wrote:
>> >> On Fri, Jul 14, 2017 at 2:06 PM, Will Deacon <will.deacon@arm.com> wrote:
>> >> > On Fri, Jul 14, 2017 at 01:42:13PM -0400, Rob Clark wrote:
>> >> >> On Fri, Jul 14, 2017 at 1:07 PM, Will Deacon <will.deacon@arm.com> wrote:
>> >> >> > On Thu, Jul 13, 2017 at 10:55:10AM -0400, Rob Clark wrote:
>> >> >> >> On Thu, Jul 13, 2017 at 9:53 AM, Sricharan R <sricharan@codeaurora.org> wrote:
>> >> >> >> > Hi,
>> >> >> >> >
>> >> >> >> > On 7/13/2017 5:20 PM, Rob Clark wrote:
>> >> >> >> >> On Thu, Jul 13, 2017 at 1:35 AM, Sricharan R <sricharan@codeaurora.org> wrote:
>> >> >> >> >>> Hi Vivek,
>> >> >> >> >>>
>> >> >> >> >>> On 7/13/2017 10:43 AM, Vivek Gautam wrote:
>> >> >> >> >>>> Hi Stephen,
>> >> >> >> >>>>
>> >> >> >> >>>>
>> >> >> >> >>>> On 07/13/2017 04:24 AM, Stephen Boyd wrote:
>> >> >> >> >>>>> On 07/06, Vivek Gautam wrote:
>> >> >> >> >>>>>> @@ -1231,12 +1237,18 @@ static int arm_smmu_map(struct iommu_domain *domain, unsigned long iova,
>> >> >> >> >>>>>>   static size_t arm_smmu_unmap(struct iommu_domain *domain, unsigned long iova,
>> >> >> >> >>>>>>                    size_t size)
>> >> >> >> >>>>>>   {
>> >> >> >> >>>>>> -    struct io_pgtable_ops *ops = to_smmu_domain(domain)->pgtbl_ops;
>> >> >> >> >>>>>> +    struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
>> >> >> >> >>>>>> +    struct io_pgtable_ops *ops = smmu_domain->pgtbl_ops;
>> >> >> >> >>>>>> +    size_t ret;
>> >> >> >> >>>>>>         if (!ops)
>> >> >> >> >>>>>>           return 0;
>> >> >> >> >>>>>>   -    return ops->unmap(ops, iova, size);
>> >> >> >> >>>>>> +    pm_runtime_get_sync(smmu_domain->smmu->dev);
>> >> >> >> >>>>> Can these map/unmap ops be called from an atomic context? I seem
>> >> >> >> >>>>> to recall that being a problem before.
>> >> >> >> >>>>
>> >> >> >> >>>> That's something which was dropped in the following patch merged in master:
>> >> >> >> >>>> 523d7423e21b iommu/arm-smmu: Remove io-pgtable spinlock
>> >> >> >> >>>>
>> >> >> >> >>>> Looks like we don't  need locks here anymore?
>> >> >> >> >>>
>> >> >> >> >>>  Apart from the locking, wonder why a explicit pm_runtime is needed
>> >> >> >> >>>  from unmap. Somehow looks like some path in the master using that
>> >> >> >> >>>  should have enabled the pm ?
>> >> >> >> >>>
>> >> >> >> >>
>> >> >> >> >> Yes, there are a bunch of scenarios where unmap can happen with
>> >> >> >> >> disabled master (but not in atomic context).  On the gpu side we
>> >> >> >> >> opportunistically keep a buffer mapping until the buffer is freed
>> >> >> >> >> (which can happen after gpu is disabled).  Likewise, v4l2 won't unmap
>> >> >> >> >> an exported dmabuf while some other driver holds a reference to it
>> >> >> >> >> (which can be dropped when the v4l2 device is suspended).
>> >> >> >> >>
>> >> >> >> >> Since unmap triggers tbl flush which touches iommu regs, the iommu
>> >> >> >> >> driver *definitely* needs a pm_runtime_get_sync().
>> >> >> >> >
>> >> >> >> >  Ok, with that being the case, there are two things here,
>> >> >> >> >
>> >> >> >> >  1) If the device links are still intact at these places where unmap is called,
>> >> >> >> >     then pm_runtime from the master would setup the all the clocks. That would
>> >> >> >> >     avoid reintroducing the locking indirectly here.
>> >> >> >> >
>> >> >> >> >  2) If not, then doing it here is the only way. But for both cases, since
>> >> >> >> >     the unmap can be called from atomic context, resume handler here should
>> >> >> >> >     avoid doing clk_prepare_enable , instead move the clk_prepare to the init.
>> >> >> >> >
>> >> >> >>
>> >> >> >> I do kinda like the approach Marek suggested.. of deferring the tlb
>> >> >> >> flush until resume.  I'm wondering if we could combine that with
>> >> >> >> putting the mmu in a stalled state when we suspend (and not resume the
>> >> >> >> mmu until after the pending tlb flush)?
>> >> >> >
>> >> >> > I'm not sure that a stalled state is what we're after here, because we need
>> >> >> > to take care to prevent any table walks if we've freed the underlying pages.
>> >> >> > What we could try to do is disable the SMMU (put into global bypass) and
>> >> >> > invalidate the TLB when performing a suspend operation, then we just ignore
>> >> >> > invalidation whilst the clocks are stopped and, on resume, enable the SMMU
>> >> >> > again.
>> >> >>
>> >> >> wouldn't stalled just block any memory transactions by device(s) using
>> >> >> the context bank?  Putting it in bypass isn't really a good thing if
>> >> >> there is any chance the device can sneak in a memory access before
>> >> >> we've taking it back out of bypass (ie. makes gpu a giant userspace
>> >> >> controlled root hole).
>> >> >
>> >> > If it doesn't deadlock, then yes, it will stall transactions. However, that
>> >> > doesn't mean it necessarily prevents page table walks.
>> >>
>> >> btw, I guess the concern about pagetable walk is that the unmap could
>> >> have removed some sub-level of the pt that the tlb walk would hit?
>> >> Would deferring freeing those pages help?
>> >
>> > Could do, but it sounds like a lot of complication that I think we can fix
>> > by making the suspend operation put the SMMU into a "clean" state.
>> >
>> >> > Instead of bypass, we
>> >> > could configure all the streams to terminate, but this race still worries me
>> >> > somewhat. I thought that the SMMU would only be suspended if all of its
>> >> > masters were suspended, so if the GPU wants to come out of suspend then the
>> >> > SMMU should be resumed first.
>> >>
>> >> I believe this should be true.. on the gpu side, I'm mostly trying to
>> >> avoid having to power the gpu back on to free buffers.  (On the v4l2
>> >> side, somewhere in the core videobuf code would also need to be made
>> >> to wrap it's dma_unmap_sg() with pm_runtime_get/put()..)
>> >
>> > Right, and we shouldn't have to resume it if we suspend it in a clean state,
>> > with the TLBs invalidated.
>> >
>>
>> I guess if the device_link() stuff ensured the attached device
>> (gpu/etc) was suspended before suspending the iommu, then I guess I
>> can't see how temporarily putting the iommu in bypass would be a
>> problem.  I haven't looked at the device_link() stuff too closely, but
>> iommu being resumed first and suspended last seems like the only thing
>> that would make sense.  I'm mostly just nervous about iommu in bypass
>> vs gpu since userspace has so much control over what address gpu
>> writes to / reads from, so getting it wrong w/ the iommu would be a
>> rather bad thing ;-)
>
> Right, but we can also configure it to terminate if you don't want bypass.
>

ok, terminate wfm

BR,
-R

^ permalink raw reply	[flat|nested] 168+ messages in thread

* Re: [PATCH V4 3/6] iommu/arm-smmu: Invoke pm_runtime during probe, add/remove device
  2017-07-14 19:39                                             ` Rob Clark
  (?)
@ 2017-07-17 11:46                                               ` Sricharan R
  -1 siblings, 0 replies; 168+ messages in thread
From: Sricharan R @ 2017-07-17 11:46 UTC (permalink / raw)
  To: Rob Clark, Will Deacon
  Cc: Vivek Gautam, Stephen Boyd, Joerg Roedel, Robin Murphy,
	Rob Herring, Mark Rutland, Marek Szyprowski, iommu, devicetree,
	Linux Kernel Mailing List, linux-clk, linux-arm-msm,
	Stanimir Varbanov, Archit Taneja, linux-arm-kernel

Hi,

On 7/15/2017 1:09 AM, Rob Clark wrote:
> On Fri, Jul 14, 2017 at 3:36 PM, Will Deacon <will.deacon@arm.com> wrote:
>> On Fri, Jul 14, 2017 at 03:34:42PM -0400, Rob Clark wrote:
>>> On Fri, Jul 14, 2017 at 3:01 PM, Will Deacon <will.deacon@arm.com> wrote:
>>>> On Fri, Jul 14, 2017 at 02:25:45PM -0400, Rob Clark wrote:
>>>>> On Fri, Jul 14, 2017 at 2:06 PM, Will Deacon <will.deacon@arm.com> wrote:
>>>>>> On Fri, Jul 14, 2017 at 01:42:13PM -0400, Rob Clark wrote:
>>>>>>> On Fri, Jul 14, 2017 at 1:07 PM, Will Deacon <will.deacon@arm.com> wrote:
>>>>>>>> On Thu, Jul 13, 2017 at 10:55:10AM -0400, Rob Clark wrote:
>>>>>>>>> On Thu, Jul 13, 2017 at 9:53 AM, Sricharan R <sricharan@codeaurora.org> wrote:
>>>>>>>>>> Hi,
>>>>>>>>>>
>>>>>>>>>> On 7/13/2017 5:20 PM, Rob Clark wrote:
>>>>>>>>>>> On Thu, Jul 13, 2017 at 1:35 AM, Sricharan R <sricharan@codeaurora.org> wrote:
>>>>>>>>>>>> Hi Vivek,
>>>>>>>>>>>>
>>>>>>>>>>>> On 7/13/2017 10:43 AM, Vivek Gautam wrote:
>>>>>>>>>>>>> Hi Stephen,
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On 07/13/2017 04:24 AM, Stephen Boyd wrote:
>>>>>>>>>>>>>> On 07/06, Vivek Gautam wrote:
>>>>>>>>>>>>>>> @@ -1231,12 +1237,18 @@ static int arm_smmu_map(struct iommu_domain *domain, unsigned long iova,
>>>>>>>>>>>>>>>   static size_t arm_smmu_unmap(struct iommu_domain *domain, unsigned long iova,
>>>>>>>>>>>>>>>                    size_t size)
>>>>>>>>>>>>>>>   {
>>>>>>>>>>>>>>> -    struct io_pgtable_ops *ops = to_smmu_domain(domain)->pgtbl_ops;
>>>>>>>>>>>>>>> +    struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
>>>>>>>>>>>>>>> +    struct io_pgtable_ops *ops = smmu_domain->pgtbl_ops;
>>>>>>>>>>>>>>> +    size_t ret;
>>>>>>>>>>>>>>>         if (!ops)
>>>>>>>>>>>>>>>           return 0;
>>>>>>>>>>>>>>>   -    return ops->unmap(ops, iova, size);
>>>>>>>>>>>>>>> +    pm_runtime_get_sync(smmu_domain->smmu->dev);
>>>>>>>>>>>>>> Can these map/unmap ops be called from an atomic context? I seem
>>>>>>>>>>>>>> to recall that being a problem before.
>>>>>>>>>>>>>
>>>>>>>>>>>>> That's something which was dropped in the following patch merged in master:
>>>>>>>>>>>>> 523d7423e21b iommu/arm-smmu: Remove io-pgtable spinlock
>>>>>>>>>>>>>
>>>>>>>>>>>>> Looks like we don't  need locks here anymore?
>>>>>>>>>>>>
>>>>>>>>>>>>  Apart from the locking, wonder why a explicit pm_runtime is needed
>>>>>>>>>>>>  from unmap. Somehow looks like some path in the master using that
>>>>>>>>>>>>  should have enabled the pm ?
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Yes, there are a bunch of scenarios where unmap can happen with
>>>>>>>>>>> disabled master (but not in atomic context).  On the gpu side we
>>>>>>>>>>> opportunistically keep a buffer mapping until the buffer is freed
>>>>>>>>>>> (which can happen after gpu is disabled).  Likewise, v4l2 won't unmap
>>>>>>>>>>> an exported dmabuf while some other driver holds a reference to it
>>>>>>>>>>> (which can be dropped when the v4l2 device is suspended).
>>>>>>>>>>>
>>>>>>>>>>> Since unmap triggers tbl flush which touches iommu regs, the iommu
>>>>>>>>>>> driver *definitely* needs a pm_runtime_get_sync().
>>>>>>>>>>
>>>>>>>>>>  Ok, with that being the case, there are two things here,
>>>>>>>>>>
>>>>>>>>>>  1) If the device links are still intact at these places where unmap is called,
>>>>>>>>>>     then pm_runtime from the master would setup the all the clocks. That would
>>>>>>>>>>     avoid reintroducing the locking indirectly here.
>>>>>>>>>>
>>>>>>>>>>  2) If not, then doing it here is the only way. But for both cases, since
>>>>>>>>>>     the unmap can be called from atomic context, resume handler here should
>>>>>>>>>>     avoid doing clk_prepare_enable , instead move the clk_prepare to the init.
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> I do kinda like the approach Marek suggested.. of deferring the tlb
>>>>>>>>> flush until resume.  I'm wondering if we could combine that with
>>>>>>>>> putting the mmu in a stalled state when we suspend (and not resume the
>>>>>>>>> mmu until after the pending tlb flush)?
>>>>>>>>
>>>>>>>> I'm not sure that a stalled state is what we're after here, because we need
>>>>>>>> to take care to prevent any table walks if we've freed the underlying pages.
>>>>>>>> What we could try to do is disable the SMMU (put into global bypass) and
>>>>>>>> invalidate the TLB when performing a suspend operation, then we just ignore
>>>>>>>> invalidation whilst the clocks are stopped and, on resume, enable the SMMU
>>>>>>>> again.
>>>>>>>
>>>>>>> wouldn't stalled just block any memory transactions by device(s) using
>>>>>>> the context bank?  Putting it in bypass isn't really a good thing if
>>>>>>> there is any chance the device can sneak in a memory access before
>>>>>>> we've taking it back out of bypass (ie. makes gpu a giant userspace
>>>>>>> controlled root hole).
>>>>>>
>>>>>> If it doesn't deadlock, then yes, it will stall transactions. However, that
>>>>>> doesn't mean it necessarily prevents page table walks.
>>>>>
>>>>> btw, I guess the concern about pagetable walk is that the unmap could
>>>>> have removed some sub-level of the pt that the tlb walk would hit?
>>>>> Would deferring freeing those pages help?
>>>>
>>>> Could do, but it sounds like a lot of complication that I think we can fix
>>>> by making the suspend operation put the SMMU into a "clean" state.
>>>>
>>>>>> Instead of bypass, we
>>>>>> could configure all the streams to terminate, but this race still worries me
>>>>>> somewhat. I thought that the SMMU would only be suspended if all of its
>>>>>> masters were suspended, so if the GPU wants to come out of suspend then the
>>>>>> SMMU should be resumed first.
>>>>>
>>>>> I believe this should be true.. on the gpu side, I'm mostly trying to
>>>>> avoid having to power the gpu back on to free buffers.  (On the v4l2
>>>>> side, somewhere in the core videobuf code would also need to be made
>>>>> to wrap it's dma_unmap_sg() with pm_runtime_get/put()..)
>>>>
>>>> Right, and we shouldn't have to resume it if we suspend it in a clean state,
>>>> with the TLBs invalidated.
>>>>
>>>
>>> I guess if the device_link() stuff ensured the attached device
>>> (gpu/etc) was suspended before suspending the iommu, then I guess I
>>> can't see how temporarily putting the iommu in bypass would be a
>>> problem.  I haven't looked at the device_link() stuff too closely, but
>>> iommu being resumed first and suspended last seems like the only thing
>>> that would make sense.  I'm mostly just nervous about iommu in bypass
>>> vs gpu since userspace has so much control over what address gpu
>>> writes to / reads from, so getting it wrong w/ the iommu would be a
>>> rather bad thing ;-)
>>
>> Right, but we can also configure it to terminate if you don't want bypass.
>>
> 

 But one thing here is, with devicelinks in picture, iommu suspend/resume
 is called along with the master. That means, we can end up cleaning even
 active entries on the suspend path ?, if suspend is going to
 put the smmu in to a clean state every time. So if the master's are following
 the pm_runtime sequence before a dma_map/unmap operation, that seems better.

Regards,
 Sricharan


-- 
"QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation

---
This email has been checked for viruses by Avast antivirus software.
https://www.avast.com/antivirus

^ permalink raw reply	[flat|nested] 168+ messages in thread

* Re: [PATCH V4 3/6] iommu/arm-smmu: Invoke pm_runtime during probe, add/remove device
@ 2017-07-17 11:46                                               ` Sricharan R
  0 siblings, 0 replies; 168+ messages in thread
From: Sricharan R @ 2017-07-17 11:46 UTC (permalink / raw)
  To: Rob Clark, Will Deacon
  Cc: Vivek Gautam, Stephen Boyd, Joerg Roedel, Robin Murphy,
	Rob Herring, Mark Rutland, Marek Szyprowski, iommu, devicetree,
	Linux Kernel Mailing List, linux-clk, linux-arm-msm,
	Stanimir Varbanov, Archit Taneja, linux-arm-kernel

Hi,

On 7/15/2017 1:09 AM, Rob Clark wrote:
> On Fri, Jul 14, 2017 at 3:36 PM, Will Deacon <will.deacon@arm.com> wrote:
>> On Fri, Jul 14, 2017 at 03:34:42PM -0400, Rob Clark wrote:
>>> On Fri, Jul 14, 2017 at 3:01 PM, Will Deacon <will.deacon@arm.com> wrote:
>>>> On Fri, Jul 14, 2017 at 02:25:45PM -0400, Rob Clark wrote:
>>>>> On Fri, Jul 14, 2017 at 2:06 PM, Will Deacon <will.deacon@arm.com> wrote:
>>>>>> On Fri, Jul 14, 2017 at 01:42:13PM -0400, Rob Clark wrote:
>>>>>>> On Fri, Jul 14, 2017 at 1:07 PM, Will Deacon <will.deacon@arm.com> wrote:
>>>>>>>> On Thu, Jul 13, 2017 at 10:55:10AM -0400, Rob Clark wrote:
>>>>>>>>> On Thu, Jul 13, 2017 at 9:53 AM, Sricharan R <sricharan@codeaurora.org> wrote:
>>>>>>>>>> Hi,
>>>>>>>>>>
>>>>>>>>>> On 7/13/2017 5:20 PM, Rob Clark wrote:
>>>>>>>>>>> On Thu, Jul 13, 2017 at 1:35 AM, Sricharan R <sricharan@codeaurora.org> wrote:
>>>>>>>>>>>> Hi Vivek,
>>>>>>>>>>>>
>>>>>>>>>>>> On 7/13/2017 10:43 AM, Vivek Gautam wrote:
>>>>>>>>>>>>> Hi Stephen,
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On 07/13/2017 04:24 AM, Stephen Boyd wrote:
>>>>>>>>>>>>>> On 07/06, Vivek Gautam wrote:
>>>>>>>>>>>>>>> @@ -1231,12 +1237,18 @@ static int arm_smmu_map(struct iommu_domain *domain, unsigned long iova,
>>>>>>>>>>>>>>>   static size_t arm_smmu_unmap(struct iommu_domain *domain, unsigned long iova,
>>>>>>>>>>>>>>>                    size_t size)
>>>>>>>>>>>>>>>   {
>>>>>>>>>>>>>>> -    struct io_pgtable_ops *ops = to_smmu_domain(domain)->pgtbl_ops;
>>>>>>>>>>>>>>> +    struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
>>>>>>>>>>>>>>> +    struct io_pgtable_ops *ops = smmu_domain->pgtbl_ops;
>>>>>>>>>>>>>>> +    size_t ret;
>>>>>>>>>>>>>>>         if (!ops)
>>>>>>>>>>>>>>>           return 0;
>>>>>>>>>>>>>>>   -    return ops->unmap(ops, iova, size);
>>>>>>>>>>>>>>> +    pm_runtime_get_sync(smmu_domain->smmu->dev);
>>>>>>>>>>>>>> Can these map/unmap ops be called from an atomic context? I seem
>>>>>>>>>>>>>> to recall that being a problem before.
>>>>>>>>>>>>>
>>>>>>>>>>>>> That's something which was dropped in the following patch merged in master:
>>>>>>>>>>>>> 523d7423e21b iommu/arm-smmu: Remove io-pgtable spinlock
>>>>>>>>>>>>>
>>>>>>>>>>>>> Looks like we don't  need locks here anymore?
>>>>>>>>>>>>
>>>>>>>>>>>>  Apart from the locking, wonder why a explicit pm_runtime is needed
>>>>>>>>>>>>  from unmap. Somehow looks like some path in the master using that
>>>>>>>>>>>>  should have enabled the pm ?
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Yes, there are a bunch of scenarios where unmap can happen with
>>>>>>>>>>> disabled master (but not in atomic context).  On the gpu side we
>>>>>>>>>>> opportunistically keep a buffer mapping until the buffer is freed
>>>>>>>>>>> (which can happen after gpu is disabled).  Likewise, v4l2 won't unmap
>>>>>>>>>>> an exported dmabuf while some other driver holds a reference to it
>>>>>>>>>>> (which can be dropped when the v4l2 device is suspended).
>>>>>>>>>>>
>>>>>>>>>>> Since unmap triggers tbl flush which touches iommu regs, the iommu
>>>>>>>>>>> driver *definitely* needs a pm_runtime_get_sync().
>>>>>>>>>>
>>>>>>>>>>  Ok, with that being the case, there are two things here,
>>>>>>>>>>
>>>>>>>>>>  1) If the device links are still intact at these places where unmap is called,
>>>>>>>>>>     then pm_runtime from the master would setup the all the clocks. That would
>>>>>>>>>>     avoid reintroducing the locking indirectly here.
>>>>>>>>>>
>>>>>>>>>>  2) If not, then doing it here is the only way. But for both cases, since
>>>>>>>>>>     the unmap can be called from atomic context, resume handler here should
>>>>>>>>>>     avoid doing clk_prepare_enable , instead move the clk_prepare to the init.
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> I do kinda like the approach Marek suggested.. of deferring the tlb
>>>>>>>>> flush until resume.  I'm wondering if we could combine that with
>>>>>>>>> putting the mmu in a stalled state when we suspend (and not resume the
>>>>>>>>> mmu until after the pending tlb flush)?
>>>>>>>>
>>>>>>>> I'm not sure that a stalled state is what we're after here, because we need
>>>>>>>> to take care to prevent any table walks if we've freed the underlying pages.
>>>>>>>> What we could try to do is disable the SMMU (put into global bypass) and
>>>>>>>> invalidate the TLB when performing a suspend operation, then we just ignore
>>>>>>>> invalidation whilst the clocks are stopped and, on resume, enable the SMMU
>>>>>>>> again.
>>>>>>>
>>>>>>> wouldn't stalled just block any memory transactions by device(s) using
>>>>>>> the context bank?  Putting it in bypass isn't really a good thing if
>>>>>>> there is any chance the device can sneak in a memory access before
>>>>>>> we've taking it back out of bypass (ie. makes gpu a giant userspace
>>>>>>> controlled root hole).
>>>>>>
>>>>>> If it doesn't deadlock, then yes, it will stall transactions. However, that
>>>>>> doesn't mean it necessarily prevents page table walks.
>>>>>
>>>>> btw, I guess the concern about pagetable walk is that the unmap could
>>>>> have removed some sub-level of the pt that the tlb walk would hit?
>>>>> Would deferring freeing those pages help?
>>>>
>>>> Could do, but it sounds like a lot of complication that I think we can fix
>>>> by making the suspend operation put the SMMU into a "clean" state.
>>>>
>>>>>> Instead of bypass, we
>>>>>> could configure all the streams to terminate, but this race still worries me
>>>>>> somewhat. I thought that the SMMU would only be suspended if all of its
>>>>>> masters were suspended, so if the GPU wants to come out of suspend then the
>>>>>> SMMU should be resumed first.
>>>>>
>>>>> I believe this should be true.. on the gpu side, I'm mostly trying to
>>>>> avoid having to power the gpu back on to free buffers.  (On the v4l2
>>>>> side, somewhere in the core videobuf code would also need to be made
>>>>> to wrap it's dma_unmap_sg() with pm_runtime_get/put()..)
>>>>
>>>> Right, and we shouldn't have to resume it if we suspend it in a clean state,
>>>> with the TLBs invalidated.
>>>>
>>>
>>> I guess if the device_link() stuff ensured the attached device
>>> (gpu/etc) was suspended before suspending the iommu, then I guess I
>>> can't see how temporarily putting the iommu in bypass would be a
>>> problem.  I haven't looked at the device_link() stuff too closely, but
>>> iommu being resumed first and suspended last seems like the only thing
>>> that would make sense.  I'm mostly just nervous about iommu in bypass
>>> vs gpu since userspace has so much control over what address gpu
>>> writes to / reads from, so getting it wrong w/ the iommu would be a
>>> rather bad thing ;-)
>>
>> Right, but we can also configure it to terminate if you don't want bypass.
>>
> 

 But one thing here is, with devicelinks in picture, iommu suspend/resume
 is called along with the master. That means, we can end up cleaning even
 active entries on the suspend path ?, if suspend is going to
 put the smmu in to a clean state every time. So if the master's are following
 the pm_runtime sequence before a dma_map/unmap operation, that seems better.

Regards,
 Sricharan


-- 
"QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation

---
This email has been checked for viruses by Avast antivirus software.
https://www.avast.com/antivirus

^ permalink raw reply	[flat|nested] 168+ messages in thread

* [PATCH V4 3/6] iommu/arm-smmu: Invoke pm_runtime during probe, add/remove device
@ 2017-07-17 11:46                                               ` Sricharan R
  0 siblings, 0 replies; 168+ messages in thread
From: Sricharan R @ 2017-07-17 11:46 UTC (permalink / raw)
  To: linux-arm-kernel

Hi,

On 7/15/2017 1:09 AM, Rob Clark wrote:
> On Fri, Jul 14, 2017 at 3:36 PM, Will Deacon <will.deacon@arm.com> wrote:
>> On Fri, Jul 14, 2017 at 03:34:42PM -0400, Rob Clark wrote:
>>> On Fri, Jul 14, 2017 at 3:01 PM, Will Deacon <will.deacon@arm.com> wrote:
>>>> On Fri, Jul 14, 2017 at 02:25:45PM -0400, Rob Clark wrote:
>>>>> On Fri, Jul 14, 2017 at 2:06 PM, Will Deacon <will.deacon@arm.com> wrote:
>>>>>> On Fri, Jul 14, 2017 at 01:42:13PM -0400, Rob Clark wrote:
>>>>>>> On Fri, Jul 14, 2017 at 1:07 PM, Will Deacon <will.deacon@arm.com> wrote:
>>>>>>>> On Thu, Jul 13, 2017 at 10:55:10AM -0400, Rob Clark wrote:
>>>>>>>>> On Thu, Jul 13, 2017 at 9:53 AM, Sricharan R <sricharan@codeaurora.org> wrote:
>>>>>>>>>> Hi,
>>>>>>>>>>
>>>>>>>>>> On 7/13/2017 5:20 PM, Rob Clark wrote:
>>>>>>>>>>> On Thu, Jul 13, 2017 at 1:35 AM, Sricharan R <sricharan@codeaurora.org> wrote:
>>>>>>>>>>>> Hi Vivek,
>>>>>>>>>>>>
>>>>>>>>>>>> On 7/13/2017 10:43 AM, Vivek Gautam wrote:
>>>>>>>>>>>>> Hi Stephen,
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On 07/13/2017 04:24 AM, Stephen Boyd wrote:
>>>>>>>>>>>>>> On 07/06, Vivek Gautam wrote:
>>>>>>>>>>>>>>> @@ -1231,12 +1237,18 @@ static int arm_smmu_map(struct iommu_domain *domain, unsigned long iova,
>>>>>>>>>>>>>>>   static size_t arm_smmu_unmap(struct iommu_domain *domain, unsigned long iova,
>>>>>>>>>>>>>>>                    size_t size)
>>>>>>>>>>>>>>>   {
>>>>>>>>>>>>>>> -    struct io_pgtable_ops *ops = to_smmu_domain(domain)->pgtbl_ops;
>>>>>>>>>>>>>>> +    struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
>>>>>>>>>>>>>>> +    struct io_pgtable_ops *ops = smmu_domain->pgtbl_ops;
>>>>>>>>>>>>>>> +    size_t ret;
>>>>>>>>>>>>>>>         if (!ops)
>>>>>>>>>>>>>>>           return 0;
>>>>>>>>>>>>>>>   -    return ops->unmap(ops, iova, size);
>>>>>>>>>>>>>>> +    pm_runtime_get_sync(smmu_domain->smmu->dev);
>>>>>>>>>>>>>> Can these map/unmap ops be called from an atomic context? I seem
>>>>>>>>>>>>>> to recall that being a problem before.
>>>>>>>>>>>>>
>>>>>>>>>>>>> That's something which was dropped in the following patch merged in master:
>>>>>>>>>>>>> 523d7423e21b iommu/arm-smmu: Remove io-pgtable spinlock
>>>>>>>>>>>>>
>>>>>>>>>>>>> Looks like we don't  need locks here anymore?
>>>>>>>>>>>>
>>>>>>>>>>>>  Apart from the locking, wonder why a explicit pm_runtime is needed
>>>>>>>>>>>>  from unmap. Somehow looks like some path in the master using that
>>>>>>>>>>>>  should have enabled the pm ?
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Yes, there are a bunch of scenarios where unmap can happen with
>>>>>>>>>>> disabled master (but not in atomic context).  On the gpu side we
>>>>>>>>>>> opportunistically keep a buffer mapping until the buffer is freed
>>>>>>>>>>> (which can happen after gpu is disabled).  Likewise, v4l2 won't unmap
>>>>>>>>>>> an exported dmabuf while some other driver holds a reference to it
>>>>>>>>>>> (which can be dropped when the v4l2 device is suspended).
>>>>>>>>>>>
>>>>>>>>>>> Since unmap triggers tbl flush which touches iommu regs, the iommu
>>>>>>>>>>> driver *definitely* needs a pm_runtime_get_sync().
>>>>>>>>>>
>>>>>>>>>>  Ok, with that being the case, there are two things here,
>>>>>>>>>>
>>>>>>>>>>  1) If the device links are still intact at these places where unmap is called,
>>>>>>>>>>     then pm_runtime from the master would setup the all the clocks. That would
>>>>>>>>>>     avoid reintroducing the locking indirectly here.
>>>>>>>>>>
>>>>>>>>>>  2) If not, then doing it here is the only way. But for both cases, since
>>>>>>>>>>     the unmap can be called from atomic context, resume handler here should
>>>>>>>>>>     avoid doing clk_prepare_enable , instead move the clk_prepare to the init.
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> I do kinda like the approach Marek suggested.. of deferring the tlb
>>>>>>>>> flush until resume.  I'm wondering if we could combine that with
>>>>>>>>> putting the mmu in a stalled state when we suspend (and not resume the
>>>>>>>>> mmu until after the pending tlb flush)?
>>>>>>>>
>>>>>>>> I'm not sure that a stalled state is what we're after here, because we need
>>>>>>>> to take care to prevent any table walks if we've freed the underlying pages.
>>>>>>>> What we could try to do is disable the SMMU (put into global bypass) and
>>>>>>>> invalidate the TLB when performing a suspend operation, then we just ignore
>>>>>>>> invalidation whilst the clocks are stopped and, on resume, enable the SMMU
>>>>>>>> again.
>>>>>>>
>>>>>>> wouldn't stalled just block any memory transactions by device(s) using
>>>>>>> the context bank?  Putting it in bypass isn't really a good thing if
>>>>>>> there is any chance the device can sneak in a memory access before
>>>>>>> we've taking it back out of bypass (ie. makes gpu a giant userspace
>>>>>>> controlled root hole).
>>>>>>
>>>>>> If it doesn't deadlock, then yes, it will stall transactions. However, that
>>>>>> doesn't mean it necessarily prevents page table walks.
>>>>>
>>>>> btw, I guess the concern about pagetable walk is that the unmap could
>>>>> have removed some sub-level of the pt that the tlb walk would hit?
>>>>> Would deferring freeing those pages help?
>>>>
>>>> Could do, but it sounds like a lot of complication that I think we can fix
>>>> by making the suspend operation put the SMMU into a "clean" state.
>>>>
>>>>>> Instead of bypass, we
>>>>>> could configure all the streams to terminate, but this race still worries me
>>>>>> somewhat. I thought that the SMMU would only be suspended if all of its
>>>>>> masters were suspended, so if the GPU wants to come out of suspend then the
>>>>>> SMMU should be resumed first.
>>>>>
>>>>> I believe this should be true.. on the gpu side, I'm mostly trying to
>>>>> avoid having to power the gpu back on to free buffers.  (On the v4l2
>>>>> side, somewhere in the core videobuf code would also need to be made
>>>>> to wrap it's dma_unmap_sg() with pm_runtime_get/put()..)
>>>>
>>>> Right, and we shouldn't have to resume it if we suspend it in a clean state,
>>>> with the TLBs invalidated.
>>>>
>>>
>>> I guess if the device_link() stuff ensured the attached device
>>> (gpu/etc) was suspended before suspending the iommu, then I guess I
>>> can't see how temporarily putting the iommu in bypass would be a
>>> problem.  I haven't looked at the device_link() stuff too closely, but
>>> iommu being resumed first and suspended last seems like the only thing
>>> that would make sense.  I'm mostly just nervous about iommu in bypass
>>> vs gpu since userspace has so much control over what address gpu
>>> writes to / reads from, so getting it wrong w/ the iommu would be a
>>> rather bad thing ;-)
>>
>> Right, but we can also configure it to terminate if you don't want bypass.
>>
> 

 But one thing here is, with devicelinks in picture, iommu suspend/resume
 is called along with the master. That means, we can end up cleaning even
 active entries on the suspend path ?, if suspend is going to
 put the smmu in to a clean state every time. So if the master's are following
 the pm_runtime sequence before a dma_map/unmap operation, that seems better.

Regards,
 Sricharan


-- 
"QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation

---
This email has been checked for viruses by Avast antivirus software.
https://www.avast.com/antivirus

^ permalink raw reply	[flat|nested] 168+ messages in thread

* Re: [PATCH V4 3/6] iommu/arm-smmu: Invoke pm_runtime during probe, add/remove device
  2017-07-17 11:46                                               ` Sricharan R
  (?)
  (?)
@ 2017-07-17 12:28                                                   ` Sricharan R
  -1 siblings, 0 replies; 168+ messages in thread
From: Sricharan R @ 2017-07-17 12:28 UTC (permalink / raw)
  To: Rob Clark, Will Deacon
  Cc: Vivek Gautam, Stephen Boyd, Joerg Roedel, Robin Murphy,
	Rob Herring, Mark Rutland, Marek Szyprowski,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	devicetree-u79uwXL29TY76Z2rM5mHXA, Linux Kernel Mailing List,
	linux-clk, linux-arm-msm, Stanimir Varbanov, Archit Taneja,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r

Hi,

On 7/17/2017 5:16 PM, Sricharan R wrote:
> Hi,
> 
> On 7/15/2017 1:09 AM, Rob Clark wrote:
>> On Fri, Jul 14, 2017 at 3:36 PM, Will Deacon <will.deacon-5wv7dgnIgG8@public.gmane.org> wrote:
>>> On Fri, Jul 14, 2017 at 03:34:42PM -0400, Rob Clark wrote:
>>>> On Fri, Jul 14, 2017 at 3:01 PM, Will Deacon <will.deacon-5wv7dgnIgG8@public.gmane.org> wrote:
>>>>> On Fri, Jul 14, 2017 at 02:25:45PM -0400, Rob Clark wrote:
>>>>>> On Fri, Jul 14, 2017 at 2:06 PM, Will Deacon <will.deacon-5wv7dgnIgG8@public.gmane.org> wrote:
>>>>>>> On Fri, Jul 14, 2017 at 01:42:13PM -0400, Rob Clark wrote:
>>>>>>>> On Fri, Jul 14, 2017 at 1:07 PM, Will Deacon <will.deacon-5wv7dgnIgG8@public.gmane.org> wrote:
>>>>>>>>> On Thu, Jul 13, 2017 at 10:55:10AM -0400, Rob Clark wrote:
>>>>>>>>>> On Thu, Jul 13, 2017 at 9:53 AM, Sricharan R <sricharan-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org> wrote:
>>>>>>>>>>> Hi,
>>>>>>>>>>>
>>>>>>>>>>> On 7/13/2017 5:20 PM, Rob Clark wrote:
>>>>>>>>>>>> On Thu, Jul 13, 2017 at 1:35 AM, Sricharan R <sricharan-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org> wrote:
>>>>>>>>>>>>> Hi Vivek,
>>>>>>>>>>>>>
>>>>>>>>>>>>> On 7/13/2017 10:43 AM, Vivek Gautam wrote:
>>>>>>>>>>>>>> Hi Stephen,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On 07/13/2017 04:24 AM, Stephen Boyd wrote:
>>>>>>>>>>>>>>> On 07/06, Vivek Gautam wrote:
>>>>>>>>>>>>>>>> @@ -1231,12 +1237,18 @@ static int arm_smmu_map(struct iommu_domain *domain, unsigned long iova,
>>>>>>>>>>>>>>>>   static size_t arm_smmu_unmap(struct iommu_domain *domain, unsigned long iova,
>>>>>>>>>>>>>>>>                    size_t size)
>>>>>>>>>>>>>>>>   {
>>>>>>>>>>>>>>>> -    struct io_pgtable_ops *ops = to_smmu_domain(domain)->pgtbl_ops;
>>>>>>>>>>>>>>>> +    struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
>>>>>>>>>>>>>>>> +    struct io_pgtable_ops *ops = smmu_domain->pgtbl_ops;
>>>>>>>>>>>>>>>> +    size_t ret;
>>>>>>>>>>>>>>>>         if (!ops)
>>>>>>>>>>>>>>>>           return 0;
>>>>>>>>>>>>>>>>   -    return ops->unmap(ops, iova, size);
>>>>>>>>>>>>>>>> +    pm_runtime_get_sync(smmu_domain->smmu->dev);
>>>>>>>>>>>>>>> Can these map/unmap ops be called from an atomic context? I seem
>>>>>>>>>>>>>>> to recall that being a problem before.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> That's something which was dropped in the following patch merged in master:
>>>>>>>>>>>>>> 523d7423e21b iommu/arm-smmu: Remove io-pgtable spinlock
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Looks like we don't  need locks here anymore?
>>>>>>>>>>>>>
>>>>>>>>>>>>>  Apart from the locking, wonder why a explicit pm_runtime is needed
>>>>>>>>>>>>>  from unmap. Somehow looks like some path in the master using that
>>>>>>>>>>>>>  should have enabled the pm ?
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Yes, there are a bunch of scenarios where unmap can happen with
>>>>>>>>>>>> disabled master (but not in atomic context).  On the gpu side we
>>>>>>>>>>>> opportunistically keep a buffer mapping until the buffer is freed
>>>>>>>>>>>> (which can happen after gpu is disabled).  Likewise, v4l2 won't unmap
>>>>>>>>>>>> an exported dmabuf while some other driver holds a reference to it
>>>>>>>>>>>> (which can be dropped when the v4l2 device is suspended).
>>>>>>>>>>>>
>>>>>>>>>>>> Since unmap triggers tbl flush which touches iommu regs, the iommu
>>>>>>>>>>>> driver *definitely* needs a pm_runtime_get_sync().
>>>>>>>>>>>
>>>>>>>>>>>  Ok, with that being the case, there are two things here,
>>>>>>>>>>>
>>>>>>>>>>>  1) If the device links are still intact at these places where unmap is called,
>>>>>>>>>>>     then pm_runtime from the master would setup the all the clocks. That would
>>>>>>>>>>>     avoid reintroducing the locking indirectly here.
>>>>>>>>>>>
>>>>>>>>>>>  2) If not, then doing it here is the only way. But for both cases, since
>>>>>>>>>>>     the unmap can be called from atomic context, resume handler here should
>>>>>>>>>>>     avoid doing clk_prepare_enable , instead move the clk_prepare to the init.
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> I do kinda like the approach Marek suggested.. of deferring the tlb
>>>>>>>>>> flush until resume.  I'm wondering if we could combine that with
>>>>>>>>>> putting the mmu in a stalled state when we suspend (and not resume the
>>>>>>>>>> mmu until after the pending tlb flush)?
>>>>>>>>>
>>>>>>>>> I'm not sure that a stalled state is what we're after here, because we need
>>>>>>>>> to take care to prevent any table walks if we've freed the underlying pages.
>>>>>>>>> What we could try to do is disable the SMMU (put into global bypass) and
>>>>>>>>> invalidate the TLB when performing a suspend operation, then we just ignore
>>>>>>>>> invalidation whilst the clocks are stopped and, on resume, enable the SMMU
>>>>>>>>> again.
>>>>>>>>
>>>>>>>> wouldn't stalled just block any memory transactions by device(s) using
>>>>>>>> the context bank?  Putting it in bypass isn't really a good thing if
>>>>>>>> there is any chance the device can sneak in a memory access before
>>>>>>>> we've taking it back out of bypass (ie. makes gpu a giant userspace
>>>>>>>> controlled root hole).
>>>>>>>
>>>>>>> If it doesn't deadlock, then yes, it will stall transactions. However, that
>>>>>>> doesn't mean it necessarily prevents page table walks.
>>>>>>
>>>>>> btw, I guess the concern about pagetable walk is that the unmap could
>>>>>> have removed some sub-level of the pt that the tlb walk would hit?
>>>>>> Would deferring freeing those pages help?
>>>>>
>>>>> Could do, but it sounds like a lot of complication that I think we can fix
>>>>> by making the suspend operation put the SMMU into a "clean" state.
>>>>>
>>>>>>> Instead of bypass, we
>>>>>>> could configure all the streams to terminate, but this race still worries me
>>>>>>> somewhat. I thought that the SMMU would only be suspended if all of its
>>>>>>> masters were suspended, so if the GPU wants to come out of suspend then the
>>>>>>> SMMU should be resumed first.
>>>>>>
>>>>>> I believe this should be true.. on the gpu side, I'm mostly trying to
>>>>>> avoid having to power the gpu back on to free buffers.  (On the v4l2
>>>>>> side, somewhere in the core videobuf code would also need to be made
>>>>>> to wrap it's dma_unmap_sg() with pm_runtime_get/put()..)
>>>>>
>>>>> Right, and we shouldn't have to resume it if we suspend it in a clean state,
>>>>> with the TLBs invalidated.
>>>>>
>>>>
>>>> I guess if the device_link() stuff ensured the attached device
>>>> (gpu/etc) was suspended before suspending the iommu, then I guess I
>>>> can't see how temporarily putting the iommu in bypass would be a
>>>> problem.  I haven't looked at the device_link() stuff too closely, but
>>>> iommu being resumed first and suspended last seems like the only thing
>>>> that would make sense.  I'm mostly just nervous about iommu in bypass
>>>> vs gpu since userspace has so much control over what address gpu
>>>> writes to / reads from, so getting it wrong w/ the iommu would be a
>>>> rather bad thing ;-)
>>>
>>> Right, but we can also configure it to terminate if you don't want bypass.
>>>
>>
> 
>  But one thing here is, with devicelinks in picture, iommu suspend/resume
>  is called along with the master. That means, we can end up cleaning even
>  active entries on the suspend path ?, if suspend is going to
>  put the smmu in to a clean state every time. So if the master's are following
>  the pm_runtime sequence before a dma_map/unmap operation, that seems better.
> 

 Also, for the usecase of unmap being done from master's like GPU while it is already
 suspended, then following the Marek's approach of checking for the smmu state while
 in unmap and defer the TLB flush till resume seems correct way. All of the above
 true if we want to use device_link.

Regards,
 Sricharan

-- 
"QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation

---
This email has been checked for viruses by Avast antivirus software.
https://www.avast.com/antivirus

--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 168+ messages in thread

* Re: [PATCH V4 3/6] iommu/arm-smmu: Invoke pm_runtime during probe, add/remove device
@ 2017-07-17 12:28                                                   ` Sricharan R
  0 siblings, 0 replies; 168+ messages in thread
From: Sricharan R @ 2017-07-17 12:28 UTC (permalink / raw)
  To: Rob Clark, Will Deacon
  Cc: Vivek Gautam, Stephen Boyd, Joerg Roedel, Robin Murphy,
	Rob Herring, Mark Rutland, Marek Szyprowski, iommu, devicetree,
	Linux Kernel Mailing List, linux-clk, linux-arm-msm,
	Stanimir Varbanov, Archit Taneja, linux-arm-kernel

Hi,

On 7/17/2017 5:16 PM, Sricharan R wrote:
> Hi,
> 
> On 7/15/2017 1:09 AM, Rob Clark wrote:
>> On Fri, Jul 14, 2017 at 3:36 PM, Will Deacon <will.deacon@arm.com> wrote:
>>> On Fri, Jul 14, 2017 at 03:34:42PM -0400, Rob Clark wrote:
>>>> On Fri, Jul 14, 2017 at 3:01 PM, Will Deacon <will.deacon@arm.com> wrote:
>>>>> On Fri, Jul 14, 2017 at 02:25:45PM -0400, Rob Clark wrote:
>>>>>> On Fri, Jul 14, 2017 at 2:06 PM, Will Deacon <will.deacon@arm.com> wrote:
>>>>>>> On Fri, Jul 14, 2017 at 01:42:13PM -0400, Rob Clark wrote:
>>>>>>>> On Fri, Jul 14, 2017 at 1:07 PM, Will Deacon <will.deacon@arm.com> wrote:
>>>>>>>>> On Thu, Jul 13, 2017 at 10:55:10AM -0400, Rob Clark wrote:
>>>>>>>>>> On Thu, Jul 13, 2017 at 9:53 AM, Sricharan R <sricharan@codeaurora.org> wrote:
>>>>>>>>>>> Hi,
>>>>>>>>>>>
>>>>>>>>>>> On 7/13/2017 5:20 PM, Rob Clark wrote:
>>>>>>>>>>>> On Thu, Jul 13, 2017 at 1:35 AM, Sricharan R <sricharan@codeaurora.org> wrote:
>>>>>>>>>>>>> Hi Vivek,
>>>>>>>>>>>>>
>>>>>>>>>>>>> On 7/13/2017 10:43 AM, Vivek Gautam wrote:
>>>>>>>>>>>>>> Hi Stephen,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On 07/13/2017 04:24 AM, Stephen Boyd wrote:
>>>>>>>>>>>>>>> On 07/06, Vivek Gautam wrote:
>>>>>>>>>>>>>>>> @@ -1231,12 +1237,18 @@ static int arm_smmu_map(struct iommu_domain *domain, unsigned long iova,
>>>>>>>>>>>>>>>>   static size_t arm_smmu_unmap(struct iommu_domain *domain, unsigned long iova,
>>>>>>>>>>>>>>>>                    size_t size)
>>>>>>>>>>>>>>>>   {
>>>>>>>>>>>>>>>> -    struct io_pgtable_ops *ops = to_smmu_domain(domain)->pgtbl_ops;
>>>>>>>>>>>>>>>> +    struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
>>>>>>>>>>>>>>>> +    struct io_pgtable_ops *ops = smmu_domain->pgtbl_ops;
>>>>>>>>>>>>>>>> +    size_t ret;
>>>>>>>>>>>>>>>>         if (!ops)
>>>>>>>>>>>>>>>>           return 0;
>>>>>>>>>>>>>>>>   -    return ops->unmap(ops, iova, size);
>>>>>>>>>>>>>>>> +    pm_runtime_get_sync(smmu_domain->smmu->dev);
>>>>>>>>>>>>>>> Can these map/unmap ops be called from an atomic context? I seem
>>>>>>>>>>>>>>> to recall that being a problem before.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> That's something which was dropped in the following patch merged in master:
>>>>>>>>>>>>>> 523d7423e21b iommu/arm-smmu: Remove io-pgtable spinlock
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Looks like we don't  need locks here anymore?
>>>>>>>>>>>>>
>>>>>>>>>>>>>  Apart from the locking, wonder why a explicit pm_runtime is needed
>>>>>>>>>>>>>  from unmap. Somehow looks like some path in the master using that
>>>>>>>>>>>>>  should have enabled the pm ?
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Yes, there are a bunch of scenarios where unmap can happen with
>>>>>>>>>>>> disabled master (but not in atomic context).  On the gpu side we
>>>>>>>>>>>> opportunistically keep a buffer mapping until the buffer is freed
>>>>>>>>>>>> (which can happen after gpu is disabled).  Likewise, v4l2 won't unmap
>>>>>>>>>>>> an exported dmabuf while some other driver holds a reference to it
>>>>>>>>>>>> (which can be dropped when the v4l2 device is suspended).
>>>>>>>>>>>>
>>>>>>>>>>>> Since unmap triggers tbl flush which touches iommu regs, the iommu
>>>>>>>>>>>> driver *definitely* needs a pm_runtime_get_sync().
>>>>>>>>>>>
>>>>>>>>>>>  Ok, with that being the case, there are two things here,
>>>>>>>>>>>
>>>>>>>>>>>  1) If the device links are still intact at these places where unmap is called,
>>>>>>>>>>>     then pm_runtime from the master would setup the all the clocks. That would
>>>>>>>>>>>     avoid reintroducing the locking indirectly here.
>>>>>>>>>>>
>>>>>>>>>>>  2) If not, then doing it here is the only way. But for both cases, since
>>>>>>>>>>>     the unmap can be called from atomic context, resume handler here should
>>>>>>>>>>>     avoid doing clk_prepare_enable , instead move the clk_prepare to the init.
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> I do kinda like the approach Marek suggested.. of deferring the tlb
>>>>>>>>>> flush until resume.  I'm wondering if we could combine that with
>>>>>>>>>> putting the mmu in a stalled state when we suspend (and not resume the
>>>>>>>>>> mmu until after the pending tlb flush)?
>>>>>>>>>
>>>>>>>>> I'm not sure that a stalled state is what we're after here, because we need
>>>>>>>>> to take care to prevent any table walks if we've freed the underlying pages.
>>>>>>>>> What we could try to do is disable the SMMU (put into global bypass) and
>>>>>>>>> invalidate the TLB when performing a suspend operation, then we just ignore
>>>>>>>>> invalidation whilst the clocks are stopped and, on resume, enable the SMMU
>>>>>>>>> again.
>>>>>>>>
>>>>>>>> wouldn't stalled just block any memory transactions by device(s) using
>>>>>>>> the context bank?  Putting it in bypass isn't really a good thing if
>>>>>>>> there is any chance the device can sneak in a memory access before
>>>>>>>> we've taking it back out of bypass (ie. makes gpu a giant userspace
>>>>>>>> controlled root hole).
>>>>>>>
>>>>>>> If it doesn't deadlock, then yes, it will stall transactions. However, that
>>>>>>> doesn't mean it necessarily prevents page table walks.
>>>>>>
>>>>>> btw, I guess the concern about pagetable walk is that the unmap could
>>>>>> have removed some sub-level of the pt that the tlb walk would hit?
>>>>>> Would deferring freeing those pages help?
>>>>>
>>>>> Could do, but it sounds like a lot of complication that I think we can fix
>>>>> by making the suspend operation put the SMMU into a "clean" state.
>>>>>
>>>>>>> Instead of bypass, we
>>>>>>> could configure all the streams to terminate, but this race still worries me
>>>>>>> somewhat. I thought that the SMMU would only be suspended if all of its
>>>>>>> masters were suspended, so if the GPU wants to come out of suspend then the
>>>>>>> SMMU should be resumed first.
>>>>>>
>>>>>> I believe this should be true.. on the gpu side, I'm mostly trying to
>>>>>> avoid having to power the gpu back on to free buffers.  (On the v4l2
>>>>>> side, somewhere in the core videobuf code would also need to be made
>>>>>> to wrap it's dma_unmap_sg() with pm_runtime_get/put()..)
>>>>>
>>>>> Right, and we shouldn't have to resume it if we suspend it in a clean state,
>>>>> with the TLBs invalidated.
>>>>>
>>>>
>>>> I guess if the device_link() stuff ensured the attached device
>>>> (gpu/etc) was suspended before suspending the iommu, then I guess I
>>>> can't see how temporarily putting the iommu in bypass would be a
>>>> problem.  I haven't looked at the device_link() stuff too closely, but
>>>> iommu being resumed first and suspended last seems like the only thing
>>>> that would make sense.  I'm mostly just nervous about iommu in bypass
>>>> vs gpu since userspace has so much control over what address gpu
>>>> writes to / reads from, so getting it wrong w/ the iommu would be a
>>>> rather bad thing ;-)
>>>
>>> Right, but we can also configure it to terminate if you don't want bypass.
>>>
>>
> 
>  But one thing here is, with devicelinks in picture, iommu suspend/resume
>  is called along with the master. That means, we can end up cleaning even
>  active entries on the suspend path ?, if suspend is going to
>  put the smmu in to a clean state every time. So if the master's are following
>  the pm_runtime sequence before a dma_map/unmap operation, that seems better.
> 

 Also, for the usecase of unmap being done from master's like GPU while it is already
 suspended, then following the Marek's approach of checking for the smmu state while
 in unmap and defer the TLB flush till resume seems correct way. All of the above
 true if we want to use device_link.

Regards,
 Sricharan

-- 
"QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation

---
This email has been checked for viruses by Avast antivirus software.
https://www.avast.com/antivirus

^ permalink raw reply	[flat|nested] 168+ messages in thread

* Re: [PATCH V4 3/6] iommu/arm-smmu: Invoke pm_runtime during probe, add/remove device
@ 2017-07-17 12:28                                                   ` Sricharan R
  0 siblings, 0 replies; 168+ messages in thread
From: Sricharan R @ 2017-07-17 12:28 UTC (permalink / raw)
  To: Rob Clark, Will Deacon
  Cc: Vivek Gautam, Stephen Boyd, Joerg Roedel, Robin Murphy,
	Rob Herring, Mark Rutland, Marek Szyprowski, iommu, devicetree,
	Linux Kernel Mailing List, linux-clk, linux-arm-msm,
	Stanimir Varbanov, Archit Taneja, linux-arm-kernel

Hi,

On 7/17/2017 5:16 PM, Sricharan R wrote:
> Hi,
> 
> On 7/15/2017 1:09 AM, Rob Clark wrote:
>> On Fri, Jul 14, 2017 at 3:36 PM, Will Deacon <will.deacon@arm.com> wrote:
>>> On Fri, Jul 14, 2017 at 03:34:42PM -0400, Rob Clark wrote:
>>>> On Fri, Jul 14, 2017 at 3:01 PM, Will Deacon <will.deacon@arm.com> wrote:
>>>>> On Fri, Jul 14, 2017 at 02:25:45PM -0400, Rob Clark wrote:
>>>>>> On Fri, Jul 14, 2017 at 2:06 PM, Will Deacon <will.deacon@arm.com> wrote:
>>>>>>> On Fri, Jul 14, 2017 at 01:42:13PM -0400, Rob Clark wrote:
>>>>>>>> On Fri, Jul 14, 2017 at 1:07 PM, Will Deacon <will.deacon@arm.com> wrote:
>>>>>>>>> On Thu, Jul 13, 2017 at 10:55:10AM -0400, Rob Clark wrote:
>>>>>>>>>> On Thu, Jul 13, 2017 at 9:53 AM, Sricharan R <sricharan@codeaurora.org> wrote:
>>>>>>>>>>> Hi,
>>>>>>>>>>>
>>>>>>>>>>> On 7/13/2017 5:20 PM, Rob Clark wrote:
>>>>>>>>>>>> On Thu, Jul 13, 2017 at 1:35 AM, Sricharan R <sricharan@codeaurora.org> wrote:
>>>>>>>>>>>>> Hi Vivek,
>>>>>>>>>>>>>
>>>>>>>>>>>>> On 7/13/2017 10:43 AM, Vivek Gautam wrote:
>>>>>>>>>>>>>> Hi Stephen,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On 07/13/2017 04:24 AM, Stephen Boyd wrote:
>>>>>>>>>>>>>>> On 07/06, Vivek Gautam wrote:
>>>>>>>>>>>>>>>> @@ -1231,12 +1237,18 @@ static int arm_smmu_map(struct iommu_domain *domain, unsigned long iova,
>>>>>>>>>>>>>>>>   static size_t arm_smmu_unmap(struct iommu_domain *domain, unsigned long iova,
>>>>>>>>>>>>>>>>                    size_t size)
>>>>>>>>>>>>>>>>   {
>>>>>>>>>>>>>>>> -    struct io_pgtable_ops *ops = to_smmu_domain(domain)->pgtbl_ops;
>>>>>>>>>>>>>>>> +    struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
>>>>>>>>>>>>>>>> +    struct io_pgtable_ops *ops = smmu_domain->pgtbl_ops;
>>>>>>>>>>>>>>>> +    size_t ret;
>>>>>>>>>>>>>>>>         if (!ops)
>>>>>>>>>>>>>>>>           return 0;
>>>>>>>>>>>>>>>>   -    return ops->unmap(ops, iova, size);
>>>>>>>>>>>>>>>> +    pm_runtime_get_sync(smmu_domain->smmu->dev);
>>>>>>>>>>>>>>> Can these map/unmap ops be called from an atomic context? I seem
>>>>>>>>>>>>>>> to recall that being a problem before.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> That's something which was dropped in the following patch merged in master:
>>>>>>>>>>>>>> 523d7423e21b iommu/arm-smmu: Remove io-pgtable spinlock
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Looks like we don't  need locks here anymore?
>>>>>>>>>>>>>
>>>>>>>>>>>>>  Apart from the locking, wonder why a explicit pm_runtime is needed
>>>>>>>>>>>>>  from unmap. Somehow looks like some path in the master using that
>>>>>>>>>>>>>  should have enabled the pm ?
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Yes, there are a bunch of scenarios where unmap can happen with
>>>>>>>>>>>> disabled master (but not in atomic context).  On the gpu side we
>>>>>>>>>>>> opportunistically keep a buffer mapping until the buffer is freed
>>>>>>>>>>>> (which can happen after gpu is disabled).  Likewise, v4l2 won't unmap
>>>>>>>>>>>> an exported dmabuf while some other driver holds a reference to it
>>>>>>>>>>>> (which can be dropped when the v4l2 device is suspended).
>>>>>>>>>>>>
>>>>>>>>>>>> Since unmap triggers tbl flush which touches iommu regs, the iommu
>>>>>>>>>>>> driver *definitely* needs a pm_runtime_get_sync().
>>>>>>>>>>>
>>>>>>>>>>>  Ok, with that being the case, there are two things here,
>>>>>>>>>>>
>>>>>>>>>>>  1) If the device links are still intact at these places where unmap is called,
>>>>>>>>>>>     then pm_runtime from the master would setup the all the clocks. That would
>>>>>>>>>>>     avoid reintroducing the locking indirectly here.
>>>>>>>>>>>
>>>>>>>>>>>  2) If not, then doing it here is the only way. But for both cases, since
>>>>>>>>>>>     the unmap can be called from atomic context, resume handler here should
>>>>>>>>>>>     avoid doing clk_prepare_enable , instead move the clk_prepare to the init.
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> I do kinda like the approach Marek suggested.. of deferring the tlb
>>>>>>>>>> flush until resume.  I'm wondering if we could combine that with
>>>>>>>>>> putting the mmu in a stalled state when we suspend (and not resume the
>>>>>>>>>> mmu until after the pending tlb flush)?
>>>>>>>>>
>>>>>>>>> I'm not sure that a stalled state is what we're after here, because we need
>>>>>>>>> to take care to prevent any table walks if we've freed the underlying pages.
>>>>>>>>> What we could try to do is disable the SMMU (put into global bypass) and
>>>>>>>>> invalidate the TLB when performing a suspend operation, then we just ignore
>>>>>>>>> invalidation whilst the clocks are stopped and, on resume, enable the SMMU
>>>>>>>>> again.
>>>>>>>>
>>>>>>>> wouldn't stalled just block any memory transactions by device(s) using
>>>>>>>> the context bank?  Putting it in bypass isn't really a good thing if
>>>>>>>> there is any chance the device can sneak in a memory access before
>>>>>>>> we've taking it back out of bypass (ie. makes gpu a giant userspace
>>>>>>>> controlled root hole).
>>>>>>>
>>>>>>> If it doesn't deadlock, then yes, it will stall transactions. However, that
>>>>>>> doesn't mean it necessarily prevents page table walks.
>>>>>>
>>>>>> btw, I guess the concern about pagetable walk is that the unmap could
>>>>>> have removed some sub-level of the pt that the tlb walk would hit?
>>>>>> Would deferring freeing those pages help?
>>>>>
>>>>> Could do, but it sounds like a lot of complication that I think we can fix
>>>>> by making the suspend operation put the SMMU into a "clean" state.
>>>>>
>>>>>>> Instead of bypass, we
>>>>>>> could configure all the streams to terminate, but this race still worries me
>>>>>>> somewhat. I thought that the SMMU would only be suspended if all of its
>>>>>>> masters were suspended, so if the GPU wants to come out of suspend then the
>>>>>>> SMMU should be resumed first.
>>>>>>
>>>>>> I believe this should be true.. on the gpu side, I'm mostly trying to
>>>>>> avoid having to power the gpu back on to free buffers.  (On the v4l2
>>>>>> side, somewhere in the core videobuf code would also need to be made
>>>>>> to wrap it's dma_unmap_sg() with pm_runtime_get/put()..)
>>>>>
>>>>> Right, and we shouldn't have to resume it if we suspend it in a clean state,
>>>>> with the TLBs invalidated.
>>>>>
>>>>
>>>> I guess if the device_link() stuff ensured the attached device
>>>> (gpu/etc) was suspended before suspending the iommu, then I guess I
>>>> can't see how temporarily putting the iommu in bypass would be a
>>>> problem.  I haven't looked at the device_link() stuff too closely, but
>>>> iommu being resumed first and suspended last seems like the only thing
>>>> that would make sense.  I'm mostly just nervous about iommu in bypass
>>>> vs gpu since userspace has so much control over what address gpu
>>>> writes to / reads from, so getting it wrong w/ the iommu would be a
>>>> rather bad thing ;-)
>>>
>>> Right, but we can also configure it to terminate if you don't want bypass.
>>>
>>
> 
>  But one thing here is, with devicelinks in picture, iommu suspend/resume
>  is called along with the master. That means, we can end up cleaning even
>  active entries on the suspend path ?, if suspend is going to
>  put the smmu in to a clean state every time. So if the master's are following
>  the pm_runtime sequence before a dma_map/unmap operation, that seems better.
> 

 Also, for the usecase of unmap being done from master's like GPU while it is already
 suspended, then following the Marek's approach of checking for the smmu state while
 in unmap and defer the TLB flush till resume seems correct way. All of the above
 true if we want to use device_link.

Regards,
 Sricharan

-- 
"QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation

---
This email has been checked for viruses by Avast antivirus software.
https://www.avast.com/antivirus


^ permalink raw reply	[flat|nested] 168+ messages in thread

* [PATCH V4 3/6] iommu/arm-smmu: Invoke pm_runtime during probe, add/remove device
@ 2017-07-17 12:28                                                   ` Sricharan R
  0 siblings, 0 replies; 168+ messages in thread
From: Sricharan R @ 2017-07-17 12:28 UTC (permalink / raw)
  To: linux-arm-kernel

Hi,

On 7/17/2017 5:16 PM, Sricharan R wrote:
> Hi,
> 
> On 7/15/2017 1:09 AM, Rob Clark wrote:
>> On Fri, Jul 14, 2017 at 3:36 PM, Will Deacon <will.deacon@arm.com> wrote:
>>> On Fri, Jul 14, 2017 at 03:34:42PM -0400, Rob Clark wrote:
>>>> On Fri, Jul 14, 2017 at 3:01 PM, Will Deacon <will.deacon@arm.com> wrote:
>>>>> On Fri, Jul 14, 2017 at 02:25:45PM -0400, Rob Clark wrote:
>>>>>> On Fri, Jul 14, 2017 at 2:06 PM, Will Deacon <will.deacon@arm.com> wrote:
>>>>>>> On Fri, Jul 14, 2017 at 01:42:13PM -0400, Rob Clark wrote:
>>>>>>>> On Fri, Jul 14, 2017 at 1:07 PM, Will Deacon <will.deacon@arm.com> wrote:
>>>>>>>>> On Thu, Jul 13, 2017 at 10:55:10AM -0400, Rob Clark wrote:
>>>>>>>>>> On Thu, Jul 13, 2017 at 9:53 AM, Sricharan R <sricharan@codeaurora.org> wrote:
>>>>>>>>>>> Hi,
>>>>>>>>>>>
>>>>>>>>>>> On 7/13/2017 5:20 PM, Rob Clark wrote:
>>>>>>>>>>>> On Thu, Jul 13, 2017 at 1:35 AM, Sricharan R <sricharan@codeaurora.org> wrote:
>>>>>>>>>>>>> Hi Vivek,
>>>>>>>>>>>>>
>>>>>>>>>>>>> On 7/13/2017 10:43 AM, Vivek Gautam wrote:
>>>>>>>>>>>>>> Hi Stephen,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On 07/13/2017 04:24 AM, Stephen Boyd wrote:
>>>>>>>>>>>>>>> On 07/06, Vivek Gautam wrote:
>>>>>>>>>>>>>>>> @@ -1231,12 +1237,18 @@ static int arm_smmu_map(struct iommu_domain *domain, unsigned long iova,
>>>>>>>>>>>>>>>>   static size_t arm_smmu_unmap(struct iommu_domain *domain, unsigned long iova,
>>>>>>>>>>>>>>>>                    size_t size)
>>>>>>>>>>>>>>>>   {
>>>>>>>>>>>>>>>> -    struct io_pgtable_ops *ops = to_smmu_domain(domain)->pgtbl_ops;
>>>>>>>>>>>>>>>> +    struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
>>>>>>>>>>>>>>>> +    struct io_pgtable_ops *ops = smmu_domain->pgtbl_ops;
>>>>>>>>>>>>>>>> +    size_t ret;
>>>>>>>>>>>>>>>>         if (!ops)
>>>>>>>>>>>>>>>>           return 0;
>>>>>>>>>>>>>>>>   -    return ops->unmap(ops, iova, size);
>>>>>>>>>>>>>>>> +    pm_runtime_get_sync(smmu_domain->smmu->dev);
>>>>>>>>>>>>>>> Can these map/unmap ops be called from an atomic context? I seem
>>>>>>>>>>>>>>> to recall that being a problem before.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> That's something which was dropped in the following patch merged in master:
>>>>>>>>>>>>>> 523d7423e21b iommu/arm-smmu: Remove io-pgtable spinlock
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Looks like we don't  need locks here anymore?
>>>>>>>>>>>>>
>>>>>>>>>>>>>  Apart from the locking, wonder why a explicit pm_runtime is needed
>>>>>>>>>>>>>  from unmap. Somehow looks like some path in the master using that
>>>>>>>>>>>>>  should have enabled the pm ?
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Yes, there are a bunch of scenarios where unmap can happen with
>>>>>>>>>>>> disabled master (but not in atomic context).  On the gpu side we
>>>>>>>>>>>> opportunistically keep a buffer mapping until the buffer is freed
>>>>>>>>>>>> (which can happen after gpu is disabled).  Likewise, v4l2 won't unmap
>>>>>>>>>>>> an exported dmabuf while some other driver holds a reference to it
>>>>>>>>>>>> (which can be dropped when the v4l2 device is suspended).
>>>>>>>>>>>>
>>>>>>>>>>>> Since unmap triggers tbl flush which touches iommu regs, the iommu
>>>>>>>>>>>> driver *definitely* needs a pm_runtime_get_sync().
>>>>>>>>>>>
>>>>>>>>>>>  Ok, with that being the case, there are two things here,
>>>>>>>>>>>
>>>>>>>>>>>  1) If the device links are still intact at these places where unmap is called,
>>>>>>>>>>>     then pm_runtime from the master would setup the all the clocks. That would
>>>>>>>>>>>     avoid reintroducing the locking indirectly here.
>>>>>>>>>>>
>>>>>>>>>>>  2) If not, then doing it here is the only way. But for both cases, since
>>>>>>>>>>>     the unmap can be called from atomic context, resume handler here should
>>>>>>>>>>>     avoid doing clk_prepare_enable , instead move the clk_prepare to the init.
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> I do kinda like the approach Marek suggested.. of deferring the tlb
>>>>>>>>>> flush until resume.  I'm wondering if we could combine that with
>>>>>>>>>> putting the mmu in a stalled state when we suspend (and not resume the
>>>>>>>>>> mmu until after the pending tlb flush)?
>>>>>>>>>
>>>>>>>>> I'm not sure that a stalled state is what we're after here, because we need
>>>>>>>>> to take care to prevent any table walks if we've freed the underlying pages.
>>>>>>>>> What we could try to do is disable the SMMU (put into global bypass) and
>>>>>>>>> invalidate the TLB when performing a suspend operation, then we just ignore
>>>>>>>>> invalidation whilst the clocks are stopped and, on resume, enable the SMMU
>>>>>>>>> again.
>>>>>>>>
>>>>>>>> wouldn't stalled just block any memory transactions by device(s) using
>>>>>>>> the context bank?  Putting it in bypass isn't really a good thing if
>>>>>>>> there is any chance the device can sneak in a memory access before
>>>>>>>> we've taking it back out of bypass (ie. makes gpu a giant userspace
>>>>>>>> controlled root hole).
>>>>>>>
>>>>>>> If it doesn't deadlock, then yes, it will stall transactions. However, that
>>>>>>> doesn't mean it necessarily prevents page table walks.
>>>>>>
>>>>>> btw, I guess the concern about pagetable walk is that the unmap could
>>>>>> have removed some sub-level of the pt that the tlb walk would hit?
>>>>>> Would deferring freeing those pages help?
>>>>>
>>>>> Could do, but it sounds like a lot of complication that I think we can fix
>>>>> by making the suspend operation put the SMMU into a "clean" state.
>>>>>
>>>>>>> Instead of bypass, we
>>>>>>> could configure all the streams to terminate, but this race still worries me
>>>>>>> somewhat. I thought that the SMMU would only be suspended if all of its
>>>>>>> masters were suspended, so if the GPU wants to come out of suspend then the
>>>>>>> SMMU should be resumed first.
>>>>>>
>>>>>> I believe this should be true.. on the gpu side, I'm mostly trying to
>>>>>> avoid having to power the gpu back on to free buffers.  (On the v4l2
>>>>>> side, somewhere in the core videobuf code would also need to be made
>>>>>> to wrap it's dma_unmap_sg() with pm_runtime_get/put()..)
>>>>>
>>>>> Right, and we shouldn't have to resume it if we suspend it in a clean state,
>>>>> with the TLBs invalidated.
>>>>>
>>>>
>>>> I guess if the device_link() stuff ensured the attached device
>>>> (gpu/etc) was suspended before suspending the iommu, then I guess I
>>>> can't see how temporarily putting the iommu in bypass would be a
>>>> problem.  I haven't looked at the device_link() stuff too closely, but
>>>> iommu being resumed first and suspended last seems like the only thing
>>>> that would make sense.  I'm mostly just nervous about iommu in bypass
>>>> vs gpu since userspace has so much control over what address gpu
>>>> writes to / reads from, so getting it wrong w/ the iommu would be a
>>>> rather bad thing ;-)
>>>
>>> Right, but we can also configure it to terminate if you don't want bypass.
>>>
>>
> 
>  But one thing here is, with devicelinks in picture, iommu suspend/resume
>  is called along with the master. That means, we can end up cleaning even
>  active entries on the suspend path ?, if suspend is going to
>  put the smmu in to a clean state every time. So if the master's are following
>  the pm_runtime sequence before a dma_map/unmap operation, that seems better.
> 

 Also, for the usecase of unmap being done from master's like GPU while it is already
 suspended, then following the Marek's approach of checking for the smmu state while
 in unmap and defer the TLB flush till resume seems correct way. All of the above
 true if we want to use device_link.

Regards,
 Sricharan

-- 
"QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation

---
This email has been checked for viruses by Avast antivirus software.
https://www.avast.com/antivirus

^ permalink raw reply	[flat|nested] 168+ messages in thread

* Re: [PATCH V4 3/6] iommu/arm-smmu: Invoke pm_runtime during probe, add/remove device
  2017-07-17 12:28                                                   ` Sricharan R
  (?)
@ 2017-07-24 15:31                                                     ` Vivek Gautam
  -1 siblings, 0 replies; 168+ messages in thread
From: Vivek Gautam @ 2017-07-24 15:31 UTC (permalink / raw)
  To: Sricharan R
  Cc: Rob Clark, Will Deacon, Stephen Boyd, Joerg Roedel, Robin Murphy,
	Rob Herring, Mark Rutland, Marek Szyprowski, iommu, devicetree,
	Linux Kernel Mailing List, linux-clk, linux-arm-msm,
	Stanimir Varbanov, Archit Taneja, linux-arm-kernel

Hi,

On Mon, Jul 17, 2017 at 5:58 PM, Sricharan R <sricharan@codeaurora.org> wrote:
> Hi,
>
> On 7/17/2017 5:16 PM, Sricharan R wrote:
>> Hi,
>>
>> On 7/15/2017 1:09 AM, Rob Clark wrote:
>>> On Fri, Jul 14, 2017 at 3:36 PM, Will Deacon <will.deacon@arm.com> wrote:
>>>> On Fri, Jul 14, 2017 at 03:34:42PM -0400, Rob Clark wrote:
>>>>> On Fri, Jul 14, 2017 at 3:01 PM, Will Deacon <will.deacon@arm.com> wrote:
>>>>>> On Fri, Jul 14, 2017 at 02:25:45PM -0400, Rob Clark wrote:
>>>>>>> On Fri, Jul 14, 2017 at 2:06 PM, Will Deacon <will.deacon@arm.com> wrote:
>>>>>>>> On Fri, Jul 14, 2017 at 01:42:13PM -0400, Rob Clark wrote:
>>>>>>>>> On Fri, Jul 14, 2017 at 1:07 PM, Will Deacon <will.deacon@arm.com> wrote:
>>>>>>>>>> On Thu, Jul 13, 2017 at 10:55:10AM -0400, Rob Clark wrote:
>>>>>>>>>>> On Thu, Jul 13, 2017 at 9:53 AM, Sricharan R <sricharan@codeaurora.org> wrote:
>>>>>>>>>>>> Hi,
>>>>>>>>>>>>
>>>>>>>>>>>> On 7/13/2017 5:20 PM, Rob Clark wrote:
>>>>>>>>>>>>> On Thu, Jul 13, 2017 at 1:35 AM, Sricharan R <sricharan@codeaurora.org> wrote:
>>>>>>>>>>>>>> Hi Vivek,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On 7/13/2017 10:43 AM, Vivek Gautam wrote:
>>>>>>>>>>>>>>> Hi Stephen,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On 07/13/2017 04:24 AM, Stephen Boyd wrote:
>>>>>>>>>>>>>>>> On 07/06, Vivek Gautam wrote:
>>>>>>>>>>>>>>>>> @@ -1231,12 +1237,18 @@ static int arm_smmu_map(struct iommu_domain *domain, unsigned long iova,
>>>>>>>>>>>>>>>>>   static size_t arm_smmu_unmap(struct iommu_domain *domain, unsigned long iova,
>>>>>>>>>>>>>>>>>                    size_t size)
>>>>>>>>>>>>>>>>>   {
>>>>>>>>>>>>>>>>> -    struct io_pgtable_ops *ops = to_smmu_domain(domain)->pgtbl_ops;
>>>>>>>>>>>>>>>>> +    struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
>>>>>>>>>>>>>>>>> +    struct io_pgtable_ops *ops = smmu_domain->pgtbl_ops;
>>>>>>>>>>>>>>>>> +    size_t ret;
>>>>>>>>>>>>>>>>>         if (!ops)
>>>>>>>>>>>>>>>>>           return 0;
>>>>>>>>>>>>>>>>>   -    return ops->unmap(ops, iova, size);
>>>>>>>>>>>>>>>>> +    pm_runtime_get_sync(smmu_domain->smmu->dev);
>>>>>>>>>>>>>>>> Can these map/unmap ops be called from an atomic context? I seem
>>>>>>>>>>>>>>>> to recall that being a problem before.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> That's something which was dropped in the following patch merged in master:
>>>>>>>>>>>>>>> 523d7423e21b iommu/arm-smmu: Remove io-pgtable spinlock
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Looks like we don't  need locks here anymore?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>  Apart from the locking, wonder why a explicit pm_runtime is needed
>>>>>>>>>>>>>>  from unmap. Somehow looks like some path in the master using that
>>>>>>>>>>>>>>  should have enabled the pm ?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Yes, there are a bunch of scenarios where unmap can happen with
>>>>>>>>>>>>> disabled master (but not in atomic context).  On the gpu side we
>>>>>>>>>>>>> opportunistically keep a buffer mapping until the buffer is freed
>>>>>>>>>>>>> (which can happen after gpu is disabled).  Likewise, v4l2 won't unmap
>>>>>>>>>>>>> an exported dmabuf while some other driver holds a reference to it
>>>>>>>>>>>>> (which can be dropped when the v4l2 device is suspended).
>>>>>>>>>>>>>
>>>>>>>>>>>>> Since unmap triggers tbl flush which touches iommu regs, the iommu
>>>>>>>>>>>>> driver *definitely* needs a pm_runtime_get_sync().
>>>>>>>>>>>>
>>>>>>>>>>>>  Ok, with that being the case, there are two things here,
>>>>>>>>>>>>
>>>>>>>>>>>>  1) If the device links are still intact at these places where unmap is called,
>>>>>>>>>>>>     then pm_runtime from the master would setup the all the clocks. That would
>>>>>>>>>>>>     avoid reintroducing the locking indirectly here.
>>>>>>>>>>>>
>>>>>>>>>>>>  2) If not, then doing it here is the only way. But for both cases, since
>>>>>>>>>>>>     the unmap can be called from atomic context, resume handler here should
>>>>>>>>>>>>     avoid doing clk_prepare_enable , instead move the clk_prepare to the init.
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> I do kinda like the approach Marek suggested.. of deferring the tlb
>>>>>>>>>>> flush until resume.  I'm wondering if we could combine that with
>>>>>>>>>>> putting the mmu in a stalled state when we suspend (and not resume the
>>>>>>>>>>> mmu until after the pending tlb flush)?
>>>>>>>>>>
>>>>>>>>>> I'm not sure that a stalled state is what we're after here, because we need
>>>>>>>>>> to take care to prevent any table walks if we've freed the underlying pages.
>>>>>>>>>> What we could try to do is disable the SMMU (put into global bypass) and
>>>>>>>>>> invalidate the TLB when performing a suspend operation, then we just ignore
>>>>>>>>>> invalidation whilst the clocks are stopped and, on resume, enable the SMMU
>>>>>>>>>> again.
>>>>>>>>>
>>>>>>>>> wouldn't stalled just block any memory transactions by device(s) using
>>>>>>>>> the context bank?  Putting it in bypass isn't really a good thing if
>>>>>>>>> there is any chance the device can sneak in a memory access before
>>>>>>>>> we've taking it back out of bypass (ie. makes gpu a giant userspace
>>>>>>>>> controlled root hole).
>>>>>>>>
>>>>>>>> If it doesn't deadlock, then yes, it will stall transactions. However, that
>>>>>>>> doesn't mean it necessarily prevents page table walks.
>>>>>>>
>>>>>>> btw, I guess the concern about pagetable walk is that the unmap could
>>>>>>> have removed some sub-level of the pt that the tlb walk would hit?
>>>>>>> Would deferring freeing those pages help?
>>>>>>
>>>>>> Could do, but it sounds like a lot of complication that I think we can fix
>>>>>> by making the suspend operation put the SMMU into a "clean" state.
>>>>>>
>>>>>>>> Instead of bypass, we
>>>>>>>> could configure all the streams to terminate, but this race still worries me
>>>>>>>> somewhat. I thought that the SMMU would only be suspended if all of its
>>>>>>>> masters were suspended, so if the GPU wants to come out of suspend then the
>>>>>>>> SMMU should be resumed first.
>>>>>>>
>>>>>>> I believe this should be true.. on the gpu side, I'm mostly trying to
>>>>>>> avoid having to power the gpu back on to free buffers.  (On the v4l2
>>>>>>> side, somewhere in the core videobuf code would also need to be made
>>>>>>> to wrap it's dma_unmap_sg() with pm_runtime_get/put()..)
>>>>>>
>>>>>> Right, and we shouldn't have to resume it if we suspend it in a clean state,
>>>>>> with the TLBs invalidated.
>>>>>>
>>>>>
>>>>> I guess if the device_link() stuff ensured the attached device
>>>>> (gpu/etc) was suspended before suspending the iommu, then I guess I
>>>>> can't see how temporarily putting the iommu in bypass would be a
>>>>> problem.  I haven't looked at the device_link() stuff too closely, but
>>>>> iommu being resumed first and suspended last seems like the only thing
>>>>> that would make sense.  I'm mostly just nervous about iommu in bypass
>>>>> vs gpu since userspace has so much control over what address gpu
>>>>> writes to / reads from, so getting it wrong w/ the iommu would be a
>>>>> rather bad thing ;-)
>>>>
>>>> Right, but we can also configure it to terminate if you don't want bypass.
>>>>
>>>
>>
>>  But one thing here is, with devicelinks in picture, iommu suspend/resume
>>  is called along with the master. That means, we can end up cleaning even
>>  active entries on the suspend path ?, if suspend is going to
>>  put the smmu in to a clean state every time. So if the master's are following
>>  the pm_runtime sequence before a dma_map/unmap operation, that seems better.
>>
>
>  Also, for the usecase of unmap being done from master's like GPU while it is already
>  suspended, then following the Marek's approach of checking for the smmu state while
>  in unmap and defer the TLB flush till resume seems correct way. All of the above
>  true if we want to use device_link.

This sounds like a plan.
I have a WIP version in which we will just check in 'unmap' if the mmu is
already suspended. If yes, then we save the unmap request (domain, iova)
and defer this request.
On resume, we check the pending tlb flush requests, and call unmap.

I will give a try for the venus use-case.

Regards
Vivek

>
> Regards,
>  Sricharan
>
> --
> "QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation
>
> ---
> This email has been checked for viruses by Avast antivirus software.
> https://www.avast.com/antivirus
>



-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project

^ permalink raw reply	[flat|nested] 168+ messages in thread

* Re: [PATCH V4 3/6] iommu/arm-smmu: Invoke pm_runtime during probe, add/remove device
@ 2017-07-24 15:31                                                     ` Vivek Gautam
  0 siblings, 0 replies; 168+ messages in thread
From: Vivek Gautam @ 2017-07-24 15:31 UTC (permalink / raw)
  To: Sricharan R
  Cc: Rob Clark, Will Deacon, Stephen Boyd, Joerg Roedel, Robin Murphy,
	Rob Herring, Mark Rutland, Marek Szyprowski, iommu, devicetree,
	Linux Kernel Mailing List, linux-clk, linux-arm-msm,
	Stanimir Varbanov, Archit Taneja, linux-arm-kernel

Hi,

On Mon, Jul 17, 2017 at 5:58 PM, Sricharan R <sricharan@codeaurora.org> wrote:
> Hi,
>
> On 7/17/2017 5:16 PM, Sricharan R wrote:
>> Hi,
>>
>> On 7/15/2017 1:09 AM, Rob Clark wrote:
>>> On Fri, Jul 14, 2017 at 3:36 PM, Will Deacon <will.deacon@arm.com> wrote:
>>>> On Fri, Jul 14, 2017 at 03:34:42PM -0400, Rob Clark wrote:
>>>>> On Fri, Jul 14, 2017 at 3:01 PM, Will Deacon <will.deacon@arm.com> wrote:
>>>>>> On Fri, Jul 14, 2017 at 02:25:45PM -0400, Rob Clark wrote:
>>>>>>> On Fri, Jul 14, 2017 at 2:06 PM, Will Deacon <will.deacon@arm.com> wrote:
>>>>>>>> On Fri, Jul 14, 2017 at 01:42:13PM -0400, Rob Clark wrote:
>>>>>>>>> On Fri, Jul 14, 2017 at 1:07 PM, Will Deacon <will.deacon@arm.com> wrote:
>>>>>>>>>> On Thu, Jul 13, 2017 at 10:55:10AM -0400, Rob Clark wrote:
>>>>>>>>>>> On Thu, Jul 13, 2017 at 9:53 AM, Sricharan R <sricharan@codeaurora.org> wrote:
>>>>>>>>>>>> Hi,
>>>>>>>>>>>>
>>>>>>>>>>>> On 7/13/2017 5:20 PM, Rob Clark wrote:
>>>>>>>>>>>>> On Thu, Jul 13, 2017 at 1:35 AM, Sricharan R <sricharan@codeaurora.org> wrote:
>>>>>>>>>>>>>> Hi Vivek,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On 7/13/2017 10:43 AM, Vivek Gautam wrote:
>>>>>>>>>>>>>>> Hi Stephen,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On 07/13/2017 04:24 AM, Stephen Boyd wrote:
>>>>>>>>>>>>>>>> On 07/06, Vivek Gautam wrote:
>>>>>>>>>>>>>>>>> @@ -1231,12 +1237,18 @@ static int arm_smmu_map(struct iommu_domain *domain, unsigned long iova,
>>>>>>>>>>>>>>>>>   static size_t arm_smmu_unmap(struct iommu_domain *domain, unsigned long iova,
>>>>>>>>>>>>>>>>>                    size_t size)
>>>>>>>>>>>>>>>>>   {
>>>>>>>>>>>>>>>>> -    struct io_pgtable_ops *ops = to_smmu_domain(domain)->pgtbl_ops;
>>>>>>>>>>>>>>>>> +    struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
>>>>>>>>>>>>>>>>> +    struct io_pgtable_ops *ops = smmu_domain->pgtbl_ops;
>>>>>>>>>>>>>>>>> +    size_t ret;
>>>>>>>>>>>>>>>>>         if (!ops)
>>>>>>>>>>>>>>>>>           return 0;
>>>>>>>>>>>>>>>>>   -    return ops->unmap(ops, iova, size);
>>>>>>>>>>>>>>>>> +    pm_runtime_get_sync(smmu_domain->smmu->dev);
>>>>>>>>>>>>>>>> Can these map/unmap ops be called from an atomic context? I seem
>>>>>>>>>>>>>>>> to recall that being a problem before.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> That's something which was dropped in the following patch merged in master:
>>>>>>>>>>>>>>> 523d7423e21b iommu/arm-smmu: Remove io-pgtable spinlock
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Looks like we don't  need locks here anymore?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>  Apart from the locking, wonder why a explicit pm_runtime is needed
>>>>>>>>>>>>>>  from unmap. Somehow looks like some path in the master using that
>>>>>>>>>>>>>>  should have enabled the pm ?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Yes, there are a bunch of scenarios where unmap can happen with
>>>>>>>>>>>>> disabled master (but not in atomic context).  On the gpu side we
>>>>>>>>>>>>> opportunistically keep a buffer mapping until the buffer is freed
>>>>>>>>>>>>> (which can happen after gpu is disabled).  Likewise, v4l2 won't unmap
>>>>>>>>>>>>> an exported dmabuf while some other driver holds a reference to it
>>>>>>>>>>>>> (which can be dropped when the v4l2 device is suspended).
>>>>>>>>>>>>>
>>>>>>>>>>>>> Since unmap triggers tbl flush which touches iommu regs, the iommu
>>>>>>>>>>>>> driver *definitely* needs a pm_runtime_get_sync().
>>>>>>>>>>>>
>>>>>>>>>>>>  Ok, with that being the case, there are two things here,
>>>>>>>>>>>>
>>>>>>>>>>>>  1) If the device links are still intact at these places where unmap is called,
>>>>>>>>>>>>     then pm_runtime from the master would setup the all the clocks. That would
>>>>>>>>>>>>     avoid reintroducing the locking indirectly here.
>>>>>>>>>>>>
>>>>>>>>>>>>  2) If not, then doing it here is the only way. But for both cases, since
>>>>>>>>>>>>     the unmap can be called from atomic context, resume handler here should
>>>>>>>>>>>>     avoid doing clk_prepare_enable , instead move the clk_prepare to the init.
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> I do kinda like the approach Marek suggested.. of deferring the tlb
>>>>>>>>>>> flush until resume.  I'm wondering if we could combine that with
>>>>>>>>>>> putting the mmu in a stalled state when we suspend (and not resume the
>>>>>>>>>>> mmu until after the pending tlb flush)?
>>>>>>>>>>
>>>>>>>>>> I'm not sure that a stalled state is what we're after here, because we need
>>>>>>>>>> to take care to prevent any table walks if we've freed the underlying pages.
>>>>>>>>>> What we could try to do is disable the SMMU (put into global bypass) and
>>>>>>>>>> invalidate the TLB when performing a suspend operation, then we just ignore
>>>>>>>>>> invalidation whilst the clocks are stopped and, on resume, enable the SMMU
>>>>>>>>>> again.
>>>>>>>>>
>>>>>>>>> wouldn't stalled just block any memory transactions by device(s) using
>>>>>>>>> the context bank?  Putting it in bypass isn't really a good thing if
>>>>>>>>> there is any chance the device can sneak in a memory access before
>>>>>>>>> we've taking it back out of bypass (ie. makes gpu a giant userspace
>>>>>>>>> controlled root hole).
>>>>>>>>
>>>>>>>> If it doesn't deadlock, then yes, it will stall transactions. However, that
>>>>>>>> doesn't mean it necessarily prevents page table walks.
>>>>>>>
>>>>>>> btw, I guess the concern about pagetable walk is that the unmap could
>>>>>>> have removed some sub-level of the pt that the tlb walk would hit?
>>>>>>> Would deferring freeing those pages help?
>>>>>>
>>>>>> Could do, but it sounds like a lot of complication that I think we can fix
>>>>>> by making the suspend operation put the SMMU into a "clean" state.
>>>>>>
>>>>>>>> Instead of bypass, we
>>>>>>>> could configure all the streams to terminate, but this race still worries me
>>>>>>>> somewhat. I thought that the SMMU would only be suspended if all of its
>>>>>>>> masters were suspended, so if the GPU wants to come out of suspend then the
>>>>>>>> SMMU should be resumed first.
>>>>>>>
>>>>>>> I believe this should be true.. on the gpu side, I'm mostly trying to
>>>>>>> avoid having to power the gpu back on to free buffers.  (On the v4l2
>>>>>>> side, somewhere in the core videobuf code would also need to be made
>>>>>>> to wrap it's dma_unmap_sg() with pm_runtime_get/put()..)
>>>>>>
>>>>>> Right, and we shouldn't have to resume it if we suspend it in a clean state,
>>>>>> with the TLBs invalidated.
>>>>>>
>>>>>
>>>>> I guess if the device_link() stuff ensured the attached device
>>>>> (gpu/etc) was suspended before suspending the iommu, then I guess I
>>>>> can't see how temporarily putting the iommu in bypass would be a
>>>>> problem.  I haven't looked at the device_link() stuff too closely, but
>>>>> iommu being resumed first and suspended last seems like the only thing
>>>>> that would make sense.  I'm mostly just nervous about iommu in bypass
>>>>> vs gpu since userspace has so much control over what address gpu
>>>>> writes to / reads from, so getting it wrong w/ the iommu would be a
>>>>> rather bad thing ;-)
>>>>
>>>> Right, but we can also configure it to terminate if you don't want bypass.
>>>>
>>>
>>
>>  But one thing here is, with devicelinks in picture, iommu suspend/resume
>>  is called along with the master. That means, we can end up cleaning even
>>  active entries on the suspend path ?, if suspend is going to
>>  put the smmu in to a clean state every time. So if the master's are following
>>  the pm_runtime sequence before a dma_map/unmap operation, that seems better.
>>
>
>  Also, for the usecase of unmap being done from master's like GPU while it is already
>  suspended, then following the Marek's approach of checking for the smmu state while
>  in unmap and defer the TLB flush till resume seems correct way. All of the above
>  true if we want to use device_link.

This sounds like a plan.
I have a WIP version in which we will just check in 'unmap' if the mmu is
already suspended. If yes, then we save the unmap request (domain, iova)
and defer this request.
On resume, we check the pending tlb flush requests, and call unmap.

I will give a try for the venus use-case.

Regards
Vivek

>
> Regards,
>  Sricharan
>
> --
> "QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation
>
> ---
> This email has been checked for viruses by Avast antivirus software.
> https://www.avast.com/antivirus
>



-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project

^ permalink raw reply	[flat|nested] 168+ messages in thread

* [PATCH V4 3/6] iommu/arm-smmu: Invoke pm_runtime during probe, add/remove device
@ 2017-07-24 15:31                                                     ` Vivek Gautam
  0 siblings, 0 replies; 168+ messages in thread
From: Vivek Gautam @ 2017-07-24 15:31 UTC (permalink / raw)
  To: linux-arm-kernel

Hi,

On Mon, Jul 17, 2017 at 5:58 PM, Sricharan R <sricharan@codeaurora.org> wrote:
> Hi,
>
> On 7/17/2017 5:16 PM, Sricharan R wrote:
>> Hi,
>>
>> On 7/15/2017 1:09 AM, Rob Clark wrote:
>>> On Fri, Jul 14, 2017 at 3:36 PM, Will Deacon <will.deacon@arm.com> wrote:
>>>> On Fri, Jul 14, 2017 at 03:34:42PM -0400, Rob Clark wrote:
>>>>> On Fri, Jul 14, 2017 at 3:01 PM, Will Deacon <will.deacon@arm.com> wrote:
>>>>>> On Fri, Jul 14, 2017 at 02:25:45PM -0400, Rob Clark wrote:
>>>>>>> On Fri, Jul 14, 2017 at 2:06 PM, Will Deacon <will.deacon@arm.com> wrote:
>>>>>>>> On Fri, Jul 14, 2017 at 01:42:13PM -0400, Rob Clark wrote:
>>>>>>>>> On Fri, Jul 14, 2017 at 1:07 PM, Will Deacon <will.deacon@arm.com> wrote:
>>>>>>>>>> On Thu, Jul 13, 2017 at 10:55:10AM -0400, Rob Clark wrote:
>>>>>>>>>>> On Thu, Jul 13, 2017 at 9:53 AM, Sricharan R <sricharan@codeaurora.org> wrote:
>>>>>>>>>>>> Hi,
>>>>>>>>>>>>
>>>>>>>>>>>> On 7/13/2017 5:20 PM, Rob Clark wrote:
>>>>>>>>>>>>> On Thu, Jul 13, 2017 at 1:35 AM, Sricharan R <sricharan@codeaurora.org> wrote:
>>>>>>>>>>>>>> Hi Vivek,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On 7/13/2017 10:43 AM, Vivek Gautam wrote:
>>>>>>>>>>>>>>> Hi Stephen,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On 07/13/2017 04:24 AM, Stephen Boyd wrote:
>>>>>>>>>>>>>>>> On 07/06, Vivek Gautam wrote:
>>>>>>>>>>>>>>>>> @@ -1231,12 +1237,18 @@ static int arm_smmu_map(struct iommu_domain *domain, unsigned long iova,
>>>>>>>>>>>>>>>>>   static size_t arm_smmu_unmap(struct iommu_domain *domain, unsigned long iova,
>>>>>>>>>>>>>>>>>                    size_t size)
>>>>>>>>>>>>>>>>>   {
>>>>>>>>>>>>>>>>> -    struct io_pgtable_ops *ops = to_smmu_domain(domain)->pgtbl_ops;
>>>>>>>>>>>>>>>>> +    struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
>>>>>>>>>>>>>>>>> +    struct io_pgtable_ops *ops = smmu_domain->pgtbl_ops;
>>>>>>>>>>>>>>>>> +    size_t ret;
>>>>>>>>>>>>>>>>>         if (!ops)
>>>>>>>>>>>>>>>>>           return 0;
>>>>>>>>>>>>>>>>>   -    return ops->unmap(ops, iova, size);
>>>>>>>>>>>>>>>>> +    pm_runtime_get_sync(smmu_domain->smmu->dev);
>>>>>>>>>>>>>>>> Can these map/unmap ops be called from an atomic context? I seem
>>>>>>>>>>>>>>>> to recall that being a problem before.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> That's something which was dropped in the following patch merged in master:
>>>>>>>>>>>>>>> 523d7423e21b iommu/arm-smmu: Remove io-pgtable spinlock
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Looks like we don't  need locks here anymore?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>  Apart from the locking, wonder why a explicit pm_runtime is needed
>>>>>>>>>>>>>>  from unmap. Somehow looks like some path in the master using that
>>>>>>>>>>>>>>  should have enabled the pm ?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Yes, there are a bunch of scenarios where unmap can happen with
>>>>>>>>>>>>> disabled master (but not in atomic context).  On the gpu side we
>>>>>>>>>>>>> opportunistically keep a buffer mapping until the buffer is freed
>>>>>>>>>>>>> (which can happen after gpu is disabled).  Likewise, v4l2 won't unmap
>>>>>>>>>>>>> an exported dmabuf while some other driver holds a reference to it
>>>>>>>>>>>>> (which can be dropped when the v4l2 device is suspended).
>>>>>>>>>>>>>
>>>>>>>>>>>>> Since unmap triggers tbl flush which touches iommu regs, the iommu
>>>>>>>>>>>>> driver *definitely* needs a pm_runtime_get_sync().
>>>>>>>>>>>>
>>>>>>>>>>>>  Ok, with that being the case, there are two things here,
>>>>>>>>>>>>
>>>>>>>>>>>>  1) If the device links are still intact at these places where unmap is called,
>>>>>>>>>>>>     then pm_runtime from the master would setup the all the clocks. That would
>>>>>>>>>>>>     avoid reintroducing the locking indirectly here.
>>>>>>>>>>>>
>>>>>>>>>>>>  2) If not, then doing it here is the only way. But for both cases, since
>>>>>>>>>>>>     the unmap can be called from atomic context, resume handler here should
>>>>>>>>>>>>     avoid doing clk_prepare_enable , instead move the clk_prepare to the init.
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> I do kinda like the approach Marek suggested.. of deferring the tlb
>>>>>>>>>>> flush until resume.  I'm wondering if we could combine that with
>>>>>>>>>>> putting the mmu in a stalled state when we suspend (and not resume the
>>>>>>>>>>> mmu until after the pending tlb flush)?
>>>>>>>>>>
>>>>>>>>>> I'm not sure that a stalled state is what we're after here, because we need
>>>>>>>>>> to take care to prevent any table walks if we've freed the underlying pages.
>>>>>>>>>> What we could try to do is disable the SMMU (put into global bypass) and
>>>>>>>>>> invalidate the TLB when performing a suspend operation, then we just ignore
>>>>>>>>>> invalidation whilst the clocks are stopped and, on resume, enable the SMMU
>>>>>>>>>> again.
>>>>>>>>>
>>>>>>>>> wouldn't stalled just block any memory transactions by device(s) using
>>>>>>>>> the context bank?  Putting it in bypass isn't really a good thing if
>>>>>>>>> there is any chance the device can sneak in a memory access before
>>>>>>>>> we've taking it back out of bypass (ie. makes gpu a giant userspace
>>>>>>>>> controlled root hole).
>>>>>>>>
>>>>>>>> If it doesn't deadlock, then yes, it will stall transactions. However, that
>>>>>>>> doesn't mean it necessarily prevents page table walks.
>>>>>>>
>>>>>>> btw, I guess the concern about pagetable walk is that the unmap could
>>>>>>> have removed some sub-level of the pt that the tlb walk would hit?
>>>>>>> Would deferring freeing those pages help?
>>>>>>
>>>>>> Could do, but it sounds like a lot of complication that I think we can fix
>>>>>> by making the suspend operation put the SMMU into a "clean" state.
>>>>>>
>>>>>>>> Instead of bypass, we
>>>>>>>> could configure all the streams to terminate, but this race still worries me
>>>>>>>> somewhat. I thought that the SMMU would only be suspended if all of its
>>>>>>>> masters were suspended, so if the GPU wants to come out of suspend then the
>>>>>>>> SMMU should be resumed first.
>>>>>>>
>>>>>>> I believe this should be true.. on the gpu side, I'm mostly trying to
>>>>>>> avoid having to power the gpu back on to free buffers.  (On the v4l2
>>>>>>> side, somewhere in the core videobuf code would also need to be made
>>>>>>> to wrap it's dma_unmap_sg() with pm_runtime_get/put()..)
>>>>>>
>>>>>> Right, and we shouldn't have to resume it if we suspend it in a clean state,
>>>>>> with the TLBs invalidated.
>>>>>>
>>>>>
>>>>> I guess if the device_link() stuff ensured the attached device
>>>>> (gpu/etc) was suspended before suspending the iommu, then I guess I
>>>>> can't see how temporarily putting the iommu in bypass would be a
>>>>> problem.  I haven't looked at the device_link() stuff too closely, but
>>>>> iommu being resumed first and suspended last seems like the only thing
>>>>> that would make sense.  I'm mostly just nervous about iommu in bypass
>>>>> vs gpu since userspace has so much control over what address gpu
>>>>> writes to / reads from, so getting it wrong w/ the iommu would be a
>>>>> rather bad thing ;-)
>>>>
>>>> Right, but we can also configure it to terminate if you don't want bypass.
>>>>
>>>
>>
>>  But one thing here is, with devicelinks in picture, iommu suspend/resume
>>  is called along with the master. That means, we can end up cleaning even
>>  active entries on the suspend path ?, if suspend is going to
>>  put the smmu in to a clean state every time. So if the master's are following
>>  the pm_runtime sequence before a dma_map/unmap operation, that seems better.
>>
>
>  Also, for the usecase of unmap being done from master's like GPU while it is already
>  suspended, then following the Marek's approach of checking for the smmu state while
>  in unmap and defer the TLB flush till resume seems correct way. All of the above
>  true if we want to use device_link.

This sounds like a plan.
I have a WIP version in which we will just check in 'unmap' if the mmu is
already suspended. If yes, then we save the unmap request (domain, iova)
and defer this request.
On resume, we check the pending tlb flush requests, and call unmap.

I will give a try for the venus use-case.

Regards
Vivek

>
> Regards,
>  Sricharan
>
> --
> "QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation
>
> ---
> This email has been checked for viruses by Avast antivirus software.
> https://www.avast.com/antivirus
>



-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project

^ permalink raw reply	[flat|nested] 168+ messages in thread

* [PATCH] iommu/arm-smmu: Defer TLB flush in case of unmap op
  2017-07-24 15:31                                                     ` Vivek Gautam
  (?)
@ 2017-08-02  9:53                                                         ` Vivek Gautam
  -1 siblings, 0 replies; 168+ messages in thread
From: Vivek Gautam @ 2017-08-02  9:53 UTC (permalink / raw)
  To: iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-arm-msm-u79uwXL29TY76Z2rM5mHXA
  Cc: mark.rutland-5wv7dgnIgG8, will.deacon-5wv7dgnIgG8,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	stanimir.varbanov-QSEj5FYQhm4dnm+yROfE0A,
	robh+dt-DgEjT+Ai2ygdnm+yROfE0A, sboyd-sgV2jX0FEOL9JmXXK+q4OQ,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r

We don't want to touch the TLB when smmu is suspended.
Defer it until resume.

Signed-off-by: Vivek Gautam <vivek.gautam-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org>
---

Hi all,

Here's the small patch in response of suggestion to defer tlb operations
when smmu is in suspend state.
The patch stores the TLB requests in 'unmap' when the smmu device is
suspended. On resume, it checks all the pending TLB requests, and
performs the unmap over those.

Right now, I have applied the patch on top of the pm runtime series.
Let me know what you think of the change. It will also be helpful if
somebody can please test a valid use case with this.

regards
Vivek

 drivers/iommu/arm-smmu.c | 59 +++++++++++++++++++++++++++++++++++++++++++-----
 1 file changed, 53 insertions(+), 6 deletions(-)

diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index fe8e7fd61282..1f9c2b16aabb 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -51,6 +51,7 @@
 #include <linux/pm_runtime.h>
 #include <linux/slab.h>
 #include <linux/spinlock.h>
+#include <linux/list.h>
 
 #include <linux/amba/bus.h>
 
@@ -151,6 +152,14 @@ struct arm_smmu_master_cfg {
 #define for_each_cfg_sme(fw, i, idx) \
 	for (i = 0; idx = fwspec_smendx(fw, i), i < fw->num_ids; ++i)
 
+struct arm_smmu_tlb_req_info {
+	struct iommu_domain *domain;
+	unsigned long iova;
+	size_t size;
+	bool tlb_flush_pending;
+	struct list_head list;
+};
+
 struct arm_smmu_device {
 	struct device			*dev;
 
@@ -182,6 +191,7 @@ struct arm_smmu_device {
 	u32				num_s2_context_banks;
 	DECLARE_BITMAP(context_map, ARM_SMMU_MAX_CBS);
 	atomic_t			irptndx;
+	struct list_head		domain_list;
 
 	u32				num_mapping_groups;
 	u16				streamid_mask;
@@ -1239,17 +1249,32 @@ static size_t arm_smmu_unmap(struct iommu_domain *domain, unsigned long iova,
 			     size_t size)
 {
 	struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
+	struct arm_smmu_device *smmu = smmu_domain->smmu;
 	struct io_pgtable_ops *ops = smmu_domain->pgtbl_ops;
-	size_t ret;
+	struct arm_smmu_tlb_req_info *tlb_info;
 
 	if (!ops)
 		return 0;
 
-	pm_runtime_get_sync(smmu_domain->smmu->dev);
-	ret = ops->unmap(ops, iova, size);
-	pm_runtime_put_sync(smmu_domain->smmu->dev);
+	/* if the device is suspended; we can't unmap, defer any tlb operations */
+	if (pm_runtime_suspended(smmu->dev)) {
+		tlb_info = devm_kzalloc(smmu->dev, sizeof(*tlb_info), GFP_ATOMIC);
+		if (!tlb_info)
+			return -ENOMEM;
 
-	return ret;
+		tlb_info->domain = domain;
+		tlb_info->iova = iova;
+		tlb_info->size = size;
+		tlb_info->tlb_flush_pending = true;
+		INIT_LIST_HEAD(&tlb_info->list);
+
+		/* XXX: We need locks here, but that again introduce the slowpath ? */
+		list_add_tail(&tlb_info->list, &smmu->domain_list);
+
+		return size;
+	}
+
+	return ops->unmap(ops, iova, size);
 }
 
 static phys_addr_t arm_smmu_iova_to_phys_hard(struct iommu_domain *domain,
@@ -2166,6 +2191,8 @@ static int arm_smmu_device_probe(struct platform_device *pdev)
 		smmu->irqs[i] = irq;
 	}
 
+	INIT_LIST_HEAD(&smmu->domain_list);
+
 	err = arm_smmu_init_clocks(smmu);
 	if (err)
 		return err;
@@ -2268,8 +2295,28 @@ static int arm_smmu_device_remove(struct platform_device *pdev)
 static int arm_smmu_resume(struct device *dev)
 {
 	struct arm_smmu_device *smmu = dev_get_drvdata(dev);
+	struct arm_smmu_tlb_req_info  *tlb_info, *temp;
+	int ret;
+
+	ret = arm_smmu_enable_clocks(smmu);
+	if (ret)
+		return ret;
+
+	list_for_each_entry_safe(tlb_info, temp, &smmu->domain_list, list) {
+		printk("\n\n %s %d :: iterating over pending tlb request\n\n", __func__, __LINE__);
+		if (tlb_info->tlb_flush_pending) {
+			ret = arm_smmu_unmap(tlb_info->domain, tlb_info->iova, tlb_info->size);
+			if (!ret)
+				return -EINVAL;
 
-	return arm_smmu_enable_clocks(smmu);
+			tlb_info->tlb_flush_pending = false;
+
+			/* we are done with this request; delete it */
+			list_del(&tlb_info->list);
+		}
+	}
+
+	return 0;
 }
 
 static int arm_smmu_suspend(struct device *dev)
-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member
of Code Aurora Forum, hosted by The Linux Foundation

^ permalink raw reply related	[flat|nested] 168+ messages in thread

* [PATCH] iommu/arm-smmu: Defer TLB flush in case of unmap op
@ 2017-08-02  9:53                                                         ` Vivek Gautam
  0 siblings, 0 replies; 168+ messages in thread
From: Vivek Gautam @ 2017-08-02  9:53 UTC (permalink / raw)
  To: iommu, linux-arm-msm
  Cc: robdclark, will.deacon, joro, robin.murphy, robh+dt,
	mark.rutland, m.szyprowski, linux-kernel, stanimir.varbanov,
	sricharan, sboyd, linux-arm-kernel, Vivek Gautam

We don't want to touch the TLB when smmu is suspended.
Defer it until resume.

Signed-off-by: Vivek Gautam <vivek.gautam@codeaurora.org>
---

Hi all,

Here's the small patch in response of suggestion to defer tlb operations
when smmu is in suspend state.
The patch stores the TLB requests in 'unmap' when the smmu device is
suspended. On resume, it checks all the pending TLB requests, and
performs the unmap over those.

Right now, I have applied the patch on top of the pm runtime series.
Let me know what you think of the change. It will also be helpful if
somebody can please test a valid use case with this.

regards
Vivek

 drivers/iommu/arm-smmu.c | 59 +++++++++++++++++++++++++++++++++++++++++++-----
 1 file changed, 53 insertions(+), 6 deletions(-)

diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index fe8e7fd61282..1f9c2b16aabb 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -51,6 +51,7 @@
 #include <linux/pm_runtime.h>
 #include <linux/slab.h>
 #include <linux/spinlock.h>
+#include <linux/list.h>
 
 #include <linux/amba/bus.h>
 
@@ -151,6 +152,14 @@ struct arm_smmu_master_cfg {
 #define for_each_cfg_sme(fw, i, idx) \
 	for (i = 0; idx = fwspec_smendx(fw, i), i < fw->num_ids; ++i)
 
+struct arm_smmu_tlb_req_info {
+	struct iommu_domain *domain;
+	unsigned long iova;
+	size_t size;
+	bool tlb_flush_pending;
+	struct list_head list;
+};
+
 struct arm_smmu_device {
 	struct device			*dev;
 
@@ -182,6 +191,7 @@ struct arm_smmu_device {
 	u32				num_s2_context_banks;
 	DECLARE_BITMAP(context_map, ARM_SMMU_MAX_CBS);
 	atomic_t			irptndx;
+	struct list_head		domain_list;
 
 	u32				num_mapping_groups;
 	u16				streamid_mask;
@@ -1239,17 +1249,32 @@ static size_t arm_smmu_unmap(struct iommu_domain *domain, unsigned long iova,
 			     size_t size)
 {
 	struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
+	struct arm_smmu_device *smmu = smmu_domain->smmu;
 	struct io_pgtable_ops *ops = smmu_domain->pgtbl_ops;
-	size_t ret;
+	struct arm_smmu_tlb_req_info *tlb_info;
 
 	if (!ops)
 		return 0;
 
-	pm_runtime_get_sync(smmu_domain->smmu->dev);
-	ret = ops->unmap(ops, iova, size);
-	pm_runtime_put_sync(smmu_domain->smmu->dev);
+	/* if the device is suspended; we can't unmap, defer any tlb operations */
+	if (pm_runtime_suspended(smmu->dev)) {
+		tlb_info = devm_kzalloc(smmu->dev, sizeof(*tlb_info), GFP_ATOMIC);
+		if (!tlb_info)
+			return -ENOMEM;
 
-	return ret;
+		tlb_info->domain = domain;
+		tlb_info->iova = iova;
+		tlb_info->size = size;
+		tlb_info->tlb_flush_pending = true;
+		INIT_LIST_HEAD(&tlb_info->list);
+
+		/* XXX: We need locks here, but that again introduce the slowpath ? */
+		list_add_tail(&tlb_info->list, &smmu->domain_list);
+
+		return size;
+	}
+
+	return ops->unmap(ops, iova, size);
 }
 
 static phys_addr_t arm_smmu_iova_to_phys_hard(struct iommu_domain *domain,
@@ -2166,6 +2191,8 @@ static int arm_smmu_device_probe(struct platform_device *pdev)
 		smmu->irqs[i] = irq;
 	}
 
+	INIT_LIST_HEAD(&smmu->domain_list);
+
 	err = arm_smmu_init_clocks(smmu);
 	if (err)
 		return err;
@@ -2268,8 +2295,28 @@ static int arm_smmu_device_remove(struct platform_device *pdev)
 static int arm_smmu_resume(struct device *dev)
 {
 	struct arm_smmu_device *smmu = dev_get_drvdata(dev);
+	struct arm_smmu_tlb_req_info  *tlb_info, *temp;
+	int ret;
+
+	ret = arm_smmu_enable_clocks(smmu);
+	if (ret)
+		return ret;
+
+	list_for_each_entry_safe(tlb_info, temp, &smmu->domain_list, list) {
+		printk("\n\n %s %d :: iterating over pending tlb request\n\n", __func__, __LINE__);
+		if (tlb_info->tlb_flush_pending) {
+			ret = arm_smmu_unmap(tlb_info->domain, tlb_info->iova, tlb_info->size);
+			if (!ret)
+				return -EINVAL;
 
-	return arm_smmu_enable_clocks(smmu);
+			tlb_info->tlb_flush_pending = false;
+
+			/* we are done with this request; delete it */
+			list_del(&tlb_info->list);
+		}
+	}
+
+	return 0;
 }
 
 static int arm_smmu_suspend(struct device *dev)
-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member
of Code Aurora Forum, hosted by The Linux Foundation

^ permalink raw reply related	[flat|nested] 168+ messages in thread

* [PATCH] iommu/arm-smmu: Defer TLB flush in case of unmap op
@ 2017-08-02  9:53                                                         ` Vivek Gautam
  0 siblings, 0 replies; 168+ messages in thread
From: Vivek Gautam @ 2017-08-02  9:53 UTC (permalink / raw)
  To: linux-arm-kernel

We don't want to touch the TLB when smmu is suspended.
Defer it until resume.

Signed-off-by: Vivek Gautam <vivek.gautam@codeaurora.org>
---

Hi all,

Here's the small patch in response of suggestion to defer tlb operations
when smmu is in suspend state.
The patch stores the TLB requests in 'unmap' when the smmu device is
suspended. On resume, it checks all the pending TLB requests, and
performs the unmap over those.

Right now, I have applied the patch on top of the pm runtime series.
Let me know what you think of the change. It will also be helpful if
somebody can please test a valid use case with this.

regards
Vivek

 drivers/iommu/arm-smmu.c | 59 +++++++++++++++++++++++++++++++++++++++++++-----
 1 file changed, 53 insertions(+), 6 deletions(-)

diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index fe8e7fd61282..1f9c2b16aabb 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -51,6 +51,7 @@
 #include <linux/pm_runtime.h>
 #include <linux/slab.h>
 #include <linux/spinlock.h>
+#include <linux/list.h>
 
 #include <linux/amba/bus.h>
 
@@ -151,6 +152,14 @@ struct arm_smmu_master_cfg {
 #define for_each_cfg_sme(fw, i, idx) \
 	for (i = 0; idx = fwspec_smendx(fw, i), i < fw->num_ids; ++i)
 
+struct arm_smmu_tlb_req_info {
+	struct iommu_domain *domain;
+	unsigned long iova;
+	size_t size;
+	bool tlb_flush_pending;
+	struct list_head list;
+};
+
 struct arm_smmu_device {
 	struct device			*dev;
 
@@ -182,6 +191,7 @@ struct arm_smmu_device {
 	u32				num_s2_context_banks;
 	DECLARE_BITMAP(context_map, ARM_SMMU_MAX_CBS);
 	atomic_t			irptndx;
+	struct list_head		domain_list;
 
 	u32				num_mapping_groups;
 	u16				streamid_mask;
@@ -1239,17 +1249,32 @@ static size_t arm_smmu_unmap(struct iommu_domain *domain, unsigned long iova,
 			     size_t size)
 {
 	struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
+	struct arm_smmu_device *smmu = smmu_domain->smmu;
 	struct io_pgtable_ops *ops = smmu_domain->pgtbl_ops;
-	size_t ret;
+	struct arm_smmu_tlb_req_info *tlb_info;
 
 	if (!ops)
 		return 0;
 
-	pm_runtime_get_sync(smmu_domain->smmu->dev);
-	ret = ops->unmap(ops, iova, size);
-	pm_runtime_put_sync(smmu_domain->smmu->dev);
+	/* if the device is suspended; we can't unmap, defer any tlb operations */
+	if (pm_runtime_suspended(smmu->dev)) {
+		tlb_info = devm_kzalloc(smmu->dev, sizeof(*tlb_info), GFP_ATOMIC);
+		if (!tlb_info)
+			return -ENOMEM;
 
-	return ret;
+		tlb_info->domain = domain;
+		tlb_info->iova = iova;
+		tlb_info->size = size;
+		tlb_info->tlb_flush_pending = true;
+		INIT_LIST_HEAD(&tlb_info->list);
+
+		/* XXX: We need locks here, but that again introduce the slowpath ? */
+		list_add_tail(&tlb_info->list, &smmu->domain_list);
+
+		return size;
+	}
+
+	return ops->unmap(ops, iova, size);
 }
 
 static phys_addr_t arm_smmu_iova_to_phys_hard(struct iommu_domain *domain,
@@ -2166,6 +2191,8 @@ static int arm_smmu_device_probe(struct platform_device *pdev)
 		smmu->irqs[i] = irq;
 	}
 
+	INIT_LIST_HEAD(&smmu->domain_list);
+
 	err = arm_smmu_init_clocks(smmu);
 	if (err)
 		return err;
@@ -2268,8 +2295,28 @@ static int arm_smmu_device_remove(struct platform_device *pdev)
 static int arm_smmu_resume(struct device *dev)
 {
 	struct arm_smmu_device *smmu = dev_get_drvdata(dev);
+	struct arm_smmu_tlb_req_info  *tlb_info, *temp;
+	int ret;
+
+	ret = arm_smmu_enable_clocks(smmu);
+	if (ret)
+		return ret;
+
+	list_for_each_entry_safe(tlb_info, temp, &smmu->domain_list, list) {
+		printk("\n\n %s %d :: iterating over pending tlb request\n\n", __func__, __LINE__);
+		if (tlb_info->tlb_flush_pending) {
+			ret = arm_smmu_unmap(tlb_info->domain, tlb_info->iova, tlb_info->size);
+			if (!ret)
+				return -EINVAL;
 
-	return arm_smmu_enable_clocks(smmu);
+			tlb_info->tlb_flush_pending = false;
+
+			/* we are done with this request; delete it */
+			list_del(&tlb_info->list);
+		}
+	}
+
+	return 0;
 }
 
 static int arm_smmu_suspend(struct device *dev)
-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member
of Code Aurora Forum, hosted by The Linux Foundation

^ permalink raw reply related	[flat|nested] 168+ messages in thread

* Re: [PATCH] iommu/arm-smmu: Defer TLB flush in case of unmap op
  2017-08-02  9:53                                                         ` Vivek Gautam
@ 2017-08-02 12:17                                                           ` Robin Murphy
  -1 siblings, 0 replies; 168+ messages in thread
From: Robin Murphy @ 2017-08-02 12:17 UTC (permalink / raw)
  To: Vivek Gautam, iommu, linux-arm-msm
  Cc: robdclark, will.deacon, joro, robh+dt, mark.rutland,
	m.szyprowski, linux-kernel, stanimir.varbanov, sricharan, sboyd,
	linux-arm-kernel

On 02/08/17 10:53, Vivek Gautam wrote:
> We don't want to touch the TLB when smmu is suspended.
> Defer it until resume.
> 
> Signed-off-by: Vivek Gautam <vivek.gautam@codeaurora.org>
> ---
> 
> Hi all,
> 
> Here's the small patch in response of suggestion to defer tlb operations
> when smmu is in suspend state.
> The patch stores the TLB requests in 'unmap' when the smmu device is
> suspended. On resume, it checks all the pending TLB requests, and
> performs the unmap over those.
> 
> Right now, I have applied the patch on top of the pm runtime series.
> Let me know what you think of the change. It will also be helpful if
> somebody can please test a valid use case with this.

The patch itself doesn't make much sense to me, but more crucially it's
definitely broken in concept. We can't return from arm_smmu_unmap()
without having actually unmapped anything, because that leaves the page
tables out of sync with what the caller expects - they may immmediately
reuse that IOVA to map something else for a different device and hit an
unexpected failure from io-pgtable when the PTE turns out to be non-empty.

However, if in general suspend *might* power-gate any part of the SMMU,
then I don't think we have any guarantee of what state any TLBs could be
in upon resume. Therefore any individual invalidations we skip while
suspended are probably moot, since resume would almost certainly have to
invalidate everything to get back to a safe state anyway.

Conversely though, the situation that still concerns me is whether this
can work at all for a distributed SMMU if things *don't* lose state. Say
the GPU and its local TBU are in the same clock domain - if the GPU has
just gone idle and we've clock-gated it, but "the SMMU" (i.e. the TCU)
is still active servicing other devices, we will assume we can happily
unmap GPU buffers and issue TLBIs, but what happens with entries held in
the unclocked TBU's micro-TLB?

Robin.

> 
> regards
> Vivek
> 
>  drivers/iommu/arm-smmu.c | 59 +++++++++++++++++++++++++++++++++++++++++++-----
>  1 file changed, 53 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
> index fe8e7fd61282..1f9c2b16aabb 100644
> --- a/drivers/iommu/arm-smmu.c
> +++ b/drivers/iommu/arm-smmu.c
> @@ -51,6 +51,7 @@
>  #include <linux/pm_runtime.h>
>  #include <linux/slab.h>
>  #include <linux/spinlock.h>
> +#include <linux/list.h>
>  
>  #include <linux/amba/bus.h>
>  
> @@ -151,6 +152,14 @@ struct arm_smmu_master_cfg {
>  #define for_each_cfg_sme(fw, i, idx) \
>  	for (i = 0; idx = fwspec_smendx(fw, i), i < fw->num_ids; ++i)
>  
> +struct arm_smmu_tlb_req_info {
> +	struct iommu_domain *domain;
> +	unsigned long iova;
> +	size_t size;
> +	bool tlb_flush_pending;
> +	struct list_head list;
> +};
> +
>  struct arm_smmu_device {
>  	struct device			*dev;
>  
> @@ -182,6 +191,7 @@ struct arm_smmu_device {
>  	u32				num_s2_context_banks;
>  	DECLARE_BITMAP(context_map, ARM_SMMU_MAX_CBS);
>  	atomic_t			irptndx;
> +	struct list_head		domain_list;
>  
>  	u32				num_mapping_groups;
>  	u16				streamid_mask;
> @@ -1239,17 +1249,32 @@ static size_t arm_smmu_unmap(struct iommu_domain *domain, unsigned long iova,
>  			     size_t size)
>  {
>  	struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
> +	struct arm_smmu_device *smmu = smmu_domain->smmu;
>  	struct io_pgtable_ops *ops = smmu_domain->pgtbl_ops;
> -	size_t ret;
> +	struct arm_smmu_tlb_req_info *tlb_info;
>  
>  	if (!ops)
>  		return 0;
>  
> -	pm_runtime_get_sync(smmu_domain->smmu->dev);
> -	ret = ops->unmap(ops, iova, size);
> -	pm_runtime_put_sync(smmu_domain->smmu->dev);
> +	/* if the device is suspended; we can't unmap, defer any tlb operations */
> +	if (pm_runtime_suspended(smmu->dev)) {
> +		tlb_info = devm_kzalloc(smmu->dev, sizeof(*tlb_info), GFP_ATOMIC);
> +		if (!tlb_info)
> +			return -ENOMEM;
>  
> -	return ret;
> +		tlb_info->domain = domain;
> +		tlb_info->iova = iova;
> +		tlb_info->size = size;
> +		tlb_info->tlb_flush_pending = true;
> +		INIT_LIST_HEAD(&tlb_info->list);
> +
> +		/* XXX: We need locks here, but that again introduce the slowpath ? */
> +		list_add_tail(&tlb_info->list, &smmu->domain_list);
> +
> +		return size;
> +	}
> +
> +	return ops->unmap(ops, iova, size);
>  }
>  
>  static phys_addr_t arm_smmu_iova_to_phys_hard(struct iommu_domain *domain,
> @@ -2166,6 +2191,8 @@ static int arm_smmu_device_probe(struct platform_device *pdev)
>  		smmu->irqs[i] = irq;
>  	}
>  
> +	INIT_LIST_HEAD(&smmu->domain_list);
> +
>  	err = arm_smmu_init_clocks(smmu);
>  	if (err)
>  		return err;
> @@ -2268,8 +2295,28 @@ static int arm_smmu_device_remove(struct platform_device *pdev)
>  static int arm_smmu_resume(struct device *dev)
>  {
>  	struct arm_smmu_device *smmu = dev_get_drvdata(dev);
> +	struct arm_smmu_tlb_req_info  *tlb_info, *temp;
> +	int ret;
> +
> +	ret = arm_smmu_enable_clocks(smmu);
> +	if (ret)
> +		return ret;
> +
> +	list_for_each_entry_safe(tlb_info, temp, &smmu->domain_list, list) {
> +		printk("\n\n %s %d :: iterating over pending tlb request\n\n", __func__, __LINE__);
> +		if (tlb_info->tlb_flush_pending) {
> +			ret = arm_smmu_unmap(tlb_info->domain, tlb_info->iova, tlb_info->size);
> +			if (!ret)
> +				return -EINVAL;
>  
> -	return arm_smmu_enable_clocks(smmu);
> +			tlb_info->tlb_flush_pending = false;
> +
> +			/* we are done with this request; delete it */
> +			list_del(&tlb_info->list);
> +		}
> +	}
> +
> +	return 0;
>  }
>  
>  static int arm_smmu_suspend(struct device *dev)
> 

^ permalink raw reply	[flat|nested] 168+ messages in thread

* [PATCH] iommu/arm-smmu: Defer TLB flush in case of unmap op
@ 2017-08-02 12:17                                                           ` Robin Murphy
  0 siblings, 0 replies; 168+ messages in thread
From: Robin Murphy @ 2017-08-02 12:17 UTC (permalink / raw)
  To: linux-arm-kernel

On 02/08/17 10:53, Vivek Gautam wrote:
> We don't want to touch the TLB when smmu is suspended.
> Defer it until resume.
> 
> Signed-off-by: Vivek Gautam <vivek.gautam@codeaurora.org>
> ---
> 
> Hi all,
> 
> Here's the small patch in response of suggestion to defer tlb operations
> when smmu is in suspend state.
> The patch stores the TLB requests in 'unmap' when the smmu device is
> suspended. On resume, it checks all the pending TLB requests, and
> performs the unmap over those.
> 
> Right now, I have applied the patch on top of the pm runtime series.
> Let me know what you think of the change. It will also be helpful if
> somebody can please test a valid use case with this.

The patch itself doesn't make much sense to me, but more crucially it's
definitely broken in concept. We can't return from arm_smmu_unmap()
without having actually unmapped anything, because that leaves the page
tables out of sync with what the caller expects - they may immmediately
reuse that IOVA to map something else for a different device and hit an
unexpected failure from io-pgtable when the PTE turns out to be non-empty.

However, if in general suspend *might* power-gate any part of the SMMU,
then I don't think we have any guarantee of what state any TLBs could be
in upon resume. Therefore any individual invalidations we skip while
suspended are probably moot, since resume would almost certainly have to
invalidate everything to get back to a safe state anyway.

Conversely though, the situation that still concerns me is whether this
can work at all for a distributed SMMU if things *don't* lose state. Say
the GPU and its local TBU are in the same clock domain - if the GPU has
just gone idle and we've clock-gated it, but "the SMMU" (i.e. the TCU)
is still active servicing other devices, we will assume we can happily
unmap GPU buffers and issue TLBIs, but what happens with entries held in
the unclocked TBU's micro-TLB?

Robin.

> 
> regards
> Vivek
> 
>  drivers/iommu/arm-smmu.c | 59 +++++++++++++++++++++++++++++++++++++++++++-----
>  1 file changed, 53 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
> index fe8e7fd61282..1f9c2b16aabb 100644
> --- a/drivers/iommu/arm-smmu.c
> +++ b/drivers/iommu/arm-smmu.c
> @@ -51,6 +51,7 @@
>  #include <linux/pm_runtime.h>
>  #include <linux/slab.h>
>  #include <linux/spinlock.h>
> +#include <linux/list.h>
>  
>  #include <linux/amba/bus.h>
>  
> @@ -151,6 +152,14 @@ struct arm_smmu_master_cfg {
>  #define for_each_cfg_sme(fw, i, idx) \
>  	for (i = 0; idx = fwspec_smendx(fw, i), i < fw->num_ids; ++i)
>  
> +struct arm_smmu_tlb_req_info {
> +	struct iommu_domain *domain;
> +	unsigned long iova;
> +	size_t size;
> +	bool tlb_flush_pending;
> +	struct list_head list;
> +};
> +
>  struct arm_smmu_device {
>  	struct device			*dev;
>  
> @@ -182,6 +191,7 @@ struct arm_smmu_device {
>  	u32				num_s2_context_banks;
>  	DECLARE_BITMAP(context_map, ARM_SMMU_MAX_CBS);
>  	atomic_t			irptndx;
> +	struct list_head		domain_list;
>  
>  	u32				num_mapping_groups;
>  	u16				streamid_mask;
> @@ -1239,17 +1249,32 @@ static size_t arm_smmu_unmap(struct iommu_domain *domain, unsigned long iova,
>  			     size_t size)
>  {
>  	struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
> +	struct arm_smmu_device *smmu = smmu_domain->smmu;
>  	struct io_pgtable_ops *ops = smmu_domain->pgtbl_ops;
> -	size_t ret;
> +	struct arm_smmu_tlb_req_info *tlb_info;
>  
>  	if (!ops)
>  		return 0;
>  
> -	pm_runtime_get_sync(smmu_domain->smmu->dev);
> -	ret = ops->unmap(ops, iova, size);
> -	pm_runtime_put_sync(smmu_domain->smmu->dev);
> +	/* if the device is suspended; we can't unmap, defer any tlb operations */
> +	if (pm_runtime_suspended(smmu->dev)) {
> +		tlb_info = devm_kzalloc(smmu->dev, sizeof(*tlb_info), GFP_ATOMIC);
> +		if (!tlb_info)
> +			return -ENOMEM;
>  
> -	return ret;
> +		tlb_info->domain = domain;
> +		tlb_info->iova = iova;
> +		tlb_info->size = size;
> +		tlb_info->tlb_flush_pending = true;
> +		INIT_LIST_HEAD(&tlb_info->list);
> +
> +		/* XXX: We need locks here, but that again introduce the slowpath ? */
> +		list_add_tail(&tlb_info->list, &smmu->domain_list);
> +
> +		return size;
> +	}
> +
> +	return ops->unmap(ops, iova, size);
>  }
>  
>  static phys_addr_t arm_smmu_iova_to_phys_hard(struct iommu_domain *domain,
> @@ -2166,6 +2191,8 @@ static int arm_smmu_device_probe(struct platform_device *pdev)
>  		smmu->irqs[i] = irq;
>  	}
>  
> +	INIT_LIST_HEAD(&smmu->domain_list);
> +
>  	err = arm_smmu_init_clocks(smmu);
>  	if (err)
>  		return err;
> @@ -2268,8 +2295,28 @@ static int arm_smmu_device_remove(struct platform_device *pdev)
>  static int arm_smmu_resume(struct device *dev)
>  {
>  	struct arm_smmu_device *smmu = dev_get_drvdata(dev);
> +	struct arm_smmu_tlb_req_info  *tlb_info, *temp;
> +	int ret;
> +
> +	ret = arm_smmu_enable_clocks(smmu);
> +	if (ret)
> +		return ret;
> +
> +	list_for_each_entry_safe(tlb_info, temp, &smmu->domain_list, list) {
> +		printk("\n\n %s %d :: iterating over pending tlb request\n\n", __func__, __LINE__);
> +		if (tlb_info->tlb_flush_pending) {
> +			ret = arm_smmu_unmap(tlb_info->domain, tlb_info->iova, tlb_info->size);
> +			if (!ret)
> +				return -EINVAL;
>  
> -	return arm_smmu_enable_clocks(smmu);
> +			tlb_info->tlb_flush_pending = false;
> +
> +			/* we are done with this request; delete it */
> +			list_del(&tlb_info->list);
> +		}
> +	}
> +
> +	return 0;
>  }
>  
>  static int arm_smmu_suspend(struct device *dev)
> 

^ permalink raw reply	[flat|nested] 168+ messages in thread

* Re: [PATCH] iommu/arm-smmu: Defer TLB flush in case of unmap op
  2017-08-02 12:17                                                           ` Robin Murphy
  (?)
@ 2017-08-03  5:35                                                               ` Vivek Gautam
  -1 siblings, 0 replies; 168+ messages in thread
From: Vivek Gautam @ 2017-08-03  5:35 UTC (permalink / raw)
  To: Robin Murphy, iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-arm-msm-u79uwXL29TY76Z2rM5mHXA
  Cc: mark.rutland-5wv7dgnIgG8, sboyd-sgV2jX0FEOL9JmXXK+q4OQ,
	will.deacon-5wv7dgnIgG8, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	stanimir.varbanov-QSEj5FYQhm4dnm+yROfE0A,
	robh+dt-DgEjT+Ai2ygdnm+yROfE0A,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r

Hi Robin,



On 08/02/2017 05:47 PM, Robin Murphy wrote:
> On 02/08/17 10:53, Vivek Gautam wrote:
>> We don't want to touch the TLB when smmu is suspended.
>> Defer it until resume.
>>
>> Signed-off-by: Vivek Gautam <vivek.gautam-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org>
>> ---
>>
>> Hi all,
>>
>> Here's the small patch in response of suggestion to defer tlb operations
>> when smmu is in suspend state.
>> The patch stores the TLB requests in 'unmap' when the smmu device is
>> suspended. On resume, it checks all the pending TLB requests, and
>> performs the unmap over those.
>>
>> Right now, I have applied the patch on top of the pm runtime series.
>> Let me know what you think of the change. It will also be helpful if
>> somebody can please test a valid use case with this.
> The patch itself doesn't make much sense to me, but more crucially it's
> definitely broken in concept. We can't return from arm_smmu_unmap()
> without having actually unmapped anything, because that leaves the page
> tables out of sync with what the caller expects - they may immmediately
> reuse that IOVA to map something else for a different device and hit an
> unexpected failure from io-pgtable when the PTE turns out to be non-empty.

To understand things bit more,
once we don't *unmap* in arm_smmu_unmap(), and leave the TLBs as is,
the next mapping can happen only with the *knowledge* of smmu, i.e.,
smmu should be active at that time.
If that's true then, the _runtime()_resume() method will take care of
invalidating the TLBs when we call arm_smmu_unmap() from _runtime_resume().
Is my understanding correct here?

>
> However, if in general suspend *might* power-gate any part of the SMMU,
> then I don't think we have any guarantee of what state any TLBs could be
> in upon resume. Therefore any individual invalidations we skip while
> suspended are probably moot, since resume would almost certainly have to
> invalidate everything to get back to a safe state anyway.

Right, in case when the suspend power-gates the SMMU, then
the TLB context is lost anyways. So resume path can freshly start.
This is something that exynos does at present.

>
> Conversely though, the situation that still concerns me is whether this
> can work at all for a distributed SMMU if things *don't* lose state. Say
> the GPU and its local TBU are in the same clock domain - if the GPU has
> just gone idle and we've clock-gated it, but "the SMMU" (i.e. the TCU)
> is still active servicing other devices, we will assume we can happily
> unmap GPU buffers and issue TLBIs, but what happens with entries held in
> the unclocked TBU's micro-TLB?

We know of platforms we have that have shared TCU and multiple TBUs.
Each TBU is available in its own power domain, not in master's power domain.
In such cases we may want to runtime_get() the TBUs, so that unmap() 
call with
master clock gated gets through.

Can we have a situation where the TBU and master are in the same power
domain, and the unmap is called when the master is not runtime active?
How will such a situation be handled?

Best regards
Vivek

>
> Robin.
>
>> regards
>> Vivek
>>
>>   drivers/iommu/arm-smmu.c | 59 +++++++++++++++++++++++++++++++++++++++++++-----
>>   1 file changed, 53 insertions(+), 6 deletions(-)
>>
>> diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
>> index fe8e7fd61282..1f9c2b16aabb 100644
>> --- a/drivers/iommu/arm-smmu.c
>> +++ b/drivers/iommu/arm-smmu.c
>> @@ -51,6 +51,7 @@
>>   #include <linux/pm_runtime.h>
>>   #include <linux/slab.h>
>>   #include <linux/spinlock.h>
>> +#include <linux/list.h>
>>   
>>   #include <linux/amba/bus.h>
>>   
>> @@ -151,6 +152,14 @@ struct arm_smmu_master_cfg {
>>   #define for_each_cfg_sme(fw, i, idx) \
>>   	for (i = 0; idx = fwspec_smendx(fw, i), i < fw->num_ids; ++i)
>>   
>> +struct arm_smmu_tlb_req_info {
>> +	struct iommu_domain *domain;
>> +	unsigned long iova;
>> +	size_t size;
>> +	bool tlb_flush_pending;
>> +	struct list_head list;
>> +};
>> +
>>   struct arm_smmu_device {
>>   	struct device			*dev;
>>   
>> @@ -182,6 +191,7 @@ struct arm_smmu_device {
>>   	u32				num_s2_context_banks;
>>   	DECLARE_BITMAP(context_map, ARM_SMMU_MAX_CBS);
>>   	atomic_t			irptndx;
>> +	struct list_head		domain_list;
>>   
>>   	u32				num_mapping_groups;
>>   	u16				streamid_mask;
>> @@ -1239,17 +1249,32 @@ static size_t arm_smmu_unmap(struct iommu_domain *domain, unsigned long iova,
>>   			     size_t size)
>>   {
>>   	struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
>> +	struct arm_smmu_device *smmu = smmu_domain->smmu;
>>   	struct io_pgtable_ops *ops = smmu_domain->pgtbl_ops;
>> -	size_t ret;
>> +	struct arm_smmu_tlb_req_info *tlb_info;
>>   
>>   	if (!ops)
>>   		return 0;
>>   
>> -	pm_runtime_get_sync(smmu_domain->smmu->dev);
>> -	ret = ops->unmap(ops, iova, size);
>> -	pm_runtime_put_sync(smmu_domain->smmu->dev);
>> +	/* if the device is suspended; we can't unmap, defer any tlb operations */
>> +	if (pm_runtime_suspended(smmu->dev)) {
>> +		tlb_info = devm_kzalloc(smmu->dev, sizeof(*tlb_info), GFP_ATOMIC);
>> +		if (!tlb_info)
>> +			return -ENOMEM;
>>   
>> -	return ret;
>> +		tlb_info->domain = domain;
>> +		tlb_info->iova = iova;
>> +		tlb_info->size = size;
>> +		tlb_info->tlb_flush_pending = true;
>> +		INIT_LIST_HEAD(&tlb_info->list);
>> +
>> +		/* XXX: We need locks here, but that again introduce the slowpath ? */
>> +		list_add_tail(&tlb_info->list, &smmu->domain_list);
>> +
>> +		return size;
>> +	}
>> +
>> +	return ops->unmap(ops, iova, size);
>>   }
>>   
>>   static phys_addr_t arm_smmu_iova_to_phys_hard(struct iommu_domain *domain,
>> @@ -2166,6 +2191,8 @@ static int arm_smmu_device_probe(struct platform_device *pdev)
>>   		smmu->irqs[i] = irq;
>>   	}
>>   
>> +	INIT_LIST_HEAD(&smmu->domain_list);
>> +
>>   	err = arm_smmu_init_clocks(smmu);
>>   	if (err)
>>   		return err;
>> @@ -2268,8 +2295,28 @@ static int arm_smmu_device_remove(struct platform_device *pdev)
>>   static int arm_smmu_resume(struct device *dev)
>>   {
>>   	struct arm_smmu_device *smmu = dev_get_drvdata(dev);
>> +	struct arm_smmu_tlb_req_info  *tlb_info, *temp;
>> +	int ret;
>> +
>> +	ret = arm_smmu_enable_clocks(smmu);
>> +	if (ret)
>> +		return ret;
>> +
>> +	list_for_each_entry_safe(tlb_info, temp, &smmu->domain_list, list) {
>> +		printk("\n\n %s %d :: iterating over pending tlb request\n\n", __func__, __LINE__);
>> +		if (tlb_info->tlb_flush_pending) {
>> +			ret = arm_smmu_unmap(tlb_info->domain, tlb_info->iova, tlb_info->size);
>> +			if (!ret)
>> +				return -EINVAL;
>>   
>> -	return arm_smmu_enable_clocks(smmu);
>> +			tlb_info->tlb_flush_pending = false;
>> +
>> +			/* we are done with this request; delete it */
>> +			list_del(&tlb_info->list);
>> +		}
>> +	}
>> +
>> +	return 0;
>>   }
>>   
>>   static int arm_smmu_suspend(struct device *dev)
>>

-- 
The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project

^ permalink raw reply	[flat|nested] 168+ messages in thread

* Re: [PATCH] iommu/arm-smmu: Defer TLB flush in case of unmap op
@ 2017-08-03  5:35                                                               ` Vivek Gautam
  0 siblings, 0 replies; 168+ messages in thread
From: Vivek Gautam @ 2017-08-03  5:35 UTC (permalink / raw)
  To: Robin Murphy, iommu, linux-arm-msm
  Cc: robdclark, will.deacon, joro, robh+dt, mark.rutland,
	m.szyprowski, linux-kernel, stanimir.varbanov, sricharan, sboyd,
	linux-arm-kernel

Hi Robin,



On 08/02/2017 05:47 PM, Robin Murphy wrote:
> On 02/08/17 10:53, Vivek Gautam wrote:
>> We don't want to touch the TLB when smmu is suspended.
>> Defer it until resume.
>>
>> Signed-off-by: Vivek Gautam <vivek.gautam@codeaurora.org>
>> ---
>>
>> Hi all,
>>
>> Here's the small patch in response of suggestion to defer tlb operations
>> when smmu is in suspend state.
>> The patch stores the TLB requests in 'unmap' when the smmu device is
>> suspended. On resume, it checks all the pending TLB requests, and
>> performs the unmap over those.
>>
>> Right now, I have applied the patch on top of the pm runtime series.
>> Let me know what you think of the change. It will also be helpful if
>> somebody can please test a valid use case with this.
> The patch itself doesn't make much sense to me, but more crucially it's
> definitely broken in concept. We can't return from arm_smmu_unmap()
> without having actually unmapped anything, because that leaves the page
> tables out of sync with what the caller expects - they may immmediately
> reuse that IOVA to map something else for a different device and hit an
> unexpected failure from io-pgtable when the PTE turns out to be non-empty.

To understand things bit more,
once we don't *unmap* in arm_smmu_unmap(), and leave the TLBs as is,
the next mapping can happen only with the *knowledge* of smmu, i.e.,
smmu should be active at that time.
If that's true then, the _runtime()_resume() method will take care of
invalidating the TLBs when we call arm_smmu_unmap() from _runtime_resume().
Is my understanding correct here?

>
> However, if in general suspend *might* power-gate any part of the SMMU,
> then I don't think we have any guarantee of what state any TLBs could be
> in upon resume. Therefore any individual invalidations we skip while
> suspended are probably moot, since resume would almost certainly have to
> invalidate everything to get back to a safe state anyway.

Right, in case when the suspend power-gates the SMMU, then
the TLB context is lost anyways. So resume path can freshly start.
This is something that exynos does at present.

>
> Conversely though, the situation that still concerns me is whether this
> can work at all for a distributed SMMU if things *don't* lose state. Say
> the GPU and its local TBU are in the same clock domain - if the GPU has
> just gone idle and we've clock-gated it, but "the SMMU" (i.e. the TCU)
> is still active servicing other devices, we will assume we can happily
> unmap GPU buffers and issue TLBIs, but what happens with entries held in
> the unclocked TBU's micro-TLB?

We know of platforms we have that have shared TCU and multiple TBUs.
Each TBU is available in its own power domain, not in master's power domain.
In such cases we may want to runtime_get() the TBUs, so that unmap() 
call with
master clock gated gets through.

Can we have a situation where the TBU and master are in the same power
domain, and the unmap is called when the master is not runtime active?
How will such a situation be handled?

Best regards
Vivek

>
> Robin.
>
>> regards
>> Vivek
>>
>>   drivers/iommu/arm-smmu.c | 59 +++++++++++++++++++++++++++++++++++++++++++-----
>>   1 file changed, 53 insertions(+), 6 deletions(-)
>>
>> diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
>> index fe8e7fd61282..1f9c2b16aabb 100644
>> --- a/drivers/iommu/arm-smmu.c
>> +++ b/drivers/iommu/arm-smmu.c
>> @@ -51,6 +51,7 @@
>>   #include <linux/pm_runtime.h>
>>   #include <linux/slab.h>
>>   #include <linux/spinlock.h>
>> +#include <linux/list.h>
>>   
>>   #include <linux/amba/bus.h>
>>   
>> @@ -151,6 +152,14 @@ struct arm_smmu_master_cfg {
>>   #define for_each_cfg_sme(fw, i, idx) \
>>   	for (i = 0; idx = fwspec_smendx(fw, i), i < fw->num_ids; ++i)
>>   
>> +struct arm_smmu_tlb_req_info {
>> +	struct iommu_domain *domain;
>> +	unsigned long iova;
>> +	size_t size;
>> +	bool tlb_flush_pending;
>> +	struct list_head list;
>> +};
>> +
>>   struct arm_smmu_device {
>>   	struct device			*dev;
>>   
>> @@ -182,6 +191,7 @@ struct arm_smmu_device {
>>   	u32				num_s2_context_banks;
>>   	DECLARE_BITMAP(context_map, ARM_SMMU_MAX_CBS);
>>   	atomic_t			irptndx;
>> +	struct list_head		domain_list;
>>   
>>   	u32				num_mapping_groups;
>>   	u16				streamid_mask;
>> @@ -1239,17 +1249,32 @@ static size_t arm_smmu_unmap(struct iommu_domain *domain, unsigned long iova,
>>   			     size_t size)
>>   {
>>   	struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
>> +	struct arm_smmu_device *smmu = smmu_domain->smmu;
>>   	struct io_pgtable_ops *ops = smmu_domain->pgtbl_ops;
>> -	size_t ret;
>> +	struct arm_smmu_tlb_req_info *tlb_info;
>>   
>>   	if (!ops)
>>   		return 0;
>>   
>> -	pm_runtime_get_sync(smmu_domain->smmu->dev);
>> -	ret = ops->unmap(ops, iova, size);
>> -	pm_runtime_put_sync(smmu_domain->smmu->dev);
>> +	/* if the device is suspended; we can't unmap, defer any tlb operations */
>> +	if (pm_runtime_suspended(smmu->dev)) {
>> +		tlb_info = devm_kzalloc(smmu->dev, sizeof(*tlb_info), GFP_ATOMIC);
>> +		if (!tlb_info)
>> +			return -ENOMEM;
>>   
>> -	return ret;
>> +		tlb_info->domain = domain;
>> +		tlb_info->iova = iova;
>> +		tlb_info->size = size;
>> +		tlb_info->tlb_flush_pending = true;
>> +		INIT_LIST_HEAD(&tlb_info->list);
>> +
>> +		/* XXX: We need locks here, but that again introduce the slowpath ? */
>> +		list_add_tail(&tlb_info->list, &smmu->domain_list);
>> +
>> +		return size;
>> +	}
>> +
>> +	return ops->unmap(ops, iova, size);
>>   }
>>   
>>   static phys_addr_t arm_smmu_iova_to_phys_hard(struct iommu_domain *domain,
>> @@ -2166,6 +2191,8 @@ static int arm_smmu_device_probe(struct platform_device *pdev)
>>   		smmu->irqs[i] = irq;
>>   	}
>>   
>> +	INIT_LIST_HEAD(&smmu->domain_list);
>> +
>>   	err = arm_smmu_init_clocks(smmu);
>>   	if (err)
>>   		return err;
>> @@ -2268,8 +2295,28 @@ static int arm_smmu_device_remove(struct platform_device *pdev)
>>   static int arm_smmu_resume(struct device *dev)
>>   {
>>   	struct arm_smmu_device *smmu = dev_get_drvdata(dev);
>> +	struct arm_smmu_tlb_req_info  *tlb_info, *temp;
>> +	int ret;
>> +
>> +	ret = arm_smmu_enable_clocks(smmu);
>> +	if (ret)
>> +		return ret;
>> +
>> +	list_for_each_entry_safe(tlb_info, temp, &smmu->domain_list, list) {
>> +		printk("\n\n %s %d :: iterating over pending tlb request\n\n", __func__, __LINE__);
>> +		if (tlb_info->tlb_flush_pending) {
>> +			ret = arm_smmu_unmap(tlb_info->domain, tlb_info->iova, tlb_info->size);
>> +			if (!ret)
>> +				return -EINVAL;
>>   
>> -	return arm_smmu_enable_clocks(smmu);
>> +			tlb_info->tlb_flush_pending = false;
>> +
>> +			/* we are done with this request; delete it */
>> +			list_del(&tlb_info->list);
>> +		}
>> +	}
>> +
>> +	return 0;
>>   }
>>   
>>   static int arm_smmu_suspend(struct device *dev)
>>

-- 
The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project

^ permalink raw reply	[flat|nested] 168+ messages in thread

* [PATCH] iommu/arm-smmu: Defer TLB flush in case of unmap op
@ 2017-08-03  5:35                                                               ` Vivek Gautam
  0 siblings, 0 replies; 168+ messages in thread
From: Vivek Gautam @ 2017-08-03  5:35 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Robin,



On 08/02/2017 05:47 PM, Robin Murphy wrote:
> On 02/08/17 10:53, Vivek Gautam wrote:
>> We don't want to touch the TLB when smmu is suspended.
>> Defer it until resume.
>>
>> Signed-off-by: Vivek Gautam <vivek.gautam@codeaurora.org>
>> ---
>>
>> Hi all,
>>
>> Here's the small patch in response of suggestion to defer tlb operations
>> when smmu is in suspend state.
>> The patch stores the TLB requests in 'unmap' when the smmu device is
>> suspended. On resume, it checks all the pending TLB requests, and
>> performs the unmap over those.
>>
>> Right now, I have applied the patch on top of the pm runtime series.
>> Let me know what you think of the change. It will also be helpful if
>> somebody can please test a valid use case with this.
> The patch itself doesn't make much sense to me, but more crucially it's
> definitely broken in concept. We can't return from arm_smmu_unmap()
> without having actually unmapped anything, because that leaves the page
> tables out of sync with what the caller expects - they may immmediately
> reuse that IOVA to map something else for a different device and hit an
> unexpected failure from io-pgtable when the PTE turns out to be non-empty.

To understand things bit more,
once we don't *unmap* in arm_smmu_unmap(), and leave the TLBs as is,
the next mapping can happen only with the *knowledge* of smmu, i.e.,
smmu should be active at that time.
If that's true then, the _runtime()_resume() method will take care of
invalidating the TLBs when we call arm_smmu_unmap() from _runtime_resume().
Is my understanding correct here?

>
> However, if in general suspend *might* power-gate any part of the SMMU,
> then I don't think we have any guarantee of what state any TLBs could be
> in upon resume. Therefore any individual invalidations we skip while
> suspended are probably moot, since resume would almost certainly have to
> invalidate everything to get back to a safe state anyway.

Right, in case when the suspend power-gates the SMMU, then
the TLB context is lost anyways. So resume path can freshly start.
This is something that exynos does at present.

>
> Conversely though, the situation that still concerns me is whether this
> can work at all for a distributed SMMU if things *don't* lose state. Say
> the GPU and its local TBU are in the same clock domain - if the GPU has
> just gone idle and we've clock-gated it, but "the SMMU" (i.e. the TCU)
> is still active servicing other devices, we will assume we can happily
> unmap GPU buffers and issue TLBIs, but what happens with entries held in
> the unclocked TBU's micro-TLB?

We know of platforms we have that have shared TCU and multiple TBUs.
Each TBU is available in its own power domain, not in master's power domain.
In such cases we may want to runtime_get() the TBUs, so that unmap() 
call with
master clock gated gets through.

Can we have a situation where the TBU and master are in the same power
domain, and the unmap is called when the master is not runtime active?
How will such a situation be handled?

Best regards
Vivek

>
> Robin.
>
>> regards
>> Vivek
>>
>>   drivers/iommu/arm-smmu.c | 59 +++++++++++++++++++++++++++++++++++++++++++-----
>>   1 file changed, 53 insertions(+), 6 deletions(-)
>>
>> diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
>> index fe8e7fd61282..1f9c2b16aabb 100644
>> --- a/drivers/iommu/arm-smmu.c
>> +++ b/drivers/iommu/arm-smmu.c
>> @@ -51,6 +51,7 @@
>>   #include <linux/pm_runtime.h>
>>   #include <linux/slab.h>
>>   #include <linux/spinlock.h>
>> +#include <linux/list.h>
>>   
>>   #include <linux/amba/bus.h>
>>   
>> @@ -151,6 +152,14 @@ struct arm_smmu_master_cfg {
>>   #define for_each_cfg_sme(fw, i, idx) \
>>   	for (i = 0; idx = fwspec_smendx(fw, i), i < fw->num_ids; ++i)
>>   
>> +struct arm_smmu_tlb_req_info {
>> +	struct iommu_domain *domain;
>> +	unsigned long iova;
>> +	size_t size;
>> +	bool tlb_flush_pending;
>> +	struct list_head list;
>> +};
>> +
>>   struct arm_smmu_device {
>>   	struct device			*dev;
>>   
>> @@ -182,6 +191,7 @@ struct arm_smmu_device {
>>   	u32				num_s2_context_banks;
>>   	DECLARE_BITMAP(context_map, ARM_SMMU_MAX_CBS);
>>   	atomic_t			irptndx;
>> +	struct list_head		domain_list;
>>   
>>   	u32				num_mapping_groups;
>>   	u16				streamid_mask;
>> @@ -1239,17 +1249,32 @@ static size_t arm_smmu_unmap(struct iommu_domain *domain, unsigned long iova,
>>   			     size_t size)
>>   {
>>   	struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
>> +	struct arm_smmu_device *smmu = smmu_domain->smmu;
>>   	struct io_pgtable_ops *ops = smmu_domain->pgtbl_ops;
>> -	size_t ret;
>> +	struct arm_smmu_tlb_req_info *tlb_info;
>>   
>>   	if (!ops)
>>   		return 0;
>>   
>> -	pm_runtime_get_sync(smmu_domain->smmu->dev);
>> -	ret = ops->unmap(ops, iova, size);
>> -	pm_runtime_put_sync(smmu_domain->smmu->dev);
>> +	/* if the device is suspended; we can't unmap, defer any tlb operations */
>> +	if (pm_runtime_suspended(smmu->dev)) {
>> +		tlb_info = devm_kzalloc(smmu->dev, sizeof(*tlb_info), GFP_ATOMIC);
>> +		if (!tlb_info)
>> +			return -ENOMEM;
>>   
>> -	return ret;
>> +		tlb_info->domain = domain;
>> +		tlb_info->iova = iova;
>> +		tlb_info->size = size;
>> +		tlb_info->tlb_flush_pending = true;
>> +		INIT_LIST_HEAD(&tlb_info->list);
>> +
>> +		/* XXX: We need locks here, but that again introduce the slowpath ? */
>> +		list_add_tail(&tlb_info->list, &smmu->domain_list);
>> +
>> +		return size;
>> +	}
>> +
>> +	return ops->unmap(ops, iova, size);
>>   }
>>   
>>   static phys_addr_t arm_smmu_iova_to_phys_hard(struct iommu_domain *domain,
>> @@ -2166,6 +2191,8 @@ static int arm_smmu_device_probe(struct platform_device *pdev)
>>   		smmu->irqs[i] = irq;
>>   	}
>>   
>> +	INIT_LIST_HEAD(&smmu->domain_list);
>> +
>>   	err = arm_smmu_init_clocks(smmu);
>>   	if (err)
>>   		return err;
>> @@ -2268,8 +2295,28 @@ static int arm_smmu_device_remove(struct platform_device *pdev)
>>   static int arm_smmu_resume(struct device *dev)
>>   {
>>   	struct arm_smmu_device *smmu = dev_get_drvdata(dev);
>> +	struct arm_smmu_tlb_req_info  *tlb_info, *temp;
>> +	int ret;
>> +
>> +	ret = arm_smmu_enable_clocks(smmu);
>> +	if (ret)
>> +		return ret;
>> +
>> +	list_for_each_entry_safe(tlb_info, temp, &smmu->domain_list, list) {
>> +		printk("\n\n %s %d :: iterating over pending tlb request\n\n", __func__, __LINE__);
>> +		if (tlb_info->tlb_flush_pending) {
>> +			ret = arm_smmu_unmap(tlb_info->domain, tlb_info->iova, tlb_info->size);
>> +			if (!ret)
>> +				return -EINVAL;
>>   
>> -	return arm_smmu_enable_clocks(smmu);
>> +			tlb_info->tlb_flush_pending = false;
>> +
>> +			/* we are done with this request; delete it */
>> +			list_del(&tlb_info->list);
>> +		}
>> +	}
>> +
>> +	return 0;
>>   }
>>   
>>   static int arm_smmu_suspend(struct device *dev)
>>

-- 
The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project

^ permalink raw reply	[flat|nested] 168+ messages in thread

* Re: [PATCH] iommu/arm-smmu: Defer TLB flush in case of unmap op
  2017-08-03  5:35                                                               ` Vivek Gautam
@ 2017-08-04 17:04                                                                 ` Robin Murphy
  -1 siblings, 0 replies; 168+ messages in thread
From: Robin Murphy @ 2017-08-04 17:04 UTC (permalink / raw)
  To: Vivek Gautam, iommu, linux-arm-msm
  Cc: robdclark, will.deacon, joro, robh+dt, mark.rutland,
	m.szyprowski, linux-kernel, stanimir.varbanov, sricharan, sboyd,
	linux-arm-kernel

On 03/08/17 06:35, Vivek Gautam wrote:
> Hi Robin,
> 
> 
> 
> On 08/02/2017 05:47 PM, Robin Murphy wrote:
>> On 02/08/17 10:53, Vivek Gautam wrote:
>>> We don't want to touch the TLB when smmu is suspended.
>>> Defer it until resume.
>>>
>>> Signed-off-by: Vivek Gautam <vivek.gautam@codeaurora.org>
>>> ---
>>>
>>> Hi all,
>>>
>>> Here's the small patch in response of suggestion to defer tlb operations
>>> when smmu is in suspend state.
>>> The patch stores the TLB requests in 'unmap' when the smmu device is
>>> suspended. On resume, it checks all the pending TLB requests, and
>>> performs the unmap over those.
>>>
>>> Right now, I have applied the patch on top of the pm runtime series.
>>> Let me know what you think of the change. It will also be helpful if
>>> somebody can please test a valid use case with this.
>> The patch itself doesn't make much sense to me, but more crucially it's
>> definitely broken in concept. We can't return from arm_smmu_unmap()
>> without having actually unmapped anything, because that leaves the page
>> tables out of sync with what the caller expects - they may immmediately
>> reuse that IOVA to map something else for a different device and hit an
>> unexpected failure from io-pgtable when the PTE turns out to be
>> non-empty.
> 
> To understand things bit more,
> once we don't *unmap* in arm_smmu_unmap(), and leave the TLBs as is,
> the next mapping can happen only with the *knowledge* of smmu, i.e.,
> smmu should be active at that time.
> If that's true then, the _runtime()_resume() method will take care of
> invalidating the TLBs when we call arm_smmu_unmap() from _runtime_resume().
> Is my understanding correct here?

What I mean is that it's OK for arm_smmu_unmap() to defer the physical
TLB maintenance for an unmap request if the SMMU is suspended, but it
*must* still update the pagetable so that the given address is logically
unmapped before returning. In other words, the place to make decisions
based on the SMMU PM state would be in the .tlb_add_flush and .tlb_sync
callbacks, rather than at the top level.

>> However, if in general suspend *might* power-gate any part of the SMMU,
>> then I don't think we have any guarantee of what state any TLBs could be
>> in upon resume. Therefore any individual invalidations we skip while
>> suspended are probably moot, since resume would almost certainly have to
>> invalidate everything to get back to a safe state anyway.
> 
> Right, in case when the suspend power-gates the SMMU, then
> the TLB context is lost anyways. So resume path can freshly start.
> This is something that exynos does at present.

Yes, in general I don't think we can assume any SMMU state is preserved,
so the only safe option would be for .runtime_resume to do the same
thing as .resume, which does at least make things nice and simple.

>> Conversely though, the situation that still concerns me is whether this
>> can work at all for a distributed SMMU if things *don't* lose state. Say
>> the GPU and its local TBU are in the same clock domain - if the GPU has
>> just gone idle and we've clock-gated it, but "the SMMU" (i.e. the TCU)
>> is still active servicing other devices, we will assume we can happily
>> unmap GPU buffers and issue TLBIs, but what happens with entries held in
>> the unclocked TBU's micro-TLB?
> 
> We know of platforms we have that have shared TCU and multiple TBUs.
> Each TBU is available in its own power domain, not in master's power
> domain.
> In such cases we may want to runtime_get() the TBUs, so that unmap()
> call with
> master clock gated gets through.
> 
> Can we have a situation where the TBU and master are in the same power
> domain, and the unmap is called when the master is not runtime active?
> How will such a situation be handled?

Having thought about it a bit more, I think the
unmap-after-master-suspended case is only one facet of the problem - if
we can power down individual TBUs/micro-TLBs without suspending the rest
of the SMMU, do we also have any guarantee that such TLBs don't power
back on full of valid-looking random junk?

I'm starting to think the only way to be generally safe would be to
globally invalidate all TLBs after any *master* is resumed, and I'm not
even sure that's feasible :/

Robin.

^ permalink raw reply	[flat|nested] 168+ messages in thread

* [PATCH] iommu/arm-smmu: Defer TLB flush in case of unmap op
@ 2017-08-04 17:04                                                                 ` Robin Murphy
  0 siblings, 0 replies; 168+ messages in thread
From: Robin Murphy @ 2017-08-04 17:04 UTC (permalink / raw)
  To: linux-arm-kernel

On 03/08/17 06:35, Vivek Gautam wrote:
> Hi Robin,
> 
> 
> 
> On 08/02/2017 05:47 PM, Robin Murphy wrote:
>> On 02/08/17 10:53, Vivek Gautam wrote:
>>> We don't want to touch the TLB when smmu is suspended.
>>> Defer it until resume.
>>>
>>> Signed-off-by: Vivek Gautam <vivek.gautam@codeaurora.org>
>>> ---
>>>
>>> Hi all,
>>>
>>> Here's the small patch in response of suggestion to defer tlb operations
>>> when smmu is in suspend state.
>>> The patch stores the TLB requests in 'unmap' when the smmu device is
>>> suspended. On resume, it checks all the pending TLB requests, and
>>> performs the unmap over those.
>>>
>>> Right now, I have applied the patch on top of the pm runtime series.
>>> Let me know what you think of the change. It will also be helpful if
>>> somebody can please test a valid use case with this.
>> The patch itself doesn't make much sense to me, but more crucially it's
>> definitely broken in concept. We can't return from arm_smmu_unmap()
>> without having actually unmapped anything, because that leaves the page
>> tables out of sync with what the caller expects - they may immmediately
>> reuse that IOVA to map something else for a different device and hit an
>> unexpected failure from io-pgtable when the PTE turns out to be
>> non-empty.
> 
> To understand things bit more,
> once we don't *unmap* in arm_smmu_unmap(), and leave the TLBs as is,
> the next mapping can happen only with the *knowledge* of smmu, i.e.,
> smmu should be active at that time.
> If that's true then, the _runtime()_resume() method will take care of
> invalidating the TLBs when we call arm_smmu_unmap() from _runtime_resume().
> Is my understanding correct here?

What I mean is that it's OK for arm_smmu_unmap() to defer the physical
TLB maintenance for an unmap request if the SMMU is suspended, but it
*must* still update the pagetable so that the given address is logically
unmapped before returning. In other words, the place to make decisions
based on the SMMU PM state would be in the .tlb_add_flush and .tlb_sync
callbacks, rather than at the top level.

>> However, if in general suspend *might* power-gate any part of the SMMU,
>> then I don't think we have any guarantee of what state any TLBs could be
>> in upon resume. Therefore any individual invalidations we skip while
>> suspended are probably moot, since resume would almost certainly have to
>> invalidate everything to get back to a safe state anyway.
> 
> Right, in case when the suspend power-gates the SMMU, then
> the TLB context is lost anyways. So resume path can freshly start.
> This is something that exynos does at present.

Yes, in general I don't think we can assume any SMMU state is preserved,
so the only safe option would be for .runtime_resume to do the same
thing as .resume, which does at least make things nice and simple.

>> Conversely though, the situation that still concerns me is whether this
>> can work at all for a distributed SMMU if things *don't* lose state. Say
>> the GPU and its local TBU are in the same clock domain - if the GPU has
>> just gone idle and we've clock-gated it, but "the SMMU" (i.e. the TCU)
>> is still active servicing other devices, we will assume we can happily
>> unmap GPU buffers and issue TLBIs, but what happens with entries held in
>> the unclocked TBU's micro-TLB?
> 
> We know of platforms we have that have shared TCU and multiple TBUs.
> Each TBU is available in its own power domain, not in master's power
> domain.
> In such cases we may want to runtime_get() the TBUs, so that unmap()
> call with
> master clock gated gets through.
> 
> Can we have a situation where the TBU and master are in the same power
> domain, and the unmap is called when the master is not runtime active?
> How will such a situation be handled?

Having thought about it a bit more, I think the
unmap-after-master-suspended case is only one facet of the problem - if
we can power down individual TBUs/micro-TLBs without suspending the rest
of the SMMU, do we also have any guarantee that such TLBs don't power
back on full of valid-looking random junk?

I'm starting to think the only way to be generally safe would be to
globally invalidate all TLBs after any *master* is resumed, and I'm not
even sure that's feasible :/

Robin.

^ permalink raw reply	[flat|nested] 168+ messages in thread

* Re: [PATCH] iommu/arm-smmu: Defer TLB flush in case of unmap op
  2017-08-04 17:04                                                                 ` Robin Murphy
  (?)
@ 2017-08-07  7:44                                                                   ` Vivek Gautam
  -1 siblings, 0 replies; 168+ messages in thread
From: Vivek Gautam @ 2017-08-07  7:44 UTC (permalink / raw)
  To: Robin Murphy
  Cc: iommu, linux-arm-msm, Rob Clark, Will Deacon, Joerg Roedel,
	robh+dt, Mark Rutland, Marek Szyprowski, linux-kernel,
	Stanimir Varbanov, Sricharan R, Stephen Boyd, linux-arm-kernel

Hi Robin,


On Fri, Aug 4, 2017 at 10:34 PM, Robin Murphy <robin.murphy@arm.com> wrote:
> On 03/08/17 06:35, Vivek Gautam wrote:
>> Hi Robin,
>>
>>
>>
>> On 08/02/2017 05:47 PM, Robin Murphy wrote:
>>> On 02/08/17 10:53, Vivek Gautam wrote:
>>>> We don't want to touch the TLB when smmu is suspended.
>>>> Defer it until resume.
>>>>
>>>> Signed-off-by: Vivek Gautam <vivek.gautam@codeaurora.org>
>>>> ---
>>>>
>>>> Hi all,
>>>>
>>>> Here's the small patch in response of suggestion to defer tlb operations
>>>> when smmu is in suspend state.
>>>> The patch stores the TLB requests in 'unmap' when the smmu device is
>>>> suspended. On resume, it checks all the pending TLB requests, and
>>>> performs the unmap over those.
>>>>
>>>> Right now, I have applied the patch on top of the pm runtime series.
>>>> Let me know what you think of the change. It will also be helpful if
>>>> somebody can please test a valid use case with this.
>>> The patch itself doesn't make much sense to me, but more crucially it's
>>> definitely broken in concept. We can't return from arm_smmu_unmap()
>>> without having actually unmapped anything, because that leaves the page
>>> tables out of sync with what the caller expects - they may immmediately
>>> reuse that IOVA to map something else for a different device and hit an
>>> unexpected failure from io-pgtable when the PTE turns out to be
>>> non-empty.
>>
>> To understand things bit more,
>> once we don't *unmap* in arm_smmu_unmap(), and leave the TLBs as is,
>> the next mapping can happen only with the *knowledge* of smmu, i.e.,
>> smmu should be active at that time.
>> If that's true then, the _runtime()_resume() method will take care of
>> invalidating the TLBs when we call arm_smmu_unmap() from _runtime_resume().
>> Is my understanding correct here?
>
> What I mean is that it's OK for arm_smmu_unmap() to defer the physical
> TLB maintenance for an unmap request if the SMMU is suspended, but it
> *must* still update the pagetable so that the given address is logically
> unmapped before returning. In other words, the place to make decisions
> based on the SMMU PM state would be in the .tlb_add_flush and .tlb_sync
> callbacks, rather than at the top level.

Okay, i understand it better now.
.tlb_add_flush and .tlb_sync callbacks should be the right place.

>
>>> However, if in general suspend *might* power-gate any part of the SMMU,
>>> then I don't think we have any guarantee of what state any TLBs could be
>>> in upon resume. Therefore any individual invalidations we skip while
>>> suspended are probably moot, since resume would almost certainly have to
>>> invalidate everything to get back to a safe state anyway.
>>
>> Right, in case when the suspend power-gates the SMMU, then
>> the TLB context is lost anyways. So resume path can freshly start.
>> This is something that exynos does at present.
>
> Yes, in general I don't think we can assume any SMMU state is preserved,
> so the only safe option would be for .runtime_resume to do the same
> thing as .resume, which does at least make things nice and simple.

Let me try to find out more about the state of TLBs. As far as the
programmable registers are concerned, qcom platforms have retention
enabled for them. So they don't loose state after SMMU power down.

>
>>> Conversely though, the situation that still concerns me is whether this
>>> can work at all for a distributed SMMU if things *don't* lose state. Say
>>> the GPU and its local TBU are in the same clock domain - if the GPU has
>>> just gone idle and we've clock-gated it, but "the SMMU" (i.e. the TCU)
>>> is still active servicing other devices, we will assume we can happily
>>> unmap GPU buffers and issue TLBIs, but what happens with entries held in
>>> the unclocked TBU's micro-TLB?
>>
>> We know of platforms we have that have shared TCU and multiple TBUs.
>> Each TBU is available in its own power domain, not in master's power
>> domain.
>> In such cases we may want to runtime_get() the TBUs, so that unmap()
>> call with
>> master clock gated gets through.
>>
>> Can we have a situation where the TBU and master are in the same power
>> domain, and the unmap is called when the master is not runtime active?
>> How will such a situation be handled?
>
> Having thought about it a bit more, I think the
> unmap-after-master-suspended case is only one facet of the problem - if
> we can power down individual TBUs/micro-TLBs without suspending the rest
> of the SMMU, do we also have any guarantee that such TLBs don't power
> back on full of valid-looking random junk?
>
> I'm starting to think the only way to be generally safe would be to
> globally invalidate all TLBs after any *master* is resumed, and I'm not
> even sure that's feasible :/
>
> Robin.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-arm-msm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


regards
Vivek

-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project

^ permalink raw reply	[flat|nested] 168+ messages in thread

* Re: [PATCH] iommu/arm-smmu: Defer TLB flush in case of unmap op
@ 2017-08-07  7:44                                                                   ` Vivek Gautam
  0 siblings, 0 replies; 168+ messages in thread
From: Vivek Gautam @ 2017-08-07  7:44 UTC (permalink / raw)
  To: Robin Murphy
  Cc: iommu, linux-arm-msm, Rob Clark, Will Deacon, Joerg Roedel,
	robh+dt, Mark Rutland, Marek Szyprowski, linux-kernel,
	Stanimir Varbanov, Sricharan R, Stephen Boyd, linux-arm-kernel

Hi Robin,


On Fri, Aug 4, 2017 at 10:34 PM, Robin Murphy <robin.murphy@arm.com> wrote:
> On 03/08/17 06:35, Vivek Gautam wrote:
>> Hi Robin,
>>
>>
>>
>> On 08/02/2017 05:47 PM, Robin Murphy wrote:
>>> On 02/08/17 10:53, Vivek Gautam wrote:
>>>> We don't want to touch the TLB when smmu is suspended.
>>>> Defer it until resume.
>>>>
>>>> Signed-off-by: Vivek Gautam <vivek.gautam@codeaurora.org>
>>>> ---
>>>>
>>>> Hi all,
>>>>
>>>> Here's the small patch in response of suggestion to defer tlb operations
>>>> when smmu is in suspend state.
>>>> The patch stores the TLB requests in 'unmap' when the smmu device is
>>>> suspended. On resume, it checks all the pending TLB requests, and
>>>> performs the unmap over those.
>>>>
>>>> Right now, I have applied the patch on top of the pm runtime series.
>>>> Let me know what you think of the change. It will also be helpful if
>>>> somebody can please test a valid use case with this.
>>> The patch itself doesn't make much sense to me, but more crucially it's
>>> definitely broken in concept. We can't return from arm_smmu_unmap()
>>> without having actually unmapped anything, because that leaves the page
>>> tables out of sync with what the caller expects - they may immmediately
>>> reuse that IOVA to map something else for a different device and hit an
>>> unexpected failure from io-pgtable when the PTE turns out to be
>>> non-empty.
>>
>> To understand things bit more,
>> once we don't *unmap* in arm_smmu_unmap(), and leave the TLBs as is,
>> the next mapping can happen only with the *knowledge* of smmu, i.e.,
>> smmu should be active at that time.
>> If that's true then, the _runtime()_resume() method will take care of
>> invalidating the TLBs when we call arm_smmu_unmap() from _runtime_resume().
>> Is my understanding correct here?
>
> What I mean is that it's OK for arm_smmu_unmap() to defer the physical
> TLB maintenance for an unmap request if the SMMU is suspended, but it
> *must* still update the pagetable so that the given address is logically
> unmapped before returning. In other words, the place to make decisions
> based on the SMMU PM state would be in the .tlb_add_flush and .tlb_sync
> callbacks, rather than at the top level.

Okay, i understand it better now.
.tlb_add_flush and .tlb_sync callbacks should be the right place.

>
>>> However, if in general suspend *might* power-gate any part of the SMMU,
>>> then I don't think we have any guarantee of what state any TLBs could be
>>> in upon resume. Therefore any individual invalidations we skip while
>>> suspended are probably moot, since resume would almost certainly have to
>>> invalidate everything to get back to a safe state anyway.
>>
>> Right, in case when the suspend power-gates the SMMU, then
>> the TLB context is lost anyways. So resume path can freshly start.
>> This is something that exynos does at present.
>
> Yes, in general I don't think we can assume any SMMU state is preserved,
> so the only safe option would be for .runtime_resume to do the same
> thing as .resume, which does at least make things nice and simple.

Let me try to find out more about the state of TLBs. As far as the
programmable registers are concerned, qcom platforms have retention
enabled for them. So they don't loose state after SMMU power down.

>
>>> Conversely though, the situation that still concerns me is whether this
>>> can work at all for a distributed SMMU if things *don't* lose state. Say
>>> the GPU and its local TBU are in the same clock domain - if the GPU has
>>> just gone idle and we've clock-gated it, but "the SMMU" (i.e. the TCU)
>>> is still active servicing other devices, we will assume we can happily
>>> unmap GPU buffers and issue TLBIs, but what happens with entries held in
>>> the unclocked TBU's micro-TLB?
>>
>> We know of platforms we have that have shared TCU and multiple TBUs.
>> Each TBU is available in its own power domain, not in master's power
>> domain.
>> In such cases we may want to runtime_get() the TBUs, so that unmap()
>> call with
>> master clock gated gets through.
>>
>> Can we have a situation where the TBU and master are in the same power
>> domain, and the unmap is called when the master is not runtime active?
>> How will such a situation be handled?
>
> Having thought about it a bit more, I think the
> unmap-after-master-suspended case is only one facet of the problem - if
> we can power down individual TBUs/micro-TLBs without suspending the rest
> of the SMMU, do we also have any guarantee that such TLBs don't power
> back on full of valid-looking random junk?
>
> I'm starting to think the only way to be generally safe would be to
> globally invalidate all TLBs after any *master* is resumed, and I'm not
> even sure that's feasible :/
>
> Robin.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-arm-msm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


regards
Vivek

-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project

^ permalink raw reply	[flat|nested] 168+ messages in thread

* [PATCH] iommu/arm-smmu: Defer TLB flush in case of unmap op
@ 2017-08-07  7:44                                                                   ` Vivek Gautam
  0 siblings, 0 replies; 168+ messages in thread
From: Vivek Gautam @ 2017-08-07  7:44 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Robin,


On Fri, Aug 4, 2017 at 10:34 PM, Robin Murphy <robin.murphy@arm.com> wrote:
> On 03/08/17 06:35, Vivek Gautam wrote:
>> Hi Robin,
>>
>>
>>
>> On 08/02/2017 05:47 PM, Robin Murphy wrote:
>>> On 02/08/17 10:53, Vivek Gautam wrote:
>>>> We don't want to touch the TLB when smmu is suspended.
>>>> Defer it until resume.
>>>>
>>>> Signed-off-by: Vivek Gautam <vivek.gautam@codeaurora.org>
>>>> ---
>>>>
>>>> Hi all,
>>>>
>>>> Here's the small patch in response of suggestion to defer tlb operations
>>>> when smmu is in suspend state.
>>>> The patch stores the TLB requests in 'unmap' when the smmu device is
>>>> suspended. On resume, it checks all the pending TLB requests, and
>>>> performs the unmap over those.
>>>>
>>>> Right now, I have applied the patch on top of the pm runtime series.
>>>> Let me know what you think of the change. It will also be helpful if
>>>> somebody can please test a valid use case with this.
>>> The patch itself doesn't make much sense to me, but more crucially it's
>>> definitely broken in concept. We can't return from arm_smmu_unmap()
>>> without having actually unmapped anything, because that leaves the page
>>> tables out of sync with what the caller expects - they may immmediately
>>> reuse that IOVA to map something else for a different device and hit an
>>> unexpected failure from io-pgtable when the PTE turns out to be
>>> non-empty.
>>
>> To understand things bit more,
>> once we don't *unmap* in arm_smmu_unmap(), and leave the TLBs as is,
>> the next mapping can happen only with the *knowledge* of smmu, i.e.,
>> smmu should be active at that time.
>> If that's true then, the _runtime()_resume() method will take care of
>> invalidating the TLBs when we call arm_smmu_unmap() from _runtime_resume().
>> Is my understanding correct here?
>
> What I mean is that it's OK for arm_smmu_unmap() to defer the physical
> TLB maintenance for an unmap request if the SMMU is suspended, but it
> *must* still update the pagetable so that the given address is logically
> unmapped before returning. In other words, the place to make decisions
> based on the SMMU PM state would be in the .tlb_add_flush and .tlb_sync
> callbacks, rather than at the top level.

Okay, i understand it better now.
.tlb_add_flush and .tlb_sync callbacks should be the right place.

>
>>> However, if in general suspend *might* power-gate any part of the SMMU,
>>> then I don't think we have any guarantee of what state any TLBs could be
>>> in upon resume. Therefore any individual invalidations we skip while
>>> suspended are probably moot, since resume would almost certainly have to
>>> invalidate everything to get back to a safe state anyway.
>>
>> Right, in case when the suspend power-gates the SMMU, then
>> the TLB context is lost anyways. So resume path can freshly start.
>> This is something that exynos does at present.
>
> Yes, in general I don't think we can assume any SMMU state is preserved,
> so the only safe option would be for .runtime_resume to do the same
> thing as .resume, which does at least make things nice and simple.

Let me try to find out more about the state of TLBs. As far as the
programmable registers are concerned, qcom platforms have retention
enabled for them. So they don't loose state after SMMU power down.

>
>>> Conversely though, the situation that still concerns me is whether this
>>> can work at all for a distributed SMMU if things *don't* lose state. Say
>>> the GPU and its local TBU are in the same clock domain - if the GPU has
>>> just gone idle and we've clock-gated it, but "the SMMU" (i.e. the TCU)
>>> is still active servicing other devices, we will assume we can happily
>>> unmap GPU buffers and issue TLBIs, but what happens with entries held in
>>> the unclocked TBU's micro-TLB?
>>
>> We know of platforms we have that have shared TCU and multiple TBUs.
>> Each TBU is available in its own power domain, not in master's power
>> domain.
>> In such cases we may want to runtime_get() the TBUs, so that unmap()
>> call with
>> master clock gated gets through.
>>
>> Can we have a situation where the TBU and master are in the same power
>> domain, and the unmap is called when the master is not runtime active?
>> How will such a situation be handled?
>
> Having thought about it a bit more, I think the
> unmap-after-master-suspended case is only one facet of the problem - if
> we can power down individual TBUs/micro-TLBs without suspending the rest
> of the SMMU, do we also have any guarantee that such TLBs don't power
> back on full of valid-looking random junk?
>
> I'm starting to think the only way to be generally safe would be to
> globally invalidate all TLBs after any *master* is resumed, and I'm not
> even sure that's feasible :/
>
> Robin.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-arm-msm" in
> the body of a message to majordomo at vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


regards
Vivek

-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project

^ permalink raw reply	[flat|nested] 168+ messages in thread

* Re: [PATCH V4 3/6] iommu/arm-smmu: Invoke pm_runtime during probe, add/remove device
  2017-07-13 11:50                   ` Rob Clark
  (?)
  (?)
@ 2017-08-07  8:27                       ` Vivek Gautam
  -1 siblings, 0 replies; 168+ messages in thread
From: Vivek Gautam @ 2017-08-07  8:27 UTC (permalink / raw)
  To: Rob Clark, Robin Murphy
  Cc: Sricharan R, Stephen Boyd, Joerg Roedel, Rob Herring,
	Mark Rutland, Will Deacon, Marek Szyprowski,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	devicetree-u79uwXL29TY76Z2rM5mHXA, Linux Kernel Mailing List,
	linux-clk, linux-arm-msm, Stanimir Varbanov, Archit Taneja,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r

On Thu, Jul 13, 2017 at 5:20 PM, Rob Clark <robdclark-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
> On Thu, Jul 13, 2017 at 1:35 AM, Sricharan R <sricharan-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org> wrote:
>> Hi Vivek,
>>
>> On 7/13/2017 10:43 AM, Vivek Gautam wrote:
>>> Hi Stephen,
>>>
>>>
>>> On 07/13/2017 04:24 AM, Stephen Boyd wrote:
>>>> On 07/06, Vivek Gautam wrote:
>>>>> @@ -1231,12 +1237,18 @@ static int arm_smmu_map(struct iommu_domain *domain, unsigned long iova,
>>>>>   static size_t arm_smmu_unmap(struct iommu_domain *domain, unsigned long iova,
>>>>>                    size_t size)
>>>>>   {
>>>>> -    struct io_pgtable_ops *ops = to_smmu_domain(domain)->pgtbl_ops;
>>>>> +    struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
>>>>> +    struct io_pgtable_ops *ops = smmu_domain->pgtbl_ops;
>>>>> +    size_t ret;
>>>>>         if (!ops)
>>>>>           return 0;
>>>>>   -    return ops->unmap(ops, iova, size);
>>>>> +    pm_runtime_get_sync(smmu_domain->smmu->dev);
>>>> Can these map/unmap ops be called from an atomic context? I seem
>>>> to recall that being a problem before.
>>>
>>> That's something which was dropped in the following patch merged in master:
>>> 523d7423e21b iommu/arm-smmu: Remove io-pgtable spinlock
>>>
>>> Looks like we don't  need locks here anymore?
>>
>>  Apart from the locking, wonder why a explicit pm_runtime is needed
>>  from unmap. Somehow looks like some path in the master using that
>>  should have enabled the pm ?
>>
>
> Yes, there are a bunch of scenarios where unmap can happen with
> disabled master (but not in atomic context).

I would like to understand whether there is a situation where an unmap is
called in atomic context without an enabled master?

Let's say we have the case where all the unmap calls in atomic context happen
only from the master's context (in which case the device link should
take care of
the pm state of smmu), and the only unmap that happen in non-atomic context
is the one with master disabled. In such a case doesn it make sense to
distinguish
the atomic/non-atomic context and add pm_runtime_get_sync()/put_sync() only
for the non-atomic context since that would be the one with master disabled.


Thanks
Vivek

> On the gpu side we
> opportunistically keep a buffer mapping until the buffer is freed
> (which can happen after gpu is disabled).  Likewise, v4l2 won't unmap
> an exported dmabuf while some other driver holds a reference to it
> (which can be dropped when the v4l2 device is suspended).
>
> Since unmap triggers tbl flush which touches iommu regs, the iommu
> driver *definitely* needs a pm_runtime_get_sync().
>
> BR,
> -R
> --
> To unsubscribe from this list: send the line "unsubscribe linux-arm-msm" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 168+ messages in thread

* Re: [PATCH V4 3/6] iommu/arm-smmu: Invoke pm_runtime during probe, add/remove device
@ 2017-08-07  8:27                       ` Vivek Gautam
  0 siblings, 0 replies; 168+ messages in thread
From: Vivek Gautam @ 2017-08-07  8:27 UTC (permalink / raw)
  To: Rob Clark, Robin Murphy
  Cc: Sricharan R, Stephen Boyd, Joerg Roedel, Rob Herring,
	Mark Rutland, Will Deacon, Marek Szyprowski, iommu, devicetree,
	Linux Kernel Mailing List, linux-clk, linux-arm-msm,
	Stanimir Varbanov, Archit Taneja, linux-arm-kernel

On Thu, Jul 13, 2017 at 5:20 PM, Rob Clark <robdclark@gmail.com> wrote:
> On Thu, Jul 13, 2017 at 1:35 AM, Sricharan R <sricharan@codeaurora.org> wrote:
>> Hi Vivek,
>>
>> On 7/13/2017 10:43 AM, Vivek Gautam wrote:
>>> Hi Stephen,
>>>
>>>
>>> On 07/13/2017 04:24 AM, Stephen Boyd wrote:
>>>> On 07/06, Vivek Gautam wrote:
>>>>> @@ -1231,12 +1237,18 @@ static int arm_smmu_map(struct iommu_domain *domain, unsigned long iova,
>>>>>   static size_t arm_smmu_unmap(struct iommu_domain *domain, unsigned long iova,
>>>>>                    size_t size)
>>>>>   {
>>>>> -    struct io_pgtable_ops *ops = to_smmu_domain(domain)->pgtbl_ops;
>>>>> +    struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
>>>>> +    struct io_pgtable_ops *ops = smmu_domain->pgtbl_ops;
>>>>> +    size_t ret;
>>>>>         if (!ops)
>>>>>           return 0;
>>>>>   -    return ops->unmap(ops, iova, size);
>>>>> +    pm_runtime_get_sync(smmu_domain->smmu->dev);
>>>> Can these map/unmap ops be called from an atomic context? I seem
>>>> to recall that being a problem before.
>>>
>>> That's something which was dropped in the following patch merged in master:
>>> 523d7423e21b iommu/arm-smmu: Remove io-pgtable spinlock
>>>
>>> Looks like we don't  need locks here anymore?
>>
>>  Apart from the locking, wonder why a explicit pm_runtime is needed
>>  from unmap. Somehow looks like some path in the master using that
>>  should have enabled the pm ?
>>
>
> Yes, there are a bunch of scenarios where unmap can happen with
> disabled master (but not in atomic context).

I would like to understand whether there is a situation where an unmap is
called in atomic context without an enabled master?

Let's say we have the case where all the unmap calls in atomic context happen
only from the master's context (in which case the device link should
take care of
the pm state of smmu), and the only unmap that happen in non-atomic context
is the one with master disabled. In such a case doesn it make sense to
distinguish
the atomic/non-atomic context and add pm_runtime_get_sync()/put_sync() only
for the non-atomic context since that would be the one with master disabled.


Thanks
Vivek

> On the gpu side we
> opportunistically keep a buffer mapping until the buffer is freed
> (which can happen after gpu is disabled).  Likewise, v4l2 won't unmap
> an exported dmabuf while some other driver holds a reference to it
> (which can be dropped when the v4l2 device is suspended).
>
> Since unmap triggers tbl flush which touches iommu regs, the iommu
> driver *definitely* needs a pm_runtime_get_sync().
>
> BR,
> -R
> --
> To unsubscribe from this list: send the line "unsubscribe linux-arm-msm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project

^ permalink raw reply	[flat|nested] 168+ messages in thread

* Re: [PATCH V4 3/6] iommu/arm-smmu: Invoke pm_runtime during probe, add/remove device
@ 2017-08-07  8:27                       ` Vivek Gautam
  0 siblings, 0 replies; 168+ messages in thread
From: Vivek Gautam @ 2017-08-07  8:27 UTC (permalink / raw)
  To: Rob Clark, Robin Murphy
  Cc: Sricharan R, Stephen Boyd, Joerg Roedel, Rob Herring,
	Mark Rutland, Will Deacon, Marek Szyprowski, iommu, devicetree,
	Linux Kernel Mailing List, linux-clk, linux-arm-msm,
	Stanimir Varbanov, Archit Taneja, linux-arm-kernel

On Thu, Jul 13, 2017 at 5:20 PM, Rob Clark <robdclark@gmail.com> wrote:
> On Thu, Jul 13, 2017 at 1:35 AM, Sricharan R <sricharan@codeaurora.org> wrote:
>> Hi Vivek,
>>
>> On 7/13/2017 10:43 AM, Vivek Gautam wrote:
>>> Hi Stephen,
>>>
>>>
>>> On 07/13/2017 04:24 AM, Stephen Boyd wrote:
>>>> On 07/06, Vivek Gautam wrote:
>>>>> @@ -1231,12 +1237,18 @@ static int arm_smmu_map(struct iommu_domain *domain, unsigned long iova,
>>>>>   static size_t arm_smmu_unmap(struct iommu_domain *domain, unsigned long iova,
>>>>>                    size_t size)
>>>>>   {
>>>>> -    struct io_pgtable_ops *ops = to_smmu_domain(domain)->pgtbl_ops;
>>>>> +    struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
>>>>> +    struct io_pgtable_ops *ops = smmu_domain->pgtbl_ops;
>>>>> +    size_t ret;
>>>>>         if (!ops)
>>>>>           return 0;
>>>>>   -    return ops->unmap(ops, iova, size);
>>>>> +    pm_runtime_get_sync(smmu_domain->smmu->dev);
>>>> Can these map/unmap ops be called from an atomic context? I seem
>>>> to recall that being a problem before.
>>>
>>> That's something which was dropped in the following patch merged in master:
>>> 523d7423e21b iommu/arm-smmu: Remove io-pgtable spinlock
>>>
>>> Looks like we don't  need locks here anymore?
>>
>>  Apart from the locking, wonder why a explicit pm_runtime is needed
>>  from unmap. Somehow looks like some path in the master using that
>>  should have enabled the pm ?
>>
>
> Yes, there are a bunch of scenarios where unmap can happen with
> disabled master (but not in atomic context).

I would like to understand whether there is a situation where an unmap is
called in atomic context without an enabled master?

Let's say we have the case where all the unmap calls in atomic context happen
only from the master's context (in which case the device link should
take care of
the pm state of smmu), and the only unmap that happen in non-atomic context
is the one with master disabled. In such a case doesn it make sense to
distinguish
the atomic/non-atomic context and add pm_runtime_get_sync()/put_sync() only
for the non-atomic context since that would be the one with master disabled.


Thanks
Vivek

> On the gpu side we
> opportunistically keep a buffer mapping until the buffer is freed
> (which can happen after gpu is disabled).  Likewise, v4l2 won't unmap
> an exported dmabuf while some other driver holds a reference to it
> (which can be dropped when the v4l2 device is suspended).
>
> Since unmap triggers tbl flush which touches iommu regs, the iommu
> driver *definitely* needs a pm_runtime_get_sync().
>
> BR,
> -R
> --
> To unsubscribe from this list: send the line "unsubscribe linux-arm-msm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project

^ permalink raw reply	[flat|nested] 168+ messages in thread

* [PATCH V4 3/6] iommu/arm-smmu: Invoke pm_runtime during probe, add/remove device
@ 2017-08-07  8:27                       ` Vivek Gautam
  0 siblings, 0 replies; 168+ messages in thread
From: Vivek Gautam @ 2017-08-07  8:27 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Jul 13, 2017 at 5:20 PM, Rob Clark <robdclark@gmail.com> wrote:
> On Thu, Jul 13, 2017 at 1:35 AM, Sricharan R <sricharan@codeaurora.org> wrote:
>> Hi Vivek,
>>
>> On 7/13/2017 10:43 AM, Vivek Gautam wrote:
>>> Hi Stephen,
>>>
>>>
>>> On 07/13/2017 04:24 AM, Stephen Boyd wrote:
>>>> On 07/06, Vivek Gautam wrote:
>>>>> @@ -1231,12 +1237,18 @@ static int arm_smmu_map(struct iommu_domain *domain, unsigned long iova,
>>>>>   static size_t arm_smmu_unmap(struct iommu_domain *domain, unsigned long iova,
>>>>>                    size_t size)
>>>>>   {
>>>>> -    struct io_pgtable_ops *ops = to_smmu_domain(domain)->pgtbl_ops;
>>>>> +    struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
>>>>> +    struct io_pgtable_ops *ops = smmu_domain->pgtbl_ops;
>>>>> +    size_t ret;
>>>>>         if (!ops)
>>>>>           return 0;
>>>>>   -    return ops->unmap(ops, iova, size);
>>>>> +    pm_runtime_get_sync(smmu_domain->smmu->dev);
>>>> Can these map/unmap ops be called from an atomic context? I seem
>>>> to recall that being a problem before.
>>>
>>> That's something which was dropped in the following patch merged in master:
>>> 523d7423e21b iommu/arm-smmu: Remove io-pgtable spinlock
>>>
>>> Looks like we don't  need locks here anymore?
>>
>>  Apart from the locking, wonder why a explicit pm_runtime is needed
>>  from unmap. Somehow looks like some path in the master using that
>>  should have enabled the pm ?
>>
>
> Yes, there are a bunch of scenarios where unmap can happen with
> disabled master (but not in atomic context).

I would like to understand whether there is a situation where an unmap is
called in atomic context without an enabled master?

Let's say we have the case where all the unmap calls in atomic context happen
only from the master's context (in which case the device link should
take care of
the pm state of smmu), and the only unmap that happen in non-atomic context
is the one with master disabled. In such a case doesn it make sense to
distinguish
the atomic/non-atomic context and add pm_runtime_get_sync()/put_sync() only
for the non-atomic context since that would be the one with master disabled.


Thanks
Vivek

> On the gpu side we
> opportunistically keep a buffer mapping until the buffer is freed
> (which can happen after gpu is disabled).  Likewise, v4l2 won't unmap
> an exported dmabuf while some other driver holds a reference to it
> (which can be dropped when the v4l2 device is suspended).
>
> Since unmap triggers tbl flush which touches iommu regs, the iommu
> driver *definitely* needs a pm_runtime_get_sync().
>
> BR,
> -R
> --
> To unsubscribe from this list: send the line "unsubscribe linux-arm-msm" in
> the body of a message to majordomo at vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project

^ permalink raw reply	[flat|nested] 168+ messages in thread

* Re: [PATCH V4 3/6] iommu/arm-smmu: Invoke pm_runtime during probe, add/remove device
  2017-08-07  8:27                       ` Vivek Gautam
  (?)
@ 2017-08-07 12:29                         ` Rob Clark
  -1 siblings, 0 replies; 168+ messages in thread
From: Rob Clark @ 2017-08-07 12:29 UTC (permalink / raw)
  To: Vivek Gautam
  Cc: Robin Murphy, Sricharan R, Stephen Boyd, Joerg Roedel,
	Rob Herring, Mark Rutland, Will Deacon, Marek Szyprowski, iommu,
	devicetree, Linux Kernel Mailing List, linux-clk, linux-arm-msm,
	Stanimir Varbanov, Archit Taneja, linux-arm-kernel

On Mon, Aug 7, 2017 at 4:27 AM, Vivek Gautam
<vivek.gautam@codeaurora.org> wrote:
> On Thu, Jul 13, 2017 at 5:20 PM, Rob Clark <robdclark@gmail.com> wrote:
>> On Thu, Jul 13, 2017 at 1:35 AM, Sricharan R <sricharan@codeaurora.org> wrote:
>>> Hi Vivek,
>>>
>>> On 7/13/2017 10:43 AM, Vivek Gautam wrote:
>>>> Hi Stephen,
>>>>
>>>>
>>>> On 07/13/2017 04:24 AM, Stephen Boyd wrote:
>>>>> On 07/06, Vivek Gautam wrote:
>>>>>> @@ -1231,12 +1237,18 @@ static int arm_smmu_map(struct iommu_domain *domain, unsigned long iova,
>>>>>>   static size_t arm_smmu_unmap(struct iommu_domain *domain, unsigned long iova,
>>>>>>                    size_t size)
>>>>>>   {
>>>>>> -    struct io_pgtable_ops *ops = to_smmu_domain(domain)->pgtbl_ops;
>>>>>> +    struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
>>>>>> +    struct io_pgtable_ops *ops = smmu_domain->pgtbl_ops;
>>>>>> +    size_t ret;
>>>>>>         if (!ops)
>>>>>>           return 0;
>>>>>>   -    return ops->unmap(ops, iova, size);
>>>>>> +    pm_runtime_get_sync(smmu_domain->smmu->dev);
>>>>> Can these map/unmap ops be called from an atomic context? I seem
>>>>> to recall that being a problem before.
>>>>
>>>> That's something which was dropped in the following patch merged in master:
>>>> 523d7423e21b iommu/arm-smmu: Remove io-pgtable spinlock
>>>>
>>>> Looks like we don't  need locks here anymore?
>>>
>>>  Apart from the locking, wonder why a explicit pm_runtime is needed
>>>  from unmap. Somehow looks like some path in the master using that
>>>  should have enabled the pm ?
>>>
>>
>> Yes, there are a bunch of scenarios where unmap can happen with
>> disabled master (but not in atomic context).
>
> I would like to understand whether there is a situation where an unmap is
> called in atomic context without an enabled master?
>
> Let's say we have the case where all the unmap calls in atomic context happen
> only from the master's context (in which case the device link should
> take care of
> the pm state of smmu), and the only unmap that happen in non-atomic context
> is the one with master disabled. In such a case doesn it make sense to
> distinguish
> the atomic/non-atomic context and add pm_runtime_get_sync()/put_sync() only
> for the non-atomic context since that would be the one with master disabled.
>

At least drm/msm needs to hold obj->lock (a mutex) in unmap, so it
won't unmap anything in atomic ctx (but it can unmap w/ master
disabled).  I can't really comment about other non-gpu drivers.  It
seems like a reasonable constraint that either master is enabled or
not in atomic ctx.

Currently we actually wrap unmap w/ pm_runtime_get/put_sync(), but I'd
like to drop that to avoid powering up the gpu.

BR,
-R

^ permalink raw reply	[flat|nested] 168+ messages in thread

* Re: [PATCH V4 3/6] iommu/arm-smmu: Invoke pm_runtime during probe, add/remove device
@ 2017-08-07 12:29                         ` Rob Clark
  0 siblings, 0 replies; 168+ messages in thread
From: Rob Clark @ 2017-08-07 12:29 UTC (permalink / raw)
  To: Vivek Gautam
  Cc: Robin Murphy, Sricharan R, Stephen Boyd, Joerg Roedel,
	Rob Herring, Mark Rutland, Will Deacon, Marek Szyprowski, iommu,
	devicetree, Linux Kernel Mailing List, linux-clk, linux-arm-msm,
	Stanimir Varbanov, Archit Taneja, linux-arm-kernel

On Mon, Aug 7, 2017 at 4:27 AM, Vivek Gautam
<vivek.gautam@codeaurora.org> wrote:
> On Thu, Jul 13, 2017 at 5:20 PM, Rob Clark <robdclark@gmail.com> wrote:
>> On Thu, Jul 13, 2017 at 1:35 AM, Sricharan R <sricharan@codeaurora.org> wrote:
>>> Hi Vivek,
>>>
>>> On 7/13/2017 10:43 AM, Vivek Gautam wrote:
>>>> Hi Stephen,
>>>>
>>>>
>>>> On 07/13/2017 04:24 AM, Stephen Boyd wrote:
>>>>> On 07/06, Vivek Gautam wrote:
>>>>>> @@ -1231,12 +1237,18 @@ static int arm_smmu_map(struct iommu_domain *domain, unsigned long iova,
>>>>>>   static size_t arm_smmu_unmap(struct iommu_domain *domain, unsigned long iova,
>>>>>>                    size_t size)
>>>>>>   {
>>>>>> -    struct io_pgtable_ops *ops = to_smmu_domain(domain)->pgtbl_ops;
>>>>>> +    struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
>>>>>> +    struct io_pgtable_ops *ops = smmu_domain->pgtbl_ops;
>>>>>> +    size_t ret;
>>>>>>         if (!ops)
>>>>>>           return 0;
>>>>>>   -    return ops->unmap(ops, iova, size);
>>>>>> +    pm_runtime_get_sync(smmu_domain->smmu->dev);
>>>>> Can these map/unmap ops be called from an atomic context? I seem
>>>>> to recall that being a problem before.
>>>>
>>>> That's something which was dropped in the following patch merged in master:
>>>> 523d7423e21b iommu/arm-smmu: Remove io-pgtable spinlock
>>>>
>>>> Looks like we don't  need locks here anymore?
>>>
>>>  Apart from the locking, wonder why a explicit pm_runtime is needed
>>>  from unmap. Somehow looks like some path in the master using that
>>>  should have enabled the pm ?
>>>
>>
>> Yes, there are a bunch of scenarios where unmap can happen with
>> disabled master (but not in atomic context).
>
> I would like to understand whether there is a situation where an unmap is
> called in atomic context without an enabled master?
>
> Let's say we have the case where all the unmap calls in atomic context happen
> only from the master's context (in which case the device link should
> take care of
> the pm state of smmu), and the only unmap that happen in non-atomic context
> is the one with master disabled. In such a case doesn it make sense to
> distinguish
> the atomic/non-atomic context and add pm_runtime_get_sync()/put_sync() only
> for the non-atomic context since that would be the one with master disabled.
>

At least drm/msm needs to hold obj->lock (a mutex) in unmap, so it
won't unmap anything in atomic ctx (but it can unmap w/ master
disabled).  I can't really comment about other non-gpu drivers.  It
seems like a reasonable constraint that either master is enabled or
not in atomic ctx.

Currently we actually wrap unmap w/ pm_runtime_get/put_sync(), but I'd
like to drop that to avoid powering up the gpu.

BR,
-R

^ permalink raw reply	[flat|nested] 168+ messages in thread

* [PATCH V4 3/6] iommu/arm-smmu: Invoke pm_runtime during probe, add/remove device
@ 2017-08-07 12:29                         ` Rob Clark
  0 siblings, 0 replies; 168+ messages in thread
From: Rob Clark @ 2017-08-07 12:29 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Aug 7, 2017 at 4:27 AM, Vivek Gautam
<vivek.gautam@codeaurora.org> wrote:
> On Thu, Jul 13, 2017 at 5:20 PM, Rob Clark <robdclark@gmail.com> wrote:
>> On Thu, Jul 13, 2017 at 1:35 AM, Sricharan R <sricharan@codeaurora.org> wrote:
>>> Hi Vivek,
>>>
>>> On 7/13/2017 10:43 AM, Vivek Gautam wrote:
>>>> Hi Stephen,
>>>>
>>>>
>>>> On 07/13/2017 04:24 AM, Stephen Boyd wrote:
>>>>> On 07/06, Vivek Gautam wrote:
>>>>>> @@ -1231,12 +1237,18 @@ static int arm_smmu_map(struct iommu_domain *domain, unsigned long iova,
>>>>>>   static size_t arm_smmu_unmap(struct iommu_domain *domain, unsigned long iova,
>>>>>>                    size_t size)
>>>>>>   {
>>>>>> -    struct io_pgtable_ops *ops = to_smmu_domain(domain)->pgtbl_ops;
>>>>>> +    struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
>>>>>> +    struct io_pgtable_ops *ops = smmu_domain->pgtbl_ops;
>>>>>> +    size_t ret;
>>>>>>         if (!ops)
>>>>>>           return 0;
>>>>>>   -    return ops->unmap(ops, iova, size);
>>>>>> +    pm_runtime_get_sync(smmu_domain->smmu->dev);
>>>>> Can these map/unmap ops be called from an atomic context? I seem
>>>>> to recall that being a problem before.
>>>>
>>>> That's something which was dropped in the following patch merged in master:
>>>> 523d7423e21b iommu/arm-smmu: Remove io-pgtable spinlock
>>>>
>>>> Looks like we don't  need locks here anymore?
>>>
>>>  Apart from the locking, wonder why a explicit pm_runtime is needed
>>>  from unmap. Somehow looks like some path in the master using that
>>>  should have enabled the pm ?
>>>
>>
>> Yes, there are a bunch of scenarios where unmap can happen with
>> disabled master (but not in atomic context).
>
> I would like to understand whether there is a situation where an unmap is
> called in atomic context without an enabled master?
>
> Let's say we have the case where all the unmap calls in atomic context happen
> only from the master's context (in which case the device link should
> take care of
> the pm state of smmu), and the only unmap that happen in non-atomic context
> is the one with master disabled. In such a case doesn it make sense to
> distinguish
> the atomic/non-atomic context and add pm_runtime_get_sync()/put_sync() only
> for the non-atomic context since that would be the one with master disabled.
>

At least drm/msm needs to hold obj->lock (a mutex) in unmap, so it
won't unmap anything in atomic ctx (but it can unmap w/ master
disabled).  I can't really comment about other non-gpu drivers.  It
seems like a reasonable constraint that either master is enabled or
not in atomic ctx.

Currently we actually wrap unmap w/ pm_runtime_get/put_sync(), but I'd
like to drop that to avoid powering up the gpu.

BR,
-R

^ permalink raw reply	[flat|nested] 168+ messages in thread

* [PATCH v2 1/1] iommu/arm-smmu: Defer TLB flush in case of unmap op
  2017-08-07  7:44                                                                   ` Vivek Gautam
  (?)
  (?)
@ 2017-09-06  5:37                                                                   ` Vivek Gautam
       [not found]                                                                     ` <1504676255-15980-1-git-send-email-vivek.gautam-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org>
  2017-10-13 19:08                                                                     ` Will Deacon
  -1 siblings, 2 replies; 168+ messages in thread
From: Vivek Gautam @ 2017-09-06  5:37 UTC (permalink / raw)
  To: robin.murphy, robdclark
  Cc: will.deacon, stanimir.varbanov, sboyd, sricharan, iommu,
	linux-arm-msm, Vivek Gautam

We don't want to touch the TLB when smmu is suspended, so
defer the TLB maintenance until smmu is resumed.
On resume, we issue arm_smmu_device_reset() to restore the
configuration and flush the TLBs.

Signed-off-by: Vivek Gautam <vivek.gautam@codeaurora.org>
---

Hi Robin,

This patch comes after the discussion[1] we had about defering the
physical TLB maintenance for an unmap request if the SMMU is
suspended. Sorry for the delay in sending the updated version
of the patch.

As discussed, this patch now checks the PM state of smmu in
.tlb_add_flush and .tlb_sync page-table ops and return if smmu
is suspended. On resume without assuming that the TLBs state is
preserved, we issue a arm_smmu_device_reset() which is the safest
thing to do.

Alternatively, without going into the TLB defer thing, we can simply
avoid calling pm_runtime_get/put() in case of atomic context, as we
discussed in the other thread[3]. This will look something like this:

      static size_t arm_smmu_unmap(struct iommu_domain *domain, ....)
      {
      <snip>

     -       return ops->unmap(ops, iova, size);
     +       if (!in_atomic())
     +              pm_runtime_get_sync(smmu_domain->smmu->dev);
     +       ret = ops->unmap(ops, iova, size);
     +       if (!in_atomic())
     +              pm_runtime_put_sync(smmu_domain->smmu->dev);
     +
     +       return ret;
      }

Let me know which approach should work.

One other concern that we were discussing was of distributed SMMU
configuration.
"Say the GPU and its local TBU are in the same clock domain - if the GPU has
just gone idle and we've clock-gated it, but "the SMMU" (i.e. the TCU)
is still active servicing other devices, we will assume we can happily
unmap GPU buffers and issue TLBIs, but what happens with entries held in
the unclocked TBU's micro-TLB?"

In such scenerio, when master is clock gated and TCU is still running:
 -> If TCU and TBU are in same clock/power domain, then we can still
    issue TLBIs as long as the smmu is clocked.
 -> If TCU and TBU are in separate clock/power domains, then we better
    check the power state for TBUs and defer TLB maintenance if TBUs are
    clock gated.
    In such scenerio will it make sense to represent a distributed smmu
    as TCU device with multiple TBU child devices?

This patch is based on the pm runtime series for arm-smmu[2], the next
version of which I will post after we conclude this discussion.

[1] https://patchwork.kernel.org/patch/9876489/
[2] https://lkml.org/lkml/2017/7/6/230
[3] https://lkml.org/lkml/2017/8/7/386

 drivers/iommu/arm-smmu.c | 26 +++++++++++++++++++++++++-
 1 file changed, 25 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index 8384d5fad388..c6d904733166 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -439,6 +439,10 @@ static void arm_smmu_tlb_sync_context(void *cookie)
 	void __iomem *base = ARM_SMMU_CB(smmu, smmu_domain->cfg.cbndx);
 	unsigned long flags;
 
+	/* smmu suspended? we can't perform TLB operations */
+	if (pm_runtime_suspended(smmu->dev))
+		return;
+
 	spin_lock_irqsave(&smmu_domain->cb_lock, flags);
 	__arm_smmu_tlb_sync(smmu, base + ARM_SMMU_CB_TLBSYNC,
 			    base + ARM_SMMU_CB_TLBSTATUS);
@@ -449,6 +453,9 @@ static void arm_smmu_tlb_sync_vmid(void *cookie)
 {
 	struct arm_smmu_domain *smmu_domain = cookie;
 
+	if (pm_runtime_suspended(smmu_domain->smmu->dev))
+		return;
+
 	arm_smmu_tlb_sync_global(smmu_domain->smmu);
 }
 
@@ -458,6 +465,9 @@ static void arm_smmu_tlb_inv_context_s1(void *cookie)
 	struct arm_smmu_cfg *cfg = &smmu_domain->cfg;
 	void __iomem *base = ARM_SMMU_CB(smmu_domain->smmu, cfg->cbndx);
 
+	if (pm_runtime_suspended(smmu_domain->smmu->dev))
+		return;
+
 	writel_relaxed(cfg->asid, base + ARM_SMMU_CB_S1_TLBIASID);
 	arm_smmu_tlb_sync_context(cookie);
 }
@@ -468,6 +478,9 @@ static void arm_smmu_tlb_inv_context_s2(void *cookie)
 	struct arm_smmu_device *smmu = smmu_domain->smmu;
 	void __iomem *base = ARM_SMMU_GR0(smmu);
 
+	if (pm_runtime_suspended(smmu_domain->smmu->dev))
+		return;
+
 	writel_relaxed(smmu_domain->cfg.vmid, base + ARM_SMMU_GR0_TLBIVMID);
 	arm_smmu_tlb_sync_global(smmu);
 }
@@ -480,6 +493,9 @@ static void arm_smmu_tlb_inv_range_nosync(unsigned long iova, size_t size,
 	bool stage1 = cfg->cbar != CBAR_TYPE_S2_TRANS;
 	void __iomem *reg = ARM_SMMU_CB(smmu_domain->smmu, cfg->cbndx);
 
+	if (pm_runtime_suspended(smmu_domain->smmu->dev))
+		return;
+
 	if (stage1) {
 		reg += leaf ? ARM_SMMU_CB_S1_TLBIVAL : ARM_SMMU_CB_S1_TLBIVA;
 
@@ -521,6 +537,9 @@ static void arm_smmu_tlb_inv_vmid_nosync(unsigned long iova, size_t size,
 	struct arm_smmu_domain *smmu_domain = cookie;
 	void __iomem *base = ARM_SMMU_GR0(smmu_domain->smmu);
 
+	if (pm_runtime_suspended(smmu_domain->smmu->dev))
+		return;
+
 	writel_relaxed(smmu_domain->cfg.vmid, base + ARM_SMMU_GR0_TLBIVMID);
 }
 
@@ -2299,8 +2318,13 @@ static int __maybe_unused arm_smmu_pm_resume(struct device *dev)
 static int __maybe_unused arm_smmu_resume(struct device *dev)
 {
 	struct arm_smmu_device *smmu = dev_get_drvdata(dev);
+	int ret;
+
+	ret = clk_bulk_prepare_enable(smmu->num_clks, smmu->clocks);
+	if (ret)
+		return ret;
 
-	return clk_bulk_prepare_enable(smmu->num_clks, smmu->clocks);
+	return arm_smmu_device_reset(smmu);
 }
 
 static int __maybe_unused arm_smmu_suspend(struct device *dev)
-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member
of Code Aurora Forum, hosted by The Linux Foundation

^ permalink raw reply related	[flat|nested] 168+ messages in thread

* Re: [PATCH v2 1/1] iommu/arm-smmu: Defer TLB flush in case of unmap op
       [not found]                                                                     ` <1504676255-15980-1-git-send-email-vivek.gautam-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org>
@ 2017-09-13 11:04                                                                       ` Vivek Gautam
  0 siblings, 0 replies; 168+ messages in thread
From: Vivek Gautam @ 2017-09-13 11:04 UTC (permalink / raw)
  To: robin.murphy-5wv7dgnIgG8, robdclark-Re5JQEeQqe8AvxtiuMwx3w
  Cc: will.deacon-5wv7dgnIgG8, sboyd-sgV2jX0FEOL9JmXXK+q4OQ,
	stanimir.varbanov-QSEj5FYQhm4dnm+yROfE0A,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-arm-msm-u79uwXL29TY76Z2rM5mHXA

Hi,


On 09/06/2017 11:07 AM, Vivek Gautam wrote:
> We don't want to touch the TLB when smmu is suspended, so
> defer the TLB maintenance until smmu is resumed.
> On resume, we issue arm_smmu_device_reset() to restore the
> configuration and flush the TLBs.
>
> Signed-off-by: Vivek Gautam <vivek.gautam-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org>
> ---

gentle ping. any thoughts on this patch?

>
> Hi Robin,
>
> This patch comes after the discussion[1] we had about defering the
> physical TLB maintenance for an unmap request if the SMMU is
> suspended. Sorry for the delay in sending the updated version
> of the patch.
>
> As discussed, this patch now checks the PM state of smmu in
> .tlb_add_flush and .tlb_sync page-table ops and return if smmu
> is suspended. On resume without assuming that the TLBs state is
> preserved, we issue a arm_smmu_device_reset() which is the safest
> thing to do.
>
> Alternatively, without going into the TLB defer thing, we can simply
> avoid calling pm_runtime_get/put() in case of atomic context, as we
> discussed in the other thread[3]. This will look something like this:
>
>        static size_t arm_smmu_unmap(struct iommu_domain *domain, ....)
>        {
>        <snip>
>
>       -       return ops->unmap(ops, iova, size);
>       +       if (!in_atomic())
>       +              pm_runtime_get_sync(smmu_domain->smmu->dev);
>       +       ret = ops->unmap(ops, iova, size);
>       +       if (!in_atomic())
>       +              pm_runtime_put_sync(smmu_domain->smmu->dev);
>       +
>       +       return ret;
>        }
>
> Let me know which approach should work.
>
> One other concern that we were discussing was of distributed SMMU
> configuration.
> "Say the GPU and its local TBU are in the same clock domain - if the GPU has
> just gone idle and we've clock-gated it, but "the SMMU" (i.e. the TCU)
> is still active servicing other devices, we will assume we can happily
> unmap GPU buffers and issue TLBIs, but what happens with entries held in
> the unclocked TBU's micro-TLB?"
>
> In such scenerio, when master is clock gated and TCU is still running:
>   -> If TCU and TBU are in same clock/power domain, then we can still
>      issue TLBIs as long as the smmu is clocked.
>   -> If TCU and TBU are in separate clock/power domains, then we better
>      check the power state for TBUs and defer TLB maintenance if TBUs are
>      clock gated.
>      In such scenerio will it make sense to represent a distributed smmu
>      as TCU device with multiple TBU child devices?
>
> This patch is based on the pm runtime series for arm-smmu[2], the next
> version of which I will post after we conclude this discussion.
>
> [1] https://patchwork.kernel.org/patch/9876489/
> [2] https://lkml.org/lkml/2017/7/6/230
> [3] https://lkml.org/lkml/2017/8/7/386
>
>   drivers/iommu/arm-smmu.c | 26 +++++++++++++++++++++++++-
>   1 file changed, 25 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
> index 8384d5fad388..c6d904733166 100644
> --- a/drivers/iommu/arm-smmu.c
> +++ b/drivers/iommu/arm-smmu.c
> @@ -439,6 +439,10 @@ static void arm_smmu_tlb_sync_context(void *cookie)
>   	void __iomem *base = ARM_SMMU_CB(smmu, smmu_domain->cfg.cbndx);
>   	unsigned long flags;
>   
> +	/* smmu suspended? we can't perform TLB operations */
> +	if (pm_runtime_suspended(smmu->dev))
> +		return;
> +
>   	spin_lock_irqsave(&smmu_domain->cb_lock, flags);
>   	__arm_smmu_tlb_sync(smmu, base + ARM_SMMU_CB_TLBSYNC,
>   			    base + ARM_SMMU_CB_TLBSTATUS);
> @@ -449,6 +453,9 @@ static void arm_smmu_tlb_sync_vmid(void *cookie)
>   {
>   	struct arm_smmu_domain *smmu_domain = cookie;
>   
> +	if (pm_runtime_suspended(smmu_domain->smmu->dev))
> +		return;
> +
>   	arm_smmu_tlb_sync_global(smmu_domain->smmu);
>   }
>   
> @@ -458,6 +465,9 @@ static void arm_smmu_tlb_inv_context_s1(void *cookie)
>   	struct arm_smmu_cfg *cfg = &smmu_domain->cfg;
>   	void __iomem *base = ARM_SMMU_CB(smmu_domain->smmu, cfg->cbndx);
>   
> +	if (pm_runtime_suspended(smmu_domain->smmu->dev))
> +		return;
> +
>   	writel_relaxed(cfg->asid, base + ARM_SMMU_CB_S1_TLBIASID);
>   	arm_smmu_tlb_sync_context(cookie);
>   }
> @@ -468,6 +478,9 @@ static void arm_smmu_tlb_inv_context_s2(void *cookie)
>   	struct arm_smmu_device *smmu = smmu_domain->smmu;
>   	void __iomem *base = ARM_SMMU_GR0(smmu);
>   
> +	if (pm_runtime_suspended(smmu_domain->smmu->dev))
> +		return;
> +
>   	writel_relaxed(smmu_domain->cfg.vmid, base + ARM_SMMU_GR0_TLBIVMID);
>   	arm_smmu_tlb_sync_global(smmu);
>   }
> @@ -480,6 +493,9 @@ static void arm_smmu_tlb_inv_range_nosync(unsigned long iova, size_t size,
>   	bool stage1 = cfg->cbar != CBAR_TYPE_S2_TRANS;
>   	void __iomem *reg = ARM_SMMU_CB(smmu_domain->smmu, cfg->cbndx);
>   
> +	if (pm_runtime_suspended(smmu_domain->smmu->dev))
> +		return;
> +
>   	if (stage1) {
>   		reg += leaf ? ARM_SMMU_CB_S1_TLBIVAL : ARM_SMMU_CB_S1_TLBIVA;
>   
> @@ -521,6 +537,9 @@ static void arm_smmu_tlb_inv_vmid_nosync(unsigned long iova, size_t size,
>   	struct arm_smmu_domain *smmu_domain = cookie;
>   	void __iomem *base = ARM_SMMU_GR0(smmu_domain->smmu);
>   
> +	if (pm_runtime_suspended(smmu_domain->smmu->dev))
> +		return;
> +
>   	writel_relaxed(smmu_domain->cfg.vmid, base + ARM_SMMU_GR0_TLBIVMID);
>   }
>   
> @@ -2299,8 +2318,13 @@ static int __maybe_unused arm_smmu_pm_resume(struct device *dev)
>   static int __maybe_unused arm_smmu_resume(struct device *dev)
>   {
>   	struct arm_smmu_device *smmu = dev_get_drvdata(dev);
> +	int ret;
> +
> +	ret = clk_bulk_prepare_enable(smmu->num_clks, smmu->clocks);
> +	if (ret)
> +		return ret;
>   
> -	return clk_bulk_prepare_enable(smmu->num_clks, smmu->clocks);
> +	return arm_smmu_device_reset(smmu);
>   }
>   
>   static int __maybe_unused arm_smmu_suspend(struct device *dev)

-- 
The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project

^ permalink raw reply	[flat|nested] 168+ messages in thread

* Re: [PATCH v2 1/1] iommu/arm-smmu: Defer TLB flush in case of unmap op
  2017-09-06  5:37                                                                   ` [PATCH v2 1/1] " Vivek Gautam
       [not found]                                                                     ` <1504676255-15980-1-git-send-email-vivek.gautam-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org>
@ 2017-10-13 19:08                                                                     ` Will Deacon
  2017-11-20 17:17                                                                       ` Vivek Gautam
  1 sibling, 1 reply; 168+ messages in thread
From: Will Deacon @ 2017-10-13 19:08 UTC (permalink / raw)
  To: Vivek Gautam
  Cc: robin.murphy, robdclark, stanimir.varbanov, sboyd, sricharan,
	iommu, linux-arm-msm

On Wed, Sep 06, 2017 at 11:07:35AM +0530, Vivek Gautam wrote:
> We don't want to touch the TLB when smmu is suspended, so
> defer the TLB maintenance until smmu is resumed.
> On resume, we issue arm_smmu_device_reset() to restore the
> configuration and flush the TLBs.
> 
> Signed-off-by: Vivek Gautam <vivek.gautam@codeaurora.org>
> ---
> 
> Hi Robin,
> 
> This patch comes after the discussion[1] we had about defering the
> physical TLB maintenance for an unmap request if the SMMU is
> suspended. Sorry for the delay in sending the updated version
> of the patch.
> 
> As discussed, this patch now checks the PM state of smmu in
> .tlb_add_flush and .tlb_sync page-table ops and return if smmu
> is suspended. On resume without assuming that the TLBs state is
> preserved, we issue a arm_smmu_device_reset() which is the safest
> thing to do.
> 
> Alternatively, without going into the TLB defer thing, we can simply
> avoid calling pm_runtime_get/put() in case of atomic context, as we
> discussed in the other thread[3]. This will look something like this:
> 
>       static size_t arm_smmu_unmap(struct iommu_domain *domain, ....)
>       {
>       <snip>
> 
>      -       return ops->unmap(ops, iova, size);
>      +       if (!in_atomic())
>      +              pm_runtime_get_sync(smmu_domain->smmu->dev);
>      +       ret = ops->unmap(ops, iova, size);
>      +       if (!in_atomic())
>      +              pm_runtime_put_sync(smmu_domain->smmu->dev);
>      +
>      +       return ret;
>       }
> 
> Let me know which approach should work.
> 
> One other concern that we were discussing was of distributed SMMU
> configuration.
> "Say the GPU and its local TBU are in the same clock domain - if the GPU has
> just gone idle and we've clock-gated it, but "the SMMU" (i.e. the TCU)
> is still active servicing other devices, we will assume we can happily
> unmap GPU buffers and issue TLBIs, but what happens with entries held in
> the unclocked TBU's micro-TLB?"
> 
> In such scenerio, when master is clock gated and TCU is still running:
>  -> If TCU and TBU are in same clock/power domain, then we can still
>     issue TLBIs as long as the smmu is clocked.
>  -> If TCU and TBU are in separate clock/power domains, then we better
>     check the power state for TBUs and defer TLB maintenance if TBUs are
>     clock gated.
>     In such scenerio will it make sense to represent a distributed smmu
>     as TCU device with multiple TBU child devices?

This is one of the cases that *really* worries me, particular if we can
end up freeing parts of the page table before the TLB maintenance has
been completed. Speculative table walks from the TCU could lead to all
sorts of horribly system behaviour, such as deadlock and/or data
corruption so I'm really not happy with this approach.

Will

^ permalink raw reply	[flat|nested] 168+ messages in thread

* Re: [PATCH V4 3/6] iommu/arm-smmu: Invoke pm_runtime during probe, add/remove device
  2017-08-07 12:29                         ` Rob Clark
  (?)
  (?)
@ 2017-11-14 18:30                             ` Vivek Gautam
  -1 siblings, 0 replies; 168+ messages in thread
From: Vivek Gautam @ 2017-11-14 18:30 UTC (permalink / raw)
  To: Rob Clark, Robin Murphy, Will Deacon, Rafael J. Wysocki
  Cc: Mark Rutland, devicetree-u79uwXL29TY76Z2rM5mHXA, linux-arm-msm,
	Stephen Boyd, Linux Kernel Mailing List, Stanimir Varbanov,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, Rob Herring,
	linux-clk, linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r

Hi,


On Mon, Aug 7, 2017 at 5:59 PM, Rob Clark <robdclark-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
> On Mon, Aug 7, 2017 at 4:27 AM, Vivek Gautam
> <vivek.gautam-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org> wrote:
>> On Thu, Jul 13, 2017 at 5:20 PM, Rob Clark <robdclark-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
>>> On Thu, Jul 13, 2017 at 1:35 AM, Sricharan R <sricharan-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org> wrote:
>>>> Hi Vivek,
>>>>
>>>> On 7/13/2017 10:43 AM, Vivek Gautam wrote:
>>>>> Hi Stephen,
>>>>>
>>>>>
>>>>> On 07/13/2017 04:24 AM, Stephen Boyd wrote:
>>>>>> On 07/06, Vivek Gautam wrote:
>>>>>>> @@ -1231,12 +1237,18 @@ static int arm_smmu_map(struct iommu_domain *domain, unsigned long iova,
>>>>>>>   static size_t arm_smmu_unmap(struct iommu_domain *domain, unsigned long iova,
>>>>>>>                    size_t size)
>>>>>>>   {
>>>>>>> -    struct io_pgtable_ops *ops = to_smmu_domain(domain)->pgtbl_ops;
>>>>>>> +    struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
>>>>>>> +    struct io_pgtable_ops *ops = smmu_domain->pgtbl_ops;
>>>>>>> +    size_t ret;
>>>>>>>         if (!ops)
>>>>>>>           return 0;
>>>>>>>   -    return ops->unmap(ops, iova, size);
>>>>>>> +    pm_runtime_get_sync(smmu_domain->smmu->dev);
>>>>>> Can these map/unmap ops be called from an atomic context? I seem
>>>>>> to recall that being a problem before.
>>>>>
>>>>> That's something which was dropped in the following patch merged in master:
>>>>> 523d7423e21b iommu/arm-smmu: Remove io-pgtable spinlock
>>>>>
>>>>> Looks like we don't  need locks here anymore?
>>>>
>>>>  Apart from the locking, wonder why a explicit pm_runtime is needed
>>>>  from unmap. Somehow looks like some path in the master using that
>>>>  should have enabled the pm ?
>>>>
>>>
>>> Yes, there are a bunch of scenarios where unmap can happen with
>>> disabled master (but not in atomic context).
>>
>> I would like to understand whether there is a situation where an unmap is
>> called in atomic context without an enabled master?
>>
>> Let's say we have the case where all the unmap calls in atomic context happen
>> only from the master's context (in which case the device link should
>> take care of
>> the pm state of smmu), and the only unmap that happen in non-atomic context
>> is the one with master disabled. In such a case doesn it make sense to
>> distinguish
>> the atomic/non-atomic context and add pm_runtime_get_sync()/put_sync() only
>> for the non-atomic context since that would be the one with master disabled.
>>
>
> At least drm/msm needs to hold obj->lock (a mutex) in unmap, so it
> won't unmap anything in atomic ctx (but it can unmap w/ master
> disabled).  I can't really comment about other non-gpu drivers.  It
> seems like a reasonable constraint that either master is enabled or
> not in atomic ctx.
>
> Currently we actually wrap unmap w/ pm_runtime_get/put_sync(), but I'd
> like to drop that to avoid powering up the gpu.

Since the deferring the TLB maintenance doesn't look like the best approach [1],
how about if we try to power-up only the smmu from different client
devices such as,
GPU in the unmap path. Then we won't need to add pm_runtime_get/put() calls in
arm_smmu_unmap().

The client device can use something like - pm_runtime_get_supplier() since
we already have the device link in place with this patch series. This should
power-on the supplier (which is smmu) without turning on the consumer
(such as GPU).

pm_runtime_get_supplier() however is not exported at this moment.
Will it be useful to export this API and use it in the drivers.

Adding Rafael J. Wysocki for suggestions on pm_runtime_get_suppliers() API.


[1] https://patchwork.kernel.org/patch/9876489/


Best regards
Vivek

>
> BR,
> -R
> --
> To unsubscribe from this list: send the line "unsubscribe linux-arm-msm" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member
of Code Aurora Forum, hosted by The Linux Foundation

^ permalink raw reply	[flat|nested] 168+ messages in thread

* Re: [PATCH V4 3/6] iommu/arm-smmu: Invoke pm_runtime during probe, add/remove device
@ 2017-11-14 18:30                             ` Vivek Gautam
  0 siblings, 0 replies; 168+ messages in thread
From: Vivek Gautam @ 2017-11-14 18:30 UTC (permalink / raw)
  To: Rob Clark, Robin Murphy, Will Deacon, Rafael J. Wysocki
  Cc: Sricharan R, Stephen Boyd, Joerg Roedel, Rob Herring,
	Mark Rutland, Marek Szyprowski, iommu, devicetree,
	Linux Kernel Mailing List, linux-clk, linux-arm-msm,
	Stanimir Varbanov, Archit Taneja, linux-arm-kernel

Hi,


On Mon, Aug 7, 2017 at 5:59 PM, Rob Clark <robdclark@gmail.com> wrote:
> On Mon, Aug 7, 2017 at 4:27 AM, Vivek Gautam
> <vivek.gautam@codeaurora.org> wrote:
>> On Thu, Jul 13, 2017 at 5:20 PM, Rob Clark <robdclark@gmail.com> wrote:
>>> On Thu, Jul 13, 2017 at 1:35 AM, Sricharan R <sricharan@codeaurora.org> wrote:
>>>> Hi Vivek,
>>>>
>>>> On 7/13/2017 10:43 AM, Vivek Gautam wrote:
>>>>> Hi Stephen,
>>>>>
>>>>>
>>>>> On 07/13/2017 04:24 AM, Stephen Boyd wrote:
>>>>>> On 07/06, Vivek Gautam wrote:
>>>>>>> @@ -1231,12 +1237,18 @@ static int arm_smmu_map(struct iommu_domain *domain, unsigned long iova,
>>>>>>>   static size_t arm_smmu_unmap(struct iommu_domain *domain, unsigned long iova,
>>>>>>>                    size_t size)
>>>>>>>   {
>>>>>>> -    struct io_pgtable_ops *ops = to_smmu_domain(domain)->pgtbl_ops;
>>>>>>> +    struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
>>>>>>> +    struct io_pgtable_ops *ops = smmu_domain->pgtbl_ops;
>>>>>>> +    size_t ret;
>>>>>>>         if (!ops)
>>>>>>>           return 0;
>>>>>>>   -    return ops->unmap(ops, iova, size);
>>>>>>> +    pm_runtime_get_sync(smmu_domain->smmu->dev);
>>>>>> Can these map/unmap ops be called from an atomic context? I seem
>>>>>> to recall that being a problem before.
>>>>>
>>>>> That's something which was dropped in the following patch merged in master:
>>>>> 523d7423e21b iommu/arm-smmu: Remove io-pgtable spinlock
>>>>>
>>>>> Looks like we don't  need locks here anymore?
>>>>
>>>>  Apart from the locking, wonder why a explicit pm_runtime is needed
>>>>  from unmap. Somehow looks like some path in the master using that
>>>>  should have enabled the pm ?
>>>>
>>>
>>> Yes, there are a bunch of scenarios where unmap can happen with
>>> disabled master (but not in atomic context).
>>
>> I would like to understand whether there is a situation where an unmap is
>> called in atomic context without an enabled master?
>>
>> Let's say we have the case where all the unmap calls in atomic context happen
>> only from the master's context (in which case the device link should
>> take care of
>> the pm state of smmu), and the only unmap that happen in non-atomic context
>> is the one with master disabled. In such a case doesn it make sense to
>> distinguish
>> the atomic/non-atomic context and add pm_runtime_get_sync()/put_sync() only
>> for the non-atomic context since that would be the one with master disabled.
>>
>
> At least drm/msm needs to hold obj->lock (a mutex) in unmap, so it
> won't unmap anything in atomic ctx (but it can unmap w/ master
> disabled).  I can't really comment about other non-gpu drivers.  It
> seems like a reasonable constraint that either master is enabled or
> not in atomic ctx.
>
> Currently we actually wrap unmap w/ pm_runtime_get/put_sync(), but I'd
> like to drop that to avoid powering up the gpu.

Since the deferring the TLB maintenance doesn't look like the best approach [1],
how about if we try to power-up only the smmu from different client
devices such as,
GPU in the unmap path. Then we won't need to add pm_runtime_get/put() calls in
arm_smmu_unmap().

The client device can use something like - pm_runtime_get_supplier() since
we already have the device link in place with this patch series. This should
power-on the supplier (which is smmu) without turning on the consumer
(such as GPU).

pm_runtime_get_supplier() however is not exported at this moment.
Will it be useful to export this API and use it in the drivers.

Adding Rafael J. Wysocki for suggestions on pm_runtime_get_suppliers() API.


[1] https://patchwork.kernel.org/patch/9876489/


Best regards
Vivek

>
> BR,
> -R
> --
> To unsubscribe from this list: send the line "unsubscribe linux-arm-msm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member
of Code Aurora Forum, hosted by The Linux Foundation

^ permalink raw reply	[flat|nested] 168+ messages in thread

* Re: [PATCH V4 3/6] iommu/arm-smmu: Invoke pm_runtime during probe, add/remove device
@ 2017-11-14 18:30                             ` Vivek Gautam
  0 siblings, 0 replies; 168+ messages in thread
From: Vivek Gautam @ 2017-11-14 18:30 UTC (permalink / raw)
  To: Rob Clark, Robin Murphy, Will Deacon, Rafael J. Wysocki
  Cc: Sricharan R, Stephen Boyd, Joerg Roedel, Rob Herring,
	Mark Rutland, Marek Szyprowski, iommu, devicetree,
	Linux Kernel Mailing List, linux-clk, linux-arm-msm,
	Stanimir Varbanov, Archit Taneja, linux-arm-kernel

Hi,


On Mon, Aug 7, 2017 at 5:59 PM, Rob Clark <robdclark@gmail.com> wrote:
> On Mon, Aug 7, 2017 at 4:27 AM, Vivek Gautam
> <vivek.gautam@codeaurora.org> wrote:
>> On Thu, Jul 13, 2017 at 5:20 PM, Rob Clark <robdclark@gmail.com> wrote:
>>> On Thu, Jul 13, 2017 at 1:35 AM, Sricharan R <sricharan@codeaurora.org> wrote:
>>>> Hi Vivek,
>>>>
>>>> On 7/13/2017 10:43 AM, Vivek Gautam wrote:
>>>>> Hi Stephen,
>>>>>
>>>>>
>>>>> On 07/13/2017 04:24 AM, Stephen Boyd wrote:
>>>>>> On 07/06, Vivek Gautam wrote:
>>>>>>> @@ -1231,12 +1237,18 @@ static int arm_smmu_map(struct iommu_domain *domain, unsigned long iova,
>>>>>>>   static size_t arm_smmu_unmap(struct iommu_domain *domain, unsigned long iova,
>>>>>>>                    size_t size)
>>>>>>>   {
>>>>>>> -    struct io_pgtable_ops *ops = to_smmu_domain(domain)->pgtbl_ops;
>>>>>>> +    struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
>>>>>>> +    struct io_pgtable_ops *ops = smmu_domain->pgtbl_ops;
>>>>>>> +    size_t ret;
>>>>>>>         if (!ops)
>>>>>>>           return 0;
>>>>>>>   -    return ops->unmap(ops, iova, size);
>>>>>>> +    pm_runtime_get_sync(smmu_domain->smmu->dev);
>>>>>> Can these map/unmap ops be called from an atomic context? I seem
>>>>>> to recall that being a problem before.
>>>>>
>>>>> That's something which was dropped in the following patch merged in master:
>>>>> 523d7423e21b iommu/arm-smmu: Remove io-pgtable spinlock
>>>>>
>>>>> Looks like we don't  need locks here anymore?
>>>>
>>>>  Apart from the locking, wonder why a explicit pm_runtime is needed
>>>>  from unmap. Somehow looks like some path in the master using that
>>>>  should have enabled the pm ?
>>>>
>>>
>>> Yes, there are a bunch of scenarios where unmap can happen with
>>> disabled master (but not in atomic context).
>>
>> I would like to understand whether there is a situation where an unmap is
>> called in atomic context without an enabled master?
>>
>> Let's say we have the case where all the unmap calls in atomic context happen
>> only from the master's context (in which case the device link should
>> take care of
>> the pm state of smmu), and the only unmap that happen in non-atomic context
>> is the one with master disabled. In such a case doesn it make sense to
>> distinguish
>> the atomic/non-atomic context and add pm_runtime_get_sync()/put_sync() only
>> for the non-atomic context since that would be the one with master disabled.
>>
>
> At least drm/msm needs to hold obj->lock (a mutex) in unmap, so it
> won't unmap anything in atomic ctx (but it can unmap w/ master
> disabled).  I can't really comment about other non-gpu drivers.  It
> seems like a reasonable constraint that either master is enabled or
> not in atomic ctx.
>
> Currently we actually wrap unmap w/ pm_runtime_get/put_sync(), but I'd
> like to drop that to avoid powering up the gpu.

Since the deferring the TLB maintenance doesn't look like the best approach [1],
how about if we try to power-up only the smmu from different client
devices such as,
GPU in the unmap path. Then we won't need to add pm_runtime_get/put() calls in
arm_smmu_unmap().

The client device can use something like - pm_runtime_get_supplier() since
we already have the device link in place with this patch series. This should
power-on the supplier (which is smmu) without turning on the consumer
(such as GPU).

pm_runtime_get_supplier() however is not exported at this moment.
Will it be useful to export this API and use it in the drivers.

Adding Rafael J. Wysocki for suggestions on pm_runtime_get_suppliers() API.


[1] https://patchwork.kernel.org/patch/9876489/


Best regards
Vivek

>
> BR,
> -R
> --
> To unsubscribe from this list: send the line "unsubscribe linux-arm-msm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member
of Code Aurora Forum, hosted by The Linux Foundation

^ permalink raw reply	[flat|nested] 168+ messages in thread

* [PATCH V4 3/6] iommu/arm-smmu: Invoke pm_runtime during probe, add/remove device
@ 2017-11-14 18:30                             ` Vivek Gautam
  0 siblings, 0 replies; 168+ messages in thread
From: Vivek Gautam @ 2017-11-14 18:30 UTC (permalink / raw)
  To: linux-arm-kernel

Hi,


On Mon, Aug 7, 2017 at 5:59 PM, Rob Clark <robdclark@gmail.com> wrote:
> On Mon, Aug 7, 2017 at 4:27 AM, Vivek Gautam
> <vivek.gautam@codeaurora.org> wrote:
>> On Thu, Jul 13, 2017 at 5:20 PM, Rob Clark <robdclark@gmail.com> wrote:
>>> On Thu, Jul 13, 2017 at 1:35 AM, Sricharan R <sricharan@codeaurora.org> wrote:
>>>> Hi Vivek,
>>>>
>>>> On 7/13/2017 10:43 AM, Vivek Gautam wrote:
>>>>> Hi Stephen,
>>>>>
>>>>>
>>>>> On 07/13/2017 04:24 AM, Stephen Boyd wrote:
>>>>>> On 07/06, Vivek Gautam wrote:
>>>>>>> @@ -1231,12 +1237,18 @@ static int arm_smmu_map(struct iommu_domain *domain, unsigned long iova,
>>>>>>>   static size_t arm_smmu_unmap(struct iommu_domain *domain, unsigned long iova,
>>>>>>>                    size_t size)
>>>>>>>   {
>>>>>>> -    struct io_pgtable_ops *ops = to_smmu_domain(domain)->pgtbl_ops;
>>>>>>> +    struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
>>>>>>> +    struct io_pgtable_ops *ops = smmu_domain->pgtbl_ops;
>>>>>>> +    size_t ret;
>>>>>>>         if (!ops)
>>>>>>>           return 0;
>>>>>>>   -    return ops->unmap(ops, iova, size);
>>>>>>> +    pm_runtime_get_sync(smmu_domain->smmu->dev);
>>>>>> Can these map/unmap ops be called from an atomic context? I seem
>>>>>> to recall that being a problem before.
>>>>>
>>>>> That's something which was dropped in the following patch merged in master:
>>>>> 523d7423e21b iommu/arm-smmu: Remove io-pgtable spinlock
>>>>>
>>>>> Looks like we don't  need locks here anymore?
>>>>
>>>>  Apart from the locking, wonder why a explicit pm_runtime is needed
>>>>  from unmap. Somehow looks like some path in the master using that
>>>>  should have enabled the pm ?
>>>>
>>>
>>> Yes, there are a bunch of scenarios where unmap can happen with
>>> disabled master (but not in atomic context).
>>
>> I would like to understand whether there is a situation where an unmap is
>> called in atomic context without an enabled master?
>>
>> Let's say we have the case where all the unmap calls in atomic context happen
>> only from the master's context (in which case the device link should
>> take care of
>> the pm state of smmu), and the only unmap that happen in non-atomic context
>> is the one with master disabled. In such a case doesn it make sense to
>> distinguish
>> the atomic/non-atomic context and add pm_runtime_get_sync()/put_sync() only
>> for the non-atomic context since that would be the one with master disabled.
>>
>
> At least drm/msm needs to hold obj->lock (a mutex) in unmap, so it
> won't unmap anything in atomic ctx (but it can unmap w/ master
> disabled).  I can't really comment about other non-gpu drivers.  It
> seems like a reasonable constraint that either master is enabled or
> not in atomic ctx.
>
> Currently we actually wrap unmap w/ pm_runtime_get/put_sync(), but I'd
> like to drop that to avoid powering up the gpu.

Since the deferring the TLB maintenance doesn't look like the best approach [1],
how about if we try to power-up only the smmu from different client
devices such as,
GPU in the unmap path. Then we won't need to add pm_runtime_get/put() calls in
arm_smmu_unmap().

The client device can use something like - pm_runtime_get_supplier() since
we already have the device link in place with this patch series. This should
power-on the supplier (which is smmu) without turning on the consumer
(such as GPU).

pm_runtime_get_supplier() however is not exported at this moment.
Will it be useful to export this API and use it in the drivers.

Adding Rafael J. Wysocki for suggestions on pm_runtime_get_suppliers() API.


[1] https://patchwork.kernel.org/patch/9876489/


Best regards
Vivek

>
> BR,
> -R
> --
> To unsubscribe from this list: send the line "unsubscribe linux-arm-msm" in
> the body of a message to majordomo at vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member
of Code Aurora Forum, hosted by The Linux Foundation

^ permalink raw reply	[flat|nested] 168+ messages in thread

* Re: [PATCH v2 1/1] iommu/arm-smmu: Defer TLB flush in case of unmap op
  2017-10-13 19:08                                                                     ` Will Deacon
@ 2017-11-20 17:17                                                                       ` Vivek Gautam
  0 siblings, 0 replies; 168+ messages in thread
From: Vivek Gautam @ 2017-11-20 17:17 UTC (permalink / raw)
  To: Will Deacon
  Cc: robin.murphy, robdclark, stanimir.varbanov, sboyd, sricharan,
	iommu, linux-arm-msm

Hi Will,


On 10/14/2017 12:38 AM, Will Deacon wrote:
> On Wed, Sep 06, 2017 at 11:07:35AM +0530, Vivek Gautam wrote:
>> We don't want to touch the TLB when smmu is suspended, so
>> defer the TLB maintenance until smmu is resumed.
>> On resume, we issue arm_smmu_device_reset() to restore the
>> configuration and flush the TLBs.
>>
>> Signed-off-by: Vivek Gautam <vivek.gautam@codeaurora.org>
>> ---
>>
>> Hi Robin,
>>
>> This patch comes after the discussion[1] we had about defering the
>> physical TLB maintenance for an unmap request if the SMMU is
>> suspended. Sorry for the delay in sending the updated version
>> of the patch.
>>
>> As discussed, this patch now checks the PM state of smmu in
>> .tlb_add_flush and .tlb_sync page-table ops and return if smmu
>> is suspended. On resume without assuming that the TLBs state is
>> preserved, we issue a arm_smmu_device_reset() which is the safest
>> thing to do.
>>
>> Alternatively, without going into the TLB defer thing, we can simply
>> avoid calling pm_runtime_get/put() in case of atomic context, as we
>> discussed in the other thread[3]. This will look something like this:
>>
>>        static size_t arm_smmu_unmap(struct iommu_domain *domain, ....)
>>        {
>>        <snip>
>>
>>       -       return ops->unmap(ops, iova, size);
>>       +       if (!in_atomic())
>>       +              pm_runtime_get_sync(smmu_domain->smmu->dev);
>>       +       ret = ops->unmap(ops, iova, size);
>>       +       if (!in_atomic())
>>       +              pm_runtime_put_sync(smmu_domain->smmu->dev);
>>       +
>>       +       return ret;
>>        }
>>
>> Let me know which approach should work.
>>
>> One other concern that we were discussing was of distributed SMMU
>> configuration.
>> "Say the GPU and its local TBU are in the same clock domain - if the GPU has
>> just gone idle and we've clock-gated it, but "the SMMU" (i.e. the TCU)
>> is still active servicing other devices, we will assume we can happily
>> unmap GPU buffers and issue TLBIs, but what happens with entries held in
>> the unclocked TBU's micro-TLB?"
>>
>> In such scenerio, when master is clock gated and TCU is still running:
>>   -> If TCU and TBU are in same clock/power domain, then we can still
>>      issue TLBIs as long as the smmu is clocked.
>>   -> If TCU and TBU are in separate clock/power domains, then we better
>>      check the power state for TBUs and defer TLB maintenance if TBUs are
>>      clock gated.
>>      In such scenerio will it make sense to represent a distributed smmu
>>      as TCU device with multiple TBU child devices?
> This is one of the cases that *really* worries me, particular if we can
> end up freeing parts of the page table before the TLB maintenance has
> been completed. Speculative table walks from the TCU could lead to all
> sorts of horribly system behaviour, such as deadlock and/or data
> corruption so I'm really not happy with this approach.

Right. I am dropping this approach.

To handle the unmap path more gracefully, how about the master
devices, such as GPU or Video power up the smmu with the help of
device link.
Since we have the device link already setup, the master device can
call a runtime_get() over the suppliers (which is smmu in our case when
we setup device link between master such as GPU and the smmu).
This way we don't insert the pm_runtime_get/put() calls in the 
arm_smmu_unmap()
and we make sure through masters that the smmu is powered on for
any TLB maintenance.

I have mentioned this comment in the other thread as well [1].
Let me know your comments.

[1] https://patchwork.kernel.org/patch/9827835/

Best regards
Vivek

>
> Will
> --
> To unsubscribe from this list: send the line "unsubscribe linux-arm-msm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

-- 
The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project

^ permalink raw reply	[flat|nested] 168+ messages in thread

* Re: [PATCH V4 3/6] iommu/arm-smmu: Invoke pm_runtime during probe, add/remove device
  2017-11-14 18:30                             ` Vivek Gautam
  (?)
  (?)
@ 2017-11-27 22:22                                 ` Stephen Boyd
  -1 siblings, 0 replies; 168+ messages in thread
From: Stephen Boyd @ 2017-11-27 22:22 UTC (permalink / raw)
  To: Vivek Gautam
  Cc: Mark Rutland, devicetree-u79uwXL29TY76Z2rM5mHXA,
	Rafael J. Wysocki, Will Deacon,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	Linux Kernel Mailing List, Rob Herring, Stanimir Varbanov,
	linux-arm-msm, linux-clk,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r

On 11/15, Vivek Gautam wrote:
> Hi,
> 
> 
> On Mon, Aug 7, 2017 at 5:59 PM, Rob Clark <robdclark-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
> > On Mon, Aug 7, 2017 at 4:27 AM, Vivek Gautam
> > <vivek.gautam-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org> wrote:
> >> On Thu, Jul 13, 2017 at 5:20 PM, Rob Clark <robdclark-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
> >>> On Thu, Jul 13, 2017 at 1:35 AM, Sricharan R <sricharan-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org> wrote:
> >>>> Hi Vivek,
> >>>>
> >>>> On 7/13/2017 10:43 AM, Vivek Gautam wrote:
> >>>>> Hi Stephen,
> >>>>>
> >>>>>
> >>>>> On 07/13/2017 04:24 AM, Stephen Boyd wrote:
> >>>>>> On 07/06, Vivek Gautam wrote:
> >>>>>>> @@ -1231,12 +1237,18 @@ static int arm_smmu_map(struct iommu_domain *domain, unsigned long iova,
> >>>>>>>   static size_t arm_smmu_unmap(struct iommu_domain *domain, unsigned long iova,
> >>>>>>>                    size_t size)
> >>>>>>>   {
> >>>>>>> -    struct io_pgtable_ops *ops = to_smmu_domain(domain)->pgtbl_ops;
> >>>>>>> +    struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
> >>>>>>> +    struct io_pgtable_ops *ops = smmu_domain->pgtbl_ops;
> >>>>>>> +    size_t ret;
> >>>>>>>         if (!ops)
> >>>>>>>           return 0;
> >>>>>>>   -    return ops->unmap(ops, iova, size);
> >>>>>>> +    pm_runtime_get_sync(smmu_domain->smmu->dev);
> >>>>>> Can these map/unmap ops be called from an atomic context? I seem
> >>>>>> to recall that being a problem before.
> >>>>>
> >>>>> That's something which was dropped in the following patch merged in master:
> >>>>> 523d7423e21b iommu/arm-smmu: Remove io-pgtable spinlock
> >>>>>
> >>>>> Looks like we don't  need locks here anymore?
> >>>>
> >>>>  Apart from the locking, wonder why a explicit pm_runtime is needed
> >>>>  from unmap. Somehow looks like some path in the master using that
> >>>>  should have enabled the pm ?
> >>>>
> >>>
> >>> Yes, there are a bunch of scenarios where unmap can happen with
> >>> disabled master (but not in atomic context).
> >>
> >> I would like to understand whether there is a situation where an unmap is
> >> called in atomic context without an enabled master?
> >>
> >> Let's say we have the case where all the unmap calls in atomic context happen
> >> only from the master's context (in which case the device link should
> >> take care of
> >> the pm state of smmu), and the only unmap that happen in non-atomic context
> >> is the one with master disabled. In such a case doesn it make sense to
> >> distinguish
> >> the atomic/non-atomic context and add pm_runtime_get_sync()/put_sync() only
> >> for the non-atomic context since that would be the one with master disabled.
> >>
> >
> > At least drm/msm needs to hold obj->lock (a mutex) in unmap, so it
> > won't unmap anything in atomic ctx (but it can unmap w/ master
> > disabled).  I can't really comment about other non-gpu drivers.  It
> > seems like a reasonable constraint that either master is enabled or
> > not in atomic ctx.
> >
> > Currently we actually wrap unmap w/ pm_runtime_get/put_sync(), but I'd
> > like to drop that to avoid powering up the gpu.
> 
> Since the deferring the TLB maintenance doesn't look like the best approach [1],
> how about if we try to power-up only the smmu from different client
> devices such as,
> GPU in the unmap path. Then we won't need to add pm_runtime_get/put() calls in
> arm_smmu_unmap().
> 
> The client device can use something like - pm_runtime_get_supplier() since
> we already have the device link in place with this patch series. This should
> power-on the supplier (which is smmu) without turning on the consumer
> (such as GPU).
> 
> pm_runtime_get_supplier() however is not exported at this moment.
> Will it be useful to export this API and use it in the drivers.
> 

I'm not sure pm_runtime_get_supplier() is correct either. That
feels like we're relying on the GPU driver knowing the internal
details of how the device links are configured.

Is there some way to have the GPU driver know in its runtime PM
resume hook that it doesn't need to be powered on because it
isn't actively drawing anything or processing commands? I'm
thinking of the code calling pm_runtime_get() as proposed around
the IOMMU unmap path in the GPU driver and then having the
runtime PM resume hook in the GPU driver return some special
value to indicate that it didn't really resume because it didn't
need to and to treat the device as runtime suspended but not
return an error. Then the runtime PM core can keep track of that
and try to power the GPU on again when another pm_runtime_get()
is called on the GPU device.

This keeps the consumer API the same, always pm_runtime_get(),
but leaves the device driver logic of what to do when the GPU
doesn't need to power on to the runtime PM hook where the driver
has all the information.

-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project

^ permalink raw reply	[flat|nested] 168+ messages in thread

* Re: [PATCH V4 3/6] iommu/arm-smmu: Invoke pm_runtime during probe, add/remove device
@ 2017-11-27 22:22                                 ` Stephen Boyd
  0 siblings, 0 replies; 168+ messages in thread
From: Stephen Boyd @ 2017-11-27 22:22 UTC (permalink / raw)
  To: Vivek Gautam
  Cc: Rob Clark, Robin Murphy, Will Deacon, Rafael J. Wysocki,
	Sricharan R, Joerg Roedel, Rob Herring, Mark Rutland,
	Marek Szyprowski, iommu, devicetree, Linux Kernel Mailing List,
	linux-clk, linux-arm-msm, Stanimir Varbanov, Archit Taneja,
	linux-arm-kernel

On 11/15, Vivek Gautam wrote:
> Hi,
> 
> 
> On Mon, Aug 7, 2017 at 5:59 PM, Rob Clark <robdclark@gmail.com> wrote:
> > On Mon, Aug 7, 2017 at 4:27 AM, Vivek Gautam
> > <vivek.gautam@codeaurora.org> wrote:
> >> On Thu, Jul 13, 2017 at 5:20 PM, Rob Clark <robdclark@gmail.com> wrote:
> >>> On Thu, Jul 13, 2017 at 1:35 AM, Sricharan R <sricharan@codeaurora.org> wrote:
> >>>> Hi Vivek,
> >>>>
> >>>> On 7/13/2017 10:43 AM, Vivek Gautam wrote:
> >>>>> Hi Stephen,
> >>>>>
> >>>>>
> >>>>> On 07/13/2017 04:24 AM, Stephen Boyd wrote:
> >>>>>> On 07/06, Vivek Gautam wrote:
> >>>>>>> @@ -1231,12 +1237,18 @@ static int arm_smmu_map(struct iommu_domain *domain, unsigned long iova,
> >>>>>>>   static size_t arm_smmu_unmap(struct iommu_domain *domain, unsigned long iova,
> >>>>>>>                    size_t size)
> >>>>>>>   {
> >>>>>>> -    struct io_pgtable_ops *ops = to_smmu_domain(domain)->pgtbl_ops;
> >>>>>>> +    struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
> >>>>>>> +    struct io_pgtable_ops *ops = smmu_domain->pgtbl_ops;
> >>>>>>> +    size_t ret;
> >>>>>>>         if (!ops)
> >>>>>>>           return 0;
> >>>>>>>   -    return ops->unmap(ops, iova, size);
> >>>>>>> +    pm_runtime_get_sync(smmu_domain->smmu->dev);
> >>>>>> Can these map/unmap ops be called from an atomic context? I seem
> >>>>>> to recall that being a problem before.
> >>>>>
> >>>>> That's something which was dropped in the following patch merged in master:
> >>>>> 523d7423e21b iommu/arm-smmu: Remove io-pgtable spinlock
> >>>>>
> >>>>> Looks like we don't  need locks here anymore?
> >>>>
> >>>>  Apart from the locking, wonder why a explicit pm_runtime is needed
> >>>>  from unmap. Somehow looks like some path in the master using that
> >>>>  should have enabled the pm ?
> >>>>
> >>>
> >>> Yes, there are a bunch of scenarios where unmap can happen with
> >>> disabled master (but not in atomic context).
> >>
> >> I would like to understand whether there is a situation where an unmap is
> >> called in atomic context without an enabled master?
> >>
> >> Let's say we have the case where all the unmap calls in atomic context happen
> >> only from the master's context (in which case the device link should
> >> take care of
> >> the pm state of smmu), and the only unmap that happen in non-atomic context
> >> is the one with master disabled. In such a case doesn it make sense to
> >> distinguish
> >> the atomic/non-atomic context and add pm_runtime_get_sync()/put_sync() only
> >> for the non-atomic context since that would be the one with master disabled.
> >>
> >
> > At least drm/msm needs to hold obj->lock (a mutex) in unmap, so it
> > won't unmap anything in atomic ctx (but it can unmap w/ master
> > disabled).  I can't really comment about other non-gpu drivers.  It
> > seems like a reasonable constraint that either master is enabled or
> > not in atomic ctx.
> >
> > Currently we actually wrap unmap w/ pm_runtime_get/put_sync(), but I'd
> > like to drop that to avoid powering up the gpu.
> 
> Since the deferring the TLB maintenance doesn't look like the best approach [1],
> how about if we try to power-up only the smmu from different client
> devices such as,
> GPU in the unmap path. Then we won't need to add pm_runtime_get/put() calls in
> arm_smmu_unmap().
> 
> The client device can use something like - pm_runtime_get_supplier() since
> we already have the device link in place with this patch series. This should
> power-on the supplier (which is smmu) without turning on the consumer
> (such as GPU).
> 
> pm_runtime_get_supplier() however is not exported at this moment.
> Will it be useful to export this API and use it in the drivers.
> 

I'm not sure pm_runtime_get_supplier() is correct either. That
feels like we're relying on the GPU driver knowing the internal
details of how the device links are configured.

Is there some way to have the GPU driver know in its runtime PM
resume hook that it doesn't need to be powered on because it
isn't actively drawing anything or processing commands? I'm
thinking of the code calling pm_runtime_get() as proposed around
the IOMMU unmap path in the GPU driver and then having the
runtime PM resume hook in the GPU driver return some special
value to indicate that it didn't really resume because it didn't
need to and to treat the device as runtime suspended but not
return an error. Then the runtime PM core can keep track of that
and try to power the GPU on again when another pm_runtime_get()
is called on the GPU device.

This keeps the consumer API the same, always pm_runtime_get(),
but leaves the device driver logic of what to do when the GPU
doesn't need to power on to the runtime PM hook where the driver
has all the information.

-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project

^ permalink raw reply	[flat|nested] 168+ messages in thread

* Re: [PATCH V4 3/6] iommu/arm-smmu: Invoke pm_runtime during probe, add/remove device
@ 2017-11-27 22:22                                 ` Stephen Boyd
  0 siblings, 0 replies; 168+ messages in thread
From: Stephen Boyd @ 2017-11-27 22:22 UTC (permalink / raw)
  To: Vivek Gautam
  Cc: Rob Clark, Robin Murphy, Will Deacon, Rafael J. Wysocki,
	Sricharan R, Joerg Roedel, Rob Herring, Mark Rutland,
	Marek Szyprowski, iommu, devicetree, Linux Kernel Mailing List,
	linux-clk, linux-arm-msm, Stanimir Varbanov, Archit Taneja,
	linux-arm-kernel

On 11/15, Vivek Gautam wrote:
> Hi,
> 
> 
> On Mon, Aug 7, 2017 at 5:59 PM, Rob Clark <robdclark@gmail.com> wrote:
> > On Mon, Aug 7, 2017 at 4:27 AM, Vivek Gautam
> > <vivek.gautam@codeaurora.org> wrote:
> >> On Thu, Jul 13, 2017 at 5:20 PM, Rob Clark <robdclark@gmail.com> wrote:
> >>> On Thu, Jul 13, 2017 at 1:35 AM, Sricharan R <sricharan@codeaurora.org> wrote:
> >>>> Hi Vivek,
> >>>>
> >>>> On 7/13/2017 10:43 AM, Vivek Gautam wrote:
> >>>>> Hi Stephen,
> >>>>>
> >>>>>
> >>>>> On 07/13/2017 04:24 AM, Stephen Boyd wrote:
> >>>>>> On 07/06, Vivek Gautam wrote:
> >>>>>>> @@ -1231,12 +1237,18 @@ static int arm_smmu_map(struct iommu_domain *domain, unsigned long iova,
> >>>>>>>   static size_t arm_smmu_unmap(struct iommu_domain *domain, unsigned long iova,
> >>>>>>>                    size_t size)
> >>>>>>>   {
> >>>>>>> -    struct io_pgtable_ops *ops = to_smmu_domain(domain)->pgtbl_ops;
> >>>>>>> +    struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
> >>>>>>> +    struct io_pgtable_ops *ops = smmu_domain->pgtbl_ops;
> >>>>>>> +    size_t ret;
> >>>>>>>         if (!ops)
> >>>>>>>           return 0;
> >>>>>>>   -    return ops->unmap(ops, iova, size);
> >>>>>>> +    pm_runtime_get_sync(smmu_domain->smmu->dev);
> >>>>>> Can these map/unmap ops be called from an atomic context? I seem
> >>>>>> to recall that being a problem before.
> >>>>>
> >>>>> That's something which was dropped in the following patch merged in master:
> >>>>> 523d7423e21b iommu/arm-smmu: Remove io-pgtable spinlock
> >>>>>
> >>>>> Looks like we don't  need locks here anymore?
> >>>>
> >>>>  Apart from the locking, wonder why a explicit pm_runtime is needed
> >>>>  from unmap. Somehow looks like some path in the master using that
> >>>>  should have enabled the pm ?
> >>>>
> >>>
> >>> Yes, there are a bunch of scenarios where unmap can happen with
> >>> disabled master (but not in atomic context).
> >>
> >> I would like to understand whether there is a situation where an unmap is
> >> called in atomic context without an enabled master?
> >>
> >> Let's say we have the case where all the unmap calls in atomic context happen
> >> only from the master's context (in which case the device link should
> >> take care of
> >> the pm state of smmu), and the only unmap that happen in non-atomic context
> >> is the one with master disabled. In such a case doesn it make sense to
> >> distinguish
> >> the atomic/non-atomic context and add pm_runtime_get_sync()/put_sync() only
> >> for the non-atomic context since that would be the one with master disabled.
> >>
> >
> > At least drm/msm needs to hold obj->lock (a mutex) in unmap, so it
> > won't unmap anything in atomic ctx (but it can unmap w/ master
> > disabled).  I can't really comment about other non-gpu drivers.  It
> > seems like a reasonable constraint that either master is enabled or
> > not in atomic ctx.
> >
> > Currently we actually wrap unmap w/ pm_runtime_get/put_sync(), but I'd
> > like to drop that to avoid powering up the gpu.
> 
> Since the deferring the TLB maintenance doesn't look like the best approach [1],
> how about if we try to power-up only the smmu from different client
> devices such as,
> GPU in the unmap path. Then we won't need to add pm_runtime_get/put() calls in
> arm_smmu_unmap().
> 
> The client device can use something like - pm_runtime_get_supplier() since
> we already have the device link in place with this patch series. This should
> power-on the supplier (which is smmu) without turning on the consumer
> (such as GPU).
> 
> pm_runtime_get_supplier() however is not exported at this moment.
> Will it be useful to export this API and use it in the drivers.
> 

I'm not sure pm_runtime_get_supplier() is correct either. That
feels like we're relying on the GPU driver knowing the internal
details of how the device links are configured.

Is there some way to have the GPU driver know in its runtime PM
resume hook that it doesn't need to be powered on because it
isn't actively drawing anything or processing commands? I'm
thinking of the code calling pm_runtime_get() as proposed around
the IOMMU unmap path in the GPU driver and then having the
runtime PM resume hook in the GPU driver return some special
value to indicate that it didn't really resume because it didn't
need to and to treat the device as runtime suspended but not
return an error. Then the runtime PM core can keep track of that
and try to power the GPU on again when another pm_runtime_get()
is called on the GPU device.

This keeps the consumer API the same, always pm_runtime_get(),
but leaves the device driver logic of what to do when the GPU
doesn't need to power on to the runtime PM hook where the driver
has all the information.

-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project

^ permalink raw reply	[flat|nested] 168+ messages in thread

* [PATCH V4 3/6] iommu/arm-smmu: Invoke pm_runtime during probe, add/remove device
@ 2017-11-27 22:22                                 ` Stephen Boyd
  0 siblings, 0 replies; 168+ messages in thread
From: Stephen Boyd @ 2017-11-27 22:22 UTC (permalink / raw)
  To: linux-arm-kernel

On 11/15, Vivek Gautam wrote:
> Hi,
> 
> 
> On Mon, Aug 7, 2017 at 5:59 PM, Rob Clark <robdclark@gmail.com> wrote:
> > On Mon, Aug 7, 2017 at 4:27 AM, Vivek Gautam
> > <vivek.gautam@codeaurora.org> wrote:
> >> On Thu, Jul 13, 2017 at 5:20 PM, Rob Clark <robdclark@gmail.com> wrote:
> >>> On Thu, Jul 13, 2017 at 1:35 AM, Sricharan R <sricharan@codeaurora.org> wrote:
> >>>> Hi Vivek,
> >>>>
> >>>> On 7/13/2017 10:43 AM, Vivek Gautam wrote:
> >>>>> Hi Stephen,
> >>>>>
> >>>>>
> >>>>> On 07/13/2017 04:24 AM, Stephen Boyd wrote:
> >>>>>> On 07/06, Vivek Gautam wrote:
> >>>>>>> @@ -1231,12 +1237,18 @@ static int arm_smmu_map(struct iommu_domain *domain, unsigned long iova,
> >>>>>>>   static size_t arm_smmu_unmap(struct iommu_domain *domain, unsigned long iova,
> >>>>>>>                    size_t size)
> >>>>>>>   {
> >>>>>>> -    struct io_pgtable_ops *ops = to_smmu_domain(domain)->pgtbl_ops;
> >>>>>>> +    struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
> >>>>>>> +    struct io_pgtable_ops *ops = smmu_domain->pgtbl_ops;
> >>>>>>> +    size_t ret;
> >>>>>>>         if (!ops)
> >>>>>>>           return 0;
> >>>>>>>   -    return ops->unmap(ops, iova, size);
> >>>>>>> +    pm_runtime_get_sync(smmu_domain->smmu->dev);
> >>>>>> Can these map/unmap ops be called from an atomic context? I seem
> >>>>>> to recall that being a problem before.
> >>>>>
> >>>>> That's something which was dropped in the following patch merged in master:
> >>>>> 523d7423e21b iommu/arm-smmu: Remove io-pgtable spinlock
> >>>>>
> >>>>> Looks like we don't  need locks here anymore?
> >>>>
> >>>>  Apart from the locking, wonder why a explicit pm_runtime is needed
> >>>>  from unmap. Somehow looks like some path in the master using that
> >>>>  should have enabled the pm ?
> >>>>
> >>>
> >>> Yes, there are a bunch of scenarios where unmap can happen with
> >>> disabled master (but not in atomic context).
> >>
> >> I would like to understand whether there is a situation where an unmap is
> >> called in atomic context without an enabled master?
> >>
> >> Let's say we have the case where all the unmap calls in atomic context happen
> >> only from the master's context (in which case the device link should
> >> take care of
> >> the pm state of smmu), and the only unmap that happen in non-atomic context
> >> is the one with master disabled. In such a case doesn it make sense to
> >> distinguish
> >> the atomic/non-atomic context and add pm_runtime_get_sync()/put_sync() only
> >> for the non-atomic context since that would be the one with master disabled.
> >>
> >
> > At least drm/msm needs to hold obj->lock (a mutex) in unmap, so it
> > won't unmap anything in atomic ctx (but it can unmap w/ master
> > disabled).  I can't really comment about other non-gpu drivers.  It
> > seems like a reasonable constraint that either master is enabled or
> > not in atomic ctx.
> >
> > Currently we actually wrap unmap w/ pm_runtime_get/put_sync(), but I'd
> > like to drop that to avoid powering up the gpu.
> 
> Since the deferring the TLB maintenance doesn't look like the best approach [1],
> how about if we try to power-up only the smmu from different client
> devices such as,
> GPU in the unmap path. Then we won't need to add pm_runtime_get/put() calls in
> arm_smmu_unmap().
> 
> The client device can use something like - pm_runtime_get_supplier() since
> we already have the device link in place with this patch series. This should
> power-on the supplier (which is smmu) without turning on the consumer
> (such as GPU).
> 
> pm_runtime_get_supplier() however is not exported at this moment.
> Will it be useful to export this API and use it in the drivers.
> 

I'm not sure pm_runtime_get_supplier() is correct either. That
feels like we're relying on the GPU driver knowing the internal
details of how the device links are configured.

Is there some way to have the GPU driver know in its runtime PM
resume hook that it doesn't need to be powered on because it
isn't actively drawing anything or processing commands? I'm
thinking of the code calling pm_runtime_get() as proposed around
the IOMMU unmap path in the GPU driver and then having the
runtime PM resume hook in the GPU driver return some special
value to indicate that it didn't really resume because it didn't
need to and to treat the device as runtime suspended but not
return an error. Then the runtime PM core can keep track of that
and try to power the GPU on again when another pm_runtime_get()
is called on the GPU device.

This keeps the consumer API the same, always pm_runtime_get(),
but leaves the device driver logic of what to do when the GPU
doesn't need to power on to the runtime PM hook where the driver
has all the information.

-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project

^ permalink raw reply	[flat|nested] 168+ messages in thread

* Re: [PATCH V4 3/6] iommu/arm-smmu: Invoke pm_runtime during probe, add/remove device
  2017-11-27 22:22                                 ` Stephen Boyd
  (?)
  (?)
@ 2017-11-27 23:43                                     ` Rob Clark
  -1 siblings, 0 replies; 168+ messages in thread
From: Rob Clark @ 2017-11-27 23:43 UTC (permalink / raw)
  To: Stephen Boyd
  Cc: Mark Rutland, devicetree-u79uwXL29TY76Z2rM5mHXA,
	Rafael J. Wysocki, Will Deacon, Linux Kernel Mailing List,
	Stanimir Varbanov,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, Rob Herring,
	linux-arm-msm, linux-clk,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r

On Mon, Nov 27, 2017 at 5:22 PM, Stephen Boyd <sboyd-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org> wrote:
> On 11/15, Vivek Gautam wrote:
>> Hi,
>>
>>
>> On Mon, Aug 7, 2017 at 5:59 PM, Rob Clark <robdclark-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
>> > On Mon, Aug 7, 2017 at 4:27 AM, Vivek Gautam
>> > <vivek.gautam-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org> wrote:
>> >> On Thu, Jul 13, 2017 at 5:20 PM, Rob Clark <robdclark-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
>> >>> On Thu, Jul 13, 2017 at 1:35 AM, Sricharan R <sricharan-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org> wrote:
>> >>>> Hi Vivek,
>> >>>>
>> >>>> On 7/13/2017 10:43 AM, Vivek Gautam wrote:
>> >>>>> Hi Stephen,
>> >>>>>
>> >>>>>
>> >>>>> On 07/13/2017 04:24 AM, Stephen Boyd wrote:
>> >>>>>> On 07/06, Vivek Gautam wrote:
>> >>>>>>> @@ -1231,12 +1237,18 @@ static int arm_smmu_map(struct iommu_domain *domain, unsigned long iova,
>> >>>>>>>   static size_t arm_smmu_unmap(struct iommu_domain *domain, unsigned long iova,
>> >>>>>>>                    size_t size)
>> >>>>>>>   {
>> >>>>>>> -    struct io_pgtable_ops *ops = to_smmu_domain(domain)->pgtbl_ops;
>> >>>>>>> +    struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
>> >>>>>>> +    struct io_pgtable_ops *ops = smmu_domain->pgtbl_ops;
>> >>>>>>> +    size_t ret;
>> >>>>>>>         if (!ops)
>> >>>>>>>           return 0;
>> >>>>>>>   -    return ops->unmap(ops, iova, size);
>> >>>>>>> +    pm_runtime_get_sync(smmu_domain->smmu->dev);
>> >>>>>> Can these map/unmap ops be called from an atomic context? I seem
>> >>>>>> to recall that being a problem before.
>> >>>>>
>> >>>>> That's something which was dropped in the following patch merged in master:
>> >>>>> 523d7423e21b iommu/arm-smmu: Remove io-pgtable spinlock
>> >>>>>
>> >>>>> Looks like we don't  need locks here anymore?
>> >>>>
>> >>>>  Apart from the locking, wonder why a explicit pm_runtime is needed
>> >>>>  from unmap. Somehow looks like some path in the master using that
>> >>>>  should have enabled the pm ?
>> >>>>
>> >>>
>> >>> Yes, there are a bunch of scenarios where unmap can happen with
>> >>> disabled master (but not in atomic context).
>> >>
>> >> I would like to understand whether there is a situation where an unmap is
>> >> called in atomic context without an enabled master?
>> >>
>> >> Let's say we have the case where all the unmap calls in atomic context happen
>> >> only from the master's context (in which case the device link should
>> >> take care of
>> >> the pm state of smmu), and the only unmap that happen in non-atomic context
>> >> is the one with master disabled. In such a case doesn it make sense to
>> >> distinguish
>> >> the atomic/non-atomic context and add pm_runtime_get_sync()/put_sync() only
>> >> for the non-atomic context since that would be the one with master disabled.
>> >>
>> >
>> > At least drm/msm needs to hold obj->lock (a mutex) in unmap, so it
>> > won't unmap anything in atomic ctx (but it can unmap w/ master
>> > disabled).  I can't really comment about other non-gpu drivers.  It
>> > seems like a reasonable constraint that either master is enabled or
>> > not in atomic ctx.
>> >
>> > Currently we actually wrap unmap w/ pm_runtime_get/put_sync(), but I'd
>> > like to drop that to avoid powering up the gpu.
>>
>> Since the deferring the TLB maintenance doesn't look like the best approach [1],
>> how about if we try to power-up only the smmu from different client
>> devices such as,
>> GPU in the unmap path. Then we won't need to add pm_runtime_get/put() calls in
>> arm_smmu_unmap().
>>
>> The client device can use something like - pm_runtime_get_supplier() since
>> we already have the device link in place with this patch series. This should
>> power-on the supplier (which is smmu) without turning on the consumer
>> (such as GPU).
>>
>> pm_runtime_get_supplier() however is not exported at this moment.
>> Will it be useful to export this API and use it in the drivers.
>>
>
> I'm not sure pm_runtime_get_supplier() is correct either. That
> feels like we're relying on the GPU driver knowing the internal
> details of how the device links are configured.
>

what does pm_runtime_get_supplier() do if IOMMU driver hasn't setup
device-link?  If it is a no-op, then I guess the GPU driver calling
pm_runtime_get_supplier() seems reasonable, and less annoying than
having special cases in pm_resume path.. I don't feel too bad about
having "just in case" get/put_supplier() calls in the unmap path.

Also, presumably we still want to avoid powering up GPU even if we
short circuit the firmware loading and rest of "booting up the GPU"..
since presumably the GPU draws somewhat more power than the IOMMU..
having the pm_resume/suspend path know about the diff between waking
up / suspending the iommu and itself doesn't really feel less-bad than
just doing "just in case" get/put_supplier() calls.

BR,
-R

> Is there some way to have the GPU driver know in its runtime PM
> resume hook that it doesn't need to be powered on because it
> isn't actively drawing anything or processing commands? I'm
> thinking of the code calling pm_runtime_get() as proposed around
> the IOMMU unmap path in the GPU driver and then having the
> runtime PM resume hook in the GPU driver return some special
> value to indicate that it didn't really resume because it didn't
> need to and to treat the device as runtime suspended but not
> return an error. Then the runtime PM core can keep track of that
> and try to power the GPU on again when another pm_runtime_get()
> is called on the GPU device.
>
> This keeps the consumer API the same, always pm_runtime_get(),
> but leaves the device driver logic of what to do when the GPU
> doesn't need to power on to the runtime PM hook where the driver
> has all the information.
>
> --
> Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
> a Linux Foundation Collaborative Project

^ permalink raw reply	[flat|nested] 168+ messages in thread

* Re: [PATCH V4 3/6] iommu/arm-smmu: Invoke pm_runtime during probe, add/remove device
@ 2017-11-27 23:43                                     ` Rob Clark
  0 siblings, 0 replies; 168+ messages in thread
From: Rob Clark @ 2017-11-27 23:43 UTC (permalink / raw)
  To: Stephen Boyd
  Cc: Vivek Gautam, Robin Murphy, Will Deacon, Rafael J. Wysocki,
	Sricharan R, Joerg Roedel, Rob Herring, Mark Rutland,
	Marek Szyprowski, iommu, devicetree, Linux Kernel Mailing List,
	linux-clk, linux-arm-msm, Stanimir Varbanov, Archit Taneja,
	linux-arm-kernel

On Mon, Nov 27, 2017 at 5:22 PM, Stephen Boyd <sboyd@codeaurora.org> wrote:
> On 11/15, Vivek Gautam wrote:
>> Hi,
>>
>>
>> On Mon, Aug 7, 2017 at 5:59 PM, Rob Clark <robdclark@gmail.com> wrote:
>> > On Mon, Aug 7, 2017 at 4:27 AM, Vivek Gautam
>> > <vivek.gautam@codeaurora.org> wrote:
>> >> On Thu, Jul 13, 2017 at 5:20 PM, Rob Clark <robdclark@gmail.com> wrote:
>> >>> On Thu, Jul 13, 2017 at 1:35 AM, Sricharan R <sricharan@codeaurora.org> wrote:
>> >>>> Hi Vivek,
>> >>>>
>> >>>> On 7/13/2017 10:43 AM, Vivek Gautam wrote:
>> >>>>> Hi Stephen,
>> >>>>>
>> >>>>>
>> >>>>> On 07/13/2017 04:24 AM, Stephen Boyd wrote:
>> >>>>>> On 07/06, Vivek Gautam wrote:
>> >>>>>>> @@ -1231,12 +1237,18 @@ static int arm_smmu_map(struct iommu_domain *domain, unsigned long iova,
>> >>>>>>>   static size_t arm_smmu_unmap(struct iommu_domain *domain, unsigned long iova,
>> >>>>>>>                    size_t size)
>> >>>>>>>   {
>> >>>>>>> -    struct io_pgtable_ops *ops = to_smmu_domain(domain)->pgtbl_ops;
>> >>>>>>> +    struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
>> >>>>>>> +    struct io_pgtable_ops *ops = smmu_domain->pgtbl_ops;
>> >>>>>>> +    size_t ret;
>> >>>>>>>         if (!ops)
>> >>>>>>>           return 0;
>> >>>>>>>   -    return ops->unmap(ops, iova, size);
>> >>>>>>> +    pm_runtime_get_sync(smmu_domain->smmu->dev);
>> >>>>>> Can these map/unmap ops be called from an atomic context? I seem
>> >>>>>> to recall that being a problem before.
>> >>>>>
>> >>>>> That's something which was dropped in the following patch merged in master:
>> >>>>> 523d7423e21b iommu/arm-smmu: Remove io-pgtable spinlock
>> >>>>>
>> >>>>> Looks like we don't  need locks here anymore?
>> >>>>
>> >>>>  Apart from the locking, wonder why a explicit pm_runtime is needed
>> >>>>  from unmap. Somehow looks like some path in the master using that
>> >>>>  should have enabled the pm ?
>> >>>>
>> >>>
>> >>> Yes, there are a bunch of scenarios where unmap can happen with
>> >>> disabled master (but not in atomic context).
>> >>
>> >> I would like to understand whether there is a situation where an unmap is
>> >> called in atomic context without an enabled master?
>> >>
>> >> Let's say we have the case where all the unmap calls in atomic context happen
>> >> only from the master's context (in which case the device link should
>> >> take care of
>> >> the pm state of smmu), and the only unmap that happen in non-atomic context
>> >> is the one with master disabled. In such a case doesn it make sense to
>> >> distinguish
>> >> the atomic/non-atomic context and add pm_runtime_get_sync()/put_sync() only
>> >> for the non-atomic context since that would be the one with master disabled.
>> >>
>> >
>> > At least drm/msm needs to hold obj->lock (a mutex) in unmap, so it
>> > won't unmap anything in atomic ctx (but it can unmap w/ master
>> > disabled).  I can't really comment about other non-gpu drivers.  It
>> > seems like a reasonable constraint that either master is enabled or
>> > not in atomic ctx.
>> >
>> > Currently we actually wrap unmap w/ pm_runtime_get/put_sync(), but I'd
>> > like to drop that to avoid powering up the gpu.
>>
>> Since the deferring the TLB maintenance doesn't look like the best approach [1],
>> how about if we try to power-up only the smmu from different client
>> devices such as,
>> GPU in the unmap path. Then we won't need to add pm_runtime_get/put() calls in
>> arm_smmu_unmap().
>>
>> The client device can use something like - pm_runtime_get_supplier() since
>> we already have the device link in place with this patch series. This should
>> power-on the supplier (which is smmu) without turning on the consumer
>> (such as GPU).
>>
>> pm_runtime_get_supplier() however is not exported at this moment.
>> Will it be useful to export this API and use it in the drivers.
>>
>
> I'm not sure pm_runtime_get_supplier() is correct either. That
> feels like we're relying on the GPU driver knowing the internal
> details of how the device links are configured.
>

what does pm_runtime_get_supplier() do if IOMMU driver hasn't setup
device-link?  If it is a no-op, then I guess the GPU driver calling
pm_runtime_get_supplier() seems reasonable, and less annoying than
having special cases in pm_resume path.. I don't feel too bad about
having "just in case" get/put_supplier() calls in the unmap path.

Also, presumably we still want to avoid powering up GPU even if we
short circuit the firmware loading and rest of "booting up the GPU"..
since presumably the GPU draws somewhat more power than the IOMMU..
having the pm_resume/suspend path know about the diff between waking
up / suspending the iommu and itself doesn't really feel less-bad than
just doing "just in case" get/put_supplier() calls.

BR,
-R

> Is there some way to have the GPU driver know in its runtime PM
> resume hook that it doesn't need to be powered on because it
> isn't actively drawing anything or processing commands? I'm
> thinking of the code calling pm_runtime_get() as proposed around
> the IOMMU unmap path in the GPU driver and then having the
> runtime PM resume hook in the GPU driver return some special
> value to indicate that it didn't really resume because it didn't
> need to and to treat the device as runtime suspended but not
> return an error. Then the runtime PM core can keep track of that
> and try to power the GPU on again when another pm_runtime_get()
> is called on the GPU device.
>
> This keeps the consumer API the same, always pm_runtime_get(),
> but leaves the device driver logic of what to do when the GPU
> doesn't need to power on to the runtime PM hook where the driver
> has all the information.
>
> --
> Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
> a Linux Foundation Collaborative Project

^ permalink raw reply	[flat|nested] 168+ messages in thread

* Re: [PATCH V4 3/6] iommu/arm-smmu: Invoke pm_runtime during probe, add/remove device
@ 2017-11-27 23:43                                     ` Rob Clark
  0 siblings, 0 replies; 168+ messages in thread
From: Rob Clark @ 2017-11-27 23:43 UTC (permalink / raw)
  To: Stephen Boyd
  Cc: Vivek Gautam, Robin Murphy, Will Deacon, Rafael J. Wysocki,
	Sricharan R, Joerg Roedel, Rob Herring, Mark Rutland,
	Marek Szyprowski, iommu, devicetree, Linux Kernel Mailing List,
	linux-clk, linux-arm-msm, Stanimir Varbanov, Archit Taneja,
	linux-arm-kernel

On Mon, Nov 27, 2017 at 5:22 PM, Stephen Boyd <sboyd@codeaurora.org> wrote:
> On 11/15, Vivek Gautam wrote:
>> Hi,
>>
>>
>> On Mon, Aug 7, 2017 at 5:59 PM, Rob Clark <robdclark@gmail.com> wrote:
>> > On Mon, Aug 7, 2017 at 4:27 AM, Vivek Gautam
>> > <vivek.gautam@codeaurora.org> wrote:
>> >> On Thu, Jul 13, 2017 at 5:20 PM, Rob Clark <robdclark@gmail.com> wrote:
>> >>> On Thu, Jul 13, 2017 at 1:35 AM, Sricharan R <sricharan@codeaurora.org> wrote:
>> >>>> Hi Vivek,
>> >>>>
>> >>>> On 7/13/2017 10:43 AM, Vivek Gautam wrote:
>> >>>>> Hi Stephen,
>> >>>>>
>> >>>>>
>> >>>>> On 07/13/2017 04:24 AM, Stephen Boyd wrote:
>> >>>>>> On 07/06, Vivek Gautam wrote:
>> >>>>>>> @@ -1231,12 +1237,18 @@ static int arm_smmu_map(struct iommu_domain *domain, unsigned long iova,
>> >>>>>>>   static size_t arm_smmu_unmap(struct iommu_domain *domain, unsigned long iova,
>> >>>>>>>                    size_t size)
>> >>>>>>>   {
>> >>>>>>> -    struct io_pgtable_ops *ops = to_smmu_domain(domain)->pgtbl_ops;
>> >>>>>>> +    struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
>> >>>>>>> +    struct io_pgtable_ops *ops = smmu_domain->pgtbl_ops;
>> >>>>>>> +    size_t ret;
>> >>>>>>>         if (!ops)
>> >>>>>>>           return 0;
>> >>>>>>>   -    return ops->unmap(ops, iova, size);
>> >>>>>>> +    pm_runtime_get_sync(smmu_domain->smmu->dev);
>> >>>>>> Can these map/unmap ops be called from an atomic context? I seem
>> >>>>>> to recall that being a problem before.
>> >>>>>
>> >>>>> That's something which was dropped in the following patch merged in master:
>> >>>>> 523d7423e21b iommu/arm-smmu: Remove io-pgtable spinlock
>> >>>>>
>> >>>>> Looks like we don't  need locks here anymore?
>> >>>>
>> >>>>  Apart from the locking, wonder why a explicit pm_runtime is needed
>> >>>>  from unmap. Somehow looks like some path in the master using that
>> >>>>  should have enabled the pm ?
>> >>>>
>> >>>
>> >>> Yes, there are a bunch of scenarios where unmap can happen with
>> >>> disabled master (but not in atomic context).
>> >>
>> >> I would like to understand whether there is a situation where an unmap is
>> >> called in atomic context without an enabled master?
>> >>
>> >> Let's say we have the case where all the unmap calls in atomic context happen
>> >> only from the master's context (in which case the device link should
>> >> take care of
>> >> the pm state of smmu), and the only unmap that happen in non-atomic context
>> >> is the one with master disabled. In such a case doesn it make sense to
>> >> distinguish
>> >> the atomic/non-atomic context and add pm_runtime_get_sync()/put_sync() only
>> >> for the non-atomic context since that would be the one with master disabled.
>> >>
>> >
>> > At least drm/msm needs to hold obj->lock (a mutex) in unmap, so it
>> > won't unmap anything in atomic ctx (but it can unmap w/ master
>> > disabled).  I can't really comment about other non-gpu drivers.  It
>> > seems like a reasonable constraint that either master is enabled or
>> > not in atomic ctx.
>> >
>> > Currently we actually wrap unmap w/ pm_runtime_get/put_sync(), but I'd
>> > like to drop that to avoid powering up the gpu.
>>
>> Since the deferring the TLB maintenance doesn't look like the best approach [1],
>> how about if we try to power-up only the smmu from different client
>> devices such as,
>> GPU in the unmap path. Then we won't need to add pm_runtime_get/put() calls in
>> arm_smmu_unmap().
>>
>> The client device can use something like - pm_runtime_get_supplier() since
>> we already have the device link in place with this patch series. This should
>> power-on the supplier (which is smmu) without turning on the consumer
>> (such as GPU).
>>
>> pm_runtime_get_supplier() however is not exported at this moment.
>> Will it be useful to export this API and use it in the drivers.
>>
>
> I'm not sure pm_runtime_get_supplier() is correct either. That
> feels like we're relying on the GPU driver knowing the internal
> details of how the device links are configured.
>

what does pm_runtime_get_supplier() do if IOMMU driver hasn't setup
device-link?  If it is a no-op, then I guess the GPU driver calling
pm_runtime_get_supplier() seems reasonable, and less annoying than
having special cases in pm_resume path.. I don't feel too bad about
having "just in case" get/put_supplier() calls in the unmap path.

Also, presumably we still want to avoid powering up GPU even if we
short circuit the firmware loading and rest of "booting up the GPU"..
since presumably the GPU draws somewhat more power than the IOMMU..
having the pm_resume/suspend path know about the diff between waking
up / suspending the iommu and itself doesn't really feel less-bad than
just doing "just in case" get/put_supplier() calls.

BR,
-R

> Is there some way to have the GPU driver know in its runtime PM
> resume hook that it doesn't need to be powered on because it
> isn't actively drawing anything or processing commands? I'm
> thinking of the code calling pm_runtime_get() as proposed around
> the IOMMU unmap path in the GPU driver and then having the
> runtime PM resume hook in the GPU driver return some special
> value to indicate that it didn't really resume because it didn't
> need to and to treat the device as runtime suspended but not
> return an error. Then the runtime PM core can keep track of that
> and try to power the GPU on again when another pm_runtime_get()
> is called on the GPU device.
>
> This keeps the consumer API the same, always pm_runtime_get(),
> but leaves the device driver logic of what to do when the GPU
> doesn't need to power on to the runtime PM hook where the driver
> has all the information.
>
> --
> Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
> a Linux Foundation Collaborative Project

^ permalink raw reply	[flat|nested] 168+ messages in thread

* [PATCH V4 3/6] iommu/arm-smmu: Invoke pm_runtime during probe, add/remove device
@ 2017-11-27 23:43                                     ` Rob Clark
  0 siblings, 0 replies; 168+ messages in thread
From: Rob Clark @ 2017-11-27 23:43 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Nov 27, 2017 at 5:22 PM, Stephen Boyd <sboyd@codeaurora.org> wrote:
> On 11/15, Vivek Gautam wrote:
>> Hi,
>>
>>
>> On Mon, Aug 7, 2017 at 5:59 PM, Rob Clark <robdclark@gmail.com> wrote:
>> > On Mon, Aug 7, 2017 at 4:27 AM, Vivek Gautam
>> > <vivek.gautam@codeaurora.org> wrote:
>> >> On Thu, Jul 13, 2017 at 5:20 PM, Rob Clark <robdclark@gmail.com> wrote:
>> >>> On Thu, Jul 13, 2017 at 1:35 AM, Sricharan R <sricharan@codeaurora.org> wrote:
>> >>>> Hi Vivek,
>> >>>>
>> >>>> On 7/13/2017 10:43 AM, Vivek Gautam wrote:
>> >>>>> Hi Stephen,
>> >>>>>
>> >>>>>
>> >>>>> On 07/13/2017 04:24 AM, Stephen Boyd wrote:
>> >>>>>> On 07/06, Vivek Gautam wrote:
>> >>>>>>> @@ -1231,12 +1237,18 @@ static int arm_smmu_map(struct iommu_domain *domain, unsigned long iova,
>> >>>>>>>   static size_t arm_smmu_unmap(struct iommu_domain *domain, unsigned long iova,
>> >>>>>>>                    size_t size)
>> >>>>>>>   {
>> >>>>>>> -    struct io_pgtable_ops *ops = to_smmu_domain(domain)->pgtbl_ops;
>> >>>>>>> +    struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
>> >>>>>>> +    struct io_pgtable_ops *ops = smmu_domain->pgtbl_ops;
>> >>>>>>> +    size_t ret;
>> >>>>>>>         if (!ops)
>> >>>>>>>           return 0;
>> >>>>>>>   -    return ops->unmap(ops, iova, size);
>> >>>>>>> +    pm_runtime_get_sync(smmu_domain->smmu->dev);
>> >>>>>> Can these map/unmap ops be called from an atomic context? I seem
>> >>>>>> to recall that being a problem before.
>> >>>>>
>> >>>>> That's something which was dropped in the following patch merged in master:
>> >>>>> 523d7423e21b iommu/arm-smmu: Remove io-pgtable spinlock
>> >>>>>
>> >>>>> Looks like we don't  need locks here anymore?
>> >>>>
>> >>>>  Apart from the locking, wonder why a explicit pm_runtime is needed
>> >>>>  from unmap. Somehow looks like some path in the master using that
>> >>>>  should have enabled the pm ?
>> >>>>
>> >>>
>> >>> Yes, there are a bunch of scenarios where unmap can happen with
>> >>> disabled master (but not in atomic context).
>> >>
>> >> I would like to understand whether there is a situation where an unmap is
>> >> called in atomic context without an enabled master?
>> >>
>> >> Let's say we have the case where all the unmap calls in atomic context happen
>> >> only from the master's context (in which case the device link should
>> >> take care of
>> >> the pm state of smmu), and the only unmap that happen in non-atomic context
>> >> is the one with master disabled. In such a case doesn it make sense to
>> >> distinguish
>> >> the atomic/non-atomic context and add pm_runtime_get_sync()/put_sync() only
>> >> for the non-atomic context since that would be the one with master disabled.
>> >>
>> >
>> > At least drm/msm needs to hold obj->lock (a mutex) in unmap, so it
>> > won't unmap anything in atomic ctx (but it can unmap w/ master
>> > disabled).  I can't really comment about other non-gpu drivers.  It
>> > seems like a reasonable constraint that either master is enabled or
>> > not in atomic ctx.
>> >
>> > Currently we actually wrap unmap w/ pm_runtime_get/put_sync(), but I'd
>> > like to drop that to avoid powering up the gpu.
>>
>> Since the deferring the TLB maintenance doesn't look like the best approach [1],
>> how about if we try to power-up only the smmu from different client
>> devices such as,
>> GPU in the unmap path. Then we won't need to add pm_runtime_get/put() calls in
>> arm_smmu_unmap().
>>
>> The client device can use something like - pm_runtime_get_supplier() since
>> we already have the device link in place with this patch series. This should
>> power-on the supplier (which is smmu) without turning on the consumer
>> (such as GPU).
>>
>> pm_runtime_get_supplier() however is not exported at this moment.
>> Will it be useful to export this API and use it in the drivers.
>>
>
> I'm not sure pm_runtime_get_supplier() is correct either. That
> feels like we're relying on the GPU driver knowing the internal
> details of how the device links are configured.
>

what does pm_runtime_get_supplier() do if IOMMU driver hasn't setup
device-link?  If it is a no-op, then I guess the GPU driver calling
pm_runtime_get_supplier() seems reasonable, and less annoying than
having special cases in pm_resume path.. I don't feel too bad about
having "just in case" get/put_supplier() calls in the unmap path.

Also, presumably we still want to avoid powering up GPU even if we
short circuit the firmware loading and rest of "booting up the GPU"..
since presumably the GPU draws somewhat more power than the IOMMU..
having the pm_resume/suspend path know about the diff between waking
up / suspending the iommu and itself doesn't really feel less-bad than
just doing "just in case" get/put_supplier() calls.

BR,
-R

> Is there some way to have the GPU driver know in its runtime PM
> resume hook that it doesn't need to be powered on because it
> isn't actively drawing anything or processing commands? I'm
> thinking of the code calling pm_runtime_get() as proposed around
> the IOMMU unmap path in the GPU driver and then having the
> runtime PM resume hook in the GPU driver return some special
> value to indicate that it didn't really resume because it didn't
> need to and to treat the device as runtime suspended but not
> return an error. Then the runtime PM core can keep track of that
> and try to power the GPU on again when another pm_runtime_get()
> is called on the GPU device.
>
> This keeps the consumer API the same, always pm_runtime_get(),
> but leaves the device driver logic of what to do when the GPU
> doesn't need to power on to the runtime PM hook where the driver
> has all the information.
>
> --
> Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
> a Linux Foundation Collaborative Project

^ permalink raw reply	[flat|nested] 168+ messages in thread

* Re: [PATCH V4 3/6] iommu/arm-smmu: Invoke pm_runtime during probe, add/remove device
  2017-11-27 23:43                                     ` Rob Clark
  (?)
@ 2017-11-28 13:43                                       ` Vivek Gautam
  -1 siblings, 0 replies; 168+ messages in thread
From: Vivek Gautam @ 2017-11-28 13:43 UTC (permalink / raw)
  To: Rob Clark, Stephen Boyd
  Cc: Robin Murphy, Will Deacon, Rafael J. Wysocki, Sricharan R,
	Joerg Roedel, Rob Herring, Mark Rutland, Marek Szyprowski, iommu,
	devicetree, Linux Kernel Mailing List, linux-clk, linux-arm-msm,
	Stanimir Varbanov, Archit Taneja, linux-arm-kernel



On 11/28/2017 05:13 AM, Rob Clark wrote:
> On Mon, Nov 27, 2017 at 5:22 PM, Stephen Boyd<sboyd@codeaurora.org>  wrote:
>> On 11/15, Vivek Gautam wrote:
>>> Hi,
>>>
>>>
>>> On Mon, Aug 7, 2017 at 5:59 PM, Rob Clark<robdclark@gmail.com>  wrote:
>>>> On Mon, Aug 7, 2017 at 4:27 AM, Vivek Gautam
>>>> <vivek.gautam@codeaurora.org>  wrote:
>>>>> On Thu, Jul 13, 2017 at 5:20 PM, Rob Clark<robdclark@gmail.com>  wrote:
>>>>>> On Thu, Jul 13, 2017 at 1:35 AM, Sricharan R<sricharan@codeaurora.org>  wrote:
>>>>>>> Hi Vivek,
>>>>>>>
>>>>>>> On 7/13/2017 10:43 AM, Vivek Gautam wrote:
>>>>>>>> Hi Stephen,
>>>>>>>>
>>>>>>>>
>>>>>>>> On 07/13/2017 04:24 AM, Stephen Boyd wrote:
>>>>>>>>> On 07/06, Vivek Gautam wrote:
>>>>>>>>>> @@ -1231,12 +1237,18 @@ static int arm_smmu_map(struct iommu_domain *domain, unsigned long iova,
>>>>>>>>>>    static size_t arm_smmu_unmap(struct iommu_domain *domain, unsigned long iova,
>>>>>>>>>>                     size_t size)
>>>>>>>>>>    {
>>>>>>>>>> -    struct io_pgtable_ops *ops = to_smmu_domain(domain)->pgtbl_ops;
>>>>>>>>>> +    struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
>>>>>>>>>> +    struct io_pgtable_ops *ops = smmu_domain->pgtbl_ops;
>>>>>>>>>> +    size_t ret;
>>>>>>>>>>          if (!ops)
>>>>>>>>>>            return 0;
>>>>>>>>>>    -    return ops->unmap(ops, iova, size);
>>>>>>>>>> +    pm_runtime_get_sync(smmu_domain->smmu->dev);
>>>>>>>>> Can these map/unmap ops be called from an atomic context? I seem
>>>>>>>>> to recall that being a problem before.
>>>>>>>> That's something which was dropped in the following patch merged in master:
>>>>>>>> 523d7423e21b iommu/arm-smmu: Remove io-pgtable spinlock
>>>>>>>>
>>>>>>>> Looks like we don't  need locks here anymore?
>>>>>>>   Apart from the locking, wonder why a explicit pm_runtime is needed
>>>>>>>   from unmap. Somehow looks like some path in the master using that
>>>>>>>   should have enabled the pm ?
>>>>>>>
>>>>>> Yes, there are a bunch of scenarios where unmap can happen with
>>>>>> disabled master (but not in atomic context).
>>>>> I would like to understand whether there is a situation where an unmap is
>>>>> called in atomic context without an enabled master?
>>>>>
>>>>> Let's say we have the case where all the unmap calls in atomic context happen
>>>>> only from the master's context (in which case the device link should
>>>>> take care of
>>>>> the pm state of smmu), and the only unmap that happen in non-atomic context
>>>>> is the one with master disabled. In such a case doesn it make sense to
>>>>> distinguish
>>>>> the atomic/non-atomic context and add pm_runtime_get_sync()/put_sync() only
>>>>> for the non-atomic context since that would be the one with master disabled.
>>>>>
>>>> At least drm/msm needs to hold obj->lock (a mutex) in unmap, so it
>>>> won't unmap anything in atomic ctx (but it can unmap w/ master
>>>> disabled).  I can't really comment about other non-gpu drivers.  It
>>>> seems like a reasonable constraint that either master is enabled or
>>>> not in atomic ctx.
>>>>
>>>> Currently we actually wrap unmap w/ pm_runtime_get/put_sync(), but I'd
>>>> like to drop that to avoid powering up the gpu.
>>> Since the deferring the TLB maintenance doesn't look like the best approach [1],
>>> how about if we try to power-up only the smmu from different client
>>> devices such as,
>>> GPU in the unmap path. Then we won't need to add pm_runtime_get/put() calls in
>>> arm_smmu_unmap().
>>>
>>> The client device can use something like - pm_runtime_get_supplier() since
>>> we already have the device link in place with this patch series. This should
>>> power-on the supplier (which is smmu) without turning on the consumer
>>> (such as GPU).
>>>
>>> pm_runtime_get_supplier() however is not exported at this moment.
>>> Will it be useful to export this API and use it in the drivers.
>>>
>> I'm not sure pm_runtime_get_supplier() is correct either. That
>> feels like we're relying on the GPU driver knowing the internal
>> details of how the device links are configured.
>>
> what does pm_runtime_get_supplier() do if IOMMU driver hasn't setup
> device-link?

It will be a no-op.

> If it is a no-op, then I guess the GPU driver calling
> pm_runtime_get_supplier() seems reasonable, and less annoying than
> having special cases in pm_resume path.. I don't feel too bad about
> having "just in case" get/put_supplier() calls in the unmap path.
>
> Also, presumably we still want to avoid powering up GPU even if we
> short circuit the firmware loading and rest of "booting up the GPU"..
> since presumably the GPU draws somewhat more power than the IOMMU..
> having the pm_resume/suspend path know about the diff between waking
> up / suspending the iommu and itself doesn't really feel less-bad than
> just doing "just in case" get/put_supplier() calls.

If it sounds okay, then i can send a patch that exports the
pm_runtime_get/put_suppliers() APIs.


Best regards
Vivek

> BR,
> -R
>
>> Is there some way to have the GPU driver know in its runtime PM
>> resume hook that it doesn't need to be powered on because it
>> isn't actively drawing anything or processing commands? I'm
>> thinking of the code calling pm_runtime_get() as proposed around
>> the IOMMU unmap path in the GPU driver and then having the
>> runtime PM resume hook in the GPU driver return some special
>> value to indicate that it didn't really resume because it didn't
>> need to and to treat the device as runtime suspended but not
>> return an error. Then the runtime PM core can keep track of that
>> and try to power the GPU on again when another pm_runtime_get()
>> is called on the GPU device.
>>
>> This keeps the consumer API the same, always pm_runtime_get(),
>> but leaves the device driver logic of what to do when the GPU
>> doesn't need to power on to the runtime PM hook where the driver
>> has all the information.
>>
>> --
>> Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
>> a Linux Foundation Collaborative Project
> --
> To unsubscribe from this list: send the line "unsubscribe linux-arm-msm" in
> the body of a message tomajordomo@vger.kernel.org
> More majordomo info athttp://vger.kernel.org/majordomo-info.html

-- 
The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project

^ permalink raw reply	[flat|nested] 168+ messages in thread

* Re: [PATCH V4 3/6] iommu/arm-smmu: Invoke pm_runtime during probe, add/remove device
@ 2017-11-28 13:43                                       ` Vivek Gautam
  0 siblings, 0 replies; 168+ messages in thread
From: Vivek Gautam @ 2017-11-28 13:43 UTC (permalink / raw)
  To: Rob Clark, Stephen Boyd
  Cc: Robin Murphy, Will Deacon, Rafael J. Wysocki, Sricharan R,
	Joerg Roedel, Rob Herring, Mark Rutland, Marek Szyprowski, iommu,
	devicetree, Linux Kernel Mailing List, linux-clk, linux-arm-msm,
	Stanimir Varbanov, Archit Taneja, linux-arm-kernel



On 11/28/2017 05:13 AM, Rob Clark wrote:
> On Mon, Nov 27, 2017 at 5:22 PM, Stephen Boyd<sboyd@codeaurora.org>  wrote:
>> On 11/15, Vivek Gautam wrote:
>>> Hi,
>>>
>>>
>>> On Mon, Aug 7, 2017 at 5:59 PM, Rob Clark<robdclark@gmail.com>  wrote:
>>>> On Mon, Aug 7, 2017 at 4:27 AM, Vivek Gautam
>>>> <vivek.gautam@codeaurora.org>  wrote:
>>>>> On Thu, Jul 13, 2017 at 5:20 PM, Rob Clark<robdclark@gmail.com>  wrote:
>>>>>> On Thu, Jul 13, 2017 at 1:35 AM, Sricharan R<sricharan@codeaurora.org>  wrote:
>>>>>>> Hi Vivek,
>>>>>>>
>>>>>>> On 7/13/2017 10:43 AM, Vivek Gautam wrote:
>>>>>>>> Hi Stephen,
>>>>>>>>
>>>>>>>>
>>>>>>>> On 07/13/2017 04:24 AM, Stephen Boyd wrote:
>>>>>>>>> On 07/06, Vivek Gautam wrote:
>>>>>>>>>> @@ -1231,12 +1237,18 @@ static int arm_smmu_map(struct iommu_domain *domain, unsigned long iova,
>>>>>>>>>>    static size_t arm_smmu_unmap(struct iommu_domain *domain, unsigned long iova,
>>>>>>>>>>                     size_t size)
>>>>>>>>>>    {
>>>>>>>>>> -    struct io_pgtable_ops *ops = to_smmu_domain(domain)->pgtbl_ops;
>>>>>>>>>> +    struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
>>>>>>>>>> +    struct io_pgtable_ops *ops = smmu_domain->pgtbl_ops;
>>>>>>>>>> +    size_t ret;
>>>>>>>>>>          if (!ops)
>>>>>>>>>>            return 0;
>>>>>>>>>>    -    return ops->unmap(ops, iova, size);
>>>>>>>>>> +    pm_runtime_get_sync(smmu_domain->smmu->dev);
>>>>>>>>> Can these map/unmap ops be called from an atomic context? I seem
>>>>>>>>> to recall that being a problem before.
>>>>>>>> That's something which was dropped in the following patch merged in master:
>>>>>>>> 523d7423e21b iommu/arm-smmu: Remove io-pgtable spinlock
>>>>>>>>
>>>>>>>> Looks like we don't  need locks here anymore?
>>>>>>>   Apart from the locking, wonder why a explicit pm_runtime is needed
>>>>>>>   from unmap. Somehow looks like some path in the master using that
>>>>>>>   should have enabled the pm ?
>>>>>>>
>>>>>> Yes, there are a bunch of scenarios where unmap can happen with
>>>>>> disabled master (but not in atomic context).
>>>>> I would like to understand whether there is a situation where an unmap is
>>>>> called in atomic context without an enabled master?
>>>>>
>>>>> Let's say we have the case where all the unmap calls in atomic context happen
>>>>> only from the master's context (in which case the device link should
>>>>> take care of
>>>>> the pm state of smmu), and the only unmap that happen in non-atomic context
>>>>> is the one with master disabled. In such a case doesn it make sense to
>>>>> distinguish
>>>>> the atomic/non-atomic context and add pm_runtime_get_sync()/put_sync() only
>>>>> for the non-atomic context since that would be the one with master disabled.
>>>>>
>>>> At least drm/msm needs to hold obj->lock (a mutex) in unmap, so it
>>>> won't unmap anything in atomic ctx (but it can unmap w/ master
>>>> disabled).  I can't really comment about other non-gpu drivers.  It
>>>> seems like a reasonable constraint that either master is enabled or
>>>> not in atomic ctx.
>>>>
>>>> Currently we actually wrap unmap w/ pm_runtime_get/put_sync(), but I'd
>>>> like to drop that to avoid powering up the gpu.
>>> Since the deferring the TLB maintenance doesn't look like the best approach [1],
>>> how about if we try to power-up only the smmu from different client
>>> devices such as,
>>> GPU in the unmap path. Then we won't need to add pm_runtime_get/put() calls in
>>> arm_smmu_unmap().
>>>
>>> The client device can use something like - pm_runtime_get_supplier() since
>>> we already have the device link in place with this patch series. This should
>>> power-on the supplier (which is smmu) without turning on the consumer
>>> (such as GPU).
>>>
>>> pm_runtime_get_supplier() however is not exported at this moment.
>>> Will it be useful to export this API and use it in the drivers.
>>>
>> I'm not sure pm_runtime_get_supplier() is correct either. That
>> feels like we're relying on the GPU driver knowing the internal
>> details of how the device links are configured.
>>
> what does pm_runtime_get_supplier() do if IOMMU driver hasn't setup
> device-link?

It will be a no-op.

> If it is a no-op, then I guess the GPU driver calling
> pm_runtime_get_supplier() seems reasonable, and less annoying than
> having special cases in pm_resume path.. I don't feel too bad about
> having "just in case" get/put_supplier() calls in the unmap path.
>
> Also, presumably we still want to avoid powering up GPU even if we
> short circuit the firmware loading and rest of "booting up the GPU"..
> since presumably the GPU draws somewhat more power than the IOMMU..
> having the pm_resume/suspend path know about the diff between waking
> up / suspending the iommu and itself doesn't really feel less-bad than
> just doing "just in case" get/put_supplier() calls.

If it sounds okay, then i can send a patch that exports the
pm_runtime_get/put_suppliers() APIs.


Best regards
Vivek

> BR,
> -R
>
>> Is there some way to have the GPU driver know in its runtime PM
>> resume hook that it doesn't need to be powered on because it
>> isn't actively drawing anything or processing commands? I'm
>> thinking of the code calling pm_runtime_get() as proposed around
>> the IOMMU unmap path in the GPU driver and then having the
>> runtime PM resume hook in the GPU driver return some special
>> value to indicate that it didn't really resume because it didn't
>> need to and to treat the device as runtime suspended but not
>> return an error. Then the runtime PM core can keep track of that
>> and try to power the GPU on again when another pm_runtime_get()
>> is called on the GPU device.
>>
>> This keeps the consumer API the same, always pm_runtime_get(),
>> but leaves the device driver logic of what to do when the GPU
>> doesn't need to power on to the runtime PM hook where the driver
>> has all the information.
>>
>> --
>> Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
>> a Linux Foundation Collaborative Project
> --
> To unsubscribe from this list: send the line "unsubscribe linux-arm-msm" in
> the body of a message tomajordomo@vger.kernel.org
> More majordomo info athttp://vger.kernel.org/majordomo-info.html

-- 
The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project

^ permalink raw reply	[flat|nested] 168+ messages in thread

* [PATCH V4 3/6] iommu/arm-smmu: Invoke pm_runtime during probe, add/remove device
@ 2017-11-28 13:43                                       ` Vivek Gautam
  0 siblings, 0 replies; 168+ messages in thread
From: Vivek Gautam @ 2017-11-28 13:43 UTC (permalink / raw)
  To: linux-arm-kernel



On 11/28/2017 05:13 AM, Rob Clark wrote:
> On Mon, Nov 27, 2017 at 5:22 PM, Stephen Boyd<sboyd@codeaurora.org>  wrote:
>> On 11/15, Vivek Gautam wrote:
>>> Hi,
>>>
>>>
>>> On Mon, Aug 7, 2017 at 5:59 PM, Rob Clark<robdclark@gmail.com>  wrote:
>>>> On Mon, Aug 7, 2017 at 4:27 AM, Vivek Gautam
>>>> <vivek.gautam@codeaurora.org>  wrote:
>>>>> On Thu, Jul 13, 2017 at 5:20 PM, Rob Clark<robdclark@gmail.com>  wrote:
>>>>>> On Thu, Jul 13, 2017 at 1:35 AM, Sricharan R<sricharan@codeaurora.org>  wrote:
>>>>>>> Hi Vivek,
>>>>>>>
>>>>>>> On 7/13/2017 10:43 AM, Vivek Gautam wrote:
>>>>>>>> Hi Stephen,
>>>>>>>>
>>>>>>>>
>>>>>>>> On 07/13/2017 04:24 AM, Stephen Boyd wrote:
>>>>>>>>> On 07/06, Vivek Gautam wrote:
>>>>>>>>>> @@ -1231,12 +1237,18 @@ static int arm_smmu_map(struct iommu_domain *domain, unsigned long iova,
>>>>>>>>>>    static size_t arm_smmu_unmap(struct iommu_domain *domain, unsigned long iova,
>>>>>>>>>>                     size_t size)
>>>>>>>>>>    {
>>>>>>>>>> -    struct io_pgtable_ops *ops = to_smmu_domain(domain)->pgtbl_ops;
>>>>>>>>>> +    struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
>>>>>>>>>> +    struct io_pgtable_ops *ops = smmu_domain->pgtbl_ops;
>>>>>>>>>> +    size_t ret;
>>>>>>>>>>          if (!ops)
>>>>>>>>>>            return 0;
>>>>>>>>>>    -    return ops->unmap(ops, iova, size);
>>>>>>>>>> +    pm_runtime_get_sync(smmu_domain->smmu->dev);
>>>>>>>>> Can these map/unmap ops be called from an atomic context? I seem
>>>>>>>>> to recall that being a problem before.
>>>>>>>> That's something which was dropped in the following patch merged in master:
>>>>>>>> 523d7423e21b iommu/arm-smmu: Remove io-pgtable spinlock
>>>>>>>>
>>>>>>>> Looks like we don't  need locks here anymore?
>>>>>>>   Apart from the locking, wonder why a explicit pm_runtime is needed
>>>>>>>   from unmap. Somehow looks like some path in the master using that
>>>>>>>   should have enabled the pm ?
>>>>>>>
>>>>>> Yes, there are a bunch of scenarios where unmap can happen with
>>>>>> disabled master (but not in atomic context).
>>>>> I would like to understand whether there is a situation where an unmap is
>>>>> called in atomic context without an enabled master?
>>>>>
>>>>> Let's say we have the case where all the unmap calls in atomic context happen
>>>>> only from the master's context (in which case the device link should
>>>>> take care of
>>>>> the pm state of smmu), and the only unmap that happen in non-atomic context
>>>>> is the one with master disabled. In such a case doesn it make sense to
>>>>> distinguish
>>>>> the atomic/non-atomic context and add pm_runtime_get_sync()/put_sync() only
>>>>> for the non-atomic context since that would be the one with master disabled.
>>>>>
>>>> At least drm/msm needs to hold obj->lock (a mutex) in unmap, so it
>>>> won't unmap anything in atomic ctx (but it can unmap w/ master
>>>> disabled).  I can't really comment about other non-gpu drivers.  It
>>>> seems like a reasonable constraint that either master is enabled or
>>>> not in atomic ctx.
>>>>
>>>> Currently we actually wrap unmap w/ pm_runtime_get/put_sync(), but I'd
>>>> like to drop that to avoid powering up the gpu.
>>> Since the deferring the TLB maintenance doesn't look like the best approach [1],
>>> how about if we try to power-up only the smmu from different client
>>> devices such as,
>>> GPU in the unmap path. Then we won't need to add pm_runtime_get/put() calls in
>>> arm_smmu_unmap().
>>>
>>> The client device can use something like - pm_runtime_get_supplier() since
>>> we already have the device link in place with this patch series. This should
>>> power-on the supplier (which is smmu) without turning on the consumer
>>> (such as GPU).
>>>
>>> pm_runtime_get_supplier() however is not exported at this moment.
>>> Will it be useful to export this API and use it in the drivers.
>>>
>> I'm not sure pm_runtime_get_supplier() is correct either. That
>> feels like we're relying on the GPU driver knowing the internal
>> details of how the device links are configured.
>>
> what does pm_runtime_get_supplier() do if IOMMU driver hasn't setup
> device-link?

It will be a no-op.

> If it is a no-op, then I guess the GPU driver calling
> pm_runtime_get_supplier() seems reasonable, and less annoying than
> having special cases in pm_resume path.. I don't feel too bad about
> having "just in case" get/put_supplier() calls in the unmap path.
>
> Also, presumably we still want to avoid powering up GPU even if we
> short circuit the firmware loading and rest of "booting up the GPU"..
> since presumably the GPU draws somewhat more power than the IOMMU..
> having the pm_resume/suspend path know about the diff between waking
> up / suspending the iommu and itself doesn't really feel less-bad than
> just doing "just in case" get/put_supplier() calls.

If it sounds okay, then i can send a patch that exports the
pm_runtime_get/put_suppliers() APIs.


Best regards
Vivek

> BR,
> -R
>
>> Is there some way to have the GPU driver know in its runtime PM
>> resume hook that it doesn't need to be powered on because it
>> isn't actively drawing anything or processing commands? I'm
>> thinking of the code calling pm_runtime_get() as proposed around
>> the IOMMU unmap path in the GPU driver and then having the
>> runtime PM resume hook in the GPU driver return some special
>> value to indicate that it didn't really resume because it didn't
>> need to and to treat the device as runtime suspended but not
>> return an error. Then the runtime PM core can keep track of that
>> and try to power the GPU on again when another pm_runtime_get()
>> is called on the GPU device.
>>
>> This keeps the consumer API the same, always pm_runtime_get(),
>> but leaves the device driver logic of what to do when the GPU
>> doesn't need to power on to the runtime PM hook where the driver
>> has all the information.
>>
>> --
>> Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
>> a Linux Foundation Collaborative Project
> --
> To unsubscribe from this list: send the line "unsubscribe linux-arm-msm" in
> the body of a message tomajordomo at vger.kernel.org
> More majordomo info athttp://vger.kernel.org/majordomo-info.html

-- 
The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project

^ permalink raw reply	[flat|nested] 168+ messages in thread

* Re: [PATCH V4 3/6] iommu/arm-smmu: Invoke pm_runtime during probe, add/remove device
  2017-11-28 13:43                                       ` Vivek Gautam
  (?)
  (?)
@ 2017-11-28 20:05                                           ` Rob Clark
  -1 siblings, 0 replies; 168+ messages in thread
From: Rob Clark @ 2017-11-28 20:05 UTC (permalink / raw)
  To: Vivek Gautam
  Cc: Stephen Boyd, Robin Murphy, Will Deacon, Rafael J. Wysocki,
	Sricharan R, Joerg Roedel, Rob Herring, Mark Rutland,
	Marek Szyprowski,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	devicetree-u79uwXL29TY76Z2rM5mHXA, Linux Kernel Mailing List,
	linux-clk, linux-arm-msm, Stanimir Varbanov, Archit Taneja

On Tue, Nov 28, 2017 at 8:43 AM, Vivek Gautam
<vivek.gautam-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org> wrote:
>
>
> On 11/28/2017 05:13 AM, Rob Clark wrote:
>>
>> On Mon, Nov 27, 2017 at 5:22 PM, Stephen Boyd<sboyd-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org>
>> wrote:
>>>
>>> On 11/15, Vivek Gautam wrote:
>>>>
>>>> Hi,
>>>>
>>>>
>>>> On Mon, Aug 7, 2017 at 5:59 PM, Rob Clark<robdclark-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>  wrote:
>>>>>
>>>>> On Mon, Aug 7, 2017 at 4:27 AM, Vivek Gautam
>>>>> <vivek.gautam-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org>  wrote:
>>>>>>
>>>>>> On Thu, Jul 13, 2017 at 5:20 PM, Rob Clark<robdclark-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
>>>>>> wrote:
>>>>>>>
>>>>>>> On Thu, Jul 13, 2017 at 1:35 AM, Sricharan
>>>>>>> R<sricharan-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org>  wrote:
>>>>>>>>
>>>>>>>> Hi Vivek,
>>>>>>>>
>>>>>>>> On 7/13/2017 10:43 AM, Vivek Gautam wrote:
>>>>>>>>>
>>>>>>>>> Hi Stephen,
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 07/13/2017 04:24 AM, Stephen Boyd wrote:
>>>>>>>>>>
>>>>>>>>>> On 07/06, Vivek Gautam wrote:
>>>>>>>>>>>
>>>>>>>>>>> @@ -1231,12 +1237,18 @@ static int arm_smmu_map(struct
>>>>>>>>>>> iommu_domain *domain, unsigned long iova,
>>>>>>>>>>>    static size_t arm_smmu_unmap(struct iommu_domain *domain,
>>>>>>>>>>> unsigned long iova,
>>>>>>>>>>>                     size_t size)
>>>>>>>>>>>    {
>>>>>>>>>>> -    struct io_pgtable_ops *ops =
>>>>>>>>>>> to_smmu_domain(domain)->pgtbl_ops;
>>>>>>>>>>> +    struct arm_smmu_domain *smmu_domain =
>>>>>>>>>>> to_smmu_domain(domain);
>>>>>>>>>>> +    struct io_pgtable_ops *ops = smmu_domain->pgtbl_ops;
>>>>>>>>>>> +    size_t ret;
>>>>>>>>>>>          if (!ops)
>>>>>>>>>>>            return 0;
>>>>>>>>>>>    -    return ops->unmap(ops, iova, size);
>>>>>>>>>>> +    pm_runtime_get_sync(smmu_domain->smmu->dev);
>>>>>>>>>>
>>>>>>>>>> Can these map/unmap ops be called from an atomic context? I seem
>>>>>>>>>> to recall that being a problem before.
>>>>>>>>>
>>>>>>>>> That's something which was dropped in the following patch merged in
>>>>>>>>> master:
>>>>>>>>> 523d7423e21b iommu/arm-smmu: Remove io-pgtable spinlock
>>>>>>>>>
>>>>>>>>> Looks like we don't  need locks here anymore?
>>>>>>>>
>>>>>>>>   Apart from the locking, wonder why a explicit pm_runtime is needed
>>>>>>>>   from unmap. Somehow looks like some path in the master using that
>>>>>>>>   should have enabled the pm ?
>>>>>>>>
>>>>>>> Yes, there are a bunch of scenarios where unmap can happen with
>>>>>>> disabled master (but not in atomic context).
>>>>>>
>>>>>> I would like to understand whether there is a situation where an unmap
>>>>>> is
>>>>>> called in atomic context without an enabled master?
>>>>>>
>>>>>> Let's say we have the case where all the unmap calls in atomic context
>>>>>> happen
>>>>>> only from the master's context (in which case the device link should
>>>>>> take care of
>>>>>> the pm state of smmu), and the only unmap that happen in non-atomic
>>>>>> context
>>>>>> is the one with master disabled. In such a case doesn it make sense to
>>>>>> distinguish
>>>>>> the atomic/non-atomic context and add pm_runtime_get_sync()/put_sync()
>>>>>> only
>>>>>> for the non-atomic context since that would be the one with master
>>>>>> disabled.
>>>>>>
>>>>> At least drm/msm needs to hold obj->lock (a mutex) in unmap, so it
>>>>> won't unmap anything in atomic ctx (but it can unmap w/ master
>>>>> disabled).  I can't really comment about other non-gpu drivers.  It
>>>>> seems like a reasonable constraint that either master is enabled or
>>>>> not in atomic ctx.
>>>>>
>>>>> Currently we actually wrap unmap w/ pm_runtime_get/put_sync(), but I'd
>>>>> like to drop that to avoid powering up the gpu.
>>>>
>>>> Since the deferring the TLB maintenance doesn't look like the best
>>>> approach [1],
>>>> how about if we try to power-up only the smmu from different client
>>>> devices such as,
>>>> GPU in the unmap path. Then we won't need to add pm_runtime_get/put()
>>>> calls in
>>>> arm_smmu_unmap().
>>>>
>>>> The client device can use something like - pm_runtime_get_supplier()
>>>> since
>>>> we already have the device link in place with this patch series. This
>>>> should
>>>> power-on the supplier (which is smmu) without turning on the consumer
>>>> (such as GPU).
>>>>
>>>> pm_runtime_get_supplier() however is not exported at this moment.
>>>> Will it be useful to export this API and use it in the drivers.
>>>>
>>> I'm not sure pm_runtime_get_supplier() is correct either. That
>>> feels like we're relying on the GPU driver knowing the internal
>>> details of how the device links are configured.
>>>
>> what does pm_runtime_get_supplier() do if IOMMU driver hasn't setup
>> device-link?
>
>
> It will be a no-op.
>
>> If it is a no-op, then I guess the GPU driver calling
>> pm_runtime_get_supplier() seems reasonable, and less annoying than
>> having special cases in pm_resume path.. I don't feel too bad about
>> having "just in case" get/put_supplier() calls in the unmap path.
>>
>> Also, presumably we still want to avoid powering up GPU even if we
>> short circuit the firmware loading and rest of "booting up the GPU"..
>> since presumably the GPU draws somewhat more power than the IOMMU..
>> having the pm_resume/suspend path know about the diff between waking
>> up / suspending the iommu and itself doesn't really feel less-bad than
>> just doing "just in case" get/put_supplier() calls.
>
>
> If it sounds okay, then i can send a patch that exports the
> pm_runtime_get/put_suppliers() APIs.
>

sounds good to me

BR,
-R


>
> Best regards
> Vivek
>
>> BR,
>> -R
>>
>>> Is there some way to have the GPU driver know in its runtime PM
>>> resume hook that it doesn't need to be powered on because it
>>> isn't actively drawing anything or processing commands? I'm
>>> thinking of the code calling pm_runtime_get() as proposed around
>>> the IOMMU unmap path in the GPU driver and then having the
>>> runtime PM resume hook in the GPU driver return some special
>>> value to indicate that it didn't really resume because it didn't
>>> need to and to treat the device as runtime suspended but not
>>> return an error. Then the runtime PM core can keep track of that
>>> and try to power the GPU on again when another pm_runtime_get()
>>> is called on the GPU device.
>>>
>>> This keeps the consumer API the same, always pm_runtime_get(),
>>> but leaves the device driver logic of what to do when the GPU
>>> doesn't need to power on to the runtime PM hook where the driver
>>> has all the information.
>>>
>>> --
>>> Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
>>> a Linux Foundation Collaborative Project
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-arm-msm"
>> in
>> the body of a message tomajordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
>> More majordomo info athttp://vger.kernel.org/majordomo-info.html
>
>
> --
> The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
>
> a Linux Foundation Collaborative Project
>
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 168+ messages in thread

* Re: [PATCH V4 3/6] iommu/arm-smmu: Invoke pm_runtime during probe, add/remove device
@ 2017-11-28 20:05                                           ` Rob Clark
  0 siblings, 0 replies; 168+ messages in thread
From: Rob Clark @ 2017-11-28 20:05 UTC (permalink / raw)
  To: Vivek Gautam
  Cc: Stephen Boyd, Robin Murphy, Will Deacon, Rafael J. Wysocki,
	Sricharan R, Joerg Roedel, Rob Herring, Mark Rutland,
	Marek Szyprowski, iommu, devicetree, Linux Kernel Mailing List,
	linux-clk, linux-arm-msm, Stanimir Varbanov, Archit Taneja,
	linux-arm-kernel

On Tue, Nov 28, 2017 at 8:43 AM, Vivek Gautam
<vivek.gautam@codeaurora.org> wrote:
>
>
> On 11/28/2017 05:13 AM, Rob Clark wrote:
>>
>> On Mon, Nov 27, 2017 at 5:22 PM, Stephen Boyd<sboyd@codeaurora.org>
>> wrote:
>>>
>>> On 11/15, Vivek Gautam wrote:
>>>>
>>>> Hi,
>>>>
>>>>
>>>> On Mon, Aug 7, 2017 at 5:59 PM, Rob Clark<robdclark@gmail.com>  wrote:
>>>>>
>>>>> On Mon, Aug 7, 2017 at 4:27 AM, Vivek Gautam
>>>>> <vivek.gautam@codeaurora.org>  wrote:
>>>>>>
>>>>>> On Thu, Jul 13, 2017 at 5:20 PM, Rob Clark<robdclark@gmail.com>
>>>>>> wrote:
>>>>>>>
>>>>>>> On Thu, Jul 13, 2017 at 1:35 AM, Sricharan
>>>>>>> R<sricharan@codeaurora.org>  wrote:
>>>>>>>>
>>>>>>>> Hi Vivek,
>>>>>>>>
>>>>>>>> On 7/13/2017 10:43 AM, Vivek Gautam wrote:
>>>>>>>>>
>>>>>>>>> Hi Stephen,
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 07/13/2017 04:24 AM, Stephen Boyd wrote:
>>>>>>>>>>
>>>>>>>>>> On 07/06, Vivek Gautam wrote:
>>>>>>>>>>>
>>>>>>>>>>> @@ -1231,12 +1237,18 @@ static int arm_smmu_map(struct
>>>>>>>>>>> iommu_domain *domain, unsigned long iova,
>>>>>>>>>>>    static size_t arm_smmu_unmap(struct iommu_domain *domain,
>>>>>>>>>>> unsigned long iova,
>>>>>>>>>>>                     size_t size)
>>>>>>>>>>>    {
>>>>>>>>>>> -    struct io_pgtable_ops *ops =
>>>>>>>>>>> to_smmu_domain(domain)->pgtbl_ops;
>>>>>>>>>>> +    struct arm_smmu_domain *smmu_domain =
>>>>>>>>>>> to_smmu_domain(domain);
>>>>>>>>>>> +    struct io_pgtable_ops *ops = smmu_domain->pgtbl_ops;
>>>>>>>>>>> +    size_t ret;
>>>>>>>>>>>          if (!ops)
>>>>>>>>>>>            return 0;
>>>>>>>>>>>    -    return ops->unmap(ops, iova, size);
>>>>>>>>>>> +    pm_runtime_get_sync(smmu_domain->smmu->dev);
>>>>>>>>>>
>>>>>>>>>> Can these map/unmap ops be called from an atomic context? I seem
>>>>>>>>>> to recall that being a problem before.
>>>>>>>>>
>>>>>>>>> That's something which was dropped in the following patch merged in
>>>>>>>>> master:
>>>>>>>>> 523d7423e21b iommu/arm-smmu: Remove io-pgtable spinlock
>>>>>>>>>
>>>>>>>>> Looks like we don't  need locks here anymore?
>>>>>>>>
>>>>>>>>   Apart from the locking, wonder why a explicit pm_runtime is needed
>>>>>>>>   from unmap. Somehow looks like some path in the master using that
>>>>>>>>   should have enabled the pm ?
>>>>>>>>
>>>>>>> Yes, there are a bunch of scenarios where unmap can happen with
>>>>>>> disabled master (but not in atomic context).
>>>>>>
>>>>>> I would like to understand whether there is a situation where an unmap
>>>>>> is
>>>>>> called in atomic context without an enabled master?
>>>>>>
>>>>>> Let's say we have the case where all the unmap calls in atomic context
>>>>>> happen
>>>>>> only from the master's context (in which case the device link should
>>>>>> take care of
>>>>>> the pm state of smmu), and the only unmap that happen in non-atomic
>>>>>> context
>>>>>> is the one with master disabled. In such a case doesn it make sense to
>>>>>> distinguish
>>>>>> the atomic/non-atomic context and add pm_runtime_get_sync()/put_sync()
>>>>>> only
>>>>>> for the non-atomic context since that would be the one with master
>>>>>> disabled.
>>>>>>
>>>>> At least drm/msm needs to hold obj->lock (a mutex) in unmap, so it
>>>>> won't unmap anything in atomic ctx (but it can unmap w/ master
>>>>> disabled).  I can't really comment about other non-gpu drivers.  It
>>>>> seems like a reasonable constraint that either master is enabled or
>>>>> not in atomic ctx.
>>>>>
>>>>> Currently we actually wrap unmap w/ pm_runtime_get/put_sync(), but I'd
>>>>> like to drop that to avoid powering up the gpu.
>>>>
>>>> Since the deferring the TLB maintenance doesn't look like the best
>>>> approach [1],
>>>> how about if we try to power-up only the smmu from different client
>>>> devices such as,
>>>> GPU in the unmap path. Then we won't need to add pm_runtime_get/put()
>>>> calls in
>>>> arm_smmu_unmap().
>>>>
>>>> The client device can use something like - pm_runtime_get_supplier()
>>>> since
>>>> we already have the device link in place with this patch series. This
>>>> should
>>>> power-on the supplier (which is smmu) without turning on the consumer
>>>> (such as GPU).
>>>>
>>>> pm_runtime_get_supplier() however is not exported at this moment.
>>>> Will it be useful to export this API and use it in the drivers.
>>>>
>>> I'm not sure pm_runtime_get_supplier() is correct either. That
>>> feels like we're relying on the GPU driver knowing the internal
>>> details of how the device links are configured.
>>>
>> what does pm_runtime_get_supplier() do if IOMMU driver hasn't setup
>> device-link?
>
>
> It will be a no-op.
>
>> If it is a no-op, then I guess the GPU driver calling
>> pm_runtime_get_supplier() seems reasonable, and less annoying than
>> having special cases in pm_resume path.. I don't feel too bad about
>> having "just in case" get/put_supplier() calls in the unmap path.
>>
>> Also, presumably we still want to avoid powering up GPU even if we
>> short circuit the firmware loading and rest of "booting up the GPU"..
>> since presumably the GPU draws somewhat more power than the IOMMU..
>> having the pm_resume/suspend path know about the diff between waking
>> up / suspending the iommu and itself doesn't really feel less-bad than
>> just doing "just in case" get/put_supplier() calls.
>
>
> If it sounds okay, then i can send a patch that exports the
> pm_runtime_get/put_suppliers() APIs.
>

sounds good to me

BR,
-R


>
> Best regards
> Vivek
>
>> BR,
>> -R
>>
>>> Is there some way to have the GPU driver know in its runtime PM
>>> resume hook that it doesn't need to be powered on because it
>>> isn't actively drawing anything or processing commands? I'm
>>> thinking of the code calling pm_runtime_get() as proposed around
>>> the IOMMU unmap path in the GPU driver and then having the
>>> runtime PM resume hook in the GPU driver return some special
>>> value to indicate that it didn't really resume because it didn't
>>> need to and to treat the device as runtime suspended but not
>>> return an error. Then the runtime PM core can keep track of that
>>> and try to power the GPU on again when another pm_runtime_get()
>>> is called on the GPU device.
>>>
>>> This keeps the consumer API the same, always pm_runtime_get(),
>>> but leaves the device driver logic of what to do when the GPU
>>> doesn't need to power on to the runtime PM hook where the driver
>>> has all the information.
>>>
>>> --
>>> Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
>>> a Linux Foundation Collaborative Project
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-arm-msm"
>> in
>> the body of a message tomajordomo@vger.kernel.org
>> More majordomo info athttp://vger.kernel.org/majordomo-info.html
>
>
> --
> The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
>
> a Linux Foundation Collaborative Project
>

^ permalink raw reply	[flat|nested] 168+ messages in thread

* Re: [PATCH V4 3/6] iommu/arm-smmu: Invoke pm_runtime during probe, add/remove device
@ 2017-11-28 20:05                                           ` Rob Clark
  0 siblings, 0 replies; 168+ messages in thread
From: Rob Clark @ 2017-11-28 20:05 UTC (permalink / raw)
  To: Vivek Gautam
  Cc: Stephen Boyd, Robin Murphy, Will Deacon, Rafael J. Wysocki,
	Sricharan R, Joerg Roedel, Rob Herring, Mark Rutland,
	Marek Szyprowski, iommu, devicetree, Linux Kernel Mailing List,
	linux-clk, linux-arm-msm, Stanimir Varbanov, Archit Taneja,
	linux-arm-kernel

On Tue, Nov 28, 2017 at 8:43 AM, Vivek Gautam
<vivek.gautam@codeaurora.org> wrote:
>
>
> On 11/28/2017 05:13 AM, Rob Clark wrote:
>>
>> On Mon, Nov 27, 2017 at 5:22 PM, Stephen Boyd<sboyd@codeaurora.org>
>> wrote:
>>>
>>> On 11/15, Vivek Gautam wrote:
>>>>
>>>> Hi,
>>>>
>>>>
>>>> On Mon, Aug 7, 2017 at 5:59 PM, Rob Clark<robdclark@gmail.com>  wrote:
>>>>>
>>>>> On Mon, Aug 7, 2017 at 4:27 AM, Vivek Gautam
>>>>> <vivek.gautam@codeaurora.org>  wrote:
>>>>>>
>>>>>> On Thu, Jul 13, 2017 at 5:20 PM, Rob Clark<robdclark@gmail.com>
>>>>>> wrote:
>>>>>>>
>>>>>>> On Thu, Jul 13, 2017 at 1:35 AM, Sricharan
>>>>>>> R<sricharan@codeaurora.org>  wrote:
>>>>>>>>
>>>>>>>> Hi Vivek,
>>>>>>>>
>>>>>>>> On 7/13/2017 10:43 AM, Vivek Gautam wrote:
>>>>>>>>>
>>>>>>>>> Hi Stephen,
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 07/13/2017 04:24 AM, Stephen Boyd wrote:
>>>>>>>>>>
>>>>>>>>>> On 07/06, Vivek Gautam wrote:
>>>>>>>>>>>
>>>>>>>>>>> @@ -1231,12 +1237,18 @@ static int arm_smmu_map(struct
>>>>>>>>>>> iommu_domain *domain, unsigned long iova,
>>>>>>>>>>>    static size_t arm_smmu_unmap(struct iommu_domain *domain,
>>>>>>>>>>> unsigned long iova,
>>>>>>>>>>>                     size_t size)
>>>>>>>>>>>    {
>>>>>>>>>>> -    struct io_pgtable_ops *ops =
>>>>>>>>>>> to_smmu_domain(domain)->pgtbl_ops;
>>>>>>>>>>> +    struct arm_smmu_domain *smmu_domain =
>>>>>>>>>>> to_smmu_domain(domain);
>>>>>>>>>>> +    struct io_pgtable_ops *ops = smmu_domain->pgtbl_ops;
>>>>>>>>>>> +    size_t ret;
>>>>>>>>>>>          if (!ops)
>>>>>>>>>>>            return 0;
>>>>>>>>>>>    -    return ops->unmap(ops, iova, size);
>>>>>>>>>>> +    pm_runtime_get_sync(smmu_domain->smmu->dev);
>>>>>>>>>>
>>>>>>>>>> Can these map/unmap ops be called from an atomic context? I seem
>>>>>>>>>> to recall that being a problem before.
>>>>>>>>>
>>>>>>>>> That's something which was dropped in the following patch merged in
>>>>>>>>> master:
>>>>>>>>> 523d7423e21b iommu/arm-smmu: Remove io-pgtable spinlock
>>>>>>>>>
>>>>>>>>> Looks like we don't  need locks here anymore?
>>>>>>>>
>>>>>>>>   Apart from the locking, wonder why a explicit pm_runtime is needed
>>>>>>>>   from unmap. Somehow looks like some path in the master using that
>>>>>>>>   should have enabled the pm ?
>>>>>>>>
>>>>>>> Yes, there are a bunch of scenarios where unmap can happen with
>>>>>>> disabled master (but not in atomic context).
>>>>>>
>>>>>> I would like to understand whether there is a situation where an unmap
>>>>>> is
>>>>>> called in atomic context without an enabled master?
>>>>>>
>>>>>> Let's say we have the case where all the unmap calls in atomic context
>>>>>> happen
>>>>>> only from the master's context (in which case the device link should
>>>>>> take care of
>>>>>> the pm state of smmu), and the only unmap that happen in non-atomic
>>>>>> context
>>>>>> is the one with master disabled. In such a case doesn it make sense to
>>>>>> distinguish
>>>>>> the atomic/non-atomic context and add pm_runtime_get_sync()/put_sync()
>>>>>> only
>>>>>> for the non-atomic context since that would be the one with master
>>>>>> disabled.
>>>>>>
>>>>> At least drm/msm needs to hold obj->lock (a mutex) in unmap, so it
>>>>> won't unmap anything in atomic ctx (but it can unmap w/ master
>>>>> disabled).  I can't really comment about other non-gpu drivers.  It
>>>>> seems like a reasonable constraint that either master is enabled or
>>>>> not in atomic ctx.
>>>>>
>>>>> Currently we actually wrap unmap w/ pm_runtime_get/put_sync(), but I'd
>>>>> like to drop that to avoid powering up the gpu.
>>>>
>>>> Since the deferring the TLB maintenance doesn't look like the best
>>>> approach [1],
>>>> how about if we try to power-up only the smmu from different client
>>>> devices such as,
>>>> GPU in the unmap path. Then we won't need to add pm_runtime_get/put()
>>>> calls in
>>>> arm_smmu_unmap().
>>>>
>>>> The client device can use something like - pm_runtime_get_supplier()
>>>> since
>>>> we already have the device link in place with this patch series. This
>>>> should
>>>> power-on the supplier (which is smmu) without turning on the consumer
>>>> (such as GPU).
>>>>
>>>> pm_runtime_get_supplier() however is not exported at this moment.
>>>> Will it be useful to export this API and use it in the drivers.
>>>>
>>> I'm not sure pm_runtime_get_supplier() is correct either. That
>>> feels like we're relying on the GPU driver knowing the internal
>>> details of how the device links are configured.
>>>
>> what does pm_runtime_get_supplier() do if IOMMU driver hasn't setup
>> device-link?
>
>
> It will be a no-op.
>
>> If it is a no-op, then I guess the GPU driver calling
>> pm_runtime_get_supplier() seems reasonable, and less annoying than
>> having special cases in pm_resume path.. I don't feel too bad about
>> having "just in case" get/put_supplier() calls in the unmap path.
>>
>> Also, presumably we still want to avoid powering up GPU even if we
>> short circuit the firmware loading and rest of "booting up the GPU"..
>> since presumably the GPU draws somewhat more power than the IOMMU..
>> having the pm_resume/suspend path know about the diff between waking
>> up / suspending the iommu and itself doesn't really feel less-bad than
>> just doing "just in case" get/put_supplier() calls.
>
>
> If it sounds okay, then i can send a patch that exports the
> pm_runtime_get/put_suppliers() APIs.
>

sounds good to me

BR,
-R


>
> Best regards
> Vivek
>
>> BR,
>> -R
>>
>>> Is there some way to have the GPU driver know in its runtime PM
>>> resume hook that it doesn't need to be powered on because it
>>> isn't actively drawing anything or processing commands? I'm
>>> thinking of the code calling pm_runtime_get() as proposed around
>>> the IOMMU unmap path in the GPU driver and then having the
>>> runtime PM resume hook in the GPU driver return some special
>>> value to indicate that it didn't really resume because it didn't
>>> need to and to treat the device as runtime suspended but not
>>> return an error. Then the runtime PM core can keep track of that
>>> and try to power the GPU on again when another pm_runtime_get()
>>> is called on the GPU device.
>>>
>>> This keeps the consumer API the same, always pm_runtime_get(),
>>> but leaves the device driver logic of what to do when the GPU
>>> doesn't need to power on to the runtime PM hook where the driver
>>> has all the information.
>>>
>>> --
>>> Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
>>> a Linux Foundation Collaborative Project
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-arm-msm"
>> in
>> the body of a message tomajordomo@vger.kernel.org
>> More majordomo info athttp://vger.kernel.org/majordomo-info.html
>
>
> --
> The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
>
> a Linux Foundation Collaborative Project
>

^ permalink raw reply	[flat|nested] 168+ messages in thread

* [PATCH V4 3/6] iommu/arm-smmu: Invoke pm_runtime during probe, add/remove device
@ 2017-11-28 20:05                                           ` Rob Clark
  0 siblings, 0 replies; 168+ messages in thread
From: Rob Clark @ 2017-11-28 20:05 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Nov 28, 2017 at 8:43 AM, Vivek Gautam
<vivek.gautam@codeaurora.org> wrote:
>
>
> On 11/28/2017 05:13 AM, Rob Clark wrote:
>>
>> On Mon, Nov 27, 2017 at 5:22 PM, Stephen Boyd<sboyd@codeaurora.org>
>> wrote:
>>>
>>> On 11/15, Vivek Gautam wrote:
>>>>
>>>> Hi,
>>>>
>>>>
>>>> On Mon, Aug 7, 2017 at 5:59 PM, Rob Clark<robdclark@gmail.com>  wrote:
>>>>>
>>>>> On Mon, Aug 7, 2017 at 4:27 AM, Vivek Gautam
>>>>> <vivek.gautam@codeaurora.org>  wrote:
>>>>>>
>>>>>> On Thu, Jul 13, 2017 at 5:20 PM, Rob Clark<robdclark@gmail.com>
>>>>>> wrote:
>>>>>>>
>>>>>>> On Thu, Jul 13, 2017 at 1:35 AM, Sricharan
>>>>>>> R<sricharan@codeaurora.org>  wrote:
>>>>>>>>
>>>>>>>> Hi Vivek,
>>>>>>>>
>>>>>>>> On 7/13/2017 10:43 AM, Vivek Gautam wrote:
>>>>>>>>>
>>>>>>>>> Hi Stephen,
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 07/13/2017 04:24 AM, Stephen Boyd wrote:
>>>>>>>>>>
>>>>>>>>>> On 07/06, Vivek Gautam wrote:
>>>>>>>>>>>
>>>>>>>>>>> @@ -1231,12 +1237,18 @@ static int arm_smmu_map(struct
>>>>>>>>>>> iommu_domain *domain, unsigned long iova,
>>>>>>>>>>>    static size_t arm_smmu_unmap(struct iommu_domain *domain,
>>>>>>>>>>> unsigned long iova,
>>>>>>>>>>>                     size_t size)
>>>>>>>>>>>    {
>>>>>>>>>>> -    struct io_pgtable_ops *ops =
>>>>>>>>>>> to_smmu_domain(domain)->pgtbl_ops;
>>>>>>>>>>> +    struct arm_smmu_domain *smmu_domain =
>>>>>>>>>>> to_smmu_domain(domain);
>>>>>>>>>>> +    struct io_pgtable_ops *ops = smmu_domain->pgtbl_ops;
>>>>>>>>>>> +    size_t ret;
>>>>>>>>>>>          if (!ops)
>>>>>>>>>>>            return 0;
>>>>>>>>>>>    -    return ops->unmap(ops, iova, size);
>>>>>>>>>>> +    pm_runtime_get_sync(smmu_domain->smmu->dev);
>>>>>>>>>>
>>>>>>>>>> Can these map/unmap ops be called from an atomic context? I seem
>>>>>>>>>> to recall that being a problem before.
>>>>>>>>>
>>>>>>>>> That's something which was dropped in the following patch merged in
>>>>>>>>> master:
>>>>>>>>> 523d7423e21b iommu/arm-smmu: Remove io-pgtable spinlock
>>>>>>>>>
>>>>>>>>> Looks like we don't  need locks here anymore?
>>>>>>>>
>>>>>>>>   Apart from the locking, wonder why a explicit pm_runtime is needed
>>>>>>>>   from unmap. Somehow looks like some path in the master using that
>>>>>>>>   should have enabled the pm ?
>>>>>>>>
>>>>>>> Yes, there are a bunch of scenarios where unmap can happen with
>>>>>>> disabled master (but not in atomic context).
>>>>>>
>>>>>> I would like to understand whether there is a situation where an unmap
>>>>>> is
>>>>>> called in atomic context without an enabled master?
>>>>>>
>>>>>> Let's say we have the case where all the unmap calls in atomic context
>>>>>> happen
>>>>>> only from the master's context (in which case the device link should
>>>>>> take care of
>>>>>> the pm state of smmu), and the only unmap that happen in non-atomic
>>>>>> context
>>>>>> is the one with master disabled. In such a case doesn it make sense to
>>>>>> distinguish
>>>>>> the atomic/non-atomic context and add pm_runtime_get_sync()/put_sync()
>>>>>> only
>>>>>> for the non-atomic context since that would be the one with master
>>>>>> disabled.
>>>>>>
>>>>> At least drm/msm needs to hold obj->lock (a mutex) in unmap, so it
>>>>> won't unmap anything in atomic ctx (but it can unmap w/ master
>>>>> disabled).  I can't really comment about other non-gpu drivers.  It
>>>>> seems like a reasonable constraint that either master is enabled or
>>>>> not in atomic ctx.
>>>>>
>>>>> Currently we actually wrap unmap w/ pm_runtime_get/put_sync(), but I'd
>>>>> like to drop that to avoid powering up the gpu.
>>>>
>>>> Since the deferring the TLB maintenance doesn't look like the best
>>>> approach [1],
>>>> how about if we try to power-up only the smmu from different client
>>>> devices such as,
>>>> GPU in the unmap path. Then we won't need to add pm_runtime_get/put()
>>>> calls in
>>>> arm_smmu_unmap().
>>>>
>>>> The client device can use something like - pm_runtime_get_supplier()
>>>> since
>>>> we already have the device link in place with this patch series. This
>>>> should
>>>> power-on the supplier (which is smmu) without turning on the consumer
>>>> (such as GPU).
>>>>
>>>> pm_runtime_get_supplier() however is not exported at this moment.
>>>> Will it be useful to export this API and use it in the drivers.
>>>>
>>> I'm not sure pm_runtime_get_supplier() is correct either. That
>>> feels like we're relying on the GPU driver knowing the internal
>>> details of how the device links are configured.
>>>
>> what does pm_runtime_get_supplier() do if IOMMU driver hasn't setup
>> device-link?
>
>
> It will be a no-op.
>
>> If it is a no-op, then I guess the GPU driver calling
>> pm_runtime_get_supplier() seems reasonable, and less annoying than
>> having special cases in pm_resume path.. I don't feel too bad about
>> having "just in case" get/put_supplier() calls in the unmap path.
>>
>> Also, presumably we still want to avoid powering up GPU even if we
>> short circuit the firmware loading and rest of "booting up the GPU"..
>> since presumably the GPU draws somewhat more power than the IOMMU..
>> having the pm_resume/suspend path know about the diff between waking
>> up / suspending the iommu and itself doesn't really feel less-bad than
>> just doing "just in case" get/put_supplier() calls.
>
>
> If it sounds okay, then i can send a patch that exports the
> pm_runtime_get/put_suppliers() APIs.
>

sounds good to me

BR,
-R


>
> Best regards
> Vivek
>
>> BR,
>> -R
>>
>>> Is there some way to have the GPU driver know in its runtime PM
>>> resume hook that it doesn't need to be powered on because it
>>> isn't actively drawing anything or processing commands? I'm
>>> thinking of the code calling pm_runtime_get() as proposed around
>>> the IOMMU unmap path in the GPU driver and then having the
>>> runtime PM resume hook in the GPU driver return some special
>>> value to indicate that it didn't really resume because it didn't
>>> need to and to treat the device as runtime suspended but not
>>> return an error. Then the runtime PM core can keep track of that
>>> and try to power the GPU on again when another pm_runtime_get()
>>> is called on the GPU device.
>>>
>>> This keeps the consumer API the same, always pm_runtime_get(),
>>> but leaves the device driver logic of what to do when the GPU
>>> doesn't need to power on to the runtime PM hook where the driver
>>> has all the information.
>>>
>>> --
>>> Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
>>> a Linux Foundation Collaborative Project
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-arm-msm"
>> in
>> the body of a message tomajordomo at vger.kernel.org
>> More majordomo info athttp://vger.kernel.org/majordomo-info.html
>
>
> --
> The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
>
> a Linux Foundation Collaborative Project
>

^ permalink raw reply	[flat|nested] 168+ messages in thread

end of thread, other threads:[~2017-11-28 20:05 UTC | newest]

Thread overview: 168+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-07-06  9:36 [PATCH V4 0/6] iommu/arm-smmu: Add runtime pm/sleep support Vivek Gautam
2017-07-06  9:36 ` Vivek Gautam
2017-07-06  9:37 ` [PATCH V4 1/6] iommu/arm-smmu: Fix the error path in arm_smmu_add_device Vivek Gautam
2017-07-06  9:37   ` Vivek Gautam
2017-07-06  9:37 ` [PATCH V4 2/6] iommu/arm-smmu: Add pm_runtime/sleep ops Vivek Gautam
2017-07-06  9:37   ` Vivek Gautam
2017-07-06  9:37   ` Vivek Gautam
     [not found]   ` <1499333825-7658-3-git-send-email-vivek.gautam-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org>
2017-07-12 22:58     ` Stephen Boyd
2017-07-12 22:58       ` Stephen Boyd
2017-07-12 22:58       ` Stephen Boyd
     [not found]       ` <20170712225821.GB22780-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org>
2017-07-12 23:01         ` Stephen Boyd
2017-07-12 23:01           ` Stephen Boyd
2017-07-12 23:01           ` Stephen Boyd
2017-07-13  3:57           ` Vivek Gautam
2017-07-13  3:57             ` Vivek Gautam
2017-07-06  9:37 ` [PATCH V4 3/6] iommu/arm-smmu: Invoke pm_runtime during probe, add/remove device Vivek Gautam
2017-07-06  9:37   ` Vivek Gautam
     [not found]   ` <1499333825-7658-4-git-send-email-vivek.gautam-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org>
2017-07-12 22:54     ` Stephen Boyd
2017-07-12 22:54       ` Stephen Boyd
2017-07-12 22:54       ` Stephen Boyd
     [not found]       ` <20170712225459.GZ22780-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org>
2017-07-13  5:13         ` Vivek Gautam
2017-07-13  5:13           ` Vivek Gautam
2017-07-13  5:13           ` Vivek Gautam
     [not found]           ` <5ee0bacd-e557-a6c4-a897-844fb12ea6ae-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org>
2017-07-13  5:35             ` Sricharan R
2017-07-13  5:35               ` Sricharan R
2017-07-13  5:35               ` Sricharan R
     [not found]               ` <4dbc938c-ac88-9bd4-cf00-458008ae24c1-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org>
2017-07-13 11:50                 ` Rob Clark
2017-07-13 11:50                   ` Rob Clark
2017-07-13 11:50                   ` Rob Clark
2017-07-13 11:50                   ` Rob Clark
     [not found]                   ` <CAF6AEGsFOtsOjt1sLNPSFLEcu-7d1zxCOhTeC+P8e0TDbb1dSA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2017-07-13 12:02                     ` Marek Szyprowski
2017-07-13 12:02                       ` Marek Szyprowski
2017-07-13 12:02                       ` Marek Szyprowski
2017-07-13 12:02                       ` Marek Szyprowski
2017-07-13 12:10                       ` Rob Clark
2017-07-13 12:10                         ` Rob Clark
2017-07-13 12:10                         ` Rob Clark
     [not found]                         ` <CAF6AEGsfDewRUHLUbFKT1Q+8U2BkmFMHo4ZBSwSGspU3ktUY8g-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2017-07-13 12:23                           ` Marek Szyprowski
2017-07-13 12:23                             ` Marek Szyprowski
2017-07-13 12:23                             ` Marek Szyprowski
2017-07-13 12:23                             ` Marek Szyprowski
2017-07-13 13:53                     ` Sricharan R
2017-07-13 13:53                       ` Sricharan R
2017-07-13 13:53                       ` Sricharan R
2017-07-13 13:53                       ` Sricharan R
     [not found]                       ` <60a56ae6-ed9d-57cd-130f-5bd9d32d4d58-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org>
2017-07-13 14:55                         ` Rob Clark
2017-07-13 14:55                           ` Rob Clark
2017-07-13 14:55                           ` Rob Clark
2017-07-13 14:55                           ` Rob Clark
2017-07-14 17:07                           ` Will Deacon
2017-07-14 17:07                             ` Will Deacon
2017-07-14 17:07                             ` Will Deacon
2017-07-14 17:42                             ` Rob Clark
2017-07-14 17:42                               ` Rob Clark
2017-07-14 17:42                               ` Rob Clark
2017-07-14 18:06                               ` Will Deacon
2017-07-14 18:06                                 ` Will Deacon
2017-07-14 18:06                                 ` Will Deacon
2017-07-14 18:25                                 ` Rob Clark
2017-07-14 18:25                                   ` Rob Clark
2017-07-14 18:25                                   ` Rob Clark
2017-07-14 19:01                                   ` Will Deacon
2017-07-14 19:01                                     ` Will Deacon
2017-07-14 19:01                                     ` Will Deacon
     [not found]                                     ` <20170714190113.GE26488-5wv7dgnIgG8@public.gmane.org>
2017-07-14 19:34                                       ` Rob Clark
2017-07-14 19:34                                         ` Rob Clark
2017-07-14 19:34                                         ` Rob Clark
2017-07-14 19:34                                         ` Rob Clark
2017-07-14 19:36                                         ` Will Deacon
2017-07-14 19:36                                           ` Will Deacon
2017-07-14 19:36                                           ` Will Deacon
2017-07-14 19:39                                           ` Rob Clark
2017-07-14 19:39                                             ` Rob Clark
2017-07-14 19:39                                             ` Rob Clark
2017-07-17 11:46                                             ` Sricharan R
2017-07-17 11:46                                               ` Sricharan R
2017-07-17 11:46                                               ` Sricharan R
     [not found]                                               ` <6cd287bb-25c0-a7bd-8d3c-a63b9da0fd25-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org>
2017-07-17 12:28                                                 ` Sricharan R
2017-07-17 12:28                                                   ` Sricharan R
2017-07-17 12:28                                                   ` Sricharan R
2017-07-17 12:28                                                   ` Sricharan R
2017-07-24 15:31                                                   ` Vivek Gautam
2017-07-24 15:31                                                     ` Vivek Gautam
2017-07-24 15:31                                                     ` Vivek Gautam
     [not found]                                                     ` <CAFp+6iFfu2-qrDDim7fzKKLqMcSVMmOr7esqBZ-xEeLTOOTNLA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2017-08-02  9:53                                                       ` [PATCH] iommu/arm-smmu: Defer TLB flush in case of unmap op Vivek Gautam
2017-08-02  9:53                                                         ` Vivek Gautam
2017-08-02  9:53                                                         ` Vivek Gautam
2017-08-02 12:17                                                         ` Robin Murphy
2017-08-02 12:17                                                           ` Robin Murphy
     [not found]                                                           ` <35aeb7dd-4fe6-3175-2252-41c3c54873a9-5wv7dgnIgG8@public.gmane.org>
2017-08-03  5:35                                                             ` Vivek Gautam
2017-08-03  5:35                                                               ` Vivek Gautam
2017-08-03  5:35                                                               ` Vivek Gautam
2017-08-04 17:04                                                               ` Robin Murphy
2017-08-04 17:04                                                                 ` Robin Murphy
2017-08-07  7:44                                                                 ` Vivek Gautam
2017-08-07  7:44                                                                   ` Vivek Gautam
2017-08-07  7:44                                                                   ` Vivek Gautam
2017-09-06  5:37                                                                   ` [PATCH v2 1/1] " Vivek Gautam
     [not found]                                                                     ` <1504676255-15980-1-git-send-email-vivek.gautam-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org>
2017-09-13 11:04                                                                       ` Vivek Gautam
2017-10-13 19:08                                                                     ` Will Deacon
2017-11-20 17:17                                                                       ` Vivek Gautam
2017-08-07  8:27                     ` [PATCH V4 3/6] iommu/arm-smmu: Invoke pm_runtime during probe, add/remove device Vivek Gautam
2017-08-07  8:27                       ` Vivek Gautam
2017-08-07  8:27                       ` Vivek Gautam
2017-08-07  8:27                       ` Vivek Gautam
2017-08-07 12:29                       ` Rob Clark
2017-08-07 12:29                         ` Rob Clark
2017-08-07 12:29                         ` Rob Clark
     [not found]                         ` <CAF6AEGsw2=nERuJ8UCBr_kTBS0TigaA9LL1Hxw3JmNiu4oycOA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2017-11-14 18:30                           ` Vivek Gautam
2017-11-14 18:30                             ` Vivek Gautam
2017-11-14 18:30                             ` Vivek Gautam
2017-11-14 18:30                             ` Vivek Gautam
     [not found]                             ` <CAFp+6iGyB-iVb+vyDr6Dzk1FG6baiNy_kZWjB3sm_GViDh6rnQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2017-11-27 22:22                               ` Stephen Boyd
2017-11-27 22:22                                 ` Stephen Boyd
2017-11-27 22:22                                 ` Stephen Boyd
2017-11-27 22:22                                 ` Stephen Boyd
     [not found]                                 ` <20171127222238.GF18379-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org>
2017-11-27 23:43                                   ` Rob Clark
2017-11-27 23:43                                     ` Rob Clark
2017-11-27 23:43                                     ` Rob Clark
2017-11-27 23:43                                     ` Rob Clark
2017-11-28 13:43                                     ` Vivek Gautam
2017-11-28 13:43                                       ` Vivek Gautam
2017-11-28 13:43                                       ` Vivek Gautam
     [not found]                                       ` <3a2f74e9-90cf-d843-d801-15eb614d7abe-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org>
2017-11-28 20:05                                         ` Rob Clark
2017-11-28 20:05                                           ` Rob Clark
2017-11-28 20:05                                           ` Rob Clark
2017-11-28 20:05                                           ` Rob Clark
2017-07-13 13:57                 ` Vivek Gautam
2017-07-13 13:57                   ` Vivek Gautam
2017-07-13 13:57                   ` Vivek Gautam
2017-07-13 13:57                   ` Vivek Gautam
     [not found]                   ` <CAFp+6iFdogDfKbwWta3AMGu2GuZ9NaR+Dv373N7LwwrF5cFYwQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2017-07-13 14:01                     ` Vivek Gautam
2017-07-13 14:01                       ` Vivek Gautam
2017-07-13 14:01                       ` Vivek Gautam
2017-07-13 14:01                       ` Vivek Gautam
2017-07-13  6:48             ` Stephen Boyd
2017-07-13  6:48               ` Stephen Boyd
2017-07-13  6:48               ` Stephen Boyd
2017-07-13  9:50               ` Robin Murphy
2017-07-13  9:50                 ` Robin Murphy
2017-07-13 11:53                 ` Rob Clark
2017-07-13 11:53                   ` Rob Clark
2017-07-13 11:53                   ` Rob Clark
2017-07-06  9:37 ` [PATCH V4 4/6] iommu/arm-smmu: Add the device_link between masters and smmu Vivek Gautam
2017-07-06  9:37   ` Vivek Gautam
     [not found]   ` <1499333825-7658-5-git-send-email-vivek.gautam-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org>
2017-07-12 22:55     ` Stephen Boyd
2017-07-12 22:55       ` Stephen Boyd
2017-07-12 22:55       ` Stephen Boyd
     [not found]       ` <20170712225547.GA22780-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org>
2017-07-13  3:59         ` Vivek Gautam
2017-07-13  3:59           ` Vivek Gautam
2017-07-13  3:59           ` Vivek Gautam
2017-07-06  9:37 ` [PATCH V4 5/6] iommu/arm-smmu: Add support for MMU40x/500 clocks Vivek Gautam
2017-07-06  9:37   ` Vivek Gautam
     [not found]   ` <1499333825-7658-6-git-send-email-vivek.gautam-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org>
2017-07-10  3:37     ` Rob Herring
2017-07-10  3:37       ` Rob Herring
2017-07-10  3:37       ` Rob Herring
2017-07-11  5:18       ` Vivek Gautam
2017-07-11  5:18         ` Vivek Gautam
     [not found] ` <1499333825-7658-1-git-send-email-vivek.gautam-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org>
2017-07-06  9:37   ` [PATCH V4 6/6] iommu/arm-smmu: Add support for qcom,msm8996-smmu-v2 clocks Vivek Gautam
2017-07-06  9:37     ` [PATCH V4 6/6] iommu/arm-smmu: Add support for qcom, msm8996-smmu-v2 clocks Vivek Gautam
2017-07-06  9:37     ` [PATCH V4 6/6] iommu/arm-smmu: Add support for qcom,msm8996-smmu-v2 clocks Vivek Gautam
     [not found]     ` <1499333825-7658-7-git-send-email-vivek.gautam-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org>
2017-07-10  3:40       ` Rob Herring
2017-07-10  3:40         ` Rob Herring
2017-07-10  3:40         ` Rob Herring
2017-07-10  6:42         ` Vivek Gautam
2017-07-10  6:42           ` Vivek Gautam
2017-07-10  6:42           ` Vivek Gautam
2017-07-10  6:42           ` Vivek Gautam

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.