linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v11 00/10] Introduce memory interconnect for NVIDIA Tegra SoCs
@ 2020-12-03 19:24 Dmitry Osipenko
  2020-12-03 19:24 ` [PATCH v11 01/10] dt-bindings: memory: tegra20: emc: Document opp-supported-hw property Dmitry Osipenko
                   ` (10 more replies)
  0 siblings, 11 replies; 22+ messages in thread
From: Dmitry Osipenko @ 2020-12-03 19:24 UTC (permalink / raw)
  To: Thierry Reding, Jonathan Hunter, Georgi Djakov, Rob Herring,
	Michael Turquette, Stephen Boyd, Peter De Schrijver,
	MyungJoo Ham, Kyungmin Park, Chanwoo Choi, Mikko Perttunen,
	Viresh Kumar, Peter Geis, Nicolas Chauvet, Krzysztof Kozlowski
  Cc: linux-tegra, linux-pm, linux-kernel, dri-devel, devicetree

This series brings initial support for memory interconnect to Tegra20,
Tegra30 and Tegra124 SoCs.

For the starter only display controllers and devfreq devices are getting
interconnect API support, others could be supported later on. The display
controllers have the biggest demand for interconnect API right now because
dynamic memory frequency scaling can't be done safely without taking into
account bandwidth requirement from the displays. In particular this series
fixes distorted display output on T30 Ouya and T124 TK1 devices.

Changelog:

v11: - Added acks from Georgi Djakov.

     - CC'ed devicetree@vger.kernel.org in order to make the dt-bindings
       patch ("Document opp-supported-hw property") visible to the DT
       maintainers.

     - Rebased on a recent linux-next, note that some of v10 DT patches
       were already applied by Thierry into tegra tree, as well as one
       devfreq patch which removed the older tegra20-devfreq driver was
       applied by Chanwoo Choi to the devfreq tree.

     - Note that patch "memory: tegra20: Support hardware versioning and
       clean up OPP table initialization" fixes the noisy (but harmless)
       warning which happens because DT patches were applied before the
       code changes. This problem was spotted in current linux-next by
       Jonathan Hunter, hence could be worthwhile to pick up this patch
       for the v5.11.

v10 - In a longer run it will be much nicer if we could support EMC
      hardware versioning on Tegra20 and it's not late to support it now.
      Hence I added these new patches:

        dt-bindings: memory: tegra20: emc: Document opp-supported-hw property
        memory: tegra20: Support hardware versioning and clean up OPP table initialization

    - Removed error message from tegra30-devfreq driver about missing OPP
      properties in a device-tree because EMC driver already prints that
      message and it uses OPP API error code instead of checking DT directly,
      which is a more correct way of doing that.

v9: - Squashed "memory: tegra30-emc: Factor out clk initialization" into
      patch "tegra30: Support interconnect framework".
      Suggested by Krzysztof Kozlowski.

    - Improved Kconfig in the patch "memory: tegra124-emc: Make driver modular"
      by adding CONFIG_TEGRA124_CLK_EMC entry, which makes clk-driver changes
      to look a bit more cleaner. Suggested by Krzysztof Kozlowski.

    - Dropped voltage regulator support from ICC and DT patches for now
      because there is a new discussion about using a power domain abstraction
      for controlling the regulator, which is likely to happen.

    - Replaced direct "operating-points-v2" property checking in EMC drivers
      with checking of a returned error code from dev_pm_opp_of_add_table().
      Note that I haven't touched T20 EMC driver because it's very likely
      that we'll replace that code with a common helper soon anyways.
      Suggested by Viresh Kumar.

    - The T30 DT patches now include EMC OPP changes for Ouya board, which
      is available now in linux-next.

Dmitry Osipenko (10):
  dt-bindings: memory: tegra20: emc: Document opp-supported-hw property
  memory: tegra20: Support hardware versioning and clean up OPP table
    initialization
  memory: tegra30: Support interconnect framework
  memory: tegra124-emc: Make driver modular
  memory: tegra124-emc: Continue probing if timings are missing in
    device-tree
  memory: tegra124: Support interconnect framework
  drm/tegra: dc: Support memory bandwidth management
  drm/tegra: dc: Extend debug stats with total number of events
  PM / devfreq: tegra30: Support interconnect and OPPs from device-tree
  PM / devfreq: tegra30: Separate configurations per-SoC generation

 .../memory-controllers/nvidia,tegra20-emc.txt |   6 +
 drivers/clk/tegra/Kconfig                     |   3 +
 drivers/clk/tegra/Makefile                    |   2 +-
 drivers/clk/tegra/clk-tegra124-emc.c          |  41 +-
 drivers/clk/tegra/clk-tegra124.c              |  26 +-
 drivers/clk/tegra/clk.h                       |  18 +-
 drivers/devfreq/tegra30-devfreq.c             | 147 ++++---
 drivers/gpu/drm/tegra/Kconfig                 |   1 +
 drivers/gpu/drm/tegra/dc.c                    | 359 +++++++++++++++++
 drivers/gpu/drm/tegra/dc.h                    |  19 +
 drivers/gpu/drm/tegra/drm.c                   |  14 +
 drivers/gpu/drm/tegra/hub.c                   |   3 +
 drivers/gpu/drm/tegra/plane.c                 | 121 ++++++
 drivers/gpu/drm/tegra/plane.h                 |  15 +
 drivers/memory/tegra/Kconfig                  |   5 +-
 drivers/memory/tegra/tegra124-emc.c           | 377 ++++++++++++++++--
 drivers/memory/tegra/tegra124.c               |  82 +++-
 drivers/memory/tegra/tegra20-emc.c            |  48 +--
 drivers/memory/tegra/tegra30-emc.c            | 344 +++++++++++++++-
 drivers/memory/tegra/tegra30.c                | 173 +++++++-
 include/linux/clk/tegra.h                     |   8 +
 include/soc/tegra/emc.h                       |  16 -
 22 files changed, 1651 insertions(+), 177 deletions(-)
 delete mode 100644 include/soc/tegra/emc.h

-- 
2.29.2


^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH v11 01/10] dt-bindings: memory: tegra20: emc: Document opp-supported-hw property
  2020-12-03 19:24 [PATCH v11 00/10] Introduce memory interconnect for NVIDIA Tegra SoCs Dmitry Osipenko
@ 2020-12-03 19:24 ` Dmitry Osipenko
  2020-12-04 15:46   ` Thierry Reding
  2020-12-09 20:17   ` Rob Herring
  2020-12-03 19:24 ` [PATCH v11 02/10] memory: tegra20: Support hardware versioning and clean up OPP table initialization Dmitry Osipenko
                   ` (9 subsequent siblings)
  10 siblings, 2 replies; 22+ messages in thread
From: Dmitry Osipenko @ 2020-12-03 19:24 UTC (permalink / raw)
  To: Thierry Reding, Jonathan Hunter, Georgi Djakov, Rob Herring,
	Michael Turquette, Stephen Boyd, Peter De Schrijver,
	MyungJoo Ham, Kyungmin Park, Chanwoo Choi, Mikko Perttunen,
	Viresh Kumar, Peter Geis, Nicolas Chauvet, Krzysztof Kozlowski
  Cc: linux-tegra, linux-pm, linux-kernel, dri-devel, devicetree

Document opp-supported-hw property, which is not strictly necessary to
have on Tegra20, but it's very convenient to have because all other SoC
core devices will use hardware versioning, and thus, it's good to maintain
the consistency.

Signed-off-by: Dmitry Osipenko <digetx@gmail.com>
---
 .../bindings/memory-controllers/nvidia,tegra20-emc.txt      | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/Documentation/devicetree/bindings/memory-controllers/nvidia,tegra20-emc.txt b/Documentation/devicetree/bindings/memory-controllers/nvidia,tegra20-emc.txt
index 67ac8d1297da..cc443fcf4bec 100644
--- a/Documentation/devicetree/bindings/memory-controllers/nvidia,tegra20-emc.txt
+++ b/Documentation/devicetree/bindings/memory-controllers/nvidia,tegra20-emc.txt
@@ -16,6 +16,12 @@ Properties:
 - #interconnect-cells : Should be 0.
 - operating-points-v2: See ../bindings/opp/opp.txt for details.
 
+For each opp entry in 'operating-points-v2' table:
+- opp-supported-hw: One bitfield indicating SoC process ID mask
+
+	A bitwise AND is performed against this value and if any bit
+	matches, the OPP gets enabled.
+
 Optional properties:
 - core-supply: Phandle of voltage regulator of the SoC "core" power domain.
 
-- 
2.29.2


^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH v11 02/10] memory: tegra20: Support hardware versioning and clean up OPP table initialization
  2020-12-03 19:24 [PATCH v11 00/10] Introduce memory interconnect for NVIDIA Tegra SoCs Dmitry Osipenko
  2020-12-03 19:24 ` [PATCH v11 01/10] dt-bindings: memory: tegra20: emc: Document opp-supported-hw property Dmitry Osipenko
@ 2020-12-03 19:24 ` Dmitry Osipenko
  2020-12-04 16:20   ` Thierry Reding
  2020-12-03 19:24 ` [PATCH v11 03/10] memory: tegra30: Support interconnect framework Dmitry Osipenko
                   ` (8 subsequent siblings)
  10 siblings, 1 reply; 22+ messages in thread
From: Dmitry Osipenko @ 2020-12-03 19:24 UTC (permalink / raw)
  To: Thierry Reding, Jonathan Hunter, Georgi Djakov, Rob Herring,
	Michael Turquette, Stephen Boyd, Peter De Schrijver,
	MyungJoo Ham, Kyungmin Park, Chanwoo Choi, Mikko Perttunen,
	Viresh Kumar, Peter Geis, Nicolas Chauvet, Krzysztof Kozlowski
  Cc: linux-tegra, linux-pm, linux-kernel, dri-devel, devicetree

Support hardware versioning, which is now required for Tegra20 EMC OPP.
Clean up OPP table initialization by using a error code returned by OPP
API for judging about the OPP table presence in a device-tree and remove
OPP regulator initialization because we're now going to use power domain
instead of a raw regulator. This puts Tegra20 EMC OPP preparation on par
with the Tegra30/124 EMC drivers.

Signed-off-by: Dmitry Osipenko <digetx@gmail.com>
---
 drivers/memory/tegra/tegra20-emc.c | 48 +++++++++++++-----------------
 1 file changed, 20 insertions(+), 28 deletions(-)

diff --git a/drivers/memory/tegra/tegra20-emc.c b/drivers/memory/tegra/tegra20-emc.c
index 0320d9df4a20..686aaf477d8a 100644
--- a/drivers/memory/tegra/tegra20-emc.c
+++ b/drivers/memory/tegra/tegra20-emc.c
@@ -910,43 +910,36 @@ static int tegra_emc_interconnect_init(struct tegra_emc *emc)
 
 static int tegra_emc_opp_table_init(struct tegra_emc *emc)
 {
-	struct opp_table *reg_opp_table = NULL, *clk_opp_table;
-	const char *rname = "core";
+	u32 hw_version = BIT(tegra_sku_info.soc_process_id);
+	struct opp_table *clk_opp_table, *hw_opp_table;
 	int err;
 
-	/*
-	 * Legacy device-trees don't have OPP table and EMC driver isn't
-	 * useful in this case.
-	 */
-	if (!device_property_present(emc->dev, "operating-points-v2")) {
-		dev_err(emc->dev,
-			"OPP table not found, please update your device tree\n");
-		return -ENODEV;
-	}
-
-	/* voltage scaling is optional */
-	if (device_property_present(emc->dev, "core-supply")) {
-		reg_opp_table = dev_pm_opp_set_regulators(emc->dev, &rname, 1);
-		if (IS_ERR(reg_opp_table))
-			return dev_err_probe(emc->dev, PTR_ERR(reg_opp_table),
-					     "failed to set OPP regulator\n");
-	}
-
 	clk_opp_table = dev_pm_opp_set_clkname(emc->dev, NULL);
 	err = PTR_ERR_OR_ZERO(clk_opp_table);
 	if (err) {
 		dev_err(emc->dev, "failed to set OPP clk: %d\n", err);
-		goto put_reg_table;
+		return err;
 	}
 
-	err = dev_pm_opp_of_add_table(emc->dev);
+	hw_opp_table = dev_pm_opp_set_supported_hw(emc->dev, &hw_version, 1);
+	err = PTR_ERR_OR_ZERO(hw_opp_table);
 	if (err) {
-		dev_err(emc->dev, "failed to add OPP table: %d\n", err);
+		dev_err(emc->dev, "failed to set OPP supported HW: %d\n", err);
 		goto put_clk_table;
 	}
 
-	dev_info(emc->dev, "current clock rate %lu MHz\n",
-		 clk_get_rate(emc->clk) / 1000000);
+	err = dev_pm_opp_of_add_table(emc->dev);
+	if (err) {
+		if (err == -ENODEV)
+			dev_err(emc->dev, "OPP table not found, please update your device tree\n");
+		else
+			dev_err(emc->dev, "failed to add OPP table: %d\n", err);
+
+		goto put_hw_table;
+	}
+
+	dev_info(emc->dev, "OPP HW ver. 0x%x, current clock rate %lu MHz\n",
+		 hw_version, clk_get_rate(emc->clk) / 1000000);
 
 	/* first dummy rate-set initializes voltage state */
 	err = dev_pm_opp_set_rate(emc->dev, clk_get_rate(emc->clk));
@@ -959,11 +952,10 @@ static int tegra_emc_opp_table_init(struct tegra_emc *emc)
 
 remove_table:
 	dev_pm_opp_of_remove_table(emc->dev);
+put_hw_table:
+	dev_pm_opp_put_supported_hw(hw_opp_table);
 put_clk_table:
 	dev_pm_opp_put_clkname(clk_opp_table);
-put_reg_table:
-	if (reg_opp_table)
-		dev_pm_opp_put_regulators(reg_opp_table);
 
 	return err;
 }
-- 
2.29.2


^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH v11 03/10] memory: tegra30: Support interconnect framework
  2020-12-03 19:24 [PATCH v11 00/10] Introduce memory interconnect for NVIDIA Tegra SoCs Dmitry Osipenko
  2020-12-03 19:24 ` [PATCH v11 01/10] dt-bindings: memory: tegra20: emc: Document opp-supported-hw property Dmitry Osipenko
  2020-12-03 19:24 ` [PATCH v11 02/10] memory: tegra20: Support hardware versioning and clean up OPP table initialization Dmitry Osipenko
@ 2020-12-03 19:24 ` Dmitry Osipenko
  2020-12-04 16:27   ` Thierry Reding
  2020-12-03 19:24 ` [PATCH v11 04/10] memory: tegra124-emc: Make driver modular Dmitry Osipenko
                   ` (7 subsequent siblings)
  10 siblings, 1 reply; 22+ messages in thread
From: Dmitry Osipenko @ 2020-12-03 19:24 UTC (permalink / raw)
  To: Thierry Reding, Jonathan Hunter, Georgi Djakov, Rob Herring,
	Michael Turquette, Stephen Boyd, Peter De Schrijver,
	MyungJoo Ham, Kyungmin Park, Chanwoo Choi, Mikko Perttunen,
	Viresh Kumar, Peter Geis, Nicolas Chauvet, Krzysztof Kozlowski
  Cc: linux-tegra, linux-pm, linux-kernel, dri-devel, devicetree

Now Internal and External memory controllers are memory interconnection
providers. This allows us to use interconnect API for tuning of memory
configuration. EMC driver now supports OPPs and DVFS. MC driver now
supports tuning of memory arbitration latency, which needs to be done
for ISO memory clients, like a Display client for example.

Tested-by: Peter Geis <pgwipeout@gmail.com>
Acked-by: Georgi Djakov <georgi.djakov@linaro.org>
Signed-off-by: Dmitry Osipenko <digetx@gmail.com>
---
 drivers/memory/tegra/Kconfig       |   1 +
 drivers/memory/tegra/tegra30-emc.c | 344 +++++++++++++++++++++++++++--
 drivers/memory/tegra/tegra30.c     | 173 ++++++++++++++-
 3 files changed, 496 insertions(+), 22 deletions(-)

diff --git a/drivers/memory/tegra/Kconfig b/drivers/memory/tegra/Kconfig
index 2a4a16bcf91c..ca7077a06f4c 100644
--- a/drivers/memory/tegra/Kconfig
+++ b/drivers/memory/tegra/Kconfig
@@ -24,6 +24,7 @@ config TEGRA30_EMC
 	tristate "NVIDIA Tegra30 External Memory Controller driver"
 	default y
 	depends on TEGRA_MC && ARCH_TEGRA_3x_SOC
+	select PM_OPP
 	help
 	  This driver is for the External Memory Controller (EMC) found on
 	  Tegra30 chips. The EMC controls the external DRAM on the board.
diff --git a/drivers/memory/tegra/tegra30-emc.c b/drivers/memory/tegra/tegra30-emc.c
index 93f9002d32ad..44ac155936aa 100644
--- a/drivers/memory/tegra/tegra30-emc.c
+++ b/drivers/memory/tegra/tegra30-emc.c
@@ -14,16 +14,21 @@
 #include <linux/debugfs.h>
 #include <linux/delay.h>
 #include <linux/err.h>
+#include <linux/interconnect-provider.h>
 #include <linux/interrupt.h>
 #include <linux/io.h>
 #include <linux/iopoll.h>
 #include <linux/kernel.h>
 #include <linux/module.h>
+#include <linux/mutex.h>
 #include <linux/of_platform.h>
 #include <linux/platform_device.h>
+#include <linux/pm_opp.h>
+#include <linux/slab.h>
 #include <linux/sort.h>
 #include <linux/types.h>
 
+#include <soc/tegra/common.h>
 #include <soc/tegra/fuse.h>
 
 #include "mc.h"
@@ -323,9 +328,21 @@ struct emc_timing {
 	bool emc_cfg_dyn_self_ref;
 };
 
+enum emc_rate_request_type {
+	EMC_RATE_DEBUG,
+	EMC_RATE_ICC,
+	EMC_RATE_TYPE_MAX,
+};
+
+struct emc_rate_request {
+	unsigned long min_rate;
+	unsigned long max_rate;
+};
+
 struct tegra_emc {
 	struct device *dev;
 	struct tegra_mc *mc;
+	struct icc_provider provider;
 	struct notifier_block clk_nb;
 	struct clk *clk;
 	void __iomem *regs;
@@ -352,6 +369,15 @@ struct tegra_emc {
 		unsigned long min_rate;
 		unsigned long max_rate;
 	} debugfs;
+
+	/*
+	 * There are multiple sources in the EMC driver which could request
+	 * a min/max clock rate, these rates are contained in this array.
+	 */
+	struct emc_rate_request requested_rate[EMC_RATE_TYPE_MAX];
+
+	/* protect shared rate-change code path */
+	struct mutex rate_lock;
 };
 
 static int emc_seq_update_timing(struct tegra_emc *emc)
@@ -1094,6 +1120,83 @@ static long emc_round_rate(unsigned long rate,
 	return timing->rate;
 }
 
+static void tegra_emc_rate_requests_init(struct tegra_emc *emc)
+{
+	unsigned int i;
+
+	for (i = 0; i < EMC_RATE_TYPE_MAX; i++) {
+		emc->requested_rate[i].min_rate = 0;
+		emc->requested_rate[i].max_rate = ULONG_MAX;
+	}
+}
+
+static int emc_request_rate(struct tegra_emc *emc,
+			    unsigned long new_min_rate,
+			    unsigned long new_max_rate,
+			    enum emc_rate_request_type type)
+{
+	struct emc_rate_request *req = emc->requested_rate;
+	unsigned long min_rate = 0, max_rate = ULONG_MAX;
+	unsigned int i;
+	int err;
+
+	/* select minimum and maximum rates among the requested rates */
+	for (i = 0; i < EMC_RATE_TYPE_MAX; i++, req++) {
+		if (i == type) {
+			min_rate = max(new_min_rate, min_rate);
+			max_rate = min(new_max_rate, max_rate);
+		} else {
+			min_rate = max(req->min_rate, min_rate);
+			max_rate = min(req->max_rate, max_rate);
+		}
+	}
+
+	if (min_rate > max_rate) {
+		dev_err_ratelimited(emc->dev, "%s: type %u: out of range: %lu %lu\n",
+				    __func__, type, min_rate, max_rate);
+		return -ERANGE;
+	}
+
+	/*
+	 * EMC rate-changes should go via OPP API because it manages voltage
+	 * changes.
+	 */
+	err = dev_pm_opp_set_rate(emc->dev, min_rate);
+	if (err)
+		return err;
+
+	emc->requested_rate[type].min_rate = new_min_rate;
+	emc->requested_rate[type].max_rate = new_max_rate;
+
+	return 0;
+}
+
+static int emc_set_min_rate(struct tegra_emc *emc, unsigned long rate,
+			    enum emc_rate_request_type type)
+{
+	struct emc_rate_request *req = &emc->requested_rate[type];
+	int ret;
+
+	mutex_lock(&emc->rate_lock);
+	ret = emc_request_rate(emc, rate, req->max_rate, type);
+	mutex_unlock(&emc->rate_lock);
+
+	return ret;
+}
+
+static int emc_set_max_rate(struct tegra_emc *emc, unsigned long rate,
+			    enum emc_rate_request_type type)
+{
+	struct emc_rate_request *req = &emc->requested_rate[type];
+	int ret;
+
+	mutex_lock(&emc->rate_lock);
+	ret = emc_request_rate(emc, req->min_rate, rate, type);
+	mutex_unlock(&emc->rate_lock);
+
+	return ret;
+}
+
 /*
  * debugfs interface
  *
@@ -1177,7 +1280,7 @@ static int tegra_emc_debug_min_rate_set(void *data, u64 rate)
 	if (!tegra_emc_validate_rate(emc, rate))
 		return -EINVAL;
 
-	err = clk_set_min_rate(emc->clk, rate);
+	err = emc_set_min_rate(emc, rate, EMC_RATE_DEBUG);
 	if (err < 0)
 		return err;
 
@@ -1207,7 +1310,7 @@ static int tegra_emc_debug_max_rate_set(void *data, u64 rate)
 	if (!tegra_emc_validate_rate(emc, rate))
 		return -EINVAL;
 
-	err = clk_set_max_rate(emc->clk, rate);
+	err = emc_set_max_rate(emc, rate, EMC_RATE_DEBUG);
 	if (err < 0)
 		return err;
 
@@ -1264,6 +1367,214 @@ static void tegra_emc_debugfs_init(struct tegra_emc *emc)
 			    emc, &tegra_emc_debug_max_rate_fops);
 }
 
+static inline struct tegra_emc *
+to_tegra_emc_provider(struct icc_provider *provider)
+{
+	return container_of(provider, struct tegra_emc, provider);
+}
+
+static struct icc_node_data *
+emc_of_icc_xlate_extended(struct of_phandle_args *spec, void *data)
+{
+	struct icc_provider *provider = data;
+	struct icc_node_data *ndata;
+	struct icc_node *node;
+
+	/* External Memory is the only possible ICC route */
+	list_for_each_entry(node, &provider->nodes, node_list) {
+		if (node->id != TEGRA_ICC_EMEM)
+			continue;
+
+		ndata = kzalloc(sizeof(*ndata), GFP_KERNEL);
+		if (!ndata)
+			return ERR_PTR(-ENOMEM);
+
+		/*
+		 * SRC and DST nodes should have matching TAG in order to have
+		 * it set by default for a requested path.
+		 */
+		ndata->tag = TEGRA_MC_ICC_TAG_ISO;
+		ndata->node = node;
+
+		return ndata;
+	}
+
+	return ERR_PTR(-EPROBE_DEFER);
+}
+
+static int emc_icc_set(struct icc_node *src, struct icc_node *dst)
+{
+	struct tegra_emc *emc = to_tegra_emc_provider(dst->provider);
+	unsigned long long peak_bw = icc_units_to_bps(dst->peak_bw);
+	unsigned long long avg_bw = icc_units_to_bps(dst->avg_bw);
+	unsigned long long rate = max(avg_bw, peak_bw);
+	const unsigned int dram_data_bus_width_bytes = 4;
+	const unsigned int ddr = 2;
+	int err;
+
+	/*
+	 * Tegra30 EMC runs on a clock rate of SDRAM bus.  This means that
+	 * EMC clock rate is twice smaller than the peak data rate because
+	 * data is sampled on both EMC clock edges.
+	 */
+	do_div(rate, ddr * dram_data_bus_width_bytes);
+	rate = min_t(u64, rate, U32_MAX);
+
+	err = emc_set_min_rate(emc, rate, EMC_RATE_ICC);
+	if (err)
+		return err;
+
+	return 0;
+}
+
+static int tegra_emc_interconnect_init(struct tegra_emc *emc)
+{
+	const struct tegra_mc_soc *soc = emc->mc->soc;
+	struct icc_node *node;
+	int err;
+
+	emc->provider.dev = emc->dev;
+	emc->provider.set = emc_icc_set;
+	emc->provider.data = &emc->provider;
+	emc->provider.aggregate = soc->icc_ops->aggregate;
+	emc->provider.xlate_extended = emc_of_icc_xlate_extended;
+
+	err = icc_provider_add(&emc->provider);
+	if (err)
+		goto err_msg;
+
+	/* create External Memory Controller node */
+	node = icc_node_create(TEGRA_ICC_EMC);
+	if (IS_ERR(node)) {
+		err = PTR_ERR(node);
+		goto del_provider;
+	}
+
+	node->name = "External Memory Controller";
+	icc_node_add(node, &emc->provider);
+
+	/* link External Memory Controller to External Memory (DRAM) */
+	err = icc_link_create(node, TEGRA_ICC_EMEM);
+	if (err)
+		goto remove_nodes;
+
+	/* create External Memory node */
+	node = icc_node_create(TEGRA_ICC_EMEM);
+	if (IS_ERR(node)) {
+		err = PTR_ERR(node);
+		goto remove_nodes;
+	}
+
+	node->name = "External Memory (DRAM)";
+	icc_node_add(node, &emc->provider);
+
+	return 0;
+
+remove_nodes:
+	icc_nodes_remove(&emc->provider);
+del_provider:
+	icc_provider_del(&emc->provider);
+err_msg:
+	dev_err(emc->dev, "failed to initialize ICC: %d\n", err);
+
+	return err;
+}
+
+static int tegra_emc_opp_table_init(struct tegra_emc *emc)
+{
+	u32 hw_version = BIT(tegra_sku_info.soc_speedo_id);
+	struct opp_table *clk_opp_table, *hw_opp_table;
+	int err;
+
+	clk_opp_table = dev_pm_opp_set_clkname(emc->dev, NULL);
+	err = PTR_ERR_OR_ZERO(clk_opp_table);
+	if (err) {
+		dev_err(emc->dev, "failed to set OPP clk: %d\n", err);
+		return err;
+	}
+
+	hw_opp_table = dev_pm_opp_set_supported_hw(emc->dev, &hw_version, 1);
+	err = PTR_ERR_OR_ZERO(hw_opp_table);
+	if (err) {
+		dev_err(emc->dev, "failed to set OPP supported HW: %d\n", err);
+		goto put_clk_table;
+	}
+
+	err = dev_pm_opp_of_add_table(emc->dev);
+	if (err) {
+		if (err == -ENODEV)
+			dev_err(emc->dev, "OPP table not found, please update your device tree\n");
+		else
+			dev_err(emc->dev, "failed to add OPP table: %d\n", err);
+
+		goto put_hw_table;
+	}
+
+	dev_info(emc->dev, "OPP HW ver. 0x%x, current clock rate %lu MHz\n",
+		 hw_version, clk_get_rate(emc->clk) / 1000000);
+
+	/* first dummy rate-set initializes voltage state */
+	err = dev_pm_opp_set_rate(emc->dev, clk_get_rate(emc->clk));
+	if (err) {
+		dev_err(emc->dev, "failed to initialize OPP clock: %d\n", err);
+		goto remove_table;
+	}
+
+	return 0;
+
+remove_table:
+	dev_pm_opp_of_remove_table(emc->dev);
+put_hw_table:
+	dev_pm_opp_put_supported_hw(hw_opp_table);
+put_clk_table:
+	dev_pm_opp_put_clkname(clk_opp_table);
+
+	return err;
+}
+
+static void devm_tegra_emc_unset_callback(void *data)
+{
+	tegra20_clk_set_emc_round_callback(NULL, NULL);
+}
+
+static void devm_tegra_emc_unreg_clk_notifier(void *data)
+{
+	struct tegra_emc *emc = data;
+
+	clk_notifier_unregister(emc->clk, &emc->clk_nb);
+}
+
+static int tegra_emc_init_clk(struct tegra_emc *emc)
+{
+	int err;
+
+	tegra20_clk_set_emc_round_callback(emc_round_rate, emc);
+
+	err = devm_add_action_or_reset(emc->dev, devm_tegra_emc_unset_callback,
+				       NULL);
+	if (err)
+		return err;
+
+	emc->clk = devm_clk_get(emc->dev, NULL);
+	if (IS_ERR(emc->clk)) {
+		dev_err(emc->dev, "failed to get EMC clock: %pe\n", emc->clk);
+		return PTR_ERR(emc->clk);
+	}
+
+	err = clk_notifier_register(emc->clk, &emc->clk_nb);
+	if (err) {
+		dev_err(emc->dev, "failed to register clk notifier: %d\n", err);
+		return err;
+	}
+
+	err = devm_add_action_or_reset(emc->dev,
+				       devm_tegra_emc_unreg_clk_notifier, emc);
+	if (err)
+		return err;
+
+	return 0;
+}
+
 static int tegra_emc_probe(struct platform_device *pdev)
 {
 	struct device_node *np;
@@ -1278,6 +1589,7 @@ static int tegra_emc_probe(struct platform_device *pdev)
 	if (IS_ERR(emc->mc))
 		return PTR_ERR(emc->mc);
 
+	mutex_init(&emc->rate_lock);
 	emc->clk_nb.notifier_call = emc_clk_change_notify;
 	emc->dev = &pdev->dev;
 
@@ -1310,24 +1622,18 @@ static int tegra_emc_probe(struct platform_device *pdev)
 		return err;
 	}
 
-	tegra20_clk_set_emc_round_callback(emc_round_rate, emc);
-
-	emc->clk = devm_clk_get(&pdev->dev, "emc");
-	if (IS_ERR(emc->clk)) {
-		err = PTR_ERR(emc->clk);
-		dev_err(&pdev->dev, "failed to get emc clock: %d\n", err);
-		goto unset_cb;
-	}
+	err = tegra_emc_init_clk(emc);
+	if (err)
+		return err;
 
-	err = clk_notifier_register(emc->clk, &emc->clk_nb);
-	if (err) {
-		dev_err(&pdev->dev, "failed to register clk notifier: %d\n",
-			err);
-		goto unset_cb;
-	}
+	err = tegra_emc_opp_table_init(emc);
+	if (err)
+		return err;
 
 	platform_set_drvdata(pdev, emc);
+	tegra_emc_rate_requests_init(emc);
 	tegra_emc_debugfs_init(emc);
+	tegra_emc_interconnect_init(emc);
 
 	/*
 	 * Don't allow the kernel module to be unloaded. Unloading adds some
@@ -1337,11 +1643,6 @@ static int tegra_emc_probe(struct platform_device *pdev)
 	try_module_get(THIS_MODULE);
 
 	return 0;
-
-unset_cb:
-	tegra20_clk_set_emc_round_callback(NULL, NULL);
-
-	return err;
 }
 
 static int tegra_emc_suspend(struct device *dev)
@@ -1395,6 +1696,7 @@ static struct platform_driver tegra_emc_driver = {
 		.of_match_table = tegra_emc_of_match,
 		.pm = &tegra_emc_pm_ops,
 		.suppress_bind_attrs = true,
+		.sync_state = icc_sync_state,
 	},
 };
 module_platform_driver(tegra_emc_driver);
diff --git a/drivers/memory/tegra/tegra30.c b/drivers/memory/tegra/tegra30.c
index d0314f29608d..ea849003014b 100644
--- a/drivers/memory/tegra/tegra30.c
+++ b/drivers/memory/tegra/tegra30.c
@@ -4,7 +4,8 @@
  */
 
 #include <linux/of.h>
-#include <linux/mm.h>
+#include <linux/of_device.h>
+#include <linux/slab.h>
 
 #include <dt-bindings/memory/tegra30-mc.h>
 
@@ -1083,6 +1084,175 @@ static const struct tegra_mc_reset tegra30_mc_resets[] = {
 	TEGRA30_MC_RESET(VI,       0x200, 0x204, 17),
 };
 
+static void tegra30_mc_tune_client_latency(struct tegra_mc *mc,
+					   const struct tegra_mc_client *client,
+					   unsigned int bandwidth_mbytes_sec)
+{
+	u32 arb_tolerance_compensation_nsec, arb_tolerance_compensation_div;
+	const struct tegra_mc_la *la = &client->la;
+	unsigned int fifo_size = client->fifo_size;
+	u32 arb_nsec, la_ticks, value;
+
+	/* see 18.4.1 Client Configuration in Tegra3 TRM v03p */
+	if (bandwidth_mbytes_sec)
+		arb_nsec = fifo_size * NSEC_PER_USEC / bandwidth_mbytes_sec;
+	else
+		arb_nsec = U32_MAX;
+
+	/*
+	 * Latency allowness should be set with consideration for the module's
+	 * latency tolerance and internal buffering capabilities.
+	 *
+	 * Display memory clients use isochronous transfers and have very low
+	 * tolerance to a belated transfers. Hence we need to compensate the
+	 * memory arbitration imperfection for them in order to prevent FIFO
+	 * underflow condition when memory bus is busy.
+	 *
+	 * VI clients also need a stronger compensation.
+	 */
+	switch (client->swgroup) {
+	case TEGRA_SWGROUP_MPCORE:
+	case TEGRA_SWGROUP_PTC:
+		/*
+		 * We always want lower latency for these clients, hence
+		 * don't touch them.
+		 */
+		return;
+
+	case TEGRA_SWGROUP_DC:
+	case TEGRA_SWGROUP_DCB:
+		arb_tolerance_compensation_nsec = 1050;
+		arb_tolerance_compensation_div = 2;
+		break;
+
+	case TEGRA_SWGROUP_VI:
+		arb_tolerance_compensation_nsec = 1050;
+		arb_tolerance_compensation_div = 1;
+		break;
+
+	default:
+		arb_tolerance_compensation_nsec = 150;
+		arb_tolerance_compensation_div = 1;
+		break;
+	}
+
+	if (arb_nsec > arb_tolerance_compensation_nsec)
+		arb_nsec -= arb_tolerance_compensation_nsec;
+	else
+		arb_nsec = 0;
+
+	arb_nsec /= arb_tolerance_compensation_div;
+
+	/*
+	 * Latency allowance is a number of ticks a request from a particular
+	 * client may wait in the EMEM arbiter before it becomes a high-priority
+	 * request.
+	 */
+	la_ticks = arb_nsec / mc->tick;
+	la_ticks = min(la_ticks, la->mask);
+
+	value = mc_readl(mc, la->reg);
+	value &= ~(la->mask << la->shift);
+	value |= la_ticks << la->shift;
+	mc_writel(mc, value, la->reg);
+}
+
+static int tegra30_mc_icc_set(struct icc_node *src, struct icc_node *dst)
+{
+	struct tegra_mc *mc = icc_provider_to_tegra_mc(src->provider);
+	const struct tegra_mc_client *client = &mc->soc->clients[src->id];
+	u64 peak_bandwidth = icc_units_to_bps(src->peak_bw);
+
+	/*
+	 * Skip pre-initialization that is done by icc_node_add(), which sets
+	 * bandwidth to maximum for all clients before drivers are loaded.
+	 *
+	 * This doesn't make sense for us because we don't have drivers for all
+	 * clients and it's okay to keep configuration left from bootloader
+	 * during boot, at least for today.
+	 */
+	if (src == dst)
+		return 0;
+
+	/* convert bytes/sec to megabytes/sec */
+	do_div(peak_bandwidth, 1000000);
+
+	tegra30_mc_tune_client_latency(mc, client, peak_bandwidth);
+
+	return 0;
+}
+
+static int tegra30_mc_icc_aggreate(struct icc_node *node, u32 tag, u32 avg_bw,
+				   u32 peak_bw, u32 *agg_avg, u32 *agg_peak)
+{
+	/*
+	 * ISO clients need to reserve extra bandwidth up-front because
+	 * there could be high bandwidth pressure during initial filling
+	 * of the client's FIFO buffers.  Secondly, we need to take into
+	 * account impurities of the memory subsystem.
+	 */
+	if (tag & TEGRA_MC_ICC_TAG_ISO)
+		peak_bw = tegra_mc_scale_percents(peak_bw, 400);
+
+	*agg_avg += avg_bw;
+	*agg_peak = max(*agg_peak, peak_bw);
+
+	return 0;
+}
+
+static struct icc_node_data *
+tegra30_mc_of_icc_xlate_extended(struct of_phandle_args *spec, void *data)
+{
+	struct tegra_mc *mc = icc_provider_to_tegra_mc(data);
+	const struct tegra_mc_client *client;
+	unsigned int i, idx = spec->args[0];
+	struct icc_node_data *ndata;
+	struct icc_node *node;
+
+	list_for_each_entry(node, &mc->provider.nodes, node_list) {
+		if (node->id != idx)
+			continue;
+
+		ndata = kzalloc(sizeof(*ndata), GFP_KERNEL);
+		if (!ndata)
+			return ERR_PTR(-ENOMEM);
+
+		client = &mc->soc->clients[idx];
+		ndata->node = node;
+
+		switch (client->swgroup) {
+		case TEGRA_SWGROUP_DC:
+		case TEGRA_SWGROUP_DCB:
+		case TEGRA_SWGROUP_PTC:
+		case TEGRA_SWGROUP_VI:
+			/* these clients are isochronous by default */
+			ndata->tag = TEGRA_MC_ICC_TAG_ISO;
+			break;
+
+		default:
+			ndata->tag = TEGRA_MC_ICC_TAG_DEFAULT;
+			break;
+		}
+
+		return ndata;
+	}
+
+	for (i = 0; i < mc->soc->num_clients; i++) {
+		if (mc->soc->clients[i].id == idx)
+			return ERR_PTR(-EPROBE_DEFER);
+	}
+
+	dev_err(mc->dev, "invalid ICC client ID %u\n", idx);
+
+	return ERR_PTR(-EINVAL);
+}
+
+static const struct tegra_mc_icc_ops tegra30_mc_icc_ops = {
+	.xlate_extended = tegra30_mc_of_icc_xlate_extended,
+	.aggregate = tegra30_mc_icc_aggreate,
+	.set = tegra30_mc_icc_set,
+};
+
 const struct tegra_mc_soc tegra30_mc_soc = {
 	.clients = tegra30_mc_clients,
 	.num_clients = ARRAY_SIZE(tegra30_mc_clients),
@@ -1097,4 +1267,5 @@ const struct tegra_mc_soc tegra30_mc_soc = {
 	.reset_ops = &tegra_mc_reset_ops_common,
 	.resets = tegra30_mc_resets,
 	.num_resets = ARRAY_SIZE(tegra30_mc_resets),
+	.icc_ops = &tegra30_mc_icc_ops,
 };
-- 
2.29.2


^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH v11 04/10] memory: tegra124-emc: Make driver modular
  2020-12-03 19:24 [PATCH v11 00/10] Introduce memory interconnect for NVIDIA Tegra SoCs Dmitry Osipenko
                   ` (2 preceding siblings ...)
  2020-12-03 19:24 ` [PATCH v11 03/10] memory: tegra30: Support interconnect framework Dmitry Osipenko
@ 2020-12-03 19:24 ` Dmitry Osipenko
  2020-12-04 16:41   ` Thierry Reding
  2020-12-03 19:24 ` [PATCH v11 05/10] memory: tegra124-emc: Continue probing if timings are missing in device-tree Dmitry Osipenko
                   ` (6 subsequent siblings)
  10 siblings, 1 reply; 22+ messages in thread
From: Dmitry Osipenko @ 2020-12-03 19:24 UTC (permalink / raw)
  To: Thierry Reding, Jonathan Hunter, Georgi Djakov, Rob Herring,
	Michael Turquette, Stephen Boyd, Peter De Schrijver,
	MyungJoo Ham, Kyungmin Park, Chanwoo Choi, Mikko Perttunen,
	Viresh Kumar, Peter Geis, Nicolas Chauvet, Krzysztof Kozlowski
  Cc: linux-tegra, linux-pm, linux-kernel, dri-devel, devicetree

Add modularization support to the Tegra124 EMC driver, which now can be
compiled as a loadable kernel module.

Note that EMC clock must be registered at clk-init time, otherwise PLLM
will be disabled as unused clock at boot time if EMC driver is compiled
as a module. Hence add a prepare/complete callbacks. similarly to what is
done for the Tegra20/30 EMC drivers.

Tested-by: Nicolas Chauvet <kwizart@gmail.com>
Signed-off-by: Dmitry Osipenko <digetx@gmail.com>
---
 drivers/clk/tegra/Kconfig            |  3 ++
 drivers/clk/tegra/Makefile           |  2 +-
 drivers/clk/tegra/clk-tegra124-emc.c | 41 ++++++++++++++++++++++++----
 drivers/clk/tegra/clk-tegra124.c     | 26 ++++++++++++++++--
 drivers/clk/tegra/clk.h              | 18 ++++++++----
 drivers/memory/tegra/Kconfig         |  3 +-
 drivers/memory/tegra/tegra124-emc.c  | 31 ++++++++++++++-------
 include/linux/clk/tegra.h            |  8 ++++++
 include/soc/tegra/emc.h              | 16 -----------
 9 files changed, 106 insertions(+), 42 deletions(-)
 delete mode 100644 include/soc/tegra/emc.h

diff --git a/drivers/clk/tegra/Kconfig b/drivers/clk/tegra/Kconfig
index deaa4605824c..90df619dc087 100644
--- a/drivers/clk/tegra/Kconfig
+++ b/drivers/clk/tegra/Kconfig
@@ -7,3 +7,6 @@ config TEGRA_CLK_DFLL
 	depends on ARCH_TEGRA_124_SOC || ARCH_TEGRA_210_SOC
 	select PM_OPP
 	def_bool y
+
+config TEGRA124_CLK_EMC
+	bool
diff --git a/drivers/clk/tegra/Makefile b/drivers/clk/tegra/Makefile
index eec2313fd37e..7b1816856eb5 100644
--- a/drivers/clk/tegra/Makefile
+++ b/drivers/clk/tegra/Makefile
@@ -22,7 +22,7 @@ obj-$(CONFIG_ARCH_TEGRA_3x_SOC)		+= clk-tegra20-emc.o
 obj-$(CONFIG_ARCH_TEGRA_114_SOC)	+= clk-tegra114.o
 obj-$(CONFIG_ARCH_TEGRA_124_SOC)	+= clk-tegra124.o
 obj-$(CONFIG_TEGRA_CLK_DFLL)		+= clk-tegra124-dfll-fcpu.o
-obj-$(CONFIG_TEGRA124_EMC)		+= clk-tegra124-emc.o
+obj-$(CONFIG_TEGRA124_CLK_EMC)		+= clk-tegra124-emc.o
 obj-$(CONFIG_ARCH_TEGRA_132_SOC)	+= clk-tegra124.o
 obj-y					+= cvb.o
 obj-$(CONFIG_ARCH_TEGRA_210_SOC)	+= clk-tegra210.o
diff --git a/drivers/clk/tegra/clk-tegra124-emc.c b/drivers/clk/tegra/clk-tegra124-emc.c
index 745f9faa98d8..bdf6f4a51617 100644
--- a/drivers/clk/tegra/clk-tegra124-emc.c
+++ b/drivers/clk/tegra/clk-tegra124-emc.c
@@ -11,7 +11,9 @@
 #include <linux/clk-provider.h>
 #include <linux/clk.h>
 #include <linux/clkdev.h>
+#include <linux/clk/tegra.h>
 #include <linux/delay.h>
+#include <linux/export.h>
 #include <linux/io.h>
 #include <linux/module.h>
 #include <linux/of_address.h>
@@ -21,7 +23,6 @@
 #include <linux/string.h>
 
 #include <soc/tegra/fuse.h>
-#include <soc/tegra/emc.h>
 
 #include "clk.h"
 
@@ -80,6 +81,9 @@ struct tegra_clk_emc {
 	int num_timings;
 	struct emc_timing *timings;
 	spinlock_t *lock;
+
+	tegra124_emc_prepare_timing_change_cb *prepare_timing_change;
+	tegra124_emc_complete_timing_change_cb *complete_timing_change;
 };
 
 /* Common clock framework callback implementations */
@@ -176,6 +180,9 @@ static struct tegra_emc *emc_ensure_emc_driver(struct tegra_clk_emc *tegra)
 	if (tegra->emc)
 		return tegra->emc;
 
+	if (!tegra->prepare_timing_change || !tegra->complete_timing_change)
+		return NULL;
+
 	if (!tegra->emc_node)
 		return NULL;
 
@@ -241,7 +248,7 @@ static int emc_set_timing(struct tegra_clk_emc *tegra,
 
 	div = timing->parent_rate / (timing->rate / 2) - 2;
 
-	err = tegra_emc_prepare_timing_change(emc, timing->rate);
+	err = tegra->prepare_timing_change(emc, timing->rate);
 	if (err)
 		return err;
 
@@ -259,7 +266,7 @@ static int emc_set_timing(struct tegra_clk_emc *tegra,
 
 	spin_unlock_irqrestore(tegra->lock, flags);
 
-	tegra_emc_complete_timing_change(emc, timing->rate);
+	tegra->complete_timing_change(emc, timing->rate);
 
 	clk_hw_reparent(&tegra->hw, __clk_get_hw(timing->parent));
 	clk_disable_unprepare(tegra->prev_parent);
@@ -473,8 +480,8 @@ static const struct clk_ops tegra_clk_emc_ops = {
 	.get_parent = emc_get_parent,
 };
 
-struct clk *tegra_clk_register_emc(void __iomem *base, struct device_node *np,
-				   spinlock_t *lock)
+struct clk *tegra124_clk_register_emc(void __iomem *base, struct device_node *np,
+				      spinlock_t *lock)
 {
 	struct tegra_clk_emc *tegra;
 	struct clk_init_data init;
@@ -538,3 +545,27 @@ struct clk *tegra_clk_register_emc(void __iomem *base, struct device_node *np,
 
 	return clk;
 };
+
+void tegra124_clk_set_emc_callbacks(tegra124_emc_prepare_timing_change_cb *prep_cb,
+				    tegra124_emc_complete_timing_change_cb *complete_cb)
+{
+	struct clk *clk = __clk_lookup("emc");
+	struct tegra_clk_emc *tegra;
+	struct clk_hw *hw;
+
+	if (clk) {
+		hw = __clk_get_hw(clk);
+		tegra = container_of(hw, struct tegra_clk_emc, hw);
+
+		tegra->prepare_timing_change = prep_cb;
+		tegra->complete_timing_change = complete_cb;
+	}
+}
+EXPORT_SYMBOL_GPL(tegra124_clk_set_emc_callbacks);
+
+bool tegra124_clk_emc_driver_available(struct clk_hw *hw)
+{
+	struct tegra_clk_emc *tegra = container_of(hw, struct tegra_clk_emc, hw);
+
+	return tegra->prepare_timing_change && tegra->complete_timing_change;
+}
diff --git a/drivers/clk/tegra/clk-tegra124.c b/drivers/clk/tegra/clk-tegra124.c
index e931319dcc9d..934520aab6e3 100644
--- a/drivers/clk/tegra/clk-tegra124.c
+++ b/drivers/clk/tegra/clk-tegra124.c
@@ -1500,6 +1500,26 @@ static void __init tegra124_132_clock_init_pre(struct device_node *np)
 	writel(plld_base, clk_base + PLLD_BASE);
 }
 
+static struct clk *tegra124_clk_src_onecell_get(struct of_phandle_args *clkspec,
+						void *data)
+{
+	struct clk_hw *hw;
+	struct clk *clk;
+
+	clk = of_clk_src_onecell_get(clkspec, data);
+	if (IS_ERR(clk))
+		return clk;
+
+	hw = __clk_get_hw(clk);
+
+	if (clkspec->args[0] == TEGRA124_CLK_EMC) {
+		if (!tegra124_clk_emc_driver_available(hw))
+			return ERR_PTR(-EPROBE_DEFER);
+	}
+
+	return clk;
+}
+
 /**
  * tegra124_132_clock_init_post - clock initialization postamble for T124/T132
  * @np: struct device_node * of the DT node for the SoC CAR IP block
@@ -1516,10 +1536,10 @@ static void __init tegra124_132_clock_init_post(struct device_node *np)
 				  &pll_x_params);
 	tegra_init_special_resets(1, tegra124_reset_assert,
 				  tegra124_reset_deassert);
-	tegra_add_of_provider(np, of_clk_src_onecell_get);
+	tegra_add_of_provider(np, tegra124_clk_src_onecell_get);
 
-	clks[TEGRA124_CLK_EMC] = tegra_clk_register_emc(clk_base, np,
-							&emc_lock);
+	clks[TEGRA124_CLK_EMC] = tegra124_clk_register_emc(clk_base, np,
+							   &emc_lock);
 
 	tegra_register_devclks(devclks, ARRAY_SIZE(devclks));
 
diff --git a/drivers/clk/tegra/clk.h b/drivers/clk/tegra/clk.h
index 6b565f6b5f66..c3e36b5dcc75 100644
--- a/drivers/clk/tegra/clk.h
+++ b/drivers/clk/tegra/clk.h
@@ -881,16 +881,22 @@ void tegra_super_clk_gen5_init(void __iomem *clk_base,
 			void __iomem *pmc_base, struct tegra_clk *tegra_clks,
 			struct tegra_clk_pll_params *pll_params);
 
-#ifdef CONFIG_TEGRA124_EMC
-struct clk *tegra_clk_register_emc(void __iomem *base, struct device_node *np,
-				   spinlock_t *lock);
+#ifdef CONFIG_TEGRA124_CLK_EMC
+struct clk *tegra124_clk_register_emc(void __iomem *base, struct device_node *np,
+				      spinlock_t *lock);
+bool tegra124_clk_emc_driver_available(struct clk_hw *emc_hw);
 #else
-static inline struct clk *tegra_clk_register_emc(void __iomem *base,
-						 struct device_node *np,
-						 spinlock_t *lock)
+static inline struct clk *
+tegra124_clk_register_emc(void __iomem *base, struct device_node *np,
+			  spinlock_t *lock)
 {
 	return NULL;
 }
+
+static inline bool tegra124_clk_emc_driver_available(struct clk_hw *emc_hw)
+{
+	return false;
+}
 #endif
 
 void tegra114_clock_tune_cpu_trimmers_high(void);
diff --git a/drivers/memory/tegra/Kconfig b/drivers/memory/tegra/Kconfig
index ca7077a06f4c..f5b451403c58 100644
--- a/drivers/memory/tegra/Kconfig
+++ b/drivers/memory/tegra/Kconfig
@@ -32,9 +32,10 @@ config TEGRA30_EMC
 	  external memory.
 
 config TEGRA124_EMC
-	bool "NVIDIA Tegra124 External Memory Controller driver"
+	tristate "NVIDIA Tegra124 External Memory Controller driver"
 	default y
 	depends on TEGRA_MC && ARCH_TEGRA_124_SOC
+	select TEGRA124_CLK_EMC
 	help
 	  This driver is for the External Memory Controller (EMC) found on
 	  Tegra124 chips. The EMC controls the external DRAM on the board.
diff --git a/drivers/memory/tegra/tegra124-emc.c b/drivers/memory/tegra/tegra124-emc.c
index ee8ee39e98ed..edfbf6d6d357 100644
--- a/drivers/memory/tegra/tegra124-emc.c
+++ b/drivers/memory/tegra/tegra124-emc.c
@@ -9,16 +9,17 @@
 #include <linux/clk-provider.h>
 #include <linux/clk.h>
 #include <linux/clkdev.h>
+#include <linux/clk/tegra.h>
 #include <linux/debugfs.h>
 #include <linux/delay.h>
 #include <linux/io.h>
+#include <linux/module.h>
 #include <linux/of_address.h>
 #include <linux/of_platform.h>
 #include <linux/platform_device.h>
 #include <linux/sort.h>
 #include <linux/string.h>
 
-#include <soc/tegra/emc.h>
 #include <soc/tegra/fuse.h>
 #include <soc/tegra/mc.h>
 
@@ -562,8 +563,8 @@ static struct emc_timing *tegra_emc_find_timing(struct tegra_emc *emc,
 	return timing;
 }
 
-int tegra_emc_prepare_timing_change(struct tegra_emc *emc,
-				    unsigned long rate)
+static int tegra_emc_prepare_timing_change(struct tegra_emc *emc,
+					   unsigned long rate)
 {
 	struct emc_timing *timing = tegra_emc_find_timing(emc, rate);
 	struct emc_timing *last = &emc->last_timing;
@@ -790,8 +791,8 @@ int tegra_emc_prepare_timing_change(struct tegra_emc *emc,
 	return 0;
 }
 
-void tegra_emc_complete_timing_change(struct tegra_emc *emc,
-				      unsigned long rate)
+static void tegra_emc_complete_timing_change(struct tegra_emc *emc,
+					     unsigned long rate)
 {
 	struct emc_timing *timing = tegra_emc_find_timing(emc, rate);
 	struct emc_timing *last = &emc->last_timing;
@@ -987,6 +988,7 @@ static const struct of_device_id tegra_emc_of_match[] = {
 	{ .compatible = "nvidia,tegra132-emc" },
 	{}
 };
+MODULE_DEVICE_TABLE(of, tegra_emc_of_match);
 
 static struct device_node *
 tegra_emc_find_node_by_ram_code(struct device_node *node, u32 ram_code)
@@ -1226,9 +1228,19 @@ static int tegra_emc_probe(struct platform_device *pdev)
 
 	platform_set_drvdata(pdev, emc);
 
+	tegra124_clk_set_emc_callbacks(tegra_emc_prepare_timing_change,
+				       tegra_emc_complete_timing_change);
+
 	if (IS_ENABLED(CONFIG_DEBUG_FS))
 		emc_debugfs_init(&pdev->dev, emc);
 
+	/*
+	 * Don't allow the kernel module to be unloaded. Unloading adds some
+	 * extra complexity which doesn't really worth the effort in a case of
+	 * this driver.
+	 */
+	try_module_get(THIS_MODULE);
+
 	return 0;
 };
 
@@ -1240,9 +1252,8 @@ static struct platform_driver tegra_emc_driver = {
 		.suppress_bind_attrs = true,
 	},
 };
+module_platform_driver(tegra_emc_driver);
 
-static int tegra_emc_init(void)
-{
-	return platform_driver_register(&tegra_emc_driver);
-}
-subsys_initcall(tegra_emc_init);
+MODULE_AUTHOR("Mikko Perttunen <mperttunen@nvidia.com>");
+MODULE_DESCRIPTION("NVIDIA Tegra124 EMC driver");
+MODULE_LICENSE("GPL v2");
diff --git a/include/linux/clk/tegra.h b/include/linux/clk/tegra.h
index 3f01d43f0598..eb016fc9cc0b 100644
--- a/include/linux/clk/tegra.h
+++ b/include/linux/clk/tegra.h
@@ -136,6 +136,7 @@ extern void tegra210_clk_emc_dll_update_setting(u32 emc_dll_src_value);
 extern void tegra210_clk_emc_update_setting(u32 emc_src_value);
 
 struct clk;
+struct tegra_emc;
 
 typedef long (tegra20_clk_emc_round_cb)(unsigned long rate,
 					unsigned long min_rate,
@@ -146,6 +147,13 @@ void tegra20_clk_set_emc_round_callback(tegra20_clk_emc_round_cb *round_cb,
 					void *cb_arg);
 int tegra20_clk_prepare_emc_mc_same_freq(struct clk *emc_clk, bool same);
 
+typedef int (tegra124_emc_prepare_timing_change_cb)(struct tegra_emc *emc,
+						    unsigned long rate);
+typedef void (tegra124_emc_complete_timing_change_cb)(struct tegra_emc *emc,
+						      unsigned long rate);
+void tegra124_clk_set_emc_callbacks(tegra124_emc_prepare_timing_change_cb *prep_cb,
+				    tegra124_emc_complete_timing_change_cb *complete_cb);
+
 struct tegra210_clk_emc_config {
 	unsigned long rate;
 	bool same_freq;
diff --git a/include/soc/tegra/emc.h b/include/soc/tegra/emc.h
deleted file mode 100644
index 05199a97ccf4..000000000000
--- a/include/soc/tegra/emc.h
+++ /dev/null
@@ -1,16 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0-only */
-/*
- * Copyright (c) 2014 NVIDIA Corporation. All rights reserved.
- */
-
-#ifndef __SOC_TEGRA_EMC_H__
-#define __SOC_TEGRA_EMC_H__
-
-struct tegra_emc;
-
-int tegra_emc_prepare_timing_change(struct tegra_emc *emc,
-				    unsigned long rate);
-void tegra_emc_complete_timing_change(struct tegra_emc *emc,
-				      unsigned long rate);
-
-#endif /* __SOC_TEGRA_EMC_H__ */
-- 
2.29.2


^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH v11 05/10] memory: tegra124-emc: Continue probing if timings are missing in device-tree
  2020-12-03 19:24 [PATCH v11 00/10] Introduce memory interconnect for NVIDIA Tegra SoCs Dmitry Osipenko
                   ` (3 preceding siblings ...)
  2020-12-03 19:24 ` [PATCH v11 04/10] memory: tegra124-emc: Make driver modular Dmitry Osipenko
@ 2020-12-03 19:24 ` Dmitry Osipenko
  2020-12-03 19:24 ` [PATCH v11 06/10] memory: tegra124: Support interconnect framework Dmitry Osipenko
                   ` (5 subsequent siblings)
  10 siblings, 0 replies; 22+ messages in thread
From: Dmitry Osipenko @ 2020-12-03 19:24 UTC (permalink / raw)
  To: Thierry Reding, Jonathan Hunter, Georgi Djakov, Rob Herring,
	Michael Turquette, Stephen Boyd, Peter De Schrijver,
	MyungJoo Ham, Kyungmin Park, Chanwoo Choi, Mikko Perttunen,
	Viresh Kumar, Peter Geis, Nicolas Chauvet, Krzysztof Kozlowski
  Cc: linux-tegra, linux-pm, linux-kernel, dri-devel, devicetree

EMC driver will become mandatory after turning it into interconnect
provider because interconnect users, like display controller driver, will
fail to probe using newer device-trees that have interconnect properties.
Thus make EMC driver to probe even if timings are missing in device-tree.

Signed-off-by: Dmitry Osipenko <digetx@gmail.com>
---
 drivers/memory/tegra/tegra124-emc.c | 26 +++++++++-----------------
 1 file changed, 9 insertions(+), 17 deletions(-)

diff --git a/drivers/memory/tegra/tegra124-emc.c b/drivers/memory/tegra/tegra124-emc.c
index edfbf6d6d357..8fb8c1af25c9 100644
--- a/drivers/memory/tegra/tegra124-emc.c
+++ b/drivers/memory/tegra/tegra124-emc.c
@@ -1201,23 +1201,15 @@ static int tegra_emc_probe(struct platform_device *pdev)
 	ram_code = tegra_read_ram_code();
 
 	np = tegra_emc_find_node_by_ram_code(pdev->dev.of_node, ram_code);
-	if (!np) {
-		dev_err(&pdev->dev,
-			"no memory timings for RAM code %u found in DT\n",
-			ram_code);
-		return -ENOENT;
-	}
-
-	err = tegra_emc_load_timings_from_dt(emc, np);
-	of_node_put(np);
-	if (err)
-		return err;
-
-	if (emc->num_timings == 0) {
-		dev_err(&pdev->dev,
-			"no memory timings for RAM code %u registered\n",
-			ram_code);
-		return -ENOENT;
+	if (np) {
+		err = tegra_emc_load_timings_from_dt(emc, np);
+		of_node_put(np);
+		if (err)
+			return err;
+	} else {
+		dev_info(&pdev->dev,
+			 "no memory timings for RAM code %u found in DT\n",
+			 ram_code);
 	}
 
 	err = emc_init(emc);
-- 
2.29.2


^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH v11 06/10] memory: tegra124: Support interconnect framework
  2020-12-03 19:24 [PATCH v11 00/10] Introduce memory interconnect for NVIDIA Tegra SoCs Dmitry Osipenko
                   ` (4 preceding siblings ...)
  2020-12-03 19:24 ` [PATCH v11 05/10] memory: tegra124-emc: Continue probing if timings are missing in device-tree Dmitry Osipenko
@ 2020-12-03 19:24 ` Dmitry Osipenko
  2020-12-03 19:24 ` [PATCH v11 07/10] drm/tegra: dc: Support memory bandwidth management Dmitry Osipenko
                   ` (4 subsequent siblings)
  10 siblings, 0 replies; 22+ messages in thread
From: Dmitry Osipenko @ 2020-12-03 19:24 UTC (permalink / raw)
  To: Thierry Reding, Jonathan Hunter, Georgi Djakov, Rob Herring,
	Michael Turquette, Stephen Boyd, Peter De Schrijver,
	MyungJoo Ham, Kyungmin Park, Chanwoo Choi, Mikko Perttunen,
	Viresh Kumar, Peter Geis, Nicolas Chauvet, Krzysztof Kozlowski
  Cc: linux-tegra, linux-pm, linux-kernel, dri-devel, devicetree

Now Internal and External memory controllers are memory interconnection
providers. This allows us to use interconnect API for tuning of memory
configuration. EMC driver now supports OPPs and DVFS.

Tested-by: Nicolas Chauvet <kwizart@gmail.com>
Acked-by: Georgi Djakov <georgi.djakov@linaro.org>
Signed-off-by: Dmitry Osipenko <digetx@gmail.com>
---
 drivers/memory/tegra/Kconfig        |   1 +
 drivers/memory/tegra/tegra124-emc.c | 320 +++++++++++++++++++++++++++-
 drivers/memory/tegra/tegra124.c     |  82 ++++++-
 3 files changed, 391 insertions(+), 12 deletions(-)

diff --git a/drivers/memory/tegra/Kconfig b/drivers/memory/tegra/Kconfig
index f5b451403c58..a70967a56e52 100644
--- a/drivers/memory/tegra/Kconfig
+++ b/drivers/memory/tegra/Kconfig
@@ -36,6 +36,7 @@ config TEGRA124_EMC
 	default y
 	depends on TEGRA_MC && ARCH_TEGRA_124_SOC
 	select TEGRA124_CLK_EMC
+	select PM_OPP
 	help
 	  This driver is for the External Memory Controller (EMC) found on
 	  Tegra124 chips. The EMC controls the external DRAM on the board.
diff --git a/drivers/memory/tegra/tegra124-emc.c b/drivers/memory/tegra/tegra124-emc.c
index 8fb8c1af25c9..9706550bdd5d 100644
--- a/drivers/memory/tegra/tegra124-emc.c
+++ b/drivers/memory/tegra/tegra124-emc.c
@@ -12,20 +12,26 @@
 #include <linux/clk/tegra.h>
 #include <linux/debugfs.h>
 #include <linux/delay.h>
+#include <linux/interconnect-provider.h>
 #include <linux/io.h>
 #include <linux/module.h>
+#include <linux/mutex.h>
 #include <linux/of_address.h>
 #include <linux/of_platform.h>
 #include <linux/platform_device.h>
+#include <linux/pm_opp.h>
 #include <linux/sort.h>
 #include <linux/string.h>
 
 #include <soc/tegra/fuse.h>
 #include <soc/tegra/mc.h>
 
+#include "mc.h"
+
 #define EMC_FBIO_CFG5				0x104
 #define	EMC_FBIO_CFG5_DRAM_TYPE_MASK		0x3
 #define	EMC_FBIO_CFG5_DRAM_TYPE_SHIFT		0
+#define EMC_FBIO_CFG5_DRAM_WIDTH_X64		BIT(4)
 
 #define EMC_INTSTATUS				0x0
 #define EMC_INTSTATUS_CLKCHANGE_COMPLETE	BIT(4)
@@ -461,6 +467,17 @@ struct emc_timing {
 	u32 emc_zcal_interval;
 };
 
+enum emc_rate_request_type {
+	EMC_RATE_DEBUG,
+	EMC_RATE_ICC,
+	EMC_RATE_TYPE_MAX,
+};
+
+struct emc_rate_request {
+	unsigned long min_rate;
+	unsigned long max_rate;
+};
+
 struct tegra_emc {
 	struct device *dev;
 
@@ -471,6 +488,7 @@ struct tegra_emc {
 	struct clk *clk;
 
 	enum emc_dram_type dram_type;
+	unsigned int dram_bus_width;
 	unsigned int dram_num;
 
 	struct emc_timing last_timing;
@@ -482,6 +500,17 @@ struct tegra_emc {
 		unsigned long min_rate;
 		unsigned long max_rate;
 	} debugfs;
+
+	struct icc_provider provider;
+
+	/*
+	 * There are multiple sources in the EMC driver which could request
+	 * a min/max clock rate, these rates are contained in this array.
+	 */
+	struct emc_rate_request requested_rate[EMC_RATE_TYPE_MAX];
+
+	/* protect shared rate-change code path */
+	struct mutex rate_lock;
 };
 
 /* Timing change sequence functions */
@@ -870,6 +899,14 @@ static void emc_read_current_timing(struct tegra_emc *emc,
 static int emc_init(struct tegra_emc *emc)
 {
 	emc->dram_type = readl(emc->regs + EMC_FBIO_CFG5);
+
+	if (emc->dram_type & EMC_FBIO_CFG5_DRAM_WIDTH_X64)
+		emc->dram_bus_width = 64;
+	else
+		emc->dram_bus_width = 32;
+
+	dev_info(emc->dev, "%ubit DRAM bus\n", emc->dram_bus_width);
+
 	emc->dram_type &= EMC_FBIO_CFG5_DRAM_TYPE_MASK;
 	emc->dram_type >>= EMC_FBIO_CFG5_DRAM_TYPE_SHIFT;
 
@@ -1009,6 +1046,83 @@ tegra_emc_find_node_by_ram_code(struct device_node *node, u32 ram_code)
 	return NULL;
 }
 
+static void tegra_emc_rate_requests_init(struct tegra_emc *emc)
+{
+	unsigned int i;
+
+	for (i = 0; i < EMC_RATE_TYPE_MAX; i++) {
+		emc->requested_rate[i].min_rate = 0;
+		emc->requested_rate[i].max_rate = ULONG_MAX;
+	}
+}
+
+static int emc_request_rate(struct tegra_emc *emc,
+			    unsigned long new_min_rate,
+			    unsigned long new_max_rate,
+			    enum emc_rate_request_type type)
+{
+	struct emc_rate_request *req = emc->requested_rate;
+	unsigned long min_rate = 0, max_rate = ULONG_MAX;
+	unsigned int i;
+	int err;
+
+	/* select minimum and maximum rates among the requested rates */
+	for (i = 0; i < EMC_RATE_TYPE_MAX; i++, req++) {
+		if (i == type) {
+			min_rate = max(new_min_rate, min_rate);
+			max_rate = min(new_max_rate, max_rate);
+		} else {
+			min_rate = max(req->min_rate, min_rate);
+			max_rate = min(req->max_rate, max_rate);
+		}
+	}
+
+	if (min_rate > max_rate) {
+		dev_err_ratelimited(emc->dev, "%s: type %u: out of range: %lu %lu\n",
+				    __func__, type, min_rate, max_rate);
+		return -ERANGE;
+	}
+
+	/*
+	 * EMC rate-changes should go via OPP API because it manages voltage
+	 * changes.
+	 */
+	err = dev_pm_opp_set_rate(emc->dev, min_rate);
+	if (err)
+		return err;
+
+	emc->requested_rate[type].min_rate = new_min_rate;
+	emc->requested_rate[type].max_rate = new_max_rate;
+
+	return 0;
+}
+
+static int emc_set_min_rate(struct tegra_emc *emc, unsigned long rate,
+			    enum emc_rate_request_type type)
+{
+	struct emc_rate_request *req = &emc->requested_rate[type];
+	int ret;
+
+	mutex_lock(&emc->rate_lock);
+	ret = emc_request_rate(emc, rate, req->max_rate, type);
+	mutex_unlock(&emc->rate_lock);
+
+	return ret;
+}
+
+static int emc_set_max_rate(struct tegra_emc *emc, unsigned long rate,
+			    enum emc_rate_request_type type)
+{
+	struct emc_rate_request *req = &emc->requested_rate[type];
+	int ret;
+
+	mutex_lock(&emc->rate_lock);
+	ret = emc_request_rate(emc, req->min_rate, rate, type);
+	mutex_unlock(&emc->rate_lock);
+
+	return ret;
+}
+
 /*
  * debugfs interface
  *
@@ -1081,7 +1195,7 @@ static int tegra_emc_debug_min_rate_set(void *data, u64 rate)
 	if (!tegra_emc_validate_rate(emc, rate))
 		return -EINVAL;
 
-	err = clk_set_min_rate(emc->clk, rate);
+	err = emc_set_min_rate(emc, rate, EMC_RATE_DEBUG);
 	if (err < 0)
 		return err;
 
@@ -1111,7 +1225,7 @@ static int tegra_emc_debug_max_rate_set(void *data, u64 rate)
 	if (!tegra_emc_validate_rate(emc, rate))
 		return -EINVAL;
 
-	err = clk_set_max_rate(emc->clk, rate);
+	err = emc_set_max_rate(emc, rate, EMC_RATE_DEBUG);
 	if (err < 0)
 		return err;
 
@@ -1129,15 +1243,6 @@ static void emc_debugfs_init(struct device *dev, struct tegra_emc *emc)
 	unsigned int i;
 	int err;
 
-	emc->clk = devm_clk_get(dev, "emc");
-	if (IS_ERR(emc->clk)) {
-		if (PTR_ERR(emc->clk) != -ENODEV) {
-			dev_err(dev, "failed to get EMC clock: %ld\n",
-				PTR_ERR(emc->clk));
-			return;
-		}
-	}
-
 	emc->debugfs.min_rate = ULONG_MAX;
 	emc->debugfs.max_rate = 0;
 
@@ -1177,6 +1282,177 @@ static void emc_debugfs_init(struct device *dev, struct tegra_emc *emc)
 			    emc, &tegra_emc_debug_max_rate_fops);
 }
 
+static inline struct tegra_emc *
+to_tegra_emc_provider(struct icc_provider *provider)
+{
+	return container_of(provider, struct tegra_emc, provider);
+}
+
+static struct icc_node_data *
+emc_of_icc_xlate_extended(struct of_phandle_args *spec, void *data)
+{
+	struct icc_provider *provider = data;
+	struct icc_node_data *ndata;
+	struct icc_node *node;
+
+	/* External Memory is the only possible ICC route */
+	list_for_each_entry(node, &provider->nodes, node_list) {
+		if (node->id != TEGRA_ICC_EMEM)
+			continue;
+
+		ndata = kzalloc(sizeof(*ndata), GFP_KERNEL);
+		if (!ndata)
+			return ERR_PTR(-ENOMEM);
+
+		/*
+		 * SRC and DST nodes should have matching TAG in order to have
+		 * it set by default for a requested path.
+		 */
+		ndata->tag = TEGRA_MC_ICC_TAG_ISO;
+		ndata->node = node;
+
+		return ndata;
+	}
+
+	return ERR_PTR(-EPROBE_DEFER);
+}
+
+static int emc_icc_set(struct icc_node *src, struct icc_node *dst)
+{
+	struct tegra_emc *emc = to_tegra_emc_provider(dst->provider);
+	unsigned long long peak_bw = icc_units_to_bps(dst->peak_bw);
+	unsigned long long avg_bw = icc_units_to_bps(dst->avg_bw);
+	unsigned long long rate = max(avg_bw, peak_bw);
+	unsigned int dram_data_bus_width_bytes;
+	const unsigned int ddr = 2;
+	int err;
+
+	/*
+	 * Tegra124 EMC runs on a clock rate of SDRAM bus. This means that
+	 * EMC clock rate is twice smaller than the peak data rate because
+	 * data is sampled on both EMC clock edges.
+	 */
+	dram_data_bus_width_bytes = emc->dram_bus_width / 8;
+	do_div(rate, ddr * dram_data_bus_width_bytes);
+	rate = min_t(u64, rate, U32_MAX);
+
+	err = emc_set_min_rate(emc, rate, EMC_RATE_ICC);
+	if (err)
+		return err;
+
+	return 0;
+}
+
+static int tegra_emc_interconnect_init(struct tegra_emc *emc)
+{
+	const struct tegra_mc_soc *soc = emc->mc->soc;
+	struct icc_node *node;
+	int err;
+
+	emc->provider.dev = emc->dev;
+	emc->provider.set = emc_icc_set;
+	emc->provider.data = &emc->provider;
+	emc->provider.aggregate = soc->icc_ops->aggregate;
+	emc->provider.xlate_extended = emc_of_icc_xlate_extended;
+
+	err = icc_provider_add(&emc->provider);
+	if (err)
+		goto err_msg;
+
+	/* create External Memory Controller node */
+	node = icc_node_create(TEGRA_ICC_EMC);
+	if (IS_ERR(node)) {
+		err = PTR_ERR(node);
+		goto del_provider;
+	}
+
+	node->name = "External Memory Controller";
+	icc_node_add(node, &emc->provider);
+
+	/* link External Memory Controller to External Memory (DRAM) */
+	err = icc_link_create(node, TEGRA_ICC_EMEM);
+	if (err)
+		goto remove_nodes;
+
+	/* create External Memory node */
+	node = icc_node_create(TEGRA_ICC_EMEM);
+	if (IS_ERR(node)) {
+		err = PTR_ERR(node);
+		goto remove_nodes;
+	}
+
+	node->name = "External Memory (DRAM)";
+	icc_node_add(node, &emc->provider);
+
+	return 0;
+
+remove_nodes:
+	icc_nodes_remove(&emc->provider);
+del_provider:
+	icc_provider_del(&emc->provider);
+err_msg:
+	dev_err(emc->dev, "failed to initialize ICC: %d\n", err);
+
+	return err;
+}
+
+static int tegra_emc_opp_table_init(struct tegra_emc *emc)
+{
+	u32 hw_version = BIT(tegra_sku_info.soc_speedo_id);
+	struct opp_table *clk_opp_table, *hw_opp_table;
+	int err;
+
+	clk_opp_table = dev_pm_opp_set_clkname(emc->dev, NULL);
+	err = PTR_ERR_OR_ZERO(clk_opp_table);
+	if (err) {
+		dev_err(emc->dev, "failed to set OPP clk: %d\n", err);
+		return err;
+	}
+
+	hw_opp_table = dev_pm_opp_set_supported_hw(emc->dev, &hw_version, 1);
+	err = PTR_ERR_OR_ZERO(hw_opp_table);
+	if (err) {
+		dev_err(emc->dev, "failed to set OPP supported HW: %d\n", err);
+		goto put_clk_table;
+	}
+
+	err = dev_pm_opp_of_add_table(emc->dev);
+	if (err) {
+		if (err == -ENODEV)
+			dev_err(emc->dev, "OPP table not found, please update your device tree\n");
+		else
+			dev_err(emc->dev, "failed to add OPP table: %d\n", err);
+
+		goto put_hw_table;
+	}
+
+	dev_info(emc->dev, "OPP HW ver. 0x%x, current clock rate %lu MHz\n",
+		 hw_version, clk_get_rate(emc->clk) / 1000000);
+
+	/* first dummy rate-set initializes voltage state */
+	err = dev_pm_opp_set_rate(emc->dev, clk_get_rate(emc->clk));
+	if (err) {
+		dev_err(emc->dev, "failed to initialize OPP clock: %d\n", err);
+		goto remove_table;
+	}
+
+	return 0;
+
+remove_table:
+	dev_pm_opp_of_remove_table(emc->dev);
+put_hw_table:
+	dev_pm_opp_put_supported_hw(hw_opp_table);
+put_clk_table:
+	dev_pm_opp_put_clkname(clk_opp_table);
+
+	return err;
+}
+
+static void devm_tegra_emc_unset_callback(void *data)
+{
+	tegra124_clk_set_emc_callbacks(NULL, NULL);
+}
+
 static int tegra_emc_probe(struct platform_device *pdev)
 {
 	struct device_node *np;
@@ -1188,6 +1464,7 @@ static int tegra_emc_probe(struct platform_device *pdev)
 	if (!emc)
 		return -ENOMEM;
 
+	mutex_init(&emc->rate_lock);
 	emc->dev = &pdev->dev;
 
 	emc->regs = devm_platform_ioremap_resource(pdev, 0);
@@ -1223,9 +1500,29 @@ static int tegra_emc_probe(struct platform_device *pdev)
 	tegra124_clk_set_emc_callbacks(tegra_emc_prepare_timing_change,
 				       tegra_emc_complete_timing_change);
 
+	err = devm_add_action_or_reset(&pdev->dev, devm_tegra_emc_unset_callback,
+				       NULL);
+	if (err)
+		return err;
+
+	emc->clk = devm_clk_get(&pdev->dev, "emc");
+	if (IS_ERR(emc->clk)) {
+		err = PTR_ERR(emc->clk);
+		dev_err(&pdev->dev, "failed to get EMC clock: %d\n", err);
+		return err;
+	}
+
+	err = tegra_emc_opp_table_init(emc);
+	if (err)
+		return err;
+
+	tegra_emc_rate_requests_init(emc);
+
 	if (IS_ENABLED(CONFIG_DEBUG_FS))
 		emc_debugfs_init(&pdev->dev, emc);
 
+	tegra_emc_interconnect_init(emc);
+
 	/*
 	 * Don't allow the kernel module to be unloaded. Unloading adds some
 	 * extra complexity which doesn't really worth the effort in a case of
@@ -1242,6 +1539,7 @@ static struct platform_driver tegra_emc_driver = {
 		.name = "tegra-emc",
 		.of_match_table = tegra_emc_of_match,
 		.suppress_bind_attrs = true,
+		.sync_state = icc_sync_state,
 	},
 };
 module_platform_driver(tegra_emc_driver);
diff --git a/drivers/memory/tegra/tegra124.c b/drivers/memory/tegra/tegra124.c
index e2389573d3c0..459211f50c08 100644
--- a/drivers/memory/tegra/tegra124.c
+++ b/drivers/memory/tegra/tegra124.c
@@ -4,7 +4,8 @@
  */
 
 #include <linux/of.h>
-#include <linux/mm.h>
+#include <linux/of_device.h>
+#include <linux/slab.h>
 
 #include <dt-bindings/memory/tegra124-mc.h>
 
@@ -1010,6 +1011,83 @@ static const struct tegra_mc_reset tegra124_mc_resets[] = {
 	TEGRA124_MC_RESET(GPU,       0x970, 0x974,  2),
 };
 
+static int tegra124_mc_icc_set(struct icc_node *src, struct icc_node *dst)
+{
+	/* TODO: program PTSA */
+	return 0;
+}
+
+static int tegra124_mc_icc_aggreate(struct icc_node *node, u32 tag, u32 avg_bw,
+				    u32 peak_bw, u32 *agg_avg, u32 *agg_peak)
+{
+	/*
+	 * ISO clients need to reserve extra bandwidth up-front because
+	 * there could be high bandwidth pressure during initial filling
+	 * of the client's FIFO buffers.  Secondly, we need to take into
+	 * account impurities of the memory subsystem.
+	 */
+	if (tag & TEGRA_MC_ICC_TAG_ISO)
+		peak_bw = tegra_mc_scale_percents(peak_bw, 400);
+
+	*agg_avg += avg_bw;
+	*agg_peak = max(*agg_peak, peak_bw);
+
+	return 0;
+}
+
+static struct icc_node_data *
+tegra124_mc_of_icc_xlate_extended(struct of_phandle_args *spec, void *data)
+{
+	struct tegra_mc *mc = icc_provider_to_tegra_mc(data);
+	const struct tegra_mc_client *client;
+	unsigned int i, idx = spec->args[0];
+	struct icc_node_data *ndata;
+	struct icc_node *node;
+
+	list_for_each_entry(node, &mc->provider.nodes, node_list) {
+		if (node->id != idx)
+			continue;
+
+		ndata = kzalloc(sizeof(*ndata), GFP_KERNEL);
+		if (!ndata)
+			return ERR_PTR(-ENOMEM);
+
+		client = &mc->soc->clients[idx];
+		ndata->node = node;
+
+		switch (client->swgroup) {
+		case TEGRA_SWGROUP_DC:
+		case TEGRA_SWGROUP_DCB:
+		case TEGRA_SWGROUP_PTC:
+		case TEGRA_SWGROUP_VI:
+			/* these clients are isochronous by default */
+			ndata->tag = TEGRA_MC_ICC_TAG_ISO;
+			break;
+
+		default:
+			ndata->tag = TEGRA_MC_ICC_TAG_DEFAULT;
+			break;
+		}
+
+		return ndata;
+	}
+
+	for (i = 0; i < mc->soc->num_clients; i++) {
+		if (mc->soc->clients[i].id == idx)
+			return ERR_PTR(-EPROBE_DEFER);
+	}
+
+	dev_err(mc->dev, "invalid ICC client ID %u\n", idx);
+
+	return ERR_PTR(-EINVAL);
+}
+
+static const struct tegra_mc_icc_ops tegra124_mc_icc_ops = {
+	.xlate_extended = tegra124_mc_of_icc_xlate_extended,
+	.aggregate = tegra124_mc_icc_aggreate,
+	.set = tegra124_mc_icc_set,
+};
+
 #ifdef CONFIG_ARCH_TEGRA_124_SOC
 static const unsigned long tegra124_mc_emem_regs[] = {
 	MC_EMEM_ARB_CFG,
@@ -1061,6 +1139,7 @@ const struct tegra_mc_soc tegra124_mc_soc = {
 	.reset_ops = &tegra_mc_reset_ops_common,
 	.resets = tegra124_mc_resets,
 	.num_resets = ARRAY_SIZE(tegra124_mc_resets),
+	.icc_ops = &tegra124_mc_icc_ops,
 };
 #endif /* CONFIG_ARCH_TEGRA_124_SOC */
 
@@ -1091,5 +1170,6 @@ const struct tegra_mc_soc tegra132_mc_soc = {
 	.reset_ops = &tegra_mc_reset_ops_common,
 	.resets = tegra124_mc_resets,
 	.num_resets = ARRAY_SIZE(tegra124_mc_resets),
+	.icc_ops = &tegra124_mc_icc_ops,
 };
 #endif /* CONFIG_ARCH_TEGRA_132_SOC */
-- 
2.29.2


^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH v11 07/10] drm/tegra: dc: Support memory bandwidth management
  2020-12-03 19:24 [PATCH v11 00/10] Introduce memory interconnect for NVIDIA Tegra SoCs Dmitry Osipenko
                   ` (5 preceding siblings ...)
  2020-12-03 19:24 ` [PATCH v11 06/10] memory: tegra124: Support interconnect framework Dmitry Osipenko
@ 2020-12-03 19:24 ` Dmitry Osipenko
  2020-12-03 19:24 ` [PATCH v11 08/10] drm/tegra: dc: Extend debug stats with total number of events Dmitry Osipenko
                   ` (3 subsequent siblings)
  10 siblings, 0 replies; 22+ messages in thread
From: Dmitry Osipenko @ 2020-12-03 19:24 UTC (permalink / raw)
  To: Thierry Reding, Jonathan Hunter, Georgi Djakov, Rob Herring,
	Michael Turquette, Stephen Boyd, Peter De Schrijver,
	MyungJoo Ham, Kyungmin Park, Chanwoo Choi, Mikko Perttunen,
	Viresh Kumar, Peter Geis, Nicolas Chauvet, Krzysztof Kozlowski
  Cc: linux-tegra, linux-pm, linux-kernel, dri-devel, devicetree

Display controller (DC) performs isochronous memory transfers, and thus,
has a requirement for a minimum memory bandwidth that shall be fulfilled,
otherwise framebuffer data can't be fetched fast enough and this results
in a DC's data-FIFO underflow that follows by a visual corruption.

The Memory Controller drivers provide facility for memory bandwidth
management via interconnect API. Let's wire up the interconnect API
support to the DC driver in order to fix the distorted display output
on T30 Ouya, T124 TK1 and other Tegra devices.

Tested-by: Peter Geis <pgwipeout@gmail.com>
Tested-by: Nicolas Chauvet <kwizart@gmail.com>
Signed-off-by: Dmitry Osipenko <digetx@gmail.com>
---
 drivers/gpu/drm/tegra/Kconfig |   1 +
 drivers/gpu/drm/tegra/dc.c    | 349 ++++++++++++++++++++++++++++++++++
 drivers/gpu/drm/tegra/dc.h    |  14 ++
 drivers/gpu/drm/tegra/drm.c   |  14 ++
 drivers/gpu/drm/tegra/hub.c   |   3 +
 drivers/gpu/drm/tegra/plane.c | 121 ++++++++++++
 drivers/gpu/drm/tegra/plane.h |  15 ++
 7 files changed, 517 insertions(+)

diff --git a/drivers/gpu/drm/tegra/Kconfig b/drivers/gpu/drm/tegra/Kconfig
index 5043dcaf1cf9..1650a448eabd 100644
--- a/drivers/gpu/drm/tegra/Kconfig
+++ b/drivers/gpu/drm/tegra/Kconfig
@@ -9,6 +9,7 @@ config DRM_TEGRA
 	select DRM_MIPI_DSI
 	select DRM_PANEL
 	select TEGRA_HOST1X
+	select INTERCONNECT
 	select IOMMU_IOVA
 	select CEC_CORE if CEC_NOTIFIER
 	help
diff --git a/drivers/gpu/drm/tegra/dc.c b/drivers/gpu/drm/tegra/dc.c
index 85dd7131553a..5c587cfd1bb2 100644
--- a/drivers/gpu/drm/tegra/dc.c
+++ b/drivers/gpu/drm/tegra/dc.c
@@ -8,6 +8,7 @@
 #include <linux/debugfs.h>
 #include <linux/delay.h>
 #include <linux/iommu.h>
+#include <linux/interconnect.h>
 #include <linux/module.h>
 #include <linux/of_device.h>
 #include <linux/pm_runtime.h>
@@ -616,6 +617,9 @@ static int tegra_plane_atomic_check(struct drm_plane *plane,
 	struct tegra_dc *dc = to_tegra_dc(state->crtc);
 	int err;
 
+	plane_state->peak_memory_bandwidth = 0;
+	plane_state->avg_memory_bandwidth = 0;
+
 	/* no need for further checks if the plane is being disabled */
 	if (!state->crtc)
 		return 0;
@@ -802,6 +806,12 @@ static struct drm_plane *tegra_primary_plane_create(struct drm_device *drm,
 	formats = dc->soc->primary_formats;
 	modifiers = dc->soc->modifiers;
 
+	err = tegra_plane_interconnect_init(plane);
+	if (err) {
+		kfree(plane);
+		return ERR_PTR(err);
+	}
+
 	err = drm_universal_plane_init(drm, &plane->base, possible_crtcs,
 				       &tegra_plane_funcs, formats,
 				       num_formats, modifiers, type, NULL);
@@ -833,9 +843,13 @@ static const u32 tegra_cursor_plane_formats[] = {
 static int tegra_cursor_atomic_check(struct drm_plane *plane,
 				     struct drm_plane_state *state)
 {
+	struct tegra_plane_state *plane_state = to_tegra_plane_state(state);
 	struct tegra_plane *tegra = to_tegra_plane(plane);
 	int err;
 
+	plane_state->peak_memory_bandwidth = 0;
+	plane_state->avg_memory_bandwidth = 0;
+
 	/* no need for further checks if the plane is being disabled */
 	if (!state->crtc)
 		return 0;
@@ -973,6 +987,12 @@ static struct drm_plane *tegra_dc_cursor_plane_create(struct drm_device *drm,
 	num_formats = ARRAY_SIZE(tegra_cursor_plane_formats);
 	formats = tegra_cursor_plane_formats;
 
+	err = tegra_plane_interconnect_init(plane);
+	if (err) {
+		kfree(plane);
+		return ERR_PTR(err);
+	}
+
 	err = drm_universal_plane_init(drm, &plane->base, possible_crtcs,
 				       &tegra_plane_funcs, formats,
 				       num_formats, NULL,
@@ -1087,6 +1107,12 @@ static struct drm_plane *tegra_dc_overlay_plane_create(struct drm_device *drm,
 	num_formats = dc->soc->num_overlay_formats;
 	formats = dc->soc->overlay_formats;
 
+	err = tegra_plane_interconnect_init(plane);
+	if (err) {
+		kfree(plane);
+		return ERR_PTR(err);
+	}
+
 	if (!cursor)
 		type = DRM_PLANE_TYPE_OVERLAY;
 	else
@@ -1204,6 +1230,7 @@ tegra_crtc_atomic_duplicate_state(struct drm_crtc *crtc)
 {
 	struct tegra_dc_state *state = to_dc_state(crtc->state);
 	struct tegra_dc_state *copy;
+	unsigned int i;
 
 	copy = kmalloc(sizeof(*copy), GFP_KERNEL);
 	if (!copy)
@@ -1215,6 +1242,9 @@ tegra_crtc_atomic_duplicate_state(struct drm_crtc *crtc)
 	copy->div = state->div;
 	copy->planes = state->planes;
 
+	for (i = 0; i < ARRAY_SIZE(state->plane_peak_bw); i++)
+		copy->plane_peak_bw[i] = state->plane_peak_bw[i];
+
 	return &copy->base;
 }
 
@@ -1741,6 +1771,106 @@ static int tegra_dc_wait_idle(struct tegra_dc *dc, unsigned long timeout)
 	return -ETIMEDOUT;
 }
 
+static void
+tegra_crtc_update_memory_bandwidth(struct drm_crtc *crtc,
+				   struct drm_atomic_state *state,
+				   bool prepare_bandwidth_transition)
+{
+	const struct tegra_plane_state *old_tegra_state, *new_tegra_state;
+	const struct tegra_dc_state *old_dc_state, *new_dc_state;
+	u32 i, new_avg_bw, old_avg_bw, new_peak_bw, old_peak_bw;
+	const struct drm_plane_state *old_plane_state;
+	const struct drm_crtc_state *old_crtc_state;
+	struct tegra_dc_window window, old_window;
+	struct tegra_dc *dc = to_tegra_dc(crtc);
+	struct tegra_plane *tegra;
+	struct drm_plane *plane;
+
+	if (dc->soc->has_nvdisplay)
+		return;
+
+	old_crtc_state = drm_atomic_get_old_crtc_state(state, crtc);
+	old_dc_state = to_const_dc_state(old_crtc_state);
+	new_dc_state = to_const_dc_state(crtc->state);
+
+	if (!crtc->state->active) {
+		if (!old_crtc_state->active)
+			return;
+
+		/*
+		 * When CRTC is disabled on DPMS, the state of attached planes
+		 * is kept unchanged. Hence we need to enforce removal of the
+		 * bandwidths from the ICC paths.
+		 */
+		drm_atomic_crtc_for_each_plane(plane, crtc) {
+			tegra = to_tegra_plane(plane);
+
+			icc_set_bw(tegra->icc_mem, 0, 0);
+			icc_set_bw(tegra->icc_mem_vfilter, 0, 0);
+		}
+
+		return;
+	}
+
+	for_each_old_plane_in_state(old_crtc_state->state, plane,
+				    old_plane_state, i) {
+		old_tegra_state = to_const_tegra_plane_state(old_plane_state);
+		new_tegra_state = to_const_tegra_plane_state(plane->state);
+		tegra = to_tegra_plane(plane);
+
+		/*
+		 * We're iterating over the global atomic state and it contains
+		 * planes from another CRTC, hence we need to filter out the
+		 * planes unrelated to this CRTC.
+		 */
+		if (tegra->dc != dc)
+			continue;
+
+		new_avg_bw = new_tegra_state->avg_memory_bandwidth;
+		old_avg_bw = old_tegra_state->avg_memory_bandwidth;
+
+		new_peak_bw = new_dc_state->plane_peak_bw[tegra->index];
+		old_peak_bw = old_dc_state->plane_peak_bw[tegra->index];
+
+		/*
+		 * See the comment related to !crtc->state->active above,
+		 * which explains why bandwidths need to be updated when
+		 * CRTC is turning ON.
+		 */
+		if (new_avg_bw == old_avg_bw && new_peak_bw == old_peak_bw &&
+		    old_crtc_state->active)
+			continue;
+
+		window.src.h = drm_rect_height(&plane->state->src) >> 16;
+		window.dst.h = drm_rect_height(&plane->state->dst);
+
+		old_window.src.h = drm_rect_height(&old_plane_state->src) >> 16;
+		old_window.dst.h = drm_rect_height(&old_plane_state->dst);
+
+		/*
+		 * During the preparation phase (atomic_begin), the memory
+		 * freq should go high before the DC changes are committed
+		 * if bandwidth requirement goes up, otherwise memory freq
+		 * should to stay high if BW requirement goes down.  The
+		 * opposite applies to the completion phase (post_commit).
+		 */
+		if (prepare_bandwidth_transition) {
+			new_avg_bw = max(old_avg_bw, new_avg_bw);
+			new_peak_bw = max(old_peak_bw, new_peak_bw);
+
+			if (tegra_plane_use_vertical_filtering(tegra, &old_window))
+				window = old_window;
+		}
+
+		icc_set_bw(tegra->icc_mem, new_avg_bw, new_peak_bw);
+
+		if (tegra_plane_use_vertical_filtering(tegra, &window))
+			icc_set_bw(tegra->icc_mem_vfilter, new_avg_bw, new_peak_bw);
+		else
+			icc_set_bw(tegra->icc_mem_vfilter, 0, 0);
+	}
+}
+
 static void tegra_crtc_atomic_disable(struct drm_crtc *crtc,
 				      struct drm_atomic_state *state)
 {
@@ -1922,6 +2052,8 @@ static void tegra_crtc_atomic_begin(struct drm_crtc *crtc,
 {
 	unsigned long flags;
 
+	tegra_crtc_update_memory_bandwidth(crtc, state, true);
+
 	if (crtc->state->event) {
 		spin_lock_irqsave(&crtc->dev->event_lock, flags);
 
@@ -1954,7 +2086,212 @@ static void tegra_crtc_atomic_flush(struct drm_crtc *crtc,
 	value = tegra_dc_readl(dc, DC_CMD_STATE_CONTROL);
 }
 
+static bool tegra_plane_is_cursor(const struct drm_plane_state *state)
+{
+	const struct tegra_dc_soc_info *soc = to_tegra_dc(state->crtc)->soc;
+	const struct drm_format_info *fmt = state->fb->format;
+	unsigned int src_w = drm_rect_width(&state->src) >> 16;
+	unsigned int dst_w = drm_rect_width(&state->dst);
+
+	if (state->plane->type != DRM_PLANE_TYPE_CURSOR)
+		return false;
+
+	if (soc->supports_cursor)
+		return true;
+
+	if (src_w != dst_w || fmt->num_planes != 1 || src_w * fmt->cpp[0] > 256)
+		return false;
+
+	return true;
+}
+
+static unsigned long
+tegra_plane_overlap_mask(struct drm_crtc_state *state,
+			 const struct drm_plane_state *plane_state)
+{
+	const struct drm_plane_state *other_state;
+	const struct tegra_plane *tegra;
+	unsigned long overlap_mask = 0;
+	struct drm_plane *plane;
+	struct drm_rect rect;
+
+	if (!plane_state->visible || !plane_state->fb)
+		return 0;
+
+	drm_atomic_crtc_state_for_each_plane_state(plane, other_state, state) {
+		rect = plane_state->dst;
+
+		tegra = to_tegra_plane(other_state->plane);
+
+		if (!other_state->visible || !other_state->fb)
+			continue;
+
+		/*
+		 * Ignore cursor plane overlaps because it's not practical to
+		 * assume that it contributes to the bandwidth in overlapping
+		 * area if window width is small.
+		 */
+		if (tegra_plane_is_cursor(other_state))
+			continue;
+
+		if (drm_rect_intersect(&rect, &other_state->dst))
+			overlap_mask |= BIT(tegra->index);
+	}
+
+	/*
+	 * Data prefetch FIFO will easily help to overcome temporal memory
+	 * pressure if other plane overlaps with the cursor plane.
+	 */
+	if (tegra_plane_is_cursor(plane_state) && overlap_mask)
+		return 0;
+
+	return overlap_mask;
+}
+
+static struct drm_plane *
+tegra_crtc_get_plane_by_index(struct drm_crtc *crtc, unsigned int index)
+{
+	struct drm_plane *plane;
+
+	drm_atomic_crtc_for_each_plane(plane, crtc) {
+		if (to_tegra_plane(plane)->index == index)
+			return plane;
+	}
+
+	return NULL;
+}
+
+static int tegra_crtc_check_bandwidth_state(struct drm_crtc *crtc,
+					    struct drm_atomic_state *state)
+{
+	ulong overlap_mask[TEGRA_DC_LEGACY_PLANES_NUM] = {}, mask;
+	u32 plane_peak_bw[TEGRA_DC_LEGACY_PLANES_NUM] = {};
+	bool all_planes_overlap_simultaneously = true;
+	const struct tegra_plane_state *tegra_state;
+	const struct drm_plane_state *plane_state;
+	const struct tegra_dc_state *old_dc_state;
+	struct tegra_dc *dc = to_tegra_dc(crtc);
+	const struct drm_crtc_state *old_state;
+	struct tegra_dc_state *new_dc_state;
+	struct drm_crtc_state *new_state;
+	struct tegra_plane *tegra;
+	struct drm_plane *plane;
+	u32 i, k, overlap_bw;
+
+	/*
+	 * The nv-display uses shared planes.  The algorithm below assumes
+	 * maximum 3 planes per-CRTC, this assumption isn't applicable to
+	 * the nv-display.  Note that T124 support has additional windows,
+	 * but currently they aren't supported by the driver.
+	 */
+	if (dc->soc->has_nvdisplay)
+		return 0;
+
+	new_state = drm_atomic_get_new_crtc_state(state, crtc);
+	new_dc_state = to_dc_state(new_state);
+
+	/*
+	 * For overlapping planes pixel's data is fetched for each plane at
+	 * the same time, hence bandwidths are accumulated in this case.
+	 * This needs to be taken into account for calculating total bandwidth
+	 * consumed by all planes.
+	 *
+	 * Here we get the overlapping state of each plane, which is a
+	 * bitmask of plane indices telling with what planes there is an
+	 * overlap. Note that bitmask[plane] includes BIT(plane) in order
+	 * to make further code nicer and simpler.
+	 */
+	drm_atomic_crtc_state_for_each_plane_state(plane, plane_state, new_state) {
+		tegra_state = to_const_tegra_plane_state(plane_state);
+		tegra = to_tegra_plane(plane);
+
+		plane_peak_bw[tegra->index] = tegra_state->peak_memory_bandwidth;
+		mask = tegra_plane_overlap_mask(new_state, plane_state);
+		overlap_mask[tegra->index] = mask;
+
+		if (hweight_long(mask) != 3)
+			all_planes_overlap_simultaneously = false;
+	}
+
+	old_state = drm_atomic_get_old_crtc_state(state, crtc);
+	old_dc_state = to_const_dc_state(old_state);
+
+	/*
+	 * Then we calculate maximum bandwidth of each plane state.
+	 * The bandwidth includes the plane BW + BW of the "simultaneously"
+	 * overlapping planes, where "simultaneously" means areas where DC
+	 * fetches from the planes simultaneously during of scan-out process.
+	 *
+	 * For example, if plane A overlaps with planes B and C, but B and C
+	 * don't overlap, then the peak bandwidth will be either in area where
+	 * A-and-B or A-and-C planes overlap.
+	 *
+	 * The plane_peak_bw[] contains peak memory bandwidth values of
+	 * each plane, this information is needed by interconnect provider
+	 * in order to set up latency allowness based on the peak BW, see
+	 * tegra_crtc_update_memory_bandwidth().
+	 */
+	for (i = 0; i < ARRAY_SIZE(plane_peak_bw); i++) {
+		overlap_bw = 0;
+
+		for_each_set_bit(k, &overlap_mask[i], 3) {
+			if (k == i)
+				continue;
+
+			if (all_planes_overlap_simultaneously)
+				overlap_bw += plane_peak_bw[k];
+			else
+				overlap_bw = max(overlap_bw, plane_peak_bw[k]);
+		}
+
+		new_dc_state->plane_peak_bw[i] = plane_peak_bw[i] + overlap_bw;
+
+		/*
+		 * If plane's peak bandwidth changed (for example plane isn't
+		 * overlapped anymore) and plane isn't in the atomic state,
+		 * then add plane to the state in order to have the bandwidth
+		 * updated.
+		 */
+		if (old_dc_state->plane_peak_bw[i] !=
+		    new_dc_state->plane_peak_bw[i]) {
+			plane = tegra_crtc_get_plane_by_index(crtc, i);
+			if (!plane)
+				continue;
+
+			plane_state = drm_atomic_get_plane_state(state, plane);
+			if (IS_ERR(plane_state))
+				return PTR_ERR(plane_state);
+		}
+	}
+
+	return 0;
+}
+
+static int tegra_crtc_atomic_check(struct drm_crtc *crtc,
+				   struct drm_atomic_state *state)
+{
+	int err;
+
+	err = tegra_crtc_check_bandwidth_state(crtc, state);
+	if (err)
+		return err;
+
+	return 0;
+}
+
+void tegra_crtc_atomic_post_commit(struct drm_crtc *crtc,
+				   struct drm_atomic_state *state)
+{
+	/*
+	 * Display bandwidth is allowed to go down only once hardware state
+	 * is known to be armed, i.e. state was committed and VBLANK event
+	 * received.
+	 */
+	tegra_crtc_update_memory_bandwidth(crtc, state, false);
+}
+
 static const struct drm_crtc_helper_funcs tegra_crtc_helper_funcs = {
+	.atomic_check = tegra_crtc_atomic_check,
 	.atomic_begin = tegra_crtc_atomic_begin,
 	.atomic_flush = tegra_crtc_atomic_flush,
 	.atomic_enable = tegra_crtc_atomic_enable,
@@ -2245,7 +2582,9 @@ static const struct tegra_dc_soc_info tegra20_dc_soc_info = {
 	.overlay_formats = tegra20_overlay_formats,
 	.modifiers = tegra20_modifiers,
 	.has_win_a_without_filters = true,
+	.has_win_b_vfilter_mem_client = true,
 	.has_win_c_without_vert_filter = true,
+	.plane_tiled_memory_bandwidth_x2 = false,
 };
 
 static const struct tegra_dc_soc_info tegra30_dc_soc_info = {
@@ -2264,7 +2603,9 @@ static const struct tegra_dc_soc_info tegra30_dc_soc_info = {
 	.overlay_formats = tegra20_overlay_formats,
 	.modifiers = tegra20_modifiers,
 	.has_win_a_without_filters = false,
+	.has_win_b_vfilter_mem_client = true,
 	.has_win_c_without_vert_filter = false,
+	.plane_tiled_memory_bandwidth_x2 = true,
 };
 
 static const struct tegra_dc_soc_info tegra114_dc_soc_info = {
@@ -2283,7 +2624,9 @@ static const struct tegra_dc_soc_info tegra114_dc_soc_info = {
 	.overlay_formats = tegra114_overlay_formats,
 	.modifiers = tegra20_modifiers,
 	.has_win_a_without_filters = false,
+	.has_win_b_vfilter_mem_client = false,
 	.has_win_c_without_vert_filter = false,
+	.plane_tiled_memory_bandwidth_x2 = true,
 };
 
 static const struct tegra_dc_soc_info tegra124_dc_soc_info = {
@@ -2302,7 +2645,9 @@ static const struct tegra_dc_soc_info tegra124_dc_soc_info = {
 	.overlay_formats = tegra124_overlay_formats,
 	.modifiers = tegra124_modifiers,
 	.has_win_a_without_filters = false,
+	.has_win_b_vfilter_mem_client = false,
 	.has_win_c_without_vert_filter = false,
+	.plane_tiled_memory_bandwidth_x2 = false,
 };
 
 static const struct tegra_dc_soc_info tegra210_dc_soc_info = {
@@ -2321,7 +2666,9 @@ static const struct tegra_dc_soc_info tegra210_dc_soc_info = {
 	.overlay_formats = tegra114_overlay_formats,
 	.modifiers = tegra124_modifiers,
 	.has_win_a_without_filters = false,
+	.has_win_b_vfilter_mem_client = false,
 	.has_win_c_without_vert_filter = false,
+	.plane_tiled_memory_bandwidth_x2 = false,
 };
 
 static const struct tegra_windowgroup_soc tegra186_dc_wgrps[] = {
@@ -2370,6 +2717,7 @@ static const struct tegra_dc_soc_info tegra186_dc_soc_info = {
 	.has_nvdisplay = true,
 	.wgrps = tegra186_dc_wgrps,
 	.num_wgrps = ARRAY_SIZE(tegra186_dc_wgrps),
+	.plane_tiled_memory_bandwidth_x2 = false,
 };
 
 static const struct tegra_windowgroup_soc tegra194_dc_wgrps[] = {
@@ -2418,6 +2766,7 @@ static const struct tegra_dc_soc_info tegra194_dc_soc_info = {
 	.has_nvdisplay = true,
 	.wgrps = tegra194_dc_wgrps,
 	.num_wgrps = ARRAY_SIZE(tegra194_dc_wgrps),
+	.plane_tiled_memory_bandwidth_x2 = false,
 };
 
 static const struct of_device_id tegra_dc_of_match[] = {
diff --git a/drivers/gpu/drm/tegra/dc.h b/drivers/gpu/drm/tegra/dc.h
index 051d03dcb9b0..0d7bdf66a1ec 100644
--- a/drivers/gpu/drm/tegra/dc.h
+++ b/drivers/gpu/drm/tegra/dc.h
@@ -15,6 +15,8 @@
 
 struct tegra_output;
 
+#define TEGRA_DC_LEGACY_PLANES_NUM	6
+
 struct tegra_dc_state {
 	struct drm_crtc_state base;
 
@@ -23,6 +25,8 @@ struct tegra_dc_state {
 	unsigned int div;
 
 	u32 planes;
+
+	unsigned long plane_peak_bw[TEGRA_DC_LEGACY_PLANES_NUM];
 };
 
 static inline struct tegra_dc_state *to_dc_state(struct drm_crtc_state *state)
@@ -33,6 +37,12 @@ static inline struct tegra_dc_state *to_dc_state(struct drm_crtc_state *state)
 	return NULL;
 }
 
+static inline const struct tegra_dc_state *
+to_const_dc_state(const struct drm_crtc_state *state)
+{
+	return to_dc_state((struct drm_crtc_state *)state);
+}
+
 struct tegra_dc_stats {
 	unsigned long frames;
 	unsigned long vblank;
@@ -65,7 +75,9 @@ struct tegra_dc_soc_info {
 	unsigned int num_overlay_formats;
 	const u64 *modifiers;
 	bool has_win_a_without_filters;
+	bool has_win_b_vfilter_mem_client;
 	bool has_win_c_without_vert_filter;
+	unsigned int plane_tiled_memory_bandwidth_x2;
 };
 
 struct tegra_dc {
@@ -151,6 +163,8 @@ int tegra_dc_state_setup_clock(struct tegra_dc *dc,
 			       struct drm_crtc_state *crtc_state,
 			       struct clk *clk, unsigned long pclk,
 			       unsigned int div);
+void tegra_crtc_atomic_post_commit(struct drm_crtc *crtc,
+				   struct drm_atomic_state *state);
 
 /* from rgb.c */
 int tegra_dc_rgb_probe(struct tegra_dc *dc);
diff --git a/drivers/gpu/drm/tegra/drm.c b/drivers/gpu/drm/tegra/drm.c
index e45c8414e2a3..1472042da2a7 100644
--- a/drivers/gpu/drm/tegra/drm.c
+++ b/drivers/gpu/drm/tegra/drm.c
@@ -20,6 +20,7 @@
 #include <drm/drm_prime.h>
 #include <drm/drm_vblank.h>
 
+#include "dc.h"
 #include "drm.h"
 #include "gem.h"
 
@@ -59,6 +60,17 @@ static const struct drm_mode_config_funcs tegra_drm_mode_config_funcs = {
 	.atomic_commit = drm_atomic_helper_commit,
 };
 
+static void tegra_atomic_post_commit(struct drm_device *drm,
+				     struct drm_atomic_state *old_state)
+{
+	struct drm_crtc_state *old_crtc_state __maybe_unused;
+	struct drm_crtc *crtc;
+	unsigned int i;
+
+	for_each_old_crtc_in_state(old_state, crtc, old_crtc_state, i)
+		tegra_crtc_atomic_post_commit(crtc, old_state);
+}
+
 static void tegra_atomic_commit_tail(struct drm_atomic_state *old_state)
 {
 	struct drm_device *drm = old_state->dev;
@@ -75,6 +87,8 @@ static void tegra_atomic_commit_tail(struct drm_atomic_state *old_state)
 	} else {
 		drm_atomic_helper_commit_tail_rpm(old_state);
 	}
+
+	tegra_atomic_post_commit(drm, old_state);
 }
 
 static const struct drm_mode_config_helper_funcs
diff --git a/drivers/gpu/drm/tegra/hub.c b/drivers/gpu/drm/tegra/hub.c
index 22a03f7ffdc1..4fa338dc7eb2 100644
--- a/drivers/gpu/drm/tegra/hub.c
+++ b/drivers/gpu/drm/tegra/hub.c
@@ -344,6 +344,9 @@ static int tegra_shared_plane_atomic_check(struct drm_plane *plane,
 	struct tegra_dc *dc = to_tegra_dc(state->crtc);
 	int err;
 
+	plane_state->peak_memory_bandwidth = 0;
+	plane_state->avg_memory_bandwidth = 0;
+
 	/* no need for further checks if the plane is being disabled */
 	if (!state->crtc || !state->fb)
 		return 0;
diff --git a/drivers/gpu/drm/tegra/plane.c b/drivers/gpu/drm/tegra/plane.c
index 539d14935728..1e589c5af143 100644
--- a/drivers/gpu/drm/tegra/plane.c
+++ b/drivers/gpu/drm/tegra/plane.c
@@ -4,6 +4,7 @@
  */
 
 #include <linux/iommu.h>
+#include <linux/interconnect.h>
 
 #include <drm/drm_atomic.h>
 #include <drm/drm_atomic_helper.h>
@@ -64,6 +65,8 @@ tegra_plane_atomic_duplicate_state(struct drm_plane *plane)
 	copy->reflect_x = state->reflect_x;
 	copy->reflect_y = state->reflect_y;
 	copy->opaque = state->opaque;
+	copy->peak_memory_bandwidth = state->peak_memory_bandwidth;
+	copy->avg_memory_bandwidth = state->avg_memory_bandwidth;
 
 	for (i = 0; i < 2; i++)
 		copy->blending[i] = state->blending[i];
@@ -212,6 +215,87 @@ void tegra_plane_cleanup_fb(struct drm_plane *plane,
 		tegra_dc_unpin(dc, to_tegra_plane_state(state));
 }
 
+static int tegra_plane_check_memory_bandwidth(struct drm_plane_state *state)
+{
+	struct tegra_plane_state *tegra_state = to_tegra_plane_state(state);
+	unsigned int i, bpp, bpp_plane, dst_w, src_w, src_h, mul;
+	const struct tegra_dc_soc_info *soc;
+	const struct drm_format_info *fmt;
+	struct drm_crtc_state *crtc_state;
+	u32 avg_bandwidth, peak_bandwidth;
+
+	if (!state->visible)
+		return 0;
+
+	crtc_state = drm_atomic_get_new_crtc_state(state->state, state->crtc);
+	if (!crtc_state)
+		return -EINVAL;
+
+	src_w = drm_rect_width(&state->src) >> 16;
+	src_h = drm_rect_height(&state->src) >> 16;
+	dst_w = drm_rect_width(&state->dst);
+
+	fmt = state->fb->format;
+	soc = to_tegra_dc(state->crtc)->soc;
+
+	/*
+	 * Note that real memory bandwidth vary depending on format and
+	 * memory layout, we are not taking that into account because small
+	 * estimation error isn't important since bandwidth is rounded up
+	 * anyway.
+	 */
+	for (i = 0, bpp = 0; i < fmt->num_planes; i++) {
+		bpp_plane = fmt->cpp[i] * 8;
+
+		/*
+		 * Sub-sampling is relevant for chroma planes only and vertical
+		 * readouts are not cached, hence only horizontal sub-sampling
+		 * matters.
+		 */
+		if (i > 0)
+			bpp_plane /= fmt->hsub;
+
+		bpp += bpp_plane;
+	}
+
+	/*
+	 * Horizontal downscale takes extra bandwidth which roughly depends
+	 * on the scaled width.
+	 */
+	if (src_w > dst_w)
+		mul = (src_w - dst_w) * bpp / 2048 + 1;
+	else
+		mul = 1;
+
+	/* average bandwidth in bytes/s */
+	avg_bandwidth  = src_w * src_h * bpp / 8 * mul;
+	avg_bandwidth *= drm_mode_vrefresh(&crtc_state->mode);
+
+	/* mode.clock in kHz, peak bandwidth in kbit/s */
+	peak_bandwidth = crtc_state->mode.clock * bpp * mul;
+
+	/* ICC bandwidth in kbyte/s */
+	peak_bandwidth = kbps_to_icc(peak_bandwidth);
+	avg_bandwidth  = Bps_to_icc(avg_bandwidth);
+
+	/*
+	 * Tegra30/114 Memory Controller can't interleave DC memory requests
+	 * and DC uses 16-bytes atom for the tiled windows, while DDR3 uses 32
+	 * bytes atom. Hence there is x2 memory overfetch for tiled framebuffer
+	 * and DDR3 on older SoCs.
+	 */
+	if (soc->plane_tiled_memory_bandwidth_x2 &&
+	    tegra_state->tiling.mode == TEGRA_BO_TILING_MODE_TILED) {
+		peak_bandwidth *= 2;
+		avg_bandwidth *= 2;
+	}
+
+	tegra_state->peak_memory_bandwidth = peak_bandwidth;
+	tegra_state->avg_memory_bandwidth = avg_bandwidth;
+
+	return 0;
+}
+
 int tegra_plane_state_add(struct tegra_plane *plane,
 			  struct drm_plane_state *state)
 {
@@ -230,6 +314,10 @@ int tegra_plane_state_add(struct tegra_plane *plane,
 	if (err < 0)
 		return err;
 
+	err = tegra_plane_check_memory_bandwidth(state);
+	if (err < 0)
+		return err;
+
 	tegra = to_dc_state(crtc_state);
 
 	tegra->planes |= WIN_A_ACT_REQ << plane->index;
@@ -595,3 +683,36 @@ int tegra_plane_setup_legacy_state(struct tegra_plane *tegra,
 
 	return 0;
 }
+
+static const char * const tegra_plane_icc_names[] = {
+	"wina", "winb", "winc", "", "", "", "cursor",
+};
+
+int tegra_plane_interconnect_init(struct tegra_plane *plane)
+{
+	const char *icc_name = tegra_plane_icc_names[plane->index];
+	struct device *dev = plane->dc->dev;
+	struct tegra_dc *dc = plane->dc;
+	int err;
+
+	plane->icc_mem = devm_of_icc_get(dev, icc_name);
+	err = PTR_ERR_OR_ZERO(plane->icc_mem);
+	if (err) {
+		dev_err_probe(dev, err, "failed to get %s interconnect\n",
+			      icc_name);
+		return err;
+	}
+
+	/* plane B on T20/30 has a dedicated memory client for a 6-tap vertical filter */
+	if (plane->index == 1 && dc->soc->has_win_b_vfilter_mem_client) {
+		plane->icc_mem_vfilter = devm_of_icc_get(dev, "winb-vfilter");
+		err = PTR_ERR_OR_ZERO(plane->icc_mem_vfilter);
+		if (err) {
+			dev_err_probe(dev, err, "failed to get %s interconnect\n",
+				      "winb-vfilter");
+			return err;
+		}
+	}
+
+	return 0;
+}
diff --git a/drivers/gpu/drm/tegra/plane.h b/drivers/gpu/drm/tegra/plane.h
index c691dd79b27b..f2731aae7d01 100644
--- a/drivers/gpu/drm/tegra/plane.h
+++ b/drivers/gpu/drm/tegra/plane.h
@@ -8,6 +8,7 @@
 
 #include <drm/drm_plane.h>
 
+struct icc_path;
 struct tegra_bo;
 struct tegra_dc;
 
@@ -16,6 +17,9 @@ struct tegra_plane {
 	struct tegra_dc *dc;
 	unsigned int offset;
 	unsigned int index;
+
+	struct icc_path *icc_mem;
+	struct icc_path *icc_mem_vfilter;
 };
 
 struct tegra_cursor {
@@ -52,6 +56,10 @@ struct tegra_plane_state {
 	/* used for legacy blending support only */
 	struct tegra_plane_legacy_blending_state blending[2];
 	bool opaque;
+
+	/* bandwidths are in ICC units, i.e. kbytes/sec */
+	u32 peak_memory_bandwidth;
+	u32 avg_memory_bandwidth;
 };
 
 static inline struct tegra_plane_state *
@@ -63,6 +71,12 @@ to_tegra_plane_state(struct drm_plane_state *state)
 	return NULL;
 }
 
+static inline const struct tegra_plane_state *
+to_const_tegra_plane_state(const struct drm_plane_state *state)
+{
+	return to_tegra_plane_state((struct drm_plane_state *)state);
+}
+
 extern const struct drm_plane_funcs tegra_plane_funcs;
 
 int tegra_plane_prepare_fb(struct drm_plane *plane,
@@ -77,5 +91,6 @@ int tegra_plane_format(u32 fourcc, u32 *format, u32 *swap);
 bool tegra_plane_format_is_yuv(unsigned int format, bool *planar);
 int tegra_plane_setup_legacy_state(struct tegra_plane *tegra,
 				   struct tegra_plane_state *state);
+int tegra_plane_interconnect_init(struct tegra_plane *plane);
 
 #endif /* TEGRA_PLANE_H */
-- 
2.29.2


^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH v11 08/10] drm/tegra: dc: Extend debug stats with total number of events
  2020-12-03 19:24 [PATCH v11 00/10] Introduce memory interconnect for NVIDIA Tegra SoCs Dmitry Osipenko
                   ` (6 preceding siblings ...)
  2020-12-03 19:24 ` [PATCH v11 07/10] drm/tegra: dc: Support memory bandwidth management Dmitry Osipenko
@ 2020-12-03 19:24 ` Dmitry Osipenko
  2020-12-03 19:24 ` [PATCH v11 09/10] PM / devfreq: tegra30: Support interconnect and OPPs from device-tree Dmitry Osipenko
                   ` (2 subsequent siblings)
  10 siblings, 0 replies; 22+ messages in thread
From: Dmitry Osipenko @ 2020-12-03 19:24 UTC (permalink / raw)
  To: Thierry Reding, Jonathan Hunter, Georgi Djakov, Rob Herring,
	Michael Turquette, Stephen Boyd, Peter De Schrijver,
	MyungJoo Ham, Kyungmin Park, Chanwoo Choi, Mikko Perttunen,
	Viresh Kumar, Peter Geis, Nicolas Chauvet, Krzysztof Kozlowski
  Cc: linux-tegra, linux-pm, linux-kernel, dri-devel, devicetree

It's useful to know the total number of underflow events and currently
the debug stats are getting reset each time CRTC is being disabled. Let's
account the overall number of events that doesn't get a reset.

Tested-by: Peter Geis <pgwipeout@gmail.com>
Tested-by: Nicolas Chauvet <kwizart@gmail.com>
Signed-off-by: Dmitry Osipenko <digetx@gmail.com>
---
 drivers/gpu/drm/tegra/dc.c | 10 ++++++++++
 drivers/gpu/drm/tegra/dc.h |  5 +++++
 2 files changed, 15 insertions(+)

diff --git a/drivers/gpu/drm/tegra/dc.c b/drivers/gpu/drm/tegra/dc.c
index 5c587cfd1bb2..b6676f1fe358 100644
--- a/drivers/gpu/drm/tegra/dc.c
+++ b/drivers/gpu/drm/tegra/dc.c
@@ -1539,6 +1539,11 @@ static int tegra_dc_show_stats(struct seq_file *s, void *data)
 	seq_printf(s, "underflow: %lu\n", dc->stats.underflow);
 	seq_printf(s, "overflow: %lu\n", dc->stats.overflow);
 
+	seq_printf(s, "frames total: %lu\n", dc->stats.frames_total);
+	seq_printf(s, "vblank total: %lu\n", dc->stats.vblank_total);
+	seq_printf(s, "underflow total: %lu\n", dc->stats.underflow_total);
+	seq_printf(s, "overflow total: %lu\n", dc->stats.overflow_total);
+
 	return 0;
 }
 
@@ -2310,6 +2315,7 @@ static irqreturn_t tegra_dc_irq(int irq, void *data)
 		/*
 		dev_dbg(dc->dev, "%s(): frame end\n", __func__);
 		*/
+		dc->stats.frames_total++;
 		dc->stats.frames++;
 	}
 
@@ -2318,6 +2324,7 @@ static irqreturn_t tegra_dc_irq(int irq, void *data)
 		dev_dbg(dc->dev, "%s(): vertical blank\n", __func__);
 		*/
 		drm_crtc_handle_vblank(&dc->base);
+		dc->stats.vblank_total++;
 		dc->stats.vblank++;
 	}
 
@@ -2325,6 +2332,7 @@ static irqreturn_t tegra_dc_irq(int irq, void *data)
 		/*
 		dev_dbg(dc->dev, "%s(): underflow\n", __func__);
 		*/
+		dc->stats.underflow_total++;
 		dc->stats.underflow++;
 	}
 
@@ -2332,11 +2340,13 @@ static irqreturn_t tegra_dc_irq(int irq, void *data)
 		/*
 		dev_dbg(dc->dev, "%s(): overflow\n", __func__);
 		*/
+		dc->stats.overflow_total++;
 		dc->stats.overflow++;
 	}
 
 	if (status & HEAD_UF_INT) {
 		dev_dbg_ratelimited(dc->dev, "%s(): head underflow\n", __func__);
+		dc->stats.underflow_total++;
 		dc->stats.underflow++;
 	}
 
diff --git a/drivers/gpu/drm/tegra/dc.h b/drivers/gpu/drm/tegra/dc.h
index 0d7bdf66a1ec..ba4ed35139fb 100644
--- a/drivers/gpu/drm/tegra/dc.h
+++ b/drivers/gpu/drm/tegra/dc.h
@@ -48,6 +48,11 @@ struct tegra_dc_stats {
 	unsigned long vblank;
 	unsigned long underflow;
 	unsigned long overflow;
+
+	unsigned long frames_total;
+	unsigned long vblank_total;
+	unsigned long underflow_total;
+	unsigned long overflow_total;
 };
 
 struct tegra_windowgroup_soc {
-- 
2.29.2


^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH v11 09/10] PM / devfreq: tegra30: Support interconnect and OPPs from device-tree
  2020-12-03 19:24 [PATCH v11 00/10] Introduce memory interconnect for NVIDIA Tegra SoCs Dmitry Osipenko
                   ` (7 preceding siblings ...)
  2020-12-03 19:24 ` [PATCH v11 08/10] drm/tegra: dc: Extend debug stats with total number of events Dmitry Osipenko
@ 2020-12-03 19:24 ` Dmitry Osipenko
  2020-12-07  1:32   ` Chanwoo Choi
  2020-12-03 19:24 ` [PATCH v11 10/10] PM / devfreq: tegra30: Separate configurations per-SoC generation Dmitry Osipenko
  2020-12-05 14:09 ` [PATCH v11 00/10] Introduce memory interconnect for NVIDIA Tegra SoCs Krzysztof Kozlowski
  10 siblings, 1 reply; 22+ messages in thread
From: Dmitry Osipenko @ 2020-12-03 19:24 UTC (permalink / raw)
  To: Thierry Reding, Jonathan Hunter, Georgi Djakov, Rob Herring,
	Michael Turquette, Stephen Boyd, Peter De Schrijver,
	MyungJoo Ham, Kyungmin Park, Chanwoo Choi, Mikko Perttunen,
	Viresh Kumar, Peter Geis, Nicolas Chauvet, Krzysztof Kozlowski
  Cc: linux-tegra, linux-pm, linux-kernel, dri-devel, devicetree

This patch moves ACTMON driver away from generating OPP table by itself,
transitioning it to use the table which comes from device-tree. This
change breaks compatibility with older device-trees and brings support
for the interconnect framework to the driver. This is a mandatory change
which needs to be done in order to implement interconnect-based memory
DVFS, i.e. device-trees need to be updated. Now ACTMON issues a memory
bandwidth requests using dev_pm_opp_set_bw() instead of driving EMC clock
rate directly.

Tested-by: Peter Geis <pgwipeout@gmail.com>
Tested-by: Nicolas Chauvet <kwizart@gmail.com>
Acked-by: Chanwoo Choi <cw00.choi@samsung.com>
Signed-off-by: Dmitry Osipenko <digetx@gmail.com>
---
 drivers/devfreq/tegra30-devfreq.c | 79 +++++++++++++++----------------
 1 file changed, 37 insertions(+), 42 deletions(-)

diff --git a/drivers/devfreq/tegra30-devfreq.c b/drivers/devfreq/tegra30-devfreq.c
index 38cc0d014738..145ef91ae092 100644
--- a/drivers/devfreq/tegra30-devfreq.c
+++ b/drivers/devfreq/tegra30-devfreq.c
@@ -19,6 +19,8 @@
 #include <linux/reset.h>
 #include <linux/workqueue.h>
 
+#include <soc/tegra/fuse.h>
+
 #include "governor.h"
 
 #define ACTMON_GLB_STATUS					0x0
@@ -155,6 +157,7 @@ struct tegra_devfreq_device {
 
 struct tegra_devfreq {
 	struct devfreq		*devfreq;
+	struct opp_table	*opp_table;
 
 	struct reset_control	*reset;
 	struct clk		*clock;
@@ -612,34 +615,19 @@ static void tegra_actmon_stop(struct tegra_devfreq *tegra)
 static int tegra_devfreq_target(struct device *dev, unsigned long *freq,
 				u32 flags)
 {
-	struct tegra_devfreq *tegra = dev_get_drvdata(dev);
-	struct devfreq *devfreq = tegra->devfreq;
 	struct dev_pm_opp *opp;
-	unsigned long rate;
-	int err;
+	int ret;
 
 	opp = devfreq_recommended_opp(dev, freq, flags);
 	if (IS_ERR(opp)) {
 		dev_err(dev, "Failed to find opp for %lu Hz\n", *freq);
 		return PTR_ERR(opp);
 	}
-	rate = dev_pm_opp_get_freq(opp);
-	dev_pm_opp_put(opp);
-
-	err = clk_set_min_rate(tegra->emc_clock, rate * KHZ);
-	if (err)
-		return err;
-
-	err = clk_set_rate(tegra->emc_clock, 0);
-	if (err)
-		goto restore_min_rate;
 
-	return 0;
-
-restore_min_rate:
-	clk_set_min_rate(tegra->emc_clock, devfreq->previous_freq);
+	ret = dev_pm_opp_set_bw(dev, opp);
+	dev_pm_opp_put(opp);
 
-	return err;
+	return ret;
 }
 
 static int tegra_devfreq_get_dev_status(struct device *dev,
@@ -655,7 +643,7 @@ static int tegra_devfreq_get_dev_status(struct device *dev,
 	stat->private_data = tegra;
 
 	/* The below are to be used by the other governors */
-	stat->current_frequency = cur_freq;
+	stat->current_frequency = cur_freq * KHZ;
 
 	actmon_dev = &tegra->devices[MCALL];
 
@@ -705,7 +693,12 @@ static int tegra_governor_get_target(struct devfreq *devfreq,
 		target_freq = max(target_freq, dev->target_freq);
 	}
 
-	*freq = target_freq;
+	/*
+	 * tegra-devfreq driver operates with KHz units, while OPP table
+	 * entries use Hz units. Hence we need to convert the units for the
+	 * devfreq core.
+	 */
+	*freq = target_freq * KHZ;
 
 	return 0;
 }
@@ -774,6 +767,7 @@ static struct devfreq_governor tegra_devfreq_governor = {
 
 static int tegra_devfreq_probe(struct platform_device *pdev)
 {
+	u32 hw_version = BIT(tegra_sku_info.soc_speedo_id);
 	struct tegra_devfreq_device *dev;
 	struct tegra_devfreq *tegra;
 	struct devfreq *devfreq;
@@ -822,11 +816,25 @@ static int tegra_devfreq_probe(struct platform_device *pdev)
 		return err;
 	}
 
+	tegra->opp_table = dev_pm_opp_set_supported_hw(&pdev->dev,
+						       &hw_version, 1);
+	err = PTR_ERR_OR_ZERO(tegra->opp_table);
+	if (err) {
+		dev_err(&pdev->dev, "Failed to set supported HW: %d\n", err);
+		return err;
+	}
+
+	err = dev_pm_opp_of_add_table(&pdev->dev);
+	if (err) {
+		dev_err(&pdev->dev, "Failed to add OPP table: %d\n", err);
+		goto put_hw;
+	}
+
 	err = clk_prepare_enable(tegra->clock);
 	if (err) {
 		dev_err(&pdev->dev,
 			"Failed to prepare and enable ACTMON clock\n");
-		return err;
+		goto remove_table;
 	}
 
 	err = reset_control_reset(tegra->reset);
@@ -850,23 +858,6 @@ static int tegra_devfreq_probe(struct platform_device *pdev)
 		dev->regs = tegra->regs + dev->config->offset;
 	}
 
-	for (rate = 0; rate <= tegra->max_freq * KHZ; rate++) {
-		rate = clk_round_rate(tegra->emc_clock, rate);
-
-		if (rate < 0) {
-			dev_err(&pdev->dev,
-				"Failed to round clock rate: %ld\n", rate);
-			err = rate;
-			goto remove_opps;
-		}
-
-		err = dev_pm_opp_add(&pdev->dev, rate / KHZ, 0);
-		if (err) {
-			dev_err(&pdev->dev, "Failed to add OPP: %d\n", err);
-			goto remove_opps;
-		}
-	}
-
 	platform_set_drvdata(pdev, tegra);
 
 	tegra->clk_rate_change_nb.notifier_call = tegra_actmon_clk_notify_cb;
@@ -882,7 +873,6 @@ static int tegra_devfreq_probe(struct platform_device *pdev)
 	}
 
 	tegra_devfreq_profile.initial_freq = clk_get_rate(tegra->emc_clock);
-	tegra_devfreq_profile.initial_freq /= KHZ;
 
 	devfreq = devfreq_add_device(&pdev->dev, &tegra_devfreq_profile,
 				     "tegra_actmon", NULL);
@@ -902,6 +892,10 @@ static int tegra_devfreq_probe(struct platform_device *pdev)
 	reset_control_reset(tegra->reset);
 disable_clk:
 	clk_disable_unprepare(tegra->clock);
+remove_table:
+	dev_pm_opp_of_remove_table(&pdev->dev);
+put_hw:
+	dev_pm_opp_put_supported_hw(tegra->opp_table);
 
 	return err;
 }
@@ -913,11 +907,12 @@ static int tegra_devfreq_remove(struct platform_device *pdev)
 	devfreq_remove_device(tegra->devfreq);
 	devfreq_remove_governor(&tegra_devfreq_governor);
 
-	dev_pm_opp_remove_all_dynamic(&pdev->dev);
-
 	reset_control_reset(tegra->reset);
 	clk_disable_unprepare(tegra->clock);
 
+	dev_pm_opp_of_remove_table(&pdev->dev);
+	dev_pm_opp_put_supported_hw(tegra->opp_table);
+
 	return 0;
 }
 
-- 
2.29.2


^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH v11 10/10] PM / devfreq: tegra30: Separate configurations per-SoC generation
  2020-12-03 19:24 [PATCH v11 00/10] Introduce memory interconnect for NVIDIA Tegra SoCs Dmitry Osipenko
                   ` (8 preceding siblings ...)
  2020-12-03 19:24 ` [PATCH v11 09/10] PM / devfreq: tegra30: Support interconnect and OPPs from device-tree Dmitry Osipenko
@ 2020-12-03 19:24 ` Dmitry Osipenko
  2020-12-07  1:33   ` Chanwoo Choi
  2020-12-05 14:09 ` [PATCH v11 00/10] Introduce memory interconnect for NVIDIA Tegra SoCs Krzysztof Kozlowski
  10 siblings, 1 reply; 22+ messages in thread
From: Dmitry Osipenko @ 2020-12-03 19:24 UTC (permalink / raw)
  To: Thierry Reding, Jonathan Hunter, Georgi Djakov, Rob Herring,
	Michael Turquette, Stephen Boyd, Peter De Schrijver,
	MyungJoo Ham, Kyungmin Park, Chanwoo Choi, Mikko Perttunen,
	Viresh Kumar, Peter Geis, Nicolas Chauvet, Krzysztof Kozlowski
  Cc: linux-tegra, linux-pm, linux-kernel, dri-devel, devicetree

Previously we were using count-weight of the T124 for T30 in order to
get EMC clock rate that was reasonable for T30. In fact the count-weight
should be x2 times smaller on T30, but then devfreq was producing a bit
too low EMC clock rate for ISO memory clients, like display controller
for example.

Now both Tegra ACTMON and Tegra DRM display drivers support interconnect
framework and display driver tells to ICC what a minimum memory bandwidth
is needed, preventing FIFO underflows. Thus, now we can use a proper
count-weight value for Tegra30 and MC_ALL device config needs a bit more
aggressive boosting.

Add a separate ACTMON driver configuration that is specific to Tegra30.

Tested-by: Peter Geis <pgwipeout@gmail.com>
Tested-by: Nicolas Chauvet <kwizart@gmail.com>
Acked-by: Chanwoo Choi <cw00.choi@samsung.com>
Signed-off-by: Dmitry Osipenko <digetx@gmail.com>
---
 drivers/devfreq/tegra30-devfreq.c | 68 ++++++++++++++++++++++++-------
 1 file changed, 54 insertions(+), 14 deletions(-)

diff --git a/drivers/devfreq/tegra30-devfreq.c b/drivers/devfreq/tegra30-devfreq.c
index 145ef91ae092..117cad7968ab 100644
--- a/drivers/devfreq/tegra30-devfreq.c
+++ b/drivers/devfreq/tegra30-devfreq.c
@@ -57,13 +57,6 @@
 #define ACTMON_BELOW_WMARK_WINDOW				3
 #define ACTMON_BOOST_FREQ_STEP					16000
 
-/*
- * Activity counter is incremented every 256 memory transactions, and each
- * transaction takes 4 EMC clocks for Tegra124; So the COUNT_WEIGHT is
- * 4 * 256 = 1024.
- */
-#define ACTMON_COUNT_WEIGHT					0x400
-
 /*
  * ACTMON_AVERAGE_WINDOW_LOG2: default value for @DEV_CTRL_K_VAL, which
  * translates to 2 ^ (K_VAL + 1). ex: 2 ^ (6 + 1) = 128
@@ -111,7 +104,7 @@ enum tegra_actmon_device {
 	MCCPU,
 };
 
-static const struct tegra_devfreq_device_config actmon_device_configs[] = {
+static const struct tegra_devfreq_device_config tegra124_device_configs[] = {
 	{
 		/* MCALL: All memory accesses (including from the CPUs) */
 		.offset = 0x1c0,
@@ -133,6 +126,28 @@ static const struct tegra_devfreq_device_config actmon_device_configs[] = {
 	},
 };
 
+static const struct tegra_devfreq_device_config tegra30_device_configs[] = {
+	{
+		/* MCALL: All memory accesses (including from the CPUs) */
+		.offset = 0x1c0,
+		.irq_mask = 1 << 26,
+		.boost_up_coeff = 200,
+		.boost_down_coeff = 50,
+		.boost_up_threshold = 20,
+		.boost_down_threshold = 10,
+	},
+	{
+		/* MCCPU: memory accesses from the CPUs */
+		.offset = 0x200,
+		.irq_mask = 1 << 25,
+		.boost_up_coeff = 800,
+		.boost_down_coeff = 40,
+		.boost_up_threshold = 27,
+		.boost_down_threshold = 10,
+		.avg_dependency_threshold = 16000, /* 16MHz in kHz units */
+	},
+};
+
 /**
  * struct tegra_devfreq_device - state specific to an ACTMON device
  *
@@ -155,6 +170,12 @@ struct tegra_devfreq_device {
 	unsigned long target_freq;
 };
 
+struct tegra_devfreq_soc_data {
+	const struct tegra_devfreq_device_config *configs;
+	/* Weight value for count measurements */
+	unsigned int count_weight;
+};
+
 struct tegra_devfreq {
 	struct devfreq		*devfreq;
 	struct opp_table	*opp_table;
@@ -171,11 +192,13 @@ struct tegra_devfreq {
 	struct delayed_work	cpufreq_update_work;
 	struct notifier_block	cpu_rate_change_nb;
 
-	struct tegra_devfreq_device devices[ARRAY_SIZE(actmon_device_configs)];
+	struct tegra_devfreq_device devices[2];
 
 	unsigned int		irq;
 
 	bool			started;
+
+	const struct tegra_devfreq_soc_data *soc;
 };
 
 struct tegra_actmon_emc_ratio {
@@ -488,7 +511,7 @@ static void tegra_actmon_configure_device(struct tegra_devfreq *tegra,
 	tegra_devfreq_update_avg_wmark(tegra, dev);
 	tegra_devfreq_update_wmark(tegra, dev);
 
-	device_writel(dev, ACTMON_COUNT_WEIGHT, ACTMON_DEV_COUNT_WEIGHT);
+	device_writel(dev, tegra->soc->count_weight, ACTMON_DEV_COUNT_WEIGHT);
 	device_writel(dev, ACTMON_INTR_STATUS_CLEAR, ACTMON_DEV_INTR_STATUS);
 
 	val |= ACTMON_DEV_CTRL_ENB_PERIODIC;
@@ -779,6 +802,8 @@ static int tegra_devfreq_probe(struct platform_device *pdev)
 	if (!tegra)
 		return -ENOMEM;
 
+	tegra->soc = of_device_get_match_data(&pdev->dev);
+
 	tegra->regs = devm_platform_ioremap_resource(pdev, 0);
 	if (IS_ERR(tegra->regs))
 		return PTR_ERR(tegra->regs);
@@ -852,9 +877,9 @@ static int tegra_devfreq_probe(struct platform_device *pdev)
 
 	tegra->max_freq = rate / KHZ;
 
-	for (i = 0; i < ARRAY_SIZE(actmon_device_configs); i++) {
+	for (i = 0; i < ARRAY_SIZE(tegra->devices); i++) {
 		dev = tegra->devices + i;
-		dev->config = actmon_device_configs + i;
+		dev->config = tegra->soc->configs + i;
 		dev->regs = tegra->regs + dev->config->offset;
 	}
 
@@ -916,9 +941,24 @@ static int tegra_devfreq_remove(struct platform_device *pdev)
 	return 0;
 }
 
+static const struct tegra_devfreq_soc_data tegra124_soc = {
+	.configs = tegra124_device_configs,
+
+	/*
+	 * Activity counter is incremented every 256 memory transactions,
+	 * and each transaction takes 4 EMC clocks.
+	 */
+	.count_weight = 4 * 256,
+};
+
+static const struct tegra_devfreq_soc_data tegra30_soc = {
+	.configs = tegra30_device_configs,
+	.count_weight = 2 * 256,
+};
+
 static const struct of_device_id tegra_devfreq_of_match[] = {
-	{ .compatible = "nvidia,tegra30-actmon" },
-	{ .compatible = "nvidia,tegra124-actmon" },
+	{ .compatible = "nvidia,tegra30-actmon",  .data = &tegra30_soc, },
+	{ .compatible = "nvidia,tegra124-actmon", .data = &tegra124_soc, },
 	{ },
 };
 
-- 
2.29.2


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v11 01/10] dt-bindings: memory: tegra20: emc: Document opp-supported-hw property
  2020-12-03 19:24 ` [PATCH v11 01/10] dt-bindings: memory: tegra20: emc: Document opp-supported-hw property Dmitry Osipenko
@ 2020-12-04 15:46   ` Thierry Reding
  2020-12-09 20:17   ` Rob Herring
  1 sibling, 0 replies; 22+ messages in thread
From: Thierry Reding @ 2020-12-04 15:46 UTC (permalink / raw)
  To: Dmitry Osipenko
  Cc: Jonathan Hunter, Georgi Djakov, Rob Herring, Michael Turquette,
	Stephen Boyd, Peter De Schrijver, MyungJoo Ham, Kyungmin Park,
	Chanwoo Choi, Mikko Perttunen, Viresh Kumar, Peter Geis,
	Nicolas Chauvet, Krzysztof Kozlowski, linux-tegra, linux-pm,
	linux-kernel, dri-devel, devicetree

[-- Attachment #1: Type: text/plain, Size: 537 bytes --]

On Thu, Dec 03, 2020 at 10:24:30PM +0300, Dmitry Osipenko wrote:
> Document opp-supported-hw property, which is not strictly necessary to
> have on Tegra20, but it's very convenient to have because all other SoC
> core devices will use hardware versioning, and thus, it's good to maintain
> the consistency.
> 
> Signed-off-by: Dmitry Osipenko <digetx@gmail.com>
> ---
>  .../bindings/memory-controllers/nvidia,tegra20-emc.txt      | 6 ++++++
>  1 file changed, 6 insertions(+)

Acked-by: Thierry Reding <treding@nvidia.com>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v11 02/10] memory: tegra20: Support hardware versioning and clean up OPP table initialization
  2020-12-03 19:24 ` [PATCH v11 02/10] memory: tegra20: Support hardware versioning and clean up OPP table initialization Dmitry Osipenko
@ 2020-12-04 16:20   ` Thierry Reding
  0 siblings, 0 replies; 22+ messages in thread
From: Thierry Reding @ 2020-12-04 16:20 UTC (permalink / raw)
  To: Dmitry Osipenko
  Cc: Jonathan Hunter, Georgi Djakov, Rob Herring, Michael Turquette,
	Stephen Boyd, Peter De Schrijver, MyungJoo Ham, Kyungmin Park,
	Chanwoo Choi, Mikko Perttunen, Viresh Kumar, Peter Geis,
	Nicolas Chauvet, Krzysztof Kozlowski, linux-tegra, linux-pm,
	linux-kernel, dri-devel, devicetree

[-- Attachment #1: Type: text/plain, Size: 723 bytes --]

On Thu, Dec 03, 2020 at 10:24:31PM +0300, Dmitry Osipenko wrote:
> Support hardware versioning, which is now required for Tegra20 EMC OPP.
> Clean up OPP table initialization by using a error code returned by OPP
> API for judging about the OPP table presence in a device-tree and remove
> OPP regulator initialization because we're now going to use power domain
> instead of a raw regulator. This puts Tegra20 EMC OPP preparation on par
> with the Tegra30/124 EMC drivers.
> 
> Signed-off-by: Dmitry Osipenko <digetx@gmail.com>
> ---
>  drivers/memory/tegra/tegra20-emc.c | 48 +++++++++++++-----------------
>  1 file changed, 20 insertions(+), 28 deletions(-)

Acked-by: Thierry Reding <treding@nvidia.com>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v11 03/10] memory: tegra30: Support interconnect framework
  2020-12-03 19:24 ` [PATCH v11 03/10] memory: tegra30: Support interconnect framework Dmitry Osipenko
@ 2020-12-04 16:27   ` Thierry Reding
  0 siblings, 0 replies; 22+ messages in thread
From: Thierry Reding @ 2020-12-04 16:27 UTC (permalink / raw)
  To: Dmitry Osipenko
  Cc: Jonathan Hunter, Georgi Djakov, Rob Herring, Michael Turquette,
	Stephen Boyd, Peter De Schrijver, MyungJoo Ham, Kyungmin Park,
	Chanwoo Choi, Mikko Perttunen, Viresh Kumar, Peter Geis,
	Nicolas Chauvet, Krzysztof Kozlowski, linux-tegra, linux-pm,
	linux-kernel, dri-devel, devicetree

[-- Attachment #1: Type: text/plain, Size: 874 bytes --]

On Thu, Dec 03, 2020 at 10:24:32PM +0300, Dmitry Osipenko wrote:
> Now Internal and External memory controllers are memory interconnection
> providers. This allows us to use interconnect API for tuning of memory
> configuration. EMC driver now supports OPPs and DVFS. MC driver now
> supports tuning of memory arbitration latency, which needs to be done
> for ISO memory clients, like a Display client for example.
> 
> Tested-by: Peter Geis <pgwipeout@gmail.com>
> Acked-by: Georgi Djakov <georgi.djakov@linaro.org>
> Signed-off-by: Dmitry Osipenko <digetx@gmail.com>
> ---
>  drivers/memory/tegra/Kconfig       |   1 +
>  drivers/memory/tegra/tegra30-emc.c | 344 +++++++++++++++++++++++++++--
>  drivers/memory/tegra/tegra30.c     | 173 ++++++++++++++-
>  3 files changed, 496 insertions(+), 22 deletions(-)

Acked-by: Thierry Reding <treding@nvidia.com>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v11 04/10] memory: tegra124-emc: Make driver modular
  2020-12-03 19:24 ` [PATCH v11 04/10] memory: tegra124-emc: Make driver modular Dmitry Osipenko
@ 2020-12-04 16:41   ` Thierry Reding
  2020-12-05 19:52     ` Dmitry Osipenko
  0 siblings, 1 reply; 22+ messages in thread
From: Thierry Reding @ 2020-12-04 16:41 UTC (permalink / raw)
  To: Dmitry Osipenko
  Cc: Jonathan Hunter, Georgi Djakov, Rob Herring, Michael Turquette,
	Stephen Boyd, Peter De Schrijver, MyungJoo Ham, Kyungmin Park,
	Chanwoo Choi, Mikko Perttunen, Viresh Kumar, Peter Geis,
	Nicolas Chauvet, Krzysztof Kozlowski, linux-tegra, linux-pm,
	linux-kernel, dri-devel, devicetree

[-- Attachment #1: Type: text/plain, Size: 6920 bytes --]

On Thu, Dec 03, 2020 at 10:24:33PM +0300, Dmitry Osipenko wrote:
> Add modularization support to the Tegra124 EMC driver, which now can be
> compiled as a loadable kernel module.
> 
> Note that EMC clock must be registered at clk-init time, otherwise PLLM
> will be disabled as unused clock at boot time if EMC driver is compiled
> as a module. Hence add a prepare/complete callbacks. similarly to what is
> done for the Tegra20/30 EMC drivers.
> 
> Tested-by: Nicolas Chauvet <kwizart@gmail.com>
> Signed-off-by: Dmitry Osipenko <digetx@gmail.com>
> ---
>  drivers/clk/tegra/Kconfig            |  3 ++
>  drivers/clk/tegra/Makefile           |  2 +-
>  drivers/clk/tegra/clk-tegra124-emc.c | 41 ++++++++++++++++++++++++----
>  drivers/clk/tegra/clk-tegra124.c     | 26 ++++++++++++++++--
>  drivers/clk/tegra/clk.h              | 18 ++++++++----
>  drivers/memory/tegra/Kconfig         |  3 +-
>  drivers/memory/tegra/tegra124-emc.c  | 31 ++++++++++++++-------
>  include/linux/clk/tegra.h            |  8 ++++++
>  include/soc/tegra/emc.h              | 16 -----------
>  9 files changed, 106 insertions(+), 42 deletions(-)
>  delete mode 100644 include/soc/tegra/emc.h
> 
> diff --git a/drivers/clk/tegra/Kconfig b/drivers/clk/tegra/Kconfig
> index deaa4605824c..90df619dc087 100644
> --- a/drivers/clk/tegra/Kconfig
> +++ b/drivers/clk/tegra/Kconfig
> @@ -7,3 +7,6 @@ config TEGRA_CLK_DFLL
>  	depends on ARCH_TEGRA_124_SOC || ARCH_TEGRA_210_SOC
>  	select PM_OPP
>  	def_bool y
> +
> +config TEGRA124_CLK_EMC
> +	bool
> diff --git a/drivers/clk/tegra/Makefile b/drivers/clk/tegra/Makefile
> index eec2313fd37e..7b1816856eb5 100644
> --- a/drivers/clk/tegra/Makefile
> +++ b/drivers/clk/tegra/Makefile
> @@ -22,7 +22,7 @@ obj-$(CONFIG_ARCH_TEGRA_3x_SOC)		+= clk-tegra20-emc.o
>  obj-$(CONFIG_ARCH_TEGRA_114_SOC)	+= clk-tegra114.o
>  obj-$(CONFIG_ARCH_TEGRA_124_SOC)	+= clk-tegra124.o
>  obj-$(CONFIG_TEGRA_CLK_DFLL)		+= clk-tegra124-dfll-fcpu.o
> -obj-$(CONFIG_TEGRA124_EMC)		+= clk-tegra124-emc.o
> +obj-$(CONFIG_TEGRA124_CLK_EMC)		+= clk-tegra124-emc.o
>  obj-$(CONFIG_ARCH_TEGRA_132_SOC)	+= clk-tegra124.o
>  obj-y					+= cvb.o
>  obj-$(CONFIG_ARCH_TEGRA_210_SOC)	+= clk-tegra210.o
> diff --git a/drivers/clk/tegra/clk-tegra124-emc.c b/drivers/clk/tegra/clk-tegra124-emc.c
> index 745f9faa98d8..bdf6f4a51617 100644
> --- a/drivers/clk/tegra/clk-tegra124-emc.c
> +++ b/drivers/clk/tegra/clk-tegra124-emc.c
> @@ -11,7 +11,9 @@
>  #include <linux/clk-provider.h>
>  #include <linux/clk.h>
>  #include <linux/clkdev.h>
> +#include <linux/clk/tegra.h>
>  #include <linux/delay.h>
> +#include <linux/export.h>
>  #include <linux/io.h>
>  #include <linux/module.h>
>  #include <linux/of_address.h>
> @@ -21,7 +23,6 @@
>  #include <linux/string.h>
>  
>  #include <soc/tegra/fuse.h>
> -#include <soc/tegra/emc.h>
>  
>  #include "clk.h"
>  
> @@ -80,6 +81,9 @@ struct tegra_clk_emc {
>  	int num_timings;
>  	struct emc_timing *timings;
>  	spinlock_t *lock;
> +
> +	tegra124_emc_prepare_timing_change_cb *prepare_timing_change;
> +	tegra124_emc_complete_timing_change_cb *complete_timing_change;
>  };
>  
>  /* Common clock framework callback implementations */
> @@ -176,6 +180,9 @@ static struct tegra_emc *emc_ensure_emc_driver(struct tegra_clk_emc *tegra)
>  	if (tegra->emc)
>  		return tegra->emc;
>  
> +	if (!tegra->prepare_timing_change || !tegra->complete_timing_change)
> +		return NULL;
> +
>  	if (!tegra->emc_node)
>  		return NULL;
>  
> @@ -241,7 +248,7 @@ static int emc_set_timing(struct tegra_clk_emc *tegra,
>  
>  	div = timing->parent_rate / (timing->rate / 2) - 2;
>  
> -	err = tegra_emc_prepare_timing_change(emc, timing->rate);
> +	err = tegra->prepare_timing_change(emc, timing->rate);
>  	if (err)
>  		return err;
>  
> @@ -259,7 +266,7 @@ static int emc_set_timing(struct tegra_clk_emc *tegra,
>  
>  	spin_unlock_irqrestore(tegra->lock, flags);
>  
> -	tegra_emc_complete_timing_change(emc, timing->rate);
> +	tegra->complete_timing_change(emc, timing->rate);
>  
>  	clk_hw_reparent(&tegra->hw, __clk_get_hw(timing->parent));
>  	clk_disable_unprepare(tegra->prev_parent);
> @@ -473,8 +480,8 @@ static const struct clk_ops tegra_clk_emc_ops = {
>  	.get_parent = emc_get_parent,
>  };
>  
> -struct clk *tegra_clk_register_emc(void __iomem *base, struct device_node *np,
> -				   spinlock_t *lock)
> +struct clk *tegra124_clk_register_emc(void __iomem *base, struct device_node *np,
> +				      spinlock_t *lock)
>  {
>  	struct tegra_clk_emc *tegra;
>  	struct clk_init_data init;
> @@ -538,3 +545,27 @@ struct clk *tegra_clk_register_emc(void __iomem *base, struct device_node *np,
>  
>  	return clk;
>  };
> +
> +void tegra124_clk_set_emc_callbacks(tegra124_emc_prepare_timing_change_cb *prep_cb,
> +				    tegra124_emc_complete_timing_change_cb *complete_cb)
> +{
> +	struct clk *clk = __clk_lookup("emc");
> +	struct tegra_clk_emc *tegra;
> +	struct clk_hw *hw;
> +
> +	if (clk) {
> +		hw = __clk_get_hw(clk);
> +		tegra = container_of(hw, struct tegra_clk_emc, hw);
> +
> +		tegra->prepare_timing_change = prep_cb;
> +		tegra->complete_timing_change = complete_cb;
> +	}
> +}
> +EXPORT_SYMBOL_GPL(tegra124_clk_set_emc_callbacks);
> +
> +bool tegra124_clk_emc_driver_available(struct clk_hw *hw)
> +{
> +	struct tegra_clk_emc *tegra = container_of(hw, struct tegra_clk_emc, hw);
> +
> +	return tegra->prepare_timing_change && tegra->complete_timing_change;
> +}

This looks a bit hackish and I prefer the way this was done for
Tegra210. But that's mostly an implementation detail and we can always
restructure this if we want to.

> diff --git a/drivers/clk/tegra/clk-tegra124.c b/drivers/clk/tegra/clk-tegra124.c
> index e931319dcc9d..934520aab6e3 100644
> --- a/drivers/clk/tegra/clk-tegra124.c
> +++ b/drivers/clk/tegra/clk-tegra124.c
> @@ -1500,6 +1500,26 @@ static void __init tegra124_132_clock_init_pre(struct device_node *np)
>  	writel(plld_base, clk_base + PLLD_BASE);
>  }
>  
> +static struct clk *tegra124_clk_src_onecell_get(struct of_phandle_args *clkspec,
> +						void *data)
> +{
> +	struct clk_hw *hw;
> +	struct clk *clk;
> +
> +	clk = of_clk_src_onecell_get(clkspec, data);
> +	if (IS_ERR(clk))
> +		return clk;
> +
> +	hw = __clk_get_hw(clk);
> +
> +	if (clkspec->args[0] == TEGRA124_CLK_EMC) {
> +		if (!tegra124_clk_emc_driver_available(hw))
> +			return ERR_PTR(-EPROBE_DEFER);
> +	}
> +
> +	return clk;
> +}

Hm... why exactly do we need this? On Tegra210 and later, the EMC driver
is the only consumer of the EMC clock and since it also provides some of
the necessary parts to scale the EMC clock, that's a chicken and egg
problem. I'm not sure I fully understand how this is supposed to work
here and why we can't do this in a similar way than Tegra210.

Thierry

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v11 00/10] Introduce memory interconnect for NVIDIA Tegra SoCs
  2020-12-03 19:24 [PATCH v11 00/10] Introduce memory interconnect for NVIDIA Tegra SoCs Dmitry Osipenko
                   ` (9 preceding siblings ...)
  2020-12-03 19:24 ` [PATCH v11 10/10] PM / devfreq: tegra30: Separate configurations per-SoC generation Dmitry Osipenko
@ 2020-12-05 14:09 ` Krzysztof Kozlowski
  2020-12-07 22:11   ` Dmitry Osipenko
  10 siblings, 1 reply; 22+ messages in thread
From: Krzysztof Kozlowski @ 2020-12-05 14:09 UTC (permalink / raw)
  To: Dmitry Osipenko, Mikko Perttunen, Nicolas Chauvet, Georgi Djakov,
	Peter Geis, MyungJoo Ham, Viresh Kumar, Thierry Reding,
	Michael Turquette, Kyungmin Park, Stephen Boyd, Rob Herring,
	Jonathan Hunter, Peter De Schrijver, Chanwoo Choi
  Cc: Krzysztof Kozlowski, linux-tegra, devicetree, dri-devel,
	linux-kernel, linux-pm

On Thu, 3 Dec 2020 22:24:29 +0300, Dmitry Osipenko wrote:
> This series brings initial support for memory interconnect to Tegra20,
> Tegra30 and Tegra124 SoCs.
> 
> For the starter only display controllers and devfreq devices are getting
> interconnect API support, others could be supported later on. The display
> controllers have the biggest demand for interconnect API right now because
> dynamic memory frequency scaling can't be done safely without taking into
> account bandwidth requirement from the displays. In particular this series
> fixes distorted display output on T30 Ouya and T124 TK1 devices.
> 
> [...]

Applied, thanks!

[01/10] dt-bindings: memory: tegra20: emc: Document opp-supported-hw property
[02/10] memory: tegra20: Support hardware versioning and clean up OPP table initialization
[03/10] memory: tegra30: Support interconnect framework
        commit: 01a51facb74fb337ff9fe734caa85dd6e246ef48

Best regards,
-- 
Krzysztof Kozlowski <krzk@kernel.org>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v11 04/10] memory: tegra124-emc: Make driver modular
  2020-12-04 16:41   ` Thierry Reding
@ 2020-12-05 19:52     ` Dmitry Osipenko
  0 siblings, 0 replies; 22+ messages in thread
From: Dmitry Osipenko @ 2020-12-05 19:52 UTC (permalink / raw)
  To: Thierry Reding
  Cc: Jonathan Hunter, Georgi Djakov, Rob Herring, Michael Turquette,
	Stephen Boyd, Peter De Schrijver, MyungJoo Ham, Kyungmin Park,
	Chanwoo Choi, Mikko Perttunen, Viresh Kumar, Peter Geis,
	Nicolas Chauvet, Krzysztof Kozlowski, linux-tegra, linux-pm,
	linux-kernel, dri-devel, devicetree

04.12.2020 19:41, Thierry Reding пишет:
...
>> +bool tegra124_clk_emc_driver_available(struct clk_hw *hw)
>> +{
>> +	struct tegra_clk_emc *tegra = container_of(hw, struct tegra_clk_emc, hw);
>> +
>> +	return tegra->prepare_timing_change && tegra->complete_timing_change;
>> +}
> 
> This looks a bit hackish and I prefer the way this was done for
> Tegra210.

I may have an opposite opinion :)

> But that's mostly an implementation detail and we can always
> restructure this if we want to.

This is true. I'm not saying that the current v11 variant is absolutely
ideal, but it should be good enough for the starter (IMO) and actually I
don't have any ideas right about what could be done better.

>> diff --git a/drivers/clk/tegra/clk-tegra124.c b/drivers/clk/tegra/clk-tegra124.c
>> index e931319dcc9d..934520aab6e3 100644
>> --- a/drivers/clk/tegra/clk-tegra124.c
>> +++ b/drivers/clk/tegra/clk-tegra124.c
>> @@ -1500,6 +1500,26 @@ static void __init tegra124_132_clock_init_pre(struct device_node *np)
>>  	writel(plld_base, clk_base + PLLD_BASE);
>>  }
>>  
>> +static struct clk *tegra124_clk_src_onecell_get(struct of_phandle_args *clkspec,
>> +						void *data)
>> +{
>> +	struct clk_hw *hw;
>> +	struct clk *clk;
>> +
>> +	clk = of_clk_src_onecell_get(clkspec, data);
>> +	if (IS_ERR(clk))
>> +		return clk;
>> +
>> +	hw = __clk_get_hw(clk);
>> +
>> +	if (clkspec->args[0] == TEGRA124_CLK_EMC) {
>> +		if (!tegra124_clk_emc_driver_available(hw))
>> +			return ERR_PTR(-EPROBE_DEFER);
>> +	}
>> +
>> +	return clk;
>> +}
> 
> Hm... why exactly do we need this? On Tegra210 and later, the EMC driver
> is the only consumer of the EMC clock and since it also provides some of
> the necessary parts to scale the EMC clock, that's a chicken and egg
> problem.

The T124 EMC driver has an existing active user for the EMC clock, the
devfreq/actmon driver which watches and drives the EMC clock rate. The
EMC clock shan't be requested by the devfreq driver until EMC driver is
ready, the only sensible way to achieve this is implemented by this patch.

The devfreq driver doesn't support T210 (yet?) and you should witness
the problem if you'll try to implement the T210 support.

> I'm not sure I fully understand how this is supposed to work
> here and why we can't do this in a similar way than Tegra210.

The CCF returns -EPROBE_DEFER for clk_get() only until clock provider is
registered, otherwise it returns a dummy/stub clock once provider is
available and clk (of the provider) isn't registered. The CCF provider
for the EMC clock is the tegra-clk driver, not the EMC driver.

Once clk_get() is invoked by a clk user, the CCF performs the clk lookup
using the DT specifier and this lookup is aborted with a -EPROBE_DEFER
from the clk_src_onecell_get() callback if EMC driver isn't loaded yet.
I don't think that there are any other variants to achieve this behaviour.

I also prefer to have a clean separation of the clk and EMC drivers
because this is a much more expressive variant than mixing drivers
together in obscure way. The pre-T210 EMC drivers don't need to touch
clk registers for programming of the memory timings, hence those EMC
drivers are in a bit better position than the T210 driver.

The T210 EMC driver also could have a cleaner separation by using a
special tegra-clk API for the clk/EMC functions, instead of shoving a
raw clk IO pointer to the EMC driver. It feels like I was already
suggesting this about a half-year ago, before the T210 driver was merged.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v11 09/10] PM / devfreq: tegra30: Support interconnect and OPPs from device-tree
  2020-12-03 19:24 ` [PATCH v11 09/10] PM / devfreq: tegra30: Support interconnect and OPPs from device-tree Dmitry Osipenko
@ 2020-12-07  1:32   ` Chanwoo Choi
  2020-12-07 22:11     ` Dmitry Osipenko
  0 siblings, 1 reply; 22+ messages in thread
From: Chanwoo Choi @ 2020-12-07  1:32 UTC (permalink / raw)
  To: Dmitry Osipenko, Thierry Reding, Jonathan Hunter, Georgi Djakov,
	Rob Herring, Michael Turquette, Stephen Boyd, Peter De Schrijver,
	MyungJoo Ham, Kyungmin Park, Mikko Perttunen, Viresh Kumar,
	Peter Geis, Nicolas Chauvet, Krzysztof Kozlowski
  Cc: linux-tegra, linux-pm, linux-kernel, dri-devel, devicetree

On 12/4/20 4:24 AM, Dmitry Osipenko wrote:
> This patch moves ACTMON driver away from generating OPP table by itself,
> transitioning it to use the table which comes from device-tree. This
> change breaks compatibility with older device-trees and brings support
> for the interconnect framework to the driver. This is a mandatory change
> which needs to be done in order to implement interconnect-based memory
> DVFS, i.e. device-trees need to be updated. Now ACTMON issues a memory
> bandwidth requests using dev_pm_opp_set_bw() instead of driving EMC clock
> rate directly.
> 
> Tested-by: Peter Geis <pgwipeout@gmail.com>
> Tested-by: Nicolas Chauvet <kwizart@gmail.com>
> Acked-by: Chanwoo Choi <cw00.choi@samsung.com>
> Signed-off-by: Dmitry Osipenko <digetx@gmail.com>
> ---
>  drivers/devfreq/tegra30-devfreq.c | 79 +++++++++++++++----------------
>  1 file changed, 37 insertions(+), 42 deletions(-)
> 
> diff --git a/drivers/devfreq/tegra30-devfreq.c b/drivers/devfreq/tegra30-devfreq.c
> index 38cc0d014738..145ef91ae092 100644
> --- a/drivers/devfreq/tegra30-devfreq.c
> +++ b/drivers/devfreq/tegra30-devfreq.c
> @@ -19,6 +19,8 @@
>  #include <linux/reset.h>
>  #include <linux/workqueue.h>
>  
> +#include <soc/tegra/fuse.h>
> +
>  #include "governor.h"
>  
>  #define ACTMON_GLB_STATUS					0x0
> @@ -155,6 +157,7 @@ struct tegra_devfreq_device {
>  
>  struct tegra_devfreq {
>  	struct devfreq		*devfreq;
> +	struct opp_table	*opp_table;
>  
>  	struct reset_control	*reset;
>  	struct clk		*clock;
> @@ -612,34 +615,19 @@ static void tegra_actmon_stop(struct tegra_devfreq *tegra)
>  static int tegra_devfreq_target(struct device *dev, unsigned long *freq,
>  				u32 flags)
>  {
> -	struct tegra_devfreq *tegra = dev_get_drvdata(dev);
> -	struct devfreq *devfreq = tegra->devfreq;
>  	struct dev_pm_opp *opp;
> -	unsigned long rate;
> -	int err;
> +	int ret;
>  
>  	opp = devfreq_recommended_opp(dev, freq, flags);
>  	if (IS_ERR(opp)) {
>  		dev_err(dev, "Failed to find opp for %lu Hz\n", *freq);
>  		return PTR_ERR(opp);
>  	}
> -	rate = dev_pm_opp_get_freq(opp);
> -	dev_pm_opp_put(opp);
> -
> -	err = clk_set_min_rate(tegra->emc_clock, rate * KHZ);
> -	if (err)
> -		return err;
> -
> -	err = clk_set_rate(tegra->emc_clock, 0);
> -	if (err)
> -		goto restore_min_rate;
>  
> -	return 0;
> -
> -restore_min_rate:
> -	clk_set_min_rate(tegra->emc_clock, devfreq->previous_freq);
> +	ret = dev_pm_opp_set_bw(dev, opp);
> +	dev_pm_opp_put(opp);
>  
> -	return err;
> +	return ret;
>  }
>  
>  static int tegra_devfreq_get_dev_status(struct device *dev,
> @@ -655,7 +643,7 @@ static int tegra_devfreq_get_dev_status(struct device *dev,
>  	stat->private_data = tegra;
>  
>  	/* The below are to be used by the other governors */
> -	stat->current_frequency = cur_freq;
> +	stat->current_frequency = cur_freq * KHZ;
>  
>  	actmon_dev = &tegra->devices[MCALL];
>  
> @@ -705,7 +693,12 @@ static int tegra_governor_get_target(struct devfreq *devfreq,
>  		target_freq = max(target_freq, dev->target_freq);
>  	}
>  
> -	*freq = target_freq;
> +	/*
> +	 * tegra-devfreq driver operates with KHz units, while OPP table
> +	 * entries use Hz units. Hence we need to convert the units for the
> +	 * devfreq core.
> +	 */
> +	*freq = target_freq * KHZ;
>  
>  	return 0;
>  }
> @@ -774,6 +767,7 @@ static struct devfreq_governor tegra_devfreq_governor = {
>  
>  static int tegra_devfreq_probe(struct platform_device *pdev)
>  {
> +	u32 hw_version = BIT(tegra_sku_info.soc_speedo_id);
>  	struct tegra_devfreq_device *dev;
>  	struct tegra_devfreq *tegra;
>  	struct devfreq *devfreq;
> @@ -822,11 +816,25 @@ static int tegra_devfreq_probe(struct platform_device *pdev)
>  		return err;
>  	}
>  
> +	tegra->opp_table = dev_pm_opp_set_supported_hw(&pdev->dev,
> +						       &hw_version, 1);
> +	err = PTR_ERR_OR_ZERO(tegra->opp_table);
> +	if (err) {
> +		dev_err(&pdev->dev, "Failed to set supported HW: %d\n", err);
> +		return err;
> +	}
> +
> +	err = dev_pm_opp_of_add_table(&pdev->dev);
> +	if (err) {
> +		dev_err(&pdev->dev, "Failed to add OPP table: %d\n", err);
> +		goto put_hw;
> +	}
> +
>  	err = clk_prepare_enable(tegra->clock);
>  	if (err) {
>  		dev_err(&pdev->dev,
>  			"Failed to prepare and enable ACTMON clock\n");
> -		return err;
> +		goto remove_table;
>  	}
>  
>  	err = reset_control_reset(tegra->reset);
> @@ -850,23 +858,6 @@ static int tegra_devfreq_probe(struct platform_device *pdev)
>  		dev->regs = tegra->regs + dev->config->offset;
>  	}
>  
> -	for (rate = 0; rate <= tegra->max_freq * KHZ; rate++) {
> -		rate = clk_round_rate(tegra->emc_clock, rate);
> -
> -		if (rate < 0) {
> -			dev_err(&pdev->dev,
> -				"Failed to round clock rate: %ld\n", rate);
> -			err = rate;
> -			goto remove_opps;
> -		}
> -
> -		err = dev_pm_opp_add(&pdev->dev, rate / KHZ, 0);
> -		if (err) {
> -			dev_err(&pdev->dev, "Failed to add OPP: %d\n", err);
> -			goto remove_opps;
> -		}
> -	}
> -
>  	platform_set_drvdata(pdev, tegra);
>  
>  	tegra->clk_rate_change_nb.notifier_call = tegra_actmon_clk_notify_cb;
> @@ -882,7 +873,6 @@ static int tegra_devfreq_probe(struct platform_device *pdev)
>  	}
>  
>  	tegra_devfreq_profile.initial_freq = clk_get_rate(tegra->emc_clock);
> -	tegra_devfreq_profile.initial_freq /= KHZ;
>  
>  	devfreq = devfreq_add_device(&pdev->dev, &tegra_devfreq_profile,
>  				     "tegra_actmon", NULL);
> @@ -902,6 +892,10 @@ static int tegra_devfreq_probe(struct platform_device *pdev)
>  	reset_control_reset(tegra->reset);
>  disable_clk:
>  	clk_disable_unprepare(tegra->clock);
> +remove_table:
> +	dev_pm_opp_of_remove_table(&pdev->dev);
> +put_hw:
> +	dev_pm_opp_put_supported_hw(tegra->opp_table);
>  
>  	return err;
>  }
> @@ -913,11 +907,12 @@ static int tegra_devfreq_remove(struct platform_device *pdev)
>  	devfreq_remove_device(tegra->devfreq);
>  	devfreq_remove_governor(&tegra_devfreq_governor);
>  
> -	dev_pm_opp_remove_all_dynamic(&pdev->dev);
> -
>  	reset_control_reset(tegra->reset);
>  	clk_disable_unprepare(tegra->clock);
>  
> +	dev_pm_opp_of_remove_table(&pdev->dev);
> +	dev_pm_opp_put_supported_hw(tegra->opp_table);
> +
>  	return 0;
>  }
>  
> 

Applied it. Thanks for your work for a long time.

-- 
Best Regards,
Chanwoo Choi
Samsung Electronics

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v11 10/10] PM / devfreq: tegra30: Separate configurations per-SoC generation
  2020-12-03 19:24 ` [PATCH v11 10/10] PM / devfreq: tegra30: Separate configurations per-SoC generation Dmitry Osipenko
@ 2020-12-07  1:33   ` Chanwoo Choi
  0 siblings, 0 replies; 22+ messages in thread
From: Chanwoo Choi @ 2020-12-07  1:33 UTC (permalink / raw)
  To: Dmitry Osipenko, Thierry Reding, Jonathan Hunter, Georgi Djakov,
	Rob Herring, Michael Turquette, Stephen Boyd, Peter De Schrijver,
	MyungJoo Ham, Kyungmin Park, Mikko Perttunen, Viresh Kumar,
	Peter Geis, Nicolas Chauvet, Krzysztof Kozlowski
  Cc: linux-tegra, linux-pm, linux-kernel, dri-devel, devicetree

On 12/4/20 4:24 AM, Dmitry Osipenko wrote:
> Previously we were using count-weight of the T124 for T30 in order to
> get EMC clock rate that was reasonable for T30. In fact the count-weight
> should be x2 times smaller on T30, but then devfreq was producing a bit
> too low EMC clock rate for ISO memory clients, like display controller
> for example.
> 
> Now both Tegra ACTMON and Tegra DRM display drivers support interconnect
> framework and display driver tells to ICC what a minimum memory bandwidth
> is needed, preventing FIFO underflows. Thus, now we can use a proper
> count-weight value for Tegra30 and MC_ALL device config needs a bit more
> aggressive boosting.
> 
> Add a separate ACTMON driver configuration that is specific to Tegra30.
> 
> Tested-by: Peter Geis <pgwipeout@gmail.com>
> Tested-by: Nicolas Chauvet <kwizart@gmail.com>
> Acked-by: Chanwoo Choi <cw00.choi@samsung.com>
> Signed-off-by: Dmitry Osipenko <digetx@gmail.com>
> ---
>  drivers/devfreq/tegra30-devfreq.c | 68 ++++++++++++++++++++++++-------
>  1 file changed, 54 insertions(+), 14 deletions(-)
> 
> diff --git a/drivers/devfreq/tegra30-devfreq.c b/drivers/devfreq/tegra30-devfreq.c
> index 145ef91ae092..117cad7968ab 100644
> --- a/drivers/devfreq/tegra30-devfreq.c
> +++ b/drivers/devfreq/tegra30-devfreq.c
> @@ -57,13 +57,6 @@
>  #define ACTMON_BELOW_WMARK_WINDOW				3
>  #define ACTMON_BOOST_FREQ_STEP					16000
>  
> -/*
> - * Activity counter is incremented every 256 memory transactions, and each
> - * transaction takes 4 EMC clocks for Tegra124; So the COUNT_WEIGHT is
> - * 4 * 256 = 1024.
> - */
> -#define ACTMON_COUNT_WEIGHT					0x400
> -
>  /*
>   * ACTMON_AVERAGE_WINDOW_LOG2: default value for @DEV_CTRL_K_VAL, which
>   * translates to 2 ^ (K_VAL + 1). ex: 2 ^ (6 + 1) = 128
> @@ -111,7 +104,7 @@ enum tegra_actmon_device {
>  	MCCPU,
>  };
>  
> -static const struct tegra_devfreq_device_config actmon_device_configs[] = {
> +static const struct tegra_devfreq_device_config tegra124_device_configs[] = {
>  	{
>  		/* MCALL: All memory accesses (including from the CPUs) */
>  		.offset = 0x1c0,
> @@ -133,6 +126,28 @@ static const struct tegra_devfreq_device_config actmon_device_configs[] = {
>  	},
>  };
>  
> +static const struct tegra_devfreq_device_config tegra30_device_configs[] = {
> +	{
> +		/* MCALL: All memory accesses (including from the CPUs) */
> +		.offset = 0x1c0,
> +		.irq_mask = 1 << 26,
> +		.boost_up_coeff = 200,
> +		.boost_down_coeff = 50,
> +		.boost_up_threshold = 20,
> +		.boost_down_threshold = 10,
> +	},
> +	{
> +		/* MCCPU: memory accesses from the CPUs */
> +		.offset = 0x200,
> +		.irq_mask = 1 << 25,
> +		.boost_up_coeff = 800,
> +		.boost_down_coeff = 40,
> +		.boost_up_threshold = 27,
> +		.boost_down_threshold = 10,
> +		.avg_dependency_threshold = 16000, /* 16MHz in kHz units */
> +	},
> +};
> +
>  /**
>   * struct tegra_devfreq_device - state specific to an ACTMON device
>   *
> @@ -155,6 +170,12 @@ struct tegra_devfreq_device {
>  	unsigned long target_freq;
>  };
>  
> +struct tegra_devfreq_soc_data {
> +	const struct tegra_devfreq_device_config *configs;
> +	/* Weight value for count measurements */
> +	unsigned int count_weight;
> +};
> +
>  struct tegra_devfreq {
>  	struct devfreq		*devfreq;
>  	struct opp_table	*opp_table;
> @@ -171,11 +192,13 @@ struct tegra_devfreq {
>  	struct delayed_work	cpufreq_update_work;
>  	struct notifier_block	cpu_rate_change_nb;
>  
> -	struct tegra_devfreq_device devices[ARRAY_SIZE(actmon_device_configs)];
> +	struct tegra_devfreq_device devices[2];
>  
>  	unsigned int		irq;
>  
>  	bool			started;
> +
> +	const struct tegra_devfreq_soc_data *soc;
>  };
>  
>  struct tegra_actmon_emc_ratio {
> @@ -488,7 +511,7 @@ static void tegra_actmon_configure_device(struct tegra_devfreq *tegra,
>  	tegra_devfreq_update_avg_wmark(tegra, dev);
>  	tegra_devfreq_update_wmark(tegra, dev);
>  
> -	device_writel(dev, ACTMON_COUNT_WEIGHT, ACTMON_DEV_COUNT_WEIGHT);
> +	device_writel(dev, tegra->soc->count_weight, ACTMON_DEV_COUNT_WEIGHT);
>  	device_writel(dev, ACTMON_INTR_STATUS_CLEAR, ACTMON_DEV_INTR_STATUS);
>  
>  	val |= ACTMON_DEV_CTRL_ENB_PERIODIC;
> @@ -779,6 +802,8 @@ static int tegra_devfreq_probe(struct platform_device *pdev)
>  	if (!tegra)
>  		return -ENOMEM;
>  
> +	tegra->soc = of_device_get_match_data(&pdev->dev);
> +
>  	tegra->regs = devm_platform_ioremap_resource(pdev, 0);
>  	if (IS_ERR(tegra->regs))
>  		return PTR_ERR(tegra->regs);
> @@ -852,9 +877,9 @@ static int tegra_devfreq_probe(struct platform_device *pdev)
>  
>  	tegra->max_freq = rate / KHZ;
>  
> -	for (i = 0; i < ARRAY_SIZE(actmon_device_configs); i++) {
> +	for (i = 0; i < ARRAY_SIZE(tegra->devices); i++) {
>  		dev = tegra->devices + i;
> -		dev->config = actmon_device_configs + i;
> +		dev->config = tegra->soc->configs + i;
>  		dev->regs = tegra->regs + dev->config->offset;
>  	}
>  
> @@ -916,9 +941,24 @@ static int tegra_devfreq_remove(struct platform_device *pdev)
>  	return 0;
>  }
>  
> +static const struct tegra_devfreq_soc_data tegra124_soc = {
> +	.configs = tegra124_device_configs,
> +
> +	/*
> +	 * Activity counter is incremented every 256 memory transactions,
> +	 * and each transaction takes 4 EMC clocks.
> +	 */
> +	.count_weight = 4 * 256,
> +};
> +
> +static const struct tegra_devfreq_soc_data tegra30_soc = {
> +	.configs = tegra30_device_configs,
> +	.count_weight = 2 * 256,
> +};
> +
>  static const struct of_device_id tegra_devfreq_of_match[] = {
> -	{ .compatible = "nvidia,tegra30-actmon" },
> -	{ .compatible = "nvidia,tegra124-actmon" },
> +	{ .compatible = "nvidia,tegra30-actmon",  .data = &tegra30_soc, },
> +	{ .compatible = "nvidia,tegra124-actmon", .data = &tegra124_soc, },
>  	{ },
>  };
>  
> 

Applied it. Thanks.

-- 
Best Regards,
Chanwoo Choi
Samsung Electronics

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v11 00/10] Introduce memory interconnect for NVIDIA Tegra SoCs
  2020-12-05 14:09 ` [PATCH v11 00/10] Introduce memory interconnect for NVIDIA Tegra SoCs Krzysztof Kozlowski
@ 2020-12-07 22:11   ` Dmitry Osipenko
  0 siblings, 0 replies; 22+ messages in thread
From: Dmitry Osipenko @ 2020-12-07 22:11 UTC (permalink / raw)
  To: Krzysztof Kozlowski, Mikko Perttunen, Nicolas Chauvet,
	Georgi Djakov, Peter Geis, MyungJoo Ham, Viresh Kumar,
	Thierry Reding, Michael Turquette, Kyungmin Park, Stephen Boyd,
	Rob Herring, Jonathan Hunter, Peter De Schrijver, Chanwoo Choi
  Cc: linux-tegra, devicetree, dri-devel, linux-kernel, linux-pm

05.12.2020 17:09, Krzysztof Kozlowski пишет:
> On Thu, 3 Dec 2020 22:24:29 +0300, Dmitry Osipenko wrote:
>> This series brings initial support for memory interconnect to Tegra20,
>> Tegra30 and Tegra124 SoCs.
>>
>> For the starter only display controllers and devfreq devices are getting
>> interconnect API support, others could be supported later on. The display
>> controllers have the biggest demand for interconnect API right now because
>> dynamic memory frequency scaling can't be done safely without taking into
>> account bandwidth requirement from the displays. In particular this series
>> fixes distorted display output on T30 Ouya and T124 TK1 devices.
>>
>> [...]
> 
> Applied, thanks!
> 
> [01/10] dt-bindings: memory: tegra20: emc: Document opp-supported-hw property
> [02/10] memory: tegra20: Support hardware versioning and clean up OPP table initialization
> [03/10] memory: tegra30: Support interconnect framework
>         commit: 01a51facb74fb337ff9fe734caa85dd6e246ef48
> 
> Best regards,
> 

Awesome, thanks! Good to have the warning splat silenced.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v11 09/10] PM / devfreq: tegra30: Support interconnect and OPPs from device-tree
  2020-12-07  1:32   ` Chanwoo Choi
@ 2020-12-07 22:11     ` Dmitry Osipenko
  0 siblings, 0 replies; 22+ messages in thread
From: Dmitry Osipenko @ 2020-12-07 22:11 UTC (permalink / raw)
  To: Chanwoo Choi, Thierry Reding, Jonathan Hunter, Georgi Djakov,
	Rob Herring, Michael Turquette, Stephen Boyd, Peter De Schrijver,
	MyungJoo Ham, Kyungmin Park, Mikko Perttunen, Viresh Kumar,
	Peter Geis, Nicolas Chauvet, Krzysztof Kozlowski
  Cc: linux-tegra, linux-pm, linux-kernel, dri-devel, devicetree

07.12.2020 04:32, Chanwoo Choi пишет:
> On 12/4/20 4:24 AM, Dmitry Osipenko wrote:
>> This patch moves ACTMON driver away from generating OPP table by itself,
>> transitioning it to use the table which comes from device-tree. This
>> change breaks compatibility with older device-trees and brings support
>> for the interconnect framework to the driver. This is a mandatory change
>> which needs to be done in order to implement interconnect-based memory
>> DVFS, i.e. device-trees need to be updated. Now ACTMON issues a memory
>> bandwidth requests using dev_pm_opp_set_bw() instead of driving EMC clock
>> rate directly.
>>
>> Tested-by: Peter Geis <pgwipeout@gmail.com>
>> Tested-by: Nicolas Chauvet <kwizart@gmail.com>
>> Acked-by: Chanwoo Choi <cw00.choi@samsung.com>
>> Signed-off-by: Dmitry Osipenko <digetx@gmail.com>
>> ---
...
> 
> Applied it. Thanks for your work for a long time.
> 

Thank you!

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v11 01/10] dt-bindings: memory: tegra20: emc: Document opp-supported-hw property
  2020-12-03 19:24 ` [PATCH v11 01/10] dt-bindings: memory: tegra20: emc: Document opp-supported-hw property Dmitry Osipenko
  2020-12-04 15:46   ` Thierry Reding
@ 2020-12-09 20:17   ` Rob Herring
  1 sibling, 0 replies; 22+ messages in thread
From: Rob Herring @ 2020-12-09 20:17 UTC (permalink / raw)
  To: Dmitry Osipenko
  Cc: MyungJoo Ham, devicetree, Viresh Kumar, Stephen Boyd,
	linux-tegra, Jonathan Hunter, Michael Turquette, linux-pm,
	Krzysztof Kozlowski, Chanwoo Choi, Rob Herring,
	Peter De Schrijver, dri-devel, Kyungmin Park, Peter Geis,
	linux-kernel, Georgi Djakov, Mikko Perttunen, Nicolas Chauvet,
	Thierry Reding

On Thu, 03 Dec 2020 22:24:30 +0300, Dmitry Osipenko wrote:
> Document opp-supported-hw property, which is not strictly necessary to
> have on Tegra20, but it's very convenient to have because all other SoC
> core devices will use hardware versioning, and thus, it's good to maintain
> the consistency.
> 
> Signed-off-by: Dmitry Osipenko <digetx@gmail.com>
> ---
>  .../bindings/memory-controllers/nvidia,tegra20-emc.txt      | 6 ++++++
>  1 file changed, 6 insertions(+)
> 

Acked-by: Rob Herring <robh@kernel.org>

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2020-12-09 20:18 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-12-03 19:24 [PATCH v11 00/10] Introduce memory interconnect for NVIDIA Tegra SoCs Dmitry Osipenko
2020-12-03 19:24 ` [PATCH v11 01/10] dt-bindings: memory: tegra20: emc: Document opp-supported-hw property Dmitry Osipenko
2020-12-04 15:46   ` Thierry Reding
2020-12-09 20:17   ` Rob Herring
2020-12-03 19:24 ` [PATCH v11 02/10] memory: tegra20: Support hardware versioning and clean up OPP table initialization Dmitry Osipenko
2020-12-04 16:20   ` Thierry Reding
2020-12-03 19:24 ` [PATCH v11 03/10] memory: tegra30: Support interconnect framework Dmitry Osipenko
2020-12-04 16:27   ` Thierry Reding
2020-12-03 19:24 ` [PATCH v11 04/10] memory: tegra124-emc: Make driver modular Dmitry Osipenko
2020-12-04 16:41   ` Thierry Reding
2020-12-05 19:52     ` Dmitry Osipenko
2020-12-03 19:24 ` [PATCH v11 05/10] memory: tegra124-emc: Continue probing if timings are missing in device-tree Dmitry Osipenko
2020-12-03 19:24 ` [PATCH v11 06/10] memory: tegra124: Support interconnect framework Dmitry Osipenko
2020-12-03 19:24 ` [PATCH v11 07/10] drm/tegra: dc: Support memory bandwidth management Dmitry Osipenko
2020-12-03 19:24 ` [PATCH v11 08/10] drm/tegra: dc: Extend debug stats with total number of events Dmitry Osipenko
2020-12-03 19:24 ` [PATCH v11 09/10] PM / devfreq: tegra30: Support interconnect and OPPs from device-tree Dmitry Osipenko
2020-12-07  1:32   ` Chanwoo Choi
2020-12-07 22:11     ` Dmitry Osipenko
2020-12-03 19:24 ` [PATCH v11 10/10] PM / devfreq: tegra30: Separate configurations per-SoC generation Dmitry Osipenko
2020-12-07  1:33   ` Chanwoo Choi
2020-12-05 14:09 ` [PATCH v11 00/10] Introduce memory interconnect for NVIDIA Tegra SoCs Krzysztof Kozlowski
2020-12-07 22:11   ` Dmitry Osipenko

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).