linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH 0/3] Introduce shared and cbus clocks
@ 2014-05-13 14:06 Peter De Schrijver
  2014-05-13 14:06 ` [RFC PATCH 1/3] clk: Implement cbus and shared clocks Peter De Schrijver
                   ` (2 more replies)
  0 siblings, 3 replies; 18+ messages in thread
From: Peter De Schrijver @ 2014-05-13 14:06 UTC (permalink / raw)
  To: Peter De Schrijver
  Cc: Mike Turquette, Prashant Gaikwad, Stephen Warren, Thierry Reding,
	Rob Herring, Pawel Moll, Mark Rutland, Ian Campbell, Kumar Gala,
	Arnd Bergmann, linux-kernel, linux-tegra, devicetree

Cbus clocks are virtual clocks which implement coordinated changes to
clock rates. They are used to change PLL rates with active child clocks.
On some SoCs (eg. Tegra) this requires the active child clocks to be moved
to a different parent durint the rate change.

Shared clocks are virtual clocks which implement clock policies such as
minimum required rate or maximum allowed rate.

Peter De Schrijver (3):
  clk: Implement cbus and shared clocks
  clk: tegra: Implement common shared clks
  clk: tegra: Implement Tegra124 shared/cbus clks

 drivers/clk/Makefile                     |    1 +
 drivers/clk/clk-shared-cbus.c            |  448 ++++++++++++++++++++++++++++++
 drivers/clk/tegra/Makefile               |    1 +
 drivers/clk/tegra/clk-id.h               |   58 ++++
 drivers/clk/tegra/clk-tegra-shared.c     |  128 +++++++++
 drivers/clk/tegra/clk-tegra124.c         |   50 ++++
 drivers/clk/tegra/clk.h                  |    1 +
 include/dt-bindings/clock/tegra124-car.h |   70 +++++-
 include/linux/clk-provider.h             |   56 ++++
 9 files changed, 812 insertions(+), 1 deletions(-)
 create mode 100644 drivers/clk/clk-shared-cbus.c
 create mode 100644 drivers/clk/tegra/clk-tegra-shared.c

-- 
1.7.7.rc0.72.g4b5ea.dirty


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [RFC PATCH 1/3] clk: Implement cbus and shared clocks
  2014-05-13 14:06 [RFC PATCH 0/3] Introduce shared and cbus clocks Peter De Schrijver
@ 2014-05-13 14:06 ` Peter De Schrijver
  2014-05-13 14:06 ` [RFC PATCH 2/3] clk: tegra: Implement common shared clks Peter De Schrijver
  2014-05-13 14:06 ` [RFC PATCH 3/3] clk: tegra: Implement Tegra124 shared/cbus clks Peter De Schrijver
  2 siblings, 0 replies; 18+ messages in thread
From: Peter De Schrijver @ 2014-05-13 14:06 UTC (permalink / raw)
  To: Peter De Schrijver
  Cc: Mike Turquette, Prashant Gaikwad, Stephen Warren, Thierry Reding,
	Rob Herring, Pawel Moll, Mark Rutland, Ian Campbell, Kumar Gala,
	Arnd Bergmann, linux-kernel, linux-tegra, devicetree

cbus clocks are virtual clocks which implement coordinated changes to
clock rates. These are needed because changing a PLLs rate cannot be
done with active child clocks. Therefore the child clocks are moved to
a backup pll during the PLL rate change.
shared clocks are virtual clocks which implement clock policies (eg min
or max rates).

Signed-off-by: Peter De Schrijver <pdeschrijver@nvidia.com>
---
 drivers/clk/Makefile          |    1 +
 drivers/clk/clk-shared-cbus.c |  448 +++++++++++++++++++++++++++++++++++++++++
 include/linux/clk-provider.h  |   56 +++++
 3 files changed, 505 insertions(+), 0 deletions(-)
 create mode 100644 drivers/clk/clk-shared-cbus.c

diff --git a/drivers/clk/Makefile b/drivers/clk/Makefile
index 5f8a287..8d9255b5 100644
--- a/drivers/clk/Makefile
+++ b/drivers/clk/Makefile
@@ -8,6 +8,7 @@ obj-$(CONFIG_COMMON_CLK)	+= clk-fixed-rate.o
 obj-$(CONFIG_COMMON_CLK)	+= clk-gate.o
 obj-$(CONFIG_COMMON_CLK)	+= clk-mux.o
 obj-$(CONFIG_COMMON_CLK)	+= clk-composite.o
+obj-y				+= clk-shared-cbus.o
 
 # hardware specific clock types
 # please keep this section sorted lexicographically by file/directory path name
diff --git a/drivers/clk/clk-shared-cbus.c b/drivers/clk/clk-shared-cbus.c
new file mode 100644
index 0000000..402b4c3
--- /dev/null
+++ b/drivers/clk/clk-shared-cbus.c
@@ -0,0 +1,448 @@
+/*
+ * Copyright (c) 2012 - 2014, NVIDIA CORPORATION.  All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include <linux/slab.h>
+#include <linux/io.h>
+#include <linux/delay.h>
+#include <linux/err.h>
+#include <linux/clk-provider.h>
+#include <linux/clk.h>
+
+static unsigned long _clk_cap_shared_bus(struct clk *c, unsigned long rate,
+						unsigned long ceiling)
+{
+	ceiling = clk_round_rate(c, ceiling);
+
+	rate = min(rate, ceiling);
+
+	return rate;
+}
+
+static int _clk_shared_bus_update(struct clk *bus)
+{
+	struct clk_cbus_shared *cbus =
+			to_clk_cbus_shared(__clk_get_hw(bus));
+	struct clk_cbus_shared *c;
+	unsigned long override_rate = 0;
+	unsigned long top_rate = 0;
+	unsigned long rate = cbus->min_rate;
+	unsigned long bw = 0;
+	unsigned long ceiling = cbus->max_rate;
+
+	list_for_each_entry(c, &cbus->shared_bus_list,
+			u.shared_bus_user.node) {
+		/*
+		 * Ignore requests from disabled floor, bw users, and
+		 * auto-users riding the bus. Always check the ceiling users
+		 * so we don't need to enable it for capping the bus rate.
+		 */
+		if (c->u.shared_bus_user.enabled ||
+		    (c->u.shared_bus_user.mode == SHARED_CEILING)) {
+			unsigned long request_rate = c->u.shared_bus_user.rate;
+
+			switch (c->u.shared_bus_user.mode) {
+			case SHARED_BW:
+				bw += request_rate;
+				if (bw > cbus->max_rate)
+					bw = cbus->max_rate;
+				break;
+			case SHARED_CEILING:
+				if (request_rate)
+					ceiling = min(request_rate, ceiling);
+				break;
+			case SHARED_OVERRIDE:
+				if (override_rate == 0)
+					override_rate = request_rate;
+				break;
+			case SHARED_AUTO:
+				break;
+			case SHARED_FLOOR:
+			default:
+				top_rate = max(request_rate, top_rate);
+				rate = max(top_rate, rate);
+			}
+		}
+	}
+	/*
+	 * Keep the bus rate as its default rate when there is no SHARED_FLOOR
+	 * users enabled so we won't underrun the bus.
+	 */
+	if (!top_rate)
+		rate = clk_get_rate(bus);
+	rate = override_rate ? : max(rate, bw);
+	ceiling = override_rate ? cbus->max_rate : ceiling;
+
+	rate = _clk_cap_shared_bus(bus, rate, ceiling);
+
+	return clk_set_rate(bus, rate);
+}
+
+static int clk_shared_prepare(struct clk_hw *hw)
+{
+	int err = 0;
+	struct clk_cbus_shared *shared = to_clk_cbus_shared(hw);
+
+	shared->u.shared_bus_user.enabled = true;
+	err = _clk_shared_bus_update(clk_get_parent(hw->clk));
+	if (!err && shared->u.shared_bus_user.client)
+		err = clk_prepare_enable(shared->u.shared_bus_user.client);
+
+	return err;
+}
+
+static void clk_shared_unprepare(struct clk_hw *hw)
+{
+	struct clk_cbus_shared *shared = to_clk_cbus_shared(hw);
+
+	if (shared->u.shared_bus_user.client)
+		clk_disable_unprepare(shared->u.shared_bus_user.client);
+
+	shared->u.shared_bus_user.enabled = false;
+	_clk_shared_bus_update(clk_get_parent(hw->clk));
+}
+
+static bool shared_clk_set_rate;
+
+static int clk_shared_set_rate(struct clk_hw *hw, unsigned long rate,
+				unsigned long parent_rate)
+{
+	struct clk_cbus_shared *shared = to_clk_cbus_shared(hw);
+	int err;
+
+	if (shared_clk_set_rate)
+		return 0;
+
+	shared_clk_set_rate = true;
+	shared->u.shared_bus_user.rate = rate;
+	err = _clk_shared_bus_update(clk_get_parent(hw->clk));
+	shared_clk_set_rate = false;
+
+	return err;
+}
+
+static long clk_shared_round_rate(struct clk_hw *hw, unsigned long rate,
+				unsigned long *parent_rate)
+{
+	struct clk_cbus_shared *shared = to_clk_cbus_shared(hw);
+	struct clk_cbus_shared *parent_cbus;
+	struct clk *parent;
+	int ret;
+
+	parent = clk_get_parent(hw->clk);
+	parent_cbus = to_clk_cbus_shared(__clk_get_hw(parent));
+
+	/*
+	 * Defer rounding requests until aggregated. BW users must not be
+	 * rounded at all, others just clipped to bus range (some clients
+	 * may use round api to find limits)
+	 */
+	if (shared->u.shared_bus_user.mode != SHARED_BW) {
+		if (!parent_cbus->max_rate) {
+			ret = clk_round_rate(parent, ULONG_MAX);
+			if (!IS_ERR_VALUE(ret))
+				parent_cbus->max_rate = ret;
+		}
+
+		if (rate > parent_cbus->max_rate)
+			rate = parent_cbus->max_rate;
+		else if (rate < parent_cbus->min_rate)
+			rate = parent_cbus->min_rate;
+	}
+	return rate;
+}
+
+static unsigned long clk_shared_recalc_rate(struct clk_hw *hw,
+				unsigned long parent_rate)
+{
+	struct clk_cbus_shared *shared = to_clk_cbus_shared(hw);
+
+	return shared->u.shared_bus_user.rate;
+}
+
+static int cbus_switch_one(struct clk *client, struct clk *p)
+{
+	int ret = 0;
+	unsigned long old_parent_rate, new_parent_rate, current_rate;
+
+	current_rate = clk_get_rate(client);
+	old_parent_rate = clk_get_rate(clk_get_parent(client));
+	new_parent_rate = clk_get_rate(p);
+
+	if (new_parent_rate > old_parent_rate) {
+		u64 temp_rate;
+
+		/*
+		 * In order to not overclocking the IP block when changing the
+		 * parent, we set the divider to a value which will give us an
+		 * allowed rate when the new parent is selected.
+		 */
+		temp_rate = DIV_ROUND_UP_ULL((u64)clk_get_rate(client) *
+			(u64)old_parent_rate, new_parent_rate);
+		ret = clk_set_rate(client, temp_rate);
+		if (ret) {
+			pr_err("failed to set %s rate to %llu: %d\n",
+				__clk_get_name(client), temp_rate, ret);
+			return ret;
+		}
+	}
+
+	ret = clk_set_parent(client, p);
+	if (ret) {
+		pr_err("failed to set %s parent to %s: %d\n",
+			__clk_get_name(client),
+			__clk_get_name(p), ret);
+		return ret;
+	}
+
+	clk_set_rate(client, current_rate);
+
+	return ret;
+}
+
+static int cbus_backup(struct clk_hw *hw)
+{
+	int ret = 0;
+	struct clk_cbus_shared *cbus = to_clk_cbus_shared(hw);
+	struct clk_cbus_shared *user;
+
+	list_for_each_entry(user, &cbus->shared_bus_list,
+				 u.shared_bus_user.node) {
+		struct clk *client = user->u.shared_bus_user.client;
+
+		if (client && __clk_is_enabled(client) &&
+		 (clk_get_parent(client) == clk_get_parent(hw->clk))) {
+			ret = cbus_switch_one(client, cbus->shared_bus_backup);
+			if (ret)
+				break;
+		}
+	}
+
+	return ret;
+}
+
+static void cbus_restore(struct clk_hw *hw)
+{
+	struct clk_cbus_shared *user;
+	struct clk_cbus_shared *cbus = to_clk_cbus_shared(hw);
+
+	list_for_each_entry(user, &cbus->shared_bus_list,
+				 u.shared_bus_user.node) {
+		struct clk *client = user->u.shared_bus_user.client;
+
+		if (client) {
+			cbus_switch_one(client, clk_get_parent(hw->clk));
+			clk_set_rate(client, user->u.shared_bus_user.rate);
+		}
+	}
+}
+
+static int clk_cbus_set_rate(struct clk_hw *hw, unsigned long rate,
+			unsigned long parent_rate)
+{
+	int ret;
+	struct clk *parent;
+	struct clk_cbus_shared *cbus = to_clk_cbus_shared(hw);
+
+	if (cbus->rate_updating)
+		return 0;
+
+	if (rate == 0)
+		return 0;
+
+	cbus->rate_updating = true;
+
+	parent = clk_get_parent(hw->clk);
+	if (!parent) {
+		cbus->rate_updating = false;
+		return -EINVAL;
+	}
+
+	ret = clk_prepare_enable(parent);
+	if (ret) {
+		cbus->rate_updating = false;
+		pr_err("%s: failed to enable %s clock: %d\n",
+		       __func__, __clk_get_name(hw->clk), ret);
+		return ret;
+	}
+
+	ret = cbus_backup(hw);
+	if (ret)
+		goto out;
+
+	ret = clk_set_rate(parent, rate);
+	if (ret) {
+		pr_err("%s: failed to set %s clock rate %lu: %d\n",
+		       __func__, __clk_get_name(hw->clk), rate, ret);
+		goto out;
+	}
+
+	cbus_restore(hw);
+
+out:
+	cbus->rate_updating = false;
+	clk_disable_unprepare(parent);
+	return ret;
+}
+
+static long clk_cbus_round_rate(struct clk_hw *hw, unsigned long rate,
+			unsigned long *parent_rate)
+{
+	struct clk *parent;
+	long new_rate;
+
+	parent = clk_get_parent(hw->clk);
+	if (IS_ERR(parent)) {
+		pr_err("no parent for %s\n", __clk_get_name(hw->clk));
+		return *parent_rate;
+	} else {
+		new_rate = clk_round_rate(parent, rate);
+		if (new_rate < 0)
+			return *parent_rate;
+		else
+			return new_rate;
+	}
+}
+
+static unsigned long clk_cbus_recalc_rate(struct clk_hw *hw,
+				unsigned long parent_rate)
+{
+	return clk_get_rate(clk_get_parent(hw->clk));
+}
+
+static const struct clk_ops clk_shared_ops = {
+	.prepare = clk_shared_prepare,
+	.unprepare = clk_shared_unprepare,
+	.set_rate = clk_shared_set_rate,
+	.round_rate = clk_shared_round_rate,
+	.recalc_rate = clk_shared_recalc_rate,
+};
+
+static const struct clk_ops tegra_clk_cbus_ops = {
+	.recalc_rate = clk_cbus_recalc_rate,
+	.round_rate = clk_cbus_round_rate,
+	.set_rate = clk_cbus_set_rate,
+};
+
+struct clk *clk_register_shared(const char *name,
+		const char **parent, u8 num_parents, unsigned long flags,
+		enum shared_bus_users_mode mode, const char *client)
+{
+	struct clk_cbus_shared *shared;
+	struct clk_init_data init;
+	struct clk_cbus_shared *parent_cbus;
+	struct clk *client_clk, *parent_clk;
+
+pr_err("%s:%d name: %s\n", __FILE__, __LINE__, name);
+
+	if (num_parents > 2)
+		return ERR_PTR(-EINVAL);
+
+	parent_clk = __clk_lookup(parent[0]);
+	if (IS_ERR(parent_clk))
+		return parent_clk;
+
+	parent_cbus = to_clk_cbus_shared(__clk_get_hw(parent_clk));
+
+	shared = kzalloc(sizeof(*shared), GFP_KERNEL);
+	if (!shared)
+		return ERR_PTR(-ENOMEM);
+
+	if (client) {
+		client_clk = __clk_lookup(client);
+		if (IS_ERR(client_clk)) {
+			kfree(shared);
+			return client_clk;
+		}
+		shared->u.shared_bus_user.client = client_clk;
+		shared->magic = CLK_SHARED_MAGIC;
+	}
+
+	shared->u.shared_bus_user.mode = mode;
+	if (mode == SHARED_CEILING)
+		shared->u.shared_bus_user.rate = parent_cbus->max_rate;
+	else
+		shared->u.shared_bus_user.rate = clk_get_rate(parent_clk);
+
+	shared->flags = flags;
+
+	if (num_parents > 1) {
+		struct clk *c = __clk_lookup(parent[1]);
+
+		if (!IS_ERR(c)) {
+			shared->u.shared_bus_user.inputs[0] = parent_clk;
+			shared->u.shared_bus_user.inputs[1] = c;
+		}
+	}
+	shared->max_rate = parent_cbus->max_rate;
+
+	INIT_LIST_HEAD(&shared->u.shared_bus_user.node);
+
+	list_add_tail(&shared->u.shared_bus_user.node,
+			&parent_cbus->shared_bus_list);
+
+	flags |= CLK_GET_RATE_NOCACHE;
+
+	init.name = name;
+	init.ops = &clk_shared_ops;
+	init.flags = flags;
+	init.parent_names = parent;
+	init.num_parents = 1;
+
+	/* Data in .init is copied by clk_register(), so stack variable OK */
+	shared->hw.init = &init;
+
+	return clk_register(NULL, &shared->hw);
+}
+
+struct clk *clk_register_cbus(const char *name,
+		const char *parent, unsigned long flags,
+		const char *backup, unsigned long min_rate,
+		unsigned long max_rate)
+{
+	struct clk_cbus_shared *cbus;
+	struct clk_init_data init;
+	struct clk *backup_clk;
+
+	cbus = kzalloc(sizeof(*cbus), GFP_KERNEL);
+	if (!cbus)
+		return ERR_PTR(-ENOMEM);
+
+	backup_clk = __clk_lookup(backup);
+	if (IS_ERR(backup_clk)) {
+		kfree(cbus);
+		return backup_clk;
+	}
+
+	cbus->shared_bus_backup = backup_clk;
+	cbus->min_rate = min_rate;
+	cbus->max_rate = max_rate;
+
+	INIT_LIST_HEAD(&cbus->shared_bus_list);
+
+	flags |= CLK_GET_RATE_NOCACHE;
+
+	init.name = name;
+	init.ops = &tegra_clk_cbus_ops;
+	init.flags = flags;
+	init.parent_names = &parent;
+	init.num_parents = 1;
+
+	/* Data in .init is copied by clk_register(), so stack variable OK */
+	cbus->hw.init = &init;
+
+	return clk_register(NULL, &cbus->hw);
+}
+
diff --git a/include/linux/clk-provider.h b/include/linux/clk-provider.h
index 5119174..0fd195d 100644
--- a/include/linux/clk-provider.h
+++ b/include/linux/clk-provider.h
@@ -444,6 +444,62 @@ struct clk *clk_register_composite(struct device *dev, const char *name,
 		struct clk_hw *gate_hw, const struct clk_ops *gate_ops,
 		unsigned long flags);
 
+enum shared_bus_users_mode {
+	SHARED_FLOOR = 0,
+	SHARED_BW,
+	SHARED_CEILING,
+	SHARED_AUTO,
+	SHARED_OVERRIDE,
+};
+
+#define CLK_SHARED_MAGIC	0x18ce213d
+
+struct clk_cbus_shared {
+	u32			magic;
+	struct clk_hw		hw;
+	struct list_head	shared_bus_list;
+	struct clk		*shared_bus_backup;
+	u32			flags;
+	unsigned long		min_rate;
+	unsigned long		max_rate;
+	bool			rate_updating;
+	bool			prepared;
+	union {
+		struct {
+			struct clk_hw	*top_user;
+			struct clk_hw	*slow_user;
+		} cbus;
+		struct {
+			struct clk_hw	*pclk;
+			struct clk_hw	*hclk;
+			struct clk_hw	*sclk_low;
+			struct clk_hw	*sclk_high;
+			unsigned long	threshold;
+		} system;
+		struct {
+			struct list_head	node;
+			bool			enabled;
+			unsigned long		rate;
+			struct clk		*client;
+			u32			client_div;
+			enum shared_bus_users_mode mode;
+			struct clk		*inputs[2];
+		} shared_bus_user;
+	} u;
+};
+
+#define to_clk_cbus_shared(_hw) \
+			container_of(_hw, struct clk_cbus_shared, hw)
+
+struct clk *clk_register_shared(const char *name,
+		const char **parent, u8 num_parents, unsigned long flags,
+		enum shared_bus_users_mode mode, const char *client);
+
+struct clk *clk_register_cbus(const char *name,
+		const char *parent, unsigned long flags,
+		const char *backup, unsigned long min_rate,
+		unsigned long max_rate);
+
 /**
  * clk_register - allocate a new clock, register it and return an opaque cookie
  * @dev: device that is registering this clock
-- 
1.7.7.rc0.72.g4b5ea.dirty


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [RFC PATCH 2/3] clk: tegra: Implement common shared clks
  2014-05-13 14:06 [RFC PATCH 0/3] Introduce shared and cbus clocks Peter De Schrijver
  2014-05-13 14:06 ` [RFC PATCH 1/3] clk: Implement cbus and shared clocks Peter De Schrijver
@ 2014-05-13 14:06 ` Peter De Schrijver
  2014-05-13 14:06 ` [RFC PATCH 3/3] clk: tegra: Implement Tegra124 shared/cbus clks Peter De Schrijver
  2 siblings, 0 replies; 18+ messages in thread
From: Peter De Schrijver @ 2014-05-13 14:06 UTC (permalink / raw)
  To: Peter De Schrijver
  Cc: Mike Turquette, Prashant Gaikwad, Stephen Warren, Thierry Reding,
	Rob Herring, Pawel Moll, Mark Rutland, Ian Campbell, Kumar Gala,
	Arnd Bergmann, linux-kernel, linux-tegra, devicetree

Implement shared clks which are common between different Tegra SoCs

Signed-off-by: Peter De Schrijver <pdeschrijver@nvidia.com>
---
 drivers/clk/tegra/Makefile           |    1 +
 drivers/clk/tegra/clk-id.h           |   58 +++++++++++++++
 drivers/clk/tegra/clk-tegra-shared.c |  128 ++++++++++++++++++++++++++++++++++
 drivers/clk/tegra/clk.h              |    1 +
 4 files changed, 188 insertions(+), 0 deletions(-)
 create mode 100644 drivers/clk/tegra/clk-tegra-shared.c

diff --git a/drivers/clk/tegra/Makefile b/drivers/clk/tegra/Makefile
index f7dfb72..e2d11639 100644
--- a/drivers/clk/tegra/Makefile
+++ b/drivers/clk/tegra/Makefile
@@ -11,6 +11,7 @@ obj-y					+= clk-tegra-periph.o
 obj-y					+= clk-tegra-pmc.o
 obj-y					+= clk-tegra-fixed.o
 obj-y					+= clk-tegra-super-gen4.o
+obj-y					+= clk-tegra-shared.o
 obj-$(CONFIG_ARCH_TEGRA_2x_SOC)         += clk-tegra20.o
 obj-$(CONFIG_ARCH_TEGRA_3x_SOC)         += clk-tegra30.o
 obj-$(CONFIG_ARCH_TEGRA_114_SOC)	+= clk-tegra114.o
diff --git a/drivers/clk/tegra/clk-id.h b/drivers/clk/tegra/clk-id.h
index c39613c..26fea1e 100644
--- a/drivers/clk/tegra/clk-id.h
+++ b/drivers/clk/tegra/clk-id.h
@@ -233,6 +233,64 @@ enum clk_id {
 	tegra_clk_xusb_hs_src,
 	tegra_clk_xusb_ss,
 	tegra_clk_xusb_ss_src,
+	tegra_clk_avp_emc,
+	tegra_clk_avp_sclk,
+	tegra_clk_battery_c2bus,
+	tegra_clk_battery_c3bus,
+	tegra_clk_battery_emc,
+	tegra_clk_bsea_sclk,
+	tegra_clk_camera_emc,
+	tegra_clk_cap_c2bus,
+	tegra_clk_cap_c3bus,
+	tegra_clk_cap_emc,
+	tegra_clk_cap_profile_c2bus,
+	tegra_clk_cap_profile_c3bus,
+	tegra_clk_cap_sclk,
+	tegra_clk_cap_throttle_c2bus,
+	tegra_clk_cap_throttle_c3bus,
+	tegra_clk_cap_throttle_emc,
+	tegra_clk_cap_throttle_sclk,
+	tegra_clk_cpu_emc,
+	tegra_clk_disp1_emc,
+	tegra_clk_disp2_emc,
+	tegra_clk_edp_c2bus,
+	tegra_clk_edp_c3bus,
+	tegra_clk_edp_emc,
+	tegra_clk_floor_c2bus,
+	tegra_clk_floor_c3bus,
+	tegra_clk_floor_emc,
+	tegra_clk_floor_sclk,
+	tegra_clk_hdmi_emc,
+	tegra_clk_iso_emc,
+	tegra_clk_mon_avp,
+	tegra_clk_mon_emc,
+	tegra_clk_msenc_emc,
+	tegra_clk_override_c2bus,
+	tegra_clk_override_c3bus,
+	tegra_clk_override_emc,
+	tegra_clk_override_sclk,
+	tegra_clk_sbc1_sclk,
+	tegra_clk_sbc2_sclk,
+	tegra_clk_sbc3_sclk,
+	tegra_clk_sbc4_sclk,
+	tegra_clk_sbc5_sclk,
+	tegra_clk_sbc6_sclk,
+	tegra_clk_sdmmc4_emc,
+	tegra_clk_tsec_emc,
+	tegra_clk_usb1_emc,
+	tegra_clk_usb1_sclk,
+	tegra_clk_usb2_emc,
+	tegra_clk_usb2_sclk,
+	tegra_clk_usb3_emc,
+	tegra_clk_usb3_sclk,
+	tegra_clk_usbd_emc,
+	tegra_clk_usbd_sclk,
+	tegra_clk_wake_sclk,
+	tegra_clk_gk20a_emc,
+	tegra_clk_vic03_emc,
+	tegra_clk_ispa_emc,
+	tegra_clk_ispb_emc,
+	tegra_clk_xusb_emc,
 	tegra_clk_max,
 };
 
diff --git a/drivers/clk/tegra/clk-tegra-shared.c b/drivers/clk/tegra/clk-tegra-shared.c
new file mode 100644
index 0000000..c84080e
--- /dev/null
+++ b/drivers/clk/tegra/clk-tegra-shared.c
@@ -0,0 +1,128 @@
+/*
+ * Copyright (c) 2012, 2013, NVIDIA CORPORATION.  All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include <linux/clk.h>
+#include <linux/clk-provider.h>
+
+#include "clk.h"
+#include "clk-id.h"
+
+struct tegra_shared_clk {
+	char *name;
+	char *client;
+	union {
+		const char **parents;
+		const char *parent;
+	} p;
+	int num_parents;
+	enum shared_bus_users_mode mode;
+	int flags;
+	int clk_id;
+};
+
+#define SHARED_CLK(_name, _parent, _mode, _flags, _client, _id)\
+	{\
+		.name = _name,\
+		.p.parent = _parent,\
+		.num_parents = 1,\
+		.mode = _mode, \
+		.flags = _flags, \
+		.client = _client,\
+		.clk_id = _id,\
+	}
+
+static struct tegra_shared_clk shared_clks[] = {
+	SHARED_CLK("cap.c2bus", "c2bus", SHARED_CEILING, 0, NULL, tegra_clk_cap_c2bus),
+	SHARED_CLK("cap.throttle.c2bus", "c2bus", SHARED_CEILING, 0, NULL, tegra_clk_cap_throttle_c2bus),
+	SHARED_CLK("floor.c2bus", "c2bus", 0, 0, NULL, tegra_clk_floor_c2bus),
+	SHARED_CLK("override.c2bus", "c2bus", SHARED_OVERRIDE, 0, NULL, tegra_clk_override_c2bus),
+	SHARED_CLK("edp.c2bus", "c2bus", SHARED_CEILING, 0, NULL, tegra_clk_edp_c2bus),
+	SHARED_CLK("battery.c2bus", "c2bus", SHARED_CEILING, 0, NULL, tegra_clk_battery_c2bus),
+	SHARED_CLK("cap.profile.c2bus", "c2bus", SHARED_CEILING, 0, NULL, tegra_clk_cap_profile_c2bus),
+	SHARED_CLK("cap.c3bus", "c3bus", SHARED_CEILING, 0, NULL, tegra_clk_cap_c3bus),
+	SHARED_CLK("cap.throttle.c3bus", "c3bus", SHARED_CEILING, 0, NULL, tegra_clk_cap_throttle_c3bus),
+	SHARED_CLK("override.c3bus", "c3bus", SHARED_OVERRIDE, 0, NULL, tegra_clk_override_c3bus),
+	SHARED_CLK("cap.sclk", "sbus", SHARED_CEILING, 0, NULL, tegra_clk_cap_sclk),
+	SHARED_CLK("cap.throttle.sclk", "sbus", SHARED_CEILING, 0, NULL, tegra_clk_cap_throttle_sclk),
+	SHARED_CLK("floor.sclk", "sbus", 0, 0, NULL, tegra_clk_floor_sclk),
+	SHARED_CLK("override.sclk", "sbus", SHARED_OVERRIDE, 0, NULL, tegra_clk_override_sclk),
+	SHARED_CLK("avp.sclk", "sbus", 0, 0, NULL, tegra_clk_avp_sclk),
+	SHARED_CLK("bsea.sclk", "sbus", 0, 0, NULL, tegra_clk_bsea_sclk),
+	SHARED_CLK("usbd.sclk", "sbus", 0, 0, NULL, tegra_clk_usbd_sclk),
+	SHARED_CLK("usb1.sclk", "sbus", 0, 0, NULL, tegra_clk_usb1_sclk),
+	SHARED_CLK("usb2.sclk", "sbus", 0, 0, NULL, tegra_clk_usb2_sclk),
+	SHARED_CLK("usb3.sclk", "sbus", 0, 0, NULL, tegra_clk_usb3_sclk),
+	SHARED_CLK("wake.sclk", "sbus", 0, 0, NULL, tegra_clk_wake_sclk),
+	SHARED_CLK("sbc1.sclk", "sbus", 0, 0, NULL, tegra_clk_sbc1_sclk),
+	SHARED_CLK("sbc2.sclk", "sbus", 0, 0, NULL, tegra_clk_sbc2_sclk),
+	SHARED_CLK("sbc3.sclk", "sbus", 0, 0, NULL, tegra_clk_sbc3_sclk),
+	SHARED_CLK("sbc4.sclk", "sbus", 0, 0, NULL, tegra_clk_sbc4_sclk),
+	SHARED_CLK("sbc5.sclk", "sbus", 0, 0, NULL, tegra_clk_sbc5_sclk),
+	SHARED_CLK("sbc6.sclk", "sbus", 0, 0, NULL, tegra_clk_sbc6_sclk),
+	SHARED_CLK("mon.avp", "sbus", 0, 0, NULL, tegra_clk_mon_avp),
+	SHARED_CLK("avp.emc", "emc_master", 0, 0, NULL, tegra_clk_avp_emc),
+	SHARED_CLK("cpu.emc", "emc_master", 0, 0, NULL, tegra_clk_cpu_emc),
+	SHARED_CLK("disp1.emc", "emc_master", SHARED_BW, 0, NULL, tegra_clk_disp1_emc),
+	SHARED_CLK("disp2.emc", "emc_master", SHARED_BW, 0, NULL, tegra_clk_disp2_emc),
+	SHARED_CLK("hdmi.emc", "emc_master", 0, 0, NULL, tegra_clk_hdmi_emc),
+	SHARED_CLK("usbd.emc", "emc_master", 0, 0, NULL, tegra_clk_usbd_emc),
+	SHARED_CLK("usb1.emc", "emc_master", 0, 0, NULL, tegra_clk_usb1_emc),
+	SHARED_CLK("usb2.emc", "emc_master", 0, 0, NULL, tegra_clk_usb2_emc),
+	SHARED_CLK("usb3.emc", "emc_master", 0, 0, NULL, tegra_clk_usb3_emc),
+	SHARED_CLK("mon.emc", "emc_master", 0, 0, NULL, tegra_clk_mon_emc),
+	SHARED_CLK("msenc.emc", "emc_master", SHARED_BW, 0, NULL, tegra_clk_msenc_emc),
+	SHARED_CLK("tsec.emc", "emc_master", 0, 0, NULL, tegra_clk_tsec_emc),
+	SHARED_CLK("sdmmc4.emc", "emc_master", 0, 0, NULL, tegra_clk_sdmmc4_emc),
+	SHARED_CLK("camera.emc", "emc_master", SHARED_BW, 0, NULL, tegra_clk_camera_emc),
+	SHARED_CLK("iso.emc", "emc_master", SHARED_BW, 0, NULL, tegra_clk_iso_emc),
+	SHARED_CLK("cap.emc", "emc_master", SHARED_CEILING, 0, NULL, tegra_clk_cap_emc),
+	SHARED_CLK("cap.throttle.emc", "emc_master", SHARED_CEILING, 0, NULL, tegra_clk_cap_throttle_emc),
+	SHARED_CLK("floor.emc", "emc_master", 0, 0, NULL, tegra_clk_floor_emc),
+	SHARED_CLK("override.emc", "emc_master", SHARED_OVERRIDE, 0, NULL, tegra_clk_override_emc),
+	SHARED_CLK("edp.emc", "emc_master", SHARED_CEILING, 0, NULL, tegra_clk_edp_emc),
+	SHARED_CLK("battery.emc", "emc_master", SHARED_CEILING, 0, NULL, tegra_clk_battery_emc),
+	SHARED_CLK("gk20a.emc", "emc_master", 0, 0, NULL, tegra_clk_gk20a_emc),
+	SHARED_CLK("vic03.emc", "emc_master", 0, 0, NULL, tegra_clk_vic03_emc),
+	SHARED_CLK("ispa.emc", "emc_master", 0, 0, NULL, tegra_clk_ispa_emc),
+	SHARED_CLK("ispb.emc", "emc_master", 0, 0, NULL, tegra_clk_ispb_emc),
+	SHARED_CLK("xusb.emc", "emc_master", 0, 0, NULL, tegra_clk_xusb_emc),
+};
+
+void __init tegra_shared_clk_init(struct tegra_clk *tegra_clks)
+{
+	int i;
+	const char **parents;
+	struct tegra_shared_clk *data;
+	struct clk *clk;
+	struct clk **dt_clk;
+
+	for (i = 0; i < ARRAY_SIZE(shared_clks); i++) {
+		data = &shared_clks[i];
+		if (data->num_parents == 1)
+			parents = &data->p.parent;
+		else
+			parents = data->p.parents;
+
+		dt_clk = tegra_lookup_dt_id(data->clk_id, tegra_clks);
+		if (!dt_clk)
+			continue;
+
+		clk = clk_register_shared(data->name, parents,
+				data->num_parents, data->flags, data->mode,
+				data->client);
+		*dt_clk = clk;
+	}
+}
diff --git a/drivers/clk/tegra/clk.h b/drivers/clk/tegra/clk.h
index 16ec8d6..e0fd180 100644
--- a/drivers/clk/tegra/clk.h
+++ b/drivers/clk/tegra/clk.h
@@ -620,6 +620,7 @@ int tegra_osc_clk_init(void __iomem *clk_base, struct tegra_clk *tegra_clks,
 void tegra_super_clk_gen4_init(void __iomem *clk_base,
 			void __iomem *pmc_base, struct tegra_clk *tegra_clks,
 			struct tegra_clk_pll_params *pll_params);
+void tegra_shared_clk_init(struct tegra_clk *tegra_clks);
 
 void tegra114_clock_tune_cpu_trimmers_high(void);
 void tegra114_clock_tune_cpu_trimmers_low(void);
-- 
1.7.7.rc0.72.g4b5ea.dirty


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [RFC PATCH 3/3] clk: tegra: Implement Tegra124 shared/cbus clks
  2014-05-13 14:06 [RFC PATCH 0/3] Introduce shared and cbus clocks Peter De Schrijver
  2014-05-13 14:06 ` [RFC PATCH 1/3] clk: Implement cbus and shared clocks Peter De Schrijver
  2014-05-13 14:06 ` [RFC PATCH 2/3] clk: tegra: Implement common shared clks Peter De Schrijver
@ 2014-05-13 14:06 ` Peter De Schrijver
  2014-05-13 18:09   ` Stephen Warren
  2 siblings, 1 reply; 18+ messages in thread
From: Peter De Schrijver @ 2014-05-13 14:06 UTC (permalink / raw)
  To: Peter De Schrijver
  Cc: Mike Turquette, Prashant Gaikwad, Stephen Warren, Thierry Reding,
	Rob Herring, Pawel Moll, Mark Rutland, Ian Campbell, Kumar Gala,
	Arnd Bergmann, linux-kernel, linux-tegra, devicetree

Add shared and cbus clocks to the Tegra124 clock implementation.

Signed-off-by: Peter De Schrijver <pdeschrijver@nvidia.com>
---
 drivers/clk/tegra/clk-tegra124.c         |   50 +++++++++++++++++++++
 include/dt-bindings/clock/tegra124-car.h |   70 +++++++++++++++++++++++++++++-
 2 files changed, 119 insertions(+), 1 deletions(-)

diff --git a/drivers/clk/tegra/clk-tegra124.c b/drivers/clk/tegra/clk-tegra124.c
index cc37c34..4a160e5 100644
--- a/drivers/clk/tegra/clk-tegra124.c
+++ b/drivers/clk/tegra/clk-tegra124.c
@@ -950,6 +950,16 @@ static struct tegra_clk tegra124_clks[tegra_clk_max] __initdata = {
 	[tegra_clk_clk_out_3_mux] = { .dt_id = TEGRA124_CLK_CLK_OUT_3_MUX, .present = true },
 	[tegra_clk_dsia_mux] = { .dt_id = TEGRA124_CLK_DSIA_MUX, .present = true },
 	[tegra_clk_dsib_mux] = { .dt_id = TEGRA124_CLK_DSIB_MUX, .present = true },
+	[tegra_clk_cap_c2bus] = { .dt_id = TEGRA124_CLK_CAP_C2BUS, .present = true },
+	[tegra_clk_cap_throttle_c2bus] = { .dt_id = TEGRA124_CLK_CAP_THROTTLE_C2BUS, .present = true },
+	[tegra_clk_floor_c2bus] = { .dt_id = TEGRA124_CLK_FLOOR_C2BUS, .present = true },
+	[tegra_clk_override_c2bus] = { .dt_id = TEGRA124_CLK_OVERRIDE_C2BUS, .present = true },
+	[tegra_clk_edp_c2bus] = { .dt_id = TEGRA124_CLK_EDP_C2BUS, .present = true },
+	[tegra_clk_battery_c2bus] = { .dt_id = TEGRA124_CLK_BATTERY_C2BUS, .present = true },
+	[tegra_clk_cap_profile_c2bus] = { .dt_id = TEGRA124_CLK_CAP_PROFILE_C2BUS, .present = true },
+	[tegra_clk_cap_c3bus] = { .dt_id = TEGRA124_CLK_CAP_C3BUS, .present = true },
+	[tegra_clk_cap_throttle_c3bus] = { .dt_id = TEGRA124_CLK_CAP_THROTTLE_C3BUS, .present = true },
+	[tegra_clk_override_c3bus] = { .dt_id = TEGRA124_CLK_OVERRIDE_C3BUS, .present = true },
 };
 
 static struct tegra_devclk devclks[] __initdata = {
@@ -1297,6 +1307,45 @@ static void __init tegra124_pll_init(void __iomem *clk_base,
 
 }
 
+static const char *cbus_parents[] = { "c2bus", "c3bus" };
+
+static __init void tegra124_shared_clk_init(void)
+{
+	struct clk *clk;
+
+	clk = clk_register_cbus("c2bus", "pll_c2", 0, "pll_p", 0,
+					700000000);
+	clk_register_clkdev(clk, "c2bus", NULL);
+	clks[TEGRA124_CLK_C2BUS] = clk;
+
+	clk = clk_register_cbus("c3bus", "pll_c3", 0, "pll_p", 0,
+					700000000);
+	clk_register_clkdev(clk, "c3bus", NULL);
+	clks[TEGRA124_CLK_C3BUS] = clk;
+
+	clk = clk_register_shared("msenc.cbus", &cbus_parents[0], 1, 0, 0,
+					"msenc");
+	clks[TEGRA124_CLK_MSENC_CBUS] = clk;
+
+	clk = clk_register_shared("vde.cbus", &cbus_parents[0], 1, 0, 0,
+					"vde");
+	clks[TEGRA124_CLK_VDE_CBUS] = clk;
+
+	clk = clk_register_shared("se.cbus", &cbus_parents[0], 1, 0, 0,
+					"se");
+	clks[TEGRA124_CLK_SE_CBUS] = clk;
+
+	clk = clk_register_shared("tsec.cbus", &cbus_parents[1], 1, 0, 0,
+					"tsec");
+	clks[TEGRA124_CLK_TSEC_CBUS] = clk;
+
+	clk = clk_register_shared("vic03.cbus", &cbus_parents[1], 1, 0, 0,
+					"vic03");
+	clks[TEGRA124_CLK_VIC03_CBUS] = clk;
+
+	tegra_shared_clk_init(tegra124_clks);
+}
+
 /* Tegra124 CPU clock and reset control functions */
 static void tegra124_wait_cpu_in_reset(u32 cpu)
 {
@@ -1414,6 +1463,7 @@ static void __init tegra124_clock_init(struct device_node *np)
 	tegra124_periph_clk_init(clk_base, pmc_base);
 	tegra_audio_clk_init(clk_base, pmc_base, tegra124_clks, &pll_a_params);
 	tegra_pmc_clk_init(pmc_base, tegra124_clks);
+	tegra124_shared_clk_init();
 
 	tegra_super_clk_gen4_init(clk_base, pmc_base, tegra124_clks,
 					&pll_x_params);
diff --git a/include/dt-bindings/clock/tegra124-car.h b/include/dt-bindings/clock/tegra124-car.h
index 433528a..943d485 100644
--- a/include/dt-bindings/clock/tegra124-car.h
+++ b/include/dt-bindings/clock/tegra124-car.h
@@ -336,6 +336,74 @@
 #define TEGRA124_CLK_DSIA_MUX 309
 #define TEGRA124_CLK_DSIB_MUX 310
 #define TEGRA124_CLK_SOR0_LVDS 311
-#define TEGRA124_CLK_CLK_MAX 312
+#define TEGRA124_CLK_C2BUS 401
+#define TEGRA124_CLK_C3BUS 402
+#define TEGRA124_CLK_GR3D_CBUS 403
+#define TEGRA124_CLK_GR2D_CBUS 404
+#define TEGRA124_CLK_EPP_CBUS 405
+#define TEGRA124_CLK_CAP_C2BUS 406
+#define TEGRA124_CLK_CAP_THROTTLE_C2BUS 407
+#define TEGRA124_CLK_FLOOR_C2BUS 408
+#define TEGRA124_CLK_OVERRIDE_C2BUS 409
+#define TEGRA124_CLK_EDP_C2BUS 410
+#define TEGRA124_CLK_BATTERY_C2BUS 411
+#define TEGRA124_CLK_CAP_PROFILE_C2BUS 412
+#define TEGRA124_CLK_MSENC_CBUS 413
+#define TEGRA124_CLK_TSEC_CBUS 414
+#define TEGRA124_CLK_VDE_CBUS 415
+#define TEGRA124_CLK_SE_CBUS 416
+#define TEGRA124_CLK_CAP_C3BUS 417
+#define TEGRA124_CLK_CAP_THROTTLE_C3BUS 418
+#define TEGRA124_CLK_OVERRIDE_C3BUS 419
+#define TEGRA124_CLK_AVP_SCLK 420
+#define TEGRA124_CLK_BSEA_SCLK 421
+#define TEGRA124_CLK_USBD_SCLK 422
+#define TEGRA124_CLK_USB1_SCLK 423
+#define TEGRA124_CLK_USB2_SCLK 424
+#define TEGRA124_CLK_USB3_SCLK 425
+#define TEGRA124_CLK_WAKE_SCLK 426
+#define TEGRA124_CLK_MON_AVP 427
+#define TEGRA124_CLK_CAP_SCLK 428
+#define TEGRA124_CLK_CAP_THROTTLE_SCLK 429
+#define TEGRA124_CLK_FLOOR_SCLK 430
+#define TEGRA124_CLK_OVERRIDE_SCLK 431
+#define TEGRA124_CLK_SBC1_SCLK 432
+#define TEGRA124_CLK_SBC2_SCLK 433
+#define TEGRA124_CLK_SBC3_SCLK 434
+#define TEGRA124_CLK_SBC4_SCLK 435
+#define TEGRA124_CLK_SBC5_SCLK 436
+#define TEGRA124_CLK_SBC6_SCLK 437
+#define TEGRA124_CLK_SBUS 438
+#define TEGRA124_CLK_EMC_MASTER 439
+#define TEGRA124_CLK_AVP_EMC 440
+#define TEGRA124_CLK_CPU_EMC 441
+#define TEGRA124_CLK_DISP1_EMC 442
+#define TEGRA124_CLK_DISP2_EMC 443
+#define TEGRA124_CLK_HDMI_EMC 444
+#define TEGRA124_CLK_USBD_EMC 445
+#define TEGRA124_CLK_USB1_EMC 446
+#define TEGRA124_CLK_USB2_EMC 447
+#define TEGRA124_CLK_USB3_EMC 448
+#define TEGRA124_CLK_MON_EMC 449
+#define TEGRA124_CLK_GR3D_EMC 450
+#define TEGRA124_CLK_GR2D_EMC 451
+#define TEGRA124_CLK_MSENC_EMC 452
+#define TEGRA124_CLK_TSEC_EMC 453
+#define TEGRA124_CLK_SDMMC4_EMC 454
+#define TEGRA124_CLK_CAMERA_EMC 455
+#define TEGRA124_CLK_ISO_EMC 456
+#define TEGRA124_CLK_FLOOR_EMC 457
+#define TEGRA124_CLK_CAP_EMC 458
+#define TEGRA124_CLK_CAP_THROTTLE_EMC 459
+#define TEGRA124_CLK_EDP_EMC 460
+#define TEGRA124_CLK_BATTERY_EMC 461
+#define TEGRA124_CLK_OVERRIDE_EMC 462
+#define TEGRA124_CLK_GK20A_EMC 463
+#define TEGRA124_CLK_VIC03_EMC 464
+#define TEGRA124_CLK_VIC03_CBUS 465
+#define TEGRA124_CLK_ISPA_EMC 466
+#define TEGRA124_CLK_ISPB_EMC 467
+#define TEGRA124_CLK_XUSB_EMC 468
+#define TEGRA124_CLK_CLK_MAX 469
 
 #endif	/* _DT_BINDINGS_CLOCK_TEGRA124_CAR_H */
-- 
1.7.7.rc0.72.g4b5ea.dirty


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [RFC PATCH 3/3] clk: tegra: Implement Tegra124 shared/cbus clks
  2014-05-13 14:06 ` [RFC PATCH 3/3] clk: tegra: Implement Tegra124 shared/cbus clks Peter De Schrijver
@ 2014-05-13 18:09   ` Stephen Warren
  2014-05-13 21:52     ` Andrew Bresticker
  2014-05-14 14:27     ` Thierry Reding
  0 siblings, 2 replies; 18+ messages in thread
From: Stephen Warren @ 2014-05-13 18:09 UTC (permalink / raw)
  To: Peter De Schrijver
  Cc: Mike Turquette, Prashant Gaikwad, Thierry Reding, Rob Herring,
	Pawel Moll, Mark Rutland, Ian Campbell, Kumar Gala,
	Arnd Bergmann, linux-kernel, linux-tegra, devicetree

On 05/13/2014 08:06 AM, Peter De Schrijver wrote:
> Add shared and cbus clocks to the Tegra124 clock implementation.

> diff --git a/include/dt-bindings/clock/tegra124-car.h b/include/dt-bindings/clock/tegra124-car.h

> +#define TEGRA124_CLK_C2BUS 401
> +#define TEGRA124_CLK_C3BUS 402
> +#define TEGRA124_CLK_GR3D_CBUS 403
> +#define TEGRA124_CLK_GR2D_CBUS 404
...

I worry about this a bit. IIUC, these clocks don't actually exist in HW,
but are more a way of SW applying policy to the clock that do exist in
HW. As such, I'm not convinced it's a good idea to expose these clock
IDS to DT, since DT is supposed to represent the HW, and not be
influenced by internal SW implementation details.

Do any DTs actually need to used these new clock IDs? I don't think we
could actually use these value in e.g. tegra124.dtsi's clocks
properties, since these clocks don't exist in HW. Was it your intent to
do that? If not, can't we just define these SW-internal clock IDs in the
header inside the Tegra clock driver, so the values are invisible to DT?

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC PATCH 3/3] clk: tegra: Implement Tegra124 shared/cbus clks
  2014-05-13 18:09   ` Stephen Warren
@ 2014-05-13 21:52     ` Andrew Bresticker
  2014-05-14 14:27     ` Thierry Reding
  1 sibling, 0 replies; 18+ messages in thread
From: Andrew Bresticker @ 2014-05-13 21:52 UTC (permalink / raw)
  To: Stephen Warren
  Cc: Peter De Schrijver, Mike Turquette, Prashant Gaikwad,
	Thierry Reding, Rob Herring, Pawel Moll, Mark Rutland,
	Ian Campbell, Kumar Gala, Arnd Bergmann, linux-kernel,
	linux-tegra, devicetree

On Tue, May 13, 2014 at 11:09 AM, Stephen Warren <swarren@wwwdotorg.org> wrote:
> On 05/13/2014 08:06 AM, Peter De Schrijver wrote:
>> Add shared and cbus clocks to the Tegra124 clock implementation.
>
>> diff --git a/include/dt-bindings/clock/tegra124-car.h b/include/dt-bindings/clock/tegra124-car.h
>
>> +#define TEGRA124_CLK_C2BUS 401
>> +#define TEGRA124_CLK_C3BUS 402
>> +#define TEGRA124_CLK_GR3D_CBUS 403
>> +#define TEGRA124_CLK_GR2D_CBUS 404
> ...
>
> I worry about this a bit. IIUC, these clocks don't actually exist in HW,
> but are more a way of SW applying policy to the clock that do exist in
> HW. As such, I'm not convinced it's a good idea to expose these clock
> IDS to DT, since DT is supposed to represent the HW, and not be
> influenced by internal SW implementation details.

The purpose of these IDs, I believe, is to be used in bindings for consumer
devices so that the driver can get() the shared clock.

> Do any DTs actually need to used these new clock IDs?

AFAIK no driver currently supports the use of these shared clocks, but
they could.
In the chromium kernel, for example, we have devices which require an
"emc" clock
in their bindings that the driver uses to vote on SDRAM frequency.  I
guess we could
avoid specifying these clocks in the DT by creating aliases for them,
but I'm not sure
that's any better.

> I don't think we could actually use these value in e.g. tegra124.dtsi's clocks
> properties, since these clocks don't exist in HW.

The DT IDs are simply used to look up the struct clk corresponding to the ID
in the per-SoC clks array.  They aren't used for indexing into a hardware
register or anything like that, though they happen to currently be assigned so
that they match the corresponding clock's hardware ID.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC PATCH 3/3] clk: tegra: Implement Tegra124 shared/cbus clks
  2014-05-13 18:09   ` Stephen Warren
  2014-05-13 21:52     ` Andrew Bresticker
@ 2014-05-14 14:27     ` Thierry Reding
  2014-05-14 17:58       ` Andrew Bresticker
                         ` (2 more replies)
  1 sibling, 3 replies; 18+ messages in thread
From: Thierry Reding @ 2014-05-14 14:27 UTC (permalink / raw)
  To: Stephen Warren
  Cc: Peter De Schrijver, Mike Turquette, Prashant Gaikwad,
	Rob Herring, Pawel Moll, Mark Rutland, Ian Campbell, Kumar Gala,
	Arnd Bergmann, linux-kernel, linux-tegra, devicetree

[-- Attachment #1: Type: text/plain, Size: 2747 bytes --]

On Tue, May 13, 2014 at 12:09:49PM -0600, Stephen Warren wrote:
> On 05/13/2014 08:06 AM, Peter De Schrijver wrote:
> > Add shared and cbus clocks to the Tegra124 clock implementation.
> 
> > diff --git a/include/dt-bindings/clock/tegra124-car.h b/include/dt-bindings/clock/tegra124-car.h
> 
> > +#define TEGRA124_CLK_C2BUS 401
> > +#define TEGRA124_CLK_C3BUS 402
> > +#define TEGRA124_CLK_GR3D_CBUS 403
> > +#define TEGRA124_CLK_GR2D_CBUS 404
> ...
> 
> I worry about this a bit. IIUC, these clocks don't actually exist in HW,
> but are more a way of SW applying policy to the clock that do exist in
> HW. As such, I'm not convinced it's a good idea to expose these clock
> IDS to DT, since DT is supposed to represent the HW, and not be
> influenced by internal SW implementation details.
> 
> Do any DTs actually need to used these new clock IDs? I don't think we
> could actually use these value in e.g. tegra124.dtsi's clocks
> properties, since these clocks don't exist in HW. Was it your intent to
> do that? If not, can't we just define these SW-internal clock IDs in the
> header inside the Tegra clock driver, so the values are invisible to DT?

I'm beginning to wonder if abusing clocks in this way is really the best
solution. From what I understand there are two problems here that are
mostly orthogonal though they're implemented using similar techniques.

The reason for introducing cbus clocks are still unclear to me. From the
cover letter of this patch series it seems like these should be
completely hidden from drivers and as such they don't belong in device
tree. Also if they are an implementation detail, why are they even
implemented as clocks? Perhaps an example use-case would help illustrate
the need for this.

As for shared clocks I'm only aware of one use-case, namely EMC scaling.
Using clocks for that doesn't seem like the best option to me. While it
can probably fix the immediate issue of choosing an appropriate
frequency for the EMC clock it isn't a complete solution for the problem
that we're trying to solve. From what I understand EMC scaling is one
part of ensuring quality of service. The current implementations of that
seems to abuse clocks (essentially one X.emc clock per X clock) to
signal the amount of memory bandwidth required by any given device. But
there are other parts to the puzzle. Latency allowance is one. The value
programmed to the latency allowance registers for example depends on the
EMC frequency.

Has anyone ever looked into using a different framework to model all of
these requirements? PM QoS looks like it might fit, but if none of the
existing frameworks have what we need, perhaps something new can be
created.

Thierry

[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC PATCH 3/3] clk: tegra: Implement Tegra124 shared/cbus clks
  2014-05-14 14:27     ` Thierry Reding
@ 2014-05-14 17:58       ` Andrew Bresticker
  2014-05-15 10:17         ` Peter De Schrijver
  2014-05-14 19:35       ` Mike Turquette
  2014-05-15 10:52       ` Peter De Schrijver
  2 siblings, 1 reply; 18+ messages in thread
From: Andrew Bresticker @ 2014-05-14 17:58 UTC (permalink / raw)
  To: Thierry Reding
  Cc: Stephen Warren, Peter De Schrijver, Mike Turquette,
	Prashant Gaikwad, Rob Herring, Pawel Moll, Mark Rutland,
	Ian Campbell, Kumar Gala, Arnd Bergmann, linux-kernel,
	linux-tegra, devicetree

On Wed, May 14, 2014 at 7:27 AM, Thierry Reding
<thierry.reding@gmail.com> wrote:
> As for shared clocks I'm only aware of one use-case, namely EMC scaling.
> Using clocks for that doesn't seem like the best option to me. While it
> can probably fix the immediate issue of choosing an appropriate
> frequency for the EMC clock it isn't a complete solution for the problem
> that we're trying to solve. From what I understand EMC scaling is one
> part of ensuring quality of service. The current implementations of that
> seems to abuse clocks (essentially one X.emc clock per X clock) to
> signal the amount of memory bandwidth required by any given device. But
> there are other parts to the puzzle. Latency allowance is one. The value
> programmed to the latency allowance registers for example depends on the
> EMC frequency.
>
> Has anyone ever looked into using a different framework to model all of
> these requirements? PM QoS looks like it might fit, but if none of the
> existing frameworks have what we need, perhaps something new can be
> created.

On Exynos we use devfreq, though in that case we monitor performance
counters to determine how internal buses should be scaled - not sure
if Tegra SoCs have similar counters that could be used for this
purpose.  It seems like EMC scaling would fit nicely within the PM QoS
framework, perhaps with a new PM_QOS_MEMORY_THROUGHPUT class.

-Andrew

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC PATCH 3/3] clk: tegra: Implement Tegra124 shared/cbus clks
  2014-05-14 14:27     ` Thierry Reding
  2014-05-14 17:58       ` Andrew Bresticker
@ 2014-05-14 19:35       ` Mike Turquette
  2014-05-15 10:59         ` Peter De Schrijver
  2014-05-26 13:07         ` Thierry Reding
  2014-05-15 10:52       ` Peter De Schrijver
  2 siblings, 2 replies; 18+ messages in thread
From: Mike Turquette @ 2014-05-14 19:35 UTC (permalink / raw)
  To: Thierry Reding, Stephen Warren
  Cc: Peter De Schrijver, Prashant Gaikwad, Rob Herring, Pawel Moll,
	Mark Rutland, Ian Campbell, Kumar Gala, Arnd Bergmann,
	linux-kernel, linux-tegra, devicetree

Quoting Thierry Reding (2014-05-14 07:27:40)
> On Tue, May 13, 2014 at 12:09:49PM -0600, Stephen Warren wrote:
> > On 05/13/2014 08:06 AM, Peter De Schrijver wrote:
> > > Add shared and cbus clocks to the Tegra124 clock implementation.
> > 
> > > diff --git a/include/dt-bindings/clock/tegra124-car.h b/include/dt-bindings/clock/tegra124-car.h
> > 
> > > +#define TEGRA124_CLK_C2BUS 401
> > > +#define TEGRA124_CLK_C3BUS 402
> > > +#define TEGRA124_CLK_GR3D_CBUS 403
> > > +#define TEGRA124_CLK_GR2D_CBUS 404
> > ...
> > 
> > I worry about this a bit. IIUC, these clocks don't actually exist in HW,
> > but are more a way of SW applying policy to the clock that do exist in
> > HW. As such, I'm not convinced it's a good idea to expose these clock
> > IDS to DT, since DT is supposed to represent the HW, and not be
> > influenced by internal SW implementation details.
> > 
> > Do any DTs actually need to used these new clock IDs? I don't think we
> > could actually use these value in e.g. tegra124.dtsi's clocks
> > properties, since these clocks don't exist in HW. Was it your intent to
> > do that? If not, can't we just define these SW-internal clock IDs in the
> > header inside the Tegra clock driver, so the values are invisible to DT?
> 
> I'm beginning to wonder if abusing clocks in this way is really the best
> solution. From what I understand there are two problems here that are
> mostly orthogonal though they're implemented using similar techniques.

Ack. "Virtual clocks" have been implemented by vendors before as a way
to manage complicated clock rate changes. I do not think we should
support such a method upstream.

I'm working with another engineer in Linaro on a "coordinated clock rate
change" series that might help solve some of the problems that this
patch series is trying to achieve.

> 
> The reason for introducing cbus clocks are still unclear to me. From the
> cover letter of this patch series it seems like these should be
> completely hidden from drivers and as such they don't belong in device
> tree. Also if they are an implementation detail, why are they even
> implemented as clocks? Perhaps an example use-case would help illustrate
> the need for this.

I also have this question. Does "cbus" come from your TRM or data sheet?
Or is it purely a software solution to coordinating rate changes within
known limits and for validated combinations?

> 
> As for shared clocks I'm only aware of one use-case, namely EMC scaling.
> Using clocks for that doesn't seem like the best option to me. While it
> can probably fix the immediate issue of choosing an appropriate
> frequency for the EMC clock it isn't a complete solution for the problem
> that we're trying to solve. From what I understand EMC scaling is one
> part of ensuring quality of service. The current implementations of that
> seems to abuse clocks (essentially one X.emc clock per X clock) to
> signal the amount of memory bandwidth required by any given device. But
> there are other parts to the puzzle. Latency allowance is one. The value
> programmed to the latency allowance registers for example depends on the
> EMC frequency.
> 
> Has anyone ever looked into using a different framework to model all of
> these requirements? PM QoS looks like it might fit, but if none of the
> existing frameworks have what we need, perhaps something new can be
> created.

It has been discussed. Using a QoS throughput constraint could help
scale frequency. But this deserves a wider discussion and starts to
stray into both PM QoS territory and also into "should we have a DVFS
framework" territory.

Regards,
Mike

> 
> Thierry

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC PATCH 3/3] clk: tegra: Implement Tegra124 shared/cbus clks
  2014-05-14 17:58       ` Andrew Bresticker
@ 2014-05-15 10:17         ` Peter De Schrijver
  0 siblings, 0 replies; 18+ messages in thread
From: Peter De Schrijver @ 2014-05-15 10:17 UTC (permalink / raw)
  To: Andrew Bresticker
  Cc: Thierry Reding, Stephen Warren, Mike Turquette, Prashant Gaikwad,
	Rob Herring, Pawel Moll, Mark Rutland, Ian Campbell, Kumar Gala,
	Arnd Bergmann, linux-kernel, linux-tegra, devicetree

On Wed, May 14, 2014 at 07:58:26PM +0200, Andrew Bresticker wrote:
> On Wed, May 14, 2014 at 7:27 AM, Thierry Reding
> <thierry.reding@gmail.com> wrote:
> > As for shared clocks I'm only aware of one use-case, namely EMC scaling.
> > Using clocks for that doesn't seem like the best option to me. While it
> > can probably fix the immediate issue of choosing an appropriate
> > frequency for the EMC clock it isn't a complete solution for the problem
> > that we're trying to solve. From what I understand EMC scaling is one
> > part of ensuring quality of service. The current implementations of that
> > seems to abuse clocks (essentially one X.emc clock per X clock) to
> > signal the amount of memory bandwidth required by any given device. But
> > there are other parts to the puzzle. Latency allowance is one. The value
> > programmed to the latency allowance registers for example depends on the
> > EMC frequency.
> >
> > Has anyone ever looked into using a different framework to model all of
> > these requirements? PM QoS looks like it might fit, but if none of the
> > existing frameworks have what we need, perhaps something new can be
> > created.
> 
> On Exynos we use devfreq, though in that case we monitor performance
> counters to determine how internal buses should be scaled - not sure
> if Tegra SoCs have similar counters that could be used for this
> purpose.  It seems like EMC scaling would fit nicely within the PM QoS
> framework, perhaps with a new PM_QOS_MEMORY_THROUGHPUT class.

We do have counters, however, counters are reactive which is a problem
for some isochronous clients (eg. display). Counters also don't solve
the latency problem.

Cheers,

Peter.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC PATCH 3/3] clk: tegra: Implement Tegra124 shared/cbus clks
  2014-05-14 14:27     ` Thierry Reding
  2014-05-14 17:58       ` Andrew Bresticker
  2014-05-14 19:35       ` Mike Turquette
@ 2014-05-15 10:52       ` Peter De Schrijver
  2014-05-15 20:20         ` Stephen Warren
  2 siblings, 1 reply; 18+ messages in thread
From: Peter De Schrijver @ 2014-05-15 10:52 UTC (permalink / raw)
  To: Thierry Reding
  Cc: Stephen Warren, Mike Turquette, Prashant Gaikwad, Rob Herring,
	Pawel Moll, Mark Rutland, Ian Campbell, Kumar Gala,
	Arnd Bergmann, linux-kernel, linux-tegra, devicetree

On Wed, May 14, 2014 at 04:27:40PM +0200, Thierry Reding wrote:
> * PGP Signed by an unknown key
> 
> On Tue, May 13, 2014 at 12:09:49PM -0600, Stephen Warren wrote:
> > On 05/13/2014 08:06 AM, Peter De Schrijver wrote:
> > > Add shared and cbus clocks to the Tegra124 clock implementation.
> > 
> > > diff --git a/include/dt-bindings/clock/tegra124-car.h b/include/dt-bindings/clock/tegra124-car.h
> > 
> > > +#define TEGRA124_CLK_C2BUS 401
> > > +#define TEGRA124_CLK_C3BUS 402
> > > +#define TEGRA124_CLK_GR3D_CBUS 403
> > > +#define TEGRA124_CLK_GR2D_CBUS 404
> > ...
> > 
> > I worry about this a bit. IIUC, these clocks don't actually exist in HW,
> > but are more a way of SW applying policy to the clock that do exist in
> > HW. As such, I'm not convinced it's a good idea to expose these clock
> > IDS to DT, since DT is supposed to represent the HW, and not be
> > influenced by internal SW implementation details.
> > 
> > Do any DTs actually need to used these new clock IDs? I don't think we
> > could actually use these value in e.g. tegra124.dtsi's clocks
> > properties, since these clocks don't exist in HW. Was it your intent to
> > do that? If not, can't we just define these SW-internal clock IDs in the
> > header inside the Tegra clock driver, so the values are invisible to DT?
> 
> I'm beginning to wonder if abusing clocks in this way is really the best
> solution. From what I understand there are two problems here that are
> mostly orthogonal though they're implemented using similar techniques.
> 
> The reason for introducing cbus clocks are still unclear to me. From the
> cover letter of this patch series it seems like these should be
> completely hidden from drivers and as such they don't belong in device
> tree. Also if they are an implementation detail, why are they even
> implemented as clocks? Perhaps an example use-case would help illustrate
> the need for this.

We don't have a PLL per engine, hence we have to use a PLL as parent for
several module clocks. However you can't change a PLLs rate with
active clients. So for scaling the PLL clocking eg. VIC or MSENC, you need to
change their parent to a different PLL, change the original PLL rate and change
the parent back to the original PLL, all while ensuring you never exceed the
maximum allowed clock at the current voltage. You also want to take into
account if a module is clocked so you don't bother handling clocks which are
disabled. (eg. if only the VIC clock is enabled, there is no point in changing
the MSENC parent). All this is handled by the 'cbus' clock.

> 
> As for shared clocks I'm only aware of one use-case, namely EMC scaling.
> Using clocks for that doesn't seem like the best option to me. While it
> can probably fix the immediate issue of choosing an appropriate
> frequency for the EMC clock it isn't a complete solution for the problem
> that we're trying to solve. From what I understand EMC scaling is one
> part of ensuring quality of service. The current implementations of that
> seems to abuse clocks (essentially one X.emc clock per X clock) to
> signal the amount of memory bandwidth required by any given device. But
> there are other parts to the puzzle. Latency allowance is one. The value
> programmed to the latency allowance registers for example depends on the
> EMC frequency.
> 

There are more usecases:

1) sclk scaling, which is similar to emc in that it has many modules who want
to influence this clock. The problem here is that sclk clocks the AVP
and is used as a parent for the AHB and APB clocks. So there are many drivers
who want to vote on this.

2) thermal capping. We want to limit eg the GPU clockrate due to thermals.

3) 'cbus' scaling. We want to scale the PLLs clocking several modules and
therefore several drivers need to be able to 'vote' on the rate. Also here
we don't want to take disabled clocks into account for the final rate
calculation.

Case 1 and 2 could presumably be handled by PM QoS, although for case 2 we
need to enforce an upper bound rather than a minimum rate. The PM QoS
maintainer has up to now rejected any patches which add PM QoS constraints
to limit a clock or another variable. For case 1 we could add a 
'bus throughput' QoS parameter to control sclk. The units would then be
MiB/s or something simlar. However sclk also clocks the AVP and would be
rather strange to require a driver to set a certain bus throughput requirement
to ensure the AVP runs fast enough.

For case 3, I don't see any existing mechanism to handle this. I don't think
PM QoS is helping here because PM QoS is supposed to deal with higher level
units (eg MiB/s), but in this case the only relationship between the modules
is that we run from the same PLL. So there is no higher level unit which makes
sense here.


Cheers,

Peter.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC PATCH 3/3] clk: tegra: Implement Tegra124 shared/cbus clks
  2014-05-14 19:35       ` Mike Turquette
@ 2014-05-15 10:59         ` Peter De Schrijver
  2014-05-26 13:07         ` Thierry Reding
  1 sibling, 0 replies; 18+ messages in thread
From: Peter De Schrijver @ 2014-05-15 10:59 UTC (permalink / raw)
  To: Mike Turquette
  Cc: Thierry Reding, Stephen Warren, Prashant Gaikwad, Rob Herring,
	Pawel Moll, Mark Rutland, Ian Campbell, Kumar Gala,
	Arnd Bergmann, linux-kernel, linux-tegra, devicetree

On Wed, May 14, 2014 at 09:35:18PM +0200, Mike Turquette wrote:
> Quoting Thierry Reding (2014-05-14 07:27:40)
> > On Tue, May 13, 2014 at 12:09:49PM -0600, Stephen Warren wrote:
> > > On 05/13/2014 08:06 AM, Peter De Schrijver wrote:
> > > > Add shared and cbus clocks to the Tegra124 clock implementation.
> > > 
> > > > diff --git a/include/dt-bindings/clock/tegra124-car.h b/include/dt-bindings/clock/tegra124-car.h
> > > 
> > > > +#define TEGRA124_CLK_C2BUS 401
> > > > +#define TEGRA124_CLK_C3BUS 402
> > > > +#define TEGRA124_CLK_GR3D_CBUS 403
> > > > +#define TEGRA124_CLK_GR2D_CBUS 404
> > > ...
> > > 
> > > I worry about this a bit. IIUC, these clocks don't actually exist in HW,
> > > but are more a way of SW applying policy to the clock that do exist in
> > > HW. As such, I'm not convinced it's a good idea to expose these clock
> > > IDS to DT, since DT is supposed to represent the HW, and not be
> > > influenced by internal SW implementation details.
> > > 
> > > Do any DTs actually need to used these new clock IDs? I don't think we
> > > could actually use these value in e.g. tegra124.dtsi's clocks
> > > properties, since these clocks don't exist in HW. Was it your intent to
> > > do that? If not, can't we just define these SW-internal clock IDs in the
> > > header inside the Tegra clock driver, so the values are invisible to DT?
> > 
> > I'm beginning to wonder if abusing clocks in this way is really the best
> > solution. From what I understand there are two problems here that are
> > mostly orthogonal though they're implemented using similar techniques.
> 
> Ack. "Virtual clocks" have been implemented by vendors before as a way
> to manage complicated clock rate changes. I do not think we should
> support such a method upstream.
> 
> I'm working with another engineer in Linaro on a "coordinated clock rate
> change" series that might help solve some of the problems that this
> patch series is trying to achieve.
> 

Any preview? :)

For us to be useful it needs to be possible to:

1) change to a different parent during a rate change
2) adjust a clocks divider when changing parents
3) ignore disabled child clocks
4) have notifiers to hook voltage scaling into

> > 
> > The reason for introducing cbus clocks are still unclear to me. From the
> > cover letter of this patch series it seems like these should be
> > completely hidden from drivers and as such they don't belong in device
> > tree. Also if they are an implementation detail, why are they even
> > implemented as clocks? Perhaps an example use-case would help illustrate
> > the need for this.
> 
> I also have this question. Does "cbus" come from your TRM or data sheet?
> Or is it purely a software solution to coordinating rate changes within
> known limits and for validated combinations?
> 

cbus is a software solution. It's not menioned in any TRM or hardware
document.

Cheers,

Peter.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC PATCH 3/3] clk: tegra: Implement Tegra124 shared/cbus clks
  2014-05-15 10:52       ` Peter De Schrijver
@ 2014-05-15 20:20         ` Stephen Warren
  2014-05-16 19:58           ` Mike Turquette
  0 siblings, 1 reply; 18+ messages in thread
From: Stephen Warren @ 2014-05-15 20:20 UTC (permalink / raw)
  To: Peter De Schrijver, Thierry Reding
  Cc: Mike Turquette, Prashant Gaikwad, Rob Herring, Pawel Moll,
	Mark Rutland, Ian Campbell, Kumar Gala, Arnd Bergmann,
	linux-kernel, linux-tegra, devicetree

On 05/15/2014 04:52 AM, Peter De Schrijver wrote:
> On Wed, May 14, 2014 at 04:27:40PM +0200, Thierry Reding wrote:
>> * PGP Signed by an unknown key
>>
>> On Tue, May 13, 2014 at 12:09:49PM -0600, Stephen Warren wrote:
>>> On 05/13/2014 08:06 AM, Peter De Schrijver wrote:
>>>> Add shared and cbus clocks to the Tegra124 clock implementation.
>>>
>>>> diff --git a/include/dt-bindings/clock/tegra124-car.h b/include/dt-bindings/clock/tegra124-car.h
>>>
>>>> +#define TEGRA124_CLK_C2BUS 401
>>>> +#define TEGRA124_CLK_C3BUS 402
>>>> +#define TEGRA124_CLK_GR3D_CBUS 403
>>>> +#define TEGRA124_CLK_GR2D_CBUS 404
>>> ...
>>>
>>> I worry about this a bit. IIUC, these clocks don't actually exist in HW,
>>> but are more a way of SW applying policy to the clock that do exist in
>>> HW. As such, I'm not convinced it's a good idea to expose these clock
>>> IDS to DT, since DT is supposed to represent the HW, and not be
>>> influenced by internal SW implementation details.
>>>
>>> Do any DTs actually need to used these new clock IDs? I don't think we
>>> could actually use these value in e.g. tegra124.dtsi's clocks
>>> properties, since these clocks don't exist in HW. Was it your intent to
>>> do that? If not, can't we just define these SW-internal clock IDs in the
>>> header inside the Tegra clock driver, so the values are invisible to DT?
>>
>> I'm beginning to wonder if abusing clocks in this way is really the best
>> solution. From what I understand there are two problems here that are
>> mostly orthogonal though they're implemented using similar techniques.
>>
>> The reason for introducing cbus clocks are still unclear to me. From the
>> cover letter of this patch series it seems like these should be
>> completely hidden from drivers and as such they don't belong in device
>> tree. Also if they are an implementation detail, why are they even
>> implemented as clocks? Perhaps an example use-case would help illustrate
>> the need for this.
> 
> We don't have a PLL per engine, hence we have to use a PLL as parent for
> several module clocks. However you can't change a PLLs rate with
> active clients. So for scaling the PLL clocking eg. VIC or MSENC, you need to
> change their parent to a different PLL, change the original PLL rate and change
> the parent back to the original PLL, all while ensuring you never exceed the
> maximum allowed clock at the current voltage. You also want to take into
> account if a module is clocked so you don't bother handling clocks which are
> disabled. (eg. if only the VIC clock is enabled, there is no point in changing
> the MSENC parent). All this is handled by the 'cbus' clock.

Presumably though we can handle this "cbus" concept entirely inside the
clock driver.

What happens right now is that when a DT node references a clock, the
driver gets a clock and then manipulates it directly. What if the clock
core was reworked a bit such that every single clock was a "cbus" clock.
clk_get() wouldn't return the raw clock object itself, but rather a
"clock client" object, which would forward requests on to the underlying
clk. If there's only 1 clk_get(), there's only 1 client, so all requests
get forwarded automatically. If there are n clk_get_requests(), the
clock object gets to implement the appropriate voting/... algorithm to
mediate the requests.

That way, we don't have to expose any of this logic in the device tree,
or hopefully/mostly even outside the HW clock's implementation.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC PATCH 3/3] clk: tegra: Implement Tegra124 shared/cbus clks
  2014-05-15 20:20         ` Stephen Warren
@ 2014-05-16 19:58           ` Mike Turquette
  0 siblings, 0 replies; 18+ messages in thread
From: Mike Turquette @ 2014-05-16 19:58 UTC (permalink / raw)
  To: Stephen Warren, Peter De Schrijver, Thierry Reding
  Cc: Prashant Gaikwad, Rob Herring, Pawel Moll, Mark Rutland,
	Ian Campbell, Kumar Gala, Arnd Bergmann, linux-kernel,
	linux-tegra, devicetree

Quoting Stephen Warren (2014-05-15 13:20:21)
> On 05/15/2014 04:52 AM, Peter De Schrijver wrote:
> > On Wed, May 14, 2014 at 04:27:40PM +0200, Thierry Reding wrote:
> >> * PGP Signed by an unknown key
> >>
> >> On Tue, May 13, 2014 at 12:09:49PM -0600, Stephen Warren wrote:
> >>> On 05/13/2014 08:06 AM, Peter De Schrijver wrote:
> >>>> Add shared and cbus clocks to the Tegra124 clock implementation.
> >>>
> >>>> diff --git a/include/dt-bindings/clock/tegra124-car.h b/include/dt-bindings/clock/tegra124-car.h
> >>>
> >>>> +#define TEGRA124_CLK_C2BUS 401
> >>>> +#define TEGRA124_CLK_C3BUS 402
> >>>> +#define TEGRA124_CLK_GR3D_CBUS 403
> >>>> +#define TEGRA124_CLK_GR2D_CBUS 404
> >>> ...
> >>>
> >>> I worry about this a bit. IIUC, these clocks don't actually exist in HW,
> >>> but are more a way of SW applying policy to the clock that do exist in
> >>> HW. As such, I'm not convinced it's a good idea to expose these clock
> >>> IDS to DT, since DT is supposed to represent the HW, and not be
> >>> influenced by internal SW implementation details.
> >>>
> >>> Do any DTs actually need to used these new clock IDs? I don't think we
> >>> could actually use these value in e.g. tegra124.dtsi's clocks
> >>> properties, since these clocks don't exist in HW. Was it your intent to
> >>> do that? If not, can't we just define these SW-internal clock IDs in the
> >>> header inside the Tegra clock driver, so the values are invisible to DT?
> >>
> >> I'm beginning to wonder if abusing clocks in this way is really the best
> >> solution. From what I understand there are two problems here that are
> >> mostly orthogonal though they're implemented using similar techniques.
> >>
> >> The reason for introducing cbus clocks are still unclear to me. From the
> >> cover letter of this patch series it seems like these should be
> >> completely hidden from drivers and as such they don't belong in device
> >> tree. Also if they are an implementation detail, why are they even
> >> implemented as clocks? Perhaps an example use-case would help illustrate
> >> the need for this.
> > 
> > We don't have a PLL per engine, hence we have to use a PLL as parent for
> > several module clocks. However you can't change a PLLs rate with
> > active clients. So for scaling the PLL clocking eg. VIC or MSENC, you need to
> > change their parent to a different PLL, change the original PLL rate and change
> > the parent back to the original PLL, all while ensuring you never exceed the
> > maximum allowed clock at the current voltage. You also want to take into
> > account if a module is clocked so you don't bother handling clocks which are
> > disabled. (eg. if only the VIC clock is enabled, there is no point in changing
> > the MSENC parent). All this is handled by the 'cbus' clock.
> 
> Presumably though we can handle this "cbus" concept entirely inside the
> clock driver.
> 
> What happens right now is that when a DT node references a clock, the
> driver gets a clock and then manipulates it directly. What if the clock
> core was reworked a bit such that every single clock was a "cbus" clock.
> clk_get() wouldn't return the raw clock object itself, but rather a
> "clock client" object, which would forward requests on to the underlying
> clk. If there's only 1 clk_get(), there's only 1 client, so all requests
> get forwarded automatically. If there are n clk_get_requests(), the
> clock object gets to implement the appropriate voting/... algorithm to
> mediate the requests.

This was proposed before[1][2] and is something that would be great to
have. The scary thing is to start introducing policy into the clock
framework, which I'd like to avoid as much as possible. But arbitration
of resources (with requisite reference counting) is pretty much
non-existent for clock rates (last call to clk_set_rate always wins),
and is very rudimentary for prepare/enable (we have use counting, but it
does not track individual clients/clock consumers).

Revisiting Rabin's patches has been at the bottom of my todo list for a
while now. I'm happy for someone else to take a crack at it.

Regards,
Mike

[1] http://lists.infradead.org/pipermail/linux-arm-kernel/2012-November/135290.html
[2] http://lists.infradead.org/pipermail/linux-arm-kernel/2012-November/135574.html

> 
> That way, we don't have to expose any of this logic in the device tree,
> or hopefully/mostly even outside the HW clock's implementation.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC PATCH 3/3] clk: tegra: Implement Tegra124 shared/cbus clks
  2014-05-14 19:35       ` Mike Turquette
  2014-05-15 10:59         ` Peter De Schrijver
@ 2014-05-26 13:07         ` Thierry Reding
  2014-05-29 23:22           ` Nishanth Menon
  1 sibling, 1 reply; 18+ messages in thread
From: Thierry Reding @ 2014-05-26 13:07 UTC (permalink / raw)
  To: Mike Turquette
  Cc: Stephen Warren, Peter De Schrijver, Prashant Gaikwad,
	Rob Herring, Pawel Moll, Mark Rutland, Ian Campbell, Kumar Gala,
	Arnd Bergmann, linux-kernel, linux-tegra, devicetree, linux-pm

[-- Attachment #1: Type: text/plain, Size: 3232 bytes --]

On Wed, May 14, 2014 at 12:35:18PM -0700, Mike Turquette wrote:
> Quoting Thierry Reding (2014-05-14 07:27:40)
[...]
> > As for shared clocks I'm only aware of one use-case, namely EMC scaling.
> > Using clocks for that doesn't seem like the best option to me. While it
> > can probably fix the immediate issue of choosing an appropriate
> > frequency for the EMC clock it isn't a complete solution for the problem
> > that we're trying to solve. From what I understand EMC scaling is one
> > part of ensuring quality of service. The current implementations of that
> > seems to abuse clocks (essentially one X.emc clock per X clock) to
> > signal the amount of memory bandwidth required by any given device. But
> > there are other parts to the puzzle. Latency allowance is one. The value
> > programmed to the latency allowance registers for example depends on the
> > EMC frequency.
> > 
> > Has anyone ever looked into using a different framework to model all of
> > these requirements? PM QoS looks like it might fit, but if none of the
> > existing frameworks have what we need, perhaps something new can be
> > created.
> 
> It has been discussed. Using a QoS throughput constraint could help
> scale frequency. But this deserves a wider discussion and starts to
> stray into both PM QoS territory and also into "should we have a DVFS
> framework" territory.

I've looked into this for a bit and it doesn't look like PM QoS is going
to be a good match after all. One of the issues I found was that PM QoS
deals with individual devices and there's no builtin way to collect the
requests from multiple devices to produce a global constraint. So if we
want to add something like that either the API would need to be extended
or it would need to be tacked on using the notifier mechanism and some
way of tracking (and filtering) the individual devices.

Looking at devfreq it seems to be the DVFS framework that you mentioned,
but from what I can tell it suffers from mostly the same problems. The
governor applies some frequency scaling policy to a single device and
does not allow multiple devices to register constraints against a single
(global) constraint so that the result can be accumulated.

For Tegra EMC scaling what we need is something more along the lines of
this: we have a resource (external memory) that is shared by multiple
devices in the system. Each of those devices requires a certain amount
of that resource (memory bandwidth). The resource driver will need to
accumulate all requests for the resource and apply the resulting
constraint so that all requests can be satisfied.

One solution I could imagine to make this work with PM QoS would be to
add the concept of a pm_qos_group to manage a set of pm_qos_requests,
but that will require a bunch of extra checks to make sure that requests
are of the correct type and so on. In other words it would still be
tacked on.

Adding the linux-pm mailing list for more visibility. Perhaps somebody
has some ideas on how to extend any of the existing frameworks to make
it work for Tegra's EMC scaling (or how to implement the requirements of
Tegra's EMC scaling within the existing frameworks).

Thierry

[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC PATCH 3/3] clk: tegra: Implement Tegra124 shared/cbus clks
  2014-05-26 13:07         ` Thierry Reding
@ 2014-05-29 23:22           ` Nishanth Menon
  2014-05-30  4:47             ` Mike Turquette
  0 siblings, 1 reply; 18+ messages in thread
From: Nishanth Menon @ 2014-05-29 23:22 UTC (permalink / raw)
  To: Thierry Reding, Mike Turquette
  Cc: Stephen Warren, Peter De Schrijver, Prashant Gaikwad,
	Rob Herring, Pawel Moll, Mark Rutland, Ian Campbell, Kumar Gala,
	Arnd Bergmann, linux-kernel, linux-tegra, devicetree, linux-pm

On 05/26/2014 08:07 AM, Thierry Reding wrote:
> On Wed, May 14, 2014 at 12:35:18PM -0700, Mike Turquette wrote:
>> Quoting Thierry Reding (2014-05-14 07:27:40)
> [...]
>>> As for shared clocks I'm only aware of one use-case, namely EMC scaling.
>>> Using clocks for that doesn't seem like the best option to me. While it
>>> can probably fix the immediate issue of choosing an appropriate
>>> frequency for the EMC clock it isn't a complete solution for the problem
>>> that we're trying to solve. From what I understand EMC scaling is one
>>> part of ensuring quality of service. The current implementations of that
>>> seems to abuse clocks (essentially one X.emc clock per X clock) to
>>> signal the amount of memory bandwidth required by any given device. But
>>> there are other parts to the puzzle. Latency allowance is one. The value
>>> programmed to the latency allowance registers for example depends on the
>>> EMC frequency.
>>>
>>> Has anyone ever looked into using a different framework to model all of
>>> these requirements? PM QoS looks like it might fit, but if none of the
>>> existing frameworks have what we need, perhaps something new can be
>>> created.
>>
>> It has been discussed. Using a QoS throughput constraint could help
>> scale frequency. But this deserves a wider discussion and starts to
>> stray into both PM QoS territory and also into "should we have a DVFS
>> framework" territory.
> 
> I've looked into this for a bit and it doesn't look like PM QoS is going
> to be a good match after all. One of the issues I found was that PM QoS
> deals with individual devices and there's no builtin way to collect the
> requests from multiple devices to produce a global constraint. So if we
> want to add something like that either the API would need to be extended
> or it would need to be tacked on using the notifier mechanism and some
> way of tracking (and filtering) the individual devices.
> 
> Looking at devfreq it seems to be the DVFS framework that you mentioned,
> but from what I can tell it suffers from mostly the same problems. The
> governor applies some frequency scaling policy to a single device and
> does not allow multiple devices to register constraints against a single
> (global) constraint so that the result can be accumulated.
> 
> For Tegra EMC scaling what we need is something more along the lines of
> this: we have a resource (external memory) that is shared by multiple
> devices in the system. Each of those devices requires a certain amount
> of that resource (memory bandwidth). The resource driver will need to
> accumulate all requests for the resource and apply the resulting
> constraint so that all requests can be satisfied.
> 
> One solution I could imagine to make this work with PM QoS would be to
> add the concept of a pm_qos_group to manage a set of pm_qos_requests,
> but that will require a bunch of extra checks to make sure that requests
> are of the correct type and so on. In other words it would still be
> tacked on.

just a minor note from previous experience: We(at TI) had attempted in
our product kernel[1] to use QoS constraint for certain SoCs for
rather unspectacular results.

Our use case was similar: devices -> L3(core bus)->memory. We had the
following intent:
a) wanted to scale L3 based on QoS requests coming in from various
device drivers. intent was to scale either to 133MHz or 266MHz (two
OPPs we supported on our devices) based on performance needs -> So we
asked drivers to report QoS requirements using an standard function -
except drivers cannot always report it satisfactorily - example bursty
transfer devices - ended up with consolidated requests > total
bandwidth possible on the bus -> (and never in practise hitting the
lower frequency).
b) timing closure issues on certain devices such as USB - which can
only function based on async bridge closure requirements on the core
bus etc.. these would require bus to be at higher frequency - QoS
model was "misused" in such requirements.
b.1) a variation: interdependent constraints -> if MPU is > freq X,
timing closure required L3 to be at 266MHz. again - it is not a QoS
requirement perse, just a dependency requirement that cannot easily be
addressed doing a pure QoS like framework solution.

Even though EMC does sound like (a) - I suspect you might want to be
100% sure that you dont have variations of (b) in the SoC as well and
betting completely on QoS approach might not actually work in practice.

> 
> Adding the linux-pm mailing list for more visibility. Perhaps somebody
For folks new on the discussion: complete thread:
http://thread.gmane.org/gmane.linux.drivers.devicetree/73967

> has some ideas on how to extend any of the existing frameworks to make
> it work for Tegra's EMC scaling (or how to implement the requirements of
> Tegra's EMC scaling within the existing frameworks).
> 


[1]
https://android.googlesource.com/kernel/omap.git/+/android-omap-panda-3.0/arch/arm/plat-omap/omap-pm-helper.c

-- 
Regards,
Nishanth Menon

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC PATCH 3/3] clk: tegra: Implement Tegra124 shared/cbus clks
  2014-05-29 23:22           ` Nishanth Menon
@ 2014-05-30  4:47             ` Mike Turquette
  2014-05-30 13:24               ` Nishanth Menon
  0 siblings, 1 reply; 18+ messages in thread
From: Mike Turquette @ 2014-05-30  4:47 UTC (permalink / raw)
  To: Nishanth Menon, Thierry Reding
  Cc: Stephen Warren, Peter De Schrijver, Prashant Gaikwad,
	Rob Herring, Pawel Moll, Mark Rutland, Ian Campbell, Kumar Gala,
	Arnd Bergmann, linux-kernel, linux-tegra, devicetree, linux-pm

Quoting Nishanth Menon (2014-05-29 16:22:45)
> On 05/26/2014 08:07 AM, Thierry Reding wrote:
> > On Wed, May 14, 2014 at 12:35:18PM -0700, Mike Turquette wrote:
> >> Quoting Thierry Reding (2014-05-14 07:27:40)
> > [...]
> >>> As for shared clocks I'm only aware of one use-case, namely EMC scaling.
> >>> Using clocks for that doesn't seem like the best option to me. While it
> >>> can probably fix the immediate issue of choosing an appropriate
> >>> frequency for the EMC clock it isn't a complete solution for the problem
> >>> that we're trying to solve. From what I understand EMC scaling is one
> >>> part of ensuring quality of service. The current implementations of that
> >>> seems to abuse clocks (essentially one X.emc clock per X clock) to
> >>> signal the amount of memory bandwidth required by any given device. But
> >>> there are other parts to the puzzle. Latency allowance is one. The value
> >>> programmed to the latency allowance registers for example depends on the
> >>> EMC frequency.
> >>>
> >>> Has anyone ever looked into using a different framework to model all of
> >>> these requirements? PM QoS looks like it might fit, but if none of the
> >>> existing frameworks have what we need, perhaps something new can be
> >>> created.
> >>
> >> It has been discussed. Using a QoS throughput constraint could help
> >> scale frequency. But this deserves a wider discussion and starts to
> >> stray into both PM QoS territory and also into "should we have a DVFS
> >> framework" territory.
> > 
> > I've looked into this for a bit and it doesn't look like PM QoS is going
> > to be a good match after all. One of the issues I found was that PM QoS
> > deals with individual devices and there's no builtin way to collect the
> > requests from multiple devices to produce a global constraint. So if we
> > want to add something like that either the API would need to be extended
> > or it would need to be tacked on using the notifier mechanism and some
> > way of tracking (and filtering) the individual devices.
> > 
> > Looking at devfreq it seems to be the DVFS framework that you mentioned,
> > but from what I can tell it suffers from mostly the same problems. The
> > governor applies some frequency scaling policy to a single device and
> > does not allow multiple devices to register constraints against a single
> > (global) constraint so that the result can be accumulated.
> > 
> > For Tegra EMC scaling what we need is something more along the lines of
> > this: we have a resource (external memory) that is shared by multiple
> > devices in the system. Each of those devices requires a certain amount
> > of that resource (memory bandwidth). The resource driver will need to
> > accumulate all requests for the resource and apply the resulting
> > constraint so that all requests can be satisfied.
> > 
> > One solution I could imagine to make this work with PM QoS would be to
> > add the concept of a pm_qos_group to manage a set of pm_qos_requests,
> > but that will require a bunch of extra checks to make sure that requests
> > are of the correct type and so on. In other words it would still be
> > tacked on.
> 
> just a minor note from previous experience: We(at TI) had attempted in
> our product kernel[1] to use QoS constraint for certain SoCs for
> rather unspectacular results.
> 
> Our use case was similar: devices -> L3(core bus)->memory. We had the
> following intent:
> a) wanted to scale L3 based on QoS requests coming in from various
> device drivers. intent was to scale either to 133MHz or 266MHz (two
> OPPs we supported on our devices) based on performance needs -> So we
> asked drivers to report QoS requirements using an standard function -
> except drivers cannot always report it satisfactorily - example bursty
> transfer devices - ended up with consolidated requests > total
> bandwidth possible on the bus -> (and never in practise hitting the
> lower frequency).

My opinion on why L3 QoS failed on OMAP is that we only used it for
DVFS. The voltage domain corresponding to the L3 interconnect had only
two OPPs, which meant that drivers submitting their constraints only had
two options: slow vs fast in the best case, or non-functional vs
functional in the worst case (some IPs simply did not work at the lower
OPP).

But the big failure in my opinion was that we did not use any of the
traffic-shaping or priority handling features of the L3 NoC. OMAP4 used
the Arteris NoC[1] which has plenty of capabilities for setting
initiator priorities and bandwidth-throttling on a point-to-point basis,
which is exactly that QoS is for. If these had been exposed a bit more
in software then I imagine some of the constraints that always resulted
in running at the fast OPP might have instead resulted in running at the
slow OPP, but with a fine-grained mixture of varying bandwidth
allocations and access priorities.

All of the hardware in the world doesn't do any good if we don't have
software that uses it. Anyways, just my $0.02.

Regards,
Mike

[1] http://www.arteris.com/OMAP4_QoS_security_NoC

> b) timing closure issues on certain devices such as USB - which can
> only function based on async bridge closure requirements on the core
> bus etc.. these would require bus to be at higher frequency - QoS
> model was "misused" in such requirements.
> b.1) a variation: interdependent constraints -> if MPU is > freq X,
> timing closure required L3 to be at 266MHz. again - it is not a QoS
> requirement perse, just a dependency requirement that cannot easily be
> addressed doing a pure QoS like framework solution.
> 
> Even though EMC does sound like (a) - I suspect you might want to be
> 100% sure that you dont have variations of (b) in the SoC as well and
> betting completely on QoS approach might not actually work in practice.
> 
> > 
> > Adding the linux-pm mailing list for more visibility. Perhaps somebody
> For folks new on the discussion: complete thread:
> http://thread.gmane.org/gmane.linux.drivers.devicetree/73967
> 
> > has some ideas on how to extend any of the existing frameworks to make
> > it work for Tegra's EMC scaling (or how to implement the requirements of
> > Tegra's EMC scaling within the existing frameworks).
> > 
> 
> 
> [1]
> https://android.googlesource.com/kernel/omap.git/+/android-omap-panda-3.0/arch/arm/plat-omap/omap-pm-helper.c
> 
> -- 
> Regards,
> Nishanth Menon

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC PATCH 3/3] clk: tegra: Implement Tegra124 shared/cbus clks
  2014-05-30  4:47             ` Mike Turquette
@ 2014-05-30 13:24               ` Nishanth Menon
  0 siblings, 0 replies; 18+ messages in thread
From: Nishanth Menon @ 2014-05-30 13:24 UTC (permalink / raw)
  To: Mike Turquette, Thierry Reding
  Cc: Stephen Warren, Peter De Schrijver, Prashant Gaikwad,
	Rob Herring, Pawel Moll, Mark Rutland, Ian Campbell, Kumar Gala,
	Arnd Bergmann, linux-kernel, linux-tegra, devicetree, linux-pm

On 05/29/2014 11:47 PM, Mike Turquette wrote:
> Quoting Nishanth Menon (2014-05-29 16:22:45)
>> On 05/26/2014 08:07 AM, Thierry Reding wrote:
>>> On Wed, May 14, 2014 at 12:35:18PM -0700, Mike Turquette wrote:
>>>> Quoting Thierry Reding (2014-05-14 07:27:40)
>>> [...]
>>>>> As for shared clocks I'm only aware of one use-case, namely EMC scaling.
>>>>> Using clocks for that doesn't seem like the best option to me. While it
>>>>> can probably fix the immediate issue of choosing an appropriate
>>>>> frequency for the EMC clock it isn't a complete solution for the problem
>>>>> that we're trying to solve. From what I understand EMC scaling is one
>>>>> part of ensuring quality of service. The current implementations of that
>>>>> seems to abuse clocks (essentially one X.emc clock per X clock) to
>>>>> signal the amount of memory bandwidth required by any given device. But
>>>>> there are other parts to the puzzle. Latency allowance is one. The value
>>>>> programmed to the latency allowance registers for example depends on the
>>>>> EMC frequency.
>>>>>
>>>>> Has anyone ever looked into using a different framework to model all of
>>>>> these requirements? PM QoS looks like it might fit, but if none of the
>>>>> existing frameworks have what we need, perhaps something new can be
>>>>> created.
>>>>
>>>> It has been discussed. Using a QoS throughput constraint could help
>>>> scale frequency. But this deserves a wider discussion and starts to
>>>> stray into both PM QoS territory and also into "should we have a DVFS
>>>> framework" territory.
>>>
>>> I've looked into this for a bit and it doesn't look like PM QoS is going
>>> to be a good match after all. One of the issues I found was that PM QoS
>>> deals with individual devices and there's no builtin way to collect the
>>> requests from multiple devices to produce a global constraint. So if we
>>> want to add something like that either the API would need to be extended
>>> or it would need to be tacked on using the notifier mechanism and some
>>> way of tracking (and filtering) the individual devices.
>>>
>>> Looking at devfreq it seems to be the DVFS framework that you mentioned,
>>> but from what I can tell it suffers from mostly the same problems. The
>>> governor applies some frequency scaling policy to a single device and
>>> does not allow multiple devices to register constraints against a single
>>> (global) constraint so that the result can be accumulated.
>>>
>>> For Tegra EMC scaling what we need is something more along the lines of
>>> this: we have a resource (external memory) that is shared by multiple
>>> devices in the system. Each of those devices requires a certain amount
>>> of that resource (memory bandwidth). The resource driver will need to
>>> accumulate all requests for the resource and apply the resulting
>>> constraint so that all requests can be satisfied.
>>>
>>> One solution I could imagine to make this work with PM QoS would be to
>>> add the concept of a pm_qos_group to manage a set of pm_qos_requests,
>>> but that will require a bunch of extra checks to make sure that requests
>>> are of the correct type and so on. In other words it would still be
>>> tacked on.
>>
>> just a minor note from previous experience: We(at TI) had attempted in
>> our product kernel[1] to use QoS constraint for certain SoCs for
>> rather unspectacular results.
>>
>> Our use case was similar: devices -> L3(core bus)->memory. We had the
>> following intent:
>> a) wanted to scale L3 based on QoS requests coming in from various
>> device drivers. intent was to scale either to 133MHz or 266MHz (two
>> OPPs we supported on our devices) based on performance needs -> So we
>> asked drivers to report QoS requirements using an standard function -
>> except drivers cannot always report it satisfactorily - example bursty
>> transfer devices - ended up with consolidated requests > total
>> bandwidth possible on the bus -> (and never in practise hitting the
>> lower frequency).
> 
> My opinion on why L3 QoS failed on OMAP is that we only used it for
> DVFS. The voltage domain corresponding to the L3 interconnect had only
> two OPPs, which meant that drivers submitting their constraints only had
> two options: slow vs fast in the best case, or non-functional vs
> functional in the worst case (some IPs simply did not work at the lower
> OPP).
> 
> But the big failure in my opinion was that we did not use any of the
> traffic-shaping or priority handling features of the L3 NoC. OMAP4 used
> the Arteris NoC[1] which has plenty of capabilities for setting
> initiator priorities and bandwidth-throttling on a point-to-point basis,
> which is exactly that QoS is for. If these had been exposed a bit more
> in software then I imagine some of the constraints that always resulted
> in running at the fast OPP might have instead resulted in running at the
> slow OPP, but with a fine-grained mixture of varying bandwidth
> allocations and access priorities.

Thanks Mike for reminding me about it, I agree that modelling of
complete bus capability was never complete or accurate. that is just
one factor, not to mention some of those abilities are not modifiable
on secure devices without adequate handlers inside secure side code.
but, for the discussion context, just details of implementation.


> 
> All of the hardware in the world doesn't do any good if we don't have
> software that uses it. Anyways, just my $0.02.

Yep, the key issue as far as I recollect(its been over a couple of
years now:) ), from productization efforts of the approach has been
the users. SoCs are rarely a single ARM centric world - they have
GPUs, co-processors etc.. no matter how accurately we model the bus
handling, it all boils down to ensuring every single user is
appropriately using the defined functions and ensuring this central
book-keeping authority has exact knowledge of the device requirements.
That is always the hard part.

> 
> Regards,
> Mike
> 
> [1] http://www.arteris.com/OMAP4_QoS_security_NoC
> 
>> b) timing closure issues on certain devices such as USB - which can
>> only function based on async bridge closure requirements on the core
>> bus etc.. these would require bus to be at higher frequency - QoS
>> model was "misused" in such requirements.
>> b.1) a variation: interdependent constraints -> if MPU is > freq X,
>> timing closure required L3 to be at 266MHz. again - it is not a QoS
>> requirement perse, just a dependency requirement that cannot easily be
>> addressed doing a pure QoS like framework solution.
>>
>> Even though EMC does sound like (a) - I suspect you might want to be
>> 100% sure that you dont have variations of (b) in the SoC as well and
>> betting completely on QoS approach might not actually work in practice.
>>
>>>
>>> Adding the linux-pm mailing list for more visibility. Perhaps somebody
>> For folks new on the discussion: complete thread:
>> http://thread.gmane.org/gmane.linux.drivers.devicetree/73967
>>
>>> has some ideas on how to extend any of the existing frameworks to make
>>> it work for Tegra's EMC scaling (or how to implement the requirements of
>>> Tegra's EMC scaling within the existing frameworks).
>>>
>>
>>
>> [1]
>> https://android.googlesource.com/kernel/omap.git/+/android-omap-panda-3.0/arch/arm/plat-omap/omap-pm-helper.c
>>
>> -- 
>> Regards,
>> Nishanth Menon


-- 
Regards,
Nishanth Menon

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2014-05-30 13:24 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-05-13 14:06 [RFC PATCH 0/3] Introduce shared and cbus clocks Peter De Schrijver
2014-05-13 14:06 ` [RFC PATCH 1/3] clk: Implement cbus and shared clocks Peter De Schrijver
2014-05-13 14:06 ` [RFC PATCH 2/3] clk: tegra: Implement common shared clks Peter De Schrijver
2014-05-13 14:06 ` [RFC PATCH 3/3] clk: tegra: Implement Tegra124 shared/cbus clks Peter De Schrijver
2014-05-13 18:09   ` Stephen Warren
2014-05-13 21:52     ` Andrew Bresticker
2014-05-14 14:27     ` Thierry Reding
2014-05-14 17:58       ` Andrew Bresticker
2014-05-15 10:17         ` Peter De Schrijver
2014-05-14 19:35       ` Mike Turquette
2014-05-15 10:59         ` Peter De Schrijver
2014-05-26 13:07         ` Thierry Reding
2014-05-29 23:22           ` Nishanth Menon
2014-05-30  4:47             ` Mike Turquette
2014-05-30 13:24               ` Nishanth Menon
2014-05-15 10:52       ` Peter De Schrijver
2014-05-15 20:20         ` Stephen Warren
2014-05-16 19:58           ` Mike Turquette

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).