All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/2] clk: fix the need of booking clk_ignore_unused=true on embedded devs
@ 2021-10-07 12:06 Mauro Carvalho Chehab
  2021-10-07 12:06 ` [PATCH 1/2] clk: wait for extra time before disabling unused clocks Mauro Carvalho Chehab
                   ` (3 more replies)
  0 siblings, 4 replies; 6+ messages in thread
From: Mauro Carvalho Chehab @ 2021-10-07 12:06 UTC (permalink / raw)
  To: Michael Turquette, Stephen Boyd
  Cc: linuxarm, mauro.chehab, Mauro Carvalho Chehab,
	Manivannan Sadhasivam, linux-clk, linux-kernel

Currently, the only way to boot a Kernel with drivers built as modules on embedded 
devices like HiKey 970 is to pass clk_ignore_unused=true as a modprobe parameter.

There are two separate issues:

1. the clk's core calls clk_disable_unused() too early. By the time this
   function is called, only the builtin drivers were already probed/initialized.
   Drivers built as modules will only be probed afterwards.

   This cause a race condition and boot instability, as the clk core will try
   to disable clocks while the drivers built as modules are still being
   probed and initialized.

   I suspect that the same problem used to happen at the regulator's core,
   as there's a code that waits for 30 seconds before disabling unused
   regulators;

2. there are some gate clocks defined at HiKey 970 that should always be on,
   as otherwise the system will hang, or the filesystem I/O will stop.

Ps.: 
  I submitted already 3 or 4 versions of patches for HiKey 970 clock, but
  they're all unreliable, due to the race conditions at the clk core due to (1).
   
Patch 1 solves the issue with the clk core.
Patch 2 solves the HiKey 970 specific issues.

Mauro Carvalho Chehab (2):
  clk: wait for extra time before disabling unused clocks
  clk: clk-hi3670: mark some clocks as CLK_IS_CRITICAL

 drivers/clk/clk.c                  | 51 +++++++++++++++++++-----------
 drivers/clk/hisilicon/clk-hi3670.c | 24 +++++++-------
 2 files changed, 44 insertions(+), 31 deletions(-)

-- 
2.31.1



^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH 1/2] clk: wait for extra time before disabling unused clocks
  2021-10-07 12:06 [PATCH 0/2] clk: fix the need of booking clk_ignore_unused=true on embedded devs Mauro Carvalho Chehab
@ 2021-10-07 12:06 ` Mauro Carvalho Chehab
  2021-10-07 12:06 ` [PATCH 2/2] clk: clk-hi3670: mark some clocks as CLK_IS_CRITICAL Mauro Carvalho Chehab
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 6+ messages in thread
From: Mauro Carvalho Chehab @ 2021-10-07 12:06 UTC (permalink / raw)
  To: Michael Turquette, Stephen Boyd
  Cc: linuxarm, mauro.chehab, Mauro Carvalho Chehab,
	Manivannan Sadhasivam, linux-clk, linux-kernel

On some tests with HiKey970, with several drivers compiled as
modules, clk_disable_unused() has been called too early,
before the init code from the drivers built as module to be called.

If the system is left to disable unused clocks, those are the last
messages at the console:

	[   22.348592] initcall acpi_gpio_handle_deferred_request_irqs+0x0/0xa8 returned 0 after 1 usecs, irqs_disabled() 0
	[   22.366973] calling  fb_logo_late_init+0x0/0x20 @ 1 irqs_disabled() 0
	[   22.373432] initcall fb_logo_late_init+0x0/0x20 returned 0 after 1 usecs, irqs_disabled() 0

	[   22.381800] calling  clk_disable_unused+0x0/0xe8 @ 1 irqs_disabled() 0
				==================

	<SoC dies here... no other messages>

Preventing clk_disable_unused to be called, there are several other initcall
logs after it:

	[   22.340305] calling  acpi_gpio_handle_deferred_request_irqs+0x0/0xa8 @ 1 irqs_disabled() 0
	[   22.348594] dwmmc_k3 fc183000.dwmmc2: card claims to support voltages below defined range
	[   22.348592] initcall acpi_gpio_handle_deferred_request_irqs+0x0/0xa8 returned 0 after 1 usecs, irqs_disabled() 0
	[   22.366973] calling  fb_logo_late_init+0x0/0x20 @ 1 irqs_disabled() 0
	[   22.373432] initcall fb_logo_late_init+0x0/0x20 returned 0 after 1 usecs, irqs_disabled() 0

	[   22.356984] initcall clk_disable_unused+0x0/0xe8 returned 0 after 117 usecs, irqs_disabled() 0
				==================

	[   22.372335] initcall imx_clk_disable_uart+0x0/0x88 returned 0 after 1 usecs, irqs_disabled() 0
	[   22.387946] initcall regulator_init_complete+0x0/0x58 returned 0 after 2 usecs, irqs_disabled() 0
	[   22.404163] initcall of_platform_sync_state_init+0x0/0x20 returned 0 after 1 usecs, irqs_disabled() 0
	[   22.426508] initcall alsa_sound_last_init+0x0/0x90 returned 0 after 6239 usecs, irqs_disabled() 0
	[   22.703071] initcall inet6_init+0x0/0x358 [ipv6] returned 0 after 13341 usecs, irqs_disabled() 0
	[   22.723861] initcall xt_init+0x0/0x1000 [x_tables] returned 0 after 8 usecs, irqs_disabled() 0
	[   22.744405] initcall ip_tables_init+0x0/0x1000 [ip_tables] returned 0 after 23 usecs, irqs_disabled() 0
	[   23.467003] initcall fuse_init+0x0/0x154 [fuse] returned 0 after 392 usecs, irqs_disabled() 0
	[   23.537742] initcall drm_core_init+0x0/0x1000 [drm] returned 0 after 122 usecs, irqs_disabled() 0
	[   24.519076] initcall rfkill_init+0x0/0x12c [rfkill] returned 0 after 15654 usecs, irqs_disabled() 0
	[   24.622168] initcall hi3670_pcie_phy_driver_init+0x0/0x1000 [phy_hi3670_pcie] returned 0 after 836 usecs, irqs_disabled() 0
	[   24.665100] initcall hi3670_phy_driver_init+0x0/0x1000 [phy_hi3670_usb3] returned 0 after 1888 usecs, irqs_disabled() 0
	[   24.694668] initcall typec_init+0x0/0x1000 [typec] returned 0 after 89 usecs, irqs_disabled() 0
	[   24.732557] initcall cpu_feature_match_ASIMD_init+0x0/0x1000 [crct10dif_ce] returned 0 after 8838 usecs, irqs_disabled() 0
	[   24.746636] initcall tcpci_i2c_driver_init+0x0/0x1000 [tcpci] returned 0 after 8607 usecs, irqs_disabled() 0
	[   24.774541] initcall hisi_hikey_usb_driver_init+0x0/0x1000 [hisi_hikey_usb] returned 0 after 35860 usecs, irqs_disabled() 0
	[   24.892957] initcall rt1711h_i2c_driver_init+0x0/0x1000 [tcpci_rt1711h] returned 0 after 21500 usecs, irqs_disabled() 0
	[   24.956528] initcall wl1271_init+0x0/0x1000 [wlcore_sdio] returned 0 after 83582 usecs, irqs_disabled() 0
	[   25.039853] initcall cfg80211_init+0x0/0xdc [cfg80211] returned 0 after 26291 usecs, irqs_disabled() 0
	[   25.118288] initcall ieee80211_init+0x0/0x40 [mac80211] returned 0 after 15 usecs, irqs_disabled() 0
	[   25.335203] initcall wl18xx_driver_init+0x0/0x1000 [wl18xx] returned 0 after 134423 usecs, irqs_disabled() 0
	[   26.277300] initcall ecdh_init+0x0/0xd0 [ecdh_generic] returned 0 after 302 usecs, irqs_disabled() 0
	[   26.435409] initcall bt_init+0x0/0xcc [bluetooth] returned 0 after 63051 usecs, irqs_disabled() 0
	[   26.508033] initcall btusb_driver_init+0x0/0x1000 [btusb] returned 0 after 305 usecs, irqs_disabled() 0
	[   27.333049] initcall kirin_pcie_driver_init+0x0/0x1000 [pcie_kirin] returned 0 after 805983 usecs, irqs_disabled() 0

So, just like regulator_init_complete code at drivers/regulator/core.c
does, we need to also delay the call to the actual logic which
disables the unused clocks.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
---

To mailbombing on a large number of people, only mailing lists were C/C on the cover.
See [PATCH 0/2] at: https://lore.kernel.org/all/cover.1633607765.git.mchehab+huawei@kernel.org/

 drivers/clk/clk.c | 51 +++++++++++++++++++++++++++++------------------
 1 file changed, 32 insertions(+), 19 deletions(-)

diff --git a/drivers/clk/clk.c b/drivers/clk/clk.c
index 65508eb89ec9..d2e192e243b2 100644
--- a/drivers/clk/clk.c
+++ b/drivers/clk/clk.c
@@ -21,6 +21,7 @@
 #include <linux/pm_runtime.h>
 #include <linux/sched.h>
 #include <linux/clkdev.h>
+#include <linux/workqueue.h>
 
 #include "clk.h"
 
@@ -1206,7 +1207,7 @@ static void clk_core_disable_unprepare(struct clk_core *core)
 	clk_core_unprepare_lock(core);
 }
 
-static void __init clk_unprepare_unused_subtree(struct clk_core *core)
+static void clk_unprepare_unused_subtree(struct clk_core *core)
 {
 	struct clk_core *child;
 
@@ -1236,7 +1237,7 @@ static void __init clk_unprepare_unused_subtree(struct clk_core *core)
 	clk_pm_runtime_put(core);
 }
 
-static void __init clk_disable_unused_subtree(struct clk_core *core)
+static void clk_disable_unused_subtree(struct clk_core *core)
 {
 	struct clk_core *child;
 	unsigned long flags;
@@ -1290,30 +1291,42 @@ static int __init clk_ignore_unused_setup(char *__unused)
 }
 __setup("clk_ignore_unused", clk_ignore_unused_setup);
 
+static void __clk_disable_unused(struct work_struct *w)
+{
+	struct clk_core *core;
+
+	clk_prepare_lock();
+
+	hlist_for_each_entry(core, &clk_root_list, child_node)
+		clk_disable_unused_subtree(core);
+
+	hlist_for_each_entry(core, &clk_orphan_list, child_node)
+		clk_disable_unused_subtree(core);
+
+	hlist_for_each_entry(core, &clk_root_list, child_node)
+		clk_unprepare_unused_subtree(core);
+
+	hlist_for_each_entry(core, &clk_orphan_list, child_node)
+		clk_unprepare_unused_subtree(core);
+
+	clk_prepare_unlock();
+}
+DECLARE_DELAYED_WORK(disable_unused, __clk_disable_unused);
+
 static int __init clk_disable_unused(void)
 {
-	struct clk_core *core;
-
 	if (clk_ignore_unused) {
 		pr_warn("clk: Not disabling unused clocks\n");
 		return 0;
 	}
 
-	clk_prepare_lock();
-
-	hlist_for_each_entry(core, &clk_root_list, child_node)
-		clk_disable_unused_subtree(core);
-
-	hlist_for_each_entry(core, &clk_orphan_list, child_node)
-		clk_disable_unused_subtree(core);
-
-	hlist_for_each_entry(core, &clk_root_list, child_node)
-		clk_unprepare_unused_subtree(core);
-
-	hlist_for_each_entry(core, &clk_orphan_list, child_node)
-		clk_unprepare_unused_subtree(core);
-
-	clk_prepare_unlock();
+	/*
+	 * We punt completion for an arbitrary amount of time since
+	 * systems with enable clocks during module load are initialized
+	 * after late_initcall_sync(), as module drivers will be probed
+	 * and initialized afterwards.
+	 */
+	schedule_delayed_work(&disable_unused, msecs_to_jiffies(15000));
 
 	return 0;
 }
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH 2/2] clk: clk-hi3670: mark some clocks as CLK_IS_CRITICAL
  2021-10-07 12:06 [PATCH 0/2] clk: fix the need of booking clk_ignore_unused=true on embedded devs Mauro Carvalho Chehab
  2021-10-07 12:06 ` [PATCH 1/2] clk: wait for extra time before disabling unused clocks Mauro Carvalho Chehab
@ 2021-10-07 12:06 ` Mauro Carvalho Chehab
  2021-10-11  6:17 ` [PATCH 0/2] clk: fix the need of booking clk_ignore_unused=true on embedded devs Manivannan Sadhasivam
  2021-10-27  7:21 ` Mauro Carvalho Chehab
  3 siblings, 0 replies; 6+ messages in thread
From: Mauro Carvalho Chehab @ 2021-10-07 12:06 UTC (permalink / raw)
  To: Michael Turquette, Stephen Boyd
  Cc: linuxarm, mauro.chehab, Mauro Carvalho Chehab,
	Manivannan Sadhasivam, linux-clk, linux-kernel

Some clocks can't be disabled or the device stops working.

Mark those with CLK_IS_CRITICAL.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
---

To mailbombing on a large number of people, only mailing lists were C/C on the cover.
See [PATCH 0/2] at: https://lore.kernel.org/all/cover.1633607765.git.mchehab+huawei@kernel.org/

 drivers/clk/hisilicon/clk-hi3670.c | 24 ++++++++++++------------
 1 file changed, 12 insertions(+), 12 deletions(-)

diff --git a/drivers/clk/hisilicon/clk-hi3670.c b/drivers/clk/hisilicon/clk-hi3670.c
index 4d05a71683a5..d5813132df9c 100644
--- a/drivers/clk/hisilicon/clk-hi3670.c
+++ b/drivers/clk/hisilicon/clk-hi3670.c
@@ -82,13 +82,13 @@ static const struct hisi_gate_clock hi3670_crgctrl_gate_sep_clks[] = {
 	{ HI3670_PPLL2_EN_ACPU, "ppll2_en_acpu", "clk_ppll2",
 	  CLK_SET_RATE_PARENT, 0x0, 3, 0, },
 	{ HI3670_PPLL3_EN_ACPU, "ppll3_en_acpu", "clk_ppll3",
-	  CLK_SET_RATE_PARENT, 0x0, 27, 0, },
+	  CLK_SET_RATE_PARENT | CLK_IS_CRITICAL, 0x0, 27, 0, },
 	{ HI3670_PPLL1_GT_CPU, "ppll1_gt_cpu", "clk_ppll1",
 	  CLK_SET_RATE_PARENT, 0x460, 16, 0, },
 	{ HI3670_PPLL2_GT_CPU, "ppll2_gt_cpu", "clk_ppll2",
 	  CLK_SET_RATE_PARENT, 0x460, 18, 0, },
 	{ HI3670_PPLL3_GT_CPU, "ppll3_gt_cpu", "clk_ppll3",
-	  CLK_SET_RATE_PARENT, 0x460, 20, 0, },
+	  CLK_SET_RATE_PARENT | CLK_IS_CRITICAL, 0x460, 20, 0, },
 	{ HI3670_CLK_GATE_PPLL2_MEDIA, "clk_gate_ppll2_media", "clk_ppll2",
 	  CLK_SET_RATE_PARENT, 0x410, 27, 0, },
 	{ HI3670_CLK_GATE_PPLL3_MEDIA, "clk_gate_ppll3_media", "clk_ppll3",
@@ -166,7 +166,7 @@ static const struct hisi_gate_clock hi3670_crgctrl_gate_sep_clks[] = {
 	{ HI3670_CLK_CCI400_BYPASS, "clk_cci400_bypass", "clk_ddrc_freq",
 	  CLK_SET_RATE_PARENT, 0x22C, 28, 0, },
 	{ HI3670_CLK_GATE_CCI400, "clk_gate_cci400", "clk_ddrc_freq",
-	  CLK_SET_RATE_PARENT, 0x50, 14, 0, },
+	  CLK_SET_RATE_PARENT | CLK_IS_CRITICAL, 0x50, 14, 0, },
 	{ HI3670_CLK_GATE_SD, "clk_gate_sd", "clk_mux_sd_sys",
 	  CLK_SET_RATE_PARENT, 0x40, 17, 0, },
 	{ HI3670_HCLK_GATE_SD, "hclk_gate_sd", "clk_div_sysbus",
@@ -248,15 +248,15 @@ static const struct hisi_gate_clock hi3670_crgctrl_gate_sep_clks[] = {
 	{ HI3670_CLK_GATE_AO_ASP, "clk_gate_ao_asp", "clk_div_ao_asp",
 	  CLK_SET_RATE_PARENT, 0x0, 26, 0, },
 	{ HI3670_PCLK_GATE_PCTRL, "pclk_gate_pctrl", "clk_div_ptp",
-	  CLK_SET_RATE_PARENT, 0x20, 31, 0, },
+	  CLK_SET_RATE_PARENT | CLK_IS_CRITICAL, 0x20, 31, 0, },
 	{ HI3670_CLK_CSI_TRANS_GT, "clk_csi_trans_gt", "clk_div_csi_trans",
-	  CLK_SET_RATE_PARENT, 0x30, 24, 0, },
+	  CLK_SET_RATE_PARENT | CLK_IS_CRITICAL, 0x30, 24, 0, },
 	{ HI3670_CLK_DSI_TRANS_GT, "clk_dsi_trans_gt", "clk_div_dsi_trans",
 	  CLK_SET_RATE_PARENT, 0x30, 25, 0, },
 	{ HI3670_CLK_GATE_PWM, "clk_gate_pwm", "clk_div_ptp",
 	  CLK_SET_RATE_PARENT, 0x20, 0, 0, },
 	{ HI3670_ABB_AUDIO_EN0, "abb_audio_en0", "clk_gate_abb_192",
-	  CLK_SET_RATE_PARENT, 0x30, 8, 0, },
+	  CLK_SET_RATE_PARENT | CLK_IS_CRITICAL, 0x30, 8, 0, },
 	{ HI3670_ABB_AUDIO_EN1, "abb_audio_en1", "clk_gate_abb_192",
 	  CLK_SET_RATE_PARENT, 0x30, 9, 0, },
 	{ HI3670_ABB_AUDIO_GT_EN0, "abb_audio_gt_en0", "abb_audio_en0",
@@ -331,9 +331,9 @@ static const struct hisi_gate_clock hi3670_crgctrl_gate_clks[] = {
 	{ HI3670_CLK_GATE_DSI_TRANS, "clk_gate_dsi_trans", "clk_ppll2",
 	  CLK_SET_RATE_PARENT, 0xF4, 1, CLK_GATE_HIWORD_MASK, },
 	{ HI3670_CLK_ANDGT_PTP, "clk_andgt_ptp", "clk_div_320m",
-	  CLK_SET_RATE_PARENT, 0xF8, 5, CLK_GATE_HIWORD_MASK, },
+	  CLK_SET_RATE_PARENT | CLK_IS_CRITICAL, 0xF8, 5, CLK_GATE_HIWORD_MASK, },
 	{ HI3670_CLK_ANDGT_OUT0, "clk_andgt_out0", "clk_ppll0",
-	  CLK_SET_RATE_PARENT, 0xF0, 10, CLK_GATE_HIWORD_MASK, },
+	  CLK_SET_RATE_PARENT | CLK_IS_CRITICAL, 0xF0, 10, CLK_GATE_HIWORD_MASK, },
 	{ HI3670_CLK_ANDGT_OUT1, "clk_andgt_out1", "clk_ppll0",
 	  CLK_SET_RATE_PARENT, 0xF0, 11, CLK_GATE_HIWORD_MASK, },
 	{ HI3670_CLKGT_DP_AUDIO_PLL_AO, "clkgt_dp_audio_pll_ao", "clk_ppll6",
@@ -569,9 +569,9 @@ static const struct hisi_gate_clock hi3670_sctrl_gate_sep_clks[] = {
 	{ HI3670_PCLK_GATE_SPI, "pclk_gate_spi", "clk_div_ioperi",
 	  CLK_SET_RATE_PARENT, 0x1B0, 10, 0, },
 	{ HI3670_CLK_GATE_UFS_SUBSYS, "clk_gate_ufs_subsys", "clk_div_ufs_subsys",
-	  CLK_SET_RATE_PARENT, 0x1B0, 14, 0, },
+	  CLK_SET_RATE_PARENT | CLK_IS_CRITICAL, 0x1B0, 14, 0, },
 	{ HI3670_CLK_GATE_UFSIO_REF, "clk_gate_ufsio_ref", "clkin_sys",
-	  CLK_SET_RATE_PARENT, 0x1b0, 12, 0, },
+	  CLK_SET_RATE_PARENT | CLK_IS_CRITICAL, 0x1b0, 12, 0, },
 	{ HI3670_PCLK_AO_GPIO0, "pclk_ao_gpio0", "clk_div_aobus",
 	  CLK_SET_RATE_PARENT, 0x160, 11, 0, },
 	{ HI3670_PCLK_AO_GPIO1, "pclk_ao_gpio1", "clk_div_aobus",
@@ -593,7 +593,7 @@ static const struct hisi_gate_clock hi3670_sctrl_gate_sep_clks[] = {
 	{ HI3670_PCLK_GATE_SYSCNT, "pclk_gate_syscnt", "clk_div_aobus",
 	  CLK_SET_RATE_PARENT, 0x160, 19, 0, },
 	{ HI3670_CLK_GATE_SYSCNT, "clk_gate_syscnt", "clkin_sys",
-	  CLK_SET_RATE_PARENT, 0x160, 20, 0, },
+	  CLK_SET_RATE_PARENT | CLK_IS_CRITICAL, 0x160, 20, 0, },
 	{ HI3670_CLK_GATE_ASP_SUBSYS_PERI, "clk_gate_asp_subsys_peri",
 	  "clk_mux_asp_subsys_peri",
 	  CLK_SET_RATE_PARENT, 0x170, 6, 0, },
@@ -703,7 +703,7 @@ static const struct hisi_gate_clock hi3670_media1_gate_sep_clks[] = {
 	{ HI3670_PCLK_GATE_DISP_NOC_SUBSYS, "pclk_gate_disp_noc_subsys", "clk_div_sysbus",
 	  CLK_SET_RATE_PARENT, 0x10, 18, 0, },
 	{ HI3670_ACLK_GATE_DISP_NOC_SUBSYS, "aclk_gate_disp_noc_subsys", "clk_gate_vivobusfreq",
-	  CLK_SET_RATE_PARENT, 0x10, 17, 0, },
+	  CLK_SET_RATE_PARENT | CLK_IS_CRITICAL, 0x10, 17, 0, },
 	{ HI3670_PCLK_GATE_DSS, "pclk_gate_dss", "pclk_gate_disp_noc_subsys",
 	  CLK_SET_RATE_PARENT, 0x00, 14, 0, },
 	{ HI3670_ACLK_GATE_DSS, "aclk_gate_dss", "aclk_gate_disp_noc_subsys",
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH 0/2] clk: fix the need of booking clk_ignore_unused=true on embedded devs
  2021-10-07 12:06 [PATCH 0/2] clk: fix the need of booking clk_ignore_unused=true on embedded devs Mauro Carvalho Chehab
  2021-10-07 12:06 ` [PATCH 1/2] clk: wait for extra time before disabling unused clocks Mauro Carvalho Chehab
  2021-10-07 12:06 ` [PATCH 2/2] clk: clk-hi3670: mark some clocks as CLK_IS_CRITICAL Mauro Carvalho Chehab
@ 2021-10-11  6:17 ` Manivannan Sadhasivam
  2021-10-14  6:44   ` Mauro Carvalho Chehab
  2021-10-27  7:21 ` Mauro Carvalho Chehab
  3 siblings, 1 reply; 6+ messages in thread
From: Manivannan Sadhasivam @ 2021-10-11  6:17 UTC (permalink / raw)
  To: Mauro Carvalho Chehab
  Cc: Michael Turquette, Stephen Boyd, linuxarm, mauro.chehab,
	linux-clk, linux-kernel

Hi Mauro,

On Thu, Oct 07, 2021 at 02:06:53PM +0200, Mauro Carvalho Chehab wrote:
> Currently, the only way to boot a Kernel with drivers built as modules on embedded 
> devices like HiKey 970 is to pass clk_ignore_unused=true as a modprobe parameter.
> 
> There are two separate issues:
> 
> 1. the clk's core calls clk_disable_unused() too early. By the time this
>    function is called, only the builtin drivers were already probed/initialized.
>    Drivers built as modules will only be probed afterwards.
> 
>    This cause a race condition and boot instability, as the clk core will try
>    to disable clocks while the drivers built as modules are still being
>    probed and initialized.

So you are mentioning a "race" condition here but it is not mentioned in the
actual patch. If the issue you are seeing is because the clocks used by the
modules are disabled before they are probed, why can't they just enable the
clocks during the probe time?

Am I missing something?

Thanks,
Mani

> 
>    I suspect that the same problem used to happen at the regulator's core,
>    as there's a code that waits for 30 seconds before disabling unused
>    regulators;
> 
> 2. there are some gate clocks defined at HiKey 970 that should always be on,
>    as otherwise the system will hang, or the filesystem I/O will stop.
> 
> Ps.: 
>   I submitted already 3 or 4 versions of patches for HiKey 970 clock, but
>   they're all unreliable, due to the race conditions at the clk core due to (1).
>    
> Patch 1 solves the issue with the clk core.
> Patch 2 solves the HiKey 970 specific issues.
> 
> Mauro Carvalho Chehab (2):
>   clk: wait for extra time before disabling unused clocks
>   clk: clk-hi3670: mark some clocks as CLK_IS_CRITICAL
> 
>  drivers/clk/clk.c                  | 51 +++++++++++++++++++-----------
>  drivers/clk/hisilicon/clk-hi3670.c | 24 +++++++-------
>  2 files changed, 44 insertions(+), 31 deletions(-)
> 
> -- 
> 2.31.1
> 
> 

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 0/2] clk: fix the need of booking clk_ignore_unused=true on embedded devs
  2021-10-11  6:17 ` [PATCH 0/2] clk: fix the need of booking clk_ignore_unused=true on embedded devs Manivannan Sadhasivam
@ 2021-10-14  6:44   ` Mauro Carvalho Chehab
  0 siblings, 0 replies; 6+ messages in thread
From: Mauro Carvalho Chehab @ 2021-10-14  6:44 UTC (permalink / raw)
  To: Manivannan Sadhasivam
  Cc: Michael Turquette, Stephen Boyd, linuxarm, mauro.chehab,
	linux-clk, linux-kernel

Em Mon, 11 Oct 2021 11:47:18 +0530
Manivannan Sadhasivam <mani@kernel.org> escreveu:

> Hi Mauro,
> 
> On Thu, Oct 07, 2021 at 02:06:53PM +0200, Mauro Carvalho Chehab wrote:
> > Currently, the only way to boot a Kernel with drivers built as modules on embedded 
> > devices like HiKey 970 is to pass clk_ignore_unused=true as a modprobe parameter.
> > 
> > There are two separate issues:
> > 
> > 1. the clk's core calls clk_disable_unused() too early. By the time this
> >    function is called, only the builtin drivers were already probed/initialized.
> >    Drivers built as modules will only be probed afterwards.
> > 
> >    This cause a race condition and boot instability, as the clk core will try
> >    to disable clocks while the drivers built as modules are still being
> >    probed and initialized.  
> 
> So you are mentioning a "race" condition here but it is not mentioned in the
> actual patch. 

Patch 1 explains it...

> If the issue you are seeing is because the clocks used by the
> modules are disabled before they are probed, why can't they just enable the
> clocks during the probe time?
> 
> Am I missing something?

What happens is that such clocks are enabled when the system boots,
and, when those are disabled, very bad things happen, as those
interrupt clocks used by several parts of the system.

Most of the problems happen because the ARM SoC produce SError NMI 
interrupts when some such clocks are disabled, which calls panic().

Other clocks disable some key components of the system that aren't
directly related with a driver, but, instead, controls some core
part of the device, making the SoC to wait forever for an I/O event
that will never happen.

A small set of clocks make the system unreliable, causing drivers
to fail probing. Those can either lead to panic() or break support
for a peripheral, like WiFi, USB and/or PCI.

The core issue is that clk_disable_unused() happens too early.
This is called at late_initcall_sync() time, which is triggered
before the probe/init code of the drivers compiled as modules
to be called. So, what happens is:


 BIOS enables clocks that are needed for the device to boot             
 |                                
 +-> Linux start booting
 |
 +-> builtin drivers are probed 
 |
 +--------------------------------\
 |                                |
 +-> late_initcall_sync() calls   +-> Modules start probing
 |   clk_disable_unused)          |
 |                                +-> Some drivers are probed
 |                                |   before their needed clks
 |                                |   got disabled
 |                                |
 +-> Clocks are disabled          |
 |                                |
 +-> SError -> panic()            |
                                  \ (several drivers weren't
				     probed/initialized)

The only fix for that is to postpone clk_disable_unused() to happen
after all driver probe/init are called, or to completely disable
it.

The current distributions recommended at:
	https://www.96boards.org/product/hikey970/

pass clk_ignore_unused as a boot parameter, which disables the call
to clk_disable_unused().

The only sane way to get rid of that is to fix the core to let the
drivers to finish probe/init before disabling clocks.

See, the regulators logic that disables unused power lines also
do the same: it waits for 30 seconds after late_initcall_sync()
before calling Runtime PM suspend logic.

Regards,
Mauro

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 0/2] clk: fix the need of booking clk_ignore_unused=true on embedded devs
  2021-10-07 12:06 [PATCH 0/2] clk: fix the need of booking clk_ignore_unused=true on embedded devs Mauro Carvalho Chehab
                   ` (2 preceding siblings ...)
  2021-10-11  6:17 ` [PATCH 0/2] clk: fix the need of booking clk_ignore_unused=true on embedded devs Manivannan Sadhasivam
@ 2021-10-27  7:21 ` Mauro Carvalho Chehab
  3 siblings, 0 replies; 6+ messages in thread
From: Mauro Carvalho Chehab @ 2021-10-27  7:21 UTC (permalink / raw)
  To: Michael Turquette, Stephen Boyd
  Cc: linuxarm, mauro.chehab, Manivannan Sadhasivam, linux-clk, linux-kernel

Stephen/Michael,

Gentile ping.

Regards,
Mauro

Em Thu,  7 Oct 2021 14:06:53 +0200
Mauro Carvalho Chehab <mchehab+huawei@kernel.org> escreveu:

> Currently, the only way to boot a Kernel with drivers built as modules on embedded 
> devices like HiKey 970 is to pass clk_ignore_unused=true as a modprobe parameter.
> 
> There are two separate issues:
> 
> 1. the clk's core calls clk_disable_unused() too early. By the time this
>    function is called, only the builtin drivers were already probed/initialized.
>    Drivers built as modules will only be probed afterwards.
> 
>    This cause a race condition and boot instability, as the clk core will try
>    to disable clocks while the drivers built as modules are still being
>    probed and initialized.
> 
>    I suspect that the same problem used to happen at the regulator's core,
>    as there's a code that waits for 30 seconds before disabling unused
>    regulators;
> 
> 2. there are some gate clocks defined at HiKey 970 that should always be on,
>    as otherwise the system will hang, or the filesystem I/O will stop.
> 
> Ps.: 
>   I submitted already 3 or 4 versions of patches for HiKey 970 clock, but
>   they're all unreliable, due to the race conditions at the clk core due to (1).
>    
> Patch 1 solves the issue with the clk core.
> Patch 2 solves the HiKey 970 specific issues.
> 
> Mauro Carvalho Chehab (2):
>   clk: wait for extra time before disabling unused clocks
>   clk: clk-hi3670: mark some clocks as CLK_IS_CRITICAL
> 
>  drivers/clk/clk.c                  | 51 +++++++++++++++++++-----------
>  drivers/clk/hisilicon/clk-hi3670.c | 24 +++++++-------
>  2 files changed, 44 insertions(+), 31 deletions(-)
> 

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2021-10-27  7:21 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-10-07 12:06 [PATCH 0/2] clk: fix the need of booking clk_ignore_unused=true on embedded devs Mauro Carvalho Chehab
2021-10-07 12:06 ` [PATCH 1/2] clk: wait for extra time before disabling unused clocks Mauro Carvalho Chehab
2021-10-07 12:06 ` [PATCH 2/2] clk: clk-hi3670: mark some clocks as CLK_IS_CRITICAL Mauro Carvalho Chehab
2021-10-11  6:17 ` [PATCH 0/2] clk: fix the need of booking clk_ignore_unused=true on embedded devs Manivannan Sadhasivam
2021-10-14  6:44   ` Mauro Carvalho Chehab
2021-10-27  7:21 ` Mauro Carvalho Chehab

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.