All of lore.kernel.org
 help / color / mirror / Atom feed
From: Dominique MARTINET <dominique.martinet@atmark-techno.com>
To: Alice Guo <alice.guo@nxp.com>, Shawn Guo <shawnguo@kernel.org>,
	Krzysztof Kozlowski <krzk@kernel.org>
Cc: robh+dt@kernel.org, devicetree@vger.kernel.org, peng.fan@nxp.com,
	linux-imx@nxp.com, linux-arm-kernel@lists.infradead.org,
	linux-kernel@vger.kernel.org
Subject: regression due to soc_device_match not handling defer (Was: [PATCH v4 4/4] soc: imx8m: change to use platform driver)
Date: Mon, 29 Mar 2021 18:08:54 +0900	[thread overview]
Message-ID: <YGGZJjAxA1IO+/VU@atmark-techno.com> (raw)
In-Reply-To: <20201120101112.31819-4-alice.guo@nxp.com>

Hi,

First thanks for the patch, it fixes the kexec hang I was looking at...

Unfortunately it means the soc is now init much later and other piece of
drivers that depend on querying the soc will fail, I am getting a BUG in
the caam crypto driver from 7d981405d0fd ("soc: imx8m: change to use
platform driver") + ce58459d8c7f ("arm64: dts: imx8m: add SoC ID
compatible") on the imx8mp evk as follow:

[    2.575505] caam 30900000.crypto: caam pkc algorithms registered in /proc/crypto
[    2.588986] caam 30900000.crypto: registering rng-caam
[    2.594168] caam_jr 30901000.jr: job ring error: irqstate: 00000103
[    2.600492] ------------[ cut here ]------------
[    2.605109] kernel BUG at drivers/crypto/caam/jr.c:187!
[    2.610338] Internal error: Oops - BUG: 0 [#1] PREEMPT SMP
[    2.615829] Modules linked in:
[    2.618895] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.12.0-rc5 #8
[    2.625168] Hardware name: NXP i.MX8MPlus EVK board (DT)
[    2.630482] pstate: 60000085 (nZCv daIf -PAN -UAO -TCO BTYPE=--)
[    2.636493] pc : caam_jr_interrupt+0x150/0x154
[    2.640944] lr : caam_jr_interrupt+0x150/0x154
[    2.645396] sp : ffff800010003e90
[    2.648709] x29: ffff800010003e90 x28: ffff800011f82e80
[    2.654035] x27: ffff800011cd2000 x26: ffff0000c1988400
[    2.659353] x25: ffff800011cd2000 x24: ffff800011f789e0
[    2.664674] x23: 000000000000002e x22: ffff800012261000
[    2.669994] x21: ffff0000c1979410 x20: ffff0000c1988a80
[    2.675315] x19: 0000000000000103 x18: 0000000000000000
[    2.680637] x17: 0000000000000007 x16: 000000000000000e
[    2.685958] x15: 0000000000000030 x14: ffffffffffffffff
[    2.691279] x13: ffff800090003aa7 x12: ffff800010003aaf
[    2.696601] x11: 0000000000000003 x10: 0101010101010101
[    2.701921] x9 : ffff8000100039d0 x8 : ffff800011fa3830
[    2.707242] x7 : ffff800011ffb830 x6 : ffff800011ffb830
[    2.712560] x5 : 0000000000000000 x4 : 0000000000000000
[    2.717881] x3 : 0000000000000000 x2 : 0000000000000000
[    2.723203] x1 : 0000000000000000 x0 : ffff800011f82e80
[    2.728528] Call trace:
[    2.730976]  caam_jr_interrupt+0x150/0x154
[    2.735080]  __handle_irq_event_percpu+0x6c/0x280
[    2.739791]  handle_irq_event+0x70/0x160
[    2.743716]  handle_fasteoi_irq+0xb0/0x200
[    2.747822]  __handle_domain_irq+0x8c/0xf0
[    2.751924]  gic_handle_irq+0xd8/0x160
[    2.755683]  el1_irq+0xb4/0x180
[    2.758829]  arch_cpu_idle+0x18/0x30
[    2.762412]  default_idle_call+0x4c/0x1d0
[    2.766431]  do_idle+0x238/0x2b0
[    2.769664]  cpu_startup_entry+0x30/0x7c
[    2.773595]  rest_init+0xe4/0xf4
[    2.776832]  arch_call_rest_init+0x1c/0x28
[    2.780937]  start_kernel+0x570/0x5a8
[    2.784602]  0x0
[    2.786452] Code: aa1503e0 90005c41 91108021 940da95d (d4210000)
[    2.792560] ---[ end trace 968b8515172abc2e ]---
[    2.797181] Kernel panic - not syncing: Oops - BUG: Fatal exception in interrupt
[    2.804580] SMP: stopping secondary CPUs
[    2.808508] Kernel Offset: disabled
[    2.811998] CPU features: 0x00240002,2000200c
[    2.816360] Memory Limit: none
[    2.819419] ---[ end Kernel panic - not syncing: Oops - BUG: Fatal exception in interrupt ]---


This particular crash can be fixed by making the caam driver delay as
well if the device isn't inited yet, e.g. this works:
--------
diff --git a/drivers/base/soc.c b/drivers/base/soc.c
index 0af5363a582c..f179f9e55b49 100644
--- a/drivers/base/soc.c
+++ b/drivers/base/soc.c
@@ -36,6 +36,7 @@ static DEVICE_ATTR(family,		0444, soc_info_show,  NULL);
 static DEVICE_ATTR(serial_number,	0444, soc_info_show,  NULL);
 static DEVICE_ATTR(soc_id,		0444, soc_info_show,  NULL);
 static DEVICE_ATTR(revision,		0444, soc_info_show,  NULL);
+static int init_done;
 
 struct device *soc_device_to_device(struct soc_device *soc_dev)
 {
@@ -157,6 +158,7 @@ struct soc_device *soc_device_register(struct soc_device_attribute *soc_dev_attr
 		return ERR_PTR(ret);
 	}
 
+	init_done = true;
 	return soc_dev;
 
 out3:
@@ -243,6 +245,9 @@ const struct soc_device_attribute *soc_device_match(
 {
 	int ret = 0;
 
+	if (!init_done)
+		return ERR_PTR(-EPROBE_DEFER);
+
 	if (!matches)
 		return NULL;
 
diff --git a/drivers/crypto/caam/ctrl.c b/drivers/crypto/caam/ctrl.c
index ca0361b2dbb0..d08f8cc4131f 100644
--- a/drivers/crypto/caam/ctrl.c
+++ b/drivers/crypto/caam/ctrl.c
@@ -635,6 +635,9 @@ static int caam_probe(struct platform_device *pdev)
 	nprop = pdev->dev.of_node;
 
 	imx_soc_match = soc_device_match(caam_imx_soc_table);
+	if (IS_ERR(imx_soc_match))
+		return PTR_ERR(imx_soc_match);
+
 	caam_imx = (bool)imx_soc_match;
 
 	if (imx_soc_match) {
-------

But it obviously doesn't cover the ~50 (!) other soc_device_match users
in the code base which would all need to start handling potential error
return code.

(I also had problems with other drivers when trying to backport the
patch to the 5.4.70_2.3.0 imx kernel but I just gave up on that one)


I think having all drivers handle potential EPROBE_DEFER errors is the
best way forward, how do you propose to move on?
I can do some but have no way of testing most of these so am a bit
reluctant to...

Thanks,
-- 
Dominique

WARNING: multiple messages have this Message-ID (diff)
From: Dominique MARTINET <dominique.martinet@atmark-techno.com>
To: Alice Guo <alice.guo@nxp.com>, Shawn Guo <shawnguo@kernel.org>,
	Krzysztof Kozlowski <krzk@kernel.org>
Cc: robh+dt@kernel.org, devicetree@vger.kernel.org, peng.fan@nxp.com,
	linux-imx@nxp.com, linux-arm-kernel@lists.infradead.org,
	linux-kernel@vger.kernel.org
Subject: regression due to soc_device_match not handling defer (Was: [PATCH v4 4/4] soc: imx8m: change to use platform driver)
Date: Mon, 29 Mar 2021 18:08:54 +0900	[thread overview]
Message-ID: <YGGZJjAxA1IO+/VU@atmark-techno.com> (raw)
In-Reply-To: <20201120101112.31819-4-alice.guo@nxp.com>

Hi,

First thanks for the patch, it fixes the kexec hang I was looking at...

Unfortunately it means the soc is now init much later and other piece of
drivers that depend on querying the soc will fail, I am getting a BUG in
the caam crypto driver from 7d981405d0fd ("soc: imx8m: change to use
platform driver") + ce58459d8c7f ("arm64: dts: imx8m: add SoC ID
compatible") on the imx8mp evk as follow:

[    2.575505] caam 30900000.crypto: caam pkc algorithms registered in /proc/crypto
[    2.588986] caam 30900000.crypto: registering rng-caam
[    2.594168] caam_jr 30901000.jr: job ring error: irqstate: 00000103
[    2.600492] ------------[ cut here ]------------
[    2.605109] kernel BUG at drivers/crypto/caam/jr.c:187!
[    2.610338] Internal error: Oops - BUG: 0 [#1] PREEMPT SMP
[    2.615829] Modules linked in:
[    2.618895] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.12.0-rc5 #8
[    2.625168] Hardware name: NXP i.MX8MPlus EVK board (DT)
[    2.630482] pstate: 60000085 (nZCv daIf -PAN -UAO -TCO BTYPE=--)
[    2.636493] pc : caam_jr_interrupt+0x150/0x154
[    2.640944] lr : caam_jr_interrupt+0x150/0x154
[    2.645396] sp : ffff800010003e90
[    2.648709] x29: ffff800010003e90 x28: ffff800011f82e80
[    2.654035] x27: ffff800011cd2000 x26: ffff0000c1988400
[    2.659353] x25: ffff800011cd2000 x24: ffff800011f789e0
[    2.664674] x23: 000000000000002e x22: ffff800012261000
[    2.669994] x21: ffff0000c1979410 x20: ffff0000c1988a80
[    2.675315] x19: 0000000000000103 x18: 0000000000000000
[    2.680637] x17: 0000000000000007 x16: 000000000000000e
[    2.685958] x15: 0000000000000030 x14: ffffffffffffffff
[    2.691279] x13: ffff800090003aa7 x12: ffff800010003aaf
[    2.696601] x11: 0000000000000003 x10: 0101010101010101
[    2.701921] x9 : ffff8000100039d0 x8 : ffff800011fa3830
[    2.707242] x7 : ffff800011ffb830 x6 : ffff800011ffb830
[    2.712560] x5 : 0000000000000000 x4 : 0000000000000000
[    2.717881] x3 : 0000000000000000 x2 : 0000000000000000
[    2.723203] x1 : 0000000000000000 x0 : ffff800011f82e80
[    2.728528] Call trace:
[    2.730976]  caam_jr_interrupt+0x150/0x154
[    2.735080]  __handle_irq_event_percpu+0x6c/0x280
[    2.739791]  handle_irq_event+0x70/0x160
[    2.743716]  handle_fasteoi_irq+0xb0/0x200
[    2.747822]  __handle_domain_irq+0x8c/0xf0
[    2.751924]  gic_handle_irq+0xd8/0x160
[    2.755683]  el1_irq+0xb4/0x180
[    2.758829]  arch_cpu_idle+0x18/0x30
[    2.762412]  default_idle_call+0x4c/0x1d0
[    2.766431]  do_idle+0x238/0x2b0
[    2.769664]  cpu_startup_entry+0x30/0x7c
[    2.773595]  rest_init+0xe4/0xf4
[    2.776832]  arch_call_rest_init+0x1c/0x28
[    2.780937]  start_kernel+0x570/0x5a8
[    2.784602]  0x0
[    2.786452] Code: aa1503e0 90005c41 91108021 940da95d (d4210000)
[    2.792560] ---[ end trace 968b8515172abc2e ]---
[    2.797181] Kernel panic - not syncing: Oops - BUG: Fatal exception in interrupt
[    2.804580] SMP: stopping secondary CPUs
[    2.808508] Kernel Offset: disabled
[    2.811998] CPU features: 0x00240002,2000200c
[    2.816360] Memory Limit: none
[    2.819419] ---[ end Kernel panic - not syncing: Oops - BUG: Fatal exception in interrupt ]---


This particular crash can be fixed by making the caam driver delay as
well if the device isn't inited yet, e.g. this works:
--------
diff --git a/drivers/base/soc.c b/drivers/base/soc.c
index 0af5363a582c..f179f9e55b49 100644
--- a/drivers/base/soc.c
+++ b/drivers/base/soc.c
@@ -36,6 +36,7 @@ static DEVICE_ATTR(family,		0444, soc_info_show,  NULL);
 static DEVICE_ATTR(serial_number,	0444, soc_info_show,  NULL);
 static DEVICE_ATTR(soc_id,		0444, soc_info_show,  NULL);
 static DEVICE_ATTR(revision,		0444, soc_info_show,  NULL);
+static int init_done;
 
 struct device *soc_device_to_device(struct soc_device *soc_dev)
 {
@@ -157,6 +158,7 @@ struct soc_device *soc_device_register(struct soc_device_attribute *soc_dev_attr
 		return ERR_PTR(ret);
 	}
 
+	init_done = true;
 	return soc_dev;
 
 out3:
@@ -243,6 +245,9 @@ const struct soc_device_attribute *soc_device_match(
 {
 	int ret = 0;
 
+	if (!init_done)
+		return ERR_PTR(-EPROBE_DEFER);
+
 	if (!matches)
 		return NULL;
 
diff --git a/drivers/crypto/caam/ctrl.c b/drivers/crypto/caam/ctrl.c
index ca0361b2dbb0..d08f8cc4131f 100644
--- a/drivers/crypto/caam/ctrl.c
+++ b/drivers/crypto/caam/ctrl.c
@@ -635,6 +635,9 @@ static int caam_probe(struct platform_device *pdev)
 	nprop = pdev->dev.of_node;
 
 	imx_soc_match = soc_device_match(caam_imx_soc_table);
+	if (IS_ERR(imx_soc_match))
+		return PTR_ERR(imx_soc_match);
+
 	caam_imx = (bool)imx_soc_match;
 
 	if (imx_soc_match) {
-------

But it obviously doesn't cover the ~50 (!) other soc_device_match users
in the code base which would all need to start handling potential error
return code.

(I also had problems with other drivers when trying to backport the
patch to the 5.4.70_2.3.0 imx kernel but I just gave up on that one)


I think having all drivers handle potential EPROBE_DEFER errors is the
best way forward, how do you propose to move on?
I can do some but have no way of testing most of these so am a bit
reluctant to...

Thanks,
-- 
Dominique

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

  parent reply	other threads:[~2021-03-29  9:10 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-11-20 10:11 [PATCH v4 1/4] dt-bindings: soc: imx8m: add DT Binding doc for soc unique ID Alice Guo
2020-11-20 10:11 ` Alice Guo
2020-11-20 10:11 ` [PATCH v4 2/4] arm64: dts: imx8m: add SoC ID compatible Alice Guo
2020-11-20 10:11   ` Alice Guo
2020-11-20 10:11 ` [PATCH v4 3/4] arm64: dts: imx8m: add NVMEM provider and consumer to read soc unique ID Alice Guo
2020-11-20 10:11   ` Alice Guo
2020-11-20 10:53   ` Krzysztof Kozlowski
2020-11-20 10:53     ` Krzysztof Kozlowski
2020-11-20 10:11 ` [PATCH v4 4/4] soc: imx8m: change to use platform driver Alice Guo
2020-11-20 10:11   ` Alice Guo
2020-11-20 10:46   ` Krzysztof Kozlowski
2020-11-20 10:46     ` Krzysztof Kozlowski
2021-03-29  9:08   ` Dominique MARTINET [this message]
2021-03-29  9:08     ` regression due to soc_device_match not handling defer (Was: [PATCH v4 4/4] soc: imx8m: change to use platform driver) Dominique MARTINET
2021-03-30  2:41     ` [EXT] " Alice Guo (OSS)
2021-03-30  2:41       ` Alice Guo (OSS)
2021-04-15  1:27       ` Dominique MARTINET
2021-04-15  1:27         ` Dominique MARTINET
2021-04-15  1:33         ` Peng Fan
2021-04-15  1:33           ` Peng Fan
2021-06-24 10:36           ` Lucas Stach
2021-06-24 10:36             ` Lucas Stach
2021-06-29  2:39             ` Peng Fan (OSS)
2021-06-29  2:39               ` Peng Fan (OSS)
2020-11-20 10:50 ` [PATCH v4 1/4] dt-bindings: soc: imx8m: add DT Binding doc for soc unique ID Krzysztof Kozlowski
2020-11-20 10:50   ` Krzysztof Kozlowski
2020-11-23  4:45   ` [EXT] " Alice Guo
2020-11-23  4:45     ` Alice Guo
2020-11-23  9:04     ` Krzysztof Kozlowski
2020-11-23  9:04       ` Krzysztof Kozlowski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YGGZJjAxA1IO+/VU@atmark-techno.com \
    --to=dominique.martinet@atmark-techno.com \
    --cc=alice.guo@nxp.com \
    --cc=devicetree@vger.kernel.org \
    --cc=krzk@kernel.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-imx@nxp.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=peng.fan@nxp.com \
    --cc=robh+dt@kernel.org \
    --cc=shawnguo@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.