linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
* WARNING: at kernel/workqueue.c:1473 __queue_work+0x3b8/0x3d0
@ 2020-02-17 20:48 Corentin Labbe
  2020-02-18 16:35 ` Daniel Jordan
  0 siblings, 1 reply; 19+ messages in thread
From: Corentin Labbe @ 2020-02-17 20:48 UTC (permalink / raw)
  To: tj, jiangshanlai, will, mark.rutland; +Cc: linux-kernel, linux-arm-kernel

Hello

When running some CI test jobs (targeting crypto tests), I always get the following WARNING:
[    7.886361] ------------[ cut here ]------------
[    7.886388] WARNING: CPU: 2 PID: 147 at kernel/workqueue.c:1473 __queue_work+0x3b8/0x3d0
[    7.886394] Modules linked in: ghash_generic
[    7.886409] CPU: 2 PID: 147 Comm: modprobe Not tainted 5.6.0-rc1-next-20200214-00068-g166c9264f0b1-dirty #545
[    7.886414] Hardware name: Pine H64 model A (DT)
[    7.886422] pstate: a0000085 (NzCv daIf -PAN -UAO)
[    7.886429] pc : __queue_work+0x3b8/0x3d0
[    7.886436] lr : __queue_work+0x1dc/0x3d0
[    7.886440] sp : ffff800012073b50
[    7.886445] x29: ffff800012073b50 x28: ffff8000117bb590 
[    7.886452] x27: 0000000000000100 x26: ffff80001132d018 
[    7.886460] x25: ffff800011336d58 x24: ffff800011629920 
[    7.886467] x23: ffff80001132d018 x22: 000000000000000e 
[    7.886474] x21: 0000000000000002 x20: ffff0000b9c08000 
[    7.886481] x19: ffff0000bd9b8400 x18: 0000000000000000 
[    7.886488] x17: 0000000000000000 x16: 0000000000000000 
[    7.886495] x15: 0000af8c526b5c68 x14: 02be881212d8d480 
[    7.886503] x13: 0000000000000352 x12: 0000000000000001 
[    7.886510] x11: 0000000000000400 x10: 0000000000000040 
[    7.886517] x9 : ffff80001163f5e8 x8 : ffff80001163f5e0 
[    7.886524] x7 : ffff0000b9800028 x6 : 0000000000000000 
[    7.886531] x5 : ffff0000b9800000 x4 : 0000000000000000 
[    7.886538] x3 : ffff0000bd9b4800 x2 : 0000000000000001 
[    7.886545] x1 : 0000000000000000 x0 : ffff8000117bb598 
[    7.886552] Call trace:
[    7.886560]  __queue_work+0x3b8/0x3d0
[    7.886567]  queue_work_on+0x6c/0x90
[    7.886576]  do_init_module+0x188/0x1f0
[    7.886582]  load_module+0x1d00/0x22b0
[    7.886589]  __do_sys_finit_module+0xd0/0xe8
[    7.886595]  __arm64_sys_finit_module+0x1c/0x28
[    7.886605]  el0_svc_common.constprop.0+0x68/0x160
[    7.886613]  do_el0_svc+0x20/0x80
[    7.886621]  el0_sync_handler+0x10c/0x180
[    7.886627]  el0_sync+0x140/0x180
[    7.886638] ---[ end trace a44615ff285cb96c ]---
[    7.886643] WARN for events

For finding what was cause this, I have added the following debug:
@@ -1468,8 +1470,10 @@ static void __queue_work(int cpu, struct workqueue_struct *wq,
        /* pwq determined, queue */
        trace_workqueue_queue_work(req_cpu, pwq, work);
 
-       if (WARN_ON(!list_empty(&work->entry)))
+       if (WARN_ON(!list_empty(&work->entry))) {
+               pr_err("WARN for %s\n", wq->name);
                goto out;
+       }

So it seems that it is a "events" workqueue that hit this problem.

Note that classic defconfig do not hit this problem, since it appears with the following config change:
-CONFIG_CRYPTO_MANAGER_DISABLE_TESTS=y
+# CONFIG_CRYPTO_MANAGER_DISABLE_TESTS is not set
+CONFIG_CRYPTO_MANAGER_EXTRA_TESTS=y

I dont see any relation between crypto and this problem, but this problem with this config change is reproductible.

Regards

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: WARNING: at kernel/workqueue.c:1473 __queue_work+0x3b8/0x3d0
  2020-02-17 20:48 WARNING: at kernel/workqueue.c:1473 __queue_work+0x3b8/0x3d0 Corentin Labbe
@ 2020-02-18 16:35 ` Daniel Jordan
  2020-02-20  9:03   ` Corentin Labbe
  0 siblings, 1 reply; 19+ messages in thread
From: Daniel Jordan @ 2020-02-18 16:35 UTC (permalink / raw)
  To: Corentin Labbe
  Cc: mark.rutland, jiangshanlai, linux-kernel, tj, will, linux-arm-kernel

Hi Corentin,

On Mon, Feb 17, 2020 at 09:48:03PM +0100, Corentin Labbe wrote:
> When running some CI test jobs (targeting crypto tests), I always get the following WARNING:

Can you be more specific about which test triggers this?  I used the config
option you mention and failed to reproduce after doing the boot-time crypto
tests and running various tcrypt incantations.

> [    7.886361] ------------[ cut here ]------------
> [    7.886388] WARNING: CPU: 2 PID: 147 at kernel/workqueue.c:1473 __queue_work+0x3b8/0x3d0
> [    7.886394] Modules linked in: ghash_generic
> [    7.886409] CPU: 2 PID: 147 Comm: modprobe Not tainted 5.6.0-rc1-next-20200214-00068-g166c9264f0b1-dirty #545

I was using just plain next-20200214.  Can't find 166c9264f0b1, what tag/branch
were you using exactly?

> So it seems that it is a "events" workqueue that hit this problem.

Looks like "schedule_work(&init_free_wq)" in do_init_module(), can't be sure
though.

thanks,
Daniel

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: WARNING: at kernel/workqueue.c:1473 __queue_work+0x3b8/0x3d0
  2020-02-18 16:35 ` Daniel Jordan
@ 2020-02-20  9:03   ` Corentin Labbe
  2020-02-21 17:42     ` Daniel Jordan
  0 siblings, 1 reply; 19+ messages in thread
From: Corentin Labbe @ 2020-02-20  9:03 UTC (permalink / raw)
  To: Daniel Jordan
  Cc: mark.rutland, jiangshanlai, linux-kernel, linux-crypto, tj, will,
	linux-arm-kernel

On Tue, Feb 18, 2020 at 11:35:04AM -0500, Daniel Jordan wrote:
> Hi Corentin,
> 
> On Mon, Feb 17, 2020 at 09:48:03PM +0100, Corentin Labbe wrote:
> > When running some CI test jobs (targeting crypto tests), I always get the following WARNING:
> 
> Can you be more specific about which test triggers this?  I used the config
> option you mention and failed to reproduce after doing the boot-time crypto
> tests and running various tcrypt incantations.
> 

Hello

It appears before any user space start. But according to the "Modules linked", probably ghash is already loaded and perhaps tested.

Removing GHASH lead to:
[    7.920931] ------------[ cut here ]------------
[    7.920955] WARNING: CPU: 1 PID: 120 at kernel/workqueue.c:1469 __queue_work+0x370/0x388
[    7.920960] Modules linked in: ccm

And removing CCM lead to
[    7.798877] ------------[ cut here ]------------
[    7.798902] WARNING: CPU: 2 PID: 127 at kernel/workqueue.c:1469 __queue_work+0x370/0x388
[    7.798907] Modules linked in: ctr

So it confirm that the problem is not related to the tested crypto algorithm.

> > [    7.886361] ------------[ cut here ]------------
> > [    7.886388] WARNING: CPU: 2 PID: 147 at kernel/workqueue.c:1473 __queue_work+0x3b8/0x3d0
> > [    7.886394] Modules linked in: ghash_generic
> > [    7.886409] CPU: 2 PID: 147 Comm: modprobe Not tainted 5.6.0-rc1-next-20200214-00068-g166c9264f0b1-dirty #545
> 
> I was using just plain next-20200214.  Can't find 166c9264f0b1, what tag/branch
> were you using exactly?
> 

The pasted example has some commit to try to debug it.
But I got the same with plain next (like yesterday 5.6.0-rc2-next-20200219 and tomorow 5.6.0-rc2-next-20200220) and master got the same issue.

But for reproductability on different hardware, I agree it is difficult.
For the moment, I got it only on Allwinner H5, A64, H6 SoCs and imx8q.
[    6.611449] ------------[ cut here ]------------
[    6.613234] WARNING: CPU: 1 PID: 157 at /srv/data/clabbe/linux-next/kernel/workqueue.c:1471 __queue_work+0x324/0x3b0
[    6.623809] Modules linked in: ghash_generic
[    6.628101] CPU: 1 PID: 157 Comm: modprobe Not tainted 5.6.0-rc2-next-20200220 #82
[    6.635710] Hardware name: NXP i.MX8MNano DDR4 EVK board (DT)

I tried amlogic boards and some qemu "virt" without success.

(I have added linux-crypto@vger.kernel.org to the CC)

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: WARNING: at kernel/workqueue.c:1473 __queue_work+0x3b8/0x3d0
  2020-02-20  9:03   ` Corentin Labbe
@ 2020-02-21 17:42     ` Daniel Jordan
  2020-02-28 12:33       ` Will Deacon
  0 siblings, 1 reply; 19+ messages in thread
From: Daniel Jordan @ 2020-02-21 17:42 UTC (permalink / raw)
  To: Corentin Labbe
  Cc: mark.rutland, jiangshanlai, linux-kernel, Daniel Jordan,
	linux-crypto, tj, will, linux-arm-kernel

On Thu, Feb 20, 2020 at 10:03:50AM +0100, Corentin Labbe wrote:
> But I got the same with plain next (like yesterday 5.6.0-rc2-next-20200219 and tomorow 5.6.0-rc2-next-20200220) and master got the same issue.

Thanks.  I've been trying to reproduce this on an arm board but it's taking a
while to get it setup since I've never used it for kernel work.

Hoping to get it up soon, though someone with a working setup may be in a
better position to help with this.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: WARNING: at kernel/workqueue.c:1473 __queue_work+0x3b8/0x3d0
  2020-02-21 17:42     ` Daniel Jordan
@ 2020-02-28 12:33       ` Will Deacon
  2020-02-28 15:33         ` Daniel Jordan
  0 siblings, 1 reply; 19+ messages in thread
From: Will Deacon @ 2020-02-28 12:33 UTC (permalink / raw)
  To: Daniel Jordan
  Cc: mark.rutland, jiangshanlai, linux-kernel, Corentin Labbe,
	linux-crypto, tj, linux-arm-kernel

On Fri, Feb 21, 2020 at 12:42:23PM -0500, Daniel Jordan wrote:
> On Thu, Feb 20, 2020 at 10:03:50AM +0100, Corentin Labbe wrote:
> > But I got the same with plain next (like yesterday 5.6.0-rc2-next-20200219 and tomorow 5.6.0-rc2-next-20200220) and master got the same issue.
> 
> Thanks.  I've been trying to reproduce this on an arm board but it's taking a
> while to get it setup since I've never used it for kernel work.
> 
> Hoping to get it up soon, though someone with a working setup may be in a
> better position to help with this.

Any joy with this? It sounded to me like the issue also happens on a
mainline kernel. If this is the case, have you managed to bisect it?

Cheers,

Will

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: WARNING: at kernel/workqueue.c:1473 __queue_work+0x3b8/0x3d0
  2020-02-28 12:33       ` Will Deacon
@ 2020-02-28 15:33         ` Daniel Jordan
  2020-03-01 17:53           ` Corentin Labbe
  0 siblings, 1 reply; 19+ messages in thread
From: Daniel Jordan @ 2020-02-28 15:33 UTC (permalink / raw)
  To: Will Deacon
  Cc: mark.rutland, jiangshanlai, linux-kernel, Daniel Jordan,
	Corentin Labbe, linux-crypto, tj, linux-arm-kernel

On Fri, Feb 28, 2020 at 12:33:12PM +0000, Will Deacon wrote:
> On Fri, Feb 21, 2020 at 12:42:23PM -0500, Daniel Jordan wrote:
> > On Thu, Feb 20, 2020 at 10:03:50AM +0100, Corentin Labbe wrote:
> > > But I got the same with plain next (like yesterday 5.6.0-rc2-next-20200219 and tomorow 5.6.0-rc2-next-20200220) and master got the same issue.
> > 
> > Thanks.  I've been trying to reproduce this on an arm board but it's taking a
> > while to get it setup since I've never used it for kernel work.
> > 
> > Hoping to get it up soon, though someone with a working setup may be in a
> > better position to help with this.
> 
> Any joy with this? It sounded to me like the issue also happens on a
> mainline kernel. If this is the case, have you managed to bisect it?

I managed to get recent mainline (rawhide) booting days ago but wasn't able to
reproduce on a rpi 3b+.

My plan had been to try debug-by-email next, but then something exploded
internally and I haven't had time for it yet.  Still intending to help once the
explosion is contained, provided someone can't get to it sooner.

thanks,
Daniel

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: WARNING: at kernel/workqueue.c:1473 __queue_work+0x3b8/0x3d0
  2020-02-28 15:33         ` Daniel Jordan
@ 2020-03-01 17:53           ` Corentin Labbe
  2020-03-02 17:25             ` Daniel Jordan
  0 siblings, 1 reply; 19+ messages in thread
From: Corentin Labbe @ 2020-03-01 17:53 UTC (permalink / raw)
  To: Daniel Jordan
  Cc: mark.rutland, jiangshanlai, linux-kernel, linux-crypto, tj,
	Will Deacon, linux-arm-kernel

On Fri, Feb 28, 2020 at 10:33:31AM -0500, Daniel Jordan wrote:
> On Fri, Feb 28, 2020 at 12:33:12PM +0000, Will Deacon wrote:
> > On Fri, Feb 21, 2020 at 12:42:23PM -0500, Daniel Jordan wrote:
> > > On Thu, Feb 20, 2020 at 10:03:50AM +0100, Corentin Labbe wrote:
> > > > But I got the same with plain next (like yesterday 5.6.0-rc2-next-20200219 and tomorow 5.6.0-rc2-next-20200220) and master got the same issue.
> > > 
> > > Thanks.  I've been trying to reproduce this on an arm board but it's taking a
> > > while to get it setup since I've never used it for kernel work.
> > > 
> > > Hoping to get it up soon, though someone with a working setup may be in a
> > > better position to help with this.
> > 
> > Any joy with this? It sounded to me like the issue also happens on a
> > mainline kernel. If this is the case, have you managed to bisect it?
> 
> I managed to get recent mainline (rawhide) booting days ago but wasn't able to
> reproduce on a rpi 3b+.
> 
> My plan had been to try debug-by-email next, but then something exploded
> internally and I haven't had time for it yet.  Still intending to help once the
> explosion is contained, provided someone can't get to it sooner.
> 
> thanks,
> Daniel

Hello

I tried to bisect this problem, but the result is:
# bad: [0ecfebd2b52404ae0c54a878c872bb93363ada36] Linux 5.2
git bisect bad 0ecfebd2b52404ae0c54a878c872bb93363ada36
# good: [e93c9c99a629c61837d5a7fc2120cd2b6c70dbdd] Linux 5.1
git bisect good e93c9c99a629c61837d5a7fc2120cd2b6c70dbdd
# bad: [a2d635decbfa9c1e4ae15cb05b68b2559f7f827c] Merge tag 'drm-next-2019-05-09' of git://anongit.freedesktop.org/drm/drm
git bisect bad a2d635decbfa9c1e4ae15cb05b68b2559f7f827c
# bad: [82efe439599439a5e1e225ce5740e6cfb777a7dd] Merge tag 'devicetree-for-5.2' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux
git bisect bad 82efe439599439a5e1e225ce5740e6cfb777a7dd
# bad: [78438ce18f26dbcaa8993bb45d20ffb0cec3bc3e] Merge branch 'stable-fodder' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
git bisect bad 78438ce18f26dbcaa8993bb45d20ffb0cec3bc3e
# good: [275b103a26e218b3d739e5ab15be6b40303a1428] Merge tag 'edac_for_5.2' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp
git bisect good 275b103a26e218b3d739e5ab15be6b40303a1428
# bad: [962d5ecca101e65175a8cdb1b91da8e1b8434d96] Merge tag 'regmap-v5.2' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap
git bisect bad 962d5ecca101e65175a8cdb1b91da8e1b8434d96
# good: [181a9096717b8d2128eb1162d07a4f4ee0f9f4b8] crypto: ccree - Make cc_sec_disable static
git bisect good 181a9096717b8d2128eb1162d07a4f4ee0f9f4b8
# good: [5d9e8b3f809f1c12e32fea7061ad2319d2848600] hwmon: (lm25066) Support SAMPLES_FOR_AVG register
git bisect good 5d9e8b3f809f1c12e32fea7061ad2319d2848600
# good: [7aefd944f038c7469571adb37769cb6f3924ecfa] Merge tag 'hwmon-for-v5.2' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging
git bisect good 7aefd944f038c7469571adb37769cb6f3924ecfa
# good: [c660a81796d456f0769937dd3ecf4cfd30f0ece6] selftests/kexec: define "require_root_privileges"
git bisect good c660a81796d456f0769937dd3ecf4cfd30f0ece6
# good: [d917fb876f6eaeeea8a2b620d2a266ce26372f4d] selftests: build and run gpio when output directory is the src dir
git bisect good d917fb876f6eaeeea8a2b620d2a266ce26372f4d
# good: [615c4d9a50e25645646c3bafa658aedc22ab7ca9] Merge branch 'regmap-5.2' into regmap-next
git bisect good 615c4d9a50e25645646c3bafa658aedc22ab7ca9
# good: [e59f755ceb6d6f39f90899d2a4e39c3e05837e12] crypto: ccree - use a proper le32 type for le32 val
git bisect good e59f755ceb6d6f39f90899d2a4e39c3e05837e12
# bad: [71ae5fc87c34ecbdca293c2a5c563d6be2576558] Merge tag 'linux-kselftest-5.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest
git bisect bad 71ae5fc87c34ecbdca293c2a5c563d6be2576558
# bad: [81ff5d2cba4f86cd850b9ee4a530cd221ee45aa3] Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
git bisect bad 81ff5d2cba4f86cd850b9ee4a530cd221ee45aa3
# first bad commit: [81ff5d2cba4f86cd850b9ee4a530cd221ee45aa3] Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6

The only interesting thing I see in this MR is: "Add fuzz testing to testmgr"

But this wont help.

Regards

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: WARNING: at kernel/workqueue.c:1473 __queue_work+0x3b8/0x3d0
  2020-03-01 17:53           ` Corentin Labbe
@ 2020-03-02 17:25             ` Daniel Jordan
  2020-03-02 18:00               ` Robin Murphy
  2020-03-03  7:48               ` Corentin Labbe
  0 siblings, 2 replies; 19+ messages in thread
From: Daniel Jordan @ 2020-03-02 17:25 UTC (permalink / raw)
  To: Corentin Labbe
  Cc: mark.rutland, jiangshanlai, linux-kernel, Daniel Jordan,
	linux-crypto, tj, Will Deacon, linux-arm-kernel

On Sun, Mar 01, 2020 at 06:53:51PM +0100, Corentin Labbe wrote:
> I tried to bisect this problem, but the result is:
...
> # first bad commit: [81ff5d2cba4f86cd850b9ee4a530cd221ee45aa3] Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
> 
> The only interesting thing I see in this MR is: "Add fuzz testing to testmgr"
> 
> But this wont help.

Hm, that merge commit has only a couple lines of powerpc build change, so maybe
there's something nondeterministic going on.

Does this fix it?  I can't verify but figure it's worth trying the simplest
explanation first, which is that the work isn't initialized by the time it's
queued.

thanks,
daniel

---8<---

Subject: [PATCH] module: statically initialize init section freeing data

Signed-off-by: Daniel Jordan <daniel.m.jordan@oracle.com>
---
 kernel/module.c | 13 +++----------
 1 file changed, 3 insertions(+), 10 deletions(-)

diff --git a/kernel/module.c b/kernel/module.c
index 33569a01d6e1..db0cda206167 100644
--- a/kernel/module.c
+++ b/kernel/module.c
@@ -88,8 +88,9 @@ EXPORT_SYMBOL_GPL(module_mutex);
 static LIST_HEAD(modules);
 
 /* Work queue for freeing init sections in success case */
-static struct work_struct init_free_wq;
-static struct llist_head init_free_list;
+static void do_free_init(struct work_struct *w);
+static DECLARE_WORK(init_free_wq, do_free_init);
+static LLIST_HEAD(init_free_list);
 
 #ifdef CONFIG_MODULES_TREE_LOOKUP
 
@@ -3501,14 +3502,6 @@ static void do_free_init(struct work_struct *w)
 	}
 }
 
-static int __init modules_wq_init(void)
-{
-	INIT_WORK(&init_free_wq, do_free_init);
-	init_llist_head(&init_free_list);
-	return 0;
-}
-module_init(modules_wq_init);
-
 /*
  * This is where the real work happens.
  *
-- 
2.25.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: WARNING: at kernel/workqueue.c:1473 __queue_work+0x3b8/0x3d0
  2020-03-02 17:25             ` Daniel Jordan
@ 2020-03-02 18:00               ` Robin Murphy
  2020-03-03 21:30                 ` Daniel Jordan
  2020-03-03  7:48               ` Corentin Labbe
  1 sibling, 1 reply; 19+ messages in thread
From: Robin Murphy @ 2020-03-02 18:00 UTC (permalink / raw)
  To: Daniel Jordan, Corentin Labbe
  Cc: mark.rutland, jiangshanlai, linux-kernel, linux-crypto, tj,
	Will Deacon, linux-arm-kernel

On 02/03/2020 5:25 pm, Daniel Jordan wrote:
> On Sun, Mar 01, 2020 at 06:53:51PM +0100, Corentin Labbe wrote:
>> I tried to bisect this problem, but the result is:
> ...
>> # first bad commit: [81ff5d2cba4f86cd850b9ee4a530cd221ee45aa3] Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
>>
>> The only interesting thing I see in this MR is: "Add fuzz testing to testmgr"
>>
>> But this wont help.
> 
> Hm, that merge commit has only a couple lines of powerpc build change, so maybe
> there's something nondeterministic going on.

Something smelled familiar about this discussion, and sure enough that 
merge contains c4741b230597 ("crypto: run initcalls for generic 
implementations earlier"), which has raised its head before[1].

> Does this fix it?  I can't verify but figure it's worth trying the simplest
> explanation first, which is that the work isn't initialized by the time it's
> queued.

The relative initcall levels would appear to explain the symptom - I 
guess the question is whether this represents a bug in a particular 
test/algorithm (as with the unaligned accesses) or a fundamental problem 
in the infrastructure now being able to poke the module loader too early.

Robin.

[1] 
https://lore.kernel.org/linux-arm-kernel/20190530170737.GB70051@gmail.com/

> thanks,
> daniel
> 
> ---8<---
> 
> Subject: [PATCH] module: statically initialize init section freeing data
> 
> Signed-off-by: Daniel Jordan <daniel.m.jordan@oracle.com>
> ---
>   kernel/module.c | 13 +++----------
>   1 file changed, 3 insertions(+), 10 deletions(-)
> 
> diff --git a/kernel/module.c b/kernel/module.c
> index 33569a01d6e1..db0cda206167 100644
> --- a/kernel/module.c
> +++ b/kernel/module.c
> @@ -88,8 +88,9 @@ EXPORT_SYMBOL_GPL(module_mutex);
>   static LIST_HEAD(modules);
>   
>   /* Work queue for freeing init sections in success case */
> -static struct work_struct init_free_wq;
> -static struct llist_head init_free_list;
> +static void do_free_init(struct work_struct *w);
> +static DECLARE_WORK(init_free_wq, do_free_init);
> +static LLIST_HEAD(init_free_list);
>   
>   #ifdef CONFIG_MODULES_TREE_LOOKUP
>   
> @@ -3501,14 +3502,6 @@ static void do_free_init(struct work_struct *w)
>   	}
>   }
>   
> -static int __init modules_wq_init(void)
> -{
> -	INIT_WORK(&init_free_wq, do_free_init);
> -	init_llist_head(&init_free_list);
> -	return 0;
> -}
> -module_init(modules_wq_init);
> -
>   /*
>    * This is where the real work happens.
>    *
> 

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: WARNING: at kernel/workqueue.c:1473 __queue_work+0x3b8/0x3d0
  2020-03-02 17:25             ` Daniel Jordan
  2020-03-02 18:00               ` Robin Murphy
@ 2020-03-03  7:48               ` Corentin Labbe
  2020-03-03 21:31                 ` Daniel Jordan
  1 sibling, 1 reply; 19+ messages in thread
From: Corentin Labbe @ 2020-03-03  7:48 UTC (permalink / raw)
  To: Daniel Jordan
  Cc: mark.rutland, jiangshanlai, linux-kernel, linux-crypto, tj,
	Will Deacon, linux-arm-kernel

On Mon, Mar 02, 2020 at 12:25:10PM -0500, Daniel Jordan wrote:
> On Sun, Mar 01, 2020 at 06:53:51PM +0100, Corentin Labbe wrote:
> > I tried to bisect this problem, but the result is:
> ...
> > # first bad commit: [81ff5d2cba4f86cd850b9ee4a530cd221ee45aa3] Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
> > 
> > The only interesting thing I see in this MR is: "Add fuzz testing to testmgr"
> > 
> > But this wont help.
> 
> Hm, that merge commit has only a couple lines of powerpc build change, so maybe
> there's something nondeterministic going on.
> 
> Does this fix it?  I can't verify but figure it's worth trying the simplest
> explanation first, which is that the work isn't initialized by the time it's
> queued.
> 
> thanks,
> daniel
> 
> ---8<---
> 
> Subject: [PATCH] module: statically initialize init section freeing data
> 
> Signed-off-by: Daniel Jordan <daniel.m.jordan@oracle.com>
> ---
>  kernel/module.c | 13 +++----------
>  1 file changed, 3 insertions(+), 10 deletions(-)
> 
> diff --git a/kernel/module.c b/kernel/module.c
> index 33569a01d6e1..db0cda206167 100644
> --- a/kernel/module.c
> +++ b/kernel/module.c
> @@ -88,8 +88,9 @@ EXPORT_SYMBOL_GPL(module_mutex);
>  static LIST_HEAD(modules);
>  
>  /* Work queue for freeing init sections in success case */
> -static struct work_struct init_free_wq;
> -static struct llist_head init_free_list;
> +static void do_free_init(struct work_struct *w);
> +static DECLARE_WORK(init_free_wq, do_free_init);
> +static LLIST_HEAD(init_free_list);
>  
>  #ifdef CONFIG_MODULES_TREE_LOOKUP
>  
> @@ -3501,14 +3502,6 @@ static void do_free_init(struct work_struct *w)
>  	}
>  }
>  
> -static int __init modules_wq_init(void)
> -{
> -	INIT_WORK(&init_free_wq, do_free_init);
> -	init_llist_head(&init_free_list);
> -	return 0;
> -}
> -module_init(modules_wq_init);
> -
>  /*
>   * This is where the real work happens.
>   *

Hello

The patch fix the issue. Thanks!

So you could add:
Reported-by: Corentin Labbe <clabbe.montjoie@gmail.com>
Tested-by: Corentin Labbe <clabbe.montjoie@gmail.com>
Tested-on: sun50i-h6-pine-h64
Tested-on: imx8mn-ddr4-evk
Tested-on: sun50i-a64-bananapi-m64

Thanks again
Regards

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: WARNING: at kernel/workqueue.c:1473 __queue_work+0x3b8/0x3d0
  2020-03-02 18:00               ` Robin Murphy
@ 2020-03-03 21:30                 ` Daniel Jordan
  2020-03-03 22:43                   ` Eric Biggers
       [not found]                   ` <20201001175022.GA22444@Red>
  0 siblings, 2 replies; 19+ messages in thread
From: Daniel Jordan @ 2020-03-03 21:30 UTC (permalink / raw)
  To: Robin Murphy
  Cc: mark.rutland, jiangshanlai, linux-kernel, Daniel Jordan,
	Corentin Labbe, linux-crypto, tj, Will Deacon, linux-arm-kernel

On Mon, Mar 02, 2020 at 06:00:10PM +0000, Robin Murphy wrote:
> On 02/03/2020 5:25 pm, Daniel Jordan wrote:
> Something smelled familiar about this discussion, and sure enough that merge
> contains c4741b230597 ("crypto: run initcalls for generic implementations
> earlier"), which has raised its head before[1].

Yep, that looks suspicious.

The bisect didn't point to that specific commit, even though my version of git
tries commits in the merge.  I'm probably missing something.

> > Does this fix it?  I can't verify but figure it's worth trying the simplest
> > explanation first, which is that the work isn't initialized by the time it's
> > queued.
> 
> The relative initcall levels would appear to explain the symptom - I guess
> the question is whether this represents a bug in a particular test/algorithm
> (as with the unaligned accesses) or a fundamental problem in the
> infrastructure now being able to poke the module loader too early.

I'm not familiar with the crypto code.  Could it be that the commit moved some
request_module() calls before modules_wq_init()?

And, is it "too early" or just "earlier"?  When is it too early for modprobe?

Barring other ideas, Corentin, would you be willing to boot with

    trace_event=initcall:*,module:* trace_options=stacktrace

and

diff --git a/kernel/module.c b/kernel/module.c
index 33569a01d6e1..393be6979a27 100644
--- a/kernel/module.c
+++ b/kernel/module.c
@@ -3604,8 +3604,11 @@ static noinline int do_init_module(struct module *mod)
 	 * be cleaned up needs to sync with the queued work - ie
 	 * rcu_barrier()
 	 */
-	if (llist_add(&freeinit->node, &init_free_list))
+	if (llist_add(&freeinit->node, &init_free_list)) {
+		pr_warn("%s: schedule_work for mod=%s\n", __func__, mod->name);
+		dump_stack();
 		schedule_work(&init_free_wq);
+	}
 
 	mutex_unlock(&module_mutex);
 	wake_up_all(&module_wq);

but not my earlier fix and share the dmesg and ftrace output to see if the
theory holds?

Also, could you attach your config?  Curious now what your crypto options look
like after fiddling with some of them today while trying and failing to see
this on x86.

thanks,
Daniel

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: WARNING: at kernel/workqueue.c:1473 __queue_work+0x3b8/0x3d0
  2020-03-03  7:48               ` Corentin Labbe
@ 2020-03-03 21:31                 ` Daniel Jordan
  2020-09-25 18:12                   ` Corentin Labbe
  0 siblings, 1 reply; 19+ messages in thread
From: Daniel Jordan @ 2020-03-03 21:31 UTC (permalink / raw)
  To: Corentin Labbe
  Cc: mark.rutland, Robin Murphy, jiangshanlai, linux-kernel,
	Daniel Jordan, linux-crypto, tj, Will Deacon, linux-arm-kernel

On Tue, Mar 03, 2020 at 08:48:19AM +0100, Corentin Labbe wrote:
> The patch fix the issue. Thanks!

Thanks for trying it!

> So you could add:
> Reported-by: Corentin Labbe <clabbe.montjoie@gmail.com>
> Tested-by: Corentin Labbe <clabbe.montjoie@gmail.com>
> Tested-on: sun50i-h6-pine-h64
> Tested-on: imx8mn-ddr4-evk
> Tested-on: sun50i-a64-bananapi-m64

I definitely will if the patch turns out to be the right fix.

thanks,
Daniel

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: WARNING: at kernel/workqueue.c:1473 __queue_work+0x3b8/0x3d0
  2020-03-03 21:30                 ` Daniel Jordan
@ 2020-03-03 22:43                   ` Eric Biggers
  2020-03-06 16:12                     ` Daniel Jordan
       [not found]                   ` <20201001175022.GA22444@Red>
  1 sibling, 1 reply; 19+ messages in thread
From: Eric Biggers @ 2020-03-03 22:43 UTC (permalink / raw)
  To: Daniel Jordan
  Cc: mark.rutland, Will Deacon, jiangshanlai, linux-kernel,
	Corentin Labbe, linux-crypto, tj, Robin Murphy, linux-arm-kernel

On Tue, Mar 03, 2020 at 04:30:17PM -0500, Daniel Jordan wrote:
> On Mon, Mar 02, 2020 at 06:00:10PM +0000, Robin Murphy wrote:
> > On 02/03/2020 5:25 pm, Daniel Jordan wrote:
> > Something smelled familiar about this discussion, and sure enough that merge
> > contains c4741b230597 ("crypto: run initcalls for generic implementations
> > earlier"), which has raised its head before[1].
> 
> Yep, that looks suspicious.
> 
> The bisect didn't point to that specific commit, even though my version of git
> tries commits in the merge.  I'm probably missing something.
> 
> > > Does this fix it?  I can't verify but figure it's worth trying the simplest
> > > explanation first, which is that the work isn't initialized by the time it's
> > > queued.
> > 
> > The relative initcall levels would appear to explain the symptom - I guess
> > the question is whether this represents a bug in a particular test/algorithm
> > (as with the unaligned accesses) or a fundamental problem in the
> > infrastructure now being able to poke the module loader too early.
> 
> I'm not familiar with the crypto code.  Could it be that the commit moved some
> request_module() calls before modules_wq_init()?
> 
> And, is it "too early" or just "earlier"?  When is it too early for modprobe?
> 
> Barring other ideas, Corentin, would you be willing to boot with
> 
>     trace_event=initcall:*,module:* trace_options=stacktrace
> 
> and
> 
> diff --git a/kernel/module.c b/kernel/module.c
> index 33569a01d6e1..393be6979a27 100644
> --- a/kernel/module.c
> +++ b/kernel/module.c
> @@ -3604,8 +3604,11 @@ static noinline int do_init_module(struct module *mod)
>  	 * be cleaned up needs to sync with the queued work - ie
>  	 * rcu_barrier()
>  	 */
> -	if (llist_add(&freeinit->node, &init_free_list))
> +	if (llist_add(&freeinit->node, &init_free_list)) {
> +		pr_warn("%s: schedule_work for mod=%s\n", __func__, mod->name);
> +		dump_stack();
>  		schedule_work(&init_free_wq);
> +	}
>  
>  	mutex_unlock(&module_mutex);
>  	wake_up_all(&module_wq);
> 
> but not my earlier fix and share the dmesg and ftrace output to see if the
> theory holds?
> 
> Also, could you attach your config?  Curious now what your crypto options look
> like after fiddling with some of them today while trying and failing to see
> this on x86.
> 

Probably the request_module() is coming from the registration-time crypto
self-tests allocating the generic implementation of algorithm when an
architecture-specific implementation is registered.  This occurs when
CONFIG_CRYPTO_MANAGER_EXTRA_TESTS=y on Linux v5.2 and later.

If this is causing problems we could do:

diff --git a/crypto/testmgr.c b/crypto/testmgr.c
index ccb3d60729fc..d89791700b88 100644
--- a/crypto/testmgr.c
+++ b/crypto/testmgr.c
@@ -1667,7 +1667,7 @@ static int test_hash_vs_generic_impl(const char *driver,
 	if (strcmp(generic_driver, driver) == 0) /* Already the generic impl? */
 		return 0;
 
-	generic_tfm = crypto_alloc_shash(generic_driver, 0, 0);
+	generic_tfm = crypto_alloc_shash(generic_driver, 0, CRYPTO_NOLOAD);
 	if (IS_ERR(generic_tfm)) {
 		err = PTR_ERR(generic_tfm);
 		if (err == -ENOENT) {
@@ -2389,7 +2389,7 @@ static int test_aead_vs_generic_impl(struct aead_extra_tests_ctx *ctx)
 	if (strcmp(generic_driver, driver) == 0) /* Already the generic impl? */
 		return 0;
 
-	generic_tfm = crypto_alloc_aead(generic_driver, 0, 0);
+	generic_tfm = crypto_alloc_aead(generic_driver, 0, CRYPTO_NOLOAD);
 	if (IS_ERR(generic_tfm)) {
 		err = PTR_ERR(generic_tfm);
 		if (err == -ENOENT) {
@@ -2993,7 +2993,7 @@ static int test_skcipher_vs_generic_impl(const char *driver,
 	if (strcmp(generic_driver, driver) == 0) /* Already the generic impl? */
 		return 0;
 
-	generic_tfm = crypto_alloc_skcipher(generic_driver, 0, 0);
+	generic_tfm = crypto_alloc_skcipher(generic_driver, 0, CRYPTO_NOLOAD);
 	if (IS_ERR(generic_tfm)) {
 		err = PTR_ERR(generic_tfm);
 		if (err == -ENOENT) {


... but that's not ideal, since it would mean that if someone builds all crypto
algorithms as modules, then the comparison tests could be unnecessarily skipped.

But it is really always wrong to be calling request_module() from other
module_init() functions?  The commit that added 'init_free_wq' was also
introduced in v5.2; maybe that's the problem here?

	commit 1a7b7d9220819afe79d1ec5d759fe4349bd2453e
	Author: Rick Edgecombe <rick.p.edgecombe@intel.com>
	Date:   Thu Apr 25 17:11:37 2019 -0700

	    modules: Use vmalloc special flag

- Eric

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: WARNING: at kernel/workqueue.c:1473 __queue_work+0x3b8/0x3d0
  2020-03-03 22:43                   ` Eric Biggers
@ 2020-03-06 16:12                     ` Daniel Jordan
  0 siblings, 0 replies; 19+ messages in thread
From: Daniel Jordan @ 2020-03-06 16:12 UTC (permalink / raw)
  To: Eric Biggers, Corentin Labbe
  Cc: mark.rutland, Will Deacon, jiangshanlai, linux-kernel,
	Daniel Jordan, linux-crypto, tj, Robin Murphy, linux-arm-kernel

On Tue, Mar 03, 2020 at 02:43:27PM -0800, Eric Biggers wrote:
> Probably the request_module() is coming from the registration-time crypto
> self-tests allocating the generic implementation of algorithm when an
> architecture-specific implementation is registered.  This occurs when
> CONFIG_CRYPTO_MANAGER_EXTRA_TESTS=y on Linux v5.2 and later.

Ok, I can confirm that if we get the debug output from one of Corentin's
boards.

> If this is causing problems we could do:
> 
> diff --git a/crypto/testmgr.c b/crypto/testmgr.c
> index ccb3d60729fc..d89791700b88 100644
> --- a/crypto/testmgr.c
> +++ b/crypto/testmgr.c
> @@ -1667,7 +1667,7 @@ static int test_hash_vs_generic_impl(const char *driver,
>  	if (strcmp(generic_driver, driver) == 0) /* Already the generic impl? */
>  		return 0;
>  
> -	generic_tfm = crypto_alloc_shash(generic_driver, 0, 0);
> +	generic_tfm = crypto_alloc_shash(generic_driver, 0, CRYPTO_NOLOAD);
>  	if (IS_ERR(generic_tfm)) {
>  		err = PTR_ERR(generic_tfm);
>  		if (err == -ENOENT) {
> @@ -2389,7 +2389,7 @@ static int test_aead_vs_generic_impl(struct aead_extra_tests_ctx *ctx)
>  	if (strcmp(generic_driver, driver) == 0) /* Already the generic impl? */
>  		return 0;
>  
> -	generic_tfm = crypto_alloc_aead(generic_driver, 0, 0);
> +	generic_tfm = crypto_alloc_aead(generic_driver, 0, CRYPTO_NOLOAD);
>  	if (IS_ERR(generic_tfm)) {
>  		err = PTR_ERR(generic_tfm);
>  		if (err == -ENOENT) {
> @@ -2993,7 +2993,7 @@ static int test_skcipher_vs_generic_impl(const char *driver,
>  	if (strcmp(generic_driver, driver) == 0) /* Already the generic impl? */
>  		return 0;
>  
> -	generic_tfm = crypto_alloc_skcipher(generic_driver, 0, 0);
> +	generic_tfm = crypto_alloc_skcipher(generic_driver, 0, CRYPTO_NOLOAD);
>  	if (IS_ERR(generic_tfm)) {
>  		err = PTR_ERR(generic_tfm);
>  		if (err == -ENOENT) {
> 
> 
> ... but that's not ideal, since it would mean that if someone builds all crypto
> algorithms as modules, then the comparison tests could be unnecessarily skipped.

We should try to avoid this then.

> But it is really always wrong to be calling request_module() from other
> module_init() functions?  The commit that added 'init_free_wq' was also
> introduced in v5.2; maybe that's the problem here?
> 
> 	commit 1a7b7d9220819afe79d1ec5d759fe4349bd2453e
> 	Author: Rick Edgecombe <rick.p.edgecombe@intel.com>
> 	Date:   Thu Apr 25 17:11:37 2019 -0700
> 
> 	    modules: Use vmalloc special flag

Yes, I don't see a reason init_free_wq has to be initialized that late.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: WARNING: at kernel/workqueue.c:1473 __queue_work+0x3b8/0x3d0
  2020-03-03 21:31                 ` Daniel Jordan
@ 2020-09-25 18:12                   ` Corentin Labbe
  2020-09-30 18:18                     ` Daniel Jordan
  0 siblings, 1 reply; 19+ messages in thread
From: Corentin Labbe @ 2020-09-25 18:12 UTC (permalink / raw)
  To: Daniel Jordan
  Cc: mark.rutland, Robin Murphy, jiangshanlai, linux-kernel,
	linux-crypto, tj, Will Deacon, linux-arm-kernel

On Tue, Mar 03, 2020 at 04:31:11PM -0500, Daniel Jordan wrote:
> On Tue, Mar 03, 2020 at 08:48:19AM +0100, Corentin Labbe wrote:
> > The patch fix the issue. Thanks!
> 
> Thanks for trying it!
> 
> > So you could add:
> > Reported-by: Corentin Labbe <clabbe.montjoie@gmail.com>
> > Tested-by: Corentin Labbe <clabbe.montjoie@gmail.com>
> > Tested-on: sun50i-h6-pine-h64
> > Tested-on: imx8mn-ddr4-evk
> > Tested-on: sun50i-a64-bananapi-m64
> 
> I definitely will if the patch turns out to be the right fix.
> 
> thanks,
> Daniel

Hello

I forgot about this problem since the patch is in my branch since.
But a co-worker hit this problem recently and without this patch my CI still have it.

Regards

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: WARNING: at kernel/workqueue.c:1473 __queue_work+0x3b8/0x3d0
  2020-09-25 18:12                   ` Corentin Labbe
@ 2020-09-30 18:18                     ` Daniel Jordan
  0 siblings, 0 replies; 19+ messages in thread
From: Daniel Jordan @ 2020-09-30 18:18 UTC (permalink / raw)
  To: Corentin Labbe
  Cc: mark.rutland, Robin Murphy, jiangshanlai, linux-kernel,
	Daniel Jordan, linux-crypto, tj, Will Deacon, linux-arm-kernel

On Fri, Sep 25, 2020 at 08:12:03PM +0200, Corentin Labbe wrote:
> On Tue, Mar 03, 2020 at 04:31:11PM -0500, Daniel Jordan wrote:
> > On Tue, Mar 03, 2020 at 08:48:19AM +0100, Corentin Labbe wrote:
> > > The patch fix the issue. Thanks!
> > 
> > Thanks for trying it!
> > 
> > > So you could add:
> > > Reported-by: Corentin Labbe <clabbe.montjoie@gmail.com>
> > > Tested-by: Corentin Labbe <clabbe.montjoie@gmail.com>
> > > Tested-on: sun50i-h6-pine-h64
> > > Tested-on: imx8mn-ddr4-evk
> > > Tested-on: sun50i-a64-bananapi-m64
> > 
> > I definitely will if the patch turns out to be the right fix.
> > 
> > thanks,
> > Daniel
> 
> Hello
> 
> I forgot about this problem since the patch is in my branch since.
> But a co-worker hit this problem recently and without this patch my CI still have it.

Hi,

Sure, I'm happy to help get a fix merged, but let's nail down what the problem
is first.  It'd be useful to have the things requested here:

https://lore.kernel.org/linux-crypto/20200303213017.tanczhqd3nhpeeak@ca-dmjordan1.us.oracle.com/

thanks,
Daniel

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: WARNING: at kernel/workqueue.c:1473 __queue_work+0x3b8/0x3d0
       [not found]                   ` <20201001175022.GA22444@Red>
@ 2020-10-05 17:09                     ` Daniel Jordan
  2020-10-07 19:41                       ` Corentin Labbe
  0 siblings, 1 reply; 19+ messages in thread
From: Daniel Jordan @ 2020-10-05 17:09 UTC (permalink / raw)
  To: Corentin Labbe
  Cc: mark.rutland, Will Deacon, jiangshanlai, linux-kernel,
	Daniel Jordan, linux-crypto, tj, Robin Murphy, linux-arm-kernel

On Thu, Oct 01, 2020 at 07:50:22PM +0200, Corentin Labbe wrote:
> On Tue, Mar 03, 2020 at 04:30:17PM -0500, Daniel Jordan wrote:
> > Barring other ideas, Corentin, would you be willing to boot with
> > 
> >     trace_event=initcall:*,module:* trace_options=stacktrace
> > 
> > and
> > 
> > diff --git a/kernel/module.c b/kernel/module.c
> > index 33569a01d6e1..393be6979a27 100644
> > --- a/kernel/module.c
> > +++ b/kernel/module.c
> > @@ -3604,8 +3604,11 @@ static noinline int do_init_module(struct module *mod)
> >  	 * be cleaned up needs to sync with the queued work - ie
> >  	 * rcu_barrier()
> >  	 */
> > -	if (llist_add(&freeinit->node, &init_free_list))
> > +	if (llist_add(&freeinit->node, &init_free_list)) {
> > +		pr_warn("%s: schedule_work for mod=%s\n", __func__, mod->name);
> > +		dump_stack();
> >  		schedule_work(&init_free_wq);
> > +	}
> >  
> >  	mutex_unlock(&module_mutex);
> >  	wake_up_all(&module_wq);
> > 
> > but not my earlier fix and share the dmesg and ftrace output to see if the
> > theory holds?
> > 
> > Also, could you attach your config?  Curious now what your crypto options look
> > like after fiddling with some of them today while trying and failing to see
> > this on x86.
> > 
> > thanks,
> > Daniel
> 
> Hello
> 
> Sorry for the very delayed answer.
> 
> I fail to reproduce it on x86 (qemu and  real hw) and arm.
> It seems to only happen on arm64.

Thanks for the config and dmesg, but there's no ftrace.  I see it's not
configured in your kernel, so could you boot with my earlier debug patch plus
this one and the kernel argument initcall_debug instead?

I'm trying to see whether it really is a request module call from the crypto
tests that's triggering this warning.  Preeetty likely that's what's happening,
but want to be sure since I can't reproduce this.  Then I can post the fix.

diff --git a/crypto/algapi.c b/crypto/algapi.c
index fdabf2675b63..0667c6b4588e 100644
--- a/crypto/algapi.c
+++ b/crypto/algapi.c
@@ -393,6 +393,10 @@ static void crypto_wait_for_test(struct crypto_larval *larval)
 {
 	int err;
 
+	pr_warn("%s: cra_name %s cra_driver_name %s\n", __func__,
+		larval->adult->cra_name, larval->adult->cra_driver_name);
+	dump_stack();
+
 	err = crypto_probing_notify(CRYPTO_MSG_ALG_REGISTER, larval->adult);
 	if (err != NOTIFY_STOP) {
 		if (WARN_ON(err != NOTIFY_DONE))
diff --git a/kernel/kmod.c b/kernel/kmod.c
index 3cd075ce2a1e..46c4645be763 100644
--- a/kernel/kmod.c
+++ b/kernel/kmod.c
@@ -166,6 +166,8 @@ int __request_module(bool wait, const char *fmt, ...)
 	}
 
 	trace_module_request(module_name, wait, _RET_IP_);
+	pr_warn("%s: %s\n", __func__, module_name);
+	dump_stack();
 
 	ret = call_modprobe(module_name, wait ? UMH_WAIT_PROC : UMH_WAIT_EXEC);
 

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: WARNING: at kernel/workqueue.c:1473 __queue_work+0x3b8/0x3d0
  2020-10-05 17:09                     ` Daniel Jordan
@ 2020-10-07 19:41                       ` Corentin Labbe
  2020-10-08 17:07                         ` Daniel Jordan
  0 siblings, 1 reply; 19+ messages in thread
From: Corentin Labbe @ 2020-10-07 19:41 UTC (permalink / raw)
  To: Daniel Jordan
  Cc: mark.rutland, Will Deacon, jiangshanlai, linux-kernel,
	linux-crypto, tj, Robin Murphy, linux-arm-kernel

On Mon, Oct 05, 2020 at 01:09:10PM -0400, Daniel Jordan wrote:
> On Thu, Oct 01, 2020 at 07:50:22PM +0200, Corentin Labbe wrote:
> > On Tue, Mar 03, 2020 at 04:30:17PM -0500, Daniel Jordan wrote:
> > > Barring other ideas, Corentin, would you be willing to boot with
> > > 
> > >     trace_event=initcall:*,module:* trace_options=stacktrace
> > > 
> > > and
> > > 
> > > diff --git a/kernel/module.c b/kernel/module.c
> > > index 33569a01d6e1..393be6979a27 100644
> > > --- a/kernel/module.c
> > > +++ b/kernel/module.c
> > > @@ -3604,8 +3604,11 @@ static noinline int do_init_module(struct module *mod)
> > >  	 * be cleaned up needs to sync with the queued work - ie
> > >  	 * rcu_barrier()
> > >  	 */
> > > -	if (llist_add(&freeinit->node, &init_free_list))
> > > +	if (llist_add(&freeinit->node, &init_free_list)) {
> > > +		pr_warn("%s: schedule_work for mod=%s\n", __func__, mod->name);
> > > +		dump_stack();
> > >  		schedule_work(&init_free_wq);
> > > +	}
> > >  
> > >  	mutex_unlock(&module_mutex);
> > >  	wake_up_all(&module_wq);
> > > 
> > > but not my earlier fix and share the dmesg and ftrace output to see if the
> > > theory holds?
> > > 
> > > Also, could you attach your config?  Curious now what your crypto options look
> > > like after fiddling with some of them today while trying and failing to see
> > > this on x86.
> > > 
> > > thanks,
> > > Daniel
> > 
> > Hello
> > 
> > Sorry for the very delayed answer.
> > 
> > I fail to reproduce it on x86 (qemu and  real hw) and arm.
> > It seems to only happen on arm64.
> 
> Thanks for the config and dmesg, but there's no ftrace.  I see it's not
> configured in your kernel, so could you boot with my earlier debug patch plus
> this one and the kernel argument initcall_debug instead?
> 
> I'm trying to see whether it really is a request module call from the crypto
> tests that's triggering this warning.  Preeetty likely that's what's happening,
> but want to be sure since I can't reproduce this.  Then I can post the fix.
> 

I have added CONFIG_FTRACE=y and your second patch.
The boot log can be seen at http://kernel.montjoie.ovh/108789.log

But it seems the latest dump_stack addition flood a bit.
I have started to read ftrace documentation, but if you have a quick what to do in /sys/kernel/debug/tracing, it will be helpfull.


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: WARNING: at kernel/workqueue.c:1473 __queue_work+0x3b8/0x3d0
  2020-10-07 19:41                       ` Corentin Labbe
@ 2020-10-08 17:07                         ` Daniel Jordan
  0 siblings, 0 replies; 19+ messages in thread
From: Daniel Jordan @ 2020-10-08 17:07 UTC (permalink / raw)
  To: Corentin Labbe
  Cc: mark.rutland, Will Deacon, jiangshanlai, linux-kernel,
	Daniel Jordan, linux-crypto, tj, Robin Murphy, linux-arm-kernel

On Wed, Oct 07, 2020 at 09:41:17PM +0200, Corentin Labbe wrote:
> I have added CONFIG_FTRACE=y and your second patch.
> The boot log can be seen at http://kernel.montjoie.ovh/108789.log
> 
> But it seems the latest dump_stack addition flood a bit.

Heh, sorry for making it spew, there wasn't such a flood when I tried.  Your
output is sufficiently incriminating, so I'll go post the fix now.

> I have started to read ftrace documentation, but if you have a quick what to do in /sys/kernel/debug/tracing, it will be helpfull.

Sure, you can view the trace in /sys/kernel/debug/tracing/trace and
kernel-parameters.txt has the boot options documented.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2020-10-08 17:02 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-02-17 20:48 WARNING: at kernel/workqueue.c:1473 __queue_work+0x3b8/0x3d0 Corentin Labbe
2020-02-18 16:35 ` Daniel Jordan
2020-02-20  9:03   ` Corentin Labbe
2020-02-21 17:42     ` Daniel Jordan
2020-02-28 12:33       ` Will Deacon
2020-02-28 15:33         ` Daniel Jordan
2020-03-01 17:53           ` Corentin Labbe
2020-03-02 17:25             ` Daniel Jordan
2020-03-02 18:00               ` Robin Murphy
2020-03-03 21:30                 ` Daniel Jordan
2020-03-03 22:43                   ` Eric Biggers
2020-03-06 16:12                     ` Daniel Jordan
     [not found]                   ` <20201001175022.GA22444@Red>
2020-10-05 17:09                     ` Daniel Jordan
2020-10-07 19:41                       ` Corentin Labbe
2020-10-08 17:07                         ` Daniel Jordan
2020-03-03  7:48               ` Corentin Labbe
2020-03-03 21:31                 ` Daniel Jordan
2020-09-25 18:12                   ` Corentin Labbe
2020-09-30 18:18                     ` Daniel Jordan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).