All of lore.kernel.org
 help / color / mirror / Atom feed
* ath9k: hwrng blocks for several minutes when phy is un-associated
@ 2022-06-23  5:08 Gregory Erwin
  2022-06-23 12:14 ` Jason A. Donenfeld
  0 siblings, 1 reply; 28+ messages in thread
From: Gregory Erwin @ 2022-06-23  5:08 UTC (permalink / raw)
  To: Jason A. Donenfeld, Miaoqing Pan, Rui Salvaterra
  Cc: ath9k-devel, linux-crypto, linux-wireless

Hello,

I bisected down to commit [fcd09c90c3c5] "ath9k: use hw_random API instead of
directly dumping into random.c'' while investigating a long delay when entering
suspend on kernels v5.18 onward. There are other reports of hangs or
unresponsiveness at https://bugs.archlinux.org/task/75138 with some more info.

AFAIKT, the issue is triggered by the ath9k hwrng when the interface is up,
but not associated with any AP. In this state, 'dd if=/dev/hwrng' will block
for up to 231 seconds before finally returning an input/output error. Similarly,
I get a kernel log message "hwrng: no data available" every 231 seconds.

The hwrng will unblock when attempting to connect to an SSID that doesn't exist,
but not when performing a scan, so I'm guessing AR_PHY_TST_ADC only produces new
data when the phy is transmitting.

Admittedly, I don't actually know if this blocking behavior is
expected or not, but it certainly seems undesirable.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: ath9k: hwrng blocks for several minutes when phy is un-associated
  2022-06-23  5:08 ath9k: hwrng blocks for several minutes when phy is un-associated Gregory Erwin
@ 2022-06-23 12:14 ` Jason A. Donenfeld
  2022-06-23 12:16   ` Jason A. Donenfeld
  0 siblings, 1 reply; 28+ messages in thread
From: Jason A. Donenfeld @ 2022-06-23 12:14 UTC (permalink / raw)
  To: Gregory Erwin
  Cc: Miaoqing Pan, Rui Salvaterra, ath9k-devel, linux-crypto, linux-wireless

Hi Gregory,

On Wed, Jun 22, 2022 at 10:08:15PM -0700, Gregory Erwin wrote:
> Hello,
> 
> I bisected down to commit [fcd09c90c3c5] "ath9k: use hw_random API instead of
> directly dumping into random.c'' while investigating a long delay when entering
> suspend on kernels v5.18 onward. There are other reports of hangs or
> unresponsiveness at https://bugs.archlinux.org/task/75138 with some more info.
> 
> AFAIKT, the issue is triggered by the ath9k hwrng when the interface is up,
> but not associated with any AP. In this state, 'dd if=/dev/hwrng' will block
> for up to 231 seconds before finally returning an input/output error. Similarly,
> I get a kernel log message "hwrng: no data available" every 231 seconds.
> 
> The hwrng will unblock when attempting to connect to an SSID that doesn't exist,
> but not when performing a scan, so I'm guessing AR_PHY_TST_ADC only produces new
> data when the phy is transmitting.
> 
> Admittedly, I don't actually know if this blocking behavior is
> expected or not, but it certainly seems undesirable.

Thanks for the report. I wish somebody from one of those bug reports
would have emailed earlier.

I don't have hardware to test this, but could you let me know if the
below patch does something? I'm sort of guessing, but maybe this is
right?

Jason

diff --git a/drivers/net/wireless/ath/ath9k/rng.c b/drivers/net/wireless/ath/ath9k/rng.c
index cb5414265a9b..a6291f5f0d47 100644
--- a/drivers/net/wireless/ath/ath9k/rng.c
+++ b/drivers/net/wireless/ath/ath9k/rng.c
@@ -80,7 +80,7 @@ static int ath9k_rng_read(struct hwrng *rng, void *buf, size_t max, bool wait)
 			bytes_read += max & 3UL;
 			memzero_explicit(&word, sizeof(word));
 		}
-		if (!wait || !max || likely(bytes_read) || fail_stats > 110)
+		if (!wait || !max || likely(bytes_read) || fail_stats > 110 || kthread_should_stop())
 			break;

 		msleep_interruptible(ath9k_rng_delay_get(++fail_stats));


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: ath9k: hwrng blocks for several minutes when phy is un-associated
  2022-06-23 12:14 ` Jason A. Donenfeld
@ 2022-06-23 12:16   ` Jason A. Donenfeld
  2022-06-23 23:36     ` Gregory Erwin
  0 siblings, 1 reply; 28+ messages in thread
From: Jason A. Donenfeld @ 2022-06-23 12:16 UTC (permalink / raw)
  To: Gregory Erwin
  Cc: Miaoqing Pan, Rui Salvaterra, ath9k-devel, linux-crypto, linux-wireless

Or perhaps more simply:

diff --git a/drivers/net/wireless/ath/ath9k/rng.c b/drivers/net/wireless/ath/ath9k/rng.c
index cb5414265a9b..5b44cd918c2b 100644
--- a/drivers/net/wireless/ath/ath9k/rng.c
+++ b/drivers/net/wireless/ath/ath9k/rng.c
@@ -83,7 +83,8 @@ static int ath9k_rng_read(struct hwrng *rng, void *buf, size_t max, bool wait)
 		if (!wait || !max || likely(bytes_read) || fail_stats > 110)
 			break;

-		msleep_interruptible(ath9k_rng_delay_get(++fail_stats));
+		if (msleep_interruptible(ath9k_rng_delay_get(++fail_stats)))
+			break;
 	}

 	if (wait && !bytes_read && max)


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: ath9k: hwrng blocks for several minutes when phy is un-associated
  2022-06-23 12:16   ` Jason A. Donenfeld
@ 2022-06-23 23:36     ` Gregory Erwin
  2022-06-23 23:47       ` Jason A. Donenfeld
  0 siblings, 1 reply; 28+ messages in thread
From: Gregory Erwin @ 2022-06-23 23:36 UTC (permalink / raw)
  To: Jason A. Donenfeld; +Cc: Rui Salvaterra, linux-crypto, linux-wireless

No luck.

The first patch caused a warning and oops with ath9k_rng_read() at the
top of the call stack when reading from /dev/hwrng:
  WARNING: CPU: 1 PID: 454 at kernel/kthread.c:75 kthread_should_stop+0x2a/0x30
  BUG: kernel NULL pointer dereference, address: 0000000000000000

The second didn't have a noticeable effect, for better or worse.


On Thu, Jun 23, 2022 at 5:16 AM Jason A. Donenfeld <Jason@zx2c4.com> wrote:
>
> Or perhaps more simply:
>
> diff --git a/drivers/net/wireless/ath/ath9k/rng.c b/drivers/net/wireless/ath/ath9k/rng.c
> index cb5414265a9b..5b44cd918c2b 100644
> --- a/drivers/net/wireless/ath/ath9k/rng.c
> +++ b/drivers/net/wireless/ath/ath9k/rng.c
> @@ -83,7 +83,8 @@ static int ath9k_rng_read(struct hwrng *rng, void *buf, size_t max, bool wait)
>                 if (!wait || !max || likely(bytes_read) || fail_stats > 110)
>                         break;
>
> -               msleep_interruptible(ath9k_rng_delay_get(++fail_stats));
> +               if (msleep_interruptible(ath9k_rng_delay_get(++fail_stats)))
> +                       break;
>         }
>
>         if (wait && !bytes_read && max)
>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: ath9k: hwrng blocks for several minutes when phy is un-associated
  2022-06-23 23:36     ` Gregory Erwin
@ 2022-06-23 23:47       ` Jason A. Donenfeld
  2022-06-24  0:01         ` Jason A. Donenfeld
  0 siblings, 1 reply; 28+ messages in thread
From: Jason A. Donenfeld @ 2022-06-23 23:47 UTC (permalink / raw)
  To: Gregory Erwin; +Cc: Rui Salvaterra, Linux Crypto Mailing List, linux-wireless

Hey Gregory,

On Fri, Jun 24, 2022 at 1:36 AM Gregory Erwin <gregerwin256@gmail.com> wrote:
>
> No luck.
>
> The first patch caused a warning and oops with ath9k_rng_read() at the
> top of the call stack when reading from /dev/hwrng:
>   WARNING: CPU: 1 PID: 454 at kernel/kthread.c:75 kthread_should_stop+0x2a/0x30
>   BUG: kernel NULL pointer dereference, address: 0000000000000000
>
> The second didn't have a noticeable effect, for better or worse.

Alright. That's actually getting us somewhere. So the path in question
here is from reading /dev/hwrng, not from the kthread that's doing the
same read.

Can you do a `cat /dev/hwrng > /dev/null`, and then do whatever it is
you do that causes everything to hang, and then while things are hung
in the bad way, look at the contents of /proc/[the pid of the cat you
just ran]/stack?

Jason

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: ath9k: hwrng blocks for several minutes when phy is un-associated
  2022-06-23 23:47       ` Jason A. Donenfeld
@ 2022-06-24  0:01         ` Jason A. Donenfeld
  2022-06-24  0:50           ` Jason A. Donenfeld
  0 siblings, 1 reply; 28+ messages in thread
From: Jason A. Donenfeld @ 2022-06-24  0:01 UTC (permalink / raw)
  To: Gregory Erwin; +Cc: Rui Salvaterra, Linux Crypto Mailing List, linux-wireless

Hey again,

On Fri, Jun 24, 2022 at 1:47 AM Jason A. Donenfeld <Jason@zx2c4.com> wrote:
>
> Hey Gregory,
>
> On Fri, Jun 24, 2022 at 1:36 AM Gregory Erwin <gregerwin256@gmail.com> wrote:
> >
> > No luck.
> >
> > The first patch caused a warning and oops with ath9k_rng_read() at the
> > top of the call stack when reading from /dev/hwrng:
> >   WARNING: CPU: 1 PID: 454 at kernel/kthread.c:75 kthread_should_stop+0x2a/0x30
> >   BUG: kernel NULL pointer dereference, address: 0000000000000000
> >
> > The second didn't have a noticeable effect, for better or worse.
>
> Alright. That's actually getting us somewhere. So the path in question
> here is from reading /dev/hwrng, not from the kthread that's doing the
> same read.
>
> Can you do a `cat /dev/hwrng > /dev/null`, and then do whatever it is
> you do that causes everything to hang, and then while things are hung
> in the bad way, look at the contents of /proc/[the pid of the cat you
> just ran]/stack?

There's another flow I'm interested in. You said it prevents the
system from sleeping. Does it also make a `ip link set wlan0 down`
hang too? If so, could you send the `/proc/[pid of ip link set
down]/stack ` of a hung ip process? That seems like the more relevant
deadlock to look into.

Jason

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: ath9k: hwrng blocks for several minutes when phy is un-associated
  2022-06-24  0:01         ` Jason A. Donenfeld
@ 2022-06-24  0:50           ` Jason A. Donenfeld
  2022-06-24  1:14             ` [PATCH] ath9k: rng: escape sleep loop when unregistering Jason A. Donenfeld
  0 siblings, 1 reply; 28+ messages in thread
From: Jason A. Donenfeld @ 2022-06-24  0:50 UTC (permalink / raw)
  To: Gregory Erwin; +Cc: Rui Salvaterra, Linux Crypto Mailing List, linux-wireless

Hey again again,

On Fri, Jun 24, 2022 at 02:01:22AM +0200, Jason A. Donenfeld wrote:
> Hey again,
> 
> On Fri, Jun 24, 2022 at 1:47 AM Jason A. Donenfeld <Jason@zx2c4.com> wrote:
> >
> > Hey Gregory,
> >
> > On Fri, Jun 24, 2022 at 1:36 AM Gregory Erwin <gregerwin256@gmail.com> wrote:
> > >
> > > No luck.
> > >
> > > The first patch caused a warning and oops with ath9k_rng_read() at the
> > > top of the call stack when reading from /dev/hwrng:
> > >   WARNING: CPU: 1 PID: 454 at kernel/kthread.c:75 kthread_should_stop+0x2a/0x30
> > >   BUG: kernel NULL pointer dereference, address: 0000000000000000
> > >
> > > The second didn't have a noticeable effect, for better or worse.
> >
> > Alright. That's actually getting us somewhere. So the path in question
> > here is from reading /dev/hwrng, not from the kthread that's doing the
> > same read.
> >
> > Can you do a `cat /dev/hwrng > /dev/null`, and then do whatever it is
> > you do that causes everything to hang, and then while things are hung
> > in the bad way, look at the contents of /proc/[the pid of the cat you
> > just ran]/stack?
> 
> There's another flow I'm interested in. You said it prevents the
> system from sleeping. Does it also make a `ip link set wlan0 down`
> hang too? If so, could you send the `/proc/[pid of ip link set
> down]/stack ` of a hung ip process? That seems like the more relevant
> deadlock to look into.

I think I have a plausible theory. I'll send a real patch, and you can
test it. Incoming shortly...

Jason

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH] ath9k: rng: escape sleep loop when unregistering
  2022-06-24  0:50           ` Jason A. Donenfeld
@ 2022-06-24  1:14             ` Jason A. Donenfeld
  2022-06-24  5:25               ` Gregory Erwin
  0 siblings, 1 reply; 28+ messages in thread
From: Jason A. Donenfeld @ 2022-06-24  1:14 UTC (permalink / raw)
  To: Gregory Erwin, linux-wireless, linux-kernel
  Cc: Jason A. Donenfeld, Toke Høiland-Jørgensen, Kalle Valo,
	Rui Salvaterra, stable

There's currently an almost deadlock when ath9k shuts down if no random
bytes are available:

Thread A                                Thread B
-------------------------------------------------------------------------
rng_dev_read
 get_current_rng
  kref_get(&current_rng->ref)
 rng_get_data
  ath9k_rng_read
   msleep_interruptible(...)
                                       ath9k_rng_stop
				        devm_hwrng_unregister
					 devm_hwrng_release
					  hwrng_unregister
					   drop_current_rng
					     kref_put(&current_rng->ref, cleanup_rng)
					      // Does NOT call cleanup_rng here,
					      // because of thread A's kref_get.
					   wait_for_completion(&rng->cleanup_done);
					    // Waits for a really long time...
   // Eventually sleep is over...
 put_rng
  kref_put(&rng->ref, cleanup_rng);
   cleanup_rng
    complete(&rng->cleanup_done);
                                           return

When thread B doesn't return right away, sleep and other functions that
are waiting for ath9k_rng_stop to complete time out, and problems ensue.

This commit fixes the problem by using another completion inside of
ath9k_rng_read so that hwrng_unregister can interrupt it as needed.

Reported-by: Gregory Erwin <gregerwin256@gmail.com>
Cc: Toke Høiland-Jørgensen <toke@redhat.com>
Cc: Kalle Valo <kvalo@kernel.org>
Cc: Rui Salvaterra <rsalvaterra@gmail.com>
Cc: stable@vger.kernel.org
Fixes: fcd09c90c3c5 ("ath9k: use hw_random API instead of directly dumping into random.c")
Link: https://lore.kernel.org/all/CAO+Okf6ZJC5-nTE_EJUGQtd8JiCkiEHytGgDsFGTEjs0c00giw@mail.gmail.com/
Link: https://bugs.archlinux.org/task/75138
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
---
I do not have an ath9k and therefore I can't test this myself. The
analysis above was done completely statically, with no dynamic tracing
and just a bug report of symptoms from Gregory. So it might be totally
wrong. Thus, this patch very much requires Gregory's testing. Please
don't apply it until we have his `Tested-by` line.

 drivers/net/wireless/ath/ath9k/ath9k.h |  1 +
 drivers/net/wireless/ath/ath9k/rng.c   | 26 ++++++++++++++++----------
 2 files changed, 17 insertions(+), 10 deletions(-)

diff --git a/drivers/net/wireless/ath/ath9k/ath9k.h b/drivers/net/wireless/ath/ath9k/ath9k.h
index 3ccf8cfc6b63..731db7f70e5d 100644
--- a/drivers/net/wireless/ath/ath9k/ath9k.h
+++ b/drivers/net/wireless/ath/ath9k/ath9k.h
@@ -1072,6 +1072,7 @@ struct ath_softc {
 
 #ifdef CONFIG_ATH9K_HWRNG
 	struct hwrng rng_ops;
+	struct completion rng_shutdown;
 	u32 rng_last;
 	char rng_name[sizeof("ath9k_65535")];
 #endif
diff --git a/drivers/net/wireless/ath/ath9k/rng.c b/drivers/net/wireless/ath/ath9k/rng.c
index cb5414265a9b..67c6b03a22ac 100644
--- a/drivers/net/wireless/ath/ath9k/rng.c
+++ b/drivers/net/wireless/ath/ath9k/rng.c
@@ -1,5 +1,6 @@
 /*
  * Copyright (c) 2015 Qualcomm Atheros, Inc.
+ * Copyright (C) 2022 Jason A. Donenfeld <Jason@zx2c4.com>. All Rights Reserved.
  *
  * Permission to use, copy, modify, and/or distribute this software for any
  * purpose with or without fee is hereby granted, provided that the above
@@ -52,18 +53,13 @@ static int ath9k_rng_data_read(struct ath_softc *sc, u32 *buf, u32 buf_size)
 	return j << 2;
 }
 
-static u32 ath9k_rng_delay_get(u32 fail_stats)
+static unsigned long ath9k_rng_delay_get(u32 fail_stats)
 {
-	u32 delay;
-
 	if (fail_stats < 100)
-		delay = 10;
+		return msecs_to_jiffies(10);
 	else if (fail_stats < 105)
-		delay = 1000;
-	else
-		delay = 10000;
-
-	return delay;
+		return msecs_to_jiffies(1000);
+	return msecs_to_jiffies(10000);
 }
 
 static int ath9k_rng_read(struct hwrng *rng, void *buf, size_t max, bool wait)
@@ -83,7 +79,10 @@ static int ath9k_rng_read(struct hwrng *rng, void *buf, size_t max, bool wait)
 		if (!wait || !max || likely(bytes_read) || fail_stats > 110)
 			break;
 
-		msleep_interruptible(ath9k_rng_delay_get(++fail_stats));
+		if (wait_for_completion_interruptible_timeout(
+			    &sc->rng_shutdown,
+			    ath9k_rng_delay_get(++fail_stats)))
+			break;
 	}
 
 	if (wait && !bytes_read && max)
@@ -91,6 +90,11 @@ static int ath9k_rng_read(struct hwrng *rng, void *buf, size_t max, bool wait)
 	return bytes_read;
 }
 
+static void ath9k_rng_cleanup(struct hwrng *rng)
+{
+	complete(&container_of(rng, struct ath_softc, rng_ops)->rng_shutdown);
+}
+
 void ath9k_rng_start(struct ath_softc *sc)
 {
 	static atomic_t serial = ATOMIC_INIT(0);
@@ -104,8 +108,10 @@ void ath9k_rng_start(struct ath_softc *sc)
 
 	snprintf(sc->rng_name, sizeof(sc->rng_name), "ath9k_%u",
 		 (atomic_inc_return(&serial) - 1) & U16_MAX);
+	init_completion(&sc->rng_shutdown);
 	sc->rng_ops.name = sc->rng_name;
 	sc->rng_ops.read = ath9k_rng_read;
+	sc->rng_ops.cleanup = ath9k_rng_cleanup;
 	sc->rng_ops.quality = 320;
 
 	if (devm_hwrng_register(sc->dev, &sc->rng_ops))
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [PATCH] ath9k: rng: escape sleep loop when unregistering
  2022-06-24  1:14             ` [PATCH] ath9k: rng: escape sleep loop when unregistering Jason A. Donenfeld
@ 2022-06-24  5:25               ` Gregory Erwin
  2022-06-24 19:12                 ` Jason A. Donenfeld
  0 siblings, 1 reply; 28+ messages in thread
From: Gregory Erwin @ 2022-06-24  5:25 UTC (permalink / raw)
  To: Jason A. Donenfeld
  Cc: linux-wireless, linux-kernel, Toke Høiland-Jørgensen,
	Kalle Valo, Rui Salvaterra, stable

Hi Jason,

I think you are on the right track, but even with this patch
'ip link set wlan0 down' blocks until the hwrng reader gives up.
The reader can either be userspace (dd, cat, etc) or it can also
be the rng_core module. I can replicate the hang in the two different
situations, so I gathered two stack traces for 'ip' depending on the
reader of hwrng:

-userspace-
$ ip link set wlan0 up
$ dd if=/dev/hwrng count=1 & # blocks until interrupted or receives -EIO
$ ip link set wlan0 down & # blocks until dd exits
$ cat /proc/$(pidof ip)/stack
[<0>] devres_release+0x2b/0x60
[<0>] ath9k_rng_stop+0x27/0x40 [ath9k]
[<0>] ath9k_stop+0x40/0x230 [ath9k]
[<0>] drv_stop+0x34/0x100 [mac80211]
[<0>] ieee80211_do_stop+0x685/0x880 [mac80211]
[<0>] ieee80211_stop+0x41/0x170 [mac80211]
[<0>] __dev_close_many+0x9e/0x110
[<0>] __dev_change_flags+0x1a7/0x250
[<0>] dev_change_flags+0x26/0x60
[<0>] do_setlink+0x32d/0x1220
[<0>] __rtnl_newlink+0x5a2/0x9f0
[<0>] rtnl_newlink+0x47/0x70
[<0>] rtnetlink_rcv_msg+0x143/0x370
[<0>] netlink_rcv_skb+0x55/0x100
[<0>] netlink_unicast+0x243/0x390
[<0>] netlink_sendmsg+0x254/0x4b0
[<0>] sock_sendmsg+0x60/0x70
[<0>] ____sys_sendmsg+0x264/0x2a0
[<0>] ___sys_sendmsg+0x96/0xd0
[<0>] __sys_sendmsg+0x7a/0xd0
[<0>] do_syscall_64+0x5f/0x90
[<0>] entry_SYSCALL_64_after_hwframe+0x44/0xae

-rng_core-
$ ip link set wlan0 up
$ # wait a while, maybe a minute or two
$ ip link set wlan0 down & # blocks until 10s after 'hwrng: no data available'
$ cat /proc/$(pidof ip)/stack
[<0>] kthread_stop+0x65/0x170
[<0>] hwrng_unregister+0xbe/0xe0 [rng_core]
[<0>] devres_release+0x2b/0x60
[<0>] ath9k_rng_stop+0x27/0x40 [ath9k]
[<0>] ath9k_stop+0x40/0x230 [ath9k]
[<0>] drv_stop+0x34/0x100 [mac80211]
[<0>] ieee80211_do_stop+0x685/0x880 [mac80211]
[<0>] ieee80211_stop+0x41/0x170 [mac80211]
[<0>] __dev_close_many+0x9e/0x110
[<0>] __dev_change_flags+0x1a7/0x250
[<0>] dev_change_flags+0x26/0x60
[<0>] do_setlink+0x32d/0x1220
[<0>] __rtnl_newlink+0x5a2/0x9f0
[<0>] rtnl_newlink+0x47/0x70
[<0>] rtnetlink_rcv_msg+0x143/0x370
[<0>] netlink_rcv_skb+0x55/0x100
[<0>] netlink_unicast+0x243/0x390
[<0>] netlink_sendmsg+0x254/0x4b0
[<0>] sock_sendmsg+0x60/0x70
[<0>] ____sys_sendmsg+0x264/0x2a0
[<0>] ___sys_sendmsg+0x96/0xd0
[<0>] __sys_sendmsg+0x7a/0xd0
[<0>] do_syscall_64+0x5f/0x90
[<0>] entry_SYSCALL_64_after_hwframe+0x44/0xae

Regards,
 - Greg.



On Thu, Jun 23, 2022 at 6:15 PM Jason A. Donenfeld <Jason@zx2c4.com> wrote:
>
> There's currently an almost deadlock when ath9k shuts down if no random
> bytes are available:
>
> Thread A                                Thread B
> -------------------------------------------------------------------------
> rng_dev_read
>  get_current_rng
>   kref_get(&current_rng->ref)
>  rng_get_data
>   ath9k_rng_read
>    msleep_interruptible(...)
>                                        ath9k_rng_stop
>                                         devm_hwrng_unregister
>                                          devm_hwrng_release
>                                           hwrng_unregister
>                                            drop_current_rng
>                                              kref_put(&current_rng->ref, cleanup_rng)
>                                               // Does NOT call cleanup_rng here,
>                                               // because of thread A's kref_get.
>                                            wait_for_completion(&rng->cleanup_done);
>                                             // Waits for a really long time...
>    // Eventually sleep is over...
>  put_rng
>   kref_put(&rng->ref, cleanup_rng);
>    cleanup_rng
>     complete(&rng->cleanup_done);
>                                            return
>
> When thread B doesn't return right away, sleep and other functions that
> are waiting for ath9k_rng_stop to complete time out, and problems ensue.
>
> This commit fixes the problem by using another completion inside of
> ath9k_rng_read so that hwrng_unregister can interrupt it as needed.
>
> Reported-by: Gregory Erwin <gregerwin256@gmail.com>
> Cc: Toke Høiland-Jørgensen <toke@redhat.com>
> Cc: Kalle Valo <kvalo@kernel.org>
> Cc: Rui Salvaterra <rsalvaterra@gmail.com>
> Cc: stable@vger.kernel.org
> Fixes: fcd09c90c3c5 ("ath9k: use hw_random API instead of directly dumping into random.c")
> Link: https://lore.kernel.org/all/CAO+Okf6ZJC5-nTE_EJUGQtd8JiCkiEHytGgDsFGTEjs0c00giw@mail.gmail.com/
> Link: https://bugs.archlinux.org/task/75138
> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
> ---
> I do not have an ath9k and therefore I can't test this myself. The
> analysis above was done completely statically, with no dynamic tracing
> and just a bug report of symptoms from Gregory. So it might be totally
> wrong. Thus, this patch very much requires Gregory's testing. Please
> don't apply it until we have his `Tested-by` line.
>
>  drivers/net/wireless/ath/ath9k/ath9k.h |  1 +
>  drivers/net/wireless/ath/ath9k/rng.c   | 26 ++++++++++++++++----------
>  2 files changed, 17 insertions(+), 10 deletions(-)
>
> diff --git a/drivers/net/wireless/ath/ath9k/ath9k.h b/drivers/net/wireless/ath/ath9k/ath9k.h
> index 3ccf8cfc6b63..731db7f70e5d 100644
> --- a/drivers/net/wireless/ath/ath9k/ath9k.h
> +++ b/drivers/net/wireless/ath/ath9k/ath9k.h
> @@ -1072,6 +1072,7 @@ struct ath_softc {
>
>  #ifdef CONFIG_ATH9K_HWRNG
>         struct hwrng rng_ops;
> +       struct completion rng_shutdown;
>         u32 rng_last;
>         char rng_name[sizeof("ath9k_65535")];
>  #endif
> diff --git a/drivers/net/wireless/ath/ath9k/rng.c b/drivers/net/wireless/ath/ath9k/rng.c
> index cb5414265a9b..67c6b03a22ac 100644
> --- a/drivers/net/wireless/ath/ath9k/rng.c
> +++ b/drivers/net/wireless/ath/ath9k/rng.c
> @@ -1,5 +1,6 @@
>  /*
>   * Copyright (c) 2015 Qualcomm Atheros, Inc.
> + * Copyright (C) 2022 Jason A. Donenfeld <Jason@zx2c4.com>. All Rights Reserved.
>   *
>   * Permission to use, copy, modify, and/or distribute this software for any
>   * purpose with or without fee is hereby granted, provided that the above
> @@ -52,18 +53,13 @@ static int ath9k_rng_data_read(struct ath_softc *sc, u32 *buf, u32 buf_size)
>         return j << 2;
>  }
>
> -static u32 ath9k_rng_delay_get(u32 fail_stats)
> +static unsigned long ath9k_rng_delay_get(u32 fail_stats)
>  {
> -       u32 delay;
> -
>         if (fail_stats < 100)
> -               delay = 10;
> +               return msecs_to_jiffies(10);
>         else if (fail_stats < 105)
> -               delay = 1000;
> -       else
> -               delay = 10000;
> -
> -       return delay;
> +               return msecs_to_jiffies(1000);
> +       return msecs_to_jiffies(10000);
>  }
>
>  static int ath9k_rng_read(struct hwrng *rng, void *buf, size_t max, bool wait)
> @@ -83,7 +79,10 @@ static int ath9k_rng_read(struct hwrng *rng, void *buf, size_t max, bool wait)
>                 if (!wait || !max || likely(bytes_read) || fail_stats > 110)
>                         break;
>
> -               msleep_interruptible(ath9k_rng_delay_get(++fail_stats));
> +               if (wait_for_completion_interruptible_timeout(
> +                           &sc->rng_shutdown,
> +                           ath9k_rng_delay_get(++fail_stats)))
> +                       break;
>         }
>
>         if (wait && !bytes_read && max)
> @@ -91,6 +90,11 @@ static int ath9k_rng_read(struct hwrng *rng, void *buf, size_t max, bool wait)
>         return bytes_read;
>  }
>
> +static void ath9k_rng_cleanup(struct hwrng *rng)
> +{
> +       complete(&container_of(rng, struct ath_softc, rng_ops)->rng_shutdown);
> +}
> +
>  void ath9k_rng_start(struct ath_softc *sc)
>  {
>         static atomic_t serial = ATOMIC_INIT(0);
> @@ -104,8 +108,10 @@ void ath9k_rng_start(struct ath_softc *sc)
>
>         snprintf(sc->rng_name, sizeof(sc->rng_name), "ath9k_%u",
>                  (atomic_inc_return(&serial) - 1) & U16_MAX);
> +       init_completion(&sc->rng_shutdown);
>         sc->rng_ops.name = sc->rng_name;
>         sc->rng_ops.read = ath9k_rng_read;
> +       sc->rng_ops.cleanup = ath9k_rng_cleanup;
>         sc->rng_ops.quality = 320;
>
>         if (devm_hwrng_register(sc->dev, &sc->rng_ops))
> --
> 2.35.1
>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] ath9k: rng: escape sleep loop when unregistering
  2022-06-24  5:25               ` Gregory Erwin
@ 2022-06-24 19:12                 ` Jason A. Donenfeld
  2022-06-24 20:44                   ` [PATCH v2] ath9k: sleep for less time when unregistering hwrng Jason A. Donenfeld
  0 siblings, 1 reply; 28+ messages in thread
From: Jason A. Donenfeld @ 2022-06-24 19:12 UTC (permalink / raw)
  To: Gregory Erwin
  Cc: linux-wireless, linux-kernel, Toke Høiland-Jørgensen,
	Kalle Valo, Rui Salvaterra, stable

Hi Gregory,

On Thu, Jun 23, 2022 at 10:25:26PM -0700, Gregory Erwin wrote:
> Hi Jason,
> 
> I think you are on the right track, but even with this patch
> 'ip link set wlan0 down' blocks until the hwrng reader gives up.
> The reader can either be userspace (dd, cat, etc) or it can also
> be the rng_core module. I can replicate the hang in the two different
> situations, so I gathered two stack traces for 'ip' depending on the
> reader of hwrng:

Thanks for the traces. I'll send a v2 to you shortly.

Jason

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH v2] ath9k: sleep for less time when unregistering hwrng
  2022-06-24 19:12                 ` Jason A. Donenfeld
@ 2022-06-24 20:44                   ` Jason A. Donenfeld
  2022-06-25  0:13                     ` Gregory Erwin
  2022-06-27  6:45                     ` Herbert Xu
  0 siblings, 2 replies; 28+ messages in thread
From: Jason A. Donenfeld @ 2022-06-24 20:44 UTC (permalink / raw)
  To: Gregory Erwin, linux-wireless, linux-kernel
  Cc: Jason A. Donenfeld, Toke Høiland-Jørgensen, Kalle Valo,
	Rui Salvaterra, Herbert Xu, stable

Even though hwrng provides a `wait` parameter, it doesn't work very well
when waiting for a long time. There are numerous deadlocks that emerge
related to shutdown. Work around this API limitation by waiting for a
shorter amount of time and erroring more frequently. This commit also
prevents hwrng from splatting messages to dmesg when there's a timeout
and prevents calling msleep_interruptible() for tons of time when a
thread is supposed to be shutting down, since msleep_interruptible()
isn't actually interrupted by kthread_stop().

Reported-by: Gregory Erwin <gregerwin256@gmail.com>
Cc: Toke Høiland-Jørgensen <toke@redhat.com>
Cc: Kalle Valo <kvalo@kernel.org>
Cc: Rui Salvaterra <rsalvaterra@gmail.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: stable@vger.kernel.org
Fixes: fcd09c90c3c5 ("ath9k: use hw_random API instead of directly dumping into random.c")
Link: https://lore.kernel.org/all/CAO+Okf6ZJC5-nTE_EJUGQtd8JiCkiEHytGgDsFGTEjs0c00giw@mail.gmail.com/
Link: https://lore.kernel.org/lkml/CAO+Okf5k+C+SE6pMVfPf-d8MfVPVq4PO7EY8Hys_DVXtent3HA@mail.gmail.com/
Link: https://bugs.archlinux.org/task/75138
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
---
I do not have an ath9k and therefore I can't test this myself. The
analysis above was done completely statically, with no dynamic tracing
and just a bug report of symptoms from Gregory. So it might be totally
wrong. Thus, this patch very much requires Gregory's testing. Please
don't apply it until we have his `Tested-by` line.

 drivers/char/hw_random/core.c        | 10 ++++++++--
 drivers/net/wireless/ath/ath9k/rng.c | 19 ++-----------------
 2 files changed, 10 insertions(+), 19 deletions(-)

diff --git a/drivers/char/hw_random/core.c b/drivers/char/hw_random/core.c
index 16f227b995e8..af1c1905bb7e 100644
--- a/drivers/char/hw_random/core.c
+++ b/drivers/char/hw_random/core.c
@@ -513,8 +513,13 @@ static int hwrng_fillfn(void *unused)
 			break;
 
 		if (rc <= 0) {
-			pr_warn("hwrng: no data available\n");
-			msleep_interruptible(10000);
+			int i;
+
+			for (i = 0; i < 100; ++i) {
+				if (kthread_should_stop() ||
+				    msleep_interruptible(10000 / 100))
+					goto out;
+			}
 			continue;
 		}
 
@@ -529,6 +534,7 @@ static int hwrng_fillfn(void *unused)
 		add_hwgenerator_randomness((void *)rng_fillbuf, rc,
 					   entropy >> 10);
 	}
+out:
 	hwrng_fill = NULL;
 	return 0;
 }
diff --git a/drivers/net/wireless/ath/ath9k/rng.c b/drivers/net/wireless/ath/ath9k/rng.c
index cb5414265a9b..883110c66e5e 100644
--- a/drivers/net/wireless/ath/ath9k/rng.c
+++ b/drivers/net/wireless/ath/ath9k/rng.c
@@ -52,20 +52,6 @@ static int ath9k_rng_data_read(struct ath_softc *sc, u32 *buf, u32 buf_size)
 	return j << 2;
 }
 
-static u32 ath9k_rng_delay_get(u32 fail_stats)
-{
-	u32 delay;
-
-	if (fail_stats < 100)
-		delay = 10;
-	else if (fail_stats < 105)
-		delay = 1000;
-	else
-		delay = 10000;
-
-	return delay;
-}
-
 static int ath9k_rng_read(struct hwrng *rng, void *buf, size_t max, bool wait)
 {
 	struct ath_softc *sc = container_of(rng, struct ath_softc, rng_ops);
@@ -80,10 +66,9 @@ static int ath9k_rng_read(struct hwrng *rng, void *buf, size_t max, bool wait)
 			bytes_read += max & 3UL;
 			memzero_explicit(&word, sizeof(word));
 		}
-		if (!wait || !max || likely(bytes_read) || fail_stats > 110)
+		if (!wait || !max || likely(bytes_read) ||
+		    ++fail_stats >= 100 || msleep_interruptible(5))
 			break;
-
-		msleep_interruptible(ath9k_rng_delay_get(++fail_stats));
 	}
 
 	if (wait && !bytes_read && max)
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [PATCH v2] ath9k: sleep for less time when unregistering hwrng
  2022-06-24 20:44                   ` [PATCH v2] ath9k: sleep for less time when unregistering hwrng Jason A. Donenfeld
@ 2022-06-25  0:13                     ` Gregory Erwin
  2022-06-25  0:40                       ` Jason A. Donenfeld
  2022-06-27  6:45                     ` Herbert Xu
  1 sibling, 1 reply; 28+ messages in thread
From: Gregory Erwin @ 2022-06-25  0:13 UTC (permalink / raw)
  To: Jason A. Donenfeld
  Cc: linux-wireless, linux-kernel, Toke Høiland-Jørgensen,
	Kalle Valo, Rui Salvaterra, Herbert Xu, stable

On Fri, Jun 24, 2022 at 1:44 PM Jason A. Donenfeld <Jason@zx2c4.com> wrote:
>
> Even though hwrng provides a `wait` parameter, it doesn't work very well
> when waiting for a long time. There are numerous deadlocks that emerge
> related to shutdown. Work around this API limitation by waiting for a
> shorter amount of time and erroring more frequently. This commit also
> prevents hwrng from splatting messages to dmesg when there's a timeout
> and prevents calling msleep_interruptible() for tons of time when a
> thread is supposed to be shutting down, since msleep_interruptible()
> isn't actually interrupted by kthread_stop().
>
> Reported-by: Gregory Erwin <gregerwin256@gmail.com>
> Cc: Toke Høiland-Jørgensen <toke@redhat.com>
> Cc: Kalle Valo <kvalo@kernel.org>
> Cc: Rui Salvaterra <rsalvaterra@gmail.com>
> Cc: Herbert Xu <herbert@gondor.apana.org.au>
> Cc: stable@vger.kernel.org
> Fixes: fcd09c90c3c5 ("ath9k: use hw_random API instead of directly dumping into random.c")
> Link: https://lore.kernel.org/all/CAO+Okf6ZJC5-nTE_EJUGQtd8JiCkiEHytGgDsFGTEjs0c00giw@mail.gmail.com/
> Link: https://lore.kernel.org/lkml/CAO+Okf5k+C+SE6pMVfPf-d8MfVPVq4PO7EY8Hys_DVXtent3HA@mail.gmail.com/
> Link: https://bugs.archlinux.org/task/75138
> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
> ---
> I do not have an ath9k and therefore I can't test this myself. The
> analysis above was done completely statically, with no dynamic tracing
> and just a bug report of symptoms from Gregory. So it might be totally
> wrong. Thus, this patch very much requires Gregory's testing. Please
> don't apply it until we have his `Tested-by` line.
>
>  drivers/char/hw_random/core.c        | 10 ++++++++--
>  drivers/net/wireless/ath/ath9k/rng.c | 19 ++-----------------
>  2 files changed, 10 insertions(+), 19 deletions(-)
>
> diff --git a/drivers/char/hw_random/core.c b/drivers/char/hw_random/core.c
> index 16f227b995e8..af1c1905bb7e 100644
> --- a/drivers/char/hw_random/core.c
> +++ b/drivers/char/hw_random/core.c
> @@ -513,8 +513,13 @@ static int hwrng_fillfn(void *unused)
>                         break;
>
>                 if (rc <= 0) {
> -                       pr_warn("hwrng: no data available\n");
> -                       msleep_interruptible(10000);
> +                       int i;
> +
> +                       for (i = 0; i < 100; ++i) {
> +                               if (kthread_should_stop() ||
> +                                   msleep_interruptible(10000 / 100))
> +                                       goto out;
> +                       }
>                         continue;
>                 }
>
> @@ -529,6 +534,7 @@ static int hwrng_fillfn(void *unused)
>                 add_hwgenerator_randomness((void *)rng_fillbuf, rc,
>                                            entropy >> 10);
>         }
> +out:
>         hwrng_fill = NULL;
>         return 0;
>  }
> diff --git a/drivers/net/wireless/ath/ath9k/rng.c b/drivers/net/wireless/ath/ath9k/rng.c
> index cb5414265a9b..883110c66e5e 100644
> --- a/drivers/net/wireless/ath/ath9k/rng.c
> +++ b/drivers/net/wireless/ath/ath9k/rng.c
> @@ -52,20 +52,6 @@ static int ath9k_rng_data_read(struct ath_softc *sc, u32 *buf, u32 buf_size)
>         return j << 2;
>  }
>
> -static u32 ath9k_rng_delay_get(u32 fail_stats)
> -{
> -       u32 delay;
> -
> -       if (fail_stats < 100)
> -               delay = 10;
> -       else if (fail_stats < 105)
> -               delay = 1000;
> -       else
> -               delay = 10000;
> -
> -       return delay;
> -}
> -
>  static int ath9k_rng_read(struct hwrng *rng, void *buf, size_t max, bool wait)
>  {
>         struct ath_softc *sc = container_of(rng, struct ath_softc, rng_ops);
> @@ -80,10 +66,9 @@ static int ath9k_rng_read(struct hwrng *rng, void *buf, size_t max, bool wait)
>                         bytes_read += max & 3UL;
>                         memzero_explicit(&word, sizeof(word));
>                 }
> -               if (!wait || !max || likely(bytes_read) || fail_stats > 110)
> +               if (!wait || !max || likely(bytes_read) ||
> +                   ++fail_stats >= 100 || msleep_interruptible(5))
>                         break;
> -
> -               msleep_interruptible(ath9k_rng_delay_get(++fail_stats));
>         }
>
>         if (wait && !bytes_read && max)
> --
> 2.35.1
>

Jason,

This patch is working as you described. Trying to read from /dev/hwrng
consistently blocks for only 1.3s before returning an IO error. The longest
that I observed 'ip link set wlan0 down' to block was also about 1.3s,
and that was immediately after 'cat /dev/hwrng'. Additionally, the longest
duration that I observed for wiphy_suspend() to return was just under 100ms.

Tested-by: Gregory Erwin <gregerwin256@gmail.com>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2] ath9k: sleep for less time when unregistering hwrng
  2022-06-25  0:13                     ` Gregory Erwin
@ 2022-06-25  0:40                       ` Jason A. Donenfeld
  0 siblings, 0 replies; 28+ messages in thread
From: Jason A. Donenfeld @ 2022-06-25  0:40 UTC (permalink / raw)
  To: Gregory Erwin, Toke Høiland-Jørgensen, Kalle Valo
  Cc: linux-wireless, LKML, Rui Salvaterra, Herbert Xu, stable

Hi Gregory,

On Sat, Jun 25, 2022 at 2:13 AM Gregory Erwin <gregerwin256@gmail.com> wrote:
> This patch is working as you described. Trying to read from /dev/hwrng
> consistently blocks for only 1.3s before returning an IO error. The longest
> that I observed 'ip link set wlan0 down' to block was also about 1.3s,
> and that was immediately after 'cat /dev/hwrng'. Additionally, the longest
> duration that I observed for wiphy_suspend() to return was just under 100ms.
>
> Tested-by: Gregory Erwin <gregerwin256@gmail.com>

Great, thanks for testing. I think that barring more invasive changes
to the hwrng subsystem, a heuristic approach like this is the best
we're going to do inside the ath9k driver itself.

So Toke/Kalle - can you queue this up?

Jason

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2] ath9k: sleep for less time when unregistering hwrng
  2022-06-24 20:44                   ` [PATCH v2] ath9k: sleep for less time when unregistering hwrng Jason A. Donenfeld
  2022-06-25  0:13                     ` Gregory Erwin
@ 2022-06-27  6:45                     ` Herbert Xu
  2022-06-27  9:29                       ` [PATCH v3] " Jason A. Donenfeld
  1 sibling, 1 reply; 28+ messages in thread
From: Herbert Xu @ 2022-06-27  6:45 UTC (permalink / raw)
  To: Jason A. Donenfeld
  Cc: Gregory Erwin, linux-wireless, linux-kernel,
	Toke Høiland-Jørgensen, Kalle Valo, Rui Salvaterra,
	stable

On Fri, Jun 24, 2022 at 10:44:33PM +0200, Jason A. Donenfeld wrote:
.
> diff --git a/drivers/char/hw_random/core.c b/drivers/char/hw_random/core.c
> index 16f227b995e8..af1c1905bb7e 100644
> --- a/drivers/char/hw_random/core.c
> +++ b/drivers/char/hw_random/core.c
> @@ -513,8 +513,13 @@ static int hwrng_fillfn(void *unused)
>  			break;
>  
>  		if (rc <= 0) {
> -			pr_warn("hwrng: no data available\n");
> -			msleep_interruptible(10000);
> +			int i;
> +
> +			for (i = 0; i < 100; ++i) {
> +				if (kthread_should_stop() ||
> +				    msleep_interruptible(10000 / 100))
> +					goto out;
> +			}

Please use schedule_timeout_interruptible.  But if you're going
to make it interruptible it should probably at least try to do
something about signals rather than just ignore them.

Cheers,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH v3] ath9k: sleep for less time when unregistering hwrng
  2022-06-27  6:45                     ` Herbert Xu
@ 2022-06-27  9:29                       ` Jason A. Donenfeld
  2022-06-27 10:49                         ` [PATCH v4] " Jason A. Donenfeld
  0 siblings, 1 reply; 28+ messages in thread
From: Jason A. Donenfeld @ 2022-06-27  9:29 UTC (permalink / raw)
  To: linux-wireless, linux-kernel, Toke Høiland-Jørgensen
  Cc: Jason A. Donenfeld, Gregory Erwin, Kalle Valo, Rui Salvaterra,
	Herbert Xu, stable

Even though hwrng provides a `wait` parameter, it doesn't work very well
when waiting for a long time. There are numerous deadlocks that emerge
related to shutdown. Work around this API limitation by waiting for a
shorter amount of time and erroring more frequently. This commit also
prevents hwrng from splatting messages to dmesg when there's a timeout
and prevents calling msleep_interruptible() for tons of time when a
thread is supposed to be shutting down, since msleep_interruptible()
isn't actually interrupted by kthread_stop().

Reported-by: Gregory Erwin <gregerwin256@gmail.com>
Tested-by: Gregory Erwin <gregerwin256@gmail.com>
Cc: Toke Høiland-Jørgensen <toke@redhat.com>
Cc: Kalle Valo <kvalo@kernel.org>
Cc: Rui Salvaterra <rsalvaterra@gmail.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: stable@vger.kernel.org
Fixes: fcd09c90c3c5 ("ath9k: use hw_random API instead of directly dumping into random.c")
Link: https://lore.kernel.org/all/CAO+Okf6ZJC5-nTE_EJUGQtd8JiCkiEHytGgDsFGTEjs0c00giw@mail.gmail.com/
Link: https://lore.kernel.org/lkml/CAO+Okf5k+C+SE6pMVfPf-d8MfVPVq4PO7EY8Hys_DVXtent3HA@mail.gmail.com/
Link: https://bugs.archlinux.org/task/75138
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
---
 drivers/char/hw_random/core.c        | 10 ++++++++--
 drivers/net/wireless/ath/ath9k/rng.c | 20 +++-----------------
 2 files changed, 11 insertions(+), 19 deletions(-)

diff --git a/drivers/char/hw_random/core.c b/drivers/char/hw_random/core.c
index 16f227b995e8..a15273271d87 100644
--- a/drivers/char/hw_random/core.c
+++ b/drivers/char/hw_random/core.c
@@ -513,8 +513,13 @@ static int hwrng_fillfn(void *unused)
 			break;
 
 		if (rc <= 0) {
-			pr_warn("hwrng: no data available\n");
-			msleep_interruptible(10000);
+			int i;
+
+			for (i = 0; i < 100; ++i) {
+				if (kthread_should_stop() ||
+				    schedule_timeout_interruptible(HZ / 20))
+					goto out;
+			}
 			continue;
 		}
 
@@ -529,6 +534,7 @@ static int hwrng_fillfn(void *unused)
 		add_hwgenerator_randomness((void *)rng_fillbuf, rc,
 					   entropy >> 10);
 	}
+out:
 	hwrng_fill = NULL;
 	return 0;
 }
diff --git a/drivers/net/wireless/ath/ath9k/rng.c b/drivers/net/wireless/ath/ath9k/rng.c
index cb5414265a9b..39195f89ea85 100644
--- a/drivers/net/wireless/ath/ath9k/rng.c
+++ b/drivers/net/wireless/ath/ath9k/rng.c
@@ -52,20 +52,6 @@ static int ath9k_rng_data_read(struct ath_softc *sc, u32 *buf, u32 buf_size)
 	return j << 2;
 }
 
-static u32 ath9k_rng_delay_get(u32 fail_stats)
-{
-	u32 delay;
-
-	if (fail_stats < 100)
-		delay = 10;
-	else if (fail_stats < 105)
-		delay = 1000;
-	else
-		delay = 10000;
-
-	return delay;
-}
-
 static int ath9k_rng_read(struct hwrng *rng, void *buf, size_t max, bool wait)
 {
 	struct ath_softc *sc = container_of(rng, struct ath_softc, rng_ops);
@@ -80,10 +66,10 @@ static int ath9k_rng_read(struct hwrng *rng, void *buf, size_t max, bool wait)
 			bytes_read += max & 3UL;
 			memzero_explicit(&word, sizeof(word));
 		}
-		if (!wait || !max || likely(bytes_read) || fail_stats > 110)
+		if (!wait || !max || likely(bytes_read) || ++fail_stats >= 100 ||
+		    schedule_timeout_interruptible(HZ / 20) ||
+		    ((current->flags & PF_KTHREAD) && kthread_should_stop()))
 			break;
-
-		msleep_interruptible(ath9k_rng_delay_get(++fail_stats));
 	}
 
 	if (wait && !bytes_read && max)
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v4] ath9k: sleep for less time when unregistering hwrng
  2022-06-27  9:29                       ` [PATCH v3] " Jason A. Donenfeld
@ 2022-06-27 10:49                         ` Jason A. Donenfeld
  2022-06-27 11:37                           ` [PATCH v5] " Jason A. Donenfeld
  0 siblings, 1 reply; 28+ messages in thread
From: Jason A. Donenfeld @ 2022-06-27 10:49 UTC (permalink / raw)
  To: linux-wireless, linux-kernel, Toke Høiland-Jørgensen
  Cc: Jason A. Donenfeld, Gregory Erwin, Kalle Valo, Rui Salvaterra,
	Herbert Xu, stable

Even though hwrng provides a `wait` parameter, it doesn't work very well
when waiting for a long time. There are numerous deadlocks that emerge
related to shutdown. Work around this API limitation by waiting for a
shorter amount of time and erroring more frequently. This commit also
prevents hwrng from splatting messages to dmesg when there's a timeout
and prevents calling msleep_interruptible() for tons of time when a
thread is supposed to be shutting down, since msleep_interruptible()
isn't actually interrupted by kthread_stop().

Reported-by: Gregory Erwin <gregerwin256@gmail.com>
Tested-by: Gregory Erwin <gregerwin256@gmail.com>
Cc: Toke Høiland-Jørgensen <toke@redhat.com>
Cc: Kalle Valo <kvalo@kernel.org>
Cc: Rui Salvaterra <rsalvaterra@gmail.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: stable@vger.kernel.org
Fixes: fcd09c90c3c5 ("ath9k: use hw_random API instead of directly dumping into random.c")
Link: https://lore.kernel.org/all/CAO+Okf6ZJC5-nTE_EJUGQtd8JiCkiEHytGgDsFGTEjs0c00giw@mail.gmail.com/
Link: https://lore.kernel.org/lkml/CAO+Okf5k+C+SE6pMVfPf-d8MfVPVq4PO7EY8Hys_DVXtent3HA@mail.gmail.com/
Link: https://bugs.archlinux.org/task/75138
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
---
 drivers/char/hw_random/core.c        | 10 ++++++++--
 drivers/net/wireless/ath/ath9k/rng.c | 20 +++-----------------
 2 files changed, 11 insertions(+), 19 deletions(-)

diff --git a/drivers/char/hw_random/core.c b/drivers/char/hw_random/core.c
index 16f227b995e8..a15273271d87 100644
--- a/drivers/char/hw_random/core.c
+++ b/drivers/char/hw_random/core.c
@@ -513,8 +513,13 @@ static int hwrng_fillfn(void *unused)
 			break;
 
 		if (rc <= 0) {
-			pr_warn("hwrng: no data available\n");
-			msleep_interruptible(10000);
+			int i;
+
+			for (i = 0; i < 100; ++i) {
+				if (kthread_should_stop() ||
+				    schedule_timeout_interruptible(HZ / 20))
+					goto out;
+			}
 			continue;
 		}
 
@@ -529,6 +534,7 @@ static int hwrng_fillfn(void *unused)
 		add_hwgenerator_randomness((void *)rng_fillbuf, rc,
 					   entropy >> 10);
 	}
+out:
 	hwrng_fill = NULL;
 	return 0;
 }
diff --git a/drivers/net/wireless/ath/ath9k/rng.c b/drivers/net/wireless/ath/ath9k/rng.c
index cb5414265a9b..757603d1949d 100644
--- a/drivers/net/wireless/ath/ath9k/rng.c
+++ b/drivers/net/wireless/ath/ath9k/rng.c
@@ -52,20 +52,6 @@ static int ath9k_rng_data_read(struct ath_softc *sc, u32 *buf, u32 buf_size)
 	return j << 2;
 }
 
-static u32 ath9k_rng_delay_get(u32 fail_stats)
-{
-	u32 delay;
-
-	if (fail_stats < 100)
-		delay = 10;
-	else if (fail_stats < 105)
-		delay = 1000;
-	else
-		delay = 10000;
-
-	return delay;
-}
-
 static int ath9k_rng_read(struct hwrng *rng, void *buf, size_t max, bool wait)
 {
 	struct ath_softc *sc = container_of(rng, struct ath_softc, rng_ops);
@@ -80,10 +66,10 @@ static int ath9k_rng_read(struct hwrng *rng, void *buf, size_t max, bool wait)
 			bytes_read += max & 3UL;
 			memzero_explicit(&word, sizeof(word));
 		}
-		if (!wait || !max || likely(bytes_read) || fail_stats > 110)
+		if (!wait || !max || likely(bytes_read) || ++fail_stats >= 100 ||
+		    ((current->flags & PF_KTHREAD) && kthread_should_stop()) ||
+		    schedule_timeout_interruptible(HZ / 20))
 			break;
-
-		msleep_interruptible(ath9k_rng_delay_get(++fail_stats));
 	}
 
 	if (wait && !bytes_read && max)
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v5] ath9k: sleep for less time when unregistering hwrng
  2022-06-27 10:49                         ` [PATCH v4] " Jason A. Donenfeld
@ 2022-06-27 11:37                           ` Jason A. Donenfeld
  2022-06-27 12:07                             ` [PATCH v6] " Jason A. Donenfeld
  0 siblings, 1 reply; 28+ messages in thread
From: Jason A. Donenfeld @ 2022-06-27 11:37 UTC (permalink / raw)
  To: linux-wireless, linux-kernel, Toke Høiland-Jørgensen
  Cc: Jason A. Donenfeld, Gregory Erwin, Kalle Valo, Rui Salvaterra,
	Herbert Xu, stable

Even though hwrng provides a `wait` parameter, it doesn't work very well
when waiting for a long time. There are numerous deadlocks that emerge
related to shutdown. Work around this API limitation by waiting for a
shorter amount of time and erroring more frequently. This commit also
prevents hwrng from splatting messages to dmesg when there's a timeout
and switches to using schedule_timeout_interruptible(), so that the
kthread can be stopped.

Reported-by: Gregory Erwin <gregerwin256@gmail.com>
Tested-by: Gregory Erwin <gregerwin256@gmail.com>
Cc: Toke Høiland-Jørgensen <toke@redhat.com>
Cc: Kalle Valo <kvalo@kernel.org>
Cc: Rui Salvaterra <rsalvaterra@gmail.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: stable@vger.kernel.org
Fixes: fcd09c90c3c5 ("ath9k: use hw_random API instead of directly dumping into random.c")
Link: https://lore.kernel.org/all/CAO+Okf6ZJC5-nTE_EJUGQtd8JiCkiEHytGgDsFGTEjs0c00giw@mail.gmail.com/
Link: https://lore.kernel.org/lkml/CAO+Okf5k+C+SE6pMVfPf-d8MfVPVq4PO7EY8Hys_DVXtent3HA@mail.gmail.com/
Link: https://bugs.archlinux.org/task/75138
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
---
Sorry for all the churn here in sending a v4 and v5 so soon. The
semantics of schedule_timeout_interruptible vs msleep_interruptible with
respect to kthreads is kind of confusing. I'll send a follow up patch
for that elsewhere. For now I think this should suffice for fixing the
bug.

 drivers/char/hw_random/core.c        |  3 +--
 drivers/net/wireless/ath/ath9k/rng.c | 20 +++-----------------
 2 files changed, 4 insertions(+), 19 deletions(-)

diff --git a/drivers/char/hw_random/core.c b/drivers/char/hw_random/core.c
index 16f227b995e8..5309fab98631 100644
--- a/drivers/char/hw_random/core.c
+++ b/drivers/char/hw_random/core.c
@@ -513,8 +513,7 @@ static int hwrng_fillfn(void *unused)
 			break;
 
 		if (rc <= 0) {
-			pr_warn("hwrng: no data available\n");
-			msleep_interruptible(10000);
+			schedule_timeout_interruptible(HZ * 10);
 			continue;
 		}
 
diff --git a/drivers/net/wireless/ath/ath9k/rng.c b/drivers/net/wireless/ath/ath9k/rng.c
index cb5414265a9b..757603d1949d 100644
--- a/drivers/net/wireless/ath/ath9k/rng.c
+++ b/drivers/net/wireless/ath/ath9k/rng.c
@@ -52,20 +52,6 @@ static int ath9k_rng_data_read(struct ath_softc *sc, u32 *buf, u32 buf_size)
 	return j << 2;
 }
 
-static u32 ath9k_rng_delay_get(u32 fail_stats)
-{
-	u32 delay;
-
-	if (fail_stats < 100)
-		delay = 10;
-	else if (fail_stats < 105)
-		delay = 1000;
-	else
-		delay = 10000;
-
-	return delay;
-}
-
 static int ath9k_rng_read(struct hwrng *rng, void *buf, size_t max, bool wait)
 {
 	struct ath_softc *sc = container_of(rng, struct ath_softc, rng_ops);
@@ -80,10 +66,10 @@ static int ath9k_rng_read(struct hwrng *rng, void *buf, size_t max, bool wait)
 			bytes_read += max & 3UL;
 			memzero_explicit(&word, sizeof(word));
 		}
-		if (!wait || !max || likely(bytes_read) || fail_stats > 110)
+		if (!wait || !max || likely(bytes_read) || ++fail_stats >= 100 ||
+		    ((current->flags & PF_KTHREAD) && kthread_should_stop()) ||
+		    schedule_timeout_interruptible(HZ / 20))
 			break;
-
-		msleep_interruptible(ath9k_rng_delay_get(++fail_stats));
 	}
 
 	if (wait && !bytes_read && max)
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v6] ath9k: sleep for less time when unregistering hwrng
  2022-06-27 11:37                           ` [PATCH v5] " Jason A. Donenfeld
@ 2022-06-27 12:07                             ` Jason A. Donenfeld
  2022-06-27 12:18                               ` Toke Høiland-Jørgensen
                                                 ` (2 more replies)
  0 siblings, 3 replies; 28+ messages in thread
From: Jason A. Donenfeld @ 2022-06-27 12:07 UTC (permalink / raw)
  To: linux-wireless, linux-kernel, Toke Høiland-Jørgensen
  Cc: Jason A. Donenfeld, Gregory Erwin, Kalle Valo, Rui Salvaterra,
	Herbert Xu, stable

Even though hwrng provides a `wait` parameter, it doesn't work very well
when waiting for a long time. There are numerous deadlocks that emerge
related to shutdown. Work around this API limitation by waiting for a
shorter amount of time and erroring more frequently. This commit also
prevents hwrng from splatting messages to dmesg when there's a timeout
and switches to using schedule_timeout_interruptible(), so that the
kthread can be stopped.

Reported-by: Gregory Erwin <gregerwin256@gmail.com>
Tested-by: Gregory Erwin <gregerwin256@gmail.com>
Cc: Toke Høiland-Jørgensen <toke@redhat.com>
Cc: Kalle Valo <kvalo@kernel.org>
Cc: Rui Salvaterra <rsalvaterra@gmail.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: stable@vger.kernel.org
Fixes: fcd09c90c3c5 ("ath9k: use hw_random API instead of directly dumping into random.c")
Link: https://lore.kernel.org/all/CAO+Okf6ZJC5-nTE_EJUGQtd8JiCkiEHytGgDsFGTEjs0c00giw@mail.gmail.com/
Link: https://lore.kernel.org/lkml/CAO+Okf5k+C+SE6pMVfPf-d8MfVPVq4PO7EY8Hys_DVXtent3HA@mail.gmail.com/
Link: https://bugs.archlinux.org/task/75138
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
---
 drivers/char/hw_random/core.c        |  5 +++--
 drivers/net/wireless/ath/ath9k/rng.c | 20 +++-----------------
 2 files changed, 6 insertions(+), 19 deletions(-)

diff --git a/drivers/char/hw_random/core.c b/drivers/char/hw_random/core.c
index 16f227b995e8..9987f0642285 100644
--- a/drivers/char/hw_random/core.c
+++ b/drivers/char/hw_random/core.c
@@ -513,8 +513,9 @@ static int hwrng_fillfn(void *unused)
 			break;
 
 		if (rc <= 0) {
-			pr_warn("hwrng: no data available\n");
-			msleep_interruptible(10000);
+			if (kthread_should_stop())
+				break;
+			schedule_timeout_interruptible(HZ * 10);
 			continue;
 		}
 
diff --git a/drivers/net/wireless/ath/ath9k/rng.c b/drivers/net/wireless/ath/ath9k/rng.c
index cb5414265a9b..757603d1949d 100644
--- a/drivers/net/wireless/ath/ath9k/rng.c
+++ b/drivers/net/wireless/ath/ath9k/rng.c
@@ -52,20 +52,6 @@ static int ath9k_rng_data_read(struct ath_softc *sc, u32 *buf, u32 buf_size)
 	return j << 2;
 }
 
-static u32 ath9k_rng_delay_get(u32 fail_stats)
-{
-	u32 delay;
-
-	if (fail_stats < 100)
-		delay = 10;
-	else if (fail_stats < 105)
-		delay = 1000;
-	else
-		delay = 10000;
-
-	return delay;
-}
-
 static int ath9k_rng_read(struct hwrng *rng, void *buf, size_t max, bool wait)
 {
 	struct ath_softc *sc = container_of(rng, struct ath_softc, rng_ops);
@@ -80,10 +66,10 @@ static int ath9k_rng_read(struct hwrng *rng, void *buf, size_t max, bool wait)
 			bytes_read += max & 3UL;
 			memzero_explicit(&word, sizeof(word));
 		}
-		if (!wait || !max || likely(bytes_read) || fail_stats > 110)
+		if (!wait || !max || likely(bytes_read) || ++fail_stats >= 100 ||
+		    ((current->flags & PF_KTHREAD) && kthread_should_stop()) ||
+		    schedule_timeout_interruptible(HZ / 20))
 			break;
-
-		msleep_interruptible(ath9k_rng_delay_get(++fail_stats));
 	}
 
 	if (wait && !bytes_read && max)
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [PATCH v6] ath9k: sleep for less time when unregistering hwrng
  2022-06-27 12:07                             ` [PATCH v6] " Jason A. Donenfeld
@ 2022-06-27 12:18                               ` Toke Høiland-Jørgensen
  2022-06-28  1:39                                 ` Gregory Erwin
  2022-06-28  1:53                               ` Herbert Xu
  2022-06-28 10:45                               ` Toke Høiland-Jørgensen
  2 siblings, 1 reply; 28+ messages in thread
From: Toke Høiland-Jørgensen @ 2022-06-27 12:18 UTC (permalink / raw)
  To: Jason A. Donenfeld, linux-wireless, linux-kernel, Gregory Erwin
  Cc: Kalle Valo, Rui Salvaterra, Herbert Xu, stable

"Jason A. Donenfeld" <Jason@zx2c4.com> writes:

> Even though hwrng provides a `wait` parameter, it doesn't work very well
> when waiting for a long time. There are numerous deadlocks that emerge
> related to shutdown. Work around this API limitation by waiting for a
> shorter amount of time and erroring more frequently. This commit also
> prevents hwrng from splatting messages to dmesg when there's a timeout
> and switches to using schedule_timeout_interruptible(), so that the
> kthread can be stopped.
>
> Reported-by: Gregory Erwin <gregerwin256@gmail.com>
> Tested-by: Gregory Erwin <gregerwin256@gmail.com>
> Cc: Toke Høiland-Jørgensen <toke@redhat.com>
> Cc: Kalle Valo <kvalo@kernel.org>
> Cc: Rui Salvaterra <rsalvaterra@gmail.com>
> Cc: Herbert Xu <herbert@gondor.apana.org.au>
> Cc: stable@vger.kernel.org
> Fixes: fcd09c90c3c5 ("ath9k: use hw_random API instead of directly dumping into random.c")
> Link: https://lore.kernel.org/all/CAO+Okf6ZJC5-nTE_EJUGQtd8JiCkiEHytGgDsFGTEjs0c00giw@mail.gmail.com/
> Link: https://lore.kernel.org/lkml/CAO+Okf5k+C+SE6pMVfPf-d8MfVPVq4PO7EY8Hys_DVXtent3HA@mail.gmail.com/
> Link: https://bugs.archlinux.org/task/75138
> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>

Gregory, care to take this version for a spin as well to double-check
that it still resolves the issue? :)

-Toke


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v6] ath9k: sleep for less time when unregistering hwrng
  2022-06-27 12:18                               ` Toke Høiland-Jørgensen
@ 2022-06-28  1:39                                 ` Gregory Erwin
  2022-06-28 10:46                                   ` Jason A. Donenfeld
  0 siblings, 1 reply; 28+ messages in thread
From: Gregory Erwin @ 2022-06-28  1:39 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen
  Cc: Jason A. Donenfeld, linux-wireless, linux-kernel, Kalle Valo,
	Rui Salvaterra, Herbert Xu, stable

On Mon, Jun 27, 2022 at 5:18 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>
> "Jason A. Donenfeld" <Jason@zx2c4.com> writes:
>
> > Even though hwrng provides a `wait` parameter, it doesn't work very well
> > when waiting for a long time. There are numerous deadlocks that emerge
> > related to shutdown. Work around this API limitation by waiting for a
> > shorter amount of time and erroring more frequently. This commit also
> > prevents hwrng from splatting messages to dmesg when there's a timeout
> > and switches to using schedule_timeout_interruptible(), so that the
> > kthread can be stopped.
> >
> > Reported-by: Gregory Erwin <gregerwin256@gmail.com>
> > Tested-by: Gregory Erwin <gregerwin256@gmail.com>
> > Cc: Toke Høiland-Jørgensen <toke@redhat.com>
> > Cc: Kalle Valo <kvalo@kernel.org>
> > Cc: Rui Salvaterra <rsalvaterra@gmail.com>
> > Cc: Herbert Xu <herbert@gondor.apana.org.au>
> > Cc: stable@vger.kernel.org
> > Fixes: fcd09c90c3c5 ("ath9k: use hw_random API instead of directly dumping into random.c")
> > Link: https://lore.kernel.org/all/CAO+Okf6ZJC5-nTE_EJUGQtd8JiCkiEHytGgDsFGTEjs0c00giw@mail.gmail.com/
> > Link: https://lore.kernel.org/lkml/CAO+Okf5k+C+SE6pMVfPf-d8MfVPVq4PO7EY8Hys_DVXtent3HA@mail.gmail.com/
> > Link: https://bugs.archlinux.org/task/75138
> > Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
>
> Gregory, care to take this version for a spin as well to double-check
> that it still resolves the issue? :)
>
> -Toke
>

With patch v6, reads from /dev/hwrng block for 5-6s, during which 'ip link set
wlan0 down' will also block. Otherwise, 'ip link set wlan0 down' returns
immediately. Similarly, wiphy_suspend() consistently returns in under 10ms.

Tested-by: Gregory Erwin <gregerwin256@gmail.com>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v6] ath9k: sleep for less time when unregistering hwrng
  2022-06-27 12:07                             ` [PATCH v6] " Jason A. Donenfeld
  2022-06-27 12:18                               ` Toke Høiland-Jørgensen
@ 2022-06-28  1:53                               ` Herbert Xu
  2022-06-28 10:45                               ` Toke Høiland-Jørgensen
  2 siblings, 0 replies; 28+ messages in thread
From: Herbert Xu @ 2022-06-28  1:53 UTC (permalink / raw)
  To: Jason A. Donenfeld
  Cc: linux-wireless, linux-kernel, Toke Høiland-Jørgensen,
	Gregory Erwin, Kalle Valo, Rui Salvaterra, stable

On Mon, Jun 27, 2022 at 02:07:35PM +0200, Jason A. Donenfeld wrote:
> Even though hwrng provides a `wait` parameter, it doesn't work very well
> when waiting for a long time. There are numerous deadlocks that emerge
> related to shutdown. Work around this API limitation by waiting for a
> shorter amount of time and erroring more frequently. This commit also
> prevents hwrng from splatting messages to dmesg when there's a timeout
> and switches to using schedule_timeout_interruptible(), so that the
> kthread can be stopped.
> 
> Reported-by: Gregory Erwin <gregerwin256@gmail.com>
> Tested-by: Gregory Erwin <gregerwin256@gmail.com>
> Cc: Toke Høiland-Jørgensen <toke@redhat.com>
> Cc: Kalle Valo <kvalo@kernel.org>
> Cc: Rui Salvaterra <rsalvaterra@gmail.com>
> Cc: Herbert Xu <herbert@gondor.apana.org.au>
> Cc: stable@vger.kernel.org
> Fixes: fcd09c90c3c5 ("ath9k: use hw_random API instead of directly dumping into random.c")
> Link: https://lore.kernel.org/all/CAO+Okf6ZJC5-nTE_EJUGQtd8JiCkiEHytGgDsFGTEjs0c00giw@mail.gmail.com/
> Link: https://lore.kernel.org/lkml/CAO+Okf5k+C+SE6pMVfPf-d8MfVPVq4PO7EY8Hys_DVXtent3HA@mail.gmail.com/
> Link: https://bugs.archlinux.org/task/75138
> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
> ---
>  drivers/char/hw_random/core.c        |  5 +++--
>  drivers/net/wireless/ath/ath9k/rng.c | 20 +++-----------------
>  2 files changed, 6 insertions(+), 19 deletions(-)

Acked-by: Herbert Xu <herbert@gondor.apana.org.au>

Thanks,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v6] ath9k: sleep for less time when unregistering hwrng
  2022-06-27 12:07                             ` [PATCH v6] " Jason A. Donenfeld
  2022-06-27 12:18                               ` Toke Høiland-Jørgensen
  2022-06-28  1:53                               ` Herbert Xu
@ 2022-06-28 10:45                               ` Toke Høiland-Jørgensen
  2 siblings, 0 replies; 28+ messages in thread
From: Toke Høiland-Jørgensen @ 2022-06-28 10:45 UTC (permalink / raw)
  To: Jason A. Donenfeld, linux-wireless, linux-kernel
  Cc: Gregory Erwin, Kalle Valo, Rui Salvaterra, Herbert Xu, stable

"Jason A. Donenfeld" <Jason@zx2c4.com> writes:

> Even though hwrng provides a `wait` parameter, it doesn't work very well
> when waiting for a long time. There are numerous deadlocks that emerge
> related to shutdown. Work around this API limitation by waiting for a
> shorter amount of time and erroring more frequently. This commit also
> prevents hwrng from splatting messages to dmesg when there's a timeout
> and switches to using schedule_timeout_interruptible(), so that the
> kthread can be stopped.
>
> Reported-by: Gregory Erwin <gregerwin256@gmail.com>
> Tested-by: Gregory Erwin <gregerwin256@gmail.com>
> Cc: Toke Høiland-Jørgensen <toke@redhat.com>
> Cc: Kalle Valo <kvalo@kernel.org>
> Cc: Rui Salvaterra <rsalvaterra@gmail.com>
> Cc: Herbert Xu <herbert@gondor.apana.org.au>
> Cc: stable@vger.kernel.org
> Fixes: fcd09c90c3c5 ("ath9k: use hw_random API instead of directly dumping into random.c")
> Link: https://lore.kernel.org/all/CAO+Okf6ZJC5-nTE_EJUGQtd8JiCkiEHytGgDsFGTEjs0c00giw@mail.gmail.com/
> Link: https://lore.kernel.org/lkml/CAO+Okf5k+C+SE6pMVfPf-d8MfVPVq4PO7EY8Hys_DVXtent3HA@mail.gmail.com/
> Link: https://bugs.archlinux.org/task/75138
> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>

Acked-by: Toke Høiland-Jørgensen <toke@toke.dk>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v6] ath9k: sleep for less time when unregistering hwrng
  2022-06-28  1:39                                 ` Gregory Erwin
@ 2022-06-28 10:46                                   ` Jason A. Donenfeld
  2022-06-28 10:48                                     ` Jason A. Donenfeld
  0 siblings, 1 reply; 28+ messages in thread
From: Jason A. Donenfeld @ 2022-06-28 10:46 UTC (permalink / raw)
  To: Gregory Erwin
  Cc: Toke Høiland-Jørgensen, linux-wireless, LKML,
	Kalle Valo, Rui Salvaterra, Herbert Xu, stable

Hi Gregory,

On Tue, Jun 28, 2022 at 3:39 AM Gregory Erwin <gregerwin256@gmail.com> wrote:
>
> On Mon, Jun 27, 2022 at 5:18 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
> >
> > "Jason A. Donenfeld" <Jason@zx2c4.com> writes:
> >
> > > Even though hwrng provides a `wait` parameter, it doesn't work very well
> > > when waiting for a long time. There are numerous deadlocks that emerge
> > > related to shutdown. Work around this API limitation by waiting for a
> > > shorter amount of time and erroring more frequently. This commit also
> > > prevents hwrng from splatting messages to dmesg when there's a timeout
> > > and switches to using schedule_timeout_interruptible(), so that the
> > > kthread can be stopped.
> > >
> > > Reported-by: Gregory Erwin <gregerwin256@gmail.com>
> > > Tested-by: Gregory Erwin <gregerwin256@gmail.com>
> > > Cc: Toke Høiland-Jørgensen <toke@redhat.com>
> > > Cc: Kalle Valo <kvalo@kernel.org>
> > > Cc: Rui Salvaterra <rsalvaterra@gmail.com>
> > > Cc: Herbert Xu <herbert@gondor.apana.org.au>
> > > Cc: stable@vger.kernel.org
> > > Fixes: fcd09c90c3c5 ("ath9k: use hw_random API instead of directly dumping into random.c")
> > > Link: https://lore.kernel.org/all/CAO+Okf6ZJC5-nTE_EJUGQtd8JiCkiEHytGgDsFGTEjs0c00giw@mail.gmail.com/
> > > Link: https://lore.kernel.org/lkml/CAO+Okf5k+C+SE6pMVfPf-d8MfVPVq4PO7EY8Hys_DVXtent3HA@mail.gmail.com/
> > > Link: https://bugs.archlinux.org/task/75138
> > > Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
> >
> > Gregory, care to take this version for a spin as well to double-check
> > that it still resolves the issue? :)
> >
> > -Toke
> >
>
> With patch v6, reads from /dev/hwrng block for 5-6s, during which 'ip link set
> wlan0 down' will also block. Otherwise, 'ip link set wlan0 down' returns
> immediately. Similarly, wiphy_suspend() consistently returns in under 10ms.
>
> Tested-by: Gregory Erwin <gregerwin256@gmail.com>

Oh 5-6s... so it's actually worse. Yikes. Sounds like v4 might have
been the better patch, then?

Jason

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v6] ath9k: sleep for less time when unregistering hwrng
  2022-06-28 10:46                                   ` Jason A. Donenfeld
@ 2022-06-28 10:48                                     ` Jason A. Donenfeld
  2022-06-28 10:51                                       ` Herbert Xu
  0 siblings, 1 reply; 28+ messages in thread
From: Jason A. Donenfeld @ 2022-06-28 10:48 UTC (permalink / raw)
  To: Gregory Erwin
  Cc: Toke Høiland-Jørgensen, linux-wireless, LKML,
	Kalle Valo, Rui Salvaterra, Herbert Xu, stable

On Tue, Jun 28, 2022 at 12:46 PM Jason A. Donenfeld <Jason@zx2c4.com> wrote:
>
> Hi Gregory,
>
> On Tue, Jun 28, 2022 at 3:39 AM Gregory Erwin <gregerwin256@gmail.com> wrote:
> >
> > On Mon, Jun 27, 2022 at 5:18 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
> > >
> > > "Jason A. Donenfeld" <Jason@zx2c4.com> writes:
> > >
> > > > Even though hwrng provides a `wait` parameter, it doesn't work very well
> > > > when waiting for a long time. There are numerous deadlocks that emerge
> > > > related to shutdown. Work around this API limitation by waiting for a
> > > > shorter amount of time and erroring more frequently. This commit also
> > > > prevents hwrng from splatting messages to dmesg when there's a timeout
> > > > and switches to using schedule_timeout_interruptible(), so that the
> > > > kthread can be stopped.
> > > >
> > > > Reported-by: Gregory Erwin <gregerwin256@gmail.com>
> > > > Tested-by: Gregory Erwin <gregerwin256@gmail.com>
> > > > Cc: Toke Høiland-Jørgensen <toke@redhat.com>
> > > > Cc: Kalle Valo <kvalo@kernel.org>
> > > > Cc: Rui Salvaterra <rsalvaterra@gmail.com>
> > > > Cc: Herbert Xu <herbert@gondor.apana.org.au>
> > > > Cc: stable@vger.kernel.org
> > > > Fixes: fcd09c90c3c5 ("ath9k: use hw_random API instead of directly dumping into random.c")
> > > > Link: https://lore.kernel.org/all/CAO+Okf6ZJC5-nTE_EJUGQtd8JiCkiEHytGgDsFGTEjs0c00giw@mail.gmail.com/
> > > > Link: https://lore.kernel.org/lkml/CAO+Okf5k+C+SE6pMVfPf-d8MfVPVq4PO7EY8Hys_DVXtent3HA@mail.gmail.com/
> > > > Link: https://bugs.archlinux.org/task/75138
> > > > Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
> > >
> > > Gregory, care to take this version for a spin as well to double-check
> > > that it still resolves the issue? :)
> > >
> > > -Toke
> > >
> >
> > With patch v6, reads from /dev/hwrng block for 5-6s, during which 'ip link set
> > wlan0 down' will also block. Otherwise, 'ip link set wlan0 down' returns
> > immediately. Similarly, wiphy_suspend() consistently returns in under 10ms.
> >
> > Tested-by: Gregory Erwin <gregerwin256@gmail.com>
>
> Oh 5-6s... so it's actually worse. Yikes. Sounds like v4 might have
> been the better patch, then?

$ curl https://lore.kernel.org/lkml/20220627104955.534013-1-Jason@zx2c4.com/raw
| git am

That one, if you want to give it a spin and see if that 5-6s is back
to ~1s or less.

Jason

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v6] ath9k: sleep for less time when unregistering hwrng
  2022-06-28 10:48                                     ` Jason A. Donenfeld
@ 2022-06-28 10:51                                       ` Herbert Xu
  2022-06-28 10:55                                         ` Jason A. Donenfeld
  0 siblings, 1 reply; 28+ messages in thread
From: Herbert Xu @ 2022-06-28 10:51 UTC (permalink / raw)
  To: Jason A. Donenfeld
  Cc: Gregory Erwin, Toke Høiland-Jørgensen, linux-wireless,
	LKML, Kalle Valo, Rui Salvaterra, stable

On Tue, Jun 28, 2022 at 12:48:50PM +0200, Jason A. Donenfeld wrote:
>
> $ curl https://lore.kernel.org/lkml/20220627104955.534013-1-Jason@zx2c4.com/raw
> | git am
> 
> That one, if you want to give it a spin and see if that 5-6s is back
> to ~1s or less.

Whatever caused kthread_should_stop to return true should have
woken the thread and caused schedule_timeout to return.

If it's not waking the thread up then we should find out why.

Oh wait you're checking kthread_should_stop before the schedule
call instead of afterwards, that would do it.

Cheers,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v6] ath9k: sleep for less time when unregistering hwrng
  2022-06-28 10:51                                       ` Herbert Xu
@ 2022-06-28 10:55                                         ` Jason A. Donenfeld
  2022-06-28 12:05                                           ` Jason A. Donenfeld
  0 siblings, 1 reply; 28+ messages in thread
From: Jason A. Donenfeld @ 2022-06-28 10:55 UTC (permalink / raw)
  To: Herbert Xu
  Cc: Gregory Erwin, Toke Høiland-Jørgensen, linux-wireless,
	LKML, Kalle Valo, Rui Salvaterra, stable

Hi Herbert,

On Tue, Jun 28, 2022 at 12:52 PM Herbert Xu <herbert@gondor.apana.org.au> wrote:
>
> On Tue, Jun 28, 2022 at 12:48:50PM +0200, Jason A. Donenfeld wrote:
> >
> > $ curl https://lore.kernel.org/lkml/20220627104955.534013-1-Jason@zx2c4.com/raw
> > | git am
> >
> > That one, if you want to give it a spin and see if that 5-6s is back
> > to ~1s or less.
>
> Whatever caused kthread_should_stop to return true should have
> woken the thread and caused schedule_timeout to return.
>
> If it's not waking the thread up then we should find out why.
>
> Oh wait you're checking kthread_should_stop before the schedule
> call instead of afterwards, that would do it.

Oh, that's a really good observation, thank you! I'll send a patch
that does it right. I have to check kthread_should_stop() before,
because the wake up might have been consumed by a sleep inside of the
hwrng ->read_data() function, causing it to return early. So we have
to check it before going to sleep again. And after, I need to be
checking the return value of schedule_timeout_interruptible(), which
I'm not in this latest. So v+1 coming up.

By the way, this thread might interest you:
https://lore.kernel.org/lkml/20220627145716.641185-1-Jason@zx2c4.com/

Jason

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v6] ath9k: sleep for less time when unregistering hwrng
  2022-06-28 10:55                                         ` Jason A. Donenfeld
@ 2022-06-28 12:05                                           ` Jason A. Donenfeld
  2022-06-28 15:14                                             ` Jason A. Donenfeld
  0 siblings, 1 reply; 28+ messages in thread
From: Jason A. Donenfeld @ 2022-06-28 12:05 UTC (permalink / raw)
  To: Herbert Xu
  Cc: Gregory Erwin, Toke Høiland-Jørgensen, linux-wireless,
	LKML, Kalle Valo, Rui Salvaterra, stable

Hi again,

On Tue, Jun 28, 2022 at 12:55:57PM +0200, Jason A. Donenfeld wrote:
> > Oh wait you're checking kthread_should_stop before the schedule
> > call instead of afterwards, that would do it.
> 
> Oh, that's a really good observation, thank you! 

Wait, no. I already am kthread_should_stop() it afterwards. That
"continue" goes to the top of the loop, which checks it in the while()
statement. So something else is amiss. I guess I'll investigate.

Jason

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v6] ath9k: sleep for less time when unregistering hwrng
  2022-06-28 12:05                                           ` Jason A. Donenfeld
@ 2022-06-28 15:14                                             ` Jason A. Donenfeld
  0 siblings, 0 replies; 28+ messages in thread
From: Jason A. Donenfeld @ 2022-06-28 15:14 UTC (permalink / raw)
  To: Herbert Xu
  Cc: Gregory Erwin, Toke Høiland-Jørgensen, linux-wireless,
	LKML, Kalle Valo, Rui Salvaterra, stable

Hi Herbert,

On Tue, Jun 28, 2022 at 02:05:34PM +0200, Jason A. Donenfeld wrote:
> Hi again,
> 
> On Tue, Jun 28, 2022 at 12:55:57PM +0200, Jason A. Donenfeld wrote:
> > > Oh wait you're checking kthread_should_stop before the schedule
> > > call instead of afterwards, that would do it.
> > 
> > Oh, that's a really good observation, thank you! 
> 
> Wait, no. I already am kthread_should_stop() it afterwards. That
> "continue" goes to the top of the loop, which checks it in the while()
> statement. So something else is amiss. I guess I'll investigate.

I worked out a solution for the "larger problem" now. It's a bit
involved, but not too bad. I'll post a patch shortly.

Jason

^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2022-06-28 15:14 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-06-23  5:08 ath9k: hwrng blocks for several minutes when phy is un-associated Gregory Erwin
2022-06-23 12:14 ` Jason A. Donenfeld
2022-06-23 12:16   ` Jason A. Donenfeld
2022-06-23 23:36     ` Gregory Erwin
2022-06-23 23:47       ` Jason A. Donenfeld
2022-06-24  0:01         ` Jason A. Donenfeld
2022-06-24  0:50           ` Jason A. Donenfeld
2022-06-24  1:14             ` [PATCH] ath9k: rng: escape sleep loop when unregistering Jason A. Donenfeld
2022-06-24  5:25               ` Gregory Erwin
2022-06-24 19:12                 ` Jason A. Donenfeld
2022-06-24 20:44                   ` [PATCH v2] ath9k: sleep for less time when unregistering hwrng Jason A. Donenfeld
2022-06-25  0:13                     ` Gregory Erwin
2022-06-25  0:40                       ` Jason A. Donenfeld
2022-06-27  6:45                     ` Herbert Xu
2022-06-27  9:29                       ` [PATCH v3] " Jason A. Donenfeld
2022-06-27 10:49                         ` [PATCH v4] " Jason A. Donenfeld
2022-06-27 11:37                           ` [PATCH v5] " Jason A. Donenfeld
2022-06-27 12:07                             ` [PATCH v6] " Jason A. Donenfeld
2022-06-27 12:18                               ` Toke Høiland-Jørgensen
2022-06-28  1:39                                 ` Gregory Erwin
2022-06-28 10:46                                   ` Jason A. Donenfeld
2022-06-28 10:48                                     ` Jason A. Donenfeld
2022-06-28 10:51                                       ` Herbert Xu
2022-06-28 10:55                                         ` Jason A. Donenfeld
2022-06-28 12:05                                           ` Jason A. Donenfeld
2022-06-28 15:14                                             ` Jason A. Donenfeld
2022-06-28  1:53                               ` Herbert Xu
2022-06-28 10:45                               ` Toke Høiland-Jørgensen

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.