linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/3] fix stuck in accessing hwrng attributes
@ 2014-09-15 16:02 Amos Kong
  2014-09-15 16:02 ` [PATCH v2 1/3] virtio-rng cleanup: move some code out of mutex protection Amos Kong
                   ` (3 more replies)
  0 siblings, 4 replies; 20+ messages in thread
From: Amos Kong @ 2014-09-15 16:02 UTC (permalink / raw)
  To: virtualization; +Cc: kvm, herbert, m, mb, mpm, rusty, amit.shah, linux-kernel

If we read hwrng by long-running dd process, it takes too much cpu
time and almost hold the mutex lock. When we check hwrng attributes
from sysfs by cat, it gets stuck in waiting the lock releaseing.
The problem can only be reproduced with non-smp guest with slow backend.

This patchset resolves the issue by changing rng_dev_read() to always
schedule 10 jiffies after release mutex lock, then cat process can
have chance to get the lock and execute protected code without stuck.

Thanks.

V2: update commitlog to describe PATCH 2, split second patch.

Amos Kong (3):
  virtio-rng cleanup: move some code out of mutex protection
  hw_random: fix stuck in catting hwrng attributes
  hw_random: increase schedule timeout in rng_dev_read()

 drivers/char/hw_random/core.c | 9 ++++-----
 1 file changed, 4 insertions(+), 5 deletions(-)

-- 
1.9.3


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH v2 1/3] virtio-rng cleanup: move some code out of mutex protection
  2014-09-15 16:02 [PATCH v2 0/3] fix stuck in accessing hwrng attributes Amos Kong
@ 2014-09-15 16:02 ` Amos Kong
  2014-09-15 16:13   ` Michael Büsch
  2014-09-15 16:02 ` [PATCH v2 2/3] hw_random: fix stuck in catting hwrng attributes Amos Kong
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 20+ messages in thread
From: Amos Kong @ 2014-09-15 16:02 UTC (permalink / raw)
  To: virtualization; +Cc: kvm, herbert, m, mb, mpm, rusty, amit.shah, linux-kernel

It doesn't save too much cpu time as expected, just a cleanup.

Signed-off-by: Amos Kong <akong@redhat.com>
---
 drivers/char/hw_random/core.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/char/hw_random/core.c b/drivers/char/hw_random/core.c
index aa30a25..c591d7e 100644
--- a/drivers/char/hw_random/core.c
+++ b/drivers/char/hw_random/core.c
@@ -270,8 +270,8 @@ static ssize_t hwrng_attr_current_show(struct device *dev,
 		return -ERESTARTSYS;
 	if (current_rng)
 		name = current_rng->name;
-	ret = snprintf(buf, PAGE_SIZE, "%s\n", name);
 	mutex_unlock(&rng_mutex);
+	ret = snprintf(buf, PAGE_SIZE, "%s\n", name);
 
 	return ret;
 }
@@ -284,19 +284,19 @@ static ssize_t hwrng_attr_available_show(struct device *dev,
 	ssize_t ret = 0;
 	struct hwrng *rng;
 
+	buf[0] = '\0';
 	err = mutex_lock_interruptible(&rng_mutex);
 	if (err)
 		return -ERESTARTSYS;
-	buf[0] = '\0';
 	list_for_each_entry(rng, &rng_list, list) {
 		strncat(buf, rng->name, PAGE_SIZE - ret - 1);
 		ret += strlen(rng->name);
 		strncat(buf, " ", PAGE_SIZE - ret - 1);
 		ret++;
 	}
+	mutex_unlock(&rng_mutex);
 	strncat(buf, "\n", PAGE_SIZE - ret - 1);
 	ret++;
-	mutex_unlock(&rng_mutex);
 
 	return ret;
 }
-- 
1.9.3


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v2 2/3] hw_random: fix stuck in catting hwrng attributes
  2014-09-15 16:02 [PATCH v2 0/3] fix stuck in accessing hwrng attributes Amos Kong
  2014-09-15 16:02 ` [PATCH v2 1/3] virtio-rng cleanup: move some code out of mutex protection Amos Kong
@ 2014-09-15 16:02 ` Amos Kong
  2014-09-18  2:43   ` Rusty Russell
  2014-09-15 16:02 ` [PATCH v2 3/3] hw_random: increase schedule timeout in rng_dev_read() Amos Kong
  2014-09-17  9:30 ` [PATCH v2 0/3] fix stuck in accessing hwrng attributes Herbert Xu
  3 siblings, 1 reply; 20+ messages in thread
From: Amos Kong @ 2014-09-15 16:02 UTC (permalink / raw)
  To: virtualization; +Cc: kvm, herbert, m, mb, mpm, rusty, amit.shah, linux-kernel

I started a QEMU (non-smp) guest with one virtio-rng device, and read
random data from /dev/hwrng by dd:

 # dd if=/dev/hwrng of=/dev/null &

In the same time, if I check hwrng attributes from sysfs by cat:

 # cat /sys/class/misc/hw_random/rng_*

The cat process always gets stuck with slow backend (5 k/s), if we
use a quick backend (1.2 M/s), the cat process will cost 1 to 2
minutes. The stuck doesn't exist for smp guest.

Reading syscall enters kernel and call rng_dev_read(), it's user
context. We used need_resched() to check if other tasks need to
be run, but it almost always return false, and re-hold the mutex
lock. The attributes accessing process always fails to hold the
lock, so the cat gets stuck.

User context doesn't allow other user contexts run on that CPU,
unless the kernel code sleeps for some reason. This is why the
need_reshed() always return false here.

This patch removed need_resched() and always schedule other tasks
then other tasks can have chance to hold the lock and execute
protected code.

Signed-off-by: Amos Kong <akong@redhat.com>
---
 drivers/char/hw_random/core.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/char/hw_random/core.c b/drivers/char/hw_random/core.c
index c591d7e..263a370 100644
--- a/drivers/char/hw_random/core.c
+++ b/drivers/char/hw_random/core.c
@@ -195,8 +195,7 @@ static ssize_t rng_dev_read(struct file *filp, char __user *buf,
 
 		mutex_unlock(&rng_mutex);
 
-		if (need_resched())
-			schedule_timeout_interruptible(1);
+		schedule_timeout_interruptible(1);
 
 		if (signal_pending(current)) {
 			err = -ERESTARTSYS;
-- 
1.9.3


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v2 3/3] hw_random: increase schedule timeout in rng_dev_read()
  2014-09-15 16:02 [PATCH v2 0/3] fix stuck in accessing hwrng attributes Amos Kong
  2014-09-15 16:02 ` [PATCH v2 1/3] virtio-rng cleanup: move some code out of mutex protection Amos Kong
  2014-09-15 16:02 ` [PATCH v2 2/3] hw_random: fix stuck in catting hwrng attributes Amos Kong
@ 2014-09-15 16:02 ` Amos Kong
  2014-09-15 16:13   ` Michael Büsch
  2014-09-17  9:30 ` [PATCH v2 0/3] fix stuck in accessing hwrng attributes Herbert Xu
  3 siblings, 1 reply; 20+ messages in thread
From: Amos Kong @ 2014-09-15 16:02 UTC (permalink / raw)
  To: virtualization; +Cc: kvm, herbert, m, mb, mpm, rusty, amit.shah, linux-kernel

This patch increases the schedule timeout to 10 jiffies, it's more
appropriate, then other takes can easy to hold the mutex lock.

Signed-off-by: Amos Kong <akong@redhat.com>
---
 drivers/char/hw_random/core.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/char/hw_random/core.c b/drivers/char/hw_random/core.c
index 263a370..b5d1b6f 100644
--- a/drivers/char/hw_random/core.c
+++ b/drivers/char/hw_random/core.c
@@ -195,7 +195,7 @@ static ssize_t rng_dev_read(struct file *filp, char __user *buf,
 
 		mutex_unlock(&rng_mutex);
 
-		schedule_timeout_interruptible(1);
+		schedule_timeout_interruptible(10);
 
 		if (signal_pending(current)) {
 			err = -ERESTARTSYS;
-- 
1.9.3


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 1/3] virtio-rng cleanup: move some code out of mutex protection
  2014-09-15 16:02 ` [PATCH v2 1/3] virtio-rng cleanup: move some code out of mutex protection Amos Kong
@ 2014-09-15 16:13   ` Michael Büsch
  2014-09-16  0:30     ` Amos Kong
  0 siblings, 1 reply; 20+ messages in thread
From: Michael Büsch @ 2014-09-15 16:13 UTC (permalink / raw)
  To: Amos Kong
  Cc: virtualization, kvm, herbert, mpm, rusty, amit.shah, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 1593 bytes --]

On Tue, 16 Sep 2014 00:02:27 +0800
Amos Kong <akong@redhat.com> wrote:

> It doesn't save too much cpu time as expected, just a cleanup.
> 
> Signed-off-by: Amos Kong <akong@redhat.com>
> ---
>  drivers/char/hw_random/core.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/char/hw_random/core.c b/drivers/char/hw_random/core.c
> index aa30a25..c591d7e 100644
> --- a/drivers/char/hw_random/core.c
> +++ b/drivers/char/hw_random/core.c
> @@ -270,8 +270,8 @@ static ssize_t hwrng_attr_current_show(struct device *dev,
>  		return -ERESTARTSYS;
>  	if (current_rng)
>  		name = current_rng->name;
> -	ret = snprintf(buf, PAGE_SIZE, "%s\n", name);
>  	mutex_unlock(&rng_mutex);
> +	ret = snprintf(buf, PAGE_SIZE, "%s\n", name);

I'm not sure this is safe.
Name is just a pointer.
What if the hwrng gets unregistered after unlock and just before the snprintf?

>  	return ret;
>  }
> @@ -284,19 +284,19 @@ static ssize_t hwrng_attr_available_show(struct device *dev,
>  	ssize_t ret = 0;
>  	struct hwrng *rng;
>  
> +	buf[0] = '\0';
>  	err = mutex_lock_interruptible(&rng_mutex);
>  	if (err)
>  		return -ERESTARTSYS;
> -	buf[0] = '\0';
>  	list_for_each_entry(rng, &rng_list, list) {
>  		strncat(buf, rng->name, PAGE_SIZE - ret - 1);
>  		ret += strlen(rng->name);
>  		strncat(buf, " ", PAGE_SIZE - ret - 1);
>  		ret++;
>  	}
> +	mutex_unlock(&rng_mutex);
>  	strncat(buf, "\n", PAGE_SIZE - ret - 1);
>  	ret++;
> -	mutex_unlock(&rng_mutex);
>  
>  	return ret;
>  }

This looks ok.

-- 
Michael

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 3/3] hw_random: increase schedule timeout in rng_dev_read()
  2014-09-15 16:02 ` [PATCH v2 3/3] hw_random: increase schedule timeout in rng_dev_read() Amos Kong
@ 2014-09-15 16:13   ` Michael Büsch
  2014-09-16  0:27     ` Amos Kong
  0 siblings, 1 reply; 20+ messages in thread
From: Michael Büsch @ 2014-09-15 16:13 UTC (permalink / raw)
  To: Amos Kong
  Cc: virtualization, kvm, herbert, mpm, rusty, amit.shah, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 946 bytes --]

On Tue, 16 Sep 2014 00:02:29 +0800
Amos Kong <akong@redhat.com> wrote:

> This patch increases the schedule timeout to 10 jiffies, it's more
> appropriate, then other takes can easy to hold the mutex lock.
> 
> Signed-off-by: Amos Kong <akong@redhat.com>
> ---
>  drivers/char/hw_random/core.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/char/hw_random/core.c b/drivers/char/hw_random/core.c
> index 263a370..b5d1b6f 100644
> --- a/drivers/char/hw_random/core.c
> +++ b/drivers/char/hw_random/core.c
> @@ -195,7 +195,7 @@ static ssize_t rng_dev_read(struct file *filp, char __user *buf,
>  
>  		mutex_unlock(&rng_mutex);
>  
> -		schedule_timeout_interruptible(1);
> +		schedule_timeout_interruptible(10);
>  
>  		if (signal_pending(current)) {
>  			err = -ERESTARTSYS;

Does a schedule of 1 ms or 10 ms decrease the throughput?
I think we need some benchmarks.

-- 
Michael

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 3/3] hw_random: increase schedule timeout in rng_dev_read()
  2014-09-15 16:13   ` Michael Büsch
@ 2014-09-16  0:27     ` Amos Kong
  2014-09-16 15:01       ` Michael Büsch
  0 siblings, 1 reply; 20+ messages in thread
From: Amos Kong @ 2014-09-16  0:27 UTC (permalink / raw)
  To: Michael Büsch
  Cc: virtualization, kvm, herbert, mpm, rusty, amit.shah, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 1930 bytes --]

On Mon, Sep 15, 2014 at 06:13:31PM +0200, Michael Büsch wrote:
> On Tue, 16 Sep 2014 00:02:29 +0800
> Amos Kong <akong@redhat.com> wrote:
> 
> > This patch increases the schedule timeout to 10 jiffies, it's more
> > appropriate, then other takes can easy to hold the mutex lock.
> > 
> > Signed-off-by: Amos Kong <akong@redhat.com>
> > ---
> >  drivers/char/hw_random/core.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/drivers/char/hw_random/core.c b/drivers/char/hw_random/core.c
> > index 263a370..b5d1b6f 100644
> > --- a/drivers/char/hw_random/core.c
> > +++ b/drivers/char/hw_random/core.c
> > @@ -195,7 +195,7 @@ static ssize_t rng_dev_read(struct file *filp, char __user *buf,
> >  
> >  		mutex_unlock(&rng_mutex);
> >  
> > -		schedule_timeout_interruptible(1);
> > +		schedule_timeout_interruptible(10);
> >  
> >  		if (signal_pending(current)) {
> >  			err = -ERESTARTSYS;
> 
> Does a schedule of 1 ms or 10 ms decrease the throughput?

In my test environment, 1 jiffe always works (100%), as suggested by
Amit 10 jiffes is more appropriate.

After applied current 3 patches, there is a throughput regression.

  1.2 M/s -> 6 K/s

We can only schedule in the end of loop (size == 0), and only for
non-smp guest. So smp guest won't be effected.

|               if (!size && num_online_cpus() == 1)
|                       schedule_timeout_interruptible(timeout);


Set timeout to 1:
  non-smp guest with quick backend (1.2M/s) -> about 49K/s)

Set timeout to 10:
  non-smp guest with quick backend (1.2M/s) -> about 490K/s)

We might need other benchmark to test the performance, but we can
see the bug clearly caused a regression.

As we discussed in other thread, need_resched() should work in this
case, so those patches might be wrong fixing.

> I think we need some benchmarks.
> 
> -- 
> Michael



-- 
			Amos.

[-- Attachment #2: Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 1/3] virtio-rng cleanup: move some code out of mutex protection
  2014-09-15 16:13   ` Michael Büsch
@ 2014-09-16  0:30     ` Amos Kong
  0 siblings, 0 replies; 20+ messages in thread
From: Amos Kong @ 2014-09-16  0:30 UTC (permalink / raw)
  To: Michael Büsch
  Cc: virtualization, kvm, herbert, mpm, rusty, amit.shah, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 1867 bytes --]

On Mon, Sep 15, 2014 at 06:13:20PM +0200, Michael Büsch wrote:
> On Tue, 16 Sep 2014 00:02:27 +0800
> Amos Kong <akong@redhat.com> wrote:
> 
> > It doesn't save too much cpu time as expected, just a cleanup.
> > 
> > Signed-off-by: Amos Kong <akong@redhat.com>
> > ---
> >  drivers/char/hw_random/core.c | 6 +++---
> >  1 file changed, 3 insertions(+), 3 deletions(-)
> > 
> > diff --git a/drivers/char/hw_random/core.c b/drivers/char/hw_random/core.c
> > index aa30a25..c591d7e 100644
> > --- a/drivers/char/hw_random/core.c
> > +++ b/drivers/char/hw_random/core.c
> > @@ -270,8 +270,8 @@ static ssize_t hwrng_attr_current_show(struct device *dev,
> >  		return -ERESTARTSYS;
> >  	if (current_rng)
> >  		name = current_rng->name;
> > -	ret = snprintf(buf, PAGE_SIZE, "%s\n", name);
> >  	mutex_unlock(&rng_mutex);
> > +	ret = snprintf(buf, PAGE_SIZE, "%s\n", name);
> 
> I'm not sure this is safe.
> Name is just a pointer.
> What if the hwrng gets unregistered after unlock and just before the snprintf?

Oh, it points to protected current_rng->name, I will drop this
cleanup. Thanks.
 
> >  	return ret;
> >  }
> > @@ -284,19 +284,19 @@ static ssize_t hwrng_attr_available_show(struct device *dev,
> >  	ssize_t ret = 0;
> >  	struct hwrng *rng;
> >  
> > +	buf[0] = '\0';
> >  	err = mutex_lock_interruptible(&rng_mutex);
> >  	if (err)
> >  		return -ERESTARTSYS;
> > -	buf[0] = '\0';
> >  	list_for_each_entry(rng, &rng_list, list) {
> >  		strncat(buf, rng->name, PAGE_SIZE - ret - 1);
> >  		ret += strlen(rng->name);
> >  		strncat(buf, " ", PAGE_SIZE - ret - 1);
> >  		ret++;
> >  	}
> > +	mutex_unlock(&rng_mutex);
> >  	strncat(buf, "\n", PAGE_SIZE - ret - 1);
> >  	ret++;
> > -	mutex_unlock(&rng_mutex);
> >  
> >  	return ret;
> >  }
> 
> This looks ok.
> 
> -- 
> Michael

-- 
			Amos.

[-- Attachment #2: Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 3/3] hw_random: increase schedule timeout in rng_dev_read()
  2014-09-16  0:27     ` Amos Kong
@ 2014-09-16 15:01       ` Michael Büsch
  0 siblings, 0 replies; 20+ messages in thread
From: Michael Büsch @ 2014-09-16 15:01 UTC (permalink / raw)
  To: Amos Kong
  Cc: virtualization, kvm, herbert, mpm, rusty, amit.shah, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 283 bytes --]

On Tue, 16 Sep 2014 08:27:40 +0800
Amos Kong <akong@redhat.com> wrote:

> Set timeout to 10:
>   non-smp guest with quick backend (1.2M/s) -> about 490K/s)

That sounds like an awful lot. This is a 60% loss in throughput.
I don't think we can live with that.

-- 
Michael

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 0/3] fix stuck in accessing hwrng attributes
  2014-09-15 16:02 [PATCH v2 0/3] fix stuck in accessing hwrng attributes Amos Kong
                   ` (2 preceding siblings ...)
  2014-09-15 16:02 ` [PATCH v2 3/3] hw_random: increase schedule timeout in rng_dev_read() Amos Kong
@ 2014-09-17  9:30 ` Herbert Xu
  3 siblings, 0 replies; 20+ messages in thread
From: Herbert Xu @ 2014-09-17  9:30 UTC (permalink / raw)
  To: Amos Kong; +Cc: virtualization, kvm, m, mb, mpm, rusty, amit.shah, linux-kernel

On Tue, Sep 16, 2014 at 12:02:26AM +0800, Amos Kong wrote:
> If we read hwrng by long-running dd process, it takes too much cpu
> time and almost hold the mutex lock. When we check hwrng attributes
> from sysfs by cat, it gets stuck in waiting the lock releaseing.
> The problem can only be reproduced with non-smp guest with slow backend.
> 
> This patchset resolves the issue by changing rng_dev_read() to always
> schedule 10 jiffies after release mutex lock, then cat process can
> have chance to get the lock and execute protected code without stuck.

Sorry I'm not going to accept your fix which simply papers over
the problem.

Please bite the bullet and convert this over to RCU.

Cheers,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 2/3] hw_random: fix stuck in catting hwrng attributes
  2014-09-15 16:02 ` [PATCH v2 2/3] hw_random: fix stuck in catting hwrng attributes Amos Kong
@ 2014-09-18  2:43   ` Rusty Russell
  2014-09-18  2:48     ` [PATCH 1/5] hw_random: place mutex around read functions and buffers Rusty Russell
  2014-09-18 12:47     ` [PATCH v2 2/3] hw_random: fix stuck in catting hwrng attributes Amos Kong
  0 siblings, 2 replies; 20+ messages in thread
From: Rusty Russell @ 2014-09-18  2:43 UTC (permalink / raw)
  To: Amos Kong, virtualization
  Cc: kvm, herbert, m, mb, mpm, amit.shah, linux-kernel, Linus Torvalds

Amos Kong <akong@redhat.com> writes:

> I started a QEMU (non-smp) guest with one virtio-rng device, and read
> random data from /dev/hwrng by dd:
>
>  # dd if=/dev/hwrng of=/dev/null &
>
> In the same time, if I check hwrng attributes from sysfs by cat:
>
>  # cat /sys/class/misc/hw_random/rng_*
>
> The cat process always gets stuck with slow backend (5 k/s), if we
> use a quick backend (1.2 M/s), the cat process will cost 1 to 2
> minutes. The stuck doesn't exist for smp guest.
>
> Reading syscall enters kernel and call rng_dev_read(), it's user
> context. We used need_resched() to check if other tasks need to
> be run, but it almost always return false, and re-hold the mutex
> lock. The attributes accessing process always fails to hold the
> lock, so the cat gets stuck.
>
> User context doesn't allow other user contexts run on that CPU,
> unless the kernel code sleeps for some reason. This is why the
> need_reshed() always return false here.
>
> This patch removed need_resched() and always schedule other tasks
> then other tasks can have chance to hold the lock and execute
> protected code.

OK, this is going to be a rant.

Your explanation doesn't make sense at all.  Worse, your solution breaks
the advice of Kernighan & Plaugher: "Don't patch bad code - rewrite
it.".

But worst of all, this detailed explanation might have convinced me you
understood the problem better than I did, and applied your patch.

I did some tests.  For me, as expected, the process spends its time
inside the virtio rng read function, holding the mutex and thus blocking
sysfs access; it's not a failure of this code at all.

Your schedule_timeout() "fix" probably just helps by letting the host
refresh entropy, so we spend less time waiting in the read fn.

I will post a series, which unfortunately is only lightly tested, then
I'm going to have some beer to begin my holiday.  That may help me
forget my disappointment at seeing respected fellow developers
monkey-patching random code they don't understand.

Grrr....
Rusty.

> Signed-off-by: Amos Kong <akong@redhat.com>
> ---
>  drivers/char/hw_random/core.c | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
>
> diff --git a/drivers/char/hw_random/core.c b/drivers/char/hw_random/core.c
> index c591d7e..263a370 100644
> --- a/drivers/char/hw_random/core.c
> +++ b/drivers/char/hw_random/core.c
> @@ -195,8 +195,7 @@ static ssize_t rng_dev_read(struct file *filp, char __user *buf,
>  
>  		mutex_unlock(&rng_mutex);
>  
> -		if (need_resched())
> -			schedule_timeout_interruptible(1);
> +		schedule_timeout_interruptible(1);
>  
>  		if (signal_pending(current)) {
>  			err = -ERESTARTSYS;
> -- 
> 1.9.3


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 1/5] hw_random: place mutex around read functions and buffers.
  2014-09-18  2:43   ` Rusty Russell
@ 2014-09-18  2:48     ` Rusty Russell
  2014-09-18  2:48       ` [PATCH 2/5] hw_random: use reference counts on each struct hwrng Rusty Russell
                         ` (3 more replies)
  2014-09-18 12:47     ` [PATCH v2 2/3] hw_random: fix stuck in catting hwrng attributes Amos Kong
  1 sibling, 4 replies; 20+ messages in thread
From: Rusty Russell @ 2014-09-18  2:48 UTC (permalink / raw)
  To: Amos Kong, virtualization, kvm, herbert, m, mb, mpm, amit.shah,
	linux-kernel
  Cc: Rusty Russell

There's currently a big lock around everything, and it means that we
can't query sysfs (eg /sys/devices/virtual/misc/hw_random/rng_current)
while the rng is reading.  This is a real problem when the rng is slow,
or blocked (eg. virtio_rng with qemu's default /dev/random backend)

This doesn't help (it leaves the current lock untouched), just adds a
lock to protect the read function and the static buffers, in preparation
for transition.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
---
 drivers/char/hw_random/core.c | 20 +++++++++++++++++---
 1 file changed, 17 insertions(+), 3 deletions(-)

diff --git a/drivers/char/hw_random/core.c b/drivers/char/hw_random/core.c
index aa30a25c8d49..b1b6042ad85c 100644
--- a/drivers/char/hw_random/core.c
+++ b/drivers/char/hw_random/core.c
@@ -53,7 +53,10 @@
 static struct hwrng *current_rng;
 static struct task_struct *hwrng_fill;
 static LIST_HEAD(rng_list);
+/* Protects rng_list and current_rng */
 static DEFINE_MUTEX(rng_mutex);
+/* Protects rng read functions, data_avail, rng_buffer and rng_fillbuf */
+static DEFINE_MUTEX(reading_mutex);
 static int data_avail;
 static u8 *rng_buffer, *rng_fillbuf;
 static unsigned short current_quality;
@@ -81,7 +84,9 @@ static void add_early_randomness(struct hwrng *rng)
 	unsigned char bytes[16];
 	int bytes_read;
 
+	mutex_lock(&reading_mutex);
 	bytes_read = rng_get_data(rng, bytes, sizeof(bytes), 1);
+	mutex_unlock(&reading_mutex);
 	if (bytes_read > 0)
 		add_device_randomness(bytes, bytes_read);
 }
@@ -128,6 +133,7 @@ static inline int rng_get_data(struct hwrng *rng, u8 *buffer, size_t size,
 			int wait) {
 	int present;
 
+	BUG_ON(!mutex_is_locked(&reading_mutex));
 	if (rng->read)
 		return rng->read(rng, (void *)buffer, size, wait);
 
@@ -160,13 +166,14 @@ static ssize_t rng_dev_read(struct file *filp, char __user *buf,
 			goto out_unlock;
 		}
 
+		mutex_lock(&reading_mutex);
 		if (!data_avail) {
 			bytes_read = rng_get_data(current_rng, rng_buffer,
 				rng_buffer_size(),
 				!(filp->f_flags & O_NONBLOCK));
 			if (bytes_read < 0) {
 				err = bytes_read;
-				goto out_unlock;
+				goto out_unlock_reading;
 			}
 			data_avail = bytes_read;
 		}
@@ -174,7 +181,7 @@ static ssize_t rng_dev_read(struct file *filp, char __user *buf,
 		if (!data_avail) {
 			if (filp->f_flags & O_NONBLOCK) {
 				err = -EAGAIN;
-				goto out_unlock;
+				goto out_unlock_reading;
 			}
 		} else {
 			len = data_avail;
@@ -186,7 +193,7 @@ static ssize_t rng_dev_read(struct file *filp, char __user *buf,
 			if (copy_to_user(buf + ret, rng_buffer + data_avail,
 								len)) {
 				err = -EFAULT;
-				goto out_unlock;
+				goto out_unlock_reading;
 			}
 
 			size -= len;
@@ -194,6 +201,7 @@ static ssize_t rng_dev_read(struct file *filp, char __user *buf,
 		}
 
 		mutex_unlock(&rng_mutex);
+		mutex_unlock(&reading_mutex);
 
 		if (need_resched())
 			schedule_timeout_interruptible(1);
@@ -208,6 +216,9 @@ out:
 out_unlock:
 	mutex_unlock(&rng_mutex);
 	goto out;
+out_unlock_reading:
+	mutex_unlock(&reading_mutex);
+	goto out_unlock;
 }
 
 
@@ -348,13 +359,16 @@ static int hwrng_fillfn(void *unused)
 	while (!kthread_should_stop()) {
 		if (!current_rng)
 			break;
+		mutex_lock(&reading_mutex);
 		rc = rng_get_data(current_rng, rng_fillbuf,
 				  rng_buffer_size(), 1);
+		mutex_unlock(&reading_mutex);
 		if (rc <= 0) {
 			pr_warn("hwrng: no data available\n");
 			msleep_interruptible(10000);
 			continue;
 		}
+		/* Outside lock, sure, but y'know: randomness. */
 		add_hwgenerator_randomness((void *)rng_fillbuf, rc,
 					   rc * current_quality * 8 >> 10);
 	}
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 2/5] hw_random: use reference counts on each struct hwrng.
  2014-09-18  2:48     ` [PATCH 1/5] hw_random: place mutex around read functions and buffers Rusty Russell
@ 2014-09-18  2:48       ` Rusty Russell
  2014-09-18 12:22         ` Amos Kong
  2014-09-18  2:48       ` [PATCH 3/5] hw_random: fix unregister race Rusty Russell
                         ` (2 subsequent siblings)
  3 siblings, 1 reply; 20+ messages in thread
From: Rusty Russell @ 2014-09-18  2:48 UTC (permalink / raw)
  To: Amos Kong, virtualization, kvm, herbert, m, mb, mpm, amit.shah,
	linux-kernel
  Cc: Rusty Russell

current_rng holds one reference, and we bump it every time we want
to do a read from it.

This means we only hold the rng_mutex to grab or drop a reference,
so accessing /sys/devices/virtual/misc/hw_random/rng_current doesn't
block on read of /dev/hwrng.

Using a kref is overkill (we're always under the rng_mutex), but
a standard pattern.

This also solves the problem that the hwrng_fillfn thread was
accessing current_rng without a lock, which could change (eg. to NULL)
underneath it.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
---
 drivers/char/hw_random/core.c | 135 ++++++++++++++++++++++++++++--------------
 include/linux/hw_random.h     |   2 +
 2 files changed, 94 insertions(+), 43 deletions(-)

diff --git a/drivers/char/hw_random/core.c b/drivers/char/hw_random/core.c
index b1b6042ad85c..dc9092a1075d 100644
--- a/drivers/char/hw_random/core.c
+++ b/drivers/char/hw_random/core.c
@@ -42,6 +42,7 @@
 #include <linux/delay.h>
 #include <linux/slab.h>
 #include <linux/random.h>
+#include <linux/err.h>
 #include <asm/uaccess.h>
 
 
@@ -91,6 +92,59 @@ static void add_early_randomness(struct hwrng *rng)
 		add_device_randomness(bytes, bytes_read);
 }
 
+static inline void cleanup_rng(struct kref *kref)
+{
+	struct hwrng *rng = container_of(kref, struct hwrng, ref);
+
+	if (rng->cleanup)
+		rng->cleanup(rng);
+}
+
+static void set_current_rng(struct hwrng *rng)
+{
+	BUG_ON(!mutex_is_locked(&rng_mutex));
+	kref_get(&rng->ref);
+	current_rng = rng;
+}
+
+static void drop_current_rng(void)
+{
+	BUG_ON(!mutex_is_locked(&rng_mutex));
+	if (!current_rng)
+		return;
+
+	kref_put(&current_rng->ref, cleanup_rng);
+	current_rng = NULL;
+}
+
+/* Returns ERR_PTR(), NULL or refcounted hwrng */
+static struct hwrng *get_current_rng(void)
+{
+	struct hwrng *rng;
+
+	if (mutex_lock_interruptible(&rng_mutex))
+		return ERR_PTR(-ERESTARTSYS);
+
+	rng = current_rng;
+	if (rng)
+		kref_get(&rng->ref);
+
+	mutex_unlock(&rng_mutex);
+	return rng;
+}
+
+static void put_rng(struct hwrng *rng)
+{
+	/*
+	 * Hold rng_mutex here so we serialize in case they set_current_rng
+	 * on rng again immediately.
+	 */
+	mutex_lock(&rng_mutex);
+	if (rng)
+		kref_put(&rng->ref, cleanup_rng);
+	mutex_unlock(&rng_mutex);
+}
+
 static inline int hwrng_init(struct hwrng *rng)
 {
 	if (rng->init) {
@@ -113,12 +167,6 @@ static inline int hwrng_init(struct hwrng *rng)
 	return 0;
 }
 
-static inline void hwrng_cleanup(struct hwrng *rng)
-{
-	if (rng && rng->cleanup)
-		rng->cleanup(rng);
-}
-
 static int rng_dev_open(struct inode *inode, struct file *filp)
 {
 	/* enforce read-only access to this chrdev */
@@ -154,21 +202,22 @@ static ssize_t rng_dev_read(struct file *filp, char __user *buf,
 	ssize_t ret = 0;
 	int err = 0;
 	int bytes_read, len;
+	struct hwrng *rng;
 
 	while (size) {
-		if (mutex_lock_interruptible(&rng_mutex)) {
-			err = -ERESTARTSYS;
+		rng = get_current_rng();
+		if (IS_ERR(rng)) {
+			err = PTR_ERR(rng);
 			goto out;
 		}
-
-		if (!current_rng) {
+		if (!rng) {
 			err = -ENODEV;
-			goto out_unlock;
+			goto out;
 		}
 
 		mutex_lock(&reading_mutex);
 		if (!data_avail) {
-			bytes_read = rng_get_data(current_rng, rng_buffer,
+			bytes_read = rng_get_data(rng, rng_buffer,
 				rng_buffer_size(),
 				!(filp->f_flags & O_NONBLOCK));
 			if (bytes_read < 0) {
@@ -200,7 +249,6 @@ static ssize_t rng_dev_read(struct file *filp, char __user *buf,
 			ret += len;
 		}
 
-		mutex_unlock(&rng_mutex);
 		mutex_unlock(&reading_mutex);
 
 		if (need_resched())
@@ -210,15 +258,16 @@ static ssize_t rng_dev_read(struct file *filp, char __user *buf,
 			err = -ERESTARTSYS;
 			goto out;
 		}
+
+		put_rng(rng);
 	}
 out:
 	return ret ? : err;
-out_unlock:
-	mutex_unlock(&rng_mutex);
-	goto out;
+
 out_unlock_reading:
 	mutex_unlock(&reading_mutex);
-	goto out_unlock;
+	put_rng(rng);
+	goto out;
 }
 
 
@@ -257,8 +306,8 @@ static ssize_t hwrng_attr_current_store(struct device *dev,
 			err = hwrng_init(rng);
 			if (err)
 				break;
-			hwrng_cleanup(current_rng);
-			current_rng = rng;
+			drop_current_rng();
+			set_current_rng(rng);
 			err = 0;
 			break;
 		}
@@ -272,17 +321,15 @@ static ssize_t hwrng_attr_current_show(struct device *dev,
 				       struct device_attribute *attr,
 				       char *buf)
 {
-	int err;
 	ssize_t ret;
-	const char *name = "none";
+	struct hwrng *rng;
 
-	err = mutex_lock_interruptible(&rng_mutex);
-	if (err)
-		return -ERESTARTSYS;
-	if (current_rng)
-		name = current_rng->name;
-	ret = snprintf(buf, PAGE_SIZE, "%s\n", name);
-	mutex_unlock(&rng_mutex);
+	rng = get_current_rng();
+	if (IS_ERR(rng))
+		return PTR_ERR(rng);
+
+	ret = snprintf(buf, PAGE_SIZE, "%s\n", rng ? rng->name : "none");
+	put_rng(rng);
 
 	return ret;
 }
@@ -357,12 +404,16 @@ static int hwrng_fillfn(void *unused)
 	long rc;
 
 	while (!kthread_should_stop()) {
-		if (!current_rng)
+		struct hwrng *rng;
+
+		rng = get_current_rng();
+		if (IS_ERR(rng) || !rng)
 			break;
 		mutex_lock(&reading_mutex);
-		rc = rng_get_data(current_rng, rng_fillbuf,
+		rc = rng_get_data(rng, rng_fillbuf,
 				  rng_buffer_size(), 1);
 		mutex_unlock(&reading_mutex);
+		put_rng(rng);
 		if (rc <= 0) {
 			pr_warn("hwrng: no data available\n");
 			msleep_interruptible(10000);
@@ -423,14 +474,13 @@ int hwrng_register(struct hwrng *rng)
 		err = hwrng_init(rng);
 		if (err)
 			goto out_unlock;
-		current_rng = rng;
+		set_current_rng(rng);
 	}
 	err = 0;
 	if (!old_rng) {
 		err = register_miscdev();
 		if (err) {
-			hwrng_cleanup(rng);
-			current_rng = NULL;
+			drop_current_rng();
 			goto out_unlock;
 		}
 	}
@@ -457,22 +507,21 @@ EXPORT_SYMBOL_GPL(hwrng_register);
 
 void hwrng_unregister(struct hwrng *rng)
 {
-	int err;
-
 	mutex_lock(&rng_mutex);
 
 	list_del(&rng->list);
 	if (current_rng == rng) {
-		hwrng_cleanup(rng);
-		if (list_empty(&rng_list)) {
-			current_rng = NULL;
-		} else {
-			current_rng = list_entry(rng_list.prev, struct hwrng, list);
-			err = hwrng_init(current_rng);
-			if (err)
-				current_rng = NULL;
+		drop_current_rng();
+		if (!list_empty(&rng_list)) {
+			struct hwrng *tail;
+
+			tail = list_entry(rng_list.prev, struct hwrng, list);
+
+			if (hwrng_init(tail) == 0)
+				set_current_rng(tail);
 		}
 	}
+
 	if (list_empty(&rng_list)) {
 		unregister_miscdev();
 		if (hwrng_fill)
diff --git a/include/linux/hw_random.h b/include/linux/hw_random.h
index 914bb08cd738..c212e71ea886 100644
--- a/include/linux/hw_random.h
+++ b/include/linux/hw_random.h
@@ -14,6 +14,7 @@
 
 #include <linux/types.h>
 #include <linux/list.h>
+#include <linux/kref.h>
 
 /**
  * struct hwrng - Hardware Random Number Generator driver
@@ -44,6 +45,7 @@ struct hwrng {
 
 	/* internal. */
 	struct list_head list;
+	struct kref ref;
 };
 
 /** Register a new Hardware Random Number Generator driver. */
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 3/5] hw_random: fix unregister race.
  2014-09-18  2:48     ` [PATCH 1/5] hw_random: place mutex around read functions and buffers Rusty Russell
  2014-09-18  2:48       ` [PATCH 2/5] hw_random: use reference counts on each struct hwrng Rusty Russell
@ 2014-09-18  2:48       ` Rusty Russell
  2014-10-21 14:15         ` Herbert Xu
  2014-09-18  2:48       ` [PATCH 4/5] hw_random: don't double-check old_rng Rusty Russell
  2014-09-18  2:48       ` [PATCH 5/5] hw_random: don't init list element we're about to add to list Rusty Russell
  3 siblings, 1 reply; 20+ messages in thread
From: Rusty Russell @ 2014-09-18  2:48 UTC (permalink / raw)
  To: Amos Kong, virtualization, kvm, herbert, m, mb, mpm, amit.shah,
	linux-kernel
  Cc: Rusty Russell

The previous patch added one potential problem: we can still be
reading from a hwrng when it's unregistered.  Add a wait for zero
in the hwrng_unregister path.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
---
 drivers/char/hw_random/core.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/drivers/char/hw_random/core.c b/drivers/char/hw_random/core.c
index dc9092a1075d..b4a21e9521cf 100644
--- a/drivers/char/hw_random/core.c
+++ b/drivers/char/hw_random/core.c
@@ -60,6 +60,7 @@ static DEFINE_MUTEX(rng_mutex);
 static DEFINE_MUTEX(reading_mutex);
 static int data_avail;
 static u8 *rng_buffer, *rng_fillbuf;
+static DECLARE_WAIT_QUEUE_HEAD(rng_done);
 static unsigned short current_quality;
 static unsigned short default_quality; /* = 0; default to "off" */
 
@@ -98,6 +99,7 @@ static inline void cleanup_rng(struct kref *kref)
 
 	if (rng->cleanup)
 		rng->cleanup(rng);
+	wake_up_all(&rng_done);
 }
 
 static void set_current_rng(struct hwrng *rng)
@@ -529,6 +531,9 @@ void hwrng_unregister(struct hwrng *rng)
 	}
 
 	mutex_unlock(&rng_mutex);
+
+	/* Just in case rng is reading right now, wait. */
+	wait_event(rng_done, atomic_read(&rng->ref.refcount) == 0);
 }
 EXPORT_SYMBOL_GPL(hwrng_unregister);
 
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 4/5] hw_random: don't double-check old_rng.
  2014-09-18  2:48     ` [PATCH 1/5] hw_random: place mutex around read functions and buffers Rusty Russell
  2014-09-18  2:48       ` [PATCH 2/5] hw_random: use reference counts on each struct hwrng Rusty Russell
  2014-09-18  2:48       ` [PATCH 3/5] hw_random: fix unregister race Rusty Russell
@ 2014-09-18  2:48       ` Rusty Russell
  2014-09-18  2:48       ` [PATCH 5/5] hw_random: don't init list element we're about to add to list Rusty Russell
  3 siblings, 0 replies; 20+ messages in thread
From: Rusty Russell @ 2014-09-18  2:48 UTC (permalink / raw)
  To: Amos Kong, virtualization, kvm, herbert, m, mb, mpm, amit.shah,
	linux-kernel
  Cc: Rusty Russell

Interesting anti-pattern.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
---
 drivers/char/hw_random/core.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/drivers/char/hw_random/core.c b/drivers/char/hw_random/core.c
index b4a21e9521cf..6a34feca6b43 100644
--- a/drivers/char/hw_random/core.c
+++ b/drivers/char/hw_random/core.c
@@ -472,14 +472,13 @@ int hwrng_register(struct hwrng *rng)
 	}
 
 	old_rng = current_rng;
+	err = 0;
 	if (!old_rng) {
 		err = hwrng_init(rng);
 		if (err)
 			goto out_unlock;
 		set_current_rng(rng);
-	}
-	err = 0;
-	if (!old_rng) {
+
 		err = register_miscdev();
 		if (err) {
 			drop_current_rng();
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 5/5] hw_random: don't init list element we're about to add to list.
  2014-09-18  2:48     ` [PATCH 1/5] hw_random: place mutex around read functions and buffers Rusty Russell
                         ` (2 preceding siblings ...)
  2014-09-18  2:48       ` [PATCH 4/5] hw_random: don't double-check old_rng Rusty Russell
@ 2014-09-18  2:48       ` Rusty Russell
  3 siblings, 0 replies; 20+ messages in thread
From: Rusty Russell @ 2014-09-18  2:48 UTC (permalink / raw)
  To: Amos Kong, virtualization, kvm, herbert, m, mb, mpm, amit.shah,
	linux-kernel
  Cc: Rusty Russell

Another interesting anti-pattern.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
---
 drivers/char/hw_random/core.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/char/hw_random/core.c b/drivers/char/hw_random/core.c
index 6a34feca6b43..96fa06716e95 100644
--- a/drivers/char/hw_random/core.c
+++ b/drivers/char/hw_random/core.c
@@ -485,7 +485,6 @@ int hwrng_register(struct hwrng *rng)
 			goto out_unlock;
 		}
 	}
-	INIT_LIST_HEAD(&rng->list);
 	list_add_tail(&rng->list, &rng_list);
 
 	if (old_rng && !rng->init) {
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH 2/5] hw_random: use reference counts on each struct hwrng.
  2014-09-18  2:48       ` [PATCH 2/5] hw_random: use reference counts on each struct hwrng Rusty Russell
@ 2014-09-18 12:22         ` Amos Kong
  0 siblings, 0 replies; 20+ messages in thread
From: Amos Kong @ 2014-09-18 12:22 UTC (permalink / raw)
  To: Rusty Russell
  Cc: virtualization, kvm, herbert, m, mpm, amit.shah, linux-kernel

On Thu, Sep 18, 2014 at 12:18:23PM +0930, Rusty Russell wrote:
> current_rng holds one reference, and we bump it every time we want
> to do a read from it.
> 
> This means we only hold the rng_mutex to grab or drop a reference,
> so accessing /sys/devices/virtual/misc/hw_random/rng_current doesn't
> block on read of /dev/hwrng.
> 
> Using a kref is overkill (we're always under the rng_mutex), but
> a standard pattern.
> 
> This also solves the problem that the hwrng_fillfn thread was
> accessing current_rng without a lock, which could change (eg. to NULL)
> underneath it.

Hi Rusty,
 
> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
> ---
>  drivers/char/hw_random/core.c | 135 ++++++++++++++++++++++++++++--------------
>  include/linux/hw_random.h     |   2 +
>  2 files changed, 94 insertions(+), 43 deletions(-)

...

>  static int rng_dev_open(struct inode *inode, struct file *filp)
>  {
>  	/* enforce read-only access to this chrdev */
> @@ -154,21 +202,22 @@ static ssize_t rng_dev_read(struct file *filp, char __user *buf,
>  	ssize_t ret = 0;
>  	int err = 0;
>  	int bytes_read, len;
> +	struct hwrng *rng;
>  
>  	while (size) {
> -		if (mutex_lock_interruptible(&rng_mutex)) {
> -			err = -ERESTARTSYS;
> +		rng = get_current_rng();
> +		if (IS_ERR(rng)) {
> +			err = PTR_ERR(rng);
>  			goto out;
>  		}
> -
> -		if (!current_rng) {
> +		if (!rng) {
>  			err = -ENODEV;
> -			goto out_unlock;
> +			goto out;
>  		}
>  
>  		mutex_lock(&reading_mutex);
>  		if (!data_avail) {
> -			bytes_read = rng_get_data(current_rng, rng_buffer,
> +			bytes_read = rng_get_data(rng, rng_buffer,
>  				rng_buffer_size(),
>  				!(filp->f_flags & O_NONBLOCK));
>  			if (bytes_read < 0) {
> @@ -200,7 +249,6 @@ static ssize_t rng_dev_read(struct file *filp, char __user *buf,
>  			ret += len;
>  		}
>  
> -		mutex_unlock(&rng_mutex);
>  		mutex_unlock(&reading_mutex);
>  
>  		if (need_resched())
> @@ -210,15 +258,16 @@ static ssize_t rng_dev_read(struct file *filp, char __user *buf,
>  			err = -ERESTARTSYS;

We need put_rng() in this error path. Otherwise, unhotplug will hang
in the end of hwrng_unregister()

|        /* Just in case rng is reading right now, wait. */
|        wait_event(rng_done, atomic_read(&rng->ref.refcount) == 0);

Steps to reproduce the hang:
  guest) # dd if=/dev/hwrng of=/dev/null 
  cancel dd process after 10 seconds
  guest) # dd if=/dev/hwrng of=/dev/null &
  hotunplug rng device from qemu monitor
  result: device can't be removed (still can find in QEMU monitor)


diff --git a/drivers/char/hw_random/core.c
b/drivers/char/hw_random/core.c
index 96fa067..4e22d70 100644
--- a/drivers/char/hw_random/core.c
+++ b/drivers/char/hw_random/core.c
@@ -258,6 +258,7 @@ static ssize_t rng_dev_read(struct file *filp,
char __user *buf,
 
                if (signal_pending(current)) {
                        err = -ERESTARTSYS;
+                       put_rng(rng);
                        goto out;
                }

>  			goto out;
>  		}
> +
> +		put_rng(rng);
>  	}
>  out:
>  	return ret ? : err;
> -out_unlock:
> -	mutex_unlock(&rng_mutex);
> -	goto out;
> +
>  out_unlock_reading:
>  	mutex_unlock(&reading_mutex);
> -	goto out_unlock;
> +	put_rng(rng);
> +	goto out;
>  }
>  
>  
> @@ -257,8 +306,8 @@ static ssize_t hwrng_attr_current_store(struct device *dev,
>  			err = hwrng_init(rng);
>  			if (err)
>  				break;
> -			hwrng_cleanup(current_rng);
> -			current_rng = rng;
> +			drop_current_rng();
> +			set_current_rng(rng);
>  			err = 0;
>  			break;
>  		}
> @@ -272,17 +321,15 @@ static ssize_t hwrng_attr_current_show(struct device *dev,
>  				       struct device_attribute *attr,
>  				       char *buf)
>  {
> -	int err;
>  	ssize_t ret;
> -	const char *name = "none";
> +	struct hwrng *rng;
>  
> -	err = mutex_lock_interruptible(&rng_mutex);
> -	if (err)
> -		return -ERESTARTSYS;
> -	if (current_rng)
> -		name = current_rng->name;
> -	ret = snprintf(buf, PAGE_SIZE, "%s\n", name);
> -	mutex_unlock(&rng_mutex);
> +	rng = get_current_rng();
> +	if (IS_ERR(rng))
> +		return PTR_ERR(rng);
> +
> +	ret = snprintf(buf, PAGE_SIZE, "%s\n", rng ? rng->name : "none");
> +	put_rng(rng);
>  
>  	return ret;
>  }
> @@ -357,12 +404,16 @@ static int hwrng_fillfn(void *unused)
>  	long rc;
>  
>  	while (!kthread_should_stop()) {
> -		if (!current_rng)
> +		struct hwrng *rng;
> +
> +		rng = get_current_rng();
> +		if (IS_ERR(rng) || !rng)
>  			break;
>  		mutex_lock(&reading_mutex);
> -		rc = rng_get_data(current_rng, rng_fillbuf,
> +		rc = rng_get_data(rng, rng_fillbuf,
>  				  rng_buffer_size(), 1);
>  		mutex_unlock(&reading_mutex);
> +		put_rng(rng);

^^^
This put_rng() called a deadlock. I will describe in the bottom.
                

>  		if (rc <= 0) {
>  			pr_warn("hwrng: no data available\n");
>  			msleep_interruptible(10000);
> @@ -423,14 +474,13 @@ int hwrng_register(struct hwrng *rng)
>  		err = hwrng_init(rng);
>  		if (err)
>  			goto out_unlock;
> -		current_rng = rng;
> +		set_current_rng(rng);
>  	}
>  	err = 0;
>  	if (!old_rng) {
>  		err = register_miscdev();
>  		if (err) {
> -			hwrng_cleanup(rng);
> -			current_rng = NULL;
> +			drop_current_rng();
>  			goto out_unlock;
>  		}
>  	}
> @@ -457,22 +507,21 @@ EXPORT_SYMBOL_GPL(hwrng_register);
>  
>  void hwrng_unregister(struct hwrng *rng)
>  {
> -	int err;
> -
>  	mutex_lock(&rng_mutex);
>  
>  	list_del(&rng->list);
>  	if (current_rng == rng) {
> -		hwrng_cleanup(rng);
> -		if (list_empty(&rng_list)) {
> -			current_rng = NULL;
> -		} else {
> -			current_rng = list_entry(rng_list.prev, struct hwrng, list);
> -			err = hwrng_init(current_rng);
> -			if (err)
> -				current_rng = NULL;
> +		drop_current_rng();
> +		if (!list_empty(&rng_list)) {
> +			struct hwrng *tail;
> +
> +			tail = list_entry(rng_list.prev, struct hwrng, list);
> +
> +			if (hwrng_init(tail) == 0)
> +				set_current_rng(tail);
>  		}
>  	}
> +
>  	if (list_empty(&rng_list)) {
>  		unregister_miscdev();
>  		if (hwrng_fill)

hwrng_unregister() and put_rng() grab the lock, if hwrng_unregister()
takes the lock, hwrng_fillfn() will stay at put_rng() to wait the
lock.

Right now, thread_stop() is insider lock protection, but we try to
wake up the fillfn thread and wait for its completion.

         |   wake_up_process(k);
         |   wait_for_completion(&kthread->exited);

The solution is moving kthread_stop() outsider of lock protection.


@@ -524,11 +525,11 @@ void hwrng_unregister(struct hwrng *rng)
 
        if (list_empty(&rng_list)) {
                unregister_miscdev();
+               mutex_unlock(&rng_mutex);
                if (hwrng_fill)
                        kthread_stop(hwrng_fill);
-       }
-
-       mutex_unlock(&rng_mutex);
+       } else
+               mutex_unlock(&rng_mutex);
 
        /* Just in case rng is reading right now, wait. */
        wait_event(rng_done, atomic_read(&rng->ref.refcount) == 0);

================
After applied my additional two fixes, both cating hung and hotunplug
issues were resolved.

| test 0:
|   hotunplug rng device from qemu monitor
| 
| test 1:
|   guest) # dd if=/dev/hwrng of=/dev/null &
|   hotunplug rng device from qemu monitor
| 
| test 2:
|   guest) # dd if=/dev/random of=/dev/null &
|   hotunplug rng device from qemu monitor
| 
| test 4:
|   guest) # dd if=/dev/hwrng of=/dev/null &
|   cat /sys/devices/virtual/misc/hw_random/rng_*
| 
| test 5:
|   guest) # dd if=/dev/hwrng of=/dev/null 
|   cancel dd process after 10 seconds
|   guest) # dd if=/dev/hwrng of=/dev/null &
|   hotunplug rng device from qemu monitor
|
| test 6:
|   use a fifo as rng backend, execute test 0 ~ 5 with no input of fifo

Test are all passed :-)

I know you are going or you already started your holiday, I will post
a v2 with my additional patches.

Thanks, Amos

> diff --git a/include/linux/hw_random.h b/include/linux/hw_random.h
> index 914bb08cd738..c212e71ea886 100644
> --- a/include/linux/hw_random.h
> +++ b/include/linux/hw_random.h
> @@ -14,6 +14,7 @@
>  
>  #include <linux/types.h>
>  #include <linux/list.h>
> +#include <linux/kref.h>
>  
>  /**
>   * struct hwrng - Hardware Random Number Generator driver
> @@ -44,6 +45,7 @@ struct hwrng {
>  
>  	/* internal. */
>  	struct list_head list;
> +	struct kref ref;
>  };
>  
>  /** Register a new Hardware Random Number Generator driver. */
> -- 
> 1.9.1

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 2/3] hw_random: fix stuck in catting hwrng attributes
  2014-09-18  2:43   ` Rusty Russell
  2014-09-18  2:48     ` [PATCH 1/5] hw_random: place mutex around read functions and buffers Rusty Russell
@ 2014-09-18 12:47     ` Amos Kong
  1 sibling, 0 replies; 20+ messages in thread
From: Amos Kong @ 2014-09-18 12:47 UTC (permalink / raw)
  To: Rusty Russell
  Cc: virtualization, kvm, herbert, m, mb, mpm, amit.shah,
	linux-kernel, Linus Torvalds

On Thu, Sep 18, 2014 at 12:13:08PM +0930, Rusty Russell wrote:
> Amos Kong <akong@redhat.com> writes:
> 
> > I started a QEMU (non-smp) guest with one virtio-rng device, and read
> > random data from /dev/hwrng by dd:
> >
> >  # dd if=/dev/hwrng of=/dev/null &
> >
> > In the same time, if I check hwrng attributes from sysfs by cat:
> >
> >  # cat /sys/class/misc/hw_random/rng_*
> >
> > The cat process always gets stuck with slow backend (5 k/s), if we
> > use a quick backend (1.2 M/s), the cat process will cost 1 to 2
> > minutes. The stuck doesn't exist for smp guest.
> >
> > Reading syscall enters kernel and call rng_dev_read(), it's user
> > context. We used need_resched() to check if other tasks need to
> > be run, but it almost always return false, and re-hold the mutex
> > lock. The attributes accessing process always fails to hold the
> > lock, so the cat gets stuck.
> >
> > User context doesn't allow other user contexts run on that CPU,
> > unless the kernel code sleeps for some reason. This is why the
> > need_reshed() always return false here.
> >
> > This patch removed need_resched() and always schedule other tasks
> > then other tasks can have chance to hold the lock and execute
> > protected code.
 
Hi Rusty,

> OK, this is going to be a rant.
> 
> Your explanation doesn't make sense at all.  Worse, your solution breaks
> the advice of Kernighan & Plaugher: "Don't patch bad code - rewrite
> it.".
> 
> But worst of all, this detailed explanation might have convinced me you
> understood the problem better than I did, and applied your patch.
 
I'm sorry about the misleading.

> I did some tests.  For me, as expected, the process spends its time
> inside the virtio rng read function, holding the mutex and thus blocking
> sysfs access; it's not a failure of this code at all.

Got it now.

The catting hang bug was found when I try to fix unhotplug issue, the
unhotplug issue can't be reproduced if I try to debug by gdb or
printk. So I forgot to debug cat hang ... but spend time to misunderstand
schedle code :(

> Your schedule_timeout() "fix" probably just helps by letting the host
> refresh entropy, so we spend less time waiting in the read fn.
> 
> I will post a series, which unfortunately is only lightly tested, then
> I'm going to have some beer to begin my holiday.  That may help me
> forget my disappointment at seeing respected fellow developers
> monkey-patching random code they don't understand.

I just posted a V2 with two additional fixes, hotunplugging works well now :)

> Grrr....

Enjoy your holiday!
Amos

> Rusty.
>
> > Signed-off-by: Amos Kong <akong@redhat.com>
> > ---
> >  drivers/char/hw_random/core.c | 3 +--
> >  1 file changed, 1 insertion(+), 2 deletions(-)
> >
> > diff --git a/drivers/char/hw_random/core.c b/drivers/char/hw_random/core.c
> > index c591d7e..263a370 100644
> > --- a/drivers/char/hw_random/core.c
> > +++ b/drivers/char/hw_random/core.c
> > @@ -195,8 +195,7 @@ static ssize_t rng_dev_read(struct file *filp, char __user *buf,
> >  
> >  		mutex_unlock(&rng_mutex);
> >  
> > -		if (need_resched())
> > -			schedule_timeout_interruptible(1);
> > +		schedule_timeout_interruptible(1);
> >  
> >  		if (signal_pending(current)) {
> >  			err = -ERESTARTSYS;
> > -- 
> > 1.9.3

-- 
			Amos.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 3/5] hw_random: fix unregister race.
  2014-09-18  2:48       ` [PATCH 3/5] hw_random: fix unregister race Rusty Russell
@ 2014-10-21 14:15         ` Herbert Xu
  2014-11-03 15:24           ` Amos Kong
  0 siblings, 1 reply; 20+ messages in thread
From: Herbert Xu @ 2014-10-21 14:15 UTC (permalink / raw)
  To: Rusty Russell
  Cc: Amos Kong, virtualization, kvm, m, mb, mpm, amit.shah, linux-kernel

On Thu, Sep 18, 2014 at 12:18:24PM +0930, Rusty Russell wrote:
> The previous patch added one potential problem: we can still be
> reading from a hwrng when it's unregistered.  Add a wait for zero
> in the hwrng_unregister path.
> 
> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
> ---
>  drivers/char/hw_random/core.c | 5 +++++
>  1 file changed, 5 insertions(+)
> 
> diff --git a/drivers/char/hw_random/core.c b/drivers/char/hw_random/core.c
> index dc9092a1075d..b4a21e9521cf 100644
> --- a/drivers/char/hw_random/core.c
> +++ b/drivers/char/hw_random/core.c
> @@ -60,6 +60,7 @@ static DEFINE_MUTEX(rng_mutex);
>  static DEFINE_MUTEX(reading_mutex);
>  static int data_avail;
>  static u8 *rng_buffer, *rng_fillbuf;
> +static DECLARE_WAIT_QUEUE_HEAD(rng_done);
>  static unsigned short current_quality;
>  static unsigned short default_quality; /* = 0; default to "off" */
>  
> @@ -98,6 +99,7 @@ static inline void cleanup_rng(struct kref *kref)
>  
>  	if (rng->cleanup)
>  		rng->cleanup(rng);
> +	wake_up_all(&rng_done);
>  }
>  
>  static void set_current_rng(struct hwrng *rng)
> @@ -529,6 +531,9 @@ void hwrng_unregister(struct hwrng *rng)
>  	}
>  
>  	mutex_unlock(&rng_mutex);
> +
> +	/* Just in case rng is reading right now, wait. */
> +	wait_event(rng_done, atomic_read(&rng->ref.refcount) == 0);

While it's obviously better than what we have now, I don't believe
this is 100% safe as the cleanup function might still be running
even after the ref count hits zero.  Once we return from this function
the module may be unloaded so we need to ensure that nothing is
running at this point.

Cheers,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 3/5] hw_random: fix unregister race.
  2014-10-21 14:15         ` Herbert Xu
@ 2014-11-03 15:24           ` Amos Kong
  0 siblings, 0 replies; 20+ messages in thread
From: Amos Kong @ 2014-11-03 15:24 UTC (permalink / raw)
  To: Herbert Xu
  Cc: Rusty Russell, virtualization, kvm, m, mb, mpm, amit.shah, linux-kernel

On Tue, Oct 21, 2014 at 10:15:23PM +0800, Herbert Xu wrote:
> On Thu, Sep 18, 2014 at 12:18:24PM +0930, Rusty Russell wrote:
> > The previous patch added one potential problem: we can still be
> > reading from a hwrng when it's unregistered.  Add a wait for zero
> > in the hwrng_unregister path.
> > 
> > Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
> > ---
> >  drivers/char/hw_random/core.c | 5 +++++
> >  1 file changed, 5 insertions(+)
> > 
> > diff --git a/drivers/char/hw_random/core.c b/drivers/char/hw_random/core.c
> > index dc9092a1075d..b4a21e9521cf 100644
> > --- a/drivers/char/hw_random/core.c
> > +++ b/drivers/char/hw_random/core.c
> > @@ -60,6 +60,7 @@ static DEFINE_MUTEX(rng_mutex);
> >  static DEFINE_MUTEX(reading_mutex);
> >  static int data_avail;
> >  static u8 *rng_buffer, *rng_fillbuf;
> > +static DECLARE_WAIT_QUEUE_HEAD(rng_done);
> >  static unsigned short current_quality;
> >  static unsigned short default_quality; /* = 0; default to "off" */
> >  
> > @@ -98,6 +99,7 @@ static inline void cleanup_rng(struct kref *kref)
> >  
> >  	if (rng->cleanup)
> >  		rng->cleanup(rng);

        rng->cleanup_done = true;

> > +	wake_up_all(&rng_done);
> >  }
> >  
> >  static void set_current_rng(struct hwrng *rng)
> > @@ -529,6 +531,9 @@ void hwrng_unregister(struct hwrng *rng)
> >  	}
> >  
> >  	mutex_unlock(&rng_mutex);
> > +
> > +	/* Just in case rng is reading right now, wait. */
> > +	wait_event(rng_done, atomic_read(&rng->ref.refcount) == 0);

Hi Rusty,
 
After initializing (kref_init()), the refcount is 1, so we need one
more kref_put() after each drop_current_rng() to release last
reference count, then cleanup function will be called.


> While it's obviously better than what we have now, I don't believe
> this is 100% safe as the cleanup function might still be running
> even after the ref count hits zero.  Once we return from this function
> the module may be unloaded so we need to ensure that nothing is
> running at this point.

I found wait_event() can still pass and finish unregister even cleanup
function isn't called (wake_up_all() isn't called). So I added a flag
cleanup_done to indicate that the rng device is cleaned up.


+       /* Just in case rng is reading right now, wait. */
+       wait_event(rng_done, rng->cleanup_done &&
+                  atomic_read(&rng->ref.refcount) == 0);

I will post the new v4 later.
 
> Cheers,
> -- 
> Email: Herbert Xu <herbert@gondor.apana.org.au>
> Home Page: http://gondor.apana.org.au/~herbert/
> PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

-- 
			Amos.

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2014-11-03 15:25 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-09-15 16:02 [PATCH v2 0/3] fix stuck in accessing hwrng attributes Amos Kong
2014-09-15 16:02 ` [PATCH v2 1/3] virtio-rng cleanup: move some code out of mutex protection Amos Kong
2014-09-15 16:13   ` Michael Büsch
2014-09-16  0:30     ` Amos Kong
2014-09-15 16:02 ` [PATCH v2 2/3] hw_random: fix stuck in catting hwrng attributes Amos Kong
2014-09-18  2:43   ` Rusty Russell
2014-09-18  2:48     ` [PATCH 1/5] hw_random: place mutex around read functions and buffers Rusty Russell
2014-09-18  2:48       ` [PATCH 2/5] hw_random: use reference counts on each struct hwrng Rusty Russell
2014-09-18 12:22         ` Amos Kong
2014-09-18  2:48       ` [PATCH 3/5] hw_random: fix unregister race Rusty Russell
2014-10-21 14:15         ` Herbert Xu
2014-11-03 15:24           ` Amos Kong
2014-09-18  2:48       ` [PATCH 4/5] hw_random: don't double-check old_rng Rusty Russell
2014-09-18  2:48       ` [PATCH 5/5] hw_random: don't init list element we're about to add to list Rusty Russell
2014-09-18 12:47     ` [PATCH v2 2/3] hw_random: fix stuck in catting hwrng attributes Amos Kong
2014-09-15 16:02 ` [PATCH v2 3/3] hw_random: increase schedule timeout in rng_dev_read() Amos Kong
2014-09-15 16:13   ` Michael Büsch
2014-09-16  0:27     ` Amos Kong
2014-09-16 15:01       ` Michael Büsch
2014-09-17  9:30 ` [PATCH v2 0/3] fix stuck in accessing hwrng attributes Herbert Xu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).